The Evolution of Language Ambiguity
Among the greatest obstacles to the development of robust natural language technologies is the prevalence of ambiguity. When early toy natural language processing systems were scaled up to try to handle naturally occurring text, almost every sentence was found to be massively ambiguous. Words typically have multiple interpretations, and it is common for sentences to permit multiple alternative phrase structures. The combination of several such ambiguities in a sentence can lead to a combinatorial explosion of possible parses.
The pervasiveness of ambiguity in natural language is paradoxical. The more interpretations an expression has, the more possibility there is that the hearer's interpretation will not match the speaker's intention. Hence, ambiguity should hinder communication. Since languages are primarily media of communication, and since languages evolve, one would expect that ambiguity would eventually diminish over time. But this is evidently not the case, or languages would not be so massively ambiguous.
This project seeks to understand why languages are so ambiguous. Drawing on tools from evolutionary theory and population biology, we are developing and analyzing mathematical models of the evolution of communication with the goal of discovering under what conditions ambiguous language will emerge, spread, and be maintained. We build on earlier work employing an evolutionary approach to the study of language, but unlike most such work, we do not assume a one-to-one pairing of meanings with expressions.
Our study starts with a computational simulation comparing the evolution of a very simple ambiguous language to the evolution of a very simple unambiguous language. Although under most circumstances the unambiguous language dominates at the population level, the ambiguous language emerges when one meaning is used overwhelmingly often. This finding reflects actual word usage in English, as an analysis of WordNet reveals.
The pervasiveness of ambiguity in natural language is paradoxical. The more interpretations an expression has, the more possibility there is that the hearer's interpretation will not match the speaker's intention. Hence, ambiguity should hinder communication. Since languages are primarily media of communication, and since languages evolve, one would expect that ambiguity would eventually diminish over time. But this is evidently not the case, or languages would not be so massively ambiguous.
This project seeks to understand why languages are so ambiguous. Drawing on tools from evolutionary theory and population biology, we are developing and analyzing mathematical models of the evolution of communication with the goal of discovering under what conditions ambiguous language will emerge, spread, and be maintained. We build on earlier work employing an evolutionary approach to the study of language, but unlike most such work, we do not assume a one-to-one pairing of meanings with expressions.
Our study starts with a computational simulation comparing the evolution of a very simple ambiguous language to the evolution of a very simple unambiguous language. Although under most circumstances the unambiguous language dominates at the population level, the ambiguous language emerges when one meaning is used overwhelmingly often. This finding reflects actual word usage in English, as an analysis of WordNet reveals.