Summary prepared by Matt Smith (additions and comments by Frawley)
One of the first things we notice about speech sounds is their variety. Consider the way a beat is created by Bobby McFarrin or the click sounds in the language spoken in "The Gods must be Crazy." But while no language organizes its rhythms in the way Bobby McFarrin creates some of those he produces (chest beating, e.g.), some languages do use different sorts of clicks as basic sounds.
In spite of their variety, speech sounds are universally limited. First, they are limited by the circumstances of their production: no two spoken words are ever the same. They vary by the difference in a person's tract size, mass, length of vocal cords, etc. They also vary by other factors reflecting the person's performance, such as if the speaker is sick.
Second, each speech sound contains different features, but these are drawn from a universal set. These features may vary slightly from speaker to speaker. But there exist salient properties of sounds that make the slight changes insignificant. Even infants have the ability to discriminate between the different features.
The features can be summed up into about 20 questions: e.g., is it nasal? is it labial? etc. Some examples of these features are:
[m] -- [sonarant, nasal, stop, labial]
[u] -- [vocalic, (sonarant), high, round, ATR]
[g] -- [stop, dorsal, slack vocal folds]
Each language has a set definition of features which are allowed. This is what causes great difficulty for people learning other languages. Once you learn one language, you become accustomed to its feature definitions. Learning new definitions is very hard because you must build it on top of your existing definition.Because of these invariants, speech perception is in some sense imperfect. There are illusions in speech perception, and what makes them interesting is that they can sometimes be learned.
Note: just as there are visual illusions, there are also speech illusions. This is because, in these cases at least, the mind applies "what it knows" absolutely. How do we determine what parts of a sound are "in the head"?
One way to see the mental representation of sounds is to look for minimal pairs, words that differ in only one sound. Because that difference in sound causes a difference in meaning, the sound difference is significant to the speaker. But just as there are properties of sound that are preserved and hence are significant to the mental representation, there are physical distinctions which are lost in memory. For example, the long and short versions of vowels in English are all stored in the one memory location of that vowel. The difference between these vowels is not mentally significant in English and attributable to the way they are pronounced more than they way they are thought of. Moreover, these differences -- significant or not -- are never taught to us directly but picked up in the act of learning the language.
Expletive Infixation is another process that illustrates aproperty of speech which is never taught to us, but we all know and use. The rule which we all have seemed to learn is that when inserting a foot (some combination of strong and weak stress) it must go somewhere between feet and must go before last foot. Judging where the infix should go comes naturally to us all. There are a great deal of possibilities. Who teaches us this rule? Why do we put the infix where we do?
These examples show a critical point about linguists' approach to language: in addition to the speech that you hear and say, there must be some abstract structure of sound that is implicit but nevertheless there: what Prof. Idsardi called "inaudible structures." Otherwise, the patterns in what we do observe would not appear. Just as syntax and semantics require abstract implicit structure, so too does the system of speech sounds.