Psychology 247 Cognitive Psychology
Language Structure
Erwin Segal
Return to syllabus
Language and its structure has been a field of study for many centuries. Even the Bible noted that people who speak different languages cannot communicate with one another. Some subtleties of language structure were studied by Plato and Aristotle. The invention of the alphabet by Phoenician scholars over 2500 years ago showed detailed study of language.

Language is the most important social product for preserving and passing on culture; and its study can be seen to be an important window on the mind. A natural language is very complicated, yet easy for children to learn. Almost all normal children know a great deal about their language by 3 years old, and most subtle nuances by the time they are 10. Interestingly, people who learn a language when they are very young know subtle properties better than those who learned the language when they were older, and this difference tends not to be overcome with years of speaking.

Language manifests structure at many different levels. The sequences of sounds are limited. The sequences of words are limited, and the same words in different orders mean different things. Dog bites man is a fairly common event, but Man bites dog is newsworthy. The order of man and dog is not the primary cause of the difference because Man is bitten by dog has the same relative order as Man bites dog, but the same logical meaning as Dog bites man.

Language and Communication
Generally it is thought that languages probably evolved so that people could communicate with one another. Thus the primary function of a natural human language is communication.

A first approximation of how the process of communication works can begin with a structure originally articulated by C. E. Shannon in 1948 and published as The Mathematical Theory of Communication in 1949. The basic idea is represented in this flow chart:

Shannon's view of a communication system

The source is in some sense the mind of the communicator. It contains the idea or message that is to be communicated. In order for the process to proceed, the message has to be put into a form by which it can be encoded and transmitted. In normal speech this would be putting it into language as a sequence of sound units of speech (phonetics and phonology). These units are put into a sound wave form by the muscles of the lungs, vocal cords, tongue and other articulators. Then the sound waves that are produced go through the air, which is the medium in this case. The ears and brain have to decode the message and put it into a form that can be understood by the mind of the listener at the destination.

These questions can be asked for both oral and written communication.

Some properties of natural languages

  • Concatenation: Languages present themselves as a sequence of elements. The physical communication process proceeds (more or less) unit by unit over time or space. A code of a natural language "linearizes" the message so that it can be sequentially transmitted.
  • Dual patterning: Language can be seen to have at least two different structures. One which relates to the sounds (or other forms of the physical instantiation of the language) and the other which relates to the meanings of the utterances. The first system is often designated the phonological system and the second is the semantic system. The smallest linguistic units of the phonological system are generally thought to be phonemes. The smallest units of the semantic system are morphemes. Although many words are single morphemes, others contain more than one morpheme. Larger units in both systems are generally based on rules of combination of more elementary units. And the combinatorial process continues for several levels which gives natural languages a hierarchical structure. Phonology is the study of the sound system and its principles of combination. Syntax is the science of identifying the rules of combination of units, particularly those forms which underlie the semantic system. Semantics (as Segal views it) is the study of relating the meaningful forms to their interpretations. All of the studies include differences among languages and processes of acquisition and change.
  • Arbitrariness: Generally there is no clearly identified set of properties in the phonological system that can be used to identify the meanings of the forms. There are certain subregularities within languages, but the same principles only hold very loosely or not at all between languages.
  • Discreteness: There are elements and units which comprise all natural languages. Starting with the sound system, or the written alphabet and continuing to higher level elements such as words, phrases, and sentences, elements are interpreted as being in one category or another. There is no continuous flow between categories. For example a sound is categorized as either a /b/ or a /p/ even if the actual sound is exactly in the middle between the two. Likewise a word is either 'bit' or 'pit' not a intermediate word that means a piece of a peach stone.

  • Openness (productivity and creativity): Natural languages have the capability of being able to communicate about almost anything. It is not finitely delimited to an a priori set of messages. Although the elements are discrete that does not mean that there cannot be new ones added to the list. The sound system of a language is relatively stable, but words of certain types are created fairly easily, and higher order units such as sentences and paragraphs are very often novel creations. Sequences of elements can construct meanings that were never communicated before.
    Chomsky
    The scientific study of natural language had its greatest boost starting in the 1950's contributions of Noam Chomsky (Syntactic Structures, 1957; Review of Skinner's 'Verbal behavior', 1959; Aspects of the Theory of Syntax, 1965).

    Prior to Chomsky the general view was that the structure of natural languages was too variable and inconsistent to be described in a meaningful way. The logicians thought that if one wanted to communicate precisely he should write in a reformed language more like a first order predicate calculus.

    Chomsky argued that natural languages have a rather precise structure with which to represent meanings, it was just that no one studied it correctly. One could think of finding the formal structures that underlie all natural languages.

    In 1957 and thereafter Chomsky argued that the productive component of language was its syntax and phonology and semantics were interpretive implementations of the syntax. One could generate an infinite number of sentences by following generative and recursive rules in the syntactic component of the language. This was part of the linguistic competence that all humans either had already, or have the potential to develop. One could discover the rules of language by careful systematic intuitive analyses of different sentences. By 1965 Chomsky seemed to think that there would be many grammars of natural languages in the very near future, following his guidelines. Anderson's analyses of language structure on pages 354-362 are based on a small part of this early work by Chomsky, as are pp 366-367, 374-377, and 379-386.

    Chomsky believed that the fact that we know a language means that we somehow have internalized a grammar of that language. It is up to linguists and other cognitive scientists to figure out what the properties of the internalized grammar are. He spent some time discussing the nature of this knowledge.

    Chomsky suggested that scientists can construct formal grammars that represent the internal knowledge. He also suggested that there were different levels of constraints on the grammars proposed. The weakest constraint was one that simply had empirical adequacy. Such a grammar would be a formal system by which one could generate all and only the sentences of a language. A second level added structural adequacy to empirical adequacy. This grammar, not only would generate the appropriate sentences, but also would show the appropriate set of structural relations among the units in the grammar. The third level was what Chomsky called theoretical adequacy. This level added to the others the potential to be acquired by humans in the time and data available. Chomsky did not spend much time, himself, on the leearning processes that one might use to acquire a language. 

    Formally Chomsky defined a language as the set of sentences that are sentences of that language. A grammar of the language is a finite set of rules that generate all and only the sentences of the language. Also, (equivalently?) a grammar could receive any string of concatenated elements and decide whether the string is a sentence in the language or not. People who know a language supposedly have the ability to generate sentences and to evaluate them.

    Syntax
    There is without doubt a hierarchical structure to natural languages. The idea of hierarchical structure in natural languages has been known since ancient times. Chomsky, however, is the first modern person who tried to analyze the structure of natural language with mathematical precision. A concatenated sequence of elements can be organized into sequences of larger units. For example, a string of phonemes may be parsed into a string of words; a string of words into phrases, some phrases into higher order phrases; phrases into clauses; clauses into sentences; sentences into paragraphs; paragraphs into stories, etc. There are constraints as to what is acceptable at every level of the hierarchy. E.g., a string of phonemes may not be parsed into a string of words if there are errors in the structure at that level. Words may not make legitimate phrases, etc.

    Another for of evidence for the hierarchical nature of language is that some sentences are ambiguous due to mis-parsing of units into the correct higher order units for the communication.

    Chomsky proposed that one could begin to generate sentences by an algorithm of sequentially applying sets of rules. The first set of rules were rewrite rules. A rewrite rule might be S--> NP + VP; to be interpreted as rewrite S (for sentence) as NP (noun phrase) concatenated with VP (verb phrase). NP-->D+N; rewrite NP as determiner (the, a, this, etc.) +Noun. VP-->V+NP. Then rewrite D, N, and V as words in the lexicon. D-->the, a; N-->boy, dog; V-->kissed, bit. This little grammar could generate the sentences A dog kissed the boy. The boy bit the dog; and a few others. This grammar seems to capture something about the structure of language, but it obviously needs enhancing. The man is coming home to his wife, shows that the first and second NP cannot be generated totally independently of one another.

    Trying to demonstrate the kinds of constraints that a language puts onto the cognitive system that uses it is very informative. One cannot generate a language from left to right or beginning to end. The constraints extend over large regions of the text. The boy who..(whatever)..is coming. One cannot even generate sentences in a hierarchical manner as implied by the phrase structural rules and rewrite rules.

    The same basic content can be presented with many different grammatical structures. Chomsky, taking off from his mentor, a man named Zelig Harris, proposed that different sentences with the same set of meaning relations are transformationally related to one another. It is true that speakers can easily generate different forms of the same content, and there is evidence that the closer the formal relationship the faster this can be done.  The boy kissed the girl; The girl was kissed by the boy, the one who kissed the girl is the boy; It's the girl that the boy kissed. The boy didn't kiss the girl. Who kissed the girl? Who did the boy kiss?

    Once we have the concept of a set of rules that relate one sentence to another as in the examples above, it can be seen that there can be relations between sentence like components within a single sentence; The John met a man who won a gold medal at the Olympics. This sentence contains within a noun phrase the sentence-like clause who won a gold medal at the Olympics.

    Other issues in grammar include the fact that many different forms obviously depend upon the particular pragmatic purpose of the communication. These pragmatic constraints make finding a mathematical underpinning of the language that much more difficult to understand psychologically. Even though they may have similar 'cognitive meanings', they have very different implications.

    There is behavioral evidence of hierarchical structure including the pause patterns of speakers, where hesitations and errors appear in speech, and even the interesting phenomenon that errors or disfluencies are acceptable and even informative in some sentence locations and not others. If one forgets words when trying to rotely recall a sentence, the grammar predicts where the errors will occur.

    Theories of the Acquisition and Cognition of Language
    There is no current agreement on how people learn and use language. The problem is very hard. Children know their own language intuitively very well, but no linguist has yet been able to describe that knowledge. Classical behaviorist learning theories cannot approach explaining any of the complex patterns of language and its use. The language that is acquired often seems to be underdetermined by the input received by the language learner. PEople who learned a natural language early in life seem to know the subtleties better than others who have had as much or more experience but started later. Some argue that the knowledge of language far surpasses the level of cognition exhibited by the child.

    It has been proposed that language is an independent modular cognitive system that is encapsulated and basically independent of influence from other cognitive systems of the human organism. It supposedly has its own set of acquisition principles, the principle one being that language acquisition is innate. Recently a theory of parameter setting has been proposed to account for the differences in the forms of different languages. This theory says that we all have all the possible different structural constraints innately present, but they need some inut to be released. The input from our first language sets the parameters. For example, a switch differentiates 'subject-verb-object' order of basic sentences from 'subject-object-verb' order. Another may mark 'pronoun deletion.'

    Alternative views argue that language, though it may have some innate components, is generally part of cognition and its properties are not that unique. It is learned to a great extent va the different functions that language plays. There is no real evidence that there is a critical period to language acquisition. Differnt roles that language plays in the environment of children seem to be somewhat correlated with particular acquisitions. Parameter setting seems to be a model which would have many disasterous consequences in bilingual children, or even in learning second languages in general. There is no good evidence that language is encapsulated. Specific environmental inputs affect different aspects of language access.

    We have a problem, lots of facts, and no clear explanation.
    Return to syllabus