Why is it so difficult to build a machine that understands language? Languages are ambiguous. You may have one statement or word that has multiple meanings, and there are multiple ways to express one idea with a language. Semantic ambiguity (e.g. batter has two meanings), syntactic ambiguity (e.g. like can be a preposition or a verb).
There is also morpho-syntactic (part of speech) ambiguity. The word can be an adjective or a noun, for example.
Burn has more than one meaning as a verb. It can mean (for example) to write data to a CD or to light something on fire.
Parts of speech:
- Determiner (a, the) — closed classes (there will not be any new determiners added to a language)
- Noun (person, place, thing, or idea)
- Adjective (e.g. pretty, nice, green, fast, crispy, easy)
- Verb (e.g. tell, run, go, eat, burn)
- Adverb (e.g. quickly, slowly, fast, well, south)
- Auxiliary Verb (have, be, do)
- Modal Verb (e.g. may, can, might, will, ought?)
- Preposition (e.g. over, under, through, in, on, with) — closed classes (there will not be any new prepositions added to a language)
- Conjunction
- Coordinate — and, or, but, nor
- Subordinate — so, yet, although, when, if, in so far as, as, after, before
- Pronoun (e.g. he, she, they, it, him, her, them)
If the same sentence can be derived two ways using the same grammar, the grammar is ambiguous.
Context-free grammars
S → S + T | S - T | S * T | S / T | T
T → id1 | id2 | id3
This is an ambiguous grammar. You can prove this by deriving the sentence (id1 * id2 + id3) using two trees.
Compound nounds (e.g. history book) are not recognized by the grammar on the handout.



Comments