Any time that you want a general approach to proving properties of grammars and languages, keep these things in mind…
- Try induction first. Sometimes a direct proof or proof by contradiction is the way to go, but induction is the most likely tool to use for grammars.
- Given some grammar, G = (N, Σ, P, S), if you're asked to prove x ∈ L(G), then you just need to come up with a specific sequence of rules (find a string π ∈ P+ (sequence of one or more values in P) such that S ⇒ x. So the existence of π is the proof that x ∈ P. It demonstrates that there is some computation path that will produce x.
There may be more than one left-most or right-most derivation. One language might have more than one left-most derivation under one grammar and the same language might have only one left-monst derivation under a different grammar (that's called unambiguous grammar). This is the type of grammar we want for compilers. We want different parsers to parse the same code the same way.
S ⇒ x
S ⇒ 0
You can construct an inductive proof based on either |π| (number of steps to derivation or |x| (length of sentenial form). Usually it's easier to do induction on the length of the derivation.
1: S → aSb
2: S → λ
Conjecture:
L(G) = { anbn | n ≥ 0 }
To prove this, you have to show two things:
- If x ∈ L(G) implies x ∈ { anbn | n ≥ 0 }
- If y ∈ { anbn | n ≥ 0 } then y ∈ L(G)
There is a difference. The first says the terminal string has to be of the form { anbn | n ≥ 0 }. You have to prove by induction that if S ⇒ x ∈ Σ*, then x=anbn for some n ≥ 0.
Also, you must prove that if x = anbn, n ≥ 0, then there exists some πx such that S ⇒ x ∈ Σx.
Hypothesis: For n ≥ 0, S ⇒ anbn.
Prove by induction on n:
- Basis Case (n=0): S ⇒(2) λ = a0b0 = λb0 = λλ = λ
- Induction Case: Assume Induction Hypothesis holds for all k, 0 ≤ k ≤ n. Show that it holds for n+1.
Show that: S ⇒ an+1bn+1
S (1)⇒ aSb (1n2)⇒ aanbnb = an+1bn+1 (using the inductive hypothesis).
Inductive Definitions, cont.
We're looking for different ways to specify formal specification systems. Three formal systems for specifying languages or sets:
- Set notation
- Grammars
- Inductive definitions
Recall that inductive definitions have three parts:
- Basis Rule that declares a finite subset of S.
- A set of inductive rules: R1, R2, … Rn where each rule takes the form: if P(x) then f(x) ⊆ S. P(x) is some predicate true about an existing member x of S. f(x) is a finite set of potentially new members of S. Even though f(x) may be finite, some of those members may already exist in S. So it doesn't guarantee that all or any members will be new.
- Standard Closure Rule: Nothing belongs to S that cannot be gotten from rules (1) and (2). You have to be able to define membership exclusively in terms of some finite application of these rules.
Sn is the set of all members of S you can get by n applications of the inductive rules.
S0 is the set gotten by no applications of the inductive rules — in other words, it is the set obtained by the basis rule.
Example 6
illustrates how we can use inductive definitions to prove something about languages. It also illustrates that any time we specify a language in some formal system we are burdened with proving that it does define the language that we say it does.L = { (ab)n | n ≥ 0 } = { λ, ab, abab, ababab, … }
Inductive definition of the same language (L' — give it a different name until you've proved that L'=L):
Basis: λ ∈ L'
Inductive Rules:
R1: If x ∈ L' then xab ∈ L' Choose any string known to exist in L' and you can get a new string in L' by concatenating ab on the end of it… by predicate)
Theorem: L = L'. In order to prove two sets are equal, you must show that each is a subset of the other.
So prove: L ∈ L' and L' ∈ L.
Lemma 1
Claim: L'n (any member of L' that's earned membership by n applications of the inductive rules) … L'n = { (ab)k | 0 ≤ k ≤ n }
(a) x ∈ L'n ⇒ x ∈ { (ab)k | 0 ≤ k ≤ n }
(b) If x ∈ (ab)n for some n ≥ 0, then x ∈ L'n.
Prove (a) and (b) by induction on n. You'll really prove both directions at once.
Basis. L'0 = { (ab)k | 0 ≤ k ≤ 0 }
LHS: L'0 = { λ }
RHS: { (ab)0 } = { λ }
So we have established that LHS = RHS for n=0.
Inductive Case.
Inductive Hypothesis: IH(n) : L'n = { (ab)k | 0 ≤ k ≤ n } for some m ≥ 0.
Show: IH(n) ⇒ IH(n+1)
LHS: Consider L'n+1
= L'n ∪ { y | y ∈ L' by some application R1(x) x ∈ L'}
By IH(n) = { (ab)k | 0 ≤ k ≤ n } ∪ { (ab)kab | 0 ≤ k ≤ n }
≡ ∪ { (ab)k+1 | 0 ≤ k ≤ n }
≡ ∪ { (ab)k | 1 ≤ k ≤ n+1 }
= { (ab)k | 0 ≤ k ≤ n } ∪ { (ab)k | 1 ≤ k ≤ n } ∪ { (ab)n+1 }
= { (ab)k | 0 ≤ k ≤ n } ∪ { (ab)n+1 }
= { (ab)k | 0 ≤ k ≤ n+1 }
∴ that equality holds…
Now that our induction is complete we can assume that our lemma holds.
Case A. L ⊆ L'
Let x ∈ L. By definition of L, x = (ab)n for some n ≥ 0. By Lemma 1, L'n (same n!) = { (ab)k | 0 ≤ k ≤ n }
∴ x ∈ L'n ⊆ L'
∴ L ⊆ L'
Case B.
Let x ∈ L'. Show that x ∈ L.
By definition of L' (closure rule), x ∈ L'n for some n ≥ 0.
∴ x = (ab)n for some n ≥ 0
∴ By definition of L, x ∈ L.
You won't be expected to do proofs in this level of detail, but you will have to become familiar with the language used and the mental discipline…
Finite State Automata
We're going to start with Regular Languages and we'll characterize them in terms of Left-Linear Grammar, Right-Linear Grammar, and Finite State Machines (plus some others). We'll show that the family of regular languages is really an abstract data type closed under a number of operations. Then we'll move into context-free grammars with a very important test (the pumping lemma) that characterizes regular languages to test whether a language is regular or not. We'll develop some mathematical tools (algorithms) for detecting non-regular languages. We'll characterize those (CFGs) in terms of Type 2 grammars and Push-Down Automata. We'll study another extension of the pumping lemma, Pumping Lemma 2 that will get us outside of the context-free into the Turing Machine and Phrase Structured Grammars. These are the recursively innumerable languages, and the largest family of languages that we can describe with computers or computational devices.
Definition: A deterministic finite state machine (automaton) is a 5-tuple, M = (Q, Σ, δ q0, A) where
Q = a non-empty, finite set of states.
Σ = alphabet of input symbols
q0 ∈ Q is the initial state of M
A ⊆ Q is a set of accept states
δ = a function from Q x Σ → Q
What is the domain of δ? It's a cross-product: QxΣ = { (q,a) | q ∈ Q and a ∈ Σ }
|QxΣ| = |Q| * |Σ| if they're both finite.
The fact that it's a function means that it's defined on every member of the domain. So, for all (q,a) ∈ QxΣ, δn(q,a) ∈ Q. This means that if the machine is any of its states it will respond (it won't hang — it will go to some state / do something).
Exercise »
Design a valid finite state machine that accepts all valid Java identifiers.Due: (no due date).

