Definition 2. Σ* = {x|x is a finite string over Σ}
What do we mean by "string"? Here we will use an inductive definition.
Basis: λ ∈ Σ* and |λ| = 0 and a ∈ Σ* for all a ∈ Σ and |a| = 1. Type conversion from symbols to string (char to string).
Inductive Step: If x ∈ Σ* then λ ⋅ x = x ⋅ λ = x
|a ⋅ x| = |x ⋅ a| = |x| + 1
"little dot": {(λ, λ)} ∪ Σ* ∗ Σ ∪ Σ ∗ Σ* → Σ*
Σ* set of strings
L ⊆ Σ* is a language over Σ
Concatenation: If L and M are strings over the same alphabet…
L, M ⊆ Σ*
L ⋅ M (Set concatenation.) = { x ⋅ y | x ∈ L and y ∈ M} (String concatenation.)
x ∈ L = { λ, ab, b }
y ∈ M = { ba, b, aa }
L ⋅ M = { (λ) ba, b, aa, (ab) abba, abb, abaa, (b) bba, bb, baa }
M ⋅ L ≠ L ⋅ M (in general)
Proposition 1: Φ ⋅ L = L ⋅ Φ = Φ
Φ ⋅ L = { x ⋅ y | x ∈ Φ and y ∈ L } … will always be false. (Look this up, I didn't get it all down.)
{λ} ⋅ L = L ⋅ {λ} = L
λ ⋅ L is nonsense! Do not write this on an exam!
Lk = L ⋅ L … L (k times)
L0 ≡ {λ} by definition
Lk = L
Lk + 1 = Lk ⋅ L = L ⋅ Lk where k ≥ 0
Φ0 = {λ} Here's a case where you do get something for nothing. Through the magic of mathematical definition…
[See slides on Phrase Structure Grammar.]
⇒ : VG* → VG*
α ⇒ β iff α = α1 = α2
and β = α1 = α2
and u → v ∈ G.
Initial state is S, the start symbol. The computer non-deterministically chooses a rule from the grammar which matches the left-hand side and if it finds a match it replaces the occurrence of the left-hand side with the right-hand side…
S = α0 ⇒ α1 ⇒ α2 ⇒ … ⇒ α∞
Computation continues indefinitely until some case that may terminate it. If the current state of memory happens to be an element of a string over the terminal alphabet (no non-terminals, αn ∈ Σ*) then the computation halts.
A language defined by G — Let G = (N,Σ,P,S) be a psg. Then the language generated by G is the set: L(G) = { x ∈ Σ* | S G⇒ + x }; that is, x ∈ L(G) if and only if there is a computation in G satisfying: S = α0 G⇒ α1 … G⇒ αn = x, for some n ≥ 1. That's how Chomsky defined a language for a phrase structured grammar.
Type 0: P = { u → v | u ∈ V* N V* and v ∈ V* }
Type 1: P = { u → v | u ∈ V* N V* and v ∈ V* and |v| ≥ |u| } Restriction: no erasing! "Context sensitive."
N = { S, X, Y }
Σ = { a, b, c }
P = {
1: S → SXY
2: S → Y
3: X → aXb
4: Y → bbY
5: X → Y
6: Y → c
7: aX → Yb
8: bY → bc
You can replace Y by c but only when preceded by a b (Rule 8). Rules 1-6 are context-free rules. These rules mean that you can replace S regardless of what its context is. There are no constraints on S's context. Rules that have two or more symbols on the left-hand side are context-sensitive.
Type 2: P = { u → v | u ∈ N and v ∈ V* } "Context-free grammar." Erasing is allowed again, but the left part is restricted to be context-free.
Context-free grammars are the family of languages from which we take all programming languages. Programming languages can, then, be described by Type 2 "Context-Free" grammars.
Type 3: P = { u → v | u ∈ N and v ∈ Σ* ∪ Σ*N } "Right linear."
Linear CFG: u → v ∈ P iff u ∈ N and v ∈ Σ* N Σ* ∪ Σ* … It's either a string of terminals or a string that has exactly one non-terminal that can occur anywhere.
Recognizers/Acceptors
x ∈ Σ* → [ P ] → Halt: Accept! if x ∈ L(P). If x ¬sin; L(P) then Halt and Reject. R may never Halt!
Type 0 grammars correspond to Turing Machines. Type 1 correspond to linear bounded automata (memory is bounded by a multiple of the input size. Type 2 correspond to non-deterministic push-down automata (memory must take the form of a stack, unbounded in size but only access the top element). Type 3 correspond to regular, finite-state machines. Finite state machines have a fixed amount of memory for all inputs so there's only a limited amount of information that they can remember.
This course will be structured around showing these equivalencies and showing properties of the individual languages (closed under blah blah) and what can you prove (algorithms) to answer questions?
Example 2
G2 = { N, Σ, P, S }
N = { S, X }
Σ = { 0, 1 }
P = { 1: S → 1S
2: S → λ
3: S → 0X
4: X → 1X
5: X → 0S }
This is a context-free grammar.
S 2⇒ λ ∈ Σ* ∴ L(G).
L(G) = { λ …
S 1⇒ 1S 1⇒ 11S 3⇒ 110X 4,4⇒ 11011X 5⇒ 110110S 2⇒ 110110 ∈ Σ* ∴ 110110 ∈ L(G)
L(G) = { 1 }* "The set containing 1, star."
Lk
L0 = { λ }
L1 = L
Lk+1 = LkL = LLk
L* = ∞k=0∪ Lk = L0 ∪ L1 ∪ L2 ∪ …
L+ = ∞k=1∪ Lk
L(Gv) = {1}*({0}{1}*{0}{1}*)* = {1}* = …
Our program produces strings that have an even number of zeros as the output of a computation. We need to show that.
You should always try proof by induction first.
{1}* ( {3} {4}* {5} )* {2}
Every execution sequence that terminates must be of this form. Replace stars by variables and do induction on those variables and show what the properties of the centential form produce.
{1}n1 ( {3} {4}n2 {5} )n3 {2} ⇒ 1n1 (0 1n2 0)n3 S
You won't be asked to prove this on a quiz or an exam.
Example 3
N = { S }
Σ = { 0, 1 }
P = { 1: S → 0S1
2: S → λ
}
S 2⇒ λ
S 1⇒ 0S1 1⇒ 00S11 1⇒ … 1⇒ 0nS1n 2⇒ 0n1n
This cannot be generated by a Type 3 grammar.
A regular grammar can be Type 3, Left-Linear, DFSA, NFSA, or regular expressions.
L = { x ∈ {0,1}* | |x|0 = |x|1 } You'll never be able to come up with a finite-state machine that will recognize this language.
Example 5
1: S → SXYZ
2: S → λ
3: YX → XY
4: ZX → XZ
5: ZY → YZ
6: X → a
7: aX → aa
8: aY → ab
9: bY → bb
10: bZ → bc
11: cZ → cc
This looks like Type 1 but erasing is allowed. So it's a Type 0.
S ⇒ SXYZ ⇒ SXYZXYZ ⇒ XYZXYZ ⇒ XXYYZZ (with combination of 3, 4, 5) … You should discover how this happens.
Use 6 a couple times and get ⇒ aaYYZZ
8, 9 ⇒ aabbZZ
10, 11 ⇒ aabbcc
{ anbncn | n > 0 } ⊆ L(G)
Describe as many different kinds of strings as you can that are members of L(G).

Comments