Upload
jordan-franklin-harper
View
241
Download
2
Embed Size (px)
Citation preview
241-303 Discrete Maths: Grammars/8 1
Discrete Maths
• Objectives– to introduce grammars and show their im
portance for defining programming languages and writing compilers;
– to show the connection between REs and grammars
241-303, Semester 1 2014-2015
8. Grammars
241-303 Discrete Maths: Grammars/8 2
Overview
1. Why Grammars?
2. Languages
3. Using a Grammar
4. Parse Trees
5. Ambiguous Grammars
6. Top-down and Bottom-up Parsing
continued
241-303 Discrete Maths: Grammars/8 3
7. Building Recursive Descent Parsers
8. Making the Translation Easy
9. Building a Parse Tree
10. Kinds of Grammars
11. From RE to a Grammar
12. Context-free Grammars vs. REs
241-303 Discrete Maths: Grammars/8 4
1. Why Grammars?
• Grammars are the standard way of defining programming languages.
• Tools exist for translating grammars into compilers (e.g. JavaCC, lex, yacc, ANTLR)– this saves weeks of work
241-303 Discrete Maths: Grammars/8 5
2. Languages
• We use a natural language to communicate– its grammar rules are very complex– the rules don’t cover important things
• We use a formal language to define a programming language– its grammar rules are fairly simple– the rules cover almost everything
continued
241-303 Discrete Maths: Grammars/8 6
• A formal language is a set of legal strings.
• The strings are legal if they correctly use the language’s alphabet and grammar rules.
• The alphabet is often called the language’s terminal symbols (or terminals).
241-303 Discrete Maths: Grammars/8 7
Example 1
• Alphabet (terminals) = {1, 2, 3}
• Using the grammar rules, the language is:L1 = { 11, 12, 13, 21, 22, 23, 31, 32, 33}
• L1 is the set of strings of length 2.
not shownhere; seelater
241-303 Discrete Maths: Grammars/8 8
Example 2
• Terminals = {1, 2, 3}
• Using different grammar rules, the language is:
L2 = { 111, 222, 333}
• L2 is the set of strings of length 3, where all the terminals are the same.
241-303 Discrete Maths: Grammars/8 9
Example 3
• Terminals = {1, 2, 3}
• Using different grammar rules, the language is:
L3 = {2, 12, 22, 32, 112, 122, 132, ...}
• L3 is the set of strings whose numerical value is divisible by 2.
241-303 Discrete Maths: Grammars/8 10
3. Using a Grammar
• A grammar is a notation for defining a language, and is made from 4 parts:– the terminal symbols– the syntactic categories (nonterminal symbols)
• e.g. statement, expression, noun, verb
– the grammar rules (productions)• e,g, A => B1 B2 ... Bn
– the starting nonterminal• the top-most syntactic category for this grammar
continued
241-303 Discrete Maths: Grammars/8 11
• We define a grammar G as a 4-tuple:G = (T, N, P, S)
– T = terminal symbols– N = nonterminal symbols– P = productions– S = starting nonterminal
241-303 Discrete Maths: Grammars/8 12
3.1. Example 1
• Consider the grammar:T = {0, 1}
N = {S, R}
P = { S => 0S => 0 RR => 1 S }
S is the starting nonterminal
the right hand sidesof productions usuallyuse a mix of terminalsand nonterminals
241-303 Discrete Maths: Grammars/8 13
Is “01010” in the language?• Start with a S rule:
– Rule String Generated-- SS => 0 R 0 RR => 1 S 0 1 SS => 0 R 0 1 0 RR => 1 S 0 1 0 1 SS => 0 0 1 0 1 0
• No more rules can be applied since there are no more nonterminals left in the string.
Yes, itis in thelanguage.
241-303 Discrete Maths: Grammars/8 14
Example 2
• Consider the grammar:T = {a, b, c, d, z}
N = {S, R, U, V}
P = { S => R U z | zR => a | b RU => d V U | cV => b | c }
S is the starting nonterminal
241-303 Discrete Maths: Grammars/8 15
• The notation:X => Y | Z
is shorthand for the two rules:X => YX => Z
• Read ‘|’ as ‘or’.
241-303 Discrete Maths: Grammars/8 16
Is “adbdbcz” in the language?
• Rule String Generated-- SS => R U z R U zR => a a U zU => d V U a d V U zV => b a d b U zU => d V U a d b d V U zV => b a d b d b U zU => c a d b d b c z Yes!
This grammar has choices about how to rewrite the string.
241-303 Discrete Maths: Grammars/8 17
Is “abdbcz” in the language?
• Rule String Generated-- SS => R U z R U zR => a a U zwhich U rule?
• U must be replaced by something beginning with a ‘b’, but the only U rule is:
U => d V U | c
No
241-303 Discrete Maths: Grammars/8 18
3.2. BNF
• BNF is a shorthand notation for productions– Backus Normal Form, or– Backus-Naur Form
• We have already used ‘|’:X => Y1 | Y2 | ... | Yn
continued
241-303 Discrete Maths: Grammars/8 19
• X => Y [Z]is shorthand for two rules:
X => YX => Y Z
• [Z] means 0 or 1 occurrences of Z.
continued
241-303 Discrete Maths: Grammars/8 20
• X => Y { Z }is shorthand for an infinite number of rules:
X => YX => Y ZX => Y Z ZX => Y Z Z Z :
• { Z } means 0 or more occurrences of Z.
241-303 Discrete Maths: Grammars/8 21
3.3. A Grammar for Expressions
• Consider the grammar:T = { 0, 1, 2,..., 9, +, -, *, /, (, ) }
N = { Expr, Number }
P = { Expr => NumberExpr => ( Expr )Expr => Expr + Expr | Expr - Expr |
Expr * Expr | Expr / Expr }
Expr is the starting nonterminal
241-303 Discrete Maths: Grammars/8 22
Defining Number• The RE definition for a number is:
number = digit digit*digit = [0-9]
• The productions for Number are:Number => Digit { Digit }Digit => 0 | 1 | 2 | 3 | … | 9
orNumber => Number Digit | DigitDigit => 0 | 1 | 2 | 3 | ... | 9
241-303 Discrete Maths: Grammars/8 23
Using Productions
• Expand Expr into (125-2)*3
Expr => Expr * Expr=> ( Expr ) * Expr=> ( Expr - Expr ) * Expr=> ( Number - Number ) * Number
:=> ( 125 - 2 ) * 3
continued
241-303 Discrete Maths: Grammars/8 24
• Expand Number into 125
Number => Number Digit=> Number Digit Digit=> Digit Digit Digit=> 1 2 5
241-303 Discrete Maths: Grammars/8 25
3.4. Grammars are not Unique
• Two grammars that do the same thing:Balanced => Balanced => ( Balanced ) Balanced
and:
Balanced => Balanced => ( Balanced )Balanced => Balanced Balanced
• Both generate the same strings:(()(())) () (()())
241-303 Discrete Maths: Grammars/8 26
3.5. Productions for parts of C
• Control structures:Statement => while ( Cond ) Statement
Statement => if ( Cond ) StatementStatement => if ( Cond ) Statement
else Statement
• Testing (conditionals):Cond => Expr < Expr | Expr > Expr | ...
continued
241-303 Discrete Maths: Grammars/8 27
• Statement blocks:Statement => ‘{‘ StatList ‘}’
StatList => Statement ; StatList | Statement ;
241-303 Discrete Maths: Grammars/8 28
Using the Statement Production
Statement=> while ( Cond ) Statement=> while ( Expr < Expr ) Statement=> while ( Expr < Expr ) { StatList }=> while ( Expr < Expr ) { Statement ; Statement ; }
:=> while (x < 10) { y++; x++; }
• This example requires an extra Expr production for variables:
Expr => VariableName
241-303 Discrete Maths: Grammars/8 29
3.6. Generating a Language
• For a given grammar, what strings can it generate?– the language is the set of legal strings
• Most languages contain an infinite number of strings (e.g. English)– but there is a process for generating them
continued
241-303 Discrete Maths: Grammars/8 30
• For each production, list the strings that can be derived immediately.
• On the 2nd round, put those strings back into the productions to generate more strings.
• On the 3rd round, put those strings back...
• Continue for as many rounds as you want.
241-303 Discrete Maths: Grammars/8 31
Example
• Consider the grammar:T = { w, c, s, ‘{‘, ‘}’, ‘;’ }
N = { S, L }
P = { S => w c S | ‘{‘ L ‘}’ | s ‘;’L => L S |
}
S is the starting nonterminal
241-303 Discrete Maths: Grammars/8 32
Strings in First 3 Rounds
S L
Round 1: s;
Round 2: wcs;{}
s;
Round 3: wcwcs;wc{}{s;}
wcs;{}s;s;s;wcs;s;{}
241-303 Discrete Maths: Grammars/8 33
4. Parse Trees
• A parse tree is a graphical way of showing how productions are used to generate a string.
• Data structures representing parse trees are used inside compilers to store information about the program being compiled.
241-303 Discrete Maths: Grammars/8 34
Example 1
• Consider the grammar:T = { a, b }
N = { S }
P = { S => S S | a S b | a b | b a }
S is the starting nonterminal
241-303 Discrete Maths: Grammars/8 35
Parse Tree for “aabbba”
The root of the tree is the start symbol S: S
Expand using S => S SS
SS
Expand using S => a S b
continued
expand thesymbol inthe circle
241-303 Discrete Maths: Grammars/8 36
S
S
S
S
a b
Expand using S => a bS
S
SS
a b
a bExpand using S => b a
continued
241-303 Discrete Maths: Grammars/8 37
S
S
S
a b
a b
S
b a
• Stop when there are no more nonterminals in leaf positions.
• Read off the string by reading the leaves left to right.
241-303 Discrete Maths: Grammars/8 38
Example 2
• Consider the grammar:T = { a, +, *, (, ) }
N = { E, T, F }
P = { E => T | T + ET => F | F * TF => a | ( E ) }
E is the starting nonterminal
241-303 Discrete Maths: Grammars/8 39
Is “a+a*a” in the Language?
E
Expand using E => T + E E
+ ET
Expand using T => F E
+ E
F
T
continued
241-303 Discrete Maths: Grammars/8 40
Continue expansion until:E
+ ET
F T
a F * T
a F
a
241-303 Discrete Maths: Grammars/8 41
5. Ambiguous Grammars
• A grammar is ambiguous when a string can be represented by more than one parse tree– it means that the string has more than one “mea
ning” in the language
• e.g. a variant of the last grammar example:P = { E => E + E | E * E | ( E ) | a }
241-303 Discrete Maths: Grammars/8 42
Parse Trees for “a+a*a”
E
E + E
a E * E
a a
andE
E + E
a a
E
* E
a
continued
241-303 Discrete Maths: Grammars/8 43
• The two parse trees allow a string like “5+5*5” to be read in two different ways:– 5+ 25 (the left hand tree)– 10*5 (the right hand tree)
241-303 Discrete Maths: Grammars/8 44
Why is Ambiguity Bad?
• In a programming language, a string with more than one meaning means that the compiler and run-time system will not know how to process it.
• e.g in C:x = 5 + 5 * 5;// what is the value in x?
241-303 Discrete Maths: Grammars/8 45
6. Top-down and Bottom-up Parsing
• Top-down parsing creates a parse tree starting from the start symbol and moves down towards the leaves.– used in most compilers– usually implemented as recursive-descent parsi
ng
continued
241-303 Discrete Maths: Grammars/8 46
• Bottom-up parsing creates a parse tree starting from the leaves, and moves up towards the start symbol.– productions are used in ‘reverse’
• Both kinds of parsing often require “guessing” to decide which productions to use to parse a string.
241-303 Discrete Maths: Grammars/8 47
Example
• Consider the grammar:T = { a, +, *, (, ) }
N = { E, T, F }
P = { E => T | T + ET => F | F * TF => a | ( E ) }
E is the starting nonterminal
241-303 Discrete Maths: Grammars/8 48
Top-down Parse of “a+a*a”
E
+ ET
F T
a F * T
a F
a
Top-down
241-303 Discrete Maths: Grammars/8 49
Bottom-up Parse of “a+a*a”
E
+ ET
F T
a F * T
a F
a
Bottom-up
241-303 Discrete Maths: Grammars/8 50
Guessing when Building
• Guessing occurs when there are several rules which can apply to the current nonterminal.
• Compilers are very bad at guessing, and so program language designers try to make grammars as simple as possible.
241-303 Discrete Maths: Grammars/8 51
Guessing in Bottom-up
• The compiler must backtrack to an earlier point and try a different rule.
E
T
F
a
E
T
F
a
E
T
F
a+ *
STUCK !
241-303 Discrete Maths: Grammars/8 52
7. Building Recursive Descent Parsers
• The parser will read a string as input, and test if it fits the grammar.
parser
continued
input stringe.g."a+a*a"
checks inputagainst thegrammar
output is:"yes" or "no"
241-303 Discrete Maths: Grammars/8 53
• In section 9 we will add the ability to generate a parse tree.
continued
parserinput stringe.g."a+a*a"
checks inputagainst thegrammar
output is:"no" or a tree:
241-303 Discrete Maths: Grammars/8 54
• The parser will be coded in 2 steps:– 1) Convert the grammar into syntax graphs– 2) Convert the syntax graphs into code
grammarconverted
to
syntaxgraphs converted
to
parser
The pay-off is that a programmar writes high-levelgrammar rules instead of complex code.
241-303 Discrete Maths: Grammars/8 55
7.1. What is a Syntax Graph?• A syntax graph is a graphical representation of a gra
mmar– easier to manipulate than grammars
• For example:P = { A => x | ( B )
B => A CC => { + A } }
• Valid strings:x (x) (x+x+x)
Remember that{ R } means 0or more R’s
241-303 Discrete Maths: Grammars/8 56
Graphs for A, B, and C
B( )
xA
A CB
A +
C
choicepoint
choicepoint
241-303 Discrete Maths: Grammars/8 57
Meanings• The input string is processed by following t
he graph for the top-most symbol in the grammar.
• A circle means:– if the current input character is a "x" then conti
nue by reading the next input character, otherwise reject the input string.
x
continued
241-303 Discrete Maths: Grammars/8 58
• A box means: – that the current input should be processed by th
e B syntax graph. It's like 'calling' the B graph to do the work.
B
241-303 Discrete Maths: Grammars/8 59
Choice Points Make Things Hard
• The graph must decide which path to take when execution reaches a choice point.– "deciding" can be hard
continued
(
xchoicepoint
241-303 Discrete Maths: Grammars/8 60
• General solution is to lookahead in the graph:– e.g. if the current input character is a "(" then g
o along the path that looks for a "(" next
241-303 Discrete Maths: Grammars/8 61
• For lookahead to be fast, it should be possible to decide which path to take by looking at only the next graph symbol.
(
xchoicepoint
This is possible if all the pathsstart with circles.
e.g. the top-path wants a "(", the bottom path wants a "x".
241-303 Discrete Maths: Grammars/8 62
7.2. From Grammar to Syntax Graph
• There are 6 translation rules to convert a grammar into a syntax graph.
241-303 Discrete Maths: Grammars/8 63
7.2.1. Translate a Production
• The production:A => Body
is mapped to a graph labelled A.
BodyA
The text inside thehexagon still needsto be translated toa graph.
241-303 Discrete Maths: Grammars/8 64
7.2.2. Translate a Terminal
• A terminal symbol x is translated to the graph:
x
the translation is finished
241-303 Discrete Maths: Grammars/8 65
7.2.3. Translate a Nonterminal
• A nonterminal symbol B is translated to the graph:
B
the translation is finished
241-303 Discrete Maths: Grammars/8 66
7.2.4. Translate ‘|’
• A production body of the form:Body1 | Body2 | ... | BodyN
is translated to the graph:
Body1
Body2
BodyN
::
The text inside thehexagons still needsto be translated tographs.
241-303 Discrete Maths: Grammars/8 67
7.2.5. Translate a Sequence
• A production body of the form:Body1 Body2 ... BodyN
is translated to the graph:
Body1 Body2 BodyN...
The text inside thehexagons still needsto be translated tographs.
241-303 Discrete Maths: Grammars/8 68
7.2.6. Translate {...} (0 or more)
• A production body of the form:{ Body1 }
is translated to the graph:
Body1The text inside thehexagon still needsto be translated toa graph.
241-303 Discrete Maths: Grammars/8 69
7.3. Grammar to Graph Example
• Consider the grammar:T = { x, +, (, ) }
N = { A, B, C }
P = { A => x | ( B )B => A CC => { + A } }
A is the starting nonterminal
241-303 Discrete Maths: Grammars/8 70
Translate the A Rule
• A => x | ( B ) uses 7.2.1 to become:
x | ( B )A
• Use 7.2.4 to become:Use 7.2.4 to become:
x
( B )
A
continued
241-303 Discrete Maths: Grammars/8 71
• Use 7.2.2. on the top branch:
( B )
Ax
• Use 7.2.5. on the bottom branch:Use 7.2.5. on the bottom branch:
( B )
Ax
continued
241-303 Discrete Maths: Grammars/8 72
• Use rules 7.2.2 and 7.2.3 on the bottom branch:
B( )
xA
241-303 Discrete Maths: Grammars/8 73
Graphs for A, B, and C
B( )
xA
A CB
A +
C
choicepoint
choicepoint
241-303 Discrete Maths: Grammars/8 74
Combining the Graphs
• Combine B and C graphs with A:
A(
A +
)
x
A
choicepoint
choicepoint
241-303 Discrete Maths: Grammars/8 75
Two Easy Choice Points
It is easy to decide which path to take at the two choice points.
Each path starts with a different nonterminal.
We can lookahead to decide which path to take.
241-303 Discrete Maths: Grammars/8 76
7.4. From Syntax Graphs to Code
• Each syntax graph is tranformed into a function using 6 basic transformations.
• main() does two things:– reads the first input character:
ch = getchar(); // ch is a global variable
– calls the function representing the starting nonterminal: A();
241-303 Discrete Maths: Grammars/8 77
7.4.1. Transform a Graph
• Becomes the function:void G(){ /* the code generated by
transforming the graph GBody */}
GBody
GThe graph inside thepentagon still needsto be translated tocode.
241-303 Discrete Maths: Grammars/8 78
7.4.2. Transform a Terminal
• Becomes the code:if (ch == ‘x’) ch = getchar(); // get ch for next stepelse error(); // reports error then exits
x
check input is x;get next input;
241-303 Discrete Maths: Grammars/8 79
7.4.3. Transform a Nonterminal
• Becomes the function call:G1();
G1
241-303 Discrete Maths: Grammars/8 80
7.4.4. Transform a Choice
• Becomes a switch or multiple if statement.
::
GBody1
GBody2
GBodyN
continued
choicepoint
241-303 Discrete Maths: Grammars/8 81
if (ch == firstGBody1) // transformation of GBody1 ;else if (ch == firstGBody2) // transformation of GBody2 ;else if ...
:else if (ch == firstGBodyN) // transformation of GBodyN ;else error();
continued
241-303 Discrete Maths: Grammars/8 82
• The translation tests ch to see if it is the character firstGBody1, firstGBody2, etc– ch is the current input character
• firstGBody1, firstGBody2, etc. are the first terminals (circles) of the pathsGBody1, GBody2, etc.
• These terminals must be distinct (different)– then only one test will succeed
241-303 Discrete Maths: Grammars/8 83
7.4.5. Transform a Sequence
• Becomes the block:{ // transformation of GBody1 ; // transformation of GBody2 ;
: // transformation of GBodyN ;}
GBody1 GBody2 GBodyN...
241-303 Discrete Maths: Grammars/8 84
7.4.6. Transform a Multiple
• Becomes the loop:while (ch == firstGBody1) // transformation of GBody1 ;
• firstGBody1 is the first terminal in GBody1.
GBody1 choicepoint
241-303 Discrete Maths: Grammars/8 85
Two Optimising Transformations
• There are two other transformations for a choice and a multiple.
• These are optimisations when the graph is a special shape.
241-303 Discrete Maths: Grammars/8 86
7.4.7. Optimising Choice
• Becomes a switch or multiple if statement.
::
GBody1
GBody2
GBodyN
continued
x1
x2
xNchoicepoint
241-303 Discrete Maths: Grammars/8 87
if (ch == ‘x1’) { ch = getchar(); // transformation of GBody1 ;}else if (ch == ‘x2’) { ch = getchar(); // transformation of GBody2 ;}else if ...
:else if (ch == ‘xN’) { ch = getchar(); // transformation of GBodyN ;}else error();
continued
241-303 Discrete Maths: Grammars/8 88
• Here the assumption is that the terminals x1, x2, etc are all different– this means that only 1 test will succeed
241-303 Discrete Maths: Grammars/8 89
7.4.8. Optimising Multiple
• Becomes the loop:while (ch == ‘x’) { ch = getchar(); // transformation of GBody1 ;}
GBody1 x choicepoint
241-303 Discrete Maths: Grammars/8 90
Code Optimisations
• Sometimes the generated code can be simplified. For example:
ch = getchar(); foo();while (ch == ‘x’) { ch = getchar(); foo();}
can be rewritten as:do { ch = getchar(); foo();while (ch == ‘x’);
241-303 Discrete Maths: Grammars/8 91
error() Function
• A simple error reporting function:
void error(){ printf(“Error while processing %c\n”,ch); exit(1);}
241-303 Discrete Maths: Grammars/8 92
7.5. Graph to Code Example
• The original grammar in section 7.3:T = { x, +, (, ) }
N = { A, B, C }
P = { A => x | ( B )B => A CC => { + A } }
A is the starting nonterminal
241-303 Discrete Maths: Grammars/8 93
Graphs for A, B, and C (again)
A CB
A +
C
B( )
xA
241-303 Discrete Maths: Grammars/8 94
Code#include ...
void A(); // parse functionsvoid B();void C();void error();
int ch; // holds current input char
void main(){ ch = getchar(); A(); printf(“parsed successfully\n”);}
continued
241-303 Discrete Maths: Grammars/8 95
void A(){ if (ch == ‘x’) ch = getchar(); else if (ch == ‘(‘) { ch = getchar(); B(); if (ch == ‘)’) ch = getchar(); else error(); } else error();}
continued
This code has beenoptimised to reducethe number of callsto error().
241-303 Discrete Maths: Grammars/8 96
void B(){ A(); C();}
void C(){ while (ch == ‘+’) { ch = getchar(); A(); }}
241-303 Discrete Maths: Grammars/8 97
8. Making the Translation Easy
• The translation (syntax graphs to code) requires the grammar to have special properties.
• When there is a choice about which path to take through a graph, the decision should depend only on the current character and the first terminals on the paths.
continued
(
x
241-303 Discrete Maths: Grammars/8 98
Examples
(
x the current input is 'x'
executionis here
The choice is easy to make.
a
a the current input is 'a'
executionis here
The choice isn't easy.(
x
241-303 Discrete Maths: Grammars/8 99
• It may be possible to “convert” a grammar into a suitable form by using techniques such as:– left recursion elimination– left factoring
241-303 Discrete Maths: Grammars/8 100
8.1. Left Recursion Elimination
• Example of left recursion:L => L a d |
• How many times should the L production be used to parse “adad”?
• Rearrange the grammar:L => a d L |
Such arearrangementis not alwayspossible.
241-303 Discrete Maths: Grammars/8 101
L a d
Lthe current inputis "a"
a dL
L the current inputis "a"
BAD...
GOOD...
executionis here
executionis here
241-303 Discrete Maths: Grammars/8 102
Another Example• Left recursive grammar:
Number => Number Digit | DigitDigit => 0 | 1 | 2 | 3 | ... | 9
• How many times should the Number production be used to parse “123”?
• Rearrange to:Number => Digit Number | DigitDigit => 0 | 1 | 2 | 3 | ... | 9
there’s stilla problem;see next slides
241-303 Discrete Maths: Grammars/8 103
8.2. Left factoring
• When 2 (or more) productions begin with the same terminal or nonterminal, then which production should be used?
• e.g. Which X rule to use to parse “ae...”?X => a d SX => a e R
continued
241-303 Discrete Maths: Grammars/8 104
a
a
the current input is 'a'
executionis here
e
dX S
R
BAD...
241-303 Discrete Maths: Grammars/8 105
• Left factoring creates a new production which represents the “tails” of the left factored rules.
• e.g. left factoring the X rules:X => a XTailXTail => d S | e R
241-303 Discrete Maths: Grammars/8 106
a
the current input is 'a'
executionis here
e
d
X
S
R
XTail
XTail
GOOD...
241-303 Discrete Maths: Grammars/8 107
Another Example
• Which Number rule should be used to parse “123”?
Number => Digit Number | DigitDigit => 0 | 1 | 2 | 3 | ... | 9
• Left factorise Number:Number => Digit NumTailNumTail => Number |
241-303 Discrete Maths: Grammars/8 108
9. Building a Parse Tree• Now we will augment the parser code of section 7 to
generate a parse tree.
• The grammar again:T = { x, +, (, ) }
N = { A, B, C }
P = { A => x | ( B )B => A CC => { + A } }
A is the starting nonterminal
241-303 Discrete Maths: Grammars/8 109
9.1. Representing the Parse Trees
• The production:A => x | ( B )
can create two possible parse trees:
A
x
or
A
( treefor B
)
continued
241-303 Discrete Maths: Grammars/8 110
• The production:B => A C
will create the parse tree:
B
treefor A
treefor C
continued
241-303 Discrete Maths: Grammars/8 111
• The production:C => { + A }
can generate an infinite number of parse trees:
C
C
+ treefor A
C
+ treefor A
+ treefor A
or or or ...
241-303 Discrete Maths: Grammars/8 112
A Parse Tree for “(x+x+x)”
A
( B )
A C
x + A + A
x x
Our code will readin a string and createa parse tree datastructure like this one.
241-303 Discrete Maths: Grammars/8 113
9.2. The Tree Data Structure
• The nodes in a parse tree can have different numbers of children.
• The C grammar rule can generate 1 child or any even number of children!– 2, 4, 6, 8, ...
continued
241-303 Discrete Maths: Grammars/8 114
• struct node { char label; struct node *leftChild; struct node *rightSib; // sibling};typedef struct node *TREE;
leftChild
rightSiblabel
Tree Node Date Structure
This approach allows usto have a variable numberof siblings.
241-303 Discrete Maths: Grammars/8 115
9.3. Tree Building Functions
• A collection of building functions:– a function to create a node with 0 children– a function to create a node with 1 child– a function to create a node with 2 children– etc.
• The C production will require some fancy coding.
241-303 Discrete Maths: Grammars/8 116
TREE makeLeaf(char x){ TREE root = (TREE) malloc(
sizeof(struct node)); root->label = x; root->leftChild = NULL; root->rightSib = NULL;
return root;}
continued
x
NN
makeLeaf() createsthe node:
241-303 Discrete Maths: Grammars/8 117
TREE makeNode1(char x, TREE t){ // the subtree t is supplied TREE root = makeLeaf(x); root->leftChild = t; return root;}
continued
x
N
?
??
t
makeNode1() createsthe tree:
‘?’ means that makeNode1()does not care what the value is.
241-303 Discrete Maths: Grammars/8 118
TREE makeNode2(char x, TREE t1, TREE t2){ // subtrees t1 and t2 are supplied TREE root = makeNode1(x, t1); t1->rightSib = t2; return root;}
x
N
?
?
t1?
??
t2
continued
makeNode2() createsthe tree:
241-303 Discrete Maths: Grammars/8 119
TREE makeNode3(char x,
TREE t1, TREE t2, TREE t3){ // the subtrees are supplied TREE root = makeNode2(x, t1, t2); t2->rightSib = t3; return root;}
x
N
?
?
t1?
?
t2?
??
t3
This approach can be used to create makeNode4(), and so on.
makeNode3() createsthe tree:
241-303 Discrete Maths: Grammars/8 120
Dealing with the C production
• The C production can generate any even number of children:
C => { + A }
continued
241-303 Discrete Maths: Grammars/8 121
• A C tree will be constructed in three ways:– a C node with 1 child
• use makeNode1(‘C’, makeLeaf(‘e’))
– a C node with 2 children• use makeNode2()
– a C node with 4, 6, 8, ... children• use add2Children() repeatedly, after calling makeNode2() first
‘e’ standsfor
241-303 Discrete Maths: Grammars/8 122
TREE add2Children(TREE t, TREE t1, TREE t2)
{ TREE rm = rightMostChild(t); rm->rightSib = t1; t1->rightSib = t2; return t;}
x
N
?
?
?
?
t1?
??
t2
t
?
?
rm
We will notdefine this function.
add2Children() adds t1and t2 to the end of t’schildren (after rm):
241-303 Discrete Maths: Grammars/8 123
9.4. Parse Trees as CodeA
xbecome:
A
( treefor B
)
A
N
x
NN
A
N
(
N
B )
NN
cells representing B tree
241-303 Discrete Maths: Grammars/8 124
B
treefor A
treefor C
becomes:B
N
A C
cells for C tree
cells for A tree
N
continued
241-303 Discrete Maths: Grammars/8 125
C
+ treefor A
+ treefor A
becomes:C
N
+
N
A +
N
cells for A tree
A
cells for A tree
N
241-303 Discrete Maths: Grammars/8 126
9.5. Code with Parse Tree Generation#include ...
struct node { ... };typedef struct node *TREE;
TREE A(); // parse functionsTREE B();TREE C();void error();TREE makeLeaf(char x);
: // other TREE building prototypes
char ch; // holds current input char
continued
241-303 Discrete Maths: Grammars/8 127
void main(){ ch = getchar(); TREE parseTree = A();
:
// use parseTree, print it, etc.}
continued
241-303 Discrete Maths: Grammars/8 128
TREE A(){ if (ch == ‘x’) { ch = getchar(); return makeNode1(‘A’, makeLeaf(‘x’)); } else if (ch == ‘(‘) { ch = getchar(); TREE BTree = B(); if (ch == ‘)’) { ch = getchar(); return makeNode3(‘A’, makeLeaf(‘(’),
BTree, makeLeaf(‘)’) ); } else error(); } else error();}
continued
241-303 Discrete Maths: Grammars/8 129
TREE B(){ Tree ATree = A(); Tree CTree = C(); return makeNode2(‘B’, ATree, CTree);}
continued
241-303 Discrete Maths: Grammars/8 130
TREE C(){ TREE ATree, CTree; int numLoops = 0; // times round the loop while (ch == ‘+’) { numLoops++; ch = getchar(); ATree = A(); if (numLoops == 1) // 1st time through loop CTree = makeNode2(‘C’,makeLeaf(‘+’),ATree); else // 2nd, 3rd, etc time CTree = add2Children(CTree, makeLeaf(‘+’),
ATree); } if (numLoops == 0) // skipped the loop CTree = makeNode1(‘C’, makeLeaf(‘e’)); return CTree;}
241-303 Discrete Maths: Grammars/8 131
10. Kinds of Grammars
• There are 4 main kinds of grammar, of increasing expressive power:– regular (type 3) grammars– context-free (type 2) grammars– context-sensitive (type 1) grammars– unrestricted (type 0) grammars
• They vary in the kinds of productions they allow.
241-303 Discrete Maths: Grammars/8 132
10.1. Regular Grammars• Every production is of the form:
A => a | a B | – A, B are nonterminals, a is a terminal
• These are sometimes called right linear rules because if a nonterminal appears in the rule body, then it must appear last.
• Regular grammars are equivalent to REs (and also to automata).
S => wTT => xTT => a
241-303 Discrete Maths: Grammars/8 133
An Equivalence Diagram
RegularGrammars
REs
Automata
sameexpressivepower
241-303 Discrete Maths: Grammars/8 134
Example
• Integer => + UInt | - UInt | 0 Digits | 1 Digits | ... | 9 Digits
UInt => 0 Digits | 1 Digits | ... | 9 Digits
Digits => 0 Digits | 1 Digits | ... | 9 Digits |
241-303 Discrete Maths: Grammars/8 135
10.2. Context-Free Grammars
• Every production is of the form:A =>
– A is a nonterminal, can be any number of nonterminals or terminals
• Most of our examples have been context-free grammars– used widely to define programming languages– they subsume regular grammars
A => aA => aBcdB => ae
241-303 Discrete Maths: Grammars/8 136
10.3. Context-Sensitive Grammars
• Every production is of the form: => – , can contain any number of terminals and
nonterminals– must contain at least 1 nonterminal– size() >= size()– cannot be
continued
A => a11A => aB2dB2 => ae
241-303 Discrete Maths: Grammars/8 137
• Context-sensitive rules allow the grammar to specify a context for a rewrite– e.g. A1a0 => 1b00– the string 2A1a01 becomes 21b001
– Context-sensitive grammars are more powerful than context-free grammars because of this context ability.
241-303 Discrete Maths: Grammars/8 138
Example
• The language:E = {012, 001122, 000111222, ... }
or, in brief, E = {0n 1n 2n | n >= 1}
can only be expressed using a context-sensitive grammar:
S => 0 A 1 2 | 0 1 2A => 0 A 1 C | 0 1 CC 1 => 1 CC 2 => 2 2
241-303 Discrete Maths: Grammars/8 139
Rewrite S to 001122
• S => O A 1 2
0 A 1 2 => 0 0 1 C 1 2
0 0 1 C 1 2 => 0 0 1 1 C 2
0 0 1 1 C 2 => 0 0 1 1 2 2
241-303 Discrete Maths: Grammars/8 140
10.4. Unrestricted Grammars
• Every production is of the form: => – , can contain any number of terminals and no
nterminals; must contain at least 1 nonterminal– no restrictions on size()
• it may be smaller than size()
– can be • Also called phrase-structure grammars.
more generalthan contextsensitive
A => 11A => aB2 => aeA
241-303 Discrete Maths: Grammars/8 141
Example
• The language:E = {, 012, 001122, 000111222, ... }
or, in brief, E = {0n 1n 2n | n >= 0}
can only be expressed using an unrestricted grammar:
S => 0 A 1 2 | A => 0 A 1 C | C 1 => 1 CC 2 => 2 2
new features
241-303 Discrete Maths: Grammars/8 142
Rewrite S to 012
• S => 0 A 1 2• 0 A 1 2 => 0 1 2
– using A ==>
241-303 Discrete Maths: Grammars/8 143
10.5. Why so many Grammar Kinds?
• More powerful grammars are more expressive, but also harder to implement efficiently– a trade-off between power and implementation
continued
241-303 Discrete Maths: Grammars/8 144
• For example, most compilers have two grammar-based components:– the lexical analyzer
• uses REs (regular grammars) to parse basic nonterminals such as identifier and number
– the syntax analyzer• uses (context-free) grammars to deal with complex s
yntactic categories such as loops and expressions
241-303 Discrete Maths: Grammars/8 145
Lexical and Syntax Analyzers
lexicalanalyzer
syntaxanalyzer
program text file
chars:'i' 'n' 't'' ' 'x' '=''4' '3' ';' ...
tokens intx=43;
....
parse tree
int x = 43 ;
....
the compiler
codegeneration
241-303 Discrete Maths: Grammars/8 146
11. From REs to Grammars
• It is easy to translate a RE into a context-free grammar.– each RE operand and operator can be implemen
ted by a grammar rule
• Infact, the power of context-free grammars is not needed, since REs are equivalent to regular grammars– we translate to context-free because it is simple
to do
241-303 Discrete Maths: Grammars/8 147
Operand to Production
• Assume that R is the regular expression, and G is the new production.
• Operand ProductionR = x G => xR = G => R = {} nothing
translatesto
241-303 Discrete Maths: Grammars/8 148
Operator to Production
• Assume that S and T are REs; Gs and Gt are their translation to productions.
• Operator ProductionR = S | T G => Gs | Gt
R = S T G => Gs Gt
R = S* G => Gs G | or G => { Gs }
translatesto
241-303 Discrete Maths: Grammars/8 149
Example: translate a | bc*
• The RE with brackets:a | ( b ( c* ) )
• Translate the operands:A => aB => bC => c
– the nonterminals A, B, C are invented
continued
241-303 Discrete Maths: Grammars/8 150
• Translate the operators in precedence order.• Translate c*:
CStar => C CStar | • Translate b c*
BCStar => B CStar
• Translate a | b c*S => A | BCStar
The CStar,BCStar, andS nonterminalsare invented.
241-303 Discrete Maths: Grammars/8 151
• The complete grammar:T = { a, b, c }
N = { S, BCStar, CStar, A, B, C }
P = { S => A | BCStarBCStar => B CStarCStar => C CStar | A => aB => bC => c }
S is the starting nonterminal
These rulescan besimplified.
241-303 Discrete Maths: Grammars/8 152
Rules Simplification
• Substitute in the right hand sides for the A, B, and C rules:
P = { S => a | BCStarBCStar => b CStarCStar => c CStar | }
• Substitute in the right hand side for BCStar:P = { S => a | b CStar
CStar => c CStar | }
241-303 Discrete Maths: Grammars/8 153
12. Context-free Grammars vs. REs
• REs (and automata) are equivalent to regular grammars– they can be used for all the same problems
• Every production in a regular grammar is right linear:
A => a | a B | – A, B are nonterminals, a is a terminal
continued
241-303 Discrete Maths: Grammars/8 154
• This means that a regular grammar (and also REs, automata) can not be used to express most context-free grammars, or any context-sensitive or unrestricted grammars.
• REs are less powerful than context-free grammars.
241-303 Discrete Maths: Grammars/8 155
Example
• Context-free grammar:S => 0 1 | 0 S 1
– it defines the language E = { 0n 1n | n >= 1}
• The S production is not right linear, so a RE cannot be used to model the language E.
S is not at the end.
241-303 Discrete Maths: Grammars/8 156
12.1. Proof Using Automata
• Proving that REs are weaker than context-free grammars is easiest if we prove that automata are weaker than context-free grammars– remember that REs are equivalent to automata
continued
241-303 Discrete Maths: Grammars/8 157
• Assume an automaton with 2*k states.
• How could it be used to represent?E = { 0n 1n | n >= 1}
• We will consider the case when n >> k.
241-303 Discrete Maths: Grammars/8 158
First Automaton
• Problem: not enough states since n >> k
1 2 3 kstart 0 0 0 0
2k 2k-1 2k-2 k+11 1 1 1
1
It uses all of its allowed 2*k states.
must beequallength
241-303 Discrete Maths: Grammars/8 159
Second Try
• Add loops to reuse states.
1 2 3 kstart 0 0 0 0
2k 2k-1 2k-2 k+11 1 1 1
1
0
1continued
must beequallength
241-303 Discrete Maths: Grammars/8 160
• Question: how many 0’s were matched between state 1 and k?– Answer: it could be any number
• Question: how can the number of matched 0’s be used to fix the number of matched 1’s?– Answer: it cannot when n can be any number
continued
241-303 Discrete Maths: Grammars/8 161
• So, no automaton can model the language: E = {0n 1n | n >= 1}– so there is no RE for E– but E can be written as a context-free grammar
• This shows that REs are weaker than context-free grammars.