Upload
alok-vishwakarma
View
77
Download
1
Embed Size (px)
Citation preview
SYNTAX ANALYSIS
OVERVIEW
The role of Syntax Analyzer
Context Free grammars
Derivations, parse tree and Ambiguity
Eliminating Ambiguity
Elimination of Left Recursion
Left factoring
Top-Down parsing
THE ROLE OF THE PARSER
Main task:
reads string of tokens from lexical analyzer and verifies that
the string of token names can be generated by the grammar
of the source language.
Position of parser in compiler model:
Lexical
analyzer ParserRest of
front end
Symbol
Table
Source
program
token
Get next
token
parse
tree
intermediate
representation
Grammars offer significant advantages to both language designers and compiler writers.
Grammars gives a precise syntactic specification of Programming Languages.
A properly designed grammar imparts a structure to a PL that is useful for translation of source programs into correct object code and for the detection of errors.
Languages evolve over a period of time, acquiring new constructs and performing additional tasks.
For certain classes of grammars, we can automatically construct an efficient parser that determines if a source program is well formed.
Three general types of parsers for grammars
Universal parsing methods
Cocke-Younger-Kasami algorithm
Earley’s algorithm
Too inefficient to use in production compilers
Compilers either top-down or bottom-up
one symbol at a time
Most efficient top-down and bottom-up methods work only on subclasses of grammars.
LL and LR grammars are expressive enough to describe most syntactic constructs in PLs
SYNTACTIC ERROR HANDLING
Common programming errors can occur at different levels: Lexical
Misspelling an identifier or keyword or operator
Syntactic Arithmetic expression with unbalanced parentheses
Semantic Operator applied to incompatible operand (type mismatch)
Logical An infinitely recursive call
Much of error detection and recovery can be handled by parsing methods.
Detection of semantic and logical errors at compile time is a difficult task.
The error handler in a parser has simple-to-state goals:
Report the presence of errors clearly and accurately.
Recover from each error quickly enough to detect subsequent errors.
Add minimal overhead to the processing of correct programs.
LL and LR parsing methods, detect an error as soon as possible.
ERROR RECOVERY STRATEGIES
Panic mode Discards input until a valid tokens if found.
Phase level local correction(replacement of characters) on the remaining input
Error productions augment the grammar with productions that generate erroneous
constructs.
Global correction algorithms for choosing a minimal sequence of changes to obtain a
global least cost correction.
CONTEXT FREE GRAMMARS
PL constructs have an inherently recursive structure that can be defined by CFGs
A CFG consists of terminals, non-terminals, a start
symbol, and productions.
Terminals: id, (, ), *, /, +, -
Start symbol: expression.
Non-terminals: expression, term, factor
DERIVATIONS
Derivation corresponds to the construction of parse
tree, starting from the start symbol replacing a non-
terminal using one of its production rules.
Any string of grammar symbols is a sentential form
if it is derived from the start symbols.
A sentence in a grammar is a sentential form with
no non-terminals.
Left most derivation
the leftmost non-terminal is replaced in the sentential
form
Right most derivation:
the rightmost non-terminal is replaced in the sentential
form
PARSE TREE
A parse tree is a graphical representation of a
derivations that rules out the order in which
productions are applied to replace non-terminals.
Yield of parse tree
Leaves of the parse tree read from left to right ,
constitute a sentential form.
AMBIGUITY
A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
An ambiguous grammar is one that produces more than one leftmost or rightmost derivation for the same sentence.
ELIMINATING AMBIGUITY
Consider the following grammar of expressions
consisting of plus, minus and digits
When an operand 5 has operators to left and right,
some conventions are needed for deciding which
operator applies to that operand.
ASSOCIATIVITY OF OPERATORS
In PLs, the operators +, -, *, / are left-associative.
An operand with plus signs on both sides of it
belongs to the operator to its left.
Exponentiation and assignment are right-
associative.
PRECEDENCE OF OPERATORS
Rules defining relative precedence of operators are
needed when more than one kind of operator are
present.
*, / have higher precedence than +, -.
9+5*2 is interpreted as 9+(5*2) and not (9+5)*2.
Consider the following “dangling-else” grammar
According to this grammar, the compound
statement
has the parse tree
The grammar is ambiguous since the string
has two parse trees
The dangling else grammar can be rewritten to an
unambiguous grammar by matching every then to
an else using the grammar shown below.
ELIMINATION OF LEFT RECURSION
A grammar is left-recursive if it has a non-terminal
such that there is a derivation for some .
Immediate left-recursion is a production of the form
The left recursion production
can be replaced by the non-left-recursive
productions
without changing the strings derivable from A.
In general, the immediate left-recursion for any
number of A-productions
can be eliminated as
Non-left recursive grammar for expression grammar
is
ALGORITHM TO ELIMINATE LEFT-RECURSION
The resulting non-left-recursive grammar may have
ϵ-productions.
Example: consider the following grammar
The algorithm yields the following non-recursive
algorithm
S Aa | bA Ac | Sd | ε
S Aa | bA bdA’ | A’
A’ cA’ | adA’ | ε
LEFT FACTORING
Left factoring transformation is useful for producing
a grammar suitable for predictive parsing.
If are two A-productions , the left-
factored productions for A becomes
ALGORITHM:
Example: Consider the grammar
Left factored grammar becomes
NON-CONTEXT FREE LANGUAGE CONSTRUCTS
L1 = { wcw | w is in (a|b)* } is not context free
The non-context freedom of L1 , directly implies the non-context-freedom of programming languages like C and Java, which require declaration of identifiers before their use and which allow identifiers of arbitrary length.
L2 = { anbmcndm | n >= 1 and m >= 1 } is not context free
Here, L2 abstracts the problem of checking the number of formal parameters in the declaration of a function agrees the number of actual parameters in a use of the function.
TOP-DOWN PARSING
Finding leftmost derivation for an input string.
The key problem is to determine the production to be
applied for a non-terminal.
Backtracking may be involved in finding the correct
production to a non-terminal.
For most of the PL constructs, back tracking can be
avoided by using suitable transformations to the
grammar.
RECURSIVE DESCENT PARSING
Leftmost derivation for an input string.
Construction of a parse tree from the root and creating the nodes of the parse tree in preorder.
Recursive-descent parsing may involve backtracking. Consider the grammar S cAd
A ab | a
derivation for string w=cad here, involves backtracking.
S S
c A d
a
c A d
a bc dA
S
Recursive descent parsing consists of set of procedures,
one for each non-terminal.
Issues
May require backtracking i.e, may require repeated scan over
the input.
Left-recursive grammar can cause the parser to go into an
infinite loop.
FIRST AND FOLLOW SETS
If α is any string of grammar symbols,
FIRST(α) is the set of terminals that begin the strings derived
from α.
For any non-terminal A
FOLLOW(A) is the set of terminals that can appear
immediately to the right of A in some sentential form.
FIRST()
To compute FIRST(X) for all grammar symbols X, apply
the following rules until no more terminals or ϵ can be
added to FIRST(X).
FOLLOW()
To compute FIRST(A) for all non-terminals A, apply the
following rules until nothing can be added to any
FOLLOW set.
EXAMPLE:
In the expression grammar
The FIRST sets are
FIRST(F) = FIRST(T)=FIRST(E)={(, id}
FIRST(E’) ={+, ϵ}
FIRST(T’) = {*, ϵ}
The FOLLOW sets are
FOLLOW(E) = FOLLOW(E’) = {), $}
FOLLOW(T) = FOLLOW(T’) = {+, ), $}
FOLLOW(F) = {+, *, ), $}
PREDICTIVE PARSING
The goal of predictive parsing is to construct top-down parser that never backtracks.
Obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking.
Given an input symbol and a non-terminal to be expanded, we must know which one of the alternatives of the productions is the unique alternative that derives a string beginning with a.
To do so, we do the following transformations to the grammar Eliminate left-recursion
Perform left-factoring
LL(1) GRAMMARS
Class of LL(1) grammars Left to right scan, Leftmost derivation, 1 input symbol
lookahead
LL(1) grammars aid in automatic construction of predictive parser.
A grammar G is LL(1) if and only if whenever are two distinct productions in G, the following conditions hold: FIRST( ) and FIRST( ) are disjoint sets.
If ϵ is in FIRST( ), then FIRST( ) and FOLLOW(A) are disjoint sets and like wise if ϵ is in FIRST( ) .
CONSTRUCTION OF A PREDICTIVE PARSING TABLE
EXAMPLE: PREDICTIVE PARSING TABLE FOR
EXPRESSION GRAMMAR
For every LL(1) grammar, each parsing-table entry
uniquely identifies a production or signals an error.
For some grammars, the table may have some
entries multiple defined.
If grammar is either left-recursive or ambiguous at least
one of the entry is multiply defined.
Although left-recursion elimination and left-factoring
are easy to do, there are some grammars for which
no amount of alteration will produce an LL(1)
grammar.
EXAMPLE: PREDICTIVE PARSING TABLE FOR
DANGLING-ELSE PROBLEM
Predictive parsing table for the grammar
is
TABLE-DRIVEN PREDICTIVE PARSING
PREDICTIVE PARSING ALGORITHM
EXAMPLE: