21
Languages and Grammars

Languages and Grammars. A language is a set of strings. Example: The set of all valid C++ programs is a language. Each program consists of a string of

Embed Size (px)

Citation preview

Languages and Grammars

A language is a set of strings.

Example: The set of all valid C++ programs is a language. Each program consists of a string of symbols – identifiers, keywords, numeric constants, operator symbols, and various punctuation marks, called tokens.

Problem: Given a string, and a language, determine whether the string is recognized by the language.

A mechanism to describe, or recognize, the strings that make up a language is a grammar.

A grammar consists of

1. A set of non-terminal symbols

2. A set of terminal symbols – strings in the language are formed by the terminal symbols.

3. A specified non-terminal symbol called the start symbol.

4. A set of rules, called productions, to reduce the start symbol to a string in the language.

A string of terminal symbols belongs to the language defined by the grammar if beginning with the start symbol, and repeatedly replacing a non-terminal symbol with the right hand side of a production for that non-terminal, we eventually obtain the specified string.

This process is called a derivation.

Example: Consider the following grammar for arithmetic expressions involving additions and subtractions.

The non-terminal symbols are: <L>, <int>

The terminal symbols are: all integers, +, -

The start symbol is: <L>

The rules are

<L> <int> + <L>

<L> <int> - <L>

<L> <int>

<int> any integer

Example of a string: 4 – 2 + 5

<L> <int> - <L> 4 - <L> 4 - <int> + <L>

4 – 2 + <L> 4 – 2 + <int> 4 – 2 + 5

This derivation verifies that the original string, or arithmetic expression, belongs to the language, but it doesn’t tell the whole story. We need to look at the parse tree for this derivation.

The root of a parse tree is the start symbol, and the children of any node on the tree which contains a non-terminal, are the objects – terminals and non-terminals- on the right hand side of a production for that non-terminal.

<L>

<int> - <L>

4 <int> + <L>

2 <int>

5

<L>

<int> - <L>

4 <int> + <L>

2 <int>

5

In other words 4 – 2 + 5 = -3 What’s wrong with this?

4 – 2 + 5 = 7 in “conventional” arithmetic.

2+5=7

4-7=-3

The additions and subtractions in expressions recognized by this grammar are right associative, that is they are performed from right to left – the right-most operation is performed first, then the right-most operation of those remaining is performed next, etc.

In ordinary arithmetic if several additions and subtractions appear in an expression, they are performed from left to right. The operations are said to be left associative.

Example: Revise the productions in the previous example so the operations are left associative.

<L> <L> + <int> | <L> - <int> | <int>

<int> any integer

The symbol “|” represents “or”

<L>

<L> + <int>

<L> - <int> 5

<int> 2

4

<L>

<L> + <int>

<L> - <int> 5

<int> 2

4

In this grammar the associativity is consistent with ordinary arithmetic.

4-2=2

2+5=7

Example: The language L consists of

ton-terminals: <S> which is also the start symbol

terminals: a, b, c

Any string of a’s, b’s, and c’s are candidates to belong to this language.

productions: <S> a <S> b | c

It’s not too hard to see that L = { an c bn : n >= 0 }

Example: Does the string aacbb belong to the language L:

<S> a <S> b aa <S> bb aacbb

<S>

a <S> b

a <S> b

c

To write a program which will recognize strings in a particular language we rely heavily on the grammar for the language.

There is a function for each non-terminal symbol.

Consider the previous example with productions

<S> a <S> b | c

Write a function corresponding to the non-terminal symbol <S> which takes note of the current character in the specified string, and performs the actions indicated by the right hand side of the productions for <S>.

The function bool S ( ) is called in main via

input a string

ch = getNextChar ( ) // the first character in the string

if ( S ( ) )

display: the string is recognized by the grammar

else

display: the string is not recognized by the grammar

bool S ( ) {

if (ch = ‘a’) { // applies the production <S> a <S> b

st.push (ch)

if (EOS) return false

ch = getNextChar ( )

if (!S ( ) ) return false

if ( ch == ‘b’) {

if (st.empty( ) ) return false

st.pop ( )

if (st.empty ( ) && EOS ) return true

if ( st.empty( ) xor EOS ) return false

ch = getNextChar ( ) return true } }

else if ( ch == ‘c’ ) { // applies the production <S> c

if ( st.empty ( ) && EOS ) return true

if ( st.empty ( ) xor EOS ) return false

ch = getNextChar ( )

return true

}

else // something bad happened

return false

} // the end of S

Example:

In main: input string: aacbb ch = ‘a’

Call to S: st: a ch = ‘a’

Call to S st: a a ch = ‘c’

Call to S st: a a ch = ‘b’ return true and

Example:

In main: input string: aacbb ch = ‘a’

Call to S: st: a ch = ‘a’

Call to S st: a a ch = ‘c’ ch = ‘b’ = ‘b’ return true

true

Call to S st: a a ch = ‘b’ return true and

Example:

In main: input string: aacbb ch = ‘a’

Call to S: st: a ch = ‘a’ ‘b’ true

Call to S st: a a ch = ‘c’ ch = ‘b’ = ‘b’ return true

true

Call to S st: a a ch = ‘b’ return true and

Example:

In main: input string: aacbb ch = ‘a’ true

Call to S: st: a ch = ‘a’ ‘b’ EOS true

Call to S st: a a ch = ‘c’ ch = ‘b’ = ‘b’ return true

true

Call to S st: a a ch = ‘b’ return true and