57
1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020 Programming Paradigms Lecture 2: Syntax (Part 1) Join the course on Ilias! See link on http://software-lab.org/teaching/winter2019/pp/

Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

1

Prof. Dr. Michael Pradel

Software Lab, University of StuttgartWinter 2019/2020

Programming ParadigmsLecture 2: Syntax (Part 1)Join the course on Ilias! See link onhttp://software-lab.org/teaching/winter2019/pp/

Page 2: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

2 - 1

Wake-up Exercise

What does the following C code print?

#include <stdio.h>

int main() {int arr[3] = {5, 42, 23};int n = 2[arr];printf("%d", n);

}

https://ilias3.uni-stuttgart.de/vote/0ZT9

Page 3: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

2 - 2

Wake-up Exercise

What does the following C code print?

#include <stdio.h>

int main() {int arr[3] = {5, 42, 23};int n = 2[arr];printf("%d", n);

}

Result: 23

https://ilias3.uni-stuttgart.de/vote/0ZT9

Page 4: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

2 - 3

Wake-up Exercise

What does the following C code print?

#include <stdio.h>

int main() {int arr[3] = {5, 42, 23};int n = 2[arr];printf("%d", n);

}

Result: 23

Same as arr[2]:Computes a pointer *(arr + 2) andreturns value stored at the address

https://ilias3.uni-stuttgart.de/vote/0ZT9

Page 5: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

3

Motivation

� Goal: Specify a programminglanguage

� What code is part of the language?

� What is the meaning of a piece of code?

� Important for both developers andtools

� In contrast: Natural languages notformally specified

Page 6: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

4 - 1

Syntax vs. Semantics

Structure ofcode

Meaning ofcode

Example:

Grammar to define a language:

digit → 0 | 1 | ... | 9non zero digit → 1 | ... | 9number → non zero digit digit*

Could mean

� Natural numbers

� Days of a 10-day

week

� Colors

� ...

Page 7: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

4 - 2

Syntax vs. Semantics

Structure ofcode

Meaning ofcode

Example:

Grammar to define a language:

digit → 0 | 1 | ... | 9non zero digit → 1 | ... | 9number → non zero digit digit*

Could mean

� Natural numbers

� Days of a 10-day

week

� Colors

� ...Focus of this and next lecture

Page 8: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

5 - 1

Syntax of Different PLs

Common: Different syntax, same semantics

if [ $foo -gt 100 ]then...

fi

if (foo > 100) {...

}

Java: Bash:

Page 9: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

5 - 2

Syntax of Different PLs

Common: Different syntax, same semantics

Sometimes: Same syntax, different semantics

if [ $foo -gt 100 ]then...

fi

if (foo > 100) {...

}

Java: Bash:

if ("abc" != 5) {...

}

Java: JavaScript:

Branch isexecuted

Type error atcompile time

Page 10: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

1

Page 11: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

7

Plan for Today

� Specifying syntax

� Regular expressions

� Context-free grammars

� Scanning

� Parsing

Page 12: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

8

Tokens

Basic building blocks of every PL

� Keywords, identifiers, constants, operators

� Think: ”Words” of the language

Example: C has more than 100 tokens

� Keywords, e.g., double, if, return, struct

� Identifiers, e.g., my var, printf

� Literals, e.g., 6.022e23, ’x’

� Punctuators, e.g., (, }, &&

Page 13: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

9 - 1

Regular Expressions

� Used to specify tokens� A regular expression is one of:

� A character

� The empty string ε

� The concatentation of two regular expressions

� Two regular expressions separated by |

• Means a string generated by one or the other

� A reg. expression followed by the Kleene star *

• Means zero or more repetitions

Page 14: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

9 - 2

Regular Expressions

� Used to specify tokens� A regular expression is one of:

� A character

� The empty string ε

� The concatentation of two regular expressions

� Two regular expressions separated by |

• Means a string generated by one or the other

� A reg. expression followed by the Kleene star *

• Means zero or more repetitions

No recursion!No recursion!

Page 15: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

2

Page 16: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

11

Quiz

https://ilias3.uni-stuttgart.de/vote/0ZT9

Which of the following strings isaccepted by the regular expressionnumber?

Page 17: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

3

Page 18: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

13 - 1

Identifiers in Popular PLs

Different PLs allow different identifiers

� Case-sensitive vs. case-insensitive� E.g., foo, Foo, and FOO are the same in Ada and

Common Lisp, but not in Perl and C

� Letters and digits: Almost always allowed

� Underscore: Allowed in most languages

In addition to syntax rules: Conventions

� E.g., Java: ClassName, variableName

Page 19: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

13 - 2

Identifiers in Popular PLs

Different PLs allow different identifiers

� Case-sensitive vs. case-insensitive� E.g., foo, Foo, and FOO are the same in Ada and

Common Lisp, but not in Perl and C

� Letters and digits: Almost always allowed

� Underscore: Allowed in most languages

In addition to syntax rules: Conventions

� E.g., Java: ClassName, variableName

Know the rules of the language you use!

Page 20: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

14

White space in Popular PLs

Free format vs. formatting as syntax

� Spaces and tabs sometimes matter

� E.g., in Python

� Line breaks sometimes matter

� E.g., to separate statements in JavaScript or Python

Page 21: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

15

Demo

[demos/whitespace: show both stmts onone line; insert semi-colon; show notindenting print (after invertingcondition)]

Page 22: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

4

Page 23: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

17 - 1

Context-free Grammars

≈ Regular expressions + Recursion

Example: Arithmetic expressionsexpr→ id | number | expr op expr | ( expr )op→ + | - | * | /

Page 24: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

17 - 2

Context-free Grammars

≈ Regular expressions + Recursion

Example: Arithmetic expressionsexpr→ id | number | expr op expr | ( expr )op→ + | - | * | /

Non-terminals

Page 25: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

17 - 3

Context-free Grammars

≈ Regular expressions + Recursion

Example: Arithmetic expressionsexpr→ id | number | expr op expr | ( expr )op→ + | - | * | /

Non-terminals Terminals =tokens of the PL

Page 26: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

17 - 4

Context-free Grammars

≈ Regular expressions + Recursion

Example: Arithmetic expressionsexpr→ id | number | expr op expr | ( expr )op→ + | - | * | /

Recursion allows fornesting expressions

Page 27: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

5

Page 28: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

6

Page 29: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

20

Derivations

Create concrete strings from thegrammar

� Begin with start symbol

� Repeat until no non-terminals remain:

� Choose production with a remaining non-terminal on

the left-hand side

� Replace it with right-hand side of the production(choose one option if multiple options)

Page 30: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

7

Page 31: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

22

Parse Trees

Tree-structured representation of aderivation

� Root = Start symbol

� Leaf nodes = Tokens that result from derivation

� Intermediate nodes = Application of a production

Page 32: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

8

Page 33: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

24

Not All Grammars are Equal

Each language has infinitely manygrammars

Some grammars are ambiguous

� A single string may have multiple derivations

� Unambiguous grammars facilitate parsing

Grammar should reflect the internalstructure of the PL

� E.g., associativity and precedence of operators

Page 34: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

25

Example: Revised Grammar

expr→ term | expr add op term

term→ factor | term mult op factor

factor→ id | number | - factor | ( expr )

add op→ + | -

mult op→ * | /

A better version of the grammar ofarithmetic expressions:

Page 35: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

26

Quiz: Context-free Grammars

https://ilias3.uni-stuttgart.de/vote/0ZT9

Draw the parse tree of foo * x + bar

with the revised grammar. How manynodes and edges does the tree have?

Page 36: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

9

Page 37: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

28

Plan for Today

� Specifying syntax

� Regular expressions

� Context-free grammars

� Scanning

� Parsing

Page 38: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

29

Implementing a Scanner

General idea

� Read one character at a time

� Whenever a full token is recognized, return it

� When no token can be recognized, report an error

� Sometimes, need to look multiple characters

ahead to determine next token

Page 39: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

30

Option 1: Ad-hoc Scanners

� Manually implemented

� Handle common tokens first

� Used in many production compilers

� Compact code

� Efficient scanning

Page 40: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

31

Option 2: Finite Automata

� Each token specified by a regularexpression

� Finite automata = Recognizers ofregular expressions

Example:

c ( ( a | b ) c )∗ S1 c S2

a

b

Page 41: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

32

Definition: DFA

Deterministic finite automaton (DFA):(Q,Σ, δ, q0, F )

� Finite set Q of states

� Finite set Σ of input symbols

� Transition function δ : Q× Σ→ Q

� Start state q0

� Set of accept states F ⊆ Q

Page 42: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

33

DFA versus NFA

� Deterministic finite automaton (DFA):

� At most one outgoing transition for each input

symbol

� No ε transitions (empty word)

� Non-deterministic finite automaton(NFA)

� Multiple outgoing transitions for same character

� May have ε transitions

Page 43: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

34

From Reg. Expressions to DFAs

See course on theoretical computerscience or Chapter 2 of ProgrammingLanguage Pragmatics

Page 44: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

35

From NFA to Minimal DFA

� NFA to DFA� Why? To avoid exploring multiple possible next

states during scanning

� How? Powerset construction method (not

covered here)

� DFA to minimal DFA� Why? Simplifies a DFA-based scanner

� How? Remove unreachable and

non-distiguishable states

Page 45: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

36

From DFA to Scanner

Two popular options

� Implement the DFA using switch statements

� Mostly in hand-written scanners

� Table-based scanners

� Table represents states and transitions

� Driver program indexes the table

� Mostly in auto-generated scanners

Page 46: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 1

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Startingstate: S1

Page 47: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 2

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Loop reads onecharacter at a timeand builds thetoken

Page 48: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 3

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Switch statementthat handles thecurrent state

Page 49: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 4

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Switch statementsto handle thecurrent character

Page 50: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 5

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Move to nextstate ifcharacteraccepted

Page 51: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 6

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Return thetoken when aspace occurs

Page 52: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

37 - 7

Switch Statement Style

S1 c S2

a

b

state = S1token = ""loop:switch state:case S1:switch in_char:case ’c’: state = S2else error

case S2:switch in_char:case ’a’: state = S1case ’b’: state = S1case ’ ’: returnelse error

token = token + in_charread next in_char

Raise an errorfor any illegalcharacter

Page 53: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

38

Table-based Scanning

Transition table indexed by state and input:

S1 c S2

a

b

State ’a’ ’b’ ’c’ Return

S1 - - S2 -S2 S1 S1 - token

Driver program

� moves to a new state,

� returns a token, or

� raises an error

Page 54: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

39

Longest Possible Token Rule

� What if one token is a prefix ofanother?� Number 3.1 vs. number 3.141

� Accept the longest possible token� 3.141 for the above example

� How to decide whether token hasended?� Scanner looks ahead (at least one character)

Page 55: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

40 - 1

Quiz: Automata and Scanners

https://ilias3.uni-stuttgart.de/vote/0ZT9

Which of these statements is true?

� A scanner produces a sequence of tokens.

� A scanner produces a syntax tree.

� Efficient scanners are based on NFAs.

� A scanner for C will turn ”ifStmt” into two tokens

”if” and ”Stmt”.

Page 56: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

40 - 2

Quiz: Automata and Scanners

https://ilias3.uni-stuttgart.de/vote/0ZT9

Which of these statements is true?

� A scanner produces a sequence of tokens.

� A scanner produces a syntax tree.

� Efficient scanners are based on NFAs.

� A scanner for C will turn ”ifStmt” into two tokens

”if” and ”Stmt”.

Page 57: Lecture 2: Syntax (Part 1) Programming Paradigmssoftware-lab.org/teaching/winter2019/pp/lecture_syntax1.pdf1 Prof. Dr. Michael Pradel Software Lab, University of Stuttgart Winter 2019/2020

41

Plan for Today

� Specifying syntax

� Regular expressions

� Context-free grammars

� Scanning

� Parsing

4