38
1 Semantic Analysis and Symbol Tables

1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

Embed Size (px)

Citation preview

Page 1: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

1

Semantic Analysis and Symbol Tables

Page 2: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

2

Outline

• How to build symbol tables

• How to use them to find – multiply-declared vars– undeclared vars

• How to perform type checking

Page 3: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

3

The Compiler So Far

• Lexical analysis– Detects inputs with illegal tokens

• int main# (`int argc,

• Parsing– Detects inputs with ill-formed parse trees

• int x = 3 x += 2;• int main { ; }

• Semantic analysis– Last “front end” phase– Catches remaining errors

Page 4: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

4

Introduction

• Typical semantic errors– multiple declarations: declare var. once in a

scope

– undeclared variable: declare before use

– type mismatch: type of LHS of assignment should match (or be compatible with) type of RHS

– wrong arguments: call method with right number and types of args

Page 5: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

5

A Simple Semantic Analyzer

• Works in two phases, traversing AST1. For each scope in program

• process the declarations (var_decl, fun_decl, param) – add new entries to symbol table – report variables that are multiply declared

• process the statements – find uses of undeclared variables – update the “ID” nodes (like “var”) of AST to point to

appropriate “decl” nodes– “use” nodes will point to “decl” nodes in tree (“declPtr” in

TreeNode struct) • No longer need symbol table

2. Process all statements in program again• use the AST “declPtr” links to determine the type of each

expression, and to find type errors

Page 6: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

6

Symbol Table

• Purpose – keep track of declared names

• names of variables, functions, methods, classes, fields

• Symbol table entry – associates a name with various attributes, like

• kind of name (variable, method, class, field)• type (int, float)• nest level • memory location (i.e., where will it be found at runtime)

– globals have labels– locals use “offset” in TreeNode

Page 7: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

7

Scoping

• In most languages, the same name can be declared multiple times if – its declarations occur in different scopes, and/or – its declarations involve different kinds of names

• MSDN on C++: an identifier (name) is a sequence of characters used to denote one of the following

– Object or variable name – Class, structure, or union name – Enumerated type name – Member of a class, structure, union, or enumeration – Function or class-member function – typedef name – Label name – Macro name – Macro parameter

Page 8: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

8

Scoping Example

• Java: can use same name for – a class, – field of class, – a method of class, and – a local variable of method

• A legal Java program:

class Test { int Test = 0; void Test () { double Test = 1; }

}

Page 9: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

9

Scoping: Overloading

• Java and C++– can use same name for more than one method

as long as the signatures are unique (signature = name + param type list)

int add (int a, int b);float add (float a, float b);void add (int a);

Page 10: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

10

Scoping: General Rules

• Scope rules of a language– determine which declaration of a named object corresponds

to each use of the object – i.e., scoping rules map uses of objects to their declarations

• C++ and Java use static scoping– mapping from uses to declarations is made at compile time – C++ uses the "most closely nested" rule

• use of variable “x” matches the declaration in the most closely enclosing scope such that the declaration precedes the use

• a deeply nested variable “x” hides “x” declared in an outer scope

– Java• inner scopes cannot define variables defined in outer scopes

Page 11: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

11

Scope Levels

• Each function can have several scopes– one for the parameters, – one for the function body,

• combine above (we will do this for C-)

– and possibly additional scopes in the function• for each “for” loop • each nested block

Page 12: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

12

Scope Levels Example

void f (int k) { // parameter int k = 0; // local variable (not OK in C++ or Java)while (k) {

int k = 1; // local var in loop (not OK in Java)}

}– the outermost scope includes just the name “f”, and

– function “f” has three nested scopes

Page 13: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

13

Dynamic Scoping

• Not all languages use static scoping • Lisp, APL, and Snobol use dynamic

scoping • Dynamic scoping

– A use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function

Page 14: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

14

Dynamic Scoping: Example

• What’s the output of the following code?

void main() { f1(); f2(); }

void f1() { int x = 10; g(); }

void f2() { String x = "hello"; f3(); g(); }

void f3() { double x = 30.5; }

void g() { print(x); }

Page 15: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

15

Static vs Dynamic Scoping

– Generally, dynamic scoping is a bad idea

• can make a program difficult to understand

• a single use of a variable can correspond to – many different declarations – with different types!

Page 16: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

16

Used before declared?

• Can a name be used before declaration?– Java: a method or field name can be used

before the definition appears • not true for a variable!

– C++?

Page 17: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

17

class Test { void f() {

// OK _val = 0; // OK g(); // ERROR! x = 1; int x;

}

void g() {} int _val;

}

Used before declared? Example

Page 18: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

18

Language Assumptions

• Assume that our language– uses static scoping– requires that all names be declared before use – does not allow multiple declarations of a name in the

same scope • even for different kinds of names

– does allow the same name to be declared in multiple nested scopes

• but only once per scope

– uses the same scope for a function’s parameters and for the local variables declared at the beginning of the function

Page 19: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

19

Symbol Table Purpose

• Our symbol table will be used to answer two questions

1. Given a declaration of a name, is there already a declaration of the same name in the current scope • i.e., is it multiply declared?

2. Given a use of a name, to which declaration does it correspond, or is it undeclared?

Page 20: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

20

Symbol Table Purpose (Cont’d)

• Symbol table is only needed to answer those two questions, i.e.– once all declarations have been processed to

build the symbol table, – and all uses have been processed to link each

“var” or “call” node in the AST with the corresponding decl node,

– then the symbol table itself is no longer needed• no more lookups based on name will be performed

Page 21: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

21

Symbol Table Operations

• Given the above assumptions, we will need

1. Look up a name in the current scope only • to check if it is multiply declared

2. Look up a name in the current and enclosing scopes • to check for a use of an undeclared name, and • to link a use with the corresponding symbol table entry

3. Insert a new name into the symbol table with its attributes

4. Do what must be done when a new scope is entered

5. Do what must be done when a scope is exited

Page 22: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

22

Symbol Table Implementations

1. a list of tables*2. a table of lists

• For each approach, we will consider – what must be done when entering and exiting a scope, – when processing a declaration, and – when processing a use

• Simplification – assume each symbol-table entry includes only

• symbol name • type • nesting level of its declaration

Page 23: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

23

Method 1: List of Hash Tables

• The idea– symbol table = a list (or vector) of hash tables – one hash table for each currently visible scope

• When processing a scope S

back front

declarations made in S

declarations made in scopes that enclose S

do push_back of a table

Page 24: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

24

Example

void f (int a, int b) {

double x;

while (...) { int x, y; ... }

} void g () { f (); }

• After processing decls inside the while loop

x: int, 2y: int, 2

a: int, 1b: int, 1x: double, 1

f: (int, int) void, 0

Our implementation: vector<unordered_set<…> *> table;

Page 25: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

25

List of Hash Tables: Operations

1.On scope entry • increment current level number and add a new

empty hashtable to back of list

2.To process a declaration of “x”• look up “x” in last table in list

• If it is there, then issue a "multiply declared variable" error;

• otherwise, add “x” to the last table in the list

Page 26: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

26

Operations (Cont.)

3. To process a use of “x”• look up “x” starting in last table in list

• if it is not there, then look up “x” in each prior table in the list

• if it is not in any table then issue an "undeclared variable" error

4. On scope exit • remove the last table from the list and

decrement current level number

Page 27: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

27

Remember

• Function names belong in the hash table for the outermost scope– not in same table as function’s variables

• In example above– function name “f” is in the symbol table for the

outermost scope– name “f” is not in the same scope as parameters “a” and

“b”, and variable “x”– This is so that when use of name “f” in method “g” is

processed, “f” is found in an enclosing scope's table

Page 28: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

28

Running Times for Operations

1.Scope entry• time to initialize a new, empty hash table• probably proportional to the capacity of the hash table

2.Process a declaration• using hashing, constant expected time (O(1))

3.Process a use• using hashing, worst-case time is O(depth of nesting), when

every table in the list must be examined

4.Scope exit• time to remove a table from the list, which is O(1) if memory

deallocation is ignored

Page 29: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

29

Method 2: Hash Table of Lists

• The idea – when processing a scope S, the structure of the

symbol table is

x:

y:

z:

Page 30: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

30

Definition

• Just one big hash table, containing an entry for each variable for which there is – some declaration in scope S or – in a scope that encloses S

• Associated with each variable is a list of symbol-table entries– First list item corresponds to the most closely

enclosing declaration – Other list items correspond to declarations in

enclosing scopes

Page 31: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

31

Example

void f (int a) {

double x;

while (...) { int x, y; ... }

void g () { f (); }

}

• After processing declarations inside the while loop f:

a:

x:

y:

int, 1

int, 2 double, 1

int, 2

int void, 0

Page 32: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

32

Nesting Level Information

• Level-number attribute stored in each entry enables us to determine whether most closely enclosing declaration was made – in the current scope or – in an enclosing scope

Page 33: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

33

Hash Table of Lists: Operations

1.On scope entry• increment the current level number

2.To process a declaration of “x”• look up ”x”

• if found, get level number from first list item• if that level number = current level then issue a

"multiply declared variable" error• otherwise, add a new item to front of list with

appropriate type and current level number

Page 34: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

34

Operations (Cont.)

3. To process a use of “x”• look up “x”• if not found, issue an "undeclared variable"

error

4. On scope exit• scan all lists in table, looking at first item on

each list• if that item's level number = current level

number, then remove item from its list • decrement current level number

Page 35: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

35

Running times

1.Scope entry• time to increment level number, O(1)

2.Process a declaration• using hashing, constant expected time (O(1))

3.Process a use• using hashing, constant expected time (O(1))

4.Scope exit• time proportional to the number of names in symbol table

(or perhaps even size of hash table if no auxiliary information is maintained to allow iteration through non-empty hash table buckets)

Page 36: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

36

Type Checking

• Purpose of type-checking phase – Determine type of each expression in program

• Annotate AST

– Find type errors

• Type rules of a language define – how to determine expression types, and – what is considered to be an error

• Type rules specify, for every operator (including assignment), – what types the operands can have, and – what is the type of the result

Page 37: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

37

Example

• Both C++ and Java allow the addition of an int and a double, and the result is of type double

• However, – C++ also allows a value of type double to be

assigned to a variable of type int– Java considers that an error

Page 38: 1 Semantic Analysis and Symbol Tables. 2 Outline How to build symbol tables How to use them to find –multiply-declared vars –undeclared vars How to perform

38

Other Type Errors

• The type checker must also 1. find type errors having to do with the context of

expressions, • e.g., the context of some expressions must be boolean

2. type errors having to do with function calls

• Examples of context errors– condition of an if statement – condition of a while loop – termination condition of a for loop

• Examples of errors – calling something that is not a function – calling a function with wrong number of arguments – calling a function with arguments of the wrong types