149
Compila(on 03683133 Lecture 1: Introduc(on Lexical Analysis Noam Rinetzky 1

Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compila(on      0368-­‐3133  

Lecture  1:  Introduc(on  

Lexical    Analysis          

Noam  Rinetzky  1

Page 2: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

2

Page 3: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Admin  

•  Lecturer:  Noam  Rinetzky  – [email protected]  –  h.p://www.cs.tau.ac.il/~maon  

•  T.A.:  Orr  Tamir  

•  Textbooks:    – Modern  Compiler  Design    –  Compilers:  principles,  techniques  and  tools  

3

Page 4: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Admin  

•  Compiler  Project  40%  –  4.5  prac(cal  exercises  – Groups  of  3  

•  1  theore(cal  exercise  10%  – Groups  of  1  

•  Final  exam  50%    – must  pass  

4

Page 5: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Course  Goals  

• What  is  a  compiler  •  How  does  it  work  •  (Reusable)  techniques  &  tools  

5

Page 6: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Course  Goals  

• What  is  a  compiler  •  How  does  it  work  •  (Reusable)  techniques  &  tools  

•  Programming  language  implementa(on  –  run(me  systems    

•  Execu(on  environments  – Assembly,  linkers,  loaders,  OS    

6

Page 7: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Introduc(on  

Compilers:  principles,  techniques  and  tools  

7

Page 8: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  Compiler?  

8

Page 9: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  Compiler?  

“A  compiler  is  a  computer  program  that  transforms  source  code  wriaen  in  a  programming  language  (source  language)  into  another  language  (target  language).      

The  most  common  reason  for  wan(ng  to  transform  source  code  is  to  create  an  executable  program.”    

         -­‐-­‐Wikipedia  

9

Page 10: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  Compiler?  source language target language

Compiler

Executable

code

exe Source

text

txt

10

Page 11: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  Compiler?  

Executable

code

exe Source

text

txt

Compiler

int a, b; a = 2; b = a*2 + 1;

MOV R1,2 SAL R1 INC R1 MOV R2,R1

11

Page 12: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  Compiler?  source language target language

C C++

Pascal Java

Postscript

TeX

Perl JavaScript

Python Ruby

Prolog

Lisp

Scheme ML

OCaml

IA32 IA64

SPARC C

C++ Pascal Java

Java Bytecode

Compiler

12

Page 13: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

High  Level  Programming  Languages  •  Impera(ve  Algol,  PL1,  Fortran,  Pascal,  Ada,  Modula,  C  

–  Closely  related  to  “von  Neumann”  Computers  

•  Object-­‐oriented  Simula,  Smalltalk,  Modula3,  C++,  Java,  C#,  Python  –   Data  abstrac(on  and  ‘evolu(onary’  form  of  program  development  •  Class  an  implementa(on  of  an  abstract  data  type  (data+code)  •  Objects  Instances  of  a  class  •  Inheritance  +  generics  

•  Func(onal  Lisp,  Scheme,  ML,  Miranda,  Hope,  Haskel,  OCaml,  F#    

•  Logic  Programming  Prolog   13

Page 14: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

More  Languages  •  Hardware  descrip(on  languages  VHDL  

–  The  program  describes  Hardware  components  –  The  compiler  generates  hardware  layouts  

•  Graphics  and  Text  processing  TeX,  LaTeX,    postscript  –  The  compiler  generates  page  layouts  

•  Scrip(ng  languages  Shell,  C-­‐shell,  Perl  –  Include  primi(ves  constructs  from  the  current  sojware  environment  

•  Web/Internet  HTML,  Telescript,  JAVA,  Javascript    •  Intermediate-­‐languages  Java  bytecode,  IDL    

14

Page 15: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

High  Level  Prog.  Lang.,  Why?  

15

Page 16: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

High  Level  Prog.  Lang.,  Why?  

16

Page 17: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiler  vs.  Interpreter    

17

Page 18: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiler  

•  A  program  which  transforms  programs    •  Input  a  program  (P)  •  Output  an  object  program  (O)  

–  For  any  x,  “O(x)”  “=“  “P(x)”  Compiler

Source

text

txt Executable

code

exe

P O 18

Page 19: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiling  C  to  Assembly  

Compiler

int  x;  scanf(“%d”,  &x);  x  =  x  +  1  ;  printf(“%d”,  x);  

add  %fp,-­‐8,  %l1  mov  %l1,  %o1  call  scanf  ld  [%fp-­‐8],%l0  add  %l0,1,%l0  st  %l0,[%fp-­‐8]  ld  [%fp-­‐8],  %l1  mov    %l1,  %o1  call  printf            

5

6

19

Page 20: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpreter  

•  A  program  which  executes  a  program  •  Input  a  program  (P)  +  its  input  (x)  •  Output  the  computed  output  (P(x))  

Interpreter

Source

text

txt

Input

Output

20

Page 21: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpre(ng  (running)  .py  programs  

•  A  program  which  executes  a  program  •  Input  a  program  (P)  +  its  input  (x)  •  Output  the  computed  output  (“P(x)”)  

Interpreter

5

int  x;  scanf(“%d”,  &x);  x  =  x  +  1  ;  printf(“%d”,  x);  

6  

21

Page 22: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiler  vs.  Interpreter  Source

Code

Executable

Code

Machine

Source

Code

Intermediate

Code

Interpreter

preprocessing

processing preprocessing

processing

22

Page 23: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiled  programs  are  usually  more  efficient  than  

scanf(“%d”,&x);  y  =  5  ;  z  =  7  ;  x  =  x  +  y  *  z;  printf(“%d”,x);    

add      %fp,-­‐8,  %l1  mov      %l1,  %o1  call    scanf  mov      5,  %l0  st        %l0,[%fp-­‐12]  mov      7,%l0  st        %l0,[%fp-­‐16]  ld        [%fp-­‐8],  %l0  ld        [%fp-­‐8],%l0  add      %l0,  35  ,%l0  st        %l0,[%fp-­‐8]  ld        [%fp-­‐8],  %l1  mov      %l1,  %o1    call    printf  

Compiler

23

Page 24: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compilers  report    input-­‐independent    possible  errors  •  Input-­‐program            

•  Compiler-­‐Output    –  “line  88:  x  may  be  used  before  set''  

scanf(“%d”,    &y);  if  (y  <  0)  

 x  =  5;  ...  If  (y  <=  0)  

 z  =  x  +  1;    

24

Page 25: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpreters  report    input-­‐specific  definite  errors  

•  Input-­‐program            

•  Input  data    –  y  =  -­‐1  –  y  =  0      

scanf(“%d”,    &y);  if  (y  <  0)  

 x  =  5;  ...  If  (y  <=  0)  

 z  =  x  +  1;    

25

Page 26: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpreter        vs.            Compiler  

•  Conceptually  simpler    –  “define”  the  prog.  lang.    

•  Can  provide  more  specific  error  report  

•  Easier  to  port  

   

•  Faster  response  (me  

•  [More  secure]  

•  How  do  we  know  the  transla(on  is  correct?  

•  Can  report  errors  before  input  is  given  

•  More  efficient  code  –  Compila(on  can  be  expensive    –  move  computa(ons  to  compile-­‐(me  

•  compile-­‐:me  +  execu:on-­‐:me  <  interpreta:on-­‐:me  is  possible  

26

Page 27: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Concluding  Remarks  

•  Both  compilers  and  interpreters  are  programs  wriaen  in  high  level  language  

•  Compilers  and  interpreters  share  func(onality  

•  In  this  course  we  focus  on  compilers  

27

Page 28: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Ex  0:  A  Simple  Interpreter    

28

Page 29: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Toy  compiler/interpreter  

•  Trivial  programming  language  •  Stack  machine  •  Compiler/interpreter  wriaen  in  C  •  Demonstrate  the  basic  steps  

•  Textbook:  Modern  Compiler  Design  1.2  

29

Page 30: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Conceptual  Structure  of  a  Compiler  

Executable

code

exe Source

text

txt

Semantic

Representation

Backend

(synthesis)

Compiler

Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

Intermediate Representation

(IR)

Code

Generation

30

Page 31: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 31

Page 32: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Source  Language  

•  Fully  parameterized  expressions  •  Arguments  can  be  a  single  digit    

ü     (4  +  (3  *  9))  ✗ 3  +  4  +  5  ✗  (12  +  3)    

expression  →  digit  |  ‘(‘  expression  operator  expression  ‘)’  operator  →  ‘+’  |  ‘*’  digit  →  ‘0’  |  ‘1’  |  ‘2’  |  ‘3’  |  ‘4’  |  ‘5’  |  ‘6’  |  ‘7’  |  ‘8’  |  ‘9’    

32

Page 33: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

The  abstract  syntax  tree  (AST)  

•  Intermediate  program  representa(on  •  Defines  a  tree  

–  Preserves  program  hierarchy  

•  Generated  by  the  parser  •  Keywords  and  punctua(on  symbols  are  not  stored    – Not  relevant  once  the  tree  exists  

33

Page 34: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Concrete  syntax  tree#  for  5*(a+b)  

expression

number expression ‘*’

identifier

expression ‘(’ ‘)’

‘+’ identifier

‘a’ ‘b’

‘5’

#Parse tree 34

Page 35: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Abstract  Syntax  tree  for  5*(a+b)  

‘*’

‘+’

‘a’ ‘b’

‘5’

35

Page 36: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Annotated  Abstract  Syntax  tree  

‘*’

‘+’

‘a’ ‘b’

‘5’

type:real

loc: reg1

type:real

loc: reg2

type:real

loc: sp+8

type:real

loc: sp+24

type:integer

36

Page 37: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Driver  for  the  toy  compiler/interpreter  

#include        "parser.h"            /*  for  type  AST_node  */  #include        "backend.h"          /*  for  Process()  */  #include        "error.h"              /*  for  Error()  */    int  main(void)  {          AST_node  *icode;            if  (!Parse_program(&icode))  Error("No  top-­‐level  expression");          Process(icode);            return  0;  }  

 

37

Page 38: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 38

Page 39: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lexical  Analysis  

•  Par((ons  the  inputs  into  tokens  – DIGIT  –  EOF  –  ‘*’  –  ‘+’  –  ‘(‘  –  ‘)’  

•  Each  token  has  its  representa(on  •  Ignores  whitespaces   39

Page 40: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

lex.h:  Header  File  for  Lexical  Analysis  

/*  Define  class  constants  */  

/*  Values  0-­‐255  are  reserved  for  ASCII  characters  */  

#define  EoF          256  

#define  DIGIT      257  

typedef  struct  {  

 int  class;    

 char  repr;}  Token_type;  

 

extern    Token_type  Token;  

extern    void  get_next_token(void);  

40

Page 41: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

#include  "lex.h"      token_type  Token;    //  Global  variable    void  get_next_token(void)  {          int  ch;          do  {                  ch  =  getchar();                  if  (ch  <  0)  {                          Token.class  =  EoF;  Token.repr  =  '#';                          return;                  }          }  while  (Layout_char(ch));          if  ('0'  <=  ch  &&  ch  <=  '9')  {Token.class  =  DIGIT;}          else  {Token.class  =  ch;}          Token.repr  =  ch;  }      static  int  Layout_char(int  ch)  {          switch  (ch)  {            case  '  ':  case  '\t':  case  '\n':  return  1;            default:                                                return  0;          }  }          

Lexical  Analyzer  

41

Page 42: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 42

Page 43: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Parser  

•  Invokes  lexical  analyzer  •  Reports  syntax  errors  •  Constructs  AST  

43

Page 44: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Parser  Header  File  

typedef  int  Operator;  

typedef  struct  _expression  {        

       char  type;        /*  'D'  or  'P'  */    

       int  value;        /*  for  'D'  type  expression  */          

       struct  _expression  *left,  *right;        /*  for  'P'  type  expression  */    

       Operator  oper;                                              /*  for  'P'  type  expression  */          

}  Expression;  

 

typedef  Expression  AST_node;              /*  the  top  node  is  an  Expression  */  

extern  int  Parse_program(AST_node  **);  

44

Page 45: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

AST  for  (2  *  ((3*4)+9))  P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 45

Page 46: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Driver  for  the  Toy  Compiler  

#include        "parser.h"            /*  for  type  AST_node  */  #include        "backend.h"          /*  for  Process()  */  #include        "error.h"              /*  for  Error()  */    int  main(void)  {          AST_node  *icode;            if  (!Parse_program(&icode))  Error("No  top-­‐level  expression");          Process(icode);            return  0;  }  

 

46

Page 47: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

AST  for  (2  *  ((3*4)+9))  P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 47

Page 48: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Driver  for  the  Toy  Compiler  

#include        "parser.h"            /*  for  type  AST_node  */  #include        "backend.h"          /*  for  Process()  */  #include        "error.h"              /*  for  Error()  */    int  main(void)  {          AST_node  *icode;            if  (!Parse_program(&icode))  Error("No  top-­‐level  expression");          Process(icode);            return  0;  }  

 

48

Page 49: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Source  Language  

•  Fully  parenthesized  expressions  •  Arguments  can  be  a  single  digit    

ü     (4  +  (3  *  9))  ✗ 3  +  4  +  5  ✗  (12  +  3)    

expression  →  digit  |  ‘(‘  expression  operator  expression  ‘)’  operator  →  ‘+’  |  ‘*’  digit  →  ‘0’  |  ‘1’  |  ‘2’  |  ‘3’  |  ‘4’  |  ‘5’  |  ‘6’  |  ‘7’  |  ‘8’  |  ‘9’    

49

Page 50: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

lex.h:  Header  File  for  Lexical  Analysis  

/*  Define  class  constants  */  

/*  Integers  are  used  to  encode  characters  +  special  codes  */  

/*  Values  0-­‐255  are  reserved  for  ASCII  characters  */  

#define  EoF          256  

#define  DIGIT      257  

typedef  struct  {  

 int  class;    

 char  repr;}  Token_type;  

 

extern    Token_type  Token;  

extern    void  get_next_token(void);  

50

Page 51: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

#include  "lex.h"      token_type  Token;    //  Global  variable    void  get_next_token(void)  {          int  ch;          do  {                  ch  =  getchar();                  if  (ch  <  0)  {                          Token.class  =  EoF;  Token.repr  =  '#’;  return;}          }  while  (Layout_char(ch));            if  ('0'  <=  ch  &&  ch  <=  '9')    

 Token.class  =  DIGIT;          else    

 Token.class  =  ch;          Token.repr  =  ch;  }      static  int  Layout_char(int  ch)  {          switch  (ch)  {            case  '  ':  case  '\t':  case  '\n':  return  1;            default:                                                return  0;          }  }          

Lexical  Analyzer  

51

Page 52: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

AST  for  (2  *  ((3*4)+9))  P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 52

Page 53: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Driver  for  the  Toy  Compiler  

#include        "parser.h"            /*  for  type  AST_node  */  #include        "backend.h"          /*  for  Process()  */  #include        "error.h"              /*  for  Error()  */    int  main(void)  {          AST_node  *icode;            if  (!Parse_program(&icode))  Error("No  top-­‐level  expression");          Process(icode);            return  0;  }  

 

53

Page 54: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Parser  Environment  #include  "lex.h”,  "error.h”,  "parser.h"      static  Expression  *new_expression(void)  {          return  (Expression  *)malloc(sizeof  (Expression));  }            static  int  Parse_operator(Operator  *oper_p);  static  int  Parse_expression(Expression  **expr_p);  int  Parse_program(AST_node  **icode_p)  {          Expression  *expr;          get_next_token();                      /*  start  the  lexical  analyzer  */          if  (Parse_expression(&expr))  {                  if  (Token.class  !=  EoF)  {                          Error("Garbage  after  end  of  program");                  }                  *icode_p  =  expr;                  return  1;          }          return  0;  }   54

Page 55: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Top-­‐Down  Parsing  •  Op(mis(cally  build  the  tree  from  the  root  to  leaves  •  For  every  P  →  A1  A2  …  An  |  B1  B2  …  Bm  

–  If  A1  succeeds  •  If  A2  succeeds  &  A3  succeeds  &  …  •  Else  fail  

–  Else  if  B1  succeeds  •  If  B2  succeeds  &  B3  succeeds  &  ..  •  Else  fail  

–  Else  fail  

•  Recursive  descent  parsing  –  Simplified:  no  backtracking  

•  Can  be  applied  for  certain  grammars    55

Page 56: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

static  int  Parse_expression(Expression  **expr_p)  {          Expression  *expr  =  *expr_p  =  new_expression();          if  (Token.class  ==  DIGIT)  {                  expr-­‐>type  =  'D';  expr-­‐>value  =  Token.repr  -­‐  '0';                  get_next_token();        return  1;          }          if  (Token.class  ==  '(')  {                  expr-­‐>type  =  'P';    get_next_token();                  if  (!Parse_expression(&expr-­‐>left))  {    Error("Missing  expression");  }                  if  (!Parse_operator(&expr-­‐>oper))  {  Error("Missing  operator");  }                  if  (!Parse_expression(&expr-­‐>right))  {  Error("Missing  expression");  }                  if  (Token.class  !=  ')')  {    Error("Missing  )");  }                  get_next_token();                  return  1;          }          /*  failed  on  both  attempts  */          free_expression(expr);  return  0;  }  

Parser  

static  int  Parse_operator(Operator  *oper)  {          if  (Token.class  ==  '+')  {                  *oper  =  '+';  get_next_token();  return  1;          }          if  (Token.class  ==  '*')  {                  *oper  =  '*';  get_next_token();  return  1;          }          return  0;  }  

 

56

Page 57: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

AST  for  (2  *  ((3*4)+9))  

P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 57

Page 58: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 58

Page 59: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Seman(c  Analysis    

•  Trivial  in  our  case  •  No  iden(fiers  •  No  procedure  /  func(ons  •  A  single  type  for  all  expressions  

59

Page 60: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 60

Page 61: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Intermediate  Representa(on  

P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 61

Page 62: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Alterna(ve  IR:  3-­‐Address  Code  

L1: _t0=a _t1=b _t2=_t0*_t1 _t3=d _t4=_t2-_t3 GOTO L1

“Simple Basic-like programming language” 62

Page 63: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 63

Page 64: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Code  genera(on  

•  Stack  based  machine  •  Four  instruc(ons  

–  PUSH  n  – ADD  – MULT  –  PRINT  

64

Page 65: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Code  genera(on  #include        "parser.h"              #include        "backend.h"    static  void  Code_gen_expression(Expression  *expr)  {          switch  (expr-­‐>type)  {          case  'D':                  printf("PUSH  %d\n",  expr-­‐>value);                  break;          case  'P':                  Code_gen_expression(expr-­‐>left);                  Code_gen_expression(expr-­‐>right);                  switch  (expr-­‐>oper)  {                  case  '+':  printf("ADD\n");  break;                  case  '*':  printf("MULT\n");  break;                  }                  break;          }  }  void  Process(AST_node  *icode)  {          Code_gen_expression(icode);  printf("PRINT\n");  }   65

Page 66: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiling  (2*((3*4)+9))  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 66

Page 67: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Execu(ng  Compiled  Program  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 67

Page 68: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack Stack’

2

68

Page 69: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

3

2

Stack

2

69

Page 70: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

4

3

2

Stack

3

2

70

Page 71: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

12

2

Stack

4

3

2

71

Page 72: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

9

12

2

Stack

12

2

72

Page 73: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

21

2

Stack

9

12

2

73

Page 74: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

42

Stack

21

2

74

Page 75: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generated  Code  Execu(on  

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’ Stack

42

75

Page 76: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Shortcuts  

•  Avoid  genera(ng  machine  code  •  Use  local  assembler  •  Generate  C  code  

76

Page 77: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Structure  of  toy  Compiler  /  interpreter  

Executable code

exe

Source

text

txt

Semantic

Representation

Backend (synthesis) Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

Intermediate Representation

(AST)

Code

Generation

Execution

Engine

Execution Engine

Output*

* Programs in our PL do not take input 77

Page 78: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpreta(on  

•  Boaom-­‐up  evalua(on  of  expressions  •  The  same  interface  of  the  compiler  

78

Page 79: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

#include  "parser.h"              #include  "backend.h”    static  int  Interpret_expression(Expression  *expr)  {          switch  (expr-­‐>type)  {          case  'D':                  return  expr-­‐>value;                  break;          case  'P':                    int  e_left  =  Interpret_expression(expr-­‐>left);                  int  e_right  =  Interpret_expression(expr-­‐>right);                  switch  (expr-­‐>oper)  {                  case  '+':  return  e_left  +  e_right;                  case  '*':  return  e_left  *  e_right;                  break;          }  }    void  Process(AST_node  *icode)  {          printf("%d\n",  Interpret_expression(icode));  }  

79

Page 80: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Interpre(ng  (2*((3*4)+9))  

P

* oper  

type  lej   right  

P +

P

*

D

2

D

9

D

4 D

3 80

Page 81: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Summary:  Journey  inside  a  compiler  

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

x = b*b – 4*a*c

txt

<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

Token Stream

81

Page 82: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

<ID,”x”>  <EQ>  <ID,”b”>  <MULT>  <ID,”b”>  <MINUS>  <INT,4>  <MULT>  <ID,”a”>  <MULT>  <ID,”c”>  

‘b’   ‘4’  

‘b’  ‘a’  

‘c’  

ID  

ID  

ID  

ID  

ID  

factor  

term   factor  MULT  

term  

expression  

expression  

factor  

term   factor  MULT  

term  

expression  

term  

MULT   factor  

MINUS  

Syntax  Tree  

Summary:  Journey  inside  a  compiler  

82

Statement  

‘x’  

ID   EQ  

Page 83: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

‘b’   ‘4’  

‘b’  ‘a’  

‘c’  

ID  

ID  

ID  

ID  

ID  

factor  

term   factor  MULT  

term  

expression  

expression  

factor  

term   factor  MULT  

term  

expression  

term  

MULT   factor  

MINUS  

Syntax  Tree  

Summary:  Journey  inside  a  compiler  

83

<ID,”x”>  <EQ>  <ID,”b”>  <MULT>  <ID,”b”>  <MINUS>  <INT,4>  <MULT>  <ID,”a”>  <MULT>  <ID,”c”>  

Page 84: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Sem. Analysis

Inter. Rep.

Code Gen.

‘b’  

‘4’  

‘b’

‘a’  

‘c’  

MULT  

MULT  

MULT  

MINUS  

Lexical Analysis

Syntax Analysis

Abstract  Syntax  Tree  

Summary:  Journey  inside  a  compiler  

84

Page 85: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

‘b’  

‘4’  

‘b’  

‘a’  

‘c’  

MULT  

MULT  

MULT  

MINUS  

type:  int  loc:  sp+8  

type:  int  loc:  const  

type:  int  loc:  sp+16  

type:  int  loc:  sp+16  

type:  int  loc:  sp+24  

type:  int  loc:  R2  

type:  int  loc:  R2  

type:  int  loc:  R1  

type:  int  loc:  R1  

Annotated  Abstract  Syntax  Tree  

Summary:  Journey  inside  a  compiler  

85

Page 86: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Journey  inside  a  compiler  

Lexical Analysis

Syntax Analysis

Sem. Analysis

Inter. Rep.

Code Gen.

‘b’  

‘4’  

‘b’  

‘a’  

‘c’  

MULT  

MULT

MULT  

MINUS  

type:  int  loc:  sp+8  

type:  int  loc:  const  

type:  int  loc:  sp+16  

type:  int  loc:  sp+16  

type:  int  loc:  sp+24  

type:  int  loc:  R2  

type:  int  loc:  R2  

type:  int  loc:  R1  

type:  int  loc:  R1  

R2  =  4*a  R1=b*b  R2=  R2*c  R1=R1-­‐R2  

Intermediate  Representa(on  

86

Page 87: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Journey  inside  a  compiler  

Inter. Rep.

Code Gen.

‘b’  

‘4’  

‘b’  

‘a’  

‘c’  

MULT  

MULT  

MULT  

MINUS  

type:  int  loc:  sp+8  

type:  int  loc:  const  

type:  int  loc:  sp+16  

type:  int  loc:  sp+16  

type:  int  loc:  sp+24  

type:  int  loc:  R2  

type:  int  loc:  R2  

type:  int  loc:  R1  

type:  int  loc:  R1  

R2  =  4*a  R1=b*b  R2=  R2*c  R1=R1-­‐R2  

MOV  R2,(sp+8)  SAL  R2,2  MOV  R1,(sp+16)  MUL  R1,(sp+16)  MUL  R2,(sp+24)  SUB  R1,R2  

Lexical Analysis

Syntax Analysis

Sem. Analysis

Intermediate  Representa(on  

Assembly  Code  

87

Page 88: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Error  Checking  

•  In  every  stage…  

•  Lexical  analysis:  illegal  tokens  •  Syntax  analysis:  illegal  syntax    •  Seman(c  analysis:  incompa(ble  types,  undefined  variables,  …  

•  Every  phase  tries  to  recover  and  proceed  with  compila(on  (why?)  –  Divergence  is  a  challenge  

88

Page 89: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

The  Real  Anatomy  of  a  Compiler  

Executable

code

exe

Source

text

txt Lexical Analysis

Sem. Analysis

Process text input

characters Syntax Analysis tokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimization

IR Code generation IR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

89

Page 90: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Op(miza(ons  •  “Op(mal  code”  is  out  of  reach    

–  many  problems  are  undecidable  or  too  expensive  (NP-­‐complete)  –  Use  approxima(on  and/or  heuris(cs    

•  Loop  op(miza(ons:  hois(ng,  unrolling,  …    •  Peephole  op(miza(ons  •  Constant  propaga(on  

–  Leverage  compile-­‐(me  informa(on  to  save  work  at  run(me  (pre-­‐computa(on)  

•  Dead  code  elimina(on  –  space  

•  …  

90

Page 91: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Machine  code  genera(on  

•  Register  alloca(on  –  Op(mal  register  assignment  is  NP-­‐Complete  –  In  prac(ce,  known  heuris(cs  perform  well  

•  assign  variables  to  memory  loca(ons  •  Instruc(on  selec(on  

–  Convert  IR  to  actual  machine  instruc(ons  

•  Modern  architectures  –  Mul(cores  –  Challenging  memory  hierarchies    

91

Page 92: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

And  on  a  More  General  Note  

92

Page 93: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Course  Goals  

• What  is  a  compiler  •  How  does  it  work  •  (Reusable)  techniques  &  tools  

•  Programming  language  implementa(on  –  run(me  systems    

•  Execu(on  environments  – Assembly,  linkers,  loaders,  OS    

93

Page 94: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

To  Compilers,  and  Beyond  …  

•  Compiler  construc(on  is  successful  –  Clear  problem    –  Proper  structure  of  the  solu(on  –  Judicious  use  of  formalisms  

• Wider  applica(on  – Many  conversions  can  be  viewed  as  compila(on  

•  Useful  algorithms  94

Page 95: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Conceptual  Structure  of  a  Compiler  

Executable

code

exe Source

text

txt

Semantic

Representation

Backend

(synthesis)

Compiler

Frontend

(analysis)

95

Page 96: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Conceptual  Structure  of  a  Compiler  

Executable

code

exe Source

text

txt

Semantic

Representation

Backend

(synthesis)

Compiler

Frontend

(analysis)

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

Intermediate Representation

(IR)

Code

Generation

96

Page 97: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Judicious  use  of  formalisms  

•  Regular  expressions  (lexical  analysis)  •  Context-­‐free  grammars    (syntac(c  analysis)  •  Aaribute  grammars  (context  analysis)  •  Code  generator  generators  (dynamic  programming)  

•  But  also  some  niay-­‐griay  programming  

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

Intermediate Representation

(IR)

Code

Generation

97

Page 98: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Use  of  program-­‐genera(ng  tools  

•  Parts  of  the  compiler  are  automa(cally  generated  from  specifica(on  

Stream  of  tokens  

Jlex  

regular  expressions  

input  program   scanner  

98

Page 99: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Use  of  program-­‐genera(ng  tools  

•  Parts  of  the  compiler  are  automa(cally  generated  from  specifica(on  

Jcup  

Context  free  grammar  

Stream  of  tokens   parser   Syntax  tree  

99

Page 100: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Use  of  program-­‐genera(ng  tools  •  Simpler  compiler  construc(on  

–  Less  error  prone  –  More  flexible  

•  Use  of  pre-­‐canned  tailored  code  –  Use  of  dirty  program  tricks  

•  Reuse  of  specifica(on  

tool  specification  

input   (generated)  code  

output  100

Page 101: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Compiler  Construc(on  Toolset  

•  Lexical  analysis  generators  –  Lex,  JLex  

•  Parser  generators  –  Yacc,  Jcup  

•  Syntax-­‐directed  translators  •  Dataflow  analysis  engines  

101

Page 102: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Wide  applicability  

•  Structured  data  can  be  expressed  using  context  free  grammars  – HTML  files  –  Postscript  –  Tex/dvi  files  – …  

102

Page 103: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Generally  useful  algorithms  

•  Parser  generators  •  Garbage  collec(on  •  Dynamic  programming  •  Graph  coloring  

103

Page 104: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

How  to  write  a  compiler?  

104

Page 105: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

How  to  write  a  compiler?  

L1  Compiler    

Executable  compiler  

exe    

L2  Compiler  source    

txt  L1  

105

Page 106: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

How  to  write  a  compiler?  

L1  Compiler    

Executable  compiler  

exe    

L2  Compiler  source    

txt  L1  

L2  Compiler    

Executable  program  

exe    

Program  source    

txt  L2  

=

106

Page 107: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

How  to  write  a  compiler?  

L1  Compiler    

Executable  compiler  

exe    

L2  Compiler  source    

txt  L1  

L2  Compiler    

Executable  program  

exe    

Program  source    

txt  L2  

=

107

Program    

Output  

Y    

Input    

X  

=  

107

Page 108: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Bootstrapping  a  compiler    

108

Page 109: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Bootstrapping  a  compiler    

L1  Compiler  simple              

L2  executable  compiler  

exe  Simple  

L2  compiler  source    

txt  L1  

L2s  Compiler  Inefficient  adv.        

L2  executable  compiler  

exe  advanced  

L2  compiler  source    

txt  L2  

L2  Compiler  Efficient  adv.        

L2  executable  compiler  

Y  advanced  

L2  compiler  source    

X  

=

=  

109

Page 110: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Proper  Design  

•  Simplify  the  compila(on  phase  –  Portability  of  the  compiler  frontend  –  Reusability  of  the  compiler  backend  

•  Professional  compilers  are  integrated  

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

IR

110

Page 111: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Modularity  

Source Language 1

txt

Semantic Representation

Backend

TL2

Frontend

SL2

int a, b; a = 2; b = a*2 + 1;

MOV R1,2 SAL R1 INC R1 MOV R2,R1

Frontend

SL3

Frontend

SL1

Backend

TL1

Backend

TL3

Source Language 1

txt

Source Language 1

txt

Executable target 1

exe

Executable target 1

exe

Executable target 1

exe

SET R1,2 STORE #0,R1 SHIFT R1,1 STORE #1,R1 ADD R1,1 STORE #2,R1

111

Page 112: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

112

Page 113: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lexical  Analysis  

Modern  Compiler  Design:  Chapter  2.1  

 

113

Page 114: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Conceptual  Structure  of  a  Compiler  

Executable

code

exe Source

text

txt

Semantic

Representation

Backend

Compiler

Frontend

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

Intermediate Representation

(IR)

Code

Generation

114

Page 115: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Conceptual  Structure  of  a  Compiler  

Executable

code

exe Source

text

txt

Semantic

Representation

Backend

Compiler

Frontend

Lexical Analysis

Syntax Analysis

Parsing

Semantic Analysis

Intermediate Representation

(IR)

Code

Generation

words   sentences   115

Page 116: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

(      (      23      +        7      )      *      19      )  

116

Page 117: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

117

Page 118: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

118

Page 119: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

119

Page 120: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

LP    LP    Num    Op    Num    RP    Op    Num    RP  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

120

Page 121: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

LP    LP    Num    Op    Num    RP    Op    Num    RP  Kind  

Value  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’  

121

Page 122: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  does  Lexical  Analysis  do?    

•  Language:  fully  parenthesized  expressions  Context  free  language  

Regular  languages  

(      (      23      +        7      )      *      19      )  

LP    LP    Num    Op    Num    RP    Op    Num    RP  Kind  

Value  

Expr  →  Num  |  LP  Expr  Op  Expr  RP  Num  →  Dig  |  Dig  Num  Dig  →  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’  LP  →  ‘(’  RP  →  ‘)’  Op  →  ‘+’  |  ‘*’   Token   Token   …  

122

Page 123: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

•  Par((ons  the  input  into  stream  of  tokens  – Numbers  –  Iden(fiers  –  Keywords  –  Punctua(on    

•  Usually  represented  as  (kind,  value)  pairs  –  (Num,  23)  –  (Op,  ‘*’)  

•     “word”  in  the  source  language  •     “meaningful”  to  the  syntac(cal  analysis  

What  does  Lexical  Analysis  do?    

123

Page 124: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

From  scanning  to  parsing((23  +  7)  *  x)  

) ? * ) 7 + 23 ( (

RP Id OP RP Num OP Num LP LP

Lexical    Analyzer  

program  text  

token  stream  

Parser  Grammar:      Expr  →  ...  |  Id      Id  →  ‘a’  |  ...  |  ‘z’    

Op(*)  

Id(?)  

Num(23)   Num(7)  

Op(+)  

Abstract  Syntax  Tree  

valid  syntax  error  

124

Page 125: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Why  Lexical  Analysis?  

• Well,  not  strictly  necessary,  but  …  –  Regular  languages  ⊆  Context-­‐Free  languages  

•  Simplifies  the  syntax  analysis  (parsing)      – And  language  defini(on  

• Modularity  •  Reusability    •  Efficiency  

125

Page 126: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lecture  goals

• Understand  role  &  place  of  lexical  analysis  

• Lexical  analysis  theory  • Using  program  genera(ng  tools    

126

Page 127: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lecture  Outline  

ü Role  &  place  of  lexical  analysis  • What  is  a  token?  •  Regular  languages  •  Lexical  analysis  •  Error  handling  •  Automa(c  crea(on  of  lexical  analyzers    

127

Page 128: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

What  is  a  token?  (Intui(vely)  

•  A  “word”  in  the  source  language  – Anything  that  should  appear  in  the  input  to  syntax  analysis  •  Iden(fiers  • Values  • Language  keywords  

•  Usually,  represented  as  a  pair  of  (kind,  value)  

128

Page 129: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Example  Tokens  

Type   Examples  

ID   foo,  n_14,  last  NUM   73,  00,  517,  082    REAL   66.1,  .5,  5.5e-­‐10  IF   if  COMMA   ,  NOTEQ   !=  LPAREN   (  RPAREN   )  

129

Page 130: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Example  Non  Tokens  

Type   Examples  

comment   /*  ignored  */  preprocessor  direc(ve   #include  <foo.h>  

#define  NUMS  5.6  macro   NUMS  whitespace   \t,  \n,  \b,  ‘  ‘  

130

Page 131: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Some  basic  terminology  

•  Lexeme  (aka  symbol)  -­‐  a  series  of  leaers  separated  from  the  rest  of  the  program  according  to  a  conven(on  (space,  semi-­‐column,  comma,  etc.)    

•  Paaern  -­‐  a  rule  specifying  a  set  of  strings.  Example:  “an  iden(fier  is  a  string  that  starts  with  a  leaer  and  con(nues  with  leaers  and  digits”  –  (Usually)  a  regular  expression      

•  Token  -­‐  a  pair  of  (paaern,  aaributes)    

131

Page 132: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Example  void  match0(char  *s)  /*  find  a  zero  */  

{  

   if  (!strncmp(s,  “0.0”,  3))  

       return  0.  ;  

}  

VOID  ID(match0)  LPAREN  CHAR  DEREF  ID(s)  RPAREN    

LBRACE    

IF  LPAREN  NOT  ID(strncmp)  LPAREN  ID(s)  COMMA  STRING(0.0)  COMMA  NUM(3)  RPAREN  RPAREN    

RETURN  REAL(0.0)  SEMI    

RBRACE    

EOF   132

Page 133: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Example  Non  Tokens  

Type   Examples  

comment   /*  ignored  */  preprocessor  direc(ve   #include  <foo.h>  

#define  NUMS  5.6  macro   NUMS  whitespace   \t,  \n,  \b,  ‘  ‘  

•  Lexemes  that  are  recognized  but  get  consumed  rather  than  transmiaed  to  parser  –  If  –  i/*comment*/f  

133

Page 134: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

How  can  we  define  tokens?  

•  Keywords  –  easy!  –  if,  then,  else,  for,  while,  …    

•  Iden(fiers?    •  Numerical  Values?  •  Strings?  

•  Characterize  unbounded  sets  of  values  using  a  bounded  descrip(on?  

134

Page 135: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Lecture  Outline  

ü Role  &  place  of  lexical  analysis  ü What  is  a  token?  •  Regular  languages  •  Lexical  analysis  •  Error  handling  •  Automa(c  crea(on  of  lexical  analyzers    

135

Page 136: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Regular  languages

•  Formal  languages  –  Σ            =  finite  set  of  leaers  – Word                =  sequence  of  leaer  –  Language  =  set  of  words  

•  Regular  languages  defined  equivalently  by  –  Regular  expressions  –  Finite-­‐state  automata  

136

Page 137: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Common  format  for  reg-­‐expsBasic  Pa4erns   Matching  

x     The  character  x  

.   Any  character,  usually  except  a  new  line  

[xyz]   Any  of  the  characters  x,y,z  

^x   Any  character  except  x  

Repe;;on  Operators  

R?   An  R  or  nothing  (=op(onally  an  R)  

R*   Zero  or  more  occurrences  of  R  

R+   One  or  more  occurrences  of  R  

Composi;on  Operators  

R1R2   An  R1  followed  by  R2  

R1|R2   Either  an  R1  or  R2  

Grouping  

(R)   R  itself   137

Page 138: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Examples  

•  ab*|cd?  =    •  (a|b)*  =  •  (0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9)*  =    

138

Page 139: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Escape  characters  

• What  is  the  expression  for  one  or  more  +  symbols?  – (+)+  won’t  work  – (\+)+  will  

•  backslash  \  before  an  operator  turns  it  to  standard  character  – \*,  \?,  \+,  a\(b\+\*,  (a\(b\+\*)+,  …  

•  backslash  double  quotes  surrounds  text  – “a(b+*”,  “a(b+*”+   139

Page 140: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Shorthands  

•  Use  names  for  expressions  –  leaer  =  a  |  b  |  …  |  z  |  A  |  B  |  …  |  Z  –  leaer_  =  leaer  |  _  –  digit  =  0  |  1  |  2  |  …  |  9  –  id  =  leaer_  (leaer_  |  digit)*  

•  Use  hyphen  to  denote  a  range  –  leaer  =  a-­‐z  |  A-­‐Z  –  digit  =  0-­‐9  

140

Page 141: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Examples  

•  if  =  if  •  then  =  then  •  relop  =  <  |  >  |  <=  |  >=  |  =  |  <>  

•  digit  =  0-­‐9  •  digits  =  digit+    

141

Page 142: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Example  

•  A  number  is  

 number  =  (  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  )+      (  ε  |  .  (  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  )+    

     (  ε  |  E  (  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  )+                  )    

 )  

•  Using  shorthands  it  can  be  written  as      number  =  digits  (ε  |  .digits  (ε  |  E  (ε|+|-­‐)  digits  )  )  

142

Page 143: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Exercise  1  -­‐  Ques(on

•  Language  of  ra(onal  numbers  in  decimal  representa(on  (no  leading,  ending  zeros)  –  0  –  123.757  –  .933333  – Not  007  – Not  0.30  

143

Page 144: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Exercise  1  -­‐  Answer

•  Language  of  ra(onal  numbers  in  decimal  representa(on  (no  leading,  ending  zeros)  

–  Digit  =  1|2|…|9  Digit0          =  0|Digit  Num  =  Digit  Digit0*  Frac  =  Digit0*  Digit    Pos    =  Num  |  .Frac  |  0.Frac|  Num.Frac  PosOrNeg  =  (Є|-­‐)Pos  R    =  0  |  PosOrNeg  

144

Page 145: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Exercise  2  -­‐  Ques(on  

•  Equal  number  of  opening  and  closing  parenthesis:  [n]n  =  [],  [[]],  [[[]]],  …  

145

Page 146: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Exercise  2  -­‐  Answer  

•  Equal  number  of  opening  and  closing  parenthesis:  [n]n  =  [],  [[]],  [[[]]],  …  

•  Not  regular  •  Context-­‐free  •  Grammar:  S  ::=  []  |  [S]  

146

Page 147: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Challenge:  Ambiguity  

•  if  =  if  •  id  =  leaer_  (leaer_  |  digit)*  

•  “if”  is  a  valid  word  in  the  language  of  iden(fiers…  so  what  should  it  be?  

•  How  about  the  iden(fier  “iffy”?  

•  Solu(on  –  Always  find  longest  matching  token  –  Break  (es  using  order  of  defini(ons…  first  defini(on  wins  (=>  list  rules  for  keywords  before  iden(fiers)   147

Page 148: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

Crea(ng  a  lexical  analyzer  

•  Given  a  list  of  token  defini(ons  (paaern  name,  regex),  write  a  program  such  that  –  Input:  String  to  be  analyzed  – Output:  List  of  tokens    

•  How  do  we  build  an  analyzer?  

148

Page 149: Compilaon - TAUmaon/teaching/2014-2015/compilation/compilatio… · Admin* • Compiler*Project40%* – 4.5*prac(cal*exercises* – Groupsof3 • 1*theore(cal*exercise*10%* – Groupsof1

149