35
1 Review Levels of Programming Languages High-level program class Triangle { ... float surface() return b*h/2; } Low-level program LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,#2 RET Executable Machine code 0001001001000101 0010010011101100 10101101001...

Levels of Programming Languages

  • Upload
    nami

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Levels of Programming Languages. High-level program. class Triangle { ... float surface() return b*h/2; }. Low-level program. LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,#2 RET. Executable Machine code. 0001001001000101001001001110110010101101001. - PowerPoint PPT Presentation

Citation preview

Page 1: Levels of Programming Languages

1Review

Levels of Programming Languages

High-level program class Triangle { ... float surface() return b*h/2; }

class Triangle { ... float surface() return b*h/2; }

Low-level program LOAD r1,bLOAD r2,hMUL r1,r2DIV r1,#2RET

LOAD r1,bLOAD r2,hMUL r1,r2DIV r1,#2RET

Executable Machine code 0001001001000101001001001110110010101101001...

0001001001000101001001001110110010101101001...

Page 2: Levels of Programming Languages

2Review

Compilers and other translators

Examples:Chinese => English

Java => JVM byte codes

Scheme => C

C => Scheme

x86 Assembly Language => x86 binary codes

Other non-traditional examples:disassembler, decompiler (e.g. JVM => Java)

Page 3: Levels of Programming Languages

3Review

Tombstone Diagrams

What are they?– diagrams consisting out of a set of “puzzle pieces” we can use

to reason about language processors and programs

– different kinds of pieces

– combination rules (not all diagrams are “well formed”)

M

Machine implemented in hardware

S -> T

L

Translator implemented in L

ML

Language interpreter in L

Program P implemented in L

LP

Page 4: Levels of Programming Languages

4Review

Syntax Specification

Syntax is specified using “Context Free Grammars”:– A finite set of terminal symbols– A finite set of non-terminal symbols– A start symbol– A finite set of production rules

Usually CFG are written in “Bachus Naur Form” or BNF notation.

A production rule in BNF notation is written as:N ::= where N is a non terminal

and a sequence of terminals and non-terminals N ::= is an abbreviation for several rules

with N as left-hand side.

Page 5: Levels of Programming Languages

5Review

Concrete and Abstract Syntax

The previous grammar specified the concrete syntax of mini triangle.

The concrete syntax is important for the programmer who needs to know exactly how to write syntactically well-formed programs.

The abstract syntax omits irrelevant syntactic details and only specifies the essential structure of programs.

Example: different concrete syntaxes for an assignmentv := e (set! v e)e -> vv = e

Page 6: Levels of Programming Languages

6Review

Abstract Syntax Trees

Abstract Syntax Tree for: d:=d+10*n

BinaryExpression

VNameExp

BinaryExpression

Ident

d +

Op Int-Lit

10 *

Op

SimpleVName

IntegerExp VNameExp

Ident

n

SimpleVName

AssignmentCmd

d

Ident

VName

SimpleVName

Page 7: Levels of Programming Languages

7Review

Contextual Constraints

Syntax rules alone are not enough to specify the format of well-formed programs.

Example 1:let const m~2in m + x

Example 2:let const m~2 ; var n:Booleanin begin n := m<4; n := n+1end

Undefined! Scope Rules

Type error! Type Rules

Page 8: Levels of Programming Languages

8Review

Semantics

Specification of semantics is concerned with specifying the “meaning” of well-formed programs.

Terminology:

Expressions are evaluated and yield values (and may or may not perform side effects)

Commands are executed and perform side effects.

Declarations are elaborated to produce bindings

Side effects:• change the values of variables• perform input/output

Page 9: Levels of Programming Languages

9Review

Phases of a Compiler

A compiler’s phases are steps in transforming source code into object code.

The different phases correspond roughly to the different parts of the language specification:

• Syntax analysis <-> Syntax• Contextual analysis <-> Contextual constraints• Code generation <-> Semantics

Page 10: Levels of Programming Languages

10Review

Compiler Passes

• A pass is a complete traversal of the source program, or a complete traversal of some internal representation of the source program.

• A pass can correspond to a “phase” but it does not have to!

• Sometimes a single “pass” corresponds to several phases that are interleaved in time.

• What and how many passes a compiler does over the source program is an important design decision.

Page 11: Levels of Programming Languages

11Review

Syntax Analysis

Scanner

Source Program

Abstract Syntax Tree

Error Reports

Parser

Stream of “Tokens”

Stream of Characters

Error Reports

Dataflow chart

Page 12: Levels of Programming Languages

12Review

Regular Expressions

• RE are a notation for expressing a set of strings of terminal symbols.

Different kinds of RE: The empty stringt Generates only the string tX Y Generates any string xy such that x is generated by x

and y is generated by YX | Y Generates any string which generated either

by X or by YX* The concatenation of zero or more strings generated

by X(X) For grouping,

Page 13: Levels of Programming Languages

13Review

FA and the implementation of Scanners

• Regular expressions, (N)DFA- and NDFA and DFA’s are all equivalent formalism in terms of what languages can be defined with them.

• Regular expressions are a convenient notation for describing the “tokens” of programming languages.

• Regular expressions can be converted into FA’s (the algorithm for conversion into NDFA- is straightforward)

• DFA’s can be easily implemented as computer programs.

Page 14: Levels of Programming Languages

14Review

JFlex Lexical Analyzer Generator for Java

Definition of tokens

Regular Expressions

JFlex

Java File: Scanner Class

Recognizes Tokens

Page 15: Levels of Programming Languages

15Review

Parsing

Parsing == Recognition + determining phrase structure (for example by generating AST)

– Different types of parsing strategies

• bottom up

• top down

– Recursive descent parsing

• What is it

• How to implement one given an EBNF specification

– Bottom up parsing algorithms

Page 16: Levels of Programming Languages

16Review

Top-down parsing

The cat sees a rat .The cat sees rat .

Sentence

Subject Verb Object .

Sentence

Noun

Subject

The

Noun

cat

Verb

sees a

Noun

Object

Noun

rat .

Page 17: Levels of Programming Languages

17Review

Bottom up parsing

The cat sees a rat .The cat

Noun

Subject

sees

Verb

a rat

Noun

Object

.

Sentence

Page 18: Levels of Programming Languages

18Review

Development of Recursive Descent Parser

(1) Express grammar in EBNF

(2) Grammar Transformations: Left factorization and Left recursion elimination

(3) Create a parser class with– private variable currentToken– methods to call the scanner: accept and acceptIt

(4) Implement private parsing methods:– add private parseN method for each non terminal N

– public parse method that

• gets the first token form the scanner

• calls parseS (S is the start symbol of the grammar)

Page 19: Levels of Programming Languages

19Review

LL 1 Grammars

• The presented algorithm to convert EBNF into a parser does not work for all possible grammars.

• It only works for so called “LL 1” grammars.• Basically, an LL1 grammar is a grammar which can be

parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token.

• What grammars are LL1?

How can we recognize that a grammar is (or is not) LL1?

=> We can deduce the necessary conditions from the parser generation algorithm.

Page 20: Levels of Programming Languages

20Review

LR parsing

– The algorithm makes use of a stack.

– The first item on the stack is the initial state of a DFA

– A state of the automaton is a set of LR0/LR1 items.

– The initial state is constructed from productions of the form S:= • [, $] (where S is the start symbol of the CFG)

– The stack contains (in alternating) order:

• A DFA state

• A terminal symbol or part (subtree) of the parse tree being constructed

– The items on the stack are related by transitions of the DFA

– There are two basic actions in the algorithm:

• shift: get next input token

• reduce: build a new node (remove children from stack)

Page 21: Levels of Programming Languages

21Review

Bottom Up Parsers: Overview of Algorithms

• LR0 : The simplest algorithm, theoretically important but rather weak (not practical)

• SLR : An improved version of LR0 more practical but still rather weak.

• LR1 : LR0 algorithm with extra lookahead token.– very powerful algorithm. Not often used because of large

memory requirements (very big parsing tables)

• LALR : “Watered down” version of LR1– still very powerful, but has much smaller parsing tables

– most commonly used algorithm today

Page 22: Levels of Programming Languages

22Review

JavaCUP: A LALR generator for Java

Grammar BNF-like Specification

JavaCUP

Java File: Parser Class

Uses Scanner to get TokensParses Stream of Tokens

Definition of tokens

Regular Expressions

JFlex

Java File: Scanner Class

Recognizes Tokens

Syntactic Analyzer

Page 23: Levels of Programming Languages

23Review

Contextual Analysis -> Decorated AST

ProgramLetCommand

SequentialDeclaration

n

Ident Ident Ident Ident

SimpleT

VarDecl

SimpleT

VarDecl

Integer c Char c ‘&’ n n + 1

Ident Ident Ident OpChar.Lit Int.Lit

SimpleV

Char.Expr

SimpleV

VNameExp Int.Expr

AssignCommand BinaryExpr

SequentialCommand

AssignCommand

:char

:char

:int

:int

:int :int

result of identification:type result of type checking

Annotations:

:int

Page 24: Levels of Programming Languages

24Review

Nested Block Structure

A language exhibits nested block structure if blocks may be nested one within another (typically with no upper bound on the level of nesting that is allowed).

A language exhibits nested block structure if blocks may be nested one within another (typically with no upper bound on the level of nesting that is allowed).

There can be any number of scope levels (depending on the level of nesting of blocks):

Typical scope rules:

• no identifier may be declared more than once within the same block (at the same level).

• for any applied occurrence there must be a corresponding declaration, either within the same block or in a block in which it is nested.

Nested

Page 25: Levels of Programming Languages

25Review

Type Checking

For most statically typed programming languages, a bottom up algorithm over the AST:

• Types of expression AST leaves are known immediately:– literals => obvious

– variables => from the ID table

– named constants => from the ID table

• Types of internal nodes are inferred from the type of the children and the type rule for that kind of expression

Page 26: Levels of Programming Languages

26Review

Runtime organization

• Data Representation: how to represent values of the source language on the target machine.

•Primitives, arrays, structures, unions, pointers• Expression Evaluation: How to organize computing the values of

expressions (taking care of intermediate results)•Register vs. stack machine

• Storage Allocation: How to organize storage for variables (considering different lifetimes of global, local and heap variables)

•Activation records, static links• Routines: How to implement procedures, functions (and how to

pass their parameters and return values)•Value vs. reference, closures, recursion

• Object Orientation: Runtime organization for OO languages•Method tables

Page 27: Levels of Programming Languages

27Review

Tricky sort

n:23 check

p

n:15 check

p

n:7 check

p

n:88 check

p

identity

check

i:88

check

check

i:88

i:88

identity

Page 28: Levels of Programming Languages

28Review

JVM

The JVM is an abstract machine in the true sense of the word.

The JVM spec. does not specify implementation details (can be dependent on target OS/platform, performance requirements etc.)

The JVM spec defines a machine independent “class file format” that all JVM implementations must support.

.class files

JVM

load

External representationplatform independent

internal representationimplementation dependent

objects

classes

methods

integersarrays

primitive types

Page 29: Levels of Programming Languages

29Review

Inspecting JVM code

% javac Factorial.java % javap -c -verbose FactorialCompiled from Factorial.javapublic class Factorial extends java.lang.Object { public Factorial(); /* Stack=1, Locals=1, Args_size=1 */ public int fac(int); /* Stack=2, Locals=4, Args_size=2 */}

Method Factorial() 0 aload_0 1 invokespecial #1 <Method java.lang.Object()> 4 return

% javac Factorial.java % javap -c -verbose FactorialCompiled from Factorial.javapublic class Factorial extends java.lang.Object { public Factorial(); /* Stack=1, Locals=1, Args_size=1 */ public int fac(int); /* Stack=2, Locals=4, Args_size=2 */}

Method Factorial() 0 aload_0 1 invokespecial #1 <Method java.lang.Object()> 4 return

Page 30: Levels of Programming Languages

30Review

Inspecting JVM Code ...

// address: 0 1 2 3Method int fac(int) // stack: this n result i 0 iconst_1 // stack: this n result i 1 1 istore_2 // stack: this n result i 2 iconst_2 // stack: this n result i 2 3 istore_3 // stack: this n result i 4 goto 14 7 iload_2 // stack: this n result i result 8 iload_3 // stack: this n result i result i 9 imul // stack: this n result i result i 10 istore_2 11 iinc 3 1 14 iload_3 // stack: this n result i i 15 iload_1 // stack: this n result i i n 16 if_icmple 7 // stack: this n result i 19 iload_2 // stack: this n result i result 20 ireturn

Page 31: Levels of Programming Languages

31Review

Code Generation

Source Program

let var n: integer; var c: charin begin c := ‘&’; n := n+1end

let var n: integer; var c: charin begin c := ‘&’; n := n+1end

PUSH 2LOADL 38STORE 1[SB]LOAD 0LOADL 1CALL addSTORE 0[SB]POP 2HALT

PUSH 2LOADL 38STORE 1[SB]LOAD 0LOADL 1CALL addSTORE 0[SB]POP 2HALT

Target program

~~

Source and target program must be“semantically equivalent”

Semantic specification of the source language is structured in terms of phrases in the SL: expressions, commands, etc.=> Code generation follows the same “inductive” structure.

Page 32: Levels of Programming Languages

32Review

Specifying Code Generation with Code Templates

The code generation functions for Mini Triangle

Phrase Class Function Effect of the generated code

Program

Command

Expres-sionV-name

V-nameDecla-ration

run P

execute C

evaluate E

fetch V

assign Velaborate D

Run program P then halt. Starting and finishing with empty stackExecute Command C. May update variables but does not shrink or grow the stack!Evaluate E, net result is pushing the value of E on the stack.Push value of constant or variable on the stack.Pop value from stack and store in variable VElaborate declaration, make space on the stack for constants and variables in the decl.

Page 33: Levels of Programming Languages

33Review

Code Generation with Code Templates

execute [while E do C] =

JUMP hg: execute [C]h: evaluate[E]

JUMPIF(1) g

C

E

While command

Page 34: Levels of Programming Languages

34Review

Code improvement (optimization)

The code generated by our compiler is not efficient:• It computes values at runtime that could be known at

compile time• It computes values more times than necessary

We can do better!• Constant folding• Common sub-expression elimination• Code motion• Dead code elimination

Page 35: Levels of Programming Languages

35Review

Optimization implementation

• Is the optimization correct or safe?• Is the optimization an improvement?• What sort of analyses do we need to perform to get the

required information?–Local

–Global