27
1 Formalization of Generics for the .NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy , Don Syme (Microsoft Research Cambridge)

1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

1

Formalization of Generics for the .NET Common Language

Runtime

Dachuan Yu (Yale University)

Andrew Kennedy, Don Syme

(Microsoft Research Cambridge)

Page 2: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

2

Introduction Upcoming revision of Microsoft .NET platform

includes support for parametric polymorphism (“generics”) in Programming languages C#, Visual Basic, Managed C++ Common Language Runtime (the “virtual machine”) Visual Studio (Integrated Development Environment) Libraries

Previous work (PLDI’01) described implementation techniques used in the CLR

Now we formalize the polymorphic intermediate language and aspects of the implementation

Page 3: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

3

CLR: The big pictureC#

program

IL

C# compiler

Visual Basicprogram

IL

Visual Basic compiler

SML.NETprogram

IL

SML.NET compiler

Machine code

Loader & JIT front-end

JIT IL Common Language RuntimeGarbage

collector

Native interop

Security

Remoting

ExceptionHandling

Threads

Native binary

JIT code-gen

Page 4: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

4

CLR: The big pictureC#

program

IL

C# compiler

Visual Basicprogram

IL

Visual Basic compiler

SML.NETprogram

IL

SML.NET compiler

Loader & JIT front-end

JIT IL Common Language RuntimeGarbage

collector

Native interop

Security

Remoting

ExceptionHandling

Threads

Native binary

Machine code

JIT code-gen

Page 5: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

5

High-level design of generics Type parameterization for all declarations

classes e.g. class Set<T>

interfaces e.g. interface IComparable<T>

structse.g. struct HashBucket<K,D>

methods e.g. static void Reverse<T>(T[] arr)

delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)

Page 6: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

6

Good design => Tricky Implementation Unrestricted instantiation

List<string> ls = new List<string>(); // reference typesList<double> ld = … // primitive typesList<Pair<string,double>> lsd = … // struct types

Full support for run-time types

if (x is Set<string>) { ... } // type-test y = (List<T>) z; // checked cast

Recursion in instantiations

class List<T> : ICloneable<List<T>> // finiteclass C<T> { C<C<T>> fld; } // infinite

Page 7: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

7

Why formalize? In previous work (POPL’01, Gordon & Syme) the

aim was a type soundness proof for a subset of IL (Baby IL)

Our aims are different: Implementation techniques used in the CLR product are

subtle and difficult to get right (=> bugs, perhaps security holes)

We’d like to validate those techniques Current JIT- and pre-compilers are not type-preserving

Our formalization provides a basis for typed compiler intermediate languages for more capable and robust compilers

It’s also difficult to express and apply optimizations Formalization makes this easier

By-product is a generic variant on Baby IL

Page 8: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

8

Formalization: the big picture

BILG classes and methods

BILG = “Baby IL with Generics”A tiny subset of MS-IL

BILC classes and methods

BILC = “Baby IL with Constrained generics”

A typed intermediate language more suitable for code-generation

Specialize generic classes and methodsShare instantiations w.r.t. data

representationIntroduce types-as-values

Optimize use of types-as-values

Page 9: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

9

Illustrative example, in C#

class ArrayUtils {static List<T> ArrayToList<T>(T[] arr){

…new List<T>()… }

}

class List<T> {virtual List<T> Append(object obj) { …(List<T>) obj… …new ListCell<T>…}

}

Pass type parameters at

runtime?

Look up type representations at

runtime?

Want to share generated code for ArrayToList over different instantiations of T

Look up type representations at runtime?

How do we know what T is?

Want to share generated code for List over different instantiations of T

Page 10: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

10

Source Language: BILG “Baby IL with Generics” Purely functional, à la Featherweight Java (Igarashi, Pierce,

Wadler) Primitive types & generic classes Inheritance-based subtyping Generic methods (static and virtual) Type-case operation (isinst) inspects run-time type of object No overloading, no interfaces, no abstract methods, no structs

(“value classes”), no delegates, no boxing, no null values, no heap, no bounded polymorphism

Just enough to demonstrate most of the implementation techniques!

Typing rules & big-step semantics in paper Easier to work with big-step ¬ 9 v. e v taken as definition of divergence

Page 11: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

11

Source language: BILG(type) T,U ::= X | int32 | int64 | I(inst type) I ::= C<T1,…,Tn>

(class def) cd ::= class C<X1,…,Xn > : I {T1 f1 ;…; Tm fm; md1 … mdk }

(method def ) md ::= static T m<X1,…,Xn>(T1,…,Tm) { e; }

| virtual T m<X1,…,Xn>(T1,…,Tm) { e; }

(method ref) M ::= I::m<T1,…,Tn>(expr) e ::= ldc.i4 i4 | ldc.i8 i8 | ldarg x

| e1 … en newobj I| e ldfld I::f| e1 … en call M

| e e1 … en callvirt M| e isinst I or e

Page 12: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

12

BILG typing and evaluation for isinst

E ` e : I E ` e’ : I’

E ` e isinst I’ or e’ : I

fr ` e I’(f1=v1,…,fn=vn) ` I’ <: I

fr ` e isinst I or e’ I’(f1=v1,…,fn=vn)

fr ` e I’(f1=v1,…,fn=vn) ` ¬(I’ <: I) fr ` e’ v’

fr ` e isinst I or e’ v’

Page 13: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

13

BILG typing and evaluation for isinst

E ` e : I E ` e’ : I’

E ` e isinst I’ or e’ : I

fr ` e I’(f1=v1,…,fn=vn) ` I’ <: I

fr ` e isinst I or e’ I’(f1=v1,…,fn=vn)

fr ` e I’(f1=v1,…,fn=vn) ` ¬(I’ <: I) fr ` e’ v’

fr ` e isinst I or e’ v’

Observe:

Types affect evaluation

They cannot be erased

They serve static and dynamic purposes

Page 14: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

14

Target Language: BILC Similar to BILG, but adds

Representation constraints on type parameters ref: “must be a reference type” i4: “must be a 32-bit integer” i8: “must be a 64-bit integer”

Types-as-values RT is a value representing closed type T The value RT has singleton type Rep(T), interpreted as

“is a value representing the type T” Construct reps for open types

mkrepC<T1,…,Tn>(e1,…,en) creates a type-rep

for C<T1,…,Tn> given type-reps for T1,…,Tn

Semantics given by small-step reduction relation

Page 15: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

15

Target language: BILC (subset)(type) T,U ::= X | int32 | int64 | I(inst type) I ::= C<T1,…,Tn>(extended types) ::= T | Rep(T)(constraint) s ::= ref | i4 | i8(class def) cd ::= class C<X1 :s1,…,Xn :sn > : I

{T1 f1 ;…; Tm fm; md1 … mdk }(method def ) md ::= static T m<X1 :s1,…,Xn :sn >(1,…, k)

{ e; } | virtual T m<X1 :s1,…,X :sn>(1,…, k )

{ e; }(method ref) M ::= I::m<T1,…,Tn>(expr) e ::= i4 | i8 | x

| I(e,e1,…,en)| e ldfld I::f| e1 … en call M| e e1 … en callvirt M| e isinstIe or e| RT

| mkrepC<T1,…,Tn>(e1,…,en)

Page 16: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

16

Some typing and reduction rules

E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn)

E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>)

E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I

E ` e isinstI e’ or e’’ : I

v = I(w,v1,…,vn) w Á w’

` (v isinstT w or v’) ! v

v = I(w,v1,…,vn) w § w’

` (v isinstT w or v’) ! v’

“Reflected subtyping”:RI Á RI’ iff I <: I’

Page 17: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

17

Some typing and reduction rules

E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn)

E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>)

E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I

E ` e isinstI e’ or e’’ : I

v = I(w,v1,…,vn) w Á w’

` (v isinstT w or v’) ! v

v = I(w,v1,…,vn) w § w’

` (v isinstT w or v’) ! v’

Observe:

Types do not affect evaluation

They can be erased

They serve only static purposes

Page 18: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

18

Example Static generic method in BILG:

static List<T> Conv<T>(object a) { …a isinst List<T>…

Translated to BILC:

static Listi Convi(object a) { …a isinstTreei RTreei)…

static Listl Convl(object a) { …a isinstTreel RTreel…

static Listr<T> Convr<T:ref>(Rep(T) r, object a) { …a isinstListr<T> (mkrepListr<T>(r))…

Specialized code for T= int32

Specialized code for T= int64

Extra parameter representing T

Code shared for reference types

Lookup/Create type rep at runtime

Page 19: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

19

We need more… So far:

specialization, sharing, and separation of run-time types from static types

but mkrep is a costly operation, requiring type-rep creation at runtime

Idea: instead of passing representations for type parameters, pass representations of types that we actually need:

static Listr<T> Convr<T:ref>(Rep(Listr<T>) r, object a) { …a isinstListr<T>(r)…

Extra parameter representing List<T>

Page 20: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

20

We need more… In general, we need many type-reps in a single method body

So we pass around dictionaries of type-reps What type does a dictionary of type-reps have?

At its simplest, it is just a tuple e.g. Rep(List<X>) £ Rep(Vec<Vec<X>>) is type of a two-slot dictionary containing type-reps for List<X> and Vec<Vec<X>>

In general, dictionaries may contain cycles (e.g. for mutually recursive methods), so we need recursive values and their types

Worse still, polymorphic recursion requires “infinite” dictionaries Simpler: use name-based types for dictionaries

reps for methods: Rep(M), RM, mkrepM(e1,…,en) statically: each Rep-type determines a particular tuple of other

Rep-types dynamically: each type-rep RT or method-rep RM determines a

tuple of type-rep/method-rep values

Page 21: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

21

Target language: BILC (full)(type) T,U ::= X | int32 | int64 | I(inst type) I ::= C<T1,…,Tn>(ext type) ::= T | Rep(T) | Rep(M)(constraint) s ::= ref | i4 | i8(class def) cd ::= class C<X1 :s1,…,Xn :sn > : I

{T1 f1 ;…; Tm fm; md1 … mdk } with 1,…,p

(method def ) md ::= static T m<X1 :s1,…,Xn :sn >(1,…, k) { e; } with 1,…,p

| virtual T m<X1 :s1,…,X :sn>(1,…, k) { e; }(method ref) M ::= I::m<T1,…,Tn>(expr) e ::= i4 | i8 | x

| I(e,e1,…,en)| e ldfld I::f| e1 … en call M| e e1 … en callvirt M| e isinstIe or e| RT | RM

| mkrepC<T1,…,Tn>(e1,…,en)

| mkrepC<T1,…,Tn>::m<U1,…,Uk>(e1,…,en,e1,…,ek)| objdicti e| mdicti e

Page 22: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

22

Translation scheme Static generic methods:

Extra dictionary parameter associated with method Accessed using mdicti(e)

Virtual methods in generic classes Obtain dictionary through type of object Accessed using objdict_i(e)

Generic virtual methods: Dictionary type not known statically (body could be

overridden) So pass reps for type parameters and construct type-

reps at runtime using mkdrep

Page 23: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

23

In the paper… Complete formalization of BILG, BILC, and

a translation Theorems:

Translation preserves types Translation preserves behaviour

And in forthcoming technical report: Full proofs Type erasure theorem: types in BILC do not

affect evaluation

Page 24: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

24

Future work Extend BILG and the translation to cover more

features Value classes (structs)

Would satisfy representation constraint of form [s1,…,sn] where s1,…,sn are constraints on the fields’ representations

Now have unbounded number of specializations All methods on generic structs whose code is shared take a

dictionary parameter Need treatment of boxing

Flexible specialization policies Less sharing: e.g. full specialization of selected types More sharing: e.g. share all instantiations of C<T> by

boxing and unboxing appropriately (cf ML)

Page 25: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

25

Future work: structural typing Flexible specialization interacts badly with run-

time types based on name-equivalence Instead, describe dictionaries using structural

typing: Products:

Rep(List<X>) £ Rep(X) is two-slot dictionary with type-reps for List<X> and X

Circular dictionaries => Recursive types e.g. D. Rep(Vec<X>) £ (Rep(Set<X>) £ D)

Polymorphic recursion in code => Higher-kinded recursive types e.g. (D. X. Rep(Vec<X>) £ D(Set<X>)) string

Page 26: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

26

Related work Rep(T)

Crary, Weirich, Morrisett: “Intensional polymorphism in type-erasure semantics”

Dictionary-passing for polymorphism implementation Saha and Shao (ML) Viroli and Natali (Java)

Page 27: 1 Formalization of Generics for the.NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge),

27

Questions?