1
So ft ware Languages . Lab Many-Core Virtual Machines Decoupling Abstract From Concrete Concurrency Virtual Machines Concurrency So Many Models Stefan Marr, Theo D’Hondt {stefan.marr, tjdhondt}@vub.ac.be hp://soſt.vub.ac.be/~smarr/research/ Experiments abstract for multiple languages Abstraction by ILs concurrency support is limited decouple abstract concurrency models and concrete concurrency models what to include in ILs? how to combine the various models? what are the fundamental problems? Locality and Encapsulation? Approach and Evaluation The Manycore RoarVM but... a VM has to: from and hardware operang systems ECMA-335 4Edition / June 2006 Common Language Infrastructure (CLI) Partitions I to VI • powered by fast JIT compilers, and great GCs • foundaon for mul-language VMs • allow to reuse exisng infrastructure • require huge investments • reuse is economically necessary • VM Intermediate Languages (ILs) • oſten defined as bytecodes • expressive abstracon for various target languages • state of the art is very diverse Abstraction Model #Register Execution Mode Length in Byte #Opcodes CLI Bytecode stack 0 - variable >= 1 217 CPython Bytecode stack 0 switch variable 1 or 3 102 Dalvik VM Bytecode register threaded variable >= 2 218 Dis VM Bytecode memory-to-memory 0 - variable 1 - 33 158 Erlang Bytecode register 1024 threaded, JIT fixed 4 148 JVM Bytecode stack 0 - variable >= 1 201 Lua Bytecode register 255 switch fixed 4 38 Mozart Bytecode register-memory threaded variable 4 - 24 97 Parrot Bytecode register switch, threaded, JIT variable >= 4 >1200 PHP Bytecode register-memory threaded fixed 76 136 Rubinius Bytecode stack 0 JIT variable 4 -16 89 Ruby 1.8 AST stack 0 switch - - 105 Ruby 1.9 Bytecode stack 2 threaded variable >= 32 77 Self Bytecode stack 0 JIT fixed 1 17 Squeak Bytecode stack 0 switch, threaded variable 1 or 2 71 TraceMonkey Bytecode stack 1 threaded, JIT variable >= 1 234 V8 AST - - JIT - - 38 Model Threads/Locks CSP Actors Data-flow IL Support Marginal High-level High-level Marginal StdLib Low/high-level High-level High-level High-level • VM support is minimal • only one specific concurrency model is supported • only few ILs provide noon of concurrency • no comprehensive abstracon Stefan Marr, Michael Haupt, and Theo D'Hondt Intermediate Language Design of High-level Language Virtual Machines: Towards Comprehensive Concurrency Support In: Proc. of the 3rd Workshop on Virtual Machines and Intermediate Languages, ACM, October (2009) • abstract concurrency models are defined by languages or libraries • used by applicaon developers • wide range of models supported by VM is necessary • implemenng unsupported models on top is hard • restricons hinder efficient implementaon • support at VM-level allows reuse and opmizaon concrete concurrency models are provided by the underlying system Single-Core • preempve OS threads • instrucon-level parallelism • VM challenges • deep cache hierarchies • cache-consciousness required Mul-Core • uniform memory access • nave support for thread-level parallelism • and cache coherency • locality and cache hierar- chy must be considered • avoid cache thrashing Many-Core • non-uniform memory access architectures • can have explicit core- to-core communicaon • very diverse designs • with/out cache coher. • explicit inter-core com. Intel Banias Intel Nehalem IBM Cell B.E. there are various ways to express concurrency and soluons are domain-specific A Foundaon for Concurrency Support in Mul-Language VMs? Exploring the Design Space: Stefan Marr et al. Virtual Machine Support for Many-Core Architectures: Decoupling Abstract From Concrete Concurrency Models In: 2nd Internaonal Workshop on Programming Languages Approaches to Concurrency and Communicaon-cEntric Soſtware, York, UK, March (2009) Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, and Robert Hirschfeld Towards an Actor-based Concurrent Machine Model In: Proc. of the 4th Workshop on the Implementaon, Compilaon, Opmizaon of Object-Oriented Languages, Programs and Systems, ACM (2009) earlier experiments: an IL for threads/locks, and an IL for Actors • A Smalltalk VM for mul- and manycore systems • runs on the 64-core TILE architecture • runs on standard x86 systems • supports Linux and OS X • released under the Eclipse Public License at hp://github.com/smarr/RoarVM Research In cooperaon with David Ungar and Sam Adams from On a TILEPro64 • 64 cores on a single chip • explicit core-to-core communicaon • small caches • shared coherent memory Non-Uniform Memory Access Shared Mutable State in some form explicit or implied in languages Locality Encapsulaon none C/C++/Java AmbientTalk E vats Actors CSP flat partitioning k-level hierarchy dynamic hierarchy immutability bit readability bit predicate-based Clojure Agents X10 STM UPC CAF RoarVM CUDA OpenCL Nested Actors Active Objects CoBoxes 1. 2. 3. Top-Down from a Language Perspecve X10 inter-place operaons Nesng of Actors or CoBoxes Clojure Agents Abstract from experiments and extend the VM model • Language engineering effort • Avoid duplicaon in different language implementaons • Trade-off to VM complexity • Performance benefits • Memory/cache ulizaon Assess and evaluate benefits of VM Support Our Plaorm for Experiments Implement relevant concepts on-top of the RoarVM NUMA is the dominang hardware characterisc Paroned Global Address Space locality explicit in shared-memory Non-shared Memory Languages natural fit for NUMA protecng and isolang shared state avoiding mutable shared state perhaps allowing immutable shared state Is the noon of locality inevitable for a VM? Can a VM support this beer by some noon of Encapsulaon?

Many-Core Virtual Machines: Decoupling Abstract From ... · Ruby 1.8 AST stack 0 switch - - 105 Ruby 1.9 Bytecode stack 2 threaded variable >= 32 77 Self Bytecode stack 0 JIT fixed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Many-Core Virtual Machines: Decoupling Abstract From ... · Ruby 1.8 AST stack 0 switch - - 105 Ruby 1.9 Bytecode stack 2 threaded variable >= 32 77 Self Bytecode stack 0 JIT fixed

Software↵

Languages.Lab

INHOUDSTAFEL

Kerncijfers kaft

Mijlpalen kaft

Inhoudstafel 3

Voorwoord 5

Innovatie stimuleren in Vlaanderen 7

Overheidsagentschap voor innovatiesteun

Open voor bedrijven en onderzoekscentra

Gedragen door een ervaren team

Samen realiseren we innovatieprojecten 13

Overheidssteun voor innovatie 17

Basisonderzoek stimuleren 21

Innovatiebeleid van de Vlaamse Regering ondersteunen 26

WAT?

WAAROM?

HOE?

agentschap voor Innovatie door Wetenschap en Technologie

Pagina 3

Many-Core Virtual MachinesDecoupling Abstract From Concrete Concurrency

Vir

tua

l M

ac

hin

es

Co

nc

urr

en

cy

So

Ma

ny

Mo

de

ls

Stefan Marr, Theo D’Hondt {stefan.marr, tjdhondt}@vub.ac.be http://soft.vub.ac.be/~smarr/research/

Ex

pe

rim

en

ts

abstract

for multiple languages

Abstraction by ILs

concurrency support is limited

decouple abstract concurrency models

and concrete concurrency models

what to include in ILs?

how to combine the various models?

what are thefundamental problems?

Locality and Encapsulation?

Approach and Evaluation

The Manycore RoarVM

but...

a VM has to:

from andhardware operating systems from and from andhardware operating systemshardware operating systems

ECMA-3354th Edition / June 2006

Common Language Infrastructure (CLI)

Partitions I to VI

• powered by fast JIT compilers, and great GCs • foundation for multi-language VMs • allow to reuse existing infrastructure• require huge investments • reuse is economically necessary

• VM Intermediate Languages (ILs) • often defined as bytecodes • expressive abstraction for various target languages • state of the art is very diverse

Abstraction Model #Register Execution Mode Length in Byte #Opcodes

CLI Bytecode stack 0 - variable >= 1 217

CPython Bytecode stack 0 switch variable 1 or 3 102

Dalvik VM Bytecode register threaded variable >= 2 218

Dis VM Bytecode memory-to-memory 0 - variable 1 - 33 158

Erlang Bytecode register 1024 threaded, JIT fixed 4 148

JVM Bytecode stack 0 - variable >= 1 201

Lua Bytecode register 255 switch fixed 4 38

Mozart Bytecode register-memory threaded variable 4 - 24 97

Parrot Bytecode register switch, threaded, JIT variable >= 4 >1200

PHP Bytecode register-memory threaded fixed 76 136

Rubinius Bytecode stack 0 JIT variable 4 -16 89

Ruby 1.8 AST stack 0 switch - - 105

Ruby 1.9 Bytecode stack 2 threaded variable >= 32 77

Self Bytecode stack 0 JIT fixed 1 17

Squeak Bytecode stack 0 switch, threaded variable 1 or 2 71

TraceMonkey Bytecode stack 1 threaded, JIT variable >= 1 234

V8 AST - - JIT - - 38

Model Threads/Locks CSP Actors Data-flow

IL Support Marginal High-level High-level Marginal

StdLib Low/high-level High-level High-level High-level

• VM support is minimal • only one specific concurrency model is supported • only few ILs provide notion of concurrency • no comprehensive abstraction

Stefan Marr, Michael Haupt, and Theo D'HondtIntermediate Language Design of High-level Language Virtual Machines: Towards Comprehensive Concurrency SupportIn: Proc. of the 3rd Workshop on Virtual Machines and Intermediate Languages, ACM, October (2009)

• abstract concurrency models are defined by languages or libraries • used by application developers

• wide range of models supported by VM is necessary • implementing unsupported models on top is hard • restrictions hinder efficient implementation • support at VM-level allows reuse and optimization

concrete concurrency models are provided by the underlying system

Single-Core• preemptive OS threads• instruction-level parallelism• VM challenges • deep cache hierarchies • cache-consciousness required

Multi-Core• uniform memory access• native support for thread-level parallelism• and cache coherency• locality and cache hierar- chy must be considered • avoid cache thrashing

Many-Core• non-uniform memory access architectures• can have explicit core- to-core communication• very diverse designs • with/out cache coher. • explicit inter-core com.

Intel Banias Intel Nehalem IBM Cell B.E.

there are various ways to express concurrencyand solutions are domain-specific A Foundation for Concurrency Support in Multi-Language VMs?

Exploring the Design Space:

Stefan Marr et al.Virtual Machine Support for Many-Core Architectures: Decoupling Abstract From Concrete Concurrency ModelsIn: 2nd International Workshop on Programming Languages Approaches to Concurrency and Communication-cEntric Software, York, UK, March (2009)

Hans Schippers, Tom Van Cutsem, Stefan Marr, Michael Haupt, and Robert HirschfeldTowards an Actor-based Concurrent Machine ModelIn: Proc. of the 4th Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, ACM (2009)

earlier experiments: an IL for threads/locks, and an IL for Actors

• A Smalltalk VM for multi- and manycore systems• runs on the 64-core TILE architecture• runs on standard x86 systems• supports Linux and OS X• released under the Eclipse Public License at http://github.com/smarr/RoarVM

ResearchIn cooperation with David Ungar and Sam Adams from

On a TILEPro64• 64 cores on a single chip• explicit core-to-core communication• small caches• shared coherent memory

Non-Uniform Memory Access

Shared Mutable State

in s

ome

form

exp

licit

or im

plie

d in

lang

uage

s

Loca

lity

Encapsulation

none

C/C++/Java

AmbientTalkE vatsActors

CSP

flatpartitioning

k-levelhierarchy

dynamichierarchy

immutability bit readability bit predicate-based

Clojure AgentsX10

STM

UPCCAF

RoarVM

CUDAOpenCL

Nested ActorsActive Objects

CoBoxes

1.2.3.

Top-Down from a Language Perspective

X10 inter-placeoperations

Nesting ofActors or CoBoxes

Clojure Agents

Abstract from experimentsand extend the VM model

• Language engineering effort • Avoid duplication in different language implementations • Trade-off to VM complexity• Performance benefits• Memory/cache utilization

Assess and evaluatebenefits of VM Support

Our Platform for Experiments

Implement relevant conceptson-top of the RoarVM

NU

MA

is th

e do

min

ating

har

dwar

e ch

arac

teri

stic

Partitioned Global Address Spacelocality explicit in shared-memory

Non-shared Memory Languagesnatural fit for NUMA

protecting and isolating shared state

avoiding mutable shared stateperhaps allowing immutable shared state

Is the notion of localityinevitable for a VM?

Can a VM support this betterby some notion of Encapsulation?

• shared coherent memory

Ex

pe

rim

en

ts