43
ISMS Keynote – 9 Feb 2015 A Modeldriven Approach for Timeenergy Performance of Parallel Applications Yong Meng TEO +* and Lavanya Ramapantulu Department of Computer Science National University of Singapore email: [email protected] url: www.comp.nus.edu.sg/~teoym + Visiting Professor, Chinese Academy of Sciences * Centre for Business Analytics, NUS

ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

ISMS Keynote – 9 Feb 2015

A Model‐driven Approach forTime‐energy Performance of Parallel Applications

Yong Meng TEO+* and Lavanya RamapantuluDepartment of Computer ScienceNational University of Singaporeemail: [email protected]

url: www.comp.nus.edu.sg/~teoym

+ Visiting Professor, Chinese Academy of Sciences* Centre for Business Analytics, NUS

Page 2: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

My InterestsRecent PhD Theses1. Specification and Verification of Shared‐memory Concurrent Programs,

Le Duy Khanh, Dec 2014.

2. Parallelism‐Energy Performance Analysis of Multicore Systems, B.M.Tudor, Jan 2014. [IPDPS 2012 PhD Forum Best Poster Award]

3. On Flash Crowd Performance of Peer‐Assisted File Distribution, C.Carbunaru, June 2014.

4. Strategy‐proof Resource Pricing in Federated Systems, M. Mihailescu,2012. [Best Paper Award ‐ 10th International Conference on Algorithmsand Architectures for Parallel Processing, May 2010]

5. Composable Simulation Models and their Formal Validation, ClaudiaSzabo, 2010. [ACM SIGSIM 2009 Best PhD Student Paper Award]

Teaching• CS5224 Cloud Computing, CS3210 Parallel Computing, CS5239 Computer

Systems Performance Analysis, …

9 Feb 2015 2ISMS 2015 Keynote

Page 3: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

NUS School of Computing

Faculty of Arts and Social SciencesSchool of BusinessSchool of ComputingFaculty of DentistrySchool of Design and EnvironmentFaculty of EngineeringFaculty of LawYong Loo Lin School of MedicineYong Siew Toh Conservatory of MusicSaw Swee Hock School of Public HealthFaculty of ScienceUniversity Scholars ProgrammeYale‐NUS CollegeLee Kuan Yew School of Public PolicyNUS Graduate School for Integrative Sciences & EngineeringDuke‐NUS Graduate Medical School Singapore

• Established July 1998 (formerly DISCS within FoS)

• Departments: – Computer Science – Information Systems

• Staff strength:‐ 111 (academic staff)‐ 115 (research staff)

• Student Population~ 2330 (total):

‐1800 undergraduates‐530 graduate students

9 Feb 2015 ISMS 2015 Keynote

Page 4: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Recent Rankings

9 Feb 2015

Massachusetts Institute of Technology

Carnegie Mellon University

University of Cambridge

University of California, Berkeley

National University of Singapore

The Hong Kong University of Science and Technology

University of Edinburgh4ISMS 2015 Keynote

*

*

Page 5: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Faster is better

time (traditionally)

9 Feb 2015 5ISMS 2015 Keynote

& energy

Page 6: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Energy Use of Datacenters• Energy consumption of large‐scale data centers and its costs

are significant– 2006 ‐ 6,000 data centers in US consumed 61x109 KWh of

energy, 1.5% of all electricity consumption, at a cost of$4.5 billion

– 2006‐2011 ‐ from 7 GW to 12 GW, 10 new power plants

• 1998‐2007: performance of supercomputers (+7,000%) hasincreased 3.5 times faster than their operating efficiency*(+2,000%)

• Effort to reduce energy use is focused on computing,networking, and storage activities of a data center – our focus

*operating efficiency of a system = performance per Watt of power

9 Feb 2015 6ISMS 2015 Keynote

Page 7: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Datacenter Energy Usage

9 Feb 2015 7

Barroso L.A., et al., The Datacenter  as a Computer: An Introduction to the Design of Warehouse‐Scale Machines, 2nd Edition, 2013

ISMS 2015 Keynote

Page 8: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Outline

• Motivation

• Objective

• Research Questions

• Time‐energy Performance [ICPP 2014]

• Heterogeneous Low‐power Systems

• Summary

9 Feb 2015 8ISMS 2015 Keynote

Page 9: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Motivation

• computing platforms are increasingly heterogeneous

– processors: brawny vs wimpy, big‐little, accelerators, …

– supercomputer with accelerators

– data centers with different server generations

– heterogeneous cloud computing resources with differentprice‐performance

9 Feb 2015 9ISMS 2015 Keynote

ARM Cortex‐A9

big.LittleARM  A15 +A7

NVIDIA Jetson TK1• CPU: 4‐core ARM A15• GPU: 192‐core NVIDIA Kepler

Page 10: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

10

010

100

90

80

70

60

50

40

30

20

Percentage of power usage

0 1009080706050403020

Percentage of systemutilization

Typical operating region

Energy efficiency

Power

System Utilization vs Percentage Power Usage

9 Feb 2015 10

1. a typical Google cluster: spends most ofits time in 10‐50% utilization range ‐ amismatch between server workloadprofile and server energy efficiency

2. energy‐proportional system (ideal):energy consumed is proportional to theamount of work done

ISMS 2015 Keynote

Page 11: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Energy‐proportional Systems

• Even when power requirements scale linearly with theload, energy efficiency is not a linear function of load;idle system use 50% power

• Ideal system consumes no power when idle, very littlepower under a light load and, gradually, more power asthe load increases

• Dynamic power range: low and upper range of thepower consumption of a device– Processor (70%), DRAM (50%), disk drive (25%), networkswitches (15%), human(??%)

– wider range is better

9 Feb 2015 11

Is human‐being an energy‐proportional system?Is human‐being an energy‐proportional system?

• idle (70W), average (120W), peak (1‐2KW)

• dynamic power range = 1 – 70/1000 > 90%

ISMS 2015 Keynote

Page 12: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Research Questions

1. Can we replace traditional servers with low‐powernodes ? [SIGMETRICS2013]

2. How do we configure energy‐efficientheterogeneous clusters (data centers)?

[ICPP2014, IPDPS2015]

3. What is the cost of processing big data on low‐power servers? [VLDB2015]

4. Is dataflow a suitable model of computation andscheduling to scale‐out workload on low‐powerservers? [PACT2013 workshop]

9 Feb 2015 12ISMS 2015 Keynote

Page 13: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

General Objective

To develop models and techniques for dynamic resourceprovisioning to achieve energy efficient computing whilemeeting performance deadline

Approach:1. generalized analytic performance model for

configuring application resource demand (this talk)2. technique for runtime provisioning using

polymorphic tasks3. …..

9 Feb 2015 ISMS 2015 Keynote 13

Page 14: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Time‐energy Performance

L. Ramapantulu, B.M. Tudor, D. Loghin, T. Vu and Y.M.Teo, Modeling the Energy Efficiency of HeterogeneousClusters, Proc of 43rd International Conference onParallel Processing, Minneapolis, USA, Sep 2014.

9 Feb 2015 14ISMS 2015 Keynote

Page 15: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Reducing Power: Wimpy vs Brawny Servers

9 Feb 2015 15

power [W

]

Performance [MFLOPS]

Brawny node

Wimpy node

Marginal improvement in performance at high power

High idle power

ISMS 2015 Keynote

Page 16: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Objective

How do we configure energy‐efficientheterogeneous clusters (data centers)?

Given an application with an energy budget andan execution time deadline, determine efficientconfigurations to run the application

9 Feb 2015 16ISMS 2015 Keynote

Page 17: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Motivating Example Configuring Heterogeneous Systems

9 Feb 2015 ISMS 2015 Keynote 17

What is the total number of possible configurations to run anapplication with ten AMD and ten ARM nodes?

Total = 36,380 configurations[mix configurations = 10 ARM nodes x 4 cores per ARM nodex 5 core frequencies per ARM node x 10 AMD nodes x 6 cores per AMDnodes x  3 core frequencies per AMD node = 36,000] + [AMD only = 10 x 6 x 3 = 180]+ [ARM only = 10 x 4 x 5 = 200] 

Page 18: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Contributions

• Model‐driven approach: measurement‐basedanalytical model to determine energy efficientconfigurations on a mix of heterogeneous nodes– Meets a time deadline with minimum energy

• Our analysis shows that energy‐deadline Paretofrontier consisting of heterogeneous mixes is almostalways more energy‐efficient than homogeneousclusters

9 Feb 2015 18ISMS 2015 Keynote

Page 19: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Model‐driven Approach

9 Feb 2015 19

energy-efficient Pareto-optimal configurations

baseline measurement

Non-intrusive Baseline Execution

Time-Energy Performance Model

Applications

system parameters

workload parameters

Heterogeneous Systems

• onsiders different • considers different ISAs

• resource overlap• unifying unit of 

work 

ISMS 2015 Keynote

Page 20: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Applications

9 Feb 2015 20

Domain Program Problem Size

HPC EP 2,147,483,648 random numbers

Web Server memcached 600,000 GET/SET operations

Streaming video x264 600 frames 704 × 576

Financial Black‐scholes 500,000 stock options

Speech recognition Julius 2,310,559 samples

Web security RSA‐2048 5000 keys verifications

ISMS 2015 Keynote

Page 21: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Heterogeneous System 

• ARM v7‐A Cortex‐A9• quad‐core, 0.2 to 1.4GHz

9 Feb 2015 21

• AMD K10, x86_64• six‐core, 0.8 to 2.1GHz

ISMS 2015 Keynote

Page 22: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Baseline Execution

• Measurements needed only for a single node, foreach type of node– non‐intrusive hardware performance counters

• Execute the program for a very small problem size– measure instructions, computation cycles and stall cycles– Eg. measure instructions per GET operation of memcached

• Execute micro‐benchmarks to measure active andstall power of processor cores

9 Feb 2015 22ISMS 2015 Keynote

Page 23: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Execution Time Model

9 Feb 2015 23

Parallel ApplicationnARM nAMD

match the execution rates between ARM and AMD nodesT(nARM) ≈ T(nAMD)

within a type of nodeworkload is equally divided 

T(nARM) ≈ 

nARM

T 1 ≈ max( T , T / [CPU and I/O overlap]

ISMS 2015 Keynote

Page 24: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Execution Time Model

≈  , +  ,

, ≈ 

, ≈ 

• stall cycles increase linearly with – increase in core clock frequency – increase in the number of cores

9 Feb 2015 24ISMS 2015 Keynote

Page 25: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Stalls due to Memory Contention

9 Feb 2015 25ISMS 2015 Keynote

Page 26: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Energy Model

• Total Energy = EARM × nARM + EAMD × nAMD

• Enode = E(core) + E(mem) + E(I/O) + E(idle)

• E(core) = Pcore,act × Tcore,work + Pcore,stall × Tcore,stall– power × time – uses execution time model – measured values for Pcore,act , Pcore,stall , PI/O– Pmem,act ,Pmem,stall for ARM and AMD from literature and spec.

9 Feb 2015 26ISMS 2015 Keynote

Page 27: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Model SummaryExecution Time Model

T max(TARM,TAMD)

TARM max(TCPU,ARM,TI/O,ARM)

TCPU,ARM max(Tcore,ARM,Tmem,ARM)

Tcore,ARM Icore,ARM× (WPIARM+ SPIcore,ARM) fARM

Tmem,ARM Icore,ARM× (WPIARM+ SPImem,ARM) fARM

Ti/o,ARM max(TI/O,ARM , 1/λI/O)

Energy Model

E EARM +EAMD

EARM (Ecore,ARM +Emem,ARM +EI/O,ARM +Eidle,ARM) × nARM

Ecore,ARM (Pcore,act,ARM × Tact,ARM + Pcore,stall,ARM × Tstall,ARM) × cact, ARM

9 Feb 2015 27ISMS 2015 Keynote

Page 28: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Model Validation

9 Feb 2015 28ISMS 2015 Keynote

Page 29: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Performance‐to‐Power Ratio

9 Feb 2015 29

memory bound on ARM

x86 ISA has special instruction for cryptography

ISMS 2015 Keynote

Page 30: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Research Questions

1. Is heterogeneity better than homogeneity ?2. Are larger mixes of heterogeneous nodes 

better ?3. …

9 Feb 2015 30ISMS 2015 Keynote

Page 31: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Heterogeneity versus Homogeneity

9 Feb 2015 31

(36,380)

ISMS 2015 Keynote

Page 32: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Heterogeneity versus Homogeneity

9 Feb 2015 32

(36,380)

ISMS 2015 Keynote

Page 33: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Heterogeneity versus Homogeneity

9 Feb 2015 33

Heterogeneity

• Enables a sweet region

• Saves more energy for a given deadline

ISMS 2015 Keynote

Page 34: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Are larger mixes better ?

9 Feb 2015 34

• Larger mixes are more energy efficient

• Enables more number of “sweet spots”

ISMS 2015 Keynote

Page 35: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Observations

1. Heterogeneity allows larger energy savingscompared to homogeneous systems.

2. Larger mixes increase the number ofconfigurations in the sweet region.

3. …

9 Feb 2015 35ISMS 2015 Keynote

Page 36: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Conclusions• measurement‐driven analytical model to determineenergy‐efficient configurations for a single workloadon a heterogeneous mix with different ISA’s

• heterogeneity is almost always more energy‐efficientthan homogeneity– But not for programs with large sequential fraction andhigh parallel overhead

L. Ramapantulu, B.M. Tudor, D. Loghin, T. Vu and Y.M.Teo,Modeling the Energy Efficiency of Heterogeneous Clusters,Proceeding of 43rd International Conference on ParallelProcessing, Minneapolis, USA, Sep 2014.

9 Feb 2015 36ISMS 2015 Keynote

Page 37: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Heterogeneous Low‐power Systems

1. Nov 2014 – 12‐node Heterogeneous CPU‐GPU Cluster (JetsonTK1) with 44 ARM cores & 2,304 GPU cores

2. Aug 2014 – 32‐core ODROID‐XU3 big.LITTLE (ARM A15 + A7)

3. Jun 2014 – Brawny and Wimpy Systems with GPUs (NVIDIA GTX750 Ti)

4. Oct 2013 – Heterogeneous Low‐power system with GPU (NVIDIA Tegra 3 based Kayla platform)

5. Sep 2013 – 32‐core ORDOID XU+E big.LITTLE (ARM A15 + A7)

29 June 2015 Computer Systems Group ‐ Clusters 37

Page 38: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

big.LITTLE Node: ODROID XU+E (Sep 2013)

• CPU: Samsung Exynos 5410 Octal with ARM Cortex‐A15 (1.6GHz) quad core + Cortex‐A7 quad core + GPU

• Memory: 2GB LPDDR3• Power: 5V, 4A

29 June 2015 38Computer Systems Group ‐ Clusters

Page 39: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

big.LITTLE ODROID XU+E Cluster (Sep 2013)16 A15 + 16 A7

29 June 2015 39Computer Systems Group ‐ Clusters

Page 40: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Brawny & Wimpy Systems with GPU (June 2014)*

29 June 2015 40

Power Line (240V AC)

Serial interface

Dell Optiplex

Kayla DevKit

+

NVIDIA GTX 750 Ti

NVIDIA GTX 750 TiPower Monitor

Brawny System with GPU

+

Controller

1 Gbps

1 Gbps

Wimpy System with GPU

*Big Data on Heterogeneous Systems with GPUs, Nvidia GPU TechnologyWorkshop, Singapore, July 2014.

Computer Systems Group ‐ Clusters

Page 41: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

6‐node Nvidia Jetson TK1 Cluster (Nov 2014)

29 June 2015 Computer Systems Group ‐ Clusters 41

CPU: 4‐core ARM Cortex‐A15GPU: 192‐core NVIDIA KeplerMemory: 2GB LPDDR3Storage: 16GB eMMC 4.51Network: 1GbpsPower: 12V, 5A

Page 42: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Publicationshttp://www.comp.nus.edu.sg/~teoym

1. D. Loghin, B.M. Tudor, H. Zhang, B.C. Ooi and Y.M. Teo, A Performance Study of BigData on Small Nodes, Proc of 41st International Conference on Very Large DataBases, Vol. 8, No. 7, Hawaii, USA, Aug 31‐Sep 4, 2015.

2. L. Ramapantulu, D. Loghin and Y.M. Teo, An Approach for Energy EfficientExecution of Hybrid Parallel Programs, Proceedings of 29th IEEE InternationalParallel & Distributed Processing Symposium, Hyderabad, INDIA, May 25‐29, 2015(acceptance 22%).

3. D. Loghin, B.M. Tudor and Y.M. Teo, An Approach for Direct Dataflow Executionon Contemporary Multicore Systems, Proc of 3rd International Workshop onDataflow Execution Models for Extreme Scale Computing, IEEE Computer SocietyPress, in conjunction with PACT2013, Edinburgh, Scotland, Sep 2013.

4. B.M. Tudor and Y.M. Teo, On Understanding the Energy Consumption of ARM‐based Multicore Servers, ACM SIGMETRICS, Carnegie Mellon University,Pittsburgh, USA, June 17 ‐ 21, 2013 [acceptance: 27 of 196] (featured article inHPCwire: Mapping the Energy Envelope of Multicore ARM Chips, 6 June 2013).

5. B.M. Tudor and Y.M. Teo, Towards Modeling Parallelism and Energy Performanceof Multicore Systems, Proc of 26th IEEE International Parallel & DistributedProcessing Symposium, Shanghai, China, May 21‐25, 2012. [PhD Forum Best PosterAward]

6. B. Tudor, Y.M. Teo and S. See, Understanding Off‐chip Contention of ParallelPrograms in Chip Multiprocessors, Proc. of 40th International Conference onParallel Processing, Taipei, Taiwan, Sep 2011 (acceptance 19%).

9 Feb 2015 42ISMS 2015 Keynote

Page 43: ISMS Keynote –9 Feb 2015teoym/pub/15/2015-Feb... · ISMS Keynote –9 Feb 2015 A Model‐driven Approach for Time‐energy Performance of Parallel Applications Yong Meng TEO+* and

Questions & AnswersThank you!

http://www.comp.nus.edu.sg/~teoymEmail: [email protected]

Acknowledgements

Computer Systems Group, NUS

FundingsNational Research Foundation, Ministry of Education, 

Nvidia, Oracle (Sun), …

9 Feb 2015 43ISMS 2015 Keynote