45
1 Compute Power with Energy- Efficiency Partnerships, Standards and the ARM GPU Perspective Jem Davies ARM Fellow, VP of Technology, Media Processing Division, ARM

Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

  • Upload
    vuhuong

  • View
    221

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

1

Compute Power with Energy-Efficiency

Partnerships, Standards and the ARM GPU Perspective

Jem DaviesARM Fellow, VP of Technology,Media Processing Division, ARM

Page 2: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

2

What does ARM do? ARM® was founded in 1990 Initially designed CPUs Over twenty years, have covered CPUs, GPUs, DSPs,

audio processors, video processors, tools, fabric IP and physical IP

ARM’s partners make SoCs containing many heterogeneous compute engines

Heritage of low-power Not just mobile, but across multiple segments > 6 Bn ARM-based chips in 2010

Page 3: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

3

ARM also does GraphicsGrowing the Media

Processor Licensing BaseGrowing Shipments in Mobileand Non-Mobile Applications

46 licenses for graphics and video 7 licenses added in Q1 2011

More Mali™ based chips shipping into mobile and consumer electronics devices

Samsung announced that Mali-based Exynos delivered 5 times more graphics performance than their previous design

STMicroelectronics announced 10 major STB design wins for Mali-based STi7108

46

Licenses Partners RoyaltyPayers

40

6

Pre-2007, 248

14

2007

2008

2009

2010 11

Q1 2011 7

Page 4: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

4

History - the Passage of Time What lessons from history can we learn?

Page 5: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

5

Microprocessor Forum 1992

Page 6: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

6

Count the Architectures (11)

ARM

MIPS29K

PA

NVAX

N32x1688xxx

Alpha

x86

i960

SPARC

x86x86 x86

Page 7: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

7

The Survivors

ARM

x86

Page 8: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

8

What Were the Lessons? Good architectures will succeed

Power did matter It has come to matter more

Too much variety can be a bad thing For developers

The Ecosystem Really matters Developers need to be able to support a range of platforms

Page 9: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

9

The Eras of Computing

1960 1970 1980 1990 2000 2010 2020

Uni

ts

1M

10MMainframe

Mini

1st Era

100M

1 BillionPC

DesktopInternet

2nd Era

100 BillionThe Internet of Things

10 BillionMobile Internet

Page 10: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

10

The Eras of Computing

1960 1970 1980 1990 2000 2010 2020

Uni

ts

1M

10MMainframe

Mini

1st Era

100M

1 BillionPC

DesktopInternet

2nd Era

100 BillionThe Internet of Things

10 BillionMobile Internet

Page 11: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

11

Industry Changes in Requirements

Evolution of the industry-driving metric

Page 12: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

12

Industry Changes in Requirements

Evolution of the industry-driving metricFunctionality

Up to 1980sSupercomputers &

mainframes

Page 13: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

13

Industry Changes in Requirements

Evolution of the industry-driving metricFunctionality

Up to 1980sSupercomputers &

mainframes

Functionality$

1990sThe personal

computer

Page 14: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

14

Industry Changes in Requirements

Evolution of the industry-driving metricFunctionality

Up to 1980sSupercomputers &

mainframes

Functionality$

1990sThe personal

computer

FunctionalityPower × $

2000sNotebooks

Page 15: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

15

Industry Changes in Requirements

Evolution of the industry-driving metricFunctionality

Up to 1980sSupercomputers &

mainframes

Functionality$

1990sThe personal

computer

FunctionalityPower × $

2000sNotebooks

FunctionalityEnergy × $

2010sMobiles &mobility

Page 16: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

16

ARM’s Strategic Direction

FunctionalityEnergy×$

Page 17: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

17

ARM’s Strategic Direction

The bad news: this is a veryhard metric to optimize for

The good news: if you crack it, you “own”the simpler metrics as well…

FunctionalityEnergy×$

Page 18: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

18

Energy-efficiency Has Always Mattered Segment differentiation matters less than you think Mobile worries about warm hands/ears And “You can never be too thin”

Sensors want 10 years from a button cell battery Everybody hates fans Desktops struggle to: Dissipate more than ~150W out of a single chip package Supply more than ~300W from a PSU onto a PCI card

Servers struggle: To get more than ~10kW into a rack To get more than ~500kW of heat out of a shipping container Spending more on power and cooling than on hardware To buy more power from the grid

Page 19: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

19

What’s Coming?

Why should developers care?

Page 20: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

20

Moore’s Law Is Not Dead Some version of Moore’s

law will continue to be true for this decade

But it is getting less and less relevant

In the past, the fabs and Moore’s law gave us PPA (Power, Performance and Area) improvements for free But not any more…

Page 21: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

21

Expectations of The Future

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

28nm (2010)

20nm (2012)

16nm (2014)

14nm (2016)

10nm (2018)

7nm (2020)

Freq

uenc

y (G

Hz)

Expectation: 20% frequency uplift per node

Page 22: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

22

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

28nm (2010)

20nm (2012)

16nm (2014)

14nm (2016)

10nm (2018)

7nm (2020)

Freq

uenc

y (G

Hz)

The Bitter Reality?

Expectation: 20% frequency uplift per node

Current best prediction of core-type performance

Note:Normalized to fixed

Leakage / um

Page 23: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

23

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

28nm (2010)

20nm (2012)

16nm (2014)

14nm (2016)

10nm (2018)

7nm (2020)

Freq

uenc

y (G

Hz)

The Bitter Reality?

Expectation: 20% frequency uplift per node

Current best prediction of core-type performance

Note:Normalized to fixed

Leakage / um

Page 24: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

24

Silicon Generations: Shrink and Add

32nm45nm

Expectation: shrink and addnew functionality in about same area

Page 25: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

25

The Creation of Dark Silicon

Year

Node

Area-1

Peak freq

Power

45nm

2008

1

1

1

22nm

2014

4

1.6

1

11nm

2020

16

2.4

0.6

Exploitable Si(in 45nm power budget)

25%

(4 x 1)-1 = 25%

10%

(16 x 0.6)-1 = 10%

Source: ITRS 2008

Page 26: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

26

The Creation of Dark Silicon

Year

Node

Area-1

Peak freq

Power

45nm

2008

1

1

1

22nm

2014

4

1.6

1

11nm

2020

16

2.4

0.6

Exploitable Si(in 45nm power budget)

25%

(4 x 1)-1 = 25%

10%

(16 x 0.6)-1 = 10%

Source: ITRS 2008

Source: ITRS 2008Lack of power scaling severely limits

system options!

Page 27: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

27

So What Can We Do? We can have more transistors

We just can’t power them all at the same time

We need to use these extra transistors in new ways Multicores Many-cores Domain-specific processors

It all points to heterogeneous processing And aggressive power management

Computing to be done in the most efficient place

Page 28: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

28

Graphics

Page 29: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

29

The Perfect Domain-specific Use-case? GPUs have traditionally been the poster-boys for domain-specific computing Everyone accepts graphics on CPU makes little sense From performance perspective and from energy-efficiency

3D computer graphics used to be very fixed-function Transform, lighting, texturing, rendering...Hardware followed those fixed functions

Modern APIs provide more flexible, programmable models GPUs now used for more general-purpose computing So where does this all end up?

Page 30: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

30

Domain-Specific vs. General Purpose Do GPUs merge with CPUs? Graphics remains one of the very few problem spaces that have very high levels of

parallelism: Do <foo> on every vertex in the frame (10s - 100s of kvertices/frame) Do <bar> on every pixel in the frame (millions of pixels/frame)

Data-level parallelism through thread-level parallelism

Throughput-oriented architectures (and implementations) will continue to have great advantages: Performance Energy

CPUs remain “easier” to program

Page 31: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

31

Innovative ‘Midgard’ GPU architecture Tri-pipe - for performance and flexibility

Scalable and high-end performance Superb graphics quality and performance Up to 2 Gpix/s

Powerful general-purpose computing Up to 68 GFLOPS

Multicore scalability up to 4 cores

Resource efficiency Frugal use of memory and bus-bandwidth Dynamic power management

Broad API and OS support OpenCL™ Full Profile/Renderscript OpenGL® ES and Open VG™ Microsoft DirectX™ to v11

Common driver for all Midgard GPUs

Mali-T604 for Visual Computing

Page 32: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

32

It’s All About The System

CoreLink™ CCI-400 Cache Coherent Interconnect128 bit @ 0.5 Eagle frequency

Quad Cortex-

A15

CoherentI/O

device

128b

Mali-T604Graphics

asyncbridge

asyncbridge

128b 128b

MMU-400 MMU-400

128b 128b

Huxley DMCDMC-400

ACE

ACE ACE-Lite + DVM

ACE-LiteACE-LiteACE-Lite

ACE-Lite

NIC-400

Other Slaves

Other Slaves

128b

NIC-400

LCDVideo

Quad Cortex-

A15

128b

ACE

ACE

AXI4

AXI4

Configurable: AXI4/AXI3/AHB/APB

Configurable: AXI4/AXI3/AHB

DDR3/LPDDR2

WideIO DRAM

ACE-LiteACE-Lite

PHYPHY

GIC-400

ACE-Lite + DVM ACE-Lite + DVM

128b

MMU-400

Page 33: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

33

GPU Computing Observations Beware of vested interests Companies that designs CPUs, GPUs, fabric can afford a more balanced view

Raw GFLOPS numbers tell you (nearly) nothing Most GPUs will only run well (efficiently, fast) with very high numbers of

simultaneous threads (high parallelism) Most GPUs are designed for high throughput Most GPUs will run badly with few threads or requirements for low latency Remember the system (memory, energy...) Understand performance vs. bandwidth vs. latency

There is still a shortage of highly-parallelisable algorithms Remember Amdahl’s law

Page 34: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

34

Amdahl’s Law is Alive and Well100% parallel

95% parallel

90% parallel

75% parallel50% parallel

0

4

8

12

16

20

24

28

32

0 4 8 12 16 20 24 28 32

Max

imum

spe

edup

Number of cores

Speedup on parallel processors is limited by thesequential portion of the program

Sequential portion need not be largeto constrain speedup significantly !

Page 35: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

35

Standards

Page 36: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

36

Complexity Out-of-order superscalar CPUs introduces some complexity Multicore CPUs/GPUs – more complexity Memory consistency model Threading models

Heterogeneous computing - more complexity Some GPU compute engines are: Complex Badly described Difficult to reason about

Most graphics developers want to create visually stunning content, not argue about the finer points of computer science

Page 37: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

37

Complexity, Abstraction, Standards If we do not abstract away from this complexity

and present a simpler world to the developer… Either they won’t use these new systems Or, they’ll use it and get it wrong

Either way, it will be our fault,and they will hate us for it

If we all provide different abstractions… They will hate us for it

We need standard(s) And a very small number of them

Page 38: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

38

Standards

And if you don’t like any of the current standards, if you wait for a while, there will be new ones along…

"The great thing about standards is that you have SO many to choose from."

- Andrew Tannenbaum

Page 39: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

39

Before Creating New Standards… Check that you are solving the right problem The problem changes Check that there isn’t a “Good enough” solution already

There are many solutions to the same problem Not all have the same costs Some costs are well hidden Some costs are not borne by “you” Some costs only become clear over time

Remember the ecosystem

Page 40: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

40

Developers and Developer Communities ARM and AMD know a bit about developer communities

Content remains king!

Development is really expensive and very hard

Re-use of content is an economic imperative For developers…

But it’s not enough that developers don’t hate us! They have to make money as well

Page 41: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

41

Business Model Benefits Business models like ARM’s means everybody makes money Which encourages that developer community What goes around,

comes around

Page 42: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

42

Different-sized Pies

Page 43: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

43

Remember…

Being Right is Easy

Making Money is the Tricky Bit

Page 44: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

44

Final Thoughts To provide increased performance we must be clever Not just rely on Moore’s Law

GPU computing is here already Heterogeneous computing is next Computing in the most energy-efficient way is the real problem Solve that, and everything else will be easy

When we introduce new complex stuff… It must make/save money It must be easily (and widely) usable Or it won’t get used!

And then we wouldn’t make any money

Page 45: Compute Power with Energy- Efficiencydeveloper.amd.com/wordpress/media/2013/06/Compute_Power_with... · Compute Power with Energy-Efficiency ... 3D computer graphics used to be very

45

Thank you

Questions?