Transcript

1

ARM® big.LITTLE™ Technology Unleashed An Improved User Experience Delivered

Govind Wathan Product Specialist

Cortex®-A Mobile & Consumer CPU Products

2

Introduction to big.LITTLE Technology

Benefits of big.LITTLE Technology

Future big.LITTLE systems

Summary

Questions

Agenda

3

Mobile users spend a high amount of time on a

range of mobile applications*:

38% on web browsing and Facebook

32% on gaming

16% on audio, video and utility

Common “building blocks” in workloads:

Short bursts of high intensity

Long periods of sustained high intensity

Low intensity

Mobile Application Workloads

Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

* Source: Flurry Analytics Time

Time

Time

Pow

er

Pow

er

Pow

er

Web Browsing

Gaming

Audio Playback

4

Mobile Application Workloads

Sustained Performance Envelope

Category 2

Sustained Performance

at Thermal Limit

Category 3

Long-use Low-Intensity

Workloads

Category 1

Burst of High Intensity

Workloads

Example: Web Browsing

Example: Castlemaster

Example: Audio Playback

Power

Applications require a mix of performance levels

Mobile users want a better user experience but not at a cost of reduced battery life

5

Mobile Application Workload Profiles Perc

enta

ge o

f Tim

e S

pent

in

DV

FS

Stat

es

Category 2

Sustained Performance

at Thermal Limit

Category 3

Long-use Low-Intensity

Workloads

Category 1

Burst High Intensity Workloads

High

Mid

Low

WFI / PowerDown

Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

Applications require a mix of performance levels

Mobile users want a better user experience but not at a cost of reduced battery life

Idle

Example: Web Browsing

Example: Castlemaster

Example: Audio Playback

6

Heterogeneous Computing

2x higher performance vs. LITTLE only

Up to 75% CPU power savings vs. big only

Architecturally Identical Processors

High performance tuned big cores

Low power tuned LITTLE cores

Hardware Coherency

Cache Coherent Interconnect (CCI)

L1 and L2 snooping between clusters

Seamless & Automatic Task Allocation

big.LITTLE Technology

“Right Task on the Right Core”

L2 Cache L2 Cache

Cache Coherent Interconnect

Interrupt Control

Up to 40% SOC power savings*

* Measured across a set of casual games and common use-cases on an ARM

Partner 4xCortex-A15.4xCortex-A7 big.LITTLE device

big Cluster

LITTLE Cluster

7

Introduction to big.LITTLE Technology

Benefits of big.LITTLE Technology

Future big.LITTLE systems

Summary

Questions

Agenda

8

0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

ClusterMigration

big.LITTLE MP

Power

big.LITTLE MP Software Evolution

big.LITTLE

Cluster

Migration

CPU

Migration

Global Task

Scheduling

(big.LITTLE MP)

1

2

2

2

2

1

1

1

1 1

2

3

4

2

3

4

1

2

3

4

Improving Performance and Efficiency

2012 H1 2013 H2 2013

Measured Power and Performance on big.LITTLE Devices

(big.LITTLE MP relative to Cluster Migration)

-29% -38% +20%

+60% 5

6

7

8

Performance

Web

Browsing

Intensive

Gaming

Web Browsing

Intensive

Gaming

(Lower is Better) (Higher is Better)

9

big.LITTLE MP

Delivers higher power efficiency

Extends battery life

Improves user experience

0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

ClusterMigration

big.LITTLE MP

Power

Measured Power and Performance on big.LITTLE Devices

(big.LITTLE MP relative to Cluster Migration)

-29% -38% +20%

+60%

Performance

Web

Browsing

Intensive

Gaming

Web Browsing

Intensive

Gaming

(Lower is Better) (Higher is Better)

10

Asphalt 7 DungeonDefenders

Video Playback

Normalized Jank* (Less is Better)

LITTLE only

big.LITTLE

big.LITTLE MP Improves User Experience (UX)

* Measure of variance in frame rate

Measurements conducted on the same big.LITTLE platform

58% 65% 47% UX

Improvement

0%

20%

40%

60%

80%

100%

CPU0 CPU1 CPU2 CPU3 CPU4 CPU5

DVFS states: Web Browsing with Audio

LITTLE core Idle LITTLE core Low Frequency

LITTLE core Mid Frequency LITTLE core High Frequency

big core Idle big core Low Frequency

big core Mid Frequency big core High Frequency

Short bursts of performance on big cores enable

sustained levels of smooth user-experience

LITTLE cores handle background tasks and audio

LITTLE Cluster big Cluster

11

0.00

0.50

1.00

1.50

2.00

4x4 big.LITTLE MP vs. 4x4 Cluster Migration

Efficiency

SoC thermal budget constrains Cortex-A15

cores to lower frequency resulting in lower

benchmark performance

35% average improvement in power efficiency across

Single-Thread and Multi-Thread workloads

1.2GHz

Cortex-A15 MP4 Cortex-A7 MP4

1.3 GHz

Frequency residency profile while running Antutu CPU

big.LITTLE MP

Cluster

Migration

big.LITTLE MP Delivers Higher Power Efficiency

Cortex-A15 and Cortex-A7 clusters at peak

performance within the thermal budget

1.7GHz 1.1GHz

1.4GHz

1.2GHz

Cortex-A15 MP4

A7 cores not running

due to cluster migration

Cortex-A7 MP4

Cluster

Migration

Power Efficiency

12

big.LITTLE MP Extends Battery Life

0%

50%

100%

150%

200%Cluster Migration

big.LITTLE MP

0%

20%

40%

60%

80%

100%

A7 CPU0 A7 CPU1 A7 CPU2 A7 CPU3 A15 CPU4 A15 CPU5

DVFS states : Temple run

LITTLE core idle LITTLE core low frequency

LITTLE core Med frequency LITTLE core high frequency

big core idle big core low frequency

big core Med frequency big core high frequency

LITTLE Cluster big Cluster

Cores in the big cluster are powered down

Single-thread performance on highly efficient

LITTLE cores enable increased power savings

Relative battery life on big.LITTLE MP

13

big.LITTLE MP Software

http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git

Linaro Landing Teams for Club and Core Members

Provides Software Support under NDA

Exclusive Landing Teams for each Member company

Services and Support Offered through ARM

Active Assist Design Review – big.LITTLE system

Technical Support & Application Notes

big.LITTLE MP Integration and Tuning Guides

On-site Software Training

big.LITTLE MP Support and Services Available

14

Agenda

Introduction to big.LITTLE Technology

Benefits of big.LITTLE Technology

Future big.LITTLE systems

Summary

Questions

15

Improved performance on big.LITTLE ARMv8

Cortex-A57: Highest performance big CPU in thermal envelope

Cortex-A53: Most energy efficient LITTLE CPU

ARMv8-A Enables 64-bit big.LITTLE

0

500

1000

1500

0 200 400 600 800 1000 1200Performance (Spec2000)

P

ow

er

(mW

)

Higher performance at same power

Extended range of efficiency

Cortex-A15 (ARMv7-A big)

Cortex-A7 (ARMv7-A LITTLE)

Cortex-A57 (ARMv8-A big)

Cortex-A53 (ARMv8-A LITTLE)

SpecInt2000 Power vs. Performance*

*SpecInt2000 on iso-process & 32-bit

16

SoC

Extending big.LITTLE MP for Thermal Management ARM Intelligent Power Allocation (IPA)

Tdie

Tskin Power transforms to heat

SoC

Device

IPA

Elements:

Proactive temperature control

big LITTLE GPU

Real time CPU & GPU

performance requests Performance Requests

Power estimation

Dynamic power allocation

big LITTLE GPU

Dynamic Allocation by:

•Performance required

•Thermal headroom

Allocated Performance

17

Intelligent Power Allocation in Action

Device temperature is below threshold

There are no constraints on power / performance

Every actor runs at max required frequency

Median filtered chart for clarity

Runnin

g

Fre

quency

Three consecutive runs of GLB TRex

Time

Max “big” freq

“big” running freq

Max “LITTLE” freq

“LITTLE” running freq

Max GPU freq

GPU running freq

18

Intelligent Power Allocation in Action

High load on GPU & low load on CPU

GPU gets allocated most of the power

Median filtered chart for clarity

Runnin

g

Fre

quency

Three consecutive runs of GLB TRex

Time

Max “big” freq

“big” running freq

Max “LITTLE” freq

“LITTLE” running freq

Max GPU freq

GPU running freq

19

Runnin

g

Fre

quency

Three consecutive runs of GLB TRex

Time

Max “big” freq

“big” running freq

Max “LITTLE” freq

“LITTLE” running freq

Max GPU freq

GPU running freq

Median filtered chart for clarity

Intelligent Power Allocation in Action

High load on CPU & low load on GPU

CPU gets allocated most of the power

20

Runnin

g

Fre

quency

Three consecutive runs of GLB TRex

Time

Max “big” freq

“big” running freq

Max “LITTLE” freq

“LITTLE” running freq

Max GPU freq

GPU running freq

Median filtered chart for clarity

Intelligent Power Allocation in Action

Device temperature gets hotter

IPA reduces available power to actors

This maintains temperature control

21

Intelligent Power Allocation in Action

0

10

20

30

40

1st Run 2nd Run 3rd Run Average

IPA

vs. T

raditio

nal

(Rela

tive

Perf

orm

ance

)

13% Improvement

34% Improvement

36% Improvement

28% Improvement

Median filtered chart for clarity

Runnin

g

Fre

quency

Three consecutive runs of GLB TRex

Time

Max “big” freq

“big” running freq

Max “LITTLE” freq

“LITTLE” running freq

Max GPU freq

GPU running freq

22

big.LITTLE Mobile 2015

GIC-400

I/O Coherent

Masters

Cortex-A57 Cortex-A53

DMC-400 Peripherals

MMU-400

MMU-400

DRAM (2 * x32 DDR3-1600)

DisplayDisplay

NIC-400

NIC-400

Mali T720

GPU

CoreLink CCI-400

TZC-400

MMU-400

23

ARM big.LITTLE Mobile Roadmap

Present Future ARM IP

CCI-400 Next-Gen Cache Coherent Interconnects

Cortex-A57 Next-Gen High Performance “big” CPUs

Cortex-A53 Next-Gen Power Efficient “LITTLE” CPUs

Global Task Scheduling ARM Software +

Cortex-A17

Cortex-A15

Cortex-A7

Intelligent Power Allocation

64-bit Android L Support +

24

Agenda

Introduction to big.LITTLE Technology

Benefits of big.LITTLE Technology

Future big.LITTLE systems

Summary

Questions

25

big.LITTLE is fast becoming the de-facto power optimization technology in mobile

big.LITTLE processing technology delivers best-in-class performance and energy

efficiency in devices today

Improved user-experience and prolonged battery life measured on real

smartphone devices

Devices transitioning to advanced big.LITTLE Technology with additional features

and IP support

Summary

26

Thank You


Recommended