ARM big.LITTLEâ„¢ Technology Gaming Audio Playback . 4 Mobile Application Workloads Sustained Performance

  • View
    0

  • Download
    0

Embed Size (px)

Text of ARM big.LITTLEâ„¢ Technology Gaming Audio Playback . 4 Mobile Application Workloads Sustained...

  • 1

    ARM® big.LITTLE™ Technology Unleashed An Improved User Experience Delivered

    Govind Wathan Product Specialist

    Cortex®-A Mobile & Consumer CPU Products

  • 2

     Introduction to big.LITTLE Technology

     Benefits of big.LITTLE Technology

     Future big.LITTLE systems

     Summary

     Questions

    Agenda

  • 3

     Mobile users spend a high amount of time on a

    range of mobile applications*:

     38% on web browsing and Facebook

     32% on gaming

     16% on audio, video and utility

     Common “building blocks” in workloads:

     Short bursts of high intensity

     Long periods of sustained high intensity

     Low intensity

    Mobile Application Workloads

    Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

    * Source: Flurry Analytics Time

    Time

    Time

    P o w

    e r

    P o w

    e r

    P o w

    e r

    Web Browsing

    Gaming

    Audio Playback

  • 4

    Mobile Application Workloads

    Sustained Performance Envelope

    Category 2

    Sustained Performance

    at Thermal Limit

    Category 3

    Long-use Low-Intensity

    Workloads

    Category 1

    Burst of High Intensity

    Workloads

    Example: Web Browsing

    Example: Castlemaster

    Example: Audio Playback

    Power

     Applications require a mix of performance levels

     Mobile users want a better user experience but not at a cost of reduced battery life

  • 5

    Mobile Application Workload Profiles P e rc

    e n ta

    ge o

    f T im

    e S

    p e n t

    in

    D V

    F S

    St at

    e s

    Category 2

    Sustained Performance

    at Thermal Limit

    Category 3

    Long-use Low-Intensity

    Workloads

    Category 1

    Burst High Intensity Workloads

    High

    Mid

    Low

    WFI / Power Down

    Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

     Applications require a mix of performance levels

     Mobile users want a better user experience but not at a cost of reduced battery life

    Idle

    Example: Web Browsing

    Example: Castlemaster

    Example: Audio Playback

  • 6

     Heterogeneous Computing

     2x higher performance vs. LITTLE only

     Up to 75% CPU power savings vs. big only

     Architecturally Identical Processors

     High performance tuned big cores

     Low power tuned LITTLE cores

     Hardware Coherency

     Cache Coherent Interconnect (CCI)

     L1 and L2 snooping between clusters

     Seamless & Automatic Task Allocation

    big.LITTLE Technology

    “Right Task on the Right Core”

    L2 Cache L2 Cache

    Cache Coherent Interconnect

    Interrupt Control

    Up to 40% SOC power savings*

    * Measured across a set of casual games and common use-cases on an ARM

    Partner 4xCortex-A15.4xCortex-A7 big.LITTLE device

    big Cluster

    LITTLE Cluster

  • 7

     Introduction to big.LITTLE Technology

     Benefits of big.LITTLE Technology

     Future big.LITTLE systems

     Summary

     Questions

    Agenda

  • 8

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    140%

    160%

    180%

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    140%

    160%

    180%

    Cluster Migration

    big.LITTLE MP

    Power

    big.LITTLE MP Software Evolution

    big.LITTLE

    Cluster

    Migration

    CPU

    Migration

    Global Task

    Scheduling

    (big.LITTLE MP)

    1

    2

    2

    2

    2

    1

    1

    1

    1 1

    2

    3

    4

    2

    3

    4

    1

    2

    3

    4

    Improving Performance and Efficiency

    2012 H1 2013 H2 2013

    Measured Power and Performance on big.LITTLE Devices

    (big.LITTLE MP relative to Cluster Migration)

    -29% -38% +20%

    +60% 5

    6

    7

    8

    Performance

    Web

    Browsing

    Intensive

    Gaming

    Web Browsing

    Intensive

    Gaming

    (Lower is Better) (Higher is Better)

  • 9

    big.LITTLE MP

     Delivers higher power efficiency

     Extends battery life

     Improves user experience

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    140%

    160%

    180%

    0%

    20%

    40%

    60%

    80%

    100%

    120%

    140%

    160%

    180%

    Cluster Migration

    big.LITTLE MP

    Power

    Measured Power and Performance on big.LITTLE Devices

    (big.LITTLE MP relative to Cluster Migration)

    -29% -38% +20%

    +60%

    Performance

    Web

    Browsing

    Intensive

    Gaming

    Web Browsing

    Intensive

    Gaming

    (Lower is Better) (Higher is Better)

  • 10

    Asphalt 7 Dungeon Defenders

    Video Playback

    Normalized Jank* (Less is Better)

    LITTLE only

    big.LITTLE

    big.LITTLE MP Improves User Experience (UX)

    * Measure of variance in frame rate

    Measurements conducted on the same big.LITTLE platform

    58% 65% 47% UX

    Improvement

    0%

    20%

    40%

    60%

    80%

    100%

    CPU0 CPU1 CPU2 CPU3 CPU4 CPU5

    DVFS states: Web Browsing with Audio

    LITTLE core Idle LITTLE core Low Frequency

    LITTLE core Mid Frequency LITTLE core High Frequency

    big core Idle big core Low Frequency

    big core Mid Frequency big core High Frequency

    Short bursts of performance on big cores enable

    sustained levels of smooth user-experience

    LITTLE cores handle background tasks and audio

    LITTLE Cluster big Cluster

  • 11

    0.00

    0.50

    1.00

    1.50

    2.00

    4x4 big.LITTLE MP vs. 4x4 Cluster Migration

    Efficiency

     SoC thermal budget constrains Cortex-A15

    cores to lower frequency resulting in lower

    benchmark performance

     35% average improvement in power efficiency across

    Single-Thread and Multi-Thread workloads

    1.2GHz

    Cortex-A15 MP4 Cortex-A7 MP4

    1.3 GHz

    Frequency residency profile while running Antutu CPU

    big.LITTLE MP

    Cluster

    Migration

    big.LITTLE MP Delivers Higher Power Efficiency

     Cortex-A15 and Cortex-A7 clusters at peak

    performance within the thermal budget

    1.7GHz 1.1GHz

    1.4GHz

    1.2GHz

    Cortex-A15 MP4

    A7 cores not running

    due to cluster migration

    Cortex-A7 MP4

    Cluster

    Migration

    Power Efficiency

  • 12

    big.LITTLE MP Extends Battery Life

    0%

    50%

    100%

    150%

    200% Cluster Migration

    big.LITTLE MP

    0%

    20%

    40%

    60%

    80%

    100%

    A7 CPU0 A7 CPU1 A7 CPU2 A7 CPU3 A15 CPU4 A15 CPU5

    DVFS states : Temple run

    LITTLE core idle LITTLE core low frequency

    LITTLE core Med frequency LITTLE core high frequency

    big core idle big core low frequency

    big core Med frequency big core high frequency

    LITTLE Cluster big Cluster

    Cores in the big cluster are powered down

    Single-thread performance on highly efficient

    LITTLE cores enable increased power savings

    Relative battery life on big.LITTLE MP

  • 13

     big.LITTLE MP Software

     http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git

     Linaro Landing Teams for Club and Core Members

     Provides Software Support under NDA

     Exclusive Landing Teams for each Member company

     Services and Support Offered through ARM

     Active Assist Design Review – big.LITTLE system

     Technical Support & Application Notes

     big.LITTLE MP Integration and Tuning Guides

     On-site Software Training

    big.LITTLE MP Support and Services Available

    http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git http://git.linaro.org/gitweb?p=arm/big.LITTLE/mp.git

  • 14

    Agenda

     Introduction to big.LITTLE Technology

     Benefits of big.LITTLE Technology

     Future big.LITTLE systems

     Summary

     Questions

  • 15

     Improved performance on big.LITTLE ARMv8

     Cortex-A57: Highest performance big CPU in thermal envelope

     Cortex-A53: Most energy efficient LITTLE CPU

    ARMv8-A Enables 64-bit big.LITTLE