GreenDroid Mobile Application Processor - Intranet...

Preview:

Citation preview

GreenDroidGreenDroid Mobile Application Mobile Application GreenDroidGreenDroid Mobile Application Mobile Application ProcessorProcessorMorici SimoneSpiriti EmanueleSpiriti Emanuele

The The mainmain scenarioscenarioThe The mainmain scenarioscenario

Key technological problem calledKey technological problem calledutilization wallPercentage of transistor switched at full frequency decrease because of powerq y pconstraintsD k iliDark silicon

UtilizationUtilization wallwallUtilizationUtilization wallwall

Before 2005 Threshold voltage and supplyBefore 2005 Threshold voltage and supplyvoltage scaled with new processgenerationsNowaday impossible because of leakagey p grelated limits

The IdeaThe IdeaThe IdeaThe Idea

In 2005 industrial shift from singleIn 2005 industrial shift from single-threaded to multicore processorsBut dark silicon area is still increasingDark silicon cheaper and cheaperDark silicon cheaper and cheaperPower budget gets exponentially more

l blvaluable

Trade this low-cost resource toTrade this low cost resource toincrease energy efficiency

Brief Brief descriptiondescription of of GreenDroidGreenDroidBrief Brief descriptiondescription of of GreenDroidGreenDroid45nm multicore prototype45nm multicore prototypeIt targets the Android software stackA i ll d “C iAutomatically generated “Conservationcores” (c-cores from now on) used to

d ti (i t d f reduce energy consuption (instead of maximize performance)T l U ili i W ll d Try to solve Utilization Wall and consequently Dark Silicon issuesPatching support to support software updates

The The architecturearchitecture 11//22The The architecturearchitecture 11//22

We have an array/matrix of tilesWe have an array/matrix of tilesEach tile is made up of:◦ General purpose CPU◦ OCN (On Chip Network, point to pointOCN (On Chip Network, point to point

mesh interconnection) ◦ L1 data cache◦ L1 data cache◦ Specialized interface◦ 8 to 15 c-cores per tile

Each tile is unique.q

ArchitectureArchitecture 22//22ArchitectureArchitecture 22//22

Each c core is coupled to the CPU via the Each c-core is coupled to the CPU via the L1 cache and the specialized interfaceThis lets the CPU: ◦ Pass arguments to c-coresPass arguments to c cores◦ Perform context switches

R fi th h d◦ Reconfigure the hardware

OCN used for memory traffic and synchronization.

The The architecturearchitectureThe The architecturearchitecture

Architecture Architecture detailsdetailsArchitecture Architecture detailsdetailsThe CPU is 32 bit 7-stage in-order pipeline The CPU is 32 bit, 7 stage, in order pipeline and has a FPU and a multiplierThe frequency of 1.5 GHz is set by the The frequency of 1.5 GHz is set by the cache access timeAll the L1 cache of the tiles provide a largerp gL2 cacheCoherence provided by L2 light weightdirectories at DRAM interfaces which usethe L1 as victim caches

d h i h c-cores are power gated otherwise the budget is exceeded

ExecutionExecution ModelModelExecutionExecution ModelModelThe execution starts on one of the CPU’sThe execution starts on one of the CPU sWhen the CPU recognizes the hot code transfers the execution on the appropriate c-coreExecution moves from tile to tile wrt the availability and their specializationavailability and their specializationData associated with a given c-core usuallyresides in the associated L1 cacheresides in the associated L1 cacheC-cores largely transparent to developers.

The The AndroidAndroid stackstackThe The AndroidAndroid stackstack

The The AndroidAndroid stackstackThe The AndroidAndroid stackstack

The hot code reside mainly in:The hot code reside mainly in:◦ Commonly used application (ex web browser,

mail)◦ Application libraries◦ Dalvik virtual machine◦ Few location of the kernelFew location of the kernel

95% of the code is covered by c-coresi d 72% i d h i lexecution and 72% is due to the virtual

machine

MainMain ideasideas behindbehind cc corescoresMainMain ideasideas behindbehind cc--corescoresWe must do a profiling of the codeWe must do a profiling of the codeA specialized circuit (c-core) tries to mirrorthe hot code adding an extra logic thatthe hot code adding an extra logic thatallows patchingC ld d th CPUCold code runs on the CPUSpecialized compiler is responsible to

i h d li i h h recognize what code aligns with the c-coresWe also have a runtime system that managesthe allocation of c-cores according to availabilty

CC corescores detailsdetailsCC--corescores detailsdetailsData pathData path◦ Functional units (adders, shifters) to execute

instructionsinstructions◦ Multiplexer to implement control decisions◦ RegistersRegistersControl unit◦ Implements the state machine that mirrors the ◦ Implements the state machine that mirrors the

Control Flow Graph◦ Tracks branch outcomes (computed in data path) Tracks branch outcomes (computed in data path)

to determine witch hardware block must be active

A A graphicalgraphical representationrepresentationA A graphicalgraphical representationrepresentation

SynthesizingSynthesizing cc corescoresSynthesizingSynthesizing cc--corescores

The design of the c cores is not done by The design of the c-cores is not done by hand.A C/C++-to-Verilog toolchain isused to convert the code in hardwareThe toolchain identifies the main loopspand functions given a target workloadTh CFG d th d t t l fl hThe CFG and the data control flow graphare created

SynthesizingSynthesizing cc corescoresSynthesizingSynthesizing cc--corescoresThe compiler generates:The compiler generates:◦ The verilog code for the control unit◦ The data path that closely mimics the representations◦ Function stubs that applications can call in place of the

original functions to invoke the hardware◦ Description of the c-core, used when we update the p p

functionSmall changes in source code correspond to small changes in hardwarechanges in hardwareSince the target is to minimize the energy consuptionand not to achieve better performance we can

l it C t t th h t t exploit many more C constructs than when we try to get more parallelism in the code

PatchingPatchingPatchingPatchingSince the software evolves the c-core must Since the software evolves the c-core must adapt too◦ Redifine compile time constants in hardware◦ Redifine compile time constants in hardware◦ Exception mechanism that allows to transfer the

control back and forth the CPU and the c-corescontrol back and forth the CPU and the c coresThe area of the chip is increasedBut the experiments show that the But the experiments show that the adaptation process can hold for a decadeR b th t th lif l f Remember that the mean lifecycle of a smartphone is 3 years

AcceleratorsAcceleratorsAcceleratorsAcceleratorsThe main part of specialized hardware is used to The main part of specialized hardware is used to achieve better performanceWe need simple code that exposes parallelism

d l h hand a simple way to create a circuit that is neithercostly nor complexIn GreenDroid accelerators are mainly used to In GreenDroid accelerators are mainly used to reduce energy consuptionMore code can be suitable to create a c-core thatexecutes itFirst we accelerate the code that can be

ll li d h k h i i d d parallelized, then we take the remaining code and we try to map it to c-cores as much as possible

HighHigh levellevel synthesissynthesis toolstoolsHighHigh--levellevel synthesissynthesis toolstools

Since the code is different and lessSince the code is different and lessparallelizable we must have a completelyautomatic toolchainWe can’t have user aided process becausep◦ Code is too large◦ Code is constantly evolving◦ Code is constantly evolving◦ HLS supports I/O and system calls◦ Also parts of the kernel are translated

ConclusionsConclusionsConclusionsConclusions56% less energy consuption due to the 56% less energy consuption due to the absence of fetch/decode, register file on c coresc-cores35% energy savings come from the

i li i f h dspecialization of the codeThese are great results and sinceutilization wall is exponentially increasingthis way of thinking must be considered in y gevery future architecture both desktop or mobile

Recommended