View
217
Download
0
Category
Tags:
Preview:
Citation preview
Optimization Of Power Consumption For An ARM7-BASED Multimedia Handheld Device
Hoseok Chang; Wonchul Lee; Wonyong Sung
Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , Volume: 5 , 25-28 May 2003 Pages:V-105 - V-108 vol.5
Presenter: Chin-Chi Hu
112/04/18 Chin-Chi Hu 2/20
Abstract We have developed a multimedia handheld educational device and
optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.
112/04/18 Chin-Chi Hu 3/20
What’s the problem?
Multi-Tasking operating system and dynamic frequency scaling analysis the current consumption for system
Software optimization techniques improve software to reduce numbers of instruction
and clock cycle
CPU load estimation the CPU load for executing each software
components
Results and optimization
112/04/18 Chin-Chi Hu 4/20
Introduction
A low power multimedia handheld device only two AA-size batteries
It was needed to optimize DSP programs MP3 decoding LZW(Lempel-Ziv Welch) decompression speech recognition
Aspect ARM7 specific feature optimization of software components lowering the CPU clock frequency minimizes the idle time
112/04/18 Chin-Chi Hu 5/20
System architecture Speaking partner
ARM7TDMI 60MHz CPU 8KB cache graphic LCD controller synchronous DRAM controller IIS interface 8 channel of 10 bit ADC 128KB NOR flash for system ROM NAND flash and SMC (smart media card) for
program ROM SSFDC (solid state floppy disk card) and USB for
read / write
112/04/18 Chin-Chi Hu 7/20
Current consumption
The CPU drains some power even when the CPU load is very small although the CPU is mostly in the idle state It is advantageous for power reduction to use the
lowest possible clock frequency. The estimation of the minimum clock frequency for
a real-time implementation is needed
112/04/18 Chin-Chi Hu 8/20
Current consumption
This figure shows that the dynamic frequency scaling scheme is more efficient than the constant frequency operation with idle state when the load condition is low
112/04/18 Chin-Chi Hu 9/20
Current consumption
Current consumption at each hardware block (CPU load is 10%)
112/04/18 Chin-Chi Hu 10/20
Software optimization
ARM7TDMI processor has characteristics for implementing DSP algorithms large number of registers most of the instructions can be executed
conditionally 32 bit barrel shifter block load and store instructions are supported
ARM7TDMI processor has a relatively simple data path, where the hardware multiplier only has the accuracy of 32*8 bits
112/04/18 Chin-Chi Hu 11/20
Software optimization MP3 decoding algorithm
C language based high level optimization assembly language based low level optimization optimized by the conditional execution of
ARM7TDMI processor
112/04/18 Chin-Chi Hu 12/20
Software optimization block data transfer
is used for load (LDM) or store (STM) of any subset of currently visible registers to/from sequential memory
No block data transfer of 15 32-bit registers from registers to sequential memory
14S+2N+1I cycles From registers to memory using the store
instruction (STR) (1S+1N+1I)*15
S :sequential cycles N :non-sequential cycles I :internal cycles
112/04/18 Chin-Chi Hu 14/20
Software optimization Optimization for speech recognition
16bit multiplications instead of 32 bit multiplications 8% of cycle time reduction
employed several software optimization techniques loop fusion loop unrolling post increment/decrement conversion total execution time is reduced to about 30~45%
112/04/18 Chin-Chi Hu 15/20
CPU load estimation
The load for MP3 decoding is dependent on the bit rate and sampling clock frequency The CPU load with 60MHz
56kbps 22.05kHz : 10% 32kbps 22.05kHz : 9.6% 32kbps 16kHz : 7%
The load for CELP decoding is almost constant 18% of the 60MHz CPU load
112/04/18 Chin-Chi Hu 16/20
CPU load estimation
Processing time of LZW according to the number of pixels
Processing time of LZW according to the compressed data size
112/04/18 Chin-Chi Hu 17/20
CPU load estimation
Execution time prediction of each software component
112/04/18 Chin-Chi Hu 19/20
Experimental result No change the clock frequency of the CPU,
which would be a more aggressive power optimization approach which paying the delay for PLL relocking
112/04/18 Chin-Chi Hu 20/20
Concluding
A dynamic frequency scaling scheme is employed in order to reduce the CPU power consumption, which shows that 20% of system power saving can be achieved
The power analysis show that the current consumed at the DRAM is almost equal to that of the CPU core, which means that reducing cache miss is most important for lowering power consumption
The current can be further reduced, without any significant change in the power reduction algorithm Employ a CPU that supports the dynamic voltage scaling (Int
el’s Xscale)
Recommended