17
Bill Finch, Sr. Vice President CAST, Inc. [email protected] • www.cast-inc.com The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges Linley Tech Processor Conference October 22–23, 2014

The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Embed Size (px)

DESCRIPTION

Announcing a new PipelineZero 32-bit processor IP core designed to achieve fantastic energy and processing efficiency for wearables, IoT sensors, and other mobile devices. Presented at the Linley Tech Processor Conference, October 22, 2014. Learn more at: http://www.cast-inc.com/news/2014-10-22-new-ba20-processor-ip-features-zero-stage-pipeline-for-energy-and-performance-efficiency

Citation preview

Page 1: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Bill Finch, Sr. Vice President CAST, Inc.

[email protected] • www.cast-inc.com

The BA20 Processor:Responding to IoT and Wearable

Device Energy Challenges

Linley Tech Processor ConferenceOctober 22–23, 2014

Page 2: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 2Announcing the BA20

Announcing the BA20 MPU

Energy-Optimized 32-bit Embedded Processor

PipelineZero™ ArchitectureCode-Dense ISA, Power Management, Full EcosystemIP core in RTL or FPGA netlist

Provided by CAST, Inc.20 years in IPBA2x Family, 8051s, GPUs, Compression, more

Developed by Beyond Semiconductor32-bit RISC/DSP experts

Page 3: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 3Announcing the BA20

Keys to a Low-Power µP

Consume as little energyas possible while idle

Leakage proportional to area

Use as little memoryas possible

Active and idle memory system power can be > CPU power

Complete tasks with thelowest possible energy cost

Small silicon footprint

High codedensity

High CoreMarksper MHZ &per µW

Page 4: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 4Announcing the BA20

Performance Realization

Process evolution has made processing power more than adequate for many applications

3 pipeline stages run > 400MHz X CoreMarks (40nm)5 pipeline stages > 800MHz X CoreMarks (40nm)

but

Some applications just need high performance efficiency (CM/MHz), not high frequency

3-5 stage pipeline processors are overkill!

Page 5: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 5Announcing the BA20

How Can We Do Better?

Keep the relevant best practicesVariable length ISA for better code densityAdvanced power-management support including frequency scalingInterrupt handling efficiency, debugging facilities, optional arithmetic acceleration, memory protection schemes …

Re-invent the basic architectureWhat about those “old fashioned” non-pipelined CPUs?

Page 6: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 6Announcing the BA20

Typical 3-Stage RISC Pipeline

Hazards Limit PerformanceData & Structural Hazards — execution delayBranch Hazards — branch target delay

Pre-Fetching Wastes Energy when a Branch is Taken

Pipeline Registers & Hazards Resolution Increase Area

(Pre)Fetch Decode WriteExecute

Page 7: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 7Announcing the BA20

BA20 PipelineZero Approach

Lack of Hazards = Higher PerformanceNo Data Hazards (execution and write on same cycle)No Structural Hazards (single-issue/no pipeline)No Branch Hazards (branch resolved & next fetch address in Execute)

Shorter Branch Shadow = Less Energy Waste

Fewer Pipeline Registers & Simpler Control= Smaller Area

Fetch Decode WriteExecute

Page 8: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 8Announcing the BA20

BA20: M4 Efficiency, M0+ Size

Silicon Area @ 40ηm BetterWorse

Bett

er

Wors

eC

ore

Mark

s/M

Hz

Cortex®-M4

3–4 stages.04 mm2

3.40 CM/MHzCortex®-M0+

2 stages.009 mm2

2.42 CM/MHz

BA200 (1) stages

.01 mm2

3.41 CM/MHz

Page 9: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 9Announcing the BA20

How does Higher Performance Lead to Lower Energy?

Do more in less time

Sleep for a longer time

Page 10: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 10Announcing the BA20

How does Higher Performance Lead to Lower Energy?

Allows lower clock ratesReduces clock tree and CPU power when activeEnables use of HVT cells and a smaller implementation both of which lower leakage power

Page 11: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 11Announcing the BA20

Best Practices: Code Density

BA2 ISA developed to take advantage ofstate-of-the-art compilers

Variable length instruction encoding16-bit, 24-bit, 32-bit and 48-bitCompiler chooses smallest suitable encoding Yields denser code than fixed-length ISAs

32 General Purpose Registers mean fewer load/store operations

Load/store ~25% of code for typical programs on RISC CPUs

Page 12: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 12Announcing the BA20

BA2 Code Size Advantage

20000

25000

30000

35000

40000

45000

MIPS PPC ARM Thumb2 BA22

CSiBe Benchmark

Providesbest-in-class code-densityand enables

high performance

implementations

Page 13: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 13Announcing the BA20

Code DensityBenefits

Smaller FLASH, ROM, RAM for code storage

Fewer accesses to instruction memory

Page 14: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 14Announcing the BA20

BA2 Code Size Benefits

Reduces memory size requirements and resulting product cost

Beken Chooses BA22 Processor to Satisfy Tight Constraints in New Mobile Bluetooth Audio Chip

“Beken’s evaluation determined that their program code for the BA22 core fits in a 128KB memory, versus 170KB for the next-closest competitor….”

CAST Press Release, June 2, 2014

Page 15: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 15Announcing the BA20

Best Practices: Power Management

Power and Clock GatingAutomatically gates clock of unused modulesBroadcasts modules status to enable power and/or clock gating

Dynamic Frequency ScalingIndependent SoC Bus and CPU clocks

Over-clock CPU, under-clock or shut-off peripheral bus, when computational load is highUnder-clock CPU & bus, when computational load is low

Wake-up on Interrupt, Tick-Timer or PM Event

Page 16: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 16Announcing the BA20

Best Practices: Development & Configurability

Easy software developmentComplete GNU Toolchain, customized Eclipse IDECycle-Accurate ISS, Ported C Libraries & OSsJTAG & Serial Debugging, Development Kits

Flexible Architecture and OptionsOptions for ALU and Interrupt ControllerJTAG or Serial Debug InterfaceOptional Memory Protection UnitPre-Integrated Peripherals

Page 17: The BA20 Processor: Responding to IoT and Wearable Device Energy Challenges

Slide 17Announcing the BA20

The New BA20 Processor IP

Ultra-Low PowerPipelineZero Architecturefor higher performance efficiency and lower areaAdvanced Power ManagementBA2 ISA for extreme code-density

Easy Software Development

Flexible Architecture and Peripheral Options to match different requirements

Better Business Terms (no-royalties)