Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
September 1, 2009 http://csg.csail.mit.edu/korea L01-1
CSE 4541.763: Complex Digital Systems for Software People
Lecturers: Arvind & Jihong KimTAs: Nirav Dave, K. Elliott Fleming, Myron King
September 1, 2009 L01-2http://csg.csail.mit.edu/korea
Why take CSE 4541.763
Something new and exciting as well as useful
Fun: Design systems that you never thought you would design in a course
You are part of an experimentYou will prove that it is possible to design complex digital systems with little knowledge of circuits
2
September 1, 2009 http://csg.csail.mit.edu/korea L01-3
New, exciting and useful …
September 1, 2009 L01-4http://csg.csail.mit.edu/korea
Wide Variety of Products Rely on ASICsASIC = Application-Specific Integrated Circuit
3
September 1, 2009 L01-5http://csg.csail.mit.edu/korea
Two chips, each with an ARM general-purpose processor (GPP) and a DSP (TI OMAP 2420)
Current Cellphone Architecture
Comms. Processing
Application Processing
WLAN RFWLAN RF WLAN RFWCDMA/GSM RF
September 1, 2009 L01-6http://csg.csail.mit.edu/korea
Real power saving implies specialized hardware
H.264 video decoder implementations in software vs. hardware
power/energy savings could be 100 to 1000 fold
but our mind set is that hardware design is:Difficult, risky
Increases time-to-market
Inflexible, brittle, error prone, ...Difficult to deal with changing standards, …
New design flows and tools can change this mind set
4
September 1, 2009 L01-7http://csg.csail.mit.edu/korea
SoC & Multicore Convergence:more application specific blocks
On-chip memory banks
Structured on-chip networks
General-purpose
processors
Application-specific
processing units
September 1, 2009 L01-8http://csg.csail.mit.edu/korea
Source: http://www.intel.com/technology/silicon/mooreslaw/index.htm
What’s required?ICs with dramatically higher performance, optimized for applications
and at a size and power to deliver mobilitycost to address mass consumer markets
5
September 1, 2009 L01-9http://csg.csail.mit.edu/korea
Custom and Semi-CustomHand-drawn transistors (+ some standard cells)High volume, best possible performance: used for most advanced microprocessors
Standard-Cell-Based ASICsHigh volume, moderate performance: Graphics chips, network chips, cell-phone chips
Field-Programmable Gate ArraysPrototyping Low volume, low-moderate performance applications
ASIC Design Styles
Different design styles require different design flows and have vastly different costs
September 1, 2009 L01-10http://csg.csail.mit.edu/korea
Exponential growth: Moore’s Law
Intel 8080A, 19743Mhz, 6K transistors, 6u
Intel 8086, 1978, 33mm2
10Mhz, 29K transistors, 3uIntel 80286, 1982, 47mm2
12.5Mhz, 134K transistors, 1.5uIntel 386DX, 1985, 43mm2
33Mhz, 275K transistors, 1u
Intel 486, 1989, 81mm2
50Mhz, 1.2M transistors, .8uIntel Pentium, 1993/1994/1996, 295/147/90mm2
66Mhz, 3.1M transistors, .8u/.6u/.35uIntel Pentium II, 1997, 203mm2/104mm2
300/333Mhz, 7.5M transistors, .35u/.25u
http://www.intel.com/intel/intelis/museum/exhibit/hist_micro/hof/hof_main.htmShown with approximate relative sizesShown with approximate relative sizes
6
September 1, 2009 L01-11http://csg.csail.mit.edu/korea
Intel Penryn (2007)Dual coreQuad-issue out-of-order superscalar processors6MB shared L2 cache45nm technology
Metal gate transistorsHigh-K gate dielectric
410 Million transistors3+? GHz clock frequency
Could fit over 500 486 processors on same size die.
September 1, 2009 L01-12http://csg.csail.mit.edu/korea
But Design Effort is GrowingNvidia Graphics Processing Units
0
20
40
60
80
100
120
19
93
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
01
20
02
20
02
Design Effort per Chip
Transistors (M)
Relative staffing on front-end
Relative staffing on back-end
9x growth in back-end staff
5x growth in front-end staff
Front-end is designing the logic (RTL)Back-end is fitting all the gates and wires on the chip; meeting timing specifications; wiring up power, ground, and clock
7
September 1, 2009 L01-13http://csg.csail.mit.edu/korea
Design Cost Impacts Chip CostAn Altera study
Non-Recurring Engineering (NRE) costs for a 90nm ASIC is ~ $30M
59% chip design (architecture, logic & I/O design, product & test engineering)30% software and applications development11% prototyping (masks, wafers, boards)
If we sell 100,000 units, NRE costs add
$30M/100K = $300 per chip!
Alternative: Use FPGAs
Hand-crafted IBM-Sony-Toshiba Cell microprocessor achieves 4GHz in 90nm, but at the development cost of >$400M
September 1, 2009 L01-14http://csg.csail.mit.edu/korea
Field-Programmable Gate Arrays (FPGAs)
Arrays mass-produced but programmed by customer after fabrication
Can be programmed by loading SRAM bits, or loading FLASH memory
Each cell in array contains a programmable logic functionArray has programmable interconnect between logic functionsOverhead of programmability makes arrays expensive and slow but startup costs are low, so much cheaper than ASIC for small volumes
8
September 1, 2009 L01-15http://csg.csail.mit.edu/korea
FPGA Pros and ConsAdvantages
Dramatically reduce the cost of errorsLittle physical design workRemove the reticle costs from each design
Disadvantages (as compared to an ASIC)[Kuon & Rose, FPGA2006]
Switching power around ~12X worsePerformance up 3-4X worseArea 20-40X greater Still requires
tremendous design effort at RTL level
September 1, 2009 L01-16http://csg.csail.mit.edu/korea
What is needed to make hardware design easier
Extreme IP reuseMultiple instantiations of a block for different performance and application requirementsPackaging of IP so that the blocks can be assembled easily to build a large system (black box model)
Ability to do modular refinementWhole system simulation to enable concurrent hardware-software development
Need to inject modern software design techniques in hardware design
Bluespec
9
September 1, 2009 L01-17http://csg.csail.mit.edu/korea
Bluespec: Enabling High-level Synthesis
VCD output
DebussyVisualization
C
Bluesim CycleAccurate
Bluespec SystemVerilog source
Verilog 95 RTL
Verilog sim
Bluespec Compiler
RTL synthesis
gates
FPGAPower estimation
tool
Power estimation
tool
September 1, 2009 http://csg.csail.mit.edu/korea L01-18
Fun: Design systems that you never thought you would design in a course
10
September 1, 2009 L01-19http://csg.csail.mit.edu/korea
The new opportunity
“Big” FPGAs have become widely availableA multicore can be emulated on one FPGAbut the programming model is RTL and not too many people design hardware
Enable the use of FPGAs via Bluespec
September 1, 2009 L01-20http://csg.csail.mit.edu/korea
Some cool projects IBM PowerPC Prototype
Intel’s HAsim – Cycle-accurate performance models
AirBlue – A new platform to experiment with wireless protocols
Video decoder – H.264
Hardware software co-generation
11
September 1, 2009 L01-21http://csg.csail.mit.edu/korea
IBM: PowerPC Prototype K. Ekanadham, Jessica Tseng (IBM)Asif Khan, M. Vijayaraghavan (MIT)
Goal: Implement a multithreaded, multicore, in-order PowerPC on an FPGA platform and boot Linux on it in 12 months
Team: 2(IBM) + 2(MIT) + Linux and FPGA help
The team accomplished the goal (September 2008)- Bluespec PowerPC boots Linux on FPGAs in 10min;- 100M instructions to reach “Hello World”; - 15K lines of Bluespec generated 90K lines of Verilog
IBM synthesized the generated Verilog using their tools in 40nm library
– ran at 500MHz in the first try!
Working on a public release by January 2010…
September 1, 2009 L01-22http://csg.csail.mit.edu/korea
HAsim: Performance modeling of CPUs Joel Emer … (Intel), M. Pellauer …(MIT)
Intel Asim: Framework for execution-driven simulationPerformance: 10s to 100s of KIPS for high-detail modelsParallelizing the simulator could get 3x to 5x
But want 1,000x or 10,000x speedupHAsim: Configure FPGAs into a simulator of the target design
4.7 MIPS5.1 MIPS2.4 MIPSSimulation MIPs
15.67.4941.1Average FMR
95.0 MHz96.9 MHz98.8 MHzClock Speed
25 (7%)25 (7%)18 (5%)Block RAMs
22,873 (69%)9220 (28%)6599 (20%)FPGA Slices
Out of Order5-stageUnpipelined
Prelim
inary
results
70% to 90% code reuse
12
September 1, 2009 L01-23http://csg.csail.mit.edu/korea
AirBlue: A platform to experiment with wireless protocols Hari Balakrishnan, R. Gummadi, A. Ng, E. Flemming
SoftPHY: Expose signal quality to higher layersEnables new protocols
MIXIT (wireless network coding)PPR (Partial Packet Recovery)Rate adaptation
Allocate OFDM channels efficientlyVariable demandsVariable SNRs
Several cross-layer experiments have already been conducted on a 24Mbps implementation of 802.11 on AirBlue 1.0 platform
AirBlue 1.0 Fits in Nokia N95
working on Airblue 2.0 platform
September 1, 2009 L01-24http://csg.csail.mit.edu/korea
IP Reuse via parameterized modulesExample OFDM based protocols
MAC
MAC
standard specific
Scrambler FECEncoder Interleaver Mapper
Pilot &Guard
InsertionIFFT CP
Insertion
De-Scrambler
FECDecoder
De-Interleaver
De-Mapper
ChannelEstimater
FFT Synchronizer
TXController
RXController S/P
D/A
A/D
Different algorithms
Different throughput requirements
Reusable algorithm with different parameter settings
(Alfred) Man Cheuk Ng, …
13
September 1, 2009 L01-25http://csg.csail.mit.edu/korea
H.264 Video DecoderElliott Fleming, Chun Chieh Lin
May be implemented in hardware or software depending upon ...
NALunwrap
Parse+
CAVLC
Inverse Quant
Transformation
DeblockFilter
IntraPrediction
InterPrediction
RefFrames
Com
pre
ssed
Bits
Fram
es
Different requirements for different environments- QVGA 320x240p (30 fps)- DVD 720x480p- HD DVD 1280x720p (60-75 fps)
Supported by Nokia
September 1, 2009 L01-26http://csg.csail.mit.edu/korea
H.264 in BluespecInitial Design: Base profile
Eight man-months8K lines of Bluespec
in contrast to 80K lines of C standardDecoded 720p @ 32FPS
Major architectural explorations over 3 months to meet different performance or cost criteria
High performance designs (4.2 mm sq in 180nm)720p@75FPS, 1080p@ 65FPS,
Low cost designs QCIF@15FPS (2.2mm sq), 720p@30FPS (2.4mm sq)
FPGA implementations for VGA output
14
September 1, 2009 L01-27http://csg.csail.mit.edu/korea
Driver
Hw/Sw codesign in Bluespec:FEC Decoder Nirav Dave, Myron king
O/S
High-Level Driver(O/S adaptation)
Low-Level Driver(HW adaptation)
Hardware
Any changes in hardware affects the device driver
Split the device-driver
Make the low-level device driver the responsibility of the hardware team
Use Bluespec to describe both the hardware and the low-level device driver
Stable Interface
Application
Physical Bus
Interace
Driver Team
HW Team
The compiler is still under development
Has implications for parallel programming
September 1, 2009 http://csg.csail.mit.edu/korea L01-28
You are part of an experiment: You will prove that it is possible to design complex digital systems with little knowledge of circuits
15
September 1, 2009 L01-29http://csg.csail.mit.edu/korea
Historical analogyIn fifties, before the invention of Fortran, if you had said it is possible to program a computer without understanding its architecture (e.g., how many registers the machine had), please would thought you were crazy.
We will show it is possible to design complex digital systems without understanding circuits
September 1, 2009 L01-30http://csg.csail.mit.edu/korea
The Course PhilosophyEffective abstractions to reduce design effort
High-level design language rather than logic gatesControl specified with Guarded Atomic Actions rather than with finite state machinesGuarded module interfaces automatically ensure correctness of composition of existing modules
Design discipline to avoid bad design pointsDecoupled units rather than tightly coupled state machines
Design space exploration to find good designsArchitecture choice has largest impact on solution quality
We learn by doing actual designs