15
1 September 1, 2009 http://csg.csail.mit.edu/korea L01-1 CSE 4541.763: Complex Digital Systems for Software People Lecturers: Arvind & Jihong Kim TAs: Nirav Dave, K. Elliott Fleming, Myron King September 1, 2009 L01-2 http://csg.csail.mit.edu/korea Why take CSE 4541.763 Something new and exciting as well as useful Fun: Design systems that you never thought you would design in a course You are part of an experiment You will prove that it is possible to design complex digital systems with little knowledge of circuits

Why take CSE 4541 - ocw.snu.ac.kr

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why take CSE 4541 - ocw.snu.ac.kr

1

September 1, 2009 http://csg.csail.mit.edu/korea L01-1

CSE 4541.763: Complex Digital Systems for Software People

Lecturers: Arvind & Jihong KimTAs: Nirav Dave, K. Elliott Fleming, Myron King

September 1, 2009 L01-2http://csg.csail.mit.edu/korea

Why take CSE 4541.763

Something new and exciting as well as useful

Fun: Design systems that you never thought you would design in a course

You are part of an experimentYou will prove that it is possible to design complex digital systems with little knowledge of circuits

Page 2: Why take CSE 4541 - ocw.snu.ac.kr

2

September 1, 2009 http://csg.csail.mit.edu/korea L01-3

New, exciting and useful …

September 1, 2009 L01-4http://csg.csail.mit.edu/korea

Wide Variety of Products Rely on ASICsASIC = Application-Specific Integrated Circuit

Page 3: Why take CSE 4541 - ocw.snu.ac.kr

3

September 1, 2009 L01-5http://csg.csail.mit.edu/korea

Two chips, each with an ARM general-purpose processor (GPP) and a DSP (TI OMAP 2420)

Current Cellphone Architecture

Comms. Processing

Application Processing

WLAN RFWLAN RF WLAN RFWCDMA/GSM RF

September 1, 2009 L01-6http://csg.csail.mit.edu/korea

Real power saving implies specialized hardware

H.264 video decoder implementations in software vs. hardware

power/energy savings could be 100 to 1000 fold

but our mind set is that hardware design is:Difficult, risky

Increases time-to-market

Inflexible, brittle, error prone, ...Difficult to deal with changing standards, …

New design flows and tools can change this mind set

Page 4: Why take CSE 4541 - ocw.snu.ac.kr

4

September 1, 2009 L01-7http://csg.csail.mit.edu/korea

SoC & Multicore Convergence:more application specific blocks

On-chip memory banks

Structured on-chip networks

General-purpose

processors

Application-specific

processing units

September 1, 2009 L01-8http://csg.csail.mit.edu/korea

Source: http://www.intel.com/technology/silicon/mooreslaw/index.htm

What’s required?ICs with dramatically higher performance, optimized for applications

and at a size and power to deliver mobilitycost to address mass consumer markets

Page 5: Why take CSE 4541 - ocw.snu.ac.kr

5

September 1, 2009 L01-9http://csg.csail.mit.edu/korea

Custom and Semi-CustomHand-drawn transistors (+ some standard cells)High volume, best possible performance: used for most advanced microprocessors

Standard-Cell-Based ASICsHigh volume, moderate performance: Graphics chips, network chips, cell-phone chips

Field-Programmable Gate ArraysPrototyping Low volume, low-moderate performance applications

ASIC Design Styles

Different design styles require different design flows and have vastly different costs

September 1, 2009 L01-10http://csg.csail.mit.edu/korea

Exponential growth: Moore’s Law

Intel 8080A, 19743Mhz, 6K transistors, 6u

Intel 8086, 1978, 33mm2

10Mhz, 29K transistors, 3uIntel 80286, 1982, 47mm2

12.5Mhz, 134K transistors, 1.5uIntel 386DX, 1985, 43mm2

33Mhz, 275K transistors, 1u

Intel 486, 1989, 81mm2

50Mhz, 1.2M transistors, .8uIntel Pentium, 1993/1994/1996, 295/147/90mm2

66Mhz, 3.1M transistors, .8u/.6u/.35uIntel Pentium II, 1997, 203mm2/104mm2

300/333Mhz, 7.5M transistors, .35u/.25u

http://www.intel.com/intel/intelis/museum/exhibit/hist_micro/hof/hof_main.htmShown with approximate relative sizesShown with approximate relative sizes

Page 6: Why take CSE 4541 - ocw.snu.ac.kr

6

September 1, 2009 L01-11http://csg.csail.mit.edu/korea

Intel Penryn (2007)Dual coreQuad-issue out-of-order superscalar processors6MB shared L2 cache45nm technology

Metal gate transistorsHigh-K gate dielectric

410 Million transistors3+? GHz clock frequency

Could fit over 500 486 processors on same size die.

September 1, 2009 L01-12http://csg.csail.mit.edu/korea

But Design Effort is GrowingNvidia Graphics Processing Units

0

20

40

60

80

100

120

19

93

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

01

20

02

20

02

Design Effort per Chip

Transistors (M)

Relative staffing on front-end

Relative staffing on back-end

9x growth in back-end staff

5x growth in front-end staff

Front-end is designing the logic (RTL)Back-end is fitting all the gates and wires on the chip; meeting timing specifications; wiring up power, ground, and clock

Page 7: Why take CSE 4541 - ocw.snu.ac.kr

7

September 1, 2009 L01-13http://csg.csail.mit.edu/korea

Design Cost Impacts Chip CostAn Altera study

Non-Recurring Engineering (NRE) costs for a 90nm ASIC is ~ $30M

59% chip design (architecture, logic & I/O design, product & test engineering)30% software and applications development11% prototyping (masks, wafers, boards)

If we sell 100,000 units, NRE costs add

$30M/100K = $300 per chip!

Alternative: Use FPGAs

Hand-crafted IBM-Sony-Toshiba Cell microprocessor achieves 4GHz in 90nm, but at the development cost of >$400M

September 1, 2009 L01-14http://csg.csail.mit.edu/korea

Field-Programmable Gate Arrays (FPGAs)

Arrays mass-produced but programmed by customer after fabrication

Can be programmed by loading SRAM bits, or loading FLASH memory

Each cell in array contains a programmable logic functionArray has programmable interconnect between logic functionsOverhead of programmability makes arrays expensive and slow but startup costs are low, so much cheaper than ASIC for small volumes

Page 8: Why take CSE 4541 - ocw.snu.ac.kr

8

September 1, 2009 L01-15http://csg.csail.mit.edu/korea

FPGA Pros and ConsAdvantages

Dramatically reduce the cost of errorsLittle physical design workRemove the reticle costs from each design

Disadvantages (as compared to an ASIC)[Kuon & Rose, FPGA2006]

Switching power around ~12X worsePerformance up 3-4X worseArea 20-40X greater Still requires

tremendous design effort at RTL level

September 1, 2009 L01-16http://csg.csail.mit.edu/korea

What is needed to make hardware design easier

Extreme IP reuseMultiple instantiations of a block for different performance and application requirementsPackaging of IP so that the blocks can be assembled easily to build a large system (black box model)

Ability to do modular refinementWhole system simulation to enable concurrent hardware-software development

Need to inject modern software design techniques in hardware design

Bluespec

Page 9: Why take CSE 4541 - ocw.snu.ac.kr

9

September 1, 2009 L01-17http://csg.csail.mit.edu/korea

Bluespec: Enabling High-level Synthesis

VCD output

DebussyVisualization

C

Bluesim CycleAccurate

Bluespec SystemVerilog source

Verilog 95 RTL

Verilog sim

Bluespec Compiler

RTL synthesis

gates

FPGAPower estimation

tool

Power estimation

tool

September 1, 2009 http://csg.csail.mit.edu/korea L01-18

Fun: Design systems that you never thought you would design in a course

Page 10: Why take CSE 4541 - ocw.snu.ac.kr

10

September 1, 2009 L01-19http://csg.csail.mit.edu/korea

The new opportunity

“Big” FPGAs have become widely availableA multicore can be emulated on one FPGAbut the programming model is RTL and not too many people design hardware

Enable the use of FPGAs via Bluespec

September 1, 2009 L01-20http://csg.csail.mit.edu/korea

Some cool projects IBM PowerPC Prototype

Intel’s HAsim – Cycle-accurate performance models

AirBlue – A new platform to experiment with wireless protocols

Video decoder – H.264

Hardware software co-generation

Page 11: Why take CSE 4541 - ocw.snu.ac.kr

11

September 1, 2009 L01-21http://csg.csail.mit.edu/korea

IBM: PowerPC Prototype K. Ekanadham, Jessica Tseng (IBM)Asif Khan, M. Vijayaraghavan (MIT)

Goal: Implement a multithreaded, multicore, in-order PowerPC on an FPGA platform and boot Linux on it in 12 months

Team: 2(IBM) + 2(MIT) + Linux and FPGA help

The team accomplished the goal (September 2008)- Bluespec PowerPC boots Linux on FPGAs in 10min;- 100M instructions to reach “Hello World”; - 15K lines of Bluespec generated 90K lines of Verilog

IBM synthesized the generated Verilog using their tools in 40nm library

– ran at 500MHz in the first try!

Working on a public release by January 2010…

September 1, 2009 L01-22http://csg.csail.mit.edu/korea

HAsim: Performance modeling of CPUs Joel Emer … (Intel), M. Pellauer …(MIT)

Intel Asim: Framework for execution-driven simulationPerformance: 10s to 100s of KIPS for high-detail modelsParallelizing the simulator could get 3x to 5x

But want 1,000x or 10,000x speedupHAsim: Configure FPGAs into a simulator of the target design

4.7 MIPS5.1 MIPS2.4 MIPSSimulation MIPs

15.67.4941.1Average FMR

95.0 MHz96.9 MHz98.8 MHzClock Speed

25 (7%)25 (7%)18 (5%)Block RAMs

22,873 (69%)9220 (28%)6599 (20%)FPGA Slices

Out of Order5-stageUnpipelined

Prelim

inary

results

70% to 90% code reuse

Page 12: Why take CSE 4541 - ocw.snu.ac.kr

12

September 1, 2009 L01-23http://csg.csail.mit.edu/korea

AirBlue: A platform to experiment with wireless protocols Hari Balakrishnan, R. Gummadi, A. Ng, E. Flemming

SoftPHY: Expose signal quality to higher layersEnables new protocols

MIXIT (wireless network coding)PPR (Partial Packet Recovery)Rate adaptation

Allocate OFDM channels efficientlyVariable demandsVariable SNRs

Several cross-layer experiments have already been conducted on a 24Mbps implementation of 802.11 on AirBlue 1.0 platform

AirBlue 1.0 Fits in Nokia N95

working on Airblue 2.0 platform

September 1, 2009 L01-24http://csg.csail.mit.edu/korea

IP Reuse via parameterized modulesExample OFDM based protocols

MAC

MAC

standard specific

Scrambler FECEncoder Interleaver Mapper

Pilot &Guard

InsertionIFFT CP

Insertion

De-Scrambler

FECDecoder

De-Interleaver

De-Mapper

ChannelEstimater

FFT Synchronizer

TXController

RXController S/P

D/A

A/D

Different algorithms

Different throughput requirements

Reusable algorithm with different parameter settings

(Alfred) Man Cheuk Ng, …

Page 13: Why take CSE 4541 - ocw.snu.ac.kr

13

September 1, 2009 L01-25http://csg.csail.mit.edu/korea

H.264 Video DecoderElliott Fleming, Chun Chieh Lin

May be implemented in hardware or software depending upon ...

NALunwrap

Parse+

CAVLC

Inverse Quant

Transformation

DeblockFilter

IntraPrediction

InterPrediction

RefFrames

Com

pre

ssed

Bits

Fram

es

Different requirements for different environments- QVGA 320x240p (30 fps)- DVD 720x480p- HD DVD 1280x720p (60-75 fps)

Supported by Nokia

September 1, 2009 L01-26http://csg.csail.mit.edu/korea

H.264 in BluespecInitial Design: Base profile

Eight man-months8K lines of Bluespec

in contrast to 80K lines of C standardDecoded 720p @ 32FPS

Major architectural explorations over 3 months to meet different performance or cost criteria

High performance designs (4.2 mm sq in 180nm)720p@75FPS, 1080p@ 65FPS,

Low cost designs QCIF@15FPS (2.2mm sq), 720p@30FPS (2.4mm sq)

FPGA implementations for VGA output

Page 14: Why take CSE 4541 - ocw.snu.ac.kr

14

September 1, 2009 L01-27http://csg.csail.mit.edu/korea

Driver

Hw/Sw codesign in Bluespec:FEC Decoder Nirav Dave, Myron king

O/S

High-Level Driver(O/S adaptation)

Low-Level Driver(HW adaptation)

Hardware

Any changes in hardware affects the device driver

Split the device-driver

Make the low-level device driver the responsibility of the hardware team

Use Bluespec to describe both the hardware and the low-level device driver

Stable Interface

Application

Physical Bus

Interace

Driver Team

HW Team

The compiler is still under development

Has implications for parallel programming

September 1, 2009 http://csg.csail.mit.edu/korea L01-28

You are part of an experiment: You will prove that it is possible to design complex digital systems with little knowledge of circuits

Page 15: Why take CSE 4541 - ocw.snu.ac.kr

15

September 1, 2009 L01-29http://csg.csail.mit.edu/korea

Historical analogyIn fifties, before the invention of Fortran, if you had said it is possible to program a computer without understanding its architecture (e.g., how many registers the machine had), please would thought you were crazy.

We will show it is possible to design complex digital systems without understanding circuits

September 1, 2009 L01-30http://csg.csail.mit.edu/korea

The Course PhilosophyEffective abstractions to reduce design effort

High-level design language rather than logic gatesControl specified with Guarded Atomic Actions rather than with finite state machinesGuarded module interfaces automatically ensure correctness of composition of existing modules

Design discipline to avoid bad design pointsDecoupled units rather than tightly coupled state machines

Design space exploration to find good designsArchitecture choice has largest impact on solution quality

We learn by doing actual designs