EE290 A: Advanced Topics in CAD Component Based …newton/Classes/EE290sp99/... · Component Based Design of Electronic Systems Lecture 10 ... µP DSP Encryp-tion Reused blocks

1

EE290 A: Advanced Topics in CADEE290 A: Advanced Topics in CADComponent Based DesignComponent Based Design

of Electronic Systemsof Electronic SystemsLecture 10Lecture 10

Professor Kurt Professor Kurt KeutzerKeutzer

Department of Electrical Engineering andDepartment of Electrical Engineering and

Computer SciencesComputer Sciences

University of California at BerkeleyUniversity of California at Berkeley

Spring 1999Spring 1999

2

Why components?

Why are components/IP blocks an attractive way to design

electronic systems today?

What are the alternative design methodologies?

3

NRTS: Raw Silicon Capability

1

4M

8M

16M

32M

64M

128M

256M

1997250nm

1999180nm

2001150nm

2003130nm

2006100nm

512M

200970nm

201250nm

Total microprocessor tr. Microprocessor logic tr.cm2 ASIC logic tr. cm2

24M tr/6M gates

64M tr/16M gates

4

Design Productivity

1

Log

ic T

rans

isto

rs p

er C

hip

(K

)

P

rodu

cti v

i ty

Tra

ns. /S

t aff

- M

o.

10

100

1,000

10,000

100,000

1,000,000

10,000,000

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000Logic Tr./ChipTr./S.M.

58%/Yr. compoundComplexity growth rate

21%/Yr. compoundProductivity growth rate

Source: SEMATECH19

81

198 3

1 98 5

1 98 7

1 98 9

1 99 1

1 99 3

1 99 5

1 99 7

1 99 9

2 00 3

2 00 1

2 00 5

2 00 7

2 00 9

xxx

x xx

x

2.5µ

.10µ

.35µ

5

NRTS: Prediction - Increasing design reuse

10%

20%

30%

40%

50%

60%

70%

1997250nm

1999180nm

2001150nm

2003130nm

2006100nm

80%

200970nm

201250nm

90%

6

Alternative: El Greco design flow

Optimized codegeneration for

SCENIC(+VHDL &Verilog)

ASIC QoRunderpinnedby Synopsys

synthesisCore vendor’sown tools

HardwareImplement-

ation

System Integration and Verification(System Test - Hardware & Software)System Integration and Verification

(System Test - Hardware & Software)

Refinement(e.g. Float to Fixed)

SoftwareImplement-

ation

System-levelHW-SW

Coverification

Models

Algorithm/ProtocolDesign/Capture/Verify

ManualPartitioning

System designer controls HW-SW codesign

- El Greco shortens the loop

C/C++

Alternative is

synthesis from a

very high-level

description -

e.g. El

Greco/Synopsys

7

DOMAIN SPECIFIC

RTL

GATE

TRANSISTOR

BEHAVIORAL

Design Productivity by Approach

a

b

s

q0

1

d

clk

GATES/WEEK(Dataquest)

100 - 200

1K - 2K

2K - 10K

8K - 12K

10 - 20

8

To Design, Implement, Verify ...

a

b

s

q0

1

d

clk

Beh

RTL

62.5

125

625

6250

62,500

10M tr/2.5M gatesStaff Months

gate

tr

400

800

4000

40,000

400,000

64M tr/16MStaff Months

150

300

1500

15,000

150,000

24M tr/6MStaff Months

9

Even for 10M transistors ...

a

b

s

q0

1

d

clk

62.5

125

625

6250

62,500

Power

Delay

Area

Beh

RTL

Implementations here are often not good enough

Because implementationshere are inferior/ unpredictable

Staff Months

10

Productivity vs. QOR

QOR -e.g. clock speed (getting better)

des

ign

tim

e (i

ncr

easi

ng

)

11

Productivity vs. QOR


des

ign

tim

e (i

ncr

easi

ng

)PCI interface 66MHz.

Embedded mp 200MHz.

High-end mp 800MHz.

12

After only one process generation


des

ign

tim

e (i

ncr

easi

ng

)PCI interface 66MHz.

Embedded mp 200MHz.

High-end mp 800MHz.

13

Moral of the Story

Each process generation will

● Make it twice as easy to realize a fixed (e.g. 66MHz.)timing spec

● Make it twice as easy to realize a fixed (e.g. 100 mm)area requirement

● Make time-to-market requirements 20% morestringent

Time is always on the side of more productive EDAtools/methodologies, but current high-productivitysynthesis methodologies are still not meeting QORrequirements

14

Why components? - summary

Increasing re-use of intellectual property

blocks/components is the consensus because:

● Moore’s law continues to provide significantly greatersilicon capability

● Significant productivity improvements are required

● Alternative methodologies are not coming forward -and the implementation quality does not meetrequirements

15

What type of components?


● What size of component?

● What type/capability of component?

16

Delay degradation in DSM

• With scaling processes, interconnect delay becomes adecreasing portion of stage delay, even with noise

0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.260

102030405060708090

100110120130

0102030405060708090100110120130

2-input NAND, FO = 2, W/L = 20, 50K gate block

0.250.180.130.1

Del

ay (

ps)

Process Generation (µm)

Gate delay Stage delay (w/o noise) Stage delay (w/ noise)

17

Dynamic Power Analysis, 50K blocks

Parameters: Packing density from NTRS Switching activity = 0.15

Device sizing set at W/L = 20Routing density set at 0.4 (M1/2) and 0.2 (M3/4)

• 50K blocks

• Analysis focuseson dense std. celllogic parts of anASIC

• Frequency setaccording to NTRS

• Power density isroughly constantthrough scaling0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.250.180.130.1

Po

wer

Den

sity

(W

/mm

2 )

Generation (µm)

Total power density Interconnect portion

18

Dynamic Power Analysis, 100K & 200K Blocks

Parameters: Packing density from NTRS Switching activity = 0.15

Device sizing set at W/L = 20Routing density set at 0.4 (M1/2) varied from 0.2 to 0.3 (M3/4)

• Various blocksizes

• Normalized to50K gate blockperformance

• Numbers arerelativelyindependent oftechnologygeneration

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.0

1.1

1.2

1.3

1.4

1.5

1.6

200K blocks, equivalent performance

200K blocks,optimal sizing

100K blocks, equivalent performance

100K blocks,optimal sizing

Po

wer

* D

elay

(n

orm

aliz

ed)

19

Within a 50K - 100K Module

75 - 100% delay in gates3.5µ - 1.0µ1980 - 1990.18µ - 0.1µ1998 - 2005

This flowshould workOK for blocksof 50-100Kgates andshouldcontinue towork OK in thefuture

Library

RTLSynthesis

HDL

netlist

logicoptimization

netlist

physicaldesign

layout

Proper sizing within flow isabsolutely required

• typically: size down then up

• try: size up- then down

20

Is 100K too small? Look at Dec Alpha

21

Details (Alpha 21264)

U nit # A spect R atio # T ransistors

Instruction cache 1 0 .73 2 .9MIT B 1 0 .56 284kP C 1 0 .91 488kB ranch P redictor 1 0 .53 337kD ata cache 1 0 .82 2 .8MD T B 2 0 .74 419kM B ox 1 0 .61 586kL D /ST R eorder U nit 1 0 .78 612kL 2 Cache/System IO 1 0 .79 596k

In teger Exec 2 0 .75 290k2 0 .54 404k

In teger Q ueue 1 0 .5 617kIn teger Reg F ile 2 0 .91 217kIn teger M apper 1 0 .71 432k

F P d iv/sort 1 0 .57 252KF P add 1 0 .97 429kF P Q ueue 1 0 .81 515kF P R eg File 1 0 .67 296kF P M apper 1 0 .81 515kF P m ul 1 0 .61 725k

uP 24 0 .81 15.2M

A. Tabbara - UC Berkeley

*

*

22

Details of Block Size

A. Tabbara - UC Berkeley

200 tr/50K gates

400 tr/100K gates

23

Components: Next Generation SSI?

• Gary Smith, Principal Analyst at Dataquest writes:

• ‘‘The amount of confusion within the CAD community has been the big

surprise in the effort to develop System Level Integration (SLI) design

methodology. … How do you design a million gate IC ? … Fortunately

Kurt Keutzer gets it. The first thing we needed to know was the

optimum size of a basic SLI library element. The work, at Berkeley, now

tells us it’s 50,000 gates. So we have defined today’s SSI. Now the

challenge is to develop the 200 to 400 basic library elements needed to

really do SLI design. ‘’

24

Design Paradigm: Re-useable IP

Semiconductor Industry

3rd PartyDesigners

CustomerDesigns

µP DSP Encryp-tion

Reused blocks

Siliconcapabilityenablesintegration ofentire systemson a single die

Achieve required design productivityby assembling re-useableblocks of intellectual property (IP)

25

What type of component? Reuse at what level?

Entire sub-system

● e.g. MPEG

Large module

● DCT, motion estimation

Sub-module

● Filter, multiplier

Primitive component

● gate, full-adder

``We want to reuse at the highest level that we can.’’ -

Lance Mills, HP

26

Type/Functionality of Components

Video: MPEG, DVD, HDTV

Audio: MP3, voice recognition

Processors: CPUs, DSPs, Java

Networking: ATM, Ethernet,

ISDN, FibreChannel, SONET

Bus: PCI, USB, IEEE 1394

Memory: SRAM, ROM, CAM

Wireless: CDMA, TDMA

Communication: modems, transceivers

Coding: speech, Viterbi, Reed-Solomon

Display drivers/controllers: TFT

Other: sensors, encryption/decryption, GPS

Power PC core: 3.1mm2 in 0.35µ

ARM Core: 3.8 mm2 in 0.35µ

MPEG2 Decoder: ~65k gates

PCI Bus: ~8k gates

Ethernet MAC: ~7k gates (soft)

RSA Encryption: ~7k gates

Niraj Shah

27

How will components be implemented?

What are the most promising implementation media?● SW

● SW running on a standard processor - MIPS R4000● SW running on a tailored processor - TI TMS320C54● SW for a configurable processor - XTENSA● SW for an application-specific processor - C-Cube

MPEG decoder

● HW

● hard - actual layout● firm - netlist● soft - synthesizable RTL model in VHDL?/ Verilog?

The aim of this course is to make us the authority on these questions!

28

How much do we lose in hard vs soft IP?

gate array

std cell

stdcell 2 sizes

std cell6 sizes

stdcelllimitless

dynamiclibrary

RTLsyn

autoP&R

tiling

manuallayout

*

*5X speed1/4X area1/10X power

29

What do we gain with soft vs. hard IP?

FAB portability - soft, firm IP is migratable to multiple

processes

Modifiability - soft IP can be user modified (a plus?)

Process migratability - a soft, firm IP design can more

easily be migrated to the next process generation

Wider range of QOR of implementation

30

Interconnect Complexities

Interconnect effects play a major role in the increasing costs for large

hard-block design styles

Without new Circuit Fabrics and the flexibility they offer, interconnect

problems will significantly impact performance and cost for emerging IC

technologies

Occ

urre

nce

Rat

e(N

orm

aliz

ed)

sizedie

wirelength

_~0.5

Local wires

Global wires

blocks

globalwires

Pileggi - CMU

31

Technology Scaling

Block sizes cannot grow as rapidly as chip sizes since block design

becomes increasingly more difficult --- each block is a chip design over

multiple configurations

If the blocks are inflexible, the global wiring problems begin to dominate

all aspects of performance quality and system cost

Occ

urre

nce

Rat

e(N

orm

aliz

ed)

sizedie

wirelength

_~0.5

Larger chip with finer feature sizes

Pileggi - CMU

32

Softer Fabrics

With soft, flexible Fabrics, the system assembly can more thoroughly

exploit the available technology

The interconnect problem is controlled via: soft boundaries for area re-

shaping; re-synthesis and re-mapping for timing; smart wires; and top-

down specified Fabric synthesis

Occ

urre

nce

Rat

e(N

orm

aliz

ed)

sizedie

wirelength

_~0.5

Superior timing, power and cost

Pileggi - CMU

33

What methodology will prevail?

Objected oriented• Top down

• Correct by construction

• Highly sophisticated user

• Intelligence of the system is

more in the tools than the IP

• Interfaces are carefully tuned

for application

• High performance oriented

Component-oriented• Bottom up

• Robust and error tolerant

• Relatively unsophisticated user

• Intelligence is in the IP, tools are

relatively simple

• Standard ``plug and play’’ interfaces

for a broad range of applications

• High productivity oriented

34

Outline of issuesWhy components?

● Raw silicon capability

● Design productivity


● What size of component?

● What type/capability of component?

How will they be implmented?

● Review of implementation alternatives

What methodologies are likely to succeed?

● Object-oriented vs. component-oriented

Who are the players?

● foundries,

● fabless semiconductor, 3rd party IP providers, vertical semiconductor,

● system companies

Which design styles are likely to predominate

● Time-to market (productivity)

● Features● Process portability● In-field up-gradabilty, programmability● Quality of results

35

Interrelationship of issues

Semiconductor ProcessingCapability

System FunctionalityDesired

DesignConstraints

DesignProductivity

Business Issues

ImplementationApproach

Documents

EE290 A: Advanced Topics in CAD Component Based …newton/Classes/EE290sp99/... · Component Based Design of Electronic Systems Lecture 10 ... µP DSP Encryp-tion Reused blocks