VLSI Programming

Embed Size (px)

Citation preview

  • 7/30/2019 VLSI Programming

    1/61

    VLSI Programming: Lecture 1

    Course 2IN35

    Course: Kees van Berkel [email protected]

    Rudolf Mak [email protected]

    Lab: Kees van Berkel, Rudolf Mak

    2/7/20121

    Wei Tong, Hrishikesh Salunkhe

    www: http://www.win.tue.nl/~cberkel/2IN35/

    Lecture 1 Introduction

  • 7/30/2019 VLSI Programming

    2/61

    Introduction to VLSI Programming: goals

    to acquire insight in the description, design, andoptimization of fine-grained parallel computations;

    to acquire insight in the (future) capabilities of VLSIas an implementation medium of parallel computations;

    2/7/20122

    to acquire skills in the design of parallel computations andin their implementation on FPGAs.

  • 7/30/2019 VLSI Programming

    3/61

    Contents

    Massive parallelism is needed to exploit the huge and stillincreasing computational capabilities of Very Large ScaleIntegrated (VLSI) circuits:

    we focus on fine-grained parallelism(not on networks of computers);

    2/7/20123

    we assume that parallelism is by design(not by compilation);

    we draw inspiration from consumer applications, such asdigital TV, 3D TV, image processing, mobile phones, etc.;

    we will use Field Programmable Arrays (FPGA) as fine-grained abstraction of VLSI for practical implementation.

  • 7/30/2019 VLSI Programming

    4/61

    FPGA IC on the Xilinx XUP Board

    2/7/20124

    XC2VP30

    FPGA

  • 7/30/2019 VLSI Programming

    5/61

    Lab work prerequisites

    Notebook

    Exceed (can be obtained through the software distributionof the university)

    Access to UNIX server Dept. W&I

    2/7/20125

    ,

    Lab work is by teams of two students.

    Have FPGA tools (SW) installed on your machine by Feb 28

  • 7/30/2019 VLSI Programming

    6/61

    VLSI Programming: time table 2012

    date class | lab subject (provisional)

    Feb 7 2 | 0 hours introduction, DSP representations, bounds

    Feb 14 2 | 0 hours pipelining, retiming, transposition, J-slow, unfolding

    Feb 21 no lecture (carnaval) (have FPGA tools installed)

    Feb 28 2 | 0 hours unfolding (cntd), look-ahead, strength reduction

    Mar 6 1 | 3 hours Introductions FPGA, Verilog, Lab (A1)

    2/7/20126

    Mar 20 1 | 3 hours folding; FPGAs: pipelining, retiming (A2)

    Mar 27 1 | 3 hours DSP processors; FPGAs: parallelism, strength reduc. (A3)

    Apr 3 1 | 3 hours FPGAs: sample-rate conversion (A4)

    Apr10,17 2x no lecture (examinations)

    Apr 24 1 | 3 hours FPGAs: video scaler (A5); hand-in report A3

    May 1 1 | 3 hours FPGAs: video scaler (A5 cntd)

    May 22 deadline final report A4, A5

  • 7/30/2019 VLSI Programming

    7/61

    Course grading (provisional)

    Your course grade is based on:

    the quality of your programs/designs [30%];

    your final report on the design and evaluationof these programs (guidelines will follow) [30%];

    2/7/20127

    programs, the report and the lecture notes [20%];

    intermediate assignments [20%].

    Credits: 5 points = based on 140 hours on your side

  • 7/30/2019 VLSI Programming

    8/61

    Note on course literature

    First half of VLSI programming course is based on:

    Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design andImplementation. Wiley Inter-Science 1999.

    This book is recommended.

    Accompanying slides can be found on:

    2/7/20128

    http://www.ece.umn.edu/users/parhi/slides.html http://www.win.tue.nl/~cberkel/2IN35/

    Mandatory reading:

    Keshab K. Parhi. High-Level Algorithm and ArchitectureTransformations for DSP Synthesis. Journal of VLSI SignalProcessing, 9, 121-143 (1995), Kluwer Academic Publishers.

  • 7/30/2019 VLSI Programming

    9/61

    Introduction

    Some inspiration from the technology sideVLSIFPGAs

    Some inspiration from the application sideMachine Intellli ence

    2/7/20129

    Bee, SKA, SETIDigital Signal Processing (Software Defined Radio)

    Parhi, Chapters 1, 2DSP Representation MethodsIteration bounds

  • 7/30/2019 VLSI Programming

    10/61

    Some inspiration

    2/7/201210

    rom t e tec no ogy s e

  • 7/30/2019 VLSI Programming

    11/61

    Vertical cut through VLSI circuit

    2/7/201211

  • 7/30/2019 VLSI Programming

    12/61

    Intel 4004 processor [1970]

    1970

    2/7/201212

    4-bit

    2300transis

    tors

  • 7/30/2019 VLSI Programming

    13/61

    Intel Itanium 2 6M Processor [2004]

    2/7/201213

  • 7/30/2019 VLSI Programming

    14/61

    Intel Itanium 2 6M Processor [2004]

    Power dissipation 130 W (107 W typical) 3 Levels of Cache: 16k + 16k, 256K, 6M

    CMOS technology: 130nm Clock frequency1.5 GHz 7,877 pins, 95 percent of the are for power

    2/7/201214

    1,322 Specint base2000 for a single processor! Next generation: 1.7GHz and 9MByte cache!

    Rusu et. al [Intel], Itanium 2 Processor 6M: Higher Frequency andLarger L3 Cache, IEEE Micro April 2004, vol. 24, Issue 2, pp. 10-18.

  • 7/30/2019 VLSI Programming

    15/61

    Moores Law

    2/7/201215

  • 7/30/2019 VLSI Programming

    16/61

    Rule of two [Hu, 1993]

    Every 2 generations of IC technology (6 years)

    device feature size 0.5 x chip size 2 x clock frequency 2 x (no longer true)

    2/7/201216

    number of i/o pins 2 x DRAM capacity 16 x logic-gate density 4 x

  • 7/30/2019 VLSI Programming

    17/61

    ITRS: INTERNATIONAL TECHNOLOGY

    ROADMAP FOR SEMICONDUCTORSThe overall objective of the ITRS is to present industry-wide

    consensus on the best current estimate of the industrysresearch and development needs out to a 15-year horizon.

    As such, it provides a guide to the efforts of companies,universities, governments, and other research providers/funders.

    The ITRS has improved the quality of R&D investment decisionsmade at all levels and has helped channel research efforts toareas that most need research breakthroughs.

    Involves over 1000 technical experts, world wide.

    a self-fulfilling prophecy? or wishful thinking?

    2/7/2012ST-Ericsson confidential17

  • 7/30/2019 VLSI Programming

    18/61

    2009 ITRSMPU/high-performance ASIC

    Half Pitch and Gate Length Trends

    2/7/2012ST-Ericsson confidential18

  • 7/30/2019 VLSI Programming

    19/61

    2009 ITRS Functions/chip and chip size

    2/7/201219

  • 7/30/2019 VLSI Programming

    20/61

    Transistor counts [M]/ Micro Processor Unit

    2/7/2012ST-Ericsson confidential20

  • 7/30/2019 VLSI Programming

    21/61

    FPGA, the last 10 years

    Price: 300x100

    1000

    Capacity

    Speed

    Virtex-II Pro

    Memory

    Bandwidth

    2/7/201221

    1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03 1/04

    Year

    Speed: 20x

    Capacity: 200x

    Source: Xilinx

    1

    10

    Price

    Virtex-II Pro

    10x

    1x

    Spartan-3

  • 7/30/2019 VLSI Programming

    22/61

    Virtex 4 FPGA: 4VSX55

    500MHz clock

    Flexible Logic

    6,144 CLBs

    2/7/201222

    multi-port RAM320 18 kbit

    Programmable

    512 DSP slides

    450MHz

    PowerPC1Gbps

    Differential I/O

    0.6-11.1Gbps

    Serial Trx

  • 7/30/2019 VLSI Programming

    23/61

    Some inspiration

    2/7/201223

    rom t e app cat on s e

  • 7/30/2019 VLSI Programming

    24/61

    All things grand and small [Moravec 98]

    2/7/201224

  • 7/30/2019 VLSI Programming

    25/61

    Chess Machine Performance [Moravec 98]

    2/7/201225

  • 7/30/2019 VLSI Programming

    26/61

    brain power equivalentper $1000 of computer

    Evolution computer power/cost [Moravec 98]

    2/7/201226

  • 7/30/2019 VLSI Programming

    27/61

    The Square Kilometer Array (SKA)

    ... the ultimate exploration tool

    ... and the ultimate

    software defined radio

    MPSoC -- 2010, June3027

  • 7/30/2019 VLSI Programming

    28/61

    The Square Kilometer Array (SKA)

    antenna surface: 1 km2 (sensitivity 50)large physical extent (3000+ km)wide frequency range: 70 MHz 30 GHzfull design by 2010; phase 1: 2017; phase 2: 2022

    1000- 1500 dishes (15m) in the central 5 km (2000-3000 total)

    MPSoC -- 2010, June3028

    + dense and/or sparse aperture arraysconnected to a massive data processor by an optical fibre network

    Software Defined RadioSoftware Defined RadioSoftware Defined RadioSoftware Defined Radio Astronomy

    computational load (on-line) 1 exa MAC1 exa MAC1 exa MAC1 exa MAC (1018 MAC/s)power budget = 30 MW30 MW30 MW30 MW 30 pJ/MAC 30 pJ/MAC 30 pJ/MAC 30 pJ/MAC allallallall----inininin

  • 7/30/2019 VLSI Programming

    29/61

    References

    Chip fotos: http://www-vlsi.stanford.edu/group/chips.html

    ITRS Roadmap http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf

    2/7/201229

    en w compu er ar ware ma c e uman ra n http://www.jetpress.org/volume1/moravec.htm

    BEE & Square Kilometer Array

    http://bwrc.eecs.berkeley.edu/Research/BEE/ http://seti.berkeley.edu/casper/papers/BEE2_ska2004_poster.pdf http://www.skatelescope.org/

  • 7/30/2019 VLSI Programming

    30/61

    VLSI Digital Signal ProcessingSystems

    2/7/201230

    Parhi, Chapters 1&2

  • 7/30/2019 VLSI Programming

    31/61

    DSP applications classes

    10G

    1G

    100M10M

    1M video

    HDTVradio

    radar

    erate

    2/7/201231

    100k

    10k

    1k

    100

    10

    1

    speechaudio modems

    controlseismic

    modeling

    complexity

    Samp

    # instructions/sample

  • 7/30/2019 VLSI Programming

    32/61

    Typical DSP algorithms

    speech (de-)coding

    speech recognitionspeech synthesisspeaker identification

    sound synthesisecho cancellationmodem: (de-)modulationvision

    2/7/201232

    Hi-fi audio en/decodingnoise cancellationaudio equalization

    ambient acousticemulation.

    image (de-)compressionimage compositionbeam cancellation

    spectral estimationetc.

  • 7/30/2019 VLSI Programming

    33/61

    Typical DSP algorithms: FIR Filters

    Filters reduce signal noise and enhance image or signal qualityby removing unwanted frequencies.

    Finite Impulse Response (FIR) filters compute y(n):

    )(*)()()()(1

    nxnhkixkhiyN

    ==

    2/7/201233

    wherex is the input sequencey is the output sequence

    h is the impulse response (filter coefficients)N is the number of taps (coefficients) in the filter

    Output sequence depends only on input sequence and impulseresponse.

    =

  • 7/30/2019 VLSI Programming

    34/61

    Typical DSP algorithms: IIR Filters

    Infinite Impulse Response (IIR) filters compute:

    =

    =

    +=

    1

    0

    1

    1)()()()()(

    N

    k

    M

    kkixkbkiykaiy

    2/7/201234

    Output sequence depends on input sequence, impulseresponse,as well as previous outputs

    Adaptive filters (FIR and IIR) update their coefficients tominimize the distance between the filter output and thedesired signal.

  • 7/30/2019 VLSI Programming

    35/61

    Typical DSP Algorithms: DFT and FFT

    The Discrete Fourier Transform (DFT) supports frequencydomain (spectral) analysis:

    for k = 0, 1, , N-1, where

    x is the input sequence in the time domain (real or complex)

    1)()(

    21

    0

    ===

    =

    jeWnxWky Nj

    N

    N

    n

    nk

    N

    2/7/201235

    y is an output sequence in the frequency domain (complex)

    The Inverse Discrete Fourier Transform (IDFT) is computed as

    The Fast Fourier Transform (FFT) and its inverse (IFFT) providean efficient method for computing the DFT and IDFT.

    1-n,...1,0,nfor,)()(1

    0

    ==

    =

    N

    k

    nk

    N kyWnx

  • 7/30/2019 VLSI Programming

    36/61

    Typical DSP Algorithms: DCT

    The Discrete Cosine Transform (DCT) and its inverse IDCT arefrequently used in video (de-) compression (e.g., MPEG-2):

    1-N...1,0,kfor,)(]2

    )12(cos[)()(1

    0

    =+

    =

    =

    N

    n

    nxN

    knkeky

    2/7/201236

    where e(k) = 1/sqrt(2)ifk = 0; otherwise e(k) = 1.

    A N-Point, 1D-DCT requires N2MAC operations.

    1-N...1,0,kfor,)(]2cos[)()( 0=

    +

    =

    =knyN

    n

    keNnx

  • 7/30/2019 VLSI Programming

    37/61

    Typical DSP Algorithms: distance calcul.

    Distance calculations are typically used in patternrecognition, motion estimation, and coding.

    Problem: chose the vector rkwhose distance (see below) fromthe input vector xis minimum.

    2/7/201237

    |)()(|11

    0

    =

    =

    N

    ik irix

    Nd

    =

    =

    1

    0

    2)]()([1N

    ik irix

    Nd

    Mean Absolute Difference (MAD) Mean Square Error (MSE)

  • 7/30/2019 VLSI Programming

    38/61

    Typical DSP Algorithms:Matrix Computs

    Matrix computations are typically used to estimate parametersin DSP systems.

    Matrix vector multiplication Matrix-matrix multiplication

    2/7/201238

    Matrix inversion Matrix triangulization

    Matrices often have (band) structures or may be sparse.

  • 7/30/2019 VLSI Programming

    39/61

    Computation Rates

    To estimate the hardware resources required, we can usethe equation:

    where

    SSCNRR =

    2/7/201239

    c

    Rs is the sampling rateNs is the (average) number of operations per sample

    For example, a 1-D FIR has NS = 2Nand a 2-D FIR has NS = 2N

    2.

    C i l R f FIR Fil i

  • 7/30/2019 VLSI Programming

    40/61

    Computational Rates for FIR Filtering

    Signal type Frequency # taps Performance

    Speech 8 kHz N =128 20 MOPs

    Music 48 kHz N =256 240 MOPs

    Video phone 6.75 MHz N*N = 81 1,090 MOPs

    2/7/201240

    TV 27 MHz N*N = 81 4,370 MOPs

    HDTV 144 MHz N*N = 81 23,300 MOPs

  • 7/30/2019 VLSI Programming

    41/61

    2/7/201241

  • 7/30/2019 VLSI Programming

    42/61

    2/7/201242

  • 7/30/2019 VLSI Programming

    43/61

    2/7/201243

  • 7/30/2019 VLSI Programming

    44/61

    2/7/201244

  • 7/30/2019 VLSI Programming

    45/61

    2/7/201245

    z-k = k units delay

    edge labeled n= multiplication by n

  • 7/30/2019 VLSI Programming

    46/61

    Graphical representations: typical usage

    block diagram

    data flow graph

    signal flow graph

    2/7/201246

    general

    signal processing

    LTI systems

    Linear Systems

  • 7/30/2019 VLSI Programming

    47/61

    Linear Systems

    input x, output y:

    discrete system:

    x(n) y(n)

    linear system:

    results in

    2/7/201247

    x

    1(n) + x

    2(n) y

    1(n) + y

    2(n)

    c1 x1(n) + c2 x2(n) c1 y1(n) + c2 y2(n)

    for arbitrary c1 and c2

    Most of our examples will be linear systems

    results in

    results in

    Linear Time-Invariant Systems

  • 7/30/2019 VLSI Programming

    48/61

    Linear Time Invariant Systems

    input x, output y:

    x(n+k) = x(n) shifted by integer k sample periods

    time-invariant system

    x(n) =x(n+k) y(n) = y(n+k)results in

    2/7/201248

    Most of our examples will be linear time-invariant systems, orLTI systems

    Commutativity of LTI systems

  • 7/30/2019 VLSI Programming

    49/61

    Commutativity of LTI systems

    LTISystem A

    LTISystem Bx(n) y(n)

    f(n)

    2/7/201249

    LTI

    System B

    LTI

    System A

    x(n) y(n)g(n)

    is equivalent to

    Iteration of a Synchronous Flow Graph

  • 7/30/2019 VLSI Programming

    50/61

    Iteration of a Synchronous Flow Graph

    Each actor fires the minimum number of times to return thegraph to a particular state

    Example of amulti-rate DFG:A

    1

    B

    2

    C

    32 2 1

    # firings for 1 iteration

    2/7/201250

    2 2 3

    # tokens per edge for 1 iteration

    A A B B C C

    2 4 6 3

    Iteration period

  • 7/30/2019 VLSI Programming

    51/61

    Iteration period

    Iteration period =the time required for the execution of one iteration of the SFG

    Example:

    a

    2/7/201251

    Let

    Tm= 10 = multiplication timeTa= 4 = addition time

    Iteration period = Tm+Ta= 14

    = minimum sample period Ts; that is: Ts Tm+Ta

    + D y n-x n

  • 7/30/2019 VLSI Programming

    52/61

    2/7/201252

  • 7/30/2019 VLSI Programming

    53/61

    2/7/201253

  • 7/30/2019 VLSI Programming

    54/61

    2/7/201254

  • 7/30/2019 VLSI Programming

    55/61

    2/7/201255

    4 types of delay paths

  • 7/30/2019 VLSI Programming

    56/61

    1

    2

    3 outputsinputs

    combinationalfunctions

    4 types of delay paths

    from to

    2/7/201256

    delay elements

    = state

    4

    Finite state machine (FSM) representation of a DSP system

    1 inputs state

    2 state outputs

    3 inputs outputs

    4 state state

  • 7/30/2019 VLSI Programming

    57/61

    2/7/201257

    DSP references

  • 7/30/2019 VLSI Programming

    58/61

    Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design andImplementation. Wiley Inter-Science 1999.

    Richard G. Lyons. Understanding Digital Signal Processing(2ndedition). Prentice Hall 2004.

    2/7/201258

    John G. Proakis and Dimitris K Manolakis. Digital Signal Processing(4th edition), Prentice Hall, 2006.

    Simon Haykin. Neural Networks, a Comprehensive Foundation(2ndedition). Prentice Hall 1999.

    Computer Architecture and DSP references

  • 7/30/2019 VLSI Programming

    59/61

    Hennessy and Patterson, Computer Architecture, aQuantitative Approach. 3rd edition. Morgan Kaufmann, 2002.

    Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSPProcessor Fundamentals, Berkeley Design Technology, Inc,1994-199

    2/7/201259

    Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEESignal Processing Magazine, 2000.

    Kees van Berkel et al. Vector Processing as an Enabler forSoftware-Defined Radio in Handheld Devices, EURASIP Journalon Applied Signal Processing 2005:16, 2613-2625.

    VLSI Programming: next lecture

  • 7/30/2019 VLSI Programming

    60/61

    Parhi, Chapters 2, 3

    Representations of DSP algorithms Data flow graphs Loop bounds and iteration bounds

    2/7/201260

    Pipelining of digital filters Parallel processing Retiming techniques

  • 7/30/2019 VLSI Programming

    61/61

    THANK YOU