Download ppt - Lecture 1: Course Intro Prof. Mike Schulte Application-Specific Processor Design ECE 450-11

Lecture 1: Course Intro

Prof. Mike Schulte

Application-Specific Processor Design

ECE 450-11

Course/Instructor Info

• Name: Application-Specific Processor Design

• Number: ECE 450-11

• Homepage: http://www.eecs.lehigh.edu/~mschulte/ece450-00

• Location: 416 Packard Lab

• Time: T, Th 5:00-6:15 PM

• Instructor: Michael Schulte

• Office: 326 Packard Lab

• Phone: 758-5036

• Email: [email protected]

• Office hours: T, Th 6:15-7:00 or by appointment

http://www.eecs.lehigh.edu/~mschulte/ece450-00









Course Objectives• To provide students with the background needed to design and analyze application-specific processors.

• Typical applications specific processing systems include: – Digital Signal Processing (DSP) Systems

» Cellular telephones, wireless base-stations, modems

– Multimedia Systems» High-definition TV, video conferencing, computer graphics

– Scientific Computing Systems» Partial differential equation solvers, vector processors

– Control Systems» Manufacturing plants, navigation systems, chemical processing

Topics CoveredTextbook: Peter Pirsch, Architectures for Digital Signal Processing, John Wiley & Sons, 1998.

– Signal Processing Algorithms & Implementations (Chapter 1)– Computer Arithmetic (Chapter 3)– Pipelining and Parallel Processing (Chapters 4)– Array Architectures (Chapter 5)– FIR, IIR, DFT, and FFT Implementations (Chapter 6 and 7)– Digital Signal Processors and Multimedia Processors (Chapter 8)– Multiprocessor Systems (Chapter 9)– Implementation Strategies (Chapter 10)

• Other useful textbooks:– Keshab Parhi, VLSI Digital Signal Processing Systems: Design and Implementation ,

John Wiley & Sons, 1999.– Vijay K. Madisetti, Digital Signal Processors : An Introduction to Rapid Prototyping

and Design Synthesis, IEEE CS Press, 1995. • Course schedule:

http://www.eecs.lehigh.edu/~mschulte/ece450-00/schedule.html










Prerequisites and Grading

• Prerequisites:– A previous course in computer architecture (e.g.,

ECE201)

– Experience with hardware description languages (e.g., VHDL or Verilog)

– Course does not assume a knowledge of DSP or transistor level design.

• Grading– Homeworks : 20%

– First exam : 20%

– Second exam : 20%

– Class project: 40%

Course Project

• The course project is to– Perform in-depth research on a topic in the field of

application-specific processor design

– Research and design an application-specific processor

• The project will consist of – Project proposal (9/28/00)

– Status report (10/26/00)

– Final report (12/03/00)

– Project presentation (12/01/00 and 12/03/00)

• Projects to be done by one or two people

Sample Projects

• Sample projects include: – Research and design (in Verilog or VHDL)

» DCT or FFT accelerators

» Viterbi encoders or decoders

» Low-power arithmetic units (e.g., multipliers, adders, multiply-accumulate units, or function approximators)

» Parallel saturating arithmetic units for GSM coders

» Reed-Solomon or Turbo coders

» Novel FIR or IIR implementations

– Propose and evaluate compiler, architecture, or circuit techniques for reducing power dissipation.

– Investigate designs for encryption and decryption.

Useful Web Resources

• Application specific processor design links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#asp links

• Computer arithmetic links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#comp-arch links

• Digital design links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#design links

• Literature Search Links– Lehigh University Database Systems

http://www.lehigh.edu/~inlib/

– IEEE Xplore

http://ieeexplore.ieee.org/lpdocs/epic03/

http://www.eecs.lehigh.edu/~mschulte/ece450-00/#asp%20links









http://www.eecs.lehigh.edu/~mschulte/#comp-arch%20links









http://www.eecs.lehigh.edu/~mschulte/ece450-00/#design%20links









http://www.lehigh.edu/~inlib







http://ieeexplore.ieee.org/lpdocs/epic03








DSP AlgorithmsDSP Algorithm System Application

Speech CodingDigital cellular telephones, personal communications systems, digital cordless telephones,multimedia computers, secure communications.

Speech EncryptionDigital cellular telephones, personal communications systems, digital cordless telephones,secure communications.

Speech RecognitionAdvanced user interfaces, multimedia workstations, robotics, automotive applications,cellular telephones, personal communications systems.

Speech Synthesis Advanced user interfaces, roboticsSpeaker Identification Security, multimedia workstations, advanced user interfaces

High-fidelity AudioConsumer audio, consumer video, digital audio broadcast, professional audio, multimediacomputers

ModemsDigital cellular telephones, personal communications systems, digital cordless telephones,digital audio broadcast, digital signaling on cable TV, multimedia computers, wirelesscomputing, navigation, data/fax

Noise cancellation Professional audio, advanced vehicular audio, industrial applicationsAudio Equalization Consumer audio, professional audio, advanced vehicular audio, musicAmbient Acoustics Emulation Consumer audio, professional audio, advanced vehicular audio, musicAudio Mixing/Editing Professional audio, music, multimedia computersSound Synthesis Professional audio, music, multimedia computers, advanced user interfaces

VisionSecurity, multimedia computers, advanced user interfaces, instrumentation, robotics,navigation

Image Compression Digital photography, digital video, multimedia computers, videoconferencingImage Compositing Multimedia computers, consumer video, advanced user interfaces, navigationBeamforming Navigation, medical imaging, radar/sonar, signals intelligenceEcho cancellation Speakerphones, hands-free cellular telephonesSpectral Estimation Signals intelligence, radar/sonar, professional audio, music

Typical DSP Algorithms:FIR Filters

• Filters reduce signal noise and enhance image or signal quality by removing unwanted frequencies.

• Finite Impulse Response (FIR) filters compute:

where– x is the input sequence

– y is the output sequence

– h is the impulse response (filter coefficients)

– N is the number of taps (coefficients) in the filter

• Output sequence depends only on input sequence and impulse response.

)(*)()()()(1

0

nxnhkixkhiyN

k

Typical DSP Algorithms:IIR Filters

• Infinite Impulse Response (IIR) filters compute:

• Output sequence depends on input sequence, previous outputs, and impulse response.

• Both FIR and IIR filters – require dot product (multiply-accumulate) operations

– Use fixed coefficients

• Adaptive filters update their coefficients to minimize the distance between the filter output and the desired signal.

1

0

1

1

)()()()()(N

k

M

k

kixkbkiykaiy

Typical DSP Algorithms:Discrete Fourier Transform

• The Discrete Fourier Transform (DFT) allows for spectral analysis in the frequency domain.

• It is computed as

for k = 0, 1, … , N-1, where – x is the input sequence in the time domain

– y is an output sequence in the frequency domain

• The Inverse Discrete Fourier Transform is computed as

• The Fast Fourier Transform (FFT) provides an efficient method for computing the DFT.

1 )()(21

0

jeWnxWky N

j

N

N

n

nkN

1-n , ... 1, 0, n for ,)()(1

0

N

k

nkN kyWnx

Typical DSP Algorithms:Discrete Cosine Transform

• The Discrete Cosine Transform (DCT) is frequently used in video compression (e.g., MPEG-2).

• The DCT and Inverse DCT (IDCT) are computed as:

where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1.

• A N-Point, 1D-DCT requires N2 MAC operations.

1-N ... 1, 0, k for ,)(]2

)12(cos[)()(

1

0

N

n

nxN

knkeky

1-N ... 1, 0, k for ,)(]2

)12(cos[)(

2)(

1

0

N

k

nyN

knke

Nnx

Typical DSP Algorithms:Distance Calculations

• Distance calculations are typically used in pattern recognition, motion estimation, and coding.

• Problem: Chose the vector rk whose distance from the input vector x is minimum.

• The distance is typically defined as: – The mean absolute difference (MAD or L1 norm)

– The means square error (MSE or L2 norm)

|)()(|1 1

0

N

ik irix

Nd

1

0

2)]()([1 N

ik irix

Nd

Typical DSP Algorithms:Matrix Computations

• Matrix computations are typically used to estimate parameters in DSP systems.

• The Gauss-Jordan method for matrix triangualrization uses the equations:

where A is the matrix and anj is the pivot element.

• Given’s method rotates a matrix by , using the equations

nj

nknk

nj

nkijikik a

aa

a

aaaa ','

jj

ijjkikjkjkikik a

aaaaaaa 1tan,cossin',sincos'

Summary of DSP Applications• DSP Applications typically require

– Dot product computations (Filters, Transforms, Matrices)

– Distance Calculations (pattern recognition, coding)

– Division or reciprocals (matrix computations, normalization)

– Functions approximations (givens rotations, DFT, DCTs)

21 OpOp

|| 21 OpOp 221 )( OpOp

4

321 Op

OpOpOp

sincos 21 OpOp

Compuation Rates• To estimate the hardware resources required, we

can use the equation:

where– Rc is the computation rate

– Rs is the sampling rate

– nop is the average number of operations per sample

• For example, a 1-D FIR has nop = 2N and a 2-D FIR has nop = 2N2.

• What does the above equation assume?

opSC nRR

Computational Rates for FIR Filtering

Signal type Frequency # taps Performance

Speech 8 kHz N =128 20 MOPs

Music 48 kHz N =256 24 MOPs

Video phone 6.75 MHz N*N = 81 1,090 MOPs

TV 27 MHz N*N = 81 4,370 MOPs

HDTV 144 MHz N*N = 81 23,300 MOPs

Computational Requirements

400 MOPs

GSM_FR (2.5)

100 MOPs200 MOPs300 MOPs400 MOPs

GSM_EFR (16)GSM_HR, AC-3 decode, V.34 (20)

16 X GSM_EFR (380)

GSM Terminal (Baseband, HR) (52)

ADSL XCVR - 1.5Mb/s (100)

ADSL XCVR - 6.1Mb/s (360)

1 GOP

10 GOPs

100 GOPs

DFSE EQ - 2Mb/s (650)Full-rate DAB Viterbi Decoder, MPEG II MP@ML, 30fps Decode (600)

P X 64 CIF, 15 f/s, 100kb/s (1.2)

MPEG II Encode, 30f/s, Full Search, P=16, (35)

MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68)

Implementation Hierarchy

Processing Method General Description of Task

Algorithm Actual computations steps and functional relationships

ArchitectureImplementation of connected modules - each with subtask

CircuitryBasic module implementation - as logic gates or transistors

Processor ClassificationProcessor

DSP General Purpose

FixedPoint

FloatingPoint

16 bit 20 bit 24 bit32 bitIEEE

Other

IntegerFloating

Point

32 bit +subsets

32/64 bitIEEE

64 bit +subsets

Other(80 bit)

Simplified DSP Chip

InstructionMemory

DataMemoryA/D Converter

D/A Converter

Seri

al Port

s

DSPCore

DSP Basic Features

• Fast Multiply-Accumulate (MAC)– DSP filters and transforms are multiply intensive

• Multiple Access Memory– 1 Instruction, 2 data per cycle

• Specialized Addressing– Fifo, Arrays, Permutations

• Specialized Program Control– Efficient loops

– Fast Interrupt Handling

Code CharacteristicsGeneral Purpose vs. DSP

• General Purpose– Limited Parallelism

– Control Dominated

– Inherently Serial

– Branch Intensive (20%)

• DSP– Parallel Inner Loops

– Loop Setup, then Compute

– Overlapped Parallel Processing

– Multiple Independent Streams

1

parallel

parallelserial Speedup

tt

speedup

Amdahl’s Law

1.0 1.1 1.3 1.4 1.72.0

2.53.3

5.06.7

10.0

20.0

33.3

50.0

100.0

200.0

500.0

1000.0

10000.0

1

10

100

1000

10000

0 10 20 30 40 50 60 70 80 90 100

% Parallel Code

Spee

dup

Workload Comparisons

General Purpose DSP

Video

Amdahl’s Law

DSP vs. General Purpose

• Execution Predictability– Required to guarantee real-

time constraints

• 0-overhead Loop Buffer

• Complex Instructions– Multiple Operations Issued

• Harvard Memory Architecture

• Specialized Addressing Modes

• Operate on Stream Data

• Fast But Non-predictable– Dynamic Instruction Issue

– Non-deterministic caches

• Branch Prediction

• RISC Superscalar Instructions– Multiple Instructions Issued

• Von Neumann Architecture– Split Cache has similar benefit

• Typically Linear Addressing

• Caches Assume Locality

Compilable Architectures

Algorithms

Compiler

Architecture

Optimize

Cost / PowerPerformance

n MAC

2 MAC

1 MAC

No MAC

Implementations

GSMxDSLDBSLMDSHFC-QAMQPSKDMTCAPDFEMLSEDSS LMDS HFCOSDMDFSE CELP

DABDVDDVBMPEG2MPEG4AC-3MUSICAMCOFDMFFTFIR

System Integration Trends

DSP1620 DSP1620 DSP1620

Pre-equalizer Equalizer ChannelCoding

GSM Base Transceiver Station

DSP8 FR/EFR,

16 HR Channels DSP DSP

GSM Base Transceiver Station

DSP DSP DSP

A/D

A/D

D/A

8 FR/EFR,16 HR Channels

A/D

A/D

D/A

SRAM SRAMSRAMSRAMSRAMSRAM

DSP Future

Pre-equalizer EqualizerChannel Coding

GSM Base Transceiver Station8 FR/EFR,

16 HR Channels

A/D

A/D

D/A

Current Products

Current Designs

Next Generation Designs