Course/Instructor Info
• Name: Application-Specific Processor Design
• Number: ECE 450-11
• Homepage: http://www.eecs.lehigh.edu/~mschulte/ece450-00
• Location: 416 Packard Lab
• Time: T, Th 5:00-6:15 PM
• Instructor: Michael Schulte
• Office: 326 Packard Lab
• Phone: 758-5036
• Email: [email protected]
• Office hours: T, Th 6:15-7:00 or by appointment
Course Objectives• To provide students with the background needed to design and analyze application-specific processors.
• Typical applications specific processing systems include: – Digital Signal Processing (DSP) Systems
» Cellular telephones, wireless base-stations, modems
– Multimedia Systems» High-definition TV, video conferencing, computer graphics
– Scientific Computing Systems» Partial differential equation solvers, vector processors
– Control Systems» Manufacturing plants, navigation systems, chemical processing
Topics CoveredTextbook: Peter Pirsch, Architectures for Digital Signal Processing, John Wiley & Sons, 1998.
– Signal Processing Algorithms & Implementations (Chapter 1)– Computer Arithmetic (Chapter 3)– Pipelining and Parallel Processing (Chapters 4)– Array Architectures (Chapter 5)– FIR, IIR, DFT, and FFT Implementations (Chapter 6 and 7)– Digital Signal Processors and Multimedia Processors (Chapter 8)– Multiprocessor Systems (Chapter 9)– Implementation Strategies (Chapter 10)
• Other useful textbooks:– Keshab Parhi, VLSI Digital Signal Processing Systems: Design and Implementation ,
John Wiley & Sons, 1999.– Vijay K. Madisetti, Digital Signal Processors : An Introduction to Rapid Prototyping
and Design Synthesis, IEEE CS Press, 1995. • Course schedule:
http://www.eecs.lehigh.edu/~mschulte/ece450-00/schedule.html
Prerequisites and Grading
• Prerequisites:– A previous course in computer architecture (e.g.,
ECE201)
– Experience with hardware description languages (e.g., VHDL or Verilog)
– Course does not assume a knowledge of DSP or transistor level design.
• Grading– Homeworks : 20%
– First exam : 20%
– Second exam : 20%
– Class project: 40%
Course Project
• The course project is to– Perform in-depth research on a topic in the field of
application-specific processor design
– Research and design an application-specific processor
• The project will consist of – Project proposal (9/28/00)
– Status report (10/26/00)
– Final report (12/03/00)
– Project presentation (12/01/00 and 12/03/00)
• Projects to be done by one or two people
Sample Projects
• Sample projects include: – Research and design (in Verilog or VHDL)
» DCT or FFT accelerators
» Viterbi encoders or decoders
» Low-power arithmetic units (e.g., multipliers, adders, multiply-accumulate units, or function approximators)
» Parallel saturating arithmetic units for GSM coders
» Reed-Solomon or Turbo coders
» Novel FIR or IIR implementations
– Propose and evaluate compiler, architecture, or circuit techniques for reducing power dissipation.
– Investigate designs for encryption and decryption.
Useful Web Resources
• Application specific processor design links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#asp links
• Computer arithmetic links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#comp-arch links
• Digital design links:http://www.eecs.lehigh.edu/~mschulte/ece450-00/#design links
• Literature Search Links– Lehigh University Database Systems
http://www.lehigh.edu/~inlib/
– IEEE Xplore
http://ieeexplore.ieee.org/lpdocs/epic03/
DSP AlgorithmsDSP Algorithm System Application
Speech CodingDigital cellular telephones, personal communications systems, digital cordless telephones,multimedia computers, secure communications.
Speech EncryptionDigital cellular telephones, personal communications systems, digital cordless telephones,secure communications.
Speech RecognitionAdvanced user interfaces, multimedia workstations, robotics, automotive applications,cellular telephones, personal communications systems.
Speech Synthesis Advanced user interfaces, roboticsSpeaker Identification Security, multimedia workstations, advanced user interfaces
High-fidelity AudioConsumer audio, consumer video, digital audio broadcast, professional audio, multimediacomputers
ModemsDigital cellular telephones, personal communications systems, digital cordless telephones,digital audio broadcast, digital signaling on cable TV, multimedia computers, wirelesscomputing, navigation, data/fax
Noise cancellation Professional audio, advanced vehicular audio, industrial applicationsAudio Equalization Consumer audio, professional audio, advanced vehicular audio, musicAmbient Acoustics Emulation Consumer audio, professional audio, advanced vehicular audio, musicAudio Mixing/Editing Professional audio, music, multimedia computersSound Synthesis Professional audio, music, multimedia computers, advanced user interfaces
VisionSecurity, multimedia computers, advanced user interfaces, instrumentation, robotics,navigation
Image Compression Digital photography, digital video, multimedia computers, videoconferencingImage Compositing Multimedia computers, consumer video, advanced user interfaces, navigationBeamforming Navigation, medical imaging, radar/sonar, signals intelligenceEcho cancellation Speakerphones, hands-free cellular telephonesSpectral Estimation Signals intelligence, radar/sonar, professional audio, music
Typical DSP Algorithms:FIR Filters
• Filters reduce signal noise and enhance image or signal quality by removing unwanted frequencies.
• Finite Impulse Response (FIR) filters compute:
where– x is the input sequence
– y is the output sequence
– h is the impulse response (filter coefficients)
– N is the number of taps (coefficients) in the filter
• Output sequence depends only on input sequence and impulse response.
)(*)()()()(1
0
nxnhkixkhiyN
k
Typical DSP Algorithms:IIR Filters
• Infinite Impulse Response (IIR) filters compute:
• Output sequence depends on input sequence, previous outputs, and impulse response.
• Both FIR and IIR filters – require dot product (multiply-accumulate) operations
– Use fixed coefficients
• Adaptive filters update their coefficients to minimize the distance between the filter output and the desired signal.
1
0
1
1
)()()()()(N
k
M
k
kixkbkiykaiy
Typical DSP Algorithms:Discrete Fourier Transform
• The Discrete Fourier Transform (DFT) allows for spectral analysis in the frequency domain.
• It is computed as
for k = 0, 1, … , N-1, where – x is the input sequence in the time domain
– y is an output sequence in the frequency domain
• The Inverse Discrete Fourier Transform is computed as
• The Fast Fourier Transform (FFT) provides an efficient method for computing the DFT.
1 )()(21
0
jeWnxWky N
j
N
N
n
nkN
1-n , ... 1, 0, n for ,)()(1
0
N
k
nkN kyWnx
Typical DSP Algorithms:Discrete Cosine Transform
• The Discrete Cosine Transform (DCT) is frequently used in video compression (e.g., MPEG-2).
• The DCT and Inverse DCT (IDCT) are computed as:
where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1.
• A N-Point, 1D-DCT requires N2 MAC operations.
1-N ... 1, 0, k for ,)(]2
)12(cos[)()(
1
0
N
n
nxN
knkeky
1-N ... 1, 0, k for ,)(]2
)12(cos[)(
2)(
1
0
N
k
nyN
knke
Nnx
Typical DSP Algorithms:Distance Calculations
• Distance calculations are typically used in pattern recognition, motion estimation, and coding.
• Problem: Chose the vector rk whose distance from the input vector x is minimum.
• The distance is typically defined as: – The mean absolute difference (MAD or L1 norm)
– The means square error (MSE or L2 norm)
|)()(|1 1
0
N
ik irix
Nd
1
0
2)]()([1 N
ik irix
Nd
Typical DSP Algorithms:Matrix Computations
• Matrix computations are typically used to estimate parameters in DSP systems.
• The Gauss-Jordan method for matrix triangualrization uses the equations:
where A is the matrix and anj is the pivot element.
• Given’s method rotates a matrix by , using the equations
nj
nknk
nj
nkijikik a
aa
a
aaaa ','
jj
ijjkikjkjkikik a
aaaaaaa 1tan,cossin',sincos'
Summary of DSP Applications• DSP Applications typically require
– Dot product computations (Filters, Transforms, Matrices)
– Distance Calculations (pattern recognition, coding)
– Division or reciprocals (matrix computations, normalization)
– Functions approximations (givens rotations, DFT, DCTs)
21 OpOp
|| 21 OpOp 221 )( OpOp
4
321 Op
OpOpOp
sincos 21 OpOp
Compuation Rates• To estimate the hardware resources required, we
can use the equation:
where– Rc is the computation rate
– Rs is the sampling rate
– nop is the average number of operations per sample
• For example, a 1-D FIR has nop = 2N and a 2-D FIR has nop = 2N2.
• What does the above equation assume?
opSC nRR
Computational Rates for FIR Filtering
Signal type Frequency # taps Performance
Speech 8 kHz N =128 20 MOPs
Music 48 kHz N =256 24 MOPs
Video phone 6.75 MHz N*N = 81 1,090 MOPs
TV 27 MHz N*N = 81 4,370 MOPs
HDTV 144 MHz N*N = 81 23,300 MOPs
Computational Requirements
400 MOPs
GSM_FR (2.5)
100 MOPs200 MOPs300 MOPs400 MOPs
GSM_EFR (16)GSM_HR, AC-3 decode, V.34 (20)
16 X GSM_EFR (380)
GSM Terminal (Baseband, HR) (52)
ADSL XCVR - 1.5Mb/s (100)
ADSL XCVR - 6.1Mb/s (360)
1 GOP
10 GOPs
100 GOPs
DFSE EQ - 2Mb/s (650)Full-rate DAB Viterbi Decoder, MPEG II MP@ML, 30fps Decode (600)
P X 64 CIF, 15 f/s, 100kb/s (1.2)
MPEG II Encode, 30f/s, Full Search, P=16, (35)
MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68)
Implementation Hierarchy
Processing Method General Description of Task
Algorithm Actual computations steps and functional relationships
ArchitectureImplementation of connected modules - each with subtask
CircuitryBasic module implementation - as logic gates or transistors
Processor ClassificationProcessor
DSP General Purpose
FixedPoint
FloatingPoint
16 bit 20 bit 24 bit32 bitIEEE
Other
IntegerFloating
Point
32 bit +subsets
32/64 bitIEEE
64 bit +subsets
Other(80 bit)
DSP Basic Features
• Fast Multiply-Accumulate (MAC)– DSP filters and transforms are multiply intensive
• Multiple Access Memory– 1 Instruction, 2 data per cycle
• Specialized Addressing– Fifo, Arrays, Permutations
• Specialized Program Control– Efficient loops
– Fast Interrupt Handling
Code CharacteristicsGeneral Purpose vs. DSP
• General Purpose– Limited Parallelism
– Control Dominated
– Inherently Serial
– Branch Intensive (20%)
• DSP– Parallel Inner Loops
– Loop Setup, then Compute
– Overlapped Parallel Processing
– Multiple Independent Streams
1
parallel
parallelserial Speedup
tt
speedup
Amdahl’s Law
1.0 1.1 1.3 1.4 1.72.0
2.53.3
5.06.7
10.0
20.0
33.3
50.0
100.0
200.0
500.0
1000.0
10000.0
1
10
100
1000
10000
0 10 20 30 40 50 60 70 80 90 100
% Parallel Code
Spee
dup
Workload Comparisons
General Purpose DSP
Video
Amdahl’s Law
DSP vs. General Purpose
• Execution Predictability– Required to guarantee real-
time constraints
• 0-overhead Loop Buffer
• Complex Instructions– Multiple Operations Issued
• Harvard Memory Architecture
• Specialized Addressing Modes
• Operate on Stream Data
• Fast But Non-predictable– Dynamic Instruction Issue
– Non-deterministic caches
• Branch Prediction
• RISC Superscalar Instructions– Multiple Instructions Issued
• Von Neumann Architecture– Split Cache has similar benefit
• Typically Linear Addressing
• Caches Assume Locality
Compilable Architectures
Algorithms
Compiler
Architecture
Optimize
Cost / PowerPerformance
n MAC
2 MAC
1 MAC
No MAC
Implementations
GSMxDSLDBSLMDSHFC-QAMQPSKDMTCAPDFEMLSEDSS LMDS HFCOSDMDFSE CELP
DABDVDDVBMPEG2MPEG4AC-3MUSICAMCOFDMFFTFIR
System Integration Trends
DSP1620 DSP1620 DSP1620
Pre-equalizer Equalizer ChannelCoding
GSM Base Transceiver Station
DSP8 FR/EFR,
16 HR Channels DSP DSP
GSM Base Transceiver Station
DSP DSP DSP
A/D
A/D
D/A
8 FR/EFR,16 HR Channels
A/D
A/D
D/A
SRAM SRAMSRAMSRAMSRAMSRAM
DSP Future
Pre-equalizer EqualizerChannel Coding
GSM Base Transceiver Station8 FR/EFR,
16 HR Channels
A/D
A/D
D/A
Current Products
Current Designs
Next Generation Designs