Upload
nira
View
48
Download
1
Embed Size (px)
DESCRIPTION
ENCM 515 Review talk on 2001 Final. A. Wong, Electrical and Computer Engineering, University of Calgary, Canada wona @ ucalgary.ca. To Be Tackled Today. Review Important concepts of DSP 2001 ENCM 515 Final Exam Question 1 Question 2. Disclaimer. - PowerPoint PPT Presentation
Citation preview
ENCM 515 Review talk on 2001 Final
A. Wong,Electrical and Computer
Engineering, University of Calgary, Canada
wona @ ucalgary.ca
To Be Tackled Today
Review Important concepts of DSP
2001 ENCM 515 Final Exam Question 1 Question 2
Disclaimer
The answers given in this presentation are the views of the presenter and not necessarily the answers accepted by Dr. Smith
Requirements for “perfect” DSP architecture - 1
Fast instruction cycle -- not clock speed Fast hardware multiplier Floating point for easier design -- avoids
scaling and overflow High precision
wide busses for register, memory, processing units
Fast loop operation
Requirements for “perfect” DSP architecture - 2
Several data buses available to reduce memory bus conflict/transfer overhead
Harvard architecture and/or instruction caches to avoid instruction and data-fetch clashes
Duplicate resources for parallel computation Dedicated address calculation hardware
Requirements for “perfect” DSP architecture - 3
Extensive temporary registers to avoid unnecessary fetches of continually used data
Architecture allows easy parallel operation in multiprocessor systems -- NEW
Cycle time adjustable by instruction -- UNCOMMON
Duplicate resources for parallel computation of real and imaginary components -- UNCOMMON -- SIMD?
2001 Final Exam - 1 Assume that non-volatile registers have been saved as needed and that
the DAG registers I4, M4, B4, L4, I3, M3, I12, M12 have been set correctly A – circle the compute component of ONE 21k instruction B – circle the first totally parallel instruction in code C – circle the instructions that demonstrate Filling the algorithm pipeline
1 F9 = F9 - F9 R2 = 256
2 F1 = dm(I4,M4) F5 = pm(I12,M12)
3 lcntr = R2, do (pc, END_DEMOD - 1)
4 F13 = F1 * F5 F9 = F9 + F13 F1 = dm(I4,M4) F5 = pm(I12,M12)
END_DEMOD:
5 F13 = F1 * F5 F9 = F9 + F13
6 dm(I3,M3)
2001 Final Exam – 1 -- DSA A – circle the compute component of ONE 21k instruction -- OK B – circle the first totally parallel instruction in code -- OK C – circle the instructions that demonstrate Filling the
algorithm pipeline – the dm and pm in 2 and the + and * in 4
1 F9 = F9 - F9 R2 = 256
2 F1 = dm(I4,M4) F5 = pm(I12,M12)
3 lcntr = R2, do (pc, END_DEMOD - 1)
4 F13 = F1 * F5 F9 = F9 + F13 F1 = dm(I4,M4) F5 = pm(I12,M12)
END_DEMOD:
5 F13 = F1 * F5 F9 = F9 + F13
6 dm(I3,M3)
2001 Final Exam - 2 Briefly explain, using the context of this
code, the concept of pipeline in parallel instruction processors.
Answer – pipelines are necessary for parallelizing the above code since it involves using the same registers at different stages of the instruction cycle (Fetch, Decode, and Execute)
2001 Final Exam - 3 The code would be more
understandable if the first instruction had been written as F9 = 0, R2 = 256 but that wasn’t not possible. Explain.
Answer – There is a set number of bits on the data bus, if the instruction uses too many constants, there may not be enough bit to store the number.
2001 Final Exam – 3 – D.S.A The code would be more understandable if the first
instruction had been written as F9 = 0, R2 = 256 but that wasn’t not possible. Explain.
Answer – There is a set number of bits on the data bus, if the instruction uses too many constants, there may not be enough bit to store the number.
Answer – Incomplete – better – each constant takes 32 bits, total of 64 bits needed and only 48 bit program bus to carry instructions
2001 Final Exam - 4 The code will not provide the correct
synchronous detection result. There are a number of ways of fixing the code. Would changing instruction 2 to F13=F13–F13, F1=dm(I4,M4), F5=pm(I12,M12); be one of them?
Answer – yes, because F13 is not set to 0 at first, it may be containing “garbage” when used, resulting in error.
2001 Final Exam - 5 Explain the differences and relative
advantages between processors with a von Neumann and Harvard architecture.
CPU
Address Bus
Data Bus
Von Neumann
CPUROM Data
ROM Data
Harvard
Data Bus Data Bus
Address Bus Address Bus
2001 Final Exam – 5 – D.S.A. Picture’s are nice – but N. Q. A. – The
question said “Relative advantages and disadvantages” and you never discussed these at all.
CPU
Address Bus
Data Bus
Von Neumann
CPUROM Data
ROM Data
Harvard
Data Bus Data Bus
Address Bus Address Bus
2001 Final Exam - 6 Using processors discussed in ENCM
515 provide examples of processors with a von Neumann and with a Harvard architecture.
Answer von Neumann (68k) Harvard (29k)
2001 Final Exam - 7 The SHARC 21k does not have a Harvard
architecture but a Super Harvard ARChitecture. What are the advantages of having a super Harvard over the normal type, and under what circumstances will these advantages disappear.
Answer – The 21k allows caching of instruction for fast access. The advantage disappears when the cache is full or when cache thrash occurs.
2001 Final Exam - 8 Consider the code given earlier, will
instruction 6 be cached? If it is, how do you know? If not, why?
Answer – No, caching only occurs when data access on PM bus conflicts with instruction access on the PM bus
2001 Final Exam – 1 – D.S.A
Answer – No, caching only occurs when data access on PM bus conflicts with instruction access on the PM bus
ANSWER Yes -- 4 inside the loop clashes with 6 outside the loop
1 F9 = F9 - F9 R2 = 256
2 F1 = dm(I4,M4) F5 = pm(I12,M12)
3 lcntr = R2, do (pc, END_DEMOD - 1)
4 F13 = F1 * F5 F9 = F9 + F13 F1 = dm(I4,M4) F5 = pm(I12,M12)
END_DEMOD:
5 F13 = F1 * F5 F9 = F9 + F13
6 dm(I3,M3)
Homework Saturation – arithmetic – Design, write and document an 21k
assembly language code segment that accesses N points of a floating point array PMarray[] over the PM data bus, TRIPLES each value and sets all results above +25.0 to be equal to +25.0 before storing the result into a floating point array DMarray[] over the DM data bus.
.segment/pm seg_pmda;
.var PMarray[256]; // The initial array
.endseg;
.segment/dm seg_dmda;
.var DMarray[512]; // The final array
.var N; // The number of values to be converted
.endseg;