EEM478-WEEK1
Introduction
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 2
Learning Objectives
Why process signals digitally?
Definition of a real-time application.
Why use Digital Signal Processing processors?
What are the typical DSP algorithms?
Parameters to consider when choosing a DSP processor.
Programmable vs ASIC DSP.
Texas Instruments’ TMS320 family.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 3
Why go digital?
Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance.
Examples: FIR filter with linear phase.
Adaptive filters.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 4
Why go digital?
Analogue signal processing is achieved by using analogue components such as: Resistors.
Capacitors.
Inductors.
The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 5
Why go digital?
With DSP it is easy to: Change applications.
Correct applications.
Update applications.
Additionally DSP reduces: Noise susceptibility.
Chip count.
Development time.
Cost.
Power consumption.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 6
Why NOT go digital?
High frequency signals cannot be processed digitally because of two reasons: Analog to Digital Converters, ADC cannot
work fast enough.
The application can be too complex to be performed in real-time.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 7
DSP processors have to perform tasks in real-time, so how do we define real-time?
The definition of real-time depends on the application.
Example: a 100-tap FIR filter is performed in real-time if the DSP can perform and complete the following operation between two samples:
Real-time processing
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 8
We can say that we have a real-time application if: Waiting Time 0
Real-time processing
Processing TimeWaiting Time
Sample Timen n+1
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 9
Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? What is the power consumption of a
Pentium and a DSP processor?
What is the cost of a Pentium and a DSP processor?
Why do we need DSP processors?
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 10
Use a DSP processor when the following are required: Cost saving.
Smaller size.
Low power consumption.
Processing of many “high” frequency signals in real-time.
Use a GPP processor when the following are required: Large memory.
Advanced operating systems.
Why do we need DSP processors?
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 11
What are the typical DSP algorithms?
The Sum of Products (SOP) is the key element in most DSP algorithms:
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 12
Hardware vs. Microcode multiplication
DSP processors are optimised to perform multiplication and addition operations.
Multiplication and addition are done in hardware and in one cycle.
Example: 4-bit multiply (unsigned).
1011x 1110
1011x 1110
Hardware Microcode
10011010 00001011.1011..
1011...
10011010
Cycle 1Cycle 2Cycle 3Cycle 4
Cycle 5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 13
Parameters to consider when choosing a DSP processor
Parameter
Arithmetic format
Extended floating point
Extended Arithmetic
Performance (peak)
Number of hardware multipliers
Number of registers
Internal L1 program memory cache
Internal L1 data memory cache
Internal L2 cache
32-bit
N/A
40-bit
1200MIPS
2 (16 x 16-bit) with 32-bit result
32
32K
32K
512K
32-bit
64-bit
40-bit
1200MFLOPS
2 (32 x 32-bit) with 32 or 64-bit result
32
32K
32K
512K
TMS320C6211 (@150MHz)
TMS320C6711 (@150MHz)
C6711 Datasheet: \Links\TMS320C6711.pdf
C6211 Datasheet: \Links\TMS320C6211.pdf
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 14
Parameters to consider when choosing a DSP processor
Parameter
I/O bandwidth: Serial Ports (number/speed)
DMA channels
Multiprocessor support
Supply voltage
Power management
On-chip timers (number/width)
Cost
Package
External memory interface controller
JTAG
2 x 75Mbps
16
Not inherent
3.3V I/O, 1.8V Core
Yes
2 x 32-bit
US$ 21.54
256 Pin BGA
Yes
Yes
2 x 75Mbps
16
Not inherent
3.3V I/O, 1.8V Core
Yes
2 x 32-bit
US$ 21.54
256 Pin BGA
Yes
Yes
TMS320C6211 (@150MHz)
TMS320C6711 (@150MHz)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 15
Floating vs. Fixed point processors
Applications which require: High precision.
Wide dynamic range.
High signal-to-noise ratio.
Ease of use.
Need a floating point processor.
Drawback of floating point processors: Higher power consumption.
Can be higher cost.
Can be slower than fixed-point counterparts and larger in size.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 16
Floating vs. Fixed point processors
It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost.
For educational purposes, use the floating-point device (C6711) as it can support both fixed and floating point operations.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 17
General Purpose DSP vs. DSP in ASIC
Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions.
The advantages and disadvantages of using ASICs are listed below:
Advantages
• High throughput• Lower silicon area• Lower power consumption• Improved reliability• Reduction in system noise• Low overall system cost
Disadvantages
• High investment cost• Less flexibility• Long time from design to
market
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 18
Texas Instruments’ TMS320 family
Different families and sub-families exist to support different markets.
Lowest CostControl Systems Motor Control Storage Digital Ctrl Systems
C2000 C5000
EfficiencyBest MIPS per
Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP
C6000
Multi Channel and Multi Function App's
Comm Infrastructure Wireless Base-stations DSL Imaging Multi-media Servers Video
Performance &Best Ease-of-Use
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 19
C6713C62x™
C6000 RoadmapP
erf
orm
an
ce
Time
Software CompatibleFloating PointFloating Point
Multi-coreMulti-core C64x™ DSP1.1 GHz
C64x™ DSP1.1 GHz
C64x™ DSPC64x™ DSP
2nd Generation (Fixed Point)
General Purpose C6414C6414 C6415C6415 C6416C6416
MediaGateway
3G Wireless Infrastructure
C6201
C6701
C6202
C6203
C6211C6711
C6204
1st Generation
C6205
C6712C67x™
Fixed-point
Floating-point
C6411C6411
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 20
Part 2
TMS320C6000 Architectural Overview
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 21
Describe C6000 CPU architecture.
Introduce some basic instructions.
Describe the C6000 memory map.
Provide an overview of the peripherals.
Learning Objectives
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 22
General DSP System Block Diagram
PERIPHERALS
Central
Processing
Unit
Internal Memory
Internal Buses
ExternalMemory
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 23
Implementation of Sum of Products (SOP)
It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms.
So let’s write the code for this algorithm and at the same time discover the C6000 architecture.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Y =N
an xnn = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 24
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Implementation of Sum of Products (SOP)
Y =N
an xnn = 1
*So let’s implement the SOP algorithm!
The implementation in this module will be done in assembly.
= a1 * x1 + a2 * x2 +... + aN * xN
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 25
Multiply (MPY)
The multiplication of a1 by x1 is done in assembly by the following instruction:
MPY a1, x1, Y
This instruction is performed by a multiplier unit that is called “.M”
Y =N
an xnn = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 26
Multiply (.M unit)
.M
Y =40
an xnn = 1
*
The . M unit performs multiplications in hardware
MPY .M a1, x1, Y
Note: 16-bit by 16-bit multiplier provides a 32-bit result.
32-bit by 32-bit multiplier provides a 64-bit result.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 27
Addition (.?)
.M
.?
Y =40
an xnn = 1
*
MPY .M a1, x1, prod
ADD .? Y, prod, Y
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 28
Add (.L unit)
.M
.L
Y =40
an xnn = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
RISC processors such as the C6000 use registers to hold the operands, so lets change this code.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 29
Register File - A
Y =40
an xnn = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
.M
.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
Let us correct this by replacing a, x, prod and Y by the registers as shown above.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 30
Specifying Register Names
Y =40
an xnn = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
The registers A0, A1, A3 and A4 contain the values to be used by the instructions.
.M
.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 31
Specifying Register Names
Y =40
an xnn = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Register File A contains 16 registers (A0 -A15) which are 32-bits wide.
.M
.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 32
Data loading
Q: How do we load the operands into the registers?
.M
.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 33
Load Unit “.D”
A: The operands are loaded into the registers by loading them from the memory using the .D unit.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D
Data Memory
Q: How do we load the operands into the registers?
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 34
Load Unit “.D”
It is worth noting at this stage that the only way to access memory is through the .D unit.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D
Data Memory
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 35
Load Instruction
Q: Which instruction(s) can be used for loading operands from the memory to the registers?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D
Data Memory
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 36
Load Instructions (LDB, LDH,LDW,LDDW)
A: The load instructions..M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D
Data Memory
Q: Which instruction(s) can be used for loading operands from the memory to the registers?
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 37
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address.
Also the addresses are 32-bit wide.
address
FFFFFFFF
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 38
The syntax for the load instruction is:
Where:
Rn is a register that contains the address of the operand to be loaded
and
Rm is the destination register.
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1x1
prod
16-bits
Y
address
FFFFFFFF
LD *Rn,Rm
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 39
The syntax for the load instruction is:
The question now is how many bytes are going to be loaded into the destination register?
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1x1
prod
16-bits
Y
address
FFFFFFFF
LD *Rn,Rm
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 40
The syntax for the load instruction is:
LD *Rn,Rm
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1x1
prod
16-bits
Y
address
FFFFFFFF
The answer, is that it depends on the instruction you choose:• LDB: loads one byte (8-bit)
• LDH: loads half word (16-bit)
• LDW: loads a word (32-bit)
• LDDW: loads a double word (64-bit)
Note: LD on its own does not exist.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 41
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
address
FFFFFFFF
0xB0xA0xD0xC
Example:
If we assume that A5 = 0x4 then:
(1) LDB *A5, A7 ; gives A7 = 0x00000001
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 = 0x0807060504030201
0x10x20x30x40x50x60x70x8
The syntax for the load instruction is:
LD *Rn,Rm
01
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 42
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
address
FFFFFFFF
0xB0xA0xD0xC
Question:
If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place?
0x10x20x30x40x50x60x70x8
The syntax for the load instruction is:
LD *Rn,Rm
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 43
The instruction MVKL will allow a move of a 16-bit constant into a register as shown below:
MVKL .? a, A5(‘a’ is a constant or label)
How many bits represent a full address?
32 bits
So why does the instruction not allow a 32-bit move?
All instructions are 32-bit wide (see instruction opcode).
Loading the Pointer Rn
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 44
To solve this problem another instruction is available:
MVKH
Loading the Pointer Rn
eg. MVKH .? a, A5(‘a’ is a constant or label)
ah
ah x
al a
A5
MVKL a, A5
MVKH a, A5
Finally, to move the 32-bit address to a register we can use:
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 45
Loading the Pointer Rn
MVKL0x1234FABC, A5
A5 = 0xFFFFFABC ; Wrong
Example 1
A5 = 0x87654321
MVKL0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
MVKH0x1234FABC, A5
A5 = 0x1234FABC ; OK
Example 2
MVKH0x1234FABC, A5
A5 = 0x12344321
Always use MVKL then MVKH, look at the following examples:
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 46
LDH, MVKL and MVKH
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
Data Memory
MVKL pt1, A5MVKH pt1, A5
MVKL pt2, A6MVKH pt2, A6
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 47
Creating a loop
MVKL pt1, A5MVKH pt1, A5
MVKL pt2, A6MVKH pt2, A6
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
So far we have only implemented the SOP for one tap only, i.e.
Y= a1 * x1
So let’s create a loop so that we can implement the SOP for N Taps.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 48
Creating a loop
With the C6000 processors there are no dedicated
instructions such as block repeat. The loop is created
using the B instruction.
So far we have only implemented the SOP for one tap only, i.e.
Y= a1 * x1
So let’s create a loop so that we can implement the SOP for N Taps.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 49
What are the steps for creating a loop
1. Create a label to branch to.
2. Add a branch instruction, B.
3. Create a loop counter.
4. Add an instruction to decrement the loop counter.
5. Make the branch conditional based on the value in
the loop counter.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 50
1. Create a label to branch to
MVKL pt1, A5MVKH pt1, A5
MVKL pt2, A6MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 51
MVKL pt1, A5MVKH pt1, A5
MVKL pt2, A6MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .? loop
2. Add a branch instruction, B.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 52
Which unit is used by the B instruction?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
Data Memory
.S
MVKL pt1, A5MVKH pt1, A5
MVKL pt2, A6MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .? loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 53
Data Memory
Which unit is used by the B instruction?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5MVKH .S pt1, A5
MVKL .S pt2, A6MVKH .S pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .S loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 54
Data Memory
3. Create a loop counter.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5MVKH .S pt1, A5
MVKL .S pt2, A6MVKH .S pt2, A6MVKL .S count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .S loop
B registers will be introduced later
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 55
4. Decrement the loop counter
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
Data Memory
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5MVKH .S pt1, A5
MVKL .S pt2, A6MVKH .S pt2, A6MVKL .S count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
B .S loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 56
What is the syntax for making instruction conditional?
[condition] Instruction Label
e.g.
[B1] B loop
(1) The condition can be one of the following registers: A1, A2, B0, B1, B2.
(2) Any instruction can be conditional.
5. Make the branch conditional based on the value in the loop counter
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 57
The condition can be inverted by adding the exclamation symbol “!” as follows:
[!condition] Instruction Label
e.g.
[!B0] B loop ;branch if B0 = 0
[B0] B loop ;branch if B0 != 0
5. Make the branch conditional based on the value in the loop counter
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 58
Data Memory
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
ax
prod
32-bits
Y
.D
.S
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
5. Make the branch conditional
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 59
Case 1: B .S1 label Relative branch.
Label limited to +/- 220 offset.
More on the Branch Instruction (1)
With this processor all the instructions are encoded in a 32-bit.
Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded.
21-bit relative addressB
32-bit
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 60
More on the Branch Instruction (2)
By specifying a register as an operand instead of a label, it is possible to have an absolute branch.
This will allow a dynamic range of 232.
Case 2: B .S2 register Absolute branch.
Operates on .S2 ONLY!
5-bit register
codeB
32-bit
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 61
Testing the code
This code performs the following
operations:
a0*x0 + a0*x0 + a0*x0 + … + a0*x0
However, we would like to perform:
a0*x0 + a1*x1 + a2*x2 + … + aN*xN
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 62
Modifying the pointers
The solution is to modify the pointers
A5 and A6.
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 63
Indexing Pointers
Description
Pointer
Syntax PointerModified
*R No
R can be any register
In this case the pointers are used but not modified.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 64
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Syntax PointerModified
*R
*+R[disp]
*-R[disp]
No
No
No
[disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit).
disp = R or 5-bit constant. R can be any register.
In this case the pointers are modified BEFORE being used
and RESTORED to their previous values.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 65
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Syntax PointerModified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
No
No
No
Yes
Yes
In this case the pointers are modified BEFORE being used
and NOT RESTORED to their Previous Values.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 66
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Syntax PointerModified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
No
No
No
Yes
Yes
Yes
Yes
In this case the pointers are modified AFTER being used
and NOT RESTORED to their Previous Values.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 67
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Syntax PointerModified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
No
No
No
Yes
Yes
Yes
Yes
[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 68
Modify and testing the code
This code now performs the following
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 69
Store the final result
This code now performs the following
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 70
Store the final result
The Pointer A7 has not been initialised.
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 71
Store the final result
The Pointer A7 is now initialised.
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6
MVKL .S2 pt3, A7MVKH .S2 pt3, A7MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 72
What is the initial value of A4?
A4 is used as an accumulator,
so it needs to be reset to zero.
MVKL .S2 pt1, A5MVKH .S2 pt1, A5
MVKL .S2 pt2, A6MVKH .S2 pt2, A6
MVKL .S2 pt3, A7MVKH .S2 pt3, A7MVKL .S2 count, B0ZERO .L A4
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 73
How can we add more processing
power to this processor?
.S1
.M1
.L1
.D1
A0A1A2A3A4
Register File A
.
.
.
Data Memory
A15
32-bits
Increasing the processing power!
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 74
(1) Increase the clock frequency.
.S1
.M1
.L1
.D1
A0A1A2A3A4
Register File A
.
.
.
Data Memory
A15
32-bits
Increasing the processing power!
(2) Increase the number of Processing units.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 75
To increase the Processing Power, this processor has two sides (A and B or 1 and 2)
Data Memory
.S1
.M1
.L1
.D1
A0A1A2A3A4
Register File A
.
.
.
A15
32-bits
.S2
.M2
.L2
.D2
B0B1B2B3B4
Register File B
.
.
.
B15
32-bits
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 76
Can the two sides exchange operands in order to increase performance?
Data Memory
.S1
.M1
.L1
.D1
A0A1A2A3A4
Register File A
.
.
.
A15
32-bits
B15
.S2
.M2
.L2
.D2
B0B1B2B3B4
Register File B
.
.
.
32-bits
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 77
The answer is YES but there are limitations.
To exchange operands between the two sides, some cross paths or links are required.
What is a cross path? A cross path links one side of the CPU to
the other.
There are two types of cross paths:
Data cross paths.
Address cross paths.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 78
Data Cross Paths
Data cross paths can also be referred to as register file cross paths.
These cross paths allow operands from one side to be used by the other side.
There are only two cross paths:
one path which conveys data from side B to side A, 1X.
one path which conveys data from side A to side B, 2X.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 79
TMS320C67x Data-Path
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 80
Data Cross Paths
Data cross paths only apply to the .L, .S and .M units.
The data cross paths are very useful, however there are some limitations in their use.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 81
Data Cross Path Limitations
A
2x
.L1
.M1
.S1
B
1x
<src>
<src><dst>
(1) The destination register must be on same side as unit.
(2) Source registers - up to one cross path per execute packet per side.
Execute packet: group of instructions that execute simultaneously.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 82
Data Cross Path Limitations
A
2x
.L1
.M1
.S1
B
1x
<src>
<src><dst>
eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8
|| ADD .L1x A0,B0,A2
|| Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 83
Data Cross Path Limitations
A
2x
.L1
.M1
.S1
B
1x
<src>
<src><dst>
eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8
|| ADD .L1x A0,B0,A2
NOT VALID!
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 84
Data Cross Paths for both sides
A
2x
.L1
.M1
.S1
B
1x
<src>
<src><dst>
.L2
.M2
.S2
<dst>
<src>
<src>
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 85
Address cross paths
.D1A
Addr
Data
LDW.D1T1 *A0,A5STW.D1T1 A5,*A0
(1) The pointer must be on the same side of the unit.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 86
Load or store to either side
.D1A
*A0
B
Data1 A5
Data2 B5
DA1 = T1
DA2 = T2LDW.D1T1 *A0,A5LDW.D1T2 *A0,B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 87
Standard Parallel Loads
.D1A
A5
*A0
BB5
.D2
Data1
*B0
LDW.D1T1 *A0,A5|| LDW.D2T2 *B0,B5
DA1 = T1
DA2 = T2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 88
Parallel Load/Store using address cross paths
.D1A
A5
*A0
BB5
.D2
Data1
*B0
LDW.D1T2 *A0,B5|| STW.D2T1 A5,*B0
DA1 = T1
DA2 = T2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 89
Fill the blanks ... Does this work?
.D1A
*A0
B.D2
Data1
*B0
LDW.D1__ *A0,B5|| STW.D2__ B6,*B0
DA1 = T1
DA2 = T2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 90
Not Allowed!
.D1A
*A0
BB5
B6
.D2
Data1
*B0
LDW.D1T2 *A0,B5|| STW.D2T2 B6,*B0
DA2 = T2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 91
Not Allowed!Parallel accesses: both cross or neither cross
.D1A
*A0
BB5
B6
.D2
Data1
*B0
LDW.D1T2 *A0,B5|| STW.D2T2 B6,*B0
DA2 = T2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 92
Conditions Don’t Use Cross Paths
If a conditional register comes from the opposite side, it does NOT use a data or address cross-path.
Examples:
[B2] ADD .L1 A2,A0,A4[A1] LDW .D2 *B0,B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 93
CPURef Guide
Full CPU Datapath(Pg 2-2)
‘C62x Data-Path Summary
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 94
‘C67x Data-Path Summary ‘C67x
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 95
Cross Paths - Summary
Data Destination register on same side as unit.
Source registers - up to one cross path per execute packet per side.
Use “x” to indicate cross-path.
Address Pointer must be on same side as unit. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross.
Conditionals Don’t Use Cross Paths.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 96
Code Review (using side A only)
MVK .S1 40, A2 ; A2 = 40, loop count
loop: LDH .D1 *A5++, A0 ; A0 = a(n)
LDH .D1 *A6++, A1 ; A1 = x(n)
MPY .M1 A0, A1, A3 ; A3 = a(n) * x(n)
ADD .L1 A3, A4, A4 ; Y = Y + A3
SUB .L1 A2, 1, A2 ; decrement loop count
[A2] B .S1 loop ; if A2 0, branch
STH .D1 A4, *A7 ; *A7 = Y
Y =40
an xnn = 1
*
Note: Assume that A4 was previously cleared and the pointers are initialised.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 97
Let us have a look at the final details concerning the functional units.
Consider first the case of the .L and .S units.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 98
So where do the 40-bit registers come from?
Operands - 32/40-bit Register, 5-bit Constant
Operands can be: 5-bit constants (or 16-bit for MVKL and MVKH).
32-bit registers.
40-bit Registers.
However, we have seen that registers are only 32-bit.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 99
A 40-bit register can be obtained by concatenating two registers.
However, there are 3 conditions that need to be respected:
The registers must be from the same side.
The first register must be even and the second odd.
The registers must be consecutive.
Operands - 32/40-bit Register, 5-bit Constant
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 100
A1:A0
A3:A2
A5:A4
A7:A6
A9:A8
A11:A10
A13:A12
A15:A14
odd even:328
40-bit Reg
B1:B0
B3:B2
B5:B4
B7:B6
B9:B8
B11:B10
B13:B12
B15:B14
odd even:328
40-bit Reg
Operands - 32/40-bit Register, 5-bit Constant
All combinations of 40-bit registers are shown below:
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 101
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst>
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 102
Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 103
Operands - 32/40-bit Register, 5-bit Constant
OR.L1 A0, A1, A2
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 104
Operands - 32/40-bit Register, 5-bit Constant
OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 105
Operands - 32/40-bit Register, 5-bit Constant
OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4ADD.L1 A2, A3, A5:A4
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 106
Operands - 32/40-bit Register, 5-bit Constant
OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4ADD.L1 A2, A3, A5:A4
SUB.L1 A2, A5:A4, A5:A4
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 107
Operands - 32/40-bit Register, 5-bit Constant
OR.L1 A0, A1, A2
ADD.L2 -5, B3, B4ADD.L1 A2, A3, A5:A4
SUB.L1 A2, A5:A4, A5:A4ADD.L2 3, B9:B8, B9:B8
instr .unit <src>, <src>, <dst>
32-bitReg
40-bitReg
< src > < src >
32-bitReg
5-bitConst
32-bitReg
40-bitReg
< dst >
.L or .S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 108
To move the content of a register (A or B) to another register (B or A) use the move “MV” Instruction, e.g.:
MV A0, B0
MV B6, B7
To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.:
MVC IFR, A0
MVC A0, IRP
Register to register data transfer
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 109
TMS320C6211/6711 Instruction Set
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 110
'C6211 Instruction Set (by category)
ArithmeticABSADDADDAADDKADD2MPYMPYHNEGSMPYSMPYHSADDSATSSUBSUBSUBASUBCSUB2
Program CtrlBIDLENOP
LogicalANDCMPEQCMPGTCMPLTNOTORSHLSHRSSHLXOR
Data Mgmt
LDB/H/WMVMVCMVKMVKLMVKHMVKLHSTB/H/W
Bit MgmtCLREXTLMBDNORMSET
Note: Refer to the 'C6000 CPU Reference Guide for more details.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 111
'C6211 Instruction Set (by unit).S Unit
MVKLHNEGNOT ORSETSHLSHRSSHLSUBSUB2XORZERO
ADDADDKADD2ANDBCLREXTMVMVCMVKMVKLMVKH
.M Unit
SMPYSMPYH
MPYMPYH
.L Unit
NOTORSADDSATSSUBSUBSUBCXORZERO
ABSADDANDCMPEQCMPGTCMPLTLMBDMVNEGNORM
.D Unit
STB/H/WSUBSUBAZERO
ADDADDALDB/H/WMVNEGOther
IDLENOPNote: Refer to the 'C6000 CPU
Reference Guide for more details.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 112
‘C6711 Additional Instructions (by unit)
.S Unit
CMPLTDPRCPSPRCPDPRSQRSPRSQRDPSPDP
ABSSPABSDPCMPGTSPCMPEQSPCMPLTSPCMPGTDPCMPEQDP.M Unit
MPYI MPYID
MPYSPMPYDP
.L Unit
INTSPINTSPUSPINTSPTRUNCSUBSPSUBDP
ADDDPADDSPDPINTDPSPINTDPINTDPU
.D Unit
ADDAD LDDW
Note: Refer to the 'C6000 CPU Reference Guide for more details.
‘C67x
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 113
TMS320C6211/6711 Memory Map
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 114
‘C6211 Memory MapByte Address
FFFF_FFFF
0000_000064K x 8 Internal
(L2 cache)
Internal Memory Unified (data or prog) 4 blocks - each can be
RAM or cache
On-chip Peripherals0180_0000
External Memory Async (SRAM, ROM, etc.)
Sync (SBSRAM, SDRAM)
256M x 8 External2
256M x 8 External3
8000_0000
9000_0000
A000_0000
B000_0000
256M x 8 External0
256M x 8 External1Level 1 Cache
4KB Program 4KB Data Not in map CPU L2
64K
4KP
4KD
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 115
TMS320C6211/6711 Peripherals
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 116
'C6x System Block Diagram
PERIPHERALS
Memory
Internal BusesExternalMemory
CPU
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 117
‘C6x Internal Buses
A
D
InternalMemory
x32AD
ExternalInterface
A
Dx32
Peripherals
can perform 64-bit data loads.‘C67x
Data Addr - T1 x32
Data Data - T1 x32/64
Data Addr - T2 x32
Data Data - T2 x32/64
Aregs
Bregs
Program Addr x32
Program Data x256PC
DMA Addr - Read x32
DMA Data - Read x32
DMA Addr - Write x32
DMA Data - Write x32
DMA
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 118
PERIPHERALS
Memory
'C6x System Block Diagram
Internal Buses
CPU
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
EMIF
Ext’lMemory
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 119‘C6202
16M x 8 External0
Int’l Prog (64K instr)
Int’l Data (128K bytes)
On-chip Peripherals
4M x 8 External1
16M x 8 External2
16M x 8 External3
64K x 8 Internal
(L2 cache)
On-chip Peripherals
256M x 8 External2
256M x 8 External3
256M x 8 External0
256M x 8 External1
‘C6211
‘C6201/11 Memory Maps
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 120
PERIPHERALS
'C6x System Block Diagram
Internal Buses
CPU
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
EMIF
Ext’lMemory
- Sync
- Async
ProgramRAM Data Ram
Addr
D (32)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 121
'C6x Peripherals
‘C6x
CPU
EMIF
DMA
Boot
ExternalMemory
EMIF (External Memory
Interface)
- Glueless access to async/sync
memory
EPROM, SRAM, SDRAM,
SBSRAM
DMA/EDMA (Enhance Direct
Memory Acces)
McBSP
HPI/XB
Timer
PLL
McBSP (Multi-Channel
Buffered Serial Port)
- High speed sync serial
comm
- T1/E1/MVIP interface
HPI (Host Port Interface)
/Expansion Bus (XB)
- 16/32-bit host P access
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 122
Clocking - Basic Definitions
What is a “clock cycle”?‘C6x‘C6x
CLKINCLKOUT2 (1/2 CLKOUT1)
CLKOUT1 (‘C6x clock cycle)PLL
x1x4
CLKIN - MHz PLL CLKOUT2 - MHz MIPs (max)
250 x1 125 2000
200 x1 100 1600
50 x4 100 1600
25 x4 50 800
CLKOUT1 - MHz
250 (4ns)
200 (5ns)
200
100 (10ns)
When we talk about cycles ...
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 123
'C6x System Block Diagram (Final)
Internal Buses
CPU
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
EMIF
Ext’lMemory
- Sync- Async
ProgramRAM
Data Ram
D (32)
Serial Port
Host Port
Boot Load
Timers
Pwr Down
DMA
Addr
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 124
ProgramMemory
DataMemory
L2 Memory
’C6201B 64KB1 blk Pgm/Cache
64KB2 blks4 banks ea
External
’C6701 64KB1 blk Pgm/Cache
64KB2 blks8 banks ea
External
’C6202 256KB1 blk Pgm/Cache1 blk Mapped Pgm
128 KB2 blks4 banks ea
External
’C6211/C6711 4 KB1 blk Cache
4 KB1 blk Cache
64 KB4 blk MappedCache
L1 Memory
Internal Memory Summary
TMS320C67x DSP GenerationParametric Table
TMS320C64x DSP Generation Parametric Table
TMS320C62x DSP Generation Parametric Table
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 125
‘C6000 Device Summary
Device MIPS MHz Kbytes pins mm W $ Periphs
6201B 1600 200 128 352 27 1.9 80-110 D2H
6202 2000 250 384 352 27 1.9 120-150 D3X
6211 1200 150 72 256 27 1.5 20-40 E2H
TMS320 MFLOPS MHz Kbytes pins mm W $ Periphs
6701 1000 167 128 352 35 1.9 170-200 D2H
6711 600 100 72 256 27 0.9 20-40 E2H
Peripherals Legend:D,E: DMA,EDMA2,3: # of McBSPs
H,X: HPI, XBUS
TMS320C62x DSP Generation Parametric Table
TMS320C67x DSP GenerationParametric Table
TMS320C64x DSP Generation Parametric Table
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 126
’C6000 History
6201 r1 1Q97 coincident ‘C6x architectural announcement. Sample CPU core, minimal peripherals.
6201 r2 4Q97. Full production, with peripherals.
6201B 4Q98. Power reduced, .18 micron silicon, double ports into internal data memory.
6701 3Q98. Pin-for-pin compatible floating-point version of ‘C6201. 1GFLOP (@ 167MHz) performance.
6202 2Q99. 2000 MIPS @ 250MHz. 2-3x 6201 on-chip memory.
Replaced HPI with Expansion Bus (32-bit HPI + more).
6211 3Q99. 2 cents per MIPS! 1200MIPS @ 150MHz as low as $25. Double-level cache, enhanced DMA.
6711 Announced 3/1/99. 6701 floating-point CPU with 6211-like memory/peripherals. Volume pricing under $20.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 127
‘C6x Family Part Numbering
Example = TMS320LC6201PKGA200 TMS320 = TI DSP
L = Place holder for voltage levels
C6 = C6x family
2 = Fixed-point core
01 = Memory/peripheral configuration
PKG = Pkg designator (actual letters TBD)
A = -40 to 85C (blank for 0 to 70C)
200 = Core CPU speed in Mhz
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 128
Device Summary Table
Device Int Mem Ext Mem Peripherals
6201/6701 64K Data 3 x 16M DMA16K Instr 1 x 4M 2 McBSP
HPI (16-bit)2 Timer/Counters (32-bit)
6202 128K Data 3 x 16M DMA48K Instr 1 x 4M 2 McBSP
4 x 256M ---- XBus (32-bit)2 Timer/Counters (32-bit)
6211 4K Data Cache 4 x 256M EDMA4K Prog Cache 2 McBSP 64K RAM/Cache HPI (16-bit)
2 Timer/Counters (32-bit)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 129
Module 1 Exam1. Functional Units
a. How many can perform an ADD? Name them.
b. Which support memory loads/stores?
.M .S .D .L
six; .L1, .L2, .D1, .D2, .S1, .S2
2. Memory Map
a. How many external ranges exist on ‘C6201?
Four
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 130
3. Conditional Codea. Which registers can be used as cond’l registers?
b. Which instructions can be conditional?
4. Performancea. What is the 'C6711 instruction cycle time?
b. How can the 'C6711 execute 1200 MIPs?
A1, A2, B0, B1, B2
All of them
CLKOUT1
1200 MIPs = 8 instructions (units) x 150 MHz
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 131
5. Coding Problems
a. Move contents of A0-->A1
MV .L1 A0, A1or ADD .S1 A0, 0, A1or MPY .M1 A0, 1, A1(what’s the problem
with this?)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 132
5. Coding Problems
a. Move contents of A0-->A1
b. Move contents of CSR-->A1
c. Clear register A5
MV .L1 A0, A1or ADD .S1 A0, 0, A1or MPY .M1 A0, 1, A1
ZERO .S1 A5or SUB .L1 A5, A5, A5or MPY .M1 A5, 0, A5or CLR .S1 A5, 0, 31, A5or MVK .S1 0, A5or XOR .L1 A5,A5,A5
(A0 can only be a 16-bit value)
MVC CSR, A1
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 133
5. Coding Problems (cont’d)
d. A2 = A02 + A1
e. If (B1 0) then B2 = B5 * B6
f. A2 = A0 * A1 + 10
g. Load an unsigned constant (19ABCh) into register A6.
MPY.M1 A0, A0, A2ADD.L1 A2, A1, A2
[B1] MPY.M2 B5, B6, B2
mvkl .s1 0x00019abc,a6mvkh .s1 0x00019abc,a6
value .equ 0x00019abc
mvkl.s1 value,a6mvkh.s1 value,a6
MPY A0, A1, A2ADD 10, A2, A2
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 134
5. Coding Problems (cont’d)
h. Load A7 with contents of mem1 and post-increment the selected pointer.
x16 mem
mem1 10h
A7
load_mem1: MVKL .S1 mem1, A6
MVKH .S1 mem1, A6
LDH .D1 *A6++, A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002Chapter 1, Slide 135
WEEK1
TMS320C6000 Introduction andArchitectural Overview
- End -