Upload
shereenkhan120
View
230
Download
0
Embed Size (px)
Citation preview
7/27/2019 ARM Processor Organization - Presentation
1/99
ARM processorARM processor
organizationorganization
P. BakowskiP. Bakowski
[email protected]@ieee.org
7/27/2019 ARM Processor Organization - Presentation
2/99
P. Bakowski 2
ARM register bankARM register bank
TheThe register bankregister bank, which stores the, which stores the processorprocessor
statestate..
r00r00
r01r01
r14r14r15r15
7/27/2019 ARM Processor Organization - Presentation
3/99
P. Bakowski 3
ARM register bankARM register bank
It hasIt has two read portstwo read ports and oneand one write port whichwrite port which
can each be used to access any register, plus ancan each be used to access any register, plus anadditional read port and an additional write portadditional read port and an additional write port
that give special access to r15, the programthat give special access to r15, the program
counter.counter.
r00r00
r01r01
r14r14
r15r15
7/27/2019 ARM Processor Organization - Presentation
4/99
P. Bakowski 4
ARM register bankARM register bank
It hasIt has two read portstwo read ports andand oneone write portwrite port whichwhich
can each be used to access any register, plus ancan each be used to access any register, plus anadditional read port and an additional write portadditional read port and an additional write port
that give special access to r15, the programthat give special access to r15, the program
counter.counter.
r00r00
r01r01
r14r14
r15r15
7/27/2019 ARM Processor Organization - Presentation
5/99
P. Bakowski 5
ARM register bankARM register bank
It hasIt has two read portstwo read ports andand oneone write portwrite port whichwhich
can each be used to access any register, plus ancan each be used to access any register, plus anadditional read portadditional read port and anand an additional write portadditional write port
that give special access to r15, the programthat give special access to r15, the program
counter.counter.
r00r00
r01r01
r14r14
r15r15
7/27/2019 ARM Processor Organization - Presentation
6/99
P. Bakowski 6
ARM register bankARM register bank
It hasIt has two read portstwo read ports andand oneone write portwrite port whichwhich
can each be used to access any register, plus ancan each be used to access any register, plus anadditional read portadditional read port and anand an additional write portadditional write port
that give special access to r15, thethat give special access to r15, the programprogram
countercounter..
r00r00
r01r01
r14r14
r15r15
7/27/2019 ARM Processor Organization - Presentation
7/99
P. Bakowski 7
ARM barrel shifterARM barrel shifter
TheThe barrel shifterbarrel shifter, which can, which can shift or rotateshift or rotate oneone
operandoperand byby any number of bitsany number of bits..
number of bitsnumber of bits
7/27/2019 ARM Processor Organization - Presentation
8/99
P. Bakowski 8
ARM ALUARM ALU
The ALU, which performs theThe ALU, which performs the arithmeticarithmetic andand logiclogic
functionsfunctions required by therequired by the instructioninstruction set.set.
functionsfunctions
operandsoperands
7/27/2019 ARM Processor Organization - Presentation
9/99
P. Bakowski 9
ARM 3ARM 3--stage pipelinestage pipeline
TheThe address registeraddress register andand incrementerincrementer, which select, which select
and hold addresses and generate sequentialand hold addresses and generate sequentialaddresses when required.addresses when required.
address registeraddress register
incrementerincrementer
7/27/2019 ARM Processor Organization - Presentation
10/99
P. Bakowski 10
ARM 3ARM 3--stage pipelinestage pipelineThe data registers, which hold data passing to andThe data registers, which hold data passing to and
from memory.from memory.
data out registerdata out register data in registerdata in register
d[31:0]d[31:0]to/from memoryto/from memory
datadata instructionsinstructions
7/27/2019 ARM Processor Organization - Presentation
11/99
P. Bakowski 11
ARM 3ARM 3--stage pipelinestage pipeline
The instruction decoder and associated control logic.The instruction decoder and associated control logic.
control pathcontrol path
da
ta
in
regi s
ter
da
ta
in
regi s
ter
instructionsinstructions
data pathdata path
control signalscontrol signals
7/27/2019 ARM Processor Organization - Presentation
12/99
P. Bakowski 12
Three stage pipeline : ARM 1,2,3Three stage pipeline : ARM 1,2,3
FETCHFETCH; the instruction is fetched from memory and; the instruction is fetched from memory and
placed in the instruction pipeline.placed in the instruction pipeline.
fetchfetch
memorymemoryaddress registeraddress register
data in registerdata in registerto instructionto instructionregisterregister
clock cycleclock cycle
7/27/2019 ARM Processor Organization - Presentation
13/99
P. Bakowski 13
Three stage pipeline : ARM 1,2,3Three stage pipeline : ARM 1,2,3DECODEDECODE; the instruction is decoded and the; the instruction is decoded and the
datapath control signals prepared for the next cycle.datapath control signals prepared for the next cycle.
fetchfetch decodedecode
instruction registerinstruction register
control pathcontrol path
control signalscontrol signals
fetchfetch
7/27/2019 ARM Processor Organization - Presentation
14/99
P. Bakowski 14
Three stage pipeline : ARM 1,2,3Three stage pipeline : ARM 1,2,3
EXECUTEEXECUTE; the instruction; the instruction controls the datapathcontrols the datapath; the; the
register bank is read, an operand shifted the ALUregister bank is read, an operand shifted the ALUresult generated and written back into a destinationresult generated and written back into a destination
register.register.
fetchfetch decodedecode
fetchfetch
executeexecute
decodedecode
data pathdata pathcontrol signalscontrol signals
7/27/2019 ARM Processor Organization - Presentation
15/99
P. Bakowski 15
Three stage pipeline : ARM 1,2,3Three stage pipeline : ARM 1,2,3
fetchfetch decodedecode
fetchfetch
executeexecute
decodedecode executeexecutefetchfetch decodedecode executeexecute
instruction throughputinstruction throughput : 1 instruction per clock cycle: 1 instruction per clock cycle
instruction latency : 3 clock cyclesinstruction latency : 3 clock cycles
clock cycleclock cycle
7/27/2019 ARM Processor Organization - Presentation
16/99
P. Bakowski 16
Three stage pipeline : ARM 1,2,3Three stage pipeline : ARM 1,2,3
fetchfetch decodedecode
fetchfetch
executeexecute
decodedecode executeexecutefetchfetch decodedecode executeexecute
instruction throughput : 1 instruction per clock cycleinstruction throughput : 1 instruction per clock cycle
instruction latencyinstruction latency : 3 clock cycles: 3 clock cycles
7/27/2019 ARM Processor Organization - Presentation
17/99
P. Bakowski 17
ARM 1,2,3 architectureARM 1,2,3 architecture
addr
ess
register
addr
ess
register
data
in
register
data
in
register
incrementer
incrementer
data
out
register
data
out
register
control pathcontrol path
control signalscontrol signals
registerregister
bankbank
MM
ALUALU
BSBS
instruction registerinstruction register
a[31:0]a[31:0]
d[31:0]d[31:0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
MM multiplier, BSmultiplier, BS barrel shifterbarrel shifter
7/27/2019 ARM Processor Organization - Presentation
18/99
P. Bakowski 18
Multi cycle executionMulti cycle execution
fetchfetch decodedecodeSTRSTR
STRSTR store instructionstore instruction needs two execution cycles:needs two execution cycles:
address calculation cycle andaddress calculation cycle and
data transfer cycledata transfer cycle
fetchfetch
7/27/2019 ARM Processor Organization - Presentation
19/99
P. Bakowski 19
Multi cycle executionMulti cycle execution
fetchfetch decodedecodeSTRSTR
executeexecutedecodedecode addressaddress
fetchfetch
fetchfetch
address calculationaddress calculation cycle andcycle and
data transfer cycledata transfer cycle
7/27/2019 ARM Processor Organization - Presentation
20/99
P. Bakowski 20
Multi cycle executionMulti cycle execution
fetchfetch decodedecodeSTRSTR
executeexecutedecodedecode addressaddress
fetchfetch decodedecode executeexecute
fetchfetch decodedecode
transfertransfer
address calculation cycle andaddress calculation cycle and
data transferdata transfer cyclecycle
4 clock cycles4 clock cycles
7/27/2019 ARM Processor Organization - Presentation
21/99
P. Bakowski 21
Multi cycle executionMulti cycle execution
fetchfetch decodedecodeSTRSTR
executeexecutedecodedecode addressaddress
fetchfetch decodedecode executeexecute
fetchfetch decodedecode
transfertransfer
address calculation cycle andaddress calculation cycle and
data transferdata transfer cyclecycle
4 clock cycles4 clock cycles
Attention: onlyAttention: only oneonememory transfermemory transfer
per clock cycleper clock cycle
7/27/2019 ARM Processor Organization - Presentation
22/99
P. Bakowski 22
Processor performanceProcessor performance
The timeThe time TT, required to execute a given program, required to execute a given programis given by:is given by:
T=(NT=(Ninstinst*CPI)/f*CPI)/fclkclk
NNinstinst--number of instructions in the programnumber of instructions in the program
CPICPI --clock cycles per instructionclock cycles per instruction
ffclkclk --clock frequencyclock frequency
7/27/2019 ARM Processor Organization - Presentation
23/99
P. Bakowski 23
Processor performanceProcessor performance
The timeThe time TT, required to execute a given program, required to execute a given programis given by:is given by:
T=(NT=(Ninstinst*CPI)/f*CPI)/fclkclk
NNinstinst--number of instructionsnumber of instructionsin the programin the program
CPICPI --clock cycles per instructionclock cycles per instruction
ffclkclk --clock frequencyclock frequency
7/27/2019 ARM Processor Organization - Presentation
24/99
P. Bakowski 24
Processor performanceProcessor performance
The timeThe time TT, required to execute a given program, required to execute a given programis given by:is given by:
T=(NT=(Ninstinst*CPI)/f*CPI)/fclkclk
NNinstinst--number of instructions in the programnumber of instructions in the program
CPICPI --clockclock cycles per instruction (throughput)cycles per instruction (throughput)
ffclkclk --clock frequencyclock frequency
7/27/2019 ARM Processor Organization - Presentation
25/99
P. Bakowski 25
Processor performanceProcessor performance
The timeThe time TT, required to execute a given program, required to execute a given programis given by:is given by:
T=(NT=(Ninstinst*CPI)/f*CPI)/fclkclk
NNinstinst--number of instructions in the programnumber of instructions in the program
CPICPI --clock cycles per instructionclock cycles per instruction
ffclkclk --clock frequencyclock frequency
7/27/2019 ARM Processor Organization - Presentation
26/99
P. Bakowski 26
Processor performanceProcessor performance
SinceSince NNinsiinsi is constantis constant for a given program there arefor a given program there areonlyonly two waystwo ways to increase performance:to increase performance:
increase the clock rate,increase the clock rate, ffclkclk.. reduce the average number of clock cycles perreduce the average number of clock cycles per
instruction,instruction, CPI.CPI.
7/27/2019 ARM Processor Organization - Presentation
27/99
P. Bakowski 27
Clock rate increaseClock rate increaseIncrease the clock rateIncrease the clock rate,, ffclkclk. This requires the logic in. This requires the logic ineach pipeline stage to be simplified and, therefore,each pipeline stage to be simplified and, therefore,
thethe number of pipeline stages to be increasednumber of pipeline stages to be increased..
stage1stage1 stage2stage2 stage3stage3
clock cycleclock cycle 3 stages3 stages
7/27/2019 ARM Processor Organization - Presentation
28/99
P. Bakowski 28
Clock rate increaseClock rate increaseIncrease the clock rateIncrease the clock rate,, ffclkclk. This requires the logic in. This requires the logic ineach pipeline stage to be simplified and, therefore,each pipeline stage to be simplified and, therefore,
thethe number of pipeline stages to be increasednumber of pipeline stages to be increased..
stage1stage1 stage2stage2 stage3stage3 stage4stage4 stage5stage5
stage1stage1 stage2stage2 stage3stage3
clock cycleclock cycle 3 stages3 stages
clock cycleclock cycle 5 stages5 stages
7/27/2019 ARM Processor Organization - Presentation
29/99
P. Bakowski 29
Clock rate increaseClock rate increaseNote that theNote that the clock rateclock rate to be used depends heavilyto be used depends heavily
on theon the implementation technologyimplementation technology..
stage1stage1 stage2stage2 stage3stage3clock cycleclock cycle 3 stages3 stages
new implementation technologynew implementation technology
stage1stage1 stage2stage2 stage3stage3clock cycleclock cycle 3 stages3 stages
7/27/2019 ARM Processor Organization - Presentation
30/99
P. Bakowski 30
More hardware resourcesMore hardware resourcesReduceReduce the average number ofthe average number of clock cycles perclock cycles per
instructioninstruction,, CPI.CPI. This requires the introduction of moreThis requires the introduction of more
parallelismparallelism that means more hardware resources to bethat means more hardware resources to beused in a given clock cycle.used in a given clock cycle.
data readdata read oror
instruction fetchinstruction fetchdata readdata read andand
instruction fetchinstruction fetch
D/I MD/I M DMDM IMIM
7/27/2019 ARM Processor Organization - Presentation
31/99
P. Bakowski 31
55--stage pipeline organizationstage pipeline organizationHigher performance ARM cores employ a 5Higher performance ARM cores employ a 5--stagestage
pipeline and havepipeline and have separate instruction and dataseparate instruction and data
memoriesmemories..
Breaking instruction execution down into five stagesBreaking instruction execution down into five stages
rather than three reduces the maximum work whichrather than three reduces the maximum work which
must be completed in a clock cycle, and hence allowsmust be completed in a clock cycle, and hence allowsa higher clock frequency to be used.a higher clock frequency to be used.
The separate instruction and data memories seen asThe separate instruction and data memories seen as
separate caches connected to a unified instruction andseparate caches connected to a unified instruction anddata main memory allow a significant reduction in thedata main memory allow a significant reduction in the
core's CPI.core's CPI.
7/27/2019 ARM Processor Organization - Presentation
32/99
P. Bakowski 32
55--stage pipeline organizationstage pipeline organizationHigher performance ARM cores employ a 5Higher performance ARM cores employ a 5--stagestage
pipeline and havepipeline and have separate instruction and dataseparate instruction and data
memoriesmemories..
BreakingBreaking instruction executioninstruction execution down intodown into five stagesfive stages
rather than three reduces the maximum work whichrather than three reduces the maximum work which
must be completed in a clock cycle, and hence allowsmust be completed in a clock cycle, and hence allowsaa higher clock frequencyhigher clock frequency to be used.to be used.
The separate instruction and data memories seen asThe separate instruction and data memories seen as
separate caches connected to a unified instruction andseparate caches connected to a unified instruction anddata main memory allow a significant reduction in thedata main memory allow a significant reduction in the
core's CPI.core's CPI.
7/27/2019 ARM Processor Organization - Presentation
33/99
P. Bakowski 33
55--stage pipeline organizationstage pipeline organizationHigher performance ARM cores employ a 5Higher performance ARM cores employ a 5--stagestage
pipeline and havepipeline and have separate instruction and dataseparate instruction and data
memoriesmemories..
Breaking instruction execution down into five stagesBreaking instruction execution down into five stages
rather than three reduces the maximum work whichrather than three reduces the maximum work which
must be completed in a clock cycle, and hence allowsmust be completed in a clock cycle, and hence allowsa higher clock frequency to be used.a higher clock frequency to be used.
TheThe separate instruction and data memoriesseparate instruction and data memories seen asseen as
separate cachesseparate caches connected to a unified instruction andconnected to a unified instruction anddata main memory allow a significant reduction in thedata main memory allow a significant reduction in the
core's CPI.core's CPI.
7/27/2019 ARM Processor Organization - Presentation
34/99
P. Bakowski 34
Fetch stageFetch stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
FETCHFETCH -- the instruction is fetched from memory andthe instruction is fetched from memory and
placed in theplaced in the instruction cacheinstruction cache..
incrementer
incrementer
I cacheI cachenext PCnext PC
to decoderto decoder
7/27/2019 ARM Processor Organization - Presentation
35/99
P. Bakowski 35
Decode stageDecode stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
DECODEDECODE -- the instruction isthe instruction is decodeddecoded andand registerregister
operands readoperands read from the register file.from the register file.
II--decode
decode
7/27/2019 ARM Processor Organization - Presentation
36/99
P. Bakowski 36
Decode stageDecode stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
There areThere are three operand read portsthree operand read ports in the register file,in the register file,
so most instructions can obtain all their operands inso most instructions can obtain all their operands in
one cycle.one cycle.
II--decode
decode registerregister
filefile
7/27/2019 ARM Processor Organization - Presentation
37/99
P. Bakowski 37
Execute stageExecute stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
EXECUTEEXECUTE -- anan operand isoperand is shiftedshifted and the ALU resultand the ALU resultgenerated.generated.
registerregisterfilefile
MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
38/99
P. Bakowski 38
Execute stageExecute stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
EXECUTEEXECUTE -- an operand is shifted and thean operand is shifted and theALU resultALU resultgeneratedgenerated..
registerregisterfilefile
MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
39/99
P. Bakowski 39
Execute stageExecute stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
If the instruction is aIf the instruction is a load or storeload or store the memorythe memoryaddress is computedaddress is computed in the ALU.in the ALU.
registerregisterfilefile
MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
40/99
P. Bakowski 40
Buffer stageBuffer stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
BUFFERBUFFERdatadata -- data memory is accessed if required.data memory is accessed if required.
Otherwise the ALU result is simply buffered for one clockOtherwise the ALU result is simply buffered for one clock
cycle to give the same pipeline flow for all instructions.cycle to give the same pipeline flow for all instructions.
+4+4
D cacheD cache
byte replicationbyte replication
rotation/signrotation/signextensionextension
7/27/2019 ARM Processor Organization - Presentation
41/99
P. Bakowski 41
Write back stageWrite back stagefetchfetch decodedecode executeexecute bufferbuffer writewrite
WRITEWRITE--back; the results generated by the instructionback; the results generated by the instruction
are written back to the register file, including any dataare written back to the register file, including any data
loaded from memory.loaded from memory.
registerregister
filefile
ALUALU
ALUbusALUbus
D cacheD cacherotation/signrotation/sign
extensionextension
7/27/2019 ARM Processor Organization - Presentation
42/99
P. Bakowski 42
Data forwardingData forwardingIn the 5In the 5--stage pipeline instructionstage pipeline instruction execution is spreadexecution is spread
across three pipeline stagesacross three pipeline stages, the only way to resolve, the only way to resolve
data dependencies without stalling the pipeline is todata dependencies without stalling the pipeline is tointroduceintroduce forwardingforwarding pathspaths..
fetchfetch
decodedecode
executeexecute
bufferbuffer
writewrite
7/27/2019 ARM Processor Organization - Presentation
43/99
P. Bakowski 43
Data forwardingData forwarding
fetchfetch decodedecode executeexecute bufferbuffer writewrite
fetchfetch decodedecode executeexecute bufferbuffer writewrite
Data dependenciesData dependencies arise when an instructionarise when an instruction
needs to use theneeds to use the result of one of its predecessorsresult of one of its predecessorsbefore that result has returned to thebefore that result has returned to the register fileregister file..
to register fileto register file
7/27/2019 ARM Processor Organization - Presentation
44/99
P. Bakowski 44
Data forwardingData forwardingForwarding pathsForwarding paths (by(by--pass) allow the intermediatepass) allow the intermediate
results to be passed between stages as soon as they areresults to be passed between stages as soon as they are
available, in the 5available, in the 5
--stage ARM pipeline each of the threestage ARM pipeline each of the three
source operands can be forwarded from any of threesource operands can be forwarded from any of three
intermediate result registersintermediate result registers
fetchfetch decodedecode executeexecute bufferbuffer writewrite
fetchfetch decodedecode executeexecute bufferbuffer writewrite
to register fileto register file
byby--pass pathspass paths
7/27/2019 ARM Processor Organization - Presentation
45/99
P. Bakowski 45
PC organizationPC organization -- compatibilitycompatibility
TheThe programming behaviorprogramming behavior of the PC implementedof the PC implemented
through r15 is based on thethrough r15 is based on the operational characteristicsoperational characteristicsof the 3of the 3--stage ARM pipeline.stage ARM pipeline.
Basically the 5Basically the 5--stage pipeline reads the instructionstage pipeline reads the instruction
operandsoperands one stage earlierone stage earlier and that is incompatible withand that is incompatible with33--stage design.stage design.
7/27/2019 ARM Processor Organization - Presentation
46/99
P. Bakowski 46
PC organizationPC organization -- compatibilitycompatibility
The programming behavior of the PC implementedThe programming behavior of the PC implemented
through r15 is based on the operational characteristicsthrough r15 is based on the operational characteristicsof the 3of the 3--stage ARM pipeline.stage ARM pipeline.
Basically the 5Basically the 5--stage pipeline reads the instructionstage pipeline reads the instruction
operandsoperands one stage earlierone stage earlier and that is incompatible withand that is incompatible with33--stage design.stage design.
7/27/2019 ARM Processor Organization - Presentation
47/99
P. Bakowski 47
PC organisationPC organisation -- solutionsolutionThis problem is resolved by theThis problem is resolved by the incrementationincrementation of the PCof the PC
value from the fetch stage in thevalue from the fetch stage in the decode stagedecode stage,,
bypassing the pipeline register between the two stages.bypassing the pipeline register between the two stages.
PC+4 for the next instruction is equal to PC+8 for thePC+4 for the next instruction is equal to PC+8 for the
current instruction (4 bytes farther), so the correct r15current instruction (4 bytes farther), so the correct r15
value is obtained without additional hardware.value is obtained without additional hardware.
7/27/2019 ARM Processor Organization - Presentation
48/99
P. Bakowski 48
PC organisationPC organisation -- solutionsolutionPC+4 for the next instruction is equal to PC+8 for thePC+4 for the next instruction is equal to PC+8 for the
current instruction (4 bytes farther), so the correct r15current instruction (4 bytes farther), so the correct r15
value is obtained without additional hardware.value is obtained without additional hardware.
incrementer
incre
menter
I cacheI cache
nextnext
PC+4PC+4
to decoderto decoder
next PC+8next PC+8
registerregister
filefile
r15r15
7/27/2019 ARM Processor Organization - Presentation
49/99
P. Bakowski 49
ARM programming modelARM programming modelTheThe Instruction Set ArchitectureInstruction Set Architecture (ISA) defines the(ISA) defines the
operations that the programmer can use to change theoperations that the programmer can use to change the
statestate of the system incorporating the processor.of the system incorporating the processor.
This state usually comprises the values of the data itemsThis state usually comprises the values of the data items
in the visible registers and the memory.in the visible registers and the memory.
Each instruction performs a defined transformation fromEach instruction performs a defined transformation from
the state before the instruction is executed to the statethe state before the instruction is executed to the state
after it has completed.after it has completed.
7/27/2019 ARM Processor Organization - Presentation
50/99
P. Bakowski 50
ARM programming modelARM programming modelThe Instruction Set Architecture (ISA) defines theThe Instruction Set Architecture (ISA) defines the
operations that the programmer can use to change theoperations that the programmer can use to change the
state of the system incorporating the processor.state of the system incorporating the processor.
ThisThis statestate usually comprises the values of theusually comprises the values of the data itemsdata items
inin thethe visiblevisible registersregisters and theand the memorymemory..
Each instruction performs a defined transformation fromEach instruction performs a defined transformation from
the state before the instruction is executed to the statethe state before the instruction is executed to the state
after it has completed.after it has completed.
7/27/2019 ARM Processor Organization - Presentation
51/99
P. Bakowski 51
ARM programming modelARM programming modelThe Instruction Set Architecture (ISA) defines theThe Instruction Set Architecture (ISA) defines the
operations that the programmer can use to change theoperations that the programmer can use to change the
state of the system incorporating the processor.state of the system incorporating the processor.
This state usually comprises the values of the data itemsThis state usually comprises the values of the data items
in the visible registers and the memory.in the visible registers and the memory.
Each instructionEach instruction performsperforms a defineda defined transformationtransformation fromfrom
thethe state beforestate before the instruction is executed to thethe instruction is executed to the statestate
afterafter it has completed.it has completed.
7/27/2019 ARM Processor Organization - Presentation
52/99
P. Bakowski 52
ARM memory subsystemARM memory subsystem
ARM memoryARM memory may be viewed as amay be viewed as a linear array oflinear array of
bytesbytes numbered fromnumbered from zero up to 2zero up to 2
3232
--11..
00
223232--11
linear arraylinear array
of bytesof bytes
7/27/2019 ARM Processor Organization - Presentation
53/99
P. Bakowski 53
ARM memory subsystemARM memory subsystemDataData items may be 8items may be 8--bitbit bytesbytes, 16, 16--bitbit halfhalf--wordswords oror
3232--bitbit wordswords..
223232--11
wordswordsbytesbytes
00112233
7/27/2019 ARM Processor Organization - Presentation
54/99
P. Bakowski 54
ARM memory subsystemARM memory subsystemWordsWords are alwaysare always alignedaligned onon 44--byte boundariesbyte boundaries (the(the
two leasttwo least significantsignificant address bitsaddress bits areare zerozero) and half) and half--
words are aligned on even byte boundaries.words are aligned on even byte boundaries.
00112233
223232--11
0000
word addressword address
byte numberbyte number
7/27/2019 ARM Processor Organization - Presentation
55/99
P. Bakowski 55
ARM memory subsystemARM memory subsystemWordsWords are alwaysare always alignedaligned onon 44--byte boundariesbyte boundaries (the(the
two leasttwo least significantsignificant address bitsaddress bits areare zerozero) and half) and half--
words are aligned on even byte boundaries.words are aligned on even byte boundaries.
00112233
223232--11
little endianlittle endian organizationorganization
7/27/2019 ARM Processor Organization - Presentation
56/99
P. Bakowski 56
ARM loadARM load--store architecturestore architecture
TheThe processing instructionprocessing instruction (add, subtract, and so on)(add, subtract, and so on)
take the valuestake the values from the registersfrom the registers and always placeand always placethe results into a register.the results into a register.
registerregisterfilefile
MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
57/99
P. Bakowski 57
ARM loadARM load--store architecturestore architecture
TheThe processing instructionprocessing instruction (add, subtract, and so on)(add, subtract, and so on)
take the values from the registers and always placetake the values from the registers and always placethethe results into a registerresults into a register..
registerregisterfilefile
MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
58/99
P. Bakowski 58
ARMARM loadload--store architecturestore architectureThe only instructions which apply to memory state are onesThe only instructions which apply to memory state are ones
which copy memorywhich copy memory values into registervalues into register ((load instructionsload instructions))
or copy register values into memory (store instructions).or copy register values into memory (store instructions).
registerregisterfilefile
DD --cachecache memorymemory
7/27/2019 ARM Processor Organization - Presentation
59/99
P. Bakowski 59
ARM loadARM load--storestore architecturearchitectureThe only instructions which apply to memory state are onesThe only instructions which apply to memory state are ones
which copy memory values into register (load instructions)which copy memory values into register (load instructions)
or copy registeror copy register values into memoryvalues into memory ((store instructionsstore instructions).).
registerregisterfilefile
DD --cachecache memorymemory
7/27/2019 ARM Processor Organization - Presentation
60/99
P. Bakowski 60
ARM instructionsARM instructions
In general the ARM instructions fall into one of theIn general the ARM instructions fall into one of the
followingfollowing three categoriesthree categories:: data processing instructionsdata processing instructions
data transfer instructionsdata transfer instructions
control flow instructionscontrol flow instructions
7/27/2019 ARM Processor Organization - Presentation
61/99
P. Bakowski 61
ARM instructionsARM instructions
In general the ARM instructions fall into one of theIn general the ARM instructions fall into one of the
followingfollowing three categoriesthree categories:: data processingdata processing instructionsinstructions
data transfer instructionsdata transfer instructions
control flow instructionscontrol flow instructions
7/27/2019 ARM Processor Organization - Presentation
62/99
P. Bakowski 62
ARM instructionsARM instructions
In general the ARM instructions fall into one of theIn general the ARM instructions fall into one of the
followingfollowing three categoriesthree categories:: data processing instructionsdata processing instructions
data transferdata transfer instructionsinstructions
control flow instructionscontrol flow instructions
7/27/2019 ARM Processor Organization - Presentation
63/99
P. Bakowski 63
ARM instructionsARM instructions
In general the ARM instructions fall into one of theIn general the ARM instructions fall into one of the
followingfollowing three categoriesthree categories:: data processing instructionsdata processing instructions
data transfer instructionsdata transfer instructions
control flowcontrol flow instructionsinstructions
7/27/2019 ARM Processor Organization - Presentation
64/99
P. Bakowski 64
ARM data processingARM data processing
Data processing instructions:Data processing instructions:
these use andthese use and change only register valueschange only register values;;
7/27/2019 ARM Processor Organization - Presentation
65/99
P. Bakowski 65
ARM data processingARM data processingFor example, an instruction can addFor example, an instruction can add two registerstwo registers andand
place the result in aplace the result in a registerregister..
registerregister
filefile MMALUALU
BSBS
+4+4
ALUbusALUbus
7/27/2019 ARM Processor Organization - Presentation
66/99
P. Bakowski 66
ARM data transferARM data transfer
Data transfer instructions copy memory values intoData transfer instructions copy memory values into
registers (registers (loadload instructions) or copy register valuesinstructions) or copy register valuesinto memory (into memory (storestore instructions);instructions);
An additional form, useful only in systems code,An additional form, useful only in systems code,exchanges a memory value with a register value.exchanges a memory value with a register value.
7/27/2019 ARM Processor Organization - Presentation
67/99
P. Bakowski 67
ARM data transferARM data transfer
Data transfer instructions copy memory values intoData transfer instructions copy memory values into
registers (load instructions) or copy register valuesregisters (load instructions) or copy register valuesinto memory (store instructions);into memory (store instructions);
An additional form, useful only in systems code,An additional form, useful only in systems code,exchangesexchanges a memory value with a register value.a memory value with a register value.
7/27/2019 ARM Processor Organization - Presentation
68/99
P. Bakowski 68
ARM data transferARM data transferData transfer instructions copy memory values intoData transfer instructions copy memory values into
registers (load instructions) or copy register valuesregisters (load instructions) or copy register values
into memory (store instructions);into memory (store instructions);
An additional form, useful only in systems code,An additional form, useful only in systems code,
exchangesexchanges a memory value with a register value.a memory value with a register value.
e.g.e.g. test and settest and set instructioninstruction
7/27/2019 ARM Processor Organization - Presentation
69/99
P. Bakowski 69
ARM control flowARM control flowControl flow instructions cause execution toControl flow instructions cause execution to switch toswitch to
a different addressa different address, either, either permanentlypermanently ((branchbranch
instructionsinstructions) or saving a return address to resume the) or saving a return address to resume theoriginal sequence (branch and link instructions) ororiginal sequence (branch and link instructions) or
trapping into system code (supervisor calls).trapping into system code (supervisor calls).
7/27/2019 ARM Processor Organization - Presentation
70/99
P. Bakowski 70
ARM control flowARM control flowControl flow instructions cause execution to switch toControl flow instructions cause execution to switch to
a different address, either permanently (brancha different address, either permanently (branch
instructions) orinstructions) or saving a return address to resume thesaving a return address to resume theoriginal sequenceoriginal sequence ((branch and linkbranch and link instructions) orinstructions) or
trapping into system code (supervisor calls).trapping into system code (supervisor calls).
link addresslink address
7/27/2019 ARM Processor Organization - Presentation
71/99
P. Bakowski 71
ARM control flowARM control flowControl flow instructions cause execution to switch toControl flow instructions cause execution to switch to
a different address, either permanently (brancha different address, either permanently (branch
instructions) or saving a return address to resume theinstructions) or saving a return address to resume theoriginal sequence (branch and link instructions) ororiginal sequence (branch and link instructions) or
trapping into system codetrapping into system code ((supervisor callssupervisor calls).).
system codesystem code
link addresslink address
7/27/2019 ARM Processor Organization - Presentation
72/99
P. Bakowski 72
ARM supervisor modeARM supervisor modeThe ARM processor supports aThe ARM processor supports a protected supervisorprotected supervisor
modemode. The protection mechanism ensures that user. The protection mechanism ensures that user
code cannot gaincode cannot gain supervisor privilegessupervisor privileges withoutwithoutappropriate checks being carried out to ensure that theappropriate checks being carried out to ensure that the
code is not attemptingcode is not attempting illegal operationsillegal operations..
user codeuser code
I/O driverI/O driver
illegal operation ?illegal operation ?
7/27/2019 ARM Processor Organization - Presentation
73/99
P. Bakowski 73
ARM supervisor modeARM supervisor modeThe upshot of this for the userThe upshot of this for the user--level programmer is thatlevel programmer is that
systemsystem--level functionslevel functions can only be accessed throughcan only be accessed through
specifiedspecified supervisor callsupervisor call..
system codesystem code
user codeuser code
I/O driverI/O driver
system/supervisorsystem/supervisor callcall
ARM I/O programming
7/27/2019 ARM Processor Organization - Presentation
74/99
P. Bakowski 74
ARM I/O programmingARM I/O programming
The ARM handles I/0The ARM handles I/0 (input/output) peripherals (such(input/output) peripherals (such
as disk controllers, network interfaces, and so on) asas disk controllers, network interfaces, and so on) as
memorymemory--mapped devicesmapped devices with interrupt support.with interrupt support.
memorymemory--mapped devicesmapped devices
ARM I/O programming
7/27/2019 ARM Processor Organization - Presentation
75/99
P. Bakowski 75
ARM I/O programmingARM I/O programming
TheThe internal registersinternal registers in these devices appear asin these devices appear as
addressable locationsaddressable locations within the ARM's memory mapwithin the ARM's memory map
and may be read and written using the same (and may be read and written using the same (loadload--storestore) instructions as any other memory locations.) instructions as any other memory locations.
memory locationsmemory locations
storestore
loadload
ARM I/O interruptions
7/27/2019 ARM Processor Organization - Presentation
76/99
P. Bakowski 76
ARM I/O interruptionsARM I/O interruptions
Peripherals may attract the processor's attention byPeripherals may attract the processor's attention by
making anmaking an interrupt requestinterrupt request using either theusing either the normalnormal
interruptinterrupt ((IRQIRQ) or the) or the fast interruptfast interrupt ((FIQFIQ)) inputinput..
to CPUto CPU --
IRQIRQ
to CPUto CPU -- FIQFIQ
ARM I/O interruptions
7/27/2019 ARM Processor Organization - Presentation
77/99
P. Bakowski 77
ARM I/O interruptionsARM I/O interruptions
Both interrupt inputs are levelBoth interrupt inputs are level--sensitive and maskable.sensitive and maskable.
Normally most interrupt sources share theNormally most interrupt sources share the IRQ inputIRQ input,,
with just one or twowith just one or two timetime--critical sourcescritical sources connected toconnected tothethe higherhigher--priority FIQ inputpriority FIQ input..
IRQIRQ
FIQFIQ
ARM I/O interruptions
7/27/2019 ARM Processor Organization - Presentation
78/99
P. Bakowski 78
ARM I/O interruptionsARM I/O interruptions
Some systems may includeSome systems may include direct memory accessdirect memory access
(DMA) hardware external to the processor to handle(DMA) hardware external to the processor to handle
highhigh--bandwidth trafficbandwidth traffic..
system bussystem bus
DMA trafficDMA traffic
ARM exceptions
7/27/2019 ARM Processor Organization - Presentation
79/99
P. Bakowski 79
ARM exceptionse cept o s
The ARM architecture supports a range of :The ARM architecture supports a range of :
interruptsinterrupts
trapstraps
supervisor callssupervisor calls
all grouped under the general heading ofall grouped under the general heading of exceptionsexceptions..
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
80/99
P. Bakowski 80
e cept o sp
The ARM architecture supports a range of :The ARM architecture supports a range of :
interruptsinterrupts
trapstraps
supervisor callssupervisor calls
all grouped under the general heading ofall grouped under the general heading of exceptionsexceptions..
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
81/99
P. Bakowski 81
pp
The ARM architecture supports a range of :The ARM architecture supports a range of :
interruptsinterrupts
trapstraps
supervisor callssupervisor calls
all grouped under the general heading ofall grouped under the general heading of exceptionsexceptions..
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
82/99
P. Bakowski 82
pp
The ARM architecture supports a range of :The ARM architecture supports a range of :
interruptsinterrupts
trapstraps
supervisor callssupervisor calls
all grouped under the general heading ofall grouped under the general heading of exceptionsexceptions..
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
83/99
P. Bakowski 83
pp
The ARM architecture supports a range of :The ARM architecture supports a range of :
interruptsinterrupts
trapstraps
supervisor callssupervisor calls
all grouped under the general heading ofall grouped under the general heading of exceptionsexceptions..
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
84/99
P. Bakowski 84
p
TheThe general waygeneral way of exception handling is the same in allof exception handling is the same in all
cases:cases:
the current state is saved by copying the PC intothe current state is saved by copying the PC into
rl4_exc and the CPSR into SPSR_exc (whererl4_exc and the CPSR into SPSR_exc (where excexc standsstands
for the exception type);for the exception type);
the processor operating mode is changed to thethe processor operating mode is changed to theappropriate exception mode;appropriate exception mode;
the PC is forced to a value between 00the PC is forced to a value between 001616 and 1Cand 1C1616, the, the
particular value depending on the type of exception.particular value depending on the type of exception.
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
85/99
P. Bakowski 85
The general way of exception handling is the same in allThe general way of exception handling is the same in all
cases:cases:
thethe current statecurrent state isis savedsaved by copying theby copying the PCPC intointo
rl4_excrl4_exc and theand the CPSRCPSRintointo SPSR_excSPSR_exc (where(where excexc standsstands
for thefor the exception typeexception type););
the processor operating mode is changed to thethe processor operating mode is changed to theappropriate exception mode;appropriate exception mode;
the PC is forced to a value between 00the PC is forced to a value between 001616 and 1Cand 1C1616, the, the
particular value depending on the type of exception.particular value depending on the type of exception.
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
86/99
P. Bakowski 86
The general way of exception handling is the same in allThe general way of exception handling is the same in all
cases:cases:
the current state is saved by copying the PC intothe current state is saved by copying the PC into
rl4_exc and the CPSR into SPSR_exc (whererl4_exc and the CPSR into SPSR_exc (where excexc standsstands
for the exception type);for the exception type);
thethe processor operating modeprocessor operating mode is changed to theis changed to theappropriateappropriate exception modeexception mode;;
the PC is forced to a value between 00the PC is forced to a value between 001616 and 1Cand 1C1616, the, the
particular value depending on the type of exception.particular value depending on the type of exception.
ARM exceptionsARM exceptions
7/27/2019 ARM Processor Organization - Presentation
87/99
P. Bakowski 87
The general way of exception handling is the same in allThe general way of exception handling is the same in all
cases:cases:
the current state is saved by copying the PC intothe current state is saved by copying the PC into
rl4_exc and the CPSR into SPSR_exc (whererl4_exc and the CPSR into SPSR_exc (where excexc standsstands
for the exception type);for the exception type);
the processor operating mode is changed to thethe processor operating mode is changed to theappropriate exception mode;appropriate exception mode;
the PC is forced to a value betweenthe PC is forced to a value between 00001616 andand 1C1C1616, the, the
particular value depending on the type of exception.particular value depending on the type of exception.
ARM instruction executionARM instruction execution
7/27/2019 ARM Processor Organization - Presentation
88/99
P. Bakowski 88
address
register
address
register
datain
register
datain
register
incremen
ter
incremen
ter
data
out
register
data
out
register
control pathcontrol path
control signalscontrol signals
registerregister
bankbank
MM
ALUALU
BSBS
instruction registerinstruction register
a[31:0]a[31:0]
d[31:0]d[31:0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
MM multiplier, BSmultiplier, BS barrel shifterbarrel shifter
Data processing instructionData processing instruction
7/27/2019 ARM Processor Organization - Presentation
89/99
P. Bakowski 89
address
register
address
register
datain
register
datain
register
incremen
ter
incremen
ter
data
out
register
data
out
register
registerregister
bankbank
MM
ALUALU
BSBS
a
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
registerregister registerregister operationsoperations
Data processing instructionData processing instructionrrr
7/27/2019 ARM Processor Organization - Presentation
90/99
P. Bakowski 90
address
register
address
register
datain
register
datain
register
incremen
ter
incremen
ter
data
out
register
data
out
regist
er
registerregister
bankbank
MM
ALUALU
BSBS
a
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
registerregister immediateimmediate operationsoperations
ir
register
ir
register
Store instructionStore instructionrrrr
7/27/2019 ARM Processor Organization - Presentation
91/99
P. Bakowski 91
address
register
ad
dress
register
datain
register
datain
register
incremen
ter
incremen
ter
data
out
register
data
out
regist
er
registerregister
bankbank
MM
ALUALU
BSBS
a
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
compute address operationcompute address operation
ir
register
ir
register
new addressnew address
Store instructionStore instructionrrrr
7/27/2019 ARM Processor Organization - Presentation
92/99
P. Bakowski 92
address
register
ad
dress
register
datain
register
datai
n
register
incremen
ter
incremen
ter
data
out
register
data
out
regist
er
registerregister
bankbank
MM
ALUALU
BSBS
a
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
store datastore data autoauto--indexindex
ir
register
ir
register
autoauto
--indexindex
Branch instructionBranch instructionrrrr
7/27/2019 ARM Processor Organization - Presentation
93/99
P. Bakowski 93
address
register
ad
dress
register
datain
register
datai
n
register
incremen
ter
incremen
ter
data
out
register
data
out
regist
er
registerregister
bankbank
MM
ALUALU
BSBSa
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
compute branch addresscompute branch address
ir
register
ir
register
branch addressbranch address
Branch instructionBranch instructionrr
erer
7/27/2019 ARM Processor Organization - Presentation
94/99
P. Bakowski 94
ad
dress
register
ad
dress
regi
ster
datain
register
datai
n
register
incremen
ter
incremen
ter
data
out
regist
er
data
out
regist
er
registerregister
bankbank
MM
ALUALU
BSBSa
[31
:0]
a
[31
:0]
d[31:
0]
d[31:
0]
AbusAbus
BbusBbus
ALUbusALUbus
PCPC
store return addressstore return address
ir
register
ir
register
return addressreturn address
R14R14
SummarySummary
7/27/2019 ARM Processor Organization - Presentation
95/99
P. Bakowski 95
ARMARM register bankregister bank
ARM barrel shifter and ALUARM barrel shifter and ALU
ARM 3ARM 3--stage and 5stage and 5--stage pipelinesstage pipelines
ARM programming modelARM programming modelARM instructionsARM instructions
SummarySummary
7/27/2019 ARM Processor Organization - Presentation
96/99
P. Bakowski 96
ARM register bankARM register bank
ARMARM barrel shifterbarrel shifter andandALUALU
ARM 3ARM 3--stage and 5stage and 5--stage pipelinesstage pipelines
ARM programming modelARM programming modelARM instructionsARM instructions
SummarySummary
7/27/2019 ARM Processor Organization - Presentation
97/99
P. Bakowski 97
ARM register bankARM register bank
ARM barrel shifter and ALUARM barrel shifter and ALU
ARMARM 33--stagestage andand 55--stage pipelinesstage pipelines
ARM programming modelARM programming modelARM instructionsARM instructions
SummarySummary
7/27/2019 ARM Processor Organization - Presentation
98/99
P. Bakowski 98
ARM register bankARM register bank
ARM barrel shifter and ALUARM barrel shifter and ALU
ARM 3ARM 3--stage and 5stage and 5--stage pipelinesstage pipelines
ARMARM programming modelprogramming modelARM instructionsARM instructions
SummarySummary
7/27/2019 ARM Processor Organization - Presentation
99/99
P. Bakowski 99
ARM register bankARM register bank
ARM barrel shifter and ALUARM barrel shifter and ALU
ARM 3ARM 3--stage and 5stage and 5--stage pipelinesstage pipelines
ARM programming modelARM programming modelARMARM instructionsinstructions