Microprocessor UNIT - IV

UNIT - IV

UNIT - IVTHE ARM RISC ARCHITECTURE1The RISC revolutionIn the early 80s the idea of RISC was introduced. RISC stands for Reduced Instruction Set computer.RISC processors have faster clock rates. The clock rates range from 20 to 120MHz.Most RISC processors use hardwired control and 32 bit instructions.The limited addressing modes are used by these processors.A large register file , separate instruction and data caches are used. It eliminates unnecessary storage of intermediate results.Contd..Problems in CISC processors:1.Instructions were of varying length from 1 byte to 8 bytes. This causes problems with the pre-fetching and pipelining of instructions.2.ALU instructions could have operands that were memory locations. Because the number of cycles it takes to access memory varies.3.Most ALU instructions had only 2 operands where one of the operands is also the destination. This means this operand is destroyed during the operation or it must be saved before somewhere.To overcome this, the idea of RISC was introduced .It composed of instructions that all have exactly the same size, usually 32 bits.Thus they can be pre-fetched and pipelined successfully.Contd..Thus A=B+C will be assembled as,Load R1,ALoad R2,BAdd R3,R1,R2Store C,R3.Although it takes 4 instructions we can reuse the value in the registers.

The RISC architecture

RISC architecture uses separate instruction and data caches. Their access paths are different.The hardwired control is found in most RISC processors.The CISC architecture

In a CISC processor, there is a unified cache for holding the data and instructions.Therefore they have to share the same path for data and instruction.The CISC processors use micro programmed control. Thus the control memory is needed in CISC processors.This will slow down the instruction execution.

RISCClock rate is 50-150MHz.Simple instruction taking one cycle.Very few instructions refer memory.Fixed format instructions.Few addressing modes.Multiple register sets.Highly pipelined.Complexity is in the compiler.

CISCClock rate is 33-50MHz.Complex instruction set taking multiple cycles.Most of the instructions may refer memory.Variable format instructions.Many addressing modes.Single register set.Less pipelined.Complexity is in the micro program.RISC propertiesThe following are the properties of RISC architecture.Register to register operations.One instruction per cycle.Hardwired instructions.Reduced number of instructions.Simple addressing modes.Simple instruction format.Instruction pipelining.RISC properties1. Register to register operationsThe most important characteristics of RISC processor frequently accessed operands remain in high speed storage.To implement register to register operations, RISC processor provides multiple sets of registers.These register sets are organized into overlapped windows and act as small, fast buffer for holding a subset of all variables that are most likely to used.

Current procedure

Called procedureParameter register Local registerTemporary registerParameter register Local registerTemporary registerCont..The window is divided into three fixed size areas.Parameter register : It will hold parameters passed down from the procedure that called the current procedure and results to be passed back up.Local registers : They are used for local variables, as assigned by the compilers.Temporary registers : It is used to exchange parameters and results with the procedure called by current procedure.The temporary registers of current procedure are physically same as parameter registers of the called registers.This overlap permits parameters to be passed without actual movement of data.Cont..2. One instruction per cycle.In RISC processors, there is an one instruction per machine cycle.A machine cycle is defined to be the time it takes to fetch two operands from registers , performs an ALU operation, and stores the result in a register.So RISC machine instruction are not complicated and can execute as fast as CISC machines.3. Hardwired InstructionsWith simple , one cycle instruction, there is no need for micro instructions.The machine instructions can be hardwired.These instructions are executed faster than the instructions implemented with micro instructions, since it is not necessary to access a micro program control memory during instruction execution.4. Reduced number of instructions.RISC processor provides limited number of instructions, which simplifies the design of control unit.Cont..5. Simple addressing modesRISC processor uses simple addressing modes.Almost all instruction uses simple addressing modes.This architecture of RISC processor simplifies the instruction set and control unit.6. Instruction pipeliningThe process of fetching next instruction while the current instruction is being executed is known as pipelining.The CPU contains several independent units that work in parallel.One of them fetches the instruction, and other ones decode and execute them.At any instant, several instructions are in various stages of processing.The instructions have he following two phases.Instruction Fetch (I).Execute (E).

Cont..The instruction fetch phase fetches the instruction to be executed from memory.The execute phase performs an ALU operation with register input and output to execute the instruction.In case of load and store instructions, three phases are required:Instruction Fetch (I).Execute (E).Data Transfer (D).Here also the instruction fetch phase fetches the instruction to be executed.In execution phase address of memory is calculated and in data transfer phase actual data is transferred from register to memory or from memory to register depend upon the instruction.In two way instruction pipelining, I and E phases of two different instructions are performed simultaneously.In three way instruction pipelining, three instructions can be overlapped.

Cont..Two way instruction pipelining

Three way instruction pipelining

RISC addressing modesThe small instruction set of typical RISC processor consists mostly of register to register operations, and simply load and store operations for memory access.Each operand is brought into processor register with a load instruction and results are transferred to memory by means of store instruction.In this architecture almost all instructions have simple register addressing, and thus it uses only a few addressing modes.RISC processor has three basic addressing modes.Register addressing,Immediate operand andRelative to PC addressing for branch instructions.Register addressing: In register addressing, the instruction usually consists of three fields: opcode field which specifies an operation one or two source register fields and one destination register field.For example: ADD R1, R2, R3 ( R3 R1 + R2)

Cont..Immediate addressing mode: In immediate operand addressing mode, the second source is an immediate operand. The operation is performed with the data specified in the source register field and the immediate operand, and the result is stored in the destination register field.Example: ADD R1, #100 , R2..(R2 R1+100)Relative to PC addressing: In relative to PC addressing, the instruction usually consists of three fields: opcode field, condition field and address field.Opcode field specifies the operation.The condition field specifies one of many possible branch conditions.The address field specifies the signed offset which is to be added to the contents of PC to calculate new address when branch condition is satisfied.Example: JMP COND, R1(R2)(PC R1 + R2).Design for low power consumptionPower consumption is becoming the limiting factor in the amount of functionality that can be placed in the devices.ARM processor power efficient processing.Most components are currently fabricated using CMOS technology.CMS technology is cost efficient and consumes low power than the other technology.The sources of power consumption on a CMOS chip can be classified as Static power dissipation.Dynamic power dissipation.Dynamic power is frequency dependent, while static is not.Until recently a 5V supply was standard , but many modern processors operates on 3V power supply.The latest technologies operate with supplies of between 1 and 2V and this will reduce further in future.

CMOS power componentsSwitching power: This is the power dissipated by charging and discharging the gate output capacitance.Short-circuit power: During transition on the input of CMOS gate both p and n transistors can conduct simultaneously resulting a transitory conducting path from Vdd to Vss.This causes a power dissipation which is a small fraction.Leakage current:A very small current called leakage current flows through the transistors when they are in OFF state.The power dissipation due to leakage current is small and can be neglected.

CMOS circuit power

Neglecting power consumption due to leakage current and short circuit, the total power dissipation of a CMOS circuit is the summation of power dissipation due to all the gates in the circuit.It is given by,

Where,f = clock frequency,Ag = gate activity factor,Cg = gate load capacitance.

Low power circuit designVarious approaches to the low power design are as follows,Use low power supply voltage, Vdd.Keep the circuit activity factor as small as possible.Simplify the circuit and use minimum number of gates to implement it.Use minimum clock frequency. Lower clock frequency allows operation at a reduced Vdd resulting low power consumption.Reducing Vdd!Reducing Vdd, we can reduce power dissipation. However, reducing Vdd also reduces the performance of the circuit.

Maximum operating frequency is reduced as Vdd is reduced.So by decreasing Vt, we can improve the performance.

ARM Architecture BasicsThe ARM (Advanced RISC machine) processor is basically RISC.ARM is a 32 bit processor.The ARM processor provides solutions forOpen platforms running complex operating systems for wireless, consumer and imaging applications.Embedded real time systems for mass storage, automotive, industrial and networking applications.Secure applications including smart cards and SIM cards.The RISC design was adapted by ARM to create a flexible embedded processor.So ARM architecture is not a pure RISC architecture.The ARM architecture incorporates a number of features from RISC design, but rejects a number of other features.The ARM instruction set differs from pure RISC instruction set since the ARM instruction set is made suitable for embedded applications.

Architecture InheritanceThe features of RISC which are accepted by ARM processors.A large uniform register file.A load store architecture.Uniform and fixed length instruction fields.Three address instruction formats.The features of RISC which are rejected by ARM processor.Register windows.Delayed branches.Single cycle execution of all instruction.In addition ARM architecture gives,Control over ALU and shifter in every data processing instruction.Load and store multiple instructions.

ARM core dataflow model

Load-store architecture:It has two instruction types, load and store, for transferring data in and out of the processor respectively.LOAD : This instruction copies data from memory to registers in the processor core.STORE : This instruction copies data from registers in the processor core to memory.The ARM processor instruction set does not include the instruction that directly manipulate data in memory.The data processing is carried out only in registers.Data bus:The data enters the ARM core through data bus.The data is either in the form of a instruction opcode or a data.Data and instruction share the same bus.Instruction decoder:This unit decodes the instruction opcode read from the memory and then the instruction is executed.

Cont..Register file:This is a bank of 32 bit registers used for storing data items.Sign extend:The ARM core is a 32 bit processor. So most instructions of ARM processor treat registers as holding signed or unsigned 32 bit values.When the processor reads signed 8 bit or 16 bit numbers from memory, the sign extend hardware converts these numbers to 32 bit values and then places them in a register file.ALU and MAC:Most of the ARM instructions are two operands instructions. The two source registers Rn and Rm are used to store these operands.These source operands are read from the Rn and Rm registers using the internal buses A and B respectively.The ALU and MAC reads the operand values from Rn and Rm registers via internal C bus in destination register, Rd and then to the register file.

Cont..Address register:This holds the address generated by the load and store instructions and places it on the address bus.Barrel shifter:The contents of the Rm register alternatively can be preprocessed in the barrel shifter before applying as an input to the ALU.Incrementer:For load and store instructions, the incrementer updates the contents of the address register before the processor core reads or writes the next register value from or to the consecutive memory locations.

26ARM visible registers

Cont..The register file in the ARM core contains all the registers, available to a programmer.The current mode of the processor decides the availability of the registers to the programmer.The ARM processor has a total of 37 registers.All registers are 32- bit wide. They can be classified into two groups as,General purpose registers andSpecial purpose registers.General purpose registers:Registers r0 r12 are used as general purpose registers. Depending upon the context, registers r13 r15 can also be used as general purpose registers.The general purpose registers hold either data or an address.

Cont..Special purpose registers:Registers r13 r15, CPSR (current program status register) and SPSR (saved program status register) are the special register. In user mode, this registers are labeled as r13 sp, r14 lr and r15 pc respectively.Stack pointer (r13 sp) : Register r13 is the stack pointer. It stores the top of the stack in the current processor mode.Link register (r14 lr) : Register r14 is the link register. The processor stores the return address in the register when a subroutine is called.Program counter (r15 pc) : Register r15 is the program counter and stores the address of the next instruction to be fetched from the memory by the processor.The unbanked registers:Registers r0 r7 are unbanked registers. This means that each of them refers to the same 32 bit physical register in all processor modes.They are completely general purpose registers, with no special uses implied by the architecture.The banked registers:Registers r8 to r14 are banked registers.Almost all instructions allow the banked registers to be used wherever a general purpose register is used.Out of 37 registers, 20 registers are banked registers.Program status register:

Format of CPCRThe current program status register is accessible in all processor modes.It contains condition code flags, interrupt disable bits, the current processor mode and other status and control information.User mode and system mode do not have an SPSR, because they are not exception.

Control flags:The control bits change when an exception arises and can be altered by software.Bits 0-4 (mode select bits):This bit determines the processor mode.

PROCESSSOR MODEMODE SELECT BITSAbort10111Fast interrupt request10001Interrupt request10010Supervisor10011System11111Undefined11011user10000Bit 5 (thumb state bit):This bit gives the state of the core.The state of the core determines which instruction set is being executed.There are three instruction set,ARM.Thumb.Jazelle.Some processor have extra bits allocated to decide the state of the processor.The J bits in the flags field is only available on jazelle enabled processor.The jazelle J and Thumb T bits in CPSR decide the state of the processor.When both, J and T bits are 0, the processor is in ARM state and executes the ARM instructions.

Thumb:The Thumb instruction set is a reworking of the ARM set, with a few things omitted.Thumb instructions are 16 bits.This allows for greater code density in places where memory is restricted.The Thumb set can only address the first eight registers and there are no conditional execution instruction.So, the thumb instruction set will always come along with full ARM instruction set.Jazelle:Jazelle executes 8 bit instructions.It is a hybrid mix of software and hardware.It is designed to increase the speed of the java byte codes.The jazelle technology and a specially modified version of the java virtual machine is needed to execute java byte codes.

Bits 6 and 7 (interupt masks):There are two interrupts available on the ARM processor core.Interrupt request (IRQ) andFast interrupt request (FIQ)These are maskable interrupts and their masking is controlled by bits 6 and 7 of CPSR.Bit 6(F) controls FIQ and bit 7(I) controls IRQ.When bit 6 is set to binary 1, the corresponding interrupt request is masked and when bit is 0, the interrupt is available.Conditional code flags:These flag bits are updated by the operations performed by the ALU.The conditional code flags are usually modified by,Execution of comparison instruction.Execution of some other arithmetic, logical and move instruction.

Bit 28 (overflow flag, V):It is set in one of two ways,For an addition or subtraction, V is set to 1 if signed overflow occurs.For non addition/subtraction, V is normally left unchanged.

Bit 29 (carry flag, C):It is set in one of four ways,For an addition, including the comparison instruction CMN, C is set to 1 if the addition produced a carry, and to 0 otherwise.For a subtraction, including the comparison instruction CMP, C is set to 0 if the subtraction produced a borrow, and to 1 otherwise.For non-addition/subtraction that incorporate a shift operation, C is set to the last bit shifted out of the value by the shifter.For other non-addition/subtraction, C is normally left unchanged.

Bit 30 (zero flag, Z):It is set to 1 if the result of the instruction is zero (which often indicates an equal result from a comparison, and to 0 otherwise.Bit 31 (negative flag, N):It is set to bit 31 of the result of the instruction.If this result is regarded as a twos complement signed integer.N = 1 if the result is negative and N = 0 if it is positive.The memory system:The ARM processor views memory as a linear collection of bytes numbered in ascending order from zero to 232 1.The ARM7TDMI processor is bi- endian and can treat words in memory as being stored in either,Little endian (or)Big endian.Little endian is traditionally the default format for ARM processor.

Little endian:In little endian format, the lowest addressed byte in a word is considered the least significant byte of the word.The highest addressed byte is the most significant.So the byte at address 0 of the memory system connects to data lines 7 through 0.For a word aligned address A, the figure shows how the word at address A, the halfword at address A and A+2 and the byte addresses A, A+1, A+2 and A+3 map on to each other when the core is configured as little endian.

31 24 23 16 15 7 1 0Word at address AHalfword at address A+2Halfword at address A

Byte at address A+3Byte at address A+2

Byte at address A+1

Byte at address A

Big endian:In big endian format, the ARM processor stores the most significant byte of a word at the lowest numbered byte and the least significant byte at the highest numbered byte.So the byte at address 0 of the memory system connects to data lines 31 through 24.For a word aligned address A, the figure shows how the word at address A, the halfword at address A and A+2 and the byte addresses A, A+1, A+2 and A+3 map on to each other when the core is configured as big endian.

31 24 23 16 15 7 1 0Word at address AHalfword at address AHalfword at address A+2

Byte at address AByte at address A+1

Byte at address A+2

Byte at address A+3

ARM instruction executionARM instruction are classified as,Data processing instructions.Data transfer instructions.Branch instructions.Data processing instructions:These are two operand instructions.One operand is always a register and the other operand is either a second register or an immediate value.The second operand is routed through barrel shifter to the ALU.Arithmetic or logical operation is performed on the operands in the ALU and the result from the ALU is written back in destination register.

In case of immediate value, the second operand it is extracted from the current instruction at the top of the instruction pipeline and it is routed through barrel shifter to the ALU.Along the instruction pipeline, PC is incremented and copied back into both the address register and r15 in the register bank and next instruction is loaded in the instruction pipeline.All these operations are performed in a single clock cycle. Data processing instruction datapath activity

Data transfer instructions:Data transfer instructions are executed in two or more ways.In the first cycle a memory address is computed in a manner similar to the way a data processing instruction computes its result.A register is used as a base address, to which offset is added.The 12 bit offset is either taken from the second register or from the current instruction as an immediate value and is routed through the shifter without any shift.The computed address from the ALU is sent to the address register.In the second cycle actual data transfer takes place.It is important to note that PC value is incremented and stored in the register bank at the end of the first cycle so that the address register is free to accept the data transfer address for the second cycle.At the end of the second cycle the PC is loaded into address register to fetch the next instruction.

STR datapath activity

Branch instructions:Branch instructions are executed in three cycles.In the first cycle, a 24 bit immediate field is extracted from the instruction and then shifted left two bit positions using barrel shifter to give a word aligned offset.This offset is added with PC and the result is loaded into address register.In the second cycle, the return address, the contents of PC are loaded into the link register r14 through ALU.The third cycle is used to fill the instruction pipeline.First two cycles of branch instruction

ARM organization and implementation3 stage pipeline ARM organization:The main components of an ARM organization with a 3 stage pipeline are,Register bank : It stores the processor state. It has two read ports and one writ port which can each be used t access any register.It has also an additional read port and an additional write port that give special access to r15, the program counter (PC).Barrel shifter :It is used to shift or rotate one operand by any number of bits.ALU :It performs arithmetic and logical functions required by the instruction set.Address register and incrementer : They select and hold all memory addresses and generate sequential addresses when required.

3-Stage pipelining

5 stage pipeline ARM organization:The pipeline provided by ARM7 is very cost effective.For higher performances, we require processor organizations which support more number of pipeline stages.The time required to execute a program is given by,

Tprog : Time required to execute a given program.Ninst : Number of ARM instructions executed in the program.CPI : Average number of clock cycles per instruction.Fclk : Processors clock frequency.There are some ways to increase the performance,Increase the clock rate, Fclk : To achieve this it s necessary to simplify the pipeline stages to increase the number of pipeline stages.

Thus, to give higher performance ARM9 core employs a 5 stage pipeline.FETCH

DECODE

EXECUTE

MEMORY

WRITEInstruction FetchThumb/ARM inst.decoderShiftALUMemory AccessRegister Write5-Stage pipelining

ARM9TDMI 5 Stage Pipelining:It has separate instruction and data memories to support 5 stage pipelining.It provide forwarding paths to solve the problem of data dependencies without stalling the 5 stage pipeline.Data dependency is a pipeline hazard which arise when an instruction needs to use the result of one of its predecessors before that result has returned to the register file.This concept is known as data forwarding.There are some cases in which forwarding paths cannot avoid a pipeline stall due to data dependencies.For example,LDR R0, [R7].ADD R4, R0, R2.Instruction sequence suffers a single cycle penalty due to load use interlock on register R0.In such cases, compilers are encouraged to not to put a dependent instruction immediately after a load instruction.

The 5-stage pipeline stages are,Fetch:In this stage the processor fetches instruction from memory and places in the instruction pipeline.Decode:In this stage,The instruction is decoded andThe register operands read from the register.Execute:In this stage,An operand is shifted.The ALU result generated.If the instruction is load or a store, the memory address is computed in the ALU.Memory:In this stage, data memory is accessed if required.

Write:In this stage, the results generated by the instruction are written back to the register file including any data loaded from memory.

Three stage pipelined instruction execution

ARM implementationARM clocking scheme:Most ARMs do not operate with edge sensitive registers.The ARM clocking scheme is based around 2 phase non overlapping clocks generated internally from a single input clock signal.This scheme allows level sensitive transparent latches.Data movement in this scheme is controlled by passing the data alternatively through latches open during phase 1 and latches open during phase 2.

ARM datapath timing:

As shown in figure, the register read buses are valid early in phase 1.One operand is passed through the barrel shifter and the output of barrel shifter is valid later in the phase 1.ALU has input latches and they are open when valid data arrives.ALU gets the valid operands later in the phase 1 so that the phase 2 precharge does not get through the ALU.

The ALU then continues to process the operands in phase 2.At the end of phase 2 ALU output valid result and it is latched in the destination register.The minimum datapath cycle time is given by,T(min) = Register read time + shifter delay + ALU delay + Register write set up time + Phase 2 to Phase 1 non overlap time.Adder Design:Ripple carry adder circuit

The ARM supports 32 bit addition and it has significant effect on the datapath cycle time.As a result it has also significant effect on processors performance.It has worst case carry path of 32 gates long.In order to reduce worst case carry path and to allow a higher clock rate, ARM 2 uses 4 bit carry look ahead circuit.4 bit carry look ahead circuit

ALU functions:Along with the addition, ALU does address computations for memory transfer, branch calculations, bit wise logical functions and so on.

ARM2 ALU logic

ARM6 carry select adder scheme:Carry select adder supported by ARM6 computes the sums of various fields of the word for a carry in of both zero and one and then the final result is selected by using the correct carry in bit value to control the multiplexer .

In this scheme, the worst case addition time is significantly faster than the 4 bit carry look ahead adder.

ARM6 ALU Organization:The ARM6 does not easily lead to a merging of the arithmetic and logical functions into a single structure as was used on ARM2.Instead, a separate logic unit runs in parallel with the adder, and a multiplexer selects the output from the adder or from the logic unit as required.

Carry arbitration adder:ARM9TDMI supports improved adder logic called carry arbitration adder.It computes all intermediate carry values using a parallel prefix tree, which is very fast parallel logic structure.

The above table shows the values of u and v for inputs A, B and C (carry) for a particular bit position.When C is unknown, values of u and v are 1 and 0, respectively.It is important to note that u gives the carry out if the carry in is one and v gives the carry out if the carry in is zero.

The barrel shifter:In the ARM architecture shift time is critical since it contributes directly to the data path cycle time.In order to minimize the shifting time i.e., delay through shifter, a cross bar switch matrix is used instead of actual shifting of data.Each input is connected to each output through a switch.

In the above figure 4x4 matrix is shown. ARM processors use 32x32 matrix.Precharging sets all outputs to logic 0, so those which are not connected to any input during switching remain at 0 giving the zero filling required by the shift operation.For rotate right, the right shift diagonal is enabled + complementary left diagonal.

Multiplier Design:The older ARM cores support 32 bit result multiplication.They use the barrel shifter and ALU to generate the product.Here, multiplication is implemented using modified booth algorithm.On the other hand recent ARM cores support 64 bit result multiplication.For high performance multiplication they use carry save adders.In this technique, the carry output from bit i during step j is applied to carry input bit i+1 during the next step j+1.After addition of carry components in the last row, one more step is required in which the carries are allowed to ripple from the least to the most significant bit.

High Speed Multiplier Organization

ARM Register Bank:The ARM register bank consists of 31 general purpose registers, each on of 32 bit.Each bit in the register is implemented using register cell circuit.

The register cell consists of asymmetric cross coupled CMOS inverter pair.When the register contents are changed the cell is overwritten by a strong signal from the ALU bus.

Read buses A and B are provided to read the state of the cell.Read operation activated by activating control signals read A and read B. The register cell are arranged column wise to from 32 bit register.Such column are packed together to form the complete register bank.The decoders are used for the read and write enable lines which are packed above the column.In the ARM processor Program Counter is a part of register bank having two write and three read ports.The other registers in the bank have only one write port and two read ports.The PC is kept at one end of the register array.

ARM core datapath buses

ARM control logic structureIt consists of three structural components,Instruction Decoder PLA.Distributed Secondary Control.Decentralized control units.

Instruction decoder PLA: It uses internal cycle counter and some of the instruction bits to identify the class of operation to be performed on the datapath in the next cycle.Distributed Secondary Control : It uses information from PLA to select other instruction bits or processor state information to control the datapath.Decentralized Control Units : They control the datapath for specific instructions that take a variable number of cycles to complete their execution.The cycle count block indicates the current cycle number in the multi-cycle instruction execution.According to the cycle count PLA generate different control outputs.The cycle count also determines whether it is a last cycle of the current instruction and if it is, it initiates the transfer of the next instruction from the instruction pipeline.

Physical Design:There are two principal mechanisms used to implement an ARM processor core.Hard Macrocell:It is a physical layout.It can be used only on the particular process for which it has been designed.For every new process, the layout need to be modified and recharacterized.Soft Macrocell:It is a synthesizable design expressed in a hardware description language such as VHDL.It can readily be ported to a new process technology.Recent ARM processor cores are available in both hard and soft forms.

ARM7TDMI coreThe ARM7TDMI is the current low end ARM core.It is mainly used in many digital mobile telephones.Features:ARM7TDMI core is a member of the ARM family of general purpose 32 bit microprocessors.ARM family offers high performance for low power consumption and small size.ARM7TDMI core uses pipeline to increase the speed of the flow of instructions to the processor.It uses 3 stage pipeline with stages,FetchDecode andExecute.ARM7TDMI core has a Von Neumann architecture, with a single 32 bit data bus carrying both instruction and data.

Cont..Data handled by ARM7TDMI can be 8 bit (byte), 16 bit (halfword), 32 bit (words).ARM7TDMI core instruction set enables us to implement specialized additional instructions using coprocessors to extend functionality.ARM7TDMI processor contains hardware extensions for advanced debugging features.

ARM7TDMI Organization

CLOCK SIGNALS:

Mclk : Memory clock input. This is the main clock for all memory accesses and processor operations.Wait : When LOW the processor extends an access over a number of cycles of MCLK, which is useful for accessing slow memory.Eclk :External clock output.MEMORY INTERFACE:

MREQ : Memory request : When the processor requires memory access during the following cycle this is low.SEQ: Sequential Address : When the address of next memory cycle is closely related to that of the last memory access, this is high.LOCK: Locked operation : When the processor is performing a locked memory access this is high. This is used to prevent the memory controller allowing another device to access the memory. It is active only during the data swap instructions.R / W: Read / Write : When the processor is performing a read cycle, this is low. MAS [1:0]:Memory access size : Used to indicate to the memory system the size of data transfer required for both read and write cycles, become valid before the falling edge of MCLK and remain valid until the rising edge of MCLK.The binary values 00, 01 and 10 represent byte, halfword and word respectively.BL[3:0]:Byte latch control : The values on the data bus are latched on the falling edge of MCLK when these signals are high.MMU INTERFACE:TRANS:Memory translate : When the processor is in user mode, this is low. It can be used either to tell the memory management system when address translation is on.MODE[4:0]:Processor mode : These are the inverse of the internal status bits including the current processor mode.

ABORT:Memory abort : the memory system uses this signal to tell the processor that a requested access is not allowed.STATUS SIGNAL:TBIT:When the processor is executing the thumb instruction set, this is high. It is low when executing the ARM instruction set.CONFIGURATION:BIGEND:Big endian configuration : selects how the processor treats bytes in memory.HIGH for big endian format.LOW for little endian format.INTERRUPTS:FIQ:Fast interrupt request : Taking this LOW causes the processor to be interrupted if the appropriate enable in the processor is active.The signal is level sensitive and must be held LOW until a suitable response is received from the processor.

IRQ:Interrupt request : As FIQ, but with lower priority. Can be taken LOW to interrupt the processor.ISYNC:Synchronous interrupts : Set this HIGH if IRQ and FIQ are synchronous to the processor clock. Set it LOW for asynchronous interrupts.INITIALIZATION:RESET: Used to start the processor from a known address.A LOW level causes the instruction being executed to terminate abnormally.When HIGH for at least one clock cycle, the processor restarts from address 0.BUS CONTROL:ENIN:Enable input : This must be LOW for the data bus to be driven during write cycle.

ENOUT:Enable output : during a write cycle, this signal is driven LOW before the rising edge of MCLK and remains LOW for the entire cycle.DBE:Data bus enable : Must be HIGH for data to appear on either the bidirectional or unidirectional data output bus.When LOW, the bidirectional data bus is placed into high impedance state and data output is prevented on the unidirectional data output bus.ABE:Address bus enable : The address bus are disabled when this is LOW.ABE must be HIGH if there is no system requirement to disable the address drivers.ALE:Address latch enable : The signal is provided for backwards compatibility with older ARM processors.This enables these address signals to be held valid for the complete duration of a memory access cycle.

APE:Address pipeline enable : selects whether the address bus and other signals operate in pipelined (APE is high).Or depipelined mode (APE is LOW).BUSEN:Data bus configuration : A static configuration signal that selects whether the bidirectional data bus (D[31:0]) or the unidirectional data buses (Din[31:0]) and (DOUT[31:0]) are used to transfer data between the processor and memory.When BUSEN is LOW, D[31:0] is used.When BUSEN is HIGH, DIN[31:0] and DOUT[31:0] is enabled.DEBUG INTERFACE:The ARM7TDMI processor contains hardware extensions for advanced debugging features.DBGACK:Debug acknowledge : when the processor is in debug state this is high.

DBGEN:Debug enable : A static configuration signal that disables the debug features of the processor when held LOW.This signal must be HIGH to enable the debug function.DBGRQ : Debug request : This is a level sensitive input, that when HIGH causes ARM7TDMI core to enter debug state after executing the current instruction. It has also additional debugging features.EXTERN0:External input 0 : This is connected to the Embedded ICE debug logic and enables breakpoints and watchpoints to be dependent on an external condition.EXTERN1:External input 1 : This is connected to the Embedded ICE debug logic and enables breakpoints and watchpoints to be dependent on an external condition.

COMMRX:Communication channel receive : When the communication channel receive buffer is full this is HIGH.This signal changes after the rising edge of MCLK.COMMTX:Communication channel transmit : When the communication channel transmit buffer is empty this is HIGH.This signal changes after the rising edge of MCLK.EXEC:Executed : This is HIGH when the instruction in the execution unit is not being executed.RANGEOUT0:When the embedded ICE watchpoint unit 0 has matched the conditions currently present on the address, data and control buses, then this is HIGH.RANGEOUT1:When the embedded ICE watchpoint unit 1 has matched the conditions currently present on the address, data and control buses, then this is HIGH.

Documents

Microprocessor UNIT - IV