Upload
jinto-george
View
318
Download
0
Embed Size (px)
Citation preview
Efficient register renaming and
recovery for high-performance
processors
ABSTRACT In this seminar paper, a new method to increase the
performance of the processor is proposed. Most
computer applications require that the computer
system meets minimum requirements in order for
the installation to run. One of those requirements is
processor performance. The processor performance
includes its speed and ability to perform a task. It is
very important to increase the performance of the
processor and thereby increasing the speed. For that
purpose we are using register renaming technique.
Jinto George B-tech ECE
Efficient register renaming and recovery
for high-performance processors 1
Dept. of ECE Mangalam College Of Engineering
EFFICIENT REGISTER RENAMING AND
RECOVERY FOR HIGH-PERFORMANCE
PROCESSORS
Seminar report submitted in partial fulfillment of the requirements for the
Award of the degree
BACHELOR OF TECHNOLOGY
in
ELECTRONICS AND COMMUNICATION ENGINEERING
Of
MAHATMA GANDHI UNIVERSITY
Submitted by
JINTO GEORGE
Under the guidance of
Ms. JYOTHISREE K R
Assistant Professor
Dept. of Electronics and Communication Engineering
SEPTEMBER 2014
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
MANGALAM COLLEGE OF ENGINEERING, ETTUMANOOR
Affiliated to Mahatma Gandhi University, Kerala
Efficient register renaming and recovery
for high-performance processors 2
Dept. of ECE Mangalam College Of Engineering
ABSTRACT
In this seminar paper, a new method to increase the performance of the processor is
proposed. Most computer applications require that the computer system meets minimum
requirements in order for the installation to run. One of those requirements is processor
performance. The processor performance includes its speed and ability to perform a task. It is
very important to increase the performance of the processor and thereby increasing the speed.
For that purpose we are using register renaming technique. Register renaming technique is a
technique which is used to increase the performance of the processor. In computer
architecture, register renaming refers to a technique used to avoid unnecessary serialization of
program operations imposed by the reuse of registers by those operations. Register renaming
technique is a technique which is used to increase the performance of the high performance
processors. By using this method we avoid data hazards such as WAR and WAW.
Efficient register renaming and recovery
for high-performance processors 3
Dept. of ECE Mangalam College Of Engineering
1. INTRODUCTION
Most computer applications require that the computer system meets minimum
requirements in order for the installation to run. One of those requirements is processor
performance. The processor performance includes its speed and ability to perform a task. It is
very important to increase the performance of the processor and thereby increasing the speed.
For that purpose we are using register renaming technique. In computer architecture, register
renaming refers to a technique used to avoid unnecessary serialization of program operations
imposed by the reuse of registers by those operations. Register renaming technique is a
technique which is used to increase the performance of the high performance processors.
Modern superscalar microprocessors implement out-of-order and speculative execution to
increase the performance. During this write after read and write after write data hazards are
occurred. To overcome this problem we are using register renaming technique. Registers
renaming techniques are commonly used in high performance processors. High performance
processors are processors which is used for high performance computing.
Efficient register renaming and recovery
for high-performance processors 4
Dept. of ECE Mangalam College Of Engineering
2. REGISTER RENAMING TECHNIQUE
In computer architecture, register renaming refers to a technique used to avoid
unnecessary serialization of program operations imposed by the reuse of registers by those
operations. I.e. register renaming technique is a technique which is used to increase the
performance of the processors. Modern superscalar microprocessors implement out-of-order
and speculative execution to increase the performance. These mechanisms require register
renaming techniques to solve write after read and write after write data hazards.
Register renaming is a technique used to allow multiple execution paths without
conflicts between different execution units trying to use the same registers. Instead of just one
set of registers being used, multiple sets are put into the processor. This allows different
execution units to work simultaneously without unnecessary pipeline stalls. For many
computations, a CPU uses the data stored in its registers. Processors with a large number of
registers can use them to hold many program variables, which reduces the number of cache
and external memory accesses. Register renaming technique used in all current processors
based on the x86 instruction set, can eliminate this false dependency by dynamically
substituting different physical registers when dealing with closely-grouped independent
operations. Internally, renaming requires more physical registers than the logical registers
that are visible to the outside world
Register renaming distinguishes two kinds of registers: logical and physical registers.
Logical registers are those used by the compiler, whereas physical registers are those actually
implemented in the machine. Typically, the number of physical registers is quite larger than
the number of logical registers. When an instruction that produces a result is decoded, the
renaming logic allocates a free physical register. The logical destination register is said to be
mapped to that physical register. Subsequent data-dependent instructions rename their source
registers to access this physical register. Renaming structures are accessed every cycle after
the instructions are decoded. The register renaming circuitry also deals with the register
mapping. As these structures are highly accessed, renaming structures are critical because of
their high power density. We can perform the register renaming mainly in two ways,
Register renaming using RAM
Register renaming using CAM
Efficient register renaming and recovery
for high-performance processors 5
Dept. of ECE Mangalam College Of Engineering
A logical source register is renamed using its identifier to obtain the current mapping.
This is performed faster and more efficiently in terms of energy with a RAM structure. The
RAM is directly indexed by a source register, whereas this register is compared against all
current mappings in the CAM.
Register renaming is a form of pipelining that deals with data dependences between
instructions by renaming their register operands. An assembly language programmer or a
compiler specifies these operands using architectural registers - the registers that are explicit
in the instruction set architecture. Renaming replaces architectural register names by, in
effect, value names, with a new value name for each instruction destination operand. This
eliminates the name dependences (output dependences and anti-dependences) between
instructions and automatically recognizes true dependences.
The recognition of true data dependences between instructions permits a more flexible
life cycle for instructions. By maintaining a status bit for each value indicating whether or not
it has been computed yet, it allows the execution phase of two instruction operations to be
performed out of order when there are no true data dependences between them. This is called
out-of-order execution.
Efficient register renaming and recovery
for high-performance processors 6
Dept. of ECE Mangalam College Of Engineering
2.1. PRINCIPLE OF REGISTER RENAMING
Figure 2.1: The principle of register renaming
In computer architecture, register renaming refers to a technique used to avoid
unnecessary serialization of program operations imposed by the reuse of registers by those
operations. Basic principle register renaming is to eliminate false data dependencies. False
data dependencies are eliminated by writing generated results temporarily to buffers, called
the rename buffers (RB) instead of the referenced architectural registers (AR). Then
During dispatching a new rename buffer need to be allocated to each instruction
whose destination register causes false data dependency.
Referenced source operands need to be fetched from the RB file, if they are
actually renamed, else from the AR file.
During retirement buffered results need to be transferred from the RB file to the
AR file.
Efficient register renaming and recovery
for high-performance processors 7
Dept. of ECE Mangalam College Of Engineering
2.2 INSTRUCTION EXECUTION IN REGISTER RENAMING
Different instructions may take different amounts of time to execute; for example, a
processor may be able to execute hundreds of instructions while a single load from the main
memory is in progress. Shorter instructions executed while the load is outstanding will finish
first, thus the instructions are finishing out of the original program order. Out-of-order
execution has been used in most recent high-performance CPUs to achieve some of their
speed gains.
Consider this piece of code running on an out-of-order CPU:
# Instruction
1 R1 = M[1024]
2 R1 = R1 + 2
3 M[1032] = R1
4 R1 = M[2048]
5 R1 = R1 + 4
6 M[2056] = R1
Figure 2.2.1: Out-of-order Instruction execution with data hazards
Instructions 4, 5, and 6 are independent of instructions 1, 2, and 3, but the processor
cannot finish 4 until 3 is done, because 3 would then write the wrong value. Here some data
hazards are occurred. Here data hazards are WAR and WAW. We can eliminate this
Efficient register renaming and recovery
for high-performance processors 8
Dept. of ECE Mangalam College Of Engineering
restriction by changing the names of some of the registers by using register renaming
technique. This as follows:
Figure 2.2.2: Instruction execution after renaming
Now instructions 4, 5, and 6 can be executed in parallel with instructions 1, 2, and 3,
so that the program can be executed faster. This is the technique in register renaming. When
possible, the compiler would detect the distinct instructions and try to assign them to a
different register. However, there is a finite number of register names that can be used in the
assembly code. Many high performance CPUs have more physical registers than may be
named directly in the instruction set, so they rename registers in hardware to achieve
additional parallelism.
# Instruction
# Instruction
1 R1 = M[1024] 4 R2 = M[2048]
2 R1 = R1 + 2 5 R2 = R2 + 4
3 M[1032] = R1 6 M[2056] = R2
Efficient register renaming and recovery
for high-performance processors 9
Dept. of ECE Mangalam College Of Engineering
3. REGISTER RENAMING USING RAM
In register renaming using RAM technique, the source registers are renamed to
physical registers. In addition, a free physical register is allocated. To allocate a free physical
register, RAM approaches use a free register queue (FRQ). After I is renamed, subsequent
instructions having a data dependence will be renamed to physical registers. The direct-
mapped memory allows the mappings to be rapidly performed while taking up a small area.
Later, at the commit stage, physical registers are released by placing their identifiers back into
the FRQ.
Because the RAM table is updated at the rename stage, it is modified by either no
speculative or speculative instructions. On misspeculation, the changes must be cancelled,
restoring the RAM to its previous state at the time the offending instruction enters the rename
stage.
The simplest strategy for misspeculation recovery consists in waiting until the
mispredicted branch reaches the reorder buffer (ROB) head. As ROB entries contain the
previous mapping for the destination register, the correct RAM state can be restored by
scanning the ROB once the offending instruction reaches the ROB head.
Efficient register renaming and recovery
for high-performance processors 10
Dept. of ECE Mangalam College Of Engineering
4. REGISTER RENAMING USING CAM
Content-addressable memory (CAM) is a special type of computer memory used in
certain very-high-speed searching applications. It is also known as associative memory,
associative storage, or associative array. It compares input search data (tag) against a table of
stored data, and returns the address of matching data (or in the case of associative memory,
the matching data). Several custom computers, like the Goodyear STARAN, were built to
implement CAM, and were designated associative computers. Here register renaming
technique is performed with the help of CAM.
Figure 4: Register renaming using CAM
CAM structures have as many rows as the number of available physical registers.
Each row maintains information for renaming, recovery, and register allocation.
Efficient register renaming and recovery
for high-performance processors 11
Dept. of ECE Mangalam College Of Engineering
5. HYBRID RAM-CAM REGISTER RENAMING TECHNIQUE
In this technique we are using the combination of RAM and CAM register renaming
technique. So this technique provide the advantages of both the approaches. The hybrid
scheme uses two techniques:
A CAM containing all register mappings up to date.
A RAM acting as a cache of the CAM, containing a subset of its renaming
information.
The CAM table can be indexed both directly by a physical register and associatively
by a logical register, while the RAM is indexed by a logical register. A RAM entry in the
hybrid scheme may or may not contain a valid copy of the current mapping. Register
renaming is performed by just accessing the RAM, while valid entries are hit. If an invalid
entry is accessed, a RAM miss is said to occur and the CAM is used to retrieve the current
mapping. Therefore, the CAM is not looked up in all renaming cycles, but only upon RAM
misses, allowing for a lower number of CAM read/write ports compared with a typical CAM
implementation. On misspeculation, the entire RAM contents are invalidated. After recovery,
the current mappings are only available in the CAM, because all RAM entries are invalid.
Subsequent renaming of source registers will cause RAM misses, and the CAM will be
looked up to obtain the mappings. Both CAM lookups and new register allocations will cause
the RAM to be progressively updated and quickly reducing the RAM miss rate. In hybrid
RAM–CAM method, the register renaming takes place in the following manner,
Instruction reach at the rename stage
Destination registers are mapped to the new physical registers.
The new mapping is recorded both in the RAM and the CAM.
Efficient register renaming and recovery
for high-performance processors 12
Dept. of ECE Mangalam College Of Engineering
5.1 BLOCK DIAGRAM DESCRIPTION
Figure 5.1: Block diagram for hybrid RAM-CAM register renaming.
The above figure represents the block diagram of hybrid RAM-CAM register
renaming technique. I.e. this register renaming mechanism uses the combination of both the
RAM AND CAM approaches. So this method provides the advantages of both the
approaches. So this method provides fast access time. It also provides the energy efficient
access to the register mappings. By using this method we can increase the performance the
processors.
The block diagram is horizontally divided into three sections.
Clearing Previous Destination Mappings
Destination Register Renaming
Source Register Renaming
A. Clearing Previous Destination Mappings
CAM entries corresponding to previous destination map- pings are cleared. In the first
stage, all previous destination mappings are looked up in the RAM. For those valid entries,
physical register identifiers are obtained, which are then used in the second stage to directly
Efficient register renaming and recovery
for high-performance processors 13
Dept. of ECE Mangalam College Of Engineering
index the CAM and clear the current mapping entries. On the contrary, invalid RAM entries
cause previous mappings to be associatively cleared in the CAM.
B. Destination Register Renaming
Free physical registers are mapped to the destination registers. The PE provides
physical register identifiers from a set of free entries in the CAM. These identifiers are used
to directly index the CAM and set the new mappings. New mappings are also updated in the
RAM, which is indexed with the identifiers of the destination logical registers. So in the
destination register renaming, free physical registers are mapped to the destination register.
C. Source Register Renaming
For each source register, the associated mapped physical register identifier is
obtained. In the first stage, the RAM is accessed. On a hit, mappings are available right away.
Otherwise, an associative CAM search is performed in the second stage. Finally, those
mappings retrieved from the CAM are updated in the RAM on the third stage. This last stage
is optional and increases the RAM complexity. Nevertheless, it may provide performance and
energy benefits if these updates avoid enough RAM misses.
Notice that the previous mappings are cleared in the CAM in the second stage,
whereas new mappings are set in the first stage. Thus, a hazard arises when an associative
CAM lookup in the second stage for a given instruction accesses a mapping allocated by the
same instruction in the first stage. This hazard can be avoided by flagging the new mapping
entries at the end of the first stage in an additional single-bit column in the CAM. The flags
are reset at the end of the second stage.
Efficient register renaming and recovery
for high-performance processors 14
Dept. of ECE Mangalam College Of Engineering
5.2 REORDER BUFFERS
A re-order buffer (ROB) is used for out-of-order instruction execution. It allows
instructions to be committed in-order. Additional benefits of the re-order buffer include
allowing for precise exceptions and easy rollback control of target address mispredictions
(branch or jump). The ROB works by storing instructions in their original fetched order. The
ROB can also be accessed from the side since each reservation station has an additional
parameter that points to instruction in the ROB. When jump prediction is not correct or a
non-recoverable exception is encountered in the instruction stream, the ROB is cleared of all
instructions and reservation stations are re-initialized.
Here register renaming is performed based on REORDER BUFFER (ROB).
Renaming based on a reorder buffer uses a physical register file that is the same size as the
architectural register file, together with a set of registers arranged as a queue data structure,
called the reorder buffer. As instructions are issued, they are assigned entries for any results
they may generate at the tail of the reorder buffer. That is, a place is reserved in the queue.
We maintain the logical order of instructions within this buffer – so if we can issue four
instructions i to i+3 at once, we put i in the reorder buffer first, followed by i+1, i+2 and i+3.
As instruction execution proceeds, the assigned entry will ultimately be filled in by a value,
representing the result of the instruction. When entries reach the head of the reorder buffer,
provided they've been filled in with their actual intended result, they are removed, and each
value is written to its intended architectural register. If the value is not yet available, then we
must wait until it is. Because instructions take variable times to execute, and because they
may be executed out of program order, we may well find that the reorder buffer entry at the
head of the queue is still waiting to be filled, while later entries are ready. In this case, all
entries behind the unfilled slot must stay in the reorder buffer until the head instruction
completes. The reorder buffer effectively provides the history mechanism required for
register renaming. The oldest version of a register is that stored in the architectural register;
the next oldest is that nearest the head of the reorder buffer; the youngest is that nearest the
tail of the reorder buffer. When the instruction reaches the head of the buffer, its value is
stored in the logical or physical register. At renaming stage an instruction is assigned an entry
at the tail of the reorder buffer (ROB) which becomes the name of the result register.
Efficient register renaming and recovery
for high-performance processors 15
Dept. of ECE Mangalam College Of Engineering
The reorder buffer, as its name implies, can be used for in order execution. The
reorder buffer contains
Type of instruction (branch ,store, ALU)
Destination (memory address, other ROB’s)
Value and its presence/absence.
Figure 5.2: Reorder buffer
Efficient register renaming and recovery
for high-performance processors 16
Dept. of ECE Mangalam College Of Engineering
5.3 HYBRID RAM-CAM REGISTER RENAMING IN HIGH
PERFORMANCE MULTI CORE PROCESSORS
Figure 5.3: Register renaming in multi core processor
A multi-core processor is an integrated circuit (IC) to which two or more processors
have been attached for enhanced performance, reduced power consumption, and more
efficient simultaneous processing of multiple tasks. Here we are implementing hybrid RAM
and CAM for multi core processors. Each core contains renaming stages therefore; pipeline
execution will be takes place. I.e. during the execution of one another instruction is decoded.
This helps to the simultaneous execution of an instruction. So this scheme increases
processors performance and it consumes time.
Efficient register renaming and recovery
for high-performance processors 17
Dept. of ECE Mangalam College Of Engineering
6. ADVANTAGES
Fast and energy-efficient access to register mappings.
Reduces the power consumption.
It reduces the access time.
Leakage energy is lower for hybrid schemes.
The processors work in a non-speculative mode in the common case, hence RAM
invalidations are unusual.
The silicon area required to implement the hybrid RAM–CAM scheme does not
exceed the area required by conventional renaming mechanisms.
Efficient register renaming and recovery
for high-performance processors 18
Dept. of ECE Mangalam College Of Engineering
7. DISADVANTAGES
Hardware is complex.
It is difficult to analyse as it contains the combination of both RAM-CAM approach.
Efficient register renaming and recovery
for high-performance processors 19
Dept. of ECE Mangalam College Of Engineering
8. APPLICATION
Register renaming technique can be used in high performance processors. So we can
increase its speed of operation. Register renaming can be used multi core processors also.
Register renaming technique can be used for processors i.e. is used for high computing.
Efficient register renaming and recovery
for high-performance processors 20
Dept. of ECE Mangalam College Of Engineering
9. FUTURE SCOPE
Register renaming can be used in high performance processors. It can be also used in
multi core processors. It can be also used for processors i.e. is used for high performance
computing. High performance computing includes weather fore casting, military application,
space researches etc.
Efficient register renaming and recovery
for high-performance processors 21
Dept. of ECE Mangalam College Of Engineering
10. CONCLUSION
In this paper, we presented a renaming mechanism consisting of a RAM table and a
low-complexity CAM table, as a hybrid design that took the best of both approaches.
Experimental results showed that a two-way hybrid approach achieved small performance
slowdowns (about 2% and 1% for integer and floating-point benchmarks, respectively) with
respect to a four-way CAM-based renaming mechanism that was able to recover in one clock
cycle. These small slowdowns were accompanied by a drastic reduction of the original
associative searches carried out in the CAM-based approach to only 8% and 3%. Hybrid
designs also reduced the dynamic energy by 16% and 12% with respect to the original CAM
consumption, closing the dynamic energy consumption gap between CAM and RAM
approaches. Besides general performance improvements, hybrid designs were proved to be
more efficient than the simplest non-checkpointed RAM approaches in terms of both area and
energy. Finally, the experiments showed that the performance benefits span different
processor configurations, whether with short or long pipelines. From this we found that this
technology will be much helpful to improve the performance of the processor.
Efficient register renaming and recovery
for high-performance processors 22
Dept. of ECE Mangalam College Of Engineering
REFERENCE
Efficient Register Renaming and Recovery for High-
Performance Processors bySalvador Petit, Member, IEEE,
Rafael Ubal, Julio Sahuquillo, Member, IEEE, and Pedro López,
Member, IEEE
A. Moshovos, “Power-aware register renaming,” Dept. Comput.
Eng. Group, Univ. Toronto, Toronto, ON, Canada, Tech. Rep.
01-08-02, 2002.
E. Safi, A. Moshovos, and A. Veneris, “A physical level study
and optimization of CAM-based checkpointed register alias
table,” in Proc. 13th Int. Symp. Low Power Electron. Design,
Aug. 2008, pp. 233–236.