Register renaming technique

Efficient register renaming and

recovery for high-performance

processors

ABSTRACT In this seminar paper, a new method to increase the

performance of the processor is proposed. Most

computer applications require that the computer

system meets minimum requirements in order for

the installation to run. One of those requirements is

processor performance. The processor performance

includes its speed and ability to perform a task. It is

very important to increase the performance of the

processor and thereby increasing the speed. For that

purpose we are using register renaming technique.

Jinto George B-tech ECE

Efficient register renaming and recovery

for high-performance processors 1

Dept. of ECE Mangalam College Of Engineering

EFFICIENT REGISTER RENAMING AND

RECOVERY FOR HIGH-PERFORMANCE

PROCESSORS

Seminar report submitted in partial fulfillment of the requirements for the

Award of the degree

BACHELOR OF TECHNOLOGY

in

ELECTRONICS AND COMMUNICATION ENGINEERING

Of

MAHATMA GANDHI UNIVERSITY

Submitted by

JINTO GEORGE

Under the guidance of

Ms. JYOTHISREE K R

Assistant Professor

Dept. of Electronics and Communication Engineering

SEPTEMBER 2014

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

MANGALAM COLLEGE OF ENGINEERING, ETTUMANOOR

Affiliated to Mahatma Gandhi University, Kerala




ABSTRACT

In this seminar paper, a new method to increase the performance of the processor is

proposed. Most computer applications require that the computer system meets minimum

requirements in order for the installation to run. One of those requirements is processor

performance. The processor performance includes its speed and ability to perform a task. It is

very important to increase the performance of the processor and thereby increasing the speed.

For that purpose we are using register renaming technique. Register renaming technique is a

technique which is used to increase the performance of the processor. In computer

architecture, register renaming refers to a technique used to avoid unnecessary serialization of

program operations imposed by the reuse of registers by those operations. Register renaming

technique is a technique which is used to increase the performance of the high performance

processors. By using this method we avoid data hazards such as WAR and WAW.




1. INTRODUCTION

Most computer applications require that the computer system meets minimum

requirements in order for the installation to run. One of those requirements is processor

performance. The processor performance includes its speed and ability to perform a task. It is

very important to increase the performance of the processor and thereby increasing the speed.

For that purpose we are using register renaming technique. In computer architecture, register

renaming refers to a technique used to avoid unnecessary serialization of program operations

imposed by the reuse of registers by those operations. Register renaming technique is a

technique which is used to increase the performance of the high performance processors.

Modern superscalar microprocessors implement out-of-order and speculative execution to

increase the performance. During this write after read and write after write data hazards are

occurred. To overcome this problem we are using register renaming technique. Registers

renaming techniques are commonly used in high performance processors. High performance

processors are processors which is used for high performance computing.




2. REGISTER RENAMING TECHNIQUE

In computer architecture, register renaming refers to a technique used to avoid

unnecessary serialization of program operations imposed by the reuse of registers by those

operations. I.e. register renaming technique is a technique which is used to increase the

performance of the processors. Modern superscalar microprocessors implement out-of-order

and speculative execution to increase the performance. These mechanisms require register

renaming techniques to solve write after read and write after write data hazards.

Register renaming is a technique used to allow multiple execution paths without

conflicts between different execution units trying to use the same registers. Instead of just one

set of registers being used, multiple sets are put into the processor. This allows different

execution units to work simultaneously without unnecessary pipeline stalls. For many

computations, a CPU uses the data stored in its registers. Processors with a large number of

registers can use them to hold many program variables, which reduces the number of cache

and external memory accesses. Register renaming technique used in all current processors

based on the x86 instruction set, can eliminate this false dependency by dynamically

substituting different physical registers when dealing with closely-grouped independent

operations. Internally, renaming requires more physical registers than the logical registers

that are visible to the outside world

Register renaming distinguishes two kinds of registers: logical and physical registers.

Logical registers are those used by the compiler, whereas physical registers are those actually

implemented in the machine. Typically, the number of physical registers is quite larger than

the number of logical registers. When an instruction that produces a result is decoded, the

renaming logic allocates a free physical register. The logical destination register is said to be

mapped to that physical register. Subsequent data-dependent instructions rename their source

registers to access this physical register. Renaming structures are accessed every cycle after

the instructions are decoded. The register renaming circuitry also deals with the register

mapping. As these structures are highly accessed, renaming structures are critical because of

their high power density. We can perform the register renaming mainly in two ways,

Register renaming using RAM

Register renaming using CAM




A logical source register is renamed using its identifier to obtain the current mapping.

This is performed faster and more efficiently in terms of energy with a RAM structure. The

RAM is directly indexed by a source register, whereas this register is compared against all

current mappings in the CAM.

Register renaming is a form of pipelining that deals with data dependences between

instructions by renaming their register operands. An assembly language programmer or a

compiler specifies these operands using architectural registers - the registers that are explicit

in the instruction set architecture. Renaming replaces architectural register names by, in

effect, value names, with a new value name for each instruction destination operand. This

eliminates the name dependences (output dependences and anti-dependences) between

instructions and automatically recognizes true dependences.

The recognition of true data dependences between instructions permits a more flexible

life cycle for instructions. By maintaining a status bit for each value indicating whether or not

it has been computed yet, it allows the execution phase of two instruction operations to be

performed out of order when there are no true data dependences between them. This is called

out-of-order execution.




2.1. PRINCIPLE OF REGISTER RENAMING

Figure 2.1: The principle of register renaming

In computer architecture, register renaming refers to a technique used to avoid

unnecessary serialization of program operations imposed by the reuse of registers by those

operations. Basic principle register renaming is to eliminate false data dependencies. False

data dependencies are eliminated by writing generated results temporarily to buffers, called

the rename buffers (RB) instead of the referenced architectural registers (AR). Then

During dispatching a new rename buffer need to be allocated to each instruction

whose destination register causes false data dependency.

Referenced source operands need to be fetched from the RB file, if they are

actually renamed, else from the AR file.

During retirement buffered results need to be transferred from the RB file to the

AR file.




2.2 INSTRUCTION EXECUTION IN REGISTER RENAMING

Different instructions may take different amounts of time to execute; for example, a

processor may be able to execute hundreds of instructions while a single load from the main

memory is in progress. Shorter instructions executed while the load is outstanding will finish

first, thus the instructions are finishing out of the original program order. Out-of-order

execution has been used in most recent high-performance CPUs to achieve some of their

speed gains.

Consider this piece of code running on an out-of-order CPU:

# Instruction

1 R1 = M[1024]

2 R1 = R1 + 2

3 M[1032] = R1

4 R1 = M[2048]

5 R1 = R1 + 4

6 M[2056] = R1

Figure 2.2.1: Out-of-order Instruction execution with data hazards

Instructions 4, 5, and 6 are independent of instructions 1, 2, and 3, but the processor

cannot finish 4 until 3 is done, because 3 would then write the wrong value. Here some data

hazards are occurred. Here data hazards are WAR and WAW. We can eliminate this

http://en.wikipedia.org/wiki/Out-of-order_execution

http://en.wikipedia.org/wiki/Out-of-order_execution




restriction by changing the names of some of the registers by using register renaming

technique. This as follows:

Figure 2.2.2: Instruction execution after renaming

Now instructions 4, 5, and 6 can be executed in parallel with instructions 1, 2, and 3,

so that the program can be executed faster. This is the technique in register renaming. When

possible, the compiler would detect the distinct instructions and try to assign them to a

different register. However, there is a finite number of register names that can be used in the

assembly code. Many high performance CPUs have more physical registers than may be

named directly in the instruction set, so they rename registers in hardware to achieve

additional parallelism.

# Instruction

# Instruction

1 R1 = M[1024] 4 R2 = M[2048]

2 R1 = R1 + 2 5 R2 = R2 + 4

3 M[1032] = R1 6 M[2056] = R2




3. REGISTER RENAMING USING RAM

In register renaming using RAM technique, the source registers are renamed to

physical registers. In addition, a free physical register is allocated. To allocate a free physical

register, RAM approaches use a free register queue (FRQ). After I is renamed, subsequent

instructions having a data dependence will be renamed to physical registers. The direct-

mapped memory allows the mappings to be rapidly performed while taking up a small area.

Later, at the commit stage, physical registers are released by placing their identifiers back into

the FRQ.

Because the RAM table is updated at the rename stage, it is modified by either no

speculative or speculative instructions. On misspeculation, the changes must be cancelled,

restoring the RAM to its previous state at the time the offending instruction enters the rename

stage.

The simplest strategy for misspeculation recovery consists in waiting until the

mispredicted branch reaches the reorder buffer (ROB) head. As ROB entries contain the

previous mapping for the destination register, the correct RAM state can be restored by

scanning the ROB once the offending instruction reaches the ROB head.




4. REGISTER RENAMING USING CAM

Content-addressable memory (CAM) is a special type of computer memory used in

certain very-high-speed searching applications. It is also known as associative memory,

associative storage, or associative array. It compares input search data (tag) against a table of

stored data, and returns the address of matching data (or in the case of associative memory,

the matching data). Several custom computers, like the Goodyear STARAN, were built to

implement CAM, and were designated associative computers. Here register renaming

technique is performed with the help of CAM.

Figure 4: Register renaming using CAM

CAM structures have as many rows as the number of available physical registers.

Each row maintains information for renaming, recovery, and register allocation.




5. HYBRID RAM-CAM REGISTER RENAMING TECHNIQUE

In this technique we are using the combination of RAM and CAM register renaming

technique. So this technique provide the advantages of both the approaches. The hybrid

scheme uses two techniques:

A CAM containing all register mappings up to date.

A RAM acting as a cache of the CAM, containing a subset of its renaming

information.

The CAM table can be indexed both directly by a physical register and associatively

by a logical register, while the RAM is indexed by a logical register. A RAM entry in the

hybrid scheme may or may not contain a valid copy of the current mapping. Register

renaming is performed by just accessing the RAM, while valid entries are hit. If an invalid

entry is accessed, a RAM miss is said to occur and the CAM is used to retrieve the current

mapping. Therefore, the CAM is not looked up in all renaming cycles, but only upon RAM

misses, allowing for a lower number of CAM read/write ports compared with a typical CAM

implementation. On misspeculation, the entire RAM contents are invalidated. After recovery,

the current mappings are only available in the CAM, because all RAM entries are invalid.

Subsequent renaming of source registers will cause RAM misses, and the CAM will be

looked up to obtain the mappings. Both CAM lookups and new register allocations will cause

the RAM to be progressively updated and quickly reducing the RAM miss rate. In hybrid

RAM–CAM method, the register renaming takes place in the following manner,

Instruction reach at the rename stage

Destination registers are mapped to the new physical registers.

The new mapping is recorded both in the RAM and the CAM.




5.1 BLOCK DIAGRAM DESCRIPTION

Figure 5.1: Block diagram for hybrid RAM-CAM register renaming.

The above figure represents the block diagram of hybrid RAM-CAM register

renaming technique. I.e. this register renaming mechanism uses the combination of both the

RAM AND CAM approaches. So this method provides the advantages of both the

approaches. So this method provides fast access time. It also provides the energy efficient

access to the register mappings. By using this method we can increase the performance the

processors.

The block diagram is horizontally divided into three sections.

Clearing Previous Destination Mappings

Destination Register Renaming

Source Register Renaming

A. Clearing Previous Destination Mappings

CAM entries corresponding to previous destination mappings are cleared. In the first

stage, all previous destination mappings are looked up in the RAM. For those valid entries,

physical register identifiers are obtained, which are then used in the second stage to directly




index the CAM and clear the current mapping entries. On the contrary, invalid RAM entries

cause previous mappings to be associatively cleared in the CAM.

B. Destination Register Renaming

Free physical registers are mapped to the destination registers. The PE provides

physical register identifiers from a set of free entries in the CAM. These identifiers are used

to directly index the CAM and set the new mappings. New mappings are also updated in the

RAM, which is indexed with the identifiers of the destination logical registers. So in the

destination register renaming, free physical registers are mapped to the destination register.

C. Source Register Renaming

For each source register, the associated mapped physical register identifier is

obtained. In the first stage, the RAM is accessed. On a hit, mappings are available right away.

Otherwise, an associative CAM search is performed in the second stage. Finally, those

mappings retrieved from the CAM are updated in the RAM on the third stage. This last stage

is optional and increases the RAM complexity. Nevertheless, it may provide performance and

energy benefits if these updates avoid enough RAM misses.

Notice that the previous mappings are cleared in the CAM in the second stage,

whereas new mappings are set in the first stage. Thus, a hazard arises when an associative

CAM lookup in the second stage for a given instruction accesses a mapping allocated by the

same instruction in the first stage. This hazard can be avoided by flagging the new mapping

entries at the end of the first stage in an additional single-bit column in the CAM. The flags

are reset at the end of the second stage.




5.2 REORDER BUFFERS

A re-order buffer (ROB) is used for out-of-order instruction execution. It allows

instructions to be committed in-order. Additional benefits of the re-order buffer include

allowing for precise exceptions and easy rollback control of target address mispredictions

(branch or jump). The ROB works by storing instructions in their original fetched order. The

ROB can also be accessed from the side since each reservation station has an additional

parameter that points to instruction in the ROB. When jump prediction is not correct or a

non-recoverable exception is encountered in the instruction stream, the ROB is cleared of all

instructions and reservation stations are re-initialized.

Here register renaming is performed based on REORDER BUFFER (ROB).

Renaming based on a reorder buffer uses a physical register file that is the same size as the

architectural register file, together with a set of registers arranged as a queue data structure,

called the reorder buffer. As instructions are issued, they are assigned entries for any results

they may generate at the tail of the reorder buffer. That is, a place is reserved in the queue.

We maintain the logical order of instructions within this buffer – so if we can issue four

instructions i to i+3 at once, we put i in the reorder buffer first, followed by i+1, i+2 and i+3.

As instruction execution proceeds, the assigned entry will ultimately be filled in by a value,

representing the result of the instruction. When entries reach the head of the reorder buffer,

provided they've been filled in with their actual intended result, they are removed, and each

value is written to its intended architectural register. If the value is not yet available, then we

must wait until it is. Because instructions take variable times to execute, and because they

may be executed out of program order, we may well find that the reorder buffer entry at the

head of the queue is still waiting to be filled, while later entries are ready. In this case, all

entries behind the unfilled slot must stay in the reorder buffer until the head instruction

completes. The reorder buffer effectively provides the history mechanism required for

register renaming. The oldest version of a register is that stored in the architectural register;

the next oldest is that nearest the head of the reorder buffer; the youngest is that nearest the

tail of the reorder buffer. When the instruction reaches the head of the buffer, its value is

stored in the logical or physical register. At renaming stage an instruction is assigned an entry

at the tail of the reorder buffer (ROB) which becomes the name of the result register.




The reorder buffer, as its name implies, can be used for in order execution. The

reorder buffer contains

Type of instruction (branch ,store, ALU)

Destination (memory address, other ROB’s)

Value and its presence/absence.

Figure 5.2: Reorder buffer




5.3 HYBRID RAM-CAM REGISTER RENAMING IN HIGH

PERFORMANCE MULTI CORE PROCESSORS

Figure 5.3: Register renaming in multi core processor

A multi-core processor is an integrated circuit (IC) to which two or more processors

have been attached for enhanced performance, reduced power consumption, and more

efficient simultaneous processing of multiple tasks. Here we are implementing hybrid RAM

and CAM for multi core processors. Each core contains renaming stages therefore; pipeline

execution will be takes place. I.e. during the execution of one another instruction is decoded.

This helps to the simultaneous execution of an instruction. So this scheme increases

processors performance and it consumes time.




6. ADVANTAGES

Fast and energy-efficient access to register mappings.

Reduces the power consumption.

It reduces the access time.

Leakage energy is lower for hybrid schemes.

The processors work in a non-speculative mode in the common case, hence RAM

invalidations are unusual.

The silicon area required to implement the hybrid RAM–CAM scheme does not

exceed the area required by conventional renaming mechanisms.




7. DISADVANTAGES

Hardware is complex.

It is difficult to analyse as it contains the combination of both RAM-CAM approach.




8. APPLICATION

Register renaming technique can be used in high performance processors. So we can

increase its speed of operation. Register renaming can be used multi core processors also.

Register renaming technique can be used for processors i.e. is used for high computing.




9. FUTURE SCOPE

Register renaming can be used in high performance processors. It can be also used in

multi core processors. It can be also used for processors i.e. is used for high performance

computing. High performance computing includes weather fore casting, military application,

space researches etc.




10. CONCLUSION

In this paper, we presented a renaming mechanism consisting of a RAM table and a

low-complexity CAM table, as a hybrid design that took the best of both approaches.

Experimental results showed that a two-way hybrid approach achieved small performance

slowdowns (about 2% and 1% for integer and floating-point benchmarks, respectively) with

respect to a four-way CAM-based renaming mechanism that was able to recover in one clock

cycle. These small slowdowns were accompanied by a drastic reduction of the original

associative searches carried out in the CAM-based approach to only 8% and 3%. Hybrid

designs also reduced the dynamic energy by 16% and 12% with respect to the original CAM

consumption, closing the dynamic energy consumption gap between CAM and RAM

approaches. Besides general performance improvements, hybrid designs were proved to be

more efficient than the simplest non-checkpointed RAM approaches in terms of both area and

energy. Finally, the experiments showed that the performance benefits span different

processor configurations, whether with short or long pipelines. From this we found that this

technology will be much helpful to improve the performance of the processor.




REFERENCE

Efficient Register Renaming and Recovery for High-

Performance Processors bySalvador Petit, Member, IEEE,

Rafael Ubal, Julio Sahuquillo, Member, IEEE, and Pedro López,

Member, IEEE

A. Moshovos, “Power-aware register renaming,” Dept. Comput.

Eng. Group, Univ. Toronto, Toronto, ON, Canada, Tech. Rep.

01-08-02, 2002.

E. Safi, A. Moshovos, and A. Veneris, “A physical level study

and optimization of CAM-based checkpointed register alias

table,” in Proc. 13th Int. Symp. Low Power Electron. Design,

Aug. 2008, pp. 233–236.