Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa...

Link-Time Path-Sensitive Memory Redundancy

Elimination

Manel Fernández and Roger Espasa{mfernand,roger}@ac.upc.es

Computer Architecture Department

Universitat Politècnica de Catalunya

Barcelona, Spain

Motivation

The memory “gap” Processor speed increases faster than memory speed

L1-cache latency continues to increase Memory operations remain a significant bottleneck

Memory redundancy Instructions that repeatedly access the same location

Lots of memory operations are redundant Hardware designers exploit memory redundancy

E.g., caches take advantage of temporal reuse

The compiler must be very aggressive in

memory optimizations

Memory redundancy

Memory instructions that repeatedly

access the same location Lots of memory operations are redundant

Sources of redundancy Source code structure

Programmers introduce redundancy

Traditional compilation Separate compilation units Limitations in the compilation model Code generation introduces redundancy

What percentage of memory

operations are redundant at run time?

… = *p;if ( … ){ *q = … … = *p;}

redundantload

redundancysource intervening

Dynamic memory redundancy

2 4 8 16 32 64 128 256 512 1024

Redundancy window size (entries)

m88ksim

compress

vortex

Average

Loadredundancy

Storeredundancy

Talk outline

Motivation

Memory redundancy elimination (MRE)

Evaluation

Summary

Removal of memory instructions that repeatedly

access the same location Targeted at redundancy type

Load redundancy elimination (LRE) in a path-sensitive fashion– Based on path-sensitive memory disambiguation

Store redundancy elimination (SRE) Targeted at redundancy distance

Eliminating close/distant redundancy

In the context of a binary optimizer Overcome limitations of traditional compilers Need to deal with “executable code” problems

Load redundancy elimination (LRE)

Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE

Profile information is needed

Eliminating close redundancy Within extended basic blocks (EBBs)

Eliminating distant redundancy Intraprocedural dataflow analysis

[HorspoolHo97] For fully/partially-redundant loads

Redundancy on all/some paths Partial-LRE requires insertion of

speculative loads

R. N. Horspool and H. C. Ho. Partial redundancy elimination driven by a cost-benefit analysis, CSSE’97

Hot Path

move r0 , r2---------------

...I1 load (p0), r1 move r1 , r0 ...

I2 load (p0), r2 ...

Memory disambiguation

Register use-def chains Symbolic descriptors for every use Disambiguation by instruction inspection

Fails on path-sensitive redundancies

Need to deal with

path-sensitive information Partial-LRE is not sufficient either

...I0 def p0 ...I1 load (p0),r1 ...

... I3 add p0,8,p0 ...

IØ Ø-def p0 ... I2 load (p0),r2 ...

IIIII2

)8p0,p0()p0,p0(p0

p0 ,p0

Path-sensitive memory

disambiguation Established for only a subset of all the

possible paths Subsumes generic disambiguation

Path-sensitive LRE Partial-LRE is now adapted for dealing

with path-sensitive redundancies Availability on edge (AVEDGij)

Path-sensitive redundancy

...I0 def p0 ...I1 load (p0),r1 move r1, r0 ...

... I3 add p0,8,p0 load (p0),r0 ...

IØ Ø-def p0 ... move r0, r2I2 load (p0),r2 ...

---------------

)8p0,p0( ,p0

Store redundancy elimination (SRE)

...I1 store r1, (p0) ...I2 store r2, (p0) ...

----------------

Similar approach than LRE SRE on EBBs Full- and Partial-SRE

New formulation of the analysis No path-sensitive elimination!

Elimination of dead stores Other optimizations produce a lot

of dead stores Form of dead code elimination Based on heuristics

Includes a basic analysis for useless stack locations

...I1 load (p0), r0 ...I2 store r0, (p0) ...

----------------

Talk outline

Motivation

Evaluation

Summary

Methodology

Benchmark suite SPECint95

Compiled on an AlphaServer with full optimizations Intrumented using Pixie to get profiling information Aggressively re-optimized using Alto

Experimental framework Alto executable optimizer

Evaluation Dynamic number of loads/stores Actual execution time

AlphaServer GS-140, Alpha EV6-21264

Dynamic number of loads/stores

Dynamic number of loads

ksim gc

s liijp

eg perl

Benchmark

Dynamic number of stores

ksim gc

s liijp

eg perl

Benchmark

Partial

Complete

Execution time

go m88ksim gcc compress li ijpeg perl vortex Gmean

Benchmark

Partial

Complete

Relative execution time on an AlphaServer GS-140, Alpha EV6-21264 525MHz

Dynamic replay traps

Benchmark

Partial

Complete

Relative number of replay traps on the sim-alpha simulator, modeling an Alpha EV6-21264

Talk outline

Motivation

Evaluation

Summary

A high percentage of memory operations are redundant

Memory redundancy elimination (MRE) Removal of redundant memory operations

Load redundancy elimination (LRE) in a path-sensitive fashion– Based on path-sensitive memory disambiguation

Store redundancy elimination (SRE)– Including elimination of dead stores

For executable code or link-time Overcome limitations of traditional compilers

Valuable results on real execution time

Future directions Explore better alias analysis mechanism Additional techniques for MRE

Backup slides

Dynamic memory redundancy

Dynamic load redundancy (%)

2 4 8 16 32 64 128 256 512 1024

Dynamic store redundancy (%)

2 4 8 16 32 64 128 256 512 1024

m88ksim

compress

vortex

Average

Dynamic load redundancy

2 4 8 16 32 64 128 256 512 1024

m88ksim

compress

vortex

Average

Dynamic store redundancy

2 4 8 16 32 64 128 256 512 1024

m88ksim

compress

vortex

Average

I1 loads a value from

memory into r1

I2 loads from the same

location into r2

Location (p0) is not

modified between I1

and I2

r1 can be safely

bypassed to r2

...I1 load (p0), r1

I2 load (p0), r2 ...

move r1 , r0 move r0 , r2---------------

I2 can be removed!

LRE on executable code

Is (p1) at I1 the same

memory location than

(p2) at I2?

Is there any available

register between I1 and

I2 that can be used to

bypass r1 to r2?

...I1 load (p1), r1

I2 load (p2), r2 ...

Alias analysis!

Register liveness

analysis!

move r1 , r0 move r0 , r2---------------

LRE: Eliminating close redundancy

For extended basic blocks (EBBs) Alias analysis: for disambiguation Register live analysis: for bypassing

Profile-guided LRE There is not always a benefit in

removing a redundant load

Hot Path

BBBBlatC

BBlatBfreqfreq

freqload

Need to evaluate cost-benefit of

applying LRE! move r0 , r2---------------

...I1 load (p0), r1 move r1 , r0 ...

I2 load (p0), r2 ...

LRE: Eliminating distant redundancy

For eliminating fully- and

partially- redundant loads Requires insertion of speculative loads

Dataflow analysis [HorspoolHo97] Extended cost equation

Complex search for available registers

I2 load (p0),r1 ...I1 store r1 ,(p0)

load (p0), r0

move r0 ,r1----------------

move r1 ,r0

insertbypass

freqsrcloadinsert

freqsrc

freqredmovebypass

EDGlatC

BBBBlatC

Fundamental problems Alias analysis for disambiguation Liveness analysis for register bypassing Cost-benefit analysis for applying LRE

Profile information is needed

Eliminating close redundancy Within extended basic blocks (EBBs)

Eliminating distant redundancy Intraprocedural dataflow analysis

[HorspoolHo97] For fully/partially-redundant loads Partial-LRE requires insertion of

speculative loads

Hot Path

move r0 , r2---------------

...I1 load (p0), r1 move r1 , r0 ...

I2 load (p0), r2 ...

Path-sensitive LRE

Path-sensitive redundancy Redundancy occurs only on some

execution paths Partial-LRE is not sufficient

Memory disambiguation Using register use-def chains Symbolic descriptors for every use

Path-sensitive memory

disambiguation is needed!

...I0 def p0 ...I1 load (p0),r1 ...

... I3 add p0,8,p0 ...

IØ Ø-def p0 ... I2 load (p0),r2 ...

IIIII2

)8p0,p0()p0,p0(p0

Path-sensitive information Disambiguation is established for only

a subset of all the possible paths For detecting path-sensitive exact

memory dependencies

Partial-LRE Algorithm is now adapted for dealing

with path-sensitive redundancies Availability on edge (AVEDGij)

Path-sensitive memory disambiguation

...I0 def p0 ...I1 load (p0),r1 move r1, r0 ...

... I3 add p0,8,p0 load (p0),r0 ...

IØ Ø-def p0 ... move r0, r2I2 load (p0),r2 ...

---------------

)8p0,p0(

A combined algorithm

Short-distance MRE Basic

MRE within EBBs

Long-distance MRE Full

Full-MRE Partial

Partial-MRE Complete

Path-sensitive LRE Partial SRE Dead store elimination

Easy optimizations(including Basic-MRE)

Function inliningFunction inlining

Long-distance MRE(Full/Partial/Complete)

Easy optimizations(including Basic-MRE)

Dynamic number of loads

Benchmark

Partial

Complete

Dynamic number of stores

Benchmark

Partial

Complete

Alpha 21264 results

Execution time

ksim gc

s liijp

eg perl

Benchmark

Dynamic number of replay traps

ksim gc

s liijp

eg perl

Benchmark

Partial

Complete

Link-Time Path-Sensitive Memory Redundancy Elimination Manel Fernández and Roger Espasa...

Documents

Vidas' Book Series (Espasa-Calpe) by @manuelpm

OBRAS DE REFERENCIA - gredos.usal.es · blicada por editorial Gredos. Diccionario de sinónimos y de antónimos Espasa.-58 ed. Madrid: Espasa Calpe, 1992.-1.319 p. 7.300 pts. Contie

Espasa - Guia Practica De Escritura Y Redaccion.pdf

Dating the Euro Area Business Cycle...Team Leader: Jordi Suriñach (surinach@eco.ub.es ) Instituto Flores de Lemus ( IFL ), Universidad Carlos III Team Leader: Antoni Espasa (espasa@est-econ.uc3m.es

SELLO ESPASA COLECCIÓN FORMATO CARTONÉ SERVICIO

Ridpath Ian - Guias Visuales Espasa - Astronomia

SELLO ESPASA FORMATO RUSTICA SIN SOLAPAS SERVICIO …

El pirata espasa negra p4 b

225642497 Diccionario de Filosofia Espasa PDF

GridSuperscalar A programming model for GRID applications José Mª Cela cela@ac.upc.es cela@ciri.upc.es

OTROS TÍTULOS PUBLICADOS EN ESPASA el Pepe Emocionarte. …

Disponibilidades ESPASA - Resumen (25-02-2014)

Traversal techniques for concurrent systems Marc Solé & Enric Pastor Departament of Computer Architecture UPC msole@ac.upc.es

Joan Espasa Morir pensando matar - resad.es

Diccionario de Filosofia. Espasa PDF

11) Enciclopedia Temática Espasa. (2003). “Sintaxis” en Enciclopedia Temática Espasa. España Espasa-Calpe, S.a., Pp. 1061-1065[1]

07 3-peix espasa- pau

Web viewDiccionario jurídico Espasa, Espasa, Madrid, 1933. Dworkin, Ronald, Los derechos en serio, Ariel Derecho, Barcelona, 1989. García Pelayo, Manuel, Las

Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC) jnoguera@ac.upc.es

Espasa - Guia Practica Del Español Correcto