Upload
phamdung
View
220
Download
1
Embed Size (px)
Citation preview
RENDIMIENTO
M. C. Felipe Santiago Espinosa
Marzo/2018
Maestría en ElectrónicaArquitectura de Computadoras
Unidad 2
Defining Performancen Which airplane has the best performance?
Unidad 2 - Rendimiento (Arquitectura de Computadoras) 2
Response Time and Throughput
n Response timen How long it takes to do a task
n Throughputn Total work done per unit time
• e.g., tasks/transactions/… per hour
n How are response time and throughput affected byn Replacing the processor with a faster version?n Adding more processors?
n We’ll focus on response time for now…
3Unidad 2 - Rendimiento (Arquitectura de Computadoras)
Relative Performancen Define Performance = 1/Execution Timen “X is n time faster than Y”
n XY
YX
time Executiontime ExecutionePerformancePerformanc
n Example: time taken to run a programn 10s on A, 15s on Bn Execution TimeB / Execution TimeA
= 15s / 10s = 1.5n So A is 1.5 times faster than B
4Unidad 2 - Rendimiento (Arquitectura de Computadoras)
Measuring Execution Time
n Elapsed timen Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
n Determines system performance
n CPU timen Time spent processing a given job
• Discounts I/O time, other jobs’ shares
n Comprises user CPU time and system CPU timen Different programs are affected differently by
CPU and system performance
5Unidad 2 - Rendimiento (Arquitectura de Computadoras)
CPU Clockingn Operation of digital hardware governed by a
constant-rate clock
Clock (cycles)
Data transferand computation
Update state
Clock period
n Clock period: duration of a clock cyclen e.g., 250ps = 0.25ns = 250×10–12s
n Clock frequency (rate): cycles per secondn e.g., 4.0GHz = 4000MHz = 4.0×109Hz
6Unidad 2 - Rendimiento (Arquitectura de Computadoras)
CPU Time
n Performance improved byn Reducing number of clock cyclesn Increasing clock rate (frequency)n Hardware designer must often trade off clock rate against
cycle count
Rate ClockCycles Clock CPU
Time Cycle ClockCycles Clock CPUTime CPU
7Unidad 2 - Rendimiento (Arquitectura de Computadoras)
CPU Time Examplen Computer A: 2GHz clock and for some program: 10s
CPU time
n Designing Computer Bn Aim for 6s CPU time (the same program)n Can do faster clock, but causes 1.2 × clock cycles
n How fast must Computer B clock be? (The frequency of the Computer B)
8Unidad 2 - Rendimiento (Arquitectura de Computadoras)
Instruction Count and CPI
n Instruction Count for a programn Determined by program, ISA and compiler
n Average cycles per instructionn Determined by CPU hardwaren If different instructions have different CPI
• Average CPI affected by instruction mix
Rate ClockCPICount nInstructio
Time Cycle ClockCPICount nInstructioTime CPU
nInstructio per CyclesCount nInstructioCycles Clock
9Unidad 2 - Rendimiento (Arquitectura de Computadoras)
CPI Example
n Computer A: Cycle Time = 250ps, CPI = 2.0n Computer B: Cycle Time = 500ps, CPI = 1.2n Same ISAn Which is faster, and by how much?
10Unidad 2 - Rendimiento (Arquitectura de Computadoras)
CPI in More Detailn If different instruction classes take different numbers
of cycles
n
1iii )Count nInstructio(CPICycles Clock
n Weighted average CPI
n
1i
ii Count nInstructio
Count nInstructioCPICount nInstructio
Cycles ClockCPI
Relative frequency
11Unidad 2 - Rendimiento(Arquitectura de Computadoras)
CPI Examplen Alternative compiled code sequences using
instructions in classes A, B, CClass A B CCPI for class 1 2 3IC in sequence 1 2 1 2IC in sequence 2 4 1 1
n Sequence 1: IC = 5n Clock Cycles = 2×1 + 1×2 + 2×3 = 10n Avg. CPI = 10/5 = 2.0
n ¿ Sequence 2 ?
12Unidad 2 - Rendimiento (Arquitectura de Computadoras)
Performance Summary
n Performance depends onn Algorithm: affects IC, possibly CPIn Programming language: affects IC, CPIn Compiler: affects IC, CPIn Instruction set architecture: affects IC, CPI, Tc
The BIG Picture
cycle ClockSeconds
nInstructiocycles Clock
ProgramnsInstructioTime CPU
13Unidad 2 - Rendimiento (Arquitectura de Computadoras)
Power Trends
n In CMOS IC technology
FrequencyVoltageload CapacitivePower 2
×1000×30 5V → 1V
14Unidad 2 - Rendimiento(Arquitectura de Computadoras)
Reducing Power• Suppose a new CPU has
– 85% of capacitive load of old CPU– 15% voltage and 15% frequency reduction
0.520.85FVC
0.85F0.85)(V0.85CPP 4
old2
oldold
old2
oldold
old
new
n The power walln We can’t reduce voltage furthern We can’t remove more heat
n How else can we improve performance?15Unidad 2 - Rendimiento
(Arquitectura de Computadoras)
Uniprocessor Performance
Constrained by power, instruction-level parallelism, memory latency
16Unidad 2 - Rendimiento(Arquitectura de Computadoras)
Multiprocessors
n Multicore microprocessorsn More than one processor per chip
n Requires explicitly parallel programmingn Compare with instruction level parallelism
n Hardware executes multiple instructions at oncen Hidden from the programmer
n Hard to don Programming for performancen Load balancingn Optimizing communication and synchronization
17Unidad 2 - Rendimiento(Arquitectura de Computadoras)
SPEC CPU Benchmarkn Programs used to measure performance
n Supposedly typical of actual workloadn Standard Performance Evaluation Corp (SPEC)
n Develops benchmarks for CPU, I/O, Web, …n SPEC CPU2006
n Elapsed time to execute a selection of programsn Negligible I/O, so focuses on CPU performance
n Normalize relative to reference machinen Summarize as geometric mean of performance ratios
n CINT2006 (integer) and CFP2006 (floating-point)
nn
1iiratio time Execution
18Unidad 2 - Rendimiento(Arquitectura de Computadoras)
CINT2006 for Intel Core i7 920
19Unidad 2 - Rendimiento(Arquitectura de Computadoras)
SPEC Power Benchmarkn Power consumption of server at different
workload levelsn Performance: ssj_ops/secn Power: Watts (Joules/sec)
10
0ii
10
0ii powerssj_ops Wattper ssj_ops Overall
20Unidad 2 - Rendimiento(Arquitectura de Computadoras)
SPECpower_ssj2008 for Xeon X5650
21Unidad 2 - Rendimiento(Arquitectura de Computadoras)
Pitfall: Amdahl’s Lawn Improving an aspect of a computer and expecting a
proportional improvement in overall performance
208020 n
n Can’t be done!
unaffectedaffected
improved Tfactor timprovemen
TT
n Example: multiply accounts for 80s/100sn How much improvement in multiply performance to get
5× overall?
n Corollary: make the common case fast22Unidad 2 - Rendimiento
(Arquitectura de Computadoras)
Fallacy: Low Power at Idle
n Look back at i7 power benchmarkn At 100% load: 258Wn At 50% load: 170W (66%)n At 10% load: 121W (47%)
n Google data centern Mostly operates at 10% – 50% loadn At 100% load less than 1% of the time
n Consider designing processors to make power proportional to load
23Unidad 2 - Rendimiento(Arquitectura de Computadoras)
Pitfall: MIPS as a Performance Metric
n MIPS: Millions of Instructions Per Secondn Doesn’t account for
• Differences in ISAs between computers• Differences in complexity between instructions
66
6
10CPIrate Clock
10rate Clock
CPIcount nInstructiocount nInstructio10time Execution
count nInstructioMIPS
n CPI varies between programs on a given CPU
24Unidad 2 - Rendimiento(Arquitectura de Computadoras)
n Para un programa X, un compilador generó la siguiente distribución de Instrucciones:
n Con un compilador optimizado se descarta el 50% de las instrucciones de la ALU (sin reducir cargas, almacenamientos y saltos).
n Con un ciclo de reloj de 2-ns (frecuencia de reloj 500-MHz) ¿Cuál es la frecuencia MIPS del código optimizado y del código sin optimizar? ¿Son acordes con los del tiempo de ejecución?
Unidad 2 - Rendimiento(Arquitectura de Computadoras) 25
Tipo de instrucción Frecuencia de una instrucción
CPI
Operaciones de ALU 43 % 1Cargas 21 % 2
Almacenamientos 12 % 2Saltos 24 % 2
Error en la aplicación de MIPS
MFLOPS o megaFlopsn Alternativa popular para comparar sistemas diferentes.
n Basado en operaciones en lugar de instrucciones.n La métrica no es aplicable fuera del rango de operaciones
en punto flotante.n Para un compilador: MFLOPS = 0.n Un programa con un 100 % de sumas en punto flotante
tiene una frecuencia en MFLOPS mucho mayor que un programa con un 100 % de divisiones.
Unidad 2 - Rendimiento(Arquitectura de Computadoras) 26
610time ExecutionOperations FPMFLOPS
Concluding Remarks
n Execution time: the best performance measuren Power is a limiting factor
– Use parallelism to improve performance
27Unidad 2 - Rendimiento(Arquitectura de Computadoras)
Tarea: Problemas de rendimiento ubicados en la página web del curso.
Entrega: