Transcript

Sogang University

Advanced Computing System

Chap 1. Computer Architecture

Hyuk-Jun Lee, PhD

Dept. of Computer Science and EngineeringSogang University

Seoul, Korea

Email: [email protected]

Sogang University

Contents

• Computer Architecture Overview

• Metrics

– Performance

– Power consumption

• What to improve?

– Common case

– Amdahl's law

Sogang University

The Computer Revolution

• Progress in computer technology– Underpinned by Moore’s Law

• Makes novel applications feasible– Computers in automobiles– Cell phones– Human genome project– World Wide Web– Search Engines

• Computers are pervasive

Sogang University

Classes of Computers

• Desktop computers– General purpose, variety of software– Subject to cost/performance tradeoff

• Server computers– Network based– High capacity, performance, reliability– Range from small servers to building sized

• Embedded computers– Hidden as components of systems– Stringent power/performance/cost

constraints

Sogang University

Understanding Performance• Algorithm

– Determines number of operations executed

• Programming language, compiler, architecture– Determine number of machine instructions

executed per operation

• Processor and memory system– Determine how fast instructions are executed

• I/O system (including OS)– Determines how fast I/O operations are executed

Sogang University

Below Your Program• Application software

– Written in high-level language

• System software– Compiler: translates HLL code to

machine code– Operating System: service code

• Handling input/output• Managing memory and storage• Scheduling tasks & sharing

resources

• Hardware– Processor, memory, I/O

controllers

Sogang University

Levels of Program Code• High-level language

– Level of abstraction closer to problem domain

– Provides for productivity and portability

• Assembly language– Textual representation of

instructions

• Hardware representation– Binary digits (bits)– Encoded instructions and

data

Sogang University

Components of a Computer• Same components for

all kinds of computer– Desktop, server,

embedded

• Input/output includes– User-interface devices

• Display, keyboard, mouse

– Storage devices• Hard disk, CD/DVD, flash

– Network adapters• For communicating with

other computers

The BIG Picture

Sogang University

Typical x86 PC System

Sogang University

Inside the Processor• AMD Barcelona: 4 processor cores

Sogang University

ARM based smartphone structure

IP block

Embedded multimedia card

Sogang University

Response Time and Throughput

• Response time– How long it takes to do a task

• Throughput– Total work done per unit time

• e.g., tasks/transactions/… per hour

• How are response time and throughput affected by– Replacing the processor with a faster version?– Adding more processors?

• We’ll focus on response time for now…

Sogang University

CPU Time

• Performance improved by– Reducing number of clock cycles– Increasing clock rate– Hardware designer must often trade off

clock rate against cycle count

Rate Clock

Cycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

Sogang University

CPU Time Example• Computer A: 2GHz clock, 10s CPU time• Designing Computer B

– Aim for 6s CPU time– Can do faster clock, but causes 1.2 × clock cycles

• How fast must Computer B clock be?

4GHz6s

1024

6s

10201.2Rate Clock

10202GHz10s

Rate ClockTime CPUCycles Clock

6s

Cycles Clock1.2

Time CPU

Cycles ClockRate Clock

99

B

9

AAA

A

B

BB

Sogang University

Instruction Count and CPI

• Instruction Count for a program– Determined by program, ISA and compiler

• Average cycles per instruction– Determined by CPU hardware– If different instructions have different CPI

• Average CPI affected by instruction mix

Rate Clock

CPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

Sogang University

CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?

1.2500psI

600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

A is faster…

…by this much

Sogang University

CPI in More Detail• If different instruction classes take

different numbers of cycles

n

1iii )Count nInstructio(CPICycles Clock

Weighted average CPI

n

1i

ii Count nInstructio

Count nInstructioCPI

Count nInstructio

Cycles ClockCPI

Relative frequency

Sogang University

CPI Example• Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C

CPI for class 1 2 3

IC in sequence 1 2 1 2

IC in sequence 2 4 1 1

Sequence 1: IC = 5 Clock Cycles

= 2×1 + 1×2 + 2×3= 10

Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6 Clock Cycles

= 4×1 + 1×2 + 1×3= 9

Avg. CPI = 9/6 = 1.5

Sogang University

Performance Summary

• Performance depends on– Algorithm: affects IC, possibly CPI– Programming language: affects IC, CPI– Compiler: affects IC, CPI– Instruction set architecture: affects IC,

CPI, Tc

The BIG Picture

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructioTime CPU

Sogang University

Uniprocessor Performance

Constrained by power, instruction-level parallelism, memory latency

Sogang University

Power consumption• For CMOS chips, traditional dominant energy consumption

has been in switching transistors, called dynamic power

witchedFrequencySVoltageLoadCapacitivePowerdynamic 2

2/1

• For mobile devices, energy better metric

VoltageLoadCapacitiveEnergydynamic2

• For a fixed task, slowing clock rate (frequency switched) reduces power, but not energy• Capacitive load a function of number of transistors connected to output and technology, which determines capacitance of wires and transistors• Dropping voltage helps both, so went from 5V to 1V• To save energy & dynamic power, most CPUs now turn off clock of inactive modules (e.g. Fl. Pt. Unit)

Sogang University

Example of quantifying power • Suppose 15% reduction in voltage results in a

15% reduction in frequency. What is impact on dynamic power?

dynamic

dynamic

dynamic

OldPower

OldPower

witchedFrequencySVoltageLoadCapacitive

witchedFrequencySVoltageLoadCapacitivePower

6.0

)85(.

)85(.85.2/1

2/1

3

2

2

Sogang University

Power consumtion• Because leakage current flows even when a

transistor is off, now static power important too

• Leakage current increases in processors with smaller transistor sizes• Increasing the number of transistors increases power even if they are turned off• In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40%• Very low power systems even gate voltage to inactive modules to control loss due to leakage

VoltageCurrentPower staticstatic

Sogang University

Amdahl’s Law

enhanced

enhancedenhanced

new

oldoverall

Speedup

Fraction Fraction

1

ExTimeExTime

Speedup

1

Best you could ever hope to do:

enhancedmaximum Fraction - 1

1 Speedup

enhanced

enhancedenhancedoldnew Speedup

FractionFraction ExTime ExTime 1

Sogang University

Amdahl’s Law example• New CPU 10X faster• I/O bound server, so 60% time waiting for I/O

56.1

64.0

1

100.4

0.4 1

1

SpeedupFraction

Fraction 1

1 Speedup

enhanced

enhancedenhanced

overall

• Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster