CS 152 Computer Architecture and Engineering Lecture 1 …cs152/sp05/lecnotes/lec1-1.pdfCS 152 Computer Architecture and Engineering Lecture 1 – The MIPS ISA cs152/ A n d a l s o

UC Regents Spring 2005 © UCBCS 152 L1: The MIPS ISA

2005-1-18John Lazzaro

(www.cs.berkeley.edu/~lazzaro)

CS 152 Computer Architecture and Engineering

Lecture 1 – The MIPS ISA

www-inst.eecs.berkeley.edu/~cs152/

And also, an intro to the course.

TAs: Ted Hong and David Marquardt


CS 152: Computer Design Team Projects

supp

orts

a 1

.875

-Mby

te o

n-ch

ip L

2 ca

che.

Pow

er4

and

Pow

er4+

sys

tem

s bo

th h

ave

32-

Mby

te L

3 ca

ches

, whe

reas

Pow

er5

syst

ems

have

a 3

6-M

byte

L3

cach

e.T

he L

3 ca

che

oper

ates

as a

bac

kdoo

r with

sepa

rate

bus

es fo

r rea

ds a

nd w

rites

that

ope

r-at

e at

hal

f pr

oces

sor

spee

d. I

n Po

wer

4 an

dPo

wer

4+ sy

stem

s, th

e L3

was

an

inlin

e ca

che

for

data

ret

riev

ed fr

om m

emor

y. B

ecau

se o

fth

e hi

gher

tran

sisto

r de

nsity

of t

he P

ower

5’s

130-

nm te

chno

logy

, we c

ould

mov

e the

mem

-or

y co

ntro

ller

on c

hip

and

elim

inat

e a

chip

prev

ious

ly n

eede

d fo

r the

mem

ory

cont

rolle

rfu

nctio

n. T

hese

two

chan

ges

in th

e Po

wer

5al

so h

ave t

he si

gnifi

cant

side

ben

efits

of r

educ

-in

g la

tenc

y to

the

L3 c

ache

and

mai

n m

emo-

ry, a

s w

ell a

s re

duci

ng t

he n

umbe

r of

chi

psne

cess

ary

to b

uild

a sy

stem

.

Chip

overv

iewFi

gure

2 s

how

s th

e Po

wer

5 ch

ip,

whi

chIB

M f

abri

cate

s us

ing

silic

on-o

n-in

sula

tor

(SO

I) d

evic

es a

nd c

oppe

r int

erco

nnec

t. SO

Ite

chno

logy

red

uces

dev

ice

capa

cita

nce

toin

crea

se t

rans

isto

r pe

rfor

man

ce.5

Cop

per

inte

rcon

nect

dec

reas

es w

ire

resi

stan

ce a

ndre

duce

s de

lays

in w

ire-d

omin

ated

chi

p-tim

-

ing

path

s. I

n 13

0 nm

lith

ogra

phy,

the

chi

pus

es ei

ght m

etal

leve

ls an

d m

easu

res 3

89 m

m2 .

The

Pow

er5

proc

esso

r su

ppor

ts th

e 64

-bit

Pow

erPC

arc

hite

ctur

e. A

sin

gle

die

cont

ains

two

iden

tical

pro

cess

or co

res,

each

supp

ortin

gtw

o lo

gica

l thr

eads

. Thi

s ar

chite

ctur

e m

akes

the c

hip

appe

ar as

a fo

ur-w

ay sy

mm

etric

mul

-tip

roce

ssor

to th

e op

erat

ing

syst

em. T

he tw

oco

res s

hare

a 1

.875

-Mby

te (1

,920

-Kby

te) L

2ca

che.

We i

mpl

emen

ted

the L

2 ca

che a

s thr

eeid

entic

al s

lices

with

sep

arat

e co

ntro

llers

for

each

. The

L2

slice

s are

10-

way

set-

asso

ciat

ive

with

512

cong

ruen

ce cl

asse

s of 1

28-b

yte l

ines

.T

he d

ata’s

rea

l add

ress

det

erm

ines

whi

ch L

2sli

ce th

e dat

a is c

ache

d in

. Eith

er p

roce

ssor

core

can

inde

pend

ently

acc

ess e

ach

L2 c

ontr

olle

r.W

e al

so in

tegr

ated

the

dire

ctor

y fo

r an

off-

chip

36-

Mby

te L

3 ca

che o

n th

e Pow

er5

chip

.H

avin

g th

e L3

cach

e dire

ctor

y on

chip

allo

ws

the

proc

esso

r to

che

ck th

e di

rect

ory

afte

r an

L2 m

iss w

ithou

t exp

erie

ncin

g of

f-ch

ip d

elay

s.To

red

uce

mem

ory

late

ncie

s, w

e in

tegr

ated

the m

emor

y co

ntro

ller o

n th

e chi

p. T

his e

lim-

inat

es d

rive

r an

d re

ceiv

er d

elay

s to

an

exte

r-na

l con

trol

ler.

Proce

ssor c

oreW

e de

signe

d th

e Po

wer

5 pr

oces

sor c

ore

tosu

ppor

t bo

th e

nhan

ced

SMT

and

sin

gle-

thre

aded

(ST

) op

erat

ion

mod

es.

Figu

re 3

show

s th

e Po

wer

5’s

inst

ruct

ion

pipe

line,

whi

ch is

iden

tical

to th

e Pow

er4’

s. A

ll pi

pelin

ela

tenc

ies i

n th

e Pow

er5,

incl

udin

g th

e bra

nch

misp

redi

ctio

n pe

nalty

and

load

-to-

use

late

n-cy

with

an

L1 d

ata

cach

e hi

t, ar

e th

e sa

me

asin

the

Pow

er4.

The

iden

tical

pip

elin

e st

ruc-

ture

lets

opt

imiz

atio

ns d

esig

ned

for

Pow

er4-

base

d sy

stem

s pe

rfor

m

equa

lly

wel

l on

Pow

er5-

base

d sy

stem

s. F

igur

e 4

show

s th

ePo

wer

5’s i

nstr

uctio

n flo

w d

iagr

am.

In S

MT

mod

e, th

e Po

wer

5 us

es tw

o se

pa-

rate

inst

ruct

ion

fetc

h ad

dres

s reg

ister

s to

stor

eth

e pr

ogra

m c

ount

ers

for

the

two

thre

ads.

Inst

ruct

ion

fetc

hes

(IF

stag

e)

alte

rnat

ebe

twee

n th

e tw

o th

read

s. I

n ST

mod

e, t

hePo

wer

5 us

es o

nly

one

prog

ram

cou

nter

and

can

fetc

h in

stru

ctio

ns fo

r th

at t

hrea

d ev

ery

cycl

e. I

t ca

n fe

tch

up t

o ei

ght

inst

ruct

ions

from

the

inst

ruct

ion

cach

e (I

C s

tage

) ev

ery

cycl

e. T

he tw

o th

read

s sh

are

the

inst

ruct

ion

cach

e an

d th

e in

stru

ctio

n tr

ansla

tion

faci

lity.

In a

give

n cy

cle,

all f

etch

ed in

stru

ctio

ns co

me

from

the

sam

e th

read

.

42

HOT

CHIP

S15

IEEE M

ICRO

Figu

re 2

. Pow

er5

chip

(FXU

= fi

xed-

poin

t exe

cutio

n un

it, IS

U=

inst

ruct

ion

sequ

enci

ng u

nit,

IDU

= in

stru

ctio

n de

code

uni

t,LS

U =

load

/sto

re u

nit,

IFU

= in

stru

ctio

n fe

tch

unit,

FPU

=flo

atin

g-po

int u

nit,

and

MC

= m

emor

y co

ntro

ller).

IBM Power 5 “die photo”: a die is an unpackaged part

Teams of4-5 students

Single-cycle CPU project

3 weeks

Pipelined CPU4 weeks

Final Project5 weeks

200 hr/student


CS 152: Real hardware, not simulation

Intel XScale 80200: used in earlier HP PocketPCs

Will we be fabricate

CPU dies?

Back when I was taking

classes (1984 @ Caltech)our project course did fab chips.


CS 152: Our implementation technology


Design CPUs by wiring up fixed parts

Thesemodelsdon’thaveenoughparts ...


XilinxVirtex E

43,200“parts” +655,000RAM bits

WriteVerilog to “wire”parts.

FPGAs: Field Programmable Gate Arrays


Calinx: 150/152 boards in 119/125 Cory

XilinxVirtex E FPGA

DRAM

Program Xilinx via PC

Download CPU machine code using TFTP


CS152: Approximate grading weights

Lab work

Midterm exams

HW

EvalsLab 4: 25 %Lab 3: 15 %Lab 2: 8 %Lab 1: 2 %

Subject tofine tuning ...


Grading: Peer and staff evaluations

Lab work

Midterm exams

HW

Evals

Peer evals:

Teammatesgrade eachother afterLabs 2, 3, 4.Rewards good“team players”

Staffevals:

TAs areyour“managers”

They observehow wellyou workon the team.


Exams: Two mid-terms, no final ...

Lab work

Midterm exams

HW

Evals

MT1: Thursdaybefore Spring Break

MT2: Last Thursday of classes.

3 hours,early evening,no calculatorsor electronicdevices.


Homeworks: To prepare for midterms

Lab work

Midterm exams

HW

Evals

Two longhomeworks,preceding each midterm.

Work onit alone,no collaboration.

No latehomeworksaccepted.


152: Semester Calendar

www-inst.eecs.berkeley.edu/~cs152/

See the class webpage for the most up to date version! Changes daily!

Also read our Usenet newsgroup: ucb.class.cs152


Week One: You are here.

Lab due Monday. 119 Cory open during week day daytimes. 253 Cory to get Card Key access.

Download Lab 1, done individually. It is a refresher lab for MIPS assembly language programming. Use 119 Cory machines, or use RDF (see web page “Resources”)


Week One: Preparing for team labsLab 2, the first team lab, is up. I will be mailing out tentative team assignments later today. On Friday at noon, meet in 119 Cory w/ TAs, to finalize teams.

TAs: Ted Hong andDavid Marquardt. Bothundergrads. See web pagefor contact info (coming).


Week One: 61(c) boot-camp begins.61(c)-only students MUST attend first 4 150 Lectures in order to learn Xilinx CAD and Verilog (in person, or via video). 61(c)’ers MUST also do the first 4 150 labs.

Do 150 labs self-study(not handed in or graded).Ask the 152 TAs questions,not the 150 TAs.


Weeks Two and Three: Lab 2 Begins


Week Four/Five: Lab 2 Ends, Begin Lab 3


Course Organization: Final Thoughts

Text: “Computer Organization and Design”, 3rd Edition, David Patterson and John Hennessy. Not 2nd edition.

We expect you to obey the EECS Policy on Academic Dishonesty. See “Course Info” on website for info.

Printing: The first 125 pages are free (count includes cover sheets). Then,$12 per 200 pages. Plan ahead ...


MIPS Instruction Set

Your CPU projects will use a subset of the MIPS ISA.

Lab 1 refreshes

your MIPS machine language

skills, uses SPIM.


Instruction Sets: A Thin Interface

Instruction Set ArchitectureI/O systemProcessor

Digital DesignCircuit Design

Datapath & Control

Transistors

MemoryHardware

CompilerOperating

System(Mac OS X)

Application (iTunes)

Software Assembler

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

In Hexadecimal: 012A4020000000 01001 01010 01000 00000 100000Binary:

6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsFieldsize:opcode rs rt rd functshamtBitfield:

“R-Format”


Hardware implements semantics ...

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch next inst from memory:012A4020

opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10

“Retrieve” register values: $9 $10

Add $9 to $10

Place this sum in $8

Prepare to fetch instruction that follows the ADD in the program.

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10


Memory Instructions: LW $1,30($2)

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch the load inst from memory

“Retrieve” register value: $2

Compute memory address: 30 + $2

Load memory address contents into: $1

Prepare to fetch instr that follows the LW in the program. Depending on load semantics, new $1 is visible to that instr, or not until the following instr (”delayed loads”).

Decode fields to get : LW $1, 30($2)

opcode rs rt offset “I-Format”


Branch Instructions: BEQ $1,$2,25

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch branch inst from memory

“Retrieve” register values: $1, $2

Compute if we take branch: $1 == $2 ?

Decode fields to get: BEQ $1, $2, 25

opcode rs rt offset “I-Format”

ALWAYS prepare to fetch instr that follows the BEQ in the program (”delayed branch”). IF we take branch, the instr we fetch AFTER that instruction is PC + 4 + 100.

PC == “Program Counter”


Conclusions: The Architect’s Contract

To the program, it appears that instructions execute in the correct order defined by the ISA.

What the machine actually does is up to the hardware designers, as long as the contract is kept.

As each instruction completes, themachine state (regs, mem) appears to the program to obey the ISA.

The primary challenge of 152 CPU projects!


New successful instruction sets are rare

instruction set

software

hardware

Implementors suffer with original sins of ISAs, to support the installed base of software.

Documents

CS 152 Computer Architecture and Engineering Lecture 1 …cs152/sp05/lecnotes/lec1-1.pdfCS 152 Computer Architecture and Engineering Lecture 1 – The MIPS ISA cs152/ A n d a l s o