Finishing out EECS 470 A few snapshots of the real world

Finishing out EECS 470

A few snapshots of the real world

Real processors:How they are different than your project.

• What we’ve talked about so far isn’t grounded by the real world in any meaningful way.– That is, we haven’t really looked at how real

processors do things• Today we’ll look at two processors– We’ll start with a 2003 core from AMD• Lots of details available, close to your project

– Jump to the latest Intel core.• Look at performance issue

AMD 64-bit coreMost taken from

http://www.chip-architect.com/

Bit-interleavedbusses running “North-South”

IntegerDecode/Dispatch

• 3 types of instructions– Direct path

• RISC-like

– Vector path• Broken into smaller instructions via micro code.

– Double• 128-bit instructions which can be broken into 2 64-bit

independent instructions are (called Double)• Others are done via microcode• Most 128-bit SSE and SSE2 are made into doubles.

• Each cycle an instruction is issued into one of 3 lanes. – Each lane has • 8 RSs • 1 ALU • 1 AGU (Address Generation Unit)

– Each RS sees broadcasts from all ALUs, AGUs, L/S units etc.

Rename

• Break the physical register file into 2 parts (sort of like P6 scheme with ARF/RoB)– 72 in-flight instructions are kept in the RoB

• The other structure is the IFFRF: Integer Future File and Register File – 16 registers of committed state– 16 “future registers”– 8 scratch-pad registers

Future file• In the P6 scheme we had to look 3 places for the

data– The PRF– The RoB– The CDB (later)

• Here we look in the FF or the CDB-like-things later.– The FF holds the speculative value if it is known. – At execution complete instructions check to see if they

were the last thing to dispatch that writes to a given physical register.• This is done by tagging the FF with the RoB number.

– If they were the last to have that AR as a destination, they update the FF.

How does the • At issue we:

– Check the FF for source operands– Reserve a spot in the RoB– Place our tag (RoB number) in the FF– Mark the FF entry as invalid

• At EX complete we:– Send RoB number and data to the CDB– Send data to the RoB– Update FF if tag matches

• At retire – update ARF value (from RoB)

• At mispredict– Copy ARF value into FF.

What did the FF buy us?

• P6-like advantages– No free-list for PRF– Can just clear the RAT on mis-predict.

• But no need to access the RoB looking for data– RoB data only written once (EX complete) and only

read once (Commit)• Some pain– Early branch resolution looks hard

• It uses an 8-bit descriptor for 72 entries.

Re-Order-Buffer Tag definition

wrap bit

Instruction In Flight Number

re-order buffer index 0...23 sub-index 0..2

bit 7 bit 6 bit 5 bit 4 bit 3 bit 2 bit 1 bit 0

1) A sub-index 0,1 or 2 which identifies from which of the three lanes the instruction was dispatched. 2) A value 0..23 that identifies the “cycle" in which the instruction was dispatched. The "cycle counter" wraps to 0 after reaching 23. 3) A wrap bit. When two instructions have different wrap bits then the cycle counter has wrapped between the dispatches.

Finishing out EECS 470 A few snapshots of the real world

Documents

EECS 470 Lab 5 - Linux Shell Scripting · UNIX Utilities diff Description I Showstheline-by-linediﬀerencesbetweenﬁles I Goodforcheckingifyouroutputiscorrect ... EECS 470 Lab 5

EECS 470 Lecture 8 P6 µarchitecture

EECS 470 Power and ArchitecturePower and Architectureweb.eecs.umich.edu/~twenisch/470_F07/lectures/21.pdf · Power and ArchitecturePower and Architecture o n ... Source: Gordon Bell,

EECS 470 Lab 1 - Verilog: Hardware Description Language

Lecture 3 · 2021. 1. 26. · Lecture 3 EECS 470 Slide 1 EECS 470 Lecture 3 Pipelining & Hazards I Jon Beaumont GAS STATION Slides developed in part by Profs. Austin, Brehob, Falsafi,

EECS 470 Lab 5 - Linux Shell ScriptingI Calledfromsomecommandline/shell WhydoIcare? I Utilitiesformthebasisof“Linuxskills” I Usefulforautomation I Necessaryfortoday’slab (University

Storage Bus Instruction Unit EECS 470 Lecture 6 Tomasulo ... · Scheduling Algorithm II: Tomasulo •Tomasulo’s algorithm •Reservation stations (RS): instruction buffer •Common

EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470 Lab 3

EECS 470€¢CISC (Complex Instruction Set Computing) ... •RISC (Reduced Instruction Set Computing) ... Reg File PC +1 ALU D-cache

EECS 470 Lecture 1

EECS 470 Lab 3EECS 470 Lab 3 SystemVerilogStyleGuide Department of Electrical Engineering and Computer Science College of Engineering University of Michigan Friday,24th January

EECS 470 Power and Architecture

EECS 470 Midterm ReviewEECS 470 Slide 19 © Wenisch 2009 R10K Cycle # 6 ROB ht#Insn T Told S X C t1R3=R1*R2 P3 or p10p9 h2R1=R3+R2 p6 p1c5c6 3R4=R4+10 p7 …

taketoyo-sci.or.jp...2015/07/30 · 2361 470- 2361 470- 2357 470- 2329 470— 2380 470- 2349 470- 2544 470— 2544 470 2362 470- 2347 470- 2385 470 2317 470— 2346 470- 2346 470-

EECS 470 COMPUTER ARCHITECTURE, APRIL 2021 1 EECS470

EECS 470 Lecture 6 Branches: Address prediction and recovery (And interrupt recovery too.)

EECS 470€¦ · EECS 470 Slide 6 Instruction Set Architecture “Instruction set architecture (ISA) is the structure of a computer that a machine language programmer (or a compiler)

EECS 470 Lecture 6 – Winter ’04 Branches: Address ... · EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 –Winter 2020 Slides developed

r e EECS 470 Lecture 6