29
March 2005 1/29 R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class ENGR 330: Today’s Class Notes Notes Networking/Telecom Course (QMCS 370) Networking/Telecom Course (QMCS 370) CIGs CIGs Pentium Instruction Set Pentium Instruction Set Format overview Format overview Evolution Evolution Details, Address Modes Details, Address Modes Pentium Architecture/Pipelining Pentium Architecture/Pipelining Pentium, the first Pentium, the first Pentium Pro Pentium Pro Pentium 3 Pentium 3 Pentium 4 Pentium 4 Memory Management (if time) Memory Management (if time)

March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 1/29R. Smith - University of St Thomas - Minnesota

ENGR 330: Today’s ClassENGR 330: Today’s Class

• NotesNotes– Networking/Telecom Course (QMCS 370)Networking/Telecom Course (QMCS 370)– CIGsCIGs

• Pentium Instruction SetPentium Instruction Set– Format overviewFormat overview– EvolutionEvolution– Details, Address ModesDetails, Address Modes

• Pentium Architecture/PipeliningPentium Architecture/Pipelining– Pentium, the firstPentium, the first– Pentium ProPentium Pro– Pentium 3Pentium 3– Pentium 4Pentium 4

• Memory Management (if time)Memory Management (if time)

Page 2: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 2/29R. Smith - University of St Thomas - Minnesota

Pentium Instruction FormatPentium Instruction Format

• Supports 8, 16, 32-bit operandsSupports 8, 16, 32-bit operands• Officially 17 addressing modes, arguably moreOfficially 17 addressing modes, arguably more• Keyed off the opcode and prefixesKeyed off the opcode and prefixes• Identical “assembly language” from old 8080 CPUIdentical “assembly language” from old 8080 CPU

Page 3: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 3/29R. Smith - University of St Thomas - Minnesota

ChronologyChronology

• 8080 (1974)8080 (1974)– 8-bit registers, 16-bit RAM addresses (MITS Altair)8-bit registers, 16-bit RAM addresses (MITS Altair)

• 8086, 8088 (1978) “IA-16”8086, 8088 (1978) “IA-16”– 16-bit registers and RAM addresses (IBM-PC)16-bit registers and RAM addresses (IBM-PC)– 8088 hardware was ‘backwards compatible” with 80858088 hardware was ‘backwards compatible” with 8085– ““Assembler compatible” with 8080 - just reassembleAssembler compatible” with 8080 - just reassemble– SegmentationSegmentation allowed 1MB of RAM addressing allowed 1MB of RAM addressing

• 80386 (1985) “IA-32”80386 (1985) “IA-32”– 32-bit registers w/smaller ‘subsets’ for compatibility32-bit registers w/smaller ‘subsets’ for compatibility– 32-bit addresses made segments irrelevant32-bit addresses made segments irrelevant

• Pentium - the first (1993)Pentium - the first (1993)• P6 Family introduced in 1995P6 Family introduced in 1995

– Pentium Pro, Pentium II, Pentium III, etc.Pentium Pro, Pentium II, Pentium III, etc.

• Pentium 4 introduced in 2000Pentium 4 introduced in 2000

Page 4: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 4/29R. Smith - University of St Thomas - Minnesota

Instruction Set ExtensionsInstruction Set Extensions

• Each new processor brought new instructionsEach new processor brought new instructions– Specialized sets, tooSpecialized sets, too

• 80x87 Math Co-Processor80x87 Math Co-Processor– Introduced floating point instructions and stackIntroduced floating point instructions and stack– Integrated into later processorsIntegrated into later processors

• MMX (1997)MMX (1997)– SIMD instructions, 8 integer registers @ 64 bits (reused FP)SIMD instructions, 8 integer registers @ 64 bits (reused FP)

• 3DNow! (AMD in 1997)3DNow! (AMD in 1997)– MMX extended to support floating point operationsMMX extended to support floating point operations

• SSE (1999; SSE2 in 2000 for integers)SSE (1999; SSE2 in 2000 for integers)– 8 giant 128-bit registers for SIMD operation8 giant 128-bit registers for SIMD operation

Page 5: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 5/29R. Smith - University of St Thomas - Minnesota

Pentium General RegistersPentium General Registers

• Cut into halves/quarters for compatibility Cut into halves/quarters for compatibility

Page 6: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 6/29R. Smith - University of St Thomas - Minnesota

Pentium RegistersPentium Registers

• Address SpaceAddress Space– Segments with 32 bit addressesSegments with 32 bit addresses– Usually only 1 segment is used by a programUsually only 1 segment is used by a program

• Standard general purpose registersStandard general purpose registers– EAX, EBX, ECX, and EDX – 32 bits each, with lower half EAX, EBX, ECX, and EDX – 32 bits each, with lower half

accessible separately and as separate high/low bytesaccessible separately and as separate high/low bytes– Each has special jobs in certain arithmetic instructionsEach has special jobs in certain arithmetic instructions

• Address RegistersAddress Registers– ESI, EDI – point to strings in memoryESI, EDI – point to strings in memory– EBP – points to bse of the current stack frame (local memory)EBP – points to bse of the current stack frame (local memory)– ESP – the stack pointerESP – the stack pointer

Page 7: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 7/29R. Smith - University of St Thomas - Minnesota

Intel Assembly LanguageIntel Assembly Language

• Opcode destination, sourceOpcode destination, source– Format similar to LC-3 and MIPSFormat similar to LC-3 and MIPS– BUT, allows memory to memory transfersBUT, allows memory to memory transfers– Operands may be reg/mem, reg/reg, mem/mem, mem/regOperands may be reg/mem, reg/reg, mem/mem, mem/reg– But it all depends on the opcode - many weird restrictionsBut it all depends on the opcode - many weird restrictions

• Segment Registers – mostly obsoleteSegment Registers – mostly obsolete– Provide the “upper” part of the address in 16bit-1MB daysProvide the “upper” part of the address in 16bit-1MB days– RAM addresses traditionally included a segment registerRAM addresses traditionally included a segment register

• MOV AL,DS:[7777h]MOV AL,DS:[7777h]– Move contents of 7777 hex (mapped by DS) to ALMove contents of 7777 hex (mapped by DS) to AL– DS segment is for “data” - the default segmentDS segment is for “data” - the default segment

Page 8: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 8/29R. Smith - University of St Thomas - Minnesota

Addressing ModesAddressing Modes

• ““Displacement only” - direct addressDisplacement only” - direct address– Traditionally uses a segment registerTraditionally uses a segment register– DS is the defaultDS is the default

• Register IndirectRegister Indirect– MOV AL,[BX] - moves to RAM addressed by BXMOV AL,[BX] - moves to RAM addressed by BX– MOV AL,ES:[DI] addressed by DI with ES segmentMOV AL,ES:[DI] addressed by DI with ES segment– The ‘BP’ register uses SS segment by defaultThe ‘BP’ register uses SS segment by default

• Various Indexed modesVarious Indexed modes– Combine 1 or 2 index registers plus offsetCombine 1 or 2 index registers plus offset– May include offsetMay include offset

Page 9: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 9/29R. Smith - University of St Thomas - Minnesota

Memory Addressing Modes SummaryMemory Addressing Modes Summary

• Pick zero or one from each columnPick zero or one from each column• Suffix an “E” for the Pentium registersSuffix an “E” for the Pentium registers• BX = EAX, EBX, ECX, EDX, ESP, or EBPBX = EAX, EBX, ECX, EDX, ESP, or EBP• Also add a “scale factor” for 8/16/32/64Also add a “scale factor” for 8/16/32/64

Page 10: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 10/29R. Smith - University of St Thomas - Minnesota

Pentium InstructionsPentium Instructions

• Traditional InstructionsTraditional Instructions– Add, sub, add w/carry, sub w/carry, mul, divAdd, sub, add w/carry, sub w/carry, mul, div– BCD arithmetic, booleans, shifts/rotates, string opsBCD arithmetic, booleans, shifts/rotates, string ops– Loops, conditionals, condition code setting, subroutinesLoops, conditionals, condition code setting, subroutines

• MMX / SSE / XXM ExtensionsMMX / SSE / XXM Extensions– Intended to better support image manipulation for multimediaIntended to better support image manipulation for multimedia– MMX: Eight 64-bit registers, plus special instructionsMMX: Eight 64-bit registers, plus special instructions– SSE/XXM: Eight 128-bit registers usable as 16 registers of 64 SSE/XXM: Eight 128-bit registers usable as 16 registers of 64

bitsbits– Parallel adds, shifts, multiplies of multiple values packed into Parallel adds, shifts, multiplies of multiple values packed into

MMX registersMMX registers

• Example applicationsExample applications– Subtracting one image from another for overlayingSubtracting one image from another for overlaying– Unpacking a compressed image (JPEG, MPEG, etc)Unpacking a compressed image (JPEG, MPEG, etc)

Page 11: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 11/29R. Smith - University of St Thomas - Minnesota

Architecture of The First PentiumArchitecture of The First Pentium

• Before SIMD and super-Before SIMD and super-graphics, but still a graphics, but still a major machinemajor machine

• Superscalar – faster than Superscalar – faster than “linear” instruction “linear” instruction executionexecution

• Separate caches for Separate caches for instructions and datainstructions and data

Page 12: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 12/29R. Smith - University of St Thomas - Minnesota

Pentium DetailsPentium Details• Fixed Point: 5 stagesFixed Point: 5 stages

– U-pipe and V-pipe are U-pipe and V-pipe are interchangeable except if interchangeable except if the instruction needs the the instruction needs the barrel shifter.barrel shifter.

– U and V can run 2 U and V can run 2 instructions in parallel – instructions in parallel – cover vast majority of cover vast majority of instructions usedinstructions used

• Floating Point: 8 stagesFloating Point: 8 stages– Uses part of the fixed Uses part of the fixed

point pipelinepoint pipeline

Page 13: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 13/29R. Smith - University of St Thomas - Minnesota

Pentium ProPentium ProProcessorProcessor Overview Overview

• First First P6P6 Pentium Pentium– 19951995– Pentium II in 1997Pentium II in 1997– Pentium III in 1999 Pentium III in 1999

• Reservation StationReservation Station– Decoupled instruction Decoupled instruction

fetching from executionfetching from execution– Leap in performanceLeap in performance

Page 14: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 14/29R. Smith - University of St Thomas - Minnesota

P6 Architecture – Caches & ExecutionP6 Architecture – Caches & Execution

Page 15: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 15/29R. Smith - University of St Thomas - Minnesota

Pentium Pro DetailsPentium Pro Details

Page 16: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 16/29R. Smith - University of St Thomas - Minnesota

Pentium III System StructurePentium III System Structure

Page 17: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 17/29R. Smith - University of St Thomas - Minnesota

Pentium III ProcessorPentium III Processor

Page 18: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 18/29R. Smith - University of St Thomas - Minnesota

P III Instruction Execution UnitsP III Instruction Execution Units

Page 19: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 19/29R. Smith - University of St Thomas - Minnesota

Pentium 4 System StructurePentium 4 System Structure

Page 20: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 20/29R. Smith - University of St Thomas - Minnesota

Pentium 4 microinstructionsPentium 4 microinstructions

• Embeds a RISC architecture and pipelining Embeds a RISC architecture and pipelining within a CISC instruction setwithin a CISC instruction set– Instructions fetched to CPUInstructions fetched to CPU– Translated into internal RISC-style “microinstructions”Translated into internal RISC-style “microinstructions”– Microinstructions are stored in the level 0 instruction cacheMicroinstructions are stored in the level 0 instruction cache– CPU execution logic executes microinstructions in a pipelined CPU execution logic executes microinstructions in a pipelined

fashionfashion

• Retains compatibility with old Pentium and x86 Retains compatibility with old Pentium and x86 code while achieving RISC-like performancecode while achieving RISC-like performance

Page 21: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 21/29R. Smith - University of St Thomas - Minnesota

Pentium 4 Processor ArchitecturePentium 4 Processor Architecture

Page 22: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 22/29R. Smith - University of St Thomas - Minnesota

Memory HierarchiesMemory Hierarchies

• Temporal LocalityTemporal Locality– If I touched location X just now, I’ll likely touch it again soon.If I touched location X just now, I’ll likely touch it again soon.

• Spatial LocalitySpatial Locality– If I touch location X, I’ll probably also touch X+1, X-1, etc.If I touch location X, I’ll probably also touch X+1, X-1, etc.

• Lesson: keep stuff you’re using nearby in the Lesson: keep stuff you’re using nearby in the fastest RAM you can buildfastest RAM you can build

• Lesson: if you’re not using it right now, it’s OK Lesson: if you’re not using it right now, it’s OK to stick it in slower storage till you need itto stick it in slower storage till you need it

• Lesson: the system can hide the hierarchy Lesson: the system can hide the hierarchy from your programs, most of the timefrom your programs, most of the time

Page 23: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 23/29R. Smith - University of St Thomas - Minnesota

Storage Technologies (costs in 2004)Storage Technologies (costs in 2004)

• At the top: Hard DrivesAt the top: Hard Drives– Size: Terabytes. Cost/GB: $.50-$2Size: Terabytes. Cost/GB: $.50-$2– Access time: 5 million to 20 million nsecAccess time: 5 million to 20 million nsec

• FlashFlash– Size: Gigabytes. Cost/GB: $15Size: Gigabytes. Cost/GB: $15– Access time: 200 nsecAccess time: 200 nsec

• Dynamic RAM (typical computer RAM)Dynamic RAM (typical computer RAM)– Size: Gigabytes, Cost/GB: $100-200Size: Gigabytes, Cost/GB: $100-200– Speed 50-70 nsecSpeed 50-70 nsec

• Static RAM (cache, on-chip, registers)Static RAM (cache, on-chip, registers)– Size: Megabytes, Cost/GB: $4K to $10KSize: Megabytes, Cost/GB: $4K to $10K– Speed: .05 - 5 nsecSpeed: .05 - 5 nsec

Page 24: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 24/29R. Smith - University of St Thomas - Minnesota

The driving force in computer designThe driving force in computer design

• Programs are hard to writePrograms are hard to write

• How do we get the most out of the programs How do we get the most out of the programs we have already written?we have already written?

• Implications for memoryImplications for memory– CPU mustn’t see cache operation in generalCPU mustn’t see cache operation in general– CPU mustn’t see oddities in RAM layout or availability CPU mustn’t see oddities in RAM layout or availability

• How do we hide these details?How do we hide these details?

Page 25: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 25/29R. Smith - University of St Thomas - Minnesota

Hiding the detailsHiding the details

• Cache implementationCache implementation– We give the CPU an MAR/MDR interfaceWe give the CPU an MAR/MDR interface– We make most RAM references as fast as possibleWe make most RAM references as fast as possible– We NEVER make a mistakeWe NEVER make a mistake

• Process swappingProcess swapping• Multiprocessor problemsMultiprocessor problems

• RAM ManagementRAM Management– We make RAM look identical to all programsWe make RAM look identical to all programs– Programs can’t tell where they really reside in RAMPrograms can’t tell where they really reside in RAM– Give programs exactly as much RAM as they need at a given Give programs exactly as much RAM as they need at a given

time, and give the rest away to other programs that are taking time, and give the rest away to other programs that are taking turns with the CPUturns with the CPU

Page 26: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 26/29R. Smith - University of St Thomas - Minnesota

Direct Mapped CacheDirect Mapped Cache

• The preferred design these daysThe preferred design these days– A collection of high speed RAM locationsA collection of high speed RAM locations– Broken into individually addressed “cache entries” Broken into individually addressed “cache entries” – Part of RAM address chooses cache entry (“Direct mapping”)Part of RAM address chooses cache entry (“Direct mapping”)

• A cache entryA cache entry– ““Index” is its address in the cacheIndex” is its address in the cache– Valid bit - true if the entry contains valid RAM dataValid bit - true if the entry contains valid RAM data– ““Tag” holds the address bits not matching the cache addressTag” holds the address bits not matching the cache address– Data area - where the stored data residesData area - where the stored data resides

• Store multiple words (spatial locality)Store multiple words (spatial locality)

Page 27: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 27/29R. Smith - University of St Thomas - Minnesota

ExampleExample

• 32 bit RAM addresses32 bit RAM addresses• 64 cache entries, each contains 16 bytes64 cache entries, each contains 16 bytes• How do we resolve cache addresses?How do we resolve cache addresses?• How big is the tag field?How big is the tag field?• How much RAM does it need, in bits, per entry?How much RAM does it need, in bits, per entry?• How much for the whole cache?How much for the whole cache?

Page 28: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 28/29R. Smith - University of St Thomas - Minnesota

CPU and Cache HandlingCPU and Cache Handling

• What happens with a cache hit?What happens with a cache hit?• What happens with a cache miss?What happens with a cache miss?

– A stall, like a pipeline stall, but simplerA stall, like a pipeline stall, but simpler– We stall the whole CPU - inefficient but it’s the best approachWe stall the whole CPU - inefficient but it’s the best approach

• What happens when we write data?What happens when we write data?– ““Write through” runs the write while CPU proceedsWrite through” runs the write while CPU proceeds– Other CPU accesses get the cached, updated valueOther CPU accesses get the cached, updated value– ““Write miss” - obvious approach isn’t efficientWrite miss” - obvious approach isn’t efficient– Use a “write buffer” to catch missed writesUse a “write buffer” to catch missed writes

Page 29: March 2005 1/29R. Smith - University of St Thomas - Minnesota ENGR 330: Today’s Class NotesNotes –Networking/Telecom Course (QMCS 370) –CIGs Pentium Instruction

March 2005 29/29R. Smith - University of St Thomas - Minnesota

All done.All done.

• Questions?Questions?

• Diagrams cribbed from random Internet sitesDiagrams cribbed from random Internet sites