27
Microprocessor system architectures – IA64 Jakub Yaghob

Microprocessor system architectures – IA64 Jakub Yaghob

Embed Size (px)

Citation preview

Page 1: Microprocessor system architectures – IA64 Jakub Yaghob

Microprocessor system architectures – IA64

Jakub Yaghob

Page 2: Microprocessor system architectures – IA64 Jakub Yaghob

Application architecture

Page 3: Microprocessor system architectures – IA64 Jakub Yaghob

Application architecture features – I

Instruction set Architecture

Load-Execute-Store architecture, no stack, no division Explicit parallelism

Massive resources (128 integer and FP registers, 64 predicate registers, 8 branch registers)

Enhancements Speculation, predication, software pipelining, branch

prediction, multimedia instructions Instruction level parallelism

Independent instructions in bundles Multiple bundles per clock

Page 4: Microprocessor system architectures – IA64 Jakub Yaghob

Application architecture features – II

Explicit parallelism Instruction group

Defined by a compiler Parallel execution of instructions Strict requirements on dependencies

Forbidden register RAW, WAW dependencies

Memory model Relatively weak Only restriction is RAW, WAW, WAR dependencies on one

memory location Explicit memory access synchronization

Page 5: Microprocessor system architectures – IA64 Jakub Yaghob

Speculation Early memory load Control speculation

Advancing load in a condition Sometimes load executed “uselessly”, when the condition

is not met Data speculation

Advancing load before a store with aliases Checking using ALAT

Speculation check No speculative load, if it would cause an exception Data speculation is invalid, if there is a write to the memory

location

Page 6: Microprocessor system architectures – IA64 Jakub Yaghob

Prediction

Predicate registers 64 1-bit predicate registers PR0-PR63 PR0 hardwired to 1, write is ignored

No specialized arithmetic/logic flags Set by compare instructions

Pair of PR (one for the comparison, one for complementary comparison)

Modes of setting (some of them breach WAW inside of an instruction group)

Nearly all instructions are conditioned by a PR

Page 7: Microprocessor system architectures – IA64 Jakub Yaghob

Register stack

Support for function calls GR0-GR31 are global registers GR32-GR127 create a register stack Each procedure has a register frame

2 variable sized areas: local and output

Register renaming using alloc instruction First output register becomes GR32

If register stack overflows, then CPU will free some registers by saving them into the memory

Page 8: Microprocessor system architectures – IA64 Jakub Yaghob

Privilege levels and serialization Privilege levels

Like IA-32, levels 0-3 System instructions and registers accessible only with CPL=0

Serialization Data dependency

All application and system resources excluding control registers Values written to a register are observed by instructions in subsequent

instruction groups Instruction serialization

Modifications are observed before subsequent instruction group fetches are re-initiated

Data serialization Modifications affecting both execution and data memory access are observed

In-flight Non-serialized resources have “some” value for reads

Page 9: Microprocessor system architectures – IA64 Jakub Yaghob

System registers

Page 10: Microprocessor system architectures – IA64 Jakub Yaghob

Processor Status Register (PSR)

Current execution environment Divided into four overlapped sections Special instructions

Page 11: Microprocessor system architectures – IA64 Jakub Yaghob

Control registers

128 control registers Large number of reserved, only 26 used Groups

Global control registers CR0 (DCR=Default Control Register) CR2 (IVA=Interruption Vector Address) CR8 (PTA=Page Table Address)

Global interrupt control registers Control of an active interrupt

Writes are not serialized

Page 12: Microprocessor system architectures – IA64 Jakub Yaghob

Banked general registers

Fast switching of GR16-GR31 for interrupt handlers Current bank in PSR.bn Bank switching

Interrupt selects bank 0 rfi sets the bank from IPSR.bn bsw switches to the specified bank Including NaT

Page 13: Microprocessor system architectures – IA64 Jakub Yaghob

Virtual memory model

Virtual regions Supports OS with Multiple Address Spaces

Protection domain mechanism Supports OS with Single Address Space

TLB Algorithms for paging deferred to OS

VHPT (Virtual Hash Page Table) Augmenting TLB performance Inverted page tables

Other mechanisms Various page sizes, fixed translations, …

Page 14: Microprocessor system architectures – IA64 Jakub Yaghob

Address translation

Page 15: Microprocessor system architectures – IA64 Jakub Yaghob

TLB

Separated for code and data Data TLB translates accesses to VHPT or RSE Each TLB divided into two parts

Translation registers (TR) Fully associative array OS can explicitly set the translation No automatic replacement

Translation cache (TC) Entries can be inserted by an instruction Automatic replacement (from VHPT)

Page 16: Microprocessor system architectures – IA64 Jakub Yaghob

Access rights on pages

Defined by TLB.ar and TLB.pl Using TLB.ar

Read only Read, execute Read, write Read, write, execute Read only/read, write Read, execute/read, write, execute Read, write, execute/read, write Exec, promote/read, execute

Page 17: Microprocessor system architectures – IA64 Jakub Yaghob

Virtual addressing – other – I

Page sizes 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M,

4G Region registers (RR)

Highest 3 bits of VA create an index into RR

rid – region identification ps – preferred page size ve – VHPT enabling

Page 18: Microprocessor system architectures – IA64 Jakub Yaghob

Virtual addressing – other – II

Protection keys

At least 16 keys A key in TLB entry is compared with protection

keys; exception „key miss fault“

Page 19: Microprocessor system architectures – IA64 Jakub Yaghob

VHPT – I

Page 20: Microprocessor system architectures – IA64 Jakub Yaghob

VHPT – II

Vlastnosti CPU do VHPT nic nezapisuje

CPU neudržuje koherenci TLB a VHPT Dva formáty

Krátký – pro každou oblast, položka 8B Dlouhý – jedna velká pro systém, položka 32B

Různé velikosti mocniny 2 Prohledáváno, pokud selže TLB Pokud nalezeno ve VHPT, automaticky vloženo do TC Pevné hashovací funkce

Page 21: Microprocessor system architectures – IA64 Jakub Yaghob

Physical addressing and memory attributes

Only 63 bits Current architecture and implementation only 50

bits Memory attributes

Virtual – like IA-32 (WB, WC, …) Physical – using bit 63 of FA

0 – WB, speculative 1 – UC, nonspeculative

Nontrivial rules for memory ordering

Page 22: Microprocessor system architectures – IA64 Jakub Yaghob

Interrupts – I

Kinds depending on handlers IVA

Handled by OS, a vector defined by CR2 PAL

Handled by PAL or by system firmware, ev. by OS Kinds depending on behavior

Abort Interrupt

External, asynchronous Fault Trap

Interrupts are disabled during interrupt handling

Page 23: Microprocessor system architectures – IA64 Jakub Yaghob

Interrupts – II

Currently defined 81 exceptions 5 for „hard“ exceptions

RESET, INIT, INT, MCA, PMI 23 for IA-32 emulation

IVA-interrupts Vectors have fixed address Exception groups on one vector

External interrupts 256 vectors Priority division using vector number

Current vector CR65 (IVR=Interrupt Vector Register) Current priority in CR66 (TPR=Task Priority Register)

Page 24: Microprocessor system architectures – IA64 Jakub Yaghob

RSE – 1

Register Stack Engine (RSE) Transfers registers stack from/to memory

Without software intervention in the background Different activity modes (lazy-store intensive-load

intensive-eager) Physical register stack must have size at least 96

registers More in multiplies of 16

Page 25: Microprocessor system architectures – IA64 Jakub Yaghob

RSE – II

Page 26: Microprocessor system architectures – IA64 Jakub Yaghob

Firmware

Processor Abstraction Layer (PAL) Unified interface to the CPU firmware

System abstraction layer (SAL) Separates OS from implementation variation of platforms

Extensible firmware interface (EFI) OS booting

Each FW layer (including OS) has defined an entry point

PAL and SAL placed in 16M memory exactly below 4G Fixed structure

Page 27: Microprocessor system architectures – IA64 Jakub Yaghob

Model firmware