36
22年 6年 13年 1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CSCE614 Texas A&M University

Introduction to SimpleScalar (Based on SimpleScalar Tutorial)

Embed Size (px)

DESCRIPTION

Introduction to SimpleScalar (Based on SimpleScalar Tutorial). CSCE614 Texas A&M University. Overview. What is an architectural simulator a tool that reproduces the behavior of a computing device Why use a simulator Leverage a faster, more flexible software development cycle - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 1

Introduction to SimpleScalar (Based on SimpleScalar Tutorial)

CSCE614Texas A&M University

Page 2: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 2

Overview

• What is an architectural simulator– a tool that reproduces the behavior of a computing device

• Why use a simulator– Leverage a faster, more flexible software development cycle

• Permit more design space exploration

• Facilitates validation before H/W becomes available

• Level of abstraction is tailored by design task

• Possible to increase/improve system instrumentation

• Usually less expensive than building a real system

Page 3: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 3

Advantages of SimpleScalar

• Highly flexible– functional simulator + performance simulator

• Portable– Host: virtual target runs on most Unix-like systems– Target: simulators can support multiple ISAs

• Extensible– Source is included for compiler, libraries, simulators– Easy to write simulators

• Performance– Runs codes approaching ‘real’ sizes

Page 4: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 4

Simulation Tools

Shaded tools are included in SimpleScalar Tool Set

Trace-Driven

Interpreters

Exec-Driven

Functional

Inst Schedulers Cycle Timers

Performance

Architectural Simulators

Direct Execution

mumichang
Page 5: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 5

Functional vs. Performance Simulators

• Functional simulators implement the architecture– perform real execution

– Implement what programmers see

• Performance simulators implement the microarchitecture– Model system resources/internals

– Concern about time

– Do not implement what programmers see

Page 6: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 6

Trace Driven vs. Execution Driven Simulators

• Trace-Driven– Simulator reads a ‘trace’ of the instructions captured during a

previous execution– Easy to implement– No functional components necessary– No feedback to trace (eg. mis-prediction)

• Execution-Driven– Simulator runs the program (trace-on-the-fly)– Hard to implement– Advantages

• Faster than tracing• No need to store traces• Register and memory values usually are not in trace• Support mis-speculation cost modeling

Page 7: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 7

Instruction Schedulers vs. Cycle Timers

• Instruction Schedulers– Simulator schedules instruction when resources are available

– Instructions proceeded one at a time

– Simpler, but less detailed

• Cycle Timers– Simulator tracks microarch. state each cycle

– Simulator state == microarchitecture state

– Perfect for microarchitecture simulation

Page 8: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 8

SimpleScalar Release 3.0

• SimpleScalar now executes multiple instruction sets: SimpleScalar PISA (the old "SimpleScalar ISA") and Alpha AXP.

• All simulators now support external I/O traces (EIO traces). Generated with a new simulator (sim-eio)

• Support more platforms

• explicit fault support

• And many more

Page 9: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 9

Simulator Suite

Sim-Fast Sim-Safe Sim-ProfileSim-Cache

Sim-CheetahSim-BPred

Sim-Outorder

-300 lines-functional-4+ MIPS

-350 lines-functional w/checks

-900 lines-functional-Lot of stats

-< 1000 lines-functional-Cache stats-Branch stats

-3900 lines-performance-OoO issue-Branch pred.-Mis-spec.-ALUs-Cache-TLB-200+ KIPSPerformance

Detail

Page 10: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 10

Sim-Fast

• Functional simulation• Optimized for speed• Assumes no cache• Assumes no instruction checking• Does not support Dlite!• Does not allow command line arguments• <300 lines of code

Page 11: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 11

Sim-Safe

• Functional simulation

• Checks for instruction errors

• Optimized for speed

• Assumes no cache

• Supports Dlite!

• Does not allow command line arguments

Page 12: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 12

Sim-Cache

• Cache simulation

• Ideal for fast simulation of caches (if the effect of cache performance on execution time is not necessary)

• Accepts command line arguments for:– level 1 & 2 instruction and data caches

– TLB configuration (data and instruction)

– Flush and compress

– and more

• Ideal for performing high-level cache studies that don’t take access time of the caches into account

Page 13: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Sim-Cache (cont'd)

• generates one- and two-level cache hierarchy statistics and profiles

• extra options (also supported on sim-outorder):-cache:dl1 <config> - level 1 data cache configuration

-cache:dl2 <config> - level 2 data cache configuration

-cache:il1 <config> - level 1 instruction cache configuration

-cache:il2 <config> - level 2 instruction cache configuration

-tlb:dtlb <config> - data TLB configuration

-tlb:itlb <config> - instruction TLB configuration

-flush <config> - flush caches on system calls

-icompress - remaps 64-bit inst addresses to 32-bit equiv.

-pcstat <stat> - record statistic <stat> by text address

23年 4月 20日 13

Page 14: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Specifying Cache Configurations• all caches and TLB configurations specified with same format:

<name>:<nsets>:<bsize>:<assoc>:<repl>

• where:<name> - cache name (make this unique)

<nsets> - number of sets

<assoc> - associativity (number of “ways”)

<repl> - set replacement policy

l - for LRU

f - for FIFO

r - for RANDOM

• examples:il1:1024:32:2:l 2-way set-assoc 64k-byte cache, LRU

dtlb:1:4096:64:r 64-entry fully assoc TLB w/ 4k pages,random replacement

23年 4月 20日 14

Page 15: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 15

Sim-Bpred

• Simulate different branch prediction mechanisms

• Generate prediction hit and miss rate reports

• Does not simulate the effect of branch prediction on total execution time

nottakentakenperfectbimod bimodal predictor2lev 2-level adaptive predictorcomb combined predictor (bimodal and 2-level)

Page 16: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 16

Sim-Profile● Program Profiler

● Generates detailed profiles, by symbol and by address

● Keeps track of and reports

● Dynamic instruction counts

● Instruction class counts

● Branch class counts

● Usage of address modes

● Profiles of the text & data segment

Page 17: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 17

Sim-Outorder

• Most complicated and detailed simulator

• Supports out-of-order issue and execution

• Provides reports– branch prediction

– cache

– external memory

– various configuration

Page 18: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Sim-Outorder: Detailed Performance Simulator

• generates timing statistics for a detailed out-of-order issue processor core with two-level cache memory hierarchy and main memory

• extra options:-fetch:ifqsize <size> - instruction fetch queue size (in insts)

-fetch:mplat <cycles> - extra branch mis-prediction latency (cycles)

-bpred <type> - specify the branch predictor

-decode:width <insts> - decoder bandwidth (insts/cycle)

-issue:width <insts> - RUU issue bandwidth (insts/cycle)

-issue:inorder - constrain instruction issue to program order

-issue:wrongpath - permit instruction issue after mis-speculation

-ruu:size <insts> - capacity of RUU (insts)

-lsq:size <insts> - capacity of load/store queue (insts)

-cache:dl1 <config> - level 1 data cache configuration

-cache:dl1lat <cycles> - level 1 data cache hit latency

23年 4月 20日 18

Page 19: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Sim-Outorder: Detailed Performance Simulator

-cache:dl2 <config> - level 2 data cache configuration

-cache:dl2lat <cycles> - level 2 data cache hit latency

-cache:il1 <config> - level 1 instruction cache configuration

-cache:il1lat <cycles> - level 1 instruction cache hit latency

-cache:il2 <config> - level 2 instruction cache configuration

-cache:il2lat <cycles> - level 2 instruction cache hit latency

-cache:flush - flush all caches on system calls

-cache:icompress - remap 64-bit inst addresses to 32-bit equiv.

-mem:lat <1st> <next> - specify memory access latency (first, rest)

-mem:width - specify width of memory bus (in bytes)

-tlb:itlb <config> - instruction TLB configuration

-tlb:dtlb <config> - data TLB configuration

-tlb:lat <cycles> - latency (in cycles) to service a TLB miss

23年 4月 20日 19

Page 20: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Sim-Outorder: Detailed Performance Simulator

-res:ialu - specify number of integer ALUs

-res:imult - specify number of integer multiplier/dividers

-res:memports - specify number of first-level cache ports

-res:fpalu - specify number of FP ALUs

-res:fpmult - specify number of FP multiplier/dividers

-pcstat <stat> - record statistic <stat> by text address

-ptrace <file> <range> - generate pipetrace

23年 4月 20日 20

Page 21: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Specifying the Branch Predictor

• specifying the branch predictor type:-bpred <type>

• the supported predictor types are:nottaken always predict not taken

taken always predict taken

perfect perfect predictor

bimod bimodal predictor (BTB w/ 2 bit counters)

2lev 2-level adaptive predictor

• configuring the bimodal predictor (only useful when “-bpred bimod” is specified):-bpred:bimod <size> size of direct-mapped BTB

23年 4月 20日 21

Page 22: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Specifying the Branch Predictor (cont'd)

• configuring the 2-level adaptive predictor (only useful when “-bpred 2lev” is specified):

-bpred:2lev <l1size> <l2size> <hist_size> <xor>

Configurations: N, M, W, X N:# entries in first level (# of shift register(s)) M:# entries in 2nd level (# of counters, or other FSM) W:width of shift register(s) (# of bits in each shift register) X:(yes-1/no-0) xor history (We use 0 for this homework.) and address for 2nd level index

Sample predictors: GAg: 1,M,W,0 where M = 2^W GAp: 1,M,W,0 where M = C*2^W, C is # of per-address prediction tables PAg: N,M,W,0 where M = 2^W PAp: N,M,W,0 where M = N * 2^W

23年 4月 20日 22

Page 23: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Performance Comparison of GAg,GAp, PAg and PAp

23年 4月 20日 23

Branch address

2-bits per branch predictor

Prediction

2-bit global branch history

4

(b) (2,2) predictor(a) GAp

• GAp: 1 global history register and 8 per-address prediction tables

Page 24: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

Hack the state machine of Branch Predictor!

23年 4月 20日 24

T T

NT NT

Taken

Not taken

Not taken

Not takenNot taken

Taken

Taken

Taken

T T

NT NT

Taken

Not taken

Not taken

Not takenNot taken

Taken

Taken

Taken

(a) A3 (Same as shown in the textbook) (b) A2 (Original Simplescalar Implementation)

Page 25: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 25

Sim-Outorder HW Architecture

Fetch DispatchRegister

Scheduler Exe Writeback Commit

I-Cache

MemoryScheduler

Mem

Virtual Memory

D-Cache D-TLBI-TLB

Page 26: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 26

Sim-Outorder (Main Loop) • sim_main() in sim-outorder.c

ruu_init();for(;;){ ruu_commit(); ruu_writeback(); lsq_refresh(); ruu_issue(); ruu_dispatch(); ruu_fetch();}

• Executed once for each simulated machine cycle• Walks pipeline from Commit to Fetch

– Reverse traversal handles inter-stage latch synchronization by only one pass

Page 27: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 27

Sim-Outorder (RUU/LSQ)• RUU (Register Update Unit)

– Handles register synchronization/communication– Serves as reorder buffer and reservation stations– Performs out-of-order issue when register and memory

dependences are satisfied• LSQ (Load/Store Queue)

– Handles memory synchronization/communication– Contains all loads and stores in program order

• Relationship between RUU and LSQ– Memory dependencies are resolved by LSQ– Load/Store effective address calculated in RUU

Page 28: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 28

Sim-Outorder: Fetch

● ruu_fetch()● Models machine fetch bandwidth● Fetches instructions from one I-cache/memory

● block until I-cache misses are resolved● Instructions are put into the instruction fetch queue named

fetch_data in sim-outorder.c (it is also called dispatch queue in the paper)

● Probes branch predictor to obtain the cache line for next cycle

Page 29: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 29

Sim-Outorder: Dispatch

● ruu_dispatch()● Models instruction decoding and register renaming● Takes instructions from fetch_data● Decodes instructions● Enters and links instructions into RUU and LSQ● Splits memory operations into two separate instructions

Page 30: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 30

Sim-Outorder: Scheduler

● lsq_refresh()● Models instruction selection, wakeup and issue

● Separate schedulers track register and memory dependences. ● Locates instructions with all register inputs ready and all

memory inputs ready● Issue of ready loads is stalled if there is a store with unresolved

effective address in LSQ.● If earlier store address matches load address, target value is

forwarded to load.

Page 31: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 31

Sim-Outorder: Execute

● ruu_issue()● Models functional units, D-cache issue and executes

latencies● Gets instructions that are ready● Reserves free functional unit● Schedules writeback events using latency of the functional

unit● Latencies are hardcoded in fu_config[] in sim-outorder.c

Page 32: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 32

Sim-Outorder: Writeback

● ruu_writeback()● Models writeback bandwidth, detects mis-predictions,

initiated mis-prediction recovery sequence

● Gets execution finished instructions (specified in event queue)

● Wakes up instructions that are dependent on completed instruction on the dependence chains of instruction output

● Detects branch mis-prediction and roll state back to checkpoint

Page 33: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 33

Sim-Outorder: Commit

● ruu_commit()● Models in-order retirement of instructions, store commits

to the D-cache, and D-TLB miss handling

● While head of RUU/LSQ ready to commit● D-TLB miss handling● Retire store to D-cache● Update register file and rename table● Reclaim RUU/LSQ resources

Page 34: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 34

Sim-Outorder:Processor core and other specifications

• Instruction fetch, decode and issue bandwidth• Capacity of RUU and LSQ• Branch mis-prediction latency• Number of functional units

– integer ALU, integer multipliers/dividers– FP ALU, FP multipliers/dividers

• Latency of I-cache/D-cache, memory and TLB• Record statistic by text address

Page 35: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 35

Global Options

• These are supported on most simulators

-h print help message

-d enable debug message

-i start up in Dlite! Debugger

-q quit immediately (use with -dumpconfig)

-config read config parameters from <file>

-dumpconfig save config parameters into <file>

Page 36: Introduction to SimpleScalar  (Based on SimpleScalar Tutorial)

23年 4月 20日 36

How to get help from us

• Drop by during TA’s office hour

• E-Mail [email protected]