Upload
dinhlien
View
219
Download
6
Embed Size (px)
Citation preview
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
CHAPTER 9Simulation Methods
• SIMULATION METHODS
• PARALLEL SIMULATIONS
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
How to study a computer system Methodologies
Construct a hardware prototype Mathematical modeling Simulation
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Construct a hardware prototype Advantages
Runs fast Disadvantages
Takes long time to build- RPM (Rapid Prototyping engine for Multiprocessors) Project @ USC; took a few
graduate students several years Expensive Not flexible
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Mathematically model the system Use analytical modeling
Probabilistic Queuing Markov Petri Net
Advantages Very flexible Very quick to develop Runs quickly
Disadvantages Can not capture effects of system details Computer architects are skeptical of models
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Simulation Write a program that mimics system behavior Advantages
Very flexible Relatively quick to develop
Disadvantages Runs slowly (e.g., 30,000 times slower than hardware)
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Most popular research method Simulation is chosen by MOST research projects Why?
Mathematical model is NOT accurate Building prototype is too time-consuming and too expensive for academic
researchers
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Simulation Bottleneck 1 GHz = 1 Billion Cycles per Second Simulating a second of a future machine execution = Simulate 1B
cycles!! Simulation of 1 cycle of a target = 30,000 cycles on a host 1 second of target simulation = 30,000 seconds on host = 8.3
Hours CPU2K run for a few hours natively Speed much worse when simulating CMP targets!!
7
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Simulation Bottleneck 1 GHz = 1 Billion Cycles per Second Simulating a second of a future machine execution = Simulate 1B
cycles!! Simulation of 1 cycle of a target = 30,000 cycles on a host 1 second of target simulation = 30,000 seconds on host = 8.3
Hours CPU2K run for a few hours natively Speed much worse when simulating CMP targets!!
8
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
How to overcome simulation bottleneck
Gate level (RTL)
Cycle accurate
Functional level (ISA)
Detail Simulation speed
trade accuracy for simulation speed
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
How to overcome simulation bottleneck
Gate level (RTL)
Cycle accurate
Functional level (ISA)
Model based approximation
Detail Simulation speed
trade accuracy for simulation speed
This trade-off has resulted ina plethora of simulators
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Tool classification OS code execution
System-level (complete system)- Does simulate behavior of an entire computer system, including OS and user code- Examples:
– Simics– SimOS
User-level- Does NOT simulate OS code- Does emulate system calls- Examples:
– SimpleScalar
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Tool classification Simulation detail
Instruction set- Does simulate the function of instructions- Does NOT model detailed micro-architectural timing- Examples:
– Simics Micro-architecture
- Does clock cycle level simulation- Does speculative, out-of-order multiprocessor timing simulation- May NOT implement functionality of full instruction set or any devices- Examples:
– SimpleScalar RTL
- Does logic gate-level simulation- Examples:
– Synopsis
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Tool classification Simulation input
Trace-driven- Simulator reads a “trace” of inst captured during a previous execution by
software/hardware- Easy to implement, no functional component needed- Large trace size; no branch prediction
Execution-driven- Simulator “runs” the program, generating a trace on-the-fly- More difficult to implement, but has many advantages- Interpreter, direct-execution- Examples:
– Simics, SimpleScalar…
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Multi-Core Simulation Sequential simulation
– All target cores are simulated in one thread (on one host core)– Unified memory hierarchy models simulate resource contention
Parallel simulation Each target core is simulated in separate thread
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved
Multi-Core Simulation Sequential simulation
– All target cores are simulated in one thread (on one host core)– Unified memory hierarchy models simulate resource contention
Parallel simulation Each target core is simulated in separate thread
There is no relation between the number of target cores and the cores on the host!
(except simulation speed)