12
1 UC Berkeley Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley

UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

1

UC Berkeley

Time dilation in RAMP

Zhangxi Tan and David PattersonComputer Science Division

UC Berkeley

2

A time machine

• Using RAMP as datacenter simulator– Vary DC configurations: processors, disks,

network and etc.– Evaluate different system implementations: Mapreduce

with 10 Gbps, 2ms delay or 100 Gbps, 80ms delay interconnect

– Explore and predict what happened if update hardware in your cluster: powerful CPU, fast/large disks

– Try things in the future!

RAMP insideRAMP inside

3

The problems

• Emulate fast and many computers in FPGA

• What are the problems?– First comment half year ago in RadLab retreat:

100 MHz is too slow can’t reflect GHz machine– Targets are becoming more and more complex

• Implement them in FPGA and cycle accurate is desired

• How many cores can we put in FPGA? (Original vision 16-24 cores per chip. Now, 1 Leon on V2P30, 2-3 on V2P70)

4

Methodologies

• RDL– Target cycle, host cycle, start, stop, channel model…

• Transfer data between units with extra start/stop control• Replace original transferring logic with RDL• control target clock: If no data, still send something to keep the target

time “running”• Bad control logic implementation may cause deadlock• RDLizing unit (build channels, units) if you want to talk with each other

– Compared to porting APPs for MicroBlaze?– RDLizing is obvious and simple??

• Model: event driven? or clock driven?

• Time dilation – Remove target cycle control

• Stepping every clock cycle is the way to debug 1000 nodes system?– Use standard data transfer interface– Rescale everything to a “virtual wall clock” and “slow down” events

accordingly• Events: Timer interrupt, data sent/received and etc

5

Basic Idea

• “Slow down” time passage to make target faster– 10 ms wall clock time = 2 ms

target time• Network: shorter time to send

packet -> BW increase, latency decrease

• Disk: shorter time to read/write• CPU: shorter time to do

computation

– Virtual wall clock is the coordinate in target, only control event interval in implementation

Wall clock

10 ms perceived event interval

10 ms

Virtual wall clock

2 ms

2 ms perceived event interval

10 ms perceived event interval

No time dilation

Time dilation

6

Real world examples

Real

Time dilation

1 sec

Timer interrupt

before time dilation

10 ms

Network

CPU and OS

100 ms

Sending 100 Mb data between two events

Perceived BW : 100 Mbps

Perceived BW : 1 Gbps

• Sending data at the same rate with the same logic

Timer interrupt after time dilation

50 ms in wall clock time

10 ms perceived in target

• OS updates its timer every 10 ms (jiffies) in each timer interrupt

• Reprogram the timer to slow the interrupt down

– No OS modifications– No HW changes

• Speed up the processor by x5

7

Experiments

• HW Emulator (FPGA): 32-bit Leon3 with, 50MHz, 90 MHz DDR memory, 8K L1 Cache (4K Inst and 4K Data)– Target system: Linux 2.6 kernel, Leon @

50 MHz / 250 MHz / 500 MHz / 1 GHz / 2 GHz– Run Dhrystone benchmark– Tomorrow: HW/SW co-simulation example

• ConceptTime Dilation Factor = wall clock time / emulated

clock time

8

Dhrystone result (w/o memory TD)

How close to a 3 GHz x86 ~8000 Dhrystone MIPS? Memory, Cache, CPI

9

Problems

• Similar to time dilation in VM – To Infinity and Beyond: Time-Warped Network

Emulation, NSDI 06

• Everything scaled linearly, including memory! – VM is lucky: networking code can fit in cache

easily.– RAMP has more knobs to tweak.

• Solution: slow down the memory and redo the experiment

10

Dhrystone w. Memory TD

Keep the memory access latency constant - 90 MHz DDR DRAM w. 200 ns latency in all target (50MHz to 2GHz)- Latency is pessimistic, but reflect the trendRAMP blue result + Time dilation vs. real system?

11

Limitation of Naïve time dilation

• Fixed CPI (memory/CPU) model• Next step

– Variable time dilation factor: distribution and state (statistic model)

– Emulate OOO with time dilationPeek each instruction and dilate it

– Going to deterministic? No, I’ll do statistic

UnitTime dilation counter

Proposed model

• No extra control between units• Reprogram Time Dilation Counter (TDC) in each unit to get different target configuration

12

Discussions!