Upload
clare-booker
View
215
Download
0
Embed Size (px)
Citation preview
Simulation of the Fresh BreezeStorage System
Xiaoxuan MengJack Dennis
MIT CSAIL University of Delaware
Funded by NSF HECURA Grant CCF-0937832
Project Goals
• Implement fine-grain management of tasks and memory.• Provide a base for cpability-based access control and security.• Support composability of arbitrary parallel program components.• Achieve high efficiency in operation.
2
Computer Systems that:
The Fresh Breeze Memory Model
• Fan-out as large as 16
Data
Chunkse.g. 128 Bytes
Master
Chunk
• Write-Once then Read Only
Cycle-Free Heap Arrays as Trees of Chunks
3
• Arrays: Three levels yields 4096 elements (longs)
Storage System Architecture
Execution System
Top Storage
Level
4
AD
Main Storage
Level
CUSU
SUAD CU
SU
SU
AD CUSU
SU
AD – Associative Directory
CU - Control Unit
SU – Storage Unit
Spawn and Join
5
Master Task
spawn (n)
join_fetch Join
Worker 0 Worker n-1
join_update join_update
Spawned Tasks
Continuation Task
Spawn - Join Implementation
Results from
Spawned Tasks
6
Join TicketJoin Data
Count/Limit
Continuation
Task
System for Simulation
L1P
MainMemory
Switch
Processors
40Up to 64IndependentStorage Units
EUs TSUs MSUs
L1P
Fresh Breeze Simulation
X-Bar Switch
EUs – Task Execution TSUs – CacheImplementation
MSUs – Main Storage Implementation
Cyclops TUs
The Dot Product
A
B
* Sum
A B
5 levels:Vector length =165 = 1,048,576
* +
scalar result
* *
Each Task:Dot Product of two16-element vectors:16 multiplies; 15 adds
Spawn – Join Programming
void DotProd ( a, b ) {join_init (16, Accumulate ( ) );for ( i = 0; i < 16; i++)
spawn_one ( i, DotProd ( a[i], b[i] )quit ();
}
void Accumulate ( ) {handle = join_fetch ();sum = 0;for ( j = 0; j < 16; j++ )
sum += handle [j];join_update ( sum );
}
Findings
• Using fine-grain tasks and a hardware scheduler can effectively distribute concurrent tasks over a large number of processors.
• Task stealing can achieve high utilization of many processors.
11
Further Work
• Model a Three-Level Storage System: L1, L2, Main.• Task distribution to exploit data locality.• Further benchmark applications for testing: MM, FFT, LUD.• FunJava Compiler: DFG transformations and code
generation.
12