Upload
vivian-montgomery
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Anton, a Special-Purpose Machine for Molecular Dynamics
Simulation
By David E. Shaw et al
Presented by Bob Koutsoyannis
The Anton Legacy• Anton van Leeuwenhoek “Father of Microscopy”• First to see bacteria and other micro organisms• Objective: Improve
the tools available to scientists to further our understanding of organisms & diseases
Anton the Machine
• Specialized Massively Parallel Machine being built to improve Molecular Dynamic Simulations.
• In the works to be completed by 2009
• Biological processes spatially distributed among many nodes in a 3D torus.
• MD specific hardware
• Novel parallel algorithms
Molecular Dynamics Simulation
• Models the motions and interactions of molecular systems– Proteins– Cell Membranes– DNA– (atomic level
simulations)
Motivation• Life Saving…
• Used to visualize biochemical phenomena that cannot be seen in lab experiments.– Protein Folding– Protein, Protein interactions– Protein, Drug interaction
• Key for Developing Drugs
What makes one MD simulatorbetter than the Next?
• Time Scale– Being able to simulate the interaction between
molecules for more than a nanosecond.
• Problem Size– Why is a millisecond of simulation out of the
scope of our current technology?– Consider 200,000 molecules
• 1012 time steps to simulate a millisecond– Each time step requires intense arithmetic computation
on all 200,000 molecules
What makes one MD simulatorbetter than the Next?
• Other Projects Addressing MD Sims– Folding@Home
• Network of 200,000 PC’s• Large sample for independent molecular sims• But no millisecond simulations
– FASTRUN, MDGRAPE, MD Engine• Good with larger molecular system sims• Have strong arithmetic units• Still limited by communication bottlenecks
• Force Calculation • (getting an idea of the level of computation needed)
• Molecular mechanics force fields used to model the total PE of a system.
• Input: X,Y,ZOutputs: Force Quantities
MD Simulator Requirements
M1 M2
• Force Calculation • (getting an idea of the level of computation
needed)
• For every time step, the force fields must be updated.
• FFT, Convolution, Inverse FFT (Computationally expensive operations)
• For 200,000 molecules/step…• 1) Need a huge number of
arithmetic processing elements
MD Simulator Requirements
• Integration • (getting an idea of the level of computation
needed)
• For every time step, updates of atomic positions and velocities must be made.
• Global actions and Constraints must be enforced on the entire system (temperature, pressure, optimizations.)
MD Simulator Requirements
• Parallelization• (getting an idea of the level of computation
needed)
• For every time step, every atom must communicate within its cutt-off radius with every other atom.
• 2) A lot of inter-processor communication that can be scaled well is needed.
MD Simulator Requirements
• Parallelization• (getting an idea of the level of computation
needed)
• Whole System is broken down into boxes (processing nodes)
• Each node handles the bonded interactions within
• NT method for non-bonded interactions (much more common).
• NT method for Atom Migration
MD Simulator Requirements
• 1) Need a huge number of arithmetic processing elements
• 2) A lot of inter-processor communication that can be scaled well is needed.
• 3) Memory is not an issue– With 25,000 atoms (64bytes
each) total=1.6MB over 512 nodes=3.2KB/node which is < most L1
Why Specialized Hardware?
Memory
Communication
Computation
Needs
• Consider Moore’s Law on 10X improvement in 5 years vs. Anton’s 1000X in 1 year.
• Can great discoveries wait?• Can use custom pipelines
with more precision, increased datapath logic speed, over less silicon area.
• Have Tailored ISA’s for geometric calculations+• Programmability for accommodating various force fields and
integration algorithms• Dedicated memory for each particle to accumulate forces
Why Specialized Hardware?Memory
Communication
Computation
Needs
• Low-latency, high-bandwidthnetwork within and betweenASICs.
• Push based communicationwith counters (reduce wait).
• Set of Autonomous DirectMemory Access (DMA) Enginesallowing for greater overlap of communication and computation.
• Admission Control Features
Communication Latency
Updating force field
This node may update for them
1. High-Throughput Interaction Subsystem (HTIS)
2. Flexible Subsystem
3. Communication Subsystem
4. Memory Subsystem
Subsystems of Anton
– Executes Non-bonded MD interaction calculations (Charge Spreading & Force Interpolation)
– Accumulates forces on each particle as data streams through.– ICB Controls flow of data through the HTIS, programmable ISA
extensions, acts as a buffering, pre-fetching, synchronization, and write back
controller
High-Throughput Interaction Subsystem
Flexible Subsystem
•Initiates Force Computation Phase
•Calculates bonded force terms
•Force correction terms
•All integration tasksConstraint Calculations (temp & pressure)Pos. Vel. UpdatesAtom MigrationAll Maintenance Activities (boot, diagnostic, self-test,
loading sims, switching contexts, logging, check pointing, error reporting).
• General Purpose Core w/ Caches• Remote Access Unit
– Autonomous data transfers
• Geometry Cores– MD calculations bonded
• Correction Pipeline– Computes force correction terms
• Racetrack– Local, internal connect for flex subsys components
• Ring Interface Unit– Flex subsys to transfer packets to/from communication subsystem.
Flexible Subsystem
• Routing 48-bit address space• 16-bit node identifier 32-bit of address per node• Flow Control
• Provided access to ASIC DRAM• Supports accumulation and synchronization
Communications Subsystem
Memory Subsystem
• 500X NAMD 80-100X Desmond 100X Blue Matter
Simulation Evaluations
Accuracy• Force Error measured in relative rms force error• Energy Drift
Efficiency• Increase system
simulation size leads to increase in efficiency.