Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Accelerating Biomolecular Simulation using a Scalable Network of Reconfigurable HardwareArun Patel1, Christopher Madill2,3, Manuel Saldaña1, Christopher Comis1, Dave Chui1, Sam Lee1, Régis Pomès2,3, Paul Chow1.
1 Department of Electrical and Computer Engineering, University of Toronto. 2 Department of Structural Biology and Biochemistry, The Hospital for Sick Children. 3 Department of Biochemistry, University of Toronto.
Motivation
• Computer simulations of biomolecules is increasingly playing an important role in medical research.
• Understanding the balance of physical forces governing atomic-level interactions is a central challenge in modern Biochemistry.
• Computer simulations have been successfully applied to study biophysical phenomena including: i) Cell membrane transport. ii) Molecular conformational equilibria. iii) Protein / Ligand docking. iv) Time-dependent molecular motion.
• Current technological limitations restrict the size and length of simulations.
• The primary objective of this study is to engineer a scalable Molecular Dynamics simulator which is capable of outperforming supercomputers and computing clusters.
• This project is a collaboration between the Department of Electrical and Computer Engineering, and the Department of Biochemistry, University of Toronto and the Department of Structural Biology and Biochemistry, The Hospital for Sick Children.
Force Calculations (F = - E)
θ
EAngle= -kθ(θ-θ0)2
EBond= kb(l-l0)2
l
ETorsion= A[1 + cos(nτ + ϕ)]
τ
Evan der Waals= 4ε -[(−) (−) ]σ 12 σ 6
r r
EElectrostatic=−rq1q2
r
r δ+δ−
Δ→
Molecular Dynamics ArchitectureCoordinate Repository
F
F
F
FReduce Forces
Determine net forceacting on every atom
Integrate Forces
Calculate acceleration, velocityand the new position as a function of the computed forces
a =
v = a
r = r0 + v dt
F
m-
∫
r1
2
3
2
4
Output Coordinates
Echo coordinates to a computer for storage and/or visualization
δ+δ−
r
Compute Force
Compute Force
Compute Force
a = acceleration
F = force
m = mass
r = position
dt = timestep
v = velocity
→→
→
→
Distribute atoms to compute engines
Conventional vs. Proposed Implementations
• Despite these enhancements, a protein folding reaction which occurs in 10-5 seconds would require about 30 years of CPU time to simulate.
• Consequently, MD simulations are often run on supercomputers or large clusters.
3000-node Molecular Dynamics Cluster at the Pittsburgh Supercomputing Center
• NAMD, a state-of-the-art MD program, employs many techniques to improve simulation performance including:
• Spatial decomposition to enable parallel force evaluations.• PM-Ewald algorithm to improve electrostatic calculation efficiency.• Nonbonded cutoff to reduce the number of pairwise calculations.
∑
∑
∑
CPU
FPGA with multiplecomputing coresand transceivers
Circuit board ofnetworked FPGAs
• Our proposed implementation is not another attempt at building a supercomputer. This is an interdisciplinary effort to design a machine that performs MD simulations roughly 103x faster than the current supercomputer-based approach.
• MD is a highly-parallel problem with a large computation-to-data transfer ratio.
• Computation throughput is improved vastly by performing time-consuming computation kernels with hardware accelerators. • Using a network of interconnected FPGAs, we achieve greater parallelization and higher integration than typical supercomputers. • The fully-interconnected network topology, developed by profiling NAMD network activity, is ideal for transferring small data quickly.
• A single rack of FPGA-based hardware can out- perform a supercomputer in MD simulation!
Block A Block B
FSL I/F
FSL I/F
Chip A Chip B
TX RX
∑
∑
∑
CPU
CPU
rq1q2
r δ+δ−
4ε - [(−) (−) ]σ 12 σ 6r r
rr
FPGA
• Multi-Gigabit Transceivers allow gigabit-rate communication• Serial Protocol enables a fully- interconnected network topology
• FPGAs combine computation and communication hardware on-chip • Reconfigurability = Flexibility• High integration density possible
• Programming model allows mix of hardware and software elements• Soft-processors can be used to emulate hardware functionality
• Ewald Electrostatic Force Engine• Computes electrostatic forces in hardware with N∙log(N) algorithm• Other engines use N2 algorithm
• Lennard-Jones Potential Engine• Computes van der Waals forces• Outperforms existing software implementations by up to 88x
• Fast Simplex Links abstract on- chip hardware communication• Standard interface enables rapid integration of system modules
Hardware Components
Reduce
Coords
MG
T In
terface
MG
T In
terface
LJForces
CPU
Eth
ernet In
terface
Simulation FPGAVisualization FPGA
MGT Link
Output Terminal
Optical Link
MGT Links
OutputTerminal
{{Sub - ClustersCPUCPU
Hardware Prototypes
• Goal of initial prototype is to develop programming model• Uses standardized Message- Passing Interface (MPI) soft - ware for communication• Accelerators are emulated using soft-processors• Control processor separated from computation units• Lennard-Jones Forces only
• Second prototype divided into fully-interconnected sub-clusters of five FPGAs• Optical links used to connect sub-clusters together• Embedded soft-processors handle control flow and perform scheduling duties• Each FPGA “node” contains heterogeneous mixture of processors and accelerators
First Generation Prototype
Second Generation Prototype