Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Scaling Quantum Mechanics and First Principles Dynamics for Accuracy and Efficiency
Todd J. Martínez SLAC National Accelerator Laboratory
Stanford University
SciDAC4 Project: Designing Photocatalysts Through Scalable Quantum Mechanics and Dynamics
Project Overview • Improve computational modeling tools for modern
architectures to enable design of improved photocatalysts • Major Thrusts
– Enable Rapid Prototyping and deployment of multi-level parallel algorithms (Alex Aiken, Kunle Olukotun, Lexing Ying)
• Legion/Regent for high-level parallel expression • DeLite to build domain specific languages for quantum chemistry
and first principle dynamics – Develop and implement modular library frameworks for large-scale
atomistic simulations (Ed Hohenstein, Todd Martínez, Robert Parrish) • Lightspeed framework to build electronic structure from highly
tuned primitives – Design of photoactivated enzymes (Ron Dror, Possu Huang, TJ Lane,
Henry van den Bedem) • Protein design with novel cofactors • QM/MM investigations of photoenzymes
FAP: CO2 Photochemistry in Nature
hv
Fatty Acid Photodecarboxylase
• Photo-activated production of alkanes
• Requirement for light: not yet clear
• Mechanism: not known
• Project goal: QM calculations to predict mechanism, verify by comparing to spectroscopy, crystallography
Chromophore in FAP
FAD
Like all photoenzymes, FAP contains an “antenna” molecule (cofactor) that absorbs photons and plays a key role in the chemistry
Exp
QM/MM
Absorption Spectrum
QM/MM calculations of excited state mechanism are in progress Large QM regions: Need fast and accurate QM!
See Poster by TJ Lane
Legion
• Task-based programming model – Heterogeneous machines – Distributed memory
• Key features – Programs are written without specifying
where computations will run and where data will be placed
– Separate mapping phase allows program to be tuned to a particular machine
• Currently working to apply Legion to electron repulsion integral generation
Task graph for one time step of a simple application
DeLite
• DeLite framework allows for easy specification of domain specific languages – Domain specific optimizations – Structure computation and data – Optimize mapping to different hardware targets such as GPUs or
FPGAs
• Working to develop DSL for J/K matrix build • Using DeLite multiscale neural network DSL to machine learn
better exchange-correlation functionals
Learn the nonlinear map from potential to density
Primitives for Electronic Structure?
k
Electron Repulsion Integrals (ERIs)
Density Matrix
SCF Energy
SCF Gradients
We built fast GPU versions of these primitives Build a framework that allows easy reuse?
Continuum Solvation and QM/MM
Lightspeed: EST Primitives ----------------------------------------- Python/C++ API ------------------------------------------
Lightspeed: Library Layout
R.M. Parrish, T.J. Martinez and co-workers, in preparation.
Geometry RHF CIS CASCI THC-PT2
Psidewinder: EST Methods -------------------------------------------- Python API ---------------------------------------------
TeraChem (GPU-based IntBox, CASBox, etc)
OpenMM (CPU/GPU-based MM)
LibXC (CPU-based XC kernels)
Box Layer: Performant Code (Embedded or Linked as needed/available) ------------------------------------------ Wrapped API -------------------------------------------
Users Students Dynamics Codes Applications Codes
Resource List Tensor Basis DFTGrid THCGrid
Data Structures (Simple C++ Classes)
IntBox GridBox DFTBox CASBox THCBox
Compute Primitives (Static Functions)
Future Extensions
Lightspeed
R.M. Parrish, T.J. Martinez and co-workers, in preparation.
Lightspeed Example: QM/MM FOMO-CASCI
Lightspeed Example: QM/MM FOMO-CASCI (continued)
And another dozen lines to production-scale AIMD (see Parrish poster)...
Key Methods:
Lightspeed Enables Rapid Prototyping: Parallax – Neglect of Fragment Differential Overlap (NFDO)
Next Target: Current State of the Art:
Single Points Dynamics
Large Gas-Phase Large PBC
At the Least, Coulomb Embedding is Required: => Parallax Idea:
Parallax: PME Approach
T. Darden, D. York, and L. Pedersen, J. Chem. Phys., 98, 10089 (1993). L. Füsti-Molnár and Peter Pulay, J. Chem. Phys., 117, 7827 (2002).
C.-M. Chang, Y. Shao, and J. Kong, J. Chem. Phys., 136, 114112 (2012).
Raw Density:
Not Fourier Representable
Gaussian-Blurred Density:
Fourier Representable
ESP of blurred density (long-range Ewald ESP):
Parallax: Full Coulomb Embedding (Long Range)
Parallax: Full Coulomb Embedding (Short Range)
=> Linear Scaling, highly but not embarrassingly parallel.
Parallax: Sensitivity to Grid/Ewald Parameters Fourier Grid Spacing: Short-Range Cutoff Distance:
Parallax: RHF/STO-3G 648-Atom Unit Cell -1.6x104 Eh SCF Energy
Parallax: Scaling for Simple Water Box
N=1 N=2 N=4
Parallax: RHF/STO-3G Timings on PBC Water Boxes (10 SCF Iterations + Gradient)
12-Core Intel [email protected] GHz/400 GB RAM
Parallax – Next Steps
• Benchmark accuracy of approach • Application to chemical reactions in solution
– Reacting system treated as a single fragment
• Extend to allow for fragmentation across covalent bonds • Extend to allow for dynamic fragmentation
Recommendation Systems
Recommendation Systems
Dark City Star Wars Zero Dark Thirty
Steel Magnolias
Joe 1 ? 2 1
Fred 5 5 1 1
Jane ? ? 5 ?
Alex ? 3 ? ?
Sara 1 ? ? 5
Need to complete the matrix… Is this possible? Only if the matrix does not have much information! Machine learning strategy: ASSUME the problem is well-posed In example above – 10 numbers out of 20, assume this is enough to determine remaining 10 numbers…
Singular Value Decomposition
= + + …
Diagonal Rank = number nonzero diagonal elements
Assume rank is small and find best low rank approximation that reproduces known entries in the matrix…
Full CI Full Configuration Interaction:
All possible arrangements of N electrons in M orbitals
Need matrix-vector products Hc, vector length NI is factorial in N,M Formal cost: Accounting for sparsity of H:
c1 +c2 +c3 +c4
Rank Sparsity in Full CI? How can we reveal noninformative nature of CI vector?
Rearrange vector to a matrix:
“α-string” “β-string”
c C
Now, play same trick from recommendation systems:
Memory requirement, Computational cost:
⊗
Note early work by Koch and later also Taylor…
Are Electronic Wavefunctions Informative?
50 terms (out of 104) are enough for kcal/mol accuracy! Even fewer terms needed
for accurate energy differences!
rr-FCI Performance
≈1010 determinants! Even the “conventional” FCI timings here are FAR faster than expected – this is because of Fales-Levine work on GPU-based Full CI
1016 determinants
Number of grains of sand on Earth: 1018
Age of the universe: 1017 sec
≈1015 seconds w/ conventional FCI
rr-FCI • Renaissance in selected CI methods – exploiting element
sparsity of CI vector (Evangelista, Head-Gordon, Umrigar, Alavi): FCI-QMC, Heat Bath-CI, Adaptive Sampling CI
• These also increase the efficiency of full CI – some are deterministic (ASCI) and others are stochastic (FCI-QMC, HBCI)
• rr-FCI exploits rank sparsity of CI vector – There is an rr-FCI value for every element of the CI vector – This is qualitatively different from approximations which neglect small
elements • Deterministic, so gradient is straightforward for dynamics • Working to improve convergence and to exploit mixed
precision
Tensor Hypercontraction
Applying decomposition techniques to 4th order tensor:
P,Q indices have approx. same range as basis
i,j,k,l are unpinned! Can sum over any of these without carrying along the others… Numerical methods to determine X and Z simultaneously… Analytic formulas to determine Z, given X Numerical methods to determine X, given Z Formal scaling of all pair methods reduced to O(N4)!
Implementation of THC
Graphical Representation of J-like term
Where to put parenthesis?
Automatic Factorization for THC
See poster by Ed Hohenstein
Conclusions • Lightspeed framework being developed to rapidly prototype new
algorithms: will provide lessons for more flexible DSLs and implementation of core “boxes” in Legion
• NFDO method being developed using the Lightspeed framework – demonstrates ability to rapidly prototype in this environment
• Rank reduced Full CI leverages low complexity of CI wavefunctions • Rank reduction techniques can also be used for electron repulsion
integrals with tensor hypercontraction – we are exploring automated code generation and optimization to implement THC algorithms
• Simulations of photoenzyme activity in FAP are underway and will be enabled/facilitated by quantum chemistry and first principles dynamics developments
Acknowledgments
• Alex Aiken • Kunle Olukotun • Lexing Ying • TJ Lane • Henry Van Den Bedem • Ron Dror • Possu Huang • Ed Hohenstein • Robert Parrish • Alice Walker • Scott Fales
$$: DOE ASCR / DOE BES
• Henrik Koch • Stefan Seritan • Benjamin Levine • Nick Settje