1
Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs Cooperation with M. Daub • G. Schneider Extending the Scope of Approximate Computing to Scientific Computing and Simulation Technology Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Joachim Wunderlich E-mail: [email protected], [email protected] Institute of Computer Architecture and Computer Engineering Motivation Heterogeneous computer architectures Goal: Efficient and fault-tolerant execution of simulation applications Simulation on Reconfigurable Heterogeneous Architectures Challenges Reliability Simulation applications: often executed for days and months Modern CMOS devices: Increasingly vulnerable to reliability threats Required: Fault-tolerant simulation algorithms Achieving optimal performance Performance depends on the combination of implementation and utilized architecture Alexander Schöll Hans-Joachim Wunderlich SimTech Cluster of Excellence www.simtech.uni-stuttgart.de Approximate Computing Trade-off precision for efficiency Often limited to applications with inherent error tolerance Applying approximate computing to simulation technology Tight accuracy constraints Often low error resilience Acceleration of Markov-Chain Monte-Carlo Molecular Simulations Cooperation with Cooperation with J. Castillo • J. Groß Markov-Chain Monte-Carlo (MCMC) Core of many tasks in thermodynamics Mapping to GPU: exploiting parallel energy calculations and speculative evaluation of Monte- Carlo moves Heterogeneous mapping to CPU and GPU results in significant speedups Collaborations in SimTech Current work Molecular Configuration Motivation Deeper understanding for the activation of apoptosis Simulation: Dominated by extensive computing times Goals Reduction of computation time … to obtain extensive and detailed conclusions about the clustering behavior Computational Performance Results Adaptive discretization of time and heterogeneous mapping to CPU and GPU results in significant speedups Biological Evaluation Evolution of ligand-receptor clusters in less than 0.5s Preconditioned Conjugate Gradient (PCG): Important sparse linear system solver Iterative solving method Goal: PCG on approximate hardware with guaranteed result accuracy Challenges: Error resilience is changing over time Overhead by additional operations to monitor error resilience Solution: Use efficient fault tolerance to monitor and adapt approximation at runtime Experimental Results Hardware utilization and iteration count compared to execution on precise hardware 50% 60% 70% 80% 90% 100% 110% 120% 130% Hardware utilization Iteration count [1] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations", Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN'16, Toulouse, France, 28. June-1. July, 2016, pp. 251-262. [2] A. Schöll, C. Braun, and H.-J. Wunderlich, "Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware”, in Proceedings of the 29th Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'16, University of Connecticut, USA, 19. – 20. September, 2016 , pp. 21 - 26. DFTS Best Paper Award 2016. [3] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver", in Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'15, Amherst, MA, USA, 12.-14. October, 2015, pp. 60-66. [4] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, "Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures", in Proceedings of the 30th IEEE International Conference on Computer Design, ICCD'12, Montreal, Canada, 30. September-3. October, 2012, pp. 207-212. [5] A. Schöll, C. Braun, M. Daub, G. Schneider, and H.-J. Wunderlich, "Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs", in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2014, Belfast, UK, 2.-5. November, 2014, pp. 424-431. SimTech Best Paper Award 2014. x86-64 ARM SPARC Intel MIC AMD Excavator Intel Skylake Nvidia Pascal Xilinx Zynq Xilinx Virtex Altera Stratix Central Processing Unit Graphics Processing Unit Field Programmable Gate Array CPU GPU FPGA CPU CPU GPU GPU GPU CPU CPU FPGA FPGA Approximate Computing Paradigm AC Emerging Trade-off precision for a gain in efficiency Required: Exploit inherent error tolerance of applications Approximate Computing in image processing

SimTech Cluster of Excellence - uni-stuttgart.de

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SimTech Cluster of Excellence - uni-stuttgart.de

Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs

Cooperation with M. Daub • G. Schneider

Extending the Scope of Approximate Computing to Scientific Computing and Simulation Technology

Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Joachim Wunderlich

E-mail: [email protected], [email protected]

Institute of Computer Architecture and Computer Engineering

MotivationHeterogeneous computer architectures

Goal: Efficient and fault-tolerant execution of simulation applications

Simulation on Reconfigurable Heterogeneous Architectures

ChallengesReliability• Simulation applications:

• often executed for days and months

• Modern CMOS devices: • Increasingly vulnerable to reliability threats

• Required: Fault-tolerant simulation algorithms

Achieving optimal performance• Performance depends on the combination of implementation and

utilized architecture

AlexanderSchöll

Hans-JoachimWunderlich

SimTech Cluster of Excellence

www.simtech.uni-stuttgart.de

Approximate Computing• Trade-off precision for efficiency

• Often limited to applications withinherent error tolerance

Applying approximate computing tosimulation technology• Tight accuracy constraints

• Often low error resilience

Acceleration of Markov-Chain Monte-Carlo Molecular Simulations

Cooperation with Cooperation with J. Castillo • J. Groß

Markov-Chain Monte-Carlo (MCMC)• Core of many tasks in thermodynamics

• Mapping to GPU: exploiting parallel energy calculations and speculative evaluation of Monte-Carlo moves

• Heterogeneous mappingto CPU and GPU resultsin significant speedups

Collaborations in SimTech

Current work

MolecularConfiguration

Motivation• Deeper understanding for the

activation of apoptosis

Simulation: Dominated by extensive computing times

Goals• Reduction of computation time

• … to obtain extensive and detailed conclusions about the clustering behavior

Computational Performance Results• Adaptive discretization of time and heterogeneous

mapping to CPU and GPU results in significant speedups

Biological Evaluation

Evolution of ligand-receptor clusters in less

than 0.5s

Preconditioned Conjugate Gradient (PCG):Important sparse linear system solver• Iterative solving method

Goal: PCG on approximate hardware with guaranteed result accuracy

Challenges:• Error resilience is changing over time

• Overhead by additional operationsto monitor error resilience

Solution: • Use efficient fault tolerance

to monitor and adapt approximation at runtime

Experimental Results

• Hardware utilization and iteration countcompared to execution on precise hardware

50%

60%

70%

80%

90%

100%

110%

120%

130%

Hardware utilization Iteration count

[1] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations", Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN'16, Toulouse, France, 28. June-1. July, 2016, pp. 251-262.

[2] A. Schöll, C. Braun, and H.-J. Wunderlich, "Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware”, in Proceedings of the 29th Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'16, University of Connecticut, USA, 19. – 20. September, 2016 , pp. 21 - 26. DFTS Best Paper Award 2016.

[3] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver", in Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'15, Amherst, MA, USA, 12.-14. October, 2015, pp. 60-66.

[4] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, "Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures", in Proceedings of the 30th IEEE International Conference on Computer Design, ICCD'12, Montreal, Canada, 30. September-3. October, 2012, pp. 207-212.

[5] A. Schöll, C. Braun, M. Daub, G. Schneider, and H.-J. Wunderlich, "Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs", in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2014, Belfast, UK, 2.-5. November, 2014, pp. 424-431. SimTech Best Paper Award 2014.

x86-64ARMSPARC

Intel MIC AMD ExcavatorIntel Skylake

Nvidia Pascal Xilinx Zynq Xilinx VirtexAltera Stratix

Central Processing Unit Graphics Processing Unit Field Programmable Gate Array

CPU GPU FPGA

CPU CPU GPU GPUGPUCPU CPU FPGAFPGA

Approximate Computing Paradigm

AC

Emerging

Trade-off precision fora gain in efficiency

Required: Exploit inherent error tolerance of applications

Approximate Computing in image processing