Upload
data-science-research-center
View
179
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Henri Ball describes how high performance computing is driven by the demands of large scale data problems. He also describes his links to other computer science disciplines within the DSRC.
Citation preview
Henri BalVrije Universiteit Amsterdam
High Performance Distributed Computing
Outline
1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle
4. Conclusions
Developments
• Multiple types of data explosions:– Big data: huge processing/transportation
demands– Complex heterogeneous data
10-100 x global internet traffic per year,exascale processing
Complex data
Developments
• Infrastructure explosion– High complexity: heterogeneous systems with
diversity of processors, systems, networks
VU HPDC GROUP
• Bridge the gap between demanding applications and complex infrastructure
• Distributed programming systems for– Clusters, grids, clouds– Heterogeneous systems (``Jungles”) – Accelerators (GPUs) – Clouds & mobile devices
• Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….
Highlights VU-HPDC group
1st Prize: SCALE 2008
AAAI-VC 2007 DACH 2008 - BS DACH 2008 - FT
3rd Prize: ISWC 2008 1st Prize: SCALE 2010 EYR 2011Sustainability award
Solved Awari 2002
889Billiongamestates
Multimediadata
Astronomydata
Semanticweb
Semanticweb
Multimediadata
Links to data science cycle
Understand and decide
Analyze and model
Store and process
Reasoning
Knowledge representati
on
MultimediaRetrieval
Modeling and
simulation
Machine Learning
Information Retrieval
Decision Theory
PerceptionCognition
VisualAnalytics
DistributedProcessing
Large Scale Databases
SoftwareEng.
System / Network
Eng.
Distributed reasoning
Distributed computingMapreduce
Reasoning – Semantic Web
• Make the Web smarter by injecting meaning so that machines can “understand” it.o initial idea by Tim Berners-Lee in 2001
• Now attracted the interest of big IT companies
Google Example
Google Example
Distributed Reasoning
• WebPIE: web-scale distributed reasoner doing full materialization
• QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples
• DynamiTE: maintains materialization after updates (additions & removals)
Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen
COMMIT/
Distributed Computing• Jungle computing with Ibis
– Distributed, heterogeneous, hierarchical systems
• Programming accelerators
With: NLeSC (Frank Seinstra, Rob van Nieuwpoort et al.)
Ibis
• ComputationalAstrophysics (Leiden)
• Climate Modeling (Utrecht)
• Multimedia Content Analysis (UvA)
AMUSE
radiative transport
gravitational dynamics
hydro-dynamics
stellar evolution
Accelerators (GPUs)
• Use cases– Multimedia content analysis– Climate modeling– LOFAR (pulsar pipelines)
• Methodology for efficient GPU programming– Stepwise refinement, different levels of
hardware abstraction– Compiler feedback at each level
L2 Cache
Mem
ory
Contro
ller
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Mem
ory
Contro
ller
Mem
ory
Contro
ller
Mem
ory
Contr
oller
Mem
ory
Contr
oller
Mem
ory
Contr
oller
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Host Interface
GigaThread Engine
L2 Cache
Mem
ory
Contro
ller
GPCSM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPCSM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Mem
ory
Contro
ller
Mem
ory
Contro
ller
Mem
ory
Contr
oller
Mem
ory
Contr
oller
Mem
ory
Contr
oller
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
GPC
SM
Raster Engine
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
SM
Polymorph Engine
Polymorph Engine
Host Interface
GigaThread Engine
Challenge: getting grip on performance
Glasswing: MapReduceon Accelerators
• Use accelerators (OpenCL) as mainstream feature
• Massive out-of-core data sets• Scale vertically & horizontally• Maintain MapReduce abstraction
With: Ismail El Helw, Rutger Hofman, UvA-SNE
Glasswing Pipeline
• Overlaps computation, communication & disk access
• Supports multiple buffering levels
Evaluation (DAS-4, EC2)
• Compute-bound applications benefit dramatically from GPUs (up to 107×)
• Better scalability than Hadoop• Runs on a variety of accelerators &
clouds
Challenge: real-world (compute-intensive) applications
Conclusions
• Strong links with Big data & Complex data
Understand and decide
Analyze and model
Store and process
Reasoning
Knowledge representati
on
MultimediaRetrieval
Modeling and
simulation
Machine Learning
Information Retrieval
Decision Theory
PerceptionCognition
VisualAnalytics
DistributedProcessing
Large Scale Databases
SoftwareEng.
System / Network
Eng.