Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
DRIVING PURDUE UNIVERSITY'S NEXT GIANT LEAP IN HPC AND DATA SCIENCE
PRESTON SMITH EXECUTIVE DIRECTOR OF RESEARCH [email protected]
Purdue University
Outline
Topics Today
▪ Community Cluster Program at Purdue University
▪ Integrative Data Science Initiative
▪ “Gilbreth” Supercomputer for AI, ML, and HPC
Research Computing History at Purdue
1967 – CDC 6500
• Purdue became one of the first
academic institutions with a
supercomputer, a Control Data
Corp 6500
• Peak performance of 1/3 of a
megaflop!
1983 – Cyber 205
• Purdue acquired a Cyber 205 –
one of the most powerful
systems operated by a
university at the time.
Research Computing History at Purdue
Early 90s – Intel Paragon XPS
Late 90s – IBM SP
Landscape, Early 2000s
... Privately-run clusters were proliferating in labs and newly-made datacenters across campus!
Community Clusters
The Early Years
• Without a large capital acquisition by the university, providing cutting-edge computing capabilities for researchers was not possible.
• Many faculty were getting funding to acquire, host and operate HPC resources for themselves
• Solution: pool these funds to operate clusters for researchers!
– The faculty no longer have to devote a grad student to managing their cluster, or persuade their dean to renovate computing space for them!
Community Cluster Program
The Rules
• You get out at least what you put in– Buy 1 node or 100, you get a queue that guarantees access up
to that many resources
• But wait, there’s more!!– What if your neighbor isn’t using his queue?
• You can use it, but your job has to run in 4-hour chunks if he wants to run.
• You don’t have to do the work– Your grad student gets to do research rather than run your cluster.
• Nor do you have to provide space in your lab for computers.
– Central IT provides data center space, systems administration, application support.
– Just submit jobs!
#166 RICE 2015
#302 BROWN2017
Take Giant Leaps – Sustainable Planet
Dr. Jian Jin
Jian Jin uses hyperspectral imaging to
collect and analyze data about plant
phenotypes.
Jin’s lab can generate 10 TB of image
data each day, and image processing
can be greatly accelerated by the
V100 GPUs on Gilbreth.
Integrative Data Science Initiative
IDSI Vision
Interdisciplinary Approach to Research• Support structured research efforts in Data Science Theory and
Fundamentals, Data-Driven Discovery, and Data Science Applications.
Pervasive Inclusion of Data Science in Education• Establish a Data Science Education Ecosystem incorporating data science
across campus. GOAL: every undergraduate complete her or his studies with relevant professional skills in data science.
• Create physical presence for IDSI and the Educational Ecosystem to promote creative collaboration through proximity and physical interaction.
Corporate & Non-Profit Engagement• Increase data science research and education collaborations with business
and industry.
Launched in 2018
Integrative Data Science Initiative
Research
• 3 Broad Areas:– Data science theory and fundamentals
– Data-driven discovery
– Data science applications
• Themes of Excellence:
– Health and life sciences
– Agriculture
– Manufacturing
– Transportation and civil infrastructure
40 +Faculty
22Hired since
2015
150+Faculty
83Hired Since
2015
Data Science Fundamentals
Data Science Applications
INTERNAL FUNDING – RESEARCH
Eight data science research projects launched by IDSI
The selected projects create synergies among researchers to collaborate and explore data science questions at the
nexus of:
§ health care;§ defense ethics; § society and policy; § and fundamentals, methods and algorithms
52proposal teams
comprised of
172faculty across
48departments,
andcolleges
Nearly
$2Mawarded
Integrative Data Science Initiative
Funded Research Projects
• Fingerprints of the Human
Brain: A Data Science
Perspective
• Quantum Machine Learning for
Data Analytics and
Optimization
• A Relational-Based Measure of
State Legislator Consequence
Integrative Data Science Initiative
Education• Core Data Science and Related Fields
– Increase the number and quality of graduates with core skills in data science
• Infuse Data Science Across Disciplines
– GOAL: every undergraduate complete her or his studies with relevant professional skills in data science.
• The Data Mine
– A data science application centric living learning community for all discipline
• Data Science Applications Certificate Program
– A formal certificate program designed for all Purdue students
Office of the Provost funded 12 proposals as a part of the Education Ecosystem sector of IDSI
Integrative Data Science Initiative
The Data Mine Learning Community
20NEW LEARNING
COMMUNITIES
For more information, contact Mark Daniel Ward ([email protected])
A VISION
A PLACE
AN EXPERIENCE
AN INNOVATIVE ENGINE
The first large-scale living
learning community for undergraduates from all
majors focused on Data Science for All
Corporate partners, faculty and TAs
are mentors for teams of 4-6 students, who develop practical
solutions to open-ended, data-driven problems
Hillenbrand Hall, 800-
student capacity, 100%
committed to The Data
Mine, with dedicated co-
working space
Interdisciplinary teams bring creativity
and new perspectives to tough
problems, where data science is a key
part of the solution
300+HOURS STUDENTS
INVEST ON PARTNER
PROJECTSPer Academic Year
33%WOMEN
616STUDENTSDEDICATED TO
DATA MINE
PROJECTS
Take Giant Leaps - Health and Longevity
Dr. Wen Jiang
Using cryo-electron microscopy (cryo-EM), Dr.
Wen Jiang’s research team operates multiple
Cryo-EM microscopes.
The research team uses community clusters,
Data Depot, and the V100 GPUs on Gilbreth to
accelerate Cryo-EM applications like RELION
and Cryo-Sparc.
The Landscape, early 2018
We’ve gone full circle▪ Now with a University strategic effort in data science,
▪ ... We see privately-run clusters and GeForce GPU workstations are proliferating in labs and offices across campus
Gilbreth - Community Cluster
Gilbreth
• GPU-based system ideal for
machine learning, AI, big data
science – as well as FEA,
Chemistry, MD
• 50 nodes, 100 GPUs
– 1 PF of single-precision!• 2-3PB parallel filesystem storage
• Flash storage
• Annual subscription fee for
access
Over 40 faculty investors in one yearProf. Lil lian Moller Gilbreth
Take Giant Leaps - Aerospace
Dr. Alina Alexeenko
Developed an mplementation of the multi-
species Discontinuous Galerkin Fast
Spectral (DGFS) method for solution of
multi-species monoatomic full Boltzmann
equation on multi- GPU/multi-CPU
architectures.
Using 36 nodes of Gilbreth, Alexeenko’s
research team saw parallel efficiency of
.95, and would allow CFD simulations to
complete in less than a day vs months on
CPU platforms.
http://www.cfd.tu-berlin.de/~panek/cfd/jet2.png
The next Giant Leap?
The Goal
Enable discovery at Purdue that previously was not possible!
▪ By bringing the proliferated labs of GPU workstations in from the
cold and into the campus ecosystem