Upload
jamie-kinney
View
198
Download
1
Embed Size (px)
Citation preview
Accelerating Time to Science:Transforming Research in the Cloud
Jamie Kinney - @jamiekinneyDirector of Scientific Computing, a.k.a. “SciCo” – Amazon Web Services
Why does Amazon care about Scientific Computing?• In order to meaningfully change our world for the better by accelerating the pace of scientific discovery
• It is a great application of AWS with a broad customer base
• The scientific community helps us innovate on behalf of all customers– Streaming data processing & analytics– Exabyte scale data management solutions and exaflop scale compute– Collaborative research tools and techniques– New AWS regions– Significant advances in low-power compute, storage and data centers– Efficiencies which will lower our costs and therefore pricing for all customers
Why Did We Create SciCo?
In order to make it easy to find, discover and use AWS for scientific computing at any scale. Specifically, SciCo helps AWS:
• More effectively support global “Big Science” Collaborations• Develop a solution-centric focus for engaging with the global scientific and engineering communities
• Accelerate the development of a scientific computing ecosystem on AWS• Educate and Evangelize our role in Scientific Computing
How is AWS Used for Scientific Computing?
• High Performance Computing (HPC) for Engineering and Simulation• High Throughput Computing (HTC) for Data-Intensive Analytics• Collaborative Research Environments• Monte-Carlo Simulations• Data Visualization• Hybrid Supercomputing centers• Citizen Science• Science-as-a-Service• Internet of Things (IOT)• Serverless Computing
Why do researchers love using AWS?
Time to ScienceAccess research
infrastructure in minutes
Low CostPay-as-you-go pricing
ElasticEasily add or remove capacity
Globally AccessibleEasily Collaborate with
researchers around the world
SecureA collection of tools to
protect data and privacy
ScalableAccess to effectively
limitless capacity
Research Grants
AWS provides free usage credits to help researchers:• Teach advanced courses• Explore new projects• Create resources for the scientific community
aws.amazon.com/grants
AWS hosts “gold standard” reference data at our expense in order to catalyze rapid innovation and increased AWS adoptionA few examples:1,000 Genomes ~250 TB
Cancer Genomics Data Sets ~5 PB (TCGA & ICGC )Astronomy Data and APIs 1PB+Common CrawlOpenStreetMapCensus Data
Climate, Meteorology, & Earth Observing DataIRRI 3000 Rice Genome Project
Public Data Sets
• NEXRAD level-II archives going back to 1991
• Real-time data via Unidata
• Tools and tutorials:
• Using Python with NEXRAD on AWS
• THREDDS Data Server
• Creating Static and Animated Maps with NEXRAD on AWS
• AWS JavascriptS3 Explorer
• NOAA’s Weather & Climate Toolkit
• Unidata’s Integrated Data Viewer
NEXRAD on AWS
https://aws.amazon.com/noaa-big-data/nexrad/
• Raw and processed genomic, transcriptomic, and epigonomic data from thousands of cancer patients
• Controlled-access data sets available to qualified researchers with access administered by Seven Bridges Genomics and the Ontario Institute for Cancer Research
• Available to users of the NCI’s Cancer Genomics Cloud (CGC) running on AWS
• Users can interact with the data via the CGC web portal or the CGC’s APIs
• The PanCancer Launcher is an open-source system to create EC2 instances, enqueueanalysis work items, trigger Docker-based analytic pipelines, and clean up launched resources when work is complete.
The Cancer Genome Atlas (TCGA) and ICGC
https://aws.amazon.com/public-data-sets/tcga/https://aws.amazon.com/public-data-sets/icgc/https://aws.amazon.com/blogs/aws/new-aws-public-data-sets-tcga-and-icgc/
High Throughput Computing at Scale
The Large Hadron Collider @ CERN includes 6,000+ researchers from over 40 countries and produces approximately 25PB of data each year.
The ATLAS and CMS experiments are using AWS for Monte Carlo simulations, processing, and analysis of LHC data.
Data-Intensive ComputingThe Square Kilometer Array will link 250,000 radio telescopes together, creating the world’s most sensitive telescope. The SKA will generate zettabytesof raw data, publishing exabytes annually over 30-40 years.
Researchers are using AWS to develop and test: • Data processing pipelines• Image visualization tools• Exabyte-scale research data management• Collaborative research environmentsaws.amazon.com/solutions/case-studies/icrar/
Astrocompute in the Cloud Program• AWS is adding 1PB of SKA pre-cursor data to the Amazon Public Data Sets program
• We are also providing $500K in AWS Research Grants for the SKA to direct towards projects focused on:– High-throughput data analysis– Image analysis algorithms– Data mining discoveries (i.e. ML, CV and
data compression)– Exascale data management techniques– Collaborative research enablement
https://www.skatelescope.org/ska-aws-astrocompute-call-for-proposals/
NepalEarthquake
Individuals around the world
are analyzing before/after imagery of Kathmandu
in order to more-effectively direct
emergency response and
recovery efforts
Estimating Trees & Shrubs in Sub-Sahara
• NASA’s Center for Climate Simulation (NCCS) used AWS to process satellite imagery in order to estimate the tree and brush biomass over the entire arid and semi-arid zone on the south side of the Sahara.
• By using AWS, NASA was able to reduce the processing time from ~1 year to less than a month.• By estimating biomass, NASA is able to develop more accurate climate models which will help us
better understand and mitigate the effects of climate change.
High Performance Computing
Simulations in the Automotive Sector• Crash and materials simulations• Fluid and thermal dynamics simulations• Car body aerodynamics• Electronics and electromagnetic simulations
Honda materials science simulations on AWS:• Deploying scalable HPC clusters on AWS Spot – up to 1000 C3 instances• Running more simulations than before, for more accurate results
“Cloud offers us an opportunity, as we can innovate faster than before.”- Ayumi Tada, IT System Administrator, Honda R&D
Social Analytics on AWSWe Feel is a project that explores whether social media can provide an accurate, real-time signal of the world’s emotional state that analyzes approximately 27 million tweets/day.
A collaboration between CSIRO, The Black Dog Institute, Amazon Web Services and GNIP.
The outcomes?1. We can now monitor, in real-time, the emotional
health of the world2. Seamlessly scale infrastructure up or down in
direct relation to social activity3. Amazon’s Big Data platform enables real-time
trend analysis, queries of historical data and geospatial analytics
http://wefeel.csiro.au/
Baylor College of Medicine CHARGEBaylor College of Medicine Human Genome Sequencing Center and DNANexus using the Mercury Pipeline for the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium
Supports 300+ researchers around the world
Analyzed the genomes of over 14,000 individuals, encompassing 3,751 whole genomes and 10,940 whole exomes (~1PB of data)
Used 3.3 million core hours over 4 weeks to complete the job 5.7x faster than what could have been accomplished on-premise
The outcomes?• Easier collaboration• Faster time to science• Cost-effective: On-premise was prohibitively expensive• No longer constrained by on-premise capacity• Scientists focusing on Science as opposed to infrastructure
https://aws.amazon.com/solutions/case-studies/baylor/
Schrodinger & Cycle Computing:Computational Chemistry for Better Solar Power
Simulation by Mark Thompson of the University of Southern California to see which of 205,000 organic compounds could be used for photovoltaic cells for
solar panel material.
Estimated computation time 264 years completed in 18 hours.
• 156,314 core cluster, 8 regions
• 1.21 petaFLOPS (Rpeak)
• $33,000 or 16¢ per molecule
Loosely Coupled
Science-as-a-Service
Globus Genomics, DNAnexus, and SevenBridges Genomics offer inexpensive, easy-to-use, and secure platforms for processing and analyzing genomic data.
The Weather Company pushes four gigabytes of data to AWS each second in order to delivers 15 billion forecasts each day to their customers around the world.
aws.amazon.com/solutions/case-studies/the-weather-company/
Citizen ScienceThe Asteroid Data Hunters competition used AWS to develop better mechanisms for finding near-Earth asteroids. The top algorithm is 18% better at finding asteroids!
Internet of Things• Connect and manage devices• Secure device connections & data• Process & act upon device data• Read & set device state at any time
https://aws.amazon.com/iot
John Deere’s Vice President of Technology Information Solutions Patrick Pinkston describing how the 178-year-old company is using Amazon’s cloud.
John Deere is using AWS IoT solutions to help farmers track planting a crop downto the seed, both at the individual seed level and by the acreage of the field.
https://www.youtube.com/watch?v=uq4kQPsM4cQ