Upload
ian-foster
View
2.241
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The title of this talk is a crass attempt to be catchy and topical, by referring to the recent victory of Watson in Jeopardy.My point (perhaps confusingly) is not that new computer capabilities are a bad thing. On the contrary, these capabilities represent a tremendous opportunity for science. The challenge that I speak to is how we leverage these capabilities without computers and computation overwhelming the research community in terms of both human and financial resources. The solution, I suggest, is to get computation out of the lab—to outsource it to third party providers.Abstract follows:We have made much progress over the past decade toward effective distributed cyberinfrastructure. In big-science fields such as high energy physics, astronomy, and climate, thousands benefit daily from tools that enable the distributed management and analysis of vast quantities of data. But we now face a far greater challenge. Exploding data volumes and new research methodologies mean that many more--ultimately most?--researchers will soon require similar capabilities. How can we possible supply information technology (IT) at this scale, given constrained budgets? Must every lab become filled with computers, and every researcher an IT specialist?I propose that the answer is to take a leaf from industry, which is slashing both the costs and complexity of consumer and business IT by moving it out of homes and offices to so-called cloud providers. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity, empowering investigators with new capabilities and freeing them to focus on their research. I describe work we are doing to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date, and suggest a path towards large-scale delivery of these capabilities. I also suggest that these developments are part of a larger "revolution in scientific affairs," as profound in its implications as the much-discussed "revolution in military affairs" resulting from more capable, low-cost IT. I conclude with some thoughts on how researchers, educators, and institutions may want to prepare for this revolution.
Citation preview
www.ci.anl.govwww.ci.uchicago.edu
So long, computer overlordsHow Cloud (and Grid) can liberate research IT – and transform discovery
Ian Foster
www.ci.anl.govwww.ci.uchicago.edu
www.ci.anl.govwww.ci.uchicago.edu
The data deluge
1330 molec. bio databases Nucleic Acids Research (96 in Jan 2001)
Genomic sequencing output x2 every 9 month>300 public centers
100,000 TB
MACHO et al.: 1 TB
Palomar: 3 TB2MASS: 10 TB
GALEX: 30 TBSloan: 40 TB
Pan-STARRS: 40,000 TB
Climate model intercomparisonproject (CMIP) of the IPCC
2004: 36 TB
2012: 2,300 TB
www.ci.anl.govwww.ci.uchicago.edu
5
Big science has achieved big successes
All build on NSF OCI (& DOE)-supported Globus Toolkit software
LIGO: 1 PB data in last science run, distributed worldwide
ESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubs
OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010
Robust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions, built on common technology
www.ci.anl.govwww.ci.uchicago.edu
6
But small science is struggling
More data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates
www.ci.anl.govwww.ci.uchicago.edu
7
Medium-scale science struggles too!• Dark Energy Survey
receives 100,000 files each night in Illinois
• They transmit files to Texas for analysis … then move results back to Illinois
• Process must be reliable, routine, and efficient
• The cyberinfrastructure team is not large
Image credit: Roger Smith/NOAO/AURA/NSF
Blanco 4m on Cerro Tololo
www.ci.anl.govwww.ci.uchicago.edu
8
The challenge of staying competitive
"Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”
"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
www.ci.anl.govwww.ci.uchicago.edu
9
Current approaches are unsustainable
• Small laboratories– PI, postdoc, technician, grad students– Estimate 5,000 across US university community– Average ill-spent/unmet need of 0.5 FTE/lab?
• Medium-scale projects– Multiple PIs, a few software engineers– Estimate 500 across US university community– Average ill-spent/unmet need of 3 FTE/project?
• Total 4000 FTE: at ~$100K/FTE => $400M/yr Plus computers, storage, opportunity costs, …
www.ci.anl.govwww.ci.uchicago.edu
10
And don’t forget administrative costs
42% of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007
www.ci.anl.govwww.ci.uchicago.edu
11
You can run a company from a coffee shop
www.ci.anl.govwww.ci.uchicago.edu
12
Because businesses outsource their IT
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt
Software as a Service
(SaaS)
www.ci.anl.govwww.ci.uchicago.edu
13
And often their large-scale computing too
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution
Infrastructure as a Service
(IaaS)
Software as a Service
(SaaS)
www.ci.anl.govwww.ci.uchicago.edu
14
Let’s rethink how we provide research IT
Accelerate discovery and innovation worldwide by providing research IT as a service
Leverage software-as-a-service to• provide millions of researchers with
unprecedented access to powerful tools; • enable a massive shortening of cycle times in
time-consuming research processes; and• reduce research IT costs dramatically via
economies of scale
so long,
computer overlords
www.ci.anl.govwww.ci.uchicago.edu
15
Time-consuming tasks in science
• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
www.ci.anl.govwww.ci.uchicago.edu
16
Time-consuming tasks in science
• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
www.ci.anl.govwww.ci.uchicago.edu
17
A B
Data movement can be surprisingly difficult
www.ci.anl.govwww.ci.uchicago.edu
18
A B
Discover endpoints, determine available protocols, negotiate firewalls, configure software,
manage space, determine required credentials, configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, …
Data movement can be surprisingly difficult
It took 2 weeks and much help from many people to move 10 TB between California and Tennessee.
(2007 BES report)
www.ci.anl.govwww.ci.uchicago.edu
19
Globus Online’s SaaS/Web 2.0 architecture
Fire-and-forget data movementAutomatic fault recoveryHigh performanceNo client software installAcross multiple security domains
Web interface
HTTP REST interfacePOST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc>
Command line interfacels alcf#dtn:/scp alcf#dtn:/myfile \ nersc#dtn:/myfile
GridFTP serversFTP servers
Other protocols:HTTP, WebDAV, SRM, …
Globus Connecton local computers
(Hosted on)
(Operate)
www.ci.anl.govwww.ci.uchicago.edu
20
Example application: UC sequencing facility
Sequencing instrument
Mac using Globus Connect
iBi File Server
iBi general-purpose compute cluster
Sequencing-specific compute cluster
Mount drive
Delivery of data to customer
www.ci.anl.govwww.ci.uchicago.edu
21
Statistics and user feedback
• Launched November 2010>1400 users registered>350 TB user data moved>28 million user files moved>140 endpoints registered
• Widely used on TeraGrid/XSEDE; other centers & facilities; internationally
• >20x faster than SCP• Faster than hand-tuned
“Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.”
“I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.”
“Globus Online’s speed has us planning experiments that we would never have considered previously.”
www.ci.anl.govwww.ci.uchicago.edu
22
Moving 586 Terabytes in two weeks
www.ci.anl.govwww.ci.uchicago.edu
23
Monitoring provides deep visibility
Terabyte
Gigabyte
Megabyte
Kilobyte
20 Terabytes in less than one day
20 Gigabyes in more than two days
www.ci.anl.govwww.ci.uchicago.edu
25
Common research data management steps
• Dark Energy Survey• Galaxy genomics• LIGO observatory
• SBGrid structural biology consortium• NCAR climate data applications• Land use change; economics
www.ci.anl.govwww.ci.uchicago.edu
26
We have choices of where to compute
• Campus systems– First target for many researchers
• XSEDE supercomputers– 220,000 cores, peer-reviewed awards– Optimized for scientific computing
• Open Science Grid– 60,000 cores; high throughput
• Commercial cloud providers– Instant access for small tasks– Expensive for big projects
Users insist that they need everything connected
www.ci.anl.govwww.ci.uchicago.edu
27
Towards “research IT as a service”
www.ci.anl.govwww.ci.uchicago.edu
28
Research data management as a service• GO-User
– Credentials and other profile information
• GO-Transfer– Data movement
• GO-Team– Group membership
• GO-Collaborate– Connect to collaborative
tools: Jira, Confluence, …
• GO-Store– Access to campus, cloud,
XSEDE storage• GO-Catalog
– On-demand metadata catalogs
• GO-Compute– Access to computers
• GO-Galaxy– Share, create, run
workflows
Today
Fall
Prototype
www.ci.anl.govwww.ci.uchicago.edu
29
SaaS services in action: The XSEDE vision
XUAS
www.ci.anl.govwww.ci.uchicago.edu
30
Data analysis as a service: Early steps
Securely and reliably:1. Assemble code2. Find computers3. Deploy code4. Run program5. Access data6. Store data7. Record workflow8. Reuse workflow
[3, 4]
VM imageApp codeWorkflowGalaxyCondor
Data store
[5, 6]
We have built such systems for biological, environmental, and economics researchers
[1, 2]
[7, 8]
www.ci.anl.govwww.ci.uchicago.edu
31
SaaS economics: A quick tutorial
• Lower per-user cost (x10?) via aggregation onto common infrastructure– $400M/yr $40M/yr?
• Initial “cost trough” due to fixed costs
• Per-user revenue permits positive return to scale
• Further reduce per-user cost over time
$
Time0
X10 reduction in per-user cost: $50K $5K/yr per lab $300K $30K/yr per project
www.ci.anl.govwww.ci.uchicago.edu
32
A national cyberinfrastructure strategy?
LL
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LP P P P
Research data management Collaboration, computationResearch administration
• To providemore capability formore people at less cost …
• Create infrastructure – Robust and universal– Economies of scale– Positive returns to scale
• Via the creative use of– Aggregation (“cloud”)– Federation (“grid”)
Small and medium laboratories and projects
aaS
P
www.ci.anl.govwww.ci.uchicago.edu
33
Acknowledgments
• Colleagues at UChicago and Argonne– Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik,
Michael Russell, Paul Dave, Stuart Martin, Dan Katz, and many others
• Colleagues at other institutions– Carl Kesselman, Miron Livny, John Towns, and others
• NSF OCI, MPS, and SBE; DOE ASCR; and NIH for support
www.ci.anl.govwww.ci.uchicago.edu
34
For more information
• Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.
• Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Globus Online: Radical Simplification of Data Movement via SaaS. Communications of the ACM, 2011.