Upload
cameron-kiddle
View
942
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This presentation examines some of the challenges scientists face and describes various cyberinfrastructure technologies that help address these challenges. Example projects employing cyberinfrastructure technologies that we have worked on at the Grid Research Centre, including the GeoChronos project, are also presented. This presentation was given at the IAI International Wireless Sensor Networks Summer School held at the University of Alberta on July 6th, 2009.
Citation preview
Cyberinfrastructure and its Role in Science
Cameron Kiddle
Research Fellow, Grid Research Centre
Adjunct Assistant Professor, Department of Computer Science, University of Calgary
Distributed Systems Architect, WestGrid
Outline Challenges Cyberinfrastructure Cyberinfrastructure Technologies Examples
ICE Force Project Molecular Dynamics Simulations GT4-based Grid for Canada Fire Dynamics Simulator Rendering on the Cloud GeoChronos
IAI Summer School July 6, 2009
Cyberinfrastructure - 2
Collaboration Challenges Familiarity/awareness of collaboration tools Keeping all interested parties in the loop Finding related work and researchers Keeping up to date with current research Collaboration while working in the field
IAI Summer School July 6, 2009
Cyberinfrastructure - 3
Data Challenges Acquisition of data
Many different data sources Large quantities of data Different regulations/mechanisms for accessing data Lack of automation Finding the right data Bandwidth constraints
Managing data Scattered and unorganized data Inadequate tools for recording/maintaining metadata
Data without metadata is meaningless Lack of suitable metadata standards Validation of metadata
Tracking provenance of data Pre-processing of data
Raw data typically cannot be directly analyzed Significant amount of time spent preparing data for analysis Lack of automation
IAI Summer School July 6, 2009
Cyberinfrastructure - 4
Application Challenges Limited availability of computing resources Access to and familiarity of heterogeneous
computing resources Fault tolerance and reliability Access to software available in research lab
while in field or other locations Installing, configuring and updating software System dependencies of software Awareness and suitability of available software Sharing applications and results
IAI Summer School July 6, 2009
Cyberinfrastructure - 5
Cyberinfrastructure
IAI Summer School July 6, 2009
Cyberinfrastructure - 6
“Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, "cyberinfrastructure" refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor.”
Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003.
Cyberinfrastructure Technologies Grid Computing Cloud Computing Virtualization Web 2.0 / Social Networking Web Portals / Scientific Gateways Semantic Web …
IAI Summer School July 6, 2009
Cyberinfrastructure - 7
Grid Computing
IAI Summer School July 6, 2009
Cyberinfrastructure - 8
Many different definitions/uses computational grids, data grids, desktop grids, campus grids, sensor
grids, access grids Coordinated sharing of heterogeneous resources across
administrative domains
Resources Shared by Virtual Organization X
Resources Shared byVirtual Organization Y
Domain A
Domain B Domain C
Grid Middleware
IAI Summer School July 6, 2009
Cyberinfrastructure - 9
The layer between users/applications and grid resources that glues everything together
Example grid middleware Globus Toolkit
GT2 – pre-standards GT4 – Web Services based
UNICORE gLite ARC NAREGI
Key Grid Middleware Services Security Services
Concerned with authentication, authorization, secure communication, …
Information Services Provide information about resources, policy, services
and applications to tools and users Data Management Services
Manage movement and replication of data as well as metadata about data
Execution Management Services Handle placement, provisioning and lifetime
management of jobs and workflowsIAI Summer School July 6, 2009
Cyberinfrastructure - 10
Benefits of Grid Computing Easier access to more resources
Users/organizations can share resources Single sign-on Common interface (hide heterogeneity)
Improved data management Efficient file transfers Abstraction of physical location of data
Automated execution of jobs and workflows
IAI Summer School July 6, 2009
Cyberinfrastructure - 11
Example Grid Projects
IAI Summer School July 6, 2009
Cyberinfrastructure - 12
Name DescriptionLHC Computing Grid http://lcg.web.cern.ch/
data storage and analysis infrastructure for the high energy physics community using the Large Hadron Collider (LHC) at CERN (ATLAS Tier-1 site at TRIUMF in British Columbia)
Network for Earthquake Engineering Simulation (NEES) http://www.nees.org/
a US national network of 15 facilities to study the impact of earthquakes on buildings, bridges, etc.
Expanding GEOsciences on DEmand (EGEODE)
http://www.egeode.org/
a virtual organization (VO) associated with EGEE that is dedicated to research in geoscience for both public and private industrial R&D and academic laboratories
International Virtual Observatory Alliance (IVOA) http://www.ivoa.net/
development of standards and infrastructure to share and analyze astronomical archives from around the world
Cloud Computing
IAI Summer School July 6, 2009
Cyberinfrastructure - 13
Transparent access to scalable and dynamic services over the Internet
Key features: Everything as a Service (EaaS) Utility/On-demand Accessibility/Transparency Scalability Virtualization
Cloud Computing Solutions
IAI Summer School July 6, 2009
Cyberinfrastructure - 14
Benefits of Cloud Computing
IAI Summer School July 6, 2009
Cyberinfrastructure - 15
Reduce capital, support and maintenance costs Pay only for what you use Get access to more/fewer resources when needed
Ready to use for users No more downloads, installations or updates
Simplify and speed up software development Don’t have to support multiple platforms
Application popularity and lifespan difficult to predict Scale applications according to user demand
Cloud Computing Case Study: Application Popularity on Facebook
IAI Summer School July 6, 2009
Cyberinfrastructure - 16
Difficult to predict popularity and lifespan of applications
Facebook Application Growth Sep. 2007: ~ 3700 Sep. 2008: ~39000
Facebook Application Popularity (Sep. 12, 2008) 39181 applications Active user data for 37155
apps 3 apps > 10 million active users 80% apps < 1000 active users
Monthly Active Users vs.
Rank of Facebook Applications(September 12, 2008)
Cloud Computing Case Study: Shrek (Dreamworks)
IAI Summer School July 6, 2009
Cyberinfrastructure - 17
Shrek (2001) – 5 million CPU render hours Shrek 2 (2004) – 10 million CPU render hours Shrek 3 (2007) – 20 million CPU render hours
Time to Render
1 CPU 100 CPUs 10000 CPUs
Shrek 571 years 5.7 years 21 days
Shrek 2 1142 years 11.4 years 42 days
Shrek 3 2283 years 22.8 years 83 days
(Source: R. Rowe. DreamWorks Animation "Shrek the Third": Linux Feeds an Ogre. Linux Journal. June 5, 2007. (http://www.linuxjournal.com/article/9653))
Cloud Computing Case Study: Animoto
IAI Summer School July 6, 2009
Cyberinfrastructure - 18
Animoto (http://animoto.com) Produces professional quality videos from
images Runs on Amazon EC2
Popularity soared when promoted on Facebook
During the course of 4 days: Jumped from 8 to 450 renderings per minute ~20000 new users per hour 3500 instances running on Amazon EC2 at peak
(Source: D. Barker. You Need 3,500 Servers by When?! On-demand Enterprise. 2008.07.07)
Virtualization
IAI Summer School July 6, 2009
Cyberinfrastructure - 19
Can transform a single physical machine into multiple virtual machines (VMs) each with their own OS and software stack
Virtualization software Xen, KVM, VMWare Support allocation, deallocation, checkpointing and
migration of VMs Benefits
Custom environments (root access) More efficient use of resources (consolidation) System maintenance without disruption
Web 2.0 – The “Social Web”
IAI Summer School July 6, 2009
Cyberinfrastructure - 20
Aimed at: Providing feature rich user environments Making it easier for users to generate Web content Improving online social connectivity
Example Web 2.0 technologies Blogs (WordPress, TypePad) Wikis (Wikipedia) Mashups (HousingMaps, ChicagoCrime) Widgets/Gadgets (iGoogle, Netvibes) Social networks (Facebook, MySpace, YouTube)
Social Networking Sites/Platforms
IAI Summer School July 6, 2009
Cyberinfrastructure - 21
Web Portals / Scientific Gateways
IAI Summer School July 6, 2009
Cyberinfrastructure - 22
Aimed at providing a community of users access to computing resources through a common Web-based interface
Web portal development tools GridSphere (portlet based) Web 2.0/Social Networking
Examples TeraGrid Scientific Gateways (over 30 of them) nanoHUB
Semantic Web Aimed at representing knowledge, not just
information Connecting and relating data in a way
understandable by machines Semantic Web standards
Resource Description Framework (RDF) Web Ontology Language (OWL)
IAI Summer School July 6, 2009
Cyberinfrastructure - 23
IAI Summer School July 6, 2009
Cyberinfrastructure -
Confederation Bridge ICE Force Monitoring Project
Monitoring of forces on the Confederation Bridge Data analyzed by civil engineering groups at University of
Calgary and Carleton University GRC developed solution to automate data management
as part of a CANARIE AAP project
(http://www.confederationbridge.com) (http://www.confederationbridge.com)
2424
IAI Summer School July 6, 2009
Cyberinfrastructure -
ICE Force - Technologies Used Grid Middleware
GT4 Data Management
Proactive Data Management Service (PDMS) Data Transfer - GridFTP, RFT Replication Management – RLS Metadata Management - MCS
25
IAI Summer School July 6, 2009
Cyberinfrastructure -
Molecular Dynamics Simulations (GROMACS)
GROMACS Parallel molecular dynamics
simulation application Can simulate hundreds to
millions of particles Simulation runs can take
days, weeks or months Issues with long running
jobs Fault tolerance Scheduler policy constraints
(http://moose.bio.ucalgary.ca/)
26
IAI Summer School July 6, 2009
Cyberinfrastructure -
GROMACS - Grid Enabled Solution Automated grid enabled solution developed
by GRC to manage GROMACS simulations as part of a CANARIE AAP project
Long jobs split into a series of shorter jobs Automates checkpointing, migration and
reconfiguration of jobs
27
IAI Summer School July 6, 2009
Cyberinfrastructure -
GROMACS - Portal
28
IAI Summer School July 6, 2009
Cyberinfrastructure -
GROMACS - Technologies Used Grid Middleware
GT4 Information Services
WS MDS Data Management
PDMS (GridFTP, RFT, RLS, MCS) Execution Management
Custom system (Condor-G, WS GRAM) Portal
GridSphere
29
IAI Summer School July 6, 2009
Cyberinfrastructure -
Web Service based Grid Environment for Canada Established a GT4-based grid environment from
resources across Canada (CANARIE CIIP)
30
IAI Summer School July 6, 2009
Cyberinfrastructure -
GT4-based Grid - Model Schemas Models developed to describe systems, applications
and scheduler policy (GRC Model Schema)
System Model Class Diagram
31
32IAI Summer School July 6, 2009
Cyberinfrastructure -
GT4-based Grid – Viewing Resource Information Used WebMDS, a customizable Web based interface
for viewing resource information published by WS MDS
GT4-based Grid - Technologies Used
IAI Summer School July 6, 2009
Cyberinfrastructure - 33
Grid Middleware GT4
Data Management GridFTP, RFT
Information Services GRC Model Schema, WS MDS, WebMDS
Execution Management Condor-G, WS GRAM
IAI Summer School July 6, 2009
Cyberinfrastructure -
Example: Fire Simulation Developed a comprehensive
environment for the Fire Dynamics Simulator (FDS) as part of a collaborative project between GRC and HP Labs
Deployed on HP Labs Data Centre at University of Calgary
Initial focus of project Leverage Web 2.0 technologies Explore use of virtualization in a
utility/cloud computing environment
34
IAI Summer School July 6, 2009
Cyberinfrastructure -
Fire Simulation - Technologies Used User level
Web 2.0/social networking technology (Facebook)
Service provider level LAMP environment (Linux, Apache, MySQL,
Perl/Python/PHP) Simulation (FDS, Condor) Visualization (Smokeview, VNC)
Resource (utility) provider level Cloud computing technology (ASPEN) Virtual machine technology (Xen)
35
IAI Summer School July 6, 2009
Cyberinfrastructure -
Example: Rendering on the Cloud GRC created an on-
demand cloud rendering service for EDM Studio
Cybera Pilot Project Technologies used:
Cloud computing technology (ASPEN)
Virtual machine technology (Xen)
Social networking technology (Ning/Elgg)
36
An on-line platform For:
Earth Observation Scientists Facilitating:
Collaboration between scientists Data access, management and sharing Application access, management and sharing
Leveraging: Web 2.0 / social networking technologies (Elgg) Semantic Web technologies (RDF, OWL) Cloud computing and virtualization technologies (ASPEN,
Xen)
IAI Summer School July 6, 2009
Cyberinfrastructure - 37
GeoChronos - Collaboration
Social networking portal Elgg-based (elgg.org)
Social networking services Blogs Tags Media/document sharing Wikis Friends/contacts Groups Discussions Message boards Calendars Status News Feeds
IAI Summer School July 6, 2009
Cyberinfrastructure -
http://geochronos.org/
38
GeoChronos - Data Data Acquisition
Automated acquisition of data from sensors (ground, airborne, satellite) or third party
Data Storage Store, share, browse and
search data i.e., spectral library
Data Processing Automated data workflows
i.e., mosaic, reproject and subset MODIS data
IAI Summer School July 6, 2009
Cyberinfrastructure - 39
GeoChronos - Applications Interactive Application
Service (IAS) On-line, on-demand access to
scientific applications Share application sessions and
data with other users Access control to applications
Batch Processing Service Batch processing environment
for longer running data processing tasks or simulations
For use directly by individual users or as part of automated data workflows
IAI Summer School July 6, 2009
Cyberinfrastructure - 40
GeoChronos - Project Team
IAI Summer School July 6, 2009
Cyberinfrastructure -
Dr. Arturo Sanchez-AzofeifaUniversity of Alberta
Dr. John GamonUniversity of Alberta
Dr. Benoit RivardUniversity of Victoria
Dr. Rob SimmondsUniversity of Calgary
Prinicipal Investigators
Project Coordination Platform Development Domain Scientists
41
GeoChronos - Virtual Organization
IAI Summer School July 6, 2009
Cyberinfrastructure - 42
Contact Information
IAI Summer School July 6, 2009
Cyberinfrastructure -
Cameron [email protected]://pages.cspc.ucalgary.ca/~kiddlec/
http://grid.ucalgary.ca/
43