Upload
cachet
View
66
Download
2
Tags:
Embed Size (px)
DESCRIPTION
The iPlant Collaborative Pollen RCN March 2 nd , 2013 . Steve Goff BIO5 Institute University of Arizona. The iPlant Collaborative Cyberinfrastructure for the Plant Sciences. 9:00 - 9:20 AMSteve Goff, Director, iPlant Collaborative: iPlant Overview, Data Store, Discovery Environment - PowerPoint PPT Presentation
Citation preview
www.iplantcollaborative.org
The iPlant Collaborative Pollen RCN
March 2nd, 2013
Steve GoffBIO5 Institute
University of Arizona
www.iplantcollaborative.org
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
9:00 - 9:20 AM Steve Goff, Director, iPlant Collaborative: iPlant Overview, Data Store, Discovery Environment
9:20 - 9:30 AM Martha Narro, Sr. Project Coordinator, iPlant Collaborative: Bisque
9:30 – 9:40 AM Naim Matasci, iPlant Collaborative: Atmosphere
9:40 – 9:50 AM Matt Bomhoff, University of Arizona: CoGe
9:50 - 10:00 AM iPlant Presenters: Questions and Discussion
11:00 - 12:00 NOON Poster session / Booth Demonstrations by presenters in the previous session (Tutorials: PollenTubeTracker in Bisque, RNAseq in Discovery Environment)
NSF’s PSCIC Program
PSCIC Goals: “to create a new type of organization - a
cyberinfrastructure collaborative for plant science” “to enable new conceptual advances through
integrative, computational thinking”
“to address an evolving array of grand challenge questions in plant science: the driving force and organizing principles for the collaborative”
www.iplantcollaborative.org
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
• NSF Funded Project – finished 5th year• Recommended for second 5 year term• iPlant is a cyberinfrastructure platform • The platform is extensible by users• NSF recommended scope beyond plants• iPlant supports plant & animal breeding• iPlant will bridge the genomics – breeding gap
www.iplantcollaborative.org
NSF Cyberinfrastructure Vision• High Performance Computing• Data and Data Analysis• Virtual Organizations• Learning and Workforce
Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.
www.iplantcollaborative.org
Grand Challenge Projects + Added Efforts
• Plant Tree of Life – iPToL – May ’09 + Taxonomic Intelligence (TNRS) + Scientific Networking Website (MyPlant) + Perpetually Updated Trees + Species Distribution Maps
• Genotype to Phenotype – iPG2P – Aug ’09 + Image Analysis Platform (Bisque) + GLM/PLM, Association + Integrated Breeding Platform (GCP/Gates) + Comparative Genomics Platform (CoGe) + Semantic Web Development
www.iplantcollaborative.org
NAR Databases & Tools Over Time
2004 2005 2006 2007 2008 2009 2010 2011 2012
1300120011001000
900800700600500400300200100
0
www.iplantcollaborative.org
PubMed Publications Over Time
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1950 2010
Accounts for ~70% - Currently >2,500/day
www.iplantcollaborative.org
Biology’s “Big Data” InstrumentsUltra-High-Throughput SequencersExample: Illumina HiSeq 2000• >1 terabyte sequence data / 11 days• Estimated >1k analysis jobs/day• Analysis – the new bottleneck• Rapidly introducing new technology
………………AGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCT ………………
www.iplantcollaborative.org
What iPlant has to offer:• Data Management Resources• High-Performance Computing Resources• Tool Integration System • Application Programming Interfaces • Cloud Computing Resources• Image Analysis Platform• Molecular Breeding Platform (with IBP)
www.iplantcollaborative.org
The iPlant CollaborativeWeb site – entry point to tools & documentation
www.iplantcollaborative.org
The iPlant Discovery Environment:
iPlant needs to empower researchers to use next gen seq, but also point out the pitfalls
www.iplantcollaborative.org
The iPlant Data Store
Fast data transfers via parallel, non-TCP file transfer (iDrop)
• Move large (>2 GB) files with ease
Multiple, consistent access modes
• iPlant API• iPlant web apps• Desktop mount (FUSE/DAV)• Java applet (iDrop)• Command line
Fine-grained ACL permissions• Sharing made simple
“Cloud Storage”… but it’s not Amazon
Access and a storage allocation is automatic with your iPlant account
www.iplantcollaborative.org
iPlant Data Store Transfer PerformanceData Transfer from UC Berkeley to iPlant Data Store (UA)
• Dec 5th, 2011: • 100GB: <30 min
www.iplantcollaborative.org
The iPlant Data Store
• >100 Petabytes avail• Fast transfer• Storage near HPC• Replicated
www.iplantcollaborative.org
• Leveraging XSEDE• TACC, SDSC, PSC, EBI • >500,000 Compute Cores• 1-4TB shared memory
TACC Stampede
PSC Blacklight TACC Corral
EBI Web Services
TACC Lonestar
iPlant Access to HPC via XSEDEScalable Computation for High Throughput Analysis
SDSC CI
www.iplantcollaborative.org
Bisque Image Management, Analysis, Sharing System
Martha Narro will describe.
www.iplantcollaborative.org
Customized cloud platform for computing on your terms !Naim Matasci will describe Atmosphere
www.iplantcollaborative.org
Accelerating Analysis – an Example• Code Parallelization
• Biallelic SNP Association• Estimated 1,600 years • Reduced to 4 hours
Challenges:• Months of communication• Few weeks of development• Only used once to date
www.iplantcollaborative.org
The Integrated Breeding Portalhttps://www.integratedbreeding.net/
Also in Chinese, soon French and Spanish
www.iplantcollaborative.org
OneKPThe problem
• OneKP: consortium formed to sequence the transcriptomes of 1000 phylogentically diverse plant species.
• Needs: storage, access to compute resources and expertise, distribution.
Our approach• Assign personnel with expertise in the
required fields to the project• Cover storage and computational needs
• Scrubbed all names to match NCBI taxa names (20% could originally not be matched)• iPlant will be offering BLAST and search services against the OneKP results in the next DE release• The optimized BLASTX and translation pipeline as available to the community through the Discovery
Environment
Results•iPlant is replicating the entire dataset including raw reads, assemblies and analysis results•Annotated 86 million contigs against NCBI's RefSeq using BLASTX•Identified the open reading frames and estimated the protein sequences resulting in 19,556,877 potential genes•Will increased the number of plant genes in GenBank by a factor 100.
www.iplantcollaborative.org
Assembly and AnnotationResults•Diverse species assembled/annotated: Rice, diploid switchgrass, Ceratopteris, several Solanaceae, mulberry, maize accessions, Thellungiella, barley, wheat, and soybean
•Laboratory groups engaged: >30, including Cornell, Iowa State University, University of Florida, JCVI, Penn State University, CSIRO, and Purdue
•Applications deployed to HPC: ALLPATHS, Velvet, Oases, ABYSS, Newbler, SOAPdenovo, SOAPdenovo-Trans, Trinity, Celera Assembler
•HPC applications available via DE: Velvet, ABYSS, Newbler, SOAPdenovo, Trinity, InterproScan
•Current deployment and optimization efforts: Trinity, InterproScan, MAKER
•HPC systems used: PSC Blacklight, TACC Ranger, TACC Lonestar, SDSC Trestles
•Usage statistics:
• 7,000 HPC jobs; 1.5 million computing hours in Y1 of this initiative
• > 1000 HPC-backed assembly/annotation jobs run by iPlant DE users in 8 months
The problem
• Full-scale genome and transcriptome sequencing is affordable and accessible
• Assembly and knowledge extraction remains challenging
• Extremely computationally intensive. Complex, low-efficiency software. Command-line only.
Our approach• Provide HPC resources
• >100k CPUs• multi-TB RAM• petascale storage
• Optimize workflows and algorithms• Provide access via Discovery Environment
www.iplantcollaborative.org
iPlant Cyberinfrastructure Strengths
• Extensible, flexible platform architecture• Not limited to plant science (iAnimal, iArthropod)• Diverse community collaborations• Experienced staff working in a distributed fashion• Unified access to iPlant (single sign-on)• Genotype to Phenotype & Phylogenetics tools• Various levels of support, novice to expert user• Developing semantic web effort
www.iplantcollaborative.org
Staff:Greg AbramSonali AdityaRoger BarthelsonBrad BoyleTodd BryanGordon BurleighJohn CazesMike ConwayKaren CranstonRion DoodeyAndy EdmondsDmitry FedorovMichael GattoUtkarsh GaurSteven GregoryMatthew Hanlon
Metadata Data Tools Workflows Viz
Executive Team:Steve GoffDan Stanzione
Andrew LenardsMonica Lent Zhenyuan LuEric LyonsNaim MatasciSheldon McKayRobert McLayAngel MercerDave MicklosNathan MillerSteve Mock Martha NarroPraveen NuthulapatiShannon OliverShiran PasternakWilliam PeilDennis RobertsJerry Schneider
Anthony HeathBarbara HeathNatalie HenriquesUwe HilgertNicole HopkinsEun-Sook JeongLogan JohnsonChris JordanB.D. KimKathleen KennedyMohammed KhalfanLars KoersterkSangeeta KuchimanchiKristian KvilekvalAruna LakshmananSue LauterTina Lee
Bruce SchumakerSriramu SingaramEdwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellKris UriePeter Van BurenHans Vasquez-GrossMatthew VaughnJason WilliamsJohn WregglesworthWeijia Xu
Postdocs:Barbara BanburyJamie EstillBindu JosephChristos Noutsos Brad RuhfelStephen A. SmithChunlao TangLin WangLiya WangNorman Wickett
The iPlant Collaborative - AcknowledgmentsStudents:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYi-Da ChenJohn DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha Marathe
Andrew Mercer Aniruddha MaratheKurt MichaelsDhanesh PrasadAndrew PredoehlJose SalcedoShalini SasidharanGregory StriemerJason VandeventerKuan Yang
Faculty Advisors & Collaborators:Ali AkogluGreg AndrewsKobus BarnardSue BrownThomas BrutnellMichael DonoghueCasey DunnBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDan KliebensteinJim Leebens-MackDavid LowenthalRobert Martienssen
B.S. Manjunath Nirav Merchant David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisAnn StapletonLincoln SteinVal TannenTodd VisionDoreen WareSteve WelchMark Westneat