24
www.iplantcollaborat ive.org The iPlant Collaborative Pollen RCN March 2 nd , 2013 Steve Goff BIO5 Institute University of Arizona

The iPlant Collaborative Pollen RCN March 2 nd , 2013

  • Upload
    cachet

  • View
    66

  • Download
    2

Embed Size (px)

DESCRIPTION

The iPlant Collaborative Pollen RCN March 2 nd , 2013 . Steve Goff BIO5 Institute University of Arizona. The iPlant Collaborative Cyberinfrastructure for the Plant Sciences. 9:00 - 9:20 AMSteve Goff, Director, iPlant Collaborative: iPlant Overview, Data Store, Discovery Environment - PowerPoint PPT Presentation

Citation preview

Page 1: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant Collaborative Pollen RCN

March 2nd, 2013

Steve GoffBIO5 Institute

University of Arizona

Page 2: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant CollaborativeCyberinfrastructure for the Plant Sciences

9:00 - 9:20 AM Steve Goff, Director, iPlant Collaborative: iPlant Overview, Data Store, Discovery Environment

9:20 - 9:30 AM Martha Narro, Sr. Project Coordinator, iPlant Collaborative: Bisque

9:30 – 9:40 AM Naim Matasci, iPlant Collaborative: Atmosphere

9:40 – 9:50 AM Matt Bomhoff, University of Arizona: CoGe

9:50 - 10:00 AM iPlant Presenters: Questions and Discussion

11:00 - 12:00 NOON Poster session / Booth Demonstrations by presenters in the previous session (Tutorials: PollenTubeTracker in Bisque, RNAseq in Discovery Environment)

Page 3: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

NSF’s PSCIC Program

PSCIC Goals: “to create a new type of organization - a

cyberinfrastructure collaborative for plant science” “to enable new conceptual advances through

integrative, computational thinking”

“to address an evolving array of grand challenge questions in plant science: the driving force and organizing principles for the collaborative”

Page 4: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant CollaborativeCyberinfrastructure for the Plant Sciences

• NSF Funded Project – finished 5th year• Recommended for second 5 year term• iPlant is a cyberinfrastructure platform • The platform is extensible by users• NSF recommended scope beyond plants• iPlant supports plant & animal breeding• iPlant will bridge the genomics – breeding gap

Page 5: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

NSF Cyberinfrastructure Vision• High Performance Computing• Data and Data Analysis• Virtual Organizations• Learning and Workforce

Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

Page 6: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Grand Challenge Projects + Added Efforts

• Plant Tree of Life – iPToL – May ’09 + Taxonomic Intelligence (TNRS) + Scientific Networking Website (MyPlant) + Perpetually Updated Trees + Species Distribution Maps

• Genotype to Phenotype – iPG2P – Aug ’09 + Image Analysis Platform (Bisque) + GLM/PLM, Association + Integrated Breeding Platform (GCP/Gates) + Comparative Genomics Platform (CoGe) + Semantic Web Development

Page 7: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

NAR Databases & Tools Over Time

2004 2005 2006 2007 2008 2009 2010 2011 2012

1300120011001000

900800700600500400300200100

0

Page 8: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

PubMed Publications Over Time

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

1950 2010

Accounts for ~70% - Currently >2,500/day

Page 9: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Biology’s “Big Data” InstrumentsUltra-High-Throughput SequencersExample: Illumina HiSeq 2000• >1 terabyte sequence data / 11 days• Estimated >1k analysis jobs/day• Analysis – the new bottleneck• Rapidly introducing new technology

………………AGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTATCAATGCTAGGCCTTGCAAATGACGCCTGTTCAATGCT ………………

Page 10: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

What iPlant has to offer:• Data Management Resources• High-Performance Computing Resources• Tool Integration System • Application Programming Interfaces • Cloud Computing Resources• Image Analysis Platform• Molecular Breeding Platform (with IBP)

Page 11: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant CollaborativeWeb site – entry point to tools & documentation

Page 12: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant Discovery Environment:

iPlant needs to empower researchers to use next gen seq, but also point out the pitfalls

Page 13: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant Data Store

Fast data transfers via parallel, non-TCP file transfer (iDrop)

• Move large (>2 GB) files with ease

Multiple, consistent access modes

• iPlant API• iPlant web apps• Desktop mount (FUSE/DAV)• Java applet (iDrop)• Command line

Fine-grained ACL permissions• Sharing made simple

“Cloud Storage”… but it’s not Amazon

Access and a storage allocation is automatic with your iPlant account

Page 14: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

iPlant Data Store Transfer PerformanceData Transfer from UC Berkeley to iPlant Data Store (UA)

• Dec 5th, 2011: • 100GB: <30 min

Page 15: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The iPlant Data Store

• >100 Petabytes avail• Fast transfer• Storage near HPC• Replicated

Page 16: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

• Leveraging XSEDE• TACC, SDSC, PSC, EBI • >500,000 Compute Cores• 1-4TB shared memory

TACC Stampede

PSC Blacklight TACC Corral

EBI Web Services

TACC Lonestar

iPlant Access to HPC via XSEDEScalable Computation for High Throughput Analysis

SDSC CI

Page 17: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Bisque Image Management, Analysis, Sharing System

Martha Narro will describe.

Page 18: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Customized cloud platform for computing on your terms !Naim Matasci will describe Atmosphere

Page 19: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Accelerating Analysis – an Example• Code Parallelization

• Biallelic SNP Association• Estimated 1,600 years • Reduced to 4 hours

Challenges:• Months of communication• Few weeks of development• Only used once to date

Page 20: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

The Integrated Breeding Portalhttps://www.integratedbreeding.net/

Also in Chinese, soon French and Spanish

Page 21: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

OneKPThe problem

• OneKP: consortium formed to sequence the transcriptomes of 1000 phylogentically diverse plant species.

• Needs: storage, access to compute resources and expertise, distribution.

Our approach• Assign personnel with expertise in the

required fields to the project• Cover storage and computational needs

• Scrubbed all names to match NCBI taxa names (20% could originally not be matched)• iPlant will be offering BLAST and search services against the OneKP results in the next DE release• The optimized BLASTX and translation pipeline as available to the community through the Discovery

Environment

Results•iPlant is replicating the entire dataset including raw reads, assemblies and analysis results•Annotated 86 million contigs against NCBI's RefSeq using BLASTX•Identified the open reading frames and estimated the protein sequences resulting in 19,556,877 potential genes•Will increased the number of plant genes in GenBank by a factor 100.

Page 22: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Assembly and AnnotationResults•Diverse species assembled/annotated: Rice, diploid switchgrass, Ceratopteris, several Solanaceae, mulberry, maize accessions, Thellungiella, barley, wheat, and soybean

•Laboratory groups engaged: >30, including Cornell, Iowa State University, University of Florida, JCVI, Penn State University, CSIRO, and Purdue

•Applications deployed to HPC: ALLPATHS, Velvet, Oases, ABYSS, Newbler, SOAPdenovo, SOAPdenovo-Trans, Trinity, Celera Assembler

•HPC applications available via DE: Velvet, ABYSS, Newbler, SOAPdenovo, Trinity, InterproScan

•Current deployment and optimization efforts: Trinity, InterproScan, MAKER

•HPC systems used: PSC Blacklight, TACC Ranger, TACC Lonestar, SDSC Trestles

•Usage statistics:

• 7,000 HPC jobs; 1.5 million computing hours in Y1 of this initiative

• > 1000 HPC-backed assembly/annotation jobs run by iPlant DE users in 8 months

The problem

• Full-scale genome and transcriptome sequencing is affordable and accessible

• Assembly and knowledge extraction remains challenging

• Extremely computationally intensive. Complex, low-efficiency software. Command-line only.

Our approach• Provide HPC resources

• >100k CPUs• multi-TB RAM• petascale storage

• Optimize workflows and algorithms• Provide access via Discovery Environment

Page 23: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

iPlant Cyberinfrastructure Strengths

• Extensible, flexible platform architecture• Not limited to plant science (iAnimal, iArthropod)• Diverse community collaborations• Experienced staff working in a distributed fashion• Unified access to iPlant (single sign-on)• Genotype to Phenotype & Phylogenetics tools• Various levels of support, novice to expert user• Developing semantic web effort

Page 24: The iPlant Collaborative  Pollen RCN March 2 nd , 2013

www.iplantcollaborative.org

Staff:Greg AbramSonali AdityaRoger BarthelsonBrad BoyleTodd BryanGordon BurleighJohn CazesMike ConwayKaren CranstonRion DoodeyAndy EdmondsDmitry FedorovMichael GattoUtkarsh GaurSteven GregoryMatthew Hanlon

Metadata Data Tools Workflows Viz

Executive Team:Steve GoffDan Stanzione

Andrew LenardsMonica Lent Zhenyuan LuEric LyonsNaim MatasciSheldon McKayRobert McLayAngel MercerDave MicklosNathan MillerSteve Mock Martha NarroPraveen NuthulapatiShannon OliverShiran PasternakWilliam PeilDennis RobertsJerry Schneider

Anthony HeathBarbara HeathNatalie HenriquesUwe HilgertNicole HopkinsEun-Sook JeongLogan JohnsonChris JordanB.D. KimKathleen KennedyMohammed KhalfanLars KoersterkSangeeta KuchimanchiKristian KvilekvalAruna LakshmananSue LauterTina Lee

Bruce SchumakerSriramu SingaramEdwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellKris UriePeter Van BurenHans Vasquez-GrossMatthew VaughnJason WilliamsJohn WregglesworthWeijia Xu

Postdocs:Barbara BanburyJamie EstillBindu JosephChristos Noutsos Brad RuhfelStephen A. SmithChunlao TangLin WangLiya WangNorman Wickett

The iPlant Collaborative - AcknowledgmentsStudents:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYi-Da ChenJohn DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha Marathe

Andrew Mercer Aniruddha MaratheKurt MichaelsDhanesh PrasadAndrew PredoehlJose SalcedoShalini SasidharanGregory StriemerJason VandeventerKuan Yang

Faculty Advisors & Collaborators:Ali AkogluGreg AndrewsKobus BarnardSue BrownThomas BrutnellMichael DonoghueCasey DunnBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDan KliebensteinJim Leebens-MackDavid LowenthalRobert Martienssen

B.S. Manjunath Nirav Merchant David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisAnn StapletonLincoln SteinVal TannenTodd VisionDoreen WareSteve WelchMark Westneat