43
my Experiment – A Web 2.0 Virtual Research Environment David De Roure Carole Goble

My Experiment – A Web 2.0 Virtual Research Environment David De Roure Carole Goble

Embed Size (px)

Citation preview

  • myExperiment A Web 2.0 Virtual Research Environment

    David De Roure Carole Goble

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Overviewe-Science is about scientists doing scienceA Tale of Two ProjectsmyExperimentDesign Patterns for a VRE

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *X-Ray e-LabAnalysisPropertiesProperties e-LabSimulationVideoDiffractometerGrid MiddlewareStructures DatabaseCombeChem pilot projectwww.combechem.org

    NeSC VRE Workshop

    E-ScientistsEntire e-Science Cycle Encompassing experimentation, analysis, publication, research, learningDigital LibraryE-ScientistsGraduate StudentsUndergraduate StudentsE-ExperimentationE-Scientistshttp://www.ukoln.ac.uk/projects/ebank-uk/Reducing time-to-experiment

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *The key observation!

    Publication at Source describes the need to capture data and its context from the outset and maintain a complete end-to-end connection between the laboratory bench and the intellectual chemical knowledge that is published as a result of the investigation

    ProvenanceThe details of the origins of data are just as important to understanding as their actual values

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *My Chemistry ExperimentBox of Chemists

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *The RDF Graph

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    e-Research workflowsAggregator servicesInstitutional data repositoriesData curation & preservation: databases & databanksValidationHarvestData creation & capture in Smart labDeposit Publishers: peer-review journals, conference proceedings PublicationValidationData analysis, transformation, mining, modellingSearch, harvestPresentation services: portalsData discovery, linking, citationLinking, citationLaboratory repositoryDeposit (Chemistry Central)e-Crystals Federation modelThis work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0

    NeSC VRE Workshop

    * | | Slide * Key collective activities in e-scienceinterpretation of data/eventsfollowing through decisions/ coordinating activitiesproducing documents & other artifactsarchiving/recovering information informal and formal communicationmeetingshttp://www.aktors.org/coakting/

    NeSC VRE Workshop

    What we learnt about VREsReducing time-to-experimentDatasets as publicationProvenance mattersPublish the pieces, dont warehouseSemantic Lab notebooks in the VREBlogging the labFederated back endSemantic DataGrid, built sociallyDeep integration with collaborative tools26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Bioinformatics is not Chemistry There are many pieces, from many boxes, but no box, and no lid with a complete picture of what the puzzle is supposed to be.

    Planning? No.Metadata an afterthought

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *myGridOpen Source middleware for Life Scientists that enables them to undertake in silico experiments and share those experiments and their results.Machinery for linking together datasets and toolsIndividual scientists, in under-resourced labs, who use other peoples datasets and applications.Ad hoc & exploratory workflows (data flows) To support sharing and collaboration between scientists to disseminate best practice and improve the quality of science33,000 downloads; 200+ user sites; 400+ workflows;3500 third party external services accessible.Moved from prototype to production quality.Open Middleware Infrastructure Institute UKhttp://www.mygrid.org.uk

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Users in US, Asia, UK, Europe, AustraliaSystems biologyProteomicsGene/protein annotationMicroarray data analysisMedical image analysisHeart simulation orchestrationHigh throughput screening of chemical compoundsPhenotypical studiesPublic Health studiesClinical trial analysisPlants, Mouse, HumanAstronomyCultural HeritageWidespread Adoption

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Manual analysis on the microarray and QTL data failed to identify this gene as a candidate. Repetitive, unbiased analysis. Paul Fisher et al A Systematic Strategy for Large-Scale Unbiased Analysis of Genotype-Phenotype Correlations Bioinformatics in reviewTrypanosomiasis cattle workflow reused without change to identify the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study of candidate genes had failed to do this. Recycling, Reuse, Repurposing

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Service and workflow annotationOntology 710 classesFull time curatorTagging by the masses3500 service. 350 curatedProvenanceOntology 35 classesEnriched with domain ontologies and service ontologies. Possibly.Export with data. Desirably.

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *New Scientific Digital ArtefactsDesignWorkflow design historyExperiment purposeScientistLogBookWorkflow run logData lineageResults interpretation log

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *New digital artefacts

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *myExperiment.org Portal Party28th & 29th Sept 2006Hand picked Taverna users + Taverna development teamFacilitated by NCeSS. AJAX based developmentCombeChem xfer

    A social networking environment for sharing any workflowA Taverna workflow run environmentA multi-workflow launch environment

    NeSC VRE Workshop

    * | | Slide *Virtual Research EnvironmentsVRE 1Technology-focusedExperimental Diverse design & development approachesStand-alone solutions

    VRE 2User- & research practice-focusedDevelopmental Unified design & development approachesIntegrated solutions

    CollaborationSupporting small & large-scale researchSupport for single-disciplinary and multi-disciplinary research

    NeSC VRE Workshop

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *openwetware.org

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *What are we trying to do?Enabling scientists to be (more) creative.Enabling scientists to be scientists. And not programmers.Enabling mediocre scientists to become better and thus have better science.Enabling smart scientists to be smarter and propagate their smartness.Accelerate dissemination, pooling, insight.Encouraging sanctioned plagiarism.

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *PrinciplesFocus on making it easy to publish informationDiscovering and sharing experimental artefactsPublishing results to standard community repositoriesPublishing scholarly outputFamiliar social networking / web paradigmsKeeping it free and fluid and creative. Me-Science.Crossing system boundariesTrans-workflowCrossing discipline boundariesMulti-disciplinary, Inter-disciplinary, Trans-disciplinaryClustering expertiseIntellectual fusion outside discipline. We-Science. Life Science, Social Science, Astronomy, Chemistry

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Scoping exerciseWorkflow warehouse / federation of repositories Open Archives Initiative. Federated myExperiments. Sharepoint.Social space + organised rich site Social discourse + organised service / workflow space using curated semantics.Granularity and identifiers Rolling-up provenance. Id resolutionOpen vs protected content Quality, Reliability, Validation, Safety, Intellectual Property, Ownership, Secrecy, A duty of guardianship. Curation? Policing? Local data mixed with shared resourcesDesktop integration Google gadgets for workflows. Interacting with workflows through Office products.Workflow execution (WHIP) Workflows Hosted in Portals projectEvolving the myExperiment software Community development Enabling Scientists added value through applications and collaborative tagging

    NeSC VRE Workshop

    Hack Fest26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    NeSC VRE Workshop

    NeSC VRE Workshop

    Q1. Workflow Warehouse orFederation of Repositories?

    Everything on the myExperiment.org web sitevsDistributed stores

    Multiple myExperiments26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    Q2. Social Space or Shoe Shop?26/2/2007 | myExperiment | Slide *26/2/2007 | myExperiment | Slide *Shopping for Workflows and Services and Data should be as easy as shopping for shoes.Organic growth is good and bad.Social tagging might help discover workflows but we need good metadata for automated use.

    NeSC VRE Workshop

    Q3. How open is the content?OpenWetware is openOur users dont want thisProvenance helps26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    Q4. IntegrationBring user to Web SitevsBringing myExperimentness to existing interfaces26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    Web 2.0 Design Patternshttp://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html26/2/2007 | myExperiment | Slide *The Long TailData is the Next Intel InsideUsers Add ValueNetwork Effects by DefaultSome Rights ReservedThe Perpetual BetaCooperate, Don't ControlSoftware Above the Level of a Single Device

    NeSC VRE Workshop

    1. The Long TailOur target users are not just the specialist e-Scientists using computing resources to tackle major scientific breakthroughs, but also the large number of scientists conducting the routine processes of science on a daily basis. Through sharing we have the potential to enable smart scientists to be smarter and propagate their smartness, in turn enabling other scientists to become better and conduct better science.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    2. Data is the Next Intel InsidemyExperiment understands that scientists are focused on data, not software or one particular workflow engine. Workflows are components of customised applications, many of which are data-oriented rather than process-oriented. Users manipulate, through their own applications, the product (data, model) yielded by the workflow. Furthermore, workflows themselves are the data of myExperiment and provide its unique value.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    3. Users Add ValuemyExperiment makes it easy to find workflows and is designed to make it useful and straightforward to share workflows and add workflows to the pool. To succeed we draw on the insights into the incentive models of scientists gained through experience with Taverna.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    4. Network Effects by DefaultmyExperiment aggregates user data as a side-effect of using the VRE. The ability to execute workflows from myExperiment, and the integration of tools such as Taverna with myExperiment, further enable us to achieve increased value through usage.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    5. Some Rights ReservedmyExperiment users require protection as well as sharing, but the environment is designed for maximum ease of sharing to achieve collective benefits workflows are "hackable" and "remixable". Initiatives such as Science Commons provide a useful context for this.26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    6. The Perpetual BetamyExperiment is an online service (a collection of online services) and is continually evolving in response to its users. To support this, the project commenced with developers being embedded in the user community. Through day-to-day contact between designers and researchers, design is both inspired and validated.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    7. Cooperate, Don't ControlmyExperiment is a network of cooperating data services with simple interfaces which make it easy to work with content. It both provides services and reuses the service of others. It aims to support lightweight programming models so that it can easily be part of loosely coupled systems.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    8. Software Above the Level of a Single DeviceThe current model of Taverna running on the scientists desktop PC or laptop is evolving into myExperiment being available through a variety of interfaces and supporting workflow execution.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    Closinge-Science is difficult workflows and Web 2.0 make it easier.Our design workshops and the review against Web 2.0 design patterns have revealed the relationship between myExperiment and Web 2.0. The collective benefits of participation arise not only from the users but also from the developers ease of use and ease of development. It might be useful to review other VREs against the design patterns.

    26/2/2007 | myExperiment | Slide *

    NeSC VRE Workshop

    26/2/2007 | myExperiment | Slide *Take homesmyExperiment is a Web 2.0 Environment for Scientists to share experiments

    Join us!

    David De Roure [email protected]

    Carole Goble [email protected]

    NeSC VRE Workshop

    CreditsmyGrid and CombeChemMatt LeeDavid WithersDon CruickshankRob ProcterAlex VossJune FinchEd ZaluskaAll the users inc. embedders

    26/2/2007 | myExperiment | Slide *

    ********Experiment API: Requests for information return subgraphs not just triples containing requested resourcee.g. getRecord() returns all processes and materials in the record

    **ComplexOpen and trad publishersThis is the scope jisc worldJsic work with all these stakeholders* We have constructed some more pieces.

    And some meta-pieces*SBML model building, sequence analysis, microarray analysis, proteomics, QTL analysis, chemoinformatics, high throughput screening, image processing, rendering Dilbert cartoons

    Taverna workflow workbench. Soaplab Web Service + Java + Beanshell +++++

    **Commercial use by Apple Corp + BioTeamUsers across USA and EuropeEU EMBRACE NoESoaplab services supported by European Bioinformatics Institute

    Service providersBioMOBY, BluePrint, EMBOSS, BioMART Middleware developersSemantic Grid technologies, workflowSIMDAT,EGEE, GridSphere, SCECLife Science users and tool developersVirginia Bioinformatics Institute, SDSC, UC Davis, USC, VL-e, Purdue, caBIG, EMBRACECommercialLexicon Genetics, Apple, BioTeamHigh profile three citations in Science May 2004 Distributed Computing issueInternational invitations

    *Confirmed by the biologistsWorm Lady's name is Joanne Pennock and as far as I know she works for Prof. Richard K.Grencis. DescriptionTrichuris muris - the mouse whipworm is a useful parasite model of the human parasite - Trichuris trichuria. Whipworms derive their name from their characteristic morphology. Adults occupy the large intestine with their anterior ends embedded in the cells lining the intestine. Transmission occurs by ingestion of contaminated material.

    Jo didnt know about the tools; she didnt know how to do it properly. REUSE

    Identified sex-dependant biological pathways involved in mouse model. The correlation of sex depandance and the ability of mice to expel the parasite had previously been hypothesised, however, had not been verified using conventional manual analysis techniques.

    ***Combine different formalisms in one system?E.g. a dataflow Kahn network and a central- clock based calculus

    Kepler logo**********