Upload
peregrine-farmer
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models
Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester
Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit)Jacky Snoep - University of Stellenbosch
MS eScience Workshop, Pittsburgh, PA
SysMO=SYStems biology of Micro Organisms
(2)
(2)
(29)
(22)
(9)(4)
(1)
11 projects, 91 partners, 9 countries, started 2007
Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites
Sensitively retrofit a data access, model handling and data integration platform.
Support and manage the diversity of data, models and competencies.
Web-based solution:exchange of data, models and processes (intra-
and inter-consortia).search for data, models and processes across
the initiative.dissemination of results.
SysMO-DB
SysMO-DB Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
EML Research gGmbH, Germany
Isabel Rojas
University of Manchester, UK
Olga Krebs
Wolfgang Müller
Sergejs Aleksejevs
Carole Goble
Stuart Owen
Katy Wolstencroft
Connect projects, connect to outside
Project specific solutions
Internally used tools & data
Outside data and tools
Project
Public
My Disk: DataModelsWorkflows
Personal
SysMO-DB, inter-project
Own solutions
Suspicion
Data issues
Resource Issues
Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.
Suspicion and caution over sharing.Interesting interplay between modellers, experimentalists and bioinformaticians.
Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared
Different organisms, different strains.
No extra resources for the consortiums91 institutes, 11 consortiums, some overlapping
Principles…
Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible
Provide instant gratification Address doubt and anxiety Build it
Modellers
Exchange
Experimentalists
Exch
ange
Exchange
Exchange
Bioinformaticians
Three types of people
„Natural“ collaboration within SysMO
Short, simplified, black and white: Collaboration during
project design Varying methods of
collaboration during project Binomes (One modeller, one
experimentalist) Groups collaborating with
groups (occasional/formalized exchange of information)
Varying success Need for a watering
hole/meeting point Application where
experimentalists/bioinf/ modelers meet
({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)
({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)
({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32)
Trying to make experimentalists, modellers, bioinformaticians peacefully share resources
Some numbers& Some consequences
1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist
11 projects, 91 partners 20 programmer days/year/project 2.5 programmer days/year/partner “just in case“ approach impossible
Focus on real needs “just in time“, “just enough“ The right 20%
Help people help themselves Communication!
20%
80%
80-20-rule:80% of the featureswon‘t be used anyway
Useful features
Social Approach Questionnaires PALs (Project Area Liaison)
21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical
collaboration team Very intense face to face and
virtual collaboration UK and Continental PALS
Chapters Audits and Sharing
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
Communication via PALs
DB team PALS Projects
Show what is thereSuggest what is possible
Ask for requirements
Give requirementsTell priorities
Rate outcomesSuggest improvements
Double checkTransmit
Disseminate
Collect answers
Need to find the guy who does xyz: Yellow pages
Need to storeStandard Operating Procedures
Almost all our data is Excel
Outcome of first PALs meeting:
What‘s thereSysMO-SEEK screenshots
Yellow pages
Tag clouds
Bookmarks
Yellow pages tabs
ISA tabs
Standard Operation Procedures
JWS connection for modellers
View Study
New Assay (ISA)
Rights and sharing
Rights and sharing: create group
So much for the webapp
Rights+Sharing Connection to modelers‘ tools
Yellow pages SOPs
Almost there: Improved excel support
Matthew Horridge
Towards Just-Enough Exchange
Incremental steps from beta to beta
Towards Just-Enough Exchange
Largely a story about how to handle Excel sheets for user‘s benefits
SysMO Just Enough Exchange
COSMIC
Alfresco
BaCell-SysMO
Alfresco
MOSES
Wiki
SysMO-LAB
Wiki
SABIO-RK
Public Resources
SABIO-RK
Spreadsheets
SpreadsheetsSpread
sheets
Spreadsheets
BASE
Need for tradeoff
Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards
Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation Need to move incrementally to just-enough
standard implementation
Path = goalThe journey is part of the reward
Let people use what they use anyway If changes necessary,
be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many
users as possible: Simple search, simple exchange, simple tool use
A roadmap
Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs
Upload service: Hand-triggered upload of link/file Hand-added metadata
Harvesting+change detection service Automatic download Hand-added metadata
Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers
Use other data types where appropriate SBML, Matlab, Mathematica…
Stability hierarchy
Single group
Single SysMO project
Whole SysMO
Template for a group of experiments
More stable JERM data modelTemplate best practise
Project-level template
Increasing stability
Parsers/ annotators
Enter into that
Use mappers where needed
JERM Extraction Architecture
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataM
etad.
Data
Metad.
Data
MapperParser
Data
Metad.
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
Data
MapperParser
Project repositories
OopsSome projects not prolonged
Need all project data in the system fast,so…
JERM Extraction Architecture
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataM
etad.
Data
Metad.
Data
MapperParser
Data
Metad.
MapperExtractor
Template recognizer
Data handlerHarvester Data handler
Classifier/Dispatcher Template recognizer
Extractor
DataData
Data
MapperParser
DataProject repositories
Lessons we‘re learningSome interesting bits along the way
Subsetting: Don‘t overwhelm
Standards need to be comprehensive
Goal: „Minimum information“… (MIBBI)
Tends to be superset of what is needed for a project
Example for non-applicable attributes Tissue of a single cell Gender
Useful to use adapted subset-templates
Experimental design selection list
From biofolksonomy to ontology
Observation: Fast growing set of
standards Standards are moving
target Incremental approach
Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to
standard ontologies Provide migration tools
Tags + suggestions
Home-brewed taxonomy
A word on software
Template tooling Excel JAVA
SysMO-SEEK (open source under Apache license) Ruby on Rails
Convention over configuration Libraries & plugins
Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby
Database:MySQL also tested with SQLite(exclude db depedencies)
Summary
SysMO-DB as a virtual meeting point for different flavours of systems biologists
SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done still a lot todo
Challenges ahead…
Social PALs work great and motivated Now need moremoremore datadatadata
Technical Publishing into public repositories Search + exploration: The test for data quality
Hierarchical Faceted Search Distributed search via Taverna workflows
More workflows via SysMO-SEEK Improve modelling support
Bonus track: what if…
…the average data quality is below par?
„Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings
Thanks
EML People: Isabel Olga
UMAN People: Carole Katy Finn Stuart Sergejs
Jacky at Stellenbosch
BBSRC BMBF KTF
…and Microsoft for sponsoring this workshop
www.sysmo-db.orgEnd + questons
END