44
1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics

MAGE : Revised submission against LSR RFP-007 "Gene Expression"

  • Upload
    tien

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

MAGE : Revised submission against LSR RFP-007 "Gene Expression". Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics. Overview. Acknowledgements Specification history and structure Fundamental Terms UML Packages Mapping from PIM to XML-PSM Schedule Resources. Doug Bassett (Rosetta) - PowerPoint PPT Presentation

Citation preview

Page 1: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

1

MAGE:Revised submission

against LSR RFP-007"Gene Expression"

Ugis Sarkans, EBI

Michael Miller, Rosetta Inpharmatics

Page 2: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

2

Overview• Acknowledgements

• Specification history and structure

• Fundamental Terms

• UML Packages

• Mapping from PIM to XML-PSM

• Schedule

• Resources

Page 3: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

3

Acknowledgements• Doug Bassett (Rosetta)

• Derek Bernhart (Affymetrix)

• Alvis Brazma (EBI)

• Steve Chervitz (Affymetrix)

• Francisco Dela Vega (Applied Biosystems)

• Michael Dickson (NetGenics)

• David Frankel (IONA)

• Ken Griffiths (NetGenics)

• Scott Markel (NetGenics)

• Michael Miller (Rosetta)

• Dave Nellesen (Incyte)

• Alan Robinson (EBI)

• Ugis Sarkans (EBI)

• Barry Schwartz (Affymetrix)

• Martin Senger (EBI)

• Paul Spellman (Stanford)

• Jason Stewart (NCGR)

• Charles Troup (Agilent)

• participants of MAGE programming jamboree (hosted by Iobion) in Toronto, September 2001

Page 4: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

4

Model -Driven Architecture• Platform Independent Model (UML)

– most of the effort spent on this

• Platform Specific Model– XML

• UML (refined from PIM):– not used (Rational Rose profile for UML not that useful)

• DTD – generated from PIM

– manual modifications

Page 5: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

5

History of the submittal• lifesci/01-06-02 - an interim draft before the

Danvers meeting– not enough time to work out XML

• lifesci/01-08-01 - not the final submission– programming jamboree after the Toronto

meeting helped a lot, especially in the XML mapping area

• lifesci/01-10-01 - current submission

Page 6: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

6

Specification Structure

• Text document with explanations, including all diagrams– prepared partly by exporting from Rational

Rose

• PIM, UML model as a single XMI file

• XMI => DTD translation software (as a formal representation of the mapping rules)

• XML DTD

Page 7: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

7

Fundamental Terms• BioSample - tissue, cell-line, etc. that may

be treated

• BioMaterial - generic term for biological-based material

• BioSequence - an abstraction of a biological sequence

• BioAssay – treatment of an array with a labeled extract, i.e.

hybridization– experimental step in a broader sense

Page 8: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

8

Fundamental Terms (2)• Reporter - the physical representation of

biosequence(s) on an array

• Feature - location on an array

• Event - description of an action, i.e. treatment of a BioSample or the act of hybridization

• Transformation - a specific Event, transforming a set of data to another set of data.

Page 9: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

9

UML Packages (1)

• BioSequence and BQS

• BioMaterial

• BioEvent

• ArrayDesign and DesignElement

• ArrayManufacture

• BioAssay

• BioAssayData

Page 10: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

10

UML Packages (2)

• Experiment

• HigherLevelAnalysis

• Miscellaneous– Describable– Measurement– QuantitationType– Protocol– Audit and Security

Page 11: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

11BSANE BQS

Description

Protocol

Measurement

Audit

Treatment

Transformation

BioEvent

Experiment

ArrayDesign

BioMaterial

BioAssayData BioAssay

DesignElement

UML Packages (3)

HigherLevelAnalysis

BioSequence

ArrayManufactureQuantitationType

Page 12: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

12

Package dependencies

Page 13: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

13

Important package dependencies

Page 14: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

14

Experiment• Represents the container for a hierarchical

grouping of BioAssays

• ExperimentDesign decribes and annotates the overall design and purpose of the experiment

• Description of experimental steps can be structured by ExperimentalFactors/ FactorValues:– ExperimentalFactor is a part of

ExperimentDesign– FactorValues can be attached to BioAssays

Page 15: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

15

Experiment

Page 16: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

16

HigherLevelAnalysis

• The results of performing analysis on the BioAssayData from an Experiment

• Clustering allows specifying the results of analysis as a hierarchical tree

• Cluster Nodes can have NodeValues and are associated with *Dimension objects

Page 17: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

17

BioAssayData• The data associated with either a measured

BioAssay or a derived BioAssay• Data is conceptually a 3-D matrix, with

dimensions:– BioAssayDimension

– DesignElementDimension

– QuantitationTypeDimension

• Transformations are used to capture data processing sequence and rules– *Mapping objects formalize dimension translations

• Two representations for BioDataValues:– a set of BioDataTuples

– BioDataCube

Page 18: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

18

BioAssayData

Page 19: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

19

BioAssayDataBioAssay

QuantitationType

DesignElement

Transformation

Page 20: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

20

QuantitationType

• StandardQuantitationTypes and SpecializedQuantitationTypes

• list of SQTs

• can refer to a Channel object

• QuantitationTypeMap - within BioAssayData package

Page 21: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

21

BioAssay

• Three types of BioAssays (experimental steps):– PhysicalBioAssay

• Contains information and annotation on the event of joining an Array with BioMaterial, typically with LabeledExtract(s); also, Treatments

– MeasuredBioAssay• FeatureExtraction

– DerivedBioAssay• corresponds to a dry-lab experimental step

Page 22: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

22

BioAssay

Page 23: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

23

Array• Manufacturing information about the

implementation of an array design– Defects and deviations from the design can be

recorded• FeatureDefects

• ZoneDefects

– The LIMS biomaterial information for what was put on each feature can be recorded here

– ArrayGroups and Fiducials

Page 24: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

24

Array

Page 25: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

25

BioMaterial• Describes how a BioSource is treated to

obtain the BioMaterial for Hybridization (typically a LabeledExtract)

• Used by a BioAssayCreation in combination with an Array to produce a PhysicalBioAssay

• A set of treatments are typically linear in time but can form a Directed Acyclic Graph

• Formalization of Treatments with Compounds

Page 26: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

26

BioMaterial

Page 27: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

27

DesignElement

• DesignElements– Features are the locations on the array

– Reporters represents some biological sequence (clone, oligo, etc.) that can be placed on one or more features

• immobilized characteristics

– CompositeSequence is a grouping that represents a biological sequence composed of other biological sequences (gene, exon, etc.)

• biological characteristics

• *Maps - for relating Features to Reporters etc– MismatchInformation

Page 28: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

28

DesignElement

Page 29: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

29

BioSequence

• BioSequence class - abstraction of various biosequences

• DatabaseEntries for characterizing BioSequences

• Simplication of BSANE draft; will need to be compatible with the end result of BSANE

Page 30: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

30

ArrayDesign

• ArrayDesign describes a microarray design that can be manufactured– Zone information– DesignElementGroups

Page 31: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

31

ArrayDesign

Page 32: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

32

BioEvent• Abstraction of various MAGE events:

– physical (e.g., BioMaterial Treatment) – data manipulation (Transformation)

• Have associated ProtocolApplications (an ordered list)

• Subclasses have some target (the result of the BioEvent)

• Often have sources

• Relevant for BioMaterial, BioAssay, BioAssayData packages

Page 33: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

33

Protocol• Protocol and ProtocolApplication

– Protocol describes a generic laboratory procedure or analysis algorithm

– ProtocolApplication describes the actual application of a protocol

– ProtocolApplication:• values for the replaceable parameters

• any variation from the Protocol

• Similarly:– Hardware and HardwareApplication– Software and SoftwareApplication

Page 34: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

34

Protocol

Page 35: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

35

Miscellaneous (1)• Hierarchy of top-level abstract classes

– Extendable - can have properties– Describable - can have also Descriptions and

Security and Audit information– Identifiable - also has (unambiguous within

some scope) identifier and a name

• AuditAndSecurity package– Contact/Person/Organization classes– tracking of changes (audit trail)– user security (access rights to MAGE objects)

Page 36: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

36

Miscellaneous (2)• Description package

– Description is a container for• free text description

• OntologyEntries

• DatabaseEntries

• BibliographicReferences

• BQS package– BibliographicReference class

• Measurement package– Measurement is a quantity with a unit– simple Measurement ontology provided

Page 37: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

37

DTD & XML Format

<MAGE-ML> <{packageName}_package> <{className}_assnlist> <!-- generated container element --> <{className}> <!-- independent class elements --> <{container}> <!-- one of *_assn, *_assnref, *_assnlist, *_assnreflist --> <{className or className_ref}>

…<!-- alternating {container} and {className or className_ref} --> </{className or className_ref}> </{container}> </{className}> </{className}_assnlist> ... <!-- more independent classes --> </{packageName}_package>> ... <!-- more packages --></MAGE-ML>

* slide borrowed from Angel Pizarro, UPenn

Page 38: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

38

XML tree example

AuditAndSecurity_pkg

Contact_assnlist

ExperimentDesign_assn

Experiment_pkg

Experiment_assnlist

Experiment

Contact_ref

ExperimentDesign

Provider_assnref

MAGE-ML

Contact

* slide borrowed from Angel Pizarro, UPenn

Page 39: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

39

Programming APIs• Mapping of OM to language-specific OMs• API’s are automatically generated from the

OM specifications– Get/set methods for associations– Get/set methods for attributes

• XML <=> language-specific OM marshallers/unmarshallers - also automatically generated

Page 40: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

40

Programming APIs (cont.)

• Use standard modules/packages– Xerces, JDK, etc.

• Implementation in Java, C++, Perl

• Building annotation tools/database access modules on top of these APIs

Page 41: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

41

Schedule

• LSR ‘vote to vote’ at Dublin OMG meeting in November– LSR, AB, DTC votes at Dublin OMG meeting

• Setting up FTF

• open source implementation efforts– Jamboree II at EBI, December 6-11

• MAGE v.2.0– current MAGE <=> MAGE v.2.0 mapping

rules

Page 42: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

42

Web Sites

• MAGE specification - hosted by Rosetta– links to documents

• presentations

• UML models– XMI files

– Rose .mdl files

– HTML version

– PNG image files of diagrams

– http://www.geml.org/omg.htm

• MGED programming effort:– http://sourceforge.net/projects/mged

Page 43: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

43

Mailing Lists• Specification-related

[email protected]– to subscribe, send the following to

[email protected]

subscribe lsr-ge <yourEmailAddress>

• MAGE-STK development-related– https://lists.sourceforge.net/lists/listinfo/mged-

mage

Page 44: MAGE : Revised submission against LSR RFP-007 "Gene Expression"

44

Questions?