18
th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College

4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College

Embed Size (px)

Citation preview

4th Annual EPSRC e-science meeting

The need for e-Science

An industrial perspective

Stephen Calvert – VP Cheminformatics GSK Yike Guo – Imperial College

4th Annual EPSRC e-science meeting

What is the “industrial” world like?• Historically

– Low volume

• 30-50 cmpds/yr/chemist: 10,000s assay wells/yr– Low information diversity

• scientists generally dealt with limited types of data– reductionist approach

• limited information per experiment– Interpretation critical fro next step

• scientists required:– simple systems to assist in information monitoring– decision making resides with the scientist

4th Annual EPSRC e-science meeting

What is the “industrial” world like?• What happened in the last 5 years?

– “industrialisation” - Application of “principles of industrialisation” to drug discovery

• high volume– 10,000 cmpd/yr/chemist/100+ million wells/yr

– biology revolution• Human genome

– “system biology” – holistic view and interpretation– high content data --- images– multiple result types from each experiment – bio-markers, pathways

– knowledge integration• scientific discipline integration

– scientists required:• complex systems, algorithms, statistics…….• decision making shared between systems and scientists• “Informatics” essential – partnership not service

4th Annual EPSRC e-science meeting

How have we (IT) tackled the transition?

• Business as usual– problem centric view

• build applications• integrate applications

• Educate scientists in the realms of IT– “Now I need to be an IT expert alongside chemistry, biology,

genetics, robotics, engineering ……”– interesting time scale - generations

• Technology is our saviour!– client server, web services, java, C#, Corba, OO programming,

extreme programming, grid computing, …..

4th Annual EPSRC e-science meeting

What are the results?chemistry

• “islands” of process & data– complex integration problem

• “spaghetti” joins our worlds - unsustainable - cost

• control with “IT”– mismatch in cycle time to change– engineered out serendipity– service role reversed

infrastructure

Minicomputer

Minicomputer

Minicomputer

samples

HayStackstores

Tube StoreManager

ProcessControl

Manager(PinPoint)

manualstore

Manual StoreManager

Weighing

Balance

Balance

Dissolve Sort

H'Ware H'Ware

HayStackstores

Solid StoreManager

H'Ware

Other...

client ordercomponent

availabilitycomponent

Sample Holding Area

submissioncomponent

Booking-in Manager

Processing Queue

WorksOrder

ProcessingManager

Job Queue

samples:client - scientistclient - remote cmpd bank

DispatchManager

Sample Holding Area

ALS system

ALSManager (RTS)

Stock RecordDatabase

DiscoveryStock

Warehouse

DiscoverySample History

Warehouse

sample historycomponent

GSK Applications

User interface component

Database

Physical queue

Electronic queue

Automation Hardware

screening

“library” designdata

4th Annual EPSRC e-science meeting

How could we do it differently?

• result in:– handing control of science back to the scientist– match cycle times to change– Simplify

• how can we merge the 2 worlds?– physical, information

4th Annual EPSRC e-science meeting

Doodling in knowledge and experiment space

• no predefined steps• capture what was done don’t

restrict what can be done?• don’t restrict the non-obvious

Information ResourcesInformation ResourcesTargetList &Status

TargetLeads

IC50

Assay

ExclusionLists

StructureValidationOther

Assay...

Q: - are these results real?

Q: - what do I know about these compounds?

Q: - what other data can I acquire?

Q: - what other data can I acquire?

this is workflow – isn’t it?

physical & information worlds merge

4th Annual EPSRC e-science meeting

Doodling in knowledge & experiment space

• Need access to world-class scientific algorithms and tools• Need access to disparate data sources from multiple locations• Intuitive & flexible GUI design/analysis• Framework needs to be very generic • Ability to construct a “just-in-time” application• Need to serving the requirements of a varied user community

– both in terms of scientific and technical know-how

• Capture and dissemination of “Best practice” within a creative environment to enhance efficiency company wide

4th Annual EPSRC e-science meeting

Discovery Net Overview

• Funding : – One of the Eight UK National e-Science Projects (£2.4 M)

• Key Features:

– Allow Scientists to Construct, Share and Execute Complex Knowledge Discovery Processes & Services

– Allow Institutions to Manage and Utilise the Compositional Services as its Intellectual Properties

• Applications:– Life Science– Environmental Modelling – Geo-hazard Prediction

• Achievement :– For the First time Discovery Net Realises the

Dynamic Construction of Compositional Services on GRID for Real Time Knowledge Discovery and Decision Making

• Goal : Constructing the World’s First Infrastructure for Global Wide Knowledge Discovery on the Grid of Web Services

Using GRID Resources

ScientificInformationScientific

InformationScientific Discovery

In Real Time

LiteratureLiterature

DatabasesDatabases

OperationalData

OperationalData

ImagesImages

InstrumentData

InstrumentData

Real Time Data Integration

Dynamic ApplicationIntegration

Discovery Services

Process Knowledge Management

Workflow = Compositional Service

4th Annual EPSRC e-science meeting

Enterprise Wide Integrative Scientific Decision

Making Platform with Discovery Net Workflow • Constructing a ubiquitous

workflow : by scientists– Integrate information resources/software applications

cross-domain – Support innovation and capture the best practice of

your scientific research

• Warehousing workflows: for scientists

– Manage discovery processes within an organisation– Construct an enterprise process knowledge bank

• Deployment workflow: to scientists

– Turn a workflows into reusable applications/services– Turn every scientist into a solution builder

4th Annual EPSRC e-science meeting

An Integrative Analysis Example:Interactive&Interactive Scientific

Discovery with Workflow

Relational data miningRelational

data mining

Text miningText mining

Spectrum data miningSpectrum

data mining

Chemical sequence

data model

Chemical sequence

data model

Visualizingrelational

data clusters

Visualizingrelational

data clusters

Visualizingmultidimension

al data

Visualizingmultidimension

al data

Visualizingsequence data

Visualizingsequence dataVisualizing

pathway dataVisualizing

pathway dataText mining visualizationText mining visualization

Visualizing cluster

statistics

Visualizing cluster

statistics

Visualizing serial/spectru

m data

Visualizing serial/spectru

m data

Decision tree model of

metabonomic profile

Decision tree model of

metabonomic profile

Chemical structure

visualization

Chemical structure

visualization

Relational data miningRelational

data mining

Text miningText mining

Spectrum data miningSpectrum

data mining

Chemical data modelChemical

data model

4th Annual EPSRC e-science meeting

Discovery Net Commercialisation

Discovery Net ResearchCS : Workflow for Informatics on SOA

Sensor : Sensor Data Processing and MiningApplication : Life, Environmental and Geo-physical Sciences

DeltaDot

Research :

Commercialisation (Imperial College Spin Out Companies):

Workflow technology HT sensor processing

KDE Informatics Platform Label Free HT bioSensors

Life Science Industry

4th Annual EPSRC e-science meeting

library design - GSK• Process of selecting the molecules I want to make from the universe

of molecules

• Toolbox: scientific models, chemical handling, chemical properties, data access, statistics, data visualisation, ….

• Scientists can doodle in chemical space– Capture how scientists made decisions

• New algorithms, data sources added in < 1 hour

4th Annual EPSRC e-science meeting

The 2003 SARS outbreak

KDE Example2 : SARS Genome Annotation

Relationship between SARS and other virus

Mutual regions identification

Homology search against viral genome DB

Annotation using Artemis and GenSense

Gene prediction

Phylogenetic analysis

Exon prediction

Splice site prediction

Immunogenetics

Multiple sequence alignment

Microarray analysis

Bibliographic databases

Key word search

GeneSenseOntology

D-Net:Integration,

interpretation, and

discovery

Epidemiological analysis

Predicted genes

SARS patients diagnosis

Homology search against protein DB

Homology search against motif DB

Protein localization site

prediction

Protein interaction prediction

Relationship between SARS

virus and human receptors prediction

Classification and secondary structure prediction

Bibliographic databases

Genbank

Annotation using Artemis and GenSense

China SARS Virtual Lab based onDiscovery Net

Achievement: Dynamic Construction of Compositional Services:

Rapid construction of applications via composition of existing web services using workflow.

Instant deployment of analytical workflows as new web services with resource mapping.

Integrated workflow, provenance and service management

Collaborative construction of workflows by large numbers of researchers

Requirements:

Rapid constructing and sharing mission critical discovery services

Integration of diverse bioinformatics applications

Support collaborative research between geographically distributed researchers

Deploying services as easy to use tools for real time decision making

4th Annual EPSRC e-science meeting

Compositional Services for SARS Mutation Analysis

50 data resource> 200 software applications and servicesDesigned on top of the web service environmentUsed by more than 200 scientistsResult published in <<Science>>

4th Annual EPSRC e-science meeting

Future Challenge:GSK- InforSense & IC e-Science Collaboration

• Workflow Fusion : Applying advanced performance programming technology for dynamic optimization of workflow execution

• Workflow Abstraction : Investigating abstraction mechanisms for building workflow hierarchy and higher order composition forms

• Dynamic Service Composition: Investigating service ontology for dynamic composing services with workflow

• Workflow Metadata Model : Building up a generic meta data model for scientific workflow management and workflow warehousing

• Man – machine interface – free scientists from IT speak

4th Annual EPSRC e-science meeting

How can you help?• encourage focused research in key issues SCIENTISTS facing

in industries • catalyst the joint work in these focused fields between

academics, industry and commercial software vendors • facilitate the solution-oriented communication between

computer scientists and domain scientists in both academic and industry

4th Annual EPSRC e-science meeting

e-Science

• A politician's view:‘[The e-Science platform ] intends to make access to computing

power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’

Tony Blair

• A Scientist’s View:[The e-Science platform ] should help me to do my scientific

research free from the complexity of IT