23
1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

1

Data Integration and Extraction over Molecular Biological Data

Cui Tao

supported by NSF

Page 2: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

2

Motivation

Online biological data: Highly diverse in granularity and

variety Various formats Different terminologies, ID systems,

units

Page 3: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

3

How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames

Page 4: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

4

How to Build a Gene Extraction Ontology?

(G*A*U*C*)*

(G*A*T*C*)*

Page 5: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

5

Knowledge Sources Gene Ontology

Thousands of terms

All Species Toolkit 1,231,935 species names

Protein Databases Thousands of protein names

(Molecular Function, Biological Process, Cellular Component)

Page 6: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

6

Extraction Rules Statistical NLP Machine learning

Naïve Bayes Hidden Markov Models Decision Trees

Page 7: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

7

Integration

Page 8: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

8

Page 9: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

9

Page 10: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

10

Page 11: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

11

Page 12: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

12

Page 13: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

13

Integration Information Hidden behind Links

Page 14: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

14

Page 15: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

15

Page 16: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

16

Page 17: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

17

Query-based Extraction

Query the gene extraction ontology

Find applicable resources Fill out forms Extract information

Page 18: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

18

Query-based Extraction

Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”

Gene NameGene Sequence

Gene

Mutant

Protein FunctionMutant Function

Page 19: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

19

Page 20: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

20

Page 21: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

21

Page 22: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

22

Page 23: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF

23

Contribution Provides a way to automatically

integrate online biological data from different sources

Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query