Upload
julio-e-peironcely
View
792
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Talk given at the 9th International Conference on Chemical Structures, June 9th. How metabolomics and metabolite identification can be improved by chemoinformatics.
Citation preview
June 9th 2011 ICCS 2011
Julio E. Peironcely @peyron
PhD student at Leiden University and TNO
Improving Metabolite Identification with Chemoinformatics
Metabolome
Julio E. Peironcely
all low molecular weight molecules(metabolites) in cells, body fluids,
tissues, etc.
Metabolomics the quantitative and qualitative
analysis of all metabolites in samples of cells, body fluids,
tissues, etc. June 9th 2011 ICCS 2011
Metabolomics Genome
Transcripts
Proteins
Metabolites
Phenotype
Julio E. Peironcely June 9th 2011 ICCS 2011
Metabolomics
Biological question
Sample preparation
Experi- mental design
Data acquisition
Data pre- processing
Biological inter-
pretation
Data analysis
Samples Raw data List of peaks/ biomolecules
Relevant biomolecules/ connectivities
& Models
Metabolites
Sampling
Protocol
Julio E. Peironcely June 9th 2011 ICCS 2011
LC-MS and Metabolite Identification
peak = m/z@rt = metabolite
RPLC-microTOF (Bruker)
Kloet et al, Metabolomics 2011
MS MS
LC
Julio E. Peironcely June 9th 2011 ICCS 2011
When all you have is a mass and a “window”
Mass + Window
MS KEGG HMDB
PubChem
ChemSpider
Measured Mass + Mass Window = Multiple Elemental Compositions
List of Elemental Compositions
List of Molecules
Julio E. Peironcely June 9th 2011 ICCS 2011
When you have MS
Mass + Window MS n
Use the expert system
Measured Mass + Mass Window + Fragments =
Single Elemental Composition
n
KEGG HMDB
PubChem
ChemSpider
fragments 1 EC
List of Molecules
Julio E. Peironcely June 9th 2011 ICCS 2011
Bottlenecks in Metabolite Identification
Many metabolites in LC-MS not identified
HighRes MS can obtain 1 EC = many structures
No tools for automatic identification
Takes long for the expert to identify metabolites
Julio E. Peironcely
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
1.1 1.2
1.1.1
1.2.1
1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
Database MS n
present
absent
Identity assignment
Structure generator
Elemental Formula +
Fragments
Metabolite-likeness
Our approach
Julio E. Peironcely June 9th 2011 ICCS 2011
Experimental Data
Processing Data Trees
Generate Structures
Filter Structures
1.1 1.2
1.1.1
1.2.1
1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
Structure generator
Elemental Formula +
Fragments
Metabolite-likeness
Our approach
Find Similar Trees Database
MS n present
absent
Identity assignment
Julio E. Peironcely June 9th 2011 ICCS 2011
Experimental Data
Processing Data Trees
Find Similar Trees
Filter Structures
1.1 1.2
1.1.1
1.2.1
1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
Database MS n
present
absent
Identity assignment
Metabolite-likeness
Our approach
Generate Structures
Structure generator
Elemental Formula +
Fragments
Julio E. Peironcely June 9th 2011 ICCS 2011
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
1.1 1.2
1.1.1
1.2.1
1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
Database MS n
present
absent
Identity assignment
Structure generator
Elemental Formula +
Fragments
Our approach
Filter Structures
Metabolite-likeness
Julio E. Peironcely June 9th 2011 ICCS 2011
MEF: spectral to fragmentation tree
CML
MEF CDK
XCMS
Rojas-Cherto et al. Bioinformatics (submitted)
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
MEF: spectral to fragmentation tree
1.1 1.2
1.1.1 1.2.1 1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
Mass EC …
1.3
1.3.1 1.3.2
1.3
1.3.1 1.3.2 1.2.3
1.2.3
full scan
MS2
MS3
MS
Spectral Tree
Peak
Noise
υ υ υ υ
υ
Fragmentation Tree
Parent Ion
Reaction
Neutral loss Fragments
Rojas-Cherto et al. Bioinformatics (submitted)
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
Fragmentation tree fingerprints
1.1 1.2
1.1.1
1.2.1
1.2.2
1
1.1 1.2
1.1.1 1.2.1 1.2.2
1.3
1.3.1
1.3.2
1.3
1.3.1 1.3.2 1.2.3 1.2.3
Miguel Rojas-Cherto, Leiden University
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Poster 59
Julio E. Peironcely June 9th 2011 ICCS 2011
Fragmentation tree fingerprints results
Miguel Rojas-Cherto, Leiden University
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Poster 59
“unknown” Most similar trees MCS
Julio E. Peironcely June 9th 2011 ICCS 2011
Structure Generator Elemental Formula Fragments
Generate
Keep molecules if canonical
augmentation
Poster 38
In collaboration with Jean-Loup Faulon, Evry University
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
CDK Nauty
All non-duplicated molecules
Julio E. Peironcely June 9th 2011 ICCS 2011
Structure Generator Results
Glycine Phenylalanine Malic acid D-Cysteine p-Cresol sulfate
C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S
84 277,810,163 8,070 3,838 10,203,389
6 4,037,499 1,601 100 19,940
93,137 948
584
278
Elemental Composition
# Output Molecules
1 Fragment
2 Fragments
3 Fragments
MOLGEN same # of molecules
In collaboration with Jean-Loup Faulon, Evry University
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Poster 38
Julio E. Peironcely June 9th 2011 ICCS 2011
Metabolite-likeness
Training Set 532 + 532
HMDB 8K
ZINC 21M
Standardization
Diversity Selection
Test Set 6.4K + 6.4K
5-fold CV
SVM RF BC
Metabolite likeness
In collaboration with Andreas Bender, Cambridge Univ.
Atom Counts Physicochemical desc.
MDL Public Keys FCFP_4 ECFP_4
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
Metabolite-likeness
Training Set 532 + 532
HMDB 8K
ZINC 21M
Standardization
Diversity Selection
Test Set 6.4K + 6.4K
5-fold CV
SVM RF BC
Metabolite likeness
1st RF – MDLPublicKeys Sensitivity Specificity AUC
99.84% 87.52% 99.20%
2nd RF – ECFP_4 Sensitivity Specificity AUC
99.77% 86.36% 99%
In collaboration with Andreas Bender, Cambridge Univ.
Atom Counts Physicochemical desc.
MDL Public Keys FCFP_4 ECFP_4
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
Metabolite-likeness, external validation
HMDB External
validation set ChEMBL
Metabolite likeness
DrugBank
Standardization
RF – MDLPublicKeys RF – ECFP_4
In collaboration with Andreas Bender, Cambridge Univ.
Random Selection
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
Metabolite-likeness, external validation
In collaboration with Andreas Bender, Cambridge Univ.
Experimental Data
Processing Data Trees
Find Similar Trees
Generate Structures
Filter Structures
Julio E. Peironcely June 9th 2011 ICCS 2011
www.MetiTree.nl ICCS 2011
Group Database Upload and organize your own trees Visualize trees Find similar trees (to be added: Struct. Gen, Met-likeness)
Julio E. Peironcely June 9th 2011 ICCS 2011
Conclusions Chemoinformatics plays a crucial
role in the metabolite identification pipeline
Now it is the time to challenge this pipeline with real cases
Expert is still needed
Julio E. Peironcely June 9th 2011 ICCS 2011
Acknowledgements TNO Quality of Life Leon Coulier Albert Tas
Evry University Jean-Loup Faulon Davide Fichera
HMP University of Alberta David Wishart Ying (Edison) Dong
Wageningen University/PRI Egon Willighangen Justin van der Hooft Ric de Vos Jacques Vervoort
Leiden University Miguel Rojas-Cherto Piotr Kasper Michael van Vliet Theo Reijmers Rob Vreeken Ronnie van Doorn Thomas Hankemeier
University of Cambridge Andreas Bender
Julio E. Peironcely June 9th 2011 ICCS 2011