Upload
srikantapatra
View
215
Download
0
Embed Size (px)
Citation preview
8/8/2019 03 Wright Carolina EBRC
1/32
1
Fred Wright, Ph.D.
University of North Carolina at Chapel Hill
The CarolinaEnvironmental
Bioinformatics Research Center
8/8/2019 03 Wright Carolina EBRC
2/32
2
Organization of Center Three major Research Projects: (1) Biostatistics,
(2) Cheminformatics, and (3) ComputationalInfrastructure for Systems Toxicology
Administrative Unit
Public Outreach and Training Activity (POTA) Functional areas ofAnalysis, Methods
Development and Tools Development overseen
by a panel of experienced investigators
8/8/2019 03 Wright Carolina EBRC
3/32
3
Director
Dr. Fred WrightExternal Science
Advisory Committee
Administrative Co-Director
Dr. Mary Sym
Analysis - Dr. Lawrence KupperMethods - Dr. J. Stephen Marron
Tools - Dr. Jan Prins
Integration
Translation(POTA)
Dr. BradHemminger
QualityAssurance
Dr. Mary Sym
Research Oversight
Project 1
Biostatistics
Dr. Fred Wright
Project 2
Chem-informatics
Dr. AlexTropsha
Project 3
Computationand SystemsToxicology
Dr. Ivan Rusyn
Dr. David Stotts
CEBRC Center Organization
Scientific Co-Director
Dr. Ivan Rusyn
8/8/2019 03 Wright Carolina EBRC
4/32
4
(1) Biostatistics inComputational Toxicology
Emphasis on strengths inmicroarray analysis,
elucidation of
networks/pathways,Bayesian approaches
Stresses existing
capabilities
8/8/2019 03 Wright Carolina EBRC
5/32
5
(2) Chem-informatics seeks to establish a
universally applicable androbust predictive toxicologymodeling framework
Focuses on Quantitative
Structure Activity/PropertyRelationships (QSAR)
Establishes a modeling
workflow, toxicityprediction scheme and planfor software development
Figure 4. Predictive QSPR Modeling Workflow
InputStructure
File
ConvertStructures
dbtranslateBabeletc.
MolconnZGenAP
etc.
GenerateDescriptors
Utility(UNC)
NormalizeDescriptors
Descriptor Generation
MolconnZ-ToDescr
(UNC)
ReformatDescriptors
Descriptor formatting
InputDescriptor
File
InputActivity
File
Train & TestSet Selection
SE6(UNC)
Build & Test Models
Randomize(UNC)
RandomizeActivities
RWKNN,SAPLS (UNC),
etc
QSARAlgorithm
KNNPredict,
SAPLSPred (UNC)etc.
PredictTest Set
Report & Visualize Results
ModStat(UNC)
CompileResults
Weblab, TSAR,MOE, Spotfire,
etc.
VisualizeResults Database
to Screen
Screen Database
Utility
NormalizeDescriptors
DBMine,KNNPredict,
etc.
MineDatabase
TSAR, MOE,Spotfire,
etc.
VisualizeHits
QSAR
Model(s)
programsfunctions User input
Do y-randomTesting?
Randomize
Y-rand
test
QSARModel
Predict
Pass orFail?
Train/testSelection?
Ye s
No
PassFailUpdate Status Db
To Complete
DescriptorPreparation
QSARModeling/
Predict
No
UpdateStatus DB
To Complete
SE6
QSARModelpredict
Are there moretraining sets?
No
Ye s
CollectGRID
Process
Results
Ye s
Figure 5. Workflow Logic of Model Generation Process
8/8/2019 03 Wright Carolina EBRC
6/32
6
(3) Computational
Framework for
Systems Toxicology
Uses model for toxicityprofiling in multiple strains
of mice to set up
computational infrastructure Some data mining activity
will develop user-friendly
software tools from methodsin Projects 1 and 2
Control C57BL/10J BUB/BnJ MSM/Ms
Liver Injury
8/8/2019 03 Wright Carolina EBRC
7/32
7
Project 1Biostatistics in Computational Toxicology Fred Wright, Ph.D. (P.I.) statistical genetics, genomic analysis
Mayetri Gupta, Ph.D. sequence analysis, motif detection Young Troung, Ph.D. Bayesian network genetic analysis, SVM
methods for metabolomic data
Joseph Ibrahim, Ph.D. Bayesian analysis of microarray data Danyu Lin, Ph.D. haplotype-phenotype analysis, microarray
analysis
Fei Zou, Ph.D. statistical genetics, genomic analysis
Andrew Nobel, Ph.D. clustering, data dimensional reduction,
genetic pathway analysis
Masters trained personnel
8/8/2019 03 Wright Carolina EBRC
8/32
8
Project 1 objectives
to provide statistical analysis capability
to develop appropriate new methods to apply to
computational toxicology problems
to develop computational tools
to disseminate research findings, train students, and
coordinate additional statistical research incomputational toxicology.
8/8/2019 03 Wright Carolina EBRC
9/32
9
Methods (to name a few)
Sample size estimation for high-throughput data
P-value computation, significance testing Multiple-testing issues, false discovery rates
Dose-response modeling
New measures of differential expression
Transcriptional regulation and motif discovery
Network analysis, discrimination methods Pathway analysis
8/8/2019 03 Wright Carolina EBRC
10/32
10
Tools Much of initial code has been implemented in
R/Bioconductor. This is directly useful to other
statistical investigators.
Working with project 3 investigators and students toproduce user-friendly web-based and/or standalone
applications
Working to increase utility of methods by integrationwith informatics and biological annotation
We view the SAM software as a model for
independent successful dissemination. Project 3
personnel are working on implementing appropriate
procedures in ArrayTrack
8/8/2019 03 Wright Carolina EBRC
11/32
11
Log(sample mean)
Expression measurementsshow a mean-variancerelationship
Which we can exploit to
reduce the false discoveryrate
Grey envelope shows allthe SAM procedures
Example 1. New ways of detecting
differential expression
Hu and Wright, in press
Lowest curve is a newmaximum likelihoodprocedure
Log(sample mean)
8/8/2019 03 Wright Carolina EBRC
12/32
12
Example 2. Significant genes/pathways/categories: theSignificance Analysis ofFunction and Expression
procedure (honest pathway significance testing)
Category
p-value
Genes in
category
Shading
indicatesindividually
significant
genes
Barry et al. Bioinformatics 21:1943-1949 , 2005
8/8/2019 03 Wright Carolina EBRC
13/32
13
Key: blue (p
8/8/2019 03 Wright Carolina EBRC
14/32
14
Example 3. Isotonic regression: gene expression dose-response data
Model -
fshould be
strictly
increasing or
decreasing
Hu et al., 2005, Bioinformatics21: 3524-3529).
8/8/2019 03 Wright Carolina EBRC
15/32
15
Pyrethroid Biomarker Project (J. Harrill, K.
Crofton and colleagues, U.S. E.P.A)
Problem: Lack of a cost efficient biomarker of effect
hampers assessments of the cumulative risk ofpyrethroid insecticides.
Aim: Develop a biochemical biomarker of effect for
pyrethroids that reflects changes in neuronal firingrates.
Methods: Use gene arrays and RT-PCR to identify
dose-responsive transcripts in rat CNS. Permethrin
and deltamethrin each examined at four doses,
Affymetrix arrays.
8/8/2019 03 Wright Carolina EBRC
16/32
16
( ) ( )highest lowest f dose f doseM
v
=
Dose-response, cont.: a statistic to rank genes
Standard error estimate.Could be improved.
Dose-response dataon pyrethroid in ratbrains, courtesy ofJ. Harrill and K.Crofton, U.S. E.P.A.
8/8/2019 03 Wright Carolina EBRC
17/32
17
Project 2Chem-informatics
Alex Tropsha, Ph.D. (P.I.) computational chemistry, QSAR Weifan Zheng, Ph.D. computational methods in drug discovery,
QSAR
Alexander Golbraikh, Ph.D. mathematical approaches in QSAR
development
Yufeng Liu, Ph.D. Support vector machines, semi-supervised
machine learning
additional personnel
8/8/2019 03 Wright Carolina EBRC
18/32
18
Project 2 objectives
to develop an innovative QSPR modeling workflow
based on the principles of combinatorial QSPR
modeling, model validation and consensus prediction
to develop toxicity predictors using the workflow
to integrate modeling tools and endpoint predictors
using workflow design middleware and workflow
deployment in a predictive toxicology web portal
Applied to toxicology datasets
8/8/2019 03 Wright Carolina EBRC
19/32
19
Project 2 builds on years of research in the Tropsha lab
on QSAR/QSPR modeling and developing robustpredictors
Many of the machine learning and cross-validation ideas
are used in statistical genomics
Descriptors topological molecular indices, size and
shape, hydrophilic/phobic indices, physical properties,
etc.
Try to predict biological activity
Analysis of the Carcinogenic Potency Database
(collaboration with Dr. A. Richard, EPA) has been
performed, applied to 693 compounds, with
classification kNN QSAR prediction accuracies
estimated at 85%-90%.
8/8/2019 03 Wright Carolina EBRC
20/32
20
Only accept models
that have aq2 > 0.6
R2 > 0.6, etc.
MultipleTraining Sets
Validated Predictive
Models with High Internal& External Accuracy
OriginalDataset
MultipleTest Sets
Combi-QSARModelingSplit into
Training andTest Sets
ActivityPrediction
Y-Randomization
Database
Screening
Only accept models
that have aq2 > 0.6
R2 > 0.6, etc.
MultipleTraining Sets
Validated Predictive
Models with High Internal& External Accuracy
OriginalDataset
MultipleTest Sets
Combi-QSARModelingSplit into
Training andTest Sets
ActivityPrediction
Y-Randomization
Database
Screening
Flowchart of predictive toxicology framework based on
validated combi-QSAR models. Numerous public datasets
proposed.
8/8/2019 03 Wright Carolina EBRC
21/32
21
Project 3Computational Infrastructure for Systems Toxicology
David Stotts, Ph.D. (co-P.I.) computer science, software
engineering
Ivan Rusyn, Ph.D. (co-P.I.) toxicology, genomics Wei Wang, Ph.D. computer science, data mining
David Threadgill, Ph.D. mammalian genetics, genomics
Additional programmers and students
8/8/2019 03 Wright Carolina EBRC
22/32
22
Project 3 objectivesDevelop and implement algorithms that streamlineanalysis of multi-dimensional data streams.
Facilitate the development of a standard workflow for
(i) analysis of -omics data, (ii) linkages to classical
indicators of adverse health effects, and (iii) integration
with other types of biological information such as
sequence and cross-species comparisons.
Implement user-friendly software for approaches fromProjects 1-3
8/8/2019 03 Wright Carolina EBRC
23/32
23
A driving biological problem:
Toxicogenetic analysis of susceptibility to
toxicant-induced organ injury The model being used by Drs. Threadgill and
Rusyn involves extensive profiling of numerous
mouse strains (over 40) for relevant organs Early data on acetominophen and alcohol on liver
Proposals for trichloroethylene and other toxicants
on liver, kidney, and other organs
8/8/2019 03 Wright Carolina EBRC
24/32
24Image courtesy of D.W. Threadgill
The Mouse as a Model for StudyingGenotype-Phenotype Interactions
8/8/2019 03 Wright Carolina EBRC
25/32
25
Strain-specific susceptibility toacetaminophen (APAP)-induced
liver injury. Serum ALT levels(top panel) and tissuehistopathological changes(bottom panel) were assessed24 hrs after a single doseexposure to APAP (300mg/kg, i.g., 24 hrs).
A large variation in response by
genetic background
8/8/2019 03 Wright Carolina EBRC
26/32
26
Toxicological and expression analysis of genotype-specific responses toethanol in liver. Serum and liver tissues were collected from mice of 6different strains after acute (5 g/kg, 6 hrs; A) or subchronic (4 weeks, B)treatment with ethanol.
8/8/2019 03 Wright Carolina EBRC
27/32
27
Systems Biology Approach
8/8/2019 03 Wright Carolina EBRC
28/32
28
Transcriptome map for the murine brain and liver.
Examination of genetic networks that regulate
gene expression in liver (webQTL and beyond)
Source: Ivan Rusyn and colleagues
8/8/2019 03 Wright Carolina EBRC
29/32
29
Correlation between gene expression of CYP2C29 and several liver-specific phenotypes recorded for BXD strains.
Source: Ivan Rusyn and colleagues
8/8/2019 03 Wright Carolina EBRC
30/32
30
Development of new methods for -omics data analysis:Finding associations between gene expression profiles and
strain-specific genotyping data
SiZer
Smoothing
approach
8/8/2019 03 Wright Carolina EBRC
31/32
31
Data analysis procedures in concert with project 1,including principal component analyses, distance-weighted discrimination, SAFE, etc.
Specific data mining approaches also proposed, suchas subspace clustering (SNPs vs. phenotypes, geneexpression), that fall outside of typical statistical
framework The computational challenges are immense when wecompare different omics platforms (e.g., 100,000SNPs X 30,000 transcripts)
This requires serious computer science (activities ofUNC SNP group).
8/8/2019 03 Wright Carolina EBRC
32/32
32
Collaborations with the NCCT and NJ ebCTC Several collaborative projects identified and ongoing
Several graduate students working on projects at
NCCT or shared advising with CEBRC members
Implementation of ArrayTrack extensions underway
in Project 3
Plan for deliberate expansion of collaborations with
NCCT
The ebCTC is highly complementary to our Center.
Coordination with ebTrack/ArrayTrack development
is one key area of interchange.