Office of Research and Development
U.S. EPA’s ToxCast, Tox21, and COSMOS Projects: Cheminformatics Approaches to Creating Data Linkages and Synergies
Ann Richard U.S. EPA, National Center for Computational Toxicology Office of Research and Development
The views expressed in this presentation are those of the author and do not
necessarily reflect the views or policies of the U.S. EPA
Society of Toxicology, Phoenix, AZ, Mar 24-27, 2014
Office of Research and Development National Center for Computational Toxicology
ToxCast & Tox21:
Chemicals, Data and Release Timelines
Set Chemicals Assays Endpoints Completion Available
ToxCast Phase I 293 ~600 ~700 2011 Now
ToxCast Phase II 767 ~600 ~700 03/2013 Now
ToxCast E1K 800 ~50 ~120 03/2013 Now
Tox21 ~8300 ~80 ~150 Ongoing Ongoing
ToxCast Phase III ~900 ~100 ~100 Just starting 2014-2015
Chemicals
Assa
ys
~600
0
Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV,
endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS,
FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs,
marketed drugs, fragrances, flame retardants, etc.
~9000
~9000
Office of Research and Development National Center for Computational Toxicology
ToxCast PhII Data Release: http://www.epa.gov/ncct/toxcast/data.html
3
• ToxCast Assay Summary Activity Files
• ToxCast Assay Annotation Files
• ToxCast Chemical Library & Structure
Files (DSSTox)
• ToxCast Concentration Response Data
Files
• ToxRefDB Effect & Endpoint Data Files
Office of Research and Development National Center for Computational Toxicology
ToxCast PhI&PhII 1060:
# Compounds per Inventory
PesticideInerts
Water
Consumer
Antimicrobials
Green Chemistry
HPV
MPV
TRI
IRIS
EDSP
GRAS
AIR
243
217
210
91
85
232
83
216
240
130
26
90
Total In vivo
FDA CFSAN
NTP In Vivo
Donated Pharmaceuticals
PesticideActives
580
94
202
135
329
Excellent coverage of
multiple high-interest inventories
Many chemicals appear on
many lists
Broad diversity of chemical-
use categories
Large overlap with data-rich
in vivo inventories
Office of Research and Development National Center for Computational Toxicology
714
936 166
Synergies: Tox21/ToxCast
Chemical Overlaps with COSMOS
Tox21
~8300 (unique structures)
COSMOS ~5500 (unique
structures)
1478
ToxRef DB In vivo animal
studies
ToxCast
PhI (300)
PhII (1060)
E1k & PhIII (2300)
Significant CASRN overlap
increased shared data & knowledge
resources for these chemicals
What about non-overlapping
chemicals?
How do we utilize full chemical-
data landscape?
Office of Research and Development National Center for Computational Toxicology
Chemical Elements to Data Integration:
Chemical representations Uses
Structure
Generic
Substance
Test
Sample
Chemical Name
CASRN
Supplier, Lot/Batch,
physical description
Features
Properties
Chemotypes, fingerprints,
phys-chem properties, ...
SMILES
InChI
Experimental
Endpoint Data
Public toxicity
datasets
Structure searching
& modeling
Chemical analogs,
Read-across,
SAR modeling
Office of Research and Development National Center for Computational Toxicology
-5
0
5
10
15
VAR(1)
0
1
2
3
VAR(2)
1
2
3
4
VA
R(3
)
Tox21 (7324 unique)
ToxCast e1k (+800)
ToxCast PhaseII (767)
Donated Pharma (135)
ToxCast PhaseI (293)
Chemical properties computed using “Adrianna” software by
Molecular Networks (P. Volarath)
LOG P =
Octanol/Water
partition coefficient
TPSA = log (Total
Polar Surface Area)
Complexity = log
(complexity based on
paths, branching,
atoms)
ToxCast & Tox21 Property Space
Office of Research and Development National Center for Computational Toxicology
8
Estimating Toxicity Mechanism Coverage: DEREK (LHASA) Predictions for ToxCast PhII (1060)
0 10 20 30 40 50 60 70 80 90
Halogenated benzene
Polyhalogenated aromatic
Alkylating agent
Phenol or precursor
Organophosphorus ester
Alkyl ester of phosphoric or phosphonic acid
Substituted pyrimidine or purine
Aromatic primary or secondary amine
1,2-Dihalogenated hydrocarbon
beta-O/S-Substituted carboxylic acid or…
Polyhalogenated benzene
Alkyl aldehyde or precursor
Alkylphenol
Hydrazine or precursor
Simple aniline or precursor
Di- to poly-halogenated alkane or cycloalkane
HERG Pharmacophore I
Organophosphorus di- or tri-ester
1,2-Ethyleneglycol or derivative
Aromatic nitro compound
328/450 unique DEREK
alerts fired across entire
dataset
128 alerts fired 5 or more
times across dataset
DEREK predicts 1 or
more toxicity endpoints for
80% of chemicals
DEREK predicts 3 or
more endpoints for 40%
chemicals
Office of Research and Development National Center for Computational Toxicology
Chemistry: What’s needed?
Incorporate chemical information into usable tools for
chemical prioritization & safety assessments
Publicly available data & computational tools &
resources for chemists, toxicologists & modelers
to access & utilize chemical information
Harvesting of existing chemical activity (in vitro, in vivo)
data into databases & computational forms
Integration of available data resources (HTS, in vivo)
Cheminformatics foundation to enable structure modeling
Ability to “look across” data (HTS, in vivo, chemical)
to form hypotheses, guide analog selection, and
improve prediction models
Data!
Public availability
Transparency
Tools
Usability
Office of Research and Development National Center for Computational Toxicology
Public Resources:
EPA ToxCast On-line Resources
>300K structures >16K structures
Data Integration Chemicals
HTS assay results
In vivo data
Product categories
Analysis tools
iCSS Dashboard
>2K structures
Office of Research and Development National Center for Computational Toxicology
Public Resources:
Tox21 Chemical & Bioassay Data
DSSTox:TOX21S
structures
Tox21 assays x
ToxCast cmpds
PubChem: Tox21
88 bioassays
9762 compounds
Office of Research and Development National Center for Computational Toxicology
Public Resources: COSMOS DB v1.0
http://www.cosmostox.eu/
• >12K toxicity studies across 27 endpoints for more than 1,600 compounds •US FDA PAFA content donated by US FDA Office for Food Additives Safety (OFAS) and
oRepeatToxDB compiled by COSMOS Consortium.
•Endpoints including both repeat dose toxicity studies and genetic toxicity data.
•Toxicity data searchable by endpoints, test system, route of exposure, sites or other details.
• >80K records, 40K unique structures •Searchable by name, CAS, structure, structure-
similarity
Office of Research and Development National Center for Computational Toxicology
Public Resources: KNIME Chemistry
Data Analytics https://www.knime.org/
• Workflows can be freely published & shared
– reproducible & transparent
– promotes quality standards
• Scripting for “non-programmers”
• Using to improve quality of structures in ACToR and efficiency of DSSTox curation
• KNIME chemotyper implemented in multiple COSMOS projects & workflows
KNIME Workflow developed by Kamel Mansouri, ORISE PostDoc, NCCT
Structure processing
SMILES, InChI
Chemical properties
Fingerprinting
Structure similarity
Statistics
Visualization
Office of Research and Development National Center for Computational Toxicology
Public Resources:
Chemotyper & ToxPrint Chemotypes
Developed by Altamira & Molecular Networks, Funded by US FDA
Chemotyper allows visualization of
chemotypes in an imported
structure inventory (e.g., ToxCast)
Chemotyper “fingerprint” files generated for ToxCast & Tox21 inventories
ToxPrint feature set designed
to capture important structural
frameworks, fragments and
elements spanning inventories
of toxciological & regulatory
interest to EPA, FDA.
Office of Research and Development National Center for Computational Toxicology
15
ToxCast ToxPrint Chemotype “Fingerprints”
DS
ST
ox_G
SID
bond:C
=O
_carb
on
yl_
gene
ric
chain
:aro
matic
Alk
ane_P
h-
C1_acyc
lic_generic
bond:C
OH
_alc
ohol_
generic
bond:N
C=
O_am
inocarb
on
yl_
gene
ric
bond:C
(=O
)N_carb
oxam
ide_ge
ne
ric
ring:h
ete
ro_[6
]_Z
_g
ene
ric
bond:C
X_halid
e_aro
ma
tic-X
_gen
eric
bond:C
OH
_alc
ohol_
alip
hatic
_gene
ric
bond:C
N_am
ine_alip
hatic
_generic
bond:C
N_am
ine_aro
matic
_gene
ric
bond:C
(=O
)O_
ca
rbo
xylic
Acid
_generic
bond:C
X_halid
e_alk
yl-X_g
ene
ric
chain
:alk
eneC
yclic
_eth
ene_ge
ne
ric
chain
:alk
eneLin
ear_
mono-
ene_eth
ylene_
gene
ric
bond:S
=O
_sulfo
nyl_
gene
ric
ring:h
ete
ro_[6
]_N
_p
yrid
ine_ge
ne
ric
bond:C
N_am
ine_te
r-N_gen
eric
bond:C
C(=
O)C
_keto
ne_g
ene
ric
ring:h
ete
ro_[5
_6
]_Z
_ge
ne
ric
bond:C
N_am
ine_pri-N
H2_generic
bond:C
X_halid
e_gene
ric-X
_dih
alo
_(1
_2
-)
bond:C
N_am
ine_sec-N
H_generic
ring:h
ete
ro_[6
_6
]_Z
_ge
ne
ric
bond:C
C(=
O)C
_keto
ne_alip
ha
tic_g
ene
ric
bond:C
N_am
ine_alic
yclic
_generic
bond:C
=O
_carb
on
yl_
ab-
unsatu
rate
d_gen
eric
bond:S
~N
_generic
bond:S
(=O
)O_sulfo
nic
Acid
_g
ene
ric
bond:C
C(=
O)C
_keto
ne_alk
ene_cyclic
_2
-en
-
1-o
ne_generic
ring:h
ete
ro_[6
]_O
_p
yra
n_gen
eric
ring:h
ete
ro_[6
]_N
_dia
zin
e_
(1_
3-)_
gen
eric
bond:C
C(=
O)C
_keto
ne_alk
ene_
gene
ric
bond:N
C=
O_ure
a_gen
eric
ring:h
ete
ro_[5
]_N
_p
yrro
le_ge
ne
ric
bond:C
#N
_nitrile
_generic
ring:h
ete
ro_[6
]_N
_tria
zin
e_g
ene
ric
chain
:aro
matic
Alk
ene_P
h-
C2_acyc
lic_generic
bond:C
X_halid
e_alk
enyl-X
_gen
eric
gro
up:a
min
oA
cid
_am
inoA
cid
_generic
bond:C
X_halid
e_alk
yl-X_e
thyl_
gene
ric
bond:P
~S
_generic
bond:C
=O
_ald
eh
yde
_ge
ne
ric
ring:fu
sed_ste
roid
_ge
ne
ric_[5
_6
_6_6
]
bond:N
=N
_azo_gene
ric
bond:m
eta
l_m
eta
lloid
_S
i_generic
bond:q
uatN
_generic
bond:C
=S
_carb
on
yl_th
io_ge
ne
ric
bond:S
(=O
)O_sulfu
ricA
cid
_gen
eric
bond:C
=N
_carb
oxam
idin
e_
gene
ric
ring:h
ete
ro_[7
]_gen
eric
_1-Z
chain
:alk
yne_eth
yne_
gen
eric
ring:h
ete
ro_[3
]_Z
_g
ene
ric
20197 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
47368 1 1 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
47271 1 1 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47305 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47346 1 0 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
47375 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21244 1 1 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22519 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
24107 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47254 1 1 0 1 1 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47289 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
47311 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47355 1 0 0 1 1 1 1 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
48507 1 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48511 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20822 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
21097 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21233 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21777 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
22588 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
23322 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
23412 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23645 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25234 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34260 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47316 1 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
47325 1 1 0 1 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
47339 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47347 1 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47351
1 1 0 1 1 1 0 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
“toxprint_v2_vs_TOX21S_v4a_8599_03Dec2013.csv”
Excellent Coverage (#chem w/chemotypes): Tox21: 8599 chemicals x 729 chemotypes
all 8454 structures have ≥ 1 chemotype
95% have ≥ 5 chemotypes each
65% have ≥ 10 chemotypes each
Diversity (#chemotypes present) ToxCast (1860) Tox21(8599) 500/729 (68%) 627/729 (86%)
Office of Research and Development National Center for Computational Toxicology
Filter 1892 ToxCast chemicals by
ToxPrint_Chemotype
Export all ToxCast Assay data
ToxPrint Chemotype
Chemical Use Category, phys-chem properties, assay hits…
Filter by ToxPrint_Chemotype:
bond.C..O.N_carboxamide_.NH2
EPA ToxCast iCSS Dashboard:
http://actor.epa.gov/dashboard/
Office of Research and Development National Center for Computational Toxicology
17
•Refine or expand chemotype
subgroup of interest
•Are there HTS assay hits enriched
within this chemotype subgroup?
e.g. ToxCast (1860) “Bisphenol A” chemotype search
Can export list of chemotypes for
selected chemicals
Can export structures containing
chemotypes
Use in iCSS Dashboard to
explore ToxCast HTS results
Office of Research and Development National Center for Computational Toxicology
714
936 166
Synergies: Tox21/ToxCast
Chemical Overlaps with COSMOS
Tox21
~8300 (unique structures)
COSMOS ~5500 (unique
structures)
1478
ToxRef DB In vivo animal
studies
ToxCast
PhI (300)
PhII (1060)
E1k & PhIII (2300)
What about non-overlapping
chemicals?
How do we utilize full chemical-
data landscape?
Significant overlap increased
shared data & knowledge resources
per overlapping chemical
Office of Research and Development National Center for Computational Toxicology
0% 20% 40% 60% 80% 100%
bond:C(=O)O_carboxylicEster_aromatic
bond:C=O_aldehyde_generic
bond:CC(=O)C_ketone_generic
bond:CN_amine_pri-NH2_aromatic
bond:CN_amine_pri-NH2_generic
bond:CN_amine_sec-NH_generic
bond:CN_amine_ter-N_generic
bond:COC_ether_aliphatic
bond:COC_ether_aliphatic__aromatic
bond:COC_ether_alkenyl
bond:COC_ether_aromatic
bond:COH_alcohol_aromatic_phenol
bond:COH_alcohol_diol_(1_1-),(1_2-),(1_3-)
bond:COH_alcohol_generic
bond:CS_sulfide
bond:CX_halide_alkyl-X_generic
bond:CX_halide_aromatic-X_generic
bond:CX_halide_generic-X_dihalo_(1_2-)
bond:N=N_azo_generic
bond:NC=O_urea_generic
bond:quatN_alkyl_acyclic
bond:S(=O)O_sulfonate
bond:S(=O)O_sulfonicEster_acyclic_(S-C(ring))
bond:metal_metalloid_Si_generic
bond:metal_metalloid_Si_organo
chain:alkaneLinear_octyl_C8
chain:alkaneLinear_decyl_C10
chain:alkaneLinear_dodedyl_C12
chain:alkaneLinear_tetradecyl_C14
chain:alkaneLinear_hexadecyl_C16
chain:alkaneLinear_stearyl_C18
ring:hetero_[5]_Z_1_2-Z, 2_3-Z,2_4_1_3_4-Z
ring:hetero_[5]_Z_1_3-Z
ring:hetero_[6]_N_pyridine_generic
ring:hetero_[6]_O_pyran_generic
0 500 1000 1500 2000
How representative is
Tox21-COSMOS
overlap (1478) of
remainder of COSMOS
structures (5540-
1478)?
Percentage Proportional
COSMOS-only (5540-1478)
Tox21 Overlap (1478)
Reinforcement of major
COSMOS ToxPrint chemotypes.
ToxPrint Profiling:
Synergies across inventories
Office of Research and Development National Center for Computational Toxicology
0
200
400
600
800
1000
1200
1400
1600
1800
2000
ToxCast not in COSMOS
Tox21 not in COSMOS
COSMOS_5540
In what areas of chemotype
space can ToxCast & Tox21
chemical-assay data inform
COSMOS?
* 60 ToxPrint chemotypes mapped to
20 categories
Chem
oty
pe c
ount
ToxPrint Chemotypes *
ToxPrint Profiling:
Synergies across inventories
Office of Research and Development National Center for Computational Toxicology
0,0
0,5
1,0
1,5
2,0
2,5
3,0
3,5
bond:C(=O)N_carbamatebond:CC(=O)C_ketone_generic
bond:CN_amine_sec-NH_generic
bond:COC_ether_aliphatic__aromatic
bond:COH_alcohol_aromatic_phenol
bond:COH_alcohol_generic
bond:CX_halide_alkyl-X_generic
bond:N=N_azo_generic
bond:quatN_alkyl_acyclic
bond:metal_group_III_other_Sn_organo
chain:alkaneLinear_octyl_C8
chain:alkaneLinear_tetradecyl_C14
chain:alkeneLinear_diene_linoleic_(C18)
ring:hetero_[5]_Z_1_2_3-Z
ring:hetero_[5]_Z_1_3-Z
Chemotype distribution across chemicals
with ToxRefDB Developmental study (all
species) compared to COSMOS & Tox21
COSMOS (55
Tox21 (w/o ToxRef)
ToxRef DEV (all species)
Log (Chemical count)
ToxPrint Profiling:
Synergies across inventories
Office of Research and Development National Center for Computational Toxicology
0
2
4
6
8
10
12
14
*Altamira beta version of
ToxPrint Chemotypes
S1: Metabolically Activated (134 cmpds)
S2: Direct acting &
inactives (157 cmpds)
Propose use of S1
feature set to
predict chemical
space in PhII &
Tox21 more likely
to require
metabolic
activation for Rat
Carcinogenicity
ToxPrint Profiling:
e.g. Modeling in vivo activity subsets
ToxCast Phase I (291 total) Rat Carcinogenicity Study
using ToxRefDB & Meteor:Derek workflow, Volarath et al.
Office of Research and Development National Center for Computational Toxicology
ToxCast Phase I Assays Assay Hits
in S1
Assay Hits
in S2
Fraction of total
Hits in S1
BSK_SM3C_MCP1_up 16 7 0.7
BSK_hDFCGF_IL8_up 14 4 0.78
BSK_BE3C_MIG_up 6 1 0.86
ATG_PPARa_TRANS 5 2 0.71
CLM_Hepat_LysosomalMass_1hr 5 2 0.71
CLM_Hepat_LysosomalMass_48hr 5 2 0.71
CLM_NuclearSize_24hr 5 2 0.71
NVS_NR_hPR 5 1 0.83
0
2
4
6
8
10
12
14
16 # Chemicals in 159-Dataset
# Chemicals in 134-Dataset
Subset of ToxCast assays that differentiate
metabolically activated RatCarc chemicals
(S1) from the remainder (S2)
HTS activity profile sensitive to chemical features!
ToxPrint Profiling:
e.g. Modeling in vitro to in vivo endpoint
ToxCast Phase I (291 total) Rat Carcinogenicity Study
using ToxRefDB & Meteor:Derek workflow, Volarath et al.
Office of Research and Development National Center for Computational Toxicology
ToxPrint:
e.g. Data mining & QSAR models
2. QSAR model:
Further differentiation of cleft palate
actives by HTS assay results (TGFb) &
partial pi- and sigma- charges yields
predictive model within chemotype
subgroups
C Yang et al., Altamira
1. Data Mining:
Tox21 cleft palate actives (ToxRef, public,
CERES) significantly enriched within
triazole/imidazole chemotype groups
Office of Research and Development National Center for Computational Toxicology
QSAR using biologically informed
chemical features
Toxicity
Biological features
HTS Assays
In vitro In vivo
ToxPrint
“Chemotypes”
HTS results are used to inform feature selection, linking chemical features to putative toxicity mechanism
Office of Research and Development National Center for Computational Toxicology
Building a public chemotype
“knowledge- base”
Chemicals
Cosmos
CERES
ToxRef
ToxCast
Tox21
Use categories
Fate & Transport
ADME
Reactivity
Biotransformation Phys-chem
properties
Biological
activities
Office of Research and Development National Center for Computational Toxicology
Data!
Public availability
Transparency
Tools
Usability
Chemistry: What’s needed?
Incorporate chemical information into usable tools for
chemical prioritization & safety assessments
Publicly available data & computational tools &
resources for chemists, toxicologists & modelers
to access & utilize chemical information
Harvesting of existing chemical activity (in vitro, in vivo)
data into databases & computational forms
Integration of available data resources (HTS, in vivo)
Cheminformatics foundation to enable structure modeling
Ability to “look across” data (HTS, in vivo, chemical)
to form hypotheses, guide analog selection, and
improve prediction models
ToxCast
ToxRefDB
iCSS Dashboard
DSSTox
ACToR
KNIME
ToxPrint &
Chemotyper
FDA CERES
Office of Research and Development National Center for Computational Toxicology
Acknowledgements:
EPA NCCT ToxCast Team Richard Judson (ACToR)
Keith Houck (HTS)
Matt Martin (ToxRefDB, Dashboard)
Lisa Truong
Tox21 leadership & consortium
External Collaborators:
Altamira: Chihae Yang, Jim Rathman
Molecular Networks: Aleksey Tarkhov, Christof Schwab
COSMOS: Mark Cronin
U.S. FDA: Kirk Arvidson, Patra Volarath (formerly EPA Post Doc)
This work was reviewed by EPA and approved for publication but does not
necessarily reflect official Agency policy.
Office of Research and Development National Center for Computational Toxicology
Questions?