TITLE OF PRESENTATION |
Presented By
Date
Combining semantic triple stores across knowledge domains
Matthew Clark
15 March 2016
Matthew Clark ([email protected]), Frederik van den Broek, Anton Yuryev, Maria Shkrob, Sherri Matis-Mitchell, Timothy Hoctor. R & D Solutions, Elsevier Inc., 251st National Meeting of the American Chemical Society, San Diego CA March 13-17, 2016, American Chemical Society: Washington DC, 1996; CINF 118.
TITLE OF PRESENTATION |
• Very large data sets
• Order of ~107 documents published (patents, journals, books)
• Each document has ~200 sentences ~109 statements.
• Statements are about molecules, properties, reactions, indications etc.
• Combinatorial connections between large data sets
• “connecting the dots” among these facts results in a very large number of possible connections
• 𝑛!
𝑘! 𝑛−𝑘 ! combinations of k elements chosen from a pool of n.
2
What Constitutes Big Data?
Pathways • Relationships mined from
12,000 titles , 25M documents
• <subject> <verb> <object> relationships
• Each subject, object, verb has a taxonomy
• Example: “protein” causes/induces disease
Compounds • 16,000 journal titles
plus patent offices • Compounds,
Reactions, Properties • Over 6 million
compounds with bioactivity
Bioassays • Biological relationships mined
from journals/patents (over 16 million)
• <compound> <verb> <object> <quantity>
• Example: Sunitinib binds-to Bcr-abl in <assay type> at 1nM
TITLE OF PRESENTATION | 3
Biological Pathways extracted via semantic
text mining
A upregulates B
B upregulates C
C increases Disease
Normalizing vocabularies required: proteins, diseases, drugs, chemicals
A B C disease
Bioactivities through text analysis
IC50 6.3nM, kinase binding assay 10mM concentration
Chemical Structures And Properties
InChi, Name
NCBI, Uniprot
EMTREE ReaxysTree, Structures
TITLE OF PRESENTATION | 4
Example: Process for Finding New Indications for a Drug
Find all targets for which the compound has high affinity
Collate the diseases by targets and activity of the compound
Using unique set of proteins from steps 1 and search for all diseases reported to be related to them
Step 1 Step 2 Step 3
Find all compound-protein/gene relationships with > 1 reference using text analysis
Targets inhibited
Targets Related to
Disease
TITLE OF PRESENTATION |
5
Processing Protocol in Biovia PipelinePilot
Input: Drug Name
Output: Ranked
indications the drug may
treat
TITLE OF PRESENTATION | 6
Example drug: Ruxolitinib
• Janus kinase inhibitor selective for JAK1 and JAK1
• Approved for
• Myelofibrosis – cancer of bone marrow
• Polycythemia vera –too many red blood cells are made in bone marrow
*
* Jakafi is a registered trademark of Incyte Corporation. Incyte Corporation did not sponsor and was not involved in this data analysis .
TITLE OF PRESENTATION |
• For each target in the pathway, search for active compounds and compute activity for each
7
Elsevier indexes each reported measurement; must compute the ‘best’ value for each compound
Target in Disease Pathway
ABCC8
Reported Activities
8.0
7.9
6.6
6.0
7.0
6.0
5.0
Mean by Compound
7.9
6.4
5.0
In many cases there are several reported measurements for the same target/compound
TITLE OF PRESENTATION |
Target Name Number of
Reports pX
JAK2 34 8.5 JAK1 24 8.5 JAK3 8 8.0 TYK2 16 7.7 JAK2 (V617F) 4 7.6 LTK 1 7.5
MAP3K2 1 7.4
ROCK2 1 7.3
ROCK1 1 7.2
CaMK2 2 7.2
DCAMKL1 1 7.2 DAPK1 1 7.1
LRRK2 1 7.1 ACK1 1 7.1
DAPK3 1 7.1 LRRK2 (G2019S) 1 7.1 DAPK2 1 7.0
8
Ruxolitinib Reported Activities
Specific Bioactivities
>= 7 log units
TITLE OF PRESENTATION | 10
Diseases Related to Ruxolitinib Active Targets
MeDRA Level Disease Targets In Disease Pathway Inhibited by Ruxolitinib Target Count
soc Neoplasms JAK2;JAK1;TYK2;JAK3;LTK;ROCK2;ROCK1;DAPK1;LRRK2 9
hlt Inflammation JAK2;JAK1;TYK2;JAK3;ROCK1;DAPK1;LRRK2;DAPK3 8
pt Cancer JAK2;JAK1;JAK3;MAP3K2;ROCK1;DAPK1;LRRK2 7
pt Cell Transformation, Neoplastic JAK2;JAK1;ROCK1;DAPK1 4
hlt Colitis TYK2;JAK3;LRRK2 3
pt Hypertension JAK2;ROCK1;DAPK3 3
pt Ischemia JAK2;ROCK1;DAPK1 3
pt Insulin Resistance JAK2;ROCK2;ROCK1 3
hlt Diabetes Mellitus JAK2;TYK2;ROCK1 3
pt Obesity JAK2;ROCK2;ROCK1 3
TITLE OF PRESENTATION |
Disease # Targets Inhibited Example Rux Trials from ClinTrials.gov
Neoplasms 9 Neoplasms, Hematologic; Myeloproliferative Neoplasms Inflammation 8 Neoplasm Metastasis 5 Metastatic Pancreatic Adenocarcinoma; Metastatic Cancer Cell Transformation, Neoplastic 4 Colitis 3 Hypertension 3 Ischemia 3 Insulin Resistance 3 Diabetes Mellitus 3 Obesity 3 Inflammatory Bowel Diseases 3 Autoimmune Diseases 3 Ruxolitinib Prior to Transplant in Patients With Myelofibrosis Atherosclerosis 3 Graft vs Host Disease 3 Ruxolitinib in Combination With Autotransplant Prostate Cancer 3 Metastatic Prostate Cancer Neoplasm Invasiveness 3 Metastatic Pancreatic Adenocarcinoma; Metastatic Cancer Cardiac Hypertrophy 3 Hyperinsulinism 2
11
Analysis — Suggested Indications are Consistent with Current Clinical Trials
• There is a cluster of insulin/diabetes related indications – possible new area?
TITLE OF PRESENTATION |
Disease Name Selected Sentences Number of Refs
inflammation IL17A --+> Inflammation
Collectively, the data presented here indicate that integrin αvβ8 on DCs facilitates the development of Th17 cells, and consequently contributes to IL-17-mediated CNS inflammation, through activation of TGF-β. … IL-17 mRNA in sputum of asthmatic patients: linking T cell driven inflammation and granulocytic influx? 955
autoimmune diseases
IL17A ---> Autoimmune Diseases
It has been well recognized that IL-23/Th17/IL-17 axis is critically involved in driving chronic inflammatory autoimmune diseases. … The production of IL-17 by T helper17 cells was recently shown to be essential for development of CIA or other autoimmune diseases . 271
arthritis IL17A --+> Arthritis
In a murine model, interleukin -17 plays a critical role in the pathogenesis of arthritis. 196
12
Examples of Target-Disease Relationships
TITLE OF PRESENTATION | 13
This Analysis Shows Connections of Ruxolitinib to Alopecia
A cancer drug that grows hair! Trials are under way Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition Nature Medicine 20, 1043–1049 (2014) doi:10.1038/nm.3645 Global transcriptional profiling of mouse and human AA skin revealed gene expression signatures indicative of cytotoxic T cell infiltration, an interferon-γ (IFNG) response and upregulation of several γ-chain (γc) cytokines known to promote the activation and survival of IFN-γ–producing CD8+NKG2D+ effector T cells. Therapeutically, antibody-mediated blockade of IFN-γ, interleukin-2 (IL-2) or interleukin-15 receptor β (IL-15Rβ) prevented disease development, reducing the accumulation of CD8+NKG2D+ T cells in the skin and the dermal IFN response in a mouse model of AA.
TITLE OF PRESENTATION | | 14
• A rare genetic disease
• Permanently excessive level of insulin in the blood
• Develops within the first few days of life
Symptoms include floppiness, shakiness, poor feedings, seizures, fits and convulsions.
• If not caught quickly can lead to brain injury or even death.
• In the most severe cases the only viable treatment is the removal of the pancreas, consigning the patient to a lifetime of diabetes.
Example: Treatments for Congenital Hyperinsulinism
is a UK charity that is building the rare disease community to raise awareness, drive research and develop treatments. is partnering with Findacure scientists to help identify and evaluate treatments for this devastating disease.
TITLE OF PRESENTATION | | 15
From pathways to treatments: Biovia PipelinePilot implementation combines data sources
Automated analysis combines bioassay data with pathway data
Find all targets that could be used to affect the disease state
Query for each target to find the activities for each compound that are >6 log units
Collate data by compound to summarize the targets/activities related to disease that the compound hits • Compute geometric mean of activities for ranking • Rank by number of targets and geometric mean of
activities against targets
Step 1 Step 2 Step 3
TITLE OF PRESENTATION | | 16
Automated analysis combines bioassay data with pathway data
From pathways to treatments:
• 88 Targets related to hyperinsulinism with ≥3 literature references
• Full PathwayStudio relationship information
• PathwayStudio also has all compounds suggested as treatments
Find all targets that could be used to affect the disease state
Step 1
TITLE OF PRESENTATION | | 17
Building and refining the disease model
• Summary of the literature findings: CHI mutations in the context of insulin secretion
• Generate hypotheses using:
• 6.2M literature-extracted findings
• Functional annotations (e.g. Gene Ontology)
• >1800 pre-build pathways modeling disease and normal states
TITLE OF PRESENTATION | | 18
Automated analysis combines bioassay data with pathway data
From pathways to treatments:
Find all targets that could be used to affect the disease state
Query for each target to find compounds that have high affinity for them (>6 log units)
Step 1 Step 2
Targets based on text mining
Approved compounds
TITLE OF PRESENTATION | | 19
Automated analysis combines bioassay data with pathway data
From pathways to treatments:
Mean of activities among these targets
Mean of activities among these targets Targets and activities for each compound
Drug-likeness metrics for
sorting/classification
• All compounds that were observed to bind to targets in pathway
• Sorted by number of
active targets. Too many targets may suggest lack of specificity.
Find all targets that could be used to affect the disease state
Query for each target to find compounds that have high affinity for them (>6 log units)
Collate data by compound to summarize the targets/activities related to disease that the compound hits • Compute geometric mean of activities for ranking • Rank by number of targets and geometric mean of
activities against targets
Step 1 Step 2 Step 3
TITLE OF PRESENTATION |
• Starts with the set of active compounds and attempts to find common active scaffolds among them
• This is one of 38 scaffold systems identified as potentially active to treat hyperinsulinism
• Analysis method used: "The Scaffold Tree, Visualization of the Scaffold Universe by Hierarchical Scaffold Classification", Schuffenhauer, A., Ertl, P., Roggo, S., Wetzel, S., Koch, M. A., Waldmann, H., J. Chem. Inf. Model. 2007, 47, 47-58.
20
Next Step – Analyze Molecules to Identify Common Active Scaffolds for Novel Designs
More levels of “simplification” of common scaffolds from active compounds
Level 1 Level 2 Level 3 Note: many of these can be recognized as kinase-inhibitor scaffolds
TITLE OF PRESENTATION | 21
Who is collaborating? The collaboration analysis shows clinical centers specializing in CHI
• Filtered for institutions with > 4 publications and who collaborated with another institution. • Size of circle proportional to total number of publications • Line width proportional to the number of co-authored publications • Lines labeled with DOI’s
TITLE OF PRESENTATION | 22
Who are the researchers in congenital hyperinsulinism?
• Filtered for authors with > 3 publication and who collaborated with another person. • Size of circle proportional to total number of publications • Line width proportional to the number of co-authored publications • Lines labeled with DOI’s • Numbers for authors are Scopus ID
TITLE OF PRESENTATION |
• Results in testable ideas
• Many compounds are already approved drugs, can be tested in in-vivo experiments
• Concepts can be extended to find novel compounds
• Use modeling tools to extract common frameworks
• SAR to optimize activity for new indication
• Compare with compounds suggested as treatments as found by text mining
• Shows power of combining pathway data with experimentally verified binding data
• Not just theoretical pathways, but testable ideas.
23
Summary