Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Matthias Negri , PhDScientific Information CenterResearch Networking, Boehringer Ingelheim Pharma GmbH & Co KG
ChemAxon UGM, Budapest 2016 24. May 2016
Leveraging the value of documents content –PLEXUS for agile dissemination of results
Content
1. Data in Documents – beyond chemistry
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else..
5. Deployment of results – PLEXUS
6. Examples
2Negri M, ChemAxon UGM 2016
The manifold ways of Chemistry: Names …
InChI=1S/C22H30N6O4S/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29)
InChIKey=BNRNXUUZRGQAQC-UHFFFAOYSA-N
CCCC1=NN(C)C2=C1N=C(NC2=O)C1=CC(=CC=C1OCC)S(=O)(=O)N1CCN(C)CC1
LabelINN/UNNTradeNameDrugCodeDrugNameCompanyCodeDBcode1, DBcode2,DBcode3, ….CDB codeDevelop.CodeProductCode…
IDMP code
バイアグラ
偉哥
Viagra
Sildenafil
5-{2-ethoxy-5-[(4-methylpiperazin-1-yl)sulfonyl]phenyl}-1-methyl-3-propyl-1H,6H,7H-pyrazolo[4,3-d]pyrimidin-7-one Compound 3
3Negri M, ChemAxon UGM 2016
IUPAC:1‐methyl‐7‐(1‐methyl‐1H‐pyrazol‐4‐yl)‐5‐[4‐(trifluoromethoxy)phenyl]‐1H,4H,5H‐imidazo[4,5‐c]pyridin‐4‐one
text
molec. attachments (MOL,SDF, CDX)
table
table
image
example/cmpd nr
example/cmpd nr
The manifold ways of Chemistry: …appearance..
4Negri M, ChemAxon UGM 2016
Drug indication,
Disease – condition,
Reaction types,
Mechanism of action,
Medicinal and off- targets,
Description,
contraindications,
side effects, AE,
DDI interactions,
drug group/type/classification,
sampling time per drug dose,
dosage history,
Bibliographic “novelty check”,
patent landscape,
safety
companies
Chemistry as linking node for all …data
Toxicity
The manifold ways of Chemistry: …data
PK/PD
Bioactivity
Other Data
and beyond Chemistry… PLENTY of DATA !
5Negri M, ChemAxon UGM 2016
Phys-Chem Properties:- Experimental, calculated- Internal vs external
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else.. can we do?
5. PLEXUS – use cases
6Negri M, ChemAxon UGM 2016
Tools/technologies:The interplay
1. Pipelining - KNIME/XPATH
2. Chemical recognition - ChemAxon KNIME nodes + Command line tools
3. Text/data-mining – Linguamatics I2E
4. Optical Structure Recognition – Keymodule CLiDE
5. Visualization – ChemCurator and PLEXUS
7Negri M, ChemAxon UGM 2016
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. What else.. can we do?
5. PLEXUS – use cases
8Negri M, ChemAxon UGM 2016
SLOWER & memory intensive vs BUT Higher Quality, More Control & IUPAC-enriched XML
FASTER vs LESS informative/flexible
INPUT OCR TABLE
I2E API KNIME – Batch indexing, text-mining and (relational) data retrieval
GET
9
Current (past) experiences Patent Curation Workflow - update
Negri M, ChemAxon UGM 2016
Patent Curation WorkflowVisualize data-/textmining results
SDF file imported into ChemCC project + automatic mapping to existing chemistry
Tables are exported as Excel Sheets or as SDF files
10Negri M, ChemAxon UGM 2016
1. Still NO full automation BUT: using KNIME’s flexibility - more workflows
2. Time & Computational Resources 8 CPU notebook + Server (needed in
particular for OCR correction routines)
3. Novelty checking: compare Preferred IUPAC vs Traditional names (incl. common)
4. Improved handling of OCR
11
Patent Curation Workflow - update
Negri M, ChemAxon UGM 2016
Content
1. Data in Documents – chemistry & beyond
2. Tools/technologies
3. Current (past) experiences - Patent Curation Workflow
4. Patents worked nice.. What else now?
5. PLEXUS – use cases
12Negri M, ChemAxon UGM 2016
1. Extraction of chemical reactions from PDFs
2. External databases – combine structured and unstructured (=TEXT) search
3. Internal Documents – make more out of Docx files
PLEXUSWhat’s next.. Visualization, Search & Share
Negri M, ChemAxon UGM 2016 13
1. Easy Search for chemical compounds or reaction in own PDF-collections
“Where is that reaction?“
2. Share your experience - Leverage in house synthetic knowledge
“Does this reaction also work when using a diverse reagent?“
“Which yield was achieved in house for that reaction?“
Extraction of chemical reactions from PDFs:
PLEXUS 1. Extract&Search for chemical reactions in PDFs
Negri M, ChemAxon UGM 2016 15
- Chemistry recognition (n2s/d2s, OSR)- linguistic reaction pattern recognition -
annotation with BRAT - Reaction extraction – splitting into components
Mrv file PLEXUSBrowser
PDF collection(s)
PLEXUS 1. Extract&Search for chemical reactions in PDFs
by
Negri M, ChemAxon UGM 2016 16
PLEXUS 1. Extract&Search for chemical reactions in PDFs
BRAT - capture the essence (role, anaphora, etc) of chemical reactions in text
Negri M, ChemAxon UGM 2016 17
Visualize/Design views for selected content – search & export results
PLEXUS 1. Extract&Search for chemical reactions in PDFs
Negri M, ChemAxon UGM 2016 18
19
PLEXUS 2. External databases – DrugBank
1. “classic” DB search (incl. chemical search) + search in “unstructured” text-boxes
2. Upload of content (eg. DrugBank) as raw XML/xls/csv or as pre-
processed/enriched nformation
3. Map DBs via IJC and display selected one2one/many-relations via PLEXUS
4. Custom Views for the various “customers“ within a company
Exploit Database-Searches beyond predefined fields:
Negri M, ChemAxon UGM 2016
PLEXUS 2. External databases – DrugBank
Substructure search
Text-based search
Negri M, ChemAxon UGM 2016 20
PLEXUS 3. Internal Documents – Docx/Doc
1. Search over 1000s of documents
2. Infer chemical meaning to Doc/Docx files
3. Extract&store bits of information company-wide repository for phys.-chem. or
experimental data
4. Outlook: combine “internal“ and external data
Make out more from static Word file collections (Dir:\file1, file2..)
Negri M, ChemAxon UGM 2016 21
IJCDesign
Form View
Text/datamining
Indexing/annotationFree text & tables
molconvert/reaction splitChemistry recognition (text + IMG/skc-Files)
join/mapCombine both outputs
PLEXUSVisualizeSearch
PLEXUS 3. Internal Documents – Docx/Doc
Negri M, ChemAxon UGM 2016 22
PLEXUS 3. Internal Documents – Docx/Doc
Negri M, ChemAxon UGM 2016 23
PLEXUS 3. Internal Documents – Docx/Doc
Different Tabs improved overview
Negri M, ChemAxon UGM 2016 24
- Version incompatibilities, stability issues, Java 7 vs Java 8
- If empty fields annoying error messages
PLEXUS - weak-points, limitations
Negri M, ChemAxon UGM 2016 25
- Only ONE chemical field – limited search options not possible to search for product and reagent in a “chemical” way
PLEXUS - weak-points, limitations
Negri M, ChemAxon UGM 2016 26
PLEXUS - weak-points, limitations
- Limited options - capabilities of IJC are not reflected in PLEXUS
Highlight search terms
HTML representation
IJC snapshot
Negri M, ChemAxon UGM 2016 27
Thank You
28
Acknowledgements
Lutz Weber
Anett Plüschel
Ulf Laube
Matthias Irmer
S.I.C. group
MedChem/ChemDev
H. Schmid
M. Santagostino
D. KirbergNegri M, ChemAxon UGM 2016