Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Chemically Informed Text Mining
David Milward
Linguamatics
Chemaxon UGM Budapest 2013
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style Linguamatics: Agile Text Mining
Boston Cambridge
I2E: agile, scalable, real-time NLP-based text mining
Fact extraction and knowledge synthesis
Fortune 500
Pharma/Biotech
Healthcare
Government Including 9
of the top 10
Including Kaiser Permanente
Including FDA
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style
• Melting points for exemplified compounds in patents
Chemical Searching combined with Text Searching
Patent Data from IFI Claims Direct
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style A Versatile Toolbox for Finding Information …
• Search for e.g. cancer and get synonyms and children:
• Malignant neoplasms, Malignant tumor …
• Leukaemia, Lymphoma, Astrocytoma … Terminologies
Linguistics
• e.g. microRNA: let-?\d+.* mirn?a?-?\d+.* Regular Expressions
Chemical Substructure
• Simultaneous processing of large numbers of items e.g.
• 500 genes from microarray experiment High Throughput
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style … and Presenting it Efficiently
Identify Extract Synthesize Analyze
Pie Charts for drill down
© Linguamatics 2013
Trending over time
Interaction networks
Mind maps with clustering Clustered results table
RDF/BEL for network modelling
bp(apoptosis)p(C)taof(p(A))
microRNA(Q) kaof(p(D))p(D, P@Y)
p(B) catof(p(R))
catalytic activity
kinase activitymicroRNA abundance
phosphorylation at unspecified
tyrosine
protein abundance
direct causation
transcriptional activity
biological process
protein abundance
Click to edit Master title style Click to edit Master title style
© Linguamatics 2013
ChemAxon Integration
Mol files
Mol conversion with Filtering
5.7 g (56.7 mmol) of triethylamine in 20 ml methylene chloride are added dropwise at room temperature to a solution of 10 g (56.7 mmol) 2-hydroxymethyl-6-methylene-1,4-dithiepane
I2E Index
Name-to-Structure
I2E Query with Substructure/ Similarity
Click to edit Master title style Click to edit Master title style
YOUR APPLICATION
HERE!
I2E Server
Indexing tasks
Querying tasks
Class matching
Index/Query Publishing
Administration Tasks
I2E Client Pipeline Pilot Components
WSAPI Web View
Sample Web GUI
Client
I2E WSAPI
Serv
er
I2E Web Services API (WSAPI)
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style I2E WSAPI Examples
© Linguamatics 2013
Click to edit Master title style Click to edit Master title style Thank You!
For more information…
Please visit our table or www.linguamatics.com
Webinars:
www.linguamatics.com/welcome/events/webinars.html
Contact: Phil Hastings
Email: [email protected]
© Linguamatics 2013