30
Synthesizing Knowledge by Exploiting Diverse Data Sources: From Microblogs to Patents David Milward ICIC, October 2011

Synthesizing Knowledge by Exploiting Diverse Data Sources

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Synthesizing Knowledge by Exploiting Diverse Data Sources

Synthesizing Knowledge by Exploiting Diverse Data Sources:

From Microblogs to Patents

David Milward

ICIC, October 2011

Page 2: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title style

In the beginning...

Range Of Application Areas

Protein-Protein

Interactions

Vocabulary Development

Biomarker Discovery

Target Identification

& Prioritization

Drug Repositioning

Conference Abstract Mining

Key Opinion Leader

Identification

Competitive Intelligence

Safety/Tox

In-licensing Opportunities

Gene Profiling

Systems Biology

Mining FDA Drug

LabelsExtracting Numerical

and Experimental

Data

Mutations and Gene

Expression

Sentiment Analysis in

Social Media

Workflow Integration

Mining Electronic Medical Records

Clinical Trial Analysis

Patent Analysis

Page 3: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleExample Data Sources

• External Data

– Social Media

– Scientific Literature

– Conference Abstracts

– Patents

– Clinical Trials

• Internal Data

– SharePoint repositories

– Excel spreadsheets

– Project reports

– Electronic Health Records

Page 4: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Social Media

What is being said about our brands? How are the messages being relayed?

Who are the key opinion leaders?

Which sites are influencing our consumers?

What are people saying about us and our competitors?

To provide actionable information from public sentiment

Page 5: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMicroblogs: Twitter

©Digital Surgeons 2010, All stats are based in U.S unless specified otherwise

* twitter.com -March 2011

Page 6: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title style

• Find and extract patterns, not just keywords

• Capturing the 1000s of ways people say the same thing

• For example: how do people express that they are going to get a flu vaccine vs. not get one?

Natural Language Processing (NLP)

Page 7: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleRefining Sentiment via Context: UK Elections

• Accurate prediction of UK Election Results based on sentiment trends

• What we counted as positive

• What we did not count as positive

Page 8: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleWho will get a vaccine, and finding their influences

Getters Non Getters

Page 9: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleCategorizing by Topic and by Statement

• Many Tweets or newsfeed statements are duplicates, but comparing character-by-character does not show this

• Here we are categorizing by topic e.g. jobs or drug, and using NLP to cluster by the messages to allow faster reviewing

Page 10: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining MEDLINE

Drug Repurposing

Target ID and Prioritization

Drug-Drug Interactions

Biomarker Discovery

Page 11: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleGene Profile

• Gathers together information relevant for a specific gene from multiple documents

Page 12: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleRelating Genes to Disease

Visualization via Cytoscape

Page 13: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Conference Proceedings

Identify industry leaders

Gather information for internal customers

Identify new/novel areas of research

Find other companies / researchers working in the same area

To provide actionable information about competitors or activities of potential acquisition targets

Page 14: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title style

• Creating an itinerary for poster viewing

14

Conference Proceedings

Page 15: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title style

15

Conference Proceedings – Presenter Network

Page 16: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Patent literature

Research

IP

In-licensing

Competitive intelligence

Page 17: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title stylePatent Grants Distribution by Disease (Sample)

Page 18: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title stylePatent Applications and Grants: Companies vs. Diseases

0

5000

10000

15000

20000

25000

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Ap

plic

atio

ns

Gra

nts

Abbott AZ Bayer BMS GSK Roche Merck Novartis Pfizer

Virus Diseases

Substance-Related Disorders

Stomatognathic Diseases

Skin and Connective Tissue Diseases

Respiratory Tract Diseases

Parasitic Diseases

Otorhinolaryngologic Diseases

Occupational Diseases

Nutritional and Metabolic Diseases

Nervous System Diseases

Neoplasms

Musculoskeletal Diseases

Mental Disorders

Male Urogenital Diseases

Immune System Diseases

Hemic and Lymphatic Diseases

Female Urogenital Diseases and Pregnancy ComplicationsEye Diseases

Endocrine System Diseases

Page 19: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleFinding Inhibition Information in Patents

Page 20: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Clinical Trial Data

Select trial sites more effectively

Find precedents for study design

Monitor progress of competitors’ trials

Find other companies running clinical trials in the same therapeutic area

To provide actionable information about worldwide clinical development activities

Page 21: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleClinical Trial Analysis

NCT Number Standardization of data Hits in context

Page 22: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleClinical Trial Analysis

NCT Number Standardization of data Hits in context

Page 23: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleWorkflow Integration + Clinical Trial Analysis

Page 24: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Electronic Health Records

Diagnosis prediction from reports

Translational medicine

Analysis of treatment outcomes

Monitoring of adherence to protocols

Page 25: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleElectronic Health Records

• Identify patient information

• Need to consider context within the record, as well as the immediate sentence

Confidential

Page 26: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleMining Multiple Sources

Merge information from all possible sources

Find more detailed information from a second source

Connect information across different structured and unstructured sources

Compare trends in different data sources

Page 27: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleDelivering Diverse Data Sources

Page 28: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleI2E OnDemand

Hosted Site

Firewall

Company 1

RelationshipsGenes/Proteins

Chemicals Diseases

MEDLINE

Company 2 Company 3

Complete and up-to-date MEDLINE index with built in terminologies managed by Linguamatics

Page 29: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleStructured and Unstructured Data Workflows

• Successful queries converted into regular workflows for up-to-date dashboards, alerting

• Structured resources (tabular/semantic web) augmented with information from literature

Directly ask questions such as:

• Which genes from microarray experiment (or GWAS) show evidence of association with a disease

SemanticStore/DB

High throughput querying

PrioritizationMicroarray

GWAS

Documents

IndexedDocuments

Page 30: Synthesizing Knowledge by Exploiting Diverse Data Sources

Click to edit Master title styleClick to edit Master title styleConclusions

• Text Mining allows exploitation of diverse data sources e.g.

– Scientific Literature

– Patents

– Social Media

• Wide range of applications, from R&D to Marketing

• Results from each source are valuable in themselves, but can also combine

– Structured and unstructured data

– Data from different unstructured sources