18
Predictive Analytics for OpenFDA & Other Sources October 6, 2014

Predictive Analytics for OpenFDA & Other Sources

Embed Size (px)

DESCRIPTION

Predictive Analytics for OpenFDA & Other Sources. October 6, 2014. Data Fusion to Know a Individuals. OpenFDA Queries. https:// api.fda.gov/drug/event.json ? search =patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory+ drug” &count= patient.reaction.reactionmeddrapt.exact. - PowerPoint PPT Presentation

Citation preview

Page 1: Predictive Analytics for OpenFDA & Other Sources

Predictive Analytics for OpenFDA & Other Sources

October 6, 2014

Page 2: Predictive Analytics for OpenFDA & Other Sources

Data Fusion to Know a Individuals

Page 3: Predictive Analytics for OpenFDA & Other Sources

OpenFDA Queries

https://api.fda.gov/drug/event.json?

search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory+drug”

&count=patient.reaction.reactionmeddrapt.exact

End Point

search for records where

openfda.pharm_class_epc (pharmacologic

class) contains nonsteroidal anti-

inflammatory drug.

count the field patient.reaction.rea

ctionmeddrapt (patient reactions).

Page 4: Predictive Analytics for OpenFDA & Other Sources

https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22nonsteroidal+anti-inflammatory+drug%22&count=patient.reaction.reactionmeddrapt.exact

Page 5: Predictive Analytics for OpenFDA & Other Sources

Important OpenFDA data types

What the drug is supposed to fix: Pharmacologic Class (EPC) - pharm_class_epc

How the drug works: Mechanism of Action (MOA) - pharm_class_moa

What the drug affects: Physiologic Effect (PE) - pharm_class_pe

What is in the drug: Chemical Structure (CS) - pharm_class_cs

Page 6: Predictive Analytics for OpenFDA & Other Sources

https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22Serotonin+and+Norepinephrine+Reuptake+Inhibitor%22

Safety Report ID

Biographical DataAdverse Reactions

Drug Information

Page 7: Predictive Analytics for OpenFDA & Other Sources

More OpenFDA data types

How serious is the reaction: serious (1 for Yes, 2 for No)• "serious": "1",• "seriousnesscongenitalanomali": "1", • "seriousnessdeath": "1", • "seriousnessdisabling": "1" • "seriousnesshospitalization": "1", • "seriousnesslifethreatening": "1", • "seriousnessother": "1”

What is the drug indicated for: drugindication

Circumstances for taking drug: patient.drug.drugadditional

Page 8: Predictive Analytics for OpenFDA & Other Sources

Predictions on OpenFDA Data

Hierarchical Clustering (“unsupervised learning”) on Manufacturers by Drug Class and Adverse Events

Generates Insights and Further Questions to Explore, Like; Do some adverse events dominate all others? What is the role of retail distributors rather than

manufacturers – an artifact of the data or something else they do between between themselves and patient?

Page 9: Predictive Analytics for OpenFDA & Other Sources

Manufacturers by All Drug Classes

Group distinguished by abnormally large adverse events for the products they make – includes companies Mylan and Teva

Group troubling in the large number of adverse events for the products they make – includes companies Abbvie and Pfizer

Group above average for the number of product adverse events. includes private labeling companies CVS, Kroger, Wal-Mart, Publix

Other manufacturers not troubling in the number of adverse events

Page 10: Predictive Analytics for OpenFDA & Other Sources

Manufacturers by All Adverse Events

Other manufacturers not troubling in the number of adverse events

Group of 1 highly (Mylan) distinguished by abnormally large adverse events for the products they make

Group troubling in the large number of adverse events for the products they make – includes companies Teva and Grocery Store Kroger

Group above average for the number of product adverse events. includes big pharma maker Merck.

Page 11: Predictive Analytics for OpenFDA & Other Sources

Conditional Probability Models (Bayes) Very Helpful for Predictions

Model Type % Correct on Age

% Correct on Gender

Random Forest 48% 55%

Support Vector Machine

48% 55%

Decision Trees 14% 9%

Naïve Bayes 64% 78%

Page 12: Predictive Analytics for OpenFDA & Other Sources

Why is Bayes So Much Better?

Works on Conditional Probability

Utilizes Much More of What We Already Know

Probability of Age 18to34 | Rating % Age 18to34drug

drug

Page 13: Predictive Analytics for OpenFDA & Other Sources

Bayes is Conditional Probability

Intuition is “What the chances of X given I know Y”

This will always be better than flipping a coin – as in the case of gender prediction

The probability of Female (F) for a any given Drug (T) is the same as the probability of the Drug given Female times the probability of being female divided by the probability of the Drug.

Page 14: Predictive Analytics for OpenFDA & Other Sources

Bayes Results for Single Person Households

**** ACCURACY **** WEIGHTED ACCURACY

Genre Gender Age Size Weight Gender AgeADVENTURE 75.4% 62.0% 16,565 1.001 75.5% 62.1%

AUDIENCE PARTICIPATION 84.1% 78.8% 46,283 1.003 84.4% 79.0%AWARD CEREMONIES 60.4% 42.6% 655 1.000 60.4% 42.6%

CHILD - LIVE 78.6% 67.7% 4,868 1.000 78.6% 67.7%CHILD DAY - ANIMATION 74.7% 59.3% 3,487 1.000 74.7% 59.4%

CHILD MULTI-WEEKLY 81.6% 73.2% 1,916,697 1.144 93.3% 83.8%CHILDREN'S NEWS 76.0% 33.3% 300 1.000 76.0% 33.3%COMEDY VARIETY 76.7% 68.9% 326,770 1.025 78.6% 70.6%CONCERT MUSIC 67.8% 54.6% 2,822 1.000 67.9% 54.6%

CONVERSATIONS, COLLOQUIES 76.8% 63.3% 113,290 1.009 77.5% 63.9%

DAYTIME DRAMA 81.1% 62.5% 20,478 1.002 81.2% 62.6%DEVOTIONAL 64.0% 47.8% 1,344 1.000 64.0% 47.8%

EVENING ANIMATION 80.7% 76.7% 481,722 1.036 83.6% 79.5%FEATURE FILM 74.5% 62.7% 449,549 1.034 77.0% 64.8%

FORMAT VARIES 76.6% 56.0% 1,127 1.000 76.6% 56.0%GENERAL DOCUMENTARY 74.6% 63.9% 2,004,256 1.150 85.8% 73.5%

GENERAL DRAMA 75.0% 63.6% 1,949,243 1.146 86.0% 72.9%GENERAL VARIETY 73.4% 62.1% 377,859 1.028 75.5% 63.8%

INSTRUCTION, ADVICE 79.1% 67.2% 1,000,586 1.075 85.0% 72.2%NEWS 77.8% 65.4% 971,951 1.073 83.5% 70.1%

NEWS DOCUMENTARY 77.5% 63.2% 100,634 1.008 78.1% 63.7%OFFICIAL POLICE 46.6% 29.2% 1,009 1.000 46.6% 29.2%

PARTICIPATION VARIETY 75.3% 62.3% 174,900 1.013 76.3% 63.1%POPULAR MUSIC 77.0% 67.5% 458,606 1.034 79.6% 69.8%POPULAR MUSIC

STANDARD 69.0% 50.5% 2,335 1.000 69.0% 50.5%PRIVATE DETECTIVE 71.5% 71.5% 20,522 1.002 71.6% 71.7%

QUIZ GIVE AWAY 79.1% 68.7% 76,822 1.006 79.5% 69.1%QUIZ PANEL 79.8% 63.4% 1,700 1.000 79.8% 63.4%

SCIENCE FICTION 76.1% 65.3% 24,219 1.002 76.2% 65.4%SITUATION COMEDY 75.4% 61.3% 1,124,687 1.084 81.8% 66.5%

SPORTS ANTHOLOGY 83.8% 64.8% 52,166 1.004 84.1% 65.0%SPORTS COMMENTARY 79.0% 68.7% 993,734 1.075 84.9% 73.9%

SPORTS EVENT 75.0% 62.2% 204,127 1.015 76.2% 63.1%SPORTS NEWS 81.1% 68.3% 15,275 1.001 81.2% 68.4%

SUSPENSE/MYSTERY 81.3% 70.9% 342,405 1.026 83.4% 72.7%UNCLASSIFIED 77.8% 62.8% 38,060 1.003 78.0% 63.0%

WESTERN DRAMA 75.6% 63.8% 4,300 1.000 75.7% 63.9%

AVERAGE 75.4% 62.1%13,325,35

3 77.5% 63.9%

Page 15: Predictive Analytics for OpenFDA & Other Sources

Simplifying the Problem Set

Single Households

Multi-Person Households

Same Gender & Same Age Class

Same Gender & Diff. Age Class

Diff. Gender & Same Age Class

Diff. Gender & Diff. Age Class

123K

21K

44K

303K

133K

500K

nothing to predict

predict age

predict gender

predict both

Age / Gender models by Drug

Page 16: Predictive Analytics for OpenFDA & Other Sources

2 Stage Models

Same Gender & Diff. Age Class

Diff. Gender & Same Age Class

Diff. Gender & Diff. Age Class

predict age

predict gender

predict both

Age / Gender Models by Drug

Age / Gender Conditional Probability

1

2

Single Households

Page 17: Predictive Analytics for OpenFDA & Other Sources

Age Conditional Probabilities

Page 18: Predictive Analytics for OpenFDA & Other Sources

Full Bayes Model

Using all the independent variables –

Where MAX is the prediction of Age or Gender classification given all the conditional probabilities known.

NOTE: The MAX prediction for Age is constrained by ID – each ID has only 2 possible Age classes since these are known, so if model predicts an Age class outside boundaries of a ID pick next highest MAX probability for Age.