43
Netflix Prize and Heritage Health Prize Philip Chan

Netflix Prize and Heritage Health Prize Philip Chan

Embed Size (px)

Citation preview

Netflix Prize andHeritage Health Prize

Philip Chan

Cash Prizes to Stimulate Research

Ansari X Prize for Private Spaceflight (2004) [$10M] 100 km above earth twice within 2 weeks

DAPRA Grand Challenge (2005) [$2M] autonomous vehicle: 131 miles in 10 hours

Archon X Prize for Genomics (2006) [$10M] map 100 human genomes in 10 days

Cash Prizes to Stimulate Research

Netflix Prize (2006) [$1M] Recommend movies with 10% improvement

Heritage Health Prize (2011) [$3M] Days in hospital next year with 0.4 error

Netflix Prize

netflixprize.com

Netflix Prize

Task Given customer ratings on some movies Predict customer ratings on other movies

If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Porter”, … ?

Performance Error rate (accuracy)

Cash Award

Grand Prize $1M 10% improvement by 2011 (in 5 years)

Progress Prize $50K per year 1% improvement

Intellectual Property

Netflix has a non-exclusive license to the algorithm

Authors tell the world what the algorithm is

Participation

51K contestants 41K teams 186 countries

Leader Board

Started on Oct 2, 2006 Improvement by the top algorithm

after a week: ~0.9% after two weeks: ~4.5% after a month: ~5% after a year: ~8.4% after two years: ~9.4% July 26, 2009 (less than 3 years): 10%

Winner

BellKor’s Pragmatic Chaos 7 members Merger of 3 teams

BellKor AT&T Labs, USA & Yahoo! Research, Israel

PragmaticTheory telecommunications, Canada

BigChaos started a company, Austria

A combination of different algorithms

Runner-up

The Ensemble ~30 members “last-minute” merger

teams had 30 days to beat the first team that crossed the 10% threshold

same accuracy behind by 20 minutes!

Heritage Health Prize

heritagehealthprize.com

Health Care

71M individuals admitted to US hospitals each year

Unnecessary admissions cost $30B

Heritage Provider Network

Has a network of doctors in California

Can we identify earlier those most at risk and ensure they get the treatment they need?

Can we reduce unnecessary hospitalizations?

Heritage Health Prize

Launch http://www.youtube.com/watch?v=GuZ8nkpygAs

Given patient data Predict how many days a patient will spend in

a hospital in the next year

The prediction helps develop strategies to reduce emergencies and hence hospitalizations

Grand Prize

$3M At most 0.4 in error (~0.5 day) By Apr 4, 2013 [2 years]

$500K Consolation Prize not below 0.4 error

Milestone Prizes

top 2 performers at each milestone

Aug 31, 2011 $30K, $20K

Feb 13, 2012 $50K, $30K http://www.youtube.com/watch?v=pkmkNnGyihY

Sep 4, 2012 $60K, $40K

Performance of Algorithms

Prediction Error Rate (RMSLE)

where real = log ( actual # of days + 1 ) prediction = log ( predicted # of days + 1 )

Prediction error threshold = 0.4 (~0.5 day)

n

predictionrealn

iii 2)(

Intellectual Property

Exclusive license to Sponsor and participant’s own use

Algorithms not previously published

Use of data sets is for the competition only written consent for other purposes

Data Sets

Training and validation data sets For participants to design algorithms

Feedback data set For calculating standings on Leaderboard

Scoring data set For determining winners for prizes

http://www.heritagehealthprize.com/c/hhp/Data

Data (in CSV format)

Members Data (113K members) Claims Data (2.7M claims) Drug Count Data (818K prescriptions) Lab Count Data (361K labs) Outcome Data (76K in Y2, 71K in Y3) Target (71K in Y4 for prediction)

Total ~264 MB (including other files)

Members Data

MemberID AgeAtFirstClaim Sex

Claims Data

MemberID ProviderID Vendor ID PCP (Primary care physician) ID Year Specialty (of physician/vendor?) PlaceSvc (place of service)

office, outpatient hospital, inpatient hospital, … PayDelay (between service and payment)

Claims Data [continued]

LengthOfStay (in hospital) DSFS (days since first claim) PrimaryConditionGroup (diagnostic

categories) CharlsonIndex (affect of diseases on illness) ProcedureGroup (intervention categories) SupLOS (supplement to LengthOfStay)

1 if LenghtOfStay is NULL because of de-identificaiton

Drug Count Data

MemberID Year DSFS (Days since first service) DrugCount (unique prescription drugs)

Lab Count Data

Member Id Year DSFS (Days since first service) LabCount (unique lab or pathology tests)

Outcome Data

MemberID DaysInHospital_Y2 (claims in Y1)

ie, Predict Y2 based on Y1 DaysInHospital_Y3 (claims in Y2) ClaimedTruncated

1 if members with “truncated” claims

Using Other Data?

Yes Freely available to anyone (public source) URL needs to be published to the forum

Except for demographic, socioeconomic or clinical

information about the members

Naive Algorithms

For predicting the number of Days in Hospital in the next year

Posted as “benchmarks” on the Leaderboard

Always Predict 15 (max)

Everyone goes to the hospital for at least 15 days

Always Predict 15 (max)

Everyone goes to the hospital for at least 15 days

RMSLE = 2.628062 550+% over threshold

Always Predict Zero

no one goes to the hospital

Always Predict Zero

no one goes to the hospital

RMSLE = 0.522226 31% over threshold

Predict Random Values

between 0 and 15

Predict Random Values

between 0 and 15

RMSLE = 0.752297 88% over threshold

Always Predict Average

Average ~= 0.209179

Always Predict Average

Average ~= 0.209179

RMSLE = 0.486459 22% over threshold

Leader Board

Competition started on Apr 4, 2011 with partial data

All data were released on June 4, 2011

Sep 9, 2011

Leader Board

Competition started on Apr 4, 2011 with partial data

All data were released on June 4, 2011

Sep 9, 2011 RMSLE: 0.456384 ~14.1% over threshold

Aug 29, 2012 RMSLE: 0.450426 ~12.6% over threshold

Teams

Sep 9, 2011 914 teams 6021 entries

Aug 29, 2012 1292 teams

Considerations

Accurate Prediction algorithms

Efficiency time space

Teams

Form your own teams www.heritagehealthprize.com

Join my team CSE 4403 Independent Study CSE 5801 Independent Research

THANK YOU

www.heritagehealthprize.com