28
PhUSE 2014 1 October 2014 Ziekte gebied/ Overall subject Name presenter Month-Year Title presentation PhUSE 2014 Berber Snoeijer Oct 2014 Simple and Efficient Matching Algorithms for Case-Control Matching Edith Heintjes

PhUSE 2014

Embed Size (px)

DESCRIPTION

PhUSE 2014. Berber Snoeijer. Oct 2014. Edith Heintjes. Simple and Efficient Matching Algorithms for Case-Control Matching. Contents. Observational studies Basic technique Different matching options Conclusions. Observational studies. (Retrospective) cohort Case-Control. - PowerPoint PPT Presentation

Citation preview

Page 1: PhUSE 2014

PhUSE 2014 1 October 2014

Ziekte gebied/ Overall subject

Name presenter Month-Year

Title presentation

PhUSE 2014

Berber Snoeijer Oct 2014

Simple and Efficient Matching Algorithms for Case-Control Matching

Edith Heintjes

Page 2: PhUSE 2014

PhUSE 2014 2 October 2014

Contents

• Observational studies• Basic technique• Different matching options• Conclusions

Page 3: PhUSE 2014

PhUSE 2014 3 October 2014

Observational studies

• (Retrospective) cohort• Case-Control

VS

Case Control

?

Page 4: PhUSE 2014

PhUSE 2014 4 October 2014

Case-control studiesLimit possible confounding factors

Page 6: PhUSE 2014

PhUSE 2014 6 October 2014

Case-control studies

Page 7: PhUSE 2014

PhUSE 2014 7 October 2014

Expected result

Page 8: PhUSE 2014

PhUSE 2014 8 October 2014

Matching

Greedy

Optimal

Others

Exact

Closest

Caliper

Page 9: PhUSE 2014

PhUSE 2014 9 October 2014

Efficient programming

• Limit number of data stepsPROC sql;

CREATE table Myagbs ASSELECT Distinct agb FROM data.fi_medicijnen_20145

quit;

data fif3 ;input POSTCODE INWONERS PROVINCIE PLAATSFIF3 NAAMFIF3 ;

run ;

proc SQL;create table xar3 asSELECT f.fif3, f.naamfif3, oapo_artcd, month(oapo_afldat) as month, year(oapo_afldat ) as year ,

ORDER BY fif3, oapo_artcd, year, month ;QUIT;

data Inkoop_fif3 (RENAME=(var1=agb var2=fif3 ));format Var1-var2 repmon verpak 12. zindex $8.;input var1-var2 zindex periode verpak;

run ;

proc sql ;create table data.fi_medicijnen_fif3

as select a.agb, a.zindex, a.fif3, a.verpak as aantalstuks, a.djm format=ddmmyy10.,

from inkoop_fif3 a left join data.fi_knmp as bon a.zindex = left(b.knmp_artcd);

quit;

Proc SQL;CREATE TABLE XXXAS

SELECT zindex, djm, fif3, knmp_prcd, knmp_atccd, knmp_inkhoev, SUM(aantalstuks) as aantalstuks

FROM data.fi_medicijnen_fif3GROUP BY zindex, djm, fif3, knmp_prcd, knmp_atccd, knmp_inkhoe;

;QUIT;

PROC SQL;CREATE TABLE Xar4 AS

SELECT a.*, FROM xar3 as a FULL OUTER JOIN TotXarelto as b ON a.oapo_artcd=b.zindex ;

QUIT;

Page 10: PhUSE 2014

PhUSE 2014 10 October 2014

Efficient programming

• Limit sorting

Page 11: PhUSE 2014

PhUSE 2014 11 October 2014

Efficient programming

• Decrease size of datasets

Page 12: PhUSE 2014

PhUSE 2014 12 October 2014

Efficient programming

• Limit number of iterations

Page 13: PhUSE 2014

PhUSE 2014 13 October 2014

Basic technique

1. Construct all possible pairs2. Add a random number to each combination3. Sort by control and random numberPROC SQL;

CREATE _Input ASSELECT a.*, b.* , ranuni(&Seed) as randomnumFROM Cases as a INNER JOIN Controls as b ON … (all exact and caliper criteria)ORDER BY Pt_control, randomnum;

QUIT;

Page 14: PhUSE 2014

PhUSE 2014 14 October 2014

Basic technique

4. Pick the first case for each controldata _Result1;

set _Input2;by Pt_control;if first.pt_control then output;

run;

5. Sort by caseproc sort data = _Result1;

by Pt_case randomnum;run;

Page 15: PhUSE 2014

PhUSE 2014 15 October 2014

Basic technique

6. Pick the controls up to the maximum number of controls you desire

data _result2;set _result1;retain Matchno;by Pt_case;if first.pt_case then Matchno=1;

ELSE MatchNo=MatchNo+1;if Matchno<=&MaxMatch then output _result2;

run;

Page 16: PhUSE 2014

PhUSE 2014 16 October 2014

Basic technique

Page 17: PhUSE 2014

PhUSE 2014 17 October 2014

By round Round 1Round 2Round 3Round 3, iteration 2

Page 18: PhUSE 2014

PhUSE 2014 18 October 2014

Closest match

Calculate all absolute differences between the case and controls.Sort by absolute difference and then closest distance.

 PROC SQL;CREATE _Input AS

SELECT a.*, b.* , ranuni(&Seed) as randomnum, Abs(CaseVal-RefVal) as AbsDif

FROM Cases as a INNER JOIN Controls as b ON …

(all exact and caliper criteria)ORDER BY Pt_control, AbsDif, randomnum;

QUIT; 

Page 19: PhUSE 2014

PhUSE 2014 19 October 2014

Closest match – plaatje omdraaien

1: 1.5

2: 1.7

3: 1.9

10: 1.6

11: 1.7

12: 1.8

13: 1.85

14: 1.9

15: 2.0

Page 20: PhUSE 2014

PhUSE 2014 20 October 2014

TestsMatch 1 control by round

Distance Rank Priority

Least number of matches priority

Run Time Total number of matched

cases

Total number of matched

Pairs

Number of iterations

No No No 1 min, 4 sec 1670 8025 25

No Yes No 1 min, 0 sec 1924 8781 6

No No Yes 1 min, 19 sec 1715 9831 7

No Yes Yes 1 min, 57 sec 1685 9828 9

Yes No No 4 min, 41 sec 2223 8441 74

Yes Yes No 4 min, 37 sec 2290 8859 32

Yes No Yes 5 min, 29 sec 2338 9190 39

Yes Yes Yes 9 min, 37 sec 2308 9171 45

2500 cases, 25000 possible matches, maximum of 8 controls per case

Page 21: PhUSE 2014

PhUSE 2014 21 October 2014

Least number of matches methodProc SQL;

Create table _input2 asselect *, ranuni(&Seed) AS randomnum, Count(*) as Nmatches from _InputMegroup by pt_caseorder by pt_control, Nmatches, randomnum ;

Quit; data _Result1;

set _Input2;by Pt_control;if first.pt_control then output;

run;

Page 22: PhUSE 2014

PhUSE 2014 22 October 2014

Least number of matches method (2)Proc SQL;

Create table _input2 asselect *, ranuni(&Seed) AS randomnum, case when (Count(*) <= 10) Then count(*) when (Count(*) <= 100) Then ROUND(count(*),10.) when (count(*) <= 1000) then round(Count(*),100.) when (count(*) <= 10000) then

round(count(*),1000.) else 10000 end as Nmatches from _InputMe

group by pt_caseorder by pt_control, Nmatches, AbsDif, randomnum ;

Quit;

123…102030..100200300…1000

Page 23: PhUSE 2014

PhUSE 2014 23 October 2014

Example

• 2415 cases• 22140 possible matches• Match on

– gender– age range (+/- 2.5 year)

• Max 10 matches per case

• No replacement• All at once• 7 rounds• 47 seconds

Page 24: PhUSE 2014

PhUSE 2014 24 October 2014

Example

• 2415 cases• 22140 possible matches• Match on

– gender– age range (+/- 2.5 year)

• Max 10 matches per case

• No replacement• Round by round, 10%

saturation• 16 rounds• 1 min 50 seconds

Page 25: PhUSE 2014

PhUSE 2014 25 October 2014

Example

• 2415 cases• 22140 possible matches• Match on

– gender– age range (+/- 2.5 year)

• Max 10 matches per case

• No replacement• Round by round, 60%

saturation• 19 rounds• 1 min 58 seconds

Page 26: PhUSE 2014

PhUSE 2014 26 October 2014

Example

• 2415 cases• 22140 possible matches• Match on

– gender– age range (+/- 2.5 year)

• Max 10 matches per case

• No replacement• Round by round, full

saturation• 41 rounds• 2 min 21 seconds

Page 27: PhUSE 2014

PhUSE 2014 27 October 2014

Conclusions

• Efficient and fast• Useful with Big data• Optimal• Can handle any combination of exact and

caliper variables• Can handle any number of matches to

controls• Final distribution can be examined and

best options can be chosen

Page 28: PhUSE 2014

PhUSE 2014 28 October 2014

• Questions?