40
Propensity Scores How to do it – Part 1

Propensity Scores How to do it – Part 1. X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 No matrices were harmed in this presentation

Embed Size (px)

Citation preview

Propensity Scores

How to do it – Part 1

X 11 X 12 X 13

X21 X22 X23

X31 X32 X33

No matrices were harmed inthis presentation

WHY YOU NEED IT

TWO NON-EQUIVALENT GROUPS

Patients in specialized units

People who attend a fundraising event

Research Question

Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk of death in adults over 65 years of age when controlling for age, gender, race, and number of emergency room visits?

Propensity Score MatchingorDo nursing homes kill you?

ANY TIME YOU CAN ASK THE QUESTION ….

Is there a difference on OUTCOME between levels of “treatment” A, controlling for X, Y and Z ?

ExamplesOUTCOME “TREATMENT”

LEVELSCOVARIATES

DROP OUT PUBLIC, PRIVATE INCOMEPARENT EDUCATIONGR. 8 ACHIEVEMENT

BMI DAILY SOFT DRINKSNO SOFT DRINKS

GENDERAGERACEEXERCISE FREQ.

DEATH LIVES AT HOMENURSING HOME

AGEGENDERTOTAL ER VISITS

1. Make sure there are pre-existing differences

(Thank you, Captain Obvious)

2a. Decide on covariates Are the differences pre-existing or

could they possibly be due to the different “treatment” levels?

Race and gender are good choices for covariates. If more students at private vs public schools are black or female, the schooling probably didn’t cause that

Differences in grade 10 math scores may be a result of the type of school

2b. Decide on covariates

Don’t use your outcome variable as one of your covariates

3. Run logistic regression to generate propensity scores

LOGISTIC REGRESSION VARIABLES dep

/METHOD=ENTER indep1 indep2 indep3

/SAVE=PRED

/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).

RENAME VARIABLES (PRE_1=propen) .

SAVE OUTFILE= "test.sav" .

4. Select matching method

1. Quintiles

2. Nearest neighbors

3. Calipers

ALL OF THE ABOVE CAN BE DONE EITHER WITH OR WITHOUT REPLACEMENT

5. Run matching program & test its effectiveness

6. Run your analysis using the matched data set

An actual example

Do specialized hospital units save lives?

Our problem

We have cities with and without specialized care units (trauma center, burn unit)

We want to see if the cities with specialized units have higher survival rates, controlling for other variables

Creating Propensity Scores

What variables are related to group?

Example:

Age group and gender were significantly related to city.

Preparing the data

Maximum likelihood solutions are large sample methods. You may wish to combine or delete categories with small numbers

Consider dropping or combining categories… (this was done)

MECHANISM Frequency Cumulative

Percent Percent

Fall 1370 19.2 19.2

GSW 1357 19.0 38.2

MVC 2161 30.3 30.3 68.5

Other 588 8.2 76.7

Accidents 1077 15.1 91.8

Shark attacks 44 .6 92.4

HWB 542 7.6 100.0

Total 7139 100.0

Start SPSSOpen example.sav

File > Open > Data

Note: This is real data with some changes made for confidentiality

An appearance by Captain Obvious Because propensity score

matching essentially checks that the difference between groups disappears once pre-existing differences are controlled, before you go to all of this trouble, test to see that the groups are ,in fact, significantly different.

Syntax

vs

Pointy-clicky stuff

EDIT > OPTIONS >viewer

Example: City study

ANALYZE >

Descriptive Statistics > Crosstabs > Statistics > Chi-square

Use crosstabs to test for difference on categorical variables

Move variables desired to Rows and ColumnsClick on Statistics

Note: You can put multiple variables under rows

Click on chi-squareIf desired, select phi coefficient also.

SYNTAX

CROSSTABS

/TABLES=OUTCOME Age_groups CategGCS BY City_of_injury

/FORMAT=AVALUE TABLES

/CELLS=COUNT

/COUNT ROUND CELL.

Basic statistics to test covariatesTesting for differences on numeric variables

ANALYZE > COMPARE MEANS > INDEPENDENT SAMPLES T-TEST

Independent samples t-test

Age as test variableCity_of_injury as group

What differs between cities?

Age in years, Age group was not significantly different between cities

Gender, Trauma Type, Mechanism of Injury, Admission to ICU, GCS, ISS & RTS are all significantly different between cities

What differs between outcomes? ICU_LOS,Trauma Type, Mechanism of

Injury, Admission to ICU, GCS, ISS & RTS are all significantly different between cities

What variables should be controlled?Example of City A vs B- Logistic regression with city as dependent

and age group, trauma type & admission to ICU as independents.

- Logistic regression with city as dependent and Age Group, Gender, Trauma Type, Mechanism of Injury, Admission to ICU, GCS, ISS & RTS as independents.

Since running the logistic regression and creating propensity scores takes relatively little time it is not much trouble to test more than one model

Logistic regression

From SPSS menu select:

ANALYZE > REGRESSION >

BINARY LOGISTIC

CovariatesMECHANISM TRAUMATYPE RTS ISS CategGCS ICULOS

Define categorical variables

Select Predicted Probabilities(not yet)

SYNTAX

LOGISTIC REGRESSION VARIABLES City_of_injury

/METHOD=ENTER MECHANISM TRAUMATYPE RTS ISS CategGCS ICULOS

/CONTRAST (MECHANISM)=Indicator

/CONTRAST (CategGCS)=Indicator

/CONTRAST (TRAUMATYPE)=Indicator

/SAVE=PRED Don’t include this yet

/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).