32
An R package for analyzing truncated data Carla Moreira 1* , Jacobo de U˜ na- ´ Alvarez 1 and Rosa M. Crujeiras 2 1 Department of Statistics and OR, University of Vigo 2 Department of Statistics and OR, University of Santiago de Compostela * [email protected]

An R package for analyzinig truncated dataDeU... · An R package for analyzing truncated data ... This package incorporates the functions efron.petrosian, lynden and shen, which call

  • Upload
    dinhnhi

  • View
    255

  • Download
    0

Embed Size (px)

Citation preview

An R package for analyzingtruncated data

Carla Moreira1∗, Jacobo de Una-Alvarez1 and Rosa M. Crujeiras2

1 Department of Statistics and OR, University of Vigo2 Department of Statistics and OR, University of Santiago de Compostela

[email protected]

Introduction Algorithms for DTD Package description Conclusions

Outline

1 Introduction

2 Algorithms for DTD

3 Package description

4 Conclusions

Moreira et al. useR! 2009 DTDA package 2/30

Introduction Algorithms for DTD Package description Conclusions

Motivation examples

Astronomy

Epidemiology

Economy

Survival Analysis

In these cases, we must apply specialized statistical modelsand methods due the need to accommodate the event oflosses in the sample, such as grouping, censoring ortruncation.

Moreira et al. useR! 2009 DTDA package 3/30

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

Moreira et al. useR! 2009 DTDA package 4/30

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

t1

Moreira et al. useR! 2009 DTDA package 4/30

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

t1 t2Observational Window

Moreira et al. useR! 2009 DTDA package 4/30

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

Let X∗ be the ultimate time of interest with df F

(U∗, V ∗) the pair of truncation times, with joint df K

We observe (U∗,X∗, V ∗) if and only if U∗ ≤ X∗ ≤ V ∗

Let (Ui,Xi, Vi), i = 1, ..., n be the observed data.

Under the assumption of independence between X∗ and (U∗, V ∗):

The full likelihood is given by:

Ln(f, k) =n

j=1

fjkj∑n

i=1 Fiki

Moreira et al. useR! 2009 DTDA package 5/30

Introduction Algorithms for DTD Package description Conclusions

Truncation SchemeWhere:

f = (f1, f2, ..., fn)

k = (k1, k2, ..., kn)

Fi =∑n

m=1 fmJim

andJim = I[Ui≤Xm≤Vi] = 1 if Ui ≤ Xm ≤ Vi,

or zero otherwise.As noted by Shen (2008):

Ln(f, k) =

n∏

j=1

fj

Fj

×

n∏

j=1

Fjkj∑n

i=1 Fiki

= L1(f) × L2(f, k)

Moreira et al. useR! 2009 DTDA package 6/30

Introduction Algorithms for DTD Package description Conclusions

Efron-Petrosian estimators

The condicional NPMLE of F (Efron-Petrosian, 1999) is definedas the maximizer of L1(f).

1

fj

=

n∑

i=1

Jij ×1

Fi

, j = 1, ..., n

where Fi =

n∑

m=1

fmJim.

This equation was used by Efron and Petrosian (1999) tointroduce the EM algorithm to compute f .

Moreira et al. useR! 2009 DTDA package 7/30

Introduction Algorithms for DTD Package description Conclusions

EM algorithm from Efron and Petrosian (1999)

EP1. Compute the initial estimate F(0) corresponding to

f(0) = (1/n, ..., 1/n);

EP2. Apply (1) to get an improved estimator f(1) to compute the

F(1) pertaining to f(1);

EP3. Repeat Step EP2 until convergence criterion is reached.

Moreira et al. useR! 2009 DTDA package 8/30

Introduction Algorithms for DTD Package description Conclusions

Shen EstimatorInterchanging the roles of X’s and (Ui, Vi):

Ln(f, k) =n

j=1

kj

Kj

×

n∏

j=1

Kjfj∑n

i=1 Kifi

= L1(k) × L2(k, f)

where

Ki =

n∑

m=1

kmI[Um≤Xi≤Vm] =

n∑

m=1

kmJim

and maximizing L1(k):

1

kj

=

n∑

i=1

Jji

1

Ki

, j = 1, ..., n

with Ki =n

m=1

kmJim.

Moreira et al. useR! 2009 DTDA package 9/30

Introduction Algorithms for DTD Package description Conclusions

Shen Estimator

Shen (2008) showed that the solutions are the unconditionalNPMLE of F and K, respectively, and both estimators can beobtained by:

fj =

[

n∑

i=1

1

Kj

]−11

Kj

, j = 1, ..., n

kj =

[

n∑

i=1

1

Fj

]−11

Fj

, j = 1, ..., n

Moreira et al. useR! 2009 DTDA package 10/30

Introduction Algorithms for DTD Package description Conclusions

EM algorithm from Shen (2008)

S1. Compute the initial estimate F(0) corresponding to

f(0) = (1/n, ..., 1/n);

S2. Apply (4) to get the first step estimator k(1) and compute the

K(1) pertaining to k(1);

S3. Apply (3) to get the first step estimator f(1) and its

corresponding F(1);

S4. Repeat Steps S2 and S3 until convergence criterion is reached.

Moreira et al. useR! 2009 DTDA package 11/30

Introduction Algorithms for DTD Package description Conclusions

DTDA-package

efron.petrosian(X,...)

lynden(X,...)

shen(X,...)

Moreira et al. useR! 2009 DTDA package 12/30

Introduction Algorithms for DTD Package description Conclusions

DTDA-package

efron.petrosian(X,...)

lynden(X,...)

shen(X,...)

3 examples data sets with X ∼ Unif(0,1)and:

Ex.1 U ∼ Unif(0,0.5), V ∼ Unif(0.5,1)

Ex.2 U ∼ Unif(0,0.25), V ∼ Unif(0.75,1)

Ex.3 U ∼ Unif(0,0.67), V ∼ Unif(0.33,1)

Moreira et al. useR! 2009 DTDA package 13/30

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under double truncation

EX.1-50% of truncation

efron.petrosian(X,U,V,. . .)

>iter

>f

>FF

>S

>Sob

>upperF

>lowerF

>upperS

>lowerS0.2 0.6 1.0

0.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.6 1.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 14/30

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under double truncation

0.2 0.6 1.0

0.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.6 1.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 15/30

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under left truncation

EX.1

efron.petrosian(X,U,...)

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 16/30

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under left truncation

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 17/30

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under double truncation

EX.2-25% of truncation

lynden(X,U,V,...)

>iter

>NJ

>f

>FF

>h

>S

>Sob

>upperF

>lowerF

>upperS

>lowerS

0.2 0.6

0.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.6

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 18/30

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under double truncation

0.2 0.6

0.0

0.2

0.4

0.6

0.8

1.0

EP estimator

Time of interest

0.2 0.6

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 19/30

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under right truncation

EX.2

lynden(X,V,...)

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 20/30

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under right truncation

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 21/30

Introduction Algorithms for DTD Package description Conclusions

shen illustration under double truncation

EX.3-67% of truncation

shen(X,U,V...)

>iter

>f

>FF

>S

>Sob

>k

>fU

>fV

>upperF

>lowerF

>upperS

>lowerS

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Shen estimator

Time of interest

0.2 0.4 0.6 0.8

0.2

0.6

1.0

Survival

Time of interest

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Marginal U

Time of interest

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Marginal V

Time of interest

Moreira et al. useR! 2009 DTDA package 22/30

Introduction Algorithms for DTD Package description Conclusions

shen illustration under double truncation

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Shen estimator

Time of interest

0.2 0.4 0.6 0.8

0.2

0.6

1.0

Survival

Time of interest

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Marginal U

Time of interest

0.2 0.4 0.6 0.8

0.0

0.4

0.8

Marginal V

Time of interest

Moreira et al. useR! 2009 DTDA package 23/30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.

Moreira et al. useR! 2009 DTDA package 24/30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.

This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).

Moreira et al. useR! 2009 DTDA package 25/30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.

This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).

Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.

Moreira et al. useR! 2009 DTDA package 26/30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data being allowed.

This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).

Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.

Plots of cumulative distributions and survival functions areprovided.

Moreira et al. useR! 2009 DTDA package 27/30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms foranalyzing randomly truncated data, one-sided and two-sided(i.e. doubly) truncated data.

This package incorporates the functions efron.petrosian,lynden and shen, which call the iterative methods introducedby Efron and Petrosian (1999)and Shen (2008).

Estimation of the lifetime and truncation times distributions ispossible, together with the corresponding pointwise confidencelimits based on the bootstrap.

Plots of marginal cumulative distributions and survivalfunctions are provided.

There are no R packages with double truncation scheme.

Moreira et al. useR! 2009 DTDA package 28/30

Introduction Algorithms for DTD Package description Conclusions

Acknowledgments

Work supported by the research Grant MTM2008-03129 andMTM2008-0310 of the Spanish Ministerio de Ciencia eInnovacion

Grant PGIDIT07PXIB300191PR of the Xunta de Galicia

Moreira et al. useR! 2009 DTDA package 29/30

Introduction Algorithms for DTD Package description Conclusions

References

Efron, B. and Petrosian, V. (1999)Nonparametric methods for doubly truncated data.Journal of the American Statistical Association, 94, 824-834.

Lynden-Bell, D. (1971)A method of allowing for known observational selection insmall samples applied to 3CR quasars.Mon. Not. R. Astr. Soc, 155, 95-118.

Moreira, C. and de Una Alvarez, J.(Under revision)Bootstrapping the NPMLE for doubly truncated data.Journal of Nonparametric Statistics.

Shen P-S. (2008)Nonparametric analysis of doubly truncated data.Annals of the Institute of Statistical Mathematics, DOI10.1007/s10463-008-0192-2.

Moreira et al. useR! 2009 DTDA package 30/30