16 Partha Lahiri

Embed Size (px)

Citation preview

  • 8/6/2019 16 Partha Lahiri

    1/31

    SMALL DOMAIN PROPORTION ESTIMATION: ANADAPTIVE BAYESIAN APPROACH

    Partha Lahiri

    JPSM, University of Maryland, College Park, USA

    [Based on joint work with Ms. Benmei Lui]

  • 8/6/2019 16 Partha Lahiri

    2/31

    Examples

    Estimation of batting averages of major leaguebaseball players (Efron andMorris, 1975)

    Small Area Income and Poverty Estimation (SAIPE)Survey of drug use in Nebraska

    2

  • 8/6/2019 16 Partha Lahiri

    3/31

    Borrowing Strength:

    Relevant Source of InformationCensus dataAdministrative informationRelated surveys

    Method of Combining InformationChoices of good small area modelsUse of a good statistical methodology

    3

  • 8/6/2019 16 Partha Lahiri

    4/31

    A Basic Area Level Model

    iP true proportion for area i

    ip : direct design-based estimate for area i

    ix : a vector of known auxiliary variables

    Model:For i=1,,m,

    Level 1 : = g(i i i i

    T 2

    i i i

    p ) ~ ind. N( , )

    Level 2 : = g(P ) ~ ind. N(x , )

    4

  • 8/6/2019 16 Partha Lahiri

    5/31

    The sampling variances are assumed to be knowni

    Carter and Rolph (1974): ,i i i

    ,i

    g(p ) = arcsine( p ) =4n

    1

    Efron and Morris (1975): i i ig(p ) = n arcsine(2p -1), = 1

    Fay and Herriot (1979):i i

    g(p ) = log(p ),

    i

    estimated by GVF

    SAIPE:

    ifor state level estimation of proportion

    of poor school-age children and i for county

    level poverty counts of school-age children

    ig(p ) = p

    g(Y ) = log(Y )i

    5

  • 8/6/2019 16 Partha Lahiri

    6/31

    Comments:

    The model is simple and does not require theknowledge of detailed design Information (e.g., PSU

    identifiers), which may not be available in a public-use

    file

    The resulting empirical best predictor (EBP) of isdesign-consistent

    i

    EBP method is extendable to specified non-normaldistributions for the sampling and random effects.

    6

  • 8/6/2019 16 Partha Lahiri

    7/31

    For unspecified non-normality of the sampling andrandom effects, one can use EBLUP [Lahiri and Rao,1995] or certain adaptive [Li and Lahiri, 2007; Fabrizi

    and Trivisano, 2007] or linear EB [Ghosh and Lahiri,

    1987]

    Known sampling variances : The GVF typemethods are generally used. The method usually doesnot consider small area effects and the uncertainty in

    estimating the sampling variances are not included in

    the EBP.

    i

    In some situation, standard estimates [REML, ML,ANOVA, etc.] of the model variance 2 can be zero.

    7

  • 8/6/2019 16 Partha Lahiri

    8/31

    When is zero, EBLUP reduces to the regressionsynthetic estimate. One way to avoid the problem is to

    use the ADM or AML estimates [Morris, 1987; Li and

    Lahiri, 2007]

    2

    The rationale behind the transformation rests onthe Taylor series argument and is used primarily to

    stabilize the variance. A direct modeling of the directestimates is possible, but this is likely to lead to non-

    linear non-normal mixed model.

    g(.)

    A simple back transformation is often used to obtainthe estimate of . The optimum property of the BP is

    lost by such a back transformation.i

    P

    8

  • 8/6/2019 16 Partha Lahiri

    9/31

    Measures of uncertainty and confidence interval

    problem are quite challenging and the theory rests on

    asymptotics

    Hierarchical Bayes implementation of the basic arealevel model provides an exact inference at the expense

    of specification of priors for the hyperparameters.

    9

  • 8/6/2019 16 Partha Lahiri

    10/31

    Estimation of Small Area Proportions: Two BasicArea ModelsRef: Liu, Lahiri and Kalton (2007)

    Model 1 :

    i ii i i i

    i

    T 2i i

    P ( 1 - P )Level 1 : p | P ~ ind N(P , DEFF )

    n

    Level 2 : g(P ) ~ ind N(x , )

    Model 2 :

    i ii i i i

    i

    T 2

    i i

    P ( 1 - P )Level 1 : p | P ~ ind Beta(P , DEFF )n

    Level 2 : g(P ) ~ ind N(x , )

    10

  • 8/6/2019 16 Partha Lahiri

    11/31

    Comments

    Both EBP [Jiang and Lahiri 2002; Chatterjee andLahiri 2008] and the Bayesian implementation

    [Liu,Lahiri and Kalton 2007] of the above models are

    possible

    Level 1 modeling could be problematic in the presenceof sizable number of zeroes for small area.

    2

    W P (1- P )/nh ih ih ih ih

    DEFF =i P (1 -P )/ni i i

    ;

    ih ih iW = N /N ;N = Ni h ih

    i h ihn = n

    11

  • 8/6/2019 16 Partha Lahiri

    12/31

    is the population proportion for stratum in area .ihP h i

    The design effect is a function of , which areunknown.

    iDEFF

    ihP

    If ,ih

    .ih i

    P P i

    2DEFF deff = n W /ni i h ih

    DEFF estimation typically requires a syntheticassumption and the variability due to estimation of the

    Deff is not accounted for (Other refs: papers by Rao

    and You;Folsom and Singh)

    12

  • 8/6/2019 16 Partha Lahiri

    13/31

    A Unit Level Model:

    Level 1:: ; (ind

    ik i iy | ~ Bernoulli( )

    Level 2:i

    ,i i

    logit( ) = x ' + v

    where , .iid

    2

    iv ~ N(0, ) i = 1, ...,m

    Ref:

    MacGibbon and Tomberlin (1989)Malec, Sedransk, Moriarity and LeClere (1997)

    Malec, Sedransk and Tompkins (1993)

    Malec, Davis and Cao (1999)

    Ghosh et al. (1998)

    13

  • 8/6/2019 16 Partha Lahiri

    14/31

    An Illustration Of Platikurtic RandomEffects:

    Study Population: 2002 natality public-use data file that

    contains records on all births (4,024,378) occurring within

    the United States in 2002.

    i

    i1

    i2

    For i = 1,...,51

    P : true low birth weight ratex : proportion of mothers of age < 15 yr

    x : proportion of case where the newborn is the first child in the family

    14

  • 8/6/2019 16 Partha Lahiri

    15/31

    Fit

    i 0 1 i1 2 i2 iLevel 2 : logit(P ) = + x + x + v

    Obtain the residual.

    Analyze the residuals for possible departure fromnormality.

    pThe -value from the Kolmogorov-Smirnov test is0.0436.

    15

  • 8/6/2019 16 Partha Lahiri

    16/31

    The exponential power (EP) distributionRef: Box and Tiao (1973)

    0 1/1EP

    ccf (x | ,, ) = exp - | (x - ) |

    , - < x < +

    where ] + R, R , (0,1 0

    c = (3 )/( ) , 1 0

    c = c /2 ( ) .

    : location

    : scale

    The excess of kurtosis is:

    2

    ( )(5 )

    = - 3 (3 ).

    ,

    Normal: 0.5 = Platikurtik: 0.5 < Leptokurtic: 0.5 >

    16

  • 8/6/2019 16 Partha Lahiri

    17/31

    NCHS Data Analyses Cont.

    i

    v ~ EP(0, , )

    Assume that the components of ( , ) are independentand i) ) ~ Unif(0, 1 and ii) ~ U , K is a largepositive number

    nif(0, K)

    The posterior mean of is 0.2.The one-sided 95% credible interval is (0, 0.473),which does not include the normal case ( = 0.5).

    Among several alternatives the EP model with = 0.2fits the data best in terms of the smallest DIC (Ref on

    DIC: Spiegelhalter et. al., 2002)

    17

  • 8/6/2019 16 Partha Lahiri

    18/31

    -2 -1 0 1 2

    -0.2

    -0.1

    0.0

    0.1

    0.2

    Figure 1. Normal Q-Q plot of the random effect vi

    Theoretical Quantiles

    SampleQuantiles

    -2 -1 0 1 2

    -0.1

    5

    -0.1

    0

    -0.05

    0.0

    0

    0.0

    5

    0.1

    0

    Figure 2. Normal Q-Q plot of a random generated platikurtic data

    Theoretical Quantiles

    SampleQuantiles

    18

  • 8/6/2019 16 Partha Lahiri

    19/31

    Figure 3. Posterior density of kurtosis psi

    psi

    Density

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    19

  • 8/6/2019 16 Partha Lahiri

    20/31

    Our Unit Level Model for Estimating Small AreaProportions

    For i = 1,...,m

    Level 1: ;ind

    ik i iy | ~ Bernoulli( )

    Level 2:i i i

    logit( ) = x ' + v

    Two approaches for modeling the area specific randomeffects :

    iv

    Option 1: Assume that the kurtosis of is known; e.g.,iid

    - Bernoulli-Logit-Normal Model

    iv

    2

    iv ~ N(0, )

    20

  • 8/6/2019 16 Partha Lahiri

    21/31

    Option 2: Assume that the distribution of is a memberof a class of distributions that covers a wide range of

    kurtosis values and let the data determine the unknown

    kurtosis; e.g., assume

    iv

    i

    v ~ EP(0, , ) and estimate just

    as we would estimate the other hyperparameters of the

    hierarchical model - Bernoulli-Logit-EP model

    21

  • 8/6/2019 16 Partha Lahiri

    22/31

    Comments:In an unpublished work in the early 90s, Lahiri and

    Rao considered a robust extension of the Batesse-

    Fuller-Harter (1988) using EP.

    Fabrizi and Trivisano (2007): robust extensions to theFay-Herriot model for continuous data using EP.

    Li and Lahiri (2007) considered a super-populationmodel was chosen adaptively from the well-known

    Box-Cox class of transformation.

    22

  • 8/6/2019 16 Partha Lahiri

    23/31

    Simulated Data Analysis

    Two aims:

    When is non-normal, how effective is the Bernoulli-Logit-EP model relative to the Bernoulli-Logit-Normal?

    iv

    When is indeed normal, what is the effect ofoverparametrization for the Bernoulli-Logit-EPmodel?

    iv

    23

  • 8/6/2019 16 Partha Lahiri

    24/31

    .im = 100,n = 5 .'

    ix = = 0

    We consider two cases: = 0.2 (platikurtic) and(normal).

    0.5

    For each of the two cases, one sample was generatedfrom the models: ,

    i ilogit( ) = + v

    i

    v ~ EP(0, = 0.1, ) i = 1,...,m and ,

    , .

    ij iy ~ Bernoulli( )

    ij = 1, ...,n i = 1,...,m

    24

  • 8/6/2019 16 Partha Lahiri

    25/31

    Priors for the hyperparameters: i) f() 1; ii) ~ Unif(0,K), and iii) ~ Unif(0,1).We computed HB estimates for the two models using

    WinBUGS. For each WinBUGS run, three

    independent chains were used. For each chain, burn-

    ins of 1,000 samples were produced, with 4,000

    samples after burn-in. The resultant 12,000 MCMCsamples after burn-in were then used to compute the

    posterior means and percentiles for each HB model

    based on each sample dataset. The potential scale

    reduction factor was used as the primary measure

    for convergence (see Gelman and Rubin, 1992).

    R

    25

  • 8/6/2019 16 Partha Lahiri

    26/31

    Let denote an HB estimator of , and let denote

    the

    HB

    iP

    iP HB

    i,qP

    qth

    percentile of the posterior distribution of . To

    evaluate the two HB models, the following evaluation

    statistics for each HB estimator are calculated:

    iP

    Average squared deviation (ASD), m HB 2

    i ii=1

    1ASD = (P - P )

    m

    Average absolute deviation (AAD), m HB

    i ii=1

    1AAD = | P - P |

    m

    Average squared relative deviation (ASRD),

    m HB 2

    i i i1

    i=1ASRD = ((P - P )/P )

    m

    26

  • 8/6/2019 16 Partha Lahiri

    27/31

    Average absolute relative deviation (AARD),

    im HB

    i ii=1

    1AARD = | P - P |/P

    m

    Average length of the 95% credible interval (ALCI),

    m HB HB

    i,.975 i,.025i=11ALCI = (P - P )m

    27

  • 8/6/2019 16 Partha Lahiri

    28/31

    Table 1: Ratios of ASD, AAD, ASRD, AARD, and ALCIfor the two models (Normal/EP) using the simulated data

    DGP ASD AAD ARSD AARD ALCI

    EP(0, 0.1, 0.2) 1.258 1.106 1.259 1.106 1.100N(0, 0.1) 1.064 1.038 1.058 1.033 1.017

    28

  • 8/6/2019 16 Partha Lahiri

    29/31

    Real Data Analysis

    Data Source: 2002 Natality public-use dataWe drew 6 sets of samples of size n=4,526 using simple

    random sampling within states from the finite

    population.

    The state level sample sizes ranged from 7 (for smallstates such as Vermont) to 690 (for California).

    in

    Using each sampled data, we computed the HBestimates for each model using the two auxiliary

    variables

    29

  • 8/6/2019 16 Partha Lahiri

    30/31

    The prior assumptions for the hyperparameters arethe same as we used earlier

    To evaluate the two HB models, the five evaluationstatistics computed for each HB estimator.

    The numbers in the table consistently show thatBernoulli-Logit-EP model works better than the

    Bernoulli-Logit-Normal model in terms of the five

    evaluation statistics.

    30

  • 8/6/2019 16 Partha Lahiri

    31/31

    Table 3: ASD, AAD, ASRD, AARD, ARD and ALCI for the HB estimators

    using real data

    Sample Model ASD AAD ASRD AARD ALCI

    1 EP 0.00021 0.01168 0.00283 0.15695 0.05410

    1 Normal 0.00021 0.01176 0.00285 0.15809 0.059202 EP 0.00007 0.00653 0.00102 0.08779 0.06544

    2 Normal 0.00010 0.00746 0.00133 0.09948 0.06952

    3 EP 0.00012 0.00812 0.00139 0.10284 0.04663

    3 Normal 0.00013 0.00853 0.00154 0.10776 0.05214

    4 EP 0.00070 0.01846 0.00988 0.25118 0.12736

    4 Normal 0.00087 0.02061 0.01241 0.28247 0.13299

    5 EP 0.00043 0.01699 0.00569 0.22286 0.10696

    5 Normal 0.00063 0.02057 0.00810 0.26668 0.12456

    6 EP 0.00086 0.02238 0.01121 0.28965 0.128206 Normal 0.00147 0.02994 0.01876 0.38553 0.14330

    31