Lecture Estimation

Embed Size (px)

Citation preview

  • 7/29/2019 Lecture Estimation

    1/38

    Binomial distributions for sample counts

    Binomial distributions are models for some categorical variables, typically

    representing the number of successes in a series of n trials.

    The observations must meet these requirements:

    The total number of observations n is fixed in advance.

    Each observation falls into just 1 of 2 categories: success and failure.

    The outcomes of all n observations are statistically independent.

    All n observations have the same probability of success, p.

    We record the next 50 births at a local hospital. Each newborn is either a

    boy or a girl; each baby is either born on a Sunday or not.

  • 7/29/2019 Lecture Estimation

    2/38

    Applications for binomial distributions

    Binomial distributions describe the possible number of times that

    a particular event will occur in a sequence of observations.

    They are used when we want to know about the occurrence of an

    event, not its magnitude.

    In a clinical trial, a patients condition may improve or not. We study the

    number of patients who improved, not how much better they feel.

    Is a person ambitious or not? The binomial distribution describes the

    number of ambitious persons, not how ambitious they are. In quality control we assess the number of defective items in a lot of

    goods, irrespective of the type of defect.

  • 7/29/2019 Lecture Estimation

    3/38

    Reminder: Sampling variability

    Each time we take a random sample from a population, we are likely to get adifferent set of individuals and calculate a different statistic. This is called sampling

    variability.

    If we take a lot of random samples of the same size from a given population, the

    variation from sample to samplethe sampling distributionwill follow a

    predictable pattern.

  • 7/29/2019 Lecture Estimation

    4/38

    Binomial mean and standard deviation

    The center and spread of the binomial

    distribution for a count Xare defined by the mean

    m and standard deviation s:

    )1( pnpnpqnp

    Effect of changing p when n is fixed.

    a) n = 10, p = 0.25

    b) n = 10, p = 0.5

    c) n = 10, p = 0.75

    For small samples, binomial distributions

    are skewed when p is different from 0.5. 00.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0 1 2 3 4 5 6 7 8 9 10

    Number of successes

    P(X=x)

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0 1 2 3 4 5 6 7 8 9 10

    Number of successes

    P(X=x)

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0 1 2 3 4 5 6 7 8 9 10

    Number of successes

    P(X=x)

    a)

    b)

    c)

  • 7/29/2019 Lecture Estimation

    5/38

    Sample proportions

    The proportion of successes can be more informative than the count. Instatistical sampling the sample proportion of successes, , is used to estimate the

    proportion p of successes in a population.

    For any SRS of size n, the sample proportion of successes is:

    n

    X

    np

    samplein thesuccessesofcount

    In an SRS of 50 students in an undergrad class, 10 are Hispanic:

    = (10)/(50) = 0.2 (proportion of Hispanics in sample)

    The 30 subjects in an SRS are asked to taste an unmarked brand of coffee and rate it

    would buy or would not buy. Eighteen subjects rated the coffee would buy.

    = (18)/(30) = 0.6 (proportion of would buy)

    p

    p

    p

  • 7/29/2019 Lecture Estimation

    6/38

    Sampling distribution of the sample proportionThe sampling distribution of is never exactly normal. But as the sample size

    increases, the sampling distribution of becomes approximately normal.

    The normal approximation is most accurate for any fixed n when p is close to 0.5, and

    least accurate when p is near 0 or near 1.

    pp

  • 7/29/2019 Lecture Estimation

    7/38

    Estimation

    Estimation A process whereby we select

    a random sample from a population and use

    a sample statistic to estimate a population

    parameter.

  • 7/29/2019 Lecture Estimation

    8/38

    Point and Interval Estimation

    Point Estimate A sample statistic used toestimate the exact value of a population

    parameter

    Confidence interval (interval estimate) Arange of values defined by the confidence levelwithin which the population parameter is

    estimated to fall.

    Confidence Level The likelihood, expressedas a percentage or a probability, that a specified

    interval will contain the population parameter.

  • 7/29/2019 Lecture Estimation

    9/38

    A population distribution variation in the larger

    group that we want to know about.

    A distribution of sample observations variation in the sample that we can observe.

    A sampling distribution a normal distribution

    whose mean and standard deviation are unbiased

    estimates of the parameters and allows one to infer

    the parameters from the statistics.

    Inferential Statistics involves

    Three Distributions:

  • 7/29/2019 Lecture Estimation

    10/38

    What does this Theorem tell us: Even if a population distribution is skewed, we know that the

    sampling distribution of the mean is normally distributed

    As the sample size gets larger the mean of the sampling

    distribution becomes equal to the population mean As the sample size gets larger the standard error of the mean

    decreases in size (which means that the variability in the sample

    estimates from sample to sample decreases as n increases).

    It is important to remember that researchers do not

    typically conduct repeated samples of the same

    population. Instead, they use the knowledge of theoretical

    sampling distributions to construct confidence intervals

    around estimates.

    The Central Limit Theorem

    Revisited

  • 7/29/2019 Lecture Estimation

    11/38

  • 7/29/2019 Lecture Estimation

    12/38

  • 7/29/2019 Lecture Estimation

    13/38

    A range of reasonable guesses at a population value,for example, a mean.

    Confidence level = chance that range of guessescaptures the population value.

    Most common confidence level is 95%

  • 7/29/2019 Lecture Estimation

    14/38

    General Format of a Confidence Interval

    estimate +/- margin of error

  • 7/29/2019 Lecture Estimation

    15/38

    Accuracy of a mean

    A sample of n=36 college women hasmean pulse = 75.3.

    The SD of these pulse rates = 8 . How well does this sample mean estimate

    the population mean ?

  • 7/29/2019 Lecture Estimation

    16/38

    Standard Error of Mean

    SEM = SD of sample / square root of n

    SEM = 8 / square root ( 36) = 8 / 6 = 1.33

    Margin of error of mean = 2 x SEM Margin of Error = 2.66 , about 2.7

  • 7/29/2019 Lecture Estimation

    17/38

    Interpretation

    95% confidence that the sample mean iswithin 2.7 (pulse beats) of the population

    mean.

    A 95% confidence interval for thepopulation mean

    sample mean +/- margin of error 75.3 +/-2.7 ; 72.6 to 78.0

  • 7/29/2019 Lecture Estimation

    18/38

    C.I. for mean pulse of men

    n=49

    sample mean=70.3, SD = 8

    SEM = 8 / square root(49) = 1.1 margin of error=2 x 1.1 = 2.2 Interval is 70.3 +/- 2.2 68.1 to 72.5

  • 7/29/2019 Lecture Estimation

    19/38

    Do men and women differ in

    mean pulse? C.I. for women is 72.6 to 78.0 C.I. for men is 68.1 to 72.5 No overlap between intervals We say that population means differ

  • 7/29/2019 Lecture Estimation

    20/38

    Confidence Levels:

    Confidence Level The likelihood, expressed as a

    percentage or a probability, that a specified interval

    will contain the population parameter.

    95% confidence level there is a .95 probability that

    a specified interval DOES contain the population

    mean. In other words, there are 5 chances out of 100

    (or 1 chance out of 20) that the interval DOES NOT

    contains the population mean.

    99% confidence level there is 1 chance out of 100that the interval DOES NOTcontain the population

    mean.

  • 7/29/2019 Lecture Estimation

    21/38

    Constructing a

    Confidence Interval (CI)

    The sample mean is the point estimate of the

    population mean.

    The sample standard deviation is the pointestimate of the population standard deviation.

    The standard error of the mean makes it

    possible to state the probability that an

    interval around the point estimate contains

    the actual population mean.

  • 7/29/2019 Lecture Estimation

    22/38

    Standard error of the mean the standard

    deviation of a sampling distribution

    n

    x

    x

    Standard Error

    The Standard Error

  • 7/29/2019 Lecture Estimation

    23/38

    n

    x

    x

    Since the standard error is generally not known, we

    usually work with the estimated standard error:

    n

    ss xx

    Estimating standard errors

  • 7/29/2019 Lecture Estimation

    24/38

    )( xSEZXCI

    Determining a

    Confidence Interval (CI)

    )(

    n

    sZXCI x

    Given a large enough sample, any confidence interval for the

    population mean may be constructed:

    Where z is chosen from a standard normal distribution table to

    obtain a desired degree of confidence.

  • 7/29/2019 Lecture Estimation

    25/38

    Confidence Level Increasing our confidence levelfrom 95% to 99% means we are less willing to draw

    the wrong conclusion we take a 1% risk (ratherthan a 5%) that the specified interval does not contain

    the true population mean.If we reduce our risk of being wrong, then we need a

    wider range of values . . . So theinterval becomeslessprecise.

    )(n

    sZX x

    Confidence Interval Width

  • 7/29/2019 Lecture Estimation

    26/38

  • 7/29/2019 Lecture Estimation

    27/38

    Confidence Interval Width

  • 7/29/2019 Lecture Estimation

    28/38

    Confidence Interval Z Values

  • 7/29/2019 Lecture Estimation

    29/38

    Sample Size Larger samples result in smallerstandard errors, and therefore, in sampling

    distributions that are more clustered around the

    population mean. A more closely clustered sampling

    distribution indicates that our confidence intervals

    will be narrower and more precise.

    Confidence Interval Width

    )(n

    sZX x

  • 7/29/2019 Lecture Estimation

    30/38

  • 7/29/2019 Lecture Estimation

    31/38

    Standard Deviation Smaller sample standarddeviations result in smaller, more precise confidence

    intervals.

    (Unlike sample size and confidence level, the

    researcher plays no role in determining the standard

    deviation of a sample.)

    Confidence Interval Width

    )(n

    sZX x

  • 7/29/2019 Lecture Estimation

    32/38

  • 7/29/2019 Lecture Estimation

    33/38

    Finding confidence interval of the mean years of education of

    voters. (Table 9.4, Hamilton)

    Mean = 12.97 years

    Standard deviation = 2.42 years

    Number of cases n= 155

    Calculation of 95 percent confidence interval.

    )(n

    sZX x

    )

    155

    42.2(96.197.12

    38.097.12

    So the interval is 12.59 13.35

  • 7/29/2019 Lecture Estimation

    34/38

    Interpretation

    Informal: Based on our analysis of thisparticular sample, we are about 95% confident

    that the mean education among all voters in

    this town lies between 12.59 and 13.35 years.

    Formal: If we took a large number of random

    samples, each with 155 cases, and calculated

    confidence intervals in this manner for each

    sample, about 95% of those confidence

    intervals should include the true population

    mean .

  • 7/29/2019 Lecture Estimation

    35/38

    Estimating the standard error of a proportion

    basedon the Central Limit Theorem, a sampling distribution of

    proportions is approximately normal, with a mean, ,

    equal to the population proportion, , and with a standard

    error of proportions equal to:

    n

    1

    Since the standard error of proportions is generally not

    known, we usually work with the estimated standarderror:

    n

    s

    1

    Confidence Intervals for Proportions

  • 7/29/2019 Lecture Estimation

    36/38

    Determining a Confidence Interval

    for a Proportion

    n

    ZSEZ

    1)(

    Large sample confidence intervals for proportions

    are found as

    Where z is chosen from a table of the standard normal

    distribution to give the desired degree of confidence.

  • 7/29/2019 Lecture Estimation

    37/38

    Finding an approximate 95% confidence interval for the

    proportion favoring school closings.

    Sample statistics:

    Proportion favoring school closed = 0.431

    Number of cases n = 153

    Confidence interval for population proportion

    n

    ZSEZ

    1)(

    153

    431.01431.096.1431.0

    078.0431.0 So the interval is 0.353 0.509

  • 7/29/2019 Lecture Estimation

    38/38

    Interpretation

    Informal: Based on our analysis of this one sample weare about 95% confident that the proportion in favor

    of closing schools, among all voters in this town, lies

    between 0.353 and 0.509.

    Formal: If we took a large number of randomsamples, each with 153 cases, and calculated

    confidence intervals in this manner for each sample,

    about 95% of those confidence intervals should

    include the true population proportion .