CFA Level 1 Quantitative Analysis E Book - Part 4(1)

Embed Size (px)

Citation preview

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    1/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Quantitative Analysis E-Book

    Part 4 of 8

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    2/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Sampling and Estimation

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    3/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    1. Introduction.

    In investment analysis, it is often impossible to study every member of the population. Even if analysts

    could examine the entire population, it may not be economically efficient to do so. Sampling is the processof obtaining a sample. A simple random sample is a sample obtained in such a way that each element of the

    population has an equal probability of being selected. The selection of any one element has no impact on

    the chance of selecting another element.

    A sample is random if the method for obtaining the sample meets the criterion of randomness (each

    element having an equal chance at each draw). The word 'simple' tells you that the process is not difficult,

    and the word 'random' tells you that you don't know in advance which observations will be selected in thesample. The actual composition of the sample itself does not determine whether or not it's a random

    sample.

    Example

    Suppose that a company has 30 directors, and you wish to choose 10 of them to serve on a committee. You

    could place the names of the 30 directors on separate pieces of paper, and draw them out one by one, untilyou have drawn a sample of size 10.

    Note: that the conditions for simple random sampling have been satisfied in that every one of the 30

    directors has an equal (non-zero) chance of being selected in the sample.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    4/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    In this example, it makes no sense to sample with replacement, as this would mean that once you have

    drawn a name, that name goes back into the hat (i.e. it is replaced), and can be drawn again. If the samepersonsname is drawn more than once, then you won't end up with a sample of size 10 if you draw 10

    names, so this experiment should be done without replacement.

    A biased sample is one in which the method used to create the sample results in samples that are

    systematically different from the population. For instance, consider a research project on attitudes toward

    cricket. Collecting the data by publishing a questionnaire in a magazine and asking people to fill it out and

    send it in would produce a biased sample. People interested enough to spend their time and energy fillingout and sending in the questionnaire are likely to have different attitudes toward cricket than those not

    taking the time to fill out the questionnaire.

    It is important to realize that it is the method used to create the sample not the actual make up of the sample

    itself that defines the bias. A random sample that is very different from the population is not biased: it is by

    definition not systematically different from the population. It is randomly different.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    5/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    SAMPLING ERROR

    The sample taken from a population is used to infer conclusions about the population. However, it's

    unlikely that the sample statistic would be identical to the population parameter. Suppose there is a class of

    100 students, and a sample of size 10 from that class is chosen. If by chance most of the brightest students

    in this sample are selected, then there is a misguided idea of what the population looks like, because the

    sample mean x-bar will be much higher than the population mean in this case. Equally, a sample

    comprising mainly weaker students could be chosen, and then the opposite would have applied. he ideal is

    to have a sample, which comprises a few bright students, a few weaker students, and mainly averagestudents, as this will give a good idea of the composition of population. However, because which items go

    into the sample cannot be controlled, you are dependent to some degree on chance as to whether the results

    are favorable or not.

    Sampling error (also called error of estimation) is the difference between the observed value of a statistic

    and the quantity it is intended to estimate. For example, sampling error of the mean equals sample mean

    minus population mean.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    6/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Sampling error can apply to statistics such as the mean, the variance, the standard deviation or any other

    values that can be obtained from the sample. The sampling error varies from sample to sample. A goodestimator is one whose sample error distribution is highly concentrated about the population parameter

    value.

    Sampling error of the mean would be: Sample mean - population mean = x-bar

    Sampling error of the standard deviation would be: Sample standard deviation - population standard

    deviation = s - .

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    7/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Sampling distribution

    A sample statistic itself is a random variable, which varies depending upon the composition of the sample.

    It therefore has a probability distribution. The sampling distribution of a statistic is the distribution of all

    the distinct possible values that the statistic can assume when computed from samples of the same size

    randomly drawn from the same population. The most commonly used sample statistics include mean,

    Variance and standard deviation.

    If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the populationmean exactly; by chance it will be a little bit higher or a little bit lower. If you sampled sets of 10 numbers

    over and over again (computing the mean for each set), you would find that some sample means come

    much closer to the population mean than others. Some would be higher than the populations mean and

    some would be lower. Imagine sampling 10 numbers and computing the mean over and over again, say

    about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means. This

    distribution of means is a very good approximation to the sampling distribution of the mean. The sampling

    distribution of the mean is a theoretical distribution that is approached as the number of samples in therelative frequency distribution increases. With 1,000 samples, the relative frequency distribution is quite

    close; with 10,000 it is even closer. As the number of samples approaches infinity, the relative frequency

    distribution approaches the sampling distribution.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    8/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    The sampling distribution of the mean for a sample size of 10 is just an example; there is a different

    sampling distribution for other sample sizes. Also, keep in mind that the relative frequency distributionapproaches a sampling distribution as the number of samples increases, not as the sample size increases

    since there is a different sampling distribution for each sample size.

    A sampling distribution can also be defined as the relative frequency distribution that would be obtained if

    all possible samples of a particular sample size were taken. For example, the sampling distribution of the

    mean for a sample size of 10 would be constructed by computing the mean for each of the possible ways in

    which 10 scores could be sampled from the population and creating a relative frequency distribution ofthese means. Although these two definitions may seem different, they are actually the same: Both

    procedures produce exactly the same sampling distribution.

    Statistics other than the mean have sampling distributions too. The sampling distribution of the median is

    the distribution that would result if the median instead of the mean were computed in each sample.

    Sampling distributions are very important since almost all inferential statistics are based on samplingdistributions.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    9/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Simple random vs. stratified random sampling

    In stratified random sampling, the population is subdivided into subpopulations (strata) based on one or

    more classification criteria. Simple random samples are then drawn from each stratum (The sizes of the

    samples are proportional to the relative size of each stratum in the population). These samples are then

    pooled.

    It is important to note that the size of the data in each stratum does not have to be the same or even similar,

    and frequently isn't.

    Stratified random sampling guarantees that population subdivisions of interest are represented in the

    sample. The estimates of parameters produced from stratified sampling have greater precision (i.e. smaller

    variance or dispersion) than estimates obtained from simple random sampling.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    10/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    For example, investors may want to fully duplicate a bond index by owning all the bonds in the index in

    proportion to their market value weights. This is known as pure bond indexing. However, it's difficult andcostly to implement because a bond index typically consists of thousands of issues. If simple sampling is

    used, the sample selected may not accurately reflect the risk factors of the index. Stratified random

    sampling can be used to replicate the bond index.

    Divide the population of index bonds into groups with similar risk factors (e.g. issuer, duration/maturity,

    coupon rate, credit rating, call exposure, etc.). Each group is called a stratum or cell.

    Select a sample from each cell proportional to the relative market weighting of the cell in the index.

    A stratified sample will ensure that at least one issue in each cell is included in the sample.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    11/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Time-series and cross-sectional data.

    Data come in many different shapes and sizes, and measure many different things at different times. Often

    financial analysts are interested in particular types of data such as time-series data or cross-sectional data.

    Time-series data is a set of observations collected at usually discrete and equally spaced time intervals. For

    example, the daily closing price of a certain stock recorded over the last six weeks is an example of time

    series data. Note that a too long or too short time period may lead to time-period bias. Refer to subject g for

    details.

    Other examples of time-series would be staff numbers at a particular institution taken on a monthly basis in

    order to assess staff turnover rates, weekly sales figures of ice-cream sold during a holiday period at a

    seaside resort and the number of students registered for a particular course on a yearly basis. All of the

    above would be used to forecast likely data patterns in the future.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    12/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Cross-sectional data are observations that coming from different individuals or groups at a single point in

    time. For example, if one considered the closing prices of a group of 20 different tech stocks on December15, 1986 this would be an example of cross-sectional data. Note that the underlying population should

    consist of members with similar characteristics. For example, suppose you are interested in how much

    companies spend on research and development expenses. Firms in some industries such as retail spend

    little on research and development (R&D), while firms in industries such as technology spend heavily on

    R&D. Therefore, it's inappropriate to summarize R&D data across all companies. Rather, analysts should

    summarize R&D data by industry, and then analyze the data in each industry group.

    Other examples of cross-sectional data would be: an inventory of all ice creams in stock at a particular

    store, a list of grades obtained by a class of students for a specific test.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    13/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    2. The Central Limit Theorem.

    The central limit theorem states that given a distribution with a mean and variance 2, the sampling

    distribution of the mean x-bar approaches a normal distribution with a mean ()and a variance 2/N as N,the sample size, increases.

    The amazing and counter-intuitive thing about the central limit theorem is that no matter what the shape of

    the original distribution, x-bar approaches a normal distribution.

    If the original variable X has a normal distribution, then x-bar will be normal regardless of the

    sample size.

    If the original variable X does not have a normal distribution, then x-bar will be normal only if N

    30. This is called a distribution free result. This means that no matter what distribution X has,

    will still be normal for sufficiently large n.

    Keep in mind that N is the sample size for each mean and not the number of samples. Remember in a

    sampling distribution the number of samples is assumed to be infinite. The sample size is the number of

    scores in each sample; it is the number of scores that goes into the computation of each mean.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    14/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Two things should be noted about the effect of increasing N:

    1. The distributions become more and more normal.

    2. The spread of the distributions decreases.

    Based on the central limit theorem, when the sample size is large, you can:

    1. Use the sample mean to infer the population mean.2. Construct confidence intervals for the population mean based on the normal distribution.

    Note that the central limit theorem does not prescribe that the underlying population must be normally

    distributed. Therefore, the central limit theorem can be applied on a population with any probability

    distribution.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    15/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    3. Standard Error of the Sample Mean.

    The standard error of a statistic is the standard deviation of the sampling distribution of that statistic.

    Standard errors are important because they reflect how much sampling fluctuation a statistic will show. The

    inferential statistics involved in the construction of confidence intervals and significance testing are based

    on standard errors. The standard error of a statistic depends on the sample size. In general, the larger the

    sample size, the smaller the standard error. The standard error of a statistic is usually designated by the

    Greek letter sigma ()with a subscript indicating the statistic.

    The standard error of the mean is designated as: m. It is the standard deviation of the samplingdistribution of the mean. The formula for the standard error of the mean is: m= /N1/2, where is the

    standard deviation of the original distribution and N is the sample size (the number of scores each mean is

    based upon). This formula does not assume a normal distribution. However, many of the uses of the

    formula do assume a normal distribution. The formula shows that the larger the sample size, the smaller the

    standard error of the mean. More specifically, the size of the standard error of the mean is inversely

    proportional to the square root of the sample size

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    16/40

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    17/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Example 2

    Suppose that the mean grade of students in a class is unknown, but a sample of 30 students is taken from

    the class, and the mean from the sample is found to be 60%, with a standard deviation of 9%. Calculate the

    standard error of the sample mean, and interpret your results.

    Now, and are unknown, but m is given as 60 and s Now, and are unknown, but m is given as 60 and

    s is given as 9. Since n = 30, you can estimate the standard error of the sample mean as: 9/301/2 = 1.6432.

    This means that if you took all possible samples of size 30 from the class, you would estimate the standarderror to be 1.6432.

    It is important to note that when you have , you must use it; but when you don't, you use its sample

    equivalent s.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    18/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    4. Estimators.

    Very often, there are a number of different estimators that can be used to estimate unknown population

    parameters. When faced with such a choice, it is desirable to know that the estimator chosen is the "best"under the circumstances, that is, it has more desirable properties than any of the other options available to

    us. There are three desirable properties of estimators:

    1. Unbiasedness An estimator's expected value (the mean of its sampling distribution) equals the

    parameter it is intended to estimate. For example, the sample mean is an unbiased estimator of the

    population mean, because the expected value of the sample mean is equal to the population mean.

    2. Efficiency An estimator is efficient if no other unbiased estimator of the sample parameter has asampling distribution with smaller variance. That is, in repeated samples, analysts expect the

    estimates from an efficient estimator to be more tightly grouped around the mean than estimates

    from other unbiased estimators. For example, the sample mean is an efficient estimator of the

    population mean, and the sample variance is an efficient estimator of the population variance.

    3. Consistency A consistent estimator is one for which the probability of accurate estimates

    (estimates close to the value of the population parameter) increases as sample size increases. Inother words, a consistent estimator's sampling distribution becomes concentrated on the value of

    the parameter it is intended to estimate as the sample size approaches infinity. For example, as the

    sample size increases to infinity, the standard error of the sample mean declines to 0, and the

    sampling distribution concentrates around the population mean. Therefore, the sample mean is a

    consistent estimator of the population mean.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    19/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    The single estimate of an unknown population parameter calculated as a sample mean is called point

    estimate of the mean. The formula used to compute the point estimate is called an estimator. The specificvalue calculated from sample observations using an estimator is called an estimate. For example, the

    sample mean is a point estimate of the population mean. Suppose two samples are taken from a population,

    and the sample means are 16 and 21 respectively. Therefore, 16 and 21 are two estimates of the population

    mean. Note that an estimator will yield different estimates as repeated samples are taken from the sample

    population.

    A confidence interval is an interval for which one can assert with a given probability 1 - , called thedegree of confidence, that it will contain the parameter it is intended to estimate. This interval is often

    referred to as the (1 - )% confidence interval for the parameter, where is referred to as the level of

    significance. The end points of a confidence interval are called the lower and upper confidence limits.

    For example, suppose that a 95% confidence interval for the population mean is 20 to 40. This means that

    There is a 95% probability that the population mean lies in the range of 20 to 40;

    "95%" is the degree of confidence; "5%" is the level of significance;

    20 and 40 are the lower and higher confidence limits, respectively.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    20/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    5. Confidence Intervals for the Population Mean.

    Confidence intervals are typically constructed by using the following structure:

    Confidence Interval = Point Estimate Reliability Factor x Standard Error

    Point estimate is the value of a sample statistic of the population parameter.

    Reliability factor is a number based on the sampling distribution of the point estimate and the

    degree of confidence (1 - ).

    Standard error refers to the standard error of the sample statistic that is used to produce the point

    estimate.

    Whatever the distribution of the population, the sample mean is always the point estimate used to construct

    the confidence intervals for the population mean. The reliability factor and the standard error, however,

    may vary depending on three factors:

    1. Distribution of population: normal or non-normal.

    2. Population variance: known or unknown.

    3. Sample size: large or small.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    21/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    z-Statistic: a standard normal random variable

    If a population is normally distributed with a knownvariance, z-statistic is used as the reliability factor to

    construct confidence intervals for the population mean.

    In practice, the population standard deviation is rarely known. However, learning how to compute a

    confidenceinterval when the standard deviation is known is anexcellent introduction to how to compute a

    confidence interval when the standard deviation has to beestimated.

    Three values are used to construct a confidence interval for :

    1. The sample mean (m);

    2. The value of z (which depends on the level of confidence), and

    3. The standard error of the mean ()m.

    The confidence interval has m for its center and extends a distance equal to the product of z and in bothdirections. Therefore, the formula for a confidence interval is:

    m - z m = = m + z m

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    22/40

    CFAQ

    uantitativeAna

    lysisE-Book4of8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    For a (1 - )% confidence interval for the population mean, the z-statistic to be used is Z /2. Z /2 denotes

    the points of the standard normal distribution such that /2 of the probability falls in the right-hand tail.

    Effectively, what is happening is that the (1 - )% of the area that makes up the confidence interval falls in

    the center of the graph, that is, symmetrically around the mean. This leaves % of the area in both tails, or

    /2 % of area in each tail.

    Commonly used reliability factors are as follows:

    90% confidence intervals: z0.05= 1.645. is 10%, with 5% in each tail.

    95% confidence intervals: z0.025= 1.96. is 5%, with 2.5% in each tail.

    99% confidence intervals: z0.005= 2.575. is 1%, with 0.5% in each tail.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    23/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Example

    Assume that the standard deviation of SAT verbal scores in a school system is known to be 100. A

    researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random

    sample of 10 scores.

    The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780. Therefore, m = 530, N = 10, and

    m= 100 / 101/2 = 31.62. The value of z for the 95% confidence interval is the number of standard

    deviations one must go from the mean (in both directions) to contain .95 of the scores.

    It turns out that one must go 1.96 standard deviations from the mean in both directions to contain .95 of the

    scores the value of 1.96 was found using a z table. Since each tail is to contain .025 of the scores, you find

    the value of z for which 1 - 0.025 = 0.975 of the scores are below. This value is 1.96.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    24/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    All the components of the confidence interval are now known: m = 530, m= 31.62, z = 1.96.

    Lower limit = 530 - (1.96)(31.62) = 468.02

    Upper limit = 530 + (1.96)(31.62) = 591.98

    Therefore, 468.02 591.98. This means that the experimenter can be 95% certain that the mean SAT in

    the school system is between 468 and 592. This also means if the experimenter repeatedly took samples

    from the population and calculated a number of different 95% confidence intervals using the sample

    information, on average 95% of those intervals would contain . Notice that this is a rather large range of

    scores. Naturally, if a larger sample size had been used, the range of scores would have been smaller.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    25/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    The computation of the 99% confidence interval is exactly the same except that 2.58 rather than 1.96 is

    used for z. The 99% confidence interval is: 448.54 = = 611.46. As it must be, the 99% confidence intervalis even wider than the 95% confidence interval.

    Summary of Computations

    1. Compute m = X/N.

    2. Compute m= /N1/2

    3. Find z (1.96 for 95% interval; 2.58 for 99% interval)

    4. Lower limit = m - z m

    5. Upper limit = m + z m

    6. Lower limit = = Upper limit

    Assumptions:

    1. Normal distribution

    2. is known

    3. Scores are sampled randomly and are independent

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    26/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    There are three other points worth mentioning here:

    The point estimate will always lie exactly at the midway mark of the confidence interval. This is because it

    is the "best" estimate for ,and so the confidence interval expands out from it in both directions.

    The higher the percentage of confidence, the wider the interval will be. This is because as the percentage is

    increased, a wider interval is needed to give us a greater chance of capturing the unknown population value

    within that interval.

    The width of the confidence interval is always twice the part after the positive or negative sign, that is,

    twice the reliability factor x standard error. The width is simply the upper limit minus the lower limit.

    It is very rare for a researcher wishing to estimate the mean of a population to already know its standard

    deviation. Therefore, the construction of a confidence interval almost always involves the estimation of

    both and .

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    27/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    STUDENTS' T-DISTRIBUTION

    When is known, the formula m - z m= = m + z mis used for a confidence interval. When is not

    known, m = s/N1/2 (N is the sample size) is used as an estimate of and . Whenever the standard

    deviation is estimated, the t rather than the normal (z) distribution should be used. The values of t are larger

    than the values of z so confidence intervals when is estimated are wider than confidence intervals when

    is known. The formula for a confidence interval for when is estimated is:

    m - t sm= = m + t sm

    Where m is the sample mean, sm is an estimate of m, and t depends on the degrees of freedom and the

    level of confidence.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    28/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    The t-distribution is a symmetrical probability distribution defined by a single parameter known as degrees

    of freedom (df). Each value for the number of degrees of freedom defines one distribution in this family ofdistributions. Like a standard normal distribution (e.g. a z-distribution), the t-distribution is symmetrical

    around its mean. Unlike a standard normal distribution, the t-distribution has the following unique

    characteristics.

    It is an estimated standardized normal distribution. When n gets larger, t approximates z (s

    approaches ).

    The mean is 0, and the distribution is bell shaped. There is not one t-distribution, but a family of t-distributions. All t-distributions have the same

    mean of 0. Standard deviations of these t-distributions differ according to the sample size, n.

    The shape depends on degrees of freedom (n - 1). The t-distribution is less peaked than a standard

    normal distribution, and has fatter tails (i.e. more probability in the tails).

    t/2 tends to be greater than z/2for a given level of significance, .

    Its variance is v/(v-2) (for v > 2), where v = n-1. It is always bigger than 1. As v increases, the

    variance approaches 1.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    29/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    The value of t can be determined from a t table. The degrees of freedom for t is equal to the degrees offreedom for the estimate of mwhich is equal to N-1.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    30/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    A portion of t-table is presented as below:

    Level of significance (a) for two-Tailed Test

    Suppose the sample size (n) is 30, and the level of significance () is 5%. df = n - 1 = 29. t/2= t0.025=

    2.045 (Find the 29 df row, and then move to the 0.05 column).

    Cff 0.20 0.10 0.05 0.02 0.01

    1 3.078 6.314 12.706 31.821 63.657

    2 1.886 2.920 4.303 6.965 9.925

    29 1.311 1.699 2.045 2.462 2.756

    30 1.310 1.697 2.042 2.457 2.750

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    31/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Example

    Assume a researcher is interested in estimating the mean reading speed (number of words per minute) of

    high-school graduates and computing the 95% confidence interval. A sample of 6 graduates was taken and

    the reading speeds were: 200, 240, 300, 410, 450, and 600. For these data,

    m = 366.6667

    sm= 60.9736

    df = 6-1 = 5

    t = 2.571

    Therefore, the lower limit is: m - (t) (sm) = 209.904 and the upper limit is: m + (t) (sm) = 523.430.

    Therefore, the 95% confidence interval is: 209.904 = = 523.430

    Thus, the researcher can be 95% sure that the mean reading speed of high-school graduates is between209.904 and 523.430.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    32/40

    CFAQ

    uantitativeAna

    lysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Summary of Computations

    1. Compute m = X/N.2. Compute s

    3. Compute m= s/N1/2

    4. Compute df = N-1

    5. Find t for these df using a t table

    6. Lower limit = m - t s m

    7. Upper limit = m + t sm

    8. Lower limit = = Upper limit

    Assumptions:

    1. Normal distribution

    2. Scores are sampled randomly and are independent

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    33/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Discuss the issues surrounding selection of the appropriate sample size

    It's all starting to become a little confusing. Which distribution do you use?

    When a large sample size (generally bigger than 30 samples) is used, a z table can always be used to

    construct the confidence interval. It does not matter if the population distribution is normal, or if the

    population variance is known or not. This is because the central limit theorem assures that when the sample

    is large, the distribution of the sample mean is approximately normal. However, the t-statistic is more

    conservative because the t-statistic tends to be greater than the z statistic, and therefore using t-statistic willresult in a wider confidence interval.

    However, if there is only a small sample size, a t table has to be used to construct the confidence interval

    when the population distribution is normal and the population variance is not known.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    34/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    If the population distribution is not normal, there is no way to construct a confidence interval from a small

    sample (even if the population variance is known).

    Therefore, all else equal, you should try to select a sample larger than 30. The larger the sample size, the

    more precise the confidence interval.

    In general, at least one of the following is needed:

    A normal distribution for the population.

    A sample size that is greater than or equal to 30.

    If one or both of the above occur, then a z-table or t-table is used, dependent upon whether is known or

    unknown. If neither of the above occurs, then the question cannot be answered.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    35/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    A summary of the situation is as follows:

    If the population is normally distributed, and the population variance is known, use a z-score

    irrespective of sample size.

    If the population is normally distributed, and the population variance is unknown, use a t-score

    irrespective of sample size.

    If the population is not normally distributed, and the population variance is known, use a z score

    only if n >= 30, otherwise it cannot be done.

    If the population is not normally distributed, and the population variance is unknown, use a t-

    score only if n >= 30, otherwise it cannot be done.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    36/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    6. Common biases in sampling methods.

    As has already been mentioned repeatedly, if there are problems with the choice of sample, then the

    conclusions that are drawn from a sample could be in error.

    There are a number of different types of bias that can creep into samples. It is important to be aware of

    them, and have the ability to comment on their possible appearance in the data where appropriate.

    Data-snooping bias is the bias in the inference drawn as a result of prying into the empirical results of

    others to guide your own analysis.

    Finding seemingly significant but in fact spurious patterns in the data is a serious problem in financial

    analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for

    financial analysis because of the large number of empirical studies performed on the same datasets. Given

    enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any

    dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But

    because small effects in financial calculations can often lead to very large differences in investment

    performance, data-snooping biases can be surprisingly substantial.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    37/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    For example, after examining the empirical evidence from 1986 to 2002, Professor Minard concludes that a

    growth investment strategy produces superior investment performance. After reading about ProfessorMinard's study, Monica decides to conduct a research of growth versus value investing based on the same

    or related historical data used by Professor Minard. Monica's research is subject to data-snooping bias

    because, among other things, the data used by Professor Minard may be spurious.

    The best way to avoid data-snooping bias is to examine new data. However, data-snooping bias is difficult

    to avoid because investment analysis is typically based on historical or hypothesized data.

    Data-snooping bias can easily lead to data-mining bias.

    Data-mining is the practice of finding forecasting models by extensive searching through databases for

    patterns or trading rules (i.e. repeatedly "drilling" in the same data until you find something). It has a very

    specific definition: continually mixing and matching the elements of a database until one "discovers" two

    more or more data series that are highly correlated. Data-mining also refers more generically to any of a

    number of practices in which data can be tortured into confessing anything.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    38/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Two signs may indicate the existence of data-mining in research findings about profitable trading

    strategies:

    1. Many of the variables actually used in the research are not reported. These terms may indicate

    that the researchers were searching through many unreported variables.

    2. There is no plausible economic theory available to explain why those strategies work.

    To avoid data-mining, analysts should use out-of sample data to test a potentially profitable trading rule.

    That is, analysts should test the trading rule on a data set other than the one used to establish the rule.

    Sample selection bias occurs when data availability leads to certain assets being excluded from the

    analysis. The discrete choice has become a popular tool for assessing the value of non-market goods.

    Surveys used in these studies frequently suffer from large non-response which can lead to significant bias

    in parameter estimates and in the estimate of mean

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    39/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8

    www.educorporatebridge.comAll Rights Reserved. Corporate Bridge TM

    Survivorship bias is the most common type of sample selection bias. It occurs when studies are conducted on

    databases that have eliminated all companies that have ceased to exist (often due to bankruptcy). The findings

    from such studies most likely will be upwardly biased, since the surviving companies will look better than

    those that no longer exist For example many mutual those that no longer exist. For example, many mutual fund

    databases provide historical data about only those funds that are currently in existence. As a result, funds that

    have ceased to exist due to closure or merger do not appear in these databases. Generally, funds that have

    ceased to exist have lower returns relative to the surviving funds. Therefore, the analysis of a mutual fund

    database with survivorship bias will overestimate the average mutual fund return because the database only

    includes the better-performing funds. Another example is the return data on stocks listed on an exchange as it is

    subject to survivorship bias: it's difficult to collect information on delisted companies and these companiesoften have poor performance.

    Look-ahead bias exists when studies assume that fundamental information is available when it is not. For

    example, researchers often assume a person had annual earnings data in January; in reality the data might not

    be available until March. This usually biases results upwards.

    Time period bias occurs when a test design is based on a time period that may make the results time-periodspecific. Even the worst performers have months or even years in which they look wonderful. After all, stopped

    clocks are right twice a day. To eliminate strategies that have just been lucky, research must encompass many

    years. However, if the time period is too long, the fundamental economic structure may have changed during

    the time frame resulting in two data changed during the time frame, resulting in two data sets that reflect

    different relationships.

  • 8/13/2019 CFA Level 1 Quantitative Analysis E Book - Part 4(1)

    40/40

    CFAQ

    uantitativeAnalysisE-Book4o

    f8 For FREE Resources

    https://www.educorporatebridge.com/freebies.php

    Corporate Bridge Blog

    Finance News, Articles, Interview Tips etc

    https://www.educorporatebridge.com/blog

    For Online Finance Courses

    For any other enquiry / information

    Email [email protected]

    https://www.educoporatebridge.com

    Disclaimer Please refer to the updated curriculum of CFA level 1 for further information

    https://www.educorporatebridge.com/freebies.phphttps://www.educorporatebridge.com/blogmailto:[email protected]://www.educorporatebridge.com/https://www.educorporatebridge.com/mailto:[email protected]://www.educorporatebridge.com/bloghttps://www.educorporatebridge.com/freebies.php