145
Topic 1 --- page 1 Topic 1: Sampling and Sampling Distributions Chapter 6 Objective: To draw inferences about population parameters on the basis of _______information. In Economics 245 (Descriptive Statistics) we saw that statistics are largely applied to describing population parameters , and estimating or testing hypotheses about population characteristics. Example : measures of _______ tendency , such as the population mean, median, and mode; measures of dispersion , such as the population variance, standard deviation, coefficient of variation.

Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 1

Topic 1: Sampling and Sampling Distributions Chapter 6

Objective: To draw inferences about population parameters on the basis of _______information. In Economics 245 (Descriptive Statistics) we saw that statistics are largely applied to describing population parameters, and estimating or testing hypotheses about population characteristics. Example: • measures of _______ tendency, such as the population mean, median, and mode; • measures of dispersion, such as the population variance, standard deviation, coefficient of variation.

Page 2: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 2

If all values of a population are k____, we can determine these parameters. But for many populations, there are m______ and time constraints to the act of gathering data from an entire population to determine the population parameter of interest. →Hence, samples are taken to _______ the population parameters.

In Topic 1 we will explore how a sample is taken, and how the information generated from the sample can be applied.

Page 3: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 3

To deal with the issues of proper sampling, we must ask the following questions: 1) What are the most c___ efficient means of collecting

samples that generate the best representation of the population?

2) How do we clearly d_______ sample information in the

most useful manner? 3) How do we generate i________ about the population from

sample summaries? 4) How a_______ are the inferences generated from sample

information?

Page 4: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 4

Sample Design To answer question #1, we must describe how sample data is collected. (i.e. describe the procedure or plan of action.) Definition: Sample Design: is a pre-determined p___ depicting how to collect a sample from a given population.

That is, before any data collection takes place, the researcher p____ how he or she will actually collect the data.

The primary objective of the sample design is to collect a sample that accurately r________ the population that one is trying to replicate by a sample.

Page 5: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 5

Ideally, the sample contains population “characteristics” in the same proportion or c____________. Ex. Suppose some population contains a 6.5% unemployment rate. (Obtained from the Census). The sample must contain ≈6.5 % unemployed. Ex. 50% of the population is over 60 years old. Want a sample that has 50% of the people surveyed over 60 years old.

Page 6: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 6

Eg. Income structure: 10% population are in a 50% income-tax bracket; 55% of population are in a 40% income-tax bracket; 35% of the population are in a <40% income-tax bracket. Want a sample that contains the same proportion of income distributions. Used for determining election platform: more jobs versus tax cuts.

Page 7: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 7

There are many ways of collecting a ___ sample. Let us look at some of the e_____ that can be created from bad sampling designs:

(I) N__-S_______ Errors: these are errors that include all different human errors, such as mistakes in collecting, analysing and reporting data. Examples: i) Wrong p_________ is sampled: An error of misrepresentation can arise when the wrong population is sampled by mistake. i.e. A survey question, when applied to a certain sample, misrepresents the true population response because the sample did not accurately replicate the population.

Page 8: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 8

Example: “Should hours of operation for bars downtown be extended?” asked to the average person in a bar at 2 a.m. in downtown Vancouver. The positive or negative responses may be higher or lower than the Canadian average.

Page 9: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 9

ii) Response ____: Another type of error, know as response bias, affects results in surveys.

Due to: poor _______ on questionnaires.

Interview techniques that cause the interviewee to respond to a question that does not reflect his / her true opinion. Distorts the truth. Example: Air quality control survey: “Do you drive an automobile that is environmentally friendly?” Answer ‘no’ if you do not have a car. Or perception of your car’s environmental friendliness is high, but in reality your car would fail every test available, so your response misrepresents the population. Your perception may be s_________ if the survey does not specify some criterion.

Page 10: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 10

(II) S_______ Errors:

Errors that represent the differences that may exist between a s_____ statistic and the population parameter being estimated are called sampling errors.

Will always occur in all data collection except a census where all _____ are surveyed.

Occurs in a situation where the sample does not represent the true population being examined because the sample only represents a portion of a population. An inferential e____ may be made because the data collected from the sample does not truly reflect the characteristic of the true population being explored.

Page 11: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 11

Example: The unemployment rate calculated from the Labour Force Survey understates the actual unemployment rate because it excludes certain groups within the provincial level, such as people on native reserves, people living in the NWT, Yukon and Nunavut.

Page 12: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 12

Example: The actual amount of income people earn may be higher than reported because the sample randomly selected a portion of the population that had a higher than average number of union members relative to the total economy.

Page 13: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 13

There is usually a trade-off between sample _____ and the

____ of making an error. In order to minimize the cost of erroneous decisions based on inaccurate information, the researcher must improve the sample design.

But such improvements are expensive. “Balance the costs of making an error and cost of sampling.” (Not easy – hard to specify the cost of making an incorrect decision!

Page 14: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 14

Next: Determining Which Sample Designs Most Effectively Minimize Sampling Errors I) Pro__________ Sampling

Based on a random s________ process. One criterion for a good sample is that every item in the

population being examined has an equal and independent chance of being chosen as part of the sample. A) Simple R_____ Samples: Every item of a sample of size n (from a population of size N,) is equally likely to be chosen.

Requirement: access to all items in the population. Can be difficult if the population is large and elements are difficult to identify.

Page 15: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 15

B) S_________ Sampling:

A random starting point in the population is selected, and every kth element thereafter is a sample point in the sample.

___ equivalent to simple random sampling because every set of n items does not have an equal probability of being selected. Bias will result if we have periodicity to the items in the population. Example: Want to determine average time in movie line-ups. Sample the wait times on “cheap night ” Tuesday: Sampling time on a single day of the week, rather than a daily pattern, will result in bias amount recorded.

Page 16: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 16

Example: Sleeping: What is the average amount of sleep an individual achieves each night? Sample times on Friday night. Sampling time on a single day of the week, rather than a daily pattern, will result in bias amount recorded.

Page 17: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 17

The systematic way of drawing a simple random sample is to select a sample with the help of a random number table / generator. In Random number table, sequence of random integers between 0 - 9 are generated at random. Each has the same probability of occurrence.

Page 18: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 18

Page 19: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 19

Example Using EViews:

Random Number Generator in EViews: (i) Uniform random integer Generator: The rndint command will create (pseudo) random integers drawn uniformly from zero to a specifed maximum. The rndint command ignores the current sample and fills the entire object with random integers. Syntax Command: rndint(object_name, n)

Page 20: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 20

Type the name of the series object to fill followed by an integer value representing the maximum value n of the random integers. n should be a positive integer.

Example: Suppose we want to create a series of 10 random integers with a value from zero to 100. Type in the following code in the command window in a new workfile (with a range of 10): series x rndint(x,100) Hit return.

Page 21: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 21

Page 22: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 22

Different set of random number every time you use the command.

Page 23: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 23

Sampling With Prior K_______e Simple random sampling may be difficult when the population is large and certain elements of the population are hard to identify. Problems: E________ T___________________ There are two random sampling techniques that use prior knowledge about the population and hence, reduce the costs of simple random sampling: 1) S_________ Sampling 2) C______ Sampling

Page 24: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 24

(I) S_________ Sampling This technique divides the population into a number of distinct and similar subgroups, and then selects a proportionate number of items from each subgroup. “The use of stratified sampling requires that a population be

divided into homogeneous groups called strata. Each stratum is then sampled according to certain specified

criteria.”

If we can identify certain characteristics in the population and can separate these characteristics into subgroups, we require f____ sample points to determine the level or concentration of the characteristic under study.

Page 25: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 25

The optimal method of selecting strata is to find groups with a large _________ between strata, but with only a small variability within the strata. i.e. Groups should have large inter-strata variation, but little intra-strata variation. Each subgroup contains persons who share common traits and each subgroup is distinctly different.

Page 26: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 26

Example: A politician hires a company to determine the political platform he/she should stress: job creation or lower taxes. The research team divides the population into income classes: Upper, middle and lower income groups. Then, they sample the proportionate amount from each strata: In this case, we know the population is composed of 15% upper income, 55% middle income and 30% lower income. We then take a sample that contains 15% from the upper income strata, 55% from the middle income strata, and 30% from the lower income strata.

Page 27: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 27

Example: Studying the average wage rate in Canada: It is believed that people in large cities earn higher wages than people in smaller towns. Could divide the population of Canada into urban and rural strata. Then, study the wage rate of each group to get an average wage rate in Canada. Example: We want to determine if there is a difference between the average amount of healthcare consumed by the rich and the poor. The researcher would divide the people into strata according to their gross incomes. Then the researcher could sample each stratum to determine if there is a statistically significant difference between healthcare usage among the two groups.

Page 28: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 28

Advantages: (1) If homogeneous subsets of a population can be identified, then only a relatively ______number of sample observations are needed to determine the characteristics of each subset. Thus, stratified sampling is usually less expensive than simple random sampling, because we only require a small number of sample points to get an accurate measure of the characteristic under study. (2) Use of prior knowledge about the population may improve the ______ of the statistical inference based on stratified sampling as compared to simple random sampling

improvement in the “efficiency” of the estimate. Example: The cost of post secondary education is higher for people who do not live in a city with a university. We want to measure the average size of a student loan.

Page 29: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 29

65% of university students are residents of the city. 35% are from other parts of the province.

If we divide students into two strata: residents and non-residents, and then sample in the same proportion, we may get a better average student loan estimate.

(Disproportionate stratified sampling).

65% of university students are residents of the city

35% are from other parts of the province

Page 30: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 30

(3) By-Product: A side-effect In dividing the population into strata, a valuable side effect

is produced. Can determine inferences about each group without further sampling. Example: Continuation from above: If you wanted to know the average loan size of students that are not residents of the university city, we would sample from that group.

Page 31: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 31

(II) C______ Sampling The population is subdivided into groups called clusters, where each cluster has the s___ characteristics as the population.

Each cluster is assumed to be representative of the p_________. Hence, the researcher has only to pick a few clusters to constitute the sample in order to estimate the characteristics of the population.

All elements in each cluster is taken to constitute the sample. Procedure: 1) Divide the population into ________. 2) Randomly select a ______ of clusters. 3) Select all the elements in each selected cluster.

Page 32: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 32

Example: Want to know the average income level in large Canadian cities.

It would take a lot of time gathering information from all large cities in Canada.

Could simply gather the data from a few large cities to represent the average income in large urban cities.

Toronto Vancouver Cluster #1 Cluster #2

Page 33: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 33

Example: The researcher is interested in some student characteristic. All the students in Canada is the population. Form clusters: Group together all students at all universities: Uvic, UBC, SFU, York, Guelph, Queens, McGill, etc.,. It would be very costly to sample all universities, but we could pick a few universities and look at all of the students at those universities to estimate some population characteristic. Optimal Way of Selecting Clusters: Have groups that have a lot of intra-cluster variation, but little inter-cluster variation. i.e. There should be little variation between groups, but a high variability within each group that represents the population.

Page 34: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 34

Advantages: 1) Lowers the ____ of sampling when the population is large. 2) Better _______ when do not have a total accounting or full access to all the information about the population in question. Disadvantage:

1) Clusters may not be truly representative of the population: Geographical bias. The student cluster at the University of Northern B.C. may be different to the typical Vancouver area campus cluster. Because of the location of the university, there may be large differences in class size, loan size, family background, average age, political beliefs, etc.... Or not.

Page 35: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 35

Other Sampling Procedures (1)Systematic Sampling Randomly select n items from a population of size N. Select the kth item in the sample. Start at some randomly chosen point in the population and

include every kth element in the population as a sample point. Example: Population of 500 items (N=500) and sample size of 50; Pick every 10th item.

Cost efficient, but sample could be bias if the data has a cyclical pattern to it.

Page 36: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 36

(2) Two Stage Sampling These are samples within clusters. Samples where elements are drawn in two different stages. Example: Explore some student characteristic:

Cluster all universities and colleges and pick a few representative clusters in stage 1;

Within these clusters, randomly select a sample of students.

Page 37: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 37

(3) Sequential Sampling Like two-stage sampling. Sampling is done in 2 or more stages. Initial sampling: Results are

Perfect no more sampling

OK no more sampling

Not OK more samples required Sample is perfect, OK not OK Process carries on until we have a good sample.

Page 38: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 38

Stage 1: Within your cluster of students in a university, you cluster all non-resident students. Take a sample to determine the average student loan size. Your results are very low compared to other university clusters. Stage 2: Take another sample from your original cluster. This time the average is 50% higher than the first. Stage 3: Take another sample from the cluster and your results are somewhere between the results from stage 1 and 2. Stop sampling. The motivation for these three other sampling techniques is cost savings. This is typically because: -Sampling is expensive. -Sampling is destructive (Computer Chip example). -Research team travel is extensive and expensive.

Page 39: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 39

Non-Probabilistic Sampling 1) J________ Sampling:

These procedures involve judgement samples i.e. Selection of elements in a sample are determined by some judgement, opinion or ______ of one individual or many individuals. Usually employed when a random sample cannot be taken or it is not practical to take a random sample.

Page 40: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 40

Example: The local government is concerned about a proposed change in speed limits in town. Seventy-five drivers are asked their opinion regarding speed limits. Out of the 75 interviews, the researcher disregards any response from a driver with any speeding ticket history. Example: The instructor picks 10 students to honestly assess the class’s understanding of the course material. Such students come to class regularly.

Page 41: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 41

2) Q____ Sampling The number of sample observations gathered is dictated to the

researcher before the survey begins, but it is left up to the researcher who he/she picks to survey.

If guidelines are clear and the researcher is experienced and honest, this type of sampling is very cost efficient. Problem: researcher ____ may enter the survey process. Usually used for: market research political preferences

Page 42: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 42

3) C__________ Sampling

This sampling procedure selects observations on the basis of convenience to the researcher.

Least representative sampling.

Example: Elian Gonzalez from Cuba: TV interviewer questions the “average” person on the street to collect public opinion. Example: An interviewer asks whoever she meets on the street whether: laws should be tougher to deter serious crime, or whether gun control should be tighter to prevent serious accidents, or whether capital punishment should be reinstated after some serious crime has been committed by a 25 year old,..etc..

Bias results Used primarily for preliminary studies.

Page 43: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 43

Summary: Statistical Sampling Procedures

Probabilistic Non-Probabilistic

Page 44: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 44

Sample Statistics

The primary objective of sampling from a population is to infer something about that population.

The sample design determines the degree of accuracy of on gathered from the sample.

Next we will pull together sampling and the notion of probability from Economics 245, to analyze sample results:

Page 45: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 45

Notation: Suppose we plan to take a sample of n observations in order to determine a characteristic of some random variable x. i) Let n = # of observations we are planning on taking in the sample. i.e. n = sample size (N = population size)

nN

Population is size N

Sample is size n

Page 46: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 46

Each observation in the sample is represented by xi = x1, x2,

x3, x4,...,xn. There are n xi’s in the sample space: ii) Assume the sample is random. Then each sample observation is random. X1, ...Xn are random because the sample space of each Xi is the entire population of X values. In simple random sampling, every element in the population has an e____ chance of being the observation that occurs _____.

x1, x2, x3, ...xn

Page 47: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 47

iii) Each random variable (x1, x2, x3, ...xn) has a theoretical probability distribution. (i.e. The range of possible values that the variable can take and the frequency of each possible value.) iv) Under simple random sampling, each Xi will have the same probability distribution as the population random variable X, since the sample space for each one is the entire population of X-values. Example: Population is Age (Xi = age) of students giving blood on Friday morning:

18 18 27 19 3731 20 20 21 1822 18 19 19 2525 24 26 18 1819 21 21 29 19

Population is size 25 (N = 25)

Page 48: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 48

Let n=4 In our first random sample: S1:{X1=18, X2 =18, X3 = 27, X4 =19}. In our second random sample: S2 :{X1=31, X2=20, X3= 20, X4=21}. If we keep sampling, we will derive the sampling distribution.

By repeat sampling we drive a sampling distribution. Note: Probability of any Xi occurring is the same.

Page 49: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 49

Example:

N=4 and n=2 (Nn) pairs All the possible samples of size n = 2 (with replacement) are: {18, 21} {18, 25} {18, 19} {25, 18} {25, 19} {19, 21} {18, 18} {21, 21}{21, 18} {21, 25} {21, 19} {25, 21} {19, 18} {19, 25} {25, 25} {19, 19} The a______ of each sample: [19.5] [21.5] [18.5] [21.5] [22] [20] [18] [21] [19.5] [23] [20] [23] [18.5] [22] [25] [19] The population mean = 20.75; The mean of all the samples taken together also = 20.75.

Page 50: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 50

The population parameters of interest are usually a measure of central location and dispersion. Definitions: (I) Population parameter is some characteristic of the population. For example: μ = population mean; σ2 = population variance; ∏ = population proportion; (II) A sample _________ is any function of the random sample values.

Page 51: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 51

Example from above:

Sample mean: X

nX i

i

n

==∑1

1

For a sample of size 2:

221 XXX +

= One possible random sample will have a sample mean equal to: {(18+25)/2} = 21.5

Page 52: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 52

(III) Any function of a random variable is also a random variable.

Hence, the sample statistic is ______ and it has its own probability distribution.

The sample mean, sample variance and sample standard

deviation provide the most widely used information about the population mean, population variance and population standard deviation. Sample Mean :

Xn

X X X X nii

n

n= = + + +=∑1

11 2( ... ) /

Similar to the population mean calculation.

Page 53: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 53

Sample Variance: Take the sum of the squared deviations about X .

Sn

X Xn

X X X X X Xii

n

n2 2

11

22

2 211

11

=−

− =−

− + − + + −=∑( ) [( ) ( ) ( ) ]L

Recall the population variance is:σ μ2 2

1

1= −

=∑N

X ii

N

( )

We divide by (___) because is gives the best estimate of the unknown population variance. (More on this later.) Sample Standard Deviation: S S= 2 Square root of the sample variance. Same units as the ____.

Page 54: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 54

Example: Price of ice cream 1 litre. We take a sample of 5 brands of ice cream to estimate the average cost of a 1 litre carton of ice cream. Find the sample mean, variance and standard deviation: Ice Cream i=brand Price (X)

in $ (Xi - X) (Xi - X )2

Safeway 1 [4.69-5.59]=-0.9 Island Farms 2 [3.99-5.59]=-1.6 Penny Lite 3 [4.29-5.59]=-1.3 Breyers’ 4 [5.99-5.59]=0.4 Dairy Queen 5 8.99 [8.99-5.59]=3.4 n=5 ∑Xi =27.95 ∑(Xi - X ) = 0 ∑(Xi - X )2=16.78 X =5.59

Page 55: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 55

Sample Mean= 27.95/5 = $5.59; Sample Variance =16.78/(5-1)= $2 4.19; Sample Standard deviation = $2.05

Page 56: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 56

Using EViews:

Notice that EViews assumes that we are using a sample, not a population, in its calculations.

Page 57: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 57

Sample Statistics Using Frequencies

Data are often presented in the form of a frequency distribution.

That is, some values of Xi occur more than once.

The number of times Xi occurs (its frequency,) is denoted as “ fi”.

If data is in the form of a frequency distribution, then we make the following adjustments to our formulas:

Page 58: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 58

The Sample Mean:

Xn

X f X f X f X f ni ii

k

k k= = + + +=∑1

11 1 2 2( ... ) /

where: k = # of different values of Xi fi = # of times the value of Xi occurs The Sample Variance:

Sn

X X fn

X X f X X f X X fi ii

k

k k2 2

11

21 2

22

211

11

=−

− =−

− + − + + −=∑( ) [( ) ( ) ( ) ]L

Page 59: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 59

Example: Examine the number of visits to McDonald’s each month: As a vehicle goes through the drive-through on the last day of the month, each customer is asked how often they made a purchase at McDonald’s that month: # of Visits Freq (V* fi) V-xbar (V-xbar)2 (V-xbar)2*fi (V) (fi) 1 5 5 -3.618 13.0910 65.456 2 8 16 -2.618 6.8549 54.839 3 10 30 -1.618 2.6185 26.185 4 7 28 -0.618 0.3821 2.675 5 6 30 0.3818 0.1458 0.875 6 5 30 1.3818 1.9090 9.547 7 4 28 2.3818 5.6730 22.692 8 5 40 3.3818 11.4370 57.183 9 3 27 4.3820 19.2000 57.601 10 2 20 5.3820 28.9640 57.928 n=55 ∑254 ∑=354.982

Mean=_______ Variance = 354.982 / 54 =______ Standard Deviation = 2.563

Page 60: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 60

Concluding Note:

Recall, the objective in calculating sample statistics (mean, variance, standard deviation), is to ________ population parameters.

When we take samples from a population, we may end up with _________ results each time we take a different sample.

The only way to determine the true population parameters is to take account of every element in the population (census). Problem: Census is too expensive.

Page 61: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 61

Thus, we must accept that sample statistics generated from sampling will be used to estimate population parameters, and we must make statements on how reliable or accurate such sample statistics are in describing population parameters.

Page 62: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 62

Page 63: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 63

Page 64: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 64

Page 65: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 65

Page 66: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 66

Page 67: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 67

Probability Distribution of A Statistic

Sampling Distribution of X Each sample statistic is a random variable. Recall from the notes: i) Let n = # of observations we are planning on taking in the sample. i.e. n = sample size (N = population size) Each observation in the sample is represented by xi = x1, x2, ,...,xn. There are n xi’s in the sample space.

Page 68: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 68

ii) Assume the sample is random. Then each sample observation is random. X1, ...Xn are random because the sample space of each Xi is the entire population of X values. In simple random sampling, every element in the population has an equal chance of being the observation that occurs first. iii) Each random variable (x1, x2, x3, ...xn) has a theoretical probability distribution. (i.e. The range of possible values that the variable can take and the frequency of each possible value.) iv) Under simple random sampling, each Xi will have the same probability distribution as the population random variable X, since the sample space for each one is the entire population of X-values.

Therefore, each _________ has its own probability distribution.

Page 69: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 69

The probability distribution of a sample statistic is called its: “sampling ____________.”

So it is legitimate to ask: “What is the expected value of X [E( X )]? Or “What is the variance of X [V( X )]? The results help determine how “reliable” or accurate our sample statistics are as measures of the corresponding population parameters. i.e. How good are our estimates of the population parameters.

Page 70: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 70

Recall: (i) Each ______ we draw from a particular population will have its own sample statistic. i.e. For a particular sample, it will have a particular derived sample mean, sample variance, etc.. (ii) Usually each sample implies a different ___________ value. i.e. Each sample will have a different sample mean, sample variance, etc..

Page 71: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 71

Example: Population=[3, 4, 5, 7 10]; N=5 Let n=2. Sample 1= {3, 4} X1

3 42

3 5=+

= .

Sample 2= {7, 10} X2

7 102

8 5=+

= .

. . . Etc. (iii) If we derive all possible samples of a fixed size (n), calculate each sample’s statistics, and build up the associated probability distribution, this is what we call the “________ distribution.”

Page 72: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 72

Illustration: Sampling Distribution of X (the Sample Mean) “The sampling distribution of X is the probability distribution of all________ values of X that could occur when a sample of size n

is taken from some specified population.” Note: A sampling distribution is a p_________ because it contains all the possible ______ of some sample statistic. (Population of X ={ X1 , X2 , X3,… Xk })

Page 73: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 73

Example: Consider a parent population with only 4 values: [4, 6, 8, 10] (the number of vacations in the past 5 years by four people) that occur with _____ probability: X 4 6 8 10 If the population X = {4, 6, 8, 10} then µ = 28/4 = 7. The average number of trips is 7. The Variance is 5:

Xi (Xi-μ ) (X-μ )2 Var(X)=Σ (X-μ )2/N 4 20/4 =5 6 8 10

P(x)

0.25

Uniform Distribution

Page 74: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 74

Now assume we do n__ know much about the population and decide to take a random sample of size n =2 (with ___________). Your sample will look like one of the following ordered pairs:

(X1, X2) X (4, 4) (4, 6) (6, 4) (4, 8) (6, 6) (8, 4) (4, 10) (6, 8) (8, 6) (10, 4) (6, 10) (8, 8) (10, 6) (8, 10) (10, 8) 9 (10, 10) 10

Page 75: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 75

Sampling Distribution of X for n= 2 when X ={4, 6, 8,10}:

X P( X )

216

18=

316

416

14=

316

216

18=

10 116

316

Probability Distribution of X for n=2

416

216

116

4 5 6 7 8 9 10 X

Page 76: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 76

Mean of The Sample Mean X

Notation: Let μX denote the “mean of the sampling

distribution of X .”

I.e. It is the mean of all possible ______ means (the population of X -values). So :

E X X P XX i i

i

k

( ) ( )= ==∑μ

1 where i=1, 2, ….k, and k is the number of distinct possible values of X .

Page 77: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 77

In our last example k=7 :

( ) ( ) ( ) ( )( ) ( ) ( )

E X( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

= + + +

+ + + = =

4 116 5 2

16 6 316 7 4

16

8 316 9 2

16 10 116 7 μ

So for the population: μ μX = =__

It is not a coincidence that these 2 means are e____!!

μX =__ for any parent population and any given s______ size.

Page 78: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 78

Hence, if µ is the mean of the population of X’s and μX is the ____ of the population of X ’s, then the _______ value of X :

E X X( ) __= =μ

Proof:

Let Xi ~(μX , σ 2) for all i X is an independently

distributed random variable, with a mean of μX and variance of σ 2 .

Page 79: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 79

Since: (i) X

nX X Xn= + + +

11 2( )L

and

(ii) E (Xi) = μ , we can apply the rules of expectation:

Page 80: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 80

( )

( ) ( ) ( ) ( )[ ]( )

( )

E X En

X

nE X

nE X X X

nE X E X E X E X

n

nn

ii

n

ii

n

n

n

X

( )

. #

=⎛⎝⎜

⎞⎠⎟

=⎛⎝⎜

⎞⎠⎟

= + + +

= + + + +

= + + + +

= =

=

=

1

1

1

1

1

1

1

1

1 2

1 2 3

L

L

Lμ μ μ μ

μ μ μ

Take the expectation operator through and pull n out.

Expand

Take the expectation operator through.

Replace with μ

n’s cancel

Page 81: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 81

Variance of the Sample Mean(V( X ))

We also need to know the ________ of the sampling distribution of X for a given sample size n. Notation: The ________ of the values of X is denoted by either:

V X or X( ) σ 2

The _______ is the average of the squared deviations of the

variable X about its mean μX . σ μ μX X X

XV X E X X P X2 2 2= = − = −∑( ) ( ) ( ) ( )

Page 82: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 82

Continuation of the previous example:

X P( X ) ( )X X− μ ( )X X− μ 2 ( )X X− μ 2 P( X ) 1

16 (4-7)=-3 916

216

18= (5-7)=-2 8

16 3

16 (6-7)=-1 316

416

14= (7-7)=0 0

316 (8-7)=1 3

16 2

161

8= (9-7)=2 816

10 116 (10-7)=3 9

16

( )X X P Xi − = =∑ 2 4016 2 5( ) .

Page 83: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 83

Mean = __ Variance = ____ Recall the population variance = ___

Notice that: V X nX( ) = =σ σ2 2

i.e. 5/2= 2.5

This is not a coincidence either!! Recall: The Var(X) = Var(Xi∀ i) =σ 2 . ‘n’ is the sample size. Sampling distribution is for all possible samples of size n.

Page 84: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 84

Proof:

( )

( )

( )[ ]

V X Vn

X

nV X

nV X

and Since X s are independent under random sampling

n

nn

n

ii

i

i

( )

( )

' :

.

=⎛⎝⎜

⎞⎠⎟

=⎛⎝⎜

⎞⎠⎟

=⎛⎝⎜

⎞⎠⎟

= + + + +

=

=

1

1

1

1

1

1

2

2

22 2 2 2

22

2

σ σ σ σ

σ

σ

L

Page 85: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 85

Although we calculated the value of V X( ) directly in this 4 element population of Xi’s, in problems where there are many values of X , direct calculation is impractical. As long as we know the ________ of the population σ 2 , we can calculate the V X( ) . This is because the variance of the random variable X is related to ___, the population variance, and to the sample ___ by the formula:

V X n( ) = σ 2

The variance of X is always ____ than or equal to the population variance.

Page 86: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 86

The variance of the mean of a sample of n independent observations is 1

n times the variance of the parent population.

( )V X nX( ) = =σ σ2 21

When n=1, the samples contain only one observation and distribution of X and X are the s___.

As n increases, σ X2

becomes _______ because the sample means will tend to be closer to the value of the population

mean μ X . When n = N (in a finite population) all sample means will _____ the population mean and the V X( ) ’s will equal 0.

Page 87: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 87

With our example, the population variance (σ 2 ) is known (= 5) and n=2: So the variance of X (V X( ) ) is:

( ) ( )( )V X nX( ) .= = = =σ σ2 21 12 5 2 5

Page 88: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 88

What Happens toV X( ) as n Increases?

Because each sample contains more ___________ or more elements of the population as the sample size increases, the sample will be ______ to the population, so expect less variability. Example: Suppose X ~N(0, 100) Randomly draw samples of size: (i) 10 (ii) 100 (iii) 1000 from this population.

Page 89: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 89

Calculate X 10 for all possible samples of size 10. Calculate X 100 for all possible samples of size 100. Calculate X 1000 for all possible samples of size 1000. Then we can show: For n=10: d_________of Xbar is quite wide around the mean of 0. For n=1000: less variation around the mean of zero.

When n approaches infinity, there is no __________ and variance of X =0.

Page 90: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 90

Sampling Distributions for Xbar: Various Sample Sizes:

Page 91: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 91

Page 92: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 92

Standard Error of the Mean Notation: We usually denote the standard deviation of X ’s, σ X , the standard _____ of the mean. The error refers to ________ error.

σ X is a measure of the standard expected error when the sample mean is used to obtain information or draw conclusions about the _______ population mean. Standard Error of the mean:

σ

σ σX n n= =

2

Page 93: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 93

In our example: σσ

X n= = = =

2 52

2 2362

15811.

. Notes: (i) μ X and σ X are parameters of the population of sample averages for all conceivable samples of size n. These parameters are usually u______. (ii) The population parameters (μ, σ2) are also usually unknown.

(iii) This means that we c____use the relationships: μ μ σ σ= =X Xand n to solve for values of one of these statistics.

Page 94: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 94

But these relationships allow us to test hypotheses about the population parameters on the basis of sample results. More on this later........ Next: We now have derived the mean and the variance of the sampling distribution, but have not said anything about the s____ of the sampling distribution of X . Recall that distributions with the same mean and variance can have very different s_____. We must now specify an assumption about the entire distribution of X ’s:

Page 95: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 95

Sampling distribution of X , Normal Parent Population It is typically not possible to specify the shape of the X ’s when the parent population is ________ and the sample size is _____.

However, the shape of a sample taken from a normally distributed parent population (X) can be specified. In this case, the X ’s are distributed ________.

“ The sampling distribution of X ’s drawn from a normal parent population is a ______ distribution.”

Recall: The mean of the X s is μ μX = and the variance of X s is σ σ

X n2 2= .

Page 96: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 96

Hence the sampling distribution of X is:

X ~ N N nX X( , ) ( , )μ σ μ σ2 2=

whenever the parent population is normal. X~N(μ,σ2).

Meaning, regardless of the _____ of the parent population,

the mean and variance of X equal: μ X and σ σX n2 2= .

From the last example: X~N(0, 100). Hence,

( )

( )

( )

X N N

X N N

X N N

10

100

1000

0 10010 0 10

0 100100 0 1

0 1001000 0 01

~ , ( , )

~ , ( , )

~ , ( , . ).

=

=

=

Page 97: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 97

Remember: The normal distribution is a __________ distribution. (I.e. infinite number of different samples could be drawn.) Example: Suppose all the possible samples of size 10 are drawn from a normal distribution that has a mean of 25 and a variance of 50. That is, X is normally distributed with a mean μ=25 and variance σ2=50 : X~N(25,50).

Since the population ____ μ=25, the mean of X s equal μX =25.

Since the population variance σ2=50, the variance of the

X ’s equals σ σX n2 2 50

10 5= = = . Since X is normal, X is normally distributed X ~N(25,5).

Page 98: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 98

What this means is: 68.3% of the sample _____ will fall within ± one standard error of the mean: σ X = =5 2 24. . μ σ+ = ± =1 25 1 2 24 22 76 27 24X to( )( . ) . . . 95.43% of the sample _____ will fall within ± two standard errors of the mean: μ σ+ = ± = ± ⇒2 25 2 2 24 25 4 48 2052 29 48X to( )( . ) . . . . 99.7% of the sample _____ will fall within ± three standard errors of the mean: μ σ+ = ± = ± ⇒3 25 3 2 24 25 6 72 18 28 3172X to( )( . ) . . . .

Page 99: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 99

The Standardized Form of the Random Variable X and σ

Known In Economics 245, we saw that it is easier to work with the standard normal form of a variable than it is to leave it in its original units.

Page 100: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 100

The same type of t____________ made on a random variable X, can be made on the random variable X .

Recall, to transform the random variable X to its standard normal form (Z), we subtract the mean from each value and divide by the standard deviation:

ZX

=−( )μσ ←Z has a mean = 0 & variance = 1.

Z~N(0,1).

The standardization of X is ____________the same way:

Z

X X

n

X

X

=−

=−( )μ

σμ

σ

Page 101: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 101

The random variable Z has a mean of zero and a variance of one. Thus: When sampling from a normal parent population, the

distribution of Z

X

n=

− μσ will be normal with mean zero and

variance equal to one.

Page 102: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 102

Page 103: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 103

Example: Suppose X is the height (in inches) of basketball players on all university teams in Canada during summer term. Suppose X~N(75,36). A random sample of ____ players is drawn from this population.

What is the probability that the sample average team player height is less than __ inches? (What is P ( X ≤ 80)?) Solution: If X ~N(75,36), then X ~N(75,36/9=4). Standardize the variable X :

Z

X

n=

−=

−= = =

μσ

80 756

9

56

3

52

2 5.

Page 104: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 104

Looking at the Cumulative Standardized Normal Distribution Table F(Z), on page ___, the P(Z ≤2.5) = 0.____. The probability that the average height of basketball players in our sample of size 9 is less than 80 inches is 99.38%.

Z 0 2.5

0.9938=CDF

Page 105: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 105

Example: Let X be the amount of money customers owe on home mortgages at the Bank of Nova Scotia (in thousands of $). Suppose X~N(150,____). Draw a random sample of __ from the population. What is the probability that the average amount owing is greater that $200? P X( )≥ 200 ? Solution: X~N(150,8100), so X ~N(150,8100/25=324)=N(150, 324); Z X

n=

−=

−= = =

μσ

200 15090

5090

5

5018

2 78__

.

Page 106: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 106

P(Z ≥2.78) = (1-0.____) = 0.0027.

The probability that average amount owing on a mortgage is greater that $200 is .27%.

Z

2.78

0.9973=CDF

0.0027

Page 107: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 107

The limitations from the last section is obvious: “ We cannot always assume that the parent population is ________.”

What if the Population is Non-Normal?

Page 108: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 108

Sampling Distribution of X : Population Distribution Unknown and σ Known

When the samples drawn are not from a normal population

or when the population distribution is u_____, the size of the s_____ is extremely important.

When the sample size is ____, the shape of the distribution will depend mostly on the shape of the p_____ population. As the sample size increases, the shape of the sampling distribution of X will become more and more like a n_____ distribution, regardless of the shape of the p_____ population.

Page 109: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 109

Central Limit Theorem: “Regardless of the distribution of the parent

population, as long as it has a finite m___ µ and variance σ2, the distribution of the m____ of the

random samples will approach a n_____ distribution, with mean μ and variance σ2/n, as the sample size n,

goes to infinity.” (I) When the parent population is normal, the sampling distribution of X is exactly n_____. (II) When the parent population is not normal or unknown, the sampling distribution of X is approximately ______ as the sample size increases.

Page 110: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 110

Example: Let the sample be (X1, X2, ... ,Xn) Let S=(X1 + X2 + X3+...+Xn) E(S) = E(X1) + E(X2) + ... +E(Xn) =ΣE(Xi) = n(E(X)=__ V(S) = V(X1 + X2 + ...+ Xn) = V(X1) +V(X2) + ...+V( Xn) =ΣV(Xi) = nV(Xi) = __2. Assuming independence. So according to the CLT as n →∞ S → N(nμ, nσ2)

Page 111: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 111

Now, Xn

XX X X

nSni

i

nn= =

+ + +=

=∑1

1

1 2( )L

. The expected value of X is:

E Xn

E Sn

n

and the iance of X

V X VSn n

V Sn

nn

( ) ( )

var :

( ) ( ) .

= = =

=⎛⎝⎜

⎞⎠⎟ = = =

1 1

1 12 2

22

μ μ

σσ

So, according to the CLT: as n →∞ , X ~N(μ, σ2/n) regardless of the ____ of the parent population distribution.

Page 112: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 112

(Note: the CLT applies in discrete and continuous cases.)

Page 113: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 113

The first row of diagrams shows different parent populations. The next 3 rows show the sampling distribution of X for all possible repeated samples of size n=2, n=5, and n=30, drawn from the populations in the first row. Column 1 : U______ Population At n=2, symmetrical At n=5, normal looking distribution Column 2: _______ Population At n=2 the distribution is symmetrical. At n=5, the distribution is bell-shaped.

Page 114: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 114

Column 3 Highly s_____ exponential Population. At n=2 and n=5, the distribution is still skewed. At n=30, symmetrical bell-shaped distribution for X → normal. In general, if n ≥ __, the sampling distribution of X will be a good approximation.

Page 115: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 115

Page 116: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 116

Sampling Distribution of X , Normal Population , σ Unknown

Recall that if X~N(μ,σ2), then X ~N(μ,σ2/n) ; Also recall that the standardized form of Z,

ZX

n=

−( )μσ

is important in the determination of p_________y of X taking some value, assuming that its population mean is μ. We then use this probability distribution for p______ solving and decision making.

But what happens if _ is unknown?

Page 117: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 117

In solving a problem where σ is unknown, ‘s’, the sample statistic for s_______ deviation of σ, can be applied to solve problems involving standardization. It is l_________ because it can be shown that: E(S2)=_2

and we can standardize creating a new ratio:

t

Xs

n=

− μ

.

Where “t-ratio” is not ________ distributed. The resulting distribution no longer has a v_______ equal to 1.

Page 118: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 118

To determine the distribution of the ratio Xs

n

− μ we follow

these steps: 1) Collect all the possible samples of size n from a normal parent population. 2) Calculate X and s for each sample. 3) Subtract μ from each value of X , and then divide this

deviation by the appropriate value of sn .

This process will generate an infinite number of values of this

random variable Xs

n

− μ.

Page 119: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 119

The m___ of the t-distribution still equals 0.

The v_______ no longer equals V(Z) = 1. It is _____. Because we use ‘s’ to standardize, the dispersion or the variation around the mean zero, will be wider.

“s” introduces an element of uncertainty or ____ because s2 is a parameter estimate, not the a_____ population parameter. Hence the more uncertainty there is, the more s_____ out the distribution.

Page 120: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 120

Notes About The t- Distribution: 1) The t-distribution was developed by W.S. Gossett. It consists of two random variables X and s. Hence, the variable “t” is a __________ random variable. 2) [ ]−∞ < < ∞t Cumulative probability.

Page 121: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 121

3) The t-distribution is symmetrical: E(t) =0= median = mode. 4) Variability of the t-distribution depends on the sample ___ (n), since n affects the reliability of the estimate of ‘σ’ which ‘s’ estimates.

When n is l____, ‘s’ will be a good estimator of σ. When n is s____, ‘s’ may not be a good estimator.

The variability of the distribution depends on n:

tXs

n=

− μ

The t-distribution tZ

X

n= =

χ ν

μσ

χν

2 2/

( )

.

Page 122: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 122

6) We characterize the t-distribution in terms of the sample size minus one, (n-1). The (n-1) is referred to as the number of “degrees of f______” (d.f.), which represent the number of in_________ pieces of information that are used to estimate the standard deviation of the parent population. ν ← “nu” denotes degrees of freedom: ν=(n-1). t-distribution is described by ν degrees of freedom. (i) The mean of t-distribution =0; [E(t) = 0].

(ii) The variance for n≥3, is V tn

nnn

( )( )

( )( )

( )( )

.=−

=−

− −=

−−

νν 2

11 2

13

Page 123: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 123

7) For small sample sizes, the t-distribution is typically more spread out than the normal distribution.

t-distribution typically has fatter _____ than the Z for small degrees of freedom.

When the degrees of freedom are larger than __, the t distribution resembles the ______ distribution.

In the limit, as n approaches infinity, the t and Z distributions are the ____. So, the t-tables usually have probability values for ν ≤ 30 , since larger samples normally give a good approximation and are easier to use.

Page 124: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 124

Although the distribution holds for any sample size, we usually use the t-distribution when we are using s____ samples.

Standard normal

Page 125: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 125

Probability Applications: Probability questions involving a t-distributed random

variable can be solved by forming the t-statistic: t

Xs

n=

− μ,

and determining the probability by using the Student t-table or a computer generated value (using the @ctdist(x,ν) command in EViews).

The Student t-table gives the values of “t” for selected values of the probability 1-F(t)=1-P(t<t)=P(t>tv,α )= α across the top of the table and for degree of freedom (ν) down the left margin.

Page 126: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 126

Table gives probabilities for selected t-values for each degree of freedom.

More extensive tables are available.

The easiest way to determine probabilities is to use a statistical package.

-0.685 0 0.685 tν=24

Recall, the t-distribution is the appropriate statistic for inference on a population ____ whenever the parent population is normally distributed and σ is _______.

F(0.685)=0.75 α=1-F(tν)

Page 127: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 127

Example: A large restaurant reports its outstanding bills to suppliers are approximately normally distributed with a mean of $1200. The standard deviation is _______. A random sample of 10 accounts is taken. The mean of the sample X =980, with a standard deviation s=___. What is the probability that the sample mean will be $980 or lower when μ=1200? P( X ≤ 980)? To solve, standardize the values:

tXs

n=

−=

−=

−= −

μ 980 1200210

10

22066 4078

3312.

.

Using t=-3.312 does not appear in the row ν=9. Use table to determine an upper and lower bound for P(t<-3.312):

Page 128: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 128

F(-3.250) = 0.005 F(-4.781) = 0.0005 0.005 ≤ P(t≤-3.312) ≤ 0.0005. A Sample mean as low or lower than 980 will occur approximately between 0.5% to 0.05% of the time with μ=$1200. May be concerned with the accuracy of the sample.

Using EViews :

Page 129: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 129

Draw another sample: Next sample: What is the P X( ) ?≥ 1250 X =1250, s=195 and n=10

tXs

n=

−=

−= =

μ 1250 1200195

10

50616644

08108.

. .

P X( )≥ 1250 : F(0.703)=0.75 → 1-F(0.703)=0.25 F(1.383)=0.90 → 1-F(1.383)=0.10 ( . ( . ) . )0 25 08108 010≥ ≥ ≥P t

Page 130: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 130

Using EViews:

t9 0 0.8108

21.92%

t=0.703 (25% in the right tail)

t= 1.383 (10% in the right

Page 131: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 131

Example: Determine an interval (a,b) such that P(a t b) 0.90≤ ≤ = , assuming n-1=19 degrees of freedom.

Put half of the excluded area in each tail of the distribution: ( )1

2010 05⎛

⎝⎜⎞⎠⎟ =. .

t19 a 0 b P(t b) 0.05 F(1.729) = 0.95 b = 1.729≥ = ⇒ ⇒

0.050.05

0.95

0.90

Page 132: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 132

Since the t-distribution is symmetrical, ‘a’ is the negative value of b: a = -_____. P(-1.729 t ) 0.90≤ ≤ =1729. Use of the t-Distribution When the Population is Not Normal The discussion so far regarding the t-distribution assumes that samples drawn are from a ________ distributed parent population.

But often we cannot be sure or we cannot determine if the parent population is ______.

Page 133: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 133

“So how important is this ________ assumption?”

The normality assumption can be _______ without significantly changing the sampling distribution of the t-distribution.

The distribution is said to be quite “robust”, which implies the results still hold even if the assumptions about the parent population do not _______ to the original assumption of normality.

We must stress that the t-distribution is appropriate whenever ‘x’ is normal and σ is _______ , even though many t-tables do not list values higher than ν=30.

Page 134: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 134

Some texts suggest that the ______ distribution be used to approximate the t-distribution when ν > 30, since t and z-values will then be quite _____. Because of this procedure, the t-distribution is sometimes erroneously applied to only _____ samples. But, the t-distribution is always correct whenever σ is unknown and x is normal.

Page 135: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 135

The Sampling Distribution of the Sample Variance s2, Normal Population

We examined the sampling distribution of X to determine how good X is as an estimator of μ. Now we need to examine the ________ distribution of s2 to consider issues about σ2. That is, need to explore the distribution that consists of all the possible values of s2 calculated from _______ of size n.

Page 136: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 136

Page 137: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 137

Characteristics of the sample ________: 1) s2 must always be p_______. Hence, the distribution of s2 cannot be a ______ distribution. “s2 ” is a unimodal distribution that is skewed to the ____ and looks like a smooth curve. Sampling is from a normal population and it has one parameter, the degree of freedom.

Page 138: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 138

The shape depends on the sample ____. 2) The usual application involving s2, is analyzing whether s2 will be larger or smaller than some observed value, given some assumed value of σ2.

f(χ2)

(χν2)

Page 139: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 139

Example: Given σ2=0.020, what is the probability that a random sample of n=10 will result in a sample variance s2 = 0.015? P(s2 ≥0.015) assuming (n-1) =9 and σ2= 0.02?

We cannot directly solve this type of problem. We must _________ it: “Multiply s2 by (n-1) then divide the product by σ2.” This new random variable is denoted “χ2” → Chi-square

The Chi-squared distribution is part of a family of positively ______ density functions, which depend on one parameter, n-1, which is its degree of freedom:

Page 140: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 140

χνσ σn

s n s− = =

−1

22

2

2

2

1( ) ( )

Page 141: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 141

If s2 is the variance of random samples of size n taken from a normal population having a ________ of σ2, then the variable

( )n s− 1 2

2σ has the same distribution as a χ2-variable with (n-1) d.f. Solving a problem involving s2 follows the same process as solving problems for X . Example Continuation:

P s P n s P( . ) ( ) ( ) ..

( . )22

2 920 015 1 9 0 015

0 026 75≥ =

−≥

⎣⎢

⎦⎥ = ≥

σχ

Page 142: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 142

Properties of χ2Distribution

1) The number of degrees of freedom in a χ 2 distribution determine its _____ f( χ

2).

When the degrees of freedom is _____, the shape of the

density function is highly skewed to the _____.

As ν gets larger, the distribution becomes more s__________.

As ν → ∞ . The chi-square distribution becomes normal.

2) χ2

is never less than ____. It has values between zero and positive infinity.

Page 143: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 143

3) E ( )χ υυ2 =

4) V ( )χ υυ

2 2=

Table 7 in Appendix gives values of the 1-cumulative χ2

distribution for selected values of ν .

Page 144: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 144

Example: Use the Chi-squared distribution to solve the following: Assume the sample variance equals $216, s2=__, the population variance =$29, σ 2 = 9, and the sample is of size 11, n=__. What is the probability that P s( )2 16≥ ? υ

σ σs n s2

2

2

2

1 10 169 17 78=

−= =

( ) ( ) . From Appendix: 1-F(0.100) = ______ 1-F(0.050) = _____ Meaning: 0 10 17 78 0 052. ( . ) .> ≥ >P χ

Page 145: Topic 1: Sampling and Sampling Distributions Chapter 6 Objective

Topic 1 --- page 145