At the beginning of the term, we talked about populations and samples What are they? Why do we...

Sampling

Why We Sample

At the beginning of the term, we talked about populations and samples What are they? Why do we take samples?

Sampling

Generally, we want to know about the population But, studying/surveying the entire

population is problematic!▪ Too costly▪ May be impossible!

Sampling

So, we typically study samples rather than entire populations But, we are not usually interested in the

sample itself We hope that the sample will give

us insight into the population

Sampling

Starting here, we will look at the relationship between samples and populations What we can learn How precise/reliable the information is

Sample Mean vs. Population Mean

Suppose we were interested in knowing the average travel time for students coming to Seneca We don’t want to ask every Seneca

student So, we take a sample We hope that the sample mean will give

us insight into the population mean

Will the sample mean be exactly equal to the population mean?

Will the sample mean be exactly equal to the population mean? No, because it depends on exactly who

winds up in our sample

Will the sample mean be the same same for every sample?

Will the sample mean be the same same for every sample? No, because it depends on exactly who

winds up in our sample

Let’s Try This!

Get into groups (samples) of two, and calculate your average travel time

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample

Do these samples give us reliable estimates of the population mean?

Do these samples give us reliable estimates of the population mean? VERY SMALL -> Subject to a great deal of

randomness

Let’s Try It Again…

Groups of 3

Groups of 5

Groups of 10

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample

2. The larger the sample, the more likely that the sample mean will be close to the population mean

In larger samples, the randomness tends to ‘average out’, meaning less random fluctuation from sample to sample

Larger samples give more reliable results

Implications

Because the sample mean is random, we can describe it using a probability distribution I.e., for any given sample mean, there is

some probability And, we can talk about, ‘what is the

probability that we get a sample mean in the range ______?’

Called the ‘sampling distribution’

What Does the Sampling Distribution Look Like?

Depending on the actual raw data distribution, the distribution of the sample mean can have many different shapes In the next slide, we look at three

different data distributions, and what the distribution of the sample means looks like▪ When sample size, n, =2

Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

Those distributions look strange!

But, as sample size increases, wonderful things happen: First, the sample mean gets more

accurate▪ The distribution gets narrower▪ I.e., the probability of getting a sample

mean far from the real population mean is low

Second, the distribution changes shape

Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

When n=2

When n=10When n=30

Central Limit Theorem

As we take larger samples, the distribution of the sample mean approaches the normal distribution! (Almost) regardless of the shape of the actual

Because of this, we can use what we have learned about the normal distribution to, e.g., judge how reliable/accurate our sample results are!

T-Distribution

As discussed, if the sample size is large, the sampling distribution approaches the normal distribution But, its not exactly equal to the normal

distribution▪ Especially if n is small!

For this reason, we have another distribution that we use, which is closely related

T-Distribution

T distribution takes sample size into account

T is wider and flatter than normal The smaller the

sample, the wider and flatter!▪ Reflecting that the

information is less reliable▪ I.e., that we are more

likely to get a result far from the real population mean

.5 -2-1

.5 -1-0

.5 00.

NormalT, n=2T, n=4

T-Distribution

T use the t-distribution we need to provide degrees of freedom This is just n – 1▪ (Sample size – 1)

Understanding Sample Mean

We can use the t-distribution to determine the probability of getting a mean in a given range, in the same way we used the normal distribution to find the probability of getting a value in a certain range

Solving Sample Questions When using t, no built-in ‘one-step’ like

norm.dist

2-step process1. Convert the x-value(s) into t-scores▪ Like z-scores!

2. Use the t-score(s) to look up the probability▪ Using t.dist▪ And the same structure: ‘Less than’ -> t.dist;

‘Greater than’ -> 1-t.dist; ‘Between’ -> t.dist(big) – t.dist(small)

Step 1: T-score

Recall: z = (value – mean)/SD

T-score: t = (value – mean)/(SD/sqrt(n))

Divide standard deviation by square root of sample size• The bigger the sample size,

the bigger number you divide SD by• -> Smaller SD -> less

spread out/more accurate!

Step 2: Calculate Probability

=t.dist(t-score, degrees of freedom, True)

Business Problems

I will walk you through an example, but first, we note that we cover this primarily so you will understand what comes later Direct business applications (or at least,

marketing applications) aren’t as common as for other techniques

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

=1 – norm.dist(180, 176, 7.1, true) ≈ 0.287

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

t = (180-176)/(7.1/sqrt(5)) = 1.259756607

prob =1 – t.dist(1.259756607, 7.1, true) ≈ 0.138

More practice

Repeat, with: Sample size of 15 Sample size of 30

What happens to the probability? Why?

At the beginning of the term, we talked about populations and samples What are they? Why do we...

Documents

Combining what we have already talked about! Conditional Statements: If_ then _______.HypothesisConclusion Change the following

Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

talked through

“SUICIDE: IT’S TIME WE TALKED”

This Is the Future We Talked about

We have talked about chromosome organization, what about genome organization?

In chapter 1, we talked about parametric equations

MOSES YEO. We talked about aspects of a good videogame One element we talked about was how to get someone addicted to the game

Final Project Report - Quarry Life Award...value nature brings to them and why it’s important to protect wild areas. We have talked to over 200 pupils from We have talked to over

Random samples eliminate bias (that’s good) Random samples ...tf54692/ch9_slides.pdf · Why do we study Probability? • Random samples eliminate bias (that’s good) • Random

MAT 1000 Mathematics in Today's World. Last Time We talked about Cryptography

From the Desk of Pastor Kaye · from the pastors there. We then talked about what a pastor does and what we qualities we think a good pastor should have. We talked about how a pastor

At the end of last week’s parsha, we talked about how the

CS 5JA Introduction to Java On Thursday... Last class we talked about a few different things: We talked about mathematical operators and about the order

LAST LECTURE WE TALKED ABOUT DIETARY VITAMINS IN SPORTS TODAY WE WILL TALK ABOUT MINERALS

2015 Annual Report - cdn.ymaws.com€¦ · survival. We listened, we talked, we networked, we compared notes, ... • 14 hands-on skills development workshops, most preceded by a

Network problems Last week, we talked about 3 disadvantages of networks. What are they?

Light So far when we have talked about waves we have talked about sound waves. Light is a special type of wave

iOS Security Part 3 - Steve Gibson · jailbreaking if you have everything we've talked about so far. But there are two things specifically that we haven't talked about, we just haven't

We talked to you