Random Numbers Toss 1 quarter – What is the probability of getting a head? Toss 1 quarter 10 times...

Preview:

Citation preview

Random Numbers

Toss 1 quarter– What is the probability of getting a head?

Toss 1 quarter 10 times– What is the proportion of heads you get?

Toss 1 quarter 40 times– What is the proportion of heads you get?

Excel Exercise 1

The outcome of a coin toss

The time between customers arriving at an ATM

The price of a share of a company’s stock at the close of the market

Independence

Trials (events, outcomes) must be independent of each other.

Probability is empirical Computer simulation is faster, but it’s not

empirical

Probability Models

Create a list of the possible outcomes Assign probabilities to each outcome.

Phone book – Benford’s Law

First digits of phone number suffixes

(e.g., 123-4567)First digits of addresses

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

Benford’s Law

In many data, the distribution of “first digits” can be modeled as a (-log) distribution.

This means that the digit “1” has a much higher probability of occurring than if the digits were uniformly distributed.

Probability Models

e.g., An “outcome” can a particular random sample of 55,000 households out of 106,000,000 households.

Suppose there are five students. If we want to draw a random sample with n=3, we have 10 distinct possibilities for samples

And therefore the sample space is ….

10!35!3

!5

3

5

the sample space is ….

Jack Dan Jill Jack Jill Helen Dan Harry Helen

Jack Dan Harry Jack Harry Helen Jill Harry Helen

Jack Dan Helen Dan Jill Harry

Jack Jill Harry Dan Jill Helen

Jack Dan Jill Jack Jill Helen Dan Harry Helen

Jack Dan Harry Jack Harry Helen Jill Harry Helen

Jack Dan Helen Dan Jill Harry

Jack Jill Harry Dan Jill Helen

Each of these is an event

Jack Dan Jill 10% 4% 5% 9% 3%

Jack Dan Harry 10% 12% 12% 5% 9%

Jack Dan Helen 10% 7% 13% 3% 25%

Jack Jill Harry 10% 0% 4% 8% 24%

Jack Jill Helen 10% 7% 10% 2% 20%

Jack Harry Helen 10% 16% 14% 16% 7%

Dan Jill Harry 10% 16% 12% 15% 2%

Dan Jill Helen 10% 10% 10% 11% 6%

Dan Harry Helen 10% 19% 14% 17% 0%

Jill Harry Helen 10% 8% 5% 14% 5%

sum 100% 100% 100% 100% 100%

What if we only care about the sum of the dots, not their order nor how we get to the sum? How many alternatives are there?

What if we only care about the number of dots on the dice, but not their order? How many alternatives are there?

A s an event

Assigning Probabilities: Intervals of Outcomes

02

y

0 .2 .4 .6 .8x

There are 10 possible events.

What is P(0.2<X<0.6)

Assigning Probabilities: Intervals of Outcomes

There are 50 possible events.

What is P(0.2<X<0.6)0

2y

0 .2 .4 .6 .8 1x

Assigning Probabilities: Intervals of Outcomes

There are 5000 possible events.

What is P(0.2<X<0.6)0

2y

0 .2 .4 .6 .8 1x

This Normal distribution is the idealized equivalent of a Normal probability model.

0.1

.2.3

.4D

en

sity

-4 -2 0 2 4z

This Normal probability model (empirical)

is idealized by the Normal distribution (computational).

Random Variables

Icosahedron

X

Y

X+Y

Assigning Probabilities: Intervals of Outcomes

02

y

0 .2 .4 .6 .8x

Both are uniformly distributed between 0 and 1, but one is a discrete random variable and the other one is continuous.

02

y

0 .2 .4 .6 .8 1x

For discrete random variables, the difference between > and ≥ matters.

Take the list of integers between 0 to 50. The probability of randomly picking any individual integer out of that sample space, say, 23, is 1/50, one fiftieth.

n 50 500 50,000 50million 50 trillion

-> Infinity

P(X=23) 1/50 1/500 1/50,000 1/50,000,000

1/50,000,000,000,

000

-> 0

For continuous random variables, the difference between > and ≥ doesn’t matter because the probability of picking any one number is basically zero.

If I buy a lottery ticket for $1 but win nothing, my net gain is -$1.

If I win (and the probability is 0.00001) $100,000, I my net gain is $99,999

Then on average, my net gain from playing the lottery is

Wrong!999,492

999,991

If I buy a lottery ticket for $1 but win nothing, my net gain is -$1.

If I win (and the probability is 0.00001) $100,000, I my net gain is $99,999

Then on average, my net gain from playing the lottery is

This is the expected value of X

45.0$000001.0999,99999999.01

Excel Exercise 2

Excel Exercise 3

=0

Not independent

The Sampling Distribution of a Sample Mean

Suppose there are five students. If we want to draw a random sample with n=3, we have 10 distinct possibilities for samples

10!35!3

!5

3

5

Jack Dan Jill Jack Jill Helen Dan Harry Helen

Jack Dan Harry Jack Harry Helen Jill Harry Helen

Jack Dan Helen Dan Jill Harry

Jack Jill Harry Dan Jill Helen

Suppose there are 1000 individuals. We can collect information from small samples (n=30, n=60, n=100, n=300).

What will be the sampling distribution of the mean (the distribution of the sample mean)?

05

.000

e-0

71.0

00e-

061

.500

e-0

62.0

00e-

062

.500

e-0

6D

ensi

ty

-500000 0 500000 1000000hprice

kernel density function of 1000 normal random numbers

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000800000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063.0

00e

-06

De

nsity

-200000 0 200000400000600000hprice

01.0

00e

-06

2.0

00e

-06

3.0

00e

-06

4.0

00e

-06

5.0

00e

-06

De

nsity

-200000 0 200000400000600000hprice

02

.00

0e-0

64.0

00e

-066

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

02

.00

0e-0

64.0

00e

-066

.00

0e-0

6D

ens

ity

-400000-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-400000-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01.0

00e

-06

2.0

00e

-06

3.0

00e

-06

4.0

00e

-06

5.0

00e

-06

De

nsity

-200000 0 200000400000600000hprice

kernel density function of normal random numbers in samples n = 30

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063.0

00e

-06

De

nsity

-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000800000hprice

01.0

00e

-06

2.0

00e

-06

3.0

00e

-06

4.0

00e

-06

5.0

00e

-06

De

nsity

-200000 0 200000400000600000800000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01.0

00e

-06

2.0

00e

-06

3.0

00e

-06

4.0

00e

-06

5.0

00e

-06

De

nsity

-400000-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000800000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-500000 0 500000 1000000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

kernel density function of normal random numbers in samples n = 60

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62

.00

0e-0

63

.00

0e-0

64

.00

0e-0

6D

ens

ity

-400000-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-500000 0 500000 1000000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-400000-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-200000 0 200000400000600000hprice

01

.00

0e-0

62.0

00e

-063.0

00e

-06

De

nsity

-200000 0 200000400000600000800000hprice

01

.00

0e-0

62.0

00e

-063

.00

0e-0

6D

ens

ity

-500000 0 500000 1000000hprice

05.0

00e

-07

1.0

00e

-06

1.5

00e

-06

2.0

00e

-06

2.5

00e

-06

De

nsity

-500000 0 500000 1000000hprice

05.0

00e

-07

1.0

00e

-06

1.5

00e

-06

2.0

00e

-06

2.5

00e

-06

De

nsity

-200000 0 200000400000600000hprice

kernel density function of normal random numbers in samples n = 100

05.

000e

-07

1.00

0e-0

61.

500e

-06

2.00

0e-0

62.

500e

-06

Den

sity

-500000 0 500000 1000000hprice

05.

000e

-07

1.00

0e-0

61.

500e

-06

2.00

0e-0

62.

500e

-06

Den

sity

-500000 0 500000 1000000hprice

05.

000e

-07

1.00

0e-0

61.

500e

-06

2.00

0e-0

62.

500e

-06

Den

sity

-500000 0 500000 1000000hprice

kernel density function of normal random numbers in samples n = 300

05

.000

e-0

71.0

00e-

061

.500

e-0

62.0

00e-

062

.500

e-0

6D

ensi

ty

-500000 0 500000 1000000hprice

kernel density function of 1000 normal random numbers

Population Sample 1 Sample 2 Sample 3

mean 201,209 174,898 145,792 221,358

st dev 199,964 203,879 176,502 200,407

Sample 4 Sample 5 Sample 6 Sample 7

mean 216,957 205,055 189,929 245,484

st dev 209,431 247,535 217,423 177,200

Sample 8 Sample 9 Sample 10

mean 207,733 231,096 201,115

st dev 207,355 189,594 170,539

n=60

0.1

.2.3

.4D

ensi

ty

-2 -1 0 1 2x

0.1

.2.3

.4D

ensi

ty

-3 -2 -1 0 1 2x

0.1

.2.3

.4D

ensi

ty

-2 -1 0 1 2x

0.1

.2.3

.4.5

Den

sity

-2 -1 0 1 2x

hist x in 1/60

hist x in 61/120

hist x in 121/180hist x in 181/240

Excel Exercise 4

-1.5 -1 -0.5 0 0.5 1 1.50

5

10

-1.5 -1 -0.5 0 0.5 1 1.50

5

10

-1.5 -1 -0.5 0 0.5 1 1.50

5

10

-1.5 -1 -0.5 0 0.5 1 1.50

5

10

= 0.0068

n=3

n=30

n=60

n=100

n=30 mean standard error

3 0.1144 0.4978

30 -0.0001 0.1752

60 -0.0421 0.1249

100 -0.011 0.0967

population 0.0068 n.a.

0 100 200 300 400 500 600 700 800 900 1000-15

-10

-5

0

5

10

15

20

Notice that the Central Limit Theorem doesn’t say “Draw an SRS of size n from a Normally distributed population.” It says “any” population.

Excel Exercise 5 and 6