17
June 4, 2009 Dr. Lisa Green

Probability and Statistics

Embed Size (px)

DESCRIPTION

Probability and Statistics. June 4, 2009 Dr. Lisa Green. Goals. Main goal: Understand the difference between probability and statistics. Also will see: Binomial Model Law of Large Numbers Monte Carlo Simulation Confidence Intervals. Probability vs. Statistics. Probability. Model. - PowerPoint PPT Presentation

Citation preview

Page 1: Probability and Statistics

June 4, 2009Dr. Lisa Green

Page 2: Probability and Statistics

Main goal: Understand the difference between probability and statistics.

Also will see:• Binomial Model• Law of Large Numbers• Monte Carlo Simulation• Confidence Intervals

Page 3: Probability and Statistics

Model Data

Probability

Statistics

Model: An idealized version of how the world works.

Data: Collected observations.

Page 4: Probability and Statistics

Probability: The model is known, and we use this knowledge to describe what the data will look like.

Statistics: The model is (partially) unknown, and we use the data to make conclusions about the model.

Page 5: Probability and Statistics

There are repeated trials, each of which has only two outcomes. (Success or Failure)

The trials are independent of each other.

The number of trials (n) is known.

The probability of success on each trial (p) is constant.

Page 6: Probability and Statistics

Flip a coin 10 times, count the number of heads seen. n=10, p=0.50

Test 100 newly manufactured widgets, count the number that fail to work. n=100, p=?

Give a blood test to 35 volunteers, count the number with high cholesterol. n=35, p=?

Page 7: Probability and Statistics

Pick a point at random inside the unit square.

If it is also inside the arc of the unit circle, count it as a success. If not, count it as a failure.

What is the probability of a success?

1 unit

Page 8: Probability and Statistics

We know that the probability of success is π/4.

If we repeat this trial n times, we have a binomial experiment.

If n=100, we expect between 71 and 86 of the trials to end up successes. (95% of the time)

Page 9: Probability and Statistics

n Lower bound Upper bound

100 71 86

1000 760 810

10000 7774 7934

100000 78286 78794

1000000 784594 786202

10000000 7851438 7856526

7851438/10000000 * 4 = 3.1406 and 7856526/10000000 * 4 = 3.1426This is the law of large numbers in action.

If we didn’t already know the value of pi, and we had a lot of time, we could use this to estimate pi. Using random processes to estimate constant numbers is called Monte Carlo Simulation.

A simulation of this is at http://polymer.bu.edu/java/java/montepi/montepiapplet.html

Page 10: Probability and Statistics

We knew the model.

We knew the values of all constants.

We used that knowledge to make predictions about what was going to happen.

Page 11: Probability and Statistics

Ask a randomly chosen person whether they know anyone affected by layoffs at GM.

If the response is yes, count this as a success. If not, count it as a failure.

What is the probability of a success?

Page 12: Probability and Statistics

We don’t know the probability of success. Let’s call it p for now.

If we repeat the trial n times, and are careful about which people we talk to, we have a binomial experiment.

If we talk to 100 people, and 17 say they know someone affected by layoffs at GM, then the value of p is somewhere between 0.096 and 0.244 (95% confidence).

Page 13: Probability and Statistics

n Observed successes

Lower Bound Upper Bound

100 17 0.096 0.244

1000 170 0.147 0.193

10000 1700 0.163 0.177

100000 17000 0.168 0.172

1000000 170000 0.169 0.171

Note: There are obviously logistical difficulties in asking a million people a question.

Confidence intervals have confidence levels. The ones above are at the 95% confidence level. Here is an applet that lets you explore what the confidence level means: http://www.rossmanchance.com/applets/Confsim/Confsim.html

Page 14: Probability and Statistics

We knew the model, but not the value of all constants.

We used observed data to tell us something about the model (the unknown constant).

Page 15: Probability and Statistics

Buffon’s Needle http://www.mste.uiuc.edu/reese/buffon/buffon.html

Reese’s Pieces Applet http://www.rossmanchance.com/applets/Reeses/ReesesPieces.html

CAUSEweb http://www.causeweb.org/

Page 16: Probability and Statistics

xnx ppx

nxP

)1()(

Page 17: Probability and Statistics

N=10, p=0.14

N=100, p=0.14