Wednesday, May 13, 2015 Report at 11:30 to Prairieview

Preview:

Citation preview

Wednesday, May 13, 2015

Report at 11:30 to Prairieview

AP Exam

We started by describing univariate (that means one variable) data graphically and numerically.

Graphically:

Exploring Data: Describing Patterns and Departures from Patterns

DotplotStemplot

More graphs…

Histogram

Cumulative Frequency

Will need to read, interpret and answer questions of graphs.

When asked to describe data based on the graph, focus on SOCS

Shape: Mound, Skewed (left or right? positive or negative?), Bimodal, Unimodal, Uniform, approximately normal…

Outliers: Are there any potential? If you are only asked to describe the graph, you don’t need to calculate, just mention any potential outliers

Center: Where (approx) is the median? the mean? based on the shape of the data, which is a better choice for center?

Spread: Range, IQR

SOCS

While a plot provides a nice visual description of a dataset, we often want a more detailed numeric summary of the center and spread.

Mean

Median or Q2

Describing Center

Range: Max-Min InterQuartile Range: IQR=Q3-Q1 Standard Deviation:

Measures of Variability: When describing the “spread” of a set of data, we can use:

Use a bar graph to display categorical data. Make sure graph is labeled, bars are equal width and evenly spaced.

A pie chart may be

used to display categorical data if the data is parts

of a whole.

Analyzing Categorical Data

Conditional Distribution

Describes the values of a variable amongindividuals who have a specific value ofanother variable.

Many distributions of data and many statistical applications can be described by an approximately normal distribution.

Symmetric, Bell-shaped Curve Centered at Mean μ Described as N(μ, )Empirical Rule: 68% of data within 1 of µ 95% within 2 of µ 99.7% within 3 of µ

Normal Distribution

If the data does not fall exactly 1, 2, or 3 from µ, we can standardize the value using a z-score:

Standardizing Data

You can find the or percentile by finding the area left of the z-score.

The area under a distribution curve = 1

Use table A to find percentiles or p-values

To Get area above: To Get area below:normalcdf(z, 100, 0, 1) normalcdf(-100, z, 0, 1)

Betweeen: normalcdf (z, z, 0, 1 )

Shape Normal Probability Plot – Linear Plot =

Normal

distribution

Assessing Normality

Bivariate Data (That means 2 variables)

The study of bivariate data is the studyof the relationship between quantitativevariables.

• D O F S (Direction, Outliers, Form, Strength)• Least Squares Regression Line• Residuals (observed – predicted)• Correlation (r) –Correlation Coefficient• r2 – Coefficient of Determination

Calculator Steps:Make sure Diagnostics are onEnter data in L1, L2STAT CALC LinReg (a + bx)

Correlation “r”We can describe the strength of a linear relationship with the Correlation Coefficient, r

-1 < r < 1

The closer r is to 1 or -1, the strongerthe linear relationship between x and y.

r alone is not enough to say thereis a linear relationship between 2variables.

Least Squares Regression Line

When we observe a linear relationship between x and y, we often want to describe it with a “line of best fit” y=a+bx.

We can find this line by performing least-squares regression.

We can use the resulting equation to predict y-values for given x- values.

Assessing FitIf we hope to make useful predictions of y we must assess whether or not the LSRL is indeed the best fit. If not, we may need to find a different model.

Use the residual plot to help determine linearity.

Plots should be scattered with no obvious patterns or curvature.

If you are satisfied that the LSRL provides an appropriate model for predictions, you can use it to predict a y-hat for x’s within the observed range of x-values.

Predictions for observed x-values can be assessed by noting the residual. Residual =

Making Predictions

Bivariate Relationship – Non Linear

If data is not best described by a LSRL, we may be able to find a Power or Exponential model that can be used for more accurate predictions.

Power Model (ln x , ln y ) or ( log x , log y )

Exponential Model ( x , ln y ) or ( x , log y )

If (x,y) is non-linear, we can transform it to try to achieve a linear relationship.

If transformed data appears linear, we can find a LSRL and then transform back to the original terms of the data

Our goal in statistics is often to answer a question about a population using information from a sample.

Observational Study vs. Experiment We must be sure the sample is

representative of the population in question.

Sampling and Surveys

If you are performing an observational study, your

sample can be obtained in a number of ways: ConvenienceCluster SystematicSimple Random Sample Stratified Random Sample

Observational Studies

In an experiment, we impose a treatment with the hopes of establishing a causal relationship.

Experiments exhibit 3 Principles:

RandomizationControlReplication

Experimental Study

Like Observational Studies, Experiments can take a number of different forms:

Completely Controlled Randomized Comparative Experiment Blocked Matched Pairs

Experimental Designs

Scope of Inference