Stats 3000 Week 2 - Winter 2011

Section D

1. Goodness of fit Ch. 6 2. Test of independence Ch. 6 3. Simple regression and correlation (not included

on test 3) a) Regression Ch. 9 b) Correlation Ch. 9 c) Inferences about regression and

correlation Ch.9

Data comes in pairs of quantitative variables. Given such paired data (bivariate data), we want to determine whether there is a relationship between the two quantitative variables and, if so, identify what the relationship is.

Regression analysis allows us to identify an equation that best fits the data, and to predict values of one variable based on another variable.

Descriptive Methods in Regression

What is Linear Regression?

• The straight-line linear regression model is a means of relating one quantitative variable to another quantitative variable

• A way of predicting the value of one variable from another.– It is a hypothetical model of the relationship

between two variables.– The model used is a linear one.– Therefore, we describe the relationship using the

equation of a straight line.

LINEAR REGRESSION ANALYSIS (PREDICTION)

Process of finding the equation of the straight line that best

predicts the value of one variable from a given value of the other.

Procedures that allow us to predict one variable (Y) based

on knowledge of another variable (X)

The goal is to be able to predict new values of Y based on values of X

Generally, Y is called the dependent variable (Predicted)

(Criterion) (Outcome) (ordinate) X is called the independent variable (Predictor) (abscissa).

Scatter Plot: shows the relationship between X and Y.

Student High School GPA (X)

University GPA (Y)

2 2.25 2.00 3 2 60 1.80 4 2.65 2.80 5 2.80 2.10 6 3.10 2.00 7 2.90 2.65 8 3.25 2.25 9 3.30 2.60 10 3.60 3.00 11 3.25 3.10

4.003.753.503.253.002.752.502.252.001.751.501.251.00

.5 1.0 1.5 2 2.5 3.0 3.5 4.0.25 .75 1.25 1.75 2.25 2.75 3.25 3.75

. . . . . ....

High School(X)

University(Y)

Describing a Straight Line

Linear equation: When the relationship between X and Y is linearLinear equation: Y = bX + a

Regression line: Line whose equation is used for predictionLine that best describes the relationship between y, the dependent variable and x, the independent variable.

Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear

A straight line should yield the best “fit” of the data points in a scatterplot (a linear model)

Y = bX + a (regression equation)***** Y = predicted value of Y b = slope of the line; is called the regression coefficient X = value of independent variable a = intercept

Slope: Change in the value of y for one-unit increase in X

ˆChange in value of Slope = =

Change in value of

Intercept: The point at which a line intersects they axis. It is the value of Y, when X = 0. Determine the location of the line.

Intercepts and Slopes

Least squares criterion: Statistical method for finding the best prediction

line. Best prediction line will minimize error in

predicting Y from X.

Best regression line will be closest to the actual data points.

Residuals - the difference between a score and its predicted value

x y1 42 244 85 32

ˆ(Y Y) = residual, error in prediction = e ***********************

Best regression line minimizes the value of

2ˆY Y (is at a minimum) **************

SSresidual = Sum of squares residual= 2ˆY Y ****

VarianceiancevarCo

Covb ***********************

Covariance

The degree to which X and Y, vary together (covary); The variation in one variable (X) that is shared by another (Y)

( )( )Cov ***********ariance

X X Y Y

22 ( )

***********************

*********************a=Y-bX **

A medical researcher is interested in the possibility of a linear relationship between a patient's age and the effectiveness of a certain drug (hours). The drug is administered to 8 randomly selected patients.

Age (X) Effectiveness (Y) 34 6.3 42 8.1 37 7.9 55 9.8 47 8.6 43 8.4 52 9.1 39 8.6

625.43X

VarianceiancevarCo

Covb ***********************

( )( )Covariance *********************

X X Y Y

( )( )X X Y Y (34 43.625)(6.3 8.35) (42 43.625)(8.1 8.35) (37 43.625)(7.9 8.35)

(55 43.625)(9.8 8.35) ........ (39 43.625)(8.6 8.350 45.54875

var 45.54875 / 7 6.507

Co iance

22 ( )

***********************

(34 43.625) (42 43.625) (37 43.625)

(55 43.625) ........ (39 43.625)53.125

.12253.125

a Y bX

a (8.35) (.122)(43.625) 3.03

Y .122X 3.03

Prediction for 44 years old

hours398.8Y

)44)(122(.03.3Y

A researcher suspects that there is a relationship between the number of promisesa political candidate makes and the number of promises that are fulfilled once the candidate is elected. He examines the track record of 10 politicians. Use spss to construct a regression equation that predicts the number of promises made and promises kept by politicians.

The information in the column “unstandardized coefficients” column B embodies the regression equation: (constant) is the intercept

Y 0.118x 9.268

Standard error of estimate .y x(s )

Is a measure of the amount of error in prediction, in

units of the Y variable. Is the standard deviation of the distribution of

obtained Y scores about predicted values of Y, Y.

Standard error of estimate: a measure of the error in prediction used as the basis for a measure of the accuracy of prediction

residual

ˆY Y SSs

n*****

2 df**

x.ys represents the average error in prediction

over an entire scatterplot.

Age (X) Effectiveness (Y) Y 2ˆ( )y y

34 6.3 7.178 .771 42 8.1 8.154 .003 37 7.9 7.544 .127 55 9.8 9.74 .004 47 8.6 8.764 .027 43 8.4 8.276 .015 52 9.1 9.374 .075 39 8.6 7.788 .660

ˆ1.682

.28 .532 6

Y Ys hours

Averaged dispersion of the effectiveness scores around their predicted values.

Section D Goodness of fit Ch. 6 Test of independence Ch. 6 Regression Ch. 9 Correlation Ch. 9 Inferences about regression and correlation Ch.9

Scatterplot

• To see if scores may be related construct a graph of the scores, called a scatterplot– The variable labeled X is plotted on the

horizontal axis (the abscissa)– The Y variable is plotted on the vertical axis (the

ordinate)– The score of a subject on each of the two

measures is indicated by one point on the scatterplot

Conclusions drawn from scatterplots are subjective. A more precise and objective method for detecting straight-line patterns is the linear correlation coefficient.

The linear correlation coefficient r (often simply called the correlation coefficient) measures the strength of the linear relationship between the paired x and y values in a sample.

Is a statistical technique used to measure the relationship between two variables. (magnitude and direction)

Descriptive Methods in Correlation

Pearson Correlation (r) r is a descriptive statistic used to measure the degree of straight line relationship between 2 variables.

r also determines the precision with which predictions can be made using the regression line (r2 = coefficient of determination)

The value of r is not affected by the choice of x or y.

Interchange all x and y values and the value of r will not

change.

r measures the strength of a linear relationship. It is

not designed to measure the strength of a relationship

that is not linear.

Cov Co var iancer

s s (S.D.of X)(S.D.of Y)

(X X)s S.D. of X

(Y Y)s S.D. of Y

(X X)(Y Y)s Covarianc *e

Sign r is a measure of the extent to which paired scores occupy the same (+) or opposite (-) positions within their own distribution.

X X Y Yz z

x y2 2

Cov s s

( )( )

( ) ( )***

X X Y Yr

X X Y Y

Sign of r is determined by covariance or by the numerator. X Y X Y + = two variables move in same direction ( +z, +z, -z, -z) X Y X Y = two variables move in oppositive direction ( +z, - z, -z, +z)

RAW SCORES

1 60 162 45 133 40 124 20 85 10 6

Subject X Y

Z SCORES

1 1.25 1.25 2 0.50 0.5 3 0.25 0.25 4 -0.75 - 0.75 5 -1.25 -1.25

Subject X Y

2 xy xy

Cov Covb r

( )( ) xyCov X X Y Y

b and r will have the same sign 1. The magnitude of r ranges between 0 and 1. 2. The sign of r is either positive or negative.

Degree of linear relationship 0 < r .3 then weak .3 < r .55 then slight .55 < r .8 then moderate .8 < r 1 then strong Thus, r can take any value between -1 and 1.

3. Generally, if regardless of sign,

Direction of relationship

• A correlation coefficient indicates the direction of the relationship by the positive or negative sign of the coefficient

• A positive r indicates• A positive (direct)relationship between

variables X and Y• As the scores on variable X increase, the scores

on variable Y tend to increase• A negative r indicates

• A negative (inverse)relationship between variables X and Y

• As the scores on variable X increase, the scores on variable Y tend to decrease

........

..... .. ... ..

.... ....... .

.. ...

........

.. .... ... ...

. .. .... ...

... ..

.. . .

... ... .

.... .

. .. .. ....

... .. .

.. . .. .

. .. .. . .

.... . ...

.... ..

.. ...

. ....

.... ...

.... .. ...

r = 1 r = 0.9 r = 0.5

r = -1 r = -0.9 r = -0.5

(no error inpredictions)

. ...... .

. . ... ... . ... . .. .... ..

. .... .

... ... ...

... ... . ... . ..... . ... .

. ......

... ...

. ... . ... ..

.... ..

............ ......

...... ...

. .. ..

.. ...

. Y on X

Example: Most of us have heard that tall people generally have larger feet than short people. Is that really true and, if so, what is the relationship between height and foot length? To examine this, Professor Dennis Young obtained data on shoe size and height for a sample of students at Carleton University.

SIZE (X) x- X 2(x X) HEIGHT

2(Y Y) (x- X ) ( Y Y )

10.5 -0.46 0.22 70 -1.46 2.144133 0.679847

13 2.04 4.16 72 0.54 0.28699 1.092857

10.5 -0.46 0.21 74.5 3.04 9.215561 -1.39643

12 1.04 1.08 71 -0.46 0.215561 -0.48286

10.5 -0.46 0.21 71 -0.46 0.215561 0.213571

13 2.04 4.16 77 5.54 30.64413 11.29286

11.5 0.54 0.29 72 0.54 0.28699 0.289286

10 -0.96 0.92 72 0.54 0.28699 -0.51429

8.5 -2.46 6.05 67 -4.46 19.92985 10.98214

10.5 -0.46 0.21 73 1.54 2.358418 -0.70643

10.5 -0.46 0.21 72 0.54 0.28699 -0.24643

11 0.04 0.00 70 -1.46 2.144133 -0.05857

9 -1.96 3.84 69 -2.46 6.072704 4.83

13 2.04 4.16 70 -1.46 2.144133 -2.98714

X=10.964286

2(x X)=

25.736

Y=71.4643

2(Y Y)=

76.23214

(X X) (Y Y) 22.98842

(X X)s

25.736s

131.407

(Y Y)s

76.23214s

132.422

(X X)(Y Y)s

n 122.98842

Cov 1.768r

s s (1.407)(2.422)0.5189

Exercise 14 see webct, exercise folder

Textbook exercises: 9.2, 9.3, 9.10, 9.11, 9.13, 9.15. When appropriate verify your answers with SPSS. Get data for spss from webct, spss folder, spss exercises subfolder.

Readings to prepare for week 3, January 17-22

Chapter 9

Sections: 9.7, 9.8, 9.10, 9.11

SPSS assignment # 2 due next week

Stats 3000 Week 2 - Winter 2011

Education

Thermo Scientific AI 3000 / AS 3000 AI 3000 / AS 3000 II · P/N 31709392, Tenth Edition, March 2012 Thermo Scientific AI 3000 / AS 3000 AI 3000 / AS 3000 II 3000 Series Automatic

2021 3 5 6 10 ( ) 3000 3000 3000 3000 3000 3000 3000 3000

Maths & Stats Autumn Winter 2011

Stats for business salse stats in florida

CREB Stats 2011 Jan Metro Stats

03.30.14 Stats (Final Spring Stats)

Stats walkthrough

Today stats

Stats on Stats ETF August 2016

CREB Stats 2010 July Metro Stats

STATS AND BREAKDOWNS ON ICHIRO’s 3,000 MLB HITS29 Roberto Clemente 1955-1972 3000 30 Ichiro Suzuki 2001- 3000 Members of MLB 3,000-Hit Club Born outside U.S. Player Birthplace Rod

Stats 11, Winter 2004

CHICAGOLAND INDUSTRIAL MARKET REPORT · 6 9450 W. Bryn Mawr Avenue, Suite 550 | Rosemont, IL 60018 | 773-355-3000 Key Stats Industrial Base 22,260,613 Vacancy Rate 2.26%

THE BEST DEALS OF THE WINTER ARE HERE - … BEST DEALS OF THE WINTER ARE HERE ... or $56.56 3561759 per month* TE 60-AVR ... DRS 3000 For only Hilti TE 3000 For $689 or

Winter 2018 THE PARENT INSIDERgoffc.org/wp-content/uploads/2017/12/Winter-18.pdf · ( Red Bull Wingfinder) Teen Fast Stats alarming adolescent reality: “Social Media is Fueling

Wuensch Stats

Poker Stats explained (HUD stats)

Stats Quize

WELCOME PARTICIPATION STATS (10-11) : % of school population (57% ) Total Fall athletes: 285 (35%) Total Winter athletes: 147 (18%) Total Spring athletes:

We survived Winter!! Stats y has August€¦ · Stats y has August We had 116.0mm rain for August bringing the progressive total to 764.5mm. This compares to the same period last