Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007

Modeling a Linear Relationship

Lecture 44Secs. 13.1 – 13.3.1Tue, Apr 24, 2007

Bivariate Data

Data is called bivariate if each observations consists of a pair of values (x, y).

x is the explanatory variable. y is the response variable. x is also called the independent variable. y is also called the dependent variable.

Scatterplots

Scatterplot – A display in which each observation (x, y) is plotted as a point in the xy plane.

Example

Draw a scatterplot of the following data of calories vs. cholesterol in Subway sandwiches.

Calories (x) 350 290 330 290 320 370 280 290 310 230

Cholesterol (y) 50 20 45 15 35 50 20 25 20 0

Example

200 250 300 350 400Calories

Example

Does there appear to be a relationship? How can we tell?

TI-83 - Scatterplots

To set up a scatterplot,Enter the x values in L1.

Enter the y values in L2.

Press 2nd STAT PLOT.Select Plot1 and press ENTER.

The Stat Plot display appears.Select On and press ENTER.Under Type, select the first icon (a small

image of a scatterplot) and press ENTER.For XList, enter L1.For YList, enter L2.For Mark, select the one you want and press

ENTER.

To draw the scatterplot,Press ZOOM. The Zoom menu appears.Select ZoomStat (#9) and press ENTER. The

scatterplot appears.Press TRACE and use the arrow keys to

inspect the individual points.

Describing a Linear Relationship How would we describe this relationship?

200 250 300 350 400Calories

Linear Association

Draw (or imagine) an oval around the data set. If the oval is tilted, then there is some linear

association. If the oval is tilted upwards from left to right, then

there is positive association. If the oval is tilted downwards from left to right,

then there is negative association. If the oval is not tilted at all, then there is no

association.

Positive Linear Association

Negative Linear Association

No Linear Association

Strong vs. Weak Association

The association is strong if the oval is narrow.

The association is weak if the oval is wide.

Strong Positive Linear Association

Weak Positive Linear Association

Example

200 250 300 350 400Calories

Describing the Relationship

200 250 300 350 400Calories

Describing the Relationship

There appears to be a strong positive linear association between calories and cholesterol in Subway sandwiches.

Example

Draw a scatterplot of the following data.

Simple Linear Regression

To quantify the linear relationship between x and y, we wish to find the equation of the line that “best” fits the data.

Typically, there will be many lines that all look pretty good.

How do we measure how well a line fits the data?

Measuring the Goodness of Fit

Which line better fits the data?

Start with the scatterplot.

Draw any line through the scatterplot.

Measure the vertical distances from every point to the line

Each of these represents a deviation, called a residual, from the line.

Residuals

The i th residual – The difference between the observed value of yi and the predicted, or expected, value of yi.

Use yi^ for the predicted yi.

The formula for the ith residual is

iii yye ˆ

Residuals

Notice that the residual is positive if the data point is above the line and it is negative if the data point is below the line.

The ith residual.

Find the sum of the squared residuals.

The smaller the sum of squared residuals, the better the fit.

Example

Consider the data points

Example

2 3 4 5 6 7 8 9

Least Squares Line

Let’s see how good the fit is for the line

y^ = -1 + 2x,

where y^ represents the predicted value of y, not the observed value.

Sum of Squared Residuals

Begin with the data set.

Compute the predicted y, using y^ = -1 + 2x.

x y y^

6 12 11

9 16 17

Compute the residuals, y – y^.

x y y^ y – y^

2 3 3 0

3 5 5 0

5 9 9 0

6 12 11 1

9 16 17 -1

Square the residuals.

x y y^ y – y^ (y – y^)2

2 3 3 0 0

3 5 5 0 0

5 9 9 0 0

6 12 11 1 1

9 16 17 -1 1

x y y^ y – y^ (y – y^)2

2 3 3 0 0

3 5 5 0 0

5 9 9 0 0

6 12 11 1 1

9 16 17 -1 1

SSE = (y – y^)2 = 2.00

Least Squares Line

Least squares line – The line for which the sum of the squares of the residuals is as small as possible.

The least squares line is also called the line of best fit or the regression line.

Regression Line

We will write regression line as

a is the y-intercept. b is the slope.

This is the usual slope-intercept form

with the two terms rearranged and relabeled.

bxay ˆ

bmxy ˆ

TI-83 – Computing Residuals

It is not hard to compute the residuals and the sum of their squares on the TI-83.

(Later, we will see a faster method.) Enter the x-values in list L1 and the y-values in list L2.

Compute a + b*L1 and store in list L3 (y^ values).

Compute (L2 – L3)2. This is a list of the squared residuals.

Compute sum(Ans). This is the sum of the squared residuals.

Now let’s see how good the fit is for the line

y^ = -0.5 + 1.9x. We will compute the sum of squared

residuals, SSE.

Begin with the data set.

Compute the predicted y, using y^ = -0.5 + 1.9x.

x y y^

2 3 3.3

3 5 5.2

5 9 9.0

6 12 10.9

9 16 16.6

Compute the residuals, y – y^.

x y y^ y – y^

2 3 3.3 -0.3

3 5 5.2 -0.2

5 9 9.0 0.0

6 12 10.9 1.1

9 16 16.6 -0.6

Compute the squared residuals.

x y y^ y – y^ (y – y^)2

2 3 3.3 -0.3 0.09

3 5 5.2 -0.2 0.04

5 9 9.0 0.0 0.00

6 12 10.9 1.1 1.21

9 16 16.6 -0.6 0.36

x y y^ y – y^ (y – y^)2

2 3 3.3 -0.3 0.09

3 5 5.2 -0.2 0.04

5 9 9.0 0.0 0.00

6 12 10.9 1.1 1.21

9 16 16.6 -0.6 0.36

SSE = (y – y^)2 = 1.70

We conclude that y^ = -0.5 + 1.9x is a better fit than y^ = -1 + 2x.

Is it the best fit?

2 3 4 5 6 7 8 9

y^ = -1 + 2x

2 3 4 5 6 7 8 9

y^ = -0.5 + 1.9x

Example

For all the lines that one could draw through this data set,

it turns out that 1.70 is the smallest possible value for the sum of the squares of the residuals.

Example

Therefore,

y^ = -0.5 + 1.9x

is the regression line for this data set.

Prediction

Use the regression line to predict y when x = 4 x = 7 x = 20

Interpolation – Using an x value within the observed extremes of x values to predict y.

Extrapolation – Using an x value beyond the observed extremes of x values to predict y.

Interpolation vs. Extrapolation

Interpolated values are more reliable then extrapolated values.

The farther out the values are extrapolated, the less reliable they are.

Modeling a Linear Relationship Lecture 44 Secs. 13.1 – 13.3.1 Tue, Apr 24, 2007

Documents

STAT22200 Chapter 13 Complete Block DesignsSTAT22200 Chapter 13 Complete Block Designs Yibi Huang 13.1-13.2 Randomized Complete Block Design (RCBD) 13.3 Latin Square Designs 13.3.1

Credit - Secs. I-III

BIPROPELLANT ATTITUDE CONTROL SYSTEM (ACS ......Performance DST-11H DST-12 DST-13 5 lbf Specific Impulse 310 secs 302 secs 298 secs 288 secs/292 secs Throughput 907 kg (2000 lbm) 1073

13.3.1 实数

Welfare [Secs

1o. INFORMÁTICA SECS. GENERALES

13.3.1 Convert angle measures between degrees and radians

TRANSFORMATEURS SECS HT/BT

13 Physical schema design - Freie Universität · 13 Physical schema design 13.1 Introduction 13.2 Technology 13.2.1 Disk technology 13.2.2 RAID 13.3 Index structures in DBS 13.3.1

Votre industriel en Riz, Légumes secs, Fruits secs, Epices ... · Votre industriel en Riz, Légumes secs, Fruits secs, Epices et Produits orientaux. Édition 2018 « Depuis 1932,

13.3.1 等腰三角形

Brunei in Secs

Secs 27 181-14

13. VANDORF-PRESTON LAKE SECONDARY PLAN 13.1 ......2017/07/13 · Town of Whitchurch-Stouffville Official Plan July 2017 Consolidation 13 - 6 13.3 COMMUNITY STRUCTURE 13.3.1 Purpose

BANDO DI AMMISSIONE MASTER UNIVERSITARIO DI … · Economia e Politica Economica Economia Regionale Economia Politica 2 SECS-P/01 SECS-P/02 SECS-S/06 SECS-P/06 Politica Economica

ALSACE : blancs secs

What’s New in ProgressBook v 13.3.1, v13.4, and v13.4.1

Poesia fruits secs

Guide Pratique Legumes Secs

Fruits Et Fruits Secs