37
Business Statistics for Managerial Decision Making Examining Relationships

Business Statistics for Managerial Decision Making

Embed Size (px)

DESCRIPTION

Business Statistics for Managerial Decision Making. Examining Relationships. Examining Relationships. To study the relationship between two variables, we measure both variables on the same individuals. Often we think that one of the variables explains or influences the other. - PowerPoint PPT Presentation

Citation preview

Page 1: Business Statistics for Managerial Decision Making

Business Statistics for Managerial Decision Making

Examining Relationships

Page 2: Business Statistics for Managerial Decision Making

Examining Relationships To study the relationship between two variables,

we measure both variables on the same individuals.

Often we think that one of the variables explains or influences the other.

A response variable measures an outcome of a study.

An explanatory variable explains or influences changes in a response variable.

Page 3: Business Statistics for Managerial Decision Making

Scatter plot A scatter plot shows the relationship between two

quantitative variables measured on the same individuals.

The values of one variable appear on the horizontal axis, and the other variable appear on the vertical axis.

Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Page 4: Business Statistics for Managerial Decision Making

Example Scatter plot of the gross

sales for each day in April 2000 against the number of items sold for the same day at Duck Worth Wearing company.

The dotted lines intersect at the point (72, 594), the data for April 22, 2000.

Page 5: Business Statistics for Managerial Decision Making

Examining a Scatter Plot In any graph of data, look for the overall

pattern and for striking deviations from the pattern.

You can describe the overall pattern of a scatter plot by the form, direction, and strength of the relationship.

Page 6: Business Statistics for Managerial Decision Making

Positive Association, Negative Association

Two variables are positively associated when above average values of one tends to accompany the above average values of the other and below average values also tend to occur together.

Two variables are negatively associated when above average values of one tend to accompany below average values of the other, and vice versa.

Page 7: Business Statistics for Managerial Decision Making

Example City and highway fuel

consumption for 2002 model two-seater cars.

There is one unusual observation.

Describe the pattern of the relationship between city and highway mileage.

Page 8: Business Statistics for Managerial Decision Making

Example Scatter plot of life

expectancy against domestic product per person for all the nations for which data are available.

Describe the form, direction, and strength of the overall pattern.

The three African nations marked on the graph are outliers.

Page 9: Business Statistics for Managerial Decision Making

Correlation The correlation measures the direction and strength

between two quantitative variables. Correlation is usually written as r. If we have data on variables x and y for n

individuals. The values for the first individual are x1 and y1, the

values for the second individual are x2, y2, and so on. The means and standard deviations for the two

variables are and sx for the x-values, and and sy for the y-values.x y

Page 10: Business Statistics for Managerial Decision Making

Correlation The correlation r between x and y is

yx

n

iii

yx

n

iii

ssn

yxnyxr

ssn

yyxxr

)1(

or

)1(

))((

1

1

Page 11: Business Statistics for Managerial Decision Making

Facts about Correlation1. Correlation makes no distinction between explanatory

and response variables.2. Correlation requires that both variables be quantitative.3. r does not change when we change the units of

measurement of x, y, or both.4. Positive r indicates positive linear association between

variables, and negative r indicates negative linear association.

5. The correlation r is always a number between –1 and 16. Correlation measures the strength of only a linear

relationship between two variables.

Page 12: Business Statistics for Managerial Decision Making

Correlation

Page 13: Business Statistics for Managerial Decision Making

Least Squares Regression Correlation measures the direction and strength of

linear relationship between two quantitative variables. If the scatter plot shows a linear relationship, we can

summarize this overall pattern by drawing a line on the scatter plot.

A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.

We use a regression line to predict the value of y for a given value of x.

Page 14: Business Statistics for Managerial Decision Making

Least Squares Regression This is a scatter plot of the

natural gas consumption for Sanchez family. Outside temperature, x, is measured by heating degree-days in a month. the average amount of natural gas that the family uses per day during the month is y.

Page 15: Business Statistics for Managerial Decision Making

Least Squares Regression How do we draw the

least squares regression line?

Different people might draw different lines by eye on a scatter plot.

Page 16: Business Statistics for Managerial Decision Making

Least Squares Regression The difference

between our prediction based on the least square regression line and the the observed value is the error of our prediction.

This error is also called the residual

yye

yy

ˆ

predicted - Observed Residual

Page 17: Business Statistics for Managerial Decision Making

Least Squares Regression The least squares regression line of y on x is the

line that makes the sum of the squares of the errors or residuals (vertical distances of the data points from the line) as small as possible.

Give the data on explanatory variable, x, and the response variable, y, we can find the equation of the line with the smallest squared residuals.

Page 18: Business Statistics for Managerial Decision Making

Least Squares Regression The lest squares regression line is:

Where the slope is:

And the intercept is:

bxay ˆ

x

y

s

srb

xbya

Page 19: Business Statistics for Managerial Decision Making

Example This table gives data on

declines of at least 10% in the standard&poor's 500-stock index between 1940 and 1999. The data shows how far the index fell from its peak and how long the decline in stock prices lasted.

Decline Duration42 2827 514 115 810 122 1514 1526 622 836 1848 2126 1914 1034 320 3

Page 20: Business Statistics for Managerial Decision Making

Example Scatter plot of percent

decline versus duration in months of the bear market.

Is there a linear association?

Is the association positive or negative?

Page 21: Business Statistics for Managerial Decision Making

Example Calculation shows that the mean and standard deviation of the durations are:

For the declines are:

The correlation between duration and decline is: r = 0.6285

Find the equation of the least-squares line for predicting decline from duration.

One bear market has a duration of 15 months but a very low decline of 14%. What is the predicted decline for a bear market with duration of 15 months? What is the residual for this particular bear market?

20.873.10 xSx

20.1167.24 ySy

Page 22: Business Statistics for Managerial Decision Making

Residual plots Recall A residual is the difference between an

observed value of the response variable and the value predicted by the regression line.

A residual plot is a scatter plot of the regression residuals against the explanatory variable x.

Residual plots help us asses the fit of a regression line. The mean of the least-squares residuals is always zero.

yye

yy

ˆ

predicted - Observed Residual

Page 23: Business Statistics for Managerial Decision Making

Residual plots The residuals should

have no systematic pattern.

The residual plot to right shows a scatter of the points with no individual observations or systematic change as x increases.

Degree Days Residual Plot

-1

-0.5

0

0.5

1

0 20 40 60

Degree DaysRe

sidu

als

Page 24: Business Statistics for Managerial Decision Making

Residual plots The points in this

residual plot have a curve pattern, so a straight line fits poorly

Page 25: Business Statistics for Managerial Decision Making

Residual plots The points in this plot

show more spread for larger values of the explanatory variable x, so prediction will be less accurate when x is large.

Page 26: Business Statistics for Managerial Decision Making

Residual plots Influential observations

An observation is influential if removing it would markedly change the fitted line.

Points that are extreme in the x direction of a scatter plot are often influential for the least squares regression

line. Observation 21 is an

influential observation.

Page 27: Business Statistics for Managerial Decision Making

Residual plots The solid line is

calculated from all the data. The dashed line is calculated leaving observation 21 out.

Observation 21 is an influential observation since leaving it out moves the regression line quite a bit.

Page 28: Business Statistics for Managerial Decision Making

Residual plots Outlier

An observation that falls outside the overall pattern of the observations is called an outlier.

Points that are outliers in the y direction of a scatter plot have large regression residuals.

Page 29: Business Statistics for Managerial Decision Making

Caution about Correlation and Regression

Extrapolation Extrapolation is the use of a regression line for

prediction far outside of the explanatory variable x that you used to obtain the line.

These predictions are often not accurate. Lurking variable

A lurking variable is a variable that is not among the explanatory or response variables in a study and yet it may influence the interpretation of relationships among those variables.

Page 30: Business Statistics for Managerial Decision Making

Caution about Correlation and Regression

Association is not causation An association between an explanatory variable x and a response

variable y, even if it is strong, is not by itself good evidence that changes in x actually cause changes in y.

Example: There is a high positive correlation between the number of television sets per person (x) and the average life expectancy (y) for the world’s nations. Could we lengthen the lives of people in Rwanda by shipping them TV sets?

The best way to get evidence that x causes y is to do an experiment in which we change x and keep lurking variables under control.

Page 31: Business Statistics for Managerial Decision Making

Relations in Categorical Data Two-way tables

When two categorical variables are studied with several levels of each variables then the data can be organized in a two-way table.

For example let’s look at the relation between the payment method (cash, check, credit card) and type of purchase (impulse purchase, planned purchase).

Page 32: Business Statistics for Managerial Decision Making

Relations in Categorical Data

Payment method

Cash Check Credit card

Total

Impulse Purchase

14 4 13 31

Planned Purchase

20 11 35 66

Total 34 15 48 97

Page 33: Business Statistics for Managerial Decision Making

Relations in Categorical Data Marginal distribution

We can look at each categorical variable separately in a two-way table by studying the row totals and the column totals. They represent the marginal distributions, expressed in counts or percentages.

Example: 31/97= 31.96% were impulse shoppers. 48/97 = 49.5% paid by credit card.

Page 34: Business Statistics for Managerial Decision Making

Relations in Categorical Data

The marginal distributions summarize each

categorical variable independently. But the two-way

table actually describes the relationship between

both categorical variables.

The cells of a two-way table represent the

intersection of a given level of one categorical factor

with a given level of the other categorical factor.

Page 35: Business Statistics for Managerial Decision Making

Relations in Categorical Data For example:

35/97 = 36.1% of shoppers did a planned purchase and paid by credit card.

What percentage did impulse purchase and paid cash?

Page 36: Business Statistics for Managerial Decision Making

Relations in Categorical Data The percents within the table of the row variable for one

specific value of the column variable represent the conditional distribution of the row variable. Comparing the conditional distributions allows you to describe the “relationship” between both categorical variables.

Example: Among those who used credit card, 13/48 = 27.1 % of the

purchases were on impulse, and 35/48 = 72.9% of the purchases were planned.

Page 37: Business Statistics for Managerial Decision Making

Relations in Categorical Data Similarly, The percents within the table of the

column variable for one specific value of the row variable represent the conditional distribution of the column variable.

Example: Among those who purchase on impulse; 13/31 = 41.9%

paid by credit card, 4/31 = 12.9% paid by check, and 14/31 = 45.2% paid cash.