cld.persiangig.comcld.persiangig.com/dl/8PxB5w/STS AMAR MBA 92.07 (1).pdf · Class Intervals and Class Boundaries If each class grouping has the same width Determine the width of

مار و احتمالمار و احتمالا ا

کارشناسی ارشدکارشناسی ارشد

MBAMBA دانشکده مدیریت دانشگاه تهراندانشکده مدیریت دانشگاه تهران

دکـتر عموزاددکـتر عموزاد: : مدرسمدرس

چارچوب و طبقه بندی کلی

مار و احتمال ا

مار )بررسی نمیشود(احتما.ت مار )بررسی نمیشود(احتما.ت ا ا

مار استنباطی تخمین / نمونه گیری (ا

زمون )و ا

مار ناپارامتریکمار پارامتریک ا

ا

مار توصیفی بررسی / سرشماری (ا

بررسی نمیشود - )جوامع

سرفصل های پیشنهادی درس

ردي

ف

توضيحاتتعداد جلساتموضوع

*1گيری-نمونه ھای- روش و آمار معرفی و مبانی1

*1 مرکزی حد قضيه – مطلوب آماره و پارامتر و آماره2

*3)موفقيت نسبت – واريانس – زوج – ميانگين( ای-فاصله تخمين3

*1نمونه اندازه محاسبه4

*3آن انواع و آماری فرض آزمون5

*1دو خی توزيع کاربردھای6

*1ھمبستگی و رگرسيون7

*SPSS1 افزار-نرم8

منابع پیشنهادی درس

انتشارات –مومنی منصور آذر، عادل –مديريت در آن کاربرد و آمار -1

.سمت

مرکز – اصل وحيدی قاسم محمد عميدی، علی –فروند جان –رياضی آمار -2

.دانشگاھی نشر

–قيومی فعال علی مومنی، منصور – افزار نرم از استفاده با آماری تحليل -3

.نو کتاب نشر

منابع پیشنهادی درس

فصل اولفصل اول

مفاهیم اولیهمفاهیم اولیه

Key Definitions

� A population is the collection of all items or

things under consideration –people or objects

� A sample is a portion of the population

selected for analysisselected for analysis

� A parameter is a summary measure that

describes a characteristic of the population

� A statistic is a summary measure computed

from a sample

Population vs. Sample

a b c d

Population Sample

b c

ef gh i jk l m n

o p q rs t u v w

x y z

g i n

o r u

y

Measures used to describe the population are called parameters

Measures computed from sample data are called statistics

Key Definitions

� A survey is the gathering of data about a

particular group of people or items

� A census is a survey of the entire population

� A sample is a survey of a portion of the

population

Two Branches of Statistics

� Descriptive statistics

� Collecting, summarizing, and describing data

� Inferential statistics� Inferential statistics

� Drawing conclusions and/or making decisions

concerning a population based only on sample

data

Descriptive Statistics

� Collect data

� e.g. Survey

Present data� Present data

� e.g. Tables and graphs

� Characterize data

� e.g. Sample mean = iX

n

∑

Inferential Statistics

� Estimation

� e.g.: Estimate the population

mean weight using the sample

mean weight

� Hypothesis testing

� e.g.: Test the claim that the

population mean weight is over

120 pounds

Drawing conclusions and/or making decisions concerning a population based on sample results.

Data Sources

SecondaryData Compilation

Print or Electronic

PrimaryData Collection

Observation

Experimentation

Print or Electronic

Survey

Types of Data

Data

Categorical(Qualitative)

Numerical(Quantitative) (Qualitative) (Quantitative)

Discrete Continuous

Examples:

� Marital Status

� Political Party� Eye Color

(Defined categories)Examples:

� Number of Children

� Defects per hour

(Counted items)

Examples:

� Weight

� Voltage

(Measured characteristics)

Levels of Measurementand Measurement Scales

Interval Data

Highest Level

Strongest forms of measurementDifferences between

measurements but no

Ratio DataDifferences between measurements, true zero exists

Interval Data

Ordinal Data

Nominal Data

Higher Level

Lowest Level

Weakest form of measurement

Categories (no ordering or direction)

Ordered Categories (rankings, order, or scaling)

measurements but no true zero

Tables and Charts forNumerical Data

Numerical Data

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 2-16

Ordered Array

Stem and Leaf

DisplayHistogram Polygon Ogive

Frequency Distributions and

Cumulative Distributions

� Data in raw form (as collected):

24, 26, 24, 21, 27, 27, 30, 41, 32, 38

(continued)

The Ordered Array


� Data in ordered array from smallest to largest:

21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Example

� Completed Stem-and-leaf diagram:

(continued)

Data in ordered array:21, 24, 24, 26, 27, 27, 30, 32, 38, 41


� Completed Stem-and-leaf diagram:

Stem Leaves

2 1 4 4 6 7 7

3 0 2 8

4 1

Class Intervals and Class Boundaries

� If each class grouping has the same width

� Determine the width of each interval by

range≅intervalofWidth


� Use at least 5 but no more than 15 groupings

� Class boundaries never overlap

� Round up the interval width to get desirable endpoints

groupingsclassdesiredofnumber≅intervalofWidth

Approximate Number of Classes

Observations Classes

Less than 50 3-6

50-200 6-9


� Excel uses the square root of n.

50-200 6-9

200-1000 8-12

More than 1000 10-15

Graphing Numerical Data: The Histogram

� A graph of the data in a frequency distribution

is called a histogram

� The class boundaries (or class midpoints)


are shown on the horizontal axis

� frequency is measured on the vertical axis

� Bars of the appropriate heights can be used to

represent the number of observations within

each class

Histogram : Daily High Tem perature

67

Histogram Example

Class

10 but less than 20 15 3




FrequencyClass

Midpoint


0

3

6

5

4

2

00

1

2

3

4

5

6

7

5 15 25 35 45 55 More

Fre

qu

en

cy

Class Midpoints

(No gaps between

bars)


Frequency Polygon: Daily High Temperature

7

Graphing Numerical Data: The Frequency Polygon

Class





FrequencyClass

Midpoint


0

1

2

3

4

5

6

7

5 15 25 35 45 55 More

Fre

qu

en

cy

Class Midpoints



More than 60 65 0

(In a percentage polygon the vertical axis would be defined to show the percentage of observations per class)

Graphing Cumulative Frequencies: The Ogive (Cumulative % Polygon)

Ogive: Daily High Temperature

100

Class

less than 10 10 0

less than 20 20 15

less than 30 30 45

less than 40 40 70

Cumulative Percentage

Lower class

boundary


0

20

40

60

80

100

10 20 30 40 50 60

Cu

mu

lati

ve

Pe

rce

nta

ge

Class Boundaries (Not Midpoints)

less than 50 50 90

less than 60 60 100

Scatter Diagram Example

Cost per Day vs. Production Volume

200

250C

os

t p

er

Da

y

Volume per day

Cost per day

23 125

26 140


0

50

100

150

200

0 10 20 30 40 50 60 70

Volume per Day

Co

st

pe

r D

ay

29 146

33 160

38 167

42 170

50 188

55 195

60 200

Tables and Charts for Categorical Data: Univariate Data

Categorical Data


Graphing Data

Pie Charts

Pareto Diagram

Bar Charts

Tabulating Data

Summary Table

Bar Chart Example

Investor's Portfolio

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

Current Investment Portfolio


Investor's Portfolio

0 10 20 30 40 50

Stocks

Bonds

CD

Savings

Amount in $1000's

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Pie Chart Example


Savings

Investment Amount PercentageType (in thousands $) (%)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09


Percentages are rounded to the nearest percent

15%

CD 14%

Bonds 29%

Stocks

42%

CD 15.5 14.09

Savings 16.0 14.55

Total 110.0 100.0

Pareto Diagram Examplecu

mu

lativ

e %

investe

d in

veste

d i

n e

ach

cate

go

ry

35%

40%

45%

70%

80%

90%

100%



cu

mu

lativ

e %

investe

d

(line g

rap

h)

% i

nveste

d i

n e

ach

cate

go

ry

(bar

gra

ph

)

0%

5%

10%

15%

20%

25%

30%

Stocks Bonds Savings CD

0%

10%

20%

30%

40%

50%

60%

70%

فصل دوم

شاخص های اولیه

Summary Measures

Arithmetic Mean

Describing Data Numerically

Range Skewness

Central Tendency Variation ShapeQuartiles

Arithmetic Mean

Median

Mode Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Geometric Mean

Skewness

Measures of Central Tendency

Central Tendency

Arithmetic Mean Median Mode Geometric Mean

Overview

Arithmetic Mean Median Mode Geometric Mean

n

X

X

n

ii∑

== 1

n/1n21G )XXX(X ×××= L

Midpoint of ranked values

Most frequently observed value

(Arithmetic Mean)

n/1n21G )XXX(X ×××= L

Sample size

n

XXX

n

X

X n21

n

1ii +++

==∑

= L

Observed values

Arithmetic Mean

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321==

++++4

5

20

5

104321==

++++

(Median)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

(Mode)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No ModeMode = 9

No Mode

Review Example

$ 2 ,0 0 0 K

House Prices:

$2,000,000$ 5 0 0 K

$ 3 0 0 K

$ 1 0 0 K

$ 1 0 0 K

$2,000,000500,000300,000100,000100,000

(Quartiles)

25% 25% 25% 25%

� The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger

� Q2 is the same as the median (50% are smaller, 50% are larger)

� Only 25% of the observations are greater than the third quartile

Q1 Q2 Q3

Quartile Formulas

First quartile position: Q1 = (n+1)/4

Second quartile position: Q2 = (n+1)/2 (the median position)Second quartile position: Q2 = (n+1)/2 (the median position)

Third quartile position: Q3 = 3(n+1)/4

where n is the number of observed values

(Measures of Variation)

ความผนแปร

Variance Standard

Deviation

Coefficient of

Variation

Range Interquartile

Range

Same center,

different variation

Deviation VariationRange

(Range)

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Interquartile Range

� Interquartile range = 3rd quartile – 1st quartile

= Q3 – Q1

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

� Sample variance:

(Variance)

)X(Xn

2∑ −

1-n

)X(X

S 1i

2i

2∑

=

−

=

Where = arithmetic mean

n = sample size

Xi = ith value of the variable X

X

(Standard Deviation)

� Sample standard deviation:

1-n

)X(X

S

n

1i

2i∑

=

−

=

Calculation Example:

Sample Standard Deviation

Sample Data (Xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = X = 16

)X(2 4)X(1 4)X(1 2)X(1 0 2222 −++−+−+− L

4 .2 4 2 67

1 2 6

18

1 6 )(2 41 6 )(1 41 6 )(1 21 6 )(1 0

1n

)X(2 4)X(1 4)X(1 2)X(1 0S

2222

2222

==

−

−++−+−+−=

−

−++−+−+−=

L

L

Measuring variation

Small standard deviation

Large standard deviation

Comparing Standard Deviations

Mean = 15.5

S = 3.33811 12 13 14 15 16 17 18 19 20 21

Data B

Data A

11 12 13 14 15 16 17 18 19 20 21

Data BMean = 15.5

S = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5

S = 4.57

Data C

(Coefficient of Variation, CoV)

1 0 0 %X

SC V ⋅

=

Comparing Coefficients

of Variation

� Stock A:

� Average price last year = $50

� Standard deviation = $5

1 0 %1 0 0 %$ 5

1 0 0 %S

C V =⋅=⋅

=

� Stock B:

� Average price last year = $100

� Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

1 0 %1 0 0 %$ 5 0

1 0 0 %X

C V A =⋅=⋅

=

5 %1 0 0 %$ 1 0 0

$ 51 0 0 %

X

SC VB =⋅=⋅

=

(Shape of a Distribution)

� (skewed)

Mean = MedianMean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Population Summary Measures

XN

∑N

XXX

N

XN211i

i +++==µ

∑= L

µ = population mean

N = population size


Where

� squared deviations of values from the mean

� Population variance:

Population Variance

µ)(XN

2i∑ −

Nσ 1i

i2∑

==

Where µ = population mean

N = population size


Population Standard Deviation

� Population standard deviation:

N

µ)(X

σ

N

1i

2i∑

=

−

=

The Empirical Rule

1σµ ±

µ

68%

1σµ ±

� contains about 95% of the values in

the population or the sample

� contains about 99.7% of the values in the population or the sample

The Empirical Rule

2σµ ±

3σµ ±in the population or the sample

3σµ±

99.7%95%

2σµ±

Distribution Shape and Box and Whisker Plot

Right-SkewedLeft-Skewed Symmetric

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

Coefficient of Correlation

� Measures the relative strength of the linear relationship between two variables

� Sample coefficient of correlation:� Sample coefficient of correlation:

∑∑

∑

==

=

−−

−−

=n

i

i

n

i

i

n

i

ii

YYXX

YYXX

r

1

2

1

2

1

)()(

))((

Features of Correlation Coefficient, r

� Unit free

� Ranges between –1 and 1

� The closer to –1, the stronger the negative linear

relationshiprelationship

� The closer to 1, the stronger the positive linear

relationship

� The closer to 0, the weaker any linear relationship

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

XX X X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

فصل سوم

ماری معرفی برخی توزیع های ا

Probability Distributions

ContinuousProbability


DiscreteProbability Probability

Distributions

Binomial

Hypergeometric

Poisson


Normal

Uniform

Exponential

The Normal Distribution



Normal

Uniform

Exponential


The Normal Distribution

� ‘Bell Shaped’

� Symmetrical

� Mean, Median and Modeare Equal

Location is determined by the

f(X)

σLocation is determined by the mean, µ

Spread is determined by the standard deviation, σ

The random variable has an infinite theoretical range: + ∞∞∞∞ to −−−− ∞∞∞∞

Mean = Median = Mode

Xµ

σ

Many Normal Distributions

By varying the parameters µ and σ, we obtain different normal distributions

The Normal Distribution Shape

f(X) Changing µ shifts the distribution left or right.

Xµ

σ

Changing σ increases or decreases the spread.

The Normal Probability Density Function

� The formula for the normal probability density function is

2µ)/σ](1/2)[(Xe1

f(X) −−=

Where e = the mathematical constant approximated by 2.71828

π = the mathematical constant approximated by 3.14159

µ = the population mean

σ = the population standard deviation

X = any value of the continuous variable

µ)/σ](1/2)[(Xe2π

1f(X) −−

σ=

The Standardized Normal

� Any normal distribution (with any mean and standard deviation combination) can be transformed into the standardized normaldistribution (Z)distribution (Z)

� Need to transform X units into Z units

Translation to the Standardized Normal Distribution

� Translate from X to the standardized normal (the “Z” distribution) by subtracting the meanof X and dividing by its standard deviation:

σ

µXZ

−=

Z always has mean = 0 and standard deviation = 1

The Standardized Normal Probability Density Function

� The formula for the standardized normal probability density function is

2(1/2)Ze1

f(Z) −=

Where e = the mathematical constant approximated by 2.71828

π = the mathematical constant approximated by 3.14159

Z = any value of the standardized normal distribution

(1/2)Ze2π

1f(Z) −=

The Mathematical Model

( )( )

( )

21

2

2

1

2

: density of random variable

X

f X e

f X X

µσ

πσ

2− −

=

( )

( )

: density of random variable

3.14159; 2.71828

: population mean

: population standard deviation

: value of random variable

f X X

e

X X

π

µ

σ

= =

−∞ < < ∞

Expectation

( )µµ σµ

σµ

σπ

+−−=

=

∫

∫∞

−−

∞

∞−

−−

)(

)(

22

22

2/)(

2/)(

2

1

xdex

dxxeXE

x

x

( )

µ

µ

µµ

σµ

σπ

σµ

σπ

+=

+−−=

∫

∫∞

∞−

−−

∞

∞−

−−

0

)(

22

22

2/)(

2

1

2/)(

2

1

dxe

xdex

x

x

Variance

( ) ( )µ σ

µ

σ

µ

σπσ

µ

)(2/)(2

2

122

=− ∫∞

∞−

−−−−

deXExx

x

( ) ( )

π

µ

π

σ

π

σ

σσσπ

2

)(

2

2/2

2

2

2

22

=

=

=−

∫

∫∞

∞−

−

∞−

dyey

deXE

y

The StandardizedNormal Distribution

� Also known as the “Z” distribution

� Mean is 0

� Standard Deviation is 1

f(Z)

Z

f(Z)

0

1

Values above the mean have positive Z-values,values below the mean have negative Z-values

Example

� If X is distributed normally with mean of 100and standard deviation of 50, the Z value for X = 200 is

1 0 02 0 0µX −−

� This says that X = 200 is two standard deviations (2 increments of 50 units) above the mean of 100.

2 .05 0

1 0 02 0 0

σ

µXZ =

−=

−=

Comparing X and Z units

Z

100

2.00

200 X

Note that the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z)

(µ = 100, σ = 50)

(µ = 0, σ = 1)

Finding Normal Probabilities

Probability is the area under thecurve!

f(X) P a X b( )≤

Probability is measured by the area under the curve

≤

a b X

f(X) P a X b( )≤≤

P a X b( )<<=

(Note that the probability of any individual value is zero)

f(X)

Probability as Area Under the Curve

The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below

0 .5)XP (µ =∞<<0 .5µ )XP ( =<<− ∞

Xµ

0.50.5

1 .0)XP ( =∞<<− ∞

0 .5)XP (µ =∞<<

Empirical Rules

f(X)

What can we say about the distribution of values around the mean? There are some general rules:

µ ± 1σ encloses about

68% of X’s

Xµ µ+1σµ-1σ

σσ

68.26%

The Empirical Rule

� µ ± 2σ covers about 95% of X’s

� µ ± 3σ covers about 99.7% of X’s

(continued)

xµ

2σ 2σ

xµ

3σ 3σ

95.44% 99.72%

The Standardized Normal Table

� The Standardized Normal table in the textbook (Appendix table E.2) gives the probability less than a desired value for Z (i.e., from negative infinity to Z)(i.e., from negative infinity to Z)

Z0 2.00

.9772Example:

P(Z < 2.00) = .9772

The Standardized Normal Table

The column gives the value of Z to the second decimal point

(continued)

Z 0.00 0.01 0.02 …

The value within the table gives the probability from Z = −−−− ∞∞∞∞up to the desired Z value

.9772

2.0P(Z < 2.00) = .9772

The row shows the value of Z to the first decimal point

2.0

.

.

.

0.0

0.1

General Procedure for Finding Probabilities

� Draw the normal curve for the problem in

To find P(a < X < b) when X is distributed normally:

� Draw the normal curve for the problem in

terms of X

� Translate X-values to Z-values

� Use the Standardized Normal Table


� Suppose X is normal with mean 8.0 and standard deviation 5.0

� Find P(X < 8.6)

X

8.6

8.0

� Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6)

(continued)


0 .1 25 .0

8 .08 .6

σ

µXZ =

−=

−=

Z0.120X8.68

µ = 8

σ = 10

µ = 0

σ = 1

P(X < 8.6) P(Z < 0.12)

Z .00 .01

Solution: Finding P(Z < 0.12)

.5478.02

Standardized Normal Probability Table (Portion) = P(Z < 0.12)

P(X < 8.6)

Z

0.12

0.0 .5000 .5040 .5080

.5398 .5438

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255

0.1 .5478

0.00

Upper Tail Probabilities

� Suppose X is normal with mean 8.0 and standard deviation 5.0.

� Now Find P(X > 8.6)

X

8.6

8.0

� Now Find P(X > 8.6)…

(continued)

P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)

= 1.0 - .5478 = .4522

Upper Tail Probabilities

Z

0.12

0Z

0.12

.5478

0

1.000 1.0 - .5478 = .4522

Probability Between Two Values

� Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(8 < X < 8.6)

Calculate Z-values:

P(8 < X < 8.6)

= P(0 < Z < 0.12)

Z0.120

X8.68

05

88

σ

µXZ =

−=

−=

0 .1 25

88 .6

σ

µXZ =

−=

−=

Solution: Finding P(0 < Z < 0.12)

= P(0 < Z < 0.12)

P(8 < X < 8.6)

= P(Z < 0.12) – P(Z ≤ 0)

= .5478 - .5000 = .0478Z .00 .01 .02

Standardized Normal Probability Table (Portion)

Z

0.12

.0478

0.00

.50000.0 .5000 .5040 .5080

.5398 .5438

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255

0.1 .5478


� Now Find P(7.4 < X < 8)

Probabilities in the Lower Tail

X

7.48.0

Probabilities in the Lower Tail

Now Find P(7.4 < X < 8)…

P(7.4 < X < 8)

= P(-0.12 < Z < 0)

(continued)

.0478

X7.4 8.0

= P(Z < 0) – P(Z ≤ -0.12)

= .5000 - .4522 = .0478

.0478

.4522

Z-0.12 0

The Normal distribution is symmetric, so this probability is the same as P(0 < Z < 0.12)

� Steps to find the X value for a known probability:

1. Find the Z value for the known probability

Finding the X value for a Known Probability

2. Convert to X units using the formula:

ZσµX +=

Finding the X value for a Known Probability

Example:


� Now find the X value so that only 20% of all

(continued)

� Now find the X value so that only 20% of all values are below this X

X? 8.0

.2000

Z? 0

Find the Z value for 20% in the Lower Tail

� 20% area in the lower tail is consistent with a Z value of -0.84Z .03 .04

Standardized Normal Probability Table (Portion)

.05…

1. Find the Z value for the known probability

Z value of -0.84Z .03

-0.9 .1762 .1736

.2033

-0.7 .2327 .2296

.04

-0.8 .2005

.05

.1711

.1977

.2266

…

…

…

…

X? 8.0

.2000

Z-0.84 0

2. Convert to X units using the formula:

Finding the X value

0.5)84.0(0.8

ZσµX

−+=

+=

80.3

0.5)84.0(0.8

=

−+=

So 20% of the values from a distribution

with mean 8.0 and standard deviation 5.0 are less than 3.80

The Uniform Distribution




Normal

Uniform

Exponential

The Uniform Distribution

� The uniform distribution is a

probability distribution that has equal

probabilities for all possible probabilities for all possible

outcomes of the random variable

� Also called a rectangular distribution

The Continuous Uniform Distribution:

bXaifab

1≤≤

−

The Uniform Distribution(continued)

f(X) =

otherwise 0

where

f(X) = value of the density function at any X value

a = minimum value of X

b = maximum value of X

f(X) =

Properties of the Uniform Distribution

� The mean of a uniform distribution is

2

baµ

+=

� The standard deviation is

1 2

a )-(bσ

2

=

Uniform Distribution Example

Example: Uniform probability distributionover the range 2 ≤ X ≤ 6:

f(X) = = .25 for 2 ≤ X ≤ 66 - 21

2 6

.25

f(X) = = .25 for 2 ≤ X ≤ 66 - 2

X

f(X)

42

62

2

baµ =

+=

+=

1 5 4 7.11 2

2 )-(6

1 2

a )-(bσ

22

===

The Exponential Distribution




Normal

Uniform

Exponential


� Used to model the length of time between two occurrences of an event (the time between arrivals)

� Examples: � Time between trucks arriving at an unloading dock

� Time between transactions at an ATM Machine

� Time between phone calls to the main operator


� Defined by a single parameter, its mean λ(lambda)

� The probability that an arrival time is less than some specified time X is

Xλe1X)time P(arrival −−=<

some specified time X is

where e = mathematical constant approximated by 2.71828

λ = the population mean number of arrivals per unit

X = any value of the continuous variable where 0 < X < ∞

Exponential Distribution Example

Example: Customers arrive at the service counter at the rate of 15 per hour. What is the probability that the arrival time between consecutive customers is less than three minutes?

� The mean number of arrivals per hour is 15, so λ = 15

� Three minutes is .05 hours

� P(arrival time < .05) = 1 – e-λX = 1 – e-(15)(.05) = .5276

� So there is a 52.76% probability that the arrival time between successive customers is less than three minutes

Exponential Distributions

( )arrival time 1

: any value of continuous random variable

: the population average number of

XP X e

X

λ

λ

−< = −

: the population average number of

arrivals per unit of time

1/ : average time between arrivals

2.71828e

λ

λ

=

e.g.: Drivers Arriving at a Toll Bridge;

Customers Arriving at an ATM Machine

Exponential Distributions

� Describes time or distance between events

� Used for queues

� Density function

(continued)

f(X)λλλλ = 0.5

1x

�

� Parameters

�

X

λλλλ = 2.0

( )1

x

f x e λ

λ

−

=

µ λ σ λ= =

فصل چهارم

توزیع نمونه گیری

د.یل استفاده از نمونه گیری / مزایا

صرفه جویــی در زمان

/ د.یل مزایا

کاهش هزینه ها

مزیت در زمون های

ا

مخرب

امکان تعمیم نتایج به جامعه

مباحث اصلی در نمونه گیری

نمونه گیری

روشهای نمونه گیری

در فصول –محاسبه حجم نمونه بعدی بررسی میشود

تصادفی ساده

قرعه کشی

جدول اعداد تصادفی

گروهی نظام مند

منتظم

غیرمنتظم

خوشه ای مرحله ای

Types of Samples

Samples

Non-Probability Probability SamplesNon-Probability Samples

Judgment

Probability Samples

Simple

Random

Systematic

Stratified

Cluster

Convenience

Types of Samples:Nonprobability Sample

� In a nonprobability sample, items included are chosen without regard to their probability of occurrence.� In convenience sampling, items are selected based � In convenience sampling, items are selected based

only on the fact that they are easy, inexpensive, or convenient to sample.

� In a judgment sample, you get the opinions of pre-

selected experts in the subject matter.

Types of Samples:Probability Sample

� In a probability sample, items in the sample are chosen on the basis of known probabilities.

Probability SamplesProbability Samples

Simple

RandomSystematic Stratified Cluster

Probability Sample:Simple Random Sample

� Every individual or item from the frame has an equal chance of being selected

� Selection may be with replacement (selected individual is returned to frame for possible individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame).

� Samples obtained from table of random numbers or computer random number generators.

Selecting a Simple Random Sample Using A Random Number Table

Sampling Frame For Population With 850

Items

Item Name Item #

Portion Of A Random Number Table

49280 88924 35779 00283 81163 07275

11100 02340 12860 74697 96644 89439

09893 23997 20048 49420 88872 08401

The First 5 Items in a simple Item Name Item #Bev R. 001

Ulan X. 002

. .

. .

. .

. .

Joann P. 849

Paul F. 850

The First 5 Items in a simple random sample

Item # 492

Item # 808

Item # 892 -- does not exist so ignore

Item # 435

Item # 779

Item # 002

� Decide on sample size: n

� Divide frame of N individuals into groups of k

individuals: k=N/n

� Randomly select one individual from the 1st

Probability Sample:Systematic Sample

� Randomly select one individual from the 1st

group

� Select every kth individual thereafter

N = 40

n = 4

k = 10

First Group

Probability Sample:Stratified Sample

� Divide population into two or more subgroups (called strata) according

to some common characteristic

� A simple random sample is selected from each subgroup, with sample

sizes proportional to strata sizes

� Samples from subgroups are combined into one� Samples from subgroups are combined into one

� This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines.

Population

Divided

into 4

strata

Probability SampleCluster Sample

� Population is divided into several “clusters,” each representative of

the population

� A simple random sample of clusters is selected

� All items in the selected clusters can be used, or items can be

chosen from a cluster using another probability sampling technique

� A common application of cluster sampling involves election exit polls,

where certain election districts are selected and sampled.

Population divided into 16 clusters.

Randomly selected clusters for sample

ماره مطلوبماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای اماره مطلوبویژگیهای ا ویژگیهای ا

Unbiasedness

BiasedUnbiased

P(X)

Xµµ

X

Less Variability

Sampling

Distribution Sampling

P(X)

of MedianSampling

Distribution of

Mean

µ X

Effect of Large Sample

Larger

sample size

Smaller

P(X)

Smaller

sample size

Xµ

توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری توزیع نمونه گیری

Sampling Distributions



of the Mean


of the Proportion

.3

P(x)

(continued)

Summary Measures for the Population Distribution:

Developing a Sampling Distribution

N

Xµ i=∑

.3

.2

.1

018 20 22 24

A B C D

Uniform Distribution

x

214

24222018=

+++=

2.236N

µ)(Xσ

2i

=−

=∑

1st 2nd Observation

Obs 18 20 22 24

18 18,18 18,20 18,22 18,24

Now consider all possible samples of size n=2

1st 2nd Observation

(continued)


16 Sample Means

20 20,18 20,20 20,22 20,24

22 22,18 22,20 22,22 22,24

24 24,18 24,20 24,22 24,24

16 possible samples (sampling with replacement)

1st 2nd Observation

Obs 18 20 22 24

18 18 19 20 21

20 19 20 21 22

22 20 21 22 23

24 21 22 23 24

1st 2nd Observation

Sampling Distribution of All Sample Means

Sample Means Distribution

16 Sample Means


(continued)

_1st 2nd Observation

Obs 18 20 22 24

18 18 19 20 21

20 19 20 21 22

22 20 21 22 23

24 21 22 23 24

18 19 20 21 22 23 240

.1

.2

.3 P(X)

X_

(no longer uniform)

_

Summary Measures of this Sampling Distribution:

Developing aSampling Distribution

(continued)

2116

24211918

N

Xµ i

X=

++++==

∑ L

16N

1.5816

21)-(2421)-(1921)-(18

N

)µX(σ

222

2

Xi

X

=+++

=

−=∑

L

Comparing the Population with its Sampling Distribution

P(X)

PopulationN = 4

P(X)

1.58σ 21µXX

==2.236σ 21µ ==

Sample Means Distributionn = 2

_

18 19 20 21 22 23 240

.1

.2

.3 P(X)

X18 20 22 24

A B C D

0

.1

.2

.3 P(X)

X _

_

Sampling Distributions of the Mean



of the Mean


of the Proportion

Standard Error of the Mean

� Different samples of the same size from the same population will yield different sample means

� A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean:sample is given by the Standard Error of the Mean:

� Note that the standard error of the mean decreases as the sample size increases

n

σσ

X=

If the Population is Normal

� If a population is normal with mean µ and

standard deviation σ, the sampling distribution

of is also normally distributed withX

and

(This assumes that sampling is with replacement or

sampling is without replacement from an infinite population)

µµX

=n

σσ

X=

Z-value for Sampling Distributionof the Mean

� Z-value for the sampling distribution of :

σµ)X(

σ

)µX(Z X −

=−

=

X

where: = sample mean

= population mean

= population standard deviation

n = sample size

Xµ

σ

n

σσX

Finite Population Correction

� Apply the Finite Population Correction if:

� the sample is large relative to the population

(n is greater than 5% of N)

and…and…

� Sampling is without replacement

Then

1N

nN

n

σ

µ)X(Z

−

−

−=

Normal Population Distribution

Sampling Distribution Properties

�µµx =

Normal Sampling Distribution (has the same mean)

(i.e. is unbiased )xx

x

µ

xµ

Sampling Distribution Properties

� For sampling with replacement:

As n increases,

decreases

Larger sample size

(continued)

xσ

Smaller sample size

x

x

µ

If the Population is not Normal

� We can apply the Central Limit Theorem:

� Even if the population is not normal,

� …sample means from the population will beapproximately normal as long as the sample size is approximately normal as long as the sample size is large enough.

Properties of the sampling distribution:

andµµx =n

σσ x =

n↑

Central Limit Theorem

As the sample size gets

the sampling distribution becomes almost normal

size gets large enough…

almost normal regardless of shape of population

x

Population Distribution

Central Tendency

If the Population is not Normal(continued)

Sampling distribution properties:

µµ =

Sampling Distribution (becomes normal as n increases)

Variation

(Sampling with replacement)

x

x

Larger sample size

Smaller sample size

µµx =

n

σσ x =

xµ

µ

How Large is Large Enough?

� For most distributions, n > 30 will give a sampling distribution that is nearly normal

� For fairly symmetric distributions, n > 15� For fairly symmetric distributions, n > 15

� For normal population distributions, the sampling distribution of the mean is always normally distributed

Example

� Suppose a population has mean µ = 8 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected.

� What is the probability that the sample mean is between 7.8 and 8.2?

Example

Solution:

� Even if the population is not normally distributed, the central limit theorem can be used (n > 30)

(continued)

used (n > 30)

� … so the sampling distribution of is approximately normal

� … with mean = 8

� …and standard deviation

x

xµ

0.536

3

n

σσx ===

Example

Solution (continued):(continued)

363

8-8.2

nσ

µ- µ

363

8-7.8P 8.2) µ P(7.8

X

X

<<=<<

0.38300.5)ZP(-0.5

36n36

=<<=

Z7.8 8.2 -0.5 0.5

Sampling Distribution

Standard Normal Distribution .1915

+.1915

Population Distribution

??

??

?????

???Sample Standardize

8µ = 8µX

= 0µz =xX

Sampling Distributions of the Proportion



of the Mean


of the Proportion

Population Proportions, p

p = the proportion of the population having some characteristic

� Sample proportion ( ps ) provides an estimateof p:

� 0 ≤ ps ≤ 1

� ps has a binomial distribution

(assuming sampling with replacement from a finite population or without replacement from an infinite population)

size sample

interest ofstic characteri the having sample the in itemsofnumber

n

Xps ==

Sampling Distribution of p

� Approximated by a

normal distribution if:

�

Sampling DistributionP(ps)

.3

.2

.1

5np ≥

where

and

(where p = population proportion)

.1

00 . 2 .4 .6 8 1 ps

pµsp =

n

p)p(1σ

sp

−=

5p)n(1

and

≥−

Z-Value for Proportions

p)p(1

pp

σ

ppZ s

p

s

s−

−=

−=

Standardize ps to a Z value with the formula:

� If sampling is without replacement

and n is greater than 5% of the

population size, then must use

the finite population correction

factor:

1N

nN

n

p )p (1σ

sp−

−−=

n

p)p(1ps−

pσ

Example

� If the true proportion of voters who support

Proposition A is p = .4, what is the probability

that a sample of size 200 yields a sample that a sample of size 200 yields a sample

proportion between .40 and .45?

� i.e.: if p = .4 and n = 200, what is

P(.40 ≤ ps ≤ .45) ?

Example

� if p = .4 and n = 200, what is

P(.40 ≤ ps ≤ .45) ?

(continued)

.03464.4).4(1p)p(1

σ =−

=−

=Find : σ .03464200n

σsp ===

1.44)ZP(0

.03464

.40.45Z

.03464

.40.40P.45)pP(.40 s

≤≤=

−≤≤

−=≤≤

Find :

Convert to standard normal:

spσ

Example

� if p = .4 and n = 200, what is

P(.40 ≤ ps ≤ .45) ?

(continued)

Use standard normal table: P(0 ≤ Z ≤ 1.44) = .4251

Z.45 1.44

.4251

Standardize

Sampling DistributionStandardized

Normal Distribution

Use standard normal table: P(0 ≤ Z ≤ 1.44) = .4251

.40 0ps

Example:

( )8 =2 25

7.8 8.2 ?

n

P X

µ σ= =

< < =

Sampling Distribution Standardized

( )

( )

7.8 8 8.2 87.8 8.2

2 / 25 2 / 25

.5 .5 .3830

X

X

XP X P

P Z

µ

σ

−− −< < = < <

= − < < =

Sampling Distribution Standardized Normal Distribution2

.425

Xσ = = 1Zσ =

8X

µ =8.2 Z

0Zµ =0.57.8 0.5−

.1915

X

Example:( )200 .4 .43 ?Sn p P p= = < =

( )( )

( ).43 .4

.43 .87 .8078.4 1 .4

200

S

S

S p

S

p

pP p P P Z

µ

σ

− − < = < = < = − 200

Sampling DistributionStandardized

Normal Distribution

Spσ 1Zσ =

SpµSp Z0.43 .87

فصل پنجم

تخمین فاصله ای

Point and Interval Estimates

� A point estimate is a single number,

� a confidence interval provides additional information about the variability of the estimateinformation about the variability of the estimate

Point Estimate

Lower

Confidence

Limit

Upper

Confidence

Limit

Width of confidence interval

We can estimate a Population Parameter …

Point Estimates

with a SampleStatistic

(a Point Estimate)(a Point Estimate)

Mean

Proportion pπ

Xµ

Confidence Intervals

� How much uncertainty is associated with a point estimate of a population parameter?

� An interval estimate provides more An interval estimate provides more information about a population characteristic than does a point estimate

� Such interval estimates are called confidence intervals

Confidence Interval Estimate

� An interval gives a range of values:

� Takes into consideration variation in sample statistics from sample to sample

� Based on observations from 1 sample� Based on observations from 1 sample

� Gives information about closeness to unknown population parameters

� Stated in terms of level of confidence

� e.g. 95% confident, 99% confident

� Can never be 100% confident

Confidence Interval Example

Cereal fill example

� Population has µ = 368 and σ = 15.

� If you take a sample of size n = 25 you know

� 368 ± 1.96 * 15 / = (362.12, 373.88) contains 95% of 25� 368 ± 1.96 * 15 / = (362.12, 373.88) contains 95% of the sample means

� When you don’t know µ, you use X to estimate µ� If X = 362.3 the interval is 362.3 ± 1.96 * 15 / = (356.42, 368.18)

� Since 356.42 ≤ µ ≤ 368.18 the interval based on this sample makes a correct statement about µ.

But what about the intervals from other possible samples of size 25?

25

25


(continued)

Sample # XLower

Limit

Upper

Limit

Contain

µ?

1 362.30 356.42 368.18 Yes

2 369.50 363.62 375.38 Yes

3 360.00 354.12 365.88 No

4 362.12 356.24 368.00 Yes

5 373.88 368.00 379.76 Yes


� In practice you only take one sample of size n

� In practice you do not know µ so you do not know if the interval actually contains µ

However you do know that 95% of the intervals

(continued)

� However you do know that 95% of the intervals formed in this manner will contain µ

� Thus, based on the one sample, you actually selected you can be 95% confident your interval will contain µ (this is a 95% confidence interval)

Note: 95% confidence is based on the fact that we used Z = 1.96.

Estimation Process

Population

Random Sample

Mean

I am 95% confident that µ is between 40 & 60.

(mean, µ, is unknown)

Population Mean

X = 50

Sample

40 & 60.

General Formula

� The general formula for all confidence intervals is:

Point Estimate ± (Critical Value)(Standard Error)

Where:•Point Estimate is the sample statistic estimating the population parameter of interest

•Critical Value is a table value based on the sampling distribution of the point estimate and the desired confidence level

•Standard Error is the standard deviation of the point estimate

Confidence Level

� Confidence Level

� Confidence the interval will contain the unknown population parameter

� A percentage (less than 100%)

Confidence Level, (1-α)

� Suppose confidence level = 95%

� Also written (1 - α) = 0.95, (so α = 0.05)

� A relative frequency interpretation:

� 95% of all the confidence intervals that can be

(continued)

� 95% of all the confidence intervals that can be constructed will contain the unknown true parameter

� A specific interval either will contain or will not contain the true parameter

� No probability involved in a specific interval


Population

ConfidenceIntervals

PopulationPopulation Mean

σ Unknown

PopulationProportion

σ Known

Confidence Interval for µ(σ Known)

� Assumptions� Population standard deviation σ is known� Population is normally distributed� If population is not normal, use large sample

� Confidence interval estimate:

where is the point estimate

Zα/2 is the normal distribution critical value for a probability of α/2 in each tail

is the standard error

n

σ/2ZX α±

X

nσ/

Finding the Critical Value, Zα/2

� Consider a 95% confidence interval:

0.05 so 0.951 ==− αα

1.96/2Z ±=α

Zα/2 = -1.96 Zα/2 = 1.96

0.0252

=α

0.0252

=α

Point EstimateLower Confidence Limit

UpperConfidence Limit

Z units:

X units: Point Estimate

0

Common Levels of Confidence

� Commonly used confidence levels are 90%, 95%, and 99%

Confidence Level

Confidence Coefficient, Zα/2 value

α−1Level

1.28

1.645

1.96

2.33

2.58

3.08

3.27

0.80

0.90

0.95

0.98

0.99

0.998

0.999

80%

90%

95%

98%

99%

99.8%

99.9%

α−1

Intervals and Level of Confidence

Intervals

Sampling Distribution of the Mean

x

/2α /2αα−1

µµx

=


Intervals extend from

to

(1-α)x100%of intervals constructed contain µ;

(α)x100% do not.

n

σ2/αZX −

n

σ2/αZX +

x

x1

x2

Example

� A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.

� Determine a 95% confidence interval for the true mean resistance of the population.

Example

� A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the

population standard deviation is 0.35 ohms.

(continued)

2.4068 1.9932

0.2068 2.20

)11(0.35/ 1.96 2.20

n

σ/2 ZX

≤≤

±=

±=

±

µ

α

population standard deviation is 0.35 ohms.

� Solution:


Population

ConfidenceIntervals


σ Unknown


σ Known

Do You Ever Truly Know σ?

� Probably not!

� In virtually all real world business situations, σ is not known.known.

� If there is a situation where σ is known then µ is also known (since to calculate σ you need to know µ.)

� If you truly know µ there would be no need to gather a sample to estimate it.

� If the population standard deviation σ is unknown, we can substitute the sample standard deviation, S

Confidence Interval for µ(σ Unknown)

� This introduces extra uncertainty, since S is variable from sample to sample

� So we use the t distribution instead of the normal distribution

� Assumptions

� Population standard deviation is unknown

� Population is normally distributed

� If population is not normal, use large sample

Confidence Interval for µ(σ Unknown)

(continued)

If population is not normal, use large sample

� Use Student’s t Distribution

� Confidence Interval Estimate:

(where tα/2 is the critical value of the t distribution with n -1 degrees of freedom and an area of α/2 in each tail)

n

StX 2/α±

Student’s t Distribution

� The t is a family of distributions

� The tα/2 value depends on degrees of freedom (d.f.)

� Number of observations that are free to vary after sample mean has been calculated

d.f. = n - 1

Degrees of Freedom (df)

Idea: Number of observations that are free to varyafter sample mean has been calculated

Example: Suppose the mean of 3 numbers is 8.0

If the mean of these three values is 8.0, then X3 must be 9(i.e., X3 is not free to vary)

Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2

(2 values can be any numbers, but the third is not free to vary for a given mean)

Let X1 = 7

Let X2 = 8

What is X3?

Student’s t Distribution

Standard Normal

(t with df = ∞)

Note: t Z as n increases

t0

t (df = 5)

t (df = 13)t-distributions are bell-shaped and symmetric, but have ‘fatter’ tails than the normal

Student’s t Table

Upper Tail Area

df .10 .05 .025

1 3.078 6.314 12.706

Let: n = 3 df = n - 1 = 2

α = 0.10α/2 = 0.051 3.078 6.314 12.706

2 1.886

3 1.638 2.353 3.182

t0 2.920

The body of the table contains t values, not probabilities

α/2 = 0.05

α/2 = 0.05

4.3032.920

Selected t distribution values

With comparison to the Z value

Confidence t t t ZLevel (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)

0.80 1.372 1.325 1.310 1.280.80 1.372 1.325 1.310 1.28

0.90 1.812 1.725 1.697 1.645

0.95 2.228 2.086 2.042 1.96

0.99 3.169 2.845 2.750 2.58

Note: t Z as n increases

Example of t distribution confidence interval

A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for µ

� d.f. = n – 1 = 24, so 2.06390.025t/2 ==αt

The confidence interval is

25

8(2.0639)50

n

S/2 ±=± αtX

46.698 ≤ µ ≤ 53.302

Example of t distribution confidence interval

� Interpreting this interval requires the assumption that the population you are sampling from is approximately a normal distribution (especially since n is only 25).

(continued)

distribution (especially since n is only 25).

� This condition can be checked by creating a:

� Normal probability plot or

� Boxplot


Population

ConfidenceIntervals


σ Unknown


σ Known

Confidence Intervals for the Population Proportion, π

� An interval estimate for the population

proportion ( π ) can be calculated by

adding an allowance for uncertainty to adding an allowance for uncertainty to

the sample proportion ( p )

Confidence Intervals for the Population Proportion, π

� Recall that the distribution of the sample proportion is approximately normal if the sample size is large, with standard deviation

(continued)

)(1 ππ −

� We will estimate this with sample data:

n

p)p(1−

n

)(1σp

ππ −=

Confidence Interval Endpoints

� Upper and lower confidence limits for the population proportion are calculated with the formula

p)p(1 −

� where � Zα/2 is the standard normal value for the level of confidence desired

� p is the sample proportion

� n is the sample size

� Note: must have np > 5 and n(1-p) > 5

n

p)p(1/2Zp

−± α

Example

� A random sample of 100 people

shows that 25 are left-handed.

Form a 95% confidence interval for � Form a 95% confidence interval for

the true proportion of left-handers

Example

� A random sample of 100 people shows that 25 are left-handed. Form a 95% confidence interval for the true proportion of left-handers.

(continued)

/1000.25(0.75)1.9625/100

p)/np(1/2Zp

±=

−± α

0.3349 0.1651

(0.0433) 1.96 0.25

≤≤

±=

π

Determining Sample Size

DeterminingSample Size

For the Mean

For theProportion

Sampling Error

� The required sample size can be found to reach a desired margin of error (e) with a specified level of confidence (1 - αααα)

� The margin of error is also called sampling error

� the amount of imprecision in the estimate of the

population parameter

� the amount added and subtracted to the point

estimate to form the confidence interval


For the


For the Mean

n

σ2/αZX ±

n

σ2/αZe =

Sampling error (margin of error)


For the


(continued)

For the Mean

n

σ2/αZe =

2

2σ22/

e

Zn α=Now solve

for n to get


� To determine the required sample size for the mean, you must know:

The desired level of confidence (1 - αααα), which

(continued)

� The desired level of confidence (1 - αααα), which

determines the critical value, Zα/2

� The acceptable sampling error, e

� The standard deviation, σ

Required Sample Size Example

If σ = 45, what sample size is needed to

estimate the mean within ± 5 with 90%

confidence?

(Always round up)

219.195

(45)(1.645)

e

σZn

2

22

2

22

===

So the required sample size is n = 220

If σ is unknown

� If unknown, σ can be estimated when

using the required sample size formula

� Use a value for σ that is expected to be � Use a value for σ that is expected to be

at least as large as the true σ

� Select a pilot sample and estimate σ with

the sample standard deviation, S



For the

(continued)

For theProportion

2

2

e

)(1Zn

ππ −=

Now solve for n to getn

)(1Ze

ππ −=


� To determine the required sample size for the proportion, you must know:

� The desired level of confidence (1 - αααα), which determines the

(continued)

critical value, Zα/2

� The acceptable sampling error, e

� The true proportion of events of interest, π

� π can be estimated with a pilot sample if necessary (or

conservatively use 0.5 as an estimate of π)


How large a sample would be necessary

to estimate the true proportion defective in

a large population within ±3%, with 95%

confidence?confidence?

(Assume a pilot sample yields p = 0.12)


Solution:

For 95% confidence, use Zα/2 = 1.96

e = 0.03

(continued)

p = 0.12, so use this to estimate π

So use n = 451

450.742(0.03)

0.12)(0.12)(12(1.96)

2e

)(12/2Z

n =−

=−

=ππα

Confidence Interval for Population Total Amount

� Point estimate for a population of size N:


XN total Population =


1n

S)2/(

−

−±

N

nNtNXN α

(This is sampling without replacement, so use the finite population correction in the confidence interval formula)

Confidence Interval for Population Total: Example

A firm has a population of 1000 accounts and wishes to estimate the total population value.

A sample of 80 accounts is selected with average balance of $87.6 and standard deviation of $22.3.

Find the 95% confidence interval estimate of the total balance.

Example Solution

1n

S)2/(

−

−±

N

nNtNXN α

22.3S ,6.87X 80, n ,1000N ====

The 95% confidence interval for the population total balance is $82,837.52 to $92,362.48

48.762,4600,87

11000

801000

80

22.3)9905.1)(1000()6.87)(1000(

n

±=

−

−±=

� Point estimate for a population of size N:

� Where the average difference, D, is:

DN Difference Total =

Confidence Interval for Total Difference

� Where the average difference, D, is:

value original - value auditedD where

n

D

D

i

n

1ii

=

=∑

=


1n

DS)2/(

−

−±

N

nNtNDN α

Confidence Interval for Total Difference

(continued)

where

1n)2/(

−±

NtNDN α

1

1

2)D(

−

=

−

=

∑

n

n

i

iD

DS

One-Sided Confidence Intervals

� Application: find the upper bound for the proportion of items that do not conform with internal controls

nNp)p(1 −−

� where � Zα is the standard normal value for the level of confidence desired

� p is the sample proportion of items that do not conform

� n is the sample size

� N is the population size

1N

nN

n

p)p(1Zp boundUpper

−

−−+= α

Use A fpc When Sampling More Than 5% Of The Population (n/N > 0.05)

Confidence Interval For µ with a fpc

/ 2 1

S N nX t

Nnα

−±

− A fpc simplyreduces the

Confidence Interval For π with a fpc

1

)1(2/

−

−−±

N

nN

n

ppzp α

reduces thestandard errorof either the

sample mean orthe sample proportion

Confidence Interval for µ with a fpc

10s 50,x 100,n 1000,N Suppose ====

:for CI 95% 2/ =−

−±

nNstx αµ

)88.51 ,12.48( 88.150

11000

1001000

100

10984.150

1 :for CI 95% 2/

=±

=−

−±

=−

±Nn

tx αµ

Determining Sample Size with a fpc

� Calculate the sample size (n0) without a fpc

� For µ: 2

22

2/0

e

zn

σα=

� For π:

� Apply the fpc utilizing the following formula to arrive at the final sample size (n).

� n = n0N / (n0 + (N-1))

e

2

2

2/0

)1(

e

zn

ππα −=

فصل ششم

ماری زمون فرض ا

ا

What is a Hypothesis?

� A hypothesis is a claim (assertion) about a population parameter:

population mean� population mean

� population proportion

Example: The mean monthly cell phone bill in this city is µ = $42

Example: The proportion of adults in this city with cell phones is π = 0.68

The Null Hypothesis, H0

� States the claim or assertion to be tested

Example: The average diameter of a

manufactured bolt is 30mm ( )30µ:H0 =

� Is always about a population parameter, not about a sample statistic

30µ:H0 = 30X:H0 =

The Null Hypothesis, H0

� Begin with the assumption that the null hypothesis is true

� Similar to the notion of innocent untilproven guilty

(continued)

proven guilty

� Refers to the status quo or historical value

� Always contains “=“, or “≤”, or “≥” sign

� May or may not be rejected

The Alternative Hypothesis, H1

� Is the opposite of the null hypothesis

� e.g., The average diameter of a manufactured bolt is not equal to 30mm ( H1: µ ≠ 30 )

� Challenges the status quoChallenges the status quo

� Never contains the “=“, or “≤”, or “≥” sign

� May or may not be proven

� Is generally the hypothesis that the researcher is trying to prove

The Hypothesis Testing Process

� Claim: The population mean age is 50.� H0: µ = 50, H1: µ ≠ 50

� Sample the population and find sample mean.

Population

Sample


� Suppose the sample mean age was X = 20.

� This is significantly lower than the claimed mean population age of 50.

(continued)

� If the null hypothesis were true, the probability of getting such a different sample mean would be very

small, so you reject the null hypothesis .

� In other words, getting a sample mean of 20 is so unlikely if the population mean was 50, you conclude that the population mean must not be 50.


Sampling

Distribution of X

(continued)

µ = 50

If H0 is trueIf it is unlikely that you

would get a sample

mean of this value ...

... then you reject

the null hypothesis

that µ = 50.

20

... When in fact this were

the population mean…

X

The Test Statistic and Critical Values

� If the sample mean is close to the stated population mean, the null hypothesis is not rejected.

� If the sample mean is far from the stated � If the sample mean is far from the stated population mean, the null hypothesis is rejected.

� How far is “far enough” to reject H0?

� The critical value of a test statistic creates a “line in the sand” for decision making -- it answers the question of how far is far enough.

The Test Statistic and Critical Values

Sampling Distribution of the test statistic

Region of

Rejection

Region of

RejectionRegion of

Critical Values

“Too Far Away” From Mean of Sampling Distribution

RejectionRegion of

Non-Rejection

Possible Errors in Hypothesis Test Decision Making

� Type I Error

� Reject a true null hypothesis

� Considered a serious type of error

� The probability of a Type I Error is αααα

� Called level of significance of the test

� Set by researcher in advance

� Type II Error

� Failure to reject false null hypothesis

� The probability of a Type II Error is β


Possible Hypothesis Test Outcomes

Actual Situation

(continued)

Decision H0 True H0 False

Do Not Reject H0

No Error

Probability 1 - α

Type II Error

Probability β

Reject H0 Type I Error

Probability α

No Error

Probability 1 - β


� The confidence coefficient (1-α) is the probability of not rejecting H0 when it is true.

� The confidence level of a hypothesis test is

(continued)

� The confidence level of a hypothesis test is (1-α)*100%.

� The power of a statistical test (1-β) is the probability of rejecting H0 when it is false.

Type I & II Error Relationship

� Type I and Type II errors cannot happen atthe same time

� A Type I error can only occur if H0 is true� A Type I error can only occur if H0 is true

� A Type II error can only occur if H0 is false

If Type I error probability (αααα) , then

Type II error probability (β)

Factors Affecting Type II Error

� All else equal,

� β when the difference between

hypothesized parameter and its true value

� β when αααα

� β when σ

� β when n

Level of Significance and the Rejection Region

Level of significance = ααααH0: µ = 30

H1: µ ≠ 30

/2αααα/2αααα

This is a two-tail test because there is a rejection region in both tails

Critical values

Rejection Region

30

Hypothesis Tests for the Mean

Hypothesis Tests for µµµµ

σσσσ Known σσσσ Unknown

(Z test) (t test)

Z Test of Hypothesis for the Mean (σ Known)

� Convert sample statistic ( ) to a ZSTAT test statisticX


The test statistic is:

n

σ

µXZ

S T A T

−=

σ Known σ Unknownσσσσ Known σσσσ Unknown(Z test) (t test)

Critical Value Approach to Testing

� For a two-tail test for the mean, σ known:

� Convert sample statistic ( ) to test statistic (ZSTAT)

X

� Determine the critical Z values for a specifiedlevel of significance αααα from a table or computer

� Decision Rule: If the test statistic falls in the rejection region, reject H0 ; otherwise do not

reject H0

� There are two cutoff values (critical values), defining the regions of

Two-Tail Tests

αααα/2

H0: µ = 30

H1: µ ≠≠≠≠ 30

αααα/2

Do not reject H0 Reject H0Reject H0

regions of rejection

αααα/2

-Zα/2 0 +Zα/2

αααα/2

Lower critical value

Upper critical value

30

Z

X

6 Steps in Hypothesis Testing

1. State the null hypothesis, H0 and the alternative hypothesis, H1

2. Choose the level of significance, α, and the sample size, nsample size, n

3. Determine the appropriate test statistic and sampling distribution

4. Determine the critical values that divide the rejection and nonrejection regions

6 Steps in Hypothesis Testing

5. Collect data and compute the value of the test statistic

6. Make the statistical decision and state the managerial conclusion. If the test statistic falls

(continued)

managerial conclusion. If the test statistic falls into the nonrejection region, do not reject the null hypothesis H0. If the test statistic falls into the rejection region, reject the null hypothesis. Express the managerial conclusion in the context of the problem

Hypothesis Testing Example

Test the claim that the true mean diameter of a manufactured bolt is 30mm.

(Assume σ = 0.8)

1. State the appropriate null and alternative1. State the appropriate null and alternativehypotheses

� H0: µ = 30 H1: µ ≠ 30 (This is a two-tail test)

2. Specify the desired level of significance and the sample size

� Suppose that α = 0.05 and n = 100 are chosen for this test


3. Determine the appropriate technique� σ is assumed known so this is a Z test.

4. Determine the critical values

� For α = 0.05 the critical Z values are ±1.96

(continued)

2 .00 .0 8

.1 6

1 0 0

0 .8

3 02 9 .8 4

n

σ

µXZ S T A T −−−−====

−−−−====

−−−−====

−−−−====

5. Collect the data and compute the test statistic

� Suppose the sample results are

n = 100, X = 29.84 (σ = 0.8 is assumed known)

So the test statistic is:

� 6. Is the test statistic in the rejection region?

α/2 = 0.025


(continued)

α/2 = 0.025

Reject H0 Do not reject H0

-Zα/2 = -1.96 0

Reject H0 if ZSTAT < -1.96 or ZSTAT > 1.96; otherwise do not reject H0

Reject H0

+Zα/2 = +1.96

Here, ZSTAT = -2.0 < -1.96, so the test statistic is in the rejection region

6 (continued). Reach a decision and interpret the result


(continued)

α = 0.05/2 α = 0.05/2

-2.0

Since ZSTAT = -2.0 < -1.96, reject the null hypothesisand conclude there is sufficient evidence that the mean diameter of a manufactured bolt is not equal to 30


-Zα/2 = -1.96 0

Reject H0

+Zα/2= +1.96

p-Value Approach to Testing

� p-value: Probability of obtaining a test

statistic equal to or more extreme than the

observed sample value given H0 is true

� The p-value is also called the observed level of

significance

� It is the smallest value of αααα for which H0 can be

rejected

p-Value Approach to Testing:Interpreting the p-value

� Compare the p-value with αααα

� If p-value < αααα , reject H0

� If p-value ≥≥≥≥ αααα , do not reject H0� If p-value ≥≥≥≥ αααα , do not reject H0

� Remember

� If the p-value is low then H0 must go

The 5 Step p-value approach toHypothesis Testing

1. State the null hypothesis, H0 and the alternative hypothesis, H1

2. Choose the level of significance, α, and the sample size, n

Determine the appropriate test statistic and sampling 3. Determine the appropriate test statistic and sampling distribution

4. Collect data and compute the value of the test statistic and the p-value

5. Make the statistical decision and state the managerial conclusion. If the p-value is < α then reject H0, otherwise do not reject H0. State the managerial conclusion in the context of the problem

p-value Hypothesis Testing Example

Test the claim that the true mean diameter of a manufactured bolt is 30mm.

(Assume σ = 0.8)

1. State the appropriate null and alternative1. State the appropriate null and alternativehypotheses

� H0: µ = 30 H1: µ ≠ 30 (This is a two-tail test)

2. Specify the desired level of significance and the sample size

� Suppose that α = 0.05 and n = 100 are chosen for this test


3. Determine the appropriate technique� σ is assumed known so this is a Z test.

4. Collect the data, compute the test statistic and the p-value

(continued)

2 .00 .0 8

.1 6

1 0 0

0 .8

3 02 9 .8 4

n

σ

µXZ S T A T −−−−====

−−−−====

−−−−====

−−−−====

� Suppose the sample results are

n = 100, X = 29.84 (σ = 0.8 is assumed known)

So the test statistic is:

p-Value Hypothesis Testing Example:Calculating the p-value

4. (continued) Calculate the p-value.� How likely is it to get a ZSTAT of -2 (or something further from the

mean (0), in either direction) if H0 is true?

p-value = 0.0228 + 0.0228 = 0.0456

P(Z < -2.0) = 0.0228

0

-2.0

Z

2.0

P(Z > 2.0) = 0.0228

� 5. Is the p-value < α?

� Since p-value = 0.0456 < α = 0.05 Reject H0

� 5. (continued) State the managerial conclusion in the context of the situation.


(continued)

in the context of the situation.

� There is sufficient evidence to conclude the average diameter of a manufactured bolt is not equal to 30mm.

Connection Between Two Tail Tests and Confidence Intervals

� For X = 29.84, σ = 0.8 and n = 100, the 95%confidence interval is:

0.8 (1.96) 29.84 to

0.8 (1.96) - 29.84 ++++

29.6832 ≤ µ ≤ 29.9968

� Since this interval does not contain the hypothesized mean (30), we reject the null hypothesis at αααα = 0.05

100 (1.96) 29.84 to

100 (1.96) - 29.84 ++++

Do You Ever Truly Know σ?

� Probably not!

� In virtually all real world business situations, σ is not known.known.

� If there is a situation where σ is known then µ is also known (since to calculate σ you need to know µ.)

� If you truly know µ there would be no need to gather a sample to estimate it.

Hypothesis Testing: σ Unknown

� If the population standard deviation is unknown, you instead use the sample standard deviation S.

� Because of this change, you use the t distribution instead of the Z distribution to test the null hypothesis about the of the Z distribution to test the null hypothesis about the mean.

� When using the t distribution you must assume the population you are sampling from follows a normal distribution.

� All other steps, concepts, and conclusions are the same.

t Test of Hypothesis for the Mean (σ Unknown)


t Test of Hypothesis for the Mean (σ Unknown)

� Convert sample statistic ( ) to a tSTAT test statistic


X







n

S

µXt

S T A T

−=


Example: Two-Tail Test(σ Unknown)

The average cost of a hotel room in New York is said to be $168 per night. To determine if this is true, a random sample of 25 hotels is taken and resulted in an X of $172.50 and an S of $15.40. Test the appropriate hypotheses at α = 0.05.

(Assume the population distribution is normal)

H0: µ = 168

H1: µ ≠≠≠≠ 168

� α α α α = 0.05

Example Solution: Two-Tail t Test

Reject HReject H

α/2=.025

Do not reject H

α/2=.025H0: µ = 168

H1: µ ≠≠≠≠ 168

t� α α α α = 0.05

� n = 25, df = 25-1=24

� σσσσ is unknown, so

use a t statistic

� Critical Value:

±t24,0.025 = ± 2.0639 Do not reject H0: insufficient evidence that true mean cost is different than $168

Reject H0Reject H0

-t 24,0.025

Do not reject H0

0

-2.0639 2.0639

1 .4 6

2 5

1 5 .4 0

1 6 81 7 2 .5 0

n

S

µXS T A Tt =

−=

−=

1.46

t 24,0.025

Example Two-Tail t Test Using A p-value from Excel

� Since this is a t-test we cannot calculate the p-value without some calculation aid.

� The Excel output below does this:t Test for the Hypothesis of the Mean

Data

Null Hypothesis µ= 168.00$

Level of Significance 0.05

Sample Size 25

Sample Mean 172.50$

Sample Standard Deviation 15.40$

Standard Error of the Mean 3.08$ =B8/SQRT(B6)Degrees of Freedom 24 =B6-1

t test statistic 1.46 =(B7-B4)/B11

Lower Critical Value -2.0639 =-TINV(B5,B12)

Upper Critical Value 2.0639 =TINV(B5,B12)p-value 0.157 =TDIST(ABS(B13),B12,2)

=IF(B18<B5, "Reject null hypothesis","Do not reject null hypothesis")

Data

Intermediate Calculations

Two-Tail Test

Do Not Reject Null Hypothesis

p-value > αSo do not reject H0

Connection of Two Tail Tests to Confidence Intervals

� For X = 172.5, S = 15.40 and n = 25, the 95%confidence interval for µ is:

172.5 - (2.0639) 15.4/ 25 to 172.5 + (2.0639) 15.4/ 25172.5 - (2.0639) 15.4/ 25 to 172.5 + (2.0639) 15.4/ 25

166.14 ≤ µ ≤ 178.86

� Since this interval contains the Hypothesized mean (168), we do not reject the null hypothesis at αααα = 0.05

One-Tail Tests

� In many cases, the alternative hypothesis focuses on a particular direction

H : µ ≥ 3 This is a lower-tail test since the

H0: µ ≥ 3

H1: µ < 3

H0: µ ≤ 3

H1: µ > 3

This is a lower-tail test since the alternative hypothesis is focused on the lower tail below the mean of 3

This is an upper-tail test since the alternative hypothesis is focused on the upper tail above the mean of 3

� There is only one

critical value, since

the rejection area is

in only one tail

Lower-Tail Tests

αααα

H0: µ ≥ 3

H1: µ < 3


in only one tail αααα

-Zα or -tα 0

µ

Z or t

X

Critical value

Upper-Tail Tests

αααα

H0: µ ≤ 3

H1: µ > 3� There is only one

critical value, since

the rejection area is

in only one tail

Reject H0Do not reject H0

αααα

Zα or tα0

µ

in only one tail

Critical value

Z or t

X_

Example: Upper-Tail t Test for Mean (σ unknown)

A phone industry manager thinks that customer monthly cell phone bills have increased, and now average over $52 per month. The company wishes to test this claim. (Assume a normal population)claim. (Assume a normal population)

H0: µ ≤ 52 the average is not over $52 per month

H1: µ > 52 the average is greater than $52 per month(i.e., sufficient evidence exists to support the manager’s claim)

Form hypothesis test:

� Suppose that α = 0.10 is chosen for this test and n = 25.

Find the rejection region: Reject H0

Example: Find Rejection Region

(continued)


αααα = 0.10

1.3180

Reject H0 if tSTAT > 1.318

Obtain sample and compute the test statistic

Suppose a sample is taken with the following

results: n = 25, X = 53.1, and S = 10

Example: Test Statistic

(continued)

results: n = 25, X = 53.1, and S = 10

� Then the test statistic is:

0 .5 5

2 5

1 0

5 25 3 .1

n

S

µXt

S T A T=

−=

−=

Example: Decision

αααα = 0.10

Reject H0

Reach a decision and interpret the result:

(continued)


αααα = 0.10

1.3180

Do not reject H0 since tSTAT = 0.55 ≤ 1.318

there is not sufficient evidence that themean bill is over $52

tSTAT = 0.55

Example: Utilizing The p-value for The Test

�Calculate the p-value and compare to αααα (p-value below calculated using excel spreadsheet on next page)

ααααReject H0

p-value = .2937

Reject

H0

αααα = .10

Do not reject

H01.318

0

Reject H0

tSTAT = .55

Do not reject H0 since p-value = .2937 > αααα = .10

Hypothesis Tests for Proportions

� Involves categorical variables

� Two possible outcomes

� Possesses characteristic of interest� Possesses characteristic of interest

� Does not possess characteristic of interest

� Fraction or proportion of the population in the category of interest is denoted by π

Proportions

� Sample proportion in the category of interest is denoted by p

� sizesample

sampleininterest ofcategory in number

n

Xp ==

(continued)

�

� When both nπ and n(1-π) are at least 5, p can be approximated by a normal distribution with mean and standard deviation

�

sizesamplen

π=pµn

)(1σ

ππ −=p

� The sampling distribution of p is approximately normal, so the test

Hypothesis Tests for Proportions

Hypothesis Tests for p

normal, so the test statistic is a ZSTAT

value:

n

)(1

pZ

STAT

ππ

π

−

−=

nπ ≥≥≥≥ 5and

n(1-π) ≥≥≥≥ 5

nπ < 5or

n(1-π) < 5

Not discussed in this chapter

� An equivalent form to the last slide, but in terms of the number in the

Z Test for Proportion in Terms of Number in Category of Interest

Hypothesis Tests for X

number in the category of interest, X:

)(1n

nXZ

STAT

ππ

π

−

−=

X ≥≥≥≥ 5and

n-X ≥≥≥≥ 5

X < 5or

n-X < 5

Not discussed in this chapter

Example: Z Test for Proportion

A marketing company claims that it receives 8% responses from its mailing. To test this mailing. To test this claim, a random sample of 500 were surveyed with 25 responses. Test at the α = 0.05 significance level.

Check:

nπ = (500)(.08) = 40

n(1-π) = (500)(.92) = 460�

Z Test for Proportion: Solution

αααα = 0.05

n = 500, p = 0.05

H0: π = 0.08

H1: π ≠≠≠≠ 0.08

Test Statistic:

2.47

500

.08).08(1

.08.05

n

)(1

pSTATZ −=

−

−=

−

−=

ππ

π

n = 500, p = 0.05

Reject H0 at α = 0.05

Critical Values: ± 1.96 Decision:

Conclusion:

z0

Reject Reject

.025.025

1.96

-2.47

There is sufficient evidence to reject the company’s claim of 8% response rate.

-1.96

Do not reject H0

Reject H0Reject H0

α/2 = .025

Calculate the p-value and compare to αααα(For a two-tail test the p-value is always two-tail)

(continued)

p-value = 0.0136:

p-Value Solution

α/2 = .025 α/2 = .025

1.960

Z = -2.47

0.01362(0.0068)

2.47)P(Z2.47)P(Z

==

≥+−≤

Reject H0 since p-value = 0.0136 < αααα = 0.05

Z = 2.47

-1.96

α/2 = .025

0.00680.0068

The Power of a Test

� The power of the test is the probability of

correctly rejecting a false H0

Suppose we correctly reject H0: µ ≥≥≥≥ 52

when in fact the true mean is µ = 50

Reject H0: µ ≥ 52

Do not reject H0 : µ ≥ 52

5250

when in fact the true mean is µ = 50

α

Power = 1-β

Type II Error

� Suppose we do not reject H0: µ ≥ 52 when in fact the true mean is µ = 50

This is the true This is the range of X where H is not rejected



5250

This is the true distribution of X if µµµµ = 50

H0 is not rejected

Prob. of type II error = β

Type II Error

� Suppose we do not reject H0: µ ≥ 52 when in fact the true mean is µ = 50

Here, β = P( X ≥ cutoff ) if µ = 50

(continued)



α

5250

β

Here, β = P( X ≥ cutoff ) if µ = 50

� Suppose n = 64 , σ = 6 , and α = .05

Calculating β

50.76664

61.64552

n

σZµXcutoff =−=−== αα

(for H0 : µ ≥ 52)



α

5250

So β = P( x ≥ 50.766 ) if µ = 50

50.766

0.15390.84611.01.02)P(Z

646

5050.766ZP50)µ|50.766XP( =−=≥=

−

≥==≥

� Suppose n = 64 , σ = 6 , and α = 0.05

Calculating β and Power of the test

(continued)



64

5250

Probability of type II error:

β = 0.1539

Power

= 1 - β= 0.8461

50.766The probability of correctly rejecting a false null hypothesis is 0.8641

Power of the Test

� Conclusions regarding the power of the test:

1. A one-tail test is more powerful than a two-tail test

2. An increase in the level of significance (α) results in 2. An increase in the level of significance (α) results in an increase in power

3. An increase in the sample size results in an increase in power

Two-Sample Tests

Two-Sample Tests

Population Population Population Population

Population Means,

Independent Samples

Population Means, Related Samples

Population Variances

Group 1 vs. Group 2

Same group before vs. after treatment

Variance 1 vs.Variance 2

Examples:

Population Proportions

Proportion 1 vs. Proportion 2

Difference Between Two Means

Population means, independent

samples

Goal: Test hypothesis or form a confidence interval for the difference between two population means, µ – µ

*population means, µ1 – µ2

The point estimate for the difference is

X1 – X2

σ1 and σ2 unknown, assumed equal

σ1 and σ2 unknown, not assumed equal

Difference Between Two Means: Independent Samples


samples*

� Different data sources� Unrelated

� Independent� Sample selected from one

population has no effect on the sample selected from the other

Use Sp to estimate unknown σ. Use a Pooled-Variance t test.



Use S1 and S2 to estimate unknown σ1 and σ2. Use a Separate-variance t test

sample selected from the other population

Hypothesis Tests forTwo Population Means

Lower-tail test:

H : µ ≥≥≥≥ µ

Upper-tail test:

H : µ ≤ µ

Two-tail test:

H : µ = µ

Two Population Means, Independent Samples

H0: µ1 ≥≥≥≥ µ2

H1: µ1 < µ2

i.e.,

H0: µ1 – µ2 ≥≥≥≥ 0H1: µ1 – µ2 < 0

H0: µ1 ≤ µ2

H1: µ1 > µ2

i.e.,

H0: µ1 – µ2 ≤ 0H1: µ1 – µ2 > 0

H0: µ1 = µ2

H1: µ1 ≠ µ2

i.e.,

H0: µ1 – µ2 = 0H1: µ1 – µ2 ≠ 0

Two Population Means, Independent Samples

Lower-tail test:

H0: µ1 – µ2 ≥≥≥≥ 0H1: µ1 – µ2 < 0

Upper-tail test:

H0: µ1 – µ2 ≤ 0H1: µ1 – µ2 > 0

Two-tail test:

H0: µ1 – µ2 = 0H1: µ1 – µ2 ≠ 0

Hypothesis tests for µ1 – µ2

1 1 2 1 1 2 1 1 2

αααα αααα/2 αααα/2αααα

-tα -tα/2tα tα/2

Reject H0 if tSTAT < -tα Reject H0 if tSTAT > tα Reject H0 if tSTAT < -tα/2or tSTAT > tα/2


samples

Hypothesis tests for µ1 - µ2 with σ1

and σ2 unknown and assumed equal

Assumptions:

� Samples are randomly andindependently drawn

� Populations are normallydistributed or both samplesizes are at least 30

� Population variances areunknown but assumed equal

*σ1 and σ2 unknown, assumed equal



samples

• The pooled variance is:

• The test statistic is:

(continued)

( ) ( )1)n(n

S1nS1nS

21

2

22

2

112p

−+−

−+−=

()1

*


and σ2 unknown and assumed equal

• The test statistic is:

• Where tSTAT has d.f. = (n1 + n2 – 2)



( ) ( )

+

−−−=

21

2p

2121STAT

n

1

n

1S

µµXXt


samplesThe confidence interval for

µ – µ is:

Confidence interval for µ1 - µ2 with σ1and σ2 unknown and assumed equal

( )

+±−

21

2p/221

n

1

n

1SXX αt

µ1 – µ2 is:

Where tα/2 has d.f. = n1 + n2 – 2



Pooled-Variance t Test Example

You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data:

NYSE NASDAQNumber 21 25Number 21 25Sample mean 3.27 2.53Sample std dev 1.30 1.16

Assuming both populations are approximately normal with equal variances, isthere a difference in meanyield (α = 0.05)?

Pooled-Variance t Test Example: Calculating the Test Statistic

( ) ( ) ( )µµXX −−−−−


(continued)

H0: µ1 - µ2 = 0 i.e. (µ1 = µ2)H1: µ1 - µ2 ≠ 0 i.e. (µ1 ≠ µ2)

( ) ( ) ( ) ( )1.5021

1)25(1)-(21

1.161251.30121

1)n()1(n

S1nS1nS

22

21

2

22

2

112p =

−+

−+−=

−+−

−+−=

( ) ( ) ( )2.040

25

1

21

15021.1

02.533.27

n

1

n

1S

µµXXt

21

2p

2121=

+

−−=

+

−−−=

Pooled-Variance t Test Example: Hypothesis Test Solution

H0: µ1 - µ2 = 0 i.e. (µ1 = µ2)

H1: µ1 - µ2 ≠ 0 i.e. (µ1 ≠ µ2)

αααα = 0.05

df = 21 + 25 - 2 = 44 t0 2.0154-2.0154

.025

Reject H0 Reject H0

.025

Critical Values: t = ± 2.0154

Test Statistic: Decision:

Conclusion:

Reject H0 at αααα = 0.05

There is evidence of a difference in means.

t

2.040

2.040

25

1

21

15021.1

2.533.27t =

+

−=

Pooled-Variance t Test Example: Confidence Interval for µ1 - µ2

Since we rejected H0 can we be 95% confident that µNYSE

> µNASDAQ?

95% Confidence Interval for µNYSE - µNASDAQ95% Confidence Interval for µNYSE - µNASDAQ

Since 0 is less than the entire interval, we can be 95% confident that µNYSE > µNASDAQ

(((( )))) )471.1,009.0(3628.00154.274.0 n

1

n

1SXX

21

2

p/221 ====××××±±±±====

++++±±±±−−−− ααααt


samples


and σ2 unknown, not assumed equal

Assumptions:

� Samples are randomly andindependently drawn

� Populations are normallydistributed or both samplesizes are at least 30

� Population variances areunknown and cannot beassumed to be equal*



Related PopulationsThe Paired Difference Test

Tests Means of 2 Related Populations

� Paired or matched samples

� Repeated measures (before/after)

� Use difference between paired values:

Related samples

� Eliminates Variation Among Subjects

� Assumptions:

� Both Populations Are Normally Distributed

� Or, if not Normal, use large samples

Di = X1i - X2i

Related PopulationsThe Paired Difference Test

The ith paired difference is Di , where

Related samples

Di = X1i - X2i

The point estimate for the Dn

∑

(continued)

The point estimate for the paired difference population mean µD is D : n

D

D 1ii∑

==

n is the number of pairs in the paired sample

1n

)D(D

S

n

1i

2i

D−

−

=∑

=

The sample standard deviation is SD

� The test statistic for µD is:Paired

samples

µDt D−

=

The Paired Difference Test:Finding tSTAT

n

S

µDt

DSTAT

D−=

� Where tSTAT has n - 1 d.f.

Lower-tail test:

H0: µD ≥≥≥≥ 0H : µ < 0

Upper-tail test:

H0: µD ≤ 0H : µ > 0

Two-tail test:

H0: µD = 0H : µ ≠ 0

Paired Samples

The Paired Difference Test: Possible Hypotheses

H1: µD < 0 H1: µD > 0 H1: µD ≠ 0


-tα -tα/2tα tα/2

Reject H0 if tSTAT < -tα Reject H0 if tSTAT > tα Reject H0 if tSTAT < -tα/2or tSTAT > tα/2

Where tSTAT has n - 1 d.f.

The confidence interval for µD isPaired samples

SD2/αtD ±

The Paired Difference Confidence Interval

1n

)D(D

S

n

1i

2i

D−

−

=∑

=

n2/αtD ±

where

� Assume you send your salespeople to a “customer service” training workshop. Has the training made a difference in the number of complaints? You collect the following data:

Paired Difference Test: Example

Number of Complaints: (2) - (1) Σ DNumber of Complaints: (2) - (1)Salesperson Before (1) After (2) Difference, Di

C.B. 6 4 - 2

T.F. 20 6 -14

M.H. 3 2 - 1

R.K. 0 0 0

M.O. 4 0 - 4

-21

D =Σ Di

n

5 .6 7

1n

)D(DS

2i

D

=

−

−=

∑

= -4.2

� Has the training made a difference in the number of complaints (at the 0.01 level)?

H0: µD = 0H1: µD ≠ 0

Reject

αααα/2

Paired Difference Test: Solution

Reject

αααα/2

- 4.2D =

1 .6 655 .6 7 /

04 .2

n/S

µt

D

S T A TD −=

−−=

−=

D

Test Statistic:

t0.005 = ± 4.604d.f. = n - 1 = 4

αααα/2

- 4.604 4.604

Decision: Do not reject H0

(tstat is not in the reject region)

Conclusion: There is not a significant change in the number of complaints.

αααα/2

- 1.66αααα = .01

Two Population Proportions

Goal: test a hypothesis or form a confidence interval for the difference between two population proportions,

π1 – π2

Population proportions

1 2

The point estimate for the difference is

Assumptions:

n1 π1 ≥ 5 , n1(1- π1) ≥ 5

n2 π2 ≥ 5 , n2(1- π2) ≥ 5

21 pp −



The pooled estimate for the overall proportion is:

In the null hypothesis we assume the

null hypothesis is true, so we assume π1

= π2 and pool the two sample estimates

21

21

nn

XXp

+

+=

overall proportion is:

where X1 and X2 are the number of items of interest in samples 1 and 2



( ) ( )−−− pp ππ

The test statistic for

π1 – π2 is a Z statistic:

(continued)

( ) ( )

+−

−−−=

21

2121STAT

n

1

n

1)p(1p

ppZ

ππ

2

22

1

11

21

21

n

Xp ,

n

Xp ,

nn

XXp ==

+

+=where

Hypothesis Tests forTwo Population Proportions


Lower-tail test:

H : π ≥≥≥≥ π

Upper-tail test:

H : π ≤ π

Two-tail test:

H : π = πH0: π1 ≥≥≥≥ π2

H1: π1 < π2

i.e.,

H0: π1 – π2 ≥≥≥≥ 0H1: π1 – π2 < 0

H0: π1 ≤ π2

H1: π1 > π2

i.e.,

H0: π1 – π2 ≤ 0H1: π1 – π2 > 0

H0: π1 = π2

H1: π1 ≠ π2

i.e.,

H0: π1 – π2 = 0H1: π1 – π2 ≠ 0

Hypothesis Tests forTwo Population Proportions


Lower-tail test:

H0: π1 – π2 ≥≥≥≥ 0H1: π1 – π2 < 0

Upper-tail test:

H0: π1 – π2 ≤ 0H1: π1 – π2 > 0

Two-tail test:

H0: π1 – π2 = 0H1: π1 – π2 ≠ 0

(continued)

1 1 2 1 1 2 1 1 2


-zα -zα/2zα zα/2

Reject H0 if ZSTAT < -Zα Reject H0 if ZSTAT > Zα Reject H0 if ZSTAT < -Zα/2or ZSTAT > Zα/2

Hypothesis Test Example: Two population Proportions

Is there a significant difference between the proportion of men and the proportion of women who will vote Yes on Proposition A?

� In a random sample, 36 of 72 men and 35 of 50 women indicated they would vote Yes

� Test at the .05 level of significance

� The hypothesis test is:H0: π1 – π2 = 0 (the two proportions are equal)

H1: π1 – π2 ≠ 0 (there is a significant difference between proportions)

The sample proportions are:


(continued)

� The sample proportions are:

� Men: p1 = 36/72 = 0.50

� Women: p2 = 35/50 = 0.70

.582122

71

5072

3536

nn

XXp

21

21 ========++++

++++====

++++

++++====

� The pooled estimate for the overall proportion is:

The test statistic for π1 – π2 is:


(continued)

.025

-1.96 1.96

.025

(((( )))) (((( ))))

11p(1p

ppz 2121

STAT

++++−−−−

−−−−−−−−−−−−====

)

ππππππππ

Reject H0 Reject H0

-1.96 1.96-2.20

Decision: Do not reject H0

Conclusion: There is not significant evidence of a difference in proportions who will vote yes between men and women.

(((( )))) (((( ))))2.20

50

1

72

1.582)(1.582

0.70.50

nnp(1p

21

−−−−====

++++−−−−

−−−−−−−−====

++++−−−− )

Critical Values = ±1.96For αααα = .05

Confidence Interval forTwo Population Proportions


The confidence interval for

π1 – π2 is:

( )2

22

1

11/221

n

)p(1p

n

)p(1pZpp

−+

−±− α

Testing for the Ratio Of Two Population Variances

Tests for TwoPopulation Variances

H0: σ12 = σ2

2

H1: σ12 ≠ σ2

2

H0: σ12 ≤ σ2

2

*Hypotheses FSTAT

S12 / S2

2

F test statistic

H0: σ1 ≤ σ2

H1: σ12 > σ2

2

S12 = Variance of sample 1 (the larger sample variance)

n1 = sample size of sample 1

S22 = Variance of sample 2 (the smaller sample variance)

n2 = sample size of sample 2

n1 –1 = numerator degrees of freedom

n2 – 1 = denominator degrees of freedom

Where:

� The F critical value is found from the F table

� There are two degrees of freedom required: numerator

and denominator

� The larger sample variance is always the numerator

The F Distribution

The larger sample variance is always the numerator

� When

� In the F table,

� numerator degrees of freedom determine the column

� denominator degrees of freedom determine the row

df1 = n1 – 1 ; df2 = n2 – 12

2

2

1

S

SFSTAT ====

Finding the Rejection Region

H0: σ12 = σ2

2

H1: σ12 ≠ σ2

2

H0: σ12 ≤ σ2

2

H1: σ12 > σ2

2

α/2

F0

α

FαReject H0Do not

reject H0

Reject H0 if FSTAT > Fα

F

0

α/2

Reject H0Do not reject H0 Fα/2

Reject H0 if FSTAT > Fα/2

F Test: An Example

You are a financial analyst for a brokerage firm. You want to compare dividend yields between stocks listed

on the NYSE & NASDAQ. You collect the following data:

NYSE NASDAQNYSE NASDAQNumber 21 25Mean 3.27 2.53Std dev 1.30 1.16

Is there a difference in the

variances between the NYSE

& NASDAQ at the αααα = 0.05 level?

F Test: Example Solution

� Form the hypothesis test:

H0: σ21 = σ2

2 (there is no difference between variances)

H1: σ21 ≠ σ2

2 (there is a difference between variances)

� Find the F critical value for αααα = 0.05:Find the F critical value for αααα = 0.05:

� Numerator d.f. = n1 – 1 = 21 –1 =20

� Denominator d.f. = n2 – 1 = 25 –1 = 24

� Fα/2 = F.025, 20, 24 = 2.33

� The test statistic is:

256.116.1

30.12

2

2

2

2

1 ===S

SFSTAT

α/2 = .025

H0: σ12 = σ2

2

H1: σ12 ≠ σ2

2

F Test: Example Solution

(continued)

0

α/2 = .025

F0.025=2.33Reject H0Do not

reject H0� FSTAT = 1.256 is not in the rejection

region, so we do not reject H0

� Conclusion: There is not sufficient evidence of a difference in variances at α = .05

F

فصل هفتمفصل هفتم

توزیع خی دوتوزیع خی دو

Chi-Square Test for a Variance or Standard Deviation

� A χ2 test statistic is used to test whether or not the population variance or standard deviation is equal to a specified value:

22 1)S-(n

====χH0: σ

2 = σ02

Copyright ©2011 Pearson Education

Where n = sample size

S2 = sample variance

σ2 = hypothesized population variance

follows a chi-square distribution with d.f. = n - 1

2σ

====STATχ

2

STATχ

Ha: σ2 ≠ σ0

2

Reject H0 if > or if < 2

STATχχχχ 2

2/ααααχχχχ 2

STATχχχχ 2

2/1 ααααχχχχ −−−−

Chi-Square Test For A Variance: An Example

Suppose you have gathered a random sample of size 25 and

obtained a sample standard deviation of s = 7 and want to

do the following hypothesis test:

H0: σ2 = 81


H0: σ = 81

Ha: σ2 ≠ 81

185.1481

49*24

σ

1)S-(n2

22 ============STATχ

Since < 14.185 < you fail to reject H0401.122

975.0 ====χχχχ 364.392

025.0 ====χχχχ

Contingency Tables

Contingency Tables

� Useful in situations comparing multiple population proportions


� Used to classify sample observations according to two or more characteristics

� Also called a cross-classification table.

Contingency Table Example

Left-Handed vs. Gender

Dominant Hand: Left vs. Right

Gender: Male vs. Female


Gender: Male vs. Female

� 2 categories for each variable, so this

is called a 2 x 2 table

� Suppose we examine a sample of300 children

Contingency Table Example

Sample results organized in a contingency table:

(continued)

Gender

Hand Preference

Left Rightsample size = n = 300:


Gender Left Right

Female 12 108 120

Male 24 156 180

36 264 300

120 Females, 12 were left handed

180 Males, 24 were left handed

χ2 Test for the Difference Between Two Proportions

H0: π1 = π2 (Proportion of females who are left

handed is equal to the proportion of

males who are left handed)

H1: π1 ≠ π2 (The two proportions are not the same –

hand preference is not independent


� If H0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males

� The two proportions above should be the same as the proportion of left-handed people overall

hand preference is not independent

of gender)

The Chi-Square Test Statistic

∑−

=

cells

2

2)(

all e

eo

STAT f

ffχ

The Chi-square test statistic is:


� where:

fo = observed frequency in a particular cell

fe = expected frequency in a particular cell if H0 is true

(Assumed: each cell in the contingency table has expected

frequency of at least 5)

cells all e

freedom of degree 1 has case 2x 2 thefor 2

STATχ

Decision Rule

The test statistic approximately follows a chi-squared distribution with one degree of freedom

2

STATχ


χχχχ2222

χ2α

Decision Rule:If , reject H0, otherwise, do not reject H0

0

α


22

αSTAT χ χ >>>>

Computing the Average Proportion

Here:120 Females, 12

n

X

nn

XXp

21

21 =+

+=

The average proportion is:


Here:120 Females, 12 were left handed

180 Males, 24 were left handed

i.e., based on all 180 children the proportion of left handers is 0.12, that is, 12%

12.0300

36

180120

2412p ==

+

+=

Finding Expected Frequencies

� To obtain the expected frequency for left handed females, multiply the average proportion left handed (p) by the total number of females

� To obtain the expected frequency for left handed males,


� To obtain the expected frequency for left handed males, multiply the average proportion left handed (p) by the total number of males

If the two proportions are equal, then

P(Left Handed | Female) = P(Left Handed | Male) = .12

i.e., we would expect (.12)(120) = 14.4 females to be left handed(.12)(180) = 21.6 males to be left handed

Observed vs. Expected Frequencies

Gender

Hand Preference

Left Right


FemaleObserved = 12

Expected = 14.4

Observed = 108

Expected = 105.6120

MaleObserved = 24

Expected = 21.6

Observed = 156

Expected = 158.4180

36 264 300

Gender

Hand Preference

Left Right

FemaleObserved = 12

Expected = 14.4

Observed = 108

Expected = 105.6120

Observed = 24 Observed = 156



MaleObserved = 24

Expected = 21.6

Observed = 156

Expected = 158.4180

36 264 300

0.7576158.4

158.4)(156

21.6

21.6)(24

105.6

105.6)(108

14.4

14.4)(12

f

)f(fχ

2222

cells all e

2

eo2

=−

+−

+−

+−

=

−= ∑STAT


Decision Rule

Decision Rule:If > 3.841, reject H0, otherwise, do not reject H

3.841 d.f. 1 with ; 0.7576 is statistic test The 2

05.0

2 == χχSTAT

2

STATχ


otherwise, do not reject H0

Here, = 0.7576< = 3.841,

so we do not reject H0 and conclude that there is not sufficient evidence that the two proportions are different at α = 0.05

χχχχ2222

χ20.05 = 3.841

0

0.05


2

STATχ

2

05.0χ

� Extend the χ2 test to the case with more than two independent populations:

χ2 Test for Differences Among More Than Two Proportions

H : π = π = … = π


H0: π1 = π2 = … = πc

H1: Not all of the πj are equal (j = 1, 2, …, c)


∑−

=

cells

2

2)(

all e

eo

STAT f

ffχ



� Where:

fo = observed frequency in a particular cell of the 2 x c table




cells all e

freedom of degrees 1-c 1)-1)(c-(2 has case cx 2 thefor χ2 ====STAT

Computing the Overall Proportion

n

X

nnn

XXXp

c21

c21 =+++

+++=

L

LThe overall proportion is:

� Expected cell frequencies for the c categories


� Expected cell frequencies for the c categories are calculated as in the 2 x 2 case, and the decision rule is the same:

Where is from the chi-squared distribution with c – 1 degrees of freedom

Decision Rule:If , reject H0, otherwise, do not reject H0

22

αSTAT χ χ >>>>

2

αχ

Chi-Square Test Results

Chi-Square Test: Administrators, Students, Faculty

Admin Students Faculty Total

H0: π1 = π2 = π3

H1: Not all of the πj are equal (j = 1, 2, 3)


Admin Students Faculty Total

Favor 63 20 37 120

60 30 30

Oppose 37 30 13 80

40 20 20

Total 100 50 50 200

ObservedExpected

0

2

010

2 Hreject so 9.2103 79212 =>=.STAT

χ.χ

χ2 Test of Independence

� Similar to the χ2 test for equality of more than two proportions, but extends the concept to contingency tables with r rows and c columns


H0: The two categorical variables are independent

(i.e., there is no relationship between them)

H1: The two categorical variables are dependent

(i.e., there is a relationship between them)

χ2 Test of Independence

∑−

=

cells

2

2)(

all e

eo

STAT f

ffχ


(continued)


� where:

fo = observed frequency in a particular cell of the r x c table




cells all e

freedom of degrees 1)-1)(c-(r has case cr x for the χ 2

STAT

Expected Cell Frequencies

� Expected cell frequencies:

n

total columntotalrow fe

×=


n

Where:

row total = sum of all frequencies in the row

column total = sum of all frequencies in the column

n = overall sample size

Decision Rule

� The decision rule is

If , reject H0,

otherwise, do not reject H

22

αSTAT χ χ >>>>


Where is from the chi-squared distribution with (r – 1)(c – 1) degrees of freedom


2

αχ

Example

� The meal plan selected by 200 students is shown below:

ClassStanding

Number of meals per week

Total20/week 10/week none


Standing

Fresh. 24 32 14 70

Soph. 22 26 12 60

Junior 10 14 6 30

Senior 14 16 10 40

Total 70 88 42 200

Example

� The hypothesis to be tested is:

(continued)

H0: Meal plan and class standing are independent




H1: Meal plan and class standing are dependent

(i.e., there is a relationship between them)

ClassStanding

Number of meals per week

Total20/wk 10/wk none

Fresh. 24 32 14 70

Soph. 22 26 12 60 Number of meals per week

Observed:

Expected cell frequencies if H0 is true:

Example: Expected Cell Frequencies

(continued)


Junior 10 14 6 30

Senior 14 16 10 40

Total 70 88 42 200

ClassStanding

per week

Total20/wk 10/wk none

Fresh. 24.5 30.8 14.7 70

Soph. 21.0 26.4 12.6 60

Junior 10.5 13.2 6.3 30

Senior 14.0 17.6 8.4 40

Total 70 88 42 2005.10

200

7030

n

total columntotalrow fe

=×

=

×=

Example for one cell:

Example: The Test Statistic

� The test statistic value is:

cells

2

2

f

)ff(χ

all e

eo

STAT

−= ∑

(continued)


709048

4810

830

83032

524

52424 222

..

).(

.

).(

.

).(=

−++

−+

−= L

= 12.592 from the chi-squared distribution with (4 – 1)(3 – 1) = 6 degrees of freedom

2

050.χ

Example: Decision and Interpretation

(continued)

Decision Rule:If > 12.592, reject H0, otherwise, do not reject H

12.592 d.f. 6 with ; 7090 is statistictest The 2

050

2 ========.STAT

χ.χ

2

STATχ



Here, = 0.709 < = 12.592,

so do not reject H0

Conclusion: there is not sufficient evidence that meal plan and class standing are related at α = 0.05

χχχχ2222

χ20.05=12.592

0

0.05


2

STATχ 2

050.χ

فصل هشتمفصل هشتم

رگرسیون خطیرگرسیون خطی

Correlation vs. Regression

� A scatter plot can be used to show the relationship between two variables

� Correlation analysis is used to measure the strength of the association (linear relationship) between two variables


between two variables

� Correlation is only concerned with strength of the relationship

� No causal effect is implied with correlation

� Scatter plots were first presented in Ch. 2

� Correlation was first presented in Ch. 3

Introduction to Regression Analysis

� Regression analysis is used to:

� Predict the value of a dependent variable based on the value of at least one independent variable

� Explain the impact of changes in an independent


Explain the impact of changes in an independent variable on the dependent variable

Dependent variable: the variable we wish topredict or explain

Independent variable: the variable used to predict or explain the dependent variable

Simple Linear Regression Model

� Only one independent variable, X

� Relationship between X and Y is described by a linear function


� Changes in Y are assumed to be related to changes in X

Types of Relationships

Y Y

Linear relationships Curvilinear relationships


Y

X

X

Y

X

X


Y Y

Strong relationships Weak relationships

(continued)


Y

X

X

Y

X

X


Y

No relationship

(continued)


Y

X

X


Population Y intercept

Population SlopeCoefficient

Random Error term

Dependent

Independent Variable


ii10i εXββY ++=Linear component

Dependent Variable

Random Errorcomponent

(continued)

Y

Observed Value of Y for Xi

ii10i εXββY ++=

ε



Random Error for this Xi value

X

Predicted Value of Y for Xi

Xi

Slope = β1

Intercept = β0

εi

The simple linear regression equation provides an estimate of the population regression line

Simple Linear Regression Equation (Prediction Line)

Estimate of the regression

Estimate of the regression slope

Estimated (or predicted) Y value for


i10i XbbY +=

the regression intercept

regression slopeY value for observation i

Value of X for observation i

The Least Squares Method

b0 and b1 are obtained by finding the values of

that minimize the sum of the squared

differences between Y and :Y


2i10i

2ii ))Xb(b(Ymin)Y(Ymin +−=− ∑∑

Finding the Least Squares Equation

� The coefficients b0 and b1 , and other regression results in this chapter, will be found using Excel


found using Excel

Formulas are shown in the text for those

who are interested

Simple Linear Regression Example

� A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)


� A random sample of 10 houses is selected

� Dependent variable (Y) = house price in $1000s

� Independent variable (X) = square feet

Simple Linear Regression Example: Data

House Price in $1000s(Y)

Square Feet (X)

245 1400

312 1600

279 1700


308 1875

199 1100

219 1550

405 2350

324 2450

319 1425

255 1700

300

350

400

450

Ho

us

e P

ric

e (

$1

00

0s

)

Simple Linear Regression Example: Scatter Plot

House price model: Scatter Plot


0

50

100

150

200

250

300

0 500 1000 1500 2000 2500 3000

Ho

us

e P

ric

e (

$1

00

0s

)

Square Feet

350

400

450H

ou

se

Pri

ce

($

10

00

s)

Simple Linear Regression Example: Graphical Representation

House price model: Scatter Plot and Prediction Line

Slope


0

50

100

150

200

250

300

0 500 1000 1500 2000 2500 3000

Square Feet

Ho

us

e P

ric

e (

$1

00

0s

)

feet) (square 0.10977 98.24833 price house +=

Slope = 0.10977

Intercept = 98.248

Simple Linear Regression Example: Interpretation of bo

� b0 is the estimated average value of Y when the

value of X is zero (if X = 0 is in the range of



value of X is zero (if X = 0 is in the range of

observed X values)

� Because a house cannot have a square footage

of 0, b0 has no practical application

Simple Linear Regression Example: Interpreting b1

� b1 estimates the change in the average

value of Y as a result of a one-unit



value of Y as a result of a one-unit

increase in X

� Here, b1 = 0.10977 tells us that the mean value of a

house increases by .10977($1000) = $109.77, on

average, for each additional one square foot of size

(sq.ft.) 0.1098 98.25 price house +=

Predict the price for a house with 2000 square feet:

Simple Linear Regression Example: Making Predictions


317.85

0)0.1098(200 98.25

=

+=

The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850

450

Simple Linear Regression Example: Making Predictions

� When using a regression model for prediction, only predict within the relevant range of data

Relevant range for interpolation


0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000

Square Feet

Ho

us

e P

ric

e (

$1

00

0s

)

Do not try to extrapolate

beyond the range of observed X’s

Measures of Variation

� Total variation is made up of two parts:

SSE SSR SST +=Total Sum of Regression Sum Error Sum of


Total Sum of Squares

Regression Sum of Squares

Error Sum of Squares

∑ −= 2i )YY(SST ∑ −= 2

ii )YY(SSE∑ −= 2i )YY(SSR

where:

= Mean value of the dependent variable

Yi = Observed value of the dependent variable

= Predicted value of Y for the given Xi valueiY

Y

� SST = total sum of squares (Total Variation)

� Measures the variation of the Yi values around their mean Y

� SSR = regression sum of squares (Explained Variation)

(continued)



� SSR = regression sum of squares (Explained Variation)

� Variation attributable to the relationship between X and Y

� SSE = error sum of squares (Unexplained Variation)

� Variation in Y attributable to factors other than X

(continued)

Yi

SST = ∑∑∑∑(Yi - Y)2

SSE = ∑∑∑∑(Yi - Yi )2

∧∧∧∧

_

Y∧∧∧∧

Y



Xi

Y

X

SST = ∑∑∑∑(Yi - Y)

SSR = ∑∑∑∑(Yi - Y)2∧∧∧∧

_

_

Y_Y∧∧∧∧

� The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

Coefficient of Determination, r2


� The coefficient of determination is also called r-squared and is denoted as r2

1r0 2 ≤≤note:

squares of sum total

squares of sum regression2 ==SST

SSRr

Examples of Approximate r2 Values

Y

r2 = 1

Perfect linear relationship


r2 = 1

X

Y

X

r2 = 1

Perfect linear relationship between X and Y:

100% of the variation in Y is explained by variation in X


Y

0 < r2 < 1

Weaker linear relationships


X

Y

X

Weaker linear relationships between X and Y:

Some but not all of the variation in Y is explained by variation in X


r2 = 0

No linear relationship

Y


No linear relationship between X and Y:

The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)

Xr2 = 0

Standard Error of Estimate

� The standard deviation of the variation of observations around the regression line is estimated by

∑n


2

)ˆ(

2

1

2

−

−

=−

=

∑=

n

YY

n

SSES

n

i

ii

YX

WhereSSE = error sum of squares

n = sample size

Comparing Standard Errors

YY

SYX is a measure of the variation of observed Y values from the regression line


X XYX

S smallYX

S large

The magnitude of SYX should always be judged relative to the size of the Y values in the sample data

i.e., SYX = $41.33K is moderately small relative to house prices in the $200K - $400K range

Assumptions of RegressionL.I.N.E

� Linearity

� The relationship between X and Y is linear

� Independence of Errors

Error values are statistically independent


� Error values are statistically independent

� Normality of Error

� Error values are normally distributed for any given value of X

� Equal Variance (also called homoscedasticity)

� The probability distribution of the errors has constant variance

Residual Analysis

� The residual for observation i, ei, is the difference between its observed and predicted value

� Check the assumptions of regression by examining the residuals

iii YYe −=


residuals

� Examine for linearity assumption

� Evaluate independence assumption

� Evaluate normal distribution assumption

� Examine for constant variance for all levels of X (homoscedasticity)

� Graphical Analysis of Residuals

� Can plot residuals vs. X

Residual Analysis for Linearity

Y Y


Not Linear Linear�

x

resid

ua

ls

x

x x

resid

ua

ls

Residual Analysis for Independence

Not Independent

Independent

X

resid

ua

ls

�


X

Xresid

ua

ls

resid

ua

lsX

resid

ua

ls

Residual Analysis for Normality

Percent

When using a normal probability plot, normal errors will approximately display in a straight line

100


Residual-3 -2 -1 0 1 2 3

0

100

Residual Analysis for Equal Variance

Y Y


Non-constant variance � Constant variance

x x

x x

resid

ua

ls

resid

ua

ls

House Price Model Residual Plot

40

60

80

Simple Linear Regression Example: Excel Residual Output

RESIDUAL OUTPUT

Predicted House Price Residuals

1 251.92316 -6.923162

2 273.87671 38.12329


-60

-40

-20

0

20

40

0 1000 2000 3000

Square Feet

Re

sid

ua

ls

2 273.87671 38.12329

3 284.85348 -5.853484

4 304.06284 3.937162

5 218.99284 -19.99284

6 268.38832 -49.38832

7 356.20251 48.79749

8 367.17929 -43.17929

9 254.6674 64.33264

10 284.85348 -29.85348

Does not appear to violate

any regression assumptions

� Used when data are collected over time to detect if autocorrelation is present

� Autocorrelation exists if residuals in one

Measuring Autocorrelation:The Durbin-Watson Statistic


� Autocorrelation exists if residuals in one time period are related to residuals in another period

Autocorrelation

� Autocorrelation is correlation of the errors (residuals) over time

Time (t) Residual Plot

10

15

� Here, residuals show a


� Violates the regression assumption that residuals are random and independent

-15

-10

-5

0

5

10

0 2 4 6 8

Time (t)

Resid

uals

� Here, residuals show a cyclic pattern, not random. Cyclical patterns are a sign of positive autocorrelation

The Durbin-Watson Statistic

� The Durbin-Watson statistic is used to test for autocorrelation

H0: residuals are not correlated

H1: positive autocorrelation is present


∑

∑

=

=

−−

=n

1i

2i

n

2i

21ii

e

)ee(

D

� The possible range is 0 ≤ D ≤ 4

� D should be close to 2 if H0 is true

� D less than 2 may signal positive autocorrelation, D greater than 2 may signal negative autocorrelation


Testing for Positive Autocorrelation

� Calculate the Durbin-Watson test statistic = D (The Durbin-Watson Statistic can be found using Excel or Minitab)

H0: positive autocorrelation does not exist



(The Durbin-Watson Statistic can be found using Excel or Minitab)

Decision rule: reject H0 if D < dL

0 dU 2dL


� Find the values dL and dU from the Durbin-Watson table(for sample size n and number of independent variables k)

Inconclusive

� Suppose we have the following time series data:

140

160

Testing for Positive Autocorrelation (continued)


� Is there autocorrelation?

y = 30.65 + 4.7038x

R2 = 0.8976

0

20

40

60

80

100

120

0 5 10 15 20 25 30

Tim e

Sa

les

� Example with n = 25:

Durbin-Watson Calculations

Sum of Squared Difference of Residuals 3296.18

y = 30.65 + 4.7038x

R2 = 0.8976

40

60

80

100

120

140

160

Sa

les

Testing for Positive Autocorrelation

(continued)

Excel/PHStat output:


Difference of Residuals 3296.18

Sum of Squared Residuals 3279.98

Durbin-Watson Statistic 1.00494

0

20

40

0 5 10 15 20 25 30

Tim e

1.004943279.98

3296.18

e

)e(e

Dn

1i

2

i

n

2i

21ii

==

−

=

∑

∑

=

=

−

� Here, n = 25 and there is k = 1 one independent variable

� Using the Durbin-Watson table, dL = 1.29 and dU = 1.45

� D = 1.00494 < dL = 1.29, so reject H0 and conclude that significant positive autocorrelation exists

Testing for Positive Autocorrelation (continued)


significant positive autocorrelation exists

Decision: reject H0 since

D = 1.00494 < dL

0 dU=1.45 2dL=1.29

Reject H0 Do not reject H0Inconclusive

Inferences About the Slope

� The standard error of the regression slope coefficient (b1) is estimated by

== YXYX SSS


∑ −==

2i

YXYXb

)X(X

S

SSX

SS

1

where:

= Estimate of the standard error of the slope

= Standard error of the estimate

1bS

2n

SSESYX

−=

Inferences About the Slope: t Test

� t test for a population slope� Is there a linear relationship between X and Y?

� Null and alternative hypotheses� H : β = 0 (no linear relationship)


� H0: β1 = 0 (no linear relationship)

� H1: β1 ≠ 0 (linear relationship does exist)

� Test statistic

1b

11STAT

S

βbt

−=

2nd.f. −=

where:

b1 = regression slopecoefficient

β1 = hypothesized slope

Sb1 = standarderror of the slope

Inferences About the Slope: t Test Example

House Price in $1000s

(y)

Square Feet (x)

245 1400

312 1600

(sq.ft.) 0.1098 98.25 price house +=

Estimated Regression Equation:


279 1700

308 1875

199 1100

219 1550

405 2350

324 2450

319 1425

255 1700

The slope of this model is 0.1098

Is there a relationship between the

square footage of the house and its

sales price?


H0: β1 = 0

H1: β1 ≠ 0From Excel output:

Coefficients Standard Error t Stat P-value

Intercept 98.24833 58.03348 1.69296 0.12892

Square Feet 0.10977 0.03297 3.32938 0.01039


Square Feet 0.10977 0.03297 3.32938 0.01039

1bSb1

329383032970

0109770

S

βbt

1b

11

STAT.

.

.=

−=

−=


Test Statistic: tSTAT = 3.329

d.f. = 10- 2 = 8

H0: β1 = 0

H1: β1 ≠ 0


There is sufficient evidence

that square footage affects

house price

Decision: Reject H0

Reject H0Reject H0

α/2=.025

-tα/2

Do not reject H0

0tα/2

α/2=.025

-2.3060 2.3060 3.329

d.f. = 10- 2 = 8


H0: β1 = 0

H1: β1 ≠ 0

From Excel output:

Coefficients Standard Error t Stat P-value

Intercept 98.24833 58.03348 1.69296 0.12892


Intercept 98.24833 58.03348 1.69296 0.12892

Square Feet 0.10977 0.03297 3.32938 0.01039

p-value

There is sufficient evidence that

square footage affects house price.

Decision: Reject H0, since p-value < α

F Test for Significance

� F Test statistic:

where

MSE

MSRFSTAT =

SSRMSR =


where

1kn

SSEMSE

k

SSRMSR

−−=

=

where FSTAT follows an F distribution with k numerator and (n – k - 1)denominator degrees of freedom

(k = the number of independent variables in the regression model)

H0: β1 = 0

H1: β1 ≠ 0

α = .05

df1= 1 df2 = 8

Test Statistic:

Decision:

11.08FSTAT ==MSE

MSR

F Test for Significance(continued)


df1= 1 df2 = 8 Decision:

Conclusion:

Reject H0 at αααα = 0.05

There is sufficient evidence that house size affects selling price0

α = .05

F.05 = 5.32Reject H0Do not

reject H0

Critical Value:

Fαααα = 5.32

F

Confidence Interval Estimate for the Slope

Confidence Interval Estimate of the Slope:

Excel Printout for House Prices:

1b2/1 Sb αt± d.f. = n - 2


Excel Printout for House Prices:

At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Since the units of the house price variable is

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Confidence Interval Estimate for the Slope (continued)


Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.74 and $185.80 per square foot of house size

This 95% confidence interval does not include 0.

Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance

t Test for a Correlation Coefficient

� Hypotheses

H0: ρ = 0 (no correlation between X and Y)

H1: ρ ≠ 0 (correlation exists)


� Test statistic

(with n – 2 degrees of freedom)

2n

r1

ρ-rt

2STAT

−

−

=

0 b if rr

0 b if rr

where

12

12

<−=

>+=

t-test For A Correlation Coefficient

Is there evidence of a linear relationship between square feet and house price at the .05 level of significance?

H0: ρ = 0 (No correlation)

(continued)


H1: ρ ≠ 0 (correlation exists)

α =.05 , df = 10 - 2 = 8

3.329

210

.7621

0.762

2n

r1

ρrt

22STAT =

−

−

−=

−

−

−=

t-test For A Correlation Coefficient

Conclusion:

Decision:Reject H0

3.329

210

.7621

0.762

2n

r1

ρrt

22STAT =

−

−

−=

−

−

−=

(continued)


Conclusion:There is evidence of a linear association at the 5% level of significance

Reject H0Reject H0

α/2=.025

-tα/2

Do not reject H0

0tα/2

α/2=.025

-2.3060 2.3060

3.329

d.f. = 10-2 = 8

2102n −−

Estimating Mean Values and Predicting Individual Values

YConfidence Interval for the mean of Y, given X

Goal: Form intervals around Y to express uncertainty about the value of Y for a given Xi

Y∧∧∧∧

Copyright ©2011 Pearson EducationXXi

Y = b0+b1Xi

∧∧∧∧

Y, given Xi

Prediction Interval

for an individual Y,given Xi

Confidence Interval for the Average Y, Given X

Confidence interval estimate for the mean value of Y given a particular Xi

XX|Y :µfor interval Confidencei

=


Size of interval varies according to distance away from mean, X

ihtY YX2/

XX|Y

Sˆ

i

α±

=

∑ −

−+=

−+=

2i

2i

2i

i)X(X

)X(X

n

1

SSX

)X(X

n

1h

Prediction Interval for an Individual Y, Given X

Confidence interval estimate for an Individual value of Y given a particular Xi

= :Yfor interval Confidence XXi


This extra term adds to the interval width to reflect the added uncertainty for an individual case

ihtY +±

=

1Sˆ YX2/

XXi

α

Estimation of Mean Values: Example

Find the 95% confidence interval for the mean price of 2,000 square-foot houses

∧∧∧∧

Confidence Interval Estimate for µY|X=X i


Predicted Price Yi = 317.85 ($1,000s)∧∧∧∧

37.12317.85)X(X

)X(X

n

1StY

2i

2i

YX0.025 ±=−

−+±

∑The confidence interval endpoints are 280.66 and 354.90, or from $280,660 to $354,900

Estimation of Individual Values: Example

Find the 95% prediction interval for an individual house with 2,000 square feet

∧∧∧∧

Prediction Interval Estimate for YX=X i


Predicted Price Yi = 317.85 ($1,000s)∧∧∧∧

102.28317.85)X(X

)X(X

n

11StY

2i

2i

YX0.025 ±=−

−++±

∑The prediction interval endpoints are 215.50 and 420.07, or from $215,500 to $420,070

Finding Confidence and Prediction Intervals in Excel

� From Excel, use

PHStat | regression | simple linear regression …


� Check the

“confidence and prediction interval for X=”

box and enter the X-value and confidence level desired

فصل نهمفصل نهم

پیش بینیپیش بینی

The Importance of Forecasting

� Governments forecast unemployment rates, interest rates, and expected revenues from income taxes for policy purposes

� Marketing executives forecast demand, sales, and


� Marketing executives forecast demand, sales, and consumer preferences for strategic planning

� College administrators forecast enrollments to plan for facilities and for faculty recruitment

� Retail stores forecast demand to control inventory levels, hire employees and provide training

Common Approaches to Forecasting

Common Approaches to Forecasting

Quantitative forecasting Qualitative forecasting


� Used when historical data are unavailable

� Considered highly subjective and judgmental

Causal

Quantitative forecasting methods

Qualitative forecasting methods

Time Series

� Use past data to predict future values

Time-Series Data

� Numerical data obtained at regular time intervals

� The time intervals can be annually, quarterly, monthly, weekly, daily, hourly, etc.


monthly, weekly, daily, hourly, etc.

� Example:

Year: 2005 2006 2007 2008 2009

Sales: 75.3 74.2 78.5 79.7 80.2

Time-Series Plot

� the vertical axis U.S. Inflation Rate

A time-series plot is a two-dimensional plot of time series data


the vertical axis measures the variable of interest

� the horizontal axis corresponds to the time periods

U.S. Inflation Rate

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

19

75

19

77

19

79

19

81

19

83

19

85

19

87

19

89

19

91

19

93

19

95

19

97

19

99

20

01

Year

Infl

ati

on

Ra

te (

%)

Time-Series Components

Time Series

Cyclical Component

Irregular Component

Trend Component

Seasonal Component


Component ComponentComponent Component

Overall, persistent, long-term movement

Regular periodic fluctuations,

usually within a 12-month period

Repeating swings or

movements over more than one

year

Erratic or residual

fluctuations

Trend Component

� Long-run increase or decrease over time (overall upward or downward movement)

� Data taken over a long period of time


Sales

Time

Trend Component

� Trend can be upward or downward

� Trend can be linear or non-linear

Sales Sales

(continued)


Downward linear trend

Sales

Time Upward nonlinear trend

Sales

Time

Seasonal Component

� Short-term regular wave-like patterns

� Observed within 1 year

� Often monthly or quarterly


Sales

Time (Quarterly)

Winter

Spring

Summer

Fall

Winter

Spring

Summer

Fall

Cyclical Component

� Long-term wave-like patterns

� Regularly occur but may vary in length

� Often measured peak to peak or trough to trough


trough

Sales

1 Cycle

Year

Irregular Component

� Unpredictable, random, “residual” fluctuations

� Due to random variations of

� Nature

� Accidents or unusual events


� Accidents or unusual events

� “Noise” in the time series

Does Your Time Series Have A Trend Component?

� A time series plot should help you to answer this question.

Often it helps if you “smooth” the time series


� Often it helps if you “smooth” the time series data to help answer this question.

� Two popular smoothing methods are moving averages and exponential smoothing.

Smoothing Methods

� Moving Averages

� Calculate moving averages to get an overall impression of the pattern of movement over time

� Averages of consecutive time series values for a


� Averages of consecutive time series values for a chosen period of length L

� Exponential Smoothing

� A weighted moving average

Moving Averages

� Used for smoothing

� A series of arithmetic means over time

� Result dependent upon choice of L (length of period for computing means)


period for computing means)

� Last moving average of length L can be extrapolated one period into future for a short term forecast

� Examples: � For a 5 year moving average, L = 5

� For a 7 year moving average, L = 7

� Etc.

Moving Averages

� Example: Five-year moving average

� First average:

(continued)

5

YYYYYMA(5) 54321 ++++

=


� Second average:

� etc.

5

5

YYYYYMA(5) 65432 ++++

=

Example: Annual Data

Year Sales

1

2

3

4

23

40

25

27

Annual Sales

50

60


4

5

6

7

8

9

10

11

etc…

27

32

48

33

37

37

50

40

etc…

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11

Year

Sa

les

Calculating Moving Averages

Year Sales

1 23

2 40

3 25

Average Year

5-Year Moving Average

3 29.4

4 34.4

5 33.0

5

543213

++++=

322725402329.4

++++=


� Each moving average is for a consecutive block of 5 years

3 25

4 27

5 32

6 48

7 33

8 37

9 37

10 50

11 40

5 33.0

6 35.4

7 37.4

8 41.0

9 39.4

… …

5

322725402329.4

++++=

etc…

Annual vs. 5-Year Moving Average

50

60

Annual vs. Moving Average

The 5-year moving average smoothes the data and makes it easier to


0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11

Year

Sa

les

Annual 5-Year Moving Average

makes it easier to see the underlying trend

Exponential Smoothing

� Used for smoothing and short term forecasting (one period into the future)


� A weighted moving average

� Weights decline exponentially

� Most recent observation is given the highest weight

Exponential Smoothing

� The weight (smoothing coefficient) is W

� Subjectively chosen

� Ranges from 0 to 1

� Smaller W gives more smoothing, larger W gives

(continued)


Smaller W gives more smoothing, larger W gives less smoothing

� The weight is:

� Close to 0 for smoothing out unwanted cyclical and irregular components

� Close to 1 for forecasting

Exponential Smoothing Model

� Exponential smoothing model

11 YE =

E)W1(WYE −−+=


1iii E)W1(WYE −−+=

where:Ei = exponentially smoothed value for period i

Ei-1 = exponentially smoothed value already

computed for period i - 1Yi = observed value in period iW = weight (smoothing coefficient), 0 < W < 1

For i = 2, 3, 4, …

Exponential Smoothing Example

� Suppose we use weight W = 0.2

Time Period

(i)

Sales(Yi)

Forecast from prior

period (Ei-1)

Exponentially Smoothed Value for this period (Ei)

1

2

23

40

--

23

23

(.2)(40)+(.8)(23)=26.4

E1 = Y1

since no prior


2

3

4

5

6

7

8

9

10

etc.

40

25

27

32

48

33

37

37

50

etc.

23

26.4

26.12

26.296

27.437

31.549

31.840

32.872

33.697

etc.

(.2)(40)+(.8)(23)=26.4

(.2)(25)+(.8)(26.4)=26.12

(.2)(27)+(.8)(26.12)=26.296

(.2)(32)+(.8)(26.296)=27.437

(.2)(48)+(.8)(27.437)=31.549

(.2)(48)+(.8)(31.549)=31.840

(.2)(33)+(.8)(31.840)=32.872

(.2)(37)+(.8)(32.872)=33.697

(.2)(50)+(.8)(33.697)=36.958

etc.

1ii

i

E)W1(WY

E

−−+

=

prior information exists

Sales vs. Smoothed Sales

� Fluctuations have been smoothed

40

50

60


� NOTE: the smoothed value in this case is generally a little low, since the trend is upward sloping and the weighting factor is only .2

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10Time Period

Sa

les

Sales Smoothed

Forecasting Time Period i + 1

� The smoothed value in the current period (i) is used as the forecast value for next period (i + 1) :


i1i EY =+

There Are Three Popular Methods For Trend-Based Forecasting

� Linear Trend Forecasting


� Nonlinear Trend Forecasting

� Exponential Trend Forecasting

Linear Trend Forecasting

Estimate a trend line using regression analysis

Year

Time Period

(X)Sales

(Y)

� Use time (X) as the independent variable:


(X) (Y)

2004

2005

2006

2007

2008

2009

0

1

2

3

4

5

20

40

30

50

70

65

XbbY 10 +=

In least squares linear, non-linear, andexponential modeling, time periods arenumbered starting with 0 and increasingby 1 for each time period.


The linear trend forecasting equation is:

Sales trend

Year

Time Period

(X)Sales

(Y)

2004 0 20

ii X 9.571421.905Y +=

(continued)


0

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6

Year

sa

les

2004

2005

2006

2007

2008

2009

0

1

2

3

4

5

20

40

30

50

70

65


� Forecast for time period 6 (2010):

Year

Time Period

(X)Sales

(y)

2004 0 20

(continued)

Sales trend

79.33

(6) 9.571421.905Y

=

+=


2004

2005

2006

2007

2008

2009

2010

0

1

2

3

4

5

6

20

40

30

50

70

65

??

Sales trend

0

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6

Year

sa

les

Nonlinear Trend Forecasting

� A nonlinear regression model can be used when

the time series exhibits a nonlinear trend

� Quadratic form is one type of a nonlinear model:


� Compare adj. r2 and standard error to that of

linear model to see if this is an improvement

� Can try other functional forms to get best fit

i2i2i10i XXY ε+β+β+β=

Exponential Trend Model

� Another nonlinear trend model:

i

X

10i εββY i=


� Transform to linear form:

)εlog()log(βX)βlog()log(Y i1i0i ++=

Exponential Trend Model

� Exponential trend forecasting equation:

i10i XbbYlog( +=)ˆ

where b = estimate of log(β )

(continued)


where b0 = estimate of log(β0)

b1 = estimate of log(β1)

Interpretation:

%100)1β( 1 ×− is the estimated annual compoundgrowth rate (in %)

Trend Model Selection Using Differences

� Use a linear trend model if the first differences are approximately constant

)YY()YY()Y(Y 1-nn2312 −==−=− L


� Use a quadratic trend model if the second differences are approximately constant

1-nn2312

)]YY()Y[(Y

)]YY()Y[(Y)]YY()Y[(Y

2-n1-n1-nn

23341223

−−−==

−−−=−−−

L

� Use an exponential trend model if the percentage differences are approximately constant

(continued)

Trend Model Selection Using Differences


%100Y

)Y(Y%100

Y

)Y(Y%100

Y

)Y(Y

1-n

1-nn

2

23

1

12 ×−

==×−

=×−

L

Autoregressive Modeling

� Used for forecasting

� Takes advantage of autocorrelation

� 1st order - correlation between consecutive values

2nd order - correlation between values 2 periods


ip-ip2-i21-i10i YAYAYAAY δ+++++= L

� 2nd order - correlation between values 2 periods apart

� pth order Autoregressive model:

Random Error

Autoregressive Model: Example

Year Units

The Office Concept Corp. has acquired a number of office units (in thousands of square feet) over the last eight years. Develop the second order Autoregressive model.


Year Units

02 4

03 304 205 3 06 2 07 2 08 409 6

Autoregressive Model: Example Solution

Year Yi Yi-1 Yi-2

02 4 -- --

03 3 4 --04 2 3 405 3 2 3

� Develop the 2nd order

table

� Use Excel to estimate a

regression model


05 3 2 306 2 3 207 2 2 308 4 2 209 6 4 2

Coefficients

Intercept 3.5

X Variable 1 0.8125

X Variable 2 -0.9375

Excel Output

regression model

2i1ii 0.9375Y0.8125Y3.5Y −− −+=

Autoregressive Model Example: Forecasting

Use the second-order equation to forecast number of units for 2010:


625.4

)0.9375(4)0.8125(63.5

)0.9375(Y)0.8125(Y3.5Y

0.9375Y0.8125Y3.5Y

200820092010

2i1ii

=

−+=

−+=

−+= −−

Autoregressive Modeling Steps

1.Choose p (note that df = n – 2p – 1)

2.Form a series of “lagged predictor” variables

Yi-1 , Yi-2 , … ,Yi-p


3.Use Excel or Minitab to run regression model using all p variables

4.Test significance of Ap

� If null hypothesis rejected, this model is selected

� If null hypothesis not rejected, decrease p by 1 and repeat

Choosing A Forecasting Model

� Perform a residual analysis

� Eliminate a model that shows a pattern or trend

� Measure magnitude of residual error using squared differences and select the model


squared differences and select the model with the smallest value

� Measure magnitude of residual error using absolute differences and select the model with the smallest value

� Use simplest model

� Principle of parsimony

Residual Analysis

T T

e e

0 0


Random errors

Trend not accounted for

Cyclical effects not accounted for

Seasonal effects not accounted for

T T

T T

e e

0 0

Measuring Errors

� Choose the model that gives the smallest measuring errors

� Mean Absolute Deviation (MAD)

� Sum of squared errors (SSE)


(MAD)

� Less sensitive to extreme observations

(SSE)

� Sensitive to outliers

∑=

−=n

1i

2ii )Y(YSSE

n

YY

MAD

n

1iii∑

=

−

=

Principal of Parsimony

� Suppose two or more models provide a good fit for the data

� Select the simplest model

� Simplest model types:


� Simplest model types:� Least-squares linear

� Least-squares quadratic

� 1st order autoregressive

� More complex types:� 2nd and 3rd order autoregressive

� Least-squares exponential

� Time series are often collected monthly or quarterly

� These time series often contain a trend component, a seasonal component, and the irregular component

Forecasting With Seasonal Data


irregular component

� Suppose the seasonality is quarterly

� Define three new dummy variables for quarters:

Q1 = 1 if first quarter, 0 otherwise

Q2 = 1 if second quarter, 0 otherwise

Q3 = 1 if third quarter, 0 otherwise

(Quarter 4 is the default if Q1 = Q2 = Q3 = 0)

فصل دهمفصل دهم


موزش نرم افزارموزش نرم افزارا ا

spssاجراي برنامه اجراي برنامه اجراي برنامه اجراي برنامه


منوي اصليمنوي اصليمنوي اصليمنوي اصلي - - - - 2222

� Spss با انتخاب يك . داراي يك منوي اصلي كه مشتمل بر يك سري دستور قابل اجرا است

ن دستور قابل اجراست.دستور با استفاده از موشواره، ا


File امكان File گزينه طريق از فايلها با كار�

كردن باز جديد، فايل ايجاد .است پذير

ذخيره ها، داده نمايش موجود، فايلهاي

و …و عمليات چاپ فايلها، خروج نهايتا

جمله از ،Exit دستور با spss از

.باشد مي منو اين قابليتهاي


.باشد مي منو اين قابليتهاي

Edit

جهت جستجو داده، جايگزيني، �

كپـي كردن داده ها، جا به جايــي

در فايلها از دستورات منوي

Edit استفاده مي شود .


Edit استفاده مي شود .

View

جهت حذف يا نمايش ميله ابزار،

خطوط زمينه در ويرايشگر

داده ها، تغيير قلم و عنوان

مقادير از اين منو استفاده


مقادير از اين منو استفاده

.مي شود

Data

تعريف جهت شكل مطابق�

نها، مقادير و متغيرها به رفتن ا

تنظيم ،)Case( خاص مورد

وزن و فايلها، تركيب ها، داده


وزن و فايلها، تركيب ها، داده

مورد اين از موردها، به دادن

. شود مي استفاده

Transform

متغيرهاي محاسبه و ايجاد براي�

مجدد، گذاري كد جديد،

مقاديرمفقود جايگزيني

)Missing( منو اين از …و


)Missing( منو اين از …و

.ميشود استفاده

Analyze

ماري 16گزينه از �15گزينه اين منو تمامي گزارشهاي ا

ماري توصيفي شامل جداول در مورد داده ها از ا

مار استنباطي توصيفي، ميانگين، انحراف معيار تا ا

شامل ضريب همبستگي، رگرسيون چند متغيره، و

…


شامل ضريب همبستگي، رگرسيون چند متغيره، و

. از طريق اين منو قابل اجرا است…

Graphs

منو اين از نمودارها انواع رسم جهت�

اي، ميله نمودارهاي .شود مي استفاده

....و پراكنش اي، دايره خطي، ستوني،


Utilities

جستجوي اطqعات درباره �

متغيرها و فايل ها در اين منو

تعريف سري . امكان پذير است

متغيرها نيز در اين منو انجام مي


متغيرها نيز در اين منو انجام مي

.شود

Window

براي Windowدستوراتز ا

كوچك كردن پنجره ويرايشگر

.داده ها استفاده مي شود


Help

انواع راهنمائيها براي كار در �

قسمتهاي مختلف نرم افزار

Spss از اين منو بدست مي ،


Spss از اين منو بدست مي ،

يد .ا

نوار ابزارنوار ابزارنوار ابزارنوار ابزار ----3333

مطابق شكل زير، ميله ابزار شامل دكمه هايــي است كه براي اجراي برخي �نها استفاده مي spssدستورات

جهت سرعت بخشيدن به عمليات از ا

.شود


ميله فرمولميله فرمولميله فرمولميله فرمول - - - - 4444

، ميله فرمول spssيكي ديگر از اجزاي پنجره ويرايشگر داده ها در �

است كه در قسمت زيرين ميله ابزار قرار دارد، سطري كه محتويات

در صورت پر بودن سلول، قسمت سمت . سلول فعال را نشان مي دهد

.می دهد راست محتويات سلول را نشان


نوار پيمايش پنجره نوار پيمايش پنجره نوار پيمايش پنجره نوار پيمايش پنجره - - - - 5555

درنوارهاي افقي وعمودي كه در جهت هاي راست وپائين صفحه ويرايشگر

داده ها قرار دارند ، مثلثهاي كوچكي ديده ميشود كه با حركت دادن

نها مي توان داده ها را حركت داد وقسمتهاي مورد نظر را موشواره روي ا

. روي صفحه نمايش مشاهده نمود


. روي صفحه نمايش مشاهده نمود

نوار نمايش داده ها ومتغيرهانوار نمايش داده ها ومتغيرهانوار نمايش داده ها ومتغيرهانوار نمايش داده ها ومتغيرها - - - - 6666


پنجره ويرايشگر داده ها


:استداراي دو كاربرگ SPSSويرايشگر داده ها در . �

نمايشگر اطqعات1.

نمايشگر متغيرها2.


نمايشگر متغيرها

ورود داده هاورود داده هاورود داده هاورود داده ها


تعريف متغيرها







در سمت راست سلول دكمه اي ديده کليکبا

با كليك روي اين دكمه كادر . مي شود

. جديدي ديده مي شود




Align




با . مي شودداده ها نمايانبا كليك روي دكمه كوچك گوشه پائين سمت چپ، صفحه نمايشگر�

: انتخاب اولين سلول، ورود داده ها شروع مي شود


وارد كردن داده هاوارد كردن داده هاوارد كردن داده هاوارد كردن داده ها


. اطqعات فرضي مربوط به متغير جنسيت مطابق شكل وارد شده است , درشكل زير�


:ويرايش داده ها ابعاد مختلف دارد

تغيير و اصqح داده ها 1.

اضافه كردن يك سطر 2.


اضافه كردن متغير جديد 3.

) مورد ( حذف سطر 4.

) متغير ( حذف ستون 5.

)find(جستجوی داده 6.

)sort(مرتب سازی داده ها 7.


جهت ايجاد متغير جديد مراحل زير را اجرا

: كنيد

دستور Transformـ از منوي 1

Compute راانتخاب و اجرا كنيد .

Compute) ) ) ) محاسبه داده ها محاسبه داده ها محاسبه داده ها محاسبه داده ها ( ( ( (


. راانتخاب و اجرا كنيد



شمارش داده ها شمارش داده ها شمارش داده ها شمارش داده ها



ادامه بحثادامه بحثادامه بحثادامه بحث

در سمت چپ كادر از . نام متغير جديد را تايپ كنيد Target Variableدر كادر - 3

منتقل Numeric Variablesليست متغيرها ، متغير مورد نظر را انتخاب و به كادر

.كنيد


Define گزينه روي - 4 Values شود مي ديده زير شكل . كنيد كليك :



به عنوان . مقداري را كه مي خواهيد شمارش شود را تايپ كنيد Valueدر كادر مقابل گزينه - 5

7مثال عدد



به كادر Addبا كليك روي دكمه 7عدد

Values to Count منتقل شده

مقابل 5است و مقدار بعدي يعني عدد

تايپ شده است Valueگزينه



Add دكمه فشار با و كنيد تايپ ، شود شمارش است قرار كه مقداري هر

.كنيد وارد شمارش كادر در

.كليك كنيد Continueجهت ادامه مراحل روي گزينه -6


.كليك كنيد Continueجهت ادامه مراحل روي گزينه -6

. كليك كنيد OKروي گزينه - 7


با انجام مراحل فوق متغير جديدي به متغيرها اضافه - 8

.شده است


متغير جديدي است كه نشان دهنده mزير ستون ,در شكل

. ميباشد 7و 5تعداد پاسخهاي هر پاسخگو به كد


متغير جديدي است mمتغير ◄كه جهت شمارش تعريف شده

.است

براي 0مطابق شكل ، عدد ◄


براي 0مطابق شكل ، عدد ◄نشان دهنده 1پاسخگوي شماره

مي 7و 5عدم انتخاب كدهاي باشد

جداول توافقيجداول توافقيجداول توافقيجداول توافقي

براي تهيه جداول فراواني دوبعدي از دستور

Crosstabs اين دستور براي . استفاده كنيد

جداول دو بعدي اسمي و رتبه اياسمي و رتبه اياسمي و رتبه اياسمي و رتبه ايداده هاي


جداول دو بعدي اسمي و رتبه اياسمي و رتبه اياسمي و رتبه اياسمي و رتبه ايداده هاي

.را ايجادمي كند

جهت استفاده از گزينه

Crosstabs :

دستور Analyzeاز منوي 1.


Descriptive Statistics را انتخاب و

.كنيدكليك


را Crosstabsزير دستور 1.

.انتخاب و اجرا كنيد


جداول توافقيجداول توافقيجداول توافقيجداول توافقي

Crosstabsپس از اجراي گزينه 1.

ديده Crosstabsكادر مكالمه

.مي شود

با انتخاب و كليك كردن روي نام 2.

نها را به كادرهاي Rowsمتغيرها، ا


با انتخاب و كليك كردن روي نام

نها را به كادرهاي و Rowsمتغيرها، ا

Columns منتقل نمائيد


.كليك كنيد OKروي گزينه . 3

در صورتي كه نياز به جداول سه �

بعدي داشته باشيد، متغير


مورد نظررا به كادر

Previous منتقل كنيد.

زمونهايزمونهايازمونهايازمونهاياt ا

زمون :وجود دارد tسه نوع ا

1 .t اي يك نمونه

2 .t دو گروه مستقل


2 .t دو گروه مستقل

3 .t زوجي يا دو گروه وابسته

زمون زمون ازمون ازمون ايك نمونه اي يك نمونه اي يك نمونه اي يك نمونه اي tا

زمون به اين سؤال پاسخ مي دهد كه ميانگين مشاهده شده �اين ا

اين . در مقايسه با مقدار واقعي تفاوت معناداري دارد يا خير


اين . در مقايسه با مقدار واقعي تفاوت معناداري دارد يا خير

زمون زمون ساده ترين ا

. مي باشد tا

زمون :اي تك نمونه tجهت انجام ا

Compareگزينه Analyzeاز منوي Means را

.كنيد انتخاب وكليك


.كنيد انتخاب وكليك

Tدستور دستور دستور دستور Test... Sampl–One راكليك

واجراكنيد

زمون زمون ازمون ازمون ايك نمونه اي يك نمونه اي يك نمونه اي يك نمونه اي tا



Test T Sampl–Oneپنجره �

. ديده مي شود

نظر را از ليست متغيرهاي كادر متغير مورد�


نظر را از ليست متغيرهاي كادر متغير مورد

Test(سمت چپ به كادر سمت راست

Variables (منتقل كنيد .


.خروجي ديده مي شود. كليك كنيد OKروي گزينه �

زمون قضاوت کنيد sigبا توجه به سطح معني داري �.درمورد ا


زمون زمون ازمون ازمون ادو گروه مستقلدو گروه مستقلدو گروه مستقلدو گروه مستقل tا

زمون غير وابسته نيز مي خوانندزمون را ا

در اين نوع . اين نوع ا

ماري مستقل، زمون تفاوت بين ميانگينهاي دو جامعه ا

ا

زمون قرار مي گيرد. مورد ا


زمون قرار مي گيرد. مورد ا


دو نمونه تصادفي از دو جامعه را با هم مقايسه مي كنيم تا

نها را معين كنيم. تفاوت يا عدم تفاوت ميانگينهاي ا


زمون :مستقل tجهت اجراي ا

.كليك كنيد Analyzeروي منوي . 1

.كليك كنيد Compare Meansروي گزينه . 2


.كليك كنيد Compare Meansروي گزينه . 2

Independent Samples Tمطابق شكل روي دستور . 3

tests كليك كنيد

زمون زمون ازمون ازمون ادو گروه مستقلدو گروه مستقلدو گروه مستقلدو گروه مستقل tا



– Independentكادر گـفتگوي. 4

Sample T Testمي شودمشاهده .


روي . فهرستي از متغير ها در كادر ديده مي شود

ن را درون كادر متغير وابسته كليك كنيد و ا

منتقل Test Variableمقابل با نام

كنيد

ادامه بحث

ادامه بحث


ادامه بحث


روي متغير مستقل كليك كنيد . 5

ن را به كادر و ا


Grouping Variable منتقل نمائيد.


عنوان مثال جنسيت به عنوان به

. متغير مستقل در نظر گرفته شد



Define Groupsروي دكمه

مطابق شكل كادر . كليك كنيد

ديده مي شود، اين كادر مشخص

مي كند كه كدام دو گروه در حال


مي كند كه كدام دو گروه در حال

. مقايسه شدن هستند


در مثال موجود متغير مستقل

جنسيت مي باشد كه .زم است

را به درون كادرهاي 2و1كدهاي

.مشخص وارد كنيد


.مشخص وارد كنيد

زمون . كليك كنيد OKروي گزينه . 5 . مشاهده مي شود tخروجي ا

مي بينيد شكل در كه همانگونه .كنيد كليك Continue دكمه روي .6است شده وارد پرانتز درون مقادير


Sig مقدارمقدارمقدارمقدار بهبهبهبه ،،،، گروهگروهگروهگروه دودودودو واريانسهايواريانسهايواريانسهايواريانسهاي برابري برابري برابري برابري بررسيبررسيبررسيبررسي برايبرايبرايبراي� ا ا ا زمونزمونزمونزمونا

زمون Sig مقدار اگرمي شودمي شودمي شودمي شود توجهتوجهتوجهتوجه لونلونلونلون 05/0 از كمتر لون ا

از بايد حالت دراين .نيستند برابر جامعه دو واريانسهاي باشد،


از بايد حالت دراين .نيستند برابر جامعه دو واريانسهاي باشد،

ماره هايكرد استفاده نابرابر واريانسهاي به مربوط ا

زمونزمونازمونازمونازوجيزوجيزوجيزوجي tا

زمون، �براي تشخيص . همبسته يا وابسته نيز مي گويند tبه اين ا

زمون استفاده و انجام تفاوت ميانگين دو گروه وابسته، ازاين ا

.مي شود


.مي شود


Compareگزينه Analyzeاز منوي- �1 Means را انتخاب وكليك كنيد


زمونزمونازمونازموناا t زوجيزوجيزوجيزوجي

:ديده مي شودزير كليك كنيد، پنجره Samples T test Pairedروي . 2



Pairedدو متغير موردنظر را به طور هم زمان انتخاب و به كادر . 3

Variables منتقل كنيد.

خروجي ديده مي شود. كليك كنيد OKروي دكمه . 4


خروجي ديده مي شود. كليك كنيد OKروي دكمه . 4

تحليل واريانستحليل واريانستحليل واريانستحليل واريانس

مقدار . مجذور انحراف اعداد از ميانگين را واريانس مي گويند�

.واريانس نشان دهنده پراكندگي داده ها از ميانگين است


Way - - - - Oneتحليل واريانس يك طرفه تحليل واريانس يك طرفه تحليل واريانس يك طرفه تحليل واريانس يك طرفه

ماري روشروشروشروش�ماري اماري اماري ان طيطيطيطي كهكهكهكه ا

نانانا متغيرمتغيرمتغيرمتغير رويرويرويروي مستقلمستقلمستقلمستقل متغيرمتغيرمتغيرمتغير يكيكيكيك تاثيرتاثيرتاثيرتاثير ا

گـفتهگـفتهگـفتهگـفته طرفهطرفهطرفهطرفه يكيكيكيك واريانسواريانسواريانسواريانس تحليلتحليلتحليلتحليل مي شود،مي شود،مي شود،مي شود، بررسيبررسيبررسيبررسي وابستهوابستهوابستهوابسته


گـفتهگـفتهگـفتهگـفته طرفهطرفهطرفهطرفه يكيكيكيك واريانسواريانسواريانسواريانس تحليلتحليلتحليلتحليل مي شود،مي شود،مي شود،مي شود، بررسيبررسيبررسيبررسي وابستهوابستهوابستهوابسته

....مي شودمي شودمي شودمي شود

ناليز واريانس يكطرفه :براي محاسبه ا

را Compare means، گزينه Analyzeاز منوي . 1

.كليك كنيد


.كليك كنيد

- One، گزينه اسqيد بعدمطابق . 2 Way AN0VA

.را انتخاب كنيد

ناليز واريانس يكطرفها


Oneگزينه اجراي با - Way

ANOVA، ديده مقابل پنجره

:مي شود



متغير هاي مورد نظر را از ليست سمت . �4

چپ به كادر سمت راست

Dependent List منتقل كنيد .

OKو سپس Contintueروي دكمه

خروجي ديده مي شود. كليك كنيد


و سپس روي دكمه

خروجي ديده مي شود. كليك كنيد


كليك Optionsروي گزينه . 5

- :ديده مي شود مقابلشكل . كنيدو سپس Contintueروي دكمه

OK خروجي ديده مي شود . كليك كنيد .


OK خروجي ديده مي شود . كليك كنيد .

ناليز واريانس دو طرفهناليز واريانس دو طرفهاناليز واريانس دو طرفهاناليز واريانس دو طرفها- Towا Way Analysis of

Variance((((

درتحليل واريانس دو طرفه، متغير مستقل تغييرات متغير وابسته را درتحليل واريانس دو طرفه، متغير مستقل تغييرات متغير وابسته را درتحليل واريانس دو طرفه، متغير مستقل تغييرات متغير وابسته را درتحليل واريانس دو طرفه، متغير مستقل تغييرات متغير وابسته را �

....تبيين مي كندتبيين مي كندتبيين مي كندتبيين مي كند


....تبيين مي كندتبيين مي كندتبيين مي كندتبيين مي كند

، متغير وابسته را ، متغير وابسته را ، متغير وابسته را ، متغير وابسته را اياياياي هنگامي كه دو متغير مستقل با مقياس اسمي يا رتبههنگامي كه دو متغير مستقل با مقياس اسمي يا رتبههنگامي كه دو متغير مستقل با مقياس اسمي يا رتبههنگامي كه دو متغير مستقل با مقياس اسمي يا رتبه�

ناليز واريانس دو طرفه تبيين نمايند، ازتبيين نمايند، ازتبيين نمايند، ازتبيين نمايند، ازناليز واريانس دو طرفهاناليز واريانس دو طرفهاناليز واريانس دو طرفهاجهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده ا


ناليز واريانس دو طرفه تبيين نمايند، ازتبيين نمايند، ازتبيين نمايند، ازتبيين نمايند، ازناليز واريانس دو طرفهاناليز واريانس دو طرفهاناليز واريانس دو طرفهاجهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده جهت محاسبه روابط متغيرها، استفاده ا

....مي شودمي شودمي شودمي شود

ادامه بحث، گزينه Analyzeاز منوي - 1

General Linear Model

. كليك كنيد

Univariateمطابق شكل گزينه . 2


مطابق شكل گزينه . 2

: انتخاب كنيد

ناليز واريانس دو طرفهناليز واريانس دو طرفهاناليز واريانس دو طرفهاناليز واريانس دو طرفهاا


مطابق شكل متغير وابسته را به كادر . �4

Dependent Variable منتقل

كنيد و متغيرهاي مستقل را به كادر بعدي

.منتقل نماييد



كليك كنيد Modelروي گزينه مدل . 5

:ديده مي شود مقابلشكل



كليك كنيد full factorialروي گزينه . 6

.را فشار دهد continueو دكمه



كليك كنيد plotsروي گزينه . 7

. ديده مي شود مقابلشكل

متغيرهاي مورد نظر را به كادرهاي مقابل

.را كليك كنيد Addمنتقل كنيد و گزينه


.را كليك كنيد Addمنتقل كنيد و گزينه

.را فشاردهد continueدكمه


كليك optionsروي گزينه . 8

ديده مي شود مقابلكنيد شكل

.



كليك كنيد و دكمه Residual plotروي گزينه �

continue را فشار دهد، خروجي ديده مي شود.


ضريب همبستگي پيرسونضريب همبستگي پيرسونضريب همبستگي پيرسونضريب همبستگي پيرسون

:جهت محاسبه ضريب همبستگي پيرسون مراحل زير را دنبال كنيد

را انتخاب و كليك Correlate، گزينه Analyzeاز منوي . 1

.نمائيد


.نمائيد

:را كليك كنيد Bivariateمطابق شكل، دستور . 2

ضريب همبستگي پيرسونضريب همبستگي پيرسونضريب همبستگي پيرسونضريب همبستگي پيرسون



بعد از اجراي دستور . 3

Bivariate مقابل پنجره

:مشاهده مي شود



متغير هاي مورد نظر را به كادر .4

Variable منتقل كنيد.

Pearsonروي گزينه . 5


Pearsonروي گزينه . 5

.كليك كنيد


كليك Optionsروي دكمه . �6

:شکل مقابل ديده مي شود.كنيد


:شکل مقابل ديده مي شود.كنيد


گزينه هاي زير مجموعه . 7

Statistics مت دارqرا ع

)مطابق شكل(.كنيد



.را كليك كنيد Continueدكمه . 8

. را كليك كنيد OKدكمه . 9

.خروجي مشاهده مي شود


.خروجي مشاهده مي شود

زمون زمون ازمون ازمون اMann -Whitney U Testمن ويتني من ويتني من ويتني من ويتني Uا

زمون ناپارامتريك جهت متغيرهايــي با مقياس اسميزمون ناپارامتريك جهت متغيرهايــي با مقياس اسمييك ازمون ناپارامتريك جهت متغيرهايــي با مقياس اسمييك ازمون ناپارامتريك جهت متغيرهايــي با مقياس اسمييك ا ....رتبه اي مي باشدرتبه اي مي باشدرتبه اي مي باشدرتبه اي مي باشد - - - - يك ا

زمونزمونازمونازمونازمون معادلمعادلمعادلمعادل ويتنيويتنيويتنيويتني منمنمنمن ا

زمونازمونازمونا....مي باشدمي باشدمي باشدمي باشد مستقلمستقلمستقلمستقل گروهگروهگروهگروه دودودودو t پارامتريكپارامتريكپارامتريكپارامتريك ا


زمونزمونازمونازمونازمون معادلمعادلمعادلمعادل ويتنيويتنيويتنيويتني منمنمنمن ا

زمونازمونازمونا....مي باشدمي باشدمي باشدمي باشد مستقلمستقلمستقلمستقل گروهگروهگروهگروه دودودودو t پارامتريكپارامتريكپارامتريكپارامتريك ا

زمون اين كاربرد حسب بر را گروه دو است قرار كه هنگامي است ا

.كنند مقايسه هم با افراد رتبه

زمون :اجرا كنيد Uمراحل زيررا جهت ا

را NonParametric Testگزينه Analyzeاز منوي . 1

.كليك كنيد


.كليك كنيد

را انتخاب independent Sample 2مطابق شكل گزينه . 2

.كنيد

زمون زمون ازمون ازمون امن ويتنيمن ويتنيمن ويتنيمن ويتنيUا


ادامه بحث

2اجراي انتخاب وپس از . 3

independent Sample

. ديده مي شود مقابلپنجره را Mann Whitney Uگزينه.4


را Mann Whitney Uگزينه.4

.عqمت دار كنيد

ادامه بحث

Defineروي گزينه . 5

Groups كليك كنيد، كادر

و 1كدهاي گروه : ديده مي شود مقابل

. را مقابل كادرهاي هر كدام تايپ كنيد 2


. را مقابل كادرهاي هر كدام تايپ كنيد 2

زمون زمون ازمون ازمون امن ويتنيمن ويتنيمن ويتنيمن ويتنيUا

براي 2و 1در اينجا متغير جنسيت با توجه به كدگذاري اوليه داراي دو كد

قايان و خانمها مي باشد .ا

خروجي زير . را كليك كنيد OKو سپس Continueدكمه . 7


خروجي زير . را كليك كنيد OKو سپس Continueدكمه . 7

.ديده مي شود

زمون ويلكاكسون زمون ويلكاكسون ازمون ويلكاكسون ازمون ويلكاكسون اWilcoxon Testا

زمون ناپارامتريك جهت متغيرهايــي با مقياس �ويلكاكسون، ا

زمون، امكان مقايسه قبل و بعد . رتبه اي مي باشداز طريق اين ا

.يك وضعيت تحت تاثير يك متغير امكان پذير است


.يك وضعيت تحت تاثير يك متغير امكان پذير است

زمون پارامتريك �زمون ويلكاكسون، معادل ا

زمون پارامتريك ا

زمون ويلكاكسون، معادل ا





....زوجي مي باشدزوجي مي باشدزوجي مي باشدزوجي مي باشد tا


: براي اجراي ويلكاكسون

، گزينه Analyzeاز منوي . 1

Nonparametrice Test را

.كليك كنيد


.كليك كنيد

2 .- Related Sample 2 را انتخاب

:كنيد

ادامه بحث

:، كادر ديده مي شود2با اجراي گزينه . 3

متغيرهاي مورد نظر را به صورت جفتي به . 4

. منتقل كنيد Test Pairs Listكادر

انتقال به صورت تك متغيري امكان پذير


. منتقل كنيد كادر

انتقال به صورت تك متغيري امكان پذير

.نمي باشد

.را كليك كنيد Wilcoxonگزينه . 5

.خروجي ديده مي شود. را كليك كنيد OKدكمه . 6


.خروجي ديده مي شود. را كليك كنيد OKدكمه . 6

زمون كروسكال واليسزمون كروسكال واليسازمون كروسكال واليسازمون كروسكال واليساا

هنگامي كه داده ها در مقياس رتبه اي باشند، جهت مقايسه وضعيت يك �

زمون استفاده مي شودچند گروهچند گروهچند گروهچند گروهمتغير در .، از اين ا

زمون كروسكال واليس�زمون كروسكال واليسازمون كروسكال واليسازمون كروسكال واليسازمونهاي ,,,, ا

زمونهاي معادل تحليل واريانس يك طرفه در ازمونهاي معادل تحليل واريانس يك طرفه در ازمونهاي معادل تحليل واريانس يك طرفه در امعادل تحليل واريانس يك طرفه در ا


زمون كروسكال واليس�زمون كروسكال واليسازمون كروسكال واليسازمون كروسكال واليسازمونهاي ,,,, ا

زمونهاي معادل تحليل واريانس يك طرفه در ازمونهاي معادل تحليل واريانس يك طرفه در ازمونهاي معادل تحليل واريانس يك طرفه در امعادل تحليل واريانس يك طرفه در ا

....پارامتريك استپارامتريك استپارامتريك استپارامتريك است

ادامه بحث

زمون :براي اجراي ا

، گزينه Analyzeاز منوي . 1

NonParametrice Test را

.انتخاب و كليك كنيد



K Independentگزينه . 2

Samples راانتخاب كنيد:

ادامه بحث

Kپس از اجراي گزينه . 3

Independent Samples مقابل پنجره


مقابل پنجره

:ديده مي شود

ادامه بحث

متغير هاي مورد نظر را از كادر سمت چپ . 4

و test Variableبه كادرهاي

Grouping Variable منتقل

.كنيد


-Kruskalگزينه . 5

Wallish را مارک دار كنيد.

ادامه بحث

Defineروي گزينه . 6

Range كليك كنيد .

مقابل . ديده مي شود مقابلكادر

گزينه حداقل و حداكـثر، كدهاي


گزينه حداقل و حداكـثر، كدهاي

.مورد استفاده را تايپ كنيد

زمون كاي دوزمون كاي دوازمون كاي دوازمون كاي دواا

هنگامي كه داده هايــي با مقياس اسمي وجود دارد، يكي از معمول ترين �

زمون مي باشدزمونها، ا

.ا


ادامه بحث

زمون � :براي اجراي اين ا

، Analyzeاز منوي . 1

Descriptive Statisticsگزينه

.را انتخاب و كليك كنيد


.را انتخاب و كليك كنيد

ادامه بحث

را انتخاب و كليك Crosstabدستور . �با اجراي اين دستور شكل ديده . كنيد

:مي شود

متغيرهاي مورد نظر را به كادرهاي . �3Row وColumn )سطر و ستون (


Row وColumn )سطر و ستون ( .منتقل كنيد

ادامه بحث

كليك Statisticsروي گزينه . 4

: ديده مي شودمقابل پنجره. كنيد

- Chiروي گزينه . 5 Square


.كليك كنيد

ادامه بحث

. را انتخاب و كليك كنيد OKو سپس Continueگزينه . 6

.خروجي ديده مي شود


.خروجي ديده مي شود

زمون �يد، يعني به علت عدم X2ا

زمونهاي ناپارامتري به شمار مي ا

از ا

جهت گيري نمي تواند مشخص كند وضعيت كدام جنسيت بهتر است،

.صرفا متفاوت بودن وضعيت با توجه به نوع متغير مشخص مي شود


.صرفا متفاوت بودن وضعيت با توجه به نوع متغير مشخص مي شود

زمون �زمون كاربرد ازمون كاربرد ازمون كاربرد ايا رابطه بين X2كاربرد ا

يا رابطه بين اين است كه مشخص مي كند ايا رابطه بين اين است كه مشخص مي كند ايا رابطه بين اين است كه مشخص مي كند ااين است كه مشخص مي كند ا

....دو متغير كيفي تصادفي است يا واقعيدو متغير كيفي تصادفي است يا واقعيدو متغير كيفي تصادفي است يا واقعيدو متغير كيفي تصادفي است يا واقعي


زمون فريدمنزمون فريدمنازمون فريدمنازمون فريدمناا

زمون kهنگامي كه قرار است متغيرهايــي با مقياس رتبه اي در گروه وابسته ا

زمون استفاده شوند، جهت بررسي تفاوت در گروه هاي وابسته از اين ا

.مي شود


.مي شود

ادامه بحث

زمون :جهت استفاده از اين ا

.را انتخاب و كليك كنيد NonParametric Testگزينه Analyzeاز منوي . 1

. را كليك كنيد K Ralated Sampleگزينه . 2


. را كليك كنيد گزينه . 2

زمون فريدمنزمون فريدمنازمون فريدمنازمون فريدمناا


ادامه بحث

K Ralatedپس از اجراي گزينه . 3

Samplesكادر زير ديده مي شود ،:

متغيرهاي مورد نظر را از كادر سمت چپ به . 4

Test Variableكادر سمت راست


كادر سمت راست

.منتقل كنيد

.را عqمت دار كنيد Friedmanگزينه . 5

خروجي مشاهده. را كليك كنيد OKدكمه . 6

.مي شود


زمون كوكرانزمون كوكرانازمون كوكرانازمون كوكراناا

زمون �اگر متغيرهاي مورد بررسي داراي مقياس اسمي باشند، جهت ا

زمون كوكران استفاده كرد گروه وابسته مي توان از kتفاوت بين ا


زمون كوكران استفاده كرد گروه وابسته مي توان از kتفاوت بين ا

ادامه بحث

گزينه Analyzeاز منوي . 1

Nonparametric Test را


را K Related Samplesگزينه .2


را گزينه .2

:پنجره مشاهده مي شود. اجرا كنيد

ادامه بحث

متغيرهاي مورد نظر را از كادر سمت . �3

Test Variableچپ به كادر

.منتقل كنيد

را را را را ′′′′Q s Cochran گزينهگزينهگزينهگزينه. �4


را را را را گزينهگزينهگزينهگزينه. 4

....عqمت دار كنيدعqمت دار كنيدعqمت دار كنيدعqمت دار كنيد

ادامه بحث

.را كليك كنيد OKدكمه . �5

.خروجي مشاهده مي شود �


زمون كوكران متغير �زمون كوكران متغير توجه داشته باشيد براي استفاده از ازمون كوكران متغير توجه داشته باشيد براي استفاده از ازمون كوكران متغير توجه داشته باشيد براي استفاده از اتوجه داشته باشيد براي استفاده از ا

....مورد نظر بايستي دو بعدي باشدمورد نظر بايستي دو بعدي باشدمورد نظر بايستي دو بعدي باشدمورد نظر بايستي دو بعدي باشد


زمون مك نمار زمون مك نمار ازمون مك نمار ازمون مك نمار اMc.Nemar Testا

زمون مك نمار جهت مقايسه دو وضعيت كاربرد دارد� .ا

ن است كه متغير بايستي كيفي باشد و دو �زمون ا

ن است كه متغير بايستي كيفي باشد و دو شرط استفاده از اين ا

زمون ا


زمون ا


زمون ا

شرط استفاده از اين ا

اگر متغير دو مقوله اي نباشد، پيغام خطا ديده ....مقوله داشته باشدمقوله داشته باشدمقوله داشته باشدمقوله داشته باشد


اگر متغير دو مقوله اي نباشد، پيغام خطا ديده ....مقوله داشته باشدمقوله داشته باشدمقوله داشته باشدمقوله داشته باشد

.مي شود

ادامه بحث

Nonparametric، گزينه Analyzeاز منوي . �1 Test را كليك كنيد.

:ديده مي شود اسqید بعدپنجره . را اجرا كنيد ...Related Samples-2گزينه . �2


زمون مك نمار زمون مك نمار ازمون مك نمار ازمون مك نمار اMc.Nemar Testا


ادامه بحث

Testجفت متغير مورد نظر را به كادر . �3

Pair List منتقل كنيد.

.را كليك كنيد McNemarگزينه . �4

خروجي ديده . را كليك كنيد OKدكمه . �5


خروجي ديده . را كليك كنيد OKدكمه . �5

.مي شود

ضريب همبستگي اسپيرمنضريب همبستگي اسپيرمنضريب همبستگي اسپيرمنضريب همبستگي اسپيرمن

براي محاسبه همبستگي بين دو متغير بر حسب رتبه ها در اين دو �

متغير، به جاي استفاده از ضريب همبستگي پيرسون از ضريب


متغير، به جاي استفاده از ضريب همبستگي پيرسون از ضريب

.همبستگي اسپيرمن استفاده مي شود

ادامه بحث

گزينه Analyzeاز منوي . 1

Correlate را انتخاب وكليك كنيد.

کادر .را اجرا كنيد Bivariateدستور . 2

.مقابل ديده ميشود


.مقابل ديده ميشود

ضريب همبستگي اسپيرمنضريب همبستگي اسپيرمنضريب همبستگي اسپيرمنضريب همبستگي اسپيرمن

متغيرهاي مورد نظر را به كادر . �3

Variables منتقل كنيد .

را عqمت دار Spearmanگزينه . �4

. كنيد


. كنيد

خروجي . را كليك نمائيد OKدكمه .�5

مشاهده مي شود

موفق باشیدموفق باشید


موفق باشیدموفق باشید

Documents

cld.persiangig.comcld.persiangig.com/dl/8PxB5w/STS AMAR MBA 92.07 (1).pdf · Class Intervals and Class Boundaries If each class grouping has the same width Determine the width of