1 1 ISyE 6203 Variability Basics John H. Vande Vate Fall 2011

11

ISyE 6203Variability Basics

John H. Vande Vate

Fall 2011

22

Agenda

• Forecasting

• Assignable-Cause vs Common variation

• Review of Probability

• Review of Regression

• Forecasting

33

Forecasting

Forecasting is the effort to determine what we can about the future from the past.

We will focus on Quantitative Methods– i.e., not opinions, judgments, “markets”, etc.

44

Laws of Forecasting

• Law 1: Forecasts are wrong• Law 2: Forecast Demand not Sales• Law 3: It is generally easier to forecast

aggregate data than it is to forecast the details. (Big Idea)

• Law 4: It is generally easier to forecast a short time into the future that to forecast far into the future

• Law 5: Simpler forecasts are generally better forecasts

55

General Framework

• Future = (Past) + Residual Error

• The Specifics– What aspects of the past are relevant

– What form of to use

• The Issues– Accuracy: Is the E[Residual Error] ~ 0?

– Precision: Is Residual Error small?

– Complexity and Cost!

66

Variability

Residual Error is the “noise” or unpredictable variation

• Our focus is on managing this

• What tools are available for managing unpredictable variation?

77

Types of Variability• Predictable or Assignable Cause

Variations– Weekends and holidays– Major scheduled events, promotions– Seasonality & Growth– Other Causal relationships

• Unpredictable or Common Cause Variations – Inherent randomness

88

Types of Variability• Predictable or Assignable Cause

Variations

• Unpredictable or Common Cause Variations

• Avoid unnecessary variability

• Trade-off of capturing predictable variation while managing complexity

99

Review of Probability

• Random Variable: A function that associates a value with each point in a sample space.

• Two Random Variables X and Y are independent if and only if for any two subsets A and B of their sample spaces,

P(XA and YB) = P(XA)P(YB)

• Mean:

E[X] =

• Variance: E[(X-E[X])(X-E[X])]

( ) , where p(x) is the density of Xxp x dx

1010

Review of Probability• Covariance: Cov(X,Y) = E[[X-E[X]][Y-E[Y]]• If X and Y are independent, Cov(X,Y) = 0• Correlation Coefficient

XY ranges between -1 and 1• If XY = 0, X and Y are uncorrelated• If X and Y are independent, they are uncorrelated.

Uncorrelated random variables need not be independent.

• More details at http://en.wikipedia.org/wiki/Correlation_and_dependence

YX

YXCovXY ),(

1111

Uncorrelated vs Independent• If X and Y are independent, they are

uncorrelated. Uncorrelated random variables need not be independent.

• Example: – X is Uniformly distributed on [-1,1]

– Y is X2

• E[X] = 0

• E[Y] = 1

2 13

1

/ 2X dx

1212

Uncorrelated vs Independent• If X and Y are independent, they are

uncorrelated. Uncorrelated random variables need not be independent.

• Example: – X is Uniformly distributed on [-1,1]

– Y is X2

• Cov(X,Y) = 2 31 1

3 3

3 13

13

1

[ ( )] [ ]

[ ] [ ]

/ 2 0

E X X E X X

E X E X

X dx

1313

Covariance and Variance• Constant α• E[αX] =• Var[α X] = 2

α X =• Stdev[α X] =

α X =• E[X + Y] = • Var[X + Y] = 2

X+Y = • Stdev[X + Y] = X+Y =

αE[X]α22

X

αX

E[X]+E[Y]

2X + 2

Y + 2*Cov(X,Y)

XYYXYXYX YXCov 2),(2 2222

1414

The Big Idea

• When combining random variables, some of the “noise” cancels. How much cancels depends on the correlation.

1515

Stdev[X + Y]

• Var[X+Y] =• If

• So

2 2 2 22 ( , ) 2X Y X Y X Y XYCov X Y

2 2 2

2 2

2 2 2

1, [ ] 2 ( )

0, [ ]

1, [ ] 2 ( )

XY X Y X Y X Y

XY X Y

XY X Y X Y X Y

Var X yY

Var X yY

Var X yY

2 2

2 2

2 2

1, [ ] 2

0, [ ]

1, [ ] 2 | |

XY X Y X Y X Y

XY X Y

XY X Y X Y X Y

Stdev X yY

Stdev X yY

Stdev X yY

No reduction in variation

Reductions in variation

1616

The Big Idea

• When combining random variables, some of the “noise” cancels. How much cancels depends on the correlation.

1717

Noise Canceling

• X1, X2 independent, identically distributed rvs

• Var[2X1] = 4 2X

• Stdev[2X1] = 2X

• Var[X1 + X2] =• Stdev[X1 + X2] =

• About 30% of the variability canceled

22X + 2*Cov(X,X)

XX XXCov 2),(22 2

1818

The Big Idea?

• That’s a “Big Idea”?• So what?!• Where’s the beef?

1919

Exam Review

• Rules of the game:– I probably made mistakes grading. – If you think I made mistakes grading your

exam, write a brief note indicating what you would like me to review, turn it in. I will review it.

2020

Results

• Average ~71• Std Dev ~18

0

1

2

3

4

5

6

7

8

31-40 41-50 51-60 61-70 71-80 81-90 91-100

ABC

2121

Question 1

2222

Question 1

2323

Question 2

• Where? – Indianapolis

• Why?– Co-locating the cross dock and the plant eliminates

$1.2 million in cycle inventory of monitors saving us $222 thousand annually

– Other reasons

• What I wanted you to see– We have a different model of cost if the cross dock is

at a plant– So check those locations separately

2424

Question 3

• Admittedly challenging.

• To simplify discussion assume the context in which the number and locations of cross docks and pools is fixed.

• We’ll address the case where we can close cross docks and pools subsequently

2525

What we have to do

• Manage flows of components to the cross docks – that’s no longer the same in every solution

• Balance the flows of components into and products out of the cross docks

• Manage the flows of finished goods from the cross docks to the pools

• Balance the flows of each product at each pool• Manage the assignments of stores to pools• Manage the single sourcing of products to the pools• Account for trucks between the cross docks and the

pools• Manage the frequency requirements (for each product)

to the pools

2626

Flows of Components

• We can identify the components and the plants

• CompFlow(i,j) to represent the units of component made a plant i shipped to cross dock j each week. Non-negative

• Multiply by $1*Distance between plant i and cross dock j*weight of product made at plant i and divide by 30,000 lbs to get trucking cost to the cross dock.

2727

Balance at Cross Docks

• We have to balance the flow of each component i to each cross dock j with the volumes of finished products assembled there.

• Let Assemble(p,j) be the units of finshed product p assembled at cross dock j each week. Non-negative

• Let Recipe(p,i) be the number of components from plant i in a unit of finished product p

• How to express the balance?

2828

Balance

• Assemble(p,j)*Recipe(p,i) = units of component made at plant i needed to produce finished product p at cross dock j.

• Sum over the finished products to get the total units of component made at plant i needed at cross dock j

Productsin

j),CompFlow(i i)Recipe(p,*j),Assemble(pp

2929

Flow of Finished Goods

• Let’s call the units of finished product p from cross dock j to pool k each week, ProdToPool(p, j, k)

• We want all the assembled product p at cross dock j to flow out to pools

j),Assemble(pk)j,(p,ProdToPool Poolsin

k

3030

Balance Flow at Pools

• Let’s use Assign(s,k) to indicate whether or not store s is assigned to pool k

• The demand for product p at each store each week is StoreDemand(p)

• The demand for product p at pool k implied by the assignments is

• The total units of product p available to meet this demand at pool k is

Docks Crossin

k)j,(p,ProdToPoolj

),(Assign*d(p)StoreDemanStoresin

kss

3131

Balance Flow at Pools

• These should balance for each product at each pool

docks Crossin Storesin

k) j, (p,ProdToPool),(Assign*d(p)StoreDemanjs

ks

3232

Assignments of Stores to Pools

• Assign each store s to one and only one pool

• So far everything is pretty standard.

1),(AssignPoolsin

ksk

3333

Manage Single Sourcing at Pools

• Each pool k should get all of each product p from one and only one cross dock

• Let Source(p, j, k) indicate whether or not pool k sources product p from cross dock j

• Single Sourcing

1),,(Sourcedocks Crossin

kjpj

3434

But it’s a bit more complicated

• We don’t know the demand at the pool except as a function of the Assign variables so we need to shut down ProdToPool variables based on the Source decisions. To do this we need an upper bound on ProdToPool.

• Let TotalDemand(p) be the total demand each week for product p across all stores

• For each product p, cross dock j and pool k:•

k)j,Source(p,*d(p)TotalDeman k)j,(p,ProdToPool

3535

Trucks from Cross Dock to Pool

• Since there is a frequency requirement for serving the pools, we must keep track of the trucks we send each week

• We know the weight moving from each cross dock j to each pool k each week:

• So,

k)j,(p,ProdToPool*(p)ProdWeightProductsin p

/30000k)j,(p,ProdToPool*(p)ProdWeight),(Productsin pkjTrucks

3636

Frequency requirement

• At least one truck to pool k from each cross dock j for each cross dock that pool k sources SOME product from

• For each pool k, product p and cross dock j

),,(),( kjpSourcekjTrucks

3737

If we can shut down cross docks?

• Let OpenCD(j) indicate whether or not we will open cross dock j

• Clearly we want for each product p, pool k and cross dock j

OpenCD(j) k)j,Source(p,

3838

Alternate & Better model

• Integrate the sourcing decisions into a binary decision variable

Path(prod, cd, pool, store) = 1

if store receives prod from pool and pool receives prod from cd

• Keep

Assign(store, pool) = 1

if store receives supplies from pool

3939

Alternate & Better model

• Single sourcing at Store s

• Keep Assign(store, pool) = 1

if store receives supplies from pool• Ensure Path choices are consistent with

assignments, i.e., we get to store via assigned pools for all products

Poolsin

1 p) Assign(s,p

pool) re,Assign(sto store) pool, cd, Path(prod,docks Crossin

cd

4040

Alternate & Better Model

• Ensure single sourcing at pool• Introduce Source(prod, cd, pool) = 1 if pool

receives prod from cd

• Ensure consistency with Path decisions

1 pool) cd, d,Source(proDocks Crossin

cd

pool) cd, d,Source(pro store) pool, cd, Path(prod,

4141

Alternate & Better Model

• Volumes at the pool as before – based on Assign

• Volumes at the Cross Dock based on Path

• Is the requirement for comp at cd• …

Storesin Poolsin Productsin prod

store)) pool, cd, Path(prod,*comp) od,(Recipe(prstore pool

4242

Reno

0

500

1000

1500

2000

2500

3000

3500

4000

45001/

4/20

10

2/4/

2010

3/4/

2010

4/4/

2010

5/4/

2010

6/4/

2010

7/4/

2010

8/4/

2010

9/4/

2010

10/4

/201

0

11/4

/201

0

12/4

/201

0

Georgia

0

1000

2000

3000

4000

5000

6000

70001/

4/20

10

2/4/

2010

3/4/

2010

4/4/

2010

5/4/

2010

6/4/

2010

7/4/

2010

8/4/

2010

9/4/

2010

10/4

/201

0

11/4

/201

0

12/4

/201

0

Edison

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1/4/

2010

2/4/

2010

3/4/

2010

4/4/

2010

5/4/

2010

6/4/

2010

7/4/

2010

8/4/

2010

9/4/

2010

10/4

/201

0

11/4

/201

0

12/4

/201

0

Examples

• Adding across customers or geographies

• Demand for a single sku at a single DC

Carlstadt

0

500

1000

1500

2000

2500

30001/

4/20

10

2/4/

2010

3/4/

2010

4/4/

2010

5/4/

2010

6/4/

2010

7/4/

2010

8/4/

2010

9/4/

2010

10/4

/201

0

11/4

/201

0

12/4

/201

0

4343

Daily Sales: Single SKU• Sum of Stdev’s of all DCs: 1,957

• Stdev of Demand over all DCs: 1,553 (21% less)

Total

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

1/4/

2010

2/4/

2010

3/4/

2010

4/4/

2010

5/4/

2010

6/4/

2010

7/4/

2010

8/4/

2010

9/4/

2010

10/4

/201

0

11/4

/201

0

12/4

/201

0

4444

Correlated?

• If daily sales were uncorrelated across all DCs then variances would add

• Stdev across all DCs would equal

Sqrt(Var[DC-1]+Var[DC-2]+Var[…]) = 1,150

• Actual Stdev is 1,553

• Conclusion?

4545

Adding across Products

• Sum of Stdev in Weekly Sales across all SKUs for vendor at a DC: 1,043

• By far the largest SKU has Stdev 999.

• Stdev in Total Weekly Sales for DC of all SKUs from vendor: 996

• Explain

4646

Adding across time

• Stdev in daily sales of SKU: 999.49

• Stdev in weekly sales of SKU: 3882.63

• Much lower than 5*Daily Stdev

• Higher than ?*Daily Stdev

• Conclusion?

4747

How can we:

• Add across customers?

• Add across products?

• Add across time?

• When do these conflict?

4848

Questions?

4949

Agenda



• Review of Regression: A tool for capturing predictable variation

• Forecasting

5050

Correlation Example• Truck load shipments from Green Bay and

Denver to Indianapolis• Assemble Products in Indianapolis and

distribute by full truckload from there to stores• What will happen to costs compared to direct

full truck load shipments?– Transportation– Pipeline– At plants– At Indianapolis Warehouse/Cross Dock– At Stores

Tree Height

Trunk Diameter Correlation Coefficient: 0.886

y x35 849 927 733 660 1321 745 1151 12

Tree Height vs Trunk Diameter

0

10

20

30

40

50

60

70

4 5 6 7 8 9 10 11 12 13 14

Trunk Diameter (inches)

Tre

e H

eig

ht

(fee

t)

5151

Regression

• Explain or model the relationship between the dependent variable (e.g., tree height) and the independent variables (e.g., trunk diameter)

• Linear Regression Model

y = β0 + β1x +

Assignable cause variation

Common cause variation

5252

RegressionTree Height vs Trunk Diameter

y = 4.5413x - 1.3147R2 = 0.7854

0

10

20

30

40

50

60

70

4 5 6 7 8 9 10 11 12 13 14

Trunk Diameter (inches)

Tre

e H

eig

ht

(fee

t)

5353

Regression• The model

y = 4.5413 x - 1.3147

is the “best fit” linear model for the relationship

• It is not based on physical laws or causality (e.g., thin trees don’t have negative height)

• It does “explain” about 78% of the variability in the data R2 = 0.7854

5454

Explained vs Unexplained Variation

For Linear Regression

Coefficient of Determination

)yy( )y ( )y (

Squares of Sum Explained Squares of Sum Residual Squares of Sum Total

estimateith y

nsobservatio theof averagey

nobservatioith

2i

2ii

2

i

i

i

yy

y

Squares of Sum TotalSquares of Sum Residual

Squares of Sum TotalSquares of Sum Explained2 1R

5555

Coefficient of Determination• 0 ≤ R2 ≤ 1• Note in our example the Coefficient of

Determination R2 is equal to the square of the Correlation Coefficient r2

R2 = 0.7854 = r2 = 0.8862

• This is generally the case for simple linear regression (1 independent variable)

5757

Standard Error • A sort of standard deviation about the

regression line. • How widely dispersed are observations

about the regression line

5858

P-values

• Indicates how likely it is to see this value if the true value of the coefficient is 0 and there’s as much noise as we see in the data.

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.886R Square 0.785Adjusted R Square 0.750Standard Error 6.635Observations 8.000

ANOVAdf SS MS F Significance F

Regression 1.000 966.736 966.736 21.960 0.003Residual 6.000 264.139 44.023Total 7.000 1230.875

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -1.315 9.149 -0.144 0.890 -23.701 21.072 -23.701 21.072X Variable 1 4.541 0.969 4.686 0.003 2.170 6.913 2.170 6.913

Strong evidence there’s a relationship

Weak evidence the intercept isn’t 0

5959

Multiple Linear Regression• With more than one independent variable, e.g.,

Sales = β0 + β1 GDP + β2 Unemployment Rate + …

• Need to watch out for– Non-linearity: The relationship might not be linear,

e.g., weight of the tree vs trunk diameter. – Multi-colinearity: one independent variable is a

linear function of another (eliminate one)– Over specified model: Adding more independent

variables increases R2, but reduces the degrees of freedom in the fit. Adjusted R2 attempts to account for this.

6060

Static Regression

Salaries

Independent Data

6161

Excel’s Linest

• Array Function Linest(Y-array, Array of X’s, [const],[stat]) outputs the β’s

• One column for each β• Remember Array Functions are entered

with Ctrl-Shift-Enter• Allows you to perform running

regressions• Coefficients come out in reverse order

(Go figure)

6262

LinEst

• Regression.xls

Faculty Salary Gender Rank Dept Years Merit SUMMARY OUTPUT1 38 0 3 1 0 1.47

2 58 1 2 2 8 4.38 Regression Statistics3 80 1 3 3 9 3.65 Multiple R 0.948 4 30 1 1 1 0 1.64 R Square 0.899 5 50 1 1 3 0 2.54 Adjusted R Square 0.878 6 49 1 1 3 1 2.06 Standard Error 3.765 7 45 0 3 1 4 4.76 Observations 30.000 8 42 1 1 2 0 3.059 59 0 3 3 3 2.73 ANOVA

10 47 1 2 1 0 3.14 df SS MS F Significance F11 34 0 1 1 3 4.42 Regression 5.000 3,037.212 607.442 42.859 0.000 12 53 0 2 3 0 2.36 Residual 24.000 340.155 14.173 13 35 1 1 1 1 4.29 Total 29.000 3,377.367 14 42 0 1 2 2 3.81

15 42 0 1 2 2 3.84 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%16 51 0 3 2 7 3.15 Intercept 0.323 4.810 0.067 0.947 (9.604) 10.250 (9.604) 10.250 17 51 1 2 1 8 5.07 Gender 8.050 1.495 5.386 0.000 4.965 11.134 4.965 11.134 18 40 0 1 2 3 2.73 Rank 8.430 0.849 9.930 0.000 6.678 10.183 6.678 10.183 19 48 1 2 1 1 3.56 Dept 9.220 0.885 10.413 0.000 7.392 11.047 7.392 11.047 20 34 1 1 1 7 3.54 Years 0.059 0.311 0.191 0.850 (0.583) 0.701 (0.583) 0.701 21 46 1 2 1 2 2.71 Merit 3.107 1.032 3.010 0.006 0.977 5.237 0.977 5.237 22 45 0 1 2 6 5.1823 50 1 1 3 2 2.66

24 61 0 3 3 3 3.7 Merit Years Dept Rank Gender Intercept25 62 1 3 1 2 3.75 3.106969168 0.059 9.220 8.430 8.050 0.323

26 51 0 1 3 8 3.96 {=Linest(B2:B31, C2:G31)}27 59 0 3 3 0 2.8828 65 1 2 3 5 3.3729 49 0 1 3 0 2.8430 37 1 1 1 9 5.12

6363

Questions?

6464

LTL Rates

• Complicated to work with rate engines• Alternative: model the rates• Challenge:

– Model for rates = Max{Min Charge, Min{Intercept1 + Rate1*Weight*Distance, Intercept2 + Rate2*Weight*Distance}}

• Come up with estimates for – Min Charge, Intercepts, Rates

• Winner: Minimum sum of square errors

6565

LTL RatesObservation Distance Weight Price

1 165 87 112.52$ 2 165 125 161.66$ 3 333 107 103.11$ 4 165 299 386.70$ 5 165 322 416.44$ 6 561 389 409.77$ 7 165 1,754 1,344.97$ 8 165 3,345 2,303.70$ 9 333 5,997 2,443.78$

10 561 5,000 2,345.00$ 11 165 20,000 2,680.00$ 12 333 10,000 3,270.01$ 13 333 20,000 3,434.00$ 14 561 20,000 2,932.00$

6666

Agenda



• Review of Regression

• Forecasting

6767

Forecasting

Forecasting is the effort to determine what we can about the future from the past.

We will focus on Quantitative Methods– i.e., not opinions, judgments, “markets”, etc.

6868

General Framework

• Future = (Past) + Residual Error

• The Specifics– What aspects of the past are relevant

– What form of to use

• The Issues– Accuracy: Is the E[Residual Error] ~ 0?

– Precision: Is Residual Error small?

– Complexity and Cost!

6969

Examples• Autoregressive or time-series

– Past = Historical values of the process we are forecasting, e.g., past demand forecasts future demand

• Causal– Past = Historical values are “leading

indicators” like GDP, employment, housing starts, etc.

• Regression, Maximum Likelihood– Past may include both historical values and

leading indicators

7070

Past & Future• We lump data into time periods

– Average, total or sample in some period– Reduces data requirements– Averaging and totaling smooth the data (remember

the Big Idea)– Actionable

• What we’re forecasting– yt = value at time t

• Past– Autoregressive: yt-1, yt-2, yt-3,…– Leading indicators: xi,t-1, xi,t-2, xi,t-3,…

7171

Specifics• Autoregressive or Time-Series

– Moving average• Past = past n observations yt-1, yt-2, yt-3,… yt-n

(yt-1, yt-2, yt-3,… yt-n ) =

• More a tool for understanding the past than for forecasting the future

– Exponential Smoothing• Past = “all” past observations yt-1, yt-2, yt-3,…

(yt-1, yt-2, yt-3,… ) =

1

1

n

t ini

y

11 2 3

1

(1 ) (1 ) ( , ,...)it i t t t

i

y y f y y

7272

Specifics• Exponential Smoothing with Trend

– Past = “all” past observations yt-1, yt-2, yt-3,…

– Forecast uses exponential smoothing to estimate• The “Level”: weighted average of observation and past

estimate

• The “Trend”: weighted average of observation and past estimate

• Forecast m periods in the future = Level + m*Trend

– Details at Engineering Statistics Handbook

7373

Specifics• Exponential Smoothing with Trend & Seasonality

– Past = “all” past observations yt-1, yt-2, yt-3,…

– Forecast uses exponential smoothing to estimate• The “(De-seasonalized) Level”: weighted average of the de-

seasonalized observation and past de-seasonalized estimate

• The “Trend”: weighted average of observation and past estimate

• The “Seasonal factors”: weighted average of observation and past estimate

• Forecast m periods in the future

= (Level+m*Trend)*Seasonal Factor

– Details at Engineering Statistics Handbook

7474

Specifics• Exponential Smoothing with …

– You get the idea.

– Issues• Initialization data

• Choosing the weights

• Growing complexity

7575

Specifics

• Regression, Maximum Likelihood– Past = past observations yt-1, yt-2, yt-3,…

and leading indicators xi,t-1, xi,t-2…

(yt-1, yt-2, yt-3,… xi,t-1, xi,t-2…) is some function of these past values

• Examples: – Linear– Non-linear models

» Diffusion » Logit» Probit

7676

Bass Diffusion• Postulates a form for sales over the life of a

product• Three parameters

– m: The total potential market

– p, q: Shape parameters

( )

( )1

1( )

p q t

q p q tp

ee

N t m

7777

Questions?

7878

Top Down vs Bottom Up

• Often faced with forecasting 100’s of families and 1,000’s of SKUs

• Different Options:– Top Down:

• Develop an aggregate forecast and allocate it to more detailed level

– Bottom Up:• Develop individual detailed forecasts and

aggregate them up

7979

Laws of Forecasting

• Law 1: Forecasts are wrong• Law 2: Forecast Demand not Sales• Law 3: It is generally easier to forecast

aggregate data than it is to forecast the details. (Big Idea)

• Law 4: It is generally easier to forecast a short time into the future that to forecast far into the future

• Law 5: Simpler forecasts are generally better forecasts

8080

Examples from ABL

• Sales for SKU 8295

• Raw dataRaw Sales

0

200

400

600

800

1000

1200

1/4/

2010

1/18

/201

0

2/1/

2010

2/15

/201

0

3/1/

2010

3/15

/201

0

3/29

/201

0

4/12

/201

0

4/26

/201

0

5/10

/201

0

5/24

/201

0

6/7/

2010

6/21

/201

0

7/5/

2010

7/19

/201

0

8/2/

2010

8/16

/201

0

8/30

/201

0

9/13

/201

0

9/27

/201

0

10/1

1/20

10

10/2

5/20

10

11/8

/201

0

11/2

2/20

10

12/6

/201

0

12/2

0/20

10

8181

Variability?

• How do we forecast this?

• How do we assign a variability to this?

Not Actionable!

8282

Sales by Day

We can make ordering decisions on a daily basis:

Daily Sales

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

1/4/

2010

1/18

/201

0

2/1/

2010

2/15

/201

0

3/1/

2010

3/15

/201

0

3/29

/201

0

4/12

/201

0

4/26

/201

0

5/10

/201

0

5/24

/201

0

6/7/

2010

6/21

/201

0

7/5/

2010

7/19

/201

0

8/2/

2010

8/16

/201

0

8/30

/201

0

9/13

/201

0

9/27

/201

0

10/1

1/20

10

10/2

5/20

10

11/8

/201

0

11/2

2/20

10

12/6

/201

0

12/2

0/20

10

8383

Weekly Sales

• Or on a weekly basis

Weekly Sales

0

5000

10000

15000

20000

25000

30000

35000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

8484

Or a Monthly Basis

Units Sold by Month

0

10000

20000

30000

40000

50000

60000

70000

1 2 3 4 5 6 7 8 9 10 11 12

• Or on a monthly basis

• Which is appropriate?

8585

Compare the Variability

• .

•

Weekly Sales

0

5000

10000

15000

20000

25000

30000

35000

Daily Sales

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

1/4/

2010

1/18

/201

0

2/1/

2010

2/15

/201

0

3/1/

2010

3/15

/201

0

3/29

/201

0

4/12

/201

0

4/26

/201

0

5/10

/201

0

5/24

/201

0

6/7/

2010

6/21

/201

0

7/5/

2010

7/19

/201

0

8/2/

2010

8/16

/201

0

8/30

/201

0

9/13

/201

0

9/27

/201

0

10/1

1/20

10

10/2

5/20

10

11/8

/201

0

11/2

2/20

10

12/6

/201

0

12/2

0/20

10

8686

The Big Idea

• Average Daily Sales: 1280.196• Std Dev. In Daily Sales: 1546.472

• Average Weekly Sales: 6400.981• Std Dev. In Weekly Sales: 5971.578

• Avg Weekly Sales = 5*Average Daily Sales

• What about the relationship between the variabilities?• 5*Std Dev. In Daily Sales = 7732.361

What does the Big Idea say we should expect?

8787

The Big Idea

• If sales from week to week are independent, we should see

• 5*Variance in Daily Sales = Variance in Weekly Sales

• Sqrt(5)*Std Dev in Daily Sales = Std Dev in Weekly Sales

• Sqrt(5)*Std Dev in Daily Sales = 3458.017 < 5971.578

• So, Sales from Day to Day are (auto) correlated

8888

Forecasting

• Try Simple Techniques:– Moving Average– Exponential Smoothing– Exponential Smoothing with Trend– Exponential Smoothing with Trend &

Seasonality– (Auto) Regression– Bass Diffusion

8989

Moving Average

• Average Error: 666.29 (Over Estimates)• Std Dev of Error: 4,094 (< Std Dev. In Sales)

Weekly Sales vs Moving Average

0

5000

10000

15000

20000

25000

30000

35000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

9090

Exponential Smoothing

• Average Error: 786.13 (less accurate)• Std.Dev in Errors: 4,308 (less precise)

Weekly Sales vs Exponential Smoothing

0

5000

10000

15000

20000

25000

30000

35000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

9191

Exponential Smoothing w/ Trend

• Average Error: -167.58 (more accurate)• Std.Dev in Errors: 5,065 (less precise)

Weekly Sales vs Exponential Smoothing

0

5000

10000

15000

20000

25000

30000

35000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

Weekly Sales vs Exponential Smoothing w/ Trend

0

5000

10000

15000

20000

25000

30000

35000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

9292

Exp. Smoothing w/ Trend & Seasonality

Seasonality by:Day of WeekWeek of MonthWeek of Year

Sales by Day of Week

Mon, 15%

Tue, 21%

Wed, 20%

Thu, 24%

Fri, 20%

Sales By Week of Month

1, 31%

2, 16%

3, 23%

4, 21%

5, 25%

9393

Regression

Using

Previous 2 week

Previous 3 weeks

…

Might Use

Week of Month too

9494

Previous 2 Weeks

Average Error: 2,096Std Dev in Error: 4,929

-

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43

9595

Previous 3 Weeks

Average Error: 1,837

Std Dev in Error: 5,010

-

5,000

10,000

15,000

20,000

25,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

Series1

Series2

9696

Previous 4 Weeks

Average Error: 1664Std Dev in Error: 5367.29

-

5,000

10,000

15,000

20,000

25,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

9797

Bass Diffusion

Average Error: -1536.17Std Dev in Error: 5324.07

-

5,000

10,000

15,000

20,000

25,000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 410

5000

10000

15000

20000

25000

9898

Conclusion

• Simple moving average provides best forecast on a weekly basis

• Exponential smoothing better on a monthly basis

• Now let’s build a demand distribution given a forecast

• Idea: Compare Actual Sales to Forecasted Sales through the ratio

• Accuracy means Average should be 1• Precision means Std Dev should be small

9999

Demand Distribution

• We know the forecast is WRONG

• But it does give us some information

• What Actual Sales will be is uncertain, but we can develop a distribution for it

• What are the chances Actual Sales are larger than X? Smaller than Y? …

100100

Actual to Forecast Ratios

the Avg is 1.1 (What does that mean?)

• σ the Std Dev is 0.87

Frequency

0

2

4

6

8

10

12Ratio < 1 Over

forecast

Ratio > 1 Under forecast

101101

Translate Forecast into Demand Distribution

• Assuming things continue as they have…

• If the forecast is 100, what do we expect actual sales to be?

• So, if is the Average Actual/Forecast ratio & F is the forecast, Expected demand is F

• What is the spread of actual demand about this mean value?

110

102102

Translate Forecast into Demand Distribution

• If things continue as they have, it is natural to assume we will draw from the historical distribution of Actual/Forecast ratios.

• So, if = 1, If the forecast is 100, what should the distribution of Actual Sales be?– The mean should be?

– The std dev of actual demand about this mean value should be?

• Why?

100

100σ

103103

Demand Distribution

• The Actual/Forecast ratio has a distribution with mean and std dev σ

• If = 1, the distribution for Actual Sales is just F, the forecast, times the Actual/Forecast ratio so, it has mean F and std dev σF

• If ≠ 1, the distribution for Actual Sales is just F, the forecast, times the Actual/Forecast ratio so, it has mean F and std dev σF

104104

Common Cause Variability

• So, the common cause variability in demand for SKU 8295 that we will need to protect against with safety stock is about 87% of the Forecasted demand!

• Just working with Raw Sales– Std Dev/Average = 93%

105105

Common Cause Variability

• On the other hand, with monthly sales

• Working with Raw Sales– Std Dev/Average = 74%

• Working with Exponential Smoothing forecast– Std Dev/Average = 36%

Documents

1 1 ISyE 6203 Variability Basics John H. Vande Vate Fall 2011