125
Session 9

Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Embed Size (px)

Citation preview

Page 1: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Session 9

Page 2: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 2

OutlineTwo Multivariate Methods • Cluster Analysis

– Excel– Minitab

• Discriminant Analysis– Excel– Minitab

Steam caseCars

Page 3: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 3

Cluster Analysis• Concerned with grouping a large

number of observations into reasonable sub-groups (clusters) on the basis of their similarities on multiple dimensions

• Similar to regression in terms of its basic method: finding a solution that minimizes a total sum of squared errors

• Not concerned with explaining variability or forecasting

• No dependent variable

Page 4: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 4

Example: MBA ProgramsSchool Accept % Enroll % GMAT GPA Cost Minority % Non-U.S. % Female % Salary Pop. Density Student Body Age Rolling Job

Harvard 12.0% 82.5% 705 3.58 $ 31,800 24.0 35.0 35.0 $127,338 14,618 1821 28 0 92% Stanford 9.0% 78.0% 718 3.58 $ 31,002 24.0 24.0 38.0 $130,253 2,371 756 27 0 92% Columbia 10.0% 72.0% 710 3.45 $ 32,154 20.0 28.0 35.0 $121,000 23,671 1225 27 1 94% Penn 15.5% 70.2% 703 3.57 $ 31,218 19.0 42.0 31.0 $122,711 11,492 1542 28 0 88% MIT 18.0% 69.9% 703 3.50 $ 31,200 19.0 35.0 26.0 $120,449 11,398 638 28 0 91% Northwestern 16.0% 64.0% 700 3.45 $ 30,255 17.5 30.8 30.0 $120,500 12,185 1250 28 0 88% Carn.-Mellon 28.2% 60.1% 660 3.25 $ 26,750 21.3 44.0 29.0 $113,448 6,598 440 28 0 94% Chicago 28.4% 60.0% 695 3.33 $ 30,596 18.2 36.0 22.0 $117,893 12,185 1043 29 0 93% Michigan 19.6% 58.5% 676 3.34 $ 30,686 20.5 29.0 27.6 $119,718 4,238 862 28 0 93% Duke 20.1% 57.0% 688 3.43 $ 30,323 20.6 34.0 30.0 $117,899 2,503 682 27 0 93% Yale 20.0% 56.7% 698 3.50 $ 29,720 24.0 34.0 33.0 $114,274 6,559 436 28 0 94% Michigan State 22.3% 55.0% 641 3.36 $ 17,400 13.6 39.0 24.0 $ 90,733 3,738 221 28 0 95% Texas 29.0% 52.4% 680 3.34 $ 24,252 12.3 28.0 22.7 $106,905 2,260 780 28 1 78% NYU 15.2% 51.7% 700 3.40 $ 32,194 23.0 33.0 38.0 $112,900 23,671 585 27 1 86% Cornell 25.8% 51.2% 669 3.32 $ 30,505 22.6 33.0 31.0 $117,825 1,802 551 29 0 92% Virginia 16.6% 51.0% 681 3.40 $ 28,900 15.1 29.0 30.0 $115,272 3,938 516 28 1 93% Dartmouth 18.0% 51.0% 690 3.40 $ 30,250 17.2 29.9 24.0 $124,235 1,800 436 28 0 92% Berkeley 14.0% 50.3% 688 3.43 $ 21,208 19.4 33.0 29.0 $115,560 9,631 471 28 0 69% Indiana 24.7% 49.8% 651 3.35 $ 20,694 14.3 31.0 22.0 $102,299 4,073 596 28 0 83% UCLA 8.0% 48.0% 691 3.60 $ 22,490 21.6 28.0 30.0 $114,368 7,436 653 28 1 62% Emory 31.5% 46.7% 651 3.30 $ 28,012 10.7 36.0 21.0 $108,959 2,996 387 28 1 88% Georgetown 20.8% 45.0% 662 3.35 $ 28,440 10.8 35.0 25.0 $103,968 9,351 520 28 1 81% North Carolina 23.8% 44.2% 674 3.30 $ 26,545 16.2 32.0 29.0 $111,104 2,619 554 27 0 92% Rochester 32.0% 44.0% 637 3.33 $ 28,620 10.3 52.0 24.0 $103,633 6,541 453 29 1 72% USC 29.0% 43.4% 676 3.30 $ 30,044 32.3 29.0 30.0 $100,537 7,436 589 28 0 81% Ohio State 29.0% 41.4% 645 3.38 $ 21,792 15.5 35.0 41.0 $ 99,859 3,368 277 28 1 79% Minnesota 34.4% 40.6% 645 3.33 $ 21,367 8.6 26.0 31.0 $ 93,080 6,606 236 28 1 94%

Page 5: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 5

Cluster Analysis Questions

• Given a certain number of clusters, which schools are grouped together?

• How is the set of clusters affected if we change the number of clusters?

• For each cluster, which school is the most “typical”?

• How different are the clusters from each other?

• What is the best number of clusters?

Page 6: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 6

Basic Method in Excel• We will assume that all of these attributes

deserve equal weighting in our analysis. We will – name a school as the “typical” school in each cluster

(called the centroid of the cluster), – assign each of the non-centroid schools to the cluster

where they are most similar to the centroid, and– optimize the identities of the centroids and the

cluster assignments so as to minimize the total Euclidean distance between each school and its cluster centroid.

• We define “most similar” to be the least sum of squared errors across all attributes between a cluster member and the centroid of the cluster.

Page 7: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 7

Nonlinear Problems

Some nonlinear problems can be formulated in a linear fashion (i.e. some network problems).

Other nonlinear functions can be solved with our basic methods (i.e. smooth, continuous functions that are concave or convex, such as portfolio variances).

However, there are many types of nonlinear problems that pose significant difficulties.

Page 8: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 8

Nonlinear Problems

The linear solution to a nonlinear (say, integer) problem may be infeasible.

The linear solution may be far away from the actual optimal solution.

Some functions have many local minima (or maxima), and Solver is not guaranteed to find the global minimum (or maximum).

Page 9: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 9

3 Solvers

• Simplex LP Solver• GRG Nonlinear Solver• Evolutionary Solver

Page 10: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 10

Solution MethodologyThe standard simplex algorithm (Solver’s default method) won’t work on this problem. The GRG Nonlinear algorithm will make an honest effort, but is likely to give up without finding the optimal solution.

This can result from the use of MAX, IF, and SUMIF functions, resulting in discontinuities in our productive function and constraints as functions of the decision variables.

It can also be the result of using numerical decision variables that are in fact simply names (as in this example, where the names of the clusters happen to be numbers).

The Evolutionary Solver, a genetic algorithm, can do a good job with a problem like this, but is not guaranteed to find the optimal solution.

Page 11: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 11

Solution MethodologyThe Evolutionary Solver operates in a completely different way from the other types. Instead of searching in a structured way guaranteed to reach the optimal solution, genetic algorithms operate somewhat like biological evolutionary processes, with some degree of randomness in the steps taken from one solution to the next.

In a finite period of time, the Evolutionary Solver is not guaranteed to find the optimal solution, but it will find very good solutions and try to improve upon them.

Page 12: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 12

In cluster analysis it is common to standardize the attribute data, so that those variables with large units (such as cost, salary and student body size) do not dominate the sum of squares over attributes with small units (such as % female, % admitted, and % with a job at graduation).

So we transform each attribute for each school into a z-value.

Standardization

Page 13: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 13

123456789

10111213141516171819202122232425262728

A B C D E F G H I J K LSchool Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary (base+signing)Pop. Den. In City (people per sq. mile)Size: Student Body

Stanford -1.63 2.02 1.67 1.82 0.78 1.09 -1.59 1.68 1.74 -0.88 0.18Harvard -1.22 2.42 1.11 1.82 0.97 1.09 0.28 1.11 1.45 1.18 2.94Penn -0.76 1.32 1.02 1.71 0.83 0.15 1.47 0.34 0.99 0.66 2.21MIT -0.42 1.30 1.02 1.00 0.82 0.15 0.28 -0.61 0.76 0.64 -0.12Northwestern -0.69 0.77 0.89 0.49 0.60 -0.13 -0.43 0.15 0.77 0.77 1.46Duke -0.14 0.15 0.38 0.28 0.62 0.45 0.11 0.15 0.51 -0.86 -0.01Chicago 0.97 0.41 0.68 -0.74 0.68 0.00 0.45 -1.37 0.51 0.77 0.92Columbia -1.49 1.48 1.32 0.49 1.05 0.34 -0.91 1.11 0.82 2.71 1.39Dartmouth -0.42 -0.39 0.47 -0.03 0.60 -0.19 -0.59 -0.99 1.14 -0.98 -0.65Berkeley -0.96 -0.45 0.38 0.28 -1.54 0.22 -0.06 -0.04 0.27 0.34 -0.56Michigan -0.21 0.28 -0.14 -0.64 0.70 0.43 -0.74 -0.30 0.69 -0.57 0.46Virginia -0.61 -0.39 0.08 -0.03 0.28 -0.58 -0.74 0.15 0.24 -0.62 -0.44NYU -0.80 -0.33 0.89 -0.03 1.06 0.90 -0.06 1.68 0.01 2.71 -0.26Yale -0.15 0.12 0.81 1.00 0.47 1.09 0.11 0.73 0.14 -0.17 -0.65UCLA -1.76 -0.66 0.51 2.02 -1.24 0.64 -0.91 0.15 0.15 -0.03 -0.09Cornell 0.62 -0.37 -0.44 -0.85 0.66 0.82 -0.06 0.34 0.50 -0.98 -0.35North Carolina 0.36 -1.00 -0.22 -1.05 -0.28 -0.38 -0.23 -0.04 -0.17 -0.84 -0.34Carnegie-Mellon 0.95 0.42 -0.82 -1.56 -0.23 0.58 1.80 -0.04 0.06 -0.17 -0.64Texas 1.05 -0.27 0.04 -0.64 -0.82 -1.11 -0.91 -1.24 -0.59 -0.90 0.24USC 1.05 -1.06 -0.14 -1.05 0.55 2.64 -0.74 0.15 -1.23 -0.03 -0.25Indiana 0.48 -0.49 -1.21 -0.54 -1.66 -0.73 -0.40 -1.37 -1.06 -0.59 -0.23Emory 1.39 -0.77 -1.21 -1.05 0.07 -1.41 0.45 -1.56 -0.39 -0.78 -0.77Rochester 1.45 -1.01 -1.81 -0.74 0.21 -1.48 3.16 -0.99 -0.92 -0.18 -0.60Georgetown -0.05 -0.92 -0.74 -0.54 0.17 -1.39 0.28 -0.80 -0.89 0.30 -0.43Michigan State 0.15 -0.04 -1.64 -0.44 -2.44 -0.86 0.96 -0.99 -2.21 -0.65 -1.20Ohio State 1.05 -1.25 -1.47 -0.23 -1.40 -0.51 0.28 2.25 -1.30 -0.71 -1.06Minnesota 1.78 -1.31 -1.47 -0.74 -1.50 -1.80 -1.25 0.34 -1.98 -0.17 -1.16

=((VLOOKUP($A9,'Raw Data'!$A$2:$O$28,3,0))-AVERAGE('Raw Data'!C$2:C$28))/STDEV('Raw Data'!C$2:C$28)

Page 14: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 14

123456789

101112131415161718192021222324252627282930313233

34353637383940414243

44

45

464748495051525354555657585960616263646566676869707172

A B C D E F G H I J K L M N O PIndex School Accept % Enroll % GMAT GPA Cost Minority pctNon-U.S. pct Female % Salary Density St Body Age Rolling Job

1 Stanford -1.63 2.02 1.67 1.82 0.78 1.09 -1.59 1.68 1.74 -0.88 0.18 -1.68 -0.75 0.582 Harvard -1.22 2.42 1.11 1.82 0.97 1.09 0.28 1.11 1.45 1.18 2.94 0.13 -0.75 0.583 Penn -0.76 1.32 1.02 1.71 0.83 0.15 1.47 0.34 0.99 0.66 2.21 0.13 -0.75 0.124 MIT -0.42 1.30 1.02 1.00 0.82 0.15 0.28 -0.61 0.76 0.64 -0.12 0.13 -0.75 0.465 Northwestern -0.69 0.77 0.89 0.49 0.60 -0.13 -0.43 0.15 0.77 0.77 1.46 0.13 -0.75 0.126 Duke -0.14 0.15 0.38 0.28 0.62 0.45 0.11 0.15 0.51 -0.86 -0.01 -1.68 -0.75 0.697 Chicago 0.97 0.41 0.68 -0.74 0.68 0.00 0.45 -1.37 0.51 0.77 0.92 1.95 -0.75 0.698 Columbia -1.49 1.48 1.32 0.49 1.05 0.34 -0.91 1.11 0.82 2.71 1.39 -1.68 1.28 0.819 Dartmouth -0.42 -0.39 0.47 -0.03 0.60 -0.19 -0.59 -0.99 1.14 -0.98 -0.65 0.13 -0.75 0.58

10 Berkeley -0.96 -0.45 0.38 0.28 -1.54 0.22 -0.06 -0.04 0.27 0.34 -0.56 0.13 -0.75 -2.0811 Michigan -0.21 0.28 -0.14 -0.64 0.70 0.43 -0.74 -0.30 0.69 -0.57 0.46 0.13 -0.75 0.6912 Virginia -0.61 -0.39 0.08 -0.03 0.28 -0.58 -0.74 0.15 0.24 -0.62 -0.44 0.13 1.28 0.6913 NYU -0.80 -0.33 0.89 -0.03 1.06 0.90 -0.06 1.68 0.01 2.71 -0.26 -1.68 1.28 -0.1214 Yale -0.15 0.12 0.81 1.00 0.47 1.09 0.11 0.73 0.14 -0.17 -0.65 0.13 -0.75 0.8115 UCLA -1.76 -0.66 0.51 2.02 -1.24 0.64 -0.91 0.15 0.15 -0.03 -0.09 0.13 1.28 -2.8816 Cornell 0.62 -0.37 -0.44 -0.85 0.66 0.82 -0.06 0.34 0.50 -0.98 -0.35 1.95 -0.75 0.5817 North Carolina 0.36 -1.00 -0.22 -1.05 -0.28 -0.38 -0.23 -0.04 -0.17 -0.84 -0.34 -1.68 -0.75 0.5818 Carnegie-Mellon 0.95 0.42 -0.82 -1.56 -0.23 0.58 1.80 -0.04 0.06 -0.17 -0.64 0.13 -0.75 0.8119 Texas 1.05 -0.27 0.04 -0.64 -0.82 -1.11 -0.91 -1.24 -0.59 -0.90 0.24 0.13 1.28 -1.0420 USC 1.05 -1.06 -0.14 -1.05 0.55 2.64 -0.74 0.15 -1.23 -0.03 -0.25 0.13 -0.75 -0.6921 Indiana 0.48 -0.49 -1.21 -0.54 -1.66 -0.73 -0.40 -1.37 -1.06 -0.59 -0.23 0.13 -0.75 -0.4622 Emory 1.39 -0.77 -1.21 -1.05 0.07 -1.41 0.45 -1.56 -0.39 -0.78 -0.77 0.13 1.28 0.1223 Rochester 1.45 -1.01 -1.81 -0.74 0.21 -1.48 3.16 -0.99 -0.92 -0.18 -0.60 1.95 1.28 -1.7324 Georgetown -0.05 -0.92 -0.74 -0.54 0.17 -1.39 0.28 -0.80 -0.89 0.30 -0.43 0.13 1.28 -0.6925 Michigan State 0.15 -0.04 -1.64 -0.44 -2.44 -0.86 0.96 -0.99 -2.21 -0.65 -1.20 0.13 -0.75 0.9226 Ohio State 1.05 -1.25 -1.47 -0.23 -1.40 -0.51 0.28 2.25 -1.30 -0.71 -1.06 0.13 1.28 -0.9227 Minnesota 1.78 -1.31 -1.47 -0.74 -1.50 -1.80 -1.25 0.34 -1.98 -0.17 -1.16 0.13 1.28 0.81

Cluster centers and standardized valuesColumn offset 2 3 4 5 6 7 8 9 10 11 12 13 14

Cluster School Index Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary Density St Body Age Rolling

1 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75262 Penn 3 -0.7560 1.3239 1.0235 1.7137 0.8286 0.1486 1.4654 0.3443 0.9884 0.6572 2.2142 0.1348 -0.75263 Michigan 11 -0.2069 0.2769 -0.1351 -0.6407 0.7026 0.4299 -0.7393 -0.3047 0.6887 -0.5663 0.4554 0.1348 -0.75264 Columbia 8 -1.4927 1.4849 1.3238 0.4853 1.0503 0.3361 -0.9089 1.1078 0.8171 2.7114 1.3943 -1.6849 1.27955 UCLA 15 -1.7606 -0.6559 0.5086 2.0208 -1.2384 0.6362 -0.9089 0.1534 0.1529 -0.0269 -0.0852 0.1348 1.27956 Carnegie-Mellon 18 0.9451 0.4242 -0.8216 -1.5620 -0.2295 0.5799 1.8046 -0.0375 0.0607 -0.1683 -0.6361 0.1348 -0.75267 USC 20 1.0522 -1.0633 -0.1351 -1.0502 0.5506 2.6425 -0.7393 0.1534 -1.2323 -0.0269 -0.2507 0.1348 -0.75268 Georgetown 24 -0.0461 -0.9235 -0.7358 -0.5384 0.1707 -1.3890 0.2783 -0.8010 -0.8887 0.2961 -0.4292 0.1348 1.27959 Michigan State 25 0.1548 -0.0359 -1.6369 -0.4360 -2.4439 -0.8639 0.9566 -0.9919 -2.2142 -0.6507 -1.2025 0.1348 -0.7526

10 Rochester 23 1.4541 -1.0127 -1.8085 -0.7431 0.2133 -1.4827 3.1614 -0.9919 -0.9222 -0.1779 -0.6025 1.9545 1.2795

SumSqDists 140.4160356

Squared distances to centers Assigned toSchool Index To 1 To 2 To 3 To 4 To 5 To 6 To 7 To 8 To 9 To 10 Minimum Cluster SchoolStanford 1 0.0 24.2 24.0 23.0 38.4 49.9 48.4 56.4 72.2 104.8 0.0 1 StanfordHarvard 2 19.7 5.6 26.5 17.3 45.9 45.7 52.0 55.2 74.9 86.7 5.6 2 PennPenn 3 24.2 0.0 18.7 21.3 35.0 29.5 41.3 34.7 52.4 55.2 0.0 2 PennMIT 4 19.4 8.6 8.2 20.1 29.5 18.1 27.0 21.8 35.5 47.4 8.2 3 MichiganNorthwestern 5 18.8 6.2 6.6 14.6 26.4 22.4 25.8 21.5 40.3 51.7 6.2 2 PennDuke 6 17.0 17.4 5.7 25.7 31.9 14.1 19.5 20.4 30.1 50.3 5.7 3 MichiganChicago 7 49.6 19.6 10.2 38.9 46.5 14.6 24.0 20.8 34.6 33.7 10.2 3 MichiganColumbia 8 23.0 21.3 27.8 0.0 41.0 46.7 48.1 39.6 70.9 86.3 0.0 4 ColumbiaDartmouth 9 26.3 23.6 3.7 36.0 29.8 16.3 21.7 16.1 28.9 43.1 3.7 3 MichiganBerkeley 10 36.5 26.8 17.5 39.2 9.9 23.2 21.6 16.6 26.5 38.5 9.9 5 UCLAMichigan 11 24.0 18.7 0.0 27.8 32.6 11.9 15.1 16.8 28.7 44.0 0.0 3 MichiganVirginia 12 29.0 26.3 7.6 26.6 22.8 18.8 23.4 8.2 26.3 36.8 7.6 3 MichiganNYU 13 33.4 29.3 26.7 9.8 32.3 33.8 29.0 26.4 56.8 62.4 9.8 4 ColumbiaYale 14 18.2 15.4 7.5 26.7 26.9 14.7 15.7 22.1 30.2 47.7 7.5 3 MichiganUCLA 15 38.4 35.0 32.6 41.0 0.0 49.6 36.1 25.6 49.0 54.0 0.0 5 UCLACornell 16 41.6 29.3 6.5 48.7 39.8 10.2 13.4 20.1 28.9 32.9 6.5 3 MichiganNorth Carolina 17 36.8 34.0 8.8 39.1 37.2 12.0 17.3 13.6 19.0 41.5 8.8 3 MichiganCarnegie-Mellon 18 49.9 29.5 11.9 46.7 49.6 0.0 18.4 18.3 17.1 26.0 0.0 6 Carnegie-MellonTexas 19 53.0 39.0 16.4 46.6 25.4 23.3 24.4 7.0 22.7 27.1 7.0 8 GeorgetownUSC 20 48.4 41.3 15.1 48.1 36.1 18.4 0.0 24.6 34.8 45.5 0.0 7 USCIndiana 21 57.0 40.4 15.3 56.3 31.0 15.9 21.2 10.3 7.3 27.9 7.3 9 Michigan StateEmory 22 67.9 46.3 18.9 57.2 43.3 15.4 28.7 5.4 18.1 15.6 5.4 8 GeorgetownRochester 23 104.8 55.2 44.0 86.3 54.0 26.0 45.5 16.5 31.8 0.0 0.0 10 RochesterGeorgetown 24 56.4 34.7 16.8 39.6 25.6 18.3 24.6 0.0 19.2 16.5 0.0 8 Georgetown

=VLOOKUP(C42,$A$2:$B$28,2,0)

=VLOOKUP($B40,$B$2:$P$28,D$32,0)

=SUMXMY2($C5:$P5,$D$35:$Q$35)=MIN(C51:L51)

=MATCH(M53,C53:L53,0)

=VLOOKUP(N55,$A$34:$B$43,2)

Page 15: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 15

We set up the model in a large spreadsheet, as shown here.

The upper section contains the standardized data, the middle section contains information about the 10 centroids, and the lower section evaluates the distances between each school and each of the centroids, and assigns schools to clusters on the basis of minimum distance.

Optimization Procedure

Page 16: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 16

313233

34353637383940414243

A B C D E F G H I J K L M N O PCluster centers and standardized values

Column offset 2 3 4 5 6 7 8 9 10 11 12 13 14Cluster School Index Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary Density St Body Age Rolling

1 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75262 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75263 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75264 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75265 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75266 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75267 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75268 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75269 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.7526

10 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.7526

=VLOOKUP(C42,$A$2:$B$28,2,0)

=VLOOKUP($B40,$B$2:$P$28,D$32,0)

We begin by setting up cells C34:C43, where Solver can identify which schools are centroids. In this initial solution, all centroids have a value of 1 (the index for Stanford), and the corresponding standardized data for Stanford appear in D34:P43. These indices will be manipulated by Solver to find the best ten centroids.

Decision Variables

Page 17: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 17

In the lower section of the worksheet, we calculate the total squared distance from each school to each centroid, and pick the minimum. Cell B45 — the objective function in this problem — is the sum of M49:M75.45

464748495051525354555657585960616263646566676869707172737475

A B C D E F G H I J K L M N OSumSqDists 1107.029838

Squared distances to centers Assigned toSchool Index To 1 To 2 To 3 To 4 To 5 To 6 To 7 To 8 To 9 To 10 Minimum Cluster SchoolStanford 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 StanfordHarvard 2 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 1 StanfordPenn 3 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 1 StanfordMIT 4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 1 StanfordNorthwestern 5 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 1 StanfordDuke 6 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 1 StanfordChicago 7 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 1 StanfordColumbia 8 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 1 StanfordDartmouth 9 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 1 StanfordBerkeley 10 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 1 StanfordMichigan 11 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 1 StanfordVirginia 12 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 1 StanfordNYU 13 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 1 StanfordYale 14 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 1 StanfordUCLA 15 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 1 StanfordCornell 16 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 1 StanfordNorth Carolina 17 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 1 StanfordCarnegie-Mellon 18 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 1 StanfordTexas 19 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 1 StanfordUSC 20 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 1 StanfordIndiana 21 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 1 StanfordEmory 22 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 1 StanfordRochester 23 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 1 StanfordGeorgetown 24 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 1 StanfordMichigan State 25 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 1 StanfordOhio State 26 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 1 StanfordMinnesota 27 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 1 Stanford

=SUMXMY2($C5:$P5,$D$35:$Q$35)

Page 18: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 18

Here’s a close-up view of the cells where the assignments are made:

4849505152535455

M N O P Q R SMinimum Cluster School

0.0 6 Stanford5.6 10 Penn0.0 10 Penn8.3 7 Dartmouth6.2 10 Penn6.8 7 Dartmouth

10.6 1 Cornell

=MIN(C51:L51)

=MATCH(M53,C53:L53,0)

=VLOOKUP(N55,$A$34:$B$43,2)

Page 19: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 19

Page 20: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 20

Page 21: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 21

Page 22: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 22

Page 23: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 23

Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

Indiana Dartmouth Ohio State

Michigan State Michigan Minnesota

Virginia

Carnegie-Mellon Yale USC

Cornell

Harvard North Carolina Stanford

Penn

Northwestern UCLA Rochester

Berkeley

Page 24: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 24

Columbia MIT Texas

NYU Northwestern Emory

Duke Georgetown

Indiana Chicago Ohio State

Michigan State Dartmouth Minnesota

Michigan

Harvard Virginia USC

Penn Yale

Cornell Stanford

UCLA North Carolina

Berkeley Carnegie-Mellon Rochester

Page 25: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 25

Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

Indiana Dartmouth Ohio State

Michigan State Michigan Minnesota

Virginia

Harvard Yale UCLA

Penn Cornell Berkeley

Northwestern North Carolina

Carnegie-Mellon Stanford

Rochester USC

Page 26: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 26

Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

UCLA Dartmouth Rochester

Berkeley Michigan Ohio State

Virginia Minnesota

Harvard Yale

Penn Cornell Indiana

Northwestern North Carolina Michigan State

Carnegie-Mellon

USC Stanford

Page 27: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 27

Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

UCLA Dartmouth Rochester

Berkeley Michigan Minnesota

Virginia Indiana

Harvard Yale Michigan State

Penn Cornell Ohio State

Northwestern North Carolina

Carnegie-Mellon Stanford

USC

Page 28: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 28

Columbia MIT Texas

NYU Northwestern Emory

Duke Georgetown

UCLA Chicago Rochester

Berkeley Dartmouth Minnesota

Michigan Indiana

Harvard Virginia Michigan State

Penn Yale Ohio State

Stanford Cornell

North Carolina

Carnegie-Mellon

USC

Page 29: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 29

Stanford Duke Texas

Harvard Chicago Emory

Penn Dartmouth Georgetown

MIT Michigan Rochester

Northwestern Virginia Minnesota

Columbia Yale Indiana

NYU Cornell Michigan State

North Carolina Ohio State

UCLA Carnegie-Mellon

Berkeley USC

Page 30: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 30

Stanford Duke Texas

Harvard Chicago Emory

Penn Dartmouth Georgetown

MIT Michigan Rochester

Northwestern Virginia Minnesota

Columbia Yale Indiana

NYU Cornell Michigan State

Berkeley North Carolina Ohio State

Carnegie-Mellon UCLA

USC

Page 31: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 31

Stanford Duke Dartmouth

Harvard Chicago Yale

Penn Columbia Cornell

MIT NYU Michigan

Northwestern Berkeley

Virginia Carnegie-Mellon Ohio State

UCLA USC Emory

North Carolina Texas Georgetown

Minnesota Indiana Rochester

Michigan State

Page 32: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 32

Stanford Duke Dartmouth

Harvard Chicago Yale

Penn Columbia Cornell

MIT NYU Michigan

Northwestern Carnegie-Mellon Berkeley

Virginia USC Ohio State

UCLA Texas Emory

North Carolina Indiana Georgetown

Minnesota Michigan State Rochester

Page 33: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 33

Cluster Analysis Questions

• Given a certain number of clusters, which schools are grouped together?

• How is the set of clusters affected if we change the number of clusters?

• For each cluster, which school is the most “typical”?

• How different are the clusters from each other?

• What is the best number of clusters?

Page 34: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 34

Given a certain number of clusters, which schools are

grouped together?• Columbia and NYU are always in the same

cluster, as are Harvard-Penn, Indiana-Michigan State.

• Michigan-Cornell-Yale-Dartmouth-Chicago-Duke.

• Texas-Emory-Georgetown-Minnesota.• What happens with UCLA-Berkeley?

Page 35: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 35

How is the set of clusters affected if we change the

number of clusters?• Notice the behavior of Northwestern as we

reduce the number of clusters.• Stanford seems to be very different from all

other schools; the last school to have its own cluster.

Page 36: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 36

For each cluster, which school is the most “typical”?

• The centroid represents the school most typical in each cluster.

• We observe that Michigan is almost always the centroid of a large cluster.

Page 37: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 37

How different are the clusters from each other?

• This is difficult to assess with this method; Minitab will provide more useful output.

Page 38: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 38

TSS vs. Number of Clusters

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10

Clusters

To

tal S

qu

are

d D

ista

nc

es

What is the best number of clusters?

Page 39: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 39

Acc. % Enr. % GMAT GPA Cost Min. % Int. % Fem. % Salary Pop. Den. Size Age Rolling JobAcc. % 1Enr. % -0.614 1GMAT -0.771 0.723 1GPA -0.810 0.637 0.703 1Cost -0.299 0.512 0.636 0.232 1

Min. % -0.436 0.403 0.602 0.347 0.463 1Int. % 0.336 -0.040 -0.383 -0.202 0.011 -0.218 1

Fem. % -0.419 0.273 0.377 0.445 0.220 0.489 -0.250 1Salary -0.674 0.739 0.838 0.579 0.717 0.476 -0.167 0.255 1

Pop. Den. -0.416 0.398 0.479 0.294 0.389 0.250 0.045 0.339 0.241 1Size -0.485 0.757 0.642 0.531 0.532 0.323 -0.026 0.188 0.632 0.506 1Age 0.472 -0.231 -0.393 -0.269 -0.117 -0.214 0.436 -0.462 -0.197 -0.226 -0.069 1

Rolling 0.158 -0.423 -0.300 -0.117 -0.166 -0.501 -0.048 0.086 -0.379 0.183 -0.248 -0.037 1Job 0.045 0.415 0.167 -0.122 0.384 0.082 -0.094 0.082 0.234 0.011 0.145 -0.210 -0.387 1

Correlation issues?

Page 40: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 40

Minorities vs. Enroll %

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

Min

ori

ty %

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

USC

Cornell

Columbia

Wharton

MIT

StanfordNYU

Page 41: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 41

Women vs. Enroll %

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

Fe

ma

le %

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

Ohio State

Emory

Cornell

Columbia

Wharton

MIT

Stanford

NYU

Page 42: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 42

International vs. Enroll %

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

Intl

% (

sta

nd

ard

ize

d)

Harvard

Minnesota

Cornell

Columbia

Wharton

MIT

Stanford

Carnegie-Mellon

NYU

Rochester

Page 43: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 43

Population vs. Enroll %

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

Po

p. D

en

sit

y (

sta

nd

ard

ize

d) Harvard

Minnesota

CornellDartmouth

NYU Columbia

Wharton

MIT

Stanford

Page 44: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 44

GMAT vs. Enroll %

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

GM

AT

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

Cornell

Columbia

Wharton

MIT

Stanford

Rochester

NYU

Page 45: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 45

Page 46: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 46

  Number of obs. Number of Similarity Distance Clusters New in newStep clusters level level joined cluster cluster 1 26 81.1711 1.92787 9 17 9 2 2 25 79.2999 2.11946 5 6 5 2 3 24 77.4028 2.31370 21 22 21 2 4 23 76.9886 2.35612 1 4 1 2 5 22 76.6438 2.39141 9 10 9 3 6 21 76.5324 2.40283 5 11 5 3 7 20 76.5139 2.40472 5 9 5 6 8 19 76.2752 2.42916 5 23 5 7 9 18 75.5949 2.49881 1 5 1 9 10 17 75.1837 2.54091 1 15 1 10 11 16 74.3490 2.62638 1 16 1 11 12 15 74.1586 2.64588 13 21 13 3 13 14 73.5778 2.70534 12 19 12 2 14 13 72.2647 2.83979 12 13 12 5 15 12 72.0863 2.85805 1 12 1 16 16 11 69.3901 3.13412 3 14 3 2 17 10 69.2620 3.14723 18 20 18 2 18 9 68.8567 3.18872 1 7 1 17 19 8 68.8347 3.19098 1 8 1 18 20 7 65.8747 3.49405 26 27 26 2 21 6 64.3943 3.64563 1 26 1 20 22 5 64.2088 3.66462 1 18 1 22 23 4 64.1918 3.66636 1 25 1 23 24 3 62.7114 3.81794 1 3 1 25 25 2 61.3983 3.95238 1 24 1 26 26 1 59.7832 4.11776 1 2 1 27 

Page 47: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 47

Page 48: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 48

  Average Maximum Within distance distance Number of cluster sum from from observations of squares centroid centroidCluster1 16 152.337 2.94527 5.25820Cluster2 1 0.000 0.00000 0.00000Cluster3 2 4.911 1.56706 1.56706Cluster4 1 0.000 0.00000 0.00000Cluster5 1 0.000 0.00000 0.00000Cluster6 2 4.953 1.57362 1.57362Cluster7 1 0.000 0.00000 0.00000Cluster8 1 0.000 0.00000 0.00000Cluster9 1 0.000 0.00000 0.00000Cluster10 1 0.000 0.00000 0.00000

Page 49: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 49

Variable Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6Accept % -0.038602 -1.62669 -1.14449 0.94506 0.97185 -1.35880Enroll % 0.107802 2.02010 0.57950 0.42423 0.41448 -0.55259GMAT 0.015098 1.66709 1.10928 -0.82163 0.68019 0.44419GPA 0.031042 1.81605 0.22938 -1.56203 -0.74310 1.15067Cost 0.099078 0.77748 1.05504 -0.22952 0.68133 -1.39023Minority pct -0.163118 1.08618 0.61740 0.57990 -0.00139 0.42989Non-U.S. pct -0.011110 -1.58728 -0.48491 1.80461 0.44785 -0.48491Female % -0.308277 1.68044 1.39412 -0.03747 -1.37362 0.05797Salary (base+signing) 0.117210 1.74379 0.41147 0.06074 0.50592 0.21257Pop. Den. In City (people per s -0.273673 -0.88123 2.71135 -0.16828 0.77406 0.15817Size: Student Body 0.132103 0.18125 0.56664 -0.63609 0.92357 -0.32053Mean Age 0.021061 -1.68491 -1.68491 0.13479 1.95450 0.13479Rolling (1=yes) -0.244604 -0.75263 1.27947 -0.75263 -0.75263 0.26342Students w/ first job offer by 0.295438 0.57646 0.34588 0.80705 0.69176 -2.47880 Variable Cluster7 Cluster8 Cluster9 Cluster10 Grand centroidAccept % 1.45405 1.05221 1.05221 1.77552 -0.0000000Enroll % -1.01274 -1.06333 -1.24651 -1.31486 0.0000000GMAT -1.80853 -0.13508 -1.46526 -1.46526 -0.0000000GPA -0.74310 -1.05020 -0.23127 -0.74310 0.0000000Cost 0.21335 0.55060 -1.40373 -1.50438 -0.0000000Minority pct -1.48274 2.64253 -0.50767 -1.80151 -0.0000000Non-U.S. pct 3.16136 -0.73930 0.27826 -1.24809 0.0000000Female % -0.99186 0.15341 2.25307 0.34429 0.0000000Salary (base+signing) -0.92224 -1.23231 -1.30021 -1.97914 0.0000000Pop. Den. In City (people per s -0.17789 -0.02694 -0.71307 -0.16693 0.0000000Size: Student Body -0.60246 -0.25070 -1.05769 -1.16373 -0.0000000Mean Age 1.95450 0.13479 0.13479 0.13479 -0.0000000Rolling (1=yes) 1.27947 -0.75263 1.27947 1.27947 0.0000000Students w/ first job offer by -1.72939 -0.69176 -0.92234 0.80705 0.0000000  

Page 50: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 50

Distances Between Cluster Centroids  Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8Cluster1 0.00000 5.1787 4.66361 3.07784 3.14580 3.86172 5.6111 3.98074Cluster2 5.17873 0.0000 5.07379 7.06157 7.04249 5.91279 10.2389 6.95451Cluster3 4.66361 5.0738 0.00000 6.14703 6.00605 5.48510 8.4794 6.00463Cluster4 3.07784 7.0616 6.14703 0.00000 3.81626 5.82255 5.1003 4.29133Cluster5 3.14580 7.0425 6.00605 3.81626 0.00000 5.81509 5.8010 4.89544Cluster6 3.86172 5.9128 5.48510 5.82255 5.81509 0.00000 6.6180 5.13248Cluster7 5.61115 10.2389 8.47936 5.10025 5.80099 6.61797 0.0000 6.74368Cluster8 3.98074 6.9545 6.00463 4.29133 4.89544 5.13248 6.7437 0.00000Cluster9 4.66285 7.9670 6.77100 4.95710 6.69781 5.01759 5.2538 5.19326Cluster10 4.89897 8.8418 7.36838 5.48899 6.39972 6.38305 5.9808 5.87407  Cluster9 Cluster10Cluster1 4.66285 4.89897Cluster2 7.96699 8.84179Cluster3 6.77100 7.36838Cluster4 4.95710 5.48899Cluster5 6.69781 6.39972Cluster6 5.01759 6.38305Cluster7 5.25382 5.98082Cluster8 5.19326 5.87407Cluster9 0.00000 3.49405Cluster10 3.49405 0.00000

Page 51: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 51

Married Sophisticates: You're in your late 20s or early 30s, recently married and likely have a household income between $50,000 and $100,000. You probably own a home, most likely in an upscale suburban neighborhood. You're a fan of "green and trendy cars," shop at Banana Republic and The Gap and are a loyal Netflix Inc. subscriber.

Truckin' & Stylin': You're in your 30s or 40s, live in a rural town and earn a moderate income. You may be married, but you don't have any children. You shop at stores like Wal-mart and AutoZone and enjoy watching NASCAR and classic shows on TV Land.

Collegiate Crowd: Between 18 and 23 years old, you're single and highly mobile. You're likely a renter and probably live in a college town. You buy clothes from American Eagle and Express Inc. and are a frequent liquor store patron. Your TV is tuned to Family Guy and you probably have copies of Rolling Stone and Us Weekly lying around.

Page 52: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 52

Shooting Stars: You're in your 30s or 40s, married without any kids. You enjoy a six-figure household income and likely have a graduate degree. You shop at stores like Ann Taylor and Sephora, read magazines like Men's Health and Real Simple and use the web to check your stock investments and make travel plans.

Apple Pie Families: You're part of an upper-middle class family, likely living in a smaller city or nearby suburb. You probably drive a minivan. You shop at stores like Home Depot, Target and Best Buy, read Sports Illustrated and listen to NPR.

City Mixers: You're a childless, single "urbanite" living in a city like New York, Los Angeles or Chicago. Well-educated, you likely enjoy museums and the theater. You buy groceries from Trader Joe's and Whole Foods, outfit your home with Crate & Barrel and buy clothes from Banana Republic. You read The New York Times and watch The Office.

Page 53: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 53

224143252018272687222113191216152310179116541

59.78

73.19

86.59

100.00

Observations

Sim

ilarityDendrogram

Single Linkage, Euclidean Distance

Page 54: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 54

1 2 3 4 5 6 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia

14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell

1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn

4 Penn 4 Penn 4 Penn 4 Penn

9 Michigan 9 Michigan 9 Michigan

9 Michigan 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth

17 Dartmouth 17 Dartmouth 17 Dartmouth 10 Duke 10 Duke

17 Dartmouth 10 Duke

10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 10 Duke 11 Yale 5 MIT

11 Yale 11 Yale 11 Yale 5 MIT 6 Northwestern

11 Yale 5 MIT 6 Northwestern

5 MIT 5 MIT 5 MIT 6 Northwestern 23 North Carolina

5 MIT 6 Northwestern 6 Northwestern 23 North Carolina

6 Northwestern 23 North Carolina 16 Virginia 6 Northwestern 23 North Carolina 23 North Carolina 16 Virginia

23 North Carolina 16 Virginia 21 Emory

23 North Carolina 16 Virginia 16 Virginia 21 Emory 22 Georgetown

16 Virginia 21 Emory 22 Georgetown

16 Virginia 21 Emory 21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas 18 Berkeley 22 Georgetown 13 Texas 12 Michigan State 22 Georgetown 13 Texas 12 Michigan State 20 UCLA 13 Texas 12 Michigan State 19 Indiana 13 Texas 12 Michigan State 19 Indiana 25 USC 12 Michigan State 19 Indiana 18 Berkeley 12 Michigan State 19 Indiana 18 Berkeley 26 Ohio State 19 Indiana 18 Berkeley 20 UCLA 19 Indiana 18 Berkeley 20 UCLA 27 Minnesota 18 Berkeley 20 UCLA 25 USC 18 Berkeley 20 UCLA 25 USC 7 Carnegie-Mellon 20 UCLA 25 USC 26 Ohio State 20 UCLA 25 USC 26 Ohio State 8 Chicago 25 USC 26 Ohio State 27 Minnesota 25 USC 26 Ohio State 27 Minnesota 12 Michigan State 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 19 Indiana 27 Minnesota 7 Carnegie-Mellon 8 Chicago 27 Minnesota 7 Carnegie-Mellon 8 Chicago 24 Rochester 7 Carnegie-Mellon 8 Chicago 24 Rochester

7 Carnegie-Mellon 8 Chicago 24 Rochester 8 Chicago 24 Rochester

8 Chicago 24 Rochester 24 Rochester 24 Rochester

Page 55: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 55

7 8 9 10 11 12 13 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia

14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU

15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 1 Harvard 1 Harvard 1 Harvard 1 Harvard

1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 9 Michigan 9 Michigan

9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth

9 Michigan 9 Michigan 17 Dartmouth 10 Duke 10 Duke 10 Duke 10 Duke 17 Dartmouth 17 Dartmouth 10 Duke 11 Yale 11 Yale 11 Yale 11 Yale 10 Duke 10 Duke 11 Yale 5 MIT 5 MIT 5 MIT 5 MIT 11 Yale 11 Yale 5 MIT 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern

5 MIT 5 MIT 6 Northwestern 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina

6 Northwestern 6 Northwestern 23 North Carolina 16 Virginia 16 Virginia 16 Virginia

23 North Carolina 16 Virginia

23 North Carolina 16 Virginia 21 Emory 21 Emory 21 Emory

16 Virginia 21 Emory 22 Georgetown 22 Georgetown 22 Georgetown

16 Virginia 21 Emory 22 Georgetown 13 Texas 13 Texas

21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas 18 Berkeley 18 Berkeley 22 Georgetown 13 Texas 18 Berkeley

13 Texas 18 Berkeley 20 UCLA 20 UCLA 13 Texas 18 Berkeley 20 UCLA 18 Berkeley 20 UCLA 25 USC 25 USC 18 Berkeley 20 UCLA 25 USC 20 UCLA 25 USC 26 Ohio State 26 Ohio State 20 UCLA 25 USC 26 Ohio State 25 USC 26 Ohio State 27 Minnesota 27 Minnesota 25 USC 26 Ohio State 27 Minnesota 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 7 Carnegie-Mellon 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 27 Minnesota 7 Carnegie-Mellon 8 Chicago 8 Chicago 27 Minnesota 7 Carnegie-Mellon 8 Chicago

7 Carnegie-Mellon 8 Chicago 12 Michigan State 12 Michigan State 7 Carnegie-Mellon 8 Chicago 12 Michigan State 19 Indiana

8 Chicago 12 Michigan State 19 Indiana 8 Chicago 12 Michigan State 19 Indiana 24 Rochester

12 Michigan State 19 Indiana 24 Rochester 12 Michigan State 19 Indiana 24 Rochester 19 Indiana 24 Rochester 19 Indiana 24 Rochester 24 Rochester 24 Rochester

Page 56: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 56

14 15 16 17 18 19 20 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU

14 NYU 14 NYU

15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell

15 Cornell 15 Cornell 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth

17 Dartmouth 17 Dartmouth 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 5 MIT 5 MIT 5 MIT 5 MIT 5 MIT

5 MIT 5 MIT 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina

23 North Carolina 23 North Carolina 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 21 Emory 21 Emory 21 Emory 21 Emory 21 Emory

21 Emory 22 Georgetown 22 Georgetown 22 Georgetown 22 Georgetown 22 Georgetown

21 Emory 22 Georgetown 13 Texas 13 Texas 13 Texas 13 Texas 13 Texas 22 Georgetown 13 Texas 12 Michigan State 12 Michigan State 12 Michigan State 12 Michigan State 12 Michigan State 13 Texas 12 Michigan State 19 Indiana 19 Indiana 19 Indiana 19 Indiana 19 Indiana

12 Michigan State 19 Indiana 7 Carnegie-Mellon 7 Carnegie-Mellon 7 Carnegie-Mellon

19 Indiana 18 Berkeley 18 Berkeley 8 Chicago 8 Chicago

18 Berkeley 20 UCLA 18 Berkeley

18 Berkeley 20 UCLA 20 UCLA 18 Berkeley 18 Berkeley

20 UCLA 25 USC 20 UCLA 20 UCLA

20 UCLA 25 USC 25 USC 25 USC 26 Ohio State 25 USC 25 USC 25 USC 26 Ohio State 26 Ohio State

26 Ohio State 27 Minnesota 26 Ohio State 26 Ohio State 26 Ohio State 27 Minnesota 27 Minnesota 27 Minnesota

27 Minnesota 7 Carnegie-Mellon 27 Minnesota 27 Minnesota 7 Carnegie-Mellon 8 Chicago 24 Rochester 7 Carnegie-Mellon 8 Chicago 24 Rochester

7 Carnegie-Mellon 8 Chicago 24 Rochester 8 Chicago 24 Rochester

8 Chicago 24 Rochester 24 Rochester 24 Rochester

Page 57: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 57

21 22 23 24 25 26

2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 3 Columbia

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 15 Cornell

15 Cornell 15 Cornell 1 Harvard

15 Cornell 15 Cornell 15 Cornell 1 Harvard 1 Harvard 4 Penn 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 9 Michigan 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 17 Dartmouth 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 10 Duke

17 Dartmouth 17 Dartmouth 17 Dartmouth 10 Duke 10 Duke 11 Yale 10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 5 MIT 11 Yale 11 Yale 11 Yale 5 MIT 5 MIT 6 Northwestern 5 MIT 5 MIT 5 MIT 6 Northwestern 6 Northwestern 23 North Carolina 6 Northwestern 6 Northwestern 6 Northwestern 23 North Carolina 23 North Carolina 16 Virginia

23 North Carolina 23 North Carolina 23 North Carolina 16 Virginia 16 Virginia 21 Emory 16 Virginia 16 Virginia 16 Virginia 21 Emory 21 Emory 22 Georgetown 21 Emory 21 Emory 21 Emory 22 Georgetown 22 Georgetown 13 Texas 22 Georgetown 22 Georgetown 22 Georgetown 13 Texas 13 Texas 12 Michigan State 13 Texas 13 Texas 13 Texas 12 Michigan State 12 Michigan State 19 Indiana 12 Michigan State 12 Michigan State 12 Michigan State 19 Indiana 19 Indiana 7 Carnegie-Mellon 19 Indiana 19 Indiana 19 Indiana 7 Carnegie-Mellon 7 Carnegie-Mellon 8 Chicago 7 Carnegie-Mellon 7 Carnegie-Mellon 7 Carnegie-Mellon 8 Chicago 8 Chicago 26 Ohio State 8 Chicago 8 Chicago 8 Chicago 26 Ohio State 26 Ohio State 27 Minnesota

26 Ohio State 26 Ohio State 26 Ohio State 27 Minnesota 27 Minnesota 18 Berkeley 27 Minnesota 27 Minnesota 27 Minnesota 18 Berkeley 18 Berkeley 20 UCLA

18 Berkeley 18 Berkeley 20 UCLA 20 UCLA 25 USC

18 Berkeley 20 UCLA 20 UCLA 25 USC 25 USC 24 Rochester

20 UCLA 25 USC 24 Rochester

25 USC 24 Rochester 25 USC 24 Rochester 24 Rochester 24 Rochester

Page 58: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 58

Discriminant Analysis• Concerned with predicting membership into

two or more sub-groups (categories) on the basis of predictor variables

• Similar to regression in terms of its purpose: finding a function of independent variables that enables us to correctly forecast the value of a dependent variable

• Dependent variable is categorical

Page 59: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 59

123456789

10111213141516171819202122232425262728293031

A B C D E F G H I J K L M N O PSubject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default

1 1 0 0 0 1 0 0 0 0 0 29 $65,311 $185,246 1 02 0 1 0 0 0 0 1 0 0 1 44 $25,803 $24,699 0 03 0 1 0 0 0 1 0 0 0 1 28 $33,286 $59,406 0 04 0 0 1 0 1 0 0 0 0 0 39 $53,188 $170,868 0 05 0 1 0 0 0 1 0 0 0 1 49 $75,419 $101,881 0 06 0 1 0 0 0 0 0 0 1 1 52 $77,962 $61,582 1 17 0 1 0 0 0 0 1 0 0 1 35 $37,222 $28,267 0 08 0 1 0 0 0 0 0 1 0 1 54 $52,914 $44,654 0 19 0 1 0 0 0 1 0 0 0 1 34 $67,021 $92,176 0 0

10 0 0 1 0 1 0 0 0 0 1 42 $74,753 $191,216 0 011 0 1 0 0 0 0 1 0 0 1 40 $59,282 $52,319 0 012 1 0 0 0 0 1 0 0 0 0 36 $46,501 $71,008 1 013 0 1 0 0 1 0 0 0 0 1 33 $40,820 $159,388 0 014 1 0 0 0 0 1 0 0 0 0 38 $36,557 $64,047 0 015 0 1 0 0 0 0 1 0 0 1 27 $62,586 $56,442 1 016 1 0 0 0 0 1 0 0 0 0 53 $69,656 $94,161 0 017 0 1 0 0 0 0 1 0 0 1 32 $74,703 $66,860 1 018 0 1 0 0 0 0 1 0 0 1 31 $59,561 $54,065 1 019 0 1 0 0 0 0 1 0 0 1 42 $50,329 $41,829 0 020 0 0 1 0 0 1 0 0 0 1 50 $67,447 $89,373 1 021 0 1 0 0 1 0 0 0 0 1 39 $21,207 $136,043 0 022 0 1 0 0 0 1 0 0 0 1 25 $18,380 $42,364 1 023 0 1 0 0 1 0 0 0 0 1 40 $61,626 $173,986 1 024 0 1 0 0 0 1 0 0 0 1 26 $45,353 $67,803 1 025 1 0 0 0 0 1 0 0 0 1 35 $54,935 $84,884 1 026 1 0 0 0 0 1 0 0 0 0 37 $30,084 $57,730 1 127 0 1 0 0 0 0 0 1 0 1 45 $40,077 $33,029 1 028 0 1 0 0 1 0 0 0 0 1 47 $29,328 $147,257 1 029 0 1 0 0 0 1 0 0 0 1 44 $33,745 $58,952 1 030 1 0 0 0 0 1 0 0 0 0 24 $18,004 $45,675 1 0

Rick Beck Consumer Credit

Page 60: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 60

Excel Method• Standardize data• Create coefficients for each

independent variable• Create a “score” for each observation

(the sumproduct of the independent variables and the coefficients)

• Create a “cut-off” value

Page 61: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 61

Excel Method• Use the cut-off value as a decision rule

for categorization – a predicted value of the dependent variable

• Track how many observations are correctly predicted using the current coefficients and cut-off value

• Optimize the coefficients and cut-off value so as to maximize the number of correct predictions

Page 62: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 62

123456789

1011121314151617181920212223242526272829303132333435363738

A B C D E F G H I J K L M N O P Q RDiscriminant Scores

1 1 1 1 1 1 1 1 1 1 1 1 1 1

Cut-off Value 1

Number Correct 582

Subject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default Score Prediction1 1.57 -1.13 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -1.06 0.85 1.92 1.01 0 1.20 12 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.37 -1.27 -1.12 -0.99 0 -2.76 03 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.15 -0.87 -0.47 -0.99 0 -3.17 04 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -0.11 0.20 1.65 -0.99 0 0.53 05 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.85 1.40 0.34 -0.99 0 1.90 16 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 -0.35 3.96 0.84 1.13 1.53 -0.43 1.01 1 5.51 17 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.49 -0.66 -1.06 -0.99 0 -2.94 08 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 1.32 0.18 -0.75 -0.99 1 1.00 09 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.58 0.94 0.15 -0.99 0 -0.16 0

10 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.18 1.36 2.03 -0.99 0 4.39 111 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.01 0.53 -0.60 -0.99 0 -0.82 012 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.39 -0.16 -0.25 1.01 0 -1.31 013 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.68 -0.47 1.43 -0.99 0 -0.39 014 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.20 -0.70 -0.38 -0.99 0 -3.79 015 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -1.25 0.71 -0.52 1.01 0 0.20 016 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 1.23 1.09 0.19 -0.99 0 -0.01 017 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.77 1.36 -0.33 1.01 0 1.52 118 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.87 0.54 -0.57 1.01 0 0.37 019 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.18 0.05 -0.80 -0.99 0 -1.31 020 -0.64 -1.13 3.20 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.94 0.97 0.10 1.01 0 4.83 121 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.11 -1.52 0.99 -0.99 0 -1.32 022 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.44 -1.67 -0.79 1.01 0 -2.58 023 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.01 0.65 1.71 1.01 0 3.67 124 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.34 -0.22 -0.31 1.01 0 -0.55 025 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.49 0.29 0.02 1.01 0 1.34 126 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.30 -1.04 -0.50 1.01 1 -2.35 027 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 0.46 -0.51 -0.97 1.01 0 1.23 128 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.65 -1.08 1.20 1.01 0 2.09 129 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.37 -0.85 -0.47 1.01 0 0.37 0

Page 63: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 63

91011121314151617181920212223

P Q R S T U VJuly Default Score Prediction Correct?

0 1.20 1 00 -2.76 0 10 -3.17 0 10 0.53 0 10 1.90 1 01 5.51 1 10 -2.94 0 11 1.00 0 00 -0.16 0 10 4.39 1 00 -0.82 0 10 -1.31 0 10 -0.39 0 10 -3.79 0 1

=SUMPRODUCT($B$2:$O$2,B12:O12)

=IF(Q14>$B$4,1,0)

=IF(R16=P16,1,0)

Page 64: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 64

123456

789

10111213

A B C D EDiscriminant Scores

1 1 1 1

Cut-off Value 1

Number Correct 582

Subject Single Married Divorced Widowed1 1.57 -1.13 -0.31 -0.252 -0.64 0.88 -0.31 -0.253 -0.64 0.88 -0.31 -0.254 -0.64 -1.13 3.20 -0.25

=SUM(S10:S1009)

Page 65: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 65

123456

789

1011121314151617181920212223242526272829303132333435363738

A B C D E F G H I J K L M N O P QDiscriminant Scores

2E+29 2E+29 -6.7E+28 8.375E+28 -5.9E+29 -3.1E+29 -1.6E+29 1.9E+29 6.9E+29 -2.8E+29 2.40956E+28 -1.7E+29 3E+29 2.8E+28

Cut-off Value 7E+29

Number Correct 877

Subject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default Score1 1.57 -1.13 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -1.06 0.85 1.92 1.01 0 -1.51E+292 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.37 -1.27 -1.12 -0.99 0 -2.36E+293 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.15 -0.87 -0.47 -0.99 0 -5.00E+294 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -0.11 0.20 1.65 -0.99 0 -7.51E+295 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.85 1.40 0.34 -0.99 0 -6.07E+296 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 -0.35 3.96 0.84 1.13 1.53 -0.43 1.01 1 2.83E+307 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.49 -0.66 -1.06 -0.99 0 -3.42E+298 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 1.32 0.18 -0.75 -0.99 1 6.15E+299 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.58 0.94 0.15 -0.99 0 -6.17E+29

10 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.18 1.36 2.03 -0.99 0 -1.40E+3011 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.01 0.53 -0.60 -0.99 0 -4.02E+2912 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.39 -0.16 -0.25 1.01 0 3.80E+2813 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.68 -0.47 1.43 -0.99 0 -6.36E+2914 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.20 -0.70 -0.38 -0.99 0 3.93E+2815 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -1.25 0.71 -0.52 1.01 0 -3.83E+2916 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 1.23 1.09 0.19 -0.99 0 -6.58E+2817 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.77 1.36 -0.33 1.01 0 -4.26E+2918 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.87 0.54 -0.57 1.01 0 -3.59E+2919 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.18 0.05 -0.80 -0.99 0 -3.72E+2920 -0.64 -1.13 3.20 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.94 0.97 0.10 1.01 0 -1.19E+3021 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.11 -1.52 0.99 -0.99 0 -5.69E+2922 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.44 -1.67 -0.79 1.01 0 -4.06E+2923 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.01 0.65 1.71 1.01 0 -6.74E+2924 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.34 -0.22 -0.31 1.01 0 -5.13E+2925 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.49 0.29 0.02 1.01 0 -5.34E+2926 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.30 -1.04 -0.50 1.01 1 1.18E+2927 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 0.46 -0.51 -0.97 1.01 0 7.05E+2928 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.65 -1.08 1.20 1.01 0 -5.08E+2929 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.37 -0.85 -0.47 1.01 0 -4.13E+29

Page 66: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 66

Page 67: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 67

* ERROR *

After subtracting group means,Credit E is highly correlated with other predictors.

* ERROR * Calculations for discriminant analysis cannot be done.

Page 68: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 68

Page 69: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 69

Linear Method for Response: Default

Predictors: Married, Divorced, Widowed, Credit A, Credit B, Credit C, Credit D, Children, Age, Income, Debt, Male

Group 0 1Count 847 153

Summary of classification

True GroupPut into Group 0 10 757 561 90 97Total N 847 153N correct 757 97Proportion 0.894 0.634

N = 1000 N Correct = 854 Proportion Correct = 0.854

Page 70: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 70

Squared Distance Between Groups

0 10 0.00000 3.437841 3.43784 0.00000

Linear Discriminant Function for Groups

0 1Constant -33.44 -27.08Married 1.68 0.52Divorced 0.23 -0.45Widowed -1.07 -2.23Credit A 233.62 228.17Credit B 86.86 80.92Credit C 38.30 32.32Credit D 34.27 31.99Children 1.88 1.02Age 0.31 0.30Income 0.00 0.00Debt -0.00 -0.00Male 2.49 2.46

Page 71: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 71

Summary of Misclassified Observations

True Pred SquaredObservation Group Group Group Distance Probability 26** 1 0 0 8.366 0.627 1 9.402 0.373 27** 0 1 0 12.842 0.172 1 9.699 0.828 32** 0 1 0 18.46 0.032 1 11.67 0.968 37** 0 1 0 10.55 0.468 1 10.29 0.532 40** 0 1 0 15.651 0.027 1 8.499 0.973 52** 0 1 0 14.99 0.172 1 11.85 0.828 62** 1 0 0 9.263 0.872 1 13.104 0.128 64** 0 1 0 16.556 0.026 1 9.290 0.974

Page 72: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 72

Excel vs. Minitab• In this case the Excel method found a better

solution• The Excel method illustrated here is really only

good for distinguishing between two groups; the Minitab method is more general (multiple groups)

• For the special case in which there are only two groups, there is a better tool called logistic regression (or logit regression). This will be the topic for Session 10.

Page 73: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 73

Steam CaseDetergent plant energy costs are increasing and the company wants to control them better.

The actual causes of energy use are not well understood beyond an intuitive grasp.

The problem is complex, as there are many hypothesized causes.

Relatively good historical records have been maintained.

Question: Can past data be used to build a descriptive and predictive model?

Why the Variation? How can steam usage be predicted or controlled?

Page 74: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 74

Historical plant data are available for a 25 month period on:

STEAM Thousands of tons of high pressure steam used

INV Tons of inventory of fats

PROD Tons of detergent production

WIND Average wind velocity

CDAY Calendar days in the month

OPDAY Operating days in the month

FDAY Number of days below freezing

TEMP Average outside air temperature

STARTS Number of production start-ups

Page 75: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 75

SEQ MONTH STEAM INV PROD WIND CDAY OPDAY FDAY TEMP STARTS 1 1 10.98 5.20 0.61 7.4 31 20 22 35.3 4 2 2 11.13 5.12 0.64 8.0 29 20 25 29.7 5 3 3 12.51 6.19 0.78 7.4 31 23 17 30.8 4 4 4 8.40 3.89 0.49 7.5 30 20 22 58.8 4 5 5 9.27 6.28 0.84 5.5 31 21 0 61.4 5 6 6 8.73 5.76 0.74 8.9 30 22 0 71.3 4 7 7 6.36 3.45 0.42 4.1 31 11 0 74.4 2 8 8 8.50 6.57 0.87 4.1 31 23 0 76.7 5 9 9 7.82 5.69 0.75 4.1 30 21 0 70.7 4 10 10 9.14 6.14 0.76 4.5 31 20 0 57.5 5 11 11 8.24 4.84 0.65 10.3 30 20 11 46.4 4 12 12 12.19 4.88 0.62 6.9 31 21 12 28.9 4 13 1 11.88 6.03 0.79 6.6 31 21 25 28.1 5 14 2 9.57 4.55 0.60 7.3 28 19 18 39.1 5 15 3 10.94 5.71 0.70 8.1 31 23 5 46.8 4 16 4 9.58 5.67 0.74 8.4 30 20 7 48.5 4 17 5 10.09 6.72 0.85 6.1 31 22 0 59.3 6 18 6 8.11 4.95 0.67 4.9 30 22 0 70.0 4 19 7 6.83 4.62 0.45 4.6 31 11 0 70.0 3 20 8 8.88 6.60 0.95 3.7 31 23 0 74.5 4 21 9 7.68 5.01 0.64 4.7 30 20 0 72.1 4 22 10 8.47 5.68 0.75 5.3 31 21 1 58.1 6 23 11 8.86 5.28 0.70 6.2 30 20 14 44.6 4 24 12 10.36 5.36 0.67 6.8 31 20 22 33.4 4 25 1 11.08 5.87 0.70 7.5 31 22 28 28.6 5

Page 76: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 76

Run Chart

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1

Month

Ste

am

Page 77: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 77

• Multiple regression will permit us to fit Steam versus any combination of the explanatory variables.

• It is not necessarily good to use them all. The scientific principle of modeling parsimony should be applied.

• Fitting the “best” simple model and fitting a “full model” can be a useful diagnostics.

• Bottom Line: We will find that several quite good and simple models can be developed in this case.

Building a Useful Regression Model

Page 78: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 78

SEQ MONTH STEAM INV PROD WIND CDAY OPDAY FDAY TEMP STARTS SEQ 1 MONTH 0.3438 1 STEAM -0.1415 -0.4122 1 INV 0.1140 -0.0601 0.3832 1 PROD 0.1088 0.0268 0.3055 0.9436 1 WIND -0.2062 -0.3179 0.4743 -0.1261 -0.1437 1 CDAY 0.1102 0.1738 0.1367 0.3821 0.2482 -0.3168 1 OPDAY 0.0094 -0.1017 0.5361 0.6851 0.7645 0.2311 0.0201 1 FDAY -0.0908 -0.4498 0.6406 -0.1911 -0.2264 0.5581 -0.2048 0.1169 1 TEMP 0.0257 0.3296 -0.8452 -0.0019 0.0677 -0.6163 0.0774 -0.2098 -0.8576 1 STARTS 0.1195 -0.1651 0.3821 0.6163 0.6013 0.0739 -0.0533 0.6006 0.1175 -0.2370 1

Page 79: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 79

Page 80: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 80

One approach to building a model is to start with one independent variable, and then add others sequentially on the basis of their correlation with the dependent variable. (We will examine variants of this approach next time.)

Here are the independent variables, ranked by their correlation with Steam:

STEAM TEMP -0.8452 FDAY 0.6406 OPDAY 0.5361 WIND 0.4743 MONTH -0.4122 INV 0.3832 STARTS 0.3821 PROD 0.3055 CDAY 0.1367

Page 81: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 81

Steam vs. Temp

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60 70 80 90

Temp

Ste

am

Page 82: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 82

Regression Statistics Multiple R 0.8452 R Square 0.7144 Adjusted R Square 0.7020 Standard Error 0.8901 Observations 25 ANOVA df SS MS F Significance F Regression 1 45.5924 45.5924 57.5428 0.0000 Residual 23 18.2234 0.7923 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 13.6230 0.5815 23.4288 0.0000 TEMP -0.0798 0.0105 -7.5857 0.0000

Page 83: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 83

SEQ MONTH STEAM TEMP Predicted STEAM Residuals 1 1 10.98 35.3 10.81 0.17 2 2 11.13 29.7 11.25 -0.12 3 3 12.51 30.8 11.16 1.35 4 4 8.40 58.8 8.93 -0.53 5 5 9.27 61.4 8.72 0.55 6 6 8.73 71.3 7.93 0.80 7 7 6.36 74.4 7.68 -1.32 8 8 8.50 76.7 7.50 1.00 9 9 7.82 70.7 7.98 -0.16 10 10 9.14 57.5 9.03 0.11 11 11 8.24 46.4 9.92 -1.68 12 12 12.19 28.9 11.32 0.87 13 1 11.88 28.1 11.38 0.50 14 2 9.57 39.1 10.50 -0.93 15 3 10.94 46.8 9.89 1.05 16 4 9.58 48.5 9.75 -0.17 17 5 10.09 59.3 8.89 1.20 18 6 8.11 70.0 8.03 0.08 19 7 6.83 70.0 8.03 -1.20 20 8 8.88 74.5 7.68 1.20 21 9 7.68 72.1 7.87 -0.19 22 10 8.47 58.1 8.98 -0.51 23 11 8.86 44.6 10.06 -1.20 24 12 10.36 33.4 10.96 -0.60 25 1 11.08 28.6 11.34 -0.26

Page 84: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 84

Intervals

0

2

4

6

8

10

12

14

16

18

0 10 20 30 40 50 60 70 80 90

Temp

Ste

am

Y-hat

Prediction LCL

Prediction UCL

Confidence LCL

Confidence UCL

Data

Page 85: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 85

Let’s add “Days Below Freezing” to the model: Regression Statistics Multiple R 0.8610 R Square 0.7413 Adjusted R Square 0.7177 Standard Error 0.8663 Observations 25 ANOVA df SS MS F Significance F Regression 2 47.3047 23.6524 31.5153 0.0000 Residual 22 16.5111 0.7505 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 15.4426 1.3310 11.6025 0.0000 FDAY -0.0505 0.0334 -1.5105 0.1451 TEMP -0.1056 0.0199 -5.3039 0.0000

Not encouraging; why?

Page 86: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 86

Let’s skip “Days Below Freezing” and try “Operating Days”: Regression Statistics Multiple R 0.9215 R Square 0.8491 Adjusted R Square 0.8354 Standard Error 0.6616 Observations 25 ANOVA df SS MS F Significance F Regression 2 54.1871 27.0935 61.9043 0.0000 Residual 22 9.6287 0.4377 Total 24 63.8158 Coefficients Standard Error T Stat P-value Intercept 9.1269 1.1028 8.2761 0.0000 TEMP -0.0724 0.0080 -9.0498 0.0000 OPDAY 0.2028 0.0458 4.4314 0.0002

Better!

Page 87: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 87

The most obvious 3-variable model: Regression Statistics Multiple R 0.9279 R Square 0.8609 Adjusted R Square 0.8411 Standard Error 0.6501 Observations 25 ANOVA df SS MS F Significance F Regression 3 54.9417 18.3139 43.3388 0.0000 Residual 21 8.8741 0.4226 Total 24 63.8158 Coefficients Standard Error T Stat P-value Intercept 10.1988 1.3482 7.5647 0.0000 TEMP -0.0802 0.0098 -8.1868 0.0000 OPDAY 0.2108 0.0454 4.6470 0.0001 WIND -0.1295 0.0969 -1.3363 0.1957

The “Wind Velocity” variable doesn’t seem to add much.

Page 88: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 88

A better 3-variable model: Regression Statistics

Multiple R 0.9379 R Square 0.8796

Adjusted R Square 0.8624 Standard Error 0.6049 Observations 25

ANOVA

df SS MS F Significance F Regression 3 56.1310 18.7103 51.1290 0.0000 Residual 21 7.6848 0.3659

Total 24 63.8158 Coefficients Standard Error t Stat P-value

Intercept 8.5663 1.0373 8.2581 0.0000 TEMP -0.0758 0.0075 -10.1578 0.0000 INV 0.4880 0.2117 2.3048 0.0315

OPDAY 0.1082 0.0586 1.8456 0.0791

This one is better; the adjusted R-square is up to 0.8624.

Page 89: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 89

There are some useful inferences to be made from the “full” model, even though it contains some insignificant variables and may not be the best model for forecasting purposes.

Page 90: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 90

Regression Statistics Multiple R 0.9539 R Square 0.9098 Adjusted R Square 0.8648 Standard Error 0.5997 Observations 25 ANOVA df SS MS F Significance F Regression 8 58.0619 7.2577 20.1818 0.0000 Residual 16 5.7539 0.3596 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 6.2909 6.8096 0.9238 0.3693 INV 0.9365 0.5760 1.6257 0.1235 PROD -4.6388 3.9979 -1.1603 0.2629 WIND -0.0868 0.1035 -0.8388 0.4139 CDAY 0.1053 0.2152 0.4893 0.6313 OPDAY 0.2206 0.0810 2.7222 0.0151 FDAY -0.0181 0.0258 -0.7001 0.4939 TEMP -0.0874 0.0163 -5.3696 0.0001 STARTS -0.2548 0.2142 -1.1896 0.2516

Page 91: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 91

Note that R2 for the simple model with only TEMP was 71%, while for the full model it was 91%. This R2 becomes a benchmark for other models.

TEMP is still very significant in the full model, even given that all other variables are in the model.

OPDAYS is the only other variable that is significant on a “last- in” basis.

INV is of borderline significance on a “last in” basis.

These three variables’ significance is evaluated on what might be called a “worst-case” basis.

Conclusions from the Full Model

Page 92: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 92

Examine the correlations among these variables: STEAM INV OPDAY TEMP

STEAM 1 INV 0.3832 1 OPDAY 0.5361 0.6851 1 TEMP -0.8452 -0.0019 -0.2098 1

Page 93: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 93

The bottom line is whether the model makes good predictions:

Predictions vs. Observations

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions

Validating a Three-Parameter Model

Page 94: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 94

Residual AnalysisHistogram of Residuals

0

2

4

6

8

10

12

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy

Page 95: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 95

Normal Probability Plot

-3

-2

-1

0

1

2

3

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Normal Score

Sta

nd

ard

ize

d R

es

idu

al

Page 96: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 96

Residuals vs. Observations

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l Err

or

Page 97: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 97

Residuals vs. Predictions

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l Err

or

Page 98: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 98

Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l Err

or

Page 99: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 99

Residuals vs. Temperature

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l Err

or

Page 100: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 100

Residuals vs. Inventory

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

Inventory of Fats (tons)

Re

sid

ua

l Err

or

Page 101: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 101

Residuals vs. Operating Days

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l Err

or

Page 102: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 102

W e might check the residuals against other independent variables not in the model. W hy?

Residuals vs. Starts

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Starts

Res

idu

al E

rro

r

Page 103: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 103

One possible problem with our model is the potential for “overfitting”, where we have a great model in terms of fitting past data, but a poor model for predicting the future. How can we assess the predictive quality of our model?

One method is to “hold out” some data, and fit a model using only a subset of all of the data we have. Then, we can use the model to see how well it would have predicted the holdout sample.

Holdout Samples

Page 104: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 104

Here, we use one year of data to fit a model: Regression Statistics Multiple R 0.9463 R Square 0.8954 Adjusted R Square 0.8562 Standard Error 0.7080 Observations 12 ANOVA df SS MS F Significance F Regression 3 34.3438 11.4479 22.8370 0.0003 Residual 8 4.0103 0.5013 Total 11 38.3541 Coefficients Standard Error t Stat P-value Intercept 9.8145 1.7183 5.7117 0.0004 TEMP -0.0861 0.0127 -6.7716 0.0001 INV 0.4893 0.3620 1.3516 0.2135 OPDAY 0.0804 0.1161 0.6927 0.5081

Page 105: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 105

Using 1st Year to Predict 2nd Year

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Month

Ste

am

STEAM

Y-hat

Page 106: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 106

Is the 3-variable model significantly better than the 2-variable model?

Page 107: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 107

2 Variables 3 Variables

R-square 0.8491 0.9379

Adjusted R-square 0.8354 0.8624

Standard Error 0.6616 0.6049

Coefficient for Temp -0.0724 -0.0758 Coefficient for Inv 0.4880 Coefficient for Opdays 0.2028 0.1082

Page 108: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 108

Predictions vs. Observations

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions

Predictions vs. Observations

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions

Predictions and Observations

Page 109: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 109

Histograms of Residuals

Histogram of Residuals

0

2

4

6

8

10

12

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy

Histogram of Residuals

0

1

2

3

4

5

6

7

8

9

10

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy

Page 110: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 110

Normal Plots

Normal Probability Plot

-3

-2

-1

0

1

2

3

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Normal Score

Sta

nd

ard

ize

d R

es

idu

al

Normal Probability Plot

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Normal Score

Sta

nd

ard

ize

d R

es

idu

al

Page 111: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 111

Residuals vs. Observations

Residuals vs. Observations

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l E

rro

r

Residuals vs. Observations

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l E

rro

r

Page 112: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 112

Residuals vs. Predictions

Residuals vs. Predictions

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l E

rro

r

Residuals vs. Predictions

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l E

rro

r

Page 113: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 113

Residuals vs. Temp

Residuals vs. Temperature

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l E

rro

r

Residuals vs. Temperature

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l E

rro

r

Page 114: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 114

Residuals vs. Opdays

Residuals vs. Operating Days

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l E

rro

r

Residuals vs. Operating Days

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l E

rro

r

Page 115: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 115

Residuals vs. Inventory

Residuals vs. Inventory

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

Inventory of Fats (tons)

Re

sid

ua

l E

rro

r

Residuals vs. Inventory

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

Inventory of Fats (tons)

Re

sid

ua

l E

rro

r

Page 116: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 116

Residuals vs. Time

Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l E

rro

r

Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l E

rro

r

Page 117: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 117

Residuals vs. Starts

Residuals vs. Starts

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 1 2 3 4 5 6 7

Starts

Re

sid

ua

l E

rro

r

Residuals vs. Starts

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Starts

Re

sid

ua

l E

rro

r

Page 118: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 118

Conclusions• We observe very low values for operating days in

months 7 and 19. On investigation we find that the plant shuts down for 2 week vacation in July. Perhaps we need a dummy variable for summer vacation shutdown?

• Some reduced model appears to do the fitting well. Which we’d adopt depends in part on how management wishes to use the model and on whether the predictor variables being considered (in this case INV, TEMP, OPDAYS etc.) can themselves be predicted.

•  A more complex model might be possible. Perhaps TEMP*CDAYS would be better than either alone. Perhaps WIND^3*TEMP (measuring a wind-chill factor)? However, it is easy to reach the point of diminishing returns and having paralysis by analysis.

Page 119: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 119

Cars CaseNeed to remove one drive type (I removed FWD)

Need to remove one make (I removed Chevrolet)

Still need to remove Volkswagen because of multicollinearity

Page 120: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 120

A Possible “Enter” Procedure

Regression Statistics Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7a Model 7b Model 7c Model 7dMultiple R 0.9175 0.9408 0.9507 0.9595 0.9632 0.9736 0.9736 0.9738 0.9737 0.9736R Square 0.8418 0.8852 0.9038 0.9206 0.9278 0.9479 0.9479 0.9484 0.9481 0.9479Adjusted R Square 0.8379 0.8793 0.8962 0.9120 0.9178 0.9390 0.9372 0.9377 0.9374 0.9372Standard Error 6492 5602 5195 4782 4623 3983 4040 4024 4033 4041

ABS(Correl) In tercept -7326 9993 10618 13428 11043 -9528 -8779 -8912 -10341 -95320.9175 HP 171 129 118 109 96 136 135 135 137 136

-0.8163 FWD -11145 -9929 -11738 -7298 -8040 -8131 -7930 -8028 -80550.6064 AWD 9886 417 6608 3864 3808 4023 3901 38530.5584 Audi 10095 11199 10615 10505 10460 10507 106440.5507 Lexus 7316 3662 3627 3665 3513 3680

-0.5170 MPG City 635 628 618 668 6340.5163 RWD

-0.2737 Powertrain W arranty (miles) -6-0.2367 Chevrolet -1203-0.2192 Toyota -825-0.1827 Nissan 191-0.1546 Mazda-0.1518 Ford-0.1513 Volkswagen-0.1508 Saturn-0.1455 Honda-0.0928 Trunk-0.0888 Chrysler

Page 121: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 121

A Possible “Remove” ProcedureMultiple R 0.9757 0.9757 0.9757 0.9756 0.9755 0.9753 0.9752 0.9750 0.9748 0.9742 0.9727R Square 0.9520 0.9520 0.9520 0.9519 0.9516 0.9513 0.9510 0.9505 0.9501 0.9490 0.9462Adjusted R Square 0.9243 0.9271 0.9296 0.9320 0.9338 0.9355 0.9372 0.9385 0.9399 0.9402 0.9388Standard Error 4435 4353 4276 4206 4148 4093 4041 3997 3953 3941 3989

Intercept -13060 -13066 -13972 -14427 -15063 -15499 -16268 -18731 -17596 -19837 -20269MPG City 666 666 662 662 656 650 657 673 626 669 687HP 141 141 140 140 140 139 139 141 139 147 145Trunk -1W arranty -70 -70 -59 -55 -46 -35 -25Audi 7857 7857 8440 8640 9056 9193 9305 9866 10047 9533 10060Chrysler 1770 1770 1760 1762 1755Ford -4533 -4535 -4078 -3915 -3586 -3579 -3566 -3222 -3069 -3362Honda -1215 -1215 -776Lexus 2386 2388 2782 2885 3157 2500 2586 2851 3090Nissan -1692 -1693 -1247 -1074Saturn 1899 1900 1897 1902 1908 1443Toyota -2435 -2435 -1983 -1800 -1443 -1417 -1461 -1168Volkswagen -621 -622RW D 7563 7564 7578 7570 7573 8263 8069 7727 7758 9473 10141AWD 11443 11446 11506 11474 11487 11534 11560 11456 11443 10595 10934

Page 122: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 122

Minitab “Best Subsets” Procedure

1 2 3 4 5 6 7Multiple R 0.9175 0.9372 0.9543 0.9669 0.9727 0.9742 0.9748R Square 0.8418 0.8783 0.9107 0.9348 0.9462 0.9490 0.9501Adjusted R Square 0.8379 0.8720 0.9036 0.9278 0.9388 0.9402 0.9399Standard Error 6492 5768 5005 4334 3989 3941 3953

Intercept -7326 -32141 -30500 -23445 -20269 -19837 -17596HP 171.1 206.9 191.6 159.1 145.3 147.2 139.1MPG City 805.4 828.5 721.1 686.9 668.9 626.4Audi 9856 14864 10060 9533 10047RW D 8518 10141 9473 7758AWD 10934 10595 11443Ford -3362 -3069Lexus 3090

Page 123: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Conclusions

• These factors can be used to estimate MSRP within about $4000

• You should expect to pay more for higher HP and for higher fuel efficiency (despite the fact that they are negatively correlated with each other)

• FWD is the least expensive drive train• Audi (and maybe Lexus) is overpriced, based

on these features• Ford is a relative bargain

Applied Regression -- Prof. Juran 123

Page 124: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 124

Comparison of Three Models

Regression Statistics Enter "Best" Remove "Best" Best Subsets #6Multiple R 0.9736 0.9742 0.9742R Square 0.9479 0.9490 0.9490Adjusted R Square 0.9390 0.9402 0.9402Standard Error 3983 3941 3941

Intercept -9528 -19837 -19837HP 135.9 147.2 147.2FWD -8040AWD 3864 10595 10595Audi 10615 9533 9533Lexus 3662MPG City 634.6 668.9 668.9RWD 9473 9473Ford -3362 -3362

Page 125: Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Applied Regression -- Prof. Juran 125

SummaryTwo Multivariate Methods • Cluster Analysis

– Excel– Minitab

• Discriminant Analysis– Excel– Minitab

Steam caseCars