Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab

Session 9

Applied Regression -- Prof. Juran 2

OutlineTwo Multivariate Methods • Cluster Analysis

– Excel– Minitab

• Discriminant Analysis– Excel– Minitab

Steam caseCars


Cluster Analysis• Concerned with grouping a large

number of observations into reasonable sub-groups (clusters) on the basis of their similarities on multiple dimensions

• Similar to regression in terms of its basic method: finding a solution that minimizes a total sum of squared errors

• Not concerned with explaining variability or forecasting

• No dependent variable


Example: MBA ProgramsSchool Accept % Enroll % GMAT GPA Cost Minority % Non-U.S. % Female % Salary Pop. Density Student Body Age Rolling Job

Harvard 12.0% 82.5% 705 3.58 $ 31,800 24.0 35.0 35.0 $127,338 14,618 1821 28 0 92% Stanford 9.0% 78.0% 718 3.58 $ 31,002 24.0 24.0 38.0 $130,253 2,371 756 27 0 92% Columbia 10.0% 72.0% 710 3.45 $ 32,154 20.0 28.0 35.0 $121,000 23,671 1225 27 1 94% Penn 15.5% 70.2% 703 3.57 $ 31,218 19.0 42.0 31.0 $122,711 11,492 1542 28 0 88% MIT 18.0% 69.9% 703 3.50 $ 31,200 19.0 35.0 26.0 $120,449 11,398 638 28 0 91% Northwestern 16.0% 64.0% 700 3.45 $ 30,255 17.5 30.8 30.0 $120,500 12,185 1250 28 0 88% Carn.-Mellon 28.2% 60.1% 660 3.25 $ 26,750 21.3 44.0 29.0 $113,448 6,598 440 28 0 94% Chicago 28.4% 60.0% 695 3.33 $ 30,596 18.2 36.0 22.0 $117,893 12,185 1043 29 0 93% Michigan 19.6% 58.5% 676 3.34 $ 30,686 20.5 29.0 27.6 $119,718 4,238 862 28 0 93% Duke 20.1% 57.0% 688 3.43 $ 30,323 20.6 34.0 30.0 $117,899 2,503 682 27 0 93% Yale 20.0% 56.7% 698 3.50 $ 29,720 24.0 34.0 33.0 $114,274 6,559 436 28 0 94% Michigan State 22.3% 55.0% 641 3.36 $ 17,400 13.6 39.0 24.0 $ 90,733 3,738 221 28 0 95% Texas 29.0% 52.4% 680 3.34 $ 24,252 12.3 28.0 22.7 $106,905 2,260 780 28 1 78% NYU 15.2% 51.7% 700 3.40 $ 32,194 23.0 33.0 38.0 $112,900 23,671 585 27 1 86% Cornell 25.8% 51.2% 669 3.32 $ 30,505 22.6 33.0 31.0 $117,825 1,802 551 29 0 92% Virginia 16.6% 51.0% 681 3.40 $ 28,900 15.1 29.0 30.0 $115,272 3,938 516 28 1 93% Dartmouth 18.0% 51.0% 690 3.40 $ 30,250 17.2 29.9 24.0 $124,235 1,800 436 28 0 92% Berkeley 14.0% 50.3% 688 3.43 $ 21,208 19.4 33.0 29.0 $115,560 9,631 471 28 0 69% Indiana 24.7% 49.8% 651 3.35 $ 20,694 14.3 31.0 22.0 $102,299 4,073 596 28 0 83% UCLA 8.0% 48.0% 691 3.60 $ 22,490 21.6 28.0 30.0 $114,368 7,436 653 28 1 62% Emory 31.5% 46.7% 651 3.30 $ 28,012 10.7 36.0 21.0 $108,959 2,996 387 28 1 88% Georgetown 20.8% 45.0% 662 3.35 $ 28,440 10.8 35.0 25.0 $103,968 9,351 520 28 1 81% North Carolina 23.8% 44.2% 674 3.30 $ 26,545 16.2 32.0 29.0 $111,104 2,619 554 27 0 92% Rochester 32.0% 44.0% 637 3.33 $ 28,620 10.3 52.0 24.0 $103,633 6,541 453 29 1 72% USC 29.0% 43.4% 676 3.30 $ 30,044 32.3 29.0 30.0 $100,537 7,436 589 28 0 81% Ohio State 29.0% 41.4% 645 3.38 $ 21,792 15.5 35.0 41.0 $ 99,859 3,368 277 28 1 79% Minnesota 34.4% 40.6% 645 3.33 $ 21,367 8.6 26.0 31.0 $ 93,080 6,606 236 28 1 94%


Cluster Analysis Questions

• Given a certain number of clusters, which schools are grouped together?

• How is the set of clusters affected if we change the number of clusters?

• For each cluster, which school is the most “typical”?

• How different are the clusters from each other?

• What is the best number of clusters?


Basic Method in Excel• We will assume that all of these attributes

deserve equal weighting in our analysis. We will – name a school as the “typical” school in each cluster

(called the centroid of the cluster), – assign each of the non-centroid schools to the cluster

where they are most similar to the centroid, and– optimize the identities of the centroids and the

cluster assignments so as to minimize the total Euclidean distance between each school and its cluster centroid.

• We define “most similar” to be the least sum of squared errors across all attributes between a cluster member and the centroid of the cluster.


Nonlinear Problems

Some nonlinear problems can be formulated in a linear fashion (i.e. some network problems).

Other nonlinear functions can be solved with our basic methods (i.e. smooth, continuous functions that are concave or convex, such as portfolio variances).

However, there are many types of nonlinear problems that pose significant difficulties.


Nonlinear Problems

The linear solution to a nonlinear (say, integer) problem may be infeasible.

The linear solution may be far away from the actual optimal solution.

Some functions have many local minima (or maxima), and Solver is not guaranteed to find the global minimum (or maximum).


3 Solvers

• Simplex LP Solver• GRG Nonlinear Solver• Evolutionary Solver


Solution MethodologyThe standard simplex algorithm (Solver’s default method) won’t work on this problem. The GRG Nonlinear algorithm will make an honest effort, but is likely to give up without finding the optimal solution.

This can result from the use of MAX, IF, and SUMIF functions, resulting in discontinuities in our productive function and constraints as functions of the decision variables.

It can also be the result of using numerical decision variables that are in fact simply names (as in this example, where the names of the clusters happen to be numbers).

The Evolutionary Solver, a genetic algorithm, can do a good job with a problem like this, but is not guaranteed to find the optimal solution.


Solution MethodologyThe Evolutionary Solver operates in a completely different way from the other types. Instead of searching in a structured way guaranteed to reach the optimal solution, genetic algorithms operate somewhat like biological evolutionary processes, with some degree of randomness in the steps taken from one solution to the next.

In a finite period of time, the Evolutionary Solver is not guaranteed to find the optimal solution, but it will find very good solutions and try to improve upon them.


In cluster analysis it is common to standardize the attribute data, so that those variables with large units (such as cost, salary and student body size) do not dominate the sum of squares over attributes with small units (such as % female, % admitted, and % with a job at graduation).

So we transform each attribute for each school into a z-value.

Standardization


123456789

10111213141516171819202122232425262728

A B C D E F G H I J K LSchool Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary (base+signing)Pop. Den. In City (people per sq. mile)Size: Student Body

Stanford -1.63 2.02 1.67 1.82 0.78 1.09 -1.59 1.68 1.74 -0.88 0.18Harvard -1.22 2.42 1.11 1.82 0.97 1.09 0.28 1.11 1.45 1.18 2.94Penn -0.76 1.32 1.02 1.71 0.83 0.15 1.47 0.34 0.99 0.66 2.21MIT -0.42 1.30 1.02 1.00 0.82 0.15 0.28 -0.61 0.76 0.64 -0.12Northwestern -0.69 0.77 0.89 0.49 0.60 -0.13 -0.43 0.15 0.77 0.77 1.46Duke -0.14 0.15 0.38 0.28 0.62 0.45 0.11 0.15 0.51 -0.86 -0.01Chicago 0.97 0.41 0.68 -0.74 0.68 0.00 0.45 -1.37 0.51 0.77 0.92Columbia -1.49 1.48 1.32 0.49 1.05 0.34 -0.91 1.11 0.82 2.71 1.39Dartmouth -0.42 -0.39 0.47 -0.03 0.60 -0.19 -0.59 -0.99 1.14 -0.98 -0.65Berkeley -0.96 -0.45 0.38 0.28 -1.54 0.22 -0.06 -0.04 0.27 0.34 -0.56Michigan -0.21 0.28 -0.14 -0.64 0.70 0.43 -0.74 -0.30 0.69 -0.57 0.46Virginia -0.61 -0.39 0.08 -0.03 0.28 -0.58 -0.74 0.15 0.24 -0.62 -0.44NYU -0.80 -0.33 0.89 -0.03 1.06 0.90 -0.06 1.68 0.01 2.71 -0.26Yale -0.15 0.12 0.81 1.00 0.47 1.09 0.11 0.73 0.14 -0.17 -0.65UCLA -1.76 -0.66 0.51 2.02 -1.24 0.64 -0.91 0.15 0.15 -0.03 -0.09Cornell 0.62 -0.37 -0.44 -0.85 0.66 0.82 -0.06 0.34 0.50 -0.98 -0.35North Carolina 0.36 -1.00 -0.22 -1.05 -0.28 -0.38 -0.23 -0.04 -0.17 -0.84 -0.34Carnegie-Mellon 0.95 0.42 -0.82 -1.56 -0.23 0.58 1.80 -0.04 0.06 -0.17 -0.64Texas 1.05 -0.27 0.04 -0.64 -0.82 -1.11 -0.91 -1.24 -0.59 -0.90 0.24USC 1.05 -1.06 -0.14 -1.05 0.55 2.64 -0.74 0.15 -1.23 -0.03 -0.25Indiana 0.48 -0.49 -1.21 -0.54 -1.66 -0.73 -0.40 -1.37 -1.06 -0.59 -0.23Emory 1.39 -0.77 -1.21 -1.05 0.07 -1.41 0.45 -1.56 -0.39 -0.78 -0.77Rochester 1.45 -1.01 -1.81 -0.74 0.21 -1.48 3.16 -0.99 -0.92 -0.18 -0.60Georgetown -0.05 -0.92 -0.74 -0.54 0.17 -1.39 0.28 -0.80 -0.89 0.30 -0.43Michigan State 0.15 -0.04 -1.64 -0.44 -2.44 -0.86 0.96 -0.99 -2.21 -0.65 -1.20Ohio State 1.05 -1.25 -1.47 -0.23 -1.40 -0.51 0.28 2.25 -1.30 -0.71 -1.06Minnesota 1.78 -1.31 -1.47 -0.74 -1.50 -1.80 -1.25 0.34 -1.98 -0.17 -1.16

=((VLOOKUP($A9,'Raw Data'!$A$2:$O$28,3,0))-AVERAGE('Raw Data'!C$2:C$28))/STDEV('Raw Data'!C$2:C$28)


123456789

101112131415161718192021222324252627282930313233

34353637383940414243

44

45

464748495051525354555657585960616263646566676869707172

A B C D E F G H I J K L M N O PIndex School Accept % Enroll % GMAT GPA Cost Minority pctNon-U.S. pct Female % Salary Density St Body Age Rolling Job

1 Stanford -1.63 2.02 1.67 1.82 0.78 1.09 -1.59 1.68 1.74 -0.88 0.18 -1.68 -0.75 0.582 Harvard -1.22 2.42 1.11 1.82 0.97 1.09 0.28 1.11 1.45 1.18 2.94 0.13 -0.75 0.583 Penn -0.76 1.32 1.02 1.71 0.83 0.15 1.47 0.34 0.99 0.66 2.21 0.13 -0.75 0.124 MIT -0.42 1.30 1.02 1.00 0.82 0.15 0.28 -0.61 0.76 0.64 -0.12 0.13 -0.75 0.465 Northwestern -0.69 0.77 0.89 0.49 0.60 -0.13 -0.43 0.15 0.77 0.77 1.46 0.13 -0.75 0.126 Duke -0.14 0.15 0.38 0.28 0.62 0.45 0.11 0.15 0.51 -0.86 -0.01 -1.68 -0.75 0.697 Chicago 0.97 0.41 0.68 -0.74 0.68 0.00 0.45 -1.37 0.51 0.77 0.92 1.95 -0.75 0.698 Columbia -1.49 1.48 1.32 0.49 1.05 0.34 -0.91 1.11 0.82 2.71 1.39 -1.68 1.28 0.819 Dartmouth -0.42 -0.39 0.47 -0.03 0.60 -0.19 -0.59 -0.99 1.14 -0.98 -0.65 0.13 -0.75 0.58

10 Berkeley -0.96 -0.45 0.38 0.28 -1.54 0.22 -0.06 -0.04 0.27 0.34 -0.56 0.13 -0.75 -2.0811 Michigan -0.21 0.28 -0.14 -0.64 0.70 0.43 -0.74 -0.30 0.69 -0.57 0.46 0.13 -0.75 0.6912 Virginia -0.61 -0.39 0.08 -0.03 0.28 -0.58 -0.74 0.15 0.24 -0.62 -0.44 0.13 1.28 0.6913 NYU -0.80 -0.33 0.89 -0.03 1.06 0.90 -0.06 1.68 0.01 2.71 -0.26 -1.68 1.28 -0.1214 Yale -0.15 0.12 0.81 1.00 0.47 1.09 0.11 0.73 0.14 -0.17 -0.65 0.13 -0.75 0.8115 UCLA -1.76 -0.66 0.51 2.02 -1.24 0.64 -0.91 0.15 0.15 -0.03 -0.09 0.13 1.28 -2.8816 Cornell 0.62 -0.37 -0.44 -0.85 0.66 0.82 -0.06 0.34 0.50 -0.98 -0.35 1.95 -0.75 0.5817 North Carolina 0.36 -1.00 -0.22 -1.05 -0.28 -0.38 -0.23 -0.04 -0.17 -0.84 -0.34 -1.68 -0.75 0.5818 Carnegie-Mellon 0.95 0.42 -0.82 -1.56 -0.23 0.58 1.80 -0.04 0.06 -0.17 -0.64 0.13 -0.75 0.8119 Texas 1.05 -0.27 0.04 -0.64 -0.82 -1.11 -0.91 -1.24 -0.59 -0.90 0.24 0.13 1.28 -1.0420 USC 1.05 -1.06 -0.14 -1.05 0.55 2.64 -0.74 0.15 -1.23 -0.03 -0.25 0.13 -0.75 -0.6921 Indiana 0.48 -0.49 -1.21 -0.54 -1.66 -0.73 -0.40 -1.37 -1.06 -0.59 -0.23 0.13 -0.75 -0.4622 Emory 1.39 -0.77 -1.21 -1.05 0.07 -1.41 0.45 -1.56 -0.39 -0.78 -0.77 0.13 1.28 0.1223 Rochester 1.45 -1.01 -1.81 -0.74 0.21 -1.48 3.16 -0.99 -0.92 -0.18 -0.60 1.95 1.28 -1.7324 Georgetown -0.05 -0.92 -0.74 -0.54 0.17 -1.39 0.28 -0.80 -0.89 0.30 -0.43 0.13 1.28 -0.6925 Michigan State 0.15 -0.04 -1.64 -0.44 -2.44 -0.86 0.96 -0.99 -2.21 -0.65 -1.20 0.13 -0.75 0.9226 Ohio State 1.05 -1.25 -1.47 -0.23 -1.40 -0.51 0.28 2.25 -1.30 -0.71 -1.06 0.13 1.28 -0.9227 Minnesota 1.78 -1.31 -1.47 -0.74 -1.50 -1.80 -1.25 0.34 -1.98 -0.17 -1.16 0.13 1.28 0.81

Cluster centers and standardized valuesColumn offset 2 3 4 5 6 7 8 9 10 11 12 13 14

Cluster School Index Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary Density St Body Age Rolling

1 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75262 Penn 3 -0.7560 1.3239 1.0235 1.7137 0.8286 0.1486 1.4654 0.3443 0.9884 0.6572 2.2142 0.1348 -0.75263 Michigan 11 -0.2069 0.2769 -0.1351 -0.6407 0.7026 0.4299 -0.7393 -0.3047 0.6887 -0.5663 0.4554 0.1348 -0.75264 Columbia 8 -1.4927 1.4849 1.3238 0.4853 1.0503 0.3361 -0.9089 1.1078 0.8171 2.7114 1.3943 -1.6849 1.27955 UCLA 15 -1.7606 -0.6559 0.5086 2.0208 -1.2384 0.6362 -0.9089 0.1534 0.1529 -0.0269 -0.0852 0.1348 1.27956 Carnegie-Mellon 18 0.9451 0.4242 -0.8216 -1.5620 -0.2295 0.5799 1.8046 -0.0375 0.0607 -0.1683 -0.6361 0.1348 -0.75267 USC 20 1.0522 -1.0633 -0.1351 -1.0502 0.5506 2.6425 -0.7393 0.1534 -1.2323 -0.0269 -0.2507 0.1348 -0.75268 Georgetown 24 -0.0461 -0.9235 -0.7358 -0.5384 0.1707 -1.3890 0.2783 -0.8010 -0.8887 0.2961 -0.4292 0.1348 1.27959 Michigan State 25 0.1548 -0.0359 -1.6369 -0.4360 -2.4439 -0.8639 0.9566 -0.9919 -2.2142 -0.6507 -1.2025 0.1348 -0.7526

10 Rochester 23 1.4541 -1.0127 -1.8085 -0.7431 0.2133 -1.4827 3.1614 -0.9919 -0.9222 -0.1779 -0.6025 1.9545 1.2795

SumSqDists 140.4160356

Squared distances to centers Assigned toSchool Index To 1 To 2 To 3 To 4 To 5 To 6 To 7 To 8 To 9 To 10 Minimum Cluster SchoolStanford 1 0.0 24.2 24.0 23.0 38.4 49.9 48.4 56.4 72.2 104.8 0.0 1 StanfordHarvard 2 19.7 5.6 26.5 17.3 45.9 45.7 52.0 55.2 74.9 86.7 5.6 2 PennPenn 3 24.2 0.0 18.7 21.3 35.0 29.5 41.3 34.7 52.4 55.2 0.0 2 PennMIT 4 19.4 8.6 8.2 20.1 29.5 18.1 27.0 21.8 35.5 47.4 8.2 3 MichiganNorthwestern 5 18.8 6.2 6.6 14.6 26.4 22.4 25.8 21.5 40.3 51.7 6.2 2 PennDuke 6 17.0 17.4 5.7 25.7 31.9 14.1 19.5 20.4 30.1 50.3 5.7 3 MichiganChicago 7 49.6 19.6 10.2 38.9 46.5 14.6 24.0 20.8 34.6 33.7 10.2 3 MichiganColumbia 8 23.0 21.3 27.8 0.0 41.0 46.7 48.1 39.6 70.9 86.3 0.0 4 ColumbiaDartmouth 9 26.3 23.6 3.7 36.0 29.8 16.3 21.7 16.1 28.9 43.1 3.7 3 MichiganBerkeley 10 36.5 26.8 17.5 39.2 9.9 23.2 21.6 16.6 26.5 38.5 9.9 5 UCLAMichigan 11 24.0 18.7 0.0 27.8 32.6 11.9 15.1 16.8 28.7 44.0 0.0 3 MichiganVirginia 12 29.0 26.3 7.6 26.6 22.8 18.8 23.4 8.2 26.3 36.8 7.6 3 MichiganNYU 13 33.4 29.3 26.7 9.8 32.3 33.8 29.0 26.4 56.8 62.4 9.8 4 ColumbiaYale 14 18.2 15.4 7.5 26.7 26.9 14.7 15.7 22.1 30.2 47.7 7.5 3 MichiganUCLA 15 38.4 35.0 32.6 41.0 0.0 49.6 36.1 25.6 49.0 54.0 0.0 5 UCLACornell 16 41.6 29.3 6.5 48.7 39.8 10.2 13.4 20.1 28.9 32.9 6.5 3 MichiganNorth Carolina 17 36.8 34.0 8.8 39.1 37.2 12.0 17.3 13.6 19.0 41.5 8.8 3 MichiganCarnegie-Mellon 18 49.9 29.5 11.9 46.7 49.6 0.0 18.4 18.3 17.1 26.0 0.0 6 Carnegie-MellonTexas 19 53.0 39.0 16.4 46.6 25.4 23.3 24.4 7.0 22.7 27.1 7.0 8 GeorgetownUSC 20 48.4 41.3 15.1 48.1 36.1 18.4 0.0 24.6 34.8 45.5 0.0 7 USCIndiana 21 57.0 40.4 15.3 56.3 31.0 15.9 21.2 10.3 7.3 27.9 7.3 9 Michigan StateEmory 22 67.9 46.3 18.9 57.2 43.3 15.4 28.7 5.4 18.1 15.6 5.4 8 GeorgetownRochester 23 104.8 55.2 44.0 86.3 54.0 26.0 45.5 16.5 31.8 0.0 0.0 10 RochesterGeorgetown 24 56.4 34.7 16.8 39.6 25.6 18.3 24.6 0.0 19.2 16.5 0.0 8 Georgetown

=VLOOKUP(C42,$A$2:$B$28,2,0)

=VLOOKUP($B40,$B$2:$P$28,D$32,0)

=SUMXMY2($C5:$P5,$D$35:$Q$35)=MIN(C51:L51)

=MATCH(M53,C53:L53,0)

=VLOOKUP(N55,$A$34:$B$43,2)


We set up the model in a large spreadsheet, as shown here.

The upper section contains the standardized data, the middle section contains information about the 10 centroids, and the lower section evaluates the distances between each school and each of the centroids, and assigns schools to clusters on the basis of minimum distance.

Optimization Procedure


313233

34353637383940414243

A B C D E F G H I J K L M N O PCluster centers and standardized values

Column offset 2 3 4 5 6 7 8 9 10 11 12 13 14Cluster School Index Accept % Enroll % GMAT GPA Cost Minority pct Non-U.S. pct Female % Salary Density St Body Age Rolling

1 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75262 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75263 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75264 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75265 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75266 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75267 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75268 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.75269 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.7526

10 Stanford 1 -1.6267 2.0201 1.6671 1.8161 0.7775 1.0862 -1.5873 1.6804 1.7438 -0.8812 0.1812 -1.6849 -0.7526

=VLOOKUP(C42,$A$2:$B$28,2,0)

=VLOOKUP($B40,$B$2:$P$28,D$32,0)

We begin by setting up cells C34:C43, where Solver can identify which schools are centroids. In this initial solution, all centroids have a value of 1 (the index for Stanford), and the corresponding standardized data for Stanford appear in D34:P43. These indices will be manipulated by Solver to find the best ten centroids.

Decision Variables


In the lower section of the worksheet, we calculate the total squared distance from each school to each centroid, and pick the minimum. Cell B45 — the objective function in this problem — is the sum of M49:M75.45

464748495051525354555657585960616263646566676869707172737475

A B C D E F G H I J K L M N OSumSqDists 1107.029838

Squared distances to centers Assigned toSchool Index To 1 To 2 To 3 To 4 To 5 To 6 To 7 To 8 To 9 To 10 Minimum Cluster SchoolStanford 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 StanfordHarvard 2 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 19.7 1 StanfordPenn 3 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 24.2 1 StanfordMIT 4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.4 1 StanfordNorthwestern 5 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 18.8 1 StanfordDuke 6 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 1 StanfordChicago 7 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 49.6 1 StanfordColumbia 8 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 23.0 1 StanfordDartmouth 9 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 26.3 1 StanfordBerkeley 10 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 36.5 1 StanfordMichigan 11 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 24.0 1 StanfordVirginia 12 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 1 StanfordNYU 13 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 33.4 1 StanfordYale 14 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 18.2 1 StanfordUCLA 15 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 38.4 1 StanfordCornell 16 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 41.6 1 StanfordNorth Carolina 17 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 36.8 1 StanfordCarnegie-Mellon 18 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 49.9 1 StanfordTexas 19 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 53.0 1 StanfordUSC 20 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 48.4 1 StanfordIndiana 21 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 57.0 1 StanfordEmory 22 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 67.9 1 StanfordRochester 23 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 104.8 1 StanfordGeorgetown 24 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 56.4 1 StanfordMichigan State 25 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 72.2 1 StanfordOhio State 26 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 63.5 1 StanfordMinnesota 27 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 78.2 1 Stanford

=SUMXMY2($C5:$P5,$D$35:$Q$35)


Here’s a close-up view of the cells where the assignments are made:

4849505152535455

M N O P Q R SMinimum Cluster School

0.0 6 Stanford5.6 10 Penn0.0 10 Penn8.3 7 Dartmouth6.2 10 Penn6.8 7 Dartmouth

10.6 1 Cornell

=MIN(C51:L51)

=MATCH(M53,C53:L53,0)

=VLOOKUP(N55,$A$34:$B$43,2)






Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

Indiana Dartmouth Ohio State

Michigan State Michigan Minnesota

Virginia

Carnegie-Mellon Yale USC

Cornell

Harvard North Carolina Stanford

Penn

Northwestern UCLA Rochester

Berkeley


Columbia MIT Texas

NYU Northwestern Emory

Duke Georgetown

Indiana Chicago Ohio State

Michigan State Dartmouth Minnesota

Michigan

Harvard Virginia USC

Penn Yale

Cornell Stanford

UCLA North Carolina

Berkeley Carnegie-Mellon Rochester


Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

Indiana Dartmouth Ohio State

Michigan State Michigan Minnesota

Virginia

Harvard Yale UCLA

Penn Cornell Berkeley

Northwestern North Carolina

Carnegie-Mellon Stanford

Rochester USC


Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

UCLA Dartmouth Rochester

Berkeley Michigan Ohio State

Virginia Minnesota

Harvard Yale

Penn Cornell Indiana

Northwestern North Carolina Michigan State

Carnegie-Mellon

USC Stanford


Columbia MIT Texas

NYU Duke Emory

Chicago Georgetown

UCLA Dartmouth Rochester

Berkeley Michigan Minnesota

Virginia Indiana

Harvard Yale Michigan State

Penn Cornell Ohio State

Northwestern North Carolina

Carnegie-Mellon Stanford

USC


Columbia MIT Texas

NYU Northwestern Emory

Duke Georgetown

UCLA Chicago Rochester

Berkeley Dartmouth Minnesota

Michigan Indiana

Harvard Virginia Michigan State

Penn Yale Ohio State

Stanford Cornell

North Carolina

Carnegie-Mellon

USC


Stanford Duke Texas

Harvard Chicago Emory

Penn Dartmouth Georgetown

MIT Michigan Rochester

Northwestern Virginia Minnesota

Columbia Yale Indiana

NYU Cornell Michigan State

North Carolina Ohio State

UCLA Carnegie-Mellon

Berkeley USC


Stanford Duke Texas

Harvard Chicago Emory

Penn Dartmouth Georgetown

MIT Michigan Rochester

Northwestern Virginia Minnesota

Columbia Yale Indiana

NYU Cornell Michigan State

Berkeley North Carolina Ohio State

Carnegie-Mellon UCLA

USC


Stanford Duke Dartmouth

Harvard Chicago Yale

Penn Columbia Cornell

MIT NYU Michigan

Northwestern Berkeley

Virginia Carnegie-Mellon Ohio State

UCLA USC Emory

North Carolina Texas Georgetown

Minnesota Indiana Rochester

Michigan State


Stanford Duke Dartmouth

Harvard Chicago Yale

Penn Columbia Cornell

MIT NYU Michigan

Northwestern Carnegie-Mellon Berkeley

Virginia USC Ohio State

UCLA Texas Emory

North Carolina Indiana Georgetown

Minnesota Michigan State Rochester


Cluster Analysis Questions

• Given a certain number of clusters, which schools are grouped together?

• How is the set of clusters affected if we change the number of clusters?

• For each cluster, which school is the most “typical”?

• How different are the clusters from each other?

• What is the best number of clusters?


Given a certain number of clusters, which schools are

grouped together?• Columbia and NYU are always in the same

cluster, as are Harvard-Penn, Indiana-Michigan State.

• Michigan-Cornell-Yale-Dartmouth-Chicago-Duke.

• Texas-Emory-Georgetown-Minnesota.• What happens with UCLA-Berkeley?


How is the set of clusters affected if we change the

number of clusters?• Notice the behavior of Northwestern as we

reduce the number of clusters.• Stanford seems to be very different from all

other schools; the last school to have its own cluster.


For each cluster, which school is the most “typical”?

• The centroid represents the school most typical in each cluster.

• We observe that Michigan is almost always the centroid of a large cluster.


How different are the clusters from each other?

• This is difficult to assess with this method; Minitab will provide more useful output.


TSS vs. Number of Clusters

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10

Clusters

To

tal S

qu

are

d D

ista

nc

es

What is the best number of clusters?


Acc. % Enr. % GMAT GPA Cost Min. % Int. % Fem. % Salary Pop. Den. Size Age Rolling JobAcc. % 1Enr. % -0.614 1GMAT -0.771 0.723 1GPA -0.810 0.637 0.703 1Cost -0.299 0.512 0.636 0.232 1

Min. % -0.436 0.403 0.602 0.347 0.463 1Int. % 0.336 -0.040 -0.383 -0.202 0.011 -0.218 1

Fem. % -0.419 0.273 0.377 0.445 0.220 0.489 -0.250 1Salary -0.674 0.739 0.838 0.579 0.717 0.476 -0.167 0.255 1

Pop. Den. -0.416 0.398 0.479 0.294 0.389 0.250 0.045 0.339 0.241 1Size -0.485 0.757 0.642 0.531 0.532 0.323 -0.026 0.188 0.632 0.506 1Age 0.472 -0.231 -0.393 -0.269 -0.117 -0.214 0.436 -0.462 -0.197 -0.226 -0.069 1

Rolling 0.158 -0.423 -0.300 -0.117 -0.166 -0.501 -0.048 0.086 -0.379 0.183 -0.248 -0.037 1Job 0.045 0.415 0.167 -0.122 0.384 0.082 -0.094 0.082 0.234 0.011 0.145 -0.210 -0.387 1

Correlation issues?


Minorities vs. Enroll %

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Enroll % (standardized)

Min

ori

ty %

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

USC

Cornell

Columbia

Wharton

MIT

StanfordNYU


Women vs. Enroll %

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0


Fe

ma

le %

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

Ohio State

Emory

Cornell

Columbia

Wharton

MIT

Stanford

NYU


International vs. Enroll %

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0


Intl

% (

sta

nd

ard

ize

d)

Harvard

Minnesota

Cornell

Columbia

Wharton

MIT

Stanford

Carnegie-Mellon

NYU

Rochester


Population vs. Enroll %

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0


Po

p. D

en

sit

y (

sta

nd

ard

ize

d) Harvard

Minnesota

CornellDartmouth

NYU Columbia

Wharton

MIT

Stanford


GMAT vs. Enroll %

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0


GM

AT

(s

tan

da

rdiz

ed

)

Harvard

Minnesota

Cornell

Columbia

Wharton

MIT

Stanford

Rochester

NYU



Number of obs. Number of Similarity Distance Clusters New in newStep clusters level level joined cluster cluster 1 26 81.1711 1.92787 9 17 9 2 2 25 79.2999 2.11946 5 6 5 2 3 24 77.4028 2.31370 21 22 21 2 4 23 76.9886 2.35612 1 4 1 2 5 22 76.6438 2.39141 9 10 9 3 6 21 76.5324 2.40283 5 11 5 3 7 20 76.5139 2.40472 5 9 5 6 8 19 76.2752 2.42916 5 23 5 7 9 18 75.5949 2.49881 1 5 1 9 10 17 75.1837 2.54091 1 15 1 10 11 16 74.3490 2.62638 1 16 1 11 12 15 74.1586 2.64588 13 21 13 3 13 14 73.5778 2.70534 12 19 12 2 14 13 72.2647 2.83979 12 13 12 5 15 12 72.0863 2.85805 1 12 1 16 16 11 69.3901 3.13412 3 14 3 2 17 10 69.2620 3.14723 18 20 18 2 18 9 68.8567 3.18872 1 7 1 17 19 8 68.8347 3.19098 1 8 1 18 20 7 65.8747 3.49405 26 27 26 2 21 6 64.3943 3.64563 1 26 1 20 22 5 64.2088 3.66462 1 18 1 22 23 4 64.1918 3.66636 1 25 1 23 24 3 62.7114 3.81794 1 3 1 25 25 2 61.3983 3.95238 1 24 1 26 26 1 59.7832 4.11776 1 2 1 27



Average Maximum Within distance distance Number of cluster sum from from observations of squares centroid centroidCluster1 16 152.337 2.94527 5.25820Cluster2 1 0.000 0.00000 0.00000Cluster3 2 4.911 1.56706 1.56706Cluster4 1 0.000 0.00000 0.00000Cluster5 1 0.000 0.00000 0.00000Cluster6 2 4.953 1.57362 1.57362Cluster7 1 0.000 0.00000 0.00000Cluster8 1 0.000 0.00000 0.00000Cluster9 1 0.000 0.00000 0.00000Cluster10 1 0.000 0.00000 0.00000


Variable Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6Accept % -0.038602 -1.62669 -1.14449 0.94506 0.97185 -1.35880Enroll % 0.107802 2.02010 0.57950 0.42423 0.41448 -0.55259GMAT 0.015098 1.66709 1.10928 -0.82163 0.68019 0.44419GPA 0.031042 1.81605 0.22938 -1.56203 -0.74310 1.15067Cost 0.099078 0.77748 1.05504 -0.22952 0.68133 -1.39023Minority pct -0.163118 1.08618 0.61740 0.57990 -0.00139 0.42989Non-U.S. pct -0.011110 -1.58728 -0.48491 1.80461 0.44785 -0.48491Female % -0.308277 1.68044 1.39412 -0.03747 -1.37362 0.05797Salary (base+signing) 0.117210 1.74379 0.41147 0.06074 0.50592 0.21257Pop. Den. In City (people per s -0.273673 -0.88123 2.71135 -0.16828 0.77406 0.15817Size: Student Body 0.132103 0.18125 0.56664 -0.63609 0.92357 -0.32053Mean Age 0.021061 -1.68491 -1.68491 0.13479 1.95450 0.13479Rolling (1=yes) -0.244604 -0.75263 1.27947 -0.75263 -0.75263 0.26342Students w/ first job offer by 0.295438 0.57646 0.34588 0.80705 0.69176 -2.47880 Variable Cluster7 Cluster8 Cluster9 Cluster10 Grand centroidAccept % 1.45405 1.05221 1.05221 1.77552 -0.0000000Enroll % -1.01274 -1.06333 -1.24651 -1.31486 0.0000000GMAT -1.80853 -0.13508 -1.46526 -1.46526 -0.0000000GPA -0.74310 -1.05020 -0.23127 -0.74310 0.0000000Cost 0.21335 0.55060 -1.40373 -1.50438 -0.0000000Minority pct -1.48274 2.64253 -0.50767 -1.80151 -0.0000000Non-U.S. pct 3.16136 -0.73930 0.27826 -1.24809 0.0000000Female % -0.99186 0.15341 2.25307 0.34429 0.0000000Salary (base+signing) -0.92224 -1.23231 -1.30021 -1.97914 0.0000000Pop. Den. In City (people per s -0.17789 -0.02694 -0.71307 -0.16693 0.0000000Size: Student Body -0.60246 -0.25070 -1.05769 -1.16373 -0.0000000Mean Age 1.95450 0.13479 0.13479 0.13479 -0.0000000Rolling (1=yes) 1.27947 -0.75263 1.27947 1.27947 0.0000000Students w/ first job offer by -1.72939 -0.69176 -0.92234 0.80705 0.0000000


Distances Between Cluster Centroids Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8Cluster1 0.00000 5.1787 4.66361 3.07784 3.14580 3.86172 5.6111 3.98074Cluster2 5.17873 0.0000 5.07379 7.06157 7.04249 5.91279 10.2389 6.95451Cluster3 4.66361 5.0738 0.00000 6.14703 6.00605 5.48510 8.4794 6.00463Cluster4 3.07784 7.0616 6.14703 0.00000 3.81626 5.82255 5.1003 4.29133Cluster5 3.14580 7.0425 6.00605 3.81626 0.00000 5.81509 5.8010 4.89544Cluster6 3.86172 5.9128 5.48510 5.82255 5.81509 0.00000 6.6180 5.13248Cluster7 5.61115 10.2389 8.47936 5.10025 5.80099 6.61797 0.0000 6.74368Cluster8 3.98074 6.9545 6.00463 4.29133 4.89544 5.13248 6.7437 0.00000Cluster9 4.66285 7.9670 6.77100 4.95710 6.69781 5.01759 5.2538 5.19326Cluster10 4.89897 8.8418 7.36838 5.48899 6.39972 6.38305 5.9808 5.87407 Cluster9 Cluster10Cluster1 4.66285 4.89897Cluster2 7.96699 8.84179Cluster3 6.77100 7.36838Cluster4 4.95710 5.48899Cluster5 6.69781 6.39972Cluster6 5.01759 6.38305Cluster7 5.25382 5.98082Cluster8 5.19326 5.87407Cluster9 0.00000 3.49405Cluster10 3.49405 0.00000


Married Sophisticates: You're in your late 20s or early 30s, recently married and likely have a household income between $50,000 and $100,000. You probably own a home, most likely in an upscale suburban neighborhood. You're a fan of "green and trendy cars," shop at Banana Republic and The Gap and are a loyal Netflix Inc. subscriber.

Truckin' & Stylin': You're in your 30s or 40s, live in a rural town and earn a moderate income. You may be married, but you don't have any children. You shop at stores like Wal-mart and AutoZone and enjoy watching NASCAR and classic shows on TV Land.

Collegiate Crowd: Between 18 and 23 years old, you're single and highly mobile. You're likely a renter and probably live in a college town. You buy clothes from American Eagle and Express Inc. and are a frequent liquor store patron. Your TV is tuned to Family Guy and you probably have copies of Rolling Stone and Us Weekly lying around.


Shooting Stars: You're in your 30s or 40s, married without any kids. You enjoy a six-figure household income and likely have a graduate degree. You shop at stores like Ann Taylor and Sephora, read magazines like Men's Health and Real Simple and use the web to check your stock investments and make travel plans.

Apple Pie Families: You're part of an upper-middle class family, likely living in a smaller city or nearby suburb. You probably drive a minivan. You shop at stores like Home Depot, Target and Best Buy, read Sports Illustrated and listen to NPR.

City Mixers: You're a childless, single "urbanite" living in a city like New York, Los Angeles or Chicago. Well-educated, you likely enjoy museums and the theater. You buy groceries from Trader Joe's and Whole Foods, outfit your home with Crate & Barrel and buy clothes from Banana Republic. You read The New York Times and watch The Office.


224143252018272687222113191216152310179116541

59.78

73.19

86.59

100.00

Observations

Sim

ilarityDendrogram

Single Linkage, Euclidean Distance


1 2 3 4 5 6 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia

14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell

1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn

4 Penn 4 Penn 4 Penn 4 Penn

9 Michigan 9 Michigan 9 Michigan

9 Michigan 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth

17 Dartmouth 17 Dartmouth 17 Dartmouth 10 Duke 10 Duke

17 Dartmouth 10 Duke

10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 10 Duke 11 Yale 5 MIT

11 Yale 11 Yale 11 Yale 5 MIT 6 Northwestern

11 Yale 5 MIT 6 Northwestern

5 MIT 5 MIT 5 MIT 6 Northwestern 23 North Carolina

5 MIT 6 Northwestern 6 Northwestern 23 North Carolina

6 Northwestern 23 North Carolina 16 Virginia 6 Northwestern 23 North Carolina 23 North Carolina 16 Virginia

23 North Carolina 16 Virginia 21 Emory

23 North Carolina 16 Virginia 16 Virginia 21 Emory 22 Georgetown

16 Virginia 21 Emory 22 Georgetown

16 Virginia 21 Emory 21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas 18 Berkeley 22 Georgetown 13 Texas 12 Michigan State 22 Georgetown 13 Texas 12 Michigan State 20 UCLA 13 Texas 12 Michigan State 19 Indiana 13 Texas 12 Michigan State 19 Indiana 25 USC 12 Michigan State 19 Indiana 18 Berkeley 12 Michigan State 19 Indiana 18 Berkeley 26 Ohio State 19 Indiana 18 Berkeley 20 UCLA 19 Indiana 18 Berkeley 20 UCLA 27 Minnesota 18 Berkeley 20 UCLA 25 USC 18 Berkeley 20 UCLA 25 USC 7 Carnegie-Mellon 20 UCLA 25 USC 26 Ohio State 20 UCLA 25 USC 26 Ohio State 8 Chicago 25 USC 26 Ohio State 27 Minnesota 25 USC 26 Ohio State 27 Minnesota 12 Michigan State 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 19 Indiana 27 Minnesota 7 Carnegie-Mellon 8 Chicago 27 Minnesota 7 Carnegie-Mellon 8 Chicago 24 Rochester 7 Carnegie-Mellon 8 Chicago 24 Rochester

7 Carnegie-Mellon 8 Chicago 24 Rochester 8 Chicago 24 Rochester

8 Chicago 24 Rochester 24 Rochester 24 Rochester


7 8 9 10 11 12 13 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia

14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU

15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell 1 Harvard 1 Harvard 1 Harvard 1 Harvard

1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 9 Michigan 9 Michigan

9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth

9 Michigan 9 Michigan 17 Dartmouth 10 Duke 10 Duke 10 Duke 10 Duke 17 Dartmouth 17 Dartmouth 10 Duke 11 Yale 11 Yale 11 Yale 11 Yale 10 Duke 10 Duke 11 Yale 5 MIT 5 MIT 5 MIT 5 MIT 11 Yale 11 Yale 5 MIT 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern

5 MIT 5 MIT 6 Northwestern 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina

6 Northwestern 6 Northwestern 23 North Carolina 16 Virginia 16 Virginia 16 Virginia

23 North Carolina 16 Virginia

23 North Carolina 16 Virginia 21 Emory 21 Emory 21 Emory

16 Virginia 21 Emory 22 Georgetown 22 Georgetown 22 Georgetown

16 Virginia 21 Emory 22 Georgetown 13 Texas 13 Texas

21 Emory 22 Georgetown 13 Texas

21 Emory 22 Georgetown 13 Texas 18 Berkeley 18 Berkeley 22 Georgetown 13 Texas 18 Berkeley

13 Texas 18 Berkeley 20 UCLA 20 UCLA 13 Texas 18 Berkeley 20 UCLA 18 Berkeley 20 UCLA 25 USC 25 USC 18 Berkeley 20 UCLA 25 USC 20 UCLA 25 USC 26 Ohio State 26 Ohio State 20 UCLA 25 USC 26 Ohio State 25 USC 26 Ohio State 27 Minnesota 27 Minnesota 25 USC 26 Ohio State 27 Minnesota 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 7 Carnegie-Mellon 26 Ohio State 27 Minnesota 7 Carnegie-Mellon 27 Minnesota 7 Carnegie-Mellon 8 Chicago 8 Chicago 27 Minnesota 7 Carnegie-Mellon 8 Chicago

7 Carnegie-Mellon 8 Chicago 12 Michigan State 12 Michigan State 7 Carnegie-Mellon 8 Chicago 12 Michigan State 19 Indiana

8 Chicago 12 Michigan State 19 Indiana 8 Chicago 12 Michigan State 19 Indiana 24 Rochester

12 Michigan State 19 Indiana 24 Rochester 12 Michigan State 19 Indiana 24 Rochester 19 Indiana 24 Rochester 19 Indiana 24 Rochester 24 Rochester 24 Rochester


14 15 16 17 18 19 20 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU

14 NYU 14 NYU

15 Cornell 15 Cornell 15 Cornell 15 Cornell 15 Cornell

15 Cornell 15 Cornell 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth 17 Dartmouth

17 Dartmouth 17 Dartmouth 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 11 Yale 5 MIT 5 MIT 5 MIT 5 MIT 5 MIT

5 MIT 5 MIT 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 6 Northwestern 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina 23 North Carolina

23 North Carolina 23 North Carolina 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 16 Virginia 21 Emory 21 Emory 21 Emory 21 Emory 21 Emory

21 Emory 22 Georgetown 22 Georgetown 22 Georgetown 22 Georgetown 22 Georgetown

21 Emory 22 Georgetown 13 Texas 13 Texas 13 Texas 13 Texas 13 Texas 22 Georgetown 13 Texas 12 Michigan State 12 Michigan State 12 Michigan State 12 Michigan State 12 Michigan State 13 Texas 12 Michigan State 19 Indiana 19 Indiana 19 Indiana 19 Indiana 19 Indiana

12 Michigan State 19 Indiana 7 Carnegie-Mellon 7 Carnegie-Mellon 7 Carnegie-Mellon

19 Indiana 18 Berkeley 18 Berkeley 8 Chicago 8 Chicago

18 Berkeley 20 UCLA 18 Berkeley

18 Berkeley 20 UCLA 20 UCLA 18 Berkeley 18 Berkeley

20 UCLA 25 USC 20 UCLA 20 UCLA

20 UCLA 25 USC 25 USC 25 USC 26 Ohio State 25 USC 25 USC 25 USC 26 Ohio State 26 Ohio State

26 Ohio State 27 Minnesota 26 Ohio State 26 Ohio State 26 Ohio State 27 Minnesota 27 Minnesota 27 Minnesota

27 Minnesota 7 Carnegie-Mellon 27 Minnesota 27 Minnesota 7 Carnegie-Mellon 8 Chicago 24 Rochester 7 Carnegie-Mellon 8 Chicago 24 Rochester

7 Carnegie-Mellon 8 Chicago 24 Rochester 8 Chicago 24 Rochester

8 Chicago 24 Rochester 24 Rochester 24 Rochester


21 22 23 24 25 26

2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 2 Stanford 3 Columbia

3 Columbia 3 Columbia 3 Columbia 3 Columbia 3 Columbia 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 14 NYU 15 Cornell

15 Cornell 15 Cornell 1 Harvard

15 Cornell 15 Cornell 15 Cornell 1 Harvard 1 Harvard 4 Penn 1 Harvard 1 Harvard 1 Harvard 4 Penn 4 Penn 9 Michigan 4 Penn 4 Penn 4 Penn 9 Michigan 9 Michigan 17 Dartmouth 9 Michigan 9 Michigan 9 Michigan 17 Dartmouth 17 Dartmouth 10 Duke

17 Dartmouth 17 Dartmouth 17 Dartmouth 10 Duke 10 Duke 11 Yale 10 Duke 10 Duke 10 Duke 11 Yale 11 Yale 5 MIT 11 Yale 11 Yale 11 Yale 5 MIT 5 MIT 6 Northwestern 5 MIT 5 MIT 5 MIT 6 Northwestern 6 Northwestern 23 North Carolina 6 Northwestern 6 Northwestern 6 Northwestern 23 North Carolina 23 North Carolina 16 Virginia

23 North Carolina 23 North Carolina 23 North Carolina 16 Virginia 16 Virginia 21 Emory 16 Virginia 16 Virginia 16 Virginia 21 Emory 21 Emory 22 Georgetown 21 Emory 21 Emory 21 Emory 22 Georgetown 22 Georgetown 13 Texas 22 Georgetown 22 Georgetown 22 Georgetown 13 Texas 13 Texas 12 Michigan State 13 Texas 13 Texas 13 Texas 12 Michigan State 12 Michigan State 19 Indiana 12 Michigan State 12 Michigan State 12 Michigan State 19 Indiana 19 Indiana 7 Carnegie-Mellon 19 Indiana 19 Indiana 19 Indiana 7 Carnegie-Mellon 7 Carnegie-Mellon 8 Chicago 7 Carnegie-Mellon 7 Carnegie-Mellon 7 Carnegie-Mellon 8 Chicago 8 Chicago 26 Ohio State 8 Chicago 8 Chicago 8 Chicago 26 Ohio State 26 Ohio State 27 Minnesota

26 Ohio State 26 Ohio State 26 Ohio State 27 Minnesota 27 Minnesota 18 Berkeley 27 Minnesota 27 Minnesota 27 Minnesota 18 Berkeley 18 Berkeley 20 UCLA

18 Berkeley 18 Berkeley 20 UCLA 20 UCLA 25 USC

18 Berkeley 20 UCLA 20 UCLA 25 USC 25 USC 24 Rochester

20 UCLA 25 USC 24 Rochester

25 USC 24 Rochester 25 USC 24 Rochester 24 Rochester 24 Rochester


Discriminant Analysis• Concerned with predicting membership into

two or more sub-groups (categories) on the basis of predictor variables

• Similar to regression in terms of its purpose: finding a function of independent variables that enables us to correctly forecast the value of a dependent variable

• Dependent variable is categorical


123456789

10111213141516171819202122232425262728293031

A B C D E F G H I J K L M N O PSubject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default

1 1 0 0 0 1 0 0 0 0 0 29 $65,311 $185,246 1 02 0 1 0 0 0 0 1 0 0 1 44 $25,803 $24,699 0 03 0 1 0 0 0 1 0 0 0 1 28 $33,286 $59,406 0 04 0 0 1 0 1 0 0 0 0 0 39 $53,188 $170,868 0 05 0 1 0 0 0 1 0 0 0 1 49 $75,419 $101,881 0 06 0 1 0 0 0 0 0 0 1 1 52 $77,962 $61,582 1 17 0 1 0 0 0 0 1 0 0 1 35 $37,222 $28,267 0 08 0 1 0 0 0 0 0 1 0 1 54 $52,914 $44,654 0 19 0 1 0 0 0 1 0 0 0 1 34 $67,021 $92,176 0 0

10 0 0 1 0 1 0 0 0 0 1 42 $74,753 $191,216 0 011 0 1 0 0 0 0 1 0 0 1 40 $59,282 $52,319 0 012 1 0 0 0 0 1 0 0 0 0 36 $46,501 $71,008 1 013 0 1 0 0 1 0 0 0 0 1 33 $40,820 $159,388 0 014 1 0 0 0 0 1 0 0 0 0 38 $36,557 $64,047 0 015 0 1 0 0 0 0 1 0 0 1 27 $62,586 $56,442 1 016 1 0 0 0 0 1 0 0 0 0 53 $69,656 $94,161 0 017 0 1 0 0 0 0 1 0 0 1 32 $74,703 $66,860 1 018 0 1 0 0 0 0 1 0 0 1 31 $59,561 $54,065 1 019 0 1 0 0 0 0 1 0 0 1 42 $50,329 $41,829 0 020 0 0 1 0 0 1 0 0 0 1 50 $67,447 $89,373 1 021 0 1 0 0 1 0 0 0 0 1 39 $21,207 $136,043 0 022 0 1 0 0 0 1 0 0 0 1 25 $18,380 $42,364 1 023 0 1 0 0 1 0 0 0 0 1 40 $61,626 $173,986 1 024 0 1 0 0 0 1 0 0 0 1 26 $45,353 $67,803 1 025 1 0 0 0 0 1 0 0 0 1 35 $54,935 $84,884 1 026 1 0 0 0 0 1 0 0 0 0 37 $30,084 $57,730 1 127 0 1 0 0 0 0 0 1 0 1 45 $40,077 $33,029 1 028 0 1 0 0 1 0 0 0 0 1 47 $29,328 $147,257 1 029 0 1 0 0 0 1 0 0 0 1 44 $33,745 $58,952 1 030 1 0 0 0 0 1 0 0 0 0 24 $18,004 $45,675 1 0

Rick Beck Consumer Credit


Excel Method• Standardize data• Create coefficients for each

independent variable• Create a “score” for each observation

(the sumproduct of the independent variables and the coefficients)

• Create a “cut-off” value


Excel Method• Use the cut-off value as a decision rule

for categorization – a predicted value of the dependent variable

• Track how many observations are correctly predicted using the current coefficients and cut-off value

• Optimize the coefficients and cut-off value so as to maximize the number of correct predictions


123456789

1011121314151617181920212223242526272829303132333435363738

A B C D E F G H I J K L M N O P Q RDiscriminant Scores

1 1 1 1 1 1 1 1 1 1 1 1 1 1

Cut-off Value 1

Number Correct 582

Subject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default Score Prediction1 1.57 -1.13 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -1.06 0.85 1.92 1.01 0 1.20 12 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.37 -1.27 -1.12 -0.99 0 -2.76 03 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.15 -0.87 -0.47 -0.99 0 -3.17 04 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -0.11 0.20 1.65 -0.99 0 0.53 05 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.85 1.40 0.34 -0.99 0 1.90 16 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 -0.35 3.96 0.84 1.13 1.53 -0.43 1.01 1 5.51 17 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.49 -0.66 -1.06 -0.99 0 -2.94 08 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 1.32 0.18 -0.75 -0.99 1 1.00 09 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.58 0.94 0.15 -0.99 0 -0.16 0

10 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.18 1.36 2.03 -0.99 0 4.39 111 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.01 0.53 -0.60 -0.99 0 -0.82 012 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.39 -0.16 -0.25 1.01 0 -1.31 013 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.68 -0.47 1.43 -0.99 0 -0.39 014 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.20 -0.70 -0.38 -0.99 0 -3.79 015 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -1.25 0.71 -0.52 1.01 0 0.20 016 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 1.23 1.09 0.19 -0.99 0 -0.01 017 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.77 1.36 -0.33 1.01 0 1.52 118 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.87 0.54 -0.57 1.01 0 0.37 019 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.18 0.05 -0.80 -0.99 0 -1.31 020 -0.64 -1.13 3.20 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.94 0.97 0.10 1.01 0 4.83 121 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.11 -1.52 0.99 -0.99 0 -1.32 022 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.44 -1.67 -0.79 1.01 0 -2.58 023 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.01 0.65 1.71 1.01 0 3.67 124 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.34 -0.22 -0.31 1.01 0 -0.55 025 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.49 0.29 0.02 1.01 0 1.34 126 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.30 -1.04 -0.50 1.01 1 -2.35 027 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 0.46 -0.51 -0.97 1.01 0 1.23 128 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.65 -1.08 1.20 1.01 0 2.09 129 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.37 -0.85 -0.47 1.01 0 0.37 0


91011121314151617181920212223

P Q R S T U VJuly Default Score Prediction Correct?

0 1.20 1 00 -2.76 0 10 -3.17 0 10 0.53 0 10 1.90 1 01 5.51 1 10 -2.94 0 11 1.00 0 00 -0.16 0 10 4.39 1 00 -0.82 0 10 -1.31 0 10 -0.39 0 10 -3.79 0 1

=SUMPRODUCT($B$2:$O$2,B12:O12)

=IF(Q14>$B$4,1,0)

=IF(R16=P16,1,0)


123456

789

10111213

A B C D EDiscriminant Scores

1 1 1 1

Cut-off Value 1

Number Correct 582

Subject Single Married Divorced Widowed1 1.57 -1.13 -0.31 -0.252 -0.64 0.88 -0.31 -0.253 -0.64 0.88 -0.31 -0.254 -0.64 -1.13 3.20 -0.25

=SUM(S10:S1009)


123456

789

1011121314151617181920212223242526272829303132333435363738

A B C D E F G H I J K L M N O P QDiscriminant Scores

2E+29 2E+29 -6.7E+28 8.375E+28 -5.9E+29 -3.1E+29 -1.6E+29 1.9E+29 6.9E+29 -2.8E+29 2.40956E+28 -1.7E+29 3E+29 2.8E+28

Cut-off Value 7E+29

Number Correct 877

Subject Single Married Divorced Widowed Credit A Credit B Credit C Credit D Credit E Children? Age Income Debt Female July Default Score1 1.57 -1.13 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -1.06 0.85 1.92 1.01 0 -1.51E+292 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.37 -1.27 -1.12 -0.99 0 -2.36E+293 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.15 -0.87 -0.47 -0.99 0 -5.00E+294 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 -1.19 -0.11 0.20 1.65 -0.99 0 -7.51E+295 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.85 1.40 0.34 -0.99 0 -6.07E+296 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 -0.35 3.96 0.84 1.13 1.53 -0.43 1.01 1 2.83E+307 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.49 -0.66 -1.06 -0.99 0 -3.42E+298 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 1.32 0.18 -0.75 -0.99 1 6.15E+299 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.58 0.94 0.15 -0.99 0 -6.17E+29

10 -0.64 -1.13 3.20 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.18 1.36 2.03 -0.99 0 -1.40E+3011 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.01 0.53 -0.60 -0.99 0 -4.02E+2912 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.39 -0.16 -0.25 1.01 0 3.80E+2813 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.68 -0.47 1.43 -0.99 0 -6.36E+2914 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.20 -0.70 -0.38 -0.99 0 3.93E+2815 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -1.25 0.71 -0.52 1.01 0 -3.83E+2916 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 1.23 1.09 0.19 -0.99 0 -6.58E+2817 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.77 1.36 -0.33 1.01 0 -4.26E+2918 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 -0.87 0.54 -0.57 1.01 0 -3.59E+2919 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 1.56 -0.35 -0.25 0.84 0.18 0.05 -0.80 -0.99 0 -3.72E+2920 -0.64 -1.13 3.20 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.94 0.97 0.10 1.01 0 -1.19E+3021 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.11 -1.52 0.99 -0.99 0 -5.69E+2922 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.44 -1.67 -0.79 1.01 0 -4.06E+2923 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 -0.01 0.65 1.71 1.01 0 -6.74E+2924 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -1.34 -0.22 -0.31 1.01 0 -5.13E+2925 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 -0.49 0.29 0.02 1.01 0 -5.34E+2926 1.57 -1.13 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 -1.19 -0.30 -1.04 -0.50 1.01 1 1.18E+2927 -0.64 0.88 -0.31 -0.25 -0.60 -0.60 -0.64 2.81 -0.25 0.84 0.46 -0.51 -0.97 1.01 0 7.05E+2928 -0.64 0.88 -0.31 -0.25 1.65 -0.60 -0.64 -0.35 -0.25 0.84 0.65 -1.08 1.20 1.01 0 -5.08E+2929 -0.64 0.88 -0.31 -0.25 -0.60 1.65 -0.64 -0.35 -0.25 0.84 0.37 -0.85 -0.47 1.01 0 -4.13E+29



* ERROR *

After subtracting group means,Credit E is highly correlated with other predictors.

* ERROR * Calculations for discriminant analysis cannot be done.



Linear Method for Response: Default

Predictors: Married, Divorced, Widowed, Credit A, Credit B, Credit C, Credit D, Children, Age, Income, Debt, Male

Group 0 1Count 847 153

Summary of classification

True GroupPut into Group 0 10 757 561 90 97Total N 847 153N correct 757 97Proportion 0.894 0.634

N = 1000 N Correct = 854 Proportion Correct = 0.854


Squared Distance Between Groups

0 10 0.00000 3.437841 3.43784 0.00000

Linear Discriminant Function for Groups

0 1Constant -33.44 -27.08Married 1.68 0.52Divorced 0.23 -0.45Widowed -1.07 -2.23Credit A 233.62 228.17Credit B 86.86 80.92Credit C 38.30 32.32Credit D 34.27 31.99Children 1.88 1.02Age 0.31 0.30Income 0.00 0.00Debt -0.00 -0.00Male 2.49 2.46


Summary of Misclassified Observations

True Pred SquaredObservation Group Group Group Distance Probability 26** 1 0 0 8.366 0.627 1 9.402 0.373 27** 0 1 0 12.842 0.172 1 9.699 0.828 32** 0 1 0 18.46 0.032 1 11.67 0.968 37** 0 1 0 10.55 0.468 1 10.29 0.532 40** 0 1 0 15.651 0.027 1 8.499 0.973 52** 0 1 0 14.99 0.172 1 11.85 0.828 62** 1 0 0 9.263 0.872 1 13.104 0.128 64** 0 1 0 16.556 0.026 1 9.290 0.974


Excel vs. Minitab• In this case the Excel method found a better

solution• The Excel method illustrated here is really only

good for distinguishing between two groups; the Minitab method is more general (multiple groups)

• For the special case in which there are only two groups, there is a better tool called logistic regression (or logit regression). This will be the topic for Session 10.


Steam CaseDetergent plant energy costs are increasing and the company wants to control them better.

The actual causes of energy use are not well understood beyond an intuitive grasp.

The problem is complex, as there are many hypothesized causes.

Relatively good historical records have been maintained.

Question: Can past data be used to build a descriptive and predictive model?

Why the Variation? How can steam usage be predicted or controlled?


Historical plant data are available for a 25 month period on:

STEAM Thousands of tons of high pressure steam used

INV Tons of inventory of fats

PROD Tons of detergent production

WIND Average wind velocity

CDAY Calendar days in the month

OPDAY Operating days in the month

FDAY Number of days below freezing

TEMP Average outside air temperature

STARTS Number of production start-ups


SEQ MONTH STEAM INV PROD WIND CDAY OPDAY FDAY TEMP STARTS 1 1 10.98 5.20 0.61 7.4 31 20 22 35.3 4 2 2 11.13 5.12 0.64 8.0 29 20 25 29.7 5 3 3 12.51 6.19 0.78 7.4 31 23 17 30.8 4 4 4 8.40 3.89 0.49 7.5 30 20 22 58.8 4 5 5 9.27 6.28 0.84 5.5 31 21 0 61.4 5 6 6 8.73 5.76 0.74 8.9 30 22 0 71.3 4 7 7 6.36 3.45 0.42 4.1 31 11 0 74.4 2 8 8 8.50 6.57 0.87 4.1 31 23 0 76.7 5 9 9 7.82 5.69 0.75 4.1 30 21 0 70.7 4 10 10 9.14 6.14 0.76 4.5 31 20 0 57.5 5 11 11 8.24 4.84 0.65 10.3 30 20 11 46.4 4 12 12 12.19 4.88 0.62 6.9 31 21 12 28.9 4 13 1 11.88 6.03 0.79 6.6 31 21 25 28.1 5 14 2 9.57 4.55 0.60 7.3 28 19 18 39.1 5 15 3 10.94 5.71 0.70 8.1 31 23 5 46.8 4 16 4 9.58 5.67 0.74 8.4 30 20 7 48.5 4 17 5 10.09 6.72 0.85 6.1 31 22 0 59.3 6 18 6 8.11 4.95 0.67 4.9 30 22 0 70.0 4 19 7 6.83 4.62 0.45 4.6 31 11 0 70.0 3 20 8 8.88 6.60 0.95 3.7 31 23 0 74.5 4 21 9 7.68 5.01 0.64 4.7 30 20 0 72.1 4 22 10 8.47 5.68 0.75 5.3 31 21 1 58.1 6 23 11 8.86 5.28 0.70 6.2 30 20 14 44.6 4 24 12 10.36 5.36 0.67 6.8 31 20 22 33.4 4 25 1 11.08 5.87 0.70 7.5 31 22 28 28.6 5


Run Chart

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1

Month

Ste

am


• Multiple regression will permit us to fit Steam versus any combination of the explanatory variables.

• It is not necessarily good to use them all. The scientific principle of modeling parsimony should be applied.

• Fitting the “best” simple model and fitting a “full model” can be a useful diagnostics.

• Bottom Line: We will find that several quite good and simple models can be developed in this case.

Building a Useful Regression Model


SEQ MONTH STEAM INV PROD WIND CDAY OPDAY FDAY TEMP STARTS SEQ 1 MONTH 0.3438 1 STEAM -0.1415 -0.4122 1 INV 0.1140 -0.0601 0.3832 1 PROD 0.1088 0.0268 0.3055 0.9436 1 WIND -0.2062 -0.3179 0.4743 -0.1261 -0.1437 1 CDAY 0.1102 0.1738 0.1367 0.3821 0.2482 -0.3168 1 OPDAY 0.0094 -0.1017 0.5361 0.6851 0.7645 0.2311 0.0201 1 FDAY -0.0908 -0.4498 0.6406 -0.1911 -0.2264 0.5581 -0.2048 0.1169 1 TEMP 0.0257 0.3296 -0.8452 -0.0019 0.0677 -0.6163 0.0774 -0.2098 -0.8576 1 STARTS 0.1195 -0.1651 0.3821 0.6163 0.6013 0.0739 -0.0533 0.6006 0.1175 -0.2370 1



One approach to building a model is to start with one independent variable, and then add others sequentially on the basis of their correlation with the dependent variable. (We will examine variants of this approach next time.)

Here are the independent variables, ranked by their correlation with Steam:

STEAM TEMP -0.8452 FDAY 0.6406 OPDAY 0.5361 WIND 0.4743 MONTH -0.4122 INV 0.3832 STARTS 0.3821 PROD 0.3055 CDAY 0.1367


Steam vs. Temp

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60 70 80 90

Temp

Ste

am


Regression Statistics Multiple R 0.8452 R Square 0.7144 Adjusted R Square 0.7020 Standard Error 0.8901 Observations 25 ANOVA df SS MS F Significance F Regression 1 45.5924 45.5924 57.5428 0.0000 Residual 23 18.2234 0.7923 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 13.6230 0.5815 23.4288 0.0000 TEMP -0.0798 0.0105 -7.5857 0.0000


SEQ MONTH STEAM TEMP Predicted STEAM Residuals 1 1 10.98 35.3 10.81 0.17 2 2 11.13 29.7 11.25 -0.12 3 3 12.51 30.8 11.16 1.35 4 4 8.40 58.8 8.93 -0.53 5 5 9.27 61.4 8.72 0.55 6 6 8.73 71.3 7.93 0.80 7 7 6.36 74.4 7.68 -1.32 8 8 8.50 76.7 7.50 1.00 9 9 7.82 70.7 7.98 -0.16 10 10 9.14 57.5 9.03 0.11 11 11 8.24 46.4 9.92 -1.68 12 12 12.19 28.9 11.32 0.87 13 1 11.88 28.1 11.38 0.50 14 2 9.57 39.1 10.50 -0.93 15 3 10.94 46.8 9.89 1.05 16 4 9.58 48.5 9.75 -0.17 17 5 10.09 59.3 8.89 1.20 18 6 8.11 70.0 8.03 0.08 19 7 6.83 70.0 8.03 -1.20 20 8 8.88 74.5 7.68 1.20 21 9 7.68 72.1 7.87 -0.19 22 10 8.47 58.1 8.98 -0.51 23 11 8.86 44.6 10.06 -1.20 24 12 10.36 33.4 10.96 -0.60 25 1 11.08 28.6 11.34 -0.26


Intervals

0

2

4

6

8

10

12

14

16

18

0 10 20 30 40 50 60 70 80 90

Temp

Ste

am

Y-hat

Prediction LCL

Prediction UCL

Confidence LCL

Confidence UCL

Data


Let’s add “Days Below Freezing” to the model: Regression Statistics Multiple R 0.8610 R Square 0.7413 Adjusted R Square 0.7177 Standard Error 0.8663 Observations 25 ANOVA df SS MS F Significance F Regression 2 47.3047 23.6524 31.5153 0.0000 Residual 22 16.5111 0.7505 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 15.4426 1.3310 11.6025 0.0000 FDAY -0.0505 0.0334 -1.5105 0.1451 TEMP -0.1056 0.0199 -5.3039 0.0000

Not encouraging; why?


Let’s skip “Days Below Freezing” and try “Operating Days”: Regression Statistics Multiple R 0.9215 R Square 0.8491 Adjusted R Square 0.8354 Standard Error 0.6616 Observations 25 ANOVA df SS MS F Significance F Regression 2 54.1871 27.0935 61.9043 0.0000 Residual 22 9.6287 0.4377 Total 24 63.8158 Coefficients Standard Error T Stat P-value Intercept 9.1269 1.1028 8.2761 0.0000 TEMP -0.0724 0.0080 -9.0498 0.0000 OPDAY 0.2028 0.0458 4.4314 0.0002

Better!


The most obvious 3-variable model: Regression Statistics Multiple R 0.9279 R Square 0.8609 Adjusted R Square 0.8411 Standard Error 0.6501 Observations 25 ANOVA df SS MS F Significance F Regression 3 54.9417 18.3139 43.3388 0.0000 Residual 21 8.8741 0.4226 Total 24 63.8158 Coefficients Standard Error T Stat P-value Intercept 10.1988 1.3482 7.5647 0.0000 TEMP -0.0802 0.0098 -8.1868 0.0000 OPDAY 0.2108 0.0454 4.6470 0.0001 WIND -0.1295 0.0969 -1.3363 0.1957

The “Wind Velocity” variable doesn’t seem to add much.


A better 3-variable model: Regression Statistics

Multiple R 0.9379 R Square 0.8796

Adjusted R Square 0.8624 Standard Error 0.6049 Observations 25

ANOVA

df SS MS F Significance F Regression 3 56.1310 18.7103 51.1290 0.0000 Residual 21 7.6848 0.3659

Total 24 63.8158 Coefficients Standard Error t Stat P-value

Intercept 8.5663 1.0373 8.2581 0.0000 TEMP -0.0758 0.0075 -10.1578 0.0000 INV 0.4880 0.2117 2.3048 0.0315

OPDAY 0.1082 0.0586 1.8456 0.0791

This one is better; the adjusted R-square is up to 0.8624.


There are some useful inferences to be made from the “full” model, even though it contains some insignificant variables and may not be the best model for forecasting purposes.


Regression Statistics Multiple R 0.9539 R Square 0.9098 Adjusted R Square 0.8648 Standard Error 0.5997 Observations 25 ANOVA df SS MS F Significance F Regression 8 58.0619 7.2577 20.1818 0.0000 Residual 16 5.7539 0.3596 Total 24 63.8158 Coefficients Standard Error t Stat P-value Intercept 6.2909 6.8096 0.9238 0.3693 INV 0.9365 0.5760 1.6257 0.1235 PROD -4.6388 3.9979 -1.1603 0.2629 WIND -0.0868 0.1035 -0.8388 0.4139 CDAY 0.1053 0.2152 0.4893 0.6313 OPDAY 0.2206 0.0810 2.7222 0.0151 FDAY -0.0181 0.0258 -0.7001 0.4939 TEMP -0.0874 0.0163 -5.3696 0.0001 STARTS -0.2548 0.2142 -1.1896 0.2516


Note that R2 for the simple model with only TEMP was 71%, while for the full model it was 91%. This R2 becomes a benchmark for other models.

TEMP is still very significant in the full model, even given that all other variables are in the model.

OPDAYS is the only other variable that is significant on a “last- in” basis.

INV is of borderline significance on a “last in” basis.

These three variables’ significance is evaluated on what might be called a “worst-case” basis.

Conclusions from the Full Model


Examine the correlations among these variables: STEAM INV OPDAY TEMP

STEAM 1 INV 0.3832 1 OPDAY 0.5361 0.6851 1 TEMP -0.8452 -0.0019 -0.2098 1


The bottom line is whether the model makes good predictions:

Predictions vs. Observations

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions

Validating a Three-Parameter Model


Residual AnalysisHistogram of Residuals

0

2

4

6

8

10

12

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy


Normal Probability Plot

-3

-2

-1

0

1

2

3

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Normal Score

Sta

nd

ard

ize

d R

es

idu

al


Residuals vs. Observations

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l Err

or


Residuals vs. Predictions

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l Err

or


Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l Err

or


Residuals vs. Temperature

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l Err

or


Residuals vs. Inventory

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

Inventory of Fats (tons)

Re

sid

ua

l Err

or


Residuals vs. Operating Days

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l Err

or


W e might check the residuals against other independent variables not in the model. W hy?

Residuals vs. Starts

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Starts

Res

idu

al E

rro

r


One possible problem with our model is the potential for “overfitting”, where we have a great model in terms of fitting past data, but a poor model for predicting the future. How can we assess the predictive quality of our model?

One method is to “hold out” some data, and fit a model using only a subset of all of the data we have. Then, we can use the model to see how well it would have predicted the holdout sample.

Holdout Samples


Here, we use one year of data to fit a model: Regression Statistics Multiple R 0.9463 R Square 0.8954 Adjusted R Square 0.8562 Standard Error 0.7080 Observations 12 ANOVA df SS MS F Significance F Regression 3 34.3438 11.4479 22.8370 0.0003 Residual 8 4.0103 0.5013 Total 11 38.3541 Coefficients Standard Error t Stat P-value Intercept 9.8145 1.7183 5.7117 0.0004 TEMP -0.0861 0.0127 -6.7716 0.0001 INV 0.4893 0.3620 1.3516 0.2135 OPDAY 0.0804 0.1161 0.6927 0.5081


Using 1st Year to Predict 2nd Year

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Month

Ste

am

STEAM

Y-hat


Is the 3-variable model significantly better than the 2-variable model?


2 Variables 3 Variables

R-square 0.8491 0.9379

Adjusted R-square 0.8354 0.8624

Standard Error 0.6616 0.6049

Coefficient for Temp -0.0724 -0.0758 Coefficient for Inv 0.4880 Coefficient for Opdays 0.2028 0.1082



0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions


0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Months

Ste

am

Actual Data

Predictions

Predictions and Observations


Histograms of Residuals

Histogram of Residuals

0

2

4

6

8

10

12

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy

Histogram of Residuals

0

1

2

3

4

5

6

7

8

9

10

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Residual Error

Fre

qu

en

cy


Normal Plots


-3

-2

-1

0

1

2

3

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0

Normal Score

Sta

nd

ard

ize

d R

es

idu

al


-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Normal Score

Sta

nd

ard

ize

d R

es

idu

al




-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

6 7 8 9 10 11 12 13

Steam

Re

sid

ua

l E

rro

r




-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 2 4 6 8 10 12 14

Predicted Steam

Re

sid

ua

l E

rro

r


Residuals vs. Temp


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

20 30 40 50 60 70 80

Temperature

Re

sid

ua

l E

rro

r


Residuals vs. Opdays


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

10 12 14 16 18 20 22 24

Operating Days

Re

sid

ua

l E

rro

r




-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0


Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0


Re

sid

ua

l E

rro

r


Residuals vs. Time

Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l E

rro

r

Residuals vs. Time

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 5 10 15 20 25

Months

Re

sid

ua

l E

rro

r




-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0 1 2 3 4 5 6 7

Starts

Re

sid

ua

l E

rro

r


-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Starts

Re

sid

ua

l E

rro

r


Conclusions• We observe very low values for operating days in

months 7 and 19. On investigation we find that the plant shuts down for 2 week vacation in July. Perhaps we need a dummy variable for summer vacation shutdown?

• Some reduced model appears to do the fitting well. Which we’d adopt depends in part on how management wishes to use the model and on whether the predictor variables being considered (in this case INV, TEMP, OPDAYS etc.) can themselves be predicted.

• A more complex model might be possible. Perhaps TEMP*CDAYS would be better than either alone. Perhaps WIND^3*TEMP (measuring a wind-chill factor)? However, it is easy to reach the point of diminishing returns and having paralysis by analysis.


Cars CaseNeed to remove one drive type (I removed FWD)

Need to remove one make (I removed Chevrolet)

Still need to remove Volkswagen because of multicollinearity


A Possible “Enter” Procedure

Regression Statistics Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7a Model 7b Model 7c Model 7dMultiple R 0.9175 0.9408 0.9507 0.9595 0.9632 0.9736 0.9736 0.9738 0.9737 0.9736R Square 0.8418 0.8852 0.9038 0.9206 0.9278 0.9479 0.9479 0.9484 0.9481 0.9479Adjusted R Square 0.8379 0.8793 0.8962 0.9120 0.9178 0.9390 0.9372 0.9377 0.9374 0.9372Standard Error 6492 5602 5195 4782 4623 3983 4040 4024 4033 4041

ABS(Correl) In tercept -7326 9993 10618 13428 11043 -9528 -8779 -8912 -10341 -95320.9175 HP 171 129 118 109 96 136 135 135 137 136

-0.8163 FWD -11145 -9929 -11738 -7298 -8040 -8131 -7930 -8028 -80550.6064 AWD 9886 417 6608 3864 3808 4023 3901 38530.5584 Audi 10095 11199 10615 10505 10460 10507 106440.5507 Lexus 7316 3662 3627 3665 3513 3680

-0.5170 MPG City 635 628 618 668 6340.5163 RWD

-0.2737 Powertrain W arranty (miles) -6-0.2367 Chevrolet -1203-0.2192 Toyota -825-0.1827 Nissan 191-0.1546 Mazda-0.1518 Ford-0.1513 Volkswagen-0.1508 Saturn-0.1455 Honda-0.0928 Trunk-0.0888 Chrysler


A Possible “Remove” ProcedureMultiple R 0.9757 0.9757 0.9757 0.9756 0.9755 0.9753 0.9752 0.9750 0.9748 0.9742 0.9727R Square 0.9520 0.9520 0.9520 0.9519 0.9516 0.9513 0.9510 0.9505 0.9501 0.9490 0.9462Adjusted R Square 0.9243 0.9271 0.9296 0.9320 0.9338 0.9355 0.9372 0.9385 0.9399 0.9402 0.9388Standard Error 4435 4353 4276 4206 4148 4093 4041 3997 3953 3941 3989

Intercept -13060 -13066 -13972 -14427 -15063 -15499 -16268 -18731 -17596 -19837 -20269MPG City 666 666 662 662 656 650 657 673 626 669 687HP 141 141 140 140 140 139 139 141 139 147 145Trunk -1W arranty -70 -70 -59 -55 -46 -35 -25Audi 7857 7857 8440 8640 9056 9193 9305 9866 10047 9533 10060Chrysler 1770 1770 1760 1762 1755Ford -4533 -4535 -4078 -3915 -3586 -3579 -3566 -3222 -3069 -3362Honda -1215 -1215 -776Lexus 2386 2388 2782 2885 3157 2500 2586 2851 3090Nissan -1692 -1693 -1247 -1074Saturn 1899 1900 1897 1902 1908 1443Toyota -2435 -2435 -1983 -1800 -1443 -1417 -1461 -1168Volkswagen -621 -622RW D 7563 7564 7578 7570 7573 8263 8069 7727 7758 9473 10141AWD 11443 11446 11506 11474 11487 11534 11560 11456 11443 10595 10934


Minitab “Best Subsets” Procedure

1 2 3 4 5 6 7Multiple R 0.9175 0.9372 0.9543 0.9669 0.9727 0.9742 0.9748R Square 0.8418 0.8783 0.9107 0.9348 0.9462 0.9490 0.9501Adjusted R Square 0.8379 0.8720 0.9036 0.9278 0.9388 0.9402 0.9399Standard Error 6492 5768 5005 4334 3989 3941 3953

Intercept -7326 -32141 -30500 -23445 -20269 -19837 -17596HP 171.1 206.9 191.6 159.1 145.3 147.2 139.1MPG City 805.4 828.5 721.1 686.9 668.9 626.4Audi 9856 14864 10060 9533 10047RW D 8518 10141 9473 7758AWD 10934 10595 11443Ford -3362 -3069Lexus 3090

Conclusions

• These factors can be used to estimate MSRP within about $4000

• You should expect to pay more for higher HP and for higher fuel efficiency (despite the fact that they are negatively correlated with each other)

• FWD is the least expensive drive train• Audi (and maybe Lexus) is overpriced, based

on these features• Ford is a relative bargain



Comparison of Three Models

Regression Statistics Enter "Best" Remove "Best" Best Subsets #6Multiple R 0.9736 0.9742 0.9742R Square 0.9479 0.9490 0.9490Adjusted R Square 0.9390 0.9402 0.9402Standard Error 3983 3941 3941

Intercept -9528 -19837 -19837HP 135.9 147.2 147.2FWD -8040AWD 3864 10595 10595Audi 10615 9533 9533Lexus 3662MPG City 634.6 668.9 668.9RWD 9473 9473Ford -3362 -3362


SummaryTwo Multivariate Methods • Cluster Analysis

– Excel– Minitab

• Discriminant Analysis– Excel– Minitab

Steam caseCars

Documents

Session 9. Applied Regression -- Prof. Juran2 Outline Two Multivariate Methods Cluster Analysis –Excel –Minitab Discriminant Analysis –Excel –Minitab