13
Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012). "Energy Consumption Quota of Four and Five Star Luxury Hotels Buildings in Hainan Province, China," Energy and Buildings, Vol. 45, pp. 250-256.

Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Embed Size (px)

Citation preview

Page 1: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Multiple Regression in Matrix Form Using EXCEL

Energy Consumption of Luxury Hotels in Hainan Province, China

Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012). "Energy Consumption Quota of Four and Five Star Luxury Hotels Buildings in Hainan Province, China," Energy and Buildings, Vol. 45, pp. 250-256.

Page 2: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Data

• n = 19 Luxury Hotels• Dependent Variable: Y = Energy Consumption (1M*kwh)• Predictors:

X1 = Area (1000s of square meters) X2 = Age (Years) X3 = Effective # of Guest Rooms (#Rooms*Occupancy Rate)

hotel enrgcons area age effrooms hotel enrgcons area age effrooms1 1.9539 43.000 6 136.92 11 6.8655 45.000 10 391.372 1.0456 19.979 16 135.45 12 6.0146 37.435 13 287.173 4.2453 46.529 7 177.59 13 8.1857 50.828 4 170.034 2.1262 20.962 6 156.51 14 11.7361 68.000 13 292.115 2.7860 24.212 5 330.38 15 14.8374 78.868 8 324.546 13.8340 112.200 4 385.39 16 5.3665 28.454 13 167.627 5.5581 45.000 3 159.25 17 13.5162 70.000 4 338.538 4.0012 28.548 6 104.02 18 3.8844 20.000 5 130.029 4.6698 32.865 8 179.50 19 10.5734 50.000 12 195.57

10 8.9240 59.406 5 290.03

Page 3: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Regression Model (n = #obs, p=#predictors)

2 2 20 1 1

11 1 01

21 2 12

1 ( 1) ( 1) 1

1

Scalar Form: ... 1,..., ~ 0, Matrix Form: where:

1

1

1

i i p ip i n

p

p

nx nx p p x

n np pn

Y X X i n NID

X XY

X XY

X XY

Y Xβ ε ε I

Y X β

1

2

1

^

^ ^ ^1 12 2

( 1) 1

Ordinary Least Squares Estimator of : (Note: there are two common notations for OLS estimator: and )

Projection (Hat) Matrix: (Not

nx

n

p x

E

ε

β β b

β b X'X X'Y β β β X'X

1

^ ^ ^

1 11

e: there are two common notations : and )

Fitted Values, Residuals, and Mean Vector:

1 0 0 1 1 1

0 1 0 1 1 11

0 0 1

nxn

nx nx nxnnx nxn

Y

Y

n

Y

P H

P H X X'X X'

Y Xβ PY e Y Y I P Y Y JY I J

2

1

2^

1

1 1 1

Analysis of Variance:

1Total (Corrected Sum of Squares): 1

Residual (Error) Sum of Squares: (

n

i TOi

n

ii Ei

SSTO Y Y df nn

SSE Y Y df n

Y' I J Y

Y' I P Y

2^

1

1)

1Regression Sum of Squares:

n

i Ri

p

SSR Y Y df pn

Y' P J Y

Page 4: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

X Matrix and Y VectorX Y

1 43.000 6 136.92 1.95391 19.979 16 135.45 1.04561 46.529 7 177.59 4.24531 20.962 6 156.51 2.12621 24.212 5 330.38 2.78601 112.200 4 385.39 13.83401 45.000 3 159.25 5.55811 28.548 6 104.02 4.00121 32.865 8 179.50 4.66981 59.406 5 290.03 8.92401 45.000 10 391.37 6.86551 37.435 13 287.17 6.01461 50.828 4 170.03 8.18571 68.000 13 292.11 11.73611 78.868 8 324.54 14.83741 28.454 13 167.62 5.36651 70.000 4 338.53 13.51621 20.000 5 130.02 3.88441 50.000 12 195.57 10.5734

Note: In my current EXCEL Worksheet:

• X is in cells B31:E49• Y is in cells G31:G49

The matrix operations are applied only to cells with numeric values, not cells with the matrix names/labels.

Page 5: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Matrix Operations/Rules

• When computing a new matrix: Highlight a range of cells that will contain the matrix (you must

know its dimension). The box above the leftmost viewable column shows the size of your highlighted area.

Type the relevant matrix commands. Press the keys: “Ctrl”-”Shift”-”Enter”

• Useful Matrix Commands (mmult can only do 2-at-a-time): =mmult(RangeA,RangeB) computes AB =mmult(transpose(RangeA),RangeA) computes A’A =minverse(RangeA) computes A-1 for square (full rank) A =mdeterm(RangeA) computes det(A) for square A =mmult(RangeA,mmult(RangeB,RangeC)) computes ABC

Page 6: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Computing X’X, X’Y, (X’X)-1,b

• X is in Cells B31:E49, Y is in Cells G31:G49• X’X and (X’X)-1 are (p+1)x(p+1) ≡ 4x4 in this example• X’Y and b are (p+1)x1 ≡ 4x1 in this example• Computing X’X:

Highlight a 4x4 range (say J31:M34) and Type: =mmult(transpose(B31:E49),B31:E49) Ctrl-Shift-Enter

• Computing X’Y: Highlight a 4x1 range (say O31:O34) and Type: =mmult(transpose(B31:E49),G31:G49) Ctrl-Shift-Enter

• Computing (X’X)-1: Highlight a 4x4 range (say J38:M41) and Type: =minverse(J31:M34) Ctrl-Shift-Enter

• Computing b: Highlight a 4x1 range (say O38:O41) and Type: =mmult(J38:M41,O31:O34) Ctrl-Shift-Enter

Page 7: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Hotel ExampleX'X X'Y

19 881.286 148 4352 130.124881.286 50940.4 6470.05 227950.988 7638.204

148 6470.05 1424 33603.29 983.84174352 227951 33603.29 1156375.75 34573.19

INV(X'X) b0.701445 -0.00363 -0.035 -0.0009071 -2.25277-0.00363 0.000187 0.000239 -3.01E-05 0.148709

-0.035 0.000239 0.004001 -3.164E-05 0.113045-0.00091 -3E-05 -3.2E-05 1.1132E-05 0.005777

For the truly hard-core types, b can be computed in one step (after highlight 4x1 range):

=MMULT(MINVERSE(MMULT(TRANSPOSE(B31:E49),B31:E49)),MMULT(TRANSPOSE(B31:E49),G31:G49))

However, you need to have (X’X)-1 to obtain the Variance-Covariance Matrix and Standard Errors for b.

Page 8: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Computing the Projection (Hat) Matrix• P = X(X’X)-1X’ and is nxn.

X is in B31:E49, (X’X)-1 is in J38:M41 Highlight (say) Cells S31:AK49 and Type: =MMULT(B31:E49,MMULT(J38:M41,TRANSPOSE(B31:E49)))

P Sum0.1358 0.0471 0.1015 0.0810 -0.0591 0.0490 0.1350 0.1395 0.0714 0.0362 -0.1017 -0.0394 0.1306 0.0087 0.0244 0.0488 0.0173 0.1074 0.0666 1.00000.0471 0.3463 0.0554 0.0405 -0.0777 -0.1485 -0.0531 0.0659 0.0860 -0.0626 0.0358 0.1783 -0.0299 0.1711 0.0055 0.2402 -0.1183 0.0246 0.1933 1.00000.1015 0.0554 0.0824 0.0601 -0.0299 0.0633 0.0964 0.0996 0.0609 0.0412 -0.0422 0.0003 0.0971 0.0400 0.0447 0.0539 0.0306 0.0749 0.0698 1.00000.0810 0.0405 0.0601 0.1472 0.1509 -0.1006 0.1082 0.1238 0.0901 0.0451 0.0349 0.0232 0.0799 -0.0682 -0.0455 0.0484 0.0250 0.1607 -0.0047 1.0000

-0.0591 -0.0777 -0.0299 0.1509 0.4726 -0.0956 0.0262 -0.0075 0.0614 0.1200 0.3258 0.1300 -0.0189 -0.1009 -0.0269 -0.0247 0.1478 0.1283 -0.1217 1.00000.0490 -0.1485 0.0633 -0.1006 -0.0956 0.4901 0.0728 -0.0436 -0.0326 0.1445 0.0141 -0.0370 0.1061 0.1666 0.2572 -0.0826 0.2138 -0.0995 0.0624 1.00000.1350 -0.0531 0.0964 0.1082 0.0262 0.0728 0.1752 0.1441 0.0685 0.0743 -0.0813 -0.0776 0.1557 -0.0592 0.0159 -0.0141 0.0707 0.1402 0.0019 1.00000.1395 0.0659 0.0996 0.1238 -0.0075 -0.0436 0.1441 0.1657 0.0906 0.0248 -0.0935 -0.0327 0.1281 -0.0365 -0.0250 0.0627 -0.0064 0.1524 0.0479 1.00000.0714 0.0860 0.0609 0.0901 0.0614 -0.0326 0.0685 0.0906 0.0732 0.0329 0.0276 0.0484 0.0603 0.0176 0.0061 0.0753 0.0160 0.0958 0.0507 1.00000.0362 -0.0626 0.0412 0.0451 0.1200 0.1445 0.0743 0.0248 0.0329 0.1024 0.0915 0.0231 0.0680 0.0228 0.0841 -0.0212 0.1299 0.0452 -0.0022 1.0000

-0.1017 0.0358 -0.0422 0.0349 0.3258 0.0141 -0.0813 -0.0935 0.0276 0.0915 0.3552 0.2149 -0.0857 0.0826 0.0732 0.0437 0.1255 -0.0111 -0.0094 1.0000-0.0394 0.1783 0.0003 0.0232 0.1300 -0.0370 -0.0776 -0.0327 0.0484 0.0231 0.2149 0.2037 -0.0653 0.1409 0.0573 0.1334 0.0196 -0.0114 0.0906 1.00000.1306 -0.0299 0.0971 0.0799 -0.0189 0.1061 0.1557 0.1281 0.0603 0.0680 -0.0857 -0.0653 0.1461 -0.0166 0.0413 -0.0004 0.0657 0.1083 0.0297 1.00000.0087 0.1711 0.0400 -0.0682 -0.1009 0.1666 -0.0592 -0.0365 0.0176 0.0228 0.0826 0.1409 -0.0166 0.2438 0.1568 0.1241 0.0291 -0.0924 0.1698 1.00000.0244 0.0055 0.0447 -0.0455 -0.0269 0.2572 0.0159 -0.0250 0.0061 0.0841 0.0732 0.0573 0.0413 0.1568 0.1667 0.0187 0.1167 -0.0563 0.0852 1.00000.0488 0.2402 0.0539 0.0484 -0.0247 -0.0826 -0.0141 0.0627 0.0753 -0.0212 0.0437 0.1334 -0.0004 0.1241 0.0187 0.1726 -0.0573 0.0383 0.1400 1.00000.0173 -0.1183 0.0306 0.0250 0.1478 0.2138 0.0707 -0.0064 0.0160 0.1299 0.1255 0.0196 0.0657 0.0291 0.1167 -0.0573 0.1755 0.0211 -0.0223 1.00000.1074 0.0246 0.0749 0.1607 0.1283 -0.0995 0.1402 0.1524 0.0958 0.0452 -0.0111 -0.0114 0.1083 -0.0924 -0.0563 0.0383 0.0211 0.1834 -0.0098 1.00000.0666 0.1933 0.0698 -0.0047 -0.1217 0.0624 0.0019 0.0479 0.0507 -0.0022 -0.0094 0.0906 0.0297 0.1698 0.0852 0.1400 -0.0223 -0.0098 0.1620 1.0000

Sum 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

The Sums on the margins just show that the rows and columns sum to 1, and are not part of the P matrix

Page 9: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Computing the Analysis of Variance

• Y is in Cells G31:G49, X’Y is in O31:O34 b is in O38:O41• P is in S31:AK49• Total (Corrected) Sum of Squares: dfTotal = n-1

Y’(I - (1/n)J)Y=Y’Y - Y’(1/n)JY• Y’Y: 1) =mmult(transpose(G31:G49),G31:G49) 2) =sumsq(G31:G49)• Y’(1/n)JY: =(sum(G31:G49))^2/count(G31:G49)

• Residual (Error) Sum of Squares: dfError = n-(p+1) Y’(I - P)Y = Y’Y – Y’PY = Y’Y – b’X’Y (Much easier for large n)

• Y’PY: =mmult(transpose(G31:G49),mmult(S31:AK49,G31:G49))• b’X’Y: 1) =mmult(transpose(O38:O41),O31:O34) • b’X’Y: 2) =sumproduct(O31:O34,O38:O41)

• Regression Sum of Squares: dfReg = p Y’(P - (1/n)J)Y

Page 10: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Hotel Example – ANOVA, F & t-testsY'Y (1) Y'Y (2) Y'(1/n)JY Y'PY b'X'Y (1) b'X'Y (2)

1221.52 1221.52 891.17 1153.67 1153.67 1153.67

SSTotal dfTotal SSResid dfResid SSReg dfReg330.35 18 67.85 15 262.50 3

ANOVASource df SS MS F_obs Fcrit P-valueRegression 3 262.50 87.50 19.35 3.29 0.0000Residual 15 67.85 4.52Total 18 330.35

s2{b} = s^2*INV(X'X) b s{b} t_obs3.17268107 -0.01642 -0.15832 -0.0041028 -2.2528 1.7812 -1.2647-0.0164204 0.000845 0.001081 -0.0001362 0.1487 0.0291 5.1162

-0.158321 0.001081 0.018097 -0.0001431 0.1130 0.1345 0.8403-0.0041028 -0.00014 -0.00014 5.0351E-05 0.0058 0.0071 0.8141

Note: s2 = MSResid = 4.52 In the Companion EXCEL Spreadsheet, This is cells J45:Q60

Page 11: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Obtaining Fitted Values, Residuals and Diagonal Elements of P

1 1

1

1 1

^ ^ ^ ^2 2 2

^2 2 2

'

'1

'

'

' 1

i ip

ii ij

i ii

i ii

x x

P P

s Y MSE P

s e MSE P

1

21 2 n i

n

i i i j

x

xP X X'X X' X'X x x x x '

x

x ' X'X x x ' X'X x

Y Xβ PY Y P IP P

e Y Y I P Y e I P I I P I P

(X’X)-1 is in Cells: J38:M41 b is in O38:O41 “New” X is in B71:E89 “New” Y is in F71:F89

Computing Y-hat: Highlight cells G71:G89: =mmult(B71:E89,O38:O41)

Computing e: In Cell I71: =F71-G71 Then Double Click on box in Southeast corner of cell

Computing Pii: In Cell J71: =MMULT(B71:E71,MMULT($J$38:$M$41,TRANSPOSE(B71:E71))) Ctrl-Shift-Enter & Double Click box in Southeast Corner of J71

Page 12: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

Hotel ExampleX Y Y-hat e P_diag s{Y-hat} s{e} rstudent

1 43.000 6 136.92 1.9539 5.6110 -3.6570 0.1358 0.7836 1.9771 -1.84971 19.979 16 135.45 1.0456 3.3095 -2.2639 0.3463 1.2516 1.7195 -1.31671 46.529 7 177.59 4.2453 6.4837 -2.2384 0.0824 0.6106 2.0372 -1.09881 20.962 6 156.51 2.1262 2.4469 -0.3207 0.1472 0.8161 1.9639 -0.16331 24.212 5 330.38 2.7860 3.8215 -1.0356 0.4726 1.4620 1.5445 -0.67051 112.200 4 385.39 13.8340 17.1109 -3.2769 0.4901 1.4889 1.5186 -2.15781 45.000 3 159.25 5.5581 5.6982 -0.1401 0.1752 0.8902 1.9315 -0.07251 28.548 6 104.02 4.0012 3.2718 0.7295 0.1657 0.8658 1.9425 0.37551 32.865 8 179.50 4.6698 4.5759 0.0939 0.0732 0.5756 2.0474 0.04591 59.406 5 290.03 8.9240 8.8221 0.1019 0.1024 0.6807 2.0149 0.05061 45.000 10 391.37 6.8655 7.8305 -0.9649 0.3552 1.2675 1.7078 -0.56501 37.435 13 287.17 6.0146 6.4427 -0.4281 0.2037 0.9598 1.8978 -0.22561 50.828 4 170.03 8.1857 6.7402 1.4455 0.1461 0.8130 1.9652 0.73561 68.000 13 292.11 11.7361 11.0165 0.7196 0.2438 1.0501 1.8494 0.38911 78.868 8 324.54 14.8374 12.2548 2.5826 0.1667 0.8682 1.9414 1.33031 28.454 13 167.62 5.3665 4.4165 0.9500 0.1726 0.8836 1.9345 0.49111 70.000 4 338.53 13.5162 10.5647 2.9515 0.1755 0.8910 1.9311 1.52841 20.000 5 130.02 3.8844 2.0377 1.8467 0.1834 0.9107 1.9219 0.96091 50.000 12 195.57 10.5734 7.6690 2.9044 0.1620 0.8559 1.9469 1.4918

rstudent ≡Studentized Residual = e/s{e}

Page 13: Multiple Regression in Matrix Form Using EXCEL Energy Consumption of Luxury Hotels in Hainan Province, China Source: Y. Xin, S. Lu, N. Zhu, W. Wu (2012)

0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.0000-4.0000

-3.0000

-2.0000

-1.0000

0.0000

1.0000

2.0000

3.0000

4.0000

Residuals vs Fitted Values

e

Fitted Values

Resid

uals