Upload
phungthien
View
219
Download
0
Embed Size (px)
Citation preview
Modeling television viewership
The Nielsen ratings are the best–known measures of viewership of television shows. These
ratings form the basis for the setting of advertising rates, and are thus crucial for the success
(and survival) of shows. The ratings, which are based on diaries kept by a sample of “Nielsen
families,” are intended to measure in–home viewing. This can result in biased figures for
shows that are often watched in large public areas such as bars or restaurants, such as sports
shows, or shows that appeal to people who watch television in groups, such as those aimed
at college–aged viewers.
Two ratings values are typically examined: the rating, which is (an estimate of) the
percentage of televisions tuned to a particular show at a particular time out of all televisions,
and the share, which is the percentage of televisions tuned to a particular show at a particular
time out of all televisions that are being used at that time. The latter variable corrects for
the fact that more or less television in general is watched on certain nights of the week.
These numbers can then be converted into estimates of the total number of viewers of the
program (and thereby total number of potential customers for the advertisers!).
The data analyzed here are the estimated household ratings for each of the 157 television
shows for new episodes during the 2011-2012 season (that is, not including repeat episodes
shown in the regular time slot). I am indebted to Karl Rosen for sharing these data with
me. For each show the network (ABC, CBS, CW, Fox, or NBC) and type (comedy, drama,
news, reality/participation, or animation) are recorded. Since these data are actually a
listing of all shows for 2011-2012, they are not a sample from that season, but rather should
be viewed as a “snapshot” sample from a (hopefully) stable ongoing process. That is, a
significant difference between two networks, for example, would hopefully say something
about the 2012-2013 season and beyond. Note that NBC’s “Sunday Night Football” and
ABC’s “Saturday Movie of the Week” are not included, since they were the only shows of
their type broadcast in prime time by the networks during the 2011-2012 season.
Side–by–side boxplots show that there are definitely network and type effects. The net-
works fall into three groups: CBS, the other major networks (NBC, ABC, and Fox), and the
“netlet” CW, which seriously lags behind. Animation and news shows are generally lower
rated, while comedies, dramas, and reality shows are similar. There are noticeably different
amounts of variability in household rating across the different networks and the different
types of shows.
c©2017, Jeffrey S. Simonoff 1
Our first attempt to fit a two–way ANOVA model to these data ends in failure: Minitab
refuses to fit the model with the interaction, giving the message
General Linear Model: HH Rating versus Network, Type
The following terms cannot be estimated and were removed:
Network*Type
and then fits the model with only main effects. A table that cross–classifies the shows by
network and type reveals the problem: there are combinations that never occur (seven of
them), making it impossible to fit a model with an interaction effect.
c©2017, Jeffrey S. Simonoff 2
Rows: Network Columns: Full type
Evening
Comedy Drama Animation News Reality/Participatio All
ABC 11 14 0 4 11 40
CBS 10 14 0 2 5 31
CW 0 10 0 0 4 14
FOX 4 10 8 0 10 32
NBC 11 10 0 3 16 40
All 36 58 8 9 46 157
Cell Contents: Count
There are three things we might do here. One would be to fit a model with only main
effects (what Minitab did automatically). I could do that, but I am interested in whether
certain networks are better performers for certain types of shows, and that defines the in-
teraction effect. A second possibility is to figure out a way to fit an interaction effect even
when there are “holes” in the data. That can be done, in fact, but I’ll postpone that to an
appendix. A third possibility is to change our data a bit so that the holes aren’t there any
more. I’ll do that here, by noting three things. First, evening animation is a big problem,
since those shows only occurred on Fox. All of those shows are comedies, however, so I will
just reclassify them as comedies. Second, the CW network is a problem, since it has no
comedies or news shows. Third, news shows are a problem, since neither CW nor Fox has
any of them. I will address by removing CW shows and news shows for now. This still leaves
85.4% (134) of the shows in the sample.
Here is a two–way ANOVA for the 134 comedies, dramas, and reality shows on ABC,
CBS, Fox, and NBC:
General Linear Model: HH Rating versus Network, Type
Method
Factor coding (-1, 0, +1)
Factor Information
Factor Type Levels Values Network Fixed 4 ABC, CBS,
c©2017, Jeffrey S. Simonoff 3
FOX, NBC Type Fixed 3 Comedy, Drama, Reality/Participatio
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 3 159.06 53.020 17.25 0.000
Type 2 38.69 19.343 6.29 0.003
Network*Type 6 23.00 3.834 1.25 0.287
Error 122 375.03 3.074
Total 133 644.18
Model Summary
S R-sq R-sq(adj) R-sq(pred)
1.75330 41.78% 36.53% 29.71%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 4.582 0.157 29.10 0.000
Network
ABC 0.286 0.261 1.10 0.275 1.62
CBS 1.783 0.297 6.00 0.000 1.88
FOX -0.713 0.271 -2.64 0.009 1.64
Type
Comedy -0.777 0.219 -3.54 0.001 1.35
Drama 0.282 0.216 1.31 0.194 1.36
Network*Type
ABC Comedy 0.348 0.371 0.94 0.350 2.19
ABC Drama -0.170 0.356 -0.48 0.633 2.08
CBS Comedy 0.100 0.404 0.25 0.805 2.19
CBS Drama 0.798 0.383 2.08 0.039 2.07
FOX Comedy -0.303 0.373 -0.81 0.418 2.20
FOX Drama -0.427 0.383 -1.12 0.266 2.18
Regression Equation
HH Rating = 4.582 +0.286Network_ABC +1.783Network_CBS
-0.713Network_FOX
-1.356Network_NBC -0.777Type_Comedy +0.282Type_Drama
+0.494Type_Reality/Participatio +0.348Network*Type_ABC Comedy
c©2017, Jeffrey S. Simonoff 4
-0.170Network*Type_ABC Drama -0.178Network*Type_ABC
Reality/Participatio +0.100Network*Type_CBS Comedy
+0.798Network*Type_CBS Drama -0.898Network*Type_CBS
Reality/Participatio -0.303Network*Type_FOX Comedy
-0.427Network*Type_FOX Drama +0.731Network*Type_FOX
Reality/Participatio -0.145Network*Type_NBC Comedy
-0.201Network*Type_NBC Drama +0.346Network*Type_NBC
Reality/Participatio
The interaction effect is not close to statistically significant. The two main effects are
statistically significant, but we should remember than in an unbalanced design situation
like this, it can happen that the presence of an insignificant interaction effect can make
main effects look significant when they wouldn’t be once the interaction is removed from the
model.
Unfortunately, a plot of residuals versus fitted values shows that we have long right-tailed
residuals and nonconstant variance, which suggests modeling viewers in the logged scale.
Here are side-by-side boxplots for logged viewership separated by network and type.
While the general patterns are similar to before, there is some evidence that the nonconstant
variance might be alleviated somewhat.
c©2017, Jeffrey S. Simonoff 5
Here is an ANOVA with logged viewership as the response.
General Linear Model: Logged rating versus Network, Type
Method
Factor coding (-1, 0, +1)
Factor Information
Factor Type Levels Values Network Fixed 4 ABC, CBS,
c©2017, Jeffrey S. Simonoff 6
FOX, NBC Type Fixed 3 Comedy, Drama, Reality/Participatio
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 3 1.5459 0.51529 16.03 0.000
Type 2 0.4818 0.24092 7.50 0.001
Network*Type 6 0.1713 0.02855 0.89 0.506
Error 122 3.9211 0.03214
Total 133 6.3988
Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.179277 38.72% 33.20% 26.03%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 0.6103 0.0161 37.91 0.000
Network
ABC 0.0536 0.0267 2.01 0.047 1.62
CBS 0.1587 0.0304 5.22 0.000 1.88
FOX -0.0744 0.0277 -2.69 0.008 1.64
Type
Comedy -0.0869 0.0224 -3.87 0.000 1.35
Drama 0.0390 0.0221 1.76 0.080 1.36
Network*Type
ABC Comedy 0.0577 0.0380 1.52 0.131 2.19
ABC Drama -0.0156 0.0364 -0.43 0.669 2.08
CBS Comedy -0.0114 0.0413 -0.28 0.784 2.19
CBS Drama 0.0540 0.0392 1.38 0.170 2.07
FOX Comedy -0.0128 0.0382 -0.34 0.737 2.20
FOX Drama -0.0258 0.0391 -0.66 0.511 2.18
Regression Equation
Logged rating = 0.6103 +0.0536Network_ABC +0.1587Network_CBS
-0.0744Network_FOX -0.1379Network_NBC
-0.0869Type_Comedy +0.0390Type_Drama
+0.0479Type_Reality/Participatio +0.0577Network*Type_ABC
c©2017, Jeffrey S. Simonoff 7
Comedy -0.0156Network*Type_ABC Drama
-0.0421Network*Type_ABC Reality/Participatio
-0.0114Network*Type_CBS Comedy +0.0540Network*Type_CBS
Drama -0.0427Network*Type_CBS Reality/Participatio
-0.0128Network*Type_FOX Comedy -0.0258Network*Type_FOX
Drama +0.0386Network*Type_FOX Reality/Participatio
-0.0336Network*Type_NBC Comedy -0.0126Network*Type_NBC
Drama +0.0462Network*Type_NBC Reality/Participatio
The interaction effect is still quite insignificant, but there is still a problem, in that there
are four clear outliers:
These are the four lowest-rated shows of the 2011-2012 season: two versions of “Comedy
Time Saturday” on CBS, “Q’Viva” on Fox, and “Escape Routes” on NBC. While these four
shows are lowest-rated, it might not be immediately apparent why they are so distinctly
outlying. The issue is that they are particularly low-rated for their own groups (CBS com-
edy, Fox reality, and NBC reality); if they had all been NBC comedies (the lowest-rated
combination), for example, they might not have been outliers.
Did these shows have an important effect? Apparently so:
c©2017, Jeffrey S. Simonoff 8
General Linear Model: Logged rating versus Network, Type
Method
Factor coding (-1, 0, +1)
Factor Information
Factor Type Levels Values Network Fixed 4 ABC, CBS,
FOX, NBC Type Fixed 3 Comedy, Drama, Reality/Participatio
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 3 1.7157 0.57190 32.62 0.000
Type 2 0.3106 0.15528 8.86 0.000
Network*Type 6 0.3373 0.05621 3.21 0.006
Error 118 2.0686 0.01753
Total 129 4.8693
Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.132404 57.52% 53.56% 48.34%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 0.6324 0.0121 52.29 0.000
Network
ABC 0.0315 0.0198 1.59 0.115 1.61
CBS 0.1885 0.0231 8.17 0.000 1.89
FOX -0.0753 0.0208 -3.62 0.000 1.65
Type
Comedy -0.0701 0.0170 -4.12 0.000 1.35
Drama 0.0168 0.0165 1.02 0.309 1.36
Network*Type
ABC Comedy 0.0409 0.0283 1.45 0.151 2.19
ABC Drama 0.0066 0.0270 0.24 0.808 2.05
CBS Comedy 0.0757 0.0323 2.35 0.021 2.29
c©2017, Jeffrey S. Simonoff 9
CBS Drama 0.0242 0.0294 0.82 0.412 2.10
FOX Comedy -0.0509 0.0286 -1.78 0.078 2.18
FOX Drama -0.0249 0.0292 -0.85 0.395 2.12
Regression Equation
Logged rating = 0.6324 +0.0315Network_ABC +0.1885Network_CBS
-0.0753Network_FOX -0.1447Network_NBC
-0.0701Type_Comedy +0.0168Type_Drama
+0.0532Type_Reality/Participatio +0.0409Network*Type_ABC
Comedy +0.0066Network*Type_ABC Drama
-0.0475Network*Type_ABC Reality/Participatio
+0.0757Network*Type_CBS Comedy +0.0242Network*Type_CBS
Drama -0.0999Network*Type_CBS Reality/Participatio
-0.0509Network*Type_FOX Comedy -0.0249Network*Type_FOX
Drama +0.0758Network*Type_FOX Reality/Participatio
-0.0657Network*Type_NBC Comedy -0.0059Network*Type_NBC
Drama +0.0716Network*Type_NBC Reality/Participatio
Means
Fitted
Term Mean SE Mean
Network
ABC 0.6639 0.0222
CBS 0.8209 0.0278
FOX 0.5571 0.0239
NBC 0.4877 0.0224
Type
Comedy 0.5624 0.0207
Drama 0.6493 0.0194
Reality/Participatio 0.6857 0.0227
Network*Type
ABC Comedy 0.6348 0.0399
ABC Drama 0.6873 0.0354
ABC Reality/Participatio 0.6697 0.0399
CBS Comedy 0.8266 0.0468
CBS Drama 0.8620 0.0354
CBS Reality/Participatio 0.7742 0.0592
FOX Comedy 0.4362 0.0382
FOX Drama 0.5491 0.0419
FOX Reality/Participatio 0.6861 0.0441
NBC Comedy 0.3519 0.0399
NBC Drama 0.4987 0.0419
c©2017, Jeffrey S. Simonoff 10
NBC Reality/Participatio 0.6126 0.0342
The interaction effect is now statistically significant, so apparently the relative perfor-
mance of comedies, dramas, and reality shows differs from network to network. Note that in
a model that includes the interaction effect the fitted (and predicted) values correspond to
the means for each network / type combination, which is the average response value for each
combination. Let’s look at residual plots to see if the assumptions of the regression seem
reasonable now.
c©2017, Jeffrey S. Simonoff 11
The residual plots look better than they have before. There is still a bit of a right tail, and
some evidence of nonconstant variance. We can look at Levene’s test to see if nonconstant
variance is indicated by it.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 3 5.303 1.7675 5.41 0.002
Type 2 2.633 1.3166 4.03 0.020
Network*Type 6 5.409 0.9015 2.76 0.015
Error 118 38.547 0.3267
Total 129 52.014
Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.571549 25.89% 18.98% 10.47%
The test indicates nonconstant variance related to both network and type (and both
together). Note, by the way, that if the interaction term in this test had been insignificant,
we would then rerun the Levene’s test with only main effects, since the presence of the inter-
action could obscure the potential importance of main effects in accounting for nonconstant
variance.
c©2017, Jeffrey S. Simonoff 12
How would we handle this heteroscedasticity? We would use weighted least squares, of
course. I will stick with logged rating as the response variable because of the long right tails
of the residuals in the original analysis, although another approach would be to do WLS for
rating in the original scale. The appendix discusses a second way of getting the weights,
but I will use here the same method we used for one-way ANOVA. First, we determine the
weights based on the standard deviations of the residuals from the model.
Test for Equal Variances: SRES3 versus Network, Type
95% Bonferroni Confidence Intervals for Standard Deviations
Network Type N StDev CI
ABC Comedy 11 0.85533 (0.39726, 2.49026)
ABC Drama 14 0.76142 (0.47642, 1.53005)
ABC Reality/Participatio 11 1.53967 (0.54362, 5.89672)
CBS Comedy 8 0.70824 (0.30548, 2.55829)
CBS Drama 14 0.74397 (0.36886, 1.88672)
CBS Reality/Participatio 5 0.29992 (0.06609, 3.18758)
FOX Comedy 12 0.74125 (0.42488, 1.69886)
FOX Drama 10 1.21589 (0.50983, 4.06430)
FOX Reality/Participatio 9 1.88663 (1.00239, 5.20931)
NBC Comedy 11 0.81160 (0.39577, 2.25056)
NBC Drama 10 1.13147 (0.68311, 2.62673)
NBC Reality/Participatio 15 1.01923 (0.55605, 2.30935)
The weights are the inverse of the squared entries given under StDev. We fit a WLS
model based on all of the observations (including the OLS outliers), since an observation
might not be an outlier any more relative to a higher estimated standard deviation. This is
in fact the case here, since there are now only 3 outliers apparent from the WLS fit of the
two-way ANOVA:
c©2017, Jeffrey S. Simonoff 13
The show “Q’Viva” is no longer an outlier, because logged ratings for Fox reality shows
have larger-than-average variability (the two versions of “Comedy Time Saturday” and “Es-
cape Routes” are still outliers). It turns out, however, that once the other three shows are
omitted “Q’Viva” shows up as a little unusual, so we’ll go back to omitting all four of them:
c©2017, Jeffrey S. Simonoff 14
General Linear Model: Logged rating versus Network, Type
Method
Factor coding (-1, 0, +1)
Weights wt
Factor Information
Factor Type Levels Values
Network Fixed 4 ABC, CBS, FOX, NBC
Type Fixed 3 Comedy, Drama, Reality/Participatio
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 3 3.0036 1.00120 62.86 0.000
Type 2 0.3543 0.17716 11.12 0.000
Network*Type 6 0.5890 0.09817 6.16 0.000
Error 118 1.8793 0.01593
Total 129 7.0912
Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.126201 73.50% 71.03% 67.86%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 0.6324 0.0119 52.95 0.000
Network
ABC 0.0315 0.0207 1.52 0.131 2.23
CBS 0.1885 0.0158 11.94 0.000 1.87
FOX -0.0753 0.0258 -2.92 0.004 2.91
Type
Comedy -0.0701 0.0149 -4.72 0.000 2.03
Drama 0.0168 0.0162 1.04 0.299 2.30
Network*Type
ABC Comedy 0.0409 0.0261 1.56 0.120 2.17
ABC Drama 0.0066 0.0256 0.26 0.799 1.97
c©2017, Jeffrey S. Simonoff 15
CBS Comedy 0.0757 0.0222 3.41 0.001 2.92
CBS Drama 0.0242 0.0217 1.11 0.267 2.98
FOX Comedy -0.0509 0.0294 -1.73 0.086 2.93
FOX Drama -0.0249 0.0343 -0.73 0.469 2.29
Means
Fitted
Term Mean SE Mean
Network
ABC 0.6639 0.0239
CBS 0.8209 0.0146
FOX 0.5571 0.0323
NBC 0.4877 0.0213
Type
Comedy 0.5624 0.0153
Drama 0.6493 0.0188
Reality/Participatio 0.6857 0.0264
Network*Type
ABC Comedy 0.6348 0.0325
ABC Drama 0.6873 0.0257
ABC Reality/Participatio 0.6697 0.0586
CBS Comedy 0.8266 0.0316
CBS Drama 0.8620 0.0251
CBS Reality/Participatio 0.7742 0.0169
FOX Comedy 0.4362 0.0270
FOX Drama 0.5491 0.0485
FOX Reality/Participatio 0.6861 0.0794
NBC Comedy 0.3519 0.0309
NBC Drama 0.4987 0.0452
NBC Reality/Participatio 0.6126 0.0332
The interaction effect is highly statistically significant. The following interaction plot
summarizes the effect:
c©2017, Jeffrey S. Simonoff 16
We see that Fox and NBC are very similar to each other, with reality shows having the
highest ratings, dramas lower, and comedies lowest; Fox is a bit higher than NBC, which
has the lowest ratings in all three categories. CBS and ABC have higher ratings for dramas,
and lower ratings for comedies and reality shows (although the differences between types
are smaller, especially for ABC), with CBS having the highest ratings for all three types
of shows and ABC somewhat lower. Another way of looking at this is that the networks
generally rank CBS / ABC / Fox / NBC, with the differences between the ratings of the
networks being largest for comedies, smaller for dramas, and smallest for reality shows.
Residual plots and diagnostics look fine (remember that the guideline for leverage values
is (2.5)(12)/130 = .231, since there are 2 + 3 + 6 = 11 predictor variables in the regression
that corresponds to the two-way ANOVA fit).
c©2017, Jeffrey S. Simonoff 17
Row Show SRES4 HI4 COOK4
1 NCIS 2.34425 0.071429 0.0352276
2 DANCING WITH THE STARS 2.04441 0.090909 0.0348301
3 AMERICAN IDOL-WEDNESDAY 1.50543 0.111111 0.0236074
4 DANCING W/STARS RESULTS 1.79929 0.090909 0.0269787
5 AMERICAN IDOL-THURSDAY 1.35514 0.111111 0.0191292
6 NCIS: LOS ANGELES 1.34941 0.071429 0.0116724
7 BIG BANG THEORY, THE 1.49214 0.125000 0.0265057
8 TWO AND A HALF MEN 1.39241 0.125000 0.0230811
9 MENTALIST, THE 0.60198 0.071429 0.0023229
10 PERSON OF INTEREST 0.56694 0.071429 0.0020604
11 CRIMINAL MINDS 0.40602 0.071429 0.0010568
12 VOICE 2.26734 0.066667 0.0306002
13 UNFORGETTABLE 0.01472 0.071429 0.0000014
14 CSI -0.05816 0.071429 0.0000217
15 BLUE BLOODS -0.07153 0.071429 0.0000328
16 MODERN FAMILY 2.13887 0.090909 0.0381229
17 MIKE & MOLLY 0.25842 0.125000 0.0007950
18 CASTLE 1.72396 0.071429 0.0190516
19 2 BROKE GIRLS 0.01713 0.125000 0.0000035
20 HAWAII FIVE-0 -0.37553 0.071429 0.0009040
21 X-FACTOR-THU 0.61759 0.111111 0.0039731
22 SURVIVOR: SOUTH PACIFIC 1.45478 0.200000 0.0440912
23 GOOD WIFE, THE -0.43293 0.071429 0.0012015
24 X-FACTOR-WED 0.59428 0.111111 0.0036789
25 ROB -0.09996 0.125000 0.0001190
26 GREY’S ANATOMY 1.40656 0.071429 0.0126822
c©2017, Jeffrey S. Simonoff 19
27 CSI: MIAMI -0.57951 0.071429 0.0021528
28 CSI: NY -0.63942 0.071429 0.0026209
29 AMAZING RACE 19 0.37004 0.200000 0.0028527
30 AMERICA’S GOT TALENT-TUE 1.37307 0.066667 0.0112222
31 SURVIVOR: ONE WORLD 0.22247 0.200000 0.0010311
32 AMERICA’S GOT TALENT-MON 1.27379 0.066667 0.0096581
33 VOICE:RESULTS SHOW 1.23807 0.066667 0.0091239
34 HOW I MET YOUR MOTHER -0.74643 0.125000 0.0066328
35 BODY OF PROOF 0.81381 0.071429 0.0042454
36 ONCE UPON A TIME 0.78129 0.071429 0.0039130
37 RULES OF ENGAGEMENT -0.81845 0.125000 0.0079745
38 DESPERATE HOUSEWIVES 0.65732 0.071429 0.0027697
39 GIFTED MAN, A -1.26643 0.071429 0.0102810
40 UNDERCOVER BOSS -0.97691 0.200000 0.0198823
41 AMAZING RACE 20 -1.07038 0.200000 0.0238689
42 BACHELOR, THE 0.33848 0.090909 0.0009548
43 LAST MAN STANDING 0.88545 0.090909 0.0065335
44 BACHELORETTE, THE 0.29467 0.090909 0.0007236
45 REVENGE 0.33672 0.071429 0.0007268
46 MIDDLE, THE 0.74841 0.090909 0.0046676
47 HOW TO BE A GENTLEMAN -1.49526 0.125000 0.0266167
48 HARRY’S LAW 1.49744 0.100000 0.0207624
49 MISSING 0.09756 0.071429 0.0000610
50 NYC 22 -1.85980 0.071429 0.0221721
51 BONES 0.97554 0.100000 0.0088118
52 SCANDAL 0.01185 0.071429 0.0000009
53 PRIVATE PRACTICE -0.06568 0.071429 0.0000277
54 LAST MAN STANDING-8:30PM 0.41611 0.090909 0.0014429
55 TOUCH 0.86411 0.100000 0.0069138
56 SUBURGATORY 0.36258 0.090909 0.0010955
57 LAW AND ORDER:SVU 1.25936 0.100000 0.0146852
58 DUETS -0.08371 0.090909 0.0000584
59 GLEE 0.72202 0.100000 0.0048270
60 TERRA NOVA 0.71540 0.100000 0.0047388
61 SMASH 1.09745 0.100000 0.0111518
62 HOUSE 0.64156 0.100000 0.0038112
63 ALCATRAZ 0.50957 0.100000 0.0024042
64 CHARLIE’S ANGELS -0.79356 0.071429 0.0040368
65 MAN UP! -0.23437 0.090909 0.0004577
66 GCB -0.85097 0.071429 0.0046420
67 ROOKIE BLUE -0.86254 0.071429 0.0047691
68 BIGGEST LOSER 13 -0.05840 0.066667 0.0000203
69 NEW GIRL 1.76605 0.083333 0.0236283
70 APPRENTICE 12 -0.14622 0.066667 0.0001273
71 BIGGEST LOSER 12 -0.14622 0.066667 0.0001273
c©2017, Jeffrey S. Simonoff 20
72 HAPPY ENDINGS -0.46827 0.090909 0.0018273
73 AMER FUNN HOME VIDEOS -0.56782 0.090909 0.0026868
74 SO YOU THINK CN DANCE -0.48914 0.111111 0.0024922
75 DON’T TRUST THE B-APT 23 -0.64691 0.090909 0.0034874
76 FEAR FACTOR -0.40453 0.066667 0.0009741
77 WHO’S STILL STANDING -0.41412 0.066667 0.0010208
78 PAN AM -1.37604 0.071429 0.0121378
79 OFF THEIR ROCKRS -0.42374 0.066667 0.0010688
80 CELEBRITY WIFE SWAP -0.61200 0.090909 0.0031212
81 SHARK TANK -0.63162 0.090909 0.0033245
82 WHO DO YOU THINK YOU ARE -0.48199 0.066667 0.0013828
83 EXTREME MAKEOVER:HOME ED. -0.65803 0.090909 0.0036084
84 HELL’S KITCHEN-MON -0.62188 0.111111 0.0040285
85 EXTREME MAKEOVER:HM ED-9P -0.67135 0.090909 0.0037559
86 FAMILY GUY 1.16242 0.083333 0.0102366
87 WORK IT -0.92990 0.090909 0.0072060
88 SIMPSONS 1.12032 0.083333 0.0095085
89 WIPEOUT-THURS -0.72540 0.090909 0.0043850
90 HELL’S KITCHEN-MON 9P -0.68331 0.111111 0.0048637
91 FINDER -0.14729 0.100000 0.0002009
92 PARENTHOOD 0.17515 0.100000 0.0002841
93 AMERICAN NINJA WARRIOR -0.73572 0.066667 0.0032219
94 MOBBED -0.76442 0.111111 0.0060868
95 RIVER, THE -1.88026 0.071429 0.0226628
96 OFF THEIR ROCKRS 830 -0.79946 0.066667 0.0038043
97 OFFICE 1.63772 0.090909 0.0223510
98 PRIME SUSPECT 0.06746 0.100000 0.0000421
99 GRIMM -0.02343 0.100000 0.0000051
100 FASHION STAR -1.16054 0.066667 0.0080170
101 YOU DESERVE IT -1.09474 0.090909 0.0099872
102 COUGAR TOWN -1.70413 0.090909 0.0242006
103 WHITNEY 1.06923 0.090909 0.0095271
104 I HATE MY TEENAGE DGHTR 0.20797 0.083333 0.0003277
105 PLAYBOY CLUB -0.32387 0.100000 0.0009712
106 RAISING HOPE 0.15666 0.083333 0.0001859
107 NAPOLEON DYNAMITE 0.13943 0.083333 0.0001473
108 AMERICAN DAD 0.06991 0.083333 0.0000370
109 SING OFF -1.38134 0.066667 0.0113578
110 CLEVELAND-SUN 8:30P 0.03477 0.083333 0.0000092
111 UP ALL NIGHT 0.79671 0.090909 0.0052896
112 COPS 2 -0.97911 0.100000 0.0088764
113 ARE YOU THERE CHELSEA 0.45313 0.090909 0.0017111
114 ALLEN GREGORY -0.46634 0.083333 0.0016475
115 FREE AGENTS 0.13865 0.090909 0.0001602
116 PARKS AND RECREATION 0.11944 0.090909 0.0001189
c©2017, Jeffrey S. Simonoff 21
117 AWAKE -1.08172 0.100000 0.0108345
118 30 ROCK -0.03730 0.090909 0.0000116
119 KITCHEN NIGHTMARES -1.51369 0.111111 0.0238672
120 COMMUNITY -0.13814 0.090909 0.0001590
121 BOB’S BURGERS -1.11383 0.083333 0.0093986
122 COPS -1.46053 0.100000 0.0197512
123 CHUCK -1.28768 0.100000 0.0153530
124 CLEVELAND -1.31923 0.083333 0.0131846
125 FIRM -1.38017 0.100000 0.0176376
126 BEST FRIENDS FOREVER -0.65684 0.090909 0.0035953
127 FRINGE -1.84127 0.100000 0.0313916
128 BREAKING IN -1.75813 0.083333 0.0234169
129 BENT 9P -1.43112 0.090909 0.0170674
130 BENT 930 -1.95150 0.090909 0.0317363
c©2017, Jeffrey S. Simonoff 22
Appendix: Fitting a two–way ANOVA model to data where some combinations are missing
How could we have fit a two–way ANOVA model including an interaction effect to the
full data set? The key is to fit the interaction manually using indicator or effect coding
variables, and determine the appropriate partial F–test by hand. So, for example, in this
example, four variables are created to represent the Network main effect, three are created
to represent the Type main effect (keeping the animation shows as comedies), and then 12
pairwise products are created to represent the interaction (although as we will see not all of
those are used).
In fact, Minitab gives us the partial F -test that we need, although it obscures this fact
somewhat. Here is the fit of the two-way ANOVA based on only the main effects; we are
using all of the shows other than the four identified as outliers in the earlier analysis, and
are now including the CW as a network and news shows as a type:
General Linear Model: Logged rating versus Network, Type
Method
Factor coding (-1, 0, +1)
Factor Information
Factor Type Levels Values Network Fixed 5 ABC, CBS, CW,
FOX, NBC Type Fixed 4 Comedy, Drama, News,
Reality/Participatio
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Network 4 8.0360 2.00900 100.19 0.000
Type 3 0.5076 0.16919 8.44 0.000
Error 145 2.9075 0.02005
Lack-of-Fit 9 0.4335 0.04816 2.65 0.007
Pure Error 136 2.4740 0.01819
Total 152 11.0179
Model Summary
c©2017, Jeffrey S. Simonoff 23
S R-sq R-sq(adj) R-sq(pred)
0.141604 73.61% 72.34% 70.54%
It turns out that what Minitab is reporting as the “Lack-of-Fit” test is, in fact, the
test for the two-way interaction between Network and Type. As you can see, it is strongly
statistically significant, with F = 2.65 and p = .007.
Since the effect is statistically significant we clearly need to fit the model, so we can check
assumptions (construct residual plots and diagnostics, perform a Levene’s test, and so on);
indeed, even if the interaction is not statistically significant, we would still want to do this to
make sure that violations of assumptions haven’t resulted in the test mistakenly indicating
that the interaction is not needed. To create the indicator variables we need, click on Calc
→ Make Indicator Variables. Enter the first categorical variable (say Network) under
Indicator variables for:. The program will automatically provide names for the indicator
variables that will be formed, but you can change those if you want. Note that an indicator
variable for each of the categories will be formed, but one should be ignored. Do the same for
the second categorical variable (once again ignoring one of the variables formed). Finally, use
the calculator to construct the pairwise products of each indicator for rows by each indicator
for columns.
If you now try to fit the model using these variables using the regression program (not
the General Linear Model) treating the indicators as continuous predictors, it will work, but
you have to remember to not include any of the product variables that are all zeroes in your
regression call (there will one of these for each of the empty cells). Here are results based
on logged rating (this output is from Minitab 16, which is why it looks a little different).
This is OLS output, but if nonconstant variance was indicated a WLS analysis would be
conducted by constructing a weight variable in the same way as was done earlier.
Regression Analysis: Logged rating versus Network_ABC, Network_CBS, ...
The regression equation is
Logged rating = 0.613 + 0.0571 Network_ABC + 0.162 Network_CBS
- 0.716 Network_CW + 0.0736 Network_FOX - 0.261 Type_Comedy
- 0.114 Type_Drama - 0.116 Type_News + 0.226 ABCComedy
+ 0.131 ABCDrama - 0.057 ABCNews + 0.313 CBSComedy
+ 0.202 CBSDrama + 0.085 CBSNews + 0.203 CWDrama
+ 0.0107 FoxComedy - 0.0232 FoxDrama
Predictor Coef SE Coef T P
c©2017, Jeffrey S. Simonoff 24
Constant 0.61256 0.03482 17.59 0.000
Network_ABC 0.05712 0.05354 1.07 0.288
Network_CBS 0.16166 0.06965 2.32 0.022
Network_CW -0.71650 0.07590 -9.44 0.000
Network_FOX 0.07358 0.05687 1.29 0.198
Type_Comedy -0.26062 0.05354 -4.87 0.000
Type_Drama -0.11385 0.05506 -2.07 0.041
Type_News -0.11615 0.08530 -1.36 0.176
ABCComedy 0.22571 0.07858 2.87 0.005
ABCDrama 0.13148 0.07736 1.70 0.092
ABCNews -0.0572 0.1161 -0.49 0.623
CBSComedy 0.31297 0.09370 3.34 0.001
CBSDrama 0.20161 0.08927 2.26 0.026
CBSNews 0.0851 0.1415 0.60 0.548
CWDrama 0.20277 0.09695 2.09 0.038
FoxComedy 0.01069 0.08002 0.13 0.894
FoxDrama -0.02323 0.08290 -0.28 0.780
S = 0.134876 R-Sq = 77.5% R-Sq(adj) = 74.9%
Analysis of Variance
Source DF SS MS F P
Regression 16 8.54383 0.53399 29.35 0.000
Residual Error 136 2.47404 0.01819
Total 152 11.01788
Here is an interaction plot:
c©2017, Jeffrey S. Simonoff 25
In addition to the patterns we saw before, we see that CW has much lower ratings than
any of the other networks, and while news shows for NBC have ratings at about the middle
level for that network, they are lowest for CBS shows, and by far the lowest-rated shows for
ABC.
c©2017, Jeffrey S. Simonoff 26
Minitab commands
Two–way analysis of variance is conducted by clicking on Stat → ANOVA → General
Linear Model→ Fit General Linear Model. Enter the target variable under Responses:
and the two categorizing predicting variables under Factors:. To include the interaction
effect, click on Model. Highlight the two factor variables to the left, and click on Add.
This will add the variables “multiplied” by each other (i.e., ROW*COL) under Terms in the
model:. Residual plots and storage are obtained as stated earlier. To get effect estimates for
your model, click on Options and then All terms in the model in the drop-down menu
next to Means:. Note that the effects for main effects are not interpretable in the presence
of the interaction.
To construct an interaction plot, click on Stat→ ANOVA→ Interaction plot. Enter the
two predicting variables that define the interaction under Factors:, and enter the response
variable next to the box labeled Responses:.
Levene’s test is constructed in the usual way by fitting a two-way ANOVA with the
absolute standardized residuals as the response. Note that if nij = 1 in some cell(s) the
standardized residual produced by Minitab for that single observation will be set to the
missing value code * because technically the standardized residual is undefined (hii = 1
for the observation in a cell with nij = 1, so the standardized residual is 0/0). For such
an observation set the standardized residual equal to 0 and the weight equal to 1, since
the observation will be fit perfectly (resulting in a zero residual) no matter what weight is
used. Remember that if the interaction effect is not significant in the Levene’s test ANOVA
you should run it again with the interaction effect removed to see if it is related to one
or the other main effect; to do so highlight the product term in the box under Terms in
the model and click “X”. If weights for a weighted least squares fit depend on only one
of the effects, they can be determined using the method described for one-way ANOVA
models. If weights are needed based on two categorical variables (either if the interaction
effect in the Levene’s test is statistically significant or if it is not but both main effects in the
Levene’s test are), they can be estimated simultaneously. Click on Stat → ANOVA → Test
for Equal Variances. Enter the residuals from the OLS fit under Response:, and the two
variables under Factors:. The resultant output gives the standard deviations of the residuals
separated by the levels of the variables under “StDev” in the portion labeled Bonferroni
confidence intervals for standard deviations. The weights are one over the squared
standard deviations. Note that you should not use the tests provided in the output as
your test of constant variance in a two-way ANOVA, as they do not take into account the
potential structure in the nonconstant variance; construct Levene’s test as is described in
the handout. An alternative approach to get weights is to estimate the variances in the way
c©2017, Jeffrey S. Simonoff 27
that is discussed for a numerical predictor in the Appendix of the CAPM handout. That is,
save the standardized residuals SRES from the original two–way ANOVA, perform a two-way
ANOVA with log(SRES ∗ SRES) as the target variable, saving the fitted values; and then set
the weights equal to WT = 1/exp(FITS).
To construct a table tabulating counts of the observations separated by a cross-classification
of predictive variables, click Stat → Tables → Cross Tabulation and Chi-Square. En-
ter the variables that define the effects in the ANOVA under Categorical variables (one
under For rows and the other under For columns). To get a table of means of the re-
sponse variable separated by the predicting variables, click Stat→ Tables→ Descriptive
Statistics. Enter the variables that define the effects in the ANOVA under Categorical
variables (one under For rows and the other under For columns), and click on Associated
Variables. Enter the target variable for the ANOVA under Associated variables: and
click in the box next to Means. In this situation, you might also want to obtain the esti-
mated target variable for each combination of the two predictors. This is not the response
cell mean, since the interaction effect hasn’t been fit. Using a calculator, calculate the overall
average of the fitted means given for one of the two effects (it doesn’t matter which one).
The estimated expected response for the (i, j)th combination is the ith row effect + the jth
column effect − the overall average.
If a two-way ANOVA model is fit without an interaction term, multiple comparisons for
either main effect (or both) can be obtained by highlighting and entering each term (or both)
under Choose terms for comparisons as is done for one-way ANOVA models. In a model
that includes an interaction, comparisons can be made between the different combinations
of row and column level by entering the interaction (ROW*COL for the data analyzed in this
handout) under “Terms:”.
c©2017, Jeffrey S. Simonoff 28