Download pdf - 0000 Stats Practice

7/24/2019 0000 Stats Practice

1/52

Yale SOM

MGT 403: Statistics

Practice Problem Set - P1-1

Introduction. You have been hired to study the evolution of executive compensation over time.Specifically, how CEOs salaries vary between different sectors and how they are related to a com-panys sales in the early 1990s. You receive data on a random sample of CEOs which is containedin ceosalary1.dta. Type describe to see the contents of this data set.

Question 1

(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is thataverage CEO salaries were at most $1,000,000. Another concerns the default belief thataverage CEO salaries were actually $1,200,000. You want to test these two hypotheses. Notethat the data to test them is contained in the variable salary (which measures CEO salary

in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?To answer this question, write down the following steps for each test:

1. The null hypothesis

2. The alternative hypothesis

3. The formula for the realization of the test statistic

4. The rejection region: for which values of the test statistic you reject the null hypothesis

Now use the data to carry out the two tests.

First, do it manually by typing summarize salaryor using the User Menu, Summarizeand Describe Data, Simple Summary Statistics (summarize) to input this command. Usethe result to calculate the realization (or outcome) for the test statistics.

Can you reject the null hypotheses? Why or why not?

Second, check your answer by conducting a test of means in Stata. You can use SimpleTest of Association Test of Means in the Stata User menu.

What are the p-values for each of these two tests? Based on the p-value can you rejectthe null hypothesis at the 5% level for each test? Explain why.

(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by

writing down the formula for the confidence interval

using the results from the command summarize salary to compute the confidence in-terval.

Does 1200 fall in the confidence interval?

1


2/52

Yale SOM

MGT 403: Statistics


The Internet portal Yahoo may allow its members to customize their start pages

(homepages). As part of a short survey regarding likes and dislikes, users were asked

about their interests in options such as QuickTime movie clips with daily news and

sports events on their pages. Yahoo hopes that QuickTime will entice users to follow

a larger number of hyperlinks so that it can attract more advertisers.

The newly customized page option was made available to 100 Internet users whowere randomly sampled from the target population. The benchmark for Yahoo is 6

non-Yahoo content links clicked by all its customers on average prior to the avail-

ability of the QuickTime option (during any one-week period).

After one week of access to the customized homepage option, Yahoo observes

the (average) number of non-Yahoo links for each customer. For the sample of 100

customers, the average is 7.8 links and the standard deviation is 9.5 links.

1. Test the two-tailed null hypothesis that the customization with QuickTimedoes not alter the true average (benchmark) number of links at the 5% -level

(critical value is 1.96). Specify null and alternative hypotheses, compute the

value for the test statistic and state whether you can reject or not the null

hypothesis and why.

2. Construct a 95 percent confidence interval for the true but unknown population

parameter. Interpret the resulting interval statistically and managerially.

3. Which of these two procedures is more informative, the test of the null hypoth-esis or the confidence interval? Explain.

1


3/52

Yale SOM

MGT 403: Statistics



Question 1
















1


4/52

Yale SOM

MGT 403: Statistics



Question 1
















1


5/52

Yale SOM

MGT 403: Statistics













1. Test the two-tailed null hypothesis that the customization with QuickTimedoes not alter the true average (benchmark) number of links at the 5% -level

(critical value is 1.96). Specify null and alternative hypotheses, compute the

value for the test statistic and state whether you can reject or not the null

hypothesis and why.

2. Construct a 95 percent confidence interval for the true but unknown population

parameter. Interpret the resulting interval statistically and managerially.

3. Which of these two procedures is more informative, the test of the null hypoth-esis or the confidence interval? Explain.

1


6/52

Yale SOM

MGT 403: Statistics

Practice Problem Set P1-1 Answers

Question 1

(a) To test the research hypothesis that the mean of salary is at most (less than or equal to) 1000,we have

1. The null hypothesis: H0 : >1000

2. The alternative hypothesis: Ha : 10003. The formula for the realization of the test statistic: t= x1000

/N

4. The rejection region: reject ift < 1.65.To test if the mean of salary is equal to 1200:

1. The null hypothesis: H0 : = 1200

2. The alternative hypothesis: H1 : 6= 12003. The formula for the realization of the test statistic: t= x1200

/N

4. The rejection region: reject if |t| > 1.96 (this is the same as saying that the rejectionregion is t < 1.96or t >1.96).

For the manual calculation of the realization of the test statistic we need the mean ofsalaries in the sample, the standard deviation, and the number of observations. We getall of these from Statas summarize command.

. summarize salary

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

salary | 206 1141.063 611.193 223 4143

Hence the realization or value for the test statistic for the first test equals

t=1141.063 1000

611.193/

206= 3.313.

The realization or value for the test statistic for the second test equals

t=1141.063

1200

611.193/206 = 1.384.

1


7/52

For the first test, since the value for t is 3.3 which is not smaller than -1.65 we cannotreject the null hypothesis in favor of the alternative that that the mean of salaries in thepopulation of CEOs is at most $1,000,000 at the 5% level. For the second test, since thevalue for the test statistictof -1.384 is not in the rejection region oft < 1.96ort >1.96we also cannot reject the null hypothesis that the mean of salaries in the population ofCEOs is equal to $1,200,000 at the 5% level.

We get the same results using the ttestcommand in Stata. Note that when Stata setsas a default 95% confidence level", it is just asking you if you would like to see the 95%confidence interval for the unknown population mean of CEO salaries together with thevalue fort.

. ttest salary == 1000

One-sample t test

------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------salary | 206 1141.063 42.58383 611.193 1057.105 1225.022

------------------------------------------------------------------------------

mean = mean(salary) t = 3.3126

Ho: mean = 1000 degrees of freedom = 205

Ha: mean < 1000 Ha: mean != 1000 Ha: mean > 1000

Pr(T < t) = 0.9995 Pr(|T| > |t|) = 0.0011 Pr(T > t) = 0.0005

. ttest salary == 1200

One-sample t test------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

salary | 206 1141.063 42.58383 611.193 1057.105 1225.022

------------------------------------------------------------------------------

mean = mean(salary) t = -1.3840

Ho: mean = 1200 degrees of freedom = 205

Ha: mean < 1200 Ha: mean != 1200 Ha: mean > 1200

Pr(T < t) = 0.0839 Pr(|T| > |t|) = 0.1679 Pr(T > t) = 0.9161

The p-value for the first test is 0.9995. Since the p-value is greater than 5% we cannotreject the null hypothesis in favor of the alternative that that average salaries in thepopulation of CEOs are, at most, $1,000,000 at the 5% level. The p-value for the secondtest is 0.1679. Since the p-value is greater than 5% we also cannot reject the null

2


8/52

hypothesis that average salaries in the population of CEOs are $1,200,000, at the 5%level.

(b) The formula for the 95% confidence interval for the mean of salary in our CEO population

is

x 1.96 N

,x + 1.96

N

.

With the results from summarize salaryabove we get

1141.063 1.96 611.193

206, 1141.063 + 1.96

611.193206

= [1057.5, 1224.5].

Thus we are 95% confident that the true mean of CEO salaries is between[1057.5, 1224.5].Note that the interval contains 1200 and it is almost equal to the interval given to usin the results for the command ttest salary == 1200. We would have obtained thesame values if we had done no rounding.

3


9/52

Yale SOM

MGT 403: Statistics

Practice Problem Set - P1-2- Answers












1. Null hypothesis: H0: = 6

Alternative hypothesis: H0:6= 6 Computing the value for the test statistic

t= x2/N

= 7.869.52/100

= 1.89

Given that t = 1.89 is not in the rejection region for a two-sided test at

the 5% level (t > 1.96 or t


10/52

2. The 95% confidence interval for the true but unknown mean of the number of

links is:

7.81.96 9.510

= 7.81.9 = (5.9, 9.7)

Therefore, we can be quite sure or confident (95 percent) that the true but

unknown population mean is between 5.9 and 9.7 links.

3. None is more informative than the other it depends on the type of question

one is asking. The confidence interval gives us a range for which we are 95%

confident that the population mean falls into. It is good when we want to get

a sense of what the population mean could be. A hypothesis test, in contrast,

allows us to answer a different question: whether a specific hypothesis about

the population it true (supported by the data) or not.

2


11/52

Yale SOM

MGT 403: Statistics

Practice Problem Set P2-1

Introduction You have been asked to analyze the relationship between research and development(R&D) spending and sales of firms in the chemical and telecommunications industries. You receivedata on a random sample of firms contained in the data set rd.dta. Type describe to see thecontents of the data set. The binary (dummy) variable chem is equal to one if the firm is in thechemical industry and equal to zero if the firm is in the telecommunications industry.

Question 1

1. Run the regression of sales as a function of R&D.

2. Does the estimated coefficient suggest that sales and R&D spending are positively or nega-tively correlated?

3. By how much do sales increase or decrease on average when R&D spending increases by onemillion dollars?

4. Is this effect significantly different from zero at the 5% level and why?

5. What is the interpretation of the estimate for the intercept parameter 0?

6. How much does the variation in R&D spending explain the variation in sales?

Question 2

After analyzing the relationship between prices you are asked how the returns of the DJIA andGE are related: when one increases, does the other decrease or vice-versa? Or when one increases

does the other also increase? To investigate this question, you first have to generate the returnsusing the User menu command Manipulate Variables and Obs Generate New Variable and theformula

returnt = 100pricet pricet1

pricet1.

Since each observation in our data set represents one date and the observations are chronologicallysorted, we can implement this formula in Stata by 100 * (close_DJIA - close_DJIA[_n-1]) /close_DJIA[_n-1], for example, for DJIA. Here [_n-1]means that we are taking the observationfrom the previous period. Do the same for GE returns.

1. Plot the relationship between the return for the GE stock and the return for DJIA. Is therelationship increasing or decreasing?

2. We can define the beta of a given stock asa = cov(returna,returnp)

var(returnp) wherereturnaand returnp

are the returns of the stock in question and the stock market index, respectively, and the riskfree rate is constant over time. Given the previous plot, should the beta of the GE stock bepositive or negative? Explain why.

1


12/52

Yale SOM

MGT 403: Statistics

Practice Problem set P2-1 Answers

Question 1

1. The Stata command for the regression is regress sales rd, robust, which yields the fol-lowing output:

Linear regression Number of obs = 61

F( 1, 59) = 42.28

Prob > F = 0.0000

R-squared = 0.7971

Root MSE = 3542

------------------------------------------------------------------------------

| Robust

sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

rd | 18.00799 2.769413 6.50 0.000 12.46641 23.54957

_cons | 1040.966 397.866 2.62 0.011 244.8379 1837.094

------------------------------------------------------------------------------

2. The estimated parameter 1 equals about 18, a positive number, which suggests that R&Dspending and sales are positively correlated

3. When R&D spending increases by one million dollars, predicted sales increase by 18 milliondollars.

4. This effect is significantly different from zero at the 5% level since the value of the t-statisticreported in the regression output is equal to 6.5, which is in the rejection region of t < 1.96or t > 1.96. Accordingly, the p-value is lower than 5% at nearly zero.

5. The estimate for the intercept, 0, is 1041, which implies that predicted sales are 1041 milliondollars when R&D spending equals zero.

6. The R-squared value tells us that variation in R&D spending explains about 80% of thevariation in sales. The fit is pretty high.

1


13/52

Question 2

First generate the returns for DJIA and GE using the following commands:

generate return_DJIA = 100 * (close_DJIA - close_DJIA[_n-1]) / close_DJIA[_n-1]

generate return_GE = 100 * (close_GE - close_GE[_n-1]) / close_GE[_n-1]

1. We plot the returns of GE against the returns of DJIA using the Stata command twoway(scatter return_GE return_DJIA). See Figure 3.

Figure 1: Question I.4

2. From the graph we see that the relationship between GE returns and DJIA returns is posi-tive: higher DJIA returns are associated with higher GE returns. Thus the covariance in thissample between GE and DJIA returns is positive. Thus the numerator for beta, measuringthe covariance between GE returns and DJIA returns is positive. A variance is never negative(recall that a variance involves sums of squared terms, and squared terms are always nonneg-ative), so the denominator for beta is positive. This implies that the beta of the GE stock inthis sample is positive.

2


14/52

Yale SOM

MGT 403: StatisticsPractice Problem Set P2-2

A consulting firm wants to get a better understanding of its cost structure based

on data on costs incurred for projects in the past so as to improve its bidding process

for projects. Experience suggests that there are two main components of costs in a

project: (1) variable costs that are directly related to the size of the project, which

is reasonably proxied by the number of person-hours for the project, and (2) fixed

costs, which are incurred irrespective of the size of the project.

A regression of the total costs (in $) against the number of person-hours based

on data on 42 projects gave the following results:


F( 1, 40) = 157.8

Prob > F = 0.000

R-squared = 0.87Root MSE = 2979

------------------------------------------------------------------------------

| Robust

totalcost | Coef. Std. Err. t P>|t| [95% Conf. Interval

-------------+----------------------------------------------------------------

Person-hours | 372.15 29.629 12.6 0.000 311.0 433.3

_cons | 3209.76 1387.962 2.31 0.030 345.1 6074.4

------------------------------------------------------------------------------

1


15/52

1. Test the null hypothesis that the slope parameter is zero. State the hypotheses

in appropriate symbols, state the p-value, and interpret the result.

2. Define the 95 percent confidence interval for the true slope parameter and

interpret this interval.

3. Assuming that the equation is a reasonable approximation of the nature of

project costs, interpret the slope coefficient precisely in a manner understand-

able by a layperson.

4. What is the best estimate of fixed costs?

5. What is the predicted total cost for a project that will employ 1,000 person-hours?

2


16/52

Yale SOM

MGT 403: StatisticsPractice Problem Set P2-2-Answers

1. H0: 1= 0

Ha: 1 6= 0

The p-value for the test is 0.000.

Since the p-value is less than 5% we can reject the null hypothesis at the

5% level.

We can therefore state that there is a relationship between total costs

of a project and the number of person-hours required to complete the

project in our sample and that this relationship is very significant (it

is not likely that, given our data, the relationship does not exist)

2. The 95% confidence interval for the slope parameter is (311.0, 433.3). We

can state confidently (95% confidence level) that the predicted total cost for

a project for each additional person-hour could be anywhere between $311.0

and $433.3.

3. If the number of person-hours for a project increases by an hour, the total cost

of the project is expected to increase by $372.

4. The best estimate of fixed costs is given by the intercept: the cost of the

project when the number of person-hours is zero. This cost is is $3,210.

5. The predicted total cost for a project that will employ 1,000 person-hours is

3, 210 + 3721000 = $375, 210

1


17/52

Yale SOM

MGT 403: Statistics


Jim Douglas, the manager of Colonial Furniture has been reviewing weekly ad-

vertising expenditures. All of his advertising thus far has been focused on radio.

He is interested in learning how the effect of advertising might differ across different

media. He recorded the following variables:

Sales: Number of customers in each week (individuals visiting an outlet)

# Ads: The number of ads in the week

Medium (1=radio, 2=television).

1. Jim recalled from a class he had taken, that regression analysis could be used to

estimate the effects of the different media. He proposed the following regression

model:

Sales=0+1Ads+2Medium

and he seeks your advice. Would you propose an alternative model? If so, ex-

plain the problem with Jims model. Write out your proposed model explicitly

in the form of an equation.

2. Jim then created one indicator variable: Radio (1 if radio, 0 otherwise, that is,

television). He then ran a regression of Sales against #Ads and Radio. The

results are reported below:

1


18/52


F( 2, 49) = 14.91

Prob > F = 0.0000

R-squared = 0.69

Root MSE = 44.87

--------------------------------------------------------------------------

| Robust

sales | Coef. Std. Err. t P>|t| [95% Conf. Inter

-------------+------------------------------------------------------------Ads | 25 3.98 6.34 0.00 17.23 33.2

Radio | -47 16.44 -2.83 0.01 -79.64 -13.5

_cons | 283 17.46 16.19 0.00 247.50 317.6

--------------------------------------------------------------------------

Interpret the effect for Radio precisely.

3. What is the effect of a 1 unit increase in the number of ads? State this result

precisely.

4. What is the 95% prediction interval for sales when the company airs 50 ads on

television?

2


19/52

Yale SOM

MGT 403: Statistics

Practice Problem Set P3-1-Answers

1. Medium is a categorical variable (with values 1 or 2). So I would not include

it directly. I would create a dummy variable for the Medium category. For

example, Radio (1 if radio; 0 otherwise, that is, TV)

I would then estimate the model (treating TV as the base category):

Sales=0+1Ads+2Radio

2. For any given level of advertising (number of ads), radio ads are expected to

produce 47 fewer customers relative to ads shown on TV.

3. An increase in the number of ads per week by 1 is expected to increase the

number of customers per week by 25, holding the medium through which the

the ads are transmitted constant.

4. To answer this question, we first need to compute the predicted sales value

when airing 50 ads on television.

Sales=283+25*50+(-47)*0=$1,533

Then the 95% prediction interval for sales is:

1, 533 1.96 44.87 = (1, 445.1; 1620.9)

1


20/52

Yale SOM

MGT 403: Statistics


Data Set and Questions

You have been hired to investigate the relationship between individuals physical

attractiveness and their wage. You receive the data set beauty3.dta, which contains

data on the wage and other characteristics, such as education and years of experience,

for a random sample of individuals.

The data set also contains the variablelooks, which measures a given individuals

subjective physical attractiveness. The variable looks encompasses five categories,

where 5 denotes the highest level of attractiveness and 1 denotes the lowest level of

attractiveness. The binary zero-one variablebelavg is derived from looks: belavg

is equal to 1 if looksequals 1 or 2 and 0 otherwise.

1. By how much more/less do individuals with below average looks earn per hour,

on average, relative to individuals with average/above average looks? Run the

appropriate regression and answer the question.

2. Is the above estimate significant at the 5% level?

3. Does experience attenuate the looks" advantage? Run the appropriate regres-

sion and answer the question.

4. How much does the variation in looks and experience explain the variation in

wages?

1


21/52

Yale SOM

MGT 403: Statistics

Statistics Practice PS 3-2 Answers

1. Regression: The Stata command is regress wage belavg, robust, which

yields the following output:

Linear regression Number of obs =

F( 1, 1257) = 1

Prob > F = 0.R-squared = 0.

Root MSE = 4.

--------------------------------------------------------------------------

| Robust

wage | Coef. Std. Err. t P>|t| [95% Conf. Inter

-------------+------------------------------------------------------------

belavg | -1.118143 .3120741 -3.58 0.000 -1.730386 -.505

_cons | 6.387627 .128631 49.66 0.000 6.135272 6.63

--------------------------------------------------------------------------

Individuals with below-average looks earn1.12less per hour.

2. The coefficient onbelavgof1.12is significant at the 5% level since the value

for the t-statistic is lower than that of the critical value of1.96, which implies

that one can reject the null hypothesis at the 5% level. Recall that the rejection

region for a large sample two-sided test that each of the regression coefficientsis equal to zero, at the 5% level, is t < 1.96or t > 1.96.

1


22/52

3. regress wage belavg exper, robust


F( 2, 1256) = 5

Prob > F = 0.

R-squared = 0.

Root MSE = 4.

--------------------------------------------------------------------------

| Robust

wage | Coef. Std. Err. t P>|t| [95% Conf. Inter

-------------+------------------------------------------------------------belavg | -1.270895 .29984 -4.24 0.000 -1.859137 -.682

exper | .0966187 .0093563 10.33 0.000 .078263 .114

_cons | 4.646653 .1693257 27.44 0.000 4.314461 4.97

--------------------------------------------------------------------------

It seems that experience does not attenuate the advantage of looks. Holding

experience constant, those with below-average looks earn1.27dollars per hour

than those with above-average looks. Further, this coefficient is statistically

significant at the 5% level as the p-value of 0.000 is less than 5%.

4. The variation in looks and experience only explain 8% of the variation in wages.

2


23/52

Yale SOM

MGT 403: Statistics

Sample Exam Questions

Administrative Details

This final is open book. You can consult your class notes, problem set solutions

and other materials. But you cannot discuss the exam with anyone. This

constitutes a violation of the honor code. Show all your work, including all the Stata

output relevant to answer the questions.1

Sample Exam Question 1

The Nielsen Media organization conducts tests of commercials in its laboratories.

The firm regularly invites members of identified target markets to its premises. At-

tendees are shown one or more television programs in which commercials are embed-

ded, and asked questions about products and other aspects both before and after

they view programs.

Each study is typically sponsored by a single company such as Procter & Gamble(P&G). On November 29, 2002, Nielsen Media Research did a study on a brand

that was not performing well in the market. P&G was interested in whether new

commercials it proposes to air might change target members preferences for the

brand.

A total of 32 consumers participated in the study. They first provided preference

and perception data on multiple brands. Then they watched two TV programs

with a standard number of commercials. Thereafter they provided preference and

perception data on some of the same brands and other brands. (Researchers also

1Though all the sample questions already show the Stata output, you will have to create your

own Stata output when answering the questions in the exam.

1


24/52

obtained brain scanner analyses based on principles of neuromarketing but those

data are ignored here.)

The data of interest pertain to brand X on which consumer preferences were

obtained both before and after the TV programs (with relevant commercials on brand

X as part of the TV program). The rating was on a 5 point scale, where 5=great

and 1=lousy. The sample data about the preferences of the brand are summarized

below:

1. You have taken a regression course and want to use this promising analytical

technique". You create a dependent variable with 32 before" and 32 after"

preference scores for brand X. The regression includes only one dummy (in-

dicator) variable, AFTER, to distinguish the two categories of observations:

AFTER = 1 if after, 0 if before.

2


25/52


F( 1, 62) = 1.89

Prob > F = 0.17

R-squared = 0.18

Root MSE = 1.53

--------------------------------------------------------------------------

Robust

ratings | Coef. Std. Err. t P>|t| [95% Conf. Inter

-----------+--------------------------------------------------------------

After | 0.551 0.401 1.375 0.175 -0.251 1.3_cons | 2.068 0.283 7.293 0.000 1.500 2.6

--------------------------------------------------------------------------

What is the average preference in the sample for brand X beforethe TV pro-

grams?

2. What is the average preference in the sample for brand X afterthe TV pro-

grams?

3. Based on the above regression, do the ads for brand X have a statistically

significant effect on the average preference for the brand in the target market?

Be precise and show relevant numbers.

4. Your exposure to regression analysis suggests that it may be useful to include

other variables so as to improve the understanding of effects of interest. So

you decide to add two independent variables: PPur = 1, if the consumer has

purchased the product in the past, = 0 if not; Male = 1 if male, = 0 if female.

3


26/52


F( 1, 60) = 35.5

Prob > F = 0.00

R-squared = 0.81

Root MSE = 0.92

--------------------------------------------------------------------------

Robust

ratings | Coef. Std. Err. t P>|t| [95% Conf. Inter

-----------+--------------------------------------------------------------

After | 0.521 0.240 2.170 0.030 0.051 0.9PPur | 2.452 0.254 9.649 0.000 1.942 2.9

Male | -0.047 0.201 -0.234 0.815 -0.441 0.3

_cons | 0.742 0.237 3.126 0.002 0.266 1.2

--------------------------------------------------------------------------

Based on the above analysis, do the ads have a statistically significant effect

on the average preference for the brand in the target market? Be precise and

show relevant numbers.

5. Is your conclusion in (4) different from your conclusion in (3)? Explain the

difference, if any. Relate the idea of controlling for other variables (that is,

adding more relevant variables to the model or holding constant these other

relevant variables) to the difference between the test you did in (3) and in (4).

4


27/52


Investment Bankers earn large fees for making arrangements and giving advice re-

lating to mergers and acquisitions (M&A) when one firm joins with or purchasesanother.

Consider the following regression on the total dollar amount of M&A activity

against the number of deals of the top 15 major firms in this industry.

Dependent Variable: Total M&A Volume (in millions of dollars) for a firm

Independent Variable: Number of Deals for the corresponding firm

Below is the regression output:


F( 1, 34) = 19.9

Prob > F = 0.000

R-squared = 0.604

Root MSE = 12286

------------------------------------------------------------------------------

Robust

M&AVolume | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-----------+----------------------------------------------------------------

Deals | 269.660 60.512 4.456 0.000 138.932 400.389

_cons | 1461.941 5737.309 0.254 0.802 -10932.8 13856.640

------------------------------------------------------------------------------

For all questions, assume that the linearity assumption holds.

1. Does the regression equation have significant explanatory power? Be precise

(use a specific result to explain).

5


28/52

2. How much of the variation in M&A volume across these firms is explained by

the number of deals that each firm handles?

3. What is the marginal increase in M&A volume attributable to an additionaldeal that a firm makes (or what is the predicted difference in M&A volume

between firm B and firm A, if B has one more deal than A)? Be precise and

state the units (e.g. billions? millions?).

4. Your firm wants to be among the top players in this industry next year with

100 deals. Assuming that the estimated relationship applies to next year, what

is your best estimate of M&A volume for your firm if it achieved its goal next

year? Again, be sure to state the units (e.g. millions or billions) Note: The

question asks for your best estimate of the predicted value; so whether or notsomething is statistically significant is irrelevant to this question.

6


29/52


The movie-v2.dtadataset contains a sample of movies shown on U.S movie screens

between 1985 and 2001. It contains the title of the movie, the year of its premiere,the number of screens per week, the movies total U.S box office (revenue from ticket

sales), a binary variable indicating whether the movie was produced in the U.S or

not, and the movies production budget, among other information.

Type describe to see the contents of the data set.

1. Run the regression of box office onto budget and whether the movie was pro-

duced in the U.S.

2. There is a claim that movies with larger budgets generate bigger box offi

cerevenues. Holding the location of production constant, how does an increase

in one thousand dollars in budget change the movies box office?

3. What is the predicted box office for a movie with a 50 million dollar budget

produced in the US?

4. What is the predicted box office for a movie with the same budget produced

outside the US?

5. By how much is the variation in box offi

ce explained by the production budgetand whether a movie is produced in the US?

7


30/52

Yale SOM

MGT 403: Statistics

Sample Exam Questions-Answers

Administrative Details

This final is open book. You can consult your class notes, problem set solutions

and other materials. But you cannot discuss the exam with anyone. This

constitutes a violation of the honor code. Show all your work, including all the Stata

output relevant to answer the questions.1


1. The average preference for brand X before TV programs=2.07 (intercept)

2. The average preference for brand X after TV programs=2.07+0.55=2.62

3. No, the p-value of the difference is 0.18 which is greater than 0.05. Therefore

we cannot reject the null that the difference of 0.55 (After-Before Advertising)

is equal to zero.

4. Yes, now they do.

The null is H0 :After = 0

The alternative is Ha :After 6= 0

The p-value is 0.03


31/52

5. Yes, it is different. The reason is that there are other characteristics, such as

past purchase, that explain a difference in preferences between consumers. By

including such variables in a regression, we have a better chance of learning

the impact of ads on brand preference.

2


32/52


1. The p-value of the F-statistic is 0.000 (Stata rounded it to zero), which is less

than 0.01 or any reasonable type I error probability. Hence the regression ishighly significant and has significant explanatory power.

2. 60.4% of the variation in M&A Volume is explained by the number of deals (as

seen in the R-square).

3. The slope coefficient is 269.66. Therefore, the marginal increase in M&A vol-

ume attributable to an additional deal is $269.66 million.

4. 1,462+269.66*100=$28,428 million or $28.428 billion.

3


33/52


1. The regression is implemented as follows:

. regress boxoffice budget usa, robust


F( 2, 941) = 6

Prob > F = 0.

R-squared = 0.

Root MSE = 4

--------------------------------------------------------------------------

| Robust

boxoffice | Coef. Std. Err. t P>|t| [95% Conf. Inter

-------------+------------------------------------------------------------

budget | 1.032913 .1143477 9.03 0.000 .8085068 1.25

usa | 23697.24 4742.385 5.00 0.000 14390.36 3300

_cons | -10530.44 4557.711 -2.31 0.021 -19474.9 -1585

--------------------------------------------------------------------------

2. Holding the location of production constant, an increase in one thousand dollars

in budget increases the movies box office by 1.03 thousand dollars.

3. The predicted box office for a movie with a a budget of 50 million dollars

produced in the US is:

10, 530.4 + 1.03 50, 000 + 23, 697.2 1 = 64, 666.8 thousand.

That is 64.6668 million dollars.

4


34/52

4. The predicted box office for a movie with the same budget produced outside

the US is

=10, 530.4 + 1.03 50, 000 + 23, 697.2 0 = 40, 969.6 thousand.

That is 40.9696 million dollars.

5. We can see by the R-squared that 31.5% of the variation in box office is ex-

plained by the production budget and whether a movie is produced in the

US.

5


35/52


36/52


37/52


38/52


39/52


40/52


41/52


42/52


43/52


44/52

MGT 403 Statistics PRACTICE PROBLEMS

MGT 403: Probability Modeling and Statistics

STATISTICS: PRACTICE PROBLEMS

This is a PRACTICE PROBLEM SET. You do NOT need to turn it in. It is optional,for students who would like a little more experience solving problems. Solutions will be

posted.

There are 3 QUESTIONS.

Question 1

The Internet portal Yahoo is considering allowing its members to customize their start

pages (homepages). As part of a short survey regarding likes and dislikes, users were asked

about their interests in options such as QuickTime movie clips with daily news and sports

events on their pages. Yahoo hopes that QuickTime will entice users to follow a larger

number of hyperlinks so that it can attract more advertisers.

The newly customized page option with QuickTime links was made available to 100 Inter-

net users who were randomly sampled from the target population. The prior benchmark

for Yahoo has been 6 non-Yahoo content links clicked on average by its members per visit.

It collected data on the 100 users over 1 week to see if the availability of the QuickTime

link options significantly changes the average non-Yahoo links clicked per visit.

Findings: After one week of access to the new customized homepage option with Quick-Time links, Yahoo observes that the average number of non-Yahoo links for each customer

in the sample per visit is 7.8 links and the standard deviation is 9.5 links.

Answer the following:

(i). Draw a graph and test the Null Hypothesis that the customization with QuickTime

does NOT alter the average number of non-Yahoo links clicked. Use the customary

95% Confidence Interval (t critical value is 1.96). State whether you reject the null

hypothesis or not. Also compute the t statistic.

(ii). Draw a new graph and show the 95% Confidence Interval for the estimated meannumber of non-Yahoo links clicked in the sample of 100 customers - be precise as

far as where numerically the boundaries of the Confidence Interval lie? Does the

Confidence Interval include the previous average of 7.8 or not? How does your

answer to this last question relate to your answer to (i)?


45/52


Question 2

You have been hired to study executive compensation patterns. Your current project ex-

amines CEO salaries in the 1990s. You are curious whether some of the popular statements

about high CEO salaries during this time period are correct. You have collected data on

CEO salaries in the 90s - the data is in the STATA dataset ceosalary.dta (available on the

class website on Canvas).

A widely read commentator of the time is known to have stated that average CEO compen-

sation in the 90s (your sample period) was 1.2 million. You want to test this hypothesis.

(i). Carry out the appropriatet test in STATA, just as we did in class and you did in

Problem Set 1, Question 2. What is the t value? Is the Null Hypothesis rejected or

not? What is the p value?

(ii). You can also carry out this kind of test manually in STATA. To do this run thecommandsummarize salaryfrom the command line. This will show you the mean

of salary as well as its standard deviation. To proceed assume the distribution

for salary is a Normal distribution. Now compute the standard deviation of the

Test Statistic which is the average over the observations. To do this recall that the

standard deviation for the Test Statistic is:

=

N

where is the estimated standard deviation of the underlying variable, and N is the sizeof the sample. Once you compute this, go out 1.96 in either direction to construct

the Confidence Interval. Then check whether the Null Hypothesis value lies inside the

Confidence Interval or not.


46/52


Question 2 - Continued - Regression

Next we will run a regression to explore factors that may influence CEO Salaries. Run

a regression in STATA in which salary is the dependent variable and the independent

variables are: sales - sales of the company in the preceding few years; roe - return on

equity for the company in the preceding few years; indus - a dummy variable set to 1

if the company is in an industrial sector; finance - a dummy variable set to 1 if the

company is in the financial secctor; and utility- a dummy variable set to 1 if the company

is a utility (often regulated). Interpret your results. Then run the model without sales.

What is strange (or hard to intepret about these results compared to the results when

sales is included? How can we interpret this?


47/52


Question 3

Consider the VERY small dataset that consists of 3 datapoints:

X1= 1.0 Y1= 200.0

X2= 2.0 Y1= 145.0

X3 = 3.0 Y1= 20.0

Use the formulas:

1=

Ni=1(Xi X)(Yi Y)N

i=1(Xi X)2

0 = Y

1 X

X=N

i=1

Xi Y =N

i=1

Yi

to compute 0 and 1. Then compute residi for each datapoint and finally R2.

After you have done this calculation manually, enter these 3 datapoints into STATA (or

it has already been done for you in the dataset Q3-practice on the Canvas website under

STATS/FEINSTEIN/STATA Datasets. Run the regress command and check your work.


48/52


49/52


50/52


51/52


52/52