Core Statistics Project

Group Project Report GBA 462 – Core Statistics for MS Students

Team Members: Yifan Zhang Fan Yang Lin Cong Nikolaos Polyzopoulos Shubham Sharma

Executive Summary

Our main goal is to analyze the stock prices of different sectors and companies to have a better

understanding of their performances in the past 10 years (2005 - 2015). We exercised basic

data analysis on each company and compared their mean and standard deviation of the stock

prices and the average monthly return to grasp a brief idea of their overall performance. In

order to further learn how they performed comparatively to the other companies in their

sector, as well as the entire market, we practiced hypothesis tests on a few companies and

applied the regression models to each sector.

Data Gathering

We collected stock data from Yahoo finance(finance.yahoo.com). We picked 3 sectors

(consumer discretionary, financial and information technology respectively) out of 10 of our

interest and 5 companies in each sector, so we conducted statistical analysis on stock prices of

15 companies in total. At the same time, we collected Dow Jones Index and S&P 500 index from

Yahoo Finance and Exchange rates of USD/EUR and US Treasury long term rates from

Quandl(www.quandl.com). Of all the data we listed, we used the same time window and data

frequency from September 2005 to September 2015 (ten years in total) monthly.

Data Analysis and Evaluation

Basic Data Analysis

We first calculated the monthly percentage return for each company by dividing the changes in

stock prices relative to the previous period (monthly), to the stock price of the previous period.

Then we averaged each company’s monthly price and monthly return over the last 10 years

(2005 – 2015), and calculated their standard deviation to have a general idea on which

companies performed similarly and which companies performed noticeably better/worse. A

summary of the mean and standard deviation of the stock price and the average monthly

return is shown in the following chart:

* Red highlights show the highest mean of stock price/average monthly return in each sector.

* Green highlights show the lowest mean of stock price/average monthly return in each sector.

In the rest of the project we will give a more detailed analysis by using the Hypothesis Tests and

Regression Model Analysis.

Hypothesis Tests

A statistical hypothesis test is a method of statistical inference with observing sets of random

variables used for testing a statistical hypothesis. We build two hypothesis testing models to

compare which company performed better during the past 10 years by comparing their average

monthly return of stock prices. The two models we select are Google Inc. A VS. Microsoft Corp,

and JP Morgan Chase & Co VS. Morgan Stanley respectively.

Hypothesis Test #1: Google Inc. A VS. Microsoft Corp

In this hypothesis test, we are testing whether Microsoft Corp performed worse than Google

Inc. during the last 10 years by comparing their mean stock return at 95% confident. Here are

the data we collect:

Google Inc Microsoft Corp

Mean stock return(x̅) 1.52% 0.85%

Standard deviation (s) 0.087924 0.070337

The null hypothesis here is H0: μMFST − μGOOGL ≥ 0(Microsoft Corp’s stock return performed

better than or equal to Google Inc’s), and the alternative hypothesis is Ha: μMFST − μGOOGL <

0 (Microsoft Corp’s stock return performed worse than Google Inc’s)

Testing at α = 0.05, each sample has 120 observations. Since the observations are more than

30, we are good to use Z test. Here we have test statistic as:

z =(x1̅ − x2̅̅ ̅) − D0

𝜎(x1̅̅ ̅−x2̅̅ ̅)

where 𝜎(x1̅̅ ̅−x2̅̅ ̅) ≈ √s1

2

n1+

s22

n2 , D0 = 0

We compute z=-0.65587 and compare with Z-critical value equals to -1.64485 which can be

found by using NORM.S.INV(0.05) function in Excel. Since z value does not lie in the reject

region then we could make a statistical decision that null hypothesis H0 should be accepted.

Therefore we can conclude that there is no sufficient evidence that Google’s stock return is

statistically significantly higher than Microsoft during the past 10 years.

Hypothesis Test #2: JP Morgan Chase & Co VS. Morgan Stanley

In this hypothesis test, we are testing whether JP Morgan Chase & Co VS. Morgan Stanley

performed better than Morgan Stanley during the last 10 years by comparing their mean stock

return at 95% confident. Here are the data we collect:

JP Morgan Morgan Stanley

Mean stock return(x̅) 1.08% 0.47%

Standard deviation (s) 0.086334 0.1067

The null hypothesis here is H0: μJPM − μMS ≤ 0 (JP Morgan’s stock return performed worse

than or equal to Morgan Stanley’s), and the alternative hypothesis is Ha: μJPM − μMS > 0 (JP

Morgan’s stock return performed better than Morgan Stanley’s)

Testing at α = 0.05, each sample has 120 observations. Since the observations are more than

30, we are good to use Z test. Here we have test statistic as:

z =(x1̅̅ ̅−x2̅̅ ̅)−D0

σ(x1̅̅ ̅̅ −x2̅̅ ̅̅ )

where σ(x1̅̅ ̅−x2̅̅ ̅) ≈ √s1

2

n1+

s22

n2 , D0 = 0

We compute z=0.4843 and compare with Z-critical value equals to 1.64485 which can be found

by using NORM.S.INV(0.95) function in Excel. Since z value does not lie in the reject region then

we could make a statistical decision from Figure 2 that null hypothesis H0 should be accepted.

Therefore we can conclude that there is no sufficient evidence that JP Morgan’s stock return is

statistically significantly higher than Morgan Stanley’s during the past 10 years.

Regression Model Analysis

Regression models provides answers to the relationship of variables, and used mainly to predict

and estimate the unknown data. We conducted 4 regression analysis to understand how these

different variables effect each other in the market, including Amazon and the S&P Index,

Amazon and the 4 market data, the Financial sector as a whole and the 4 market data, as well

as S&P 500 Index and the 3 sectors.

Regression Model #1 (Single): Amazon – S&P 500 index

We would like to compare how changes of S&P 500 index contribute to the changes of stock

prices of one company. Therefore, we build the linear regression analysis model by having the

dependent variable of stock price of Amazon and independent variables of S&P 500 index. After

running it in excel, we get the following results:

From the output of the regression, we are able to conclude the following information:

The general form of linear regression model is y=β0+ β1x. Given that β0=1049.77, β1=2.06, we

achieve the linear regression formula: y=1049.77+2.06x.

As we know, the multiple R measures the strength of the linear relationship between y and x.

We find multiple R=0.78 close to 1 which shows that the stock price of Amazon(y) and the S&P

500 index(x) are positively correlated. As we could see from the regression statistics, R2=0.61

which means about 61% of the sample variation of stock price of Amazon(y) can be explained

by using S&P 500 index(x) to it.

By looking at the Significant F under ANOVA, significance F=2.84*10-26<α=0.05, we could easily

conclude that this linear regression model is correctly established and S&P 500 index(x)

significantly influences the stock price of Amazon(y).

Regression Model #2 (Multiple): Amazon – 4 Market Data

(S&P 500 index, Dow Jones Index, the Exchange rates and US Treasury long term rates)

We would like to find out how each of the S&P 500 index, Dow Jones Index, the Exchange rates

of USD/EUR and US Treasury long term rates contribute to the changes of stock prices of

Amazon. Therefore, we build the multiple regression analysis with dependent variable of stock

price of Amazon and 4 independent variables of S&P 500 index, Dow Jones Index, the Exchange

rates of USD/EUR and US Treasury long term rates. After running it in excel, we get the

following results:

From the output of the regression, we are able to conclude the following information:

The general form of multiple regression model is y=β0+ β1 X1+ β2 X2+…+ βkXk (k independent

variables). From the output, we find β0=164.69, β1=0.17, β2=0.0049, β3=-16.61, β4=-72.63. This

gives us the following linear regression formula:

Y=164.69+0.17X1 + 0.0049X2 - 16.61X3 - 72.62X4

As we could see from the regression statistics that R2=0.84, showing that about 84% of the

sample variation of stock price of Amazon(y) can be explained by using the 4 market index to

predict it. By looking at the Significant F under ANOVA, which is much lower than our proposed

α = 0.1, we could conclude that this regression model is effective. However, when we observe

the P-value of each independent variable separately, we will see that the data of Dow Jones

Index and Exchange rates does not have a significant influence on the variation of Amazon stock

prices as their P-values are higher than α=0.1. The S&P 500 Index and US Treasury rates, on the

other hand, do have positive impacts on the Amazon stocks prices.

Regression Model #3 (Multiple): Financials Sector – 4 Market Data


We would like to compare how each of the S&P 500 index, Dow Jones Index, US Treasury Bond

rates, and Exchanges rates of USD/EUR contributes to the changes in the average stock price of

the Financials sector. Therefore, we created the regression data analysis by having the average

stock price of the Financial Sector as the dependent variable and the S&P 500 index, Dow Jones

Index, US Treasury Bond rates, and Exchanges rates as the 4 independent variables. After

running it in excel, we get the following results:

From the output of the data, we are able to conclude the following information:

Given that the coefficient of the intercept is β0 = -32.31, of S&P 500 index is β1 = 0.042, of Dow

Jones Index is β2 = -0.0004, of Exchange rate is β3 = -2.089, and of the US Treasury rate is β4 =

9.1203, we achieve the linear regression formula:

y = -32.31 + 0.042X1 – 0.0004X2 – 2.089X3 + 9.1203X4

As we could see from the regression statistics, the R Square for this model is 0.835, showing

that about 83.5% of the sample variation in the average stock price of the Financial sector can

be explained by using the 4 market data to predict it.

By looking at the Significant F under ANOVA, which is greatly smaller than 𝛼=0.05, we could

easily conclude that this linear regression model is correctly established and could be used for

predicting related information. However, when we observe the P-value of each independent

variable separately, we will be able to see that the data of Dow Jones Index and Exchange rates

does not have a significant influence on the variation of the Financial sector stock prices as their

P-values are higher than 𝛼=0.05. The S&P 500 Index and US Treasury rates, on the other hand,

do have positive impacts on the Financial sector stocks prices.

The plotted graphs below provide a more straightforward understanding of the correlation

between the Financials Sector and the 4 market data:

Figure 1: Confidence Interval

Figure 2: Prediction Interval

Figure 3: Linear Regression Line

Multicollinearity Check

Since in common sense, there might be a significant correlation between Dow Jones and S&P

500, we use Gretl to check if there is multicollinearity between these two variables. From Figure

4, we recognize the correlation coefficients between the four variables - S&P 500, Dow Jones,

Exchange Rate and US Treasury Bill rates respectively. As shown, the correlation coefficient

between S&P 500 and Dow Jones is 0.9856, which is higher than 0.8. We therefore could draw a

conclusion that there is an extreme multicollinearity between S&P 500 and Dow Jones. To make

our model more precise, we need to eliminate one of these two variables. Stepwise regression

is employed in this case, and after conducting it through Gretl (Figure 5 and Figure 6), the p-

value of the model is much lower than previous model. In conclusion, we choose to eliminate

Dow Jones instead of S&P 500 to optimize the model. However, the P-value is not of significant

difference between these two.

Figure 4

Figure 5

Figure 6

Regression Model #4 (Multiple): S&P 500 Index – 3 Sectors


While the S&P 500 index contributes to the performance of each company and sector, it is also

true vice versa. Since the S&P 500 index is established using the data of 500 companies, we

would now like to see if the S&P 500 index correlates with the consumer discretionary,

financials, and information technology sectors. We build the regression model by using S&P 500

index as the dependent variable and the 3 sectors as the independent variables. Excel provides

us with the following results:

The conclusions are as follow:

Given that the R Square is 0.925, we know that 92.5% of the sample variation in the S&P 500

index can be explained by using the 3 sectors’ average to predict the S&P 500 index. The

Significance F that is so much smaller than 𝛼 = 0.05, shows evidence that this regression model

as a whole hold. As for the influence of each sector individually, we can conclude by looking at

their P-values that both the Financial sector and the Informational sector have a correlation

with the S&P, whereas the Consumer sector does not.

The coefficients of each sector gives us an idea about the weigh on how each of them

contributes to the S&P data. For the Financials sector, a dollar increase in the average stock

price will lead to 9.39 units increase in the S&P 500 index data, and for the Informational

sector, a dollar increase in the average stock price will lead to 6.71 units increase in the S&P

500 index data. Therefore, the Financial sector has a higher weigh on the results of the S&P 500

index.

Conclusion

Documents

Core Statistics Project