Upload
nikolaos-polyzopoulos
View
109
Download
0
Embed Size (px)
Citation preview
Group Project Report GBA 462 – Core Statistics for MS Students
Team Members: Yifan Zhang Fan Yang Lin Cong Nikolaos Polyzopoulos Shubham Sharma
Executive Summary
Our main goal is to analyze the stock prices of different sectors and companies to have a better
understanding of their performances in the past 10 years (2005 - 2015). We exercised basic
data analysis on each company and compared their mean and standard deviation of the stock
prices and the average monthly return to grasp a brief idea of their overall performance. In
order to further learn how they performed comparatively to the other companies in their
sector, as well as the entire market, we practiced hypothesis tests on a few companies and
applied the regression models to each sector.
Data Gathering
We collected stock data from Yahoo finance(finance.yahoo.com). We picked 3 sectors
(consumer discretionary, financial and information technology respectively) out of 10 of our
interest and 5 companies in each sector, so we conducted statistical analysis on stock prices of
15 companies in total. At the same time, we collected Dow Jones Index and S&P 500 index from
Yahoo Finance and Exchange rates of USD/EUR and US Treasury long term rates from
Quandl(www.quandl.com). Of all the data we listed, we used the same time window and data
frequency from September 2005 to September 2015 (ten years in total) monthly.
Data Analysis and Evaluation
Basic Data Analysis
We first calculated the monthly percentage return for each company by dividing the changes in
stock prices relative to the previous period (monthly), to the stock price of the previous period.
Then we averaged each company’s monthly price and monthly return over the last 10 years
(2005 – 2015), and calculated their standard deviation to have a general idea on which
companies performed similarly and which companies performed noticeably better/worse. A
summary of the mean and standard deviation of the stock price and the average monthly
return is shown in the following chart:
* Red highlights show the highest mean of stock price/average monthly return in each sector.
* Green highlights show the lowest mean of stock price/average monthly return in each sector.
In the rest of the project we will give a more detailed analysis by using the Hypothesis Tests and
Regression Model Analysis.
Hypothesis Tests
A statistical hypothesis test is a method of statistical inference with observing sets of random
variables used for testing a statistical hypothesis. We build two hypothesis testing models to
compare which company performed better during the past 10 years by comparing their average
monthly return of stock prices. The two models we select are Google Inc. A VS. Microsoft Corp,
and JP Morgan Chase & Co VS. Morgan Stanley respectively.
Hypothesis Test #1: Google Inc. A VS. Microsoft Corp
In this hypothesis test, we are testing whether Microsoft Corp performed worse than Google
Inc. during the last 10 years by comparing their mean stock return at 95% confident. Here are
the data we collect:
Google Inc Microsoft Corp
Mean stock return(x̅) 1.52% 0.85%
Standard deviation (s) 0.087924 0.070337
The null hypothesis here is H0: μMFST − μGOOGL ≥ 0(Microsoft Corp’s stock return performed
better than or equal to Google Inc’s), and the alternative hypothesis is Ha: μMFST − μGOOGL <
0 (Microsoft Corp’s stock return performed worse than Google Inc’s)
Testing at α = 0.05, each sample has 120 observations. Since the observations are more than
30, we are good to use Z test. Here we have test statistic as:
z =(x1̅ − x2̅̅ ̅) − D0
𝜎(x1̅̅ ̅−x2̅̅ ̅)
where 𝜎(x1̅̅ ̅−x2̅̅ ̅) ≈ √s1
2
n1+
s22
n2 , D0 = 0
We compute z=-0.65587 and compare with Z-critical value equals to -1.64485 which can be
found by using NORM.S.INV(0.05) function in Excel. Since z value does not lie in the reject
region then we could make a statistical decision that null hypothesis H0 should be accepted.
Therefore we can conclude that there is no sufficient evidence that Google’s stock return is
statistically significantly higher than Microsoft during the past 10 years.
Hypothesis Test #2: JP Morgan Chase & Co VS. Morgan Stanley
In this hypothesis test, we are testing whether JP Morgan Chase & Co VS. Morgan Stanley
performed better than Morgan Stanley during the last 10 years by comparing their mean stock
return at 95% confident. Here are the data we collect:
JP Morgan Morgan Stanley
Mean stock return(x̅) 1.08% 0.47%
Standard deviation (s) 0.086334 0.1067
The null hypothesis here is H0: μJPM − μMS ≤ 0 (JP Morgan’s stock return performed worse
than or equal to Morgan Stanley’s), and the alternative hypothesis is Ha: μJPM − μMS > 0 (JP
Morgan’s stock return performed better than Morgan Stanley’s)
Testing at α = 0.05, each sample has 120 observations. Since the observations are more than
30, we are good to use Z test. Here we have test statistic as:
z =(x1̅̅ ̅−x2̅̅ ̅)−D0
σ(x1̅̅ ̅̅ −x2̅̅ ̅̅ )
where σ(x1̅̅ ̅−x2̅̅ ̅) ≈ √s1
2
n1+
s22
n2 , D0 = 0
We compute z=0.4843 and compare with Z-critical value equals to 1.64485 which can be found
by using NORM.S.INV(0.95) function in Excel. Since z value does not lie in the reject region then
we could make a statistical decision from Figure 2 that null hypothesis H0 should be accepted.
Therefore we can conclude that there is no sufficient evidence that JP Morgan’s stock return is
statistically significantly higher than Morgan Stanley’s during the past 10 years.
Regression Model Analysis
Regression models provides answers to the relationship of variables, and used mainly to predict
and estimate the unknown data. We conducted 4 regression analysis to understand how these
different variables effect each other in the market, including Amazon and the S&P Index,
Amazon and the 4 market data, the Financial sector as a whole and the 4 market data, as well
as S&P 500 Index and the 3 sectors.
Regression Model #1 (Single): Amazon – S&P 500 index
We would like to compare how changes of S&P 500 index contribute to the changes of stock
prices of one company. Therefore, we build the linear regression analysis model by having the
dependent variable of stock price of Amazon and independent variables of S&P 500 index. After
running it in excel, we get the following results:
From the output of the regression, we are able to conclude the following information:
The general form of linear regression model is y=β0+ β1x. Given that β0=1049.77, β1=2.06, we
achieve the linear regression formula: y=1049.77+2.06x.
As we know, the multiple R measures the strength of the linear relationship between y and x.
We find multiple R=0.78 close to 1 which shows that the stock price of Amazon(y) and the S&P
500 index(x) are positively correlated. As we could see from the regression statistics, R2=0.61
which means about 61% of the sample variation of stock price of Amazon(y) can be explained
by using S&P 500 index(x) to it.
By looking at the Significant F under ANOVA, significance F=2.84*10-26<α=0.05, we could easily
conclude that this linear regression model is correctly established and S&P 500 index(x)
significantly influences the stock price of Amazon(y).
Regression Model #2 (Multiple): Amazon – 4 Market Data
(S&P 500 index, Dow Jones Index, the Exchange rates and US Treasury long term rates)
We would like to find out how each of the S&P 500 index, Dow Jones Index, the Exchange rates
of USD/EUR and US Treasury long term rates contribute to the changes of stock prices of
Amazon. Therefore, we build the multiple regression analysis with dependent variable of stock
price of Amazon and 4 independent variables of S&P 500 index, Dow Jones Index, the Exchange
rates of USD/EUR and US Treasury long term rates. After running it in excel, we get the
following results:
From the output of the regression, we are able to conclude the following information:
The general form of multiple regression model is y=β0+ β1 X1+ β2 X2+…+ βkXk (k independent
variables). From the output, we find β0=164.69, β1=0.17, β2=0.0049, β3=-16.61, β4=-72.63. This
gives us the following linear regression formula:
Y=164.69+0.17X1 + 0.0049X2 - 16.61X3 - 72.62X4
As we could see from the regression statistics that R2=0.84, showing that about 84% of the
sample variation of stock price of Amazon(y) can be explained by using the 4 market index to
predict it. By looking at the Significant F under ANOVA, which is much lower than our proposed
α = 0.1, we could conclude that this regression model is effective. However, when we observe
the P-value of each independent variable separately, we will see that the data of Dow Jones
Index and Exchange rates does not have a significant influence on the variation of Amazon stock
prices as their P-values are higher than α=0.1. The S&P 500 Index and US Treasury rates, on the
other hand, do have positive impacts on the Amazon stocks prices.
Regression Model #3 (Multiple): Financials Sector – 4 Market Data
(S&P 500 index, Dow Jones Index, the Exchange rates and US Treasury long term rates)
We would like to compare how each of the S&P 500 index, Dow Jones Index, US Treasury Bond
rates, and Exchanges rates of USD/EUR contributes to the changes in the average stock price of
the Financials sector. Therefore, we created the regression data analysis by having the average
stock price of the Financial Sector as the dependent variable and the S&P 500 index, Dow Jones
Index, US Treasury Bond rates, and Exchanges rates as the 4 independent variables. After
running it in excel, we get the following results:
From the output of the data, we are able to conclude the following information:
Given that the coefficient of the intercept is β0 = -32.31, of S&P 500 index is β1 = 0.042, of Dow
Jones Index is β2 = -0.0004, of Exchange rate is β3 = -2.089, and of the US Treasury rate is β4 =
9.1203, we achieve the linear regression formula:
y = -32.31 + 0.042X1 – 0.0004X2 – 2.089X3 + 9.1203X4
As we could see from the regression statistics, the R Square for this model is 0.835, showing
that about 83.5% of the sample variation in the average stock price of the Financial sector can
be explained by using the 4 market data to predict it.
By looking at the Significant F under ANOVA, which is greatly smaller than 𝛼=0.05, we could
easily conclude that this linear regression model is correctly established and could be used for
predicting related information. However, when we observe the P-value of each independent
variable separately, we will be able to see that the data of Dow Jones Index and Exchange rates
does not have a significant influence on the variation of the Financial sector stock prices as their
P-values are higher than 𝛼=0.05. The S&P 500 Index and US Treasury rates, on the other hand,
do have positive impacts on the Financial sector stocks prices.
The plotted graphs below provide a more straightforward understanding of the correlation
between the Financials Sector and the 4 market data:
Figure 1: Confidence Interval
Figure 2: Prediction Interval
Figure 3: Linear Regression Line
Multicollinearity Check
Since in common sense, there might be a significant correlation between Dow Jones and S&P
500, we use Gretl to check if there is multicollinearity between these two variables. From Figure
4, we recognize the correlation coefficients between the four variables - S&P 500, Dow Jones,
Exchange Rate and US Treasury Bill rates respectively. As shown, the correlation coefficient
between S&P 500 and Dow Jones is 0.9856, which is higher than 0.8. We therefore could draw a
conclusion that there is an extreme multicollinearity between S&P 500 and Dow Jones. To make
our model more precise, we need to eliminate one of these two variables. Stepwise regression
is employed in this case, and after conducting it through Gretl (Figure 5 and Figure 6), the p-
value of the model is much lower than previous model. In conclusion, we choose to eliminate
Dow Jones instead of S&P 500 to optimize the model. However, the P-value is not of significant
difference between these two.
Figure 4
Figure 5
Figure 6
Regression Model #4 (Multiple): S&P 500 Index – 3 Sectors
(S&P 500 index, Dow Jones Index, the Exchange rates and US Treasury long term rates)
While the S&P 500 index contributes to the performance of each company and sector, it is also
true vice versa. Since the S&P 500 index is established using the data of 500 companies, we
would now like to see if the S&P 500 index correlates with the consumer discretionary,
financials, and information technology sectors. We build the regression model by using S&P 500
index as the dependent variable and the 3 sectors as the independent variables. Excel provides
us with the following results:
The conclusions are as follow:
Given that the R Square is 0.925, we know that 92.5% of the sample variation in the S&P 500
index can be explained by using the 3 sectors’ average to predict the S&P 500 index. The
Significance F that is so much smaller than 𝛼 = 0.05, shows evidence that this regression model
as a whole hold. As for the influence of each sector individually, we can conclude by looking at
their P-values that both the Financial sector and the Informational sector have a correlation
with the S&P, whereas the Consumer sector does not.
The coefficients of each sector gives us an idea about the weigh on how each of them
contributes to the S&P data. For the Financials sector, a dollar increase in the average stock
price will lead to 9.39 units increase in the S&P 500 index data, and for the Informational
sector, a dollar increase in the average stock price will lead to 6.71 units increase in the S&P
500 index data. Therefore, the Financial sector has a higher weigh on the results of the S&P 500
index.
Conclusion