View
232
Download
4
Category
Preview:
Citation preview
Introduction to Statistical AnalysisStatistical Methods in Finance
Lecture 1
Ta-Wei Huang
September 8, 2015
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 1 / 20
Table of Contents
We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.
1 What is statistics?
2 Statistical Procedures
3 Next Lecture
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20
Table of Contents
We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.
1 What is statistics?
2 Statistical Procedures
3 Next Lecture
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20
Table of Contents
We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.
1 What is statistics?
2 Statistical Procedures
3 Next Lecture
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20
What is statistics?
Definition on Wikipedia
Statistics is the study of the collection, analysis, interpretation,
presentation, and organization of data.
Descriptive statistics: summarize data from a sample using indexes
such as the mean or standard deviation
Inferential statistics: draw conclusions from data that are subject to
random variation (e.g., observational errors, sampling variation)
Actually, it’s an old-fashioned statement! Modern statistical methods
concerning more than descriptive and inferential analysis!
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 3 / 20
Statistical Procedures
Modern Statistical Procedures
Modern statistical analysis must have the following procedures.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 4 / 20
Statistical Procedures
Problem Formulation
Don’t ask ”what can we learn from this data set!” The most important
question is what the problem we are facing now! Then, decide what kinds
of data you need.
improve credit card coverage of our bank in Taiwan
decrease the non-performing loan ratio of our bank
develop a trading rule to earn higher profit
Domain knowledge plays the most important role in this step! Learn the
basic financial theory and understand how the system works!
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 5 / 20
Statistical Procedures
Data Collection
After you’ve formulate your problem, you should collect the data. It’s
natural to ask two questions: what kinds of data you need and how to
collect?
What kinds of data you need? You need to use domain knowledge to
answer this question. ⇒ Define the population of your problem.
How to collect? Most of the time, we get data from some database.
⇒ Produce representative data for drawing correct information to
solve your problem!
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 6 / 20
Statistical Procedures
Data Cleaning and Exploratory Data Analysis 1
Data cleaning deals with detecting and removing errors and inconsistencies
from data in order to improve the quality of data. There are some types of
dirty data.
Missing values: some required values in the dataset are missing.
Inconsistent responses: usually seen in survey sampling.
Other errors: such as mistyping, non-desired format, etc.
Data cleaning is the most exhaustive step throughout the whole statistical
analysis. We need to clean data so that the final dataset is structured well.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 7 / 20
Statistical Procedures
Data Cleaning and Exploratory Data Analysis 2
EDA is an approach to analyzing data sets to summarize their main
characteristics, often with visual methods. A statistical model can be used
or not, but primarily EDA is for seeing what the data can tell us beyond
the formal modeling or hypothesis testing task.
Graphical techniques: use suitable visualization to discover patterns
Cluster analysis: find individuals with similar features and group them
Dimensional Reduction: decrease # of variables by rotation axes
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 8 / 20
Statistical Procedures
EDA Example 1
Question: Return on large stocks ¿ small stocks? Stock dividend ¿ Cash
Dividend? Is there any interaction effect?
Dividend Policy
Cash Dividend Stock Dividend
Capital
Large4.24% -5.23% 2.69% 8.12%
3.94% 9.37% 6.71% 12.20%
Small5.92% 12.10% 24.65% -8.53%
-9.03% 0.24% 1.69% 12.63%
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 9 / 20
Statistical Procedures
EDA Example 2
How can we find from the following interaction plot (or profile chart)?
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 10 / 20
Statistical Procedures
Statistical Task Formulation 1
Now, after having the structured data, we can determine our task. From
the viewpoint of purpose, we mainly have three kinds of task. Note that
the task should connect tightly with your problem.
Explanatory Analysis: want to find a hidden common structures
behind the population.
Prediction: want to predict some feature when a new individual
comes in. (Why important?)
Forecasting: want to forecast future outcome of a/some given time
series variable(s).
Then, determine your outputs and inputs.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 11 / 20
Statistical Procedures
Statistical Task Formulation 2
Suppose that you have n stocks with variables return on stock i, Ri,t,
risk-free rate, Rf,t, and market return, Rm,t.
Explanatory Analysis: Does the CAPM holds for this data set?
⇒ Need to design a ”empirical form” for CAPM.
⇒ performance measure: the explanatory power
Forecasting: Can we use the CAPM to predict a company’s future
return?
⇒ Need to design a forecasting model.
⇒ performance measure: the predictive accuracy
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 12 / 20
Statistical Procedures
Statistical Methods Selection
Determine models to apply by the following criterion.
Applying methods appropriate for the statistical task.
Applying methods appropriate for the outputs/inputs and data types.
Applying methods applicable for your computer. (Important!)
In finance, linear model and time series analysis are the most popular
methods, but others are also useful, such as multivariate analysis (or data
mining) and statistical learning.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 13 / 20
Statistical Procedures
Data Type
There are three types of data you will use when dealing with a problem in
financial econometrics.
Cross-sectional Data: data on one or more variables collected at a
single point in time.
Time series Data: data that have been collected over a period of time
on one or more variables.
Panel Data: data having the dimensions of both time series and
cross-sections (very often to see).
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 14 / 20
Statistical Procedures
Cross-sectional Data
Corss-sectional data is collected by observing many subjects (such as
individuals, firms, countries, or regions) at the same point of time, or
without regard to differences in time.
Company ID Delisted EPS ROE Profit Margin
3651 1 0.39 3.13 0.68
5296 1 0.13 0.28 -6.74
4975 1 -2.82 -19.67 -76.53
3613 1 4.05 20.88 3.92
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 15 / 20
Statistical Procedures
Time Series Data
A time series is a sequence of data points, typically consisting of successive
measurements made over a time interval.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 16 / 20
Statistical Procedures
Panel Data
panel data refers to multi-dimensional data frequently involving
measurements over time. Panel data contain observations of multiple
variables obtained over multiple time periods for the same individuals.
Example
A simple market model is of the form
Ri,t = αi,t + βi,tRm,t + εi,t,
where Ri,t is the return on stock i at time t and Ri,t is the market return
on stocks at time t.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 17 / 20
Statistical Procedures
Model Evaluation
After getting the result, the evaluation step is necessary, and performance
measures are various for different purposes.
Explanatory analysis: goodness-of-fit and model interpretation.
Prediction: RMSE(prediction), ROC curve and AUC, cost analysis.
Forecasting: RMSE, MAE, MAPE, MASE, and so on.
Spirit
max Profit(Θ) or min Cost(Θ) subject to model risk, where Θ is the
result from a model.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 18 / 20
Statistical Procedures
Final Step: Model Deployment
Congratulations! From here we can deploy our statistical model! Hey, here
we still need to ask some questions.
What is the risk and loss if your model is totally wrong?
Does your dataset reproducible? How long should you update your
model?
If there is a structural change on your population, what should you
do?
Model risk is very important! You should always be aware of the limitation
of your model so that when your model die, the loss is controllable.
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 19 / 20
Next Lecture
The Next Lecture
In next lecture, we will review the probability theory in an advanced level!
Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 20 / 20
Recommended