Upload
doantuong
View
219
Download
1
Embed Size (px)
Citation preview
Data science in Financial Markets:
How Do Data Science Techniques Reshape Financial Trading?
A study submitted in partial fulfilment
of the requirements for the degree of
MSc Data Science
at
THE UNIVERSITY OF SHEFFIELD
by
Duan Zhao
September 2016
Abstract Background
Data Science and data science techniques play a key role in modern financial markets and financial
trading by providing new tools such as backtesting, automated trading system and powerful
applications such as high frequency trading. These innovations have made the markets significantly
different from the past in fundamental ways
Aims
The first aim of this research is to investigate new techniques created by Data Science for financial
trading, and try to understand the trading strategy, computer program and implications buried in
the secrecy of this black box from a micro perspective. The second aim is to investigate the potential
social implications of the massive use of data science techniques in financial trading, and to
understand how modern financial trading reshapes our economy and society from macro
approaches.
Methods
This research will be implemented with both qualitative and quantitative methods. The first part of
the research deals with computer programming to actually build several computer programs that
reproduce backtesting, automated trading system and a few trading strategies. The second part of
the research tries to develop an innovative method to integrate the micro and macro analysis.
Technical details together with insights from the literature, the potential social impacts of these
tools will be analysed and investigated.
Results
Three trading strategies, Mean Average strategy, Improved Mean Average strategy and MACD
strategy are reproduced and backtested with the historical data of 15 listed companies. MACD
strategy and Improved Mean Average strategies perform much better than Mean Average strategy,
making profits on all 15 stocks. The average compound return rate for MACD strategy is 20% per
year during previous five years. A much higher return rate can be achieved if applying same MACD
trading strategy with high frequency data. The average daily return rate of MACD strategy in index
future is around 30%.
Conclusions
Data Science Techniques enhance the profit rate for financial speculators dramatically. The excessive
financial speculation may have negative social impacts, because it subtracts value from the society,
costing a great deal of both physical and human capital without providing any socially beneficial
service. Excessive speculation may also compel companies to sacrifice long-term value in order to
meet short-term goals set by financial markets. From macro point of view, it is the Neoliberal
policies deployed since 1980s that gave rise to various financial innovations. Many of these
innovations aim to pursue self-interest, contributing nothing to the good of society and may transfer
large deal of wealth from real economy to financial sector, resulting the depravation of income
inequality.
Acknowledgements
I would like to express my gratitude to my supervisor, Dr. Jo Bates, who has supported me
throughout my dissertation with her patience, insightful comments and encouragements. It is
impossible for me to complete this dissertation without the her guidance.
I would also like to thank the Information School for create a very interesting data science program. I
really enjoy this year at Sheffield.
I would also like to thank my family whose support and encouragement were more than I can
express on paper.
Contents 1. Introduction ………………………………………………………………………………………………………………………………….1
2. Research aims and objectives ……………………………………………………………………………………………………….1
3. Literature Review ………………………………………………………………………………………………………………………….2
3.1 Backtesting ………………………………………………………………………………………………………………………………2
3.2 Automated trading system ………………………………………………………………………………………………………3
3.3 High frequency trading …………………………………………………………………………………………………………….3
3.3.1 HFT strategies ……………………………………………………………………………………………………………………….3
3.3.2 The winner-take-all nature of HFT ………………………………………………………………………………………..4
3.4 The impacts of utilising data science techniques on financial markets …………………………………….5
4. Methodology …………………………………………………………………………………………………………………………………6
4.1 Data …………………………………………………………………………………………………………………………………………7
4.2 Coding ……………………………………………………………………………………………………………………………………..7
4.3 Visualization …………………………………………………………………………………………………………………………….8
4.4 From micro technical details to macro social impacts ………………………………………………………………8
4.5 Ethical aspects………………………………………………………………………………………………………………………….9
5. Demonstration and case study ………………………………………………………………………………………………………9
5.1Preparation of historical stock data ………………………………………………………………………………………….9
5.1.1 Download data ………………………………………………………………………………………………………………….9
5.1.2 Visualization …………………………………………………………………………………………………………………..10
5.1.2 Stock split ……………………………………………………………………………………………………………………….11
5.2 Moving Average strategy ………………………………………………………………………………………………………12
5.2.1 Calculate Moving Average ………………………………………………………………………………………………12
5.2.2 Backtesting strategy ……………………………………………………………………………………………………….12
5.2.3 Compute profits and visualization …………………………………………………………………………………..13
5.2.4 Evaluation and discussion ……………………………………………………………………………………………….14
5.3 Improved Moving Average Strategy ………………………………………………………………………………………15
5.3.1 Avoid trades in range bound ……………………………………………………………………………………………15
5.3.2 Chose the best value for N ………………………………………………………………………………………………16
5.3.3 Comparison between MA strategy and Improved MA strategy ………………………………………18
5.4 Using technical indicators ………………………………………………………………………………………………………19
5.5 Applying same strategy to other stocks …………………………………………………………………………………21
5.6 Applying MACD strategy to high frequency trading ………………………………………………………………25
6. Potential Social impacts of implying Data Science Technologies in finance trading ……………………26
6.1 Promote financial speculation ………………………………………………………………………………………………26
6.1.1Investing vs. speculation …………………………………………………………………………………………………26
6.1.2 Data Science techniques: super weapons for speculation ………………………………………………27
6.2 Potential social impacts of excessive speculation in finance trading ………………………………………28
6.2.1 Financial speculation subtracts value from society …………………………………………………………29
6.2.2 Waste of capital, both physical and human ……………………………………………………………………30
6.2.3 Negative impacts on corporation governance …………………………………………………………………30
6.3 Financial industry, Neoliberalism and income inequality ………………………………………………………31
6.3.1 The culture of speculation in finance sector ……………………………………………………………………31
6.3.2 The Neoliberal reform in 1980s ………………………………………………………………………………………32
6.3.3 Financial innovation and income inequality ……………………………………………………………………33
7. Conclusion …………………………………………………………………………………………………………………………………34
References
Appendices
List of tables and figures
Table 1 Example of historical market data ………………………………………………………………………………………7
Table 2 Backtesting of 15 stocks ……………………………………………………………………………………………………21
Figure 1 E-Mini S&P 500 futures prices during the flash crash …………………………………………………………5
Figure 2 Shanghai composite index during China’s wild swings ………………………………………………………6
Figure 3 the interface of R studio ……………………………………………………………………………………………………7
Figure 4 Backtesting result of the share price of Yahoo ……………………………………………………………………8
Figure 5 Backtesting result of the share price of Facebook ………………………………………………………………8
Figure 6 Codes for downloading data ………………………………………………………………………………………………9
Figure 7 Historical data of Apple ……………………………………………………………………………………………………10
Figure 8 Candlestick chart ………………………………………………………………………………………………………………10
Figure 9 Codes for plotting candlestick chart …………………………………………………………………………………10
Figure 10 Visualization of Apple historical share price ……………………………………………………………………11
Figure 11 Codes for dealing with stock split ……………………………………………………………………………………11
Figure 12 Visualization of Apple historical share price recalculated …………………………………………………11
Figure 13 Codes for calculating MA 5 & 20 lines ……………………………………………………………………………12
Figure 14 Apple historical share price with MA 5 and MA 20 …………………………………………………………12
Figure 15 Codes for MA strategy ……………………………………………………………………………………………………13
Figure 16 Codes for calculating profits ……………………………………………………………………………………………13
Figure 17 Backtesting result for MA strategy on Apple ……………………………………………………………………14
Figure 18 Codes for calculating profit rate ………………………………………………………………………………………14
Figure 19 Backtesting result for MA strategy on Apple from Jan. 2011 to Apr. 2012 ………………………15
Figure 20 Codes of Improved MA strategy ………………………………………………………………………………………16
Figure 21 Backtesting results for Improved MA strategy on Apple with different values of N …………16
Figure 22 Codes of Improved MA strategy ………………………………………………………………………………………17
Figure 23 Backtesting result for Improved MA strategy on Apple ……………………………………………………17
Figure 24 Comparison between MA strategy and Improved MA strategy ………………………………………18
Figure 25 MACD indicator ………………………………………………………………………………………………………………19
Figure 26 Codes for calculating MACD ……………………………………………………………………………………………19
Figure 27 Codes for MACD strategy …………………………………………………………………………………………………19
Figure 28 Backtesting results for MACD strategy ……………………………………………………………………………20
Figure 29 Backtesting results for MA strategy on Citi group ……………………………………………………………22
Figure 30 Backtesting results for MA strategy on Nestle …………………………………………………………………22
Figure 31 Backtesting results for Improved MA strategy on Citi group ……………………………………………23
Figure 32 Backtesting results for Improved MA strategy on Nestle …………………………………………………23
Figure 33 Backtesting results for MACD strategy on Volkswagen Group …………………………………………24
Figure 34 Backtesting results for MACD strategy on Toyota Motor …………………………………………………24
Figure 35 Daily return rate of Index points ………………………………………………………………………………………25
Figure 36 Backtesting for MACD strategy on index future on 2015-05-06 ………………………………………25
Figure 37 Backtesting for MACD strategy on index future on 2015-07-02 ………………………………………26
Figure 38 Financial Business and Nonfinancial Business Augmented Rates of Profit in U.S. ……………33
Figure 39 the top decile income share from 1917-2014 …………………………………………………………………34
Figure 40 Real Median Household Income in the United States ………………………………………………………34
1
1. Introduction
Data Science focuses on extracting information and knowledge from data and producing data based products.
As Provost and Fawcett (2013) stated, “Data science is a set of fundamental principles that support and guide
the principled extraction of information and knowledge from data.” These principles are applied broadly and
intertwined closely with data mining, data analysis, data-driven decision making and big data. Data science
has proved its value in customer behaviour analysis, online recommendations and advertising, credit scoring,
supply-chain management, fraud detection, and financial trading (Schutt & O'Neil, 2013).
The main function of financial system is to allocate resources efficiently by transferring capital from savers to
borrowers (Boot & Thakor, 1997). This function is achieved with two means, bank lending and direct finance.
Direct finance refers to the situation where borrowers borrow funds by selling securities or issuing bonds
through financial markets without using a third party (Mishkin, 2007). Financial markets consist of stock
markets, bond markets, commodity markets, money markets, derivatives markets, futures markets and foreign
exchange markets (Pilbeam, 2010). Despite the benefits of financial markets, over-developed financial market
may become a recipe for economic crises, as we have witnessed, the U.S. sub-prime mortgage financial crisis
led the whole world into recession and harmed the society deeply (Reinhart& Rogoff, 2008).
Data Science and data science techniques play a key role in modern financial markets and financial trading.
As Kitchin (2014) pointed, “With the rise of Data Science, a revolution is underway, reshaping how
knowledge produced and business conducted from an infrastructural level”. As for financial markets, data
science has changed the way in which information and knowledge were gathered, processed, stored and
reused in the financial trading by providing new tools such as backtesting, automated trading system and
various applications such as high frequency trading. These applications have made the markets significantly
different from the past in fundamental ways, as O’Hara stated, “from the way traders trade, to the way markets
are structured, to the way liquidity and price discovery arise” (O’Hara, 2015). At the same time, these
techniques also transition financial trading into a black box. As Pasquale (2015) stressed, the inputs and
outputs of the financial trading systems can be observed, but “we cannot tell how one becomes the other”.
To understand this evolution of the financial markets and trading, the following questions are raised and will
be investigated in this research: How do Data Science technologies reshape Financial Markets? What are the
data products produced by Data Science in financial trading? How do they work? What are the impacts of
these tools on the liquidity, stability and equality of financial markets? What are the potential social impacts
of these techniques?
2. Research aims and objectives
The first aim of this research is to investigate new techniques created by Data Science for financial trading,
and try to understand the trading strategy, computer program and implications buried in the secrecy of this
black box from a micro perspective.
The second aim is to investigate the social implications of the massive use of data science techniques in
financial trading, and to understand how modern financial trading reshapes our economy and society from
macro approaches.
Objective 1:
New tools created by data science techniques, such as backtesting and automated trading system will be
described and demonstrated with cases of computer program coded by myself.
Objective 2:
The applications based on these new tools such as high frequency trading (HFT), will be analysed with case
study and visualization methods.
2
Objective 3:
Financial impacts of these data science applications in financial trading will be analysed through textual
analysis.
Objective 4:
To link the micro and macro, theories that explain the social influences of modern financial trading will be
discussed.
This dissertation is organised as follows. Section 3, literature review, describes two basic tools produced by
data science techniques for financial trading, backtesting and automated trading system which gave rise to
high frequency trading(HFT). Section 4 explains the data and methods used in this research. To understand
the technical details of those new tools, several trading strategies are programmed by myself and backtested
with real historical market data. Both test results and data are visualised to be better understood. Section 5
demonstrated the whole research process with the codes of computer program in order to provide a rough
picture of how these new tools may create extraordinary profits in realistic financial trading. Equipped with
these, section 6 discussed the potential social impacts of these new tools and argues that many of these new
trading tools focus on the financial speculation, which may not only waste social capital but lead to a high
degree of income inequality.
3. Literature Review
In this section, backtesting and automated trading system are first introduced. With the help of these tools,
traders can built various high frequency trading strategies, such as market making, arbitrage and directional
trading. Most of these strategies have a winner-taker-all nature and it cost traders a great deal to maintain its
advantage in the competition.
3.1 Backtesting
Backtesting is the practice of utilising historical financial data to test trading strategies or analytical models to
see how actually the strategy or models would perform (Masteika et al., 2014). The assumption of backtesting
is that if the strategy had worked in the past, then it would have a good but not certain chance to make profits
in future, and conversely if the strategy had failed previously, it would be unsuccessful to perform well in
future (Lopez & Saidenberg, 2000). With the help of computer programs and historical trading data,
backtesting enables investors and analysts to evaluate and optimize their trading strategies and models before
taking them into realistic trading. The evaluation of backtesting is shown by various statistical indicators. For
example, sharp ratio is used to measure the performance of an trading strategy after adjusting for its risk,
dividing the excess return one may receive by the extra volatility that one endure for performing that strategy
(Maier-Paape & Platen, 2014).Unlike traditional investment strategies, backtesting makes it possible for
investors to test and improve strategies before facing any risk in realistic trading(Cao et al., 2004).Due to the
benefits of backtesting, it is widely employed by investors, hedge funds and investment banks (Ni & Zhang,
2005).
Previous studies also illustrate that one of the major problems of backtesting is overfitting. Overfitting is a
machine learning concept, indicating a situation when a model fits a particular observation but is unable to
describe the general structure. For example, a trading strategy can be designed based on a group of parameters
that fits one set of historical data perfectly, but fails to make profits if applying the same strategy to other
period of historical data or realistic trading (Bailey et al., 2014). The risk of overfitting cannot be eliminated,
but can be diminished by reducing the number of parameters or testing the same strategy to various types of
3
historical data (Maier-Paape & Platen, 2014).
3.2 Automated trading system
Automated trading system is a computer program developed to automatically create orders and submit them to
the market exchange (Austin et al., 2004). An automated trading system has multiple functions, passing
instant market data to trading strategies, submitting orders to the market exchange, risk management and
dynamic optimization adapt to changing market situations (Dempster & Leemans, 2006).
The Automated trading system can execute repetitive tasks correctly at a high speed, which is impossible to
human beings. The key part of an automated trading system is building the buy-sell rule, which usually bases
on fundamental analysis or technical analysis. While fundamental analysis evaluates the true value of an asset
with financial factors, such as balance sheet, supply and demand, technical analysis focuses on the pattern of
historical price movement of a security, in order to forecast the future (Creamer & Freund, 2010).One major
branch of technical analysis is indicator analysis. Indicators, such as Moving Average Convergence
Divergence (MACD), Relative Strength Index (RSI), can be calculated with past trading data, and reflect
characteristics of market movement (Liu & Xiao, 2009).
3.3 High frequency trading
High frequency trading (HFT) is one of the major applications of data science techniques on financial trading.
A large range of trading activities and behaviours are described by the term HFT, however, they have at least
three attributes in common, it is done by sophisticated computer program, it depends on extraordinarily fast
speed, and it is strategy-based (O’Hara, 2015).According to the definition from the U.S. Securities and
Exchange Commission (SEC, 2010, p.45), high frequency trading is used to refer to “professional traders
acting in a proprietary capacity that engage in strategies that generate a large number of trades on a daily
basis”.
As an application of data science techniques, a HFT program is an automated trading system pre-set with
strategies, which have been proven profitable by backtesting. A HFT program can run at a high speed with
ultra-low latency. Latency refers to the time interval between a HFT program receives data from the market
and sends orders back to the market (Carrion, 2013). After years of development, latencies of a HFT program
can be reduced to the scale from milliseconds (one millionth of a second) to even nanoseconds (one billionth
of a second), equipped with best micro-chips and co-located with the market exchange computers (Serbera &
Paumard, 2016).
3.3.1 HFT strategies
HFT deploys a wide variety of strategies, which can be divided into three categories, market making, arbitrage
and directional trading (Jones, 2013). Each category is discussed below.
Market Making refers the behaviour that post limit orders on both side of buy and sell of one financial asset,
and gaining profit on the bid-offer spread. Limits orders are orders with a certain number of shares set at a
specified price on the order book, waiting orders from other traders to meet its price requirement. Market
making provides liquidity for the market, because market participants can trade immediately at the price
provided by the market maker (Amihud & Mendelson, 1980). HFT market making strategies, however, differ
from traditional market making in speed and sensitivity. Market makers bear the risk of losing money if their
limit order price is left behind by the current information. HFT market making strategy adjusts its quotes in
response to new information in a high frequency (Menkveld, 2014). They also uses historical correlation
patterns to constantly adjust the quote, for example, if the movement of stock A and stock B is highly
4
correlated in history, when stock A price moves up, HFT market making strategy will immediately raise the
price of limit orders on stock B. As a result, HFT market making strategy tends to submit and cancel massive
limit orders continuously (Jones, 2013).
HFT Arbitrage as a trading strategy heavily depends on the data science techniques, involving data mining,
statistics and automated trading systems. Arbitrage as a trading strategy has been existed for decades, but due
to the advent of data science techniques, finance globalization and market fragmentation, the number of
arbitrage opportunities for HFT has soared since 1990s. Arbitrage vary in many forms, and can be divided into
two groups, deterministic arbitrage and statistical arbitrage. Deterministic arbitrage is the situation when a
sure profit can be obtained with no risk, while statistical arbitrage refers to the trading which takes advantage
of statistical mispricing (Bondarenko, 2003). One example for deterministic arbitrage is the arbitrage between
exchange-traded fund (ETF) and future. Both S&P 500 futures and S&P 500 ETF tracks S&P 500 index.
When a large buy order pushes up the price of S&P 500 futures and the price of S&P 500 ETF does not move
up at that moment, HFT arbitrage strategy will buy the ETF and sell the future, locking the price gap as profit
(Marshall et al., 2013). As for statistical arbitrage, the most common strategy is pairs trading. If the price
movement of two stocks is highly correlated in the history, two stocks of two rival companies for instance,
HFT strategies will buy the under-performer stock and sell the over-performer one, expecting the price gap
between these two stocks will reduce in future and make the profit. Pairs trading is a market-neutral strategy,
because its profit is gained from the volatility of the correlation between two securities, no matter the market
is in uptrend, downtrend, or sideways movement (Elliott et al., 2005).
Directional trading strategies based on the trader’s assessment of the market or stock’s near-term direction.
HFT directional trading strategies assess the direction of a security on related news or detection of large orders.
Some HFT traders use data science techniques to analyse Twitter data and media news, in order to respond
immediately after the release of new information. For example, on Tuesday, 23 April 2013, at 1.07pm Eastern
Standard Time, the official Associated Press Twitter account @AP was hacked and posted a false tweet
saying that “Two Explosions in the White House and Barack Obama is injured”. Within seconds, the U.S.
stock market dropped dramatically and $136.5 billion of the S&P 500 index's value had been wiped out before
the market recovered after a few minutes (Selyukh, 2013). Another category of directional trading strategies
focuses on detecting large order. If HFT strategies detect a series of large buy orders of one stock, which is a
signal that an institutional trader may start to purchase a certain amount of shares. Then, HFT program may
purchase the existing limit sell orders, driving up the price, and sell those shares back to that institutional
trader at a higher price (Jones, 2013).
3.3.2 The winner-take-all nature of HFT
As shown above, most HFT strategies have a winner-take-all nature, and this makes fierce competition
between HFT traders inevitable. For market making strategies, if a HFT trader reacts to the news slower than
others, he will be the victim of the market movement. For arbitrage and directional strategies, only the first
HFT trader who detects the opportunity can make the deal and gain the profits. In HFT world, slightly slower
than others means failure, therefore HFT firms have to invest a great amount of money in order to reduce the
latency as much as they can, such as purchasing the best computers, switches, hiring rocket scientists to create
technological advantage, renting co-location service next to stock exchange, and even digging a gigantic
tunnel to reduce the communication latency from 17 milliseconds to 12 milliseconds with a straight line wire
(Lewis, 2014). High cost and fierce competition have made the HFT business much more difficult than before
and it has been estimated that the total earnings of HFT industry reached its peak of USD $ 5 billion in 2009,
and kept declining to $1billion in 2013 (O’Hara, 2015).
5
3.4 The impacts of utilising data science techniques on financial markets
The applications of data science techniques on financial trading have various impacts on the financial market
and the society in terms of market liquidity, efficiency, stability and fairness.
Market liquidity and efficiency have been improved by HFT. Liquidity refers to the ability to trade a large
amount of securities in a short time period without incurring heavy price movement. Hendershott & Riordan
(2013) discovered that automated trading activities reduce the volatility of liquidity, because they consume
liquidity when it is cheap, and provide liquidity when it is expensive. Jarnecic & Snape (2014) proved that as
HFT strategies constantly submit limit orders at multiple prices, they provide liquidity on a on-going basis.
Market efficiency is enhanced as HFT strategies respond to news faster than traditional traders and HFT
arbitrage strategies are more capable to capture arbitrage opportunities (Manahov et al., 2014).
Despite the benefits brought by data science techniques for financial markets, the market stability may be
threatened by the massive use of automated trading systems. A famous example is the “flash crash” that
happened on 6th May 2010 in the U.S. stock markets (Lauricella, 2010). The U.S. Securities and Exchange
Commission (SEC) and the Commodity Futures Trading Commission (CFTC) later found out that the
extraordinary swing of the market was triggered by a mutual fund’s large selling order. After the limit buy
orders were exhausted, which is a signal of shorting for most HFT programs, a tremendous amount of selling
orders submitted by those HFT programs crashed the market (SEC & CFTC, 2010).
Figure 1 E-Mini S&P 500 futures prices during the flash crash (Jones, 2013)
6
U.S. is not the only one that experienced this kind of accident. On 16th August 2013, China’s stock market
rose 5.96% within 15 minutes, due to a set of large buy orders mistakenly submitted by the automated trading
system of one security company (Shaffer, 2013).
Figure 2 Shanghai composite index during China’s wild swings (source: Tdx.com.cn)
Concerns about fairness are from two perspectives, the fairness between traders, and the fairness between the
industry and society. For example, HFT traders can co-locate their computers with that of exchange, in order
to receive the market information faster and have their orders processed earlier than traditional traders (Angel
& McCabe, 2013).
Generally speaking, Data Science techniques have increase the complexity and uncertainty of the market. It
costs the public and the government much more efforts to understand and supervise it. Social fairness is hard
to define or measure, but concerns arise when so many resources are placed in financial trading simply to
pursuit more profits, while contributes little for society. Frank Pasquale argued that “financial sector took 29
percent of the profits of the American economy while accounting for only 10% of the value added in the
fourth quarter of 2010” (Pasquale, 2015, p.6).
To sum up, new tools built by data science techniques have significant changed the financial trading and
finance industry and have various impacts on both micro and macro level. It is important to understand these
new tools from the micro technical level, in order to get a better understanding of their potential social impacts
from a macro perspective.
4. Methodology
This research will be implemented with both qualitative and quantitative methods.
In order to understand those applications produced by data science techniques for financial trading, and
demonstrate their functions, the first part of the research deals with computer programming to actually build
several computer programs that reproduce backtesting, automated trading system and a few trading strategies.
The algorithm will be written in R, tested with real historical market data. Basic knowledge of financial
markets and financial trading can be found from text books (Pilbeam, 2010). Details of technical indicators
and trading strategies used in this research, such as Moving Average (MA) and Moving Average Convergence
Divergence (MACD) are from existing literature (Chong & Ng, 2008). Skills for building computer program
with R can be found from relevant books such as R in action (Kabacoff, 2015).
The second part of the research tries to develop an innovative method to integrate the micro and macro
analysis. Literature has suggested that financial system plays an important role in the modern economy and
society. It is also addressed that data science techniques, such as high frequency trading, have reshaped the
7
finance trading significantly. In this research, through demonstrating the tools, I aim to develop a detailed
understanding of these techniques at the micro level, together with insights from the literature, to investigate
the potential social impacts of these tools from a macro level.
4.1 Data
All the data for this research is real historical market data, downloaded from public source, such as Yahoo
Finance, Google Finance, Bloomberg and Tushare (A Chinese company providing China’s financial market
data openly and freely via APIs).
Table 1 Example of historical market data
date open high close low volume
2013/3/11 22.72 22.98 22.63 21.85 573293.4
2013/3/12 22.60 22.79 21.72 21.31 840769.1
2013/3/13 21.69 22.58 22.09 21.61 836901.5
2013/3/14 22.00 22.09 21.67 21.30 577853.6
2013/3/15 21.66 22.63 22.20 20.80 943574.1
2013/3/18 21.70 22.00 21.35 21.21 745294.8
2013/3/19 21.30 21.73 21.50 21.04 529771.2
Table 1 is an example of historical market data. The first row records the date, the second to fifth row record
the open price, highest price, close price, and lowest stock price on that day. The volume row is the value of
shares traded that day. This kind of data is the basic input for backtesting, automated trading system and HFT
programs (Maier-Paape & Platen, 2014).
4.2 Coding
Figure 3 the interface of R studio
As shown above, most of the coding work of this research will be done with R studio. R is an open-source
statistical computer programming language, widely used in the academic research. In this research, I will use
R to code computer program to implement backtesting, automated trading system and some strategies used in
HFT.
8
4.3 Visualization
Figure 4 Backtesting result of the share price of Yahoo
Figure 5 Backtesting result of the share price of Facebook
Figure 4 and 5 are examples of one of the visualization tools that will be deployed in this research to
demonstrate how Data Science techniques work in financial trading. Codes of this visualization method are
included in the appendix. The buy and sell signals, created by the strategy in backtesting, are shown in a
candle graph of the stock price. There are also various measures to evaluate the performance of trading
strategies in the literature, for this research the profit rate would be the indicator to measure the performance
of each trading strategy (Masteika et al., 2014).
4.4 From micro technical details to macro social impacts
Data science techniques have produced many new tools for finance and financial trading, and turn them into
“black box”. This research aims to open this black box by developing and demonstrating some of these new
9
tools, and investigation on the social impacts of these new tools based on technical details. I will use the
insights from redeveloping and demonstrating these new tools, together with theories from literature, to
investigate the potential social impacts of these new tools and its relationship to data science techniques.
4.5 Ethical aspects
This research has no risk considering with ethical aspects, because as shown in the section 4, all the data used
in this research will be public market data. No private information will be collected or used in this research.
5. Demonstration and case study
Data Science techniques provide powerful weapons for financial trading. Backtesting system allows traders to
build, test and improve their trading strategies. This section aims to demonstrate the whole process of
backtesting system: acquiring data, building strategy, improving strategy, testing strategy. All computer
coding is in R with R studio.
Firstly, a simple trading strategy, Mean Average strategy, will be built based on the historical data from Apple
Inc., which is a world well-known company and listed in NASDAQ. Based on the results from backtesting, a
better strategy can be created from the first simple strategy. Then a third strategy is built with technical
indicator, MACD, after evaluating the performance of second strategy. Secondly, three strategies are test with
historical market data from 15 well-known companies to make a further assessment. Finally, high frequency
data is used to test the MACD strategy, demonstrating the high profit possibility of HFT.
5.1Preparation of historical stock data
5.1.1 Download data
Figure 6 Codes for downloading data
As shown in figure 6, from line 1 to line 3 are notes written by the programmer. Notes start with “#” and will
not be run. Line 4 and 5 installs and load the “quantmod” package. A package includes many pre-set functions
which can be used by anyone after loading the package. Line 6 uses the function “getSymbols” to download
historical data of Apple from yahoo finance between Jan. 1st 2010 to Jun. 21
st 2016. From line 7 to line 11, the
raw data are cleaned and saved as a csv file in local drive. Cleaned data are shown in figure 7.
10
Figure 7 Historical data of Apple
5.1.2 Visualization
Market data is usually shown with Candlestick chart. Four crucial information, open price, close price, highest
price and lowest price of a certain time period is contained in one candle stick, as shown in figure 8. If the
close price is higher than open price, the body will be green, otherwise in red.
Figure 8 Candlestick chart (source: Wikipedia)
In R, programmer can build a function by himself to plot candlestick chart. As shown in figure 9, line 15
inputs the stock data into function “Cplot”, which is from line 18 to line 43, and receives the visualization of
stock data in figure 10.
Figure 9 Codes for plotting candlestick chart
11
Figure 10 Visualization of Apple historical share price
5.1.2 Stock split
On Jun. 9th 2014, Apple split every one share of its stock into seven shares (Solomon, 2010). So the stock
price went down to one seventh of before. For traders, the share price before stock split has to be recalculated,
because the total market value of Apple would not be influenced by stock split.
Figure 11 Codes for dealing with stock split
As shown in figure 11, a new function “Stock_Split” is built from line 52 to line 62 to recalculate the stock
price of Apple before the stock split. New price is one seventh of original price. The real historical stock price
pattern is shown in figure 12, which is significant different from figure 10.
Figure 12 Visualization of Apple historical share price recalculated
12
5.2 Moving Average strategy
5.2.1 Calculate Moving Average
Moving average is the arithmetic mean of close price for a certain amount of previous trading days. For
example MA5 is a line connecting the mean of five previous trading days’ close prices of each trading day. To
calculate MA5, in function “Calculate_MA”, from line 76 to line 79, a loop is set up to calculate the mean
close price of previous 5 days based on each trading day. Another loop from line 80 to line 83 calculates MA
20. As shown in figure 14, the black line is the MA 5 line and the orange line is MA 20 line. It is obvious that
MA 5 line is much more sensitive to the movement of stock price. When the stock starts an uptrend, the MA 5
line will cross above the MA 20 line.
Figure 13 Codes for calculating MA 5 & 20 lines
Figure 14 Apple historical share price with MA 5 (the black one) and MA 20 (the yellow one)
5.2.2 Backtesting strategy
From this feature, a simple MA strategy can be designed. When the MA 5 crosses above MA 20, close the
short position and open a long position, and when MA 5 crosses below MA 20, the long position is closed and
another short position is opened. A short position refers to the transaction that an investor borrows stocks from
other investors and sell them, making profits if the price of the stock drops. A long position, on the contrary,
13
refers to the transaction that an investor buy stocks and hold them, making profits if the stock price rises up
(Pilbeam, 2010).
Figure 15 Codes for MA strategy
This strategy can be precisely defined by codes from line 137 to line 151 shown in figure 15. For ith day, if the
value of MA 5 is larger than MA 20, and the value of MA 5 is smaller than MA 20 the day before it, then a
long position is opened. On the contrary, if the value of MA 5 is smaller than MA 20 on ith day, but MA5 is
larger than MA 20 on the day before (i-1)th day, a short position is opened. The trade and position is recorded
separately in the attributes Trade and Hold. A for-loop will perform these two “if” conditional statements to
all data from beginning to end, therefore, backtesting this strategy with historical data, and recording the
results.
5.2.3 Compute profits and visualization
To evaluate the performance of a trading strategy, profit is calculated once a trading is done and the buy or
sell is labelled near the day when that trading is conducted. Two functions are built to calculate the profits and
plot them. As shown in figure 16, a for loop from line 166 to line 182 scans all the trading results, within this
loop, from line 168 to line 173, a conditional statement detects the trade action if the attribute Trade is not 0,
then the profit of this trade is calculated and stored in the variable, “Delta_Profit”. At last, at line 180, all
“Delta_Proft” are added together to get the total profit up to that trade.
Figure 16 Codes for calculating profits
14
Figure 17 Backtesting result for MA strategy on Apple
The result of backtesting MA strategy on Apple from 2011 to 2016 is visualized in figure 17. Each trade is
marked as “Sell” or “Buy” besides the candlestick graph and the profit line is plot below to show the total
profit up to each trade. The strategy keeps losing money in the first year, when the share price is in a range
bound (stock price moves up and down within a chancel). From early 2012, when the Apple starts an uptrend,
the MA strategy begins to make profits. Once the stock price begins to range bound again in late 2013, the
profits made before is lost soon. From 2014 to late 2015, there is an uptrend and a downtrend with several
range bound periods for Apple share price, and the profit line of MA strategy just fluctuates near zero. From
2016, the MA strategy makes several good deals when there is no more range bound, and the profit line
reaches 30 before the backtesting ends.
Because the strategy is designed to sell or buy only one share for each trade, the profit rate is calculated by
dividing the final profit to the mean of the highest and lowest price of this stock within that period. The codes
to calculate profit rate is shown in figure 18.
Figure 18 Codes for calculating profit rate
5.2.4 Evaluation and discussion
From the backtesting result above, it is clear that the MA strategy performs well when the stock price is in a
trend, no matter up or down, but losses money when the stock price is in range bound. This is because it takes
a few days for MA lines to react the movement of stock price. Figure 19 shows the first 16 months backtesting
results. The stock price begins to rise 5 days before the crossover, labelled by “a” in figure 19, of MA 5
crossing above MA 20. The strategy opens a long position at the price of point a. Then a few days past, the
stock price begins to go down, but more days past when the MA 5 reacts to the downtrend and cross below
MA 20 at point b, where the MA strategy close the long position. But the stock price at b is already lower than
that of point a, therefore, a loss is made with the long position from point “a” to point “b”. Similarly, the MA
15
strategy opens a short position at point c. This position does make profits in the next 5 days as the stock price
dropping, but when the strategy close this position at point d, the stock price is already higher than that of
point c, because the trading signal from MA lines crossover is later then the movement of stock price.
Figure 19 Backtesting result for MA strategy on Apple from Jan. 2011 to Apr. 2012
From Dec. 23rd 2011 to early April 2012, an uptrend makes the stock price rise from 50 to 90. The trading
signal is still behind the movement of the stock price, but stock price is rising so much that profit is made by
MA strategy in this trade.
5.3 Improved Moving Average Strategy
Backtesting allows trader to evaluate his trading strategy with real historical data without losing any money.
From discussion above, if a switch can be designed to turn off trading when the stock is in range bound, and
turn on when the stock is in trends, a much higher profit can be achieved.
5.3.1 Avoid trades in range bound
Usually, during range bound period, trading signals are created frequently by MA strategy, so a switch can be
designed as if there is a trading signal within N days before this new trading signal, and then do not trade on
this trading signal. As shown in figure 20, at line 257 a new variable “N” is created to set the length for the
length of days that stop trading, and at line 259, a new attribute “Signal” is created to store all trading signals.
From line 266 to line 284 is the code for Improved MA strategy, compared with codes in figure 15, a new
condition is added at line 269 and line 279. Under this condition, a new trade can be conduct only if the sum
of trading signals generated in previous N days is zero.
16
Figure 20 Codes of Improved MA strategy
5.3.2 Chose the best value for N
As shown in figure 21, the improved MA strategy performs differently with different values of N, the number
of days that stop new trading to reduce loss in range bound. With backtesting system, trader can test the
strategy with all possible values of N, and chose the one with highest profits rate, as shown in figure 22. A
loop tests N from one to thirty, recoding the profit rates of all results. For this instance, in figure 23, the profit
rate reaches 102.78% when N equals to 13.
Figure 21 Backtesting results for Improved MA strategy on Apple with different values of N
17
Figure 22 Codes of Improved MA strategy
Figure 23 Backtesting result for Improved MA strategy on Apple
18
5.3.3 Comparison between MA strategy and Improved MA strategy
As show below, in 2011, the MA strategy makes 18 trades when the stock price is in bound range, and most of
these trades ending up with losing money. The MA strategy lost more than 25 dollars at the end of 2011. The
improved MA strategy, only make 8 trades in 2011, therefore only lost 8 dollars. While testing these two
strategies to same historical data, the profit rate of MA strategy is 7.18%, while that for Improved MA
strategy is 34.04%. Hence, a better strategy is improved based on the results from backtesting.
Figure 24 Comparison between MA strategy and Improved MA strategy
19
5.4 Using technical indicators
The improved MA strategy performs much better than MA strategy, but trading signals from Improved MA
strategy are still a few days behind the trend. To overcome this problem, technical indicators, which are built
from basic historical data with mathematical methods, can be used. There a number of technical indicators
used in financial trading, and Moving average convergence divergence (MACD) is one of them (Appel, 2003).
MACD is calculated from different exponential moving average (EMA) lines. EMA is also a moving average
line, but different from MA, more weight is given to the latest data so EMA is much more sensitive than MA.
As shown in figure 25, a MACD indicator consists of three components: the DIFF line, which equals to
subtracting 26-day EMA from 12-day EMA; DEA line, which is 9-day EMA line, and MACD histogram, the
value of which is subtracting DEA from DIFF.
Figure 25 MACD indicator (source: Wikipedia) Figure 26 Codes for calculating MACD
Figure 27 Codes for MACD strategy
MACD can be easily calculated with computer program, as shown in figure 26. A new strategy, MACD
strategy can be built with MACD indicators. Similar to MA strategy, when the DIFF line cross above DEA
line, open a long position and close short position, when the DIFF line cross below DEA line, do the opposite
transaction. The codes for MACD strategy is in figure 27.
20
Figure 28 Backtesting results for MACD strategy
The backtesting results of MACD strategy is in figure 28. The graph in middle is the MACD indicator. The
MACD strategy performs better than Improved MA strategy, because it not only reaches a higher profit rate at
the end, but make less loss compared with Improved MA strategy in the beginning as well.
21
5.5 Applying same strategy to other stocks
Three strategies, MA, Improved MA and MACD, are built with historical data of Apple. With backtesting
system, it is possible to test these strategies with other stocks to make a further assessment. In this section,
historical data from fifteen well-known companies are downloaded. The performance of each strategy on each
stock is in table 2.
The return rate of each strategy during the whole period of 5 years is recorded in MA row, IMA row and
MACD row. The value of N for Improved MA strategy is recorded. The last row, Compound, is the
compound return of MACD strategy. The compound return rate is computed under the condition that each
year’s interest can be reinvested in next year. For example, For HSBC Holdings, the return rate of MACD
strategy in five years is 102.32 %, which means on average return rate per year is 20.46% (102.32% / 5), so
the compound return rate is 153.6% (=1.2046 ^ 5 - 1).
The best strategy of each stock is marked with grey background. MACD strategy makes the best performance
on nine stocks, and IMA performs best for left six. Both MACD strategy and IMA strategy manage to achieve
a positive return on all fifteen stocks, while MA strategy ends up with losing money on almost half of fifteen
stocks. Generally, MACD strategy is the best among three strategies, with average return rate of 63.29%, and
average compound return rate at 104.56%.
No Name Country Code Source MA IMA N MACD Compound
1 Berkshire Hathaway U.S. NYSE:BRK.A google 15.19 24.81 4 9.56 9.93
2 JPMorgan Chase U.S. NYSE:JPM google 16.35 17.43 15 82.49 114.58
3 Exxon Mobil U.S. NYSE:XOM google 16.3 40.21 3 19.27 20.81
4 Toyota Motor Japan NYSE:TM google -44.31 55.49 20 3.89 3.95
5 AT&T U.S. NYSE:T google -33.01 17.66 8 6.52 6.69
6 HSBC Holdings U.K. HSBA.L yahoo 32.83 49.54 8 102.32 153.68
7 Citigroup U.S. NYSE:C google 67.15 132.01 12 128.12 212.87
8 Wal-Mart Stores U.S. NYSE:WMT google 9.33 48.08 20 50.71 62.09
9 Samsung Electronics South Korea KRX:005930 google -33.42 29.61 9 30.59 34.57
10 Volkswagen Group Germany VOW.F yahoo 52.74 102.68 13 209.3 474.51
11 Microsoft U.S. NASDAQ:MSFT google -20.83 34.44 10 15.03 15.96
12 Google U.S. NASDAQ:GOOGL google -9.64 38.51 24 59.2 74.98
13 Ford Motor U.S. F yahoo 40.71 42.66 1 142.98 251.67
14 IBM U.S. NYSE:IBM google -21.94 16.1 4 72.17 96.23
15 Nestle Switzerland VTX:NESN google -51.48 7.29 21 17.18 18.40
Average 2.40 43.77 63.29 104.56
Table 2 Backtesting of 15 stocks
22
The best and worst result from backtesting of each strategy is shown below.
MA strategy performs best on Citi group, with a total return of 67.15%, and worst on Nestle, with -51.48%.
From figure 29 and 30, MA strategy begins to lose money once the stock is in bound range.
Figure 29 Backtesting results for MA strategy on Citi group
Figure 30 Backtesting results for MA strategy on Nestle
23
Improved MA strategy performs best on Citi group, with a total return of 132.01%, and worst on Toyota
Nestle, with 7.29%. Comparing figure 31 with figure 28, the Improved MA strategy achieved is goal. By
reducing trade frequency during bound range, return rate has been enhanced greatly. From figure 32, as the
uptrend of Nestle ends in early 2015, and started to move up and down within 65 to 75, the Improved MA
strategy begins to lose money. After all, Improved MA strategy is from MA strategy by reducing loss during
bound range. But if a stock keeps moving in a bound range, none of these strategies can make a satisfying
return.
Figure 31 Backtesting results for Improved MA strategy on Citi group
Figure 32 Backtesting results for Improved MA strategy on Nestle
24
MACD strategy is more sensitive to the movement of stock price, so it can make the best among these three
strategies, as shown in figure 33. However, if the stock is in a tense bound range, like what happened for
Toyota Motor in figure 34 from early 2015 to mid 2016, MACD strategy also losses money.
Figure 33 Backtesting results for MACD strategy on Volkswagen Group
Figure 34 Backtesting results for MACD strategy on Toyota Motor
25
5.6 Applying MACD strategy to high frequency trading
After testing strategies with daily stock data, this section focuses on testing MACD strategy on high frequency
data, the CSI 300 Index future traded in China Financial Futures Exchange. The data is candlestick graph of a
period of 15 seconds. MACD strategy is tested on 47 trading days, from Apr. 30th 2015 to Jul. 7
th 2015. The
daily return on index points is in figure 35. As the leverage are usually 8 to 10 times on index future, the real
return equals to 5 to 8 times of this rate.
The MACD strategy manages to make a positive return on all trading days, with the lowest at 0.17% on May
6th (figure 36) and highest at 27.35% on Jul. 2
nd (figure 37). The average daily index point return is 7.17%, so
the average daily return is more than 35% in money.
Figure 35 Daily return rate of Index points
Figure 36 Backtesting for MACD strategy on index future on 2015-05-06
26
Figure 37 Backtesting for MACD strategy on index future on 2015-07-02
6. Potential Social impacts of implying Data Science Technologies in finance trading
6.1 Promote financial speculation
Data science techniques have reshaped financial trading with many super weapons for financial speculation,
which is significantly different from traditional investment.
6.1.1Investing vs. speculation
Investing is the behaviour of allocating capital into assets, in order to gain a return in future. Investors make
investments in stock markets and debt markets, facilitating companies in raising funds for productive purposes,
and indirectly create value for society (Hazen, 1991). Investing has two characteristics, adding value to the
society in long-term. In equity markets, investors analyse and research listed companies, predicting their
future performances, and purchase the stock shares of most promising companies. The stock price of these
companies will rise due to investors purchasing and make it possible for the company to obtain more loan
from bank or issuing new debt. Therefore, the company can produce better products or provide improved
service for its customers and obtain more profits in the future to meet investors’ expectation. Investing is a
long-term behaviour because this process may take several months or even a few years, during which period
investors will hold the stocks of this company.
Speculation, on the other hand, is short-term and non-productive. In financial trading, the term speculation
refers to the practice of making profits based on the prediction of market movement rather than the financial
attributes of this instrument such as dividends or interests (Szado, 2011). Regulation authority, the U.S.
27
Commodity Futures Trading Commission for example, views speculator as a trader “who trades with the
objective of achieving profits through the successful anticipation of price movements" (CFTC, 2005). The
holding periods for stock speculation are usually shorter compared with investing, a few days for most
speculators or a few seconds for HFT speculators. Speculation is a zero-sum game. One speculator’s gain is
another’s loss and therefore speculative trading creates little value for society. It is simply transferring wealth
from loser to winner. For example, in future markets, option markets, credit default swaps (CDS) and other
derivative markets, each contract consists of two traders at the opposite side, a seller and a buyer. While the
winner receives what the loser lose, society gains nothing.
6.1.2 Data Science techniques: super weapons for speculation
Data Science techniques are not created for financial speculation, however, they are widely used by
speculators to reduce the cost of speculation, enhance the practicality and therefore increase the profit margin
of financial speculation significantly.
Constantly making profits in financial speculation is extremely difficult, therefore since World War II, value
investing prevailed over speculation until the rise of quantitative trading since 1990s. A professional
speculator’s goal is to discover the price pattern from historical data, which appears as completely chaos.
Then based on his “knowledge” of the stock price pattern, trading rules are set up to instruct his speculation.
Trading rules are a set of trading plans, quotes, requirements and principles, which define the right thing to do
under each type of situations. For example, trade the strongest stocks in a bull market, or trade with a stop loss.
Then, the speculator has to apply his trading rules into realistic trading, bearing risk and loss, suffering from
greed and fear, to improve his trading rules based on the trading results until constantly making profits or
giving up (Lefevre, 2004). The destination for most financial speculators is failure in life. And because of this,
after World War II, value investing is believed as the orthodoxy of financial trading. The aim of value
investing is to buy securities when it is undervalued by the market compared with the true value computed by
fundamental analysis (Graham & Dodd, 1934). However, after Data Science techniques are introduced into
financial speculation, a whole new world opens up for speculators. Computer programming, backtesting and
automated trading system greatly enhance the power of speculators.
Firstly, with data science techniques, it cost a speculator much less time, money and efforts to create a
profitable trading rule, or say, trading strategy. As demonstrated in section 5.1, historical data of securities
from global markets can be easily acquired and visualized with data science techniques. Then trading
strategies can be precisely defined with mathematical models and computer codes. It is obvious from section
5.2 to section 5.4, that with backtesting system, a speculator can test his trading strategy to any period
historical data of any security, and receive feedback immediately. To do the same thing, it may cost several
months and certain amount of money for speculators in last century. Without losing anything, a speculator in
21st century can easily and constantly evaluate and improve his strategy, until it is profitable enough to be
deployed into realistic trading.
Secondly, data science techniques dramatically enhance the practicality of implying profitable trading strategy
considering with the precision and scope. For speculators in last century, emotional fluctuations, spoiling food,
28
or even a nightmare may become the barriers for them to conduct trading according to their trading rules. As
Jesse Livermore, who was the most famous and successful speculator before WWII, noted, “A stock operator
has to fight a lot expensive enemies within himself” (Lefevre, 2004). At present, however, automated trading
system replaces human to conduct trading strategies precisely as they are designed. As showed in section 5.5
and 5.6, automated trading system can also apply the same strategy to various stocks at the same time,
creating profits with little marginal cost. Data Science techniques greatly increase the scope of trading for
speculators, as it is impossible for human to monitor and trade fifteen or more stocks at the same time.
Thirdly, the profit margin of financial speculation has been increased astonishingly by data science techniques.
As shown in 5.3, a simple strategy after improvement can produce a profit of 80 dollars per stock in 5 years,
while the stock price rose from 60 to 100, which means in this case, a speculator can make 200% of profits
made by a value investing investor. In section 5.4, after testing same strategies to 15 different stocks, MACD
strategy manages to produce a positive return for all 15 stocks. MA strategy performs well for only 8 stocks,
but the improved MA strategy creates positive profits for all 15 stocks. In a five year period, the average
return for MACD strategy is 63%, and 43% for improved MA strategy. However, if compound interest is
taken into consideration, which means previous year’s profit can be reinvested, the average return for MACD
strategy will be 103% for 5 years, which roughly means a return rate of 20% for each year. As a comparison,
Warren Buffett, one of the most successful investors ever, his compound rate of return is 22.3% each year
(Mises, 2010). Therefore, with data science techniques, a profit rate of best ‘value investing’ investors can be
achieved by speculators with little difficulty.
The profit rate for HFT is even higher. Section 5.5 shows a fraction of HFT world, when applying same
MACD strategy to index future, the profit rate soar. After testing the MACD strategy to 47 trading days,
which is roughly 2 month real market data, the strategy manages to product an average daily return rate of 7%
in index point. Because the leverage of index future is 5 to 7 times, therefore, the daily return rate in money
would be more than 30%, which means if this strategy works as designed, it can turn £ 1,000 into £10,000
within two weeks!
It may be argued that these results are from backtesting, therefore transaction cost and various technical
problems in realistic trading may reduce the profit rate. It is true that there is a long way to go from
backtesting to realistic financial trading. However the purpose of section 5 is to demonstrate the principles and
potential power of utilising data science in financial trading, especially for financial speculation. Because
most people who do this job on Wall Street are PhDs of mathematics, physics and computer science from
world’s best universities, it is reasonable to assume that they can create much more profitable strategies than
those demonstrated in section 5 and applying them into realistic trading.
6.2 Potential social impacts of excessive speculation in finance trading
Investing and speculation are two cultures co-existing in financial trading since financial markets were created.
There used to be a balance between them. Investors provide capital for companies and receive a steady but
relative low return. Speculators, who are tempted by the lure of making fast money, trade frequently and
29
provide liquidity for investors unintentionally. Most speculators lose or earn little money finally, but most
investors manage a low but a positive return, so the balance can be maintained.
However, after data science techniques are widely used into financial trading, the balance of investing and
speculation may have been broken, because the profit return of speculation has soared with the super weapons
provided by data science techniques. At present, financial speculation may not only be conducted by
individual speculators, but also become a main source of profits for big financial institutions as well, such as
investment banks and funds.
There are many social impacts caused by the excessive financial speculation. Speculators create little or no
wealth for society at all, so their gain is society’s loss. The extraordinary return of financial speculation may
attract many talented and well-educated young people to be professional speculators, instead of becoming
scientists or inventors benefiting society. What’s more, excessive speculation may compel the management of
companies focusing more on short-term profits, rather than making good decisions that benefits companies
most in long-term.
6.2.1 Financial speculation subtracts value from society
It can be deduced from general principles that because financial speculation does not create wealth, all profits
made by speculator come from other’s loss, other speculator, investor, or even middle class who have shares
of mutual funds or pension funds (Tong, 2014).
Since 2006, fund managers began to notice a strange phenomenon in stock markets. When they decided to
purchase the stock of one company, its share price would go up immediately after fund managers submitted
their purchase orders and the existing sell orders on the order book would just disappear(O’Hara, 2015).
Therefore, the fund manager had to purchase those stocks at a higher price with higher cost. Three years past
before outsiders of Wall Street figured out what had happened. It is due to the order detection strategy from
HFT speculators. Because the order size from fund manager is usually large, say one million shares at one
time, so the broker has to send the order to various exchanges or markets to get this big order matched.
Because the distances between the broker’s computer and various stock exchanges are not the same, so orders
arrive some markets later than others in a few milliseconds, during which disparity HFT speculators make
their profits. When the HFT program detects a large order in one market, it would send buy orders to all other
markets and exchanges to consume all existing sell orders and push up the stock price. All these actions can
be conducted before other orders from fund manager arrives to the markets due to the technical advantages of
HFT speculators over funds. Then, the fund manager has to raise the price of his buy order to get those shares,
when HFT speculators can sell fund managers the shares which they just bought and make the profits (Lewis,
2014).
Facing challenges from HFT speculators, fund managers have two choices, bearing the loss of profits, or using
algorithm trading to cut one larger order into many small orders in order to prevent their trading plan detected
by HFT speculators. Both choices will raise the cost of funds, and it is people who possess the shares of funds
that pay the bill finally, because this cost is a part of the fund fee. It has been estimated that retirement
savings account fees may cost “a median-income two-earner family nearly $154,794 and consume nearly one-
30
third of their investment returns” (Hiltonsmith, 2012). Where does their money go? Profits of financial
institutions and HFT speculators.
6.2.2 Waste of capital, both physical and human
Because present speculators totally depend on their technical advantages to make profits, so the arm race
among them is inevitable. This arm race cost a great deal of capital, both physical and human. Unlike other
industries, the production of this race is simply better speculative trading techniques, but not improved service
that benefits the majority of society.
Literature has indicated that HFT is motivated by techniques, so HFT companies and speculators have to
spend a great deal of money to maintain its technical advantage over other speculators and funds. One of the
most famous examples is that a private company, Spread Networks, spent $300 million to build an 827-mile
long fiber cable route connecting Chicago exchange and New York exchange. This new cable route can cut
the latency between these two exchanges from 17 milliseconds to 12 milliseconds, giving users of Spread
Networks a slight advantage of 5 milliseconds over others (Steiner, 2010).
The extraordinary return of financial speculation attracts a number of well-educated people devoting
themselves into professional speculation with most advanced mathematical and computer techniques, which is
a waste of human capital for society. For example, Renaissance Technologies is one of the most famous and
successful quant funds founded by James Simons in 1982. James Simons is an American mathematician and
used to be a code breaker during cold war (Mallaby, 2010). With statistical methods and data science
techniques, Simons and his fund managed a 71.8 percent annual return from 1994 through mid-2014 (Rubin&
Collins, 2015), and gained more than $ 15 billion for himself by 2016 (Forbes, 2016). On the website of
Renaissance Technologies, it declares that we are a group of “roughly one hundred fifty people, half of whom
have PhDs. in scientific disciplines, dedicated to producing superior returns for its clients and employees by
adhering to mathematical and statistical methods” (Renaissance Technologies, 2016). As for recruitment, it
does not require its job candidates having experience in finance, however strong scientific and computing
skills, excellent academic record are basic requirements for most job openings, and for certain jobs, only
“Ph.D. in Computer Science, Mathematics, Physics, Statistics, or a related discipline” can apply (Renaissance
Technologies, 2016). Because the web pages of job openings may not always be accessible, the screen shots
of these pages are included in Appendix. Renaissance Technologies is not the only quant fund or financial
institutions hiring many PhDs. in scientific fields. In fact, it is so common that there is special term for them,
“Wall Street Rocket Scientists” (Zimmer, 2010). The massive rewards of financial speculation, promoted by
data science techniques and other mathematical and financial knowledge, divert so many talented young
people from socially beneficial fields into professional speculation, which harms the future of society.
6.2.3 Negative impacts on corporation governance
Excessive speculation in financial markets also makes negative impacts on corporation governance, because it
usually compels listed corporations sacrificing long-term growth to meet short-term expectations.
31
As mentioned previously, investors focus on the long-term value, while the speculators mainly focus on the
short-term movement of stock price. If a company fails to meet a short-term expectation, the growth rate of
quarterly earnings for instance, which usually leads to a drop of the stock price of this company in near future,
investors will view it as a chance to buy more shares of this company at a lower price, as long as the company
is capable to fulfil its long-term goal. Speculators, on the other hand, will sell or short stocks of this company
in order to make profits out of the short-term downtrend of this company. As the markets becoming more and
more speculative, corporations will witness a larger drop of its share price as long as it fails to meet the short-
term targets set by analysts, because there are more speculators to short and fewer investors to buy. As a result,
company managers have to bear higher pressure, as shares price is one of the most decisive indicators to
measure their performance. This pressure may force companies to sacrifice long-term value to meet short-term
goals, because for any activities benefiting in long-term, such as research and development or deploying new
strategies, the cost is committed now but the fruits is uncertain and can only be reaped in future
(Ashkenas,2012; Strine, 2010). A survey of 400 executives indicates that 78% of managers admit that they
have sacrificed long-term value to smooth earnings in order to boost stock price (Graham, Harvey & Rajgopal,
2005).
6.3 Financial industry, Neoliberalism and income inequality
More than a century ago, Oscar Wilde described the cynic as “a man who knows the price of everything, but
value of nothing”. Nowadays financial speculators with data science techniques follow this definition, making
enormous profits out of price movement of securities and derivatives with little consideration about the value
behind them.
In 2016, IMF’s research department published a new report, Neoliberalism: Oversold?, rethinking the effects
of Neoliberalism polices widely deployed by many countries since 1980s. Financial openness, which is one of
the major policy of Neoliberalism, not only attracted foreign direct investment boosting the long-term
economic growth, but also brought “portfolio investment, banking and especially hot, or speculative, debt
inflows”, which leads to a high degree of income inequality in both U.S. and U.K. (Ostry, Loungani & Furceri,
2016). Volscho (2015) argued that Neoliberalism has “put Wall street back into power”, because it took the
profit rate of financial sector to a new high, while the profit rate of other industry “was not restored by the
neoliberal era”.
6.3.1 The culture of speculation in finance sector
Although data science techniques are widely used in financial speculation, it is not Data science that creates
speculation. The impacts of techniques are not decided by techniques itself, but by the people who utilise them.
From a broader point of view, the culture of speculation has gradually prevailed over the investing culture in
finance industry. Data science techniques are exploited by this speculative culture and in turn, strengthen it. In
1950, the annual turnover of stocks in U.S was 15%, which means only 15% of markets shares was traded
within that year. By the late 1990s, it rose to 100%, and in 2010, the annual turnover was 250%, which means
generally everyone in the markets trades more frequently and becomes more speculative. For institutional
32
investors, the fund portfolio turnover also increased from 15%-20% in 1950s to 100% in recent years (Bogle,
2011). For last decade, the average annual value of equity IPOs in U.S. is $ 42 billion (Renaissance Capital,
2016) and $ 110 billion for secondary offerings (Bogle, 2011). Therefore the stocks markets raised around
$ 150 billion for listed companies each year, while the annual stock trading volume is around $ 30 trillion
(Pollin & Heintz, 2011), 200 times of capital raised for listed companies. At the same time, trading in futures
and other derivatives also soared. In 2010, trading volume in S&P 500 futures was $33 trillion, and the total
value of all credit default swaps (CDS) and other derivatives was $ 580 trillion, about 4 times of the value of
world’s stock and bonds (Bogle, 2011).
The initial aim of futures and derivatives is to allocate risks for investors who hold the underlying securities,
like buying insurance for a car. However, the problem now is that the finance industry is becoming so
speculative that “insurance fee” is multiple times of assets insured for. Why would this happen? Financial
institutions make profits out of fees and commissions by selling derivatives to their clients. For example,
income from sales and trading accounted for 36% of Morgan Stanley’s revenues during the first nine months
of 2010, while the revenue from investment bank, which is raising capital for companies, only accounted for
15%. Goldman Sachs made 63% of its revenue out of trading and sales from July to September in 2010, while
that of corporate finance is only 13% (Cassidy, 2010).
6.3.2 The Neoliberal reform in 1980s
Why does financial industry become more and more speculative? The initial purpose of financial industry is to
provide financial service for other productive industries. However, after the Neoliberalism policies occupied a
dominant place since 1980s, finance industry has gradually turned its goal to serve itself and the rich by
exacting profits from society. As a result, the income inequality has been deteriorated, causing many social
problems (Volscho, 2015).
In late 1970s and early 1980s, the economy of western world was stuck in stagnation, where the economy
ceased to grow but the inflation kept rising (Easterly2001). The Keynesian policies, which saved the U.S.
from the Great Depression and were deployed widely ever since, could do nothing about it, as the expense of
government were already too high and government deficits was one of the main reasons causing stagnation.
To overcome the stagnation, Neoliberal policies were deployed by U.S. President Reagan and U.K. Prime
Minister Thatcher. The Neoliberalism, following classical Liberalism, believes in the “invisible hand” of free
market, that while individuals pursue self-interest under justice in a free and competitive market, the good of
society will be promoted by individuals unintentionally (Adams, 2001). Neoliberal scholars developed Adam
Smith’s invisible hand into general equilibrium theory, proving that several free and fully-competitive
interacting markets can achieve an overall equilibrium automatically, maximizing the welfare of every
participant of markets (Samuelson, 1953). Therefore, the key features of Neoliberal reform are reduction of
welfare and taxation, deregulation of finance industry in order to promote free markets (Prasad, 2006).
33
Figure 38 Financial Business and Nonfinancial Business Augmented Rates of Profit in U.S. (Bakir&
Campbell, 2013)
6.3.3 Financial innovation and income inequality
Neoliberal policies did work since 1980s, as shown in figure 38, the declining trend in rate of profits of both
financial and nonfinancial business in U.S. ceased in early 1980s and started go up. The side effects of
Neoliberalism was not clear until 1990s, when the profit rate of finance began to soar, while that of
nonfinancial business fluctuated around 5%-8%.
The deregulation of financial sector gave place for financial innovations, such as those super speculation
weapons based on data science techniques, or subordinated debt and CDS which brought the whole world into
crisis in 2007. These financial innovations were invented under the pursuit of self-interest in free markets, but
unlike theoretical hypothesis, much of these innovations have nothing to do with the good of society, the
results of which are transferring wealth from the majority of society to those very rich people. Paul Volcker, a
former chairman of Federal Reserve, stated that ATM was the only financial innovation that improved society
(WSJ, 2009). Paul Krugman, a Nobel Prize winner in economics, views the rapid growth of finance since
1980s, the start point of Neoliberal era, “largely as rent-seeking, rather than true productivity” (Krugman,
2009).
34
Figure 39 the top decile income share from 1917-2014 (Saez, 2013)
One of the social impacts of Neoliberalism and financial innovations is income inequality. As financial
speculators extract profits with data science techniques from other investors, financial industry drains wealth
from real economy. It has been estimated that each year around $ 635 billion wealth was transferred to
financial sector in United States (Turbeville, 2012). As shown in figure 40, the income share of top 10% rich
American families started to rise since 1980s, and kept rising even after the financial crisis in 2007, reached
50% in 2012, the highest level after the Great depression.
Figure 40 Real Median Household Income in the United States (source: U.S. Bureau of the Census)
35
While the rich get richer, the public suffer. Figure 40 shows that the real median household income in the
United States reached its peak in 1998, and has been declining since then. There are many factors caused the
depravation of income inequality. Excessive financial speculation within the overgrowth of financial industry
is definitely one of them, if not the biggest one.
7. Conclusion
Backtesting and automated trading system are the new tools produced by data science techniques for financial
trading. As described and demonstrated in this research, with these new tools, it is possible to build various
types of trading strategy with computer coding and test them with real historical data. Trading strategy can be
analysed and improved based on the evaluation of backtesting results.
Many applications can be build on these new tools, such as various profitable trading strategies and high
frequency trading (HFT). These powerful applications impose significant financial impacts. In this research,
three trading strategies are reproduced, Mean Average strategy, Improved Mean Average strategy and MACD
strategy. After testing these three strategies with the historical data of 15 listed companies, MACD strategy
and Improved Mean Average strategies perform much better than Mean Average strategy, making profits on
all 15 stocks. The average compound return rate for MACD strategy is 20% per year during previous five
years. A much higher return rate can be achieved if applying same trading strategy with high frequency data.
The average daily return rate of MACD strategy in index future is around 30%.
These new tools and applications are so profitable that may have many potential impacts on both economy
and society. Data Science Techniques enhance the profit rate for financial speculators dramatically, as
speculators can improve their trading strategies at low cost with backtesting, and conduct the realistic trading
exactly according to their trading strategies with automated trading system. As a result, the balance between
speculation and investing in finance industry may have been broken, as the return rate for financial
speculation soared with data science techniques. However, the excessive financial speculation may have
negative social impacts, because it subtracts value from the society, costing a great deal of both physical and
human capital without providing any socially beneficial service. Excessive speculation may also compel
companies to sacrifice long-term value in order to meet short-term goals set by financial markets. From macro
point of view, it is the Neoliberal policies deployed since 1980s that gave rise to various financial innovations.
Many of these innovations, despite aiming to pursue self-interest, contribute nothing to the good of society.
Similar to speculative trading program built by data science techniques exacting profits from other investors,
financial innovations may transfer large deal of wealth from real economy to financial sector, resulting the
depravation of income inequality.
The limitation of this research is that the demonstration and case study in this study are based on secondary
resources such as books and literature. Data Science techniques do enhance the power of financial speculation,
but it is still unclear what happens in realistic financial speculation. Future work may need to acquire more
market data or interview traders to get a better understanding of these new tools. It is also suggested that more
policy-oriented research should be done, in order to protect not only investors but also society from the harm
of excessive financial speculation.
References
Adams, I. (2001). Political ideology today. Manchester University Press.
Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with
inventory. Journal of Financial Economics, 8(1), 31-53.
Angel, J. J., & McCabe, D. (2013). Fairness in financial markets: The case of high
frequency trading. Journal of business ethics, 112(4), 585-595.
Appel, G. (2003). Become Your Own Technical Analyst: How to Identify Significant
Market Turning Points Using the Moving Average Convergence-Divergence Indicator
or MACD. The Journal of Wealth Management, 6(1), 27-36.
Ashkenas, R. (2012). Thinking Long-Term in a Short-Term Economy. Retrieved July
10, 2016 from https://hbr.org/2012/08/thinking-long-term-in-a-short/
Austin, M. P., Bates, G., Dempster 3, M. A., Leemans, V., & Williams, S. N. (2004).
Adaptive systems for foreign exchange trading. Quantitative Finance,4(4), 37-45.
Bailey, D. H., Borwein, J. M., de Prado, M. L., & Zhu, Q. J. (2014).
Pseudomathematics and financial charlatanism: The effects of backtest over fitting on
out-of-sample performance. Notices of the AMS, 61(5), 458-471.
Bakir, E., & Campbell, A. (2013). The Financial Rate of Profit What is it, and how
has it behaved in the United States? Review of Radical Political Economics, 45(3),
295-304.
Bogle, J. C. (2011). The clash of the cultures. Journal of Portfolio Management, 37(3),
14.
Bondarenko, O. (2003). Statistical arbitrage and securities prices. Review of Financial
Studies, 16(3), 875-919.
Boot, A. W., &Thakor, A. V. (1997). Financial system architecture. Review of
Financial studies, 10(3), 693-733.
Cao, L., Wang, J., Lin, L., & Zhang, C. (2004, September). Agent services-based
infrastructure for online assessment of trading strategies. In Intelligent Agent
Technology, 2004.(IAT 2004). Proceedings. IEEE/WIC/ACM International
Conference on (pp. 345-348). IEEE.
Carrion, A. (2013). Very fast money: High-frequency trading on the
NASDAQ. Journal of Financial Markets, 16(4), 680-711.
Cassidy, J. (2010). WHAT GOOD IS WALL STREET? Much of what investment
bankers do is socially worthless. Retrieved July 10, 2016 from
http://www.newyorker.com/magazine/2010/11/29/what-good-is-wall-street
Chong, T. T. L., & Ng, W. K. (2008). Technical analysis and the London stock
exchange: testing the MACD and RSI rules using the FT30. Applied Economics
Letters, 15(14), 1111-1114.
Commodity Futures Trading Commission. (2005). CFTC GLOSSARY.Retrieved July
11, 2016 from
http://www.cftc.gov/ConsumerProtection/EducationCenter/CFTCGlossary/index.htm
Creamer, G., & Freund, Y. (2010). Automated trading with boosting and expert
eighting. Quantitative Finance, 10(4), 401-420.
Dempster, M. A., &Leemans, V. (2006). An automated FX trading system using
adaptive reinforcement learning. Expert Systems with Applications,30(3), 543-552.
Easterly, W. (2001). The lost decades: developing countries' stagnation in spite of
policy reform 1980–1998. Journal of Economic Growth, 6(2), 135-157.
Elliott, R. J., Van Der Hoek, J., & Malcolm, W. P. (2005). Pairs trading.Quantitative
Finance, 5(3), 271-276.
Forbes. (2016). The World’s Billionaires. Retrieved July 10, 2016 from
http://www.forbes.com/profile/james-simons/
Goldstein, M. A., Kumar, P., & Graves, F. C. (2014). Computerized and
High‐Frequency Trading. Financial Review, 49(2), 177-202.
Graham, B., & Dodd, D. L. (1934). Security analysis: principles and technique.
McGraw-Hill.
Graham, J. R., Harvey, C. R., &Rajgopal, S. (2005). The economic implications of
corporate financial reporting. Journal of accounting and economics, 40(1), 3-73.
Hazen, T. L. (1991). Rational Investments, Speculation, or Gambling--Derivative
Securities and Financial Futures and Their Effect on the Underlying Capital
Markets. Nw. UL Rev., 86, 987.
Hendershott, T., & Riordan, R. (2013). Algorithmic trading and the market for
liquidity. Journal of Financial and Quantitative Analysis, 48(04), 1001-1024.
Hiltonsmith, R. (2012). The Retirement Savings Drain: The Hidden and Excessive
Costs of 401 (k) s. Demos, May, 29.
Jones, C. M. (2013). What do we know about high-frequency trading?.Columbia
Business School Research Paper, (13-11).
Kabacoff, R. (2015). R in action: data analysis and graphics with R. Manning
Publications Co..
Kitchin, R. (2014). The data revolution. SAGE publications.
Krugman, P. (2009). Darling, I love you. Retrieved July 10, 2016 from
http://krugman.blogs.nytimes.com/2009/12/09/darling-i-love-you
Lauricella, T. (2010). Market Plunge Baffles Wall Street. Retrieved November 11,
2015 from
http://www.wsj.com/articles/SB10001424052748704370704575228664083620340
Lefevre, E. (2004). Reminiscences of a stock operator (Vol. 175). John Wiley & Sons.
Levine, D.M. (2013), A day in the quiet life of a NYSE floor trader, Retrieved April
10, 2016 from
http://fortune.com/2013/05/29/a-day-in-the-quiet-life-of-a-nyse-floor-trader/
Lewis, M. (2014). Flash boys: a Wall Street revolt. WW Norton & Company.
Liu, Z., & Xiao, D. (2009). An automated trading system with multi-indicator fusion
based on DS evidence theory in forex market. In Fuzzy Systems and Knowledge
Discovery, 2009. FSKD'09. Sixth International Conference on (Vol. 3, pp. 239-243).
IEEE.
Livermore, J. (2006). How to trade in stocks. McGraw Hill Professional.
Lopez, J.A. and Saidenberg, M.R., 2000. Evaluating Credit Risk Models. Journal of
Banking and Finance, 24, 151-167
Madura J. (2012). Financial Institutions and Markets, (10th
Edition), Ohio, United
States: Cengage South-Western
Manahov, V., Hudson, R., &Gebka, B. (2014). Does high frequency trading affect
technical analysis and market efficiency? And if so, how?. Journal of International
Financial Markets, Institutions and Money, 28, 131-157.
Maier-Paape, S., & Platen, A. (2014). Backtest of trading systems on candle
charts. arXiv preprint arXiv:1412.5558.
Mallaby, S. (2010). More money than god: Hedge funds and the making of the new
elite. A&C Black.
Marshall, B. R., Nguyen, N. H., &Visaltanachoti, N. (2013). ETF arbitrage: Intraday
evidence. Journal of Banking & Finance, 37(9), 3486-3498.
Masteika, S., Rutkauskas, A. V., & Alexander, J. A. (2012, February). Continuous
futures data series for back testing and technical analysis. In Conference Proceedings,
3rd International Conference on Financial Theory and Engineering (Vol. 29, pp.
265-269). IACSIT Press.
Menkveld, A. J. (2014). High‐Frequency Traders and Market Structure. Financial
Review, 49(2), 333-344.
Mises, L.V. (2010), The Wealth of Generations: Warren Buffett, Retrieved July 10,
2016 fromhttp://thewealthofgenerations.blogspot.co.uk/2010/01/warren-buffett.html
Mishkin, F. S. (2007). The economics of money, banking, and financial markets.
Pearson education.
Ni, J., & Zhang, C. (2005). An efficient implementation of the backtesting of trading
strategies. In Parallel and Distributed Processing and Applications(pp. 126-131).
Springer Berlin Heidelberg.
O’Hara, M. (2015). High frequency market microstructure. Journal of Financial
Economics, 116(2), 257-270.
Ostry, J, Loungani, P.&Furceri, D. (2016). Neoliberalism: Oversold?Finance&
Development, June 2016, Vol. 53, No. 2.
Pasquale, F. (2015). The black box society: The secret algorithms that control money
and information. Harvard University Press.
Pilbeam, K. (2010). Finance and financial markets. Palgrave Macmillan.
Pollin, R., &Heintz, J. (2011). Transaction Costs, Trading Elasticities and the
Revenue Potential of Financial Transaction Taxes for the United States.Research
Brief, December 2011, 1-16.
Prasad, M. (2006). The politics of free markets: The rise of neoliberal economic
policies in Britain, France, Germany, and the United States (Vol. 19). Chicago:
University of Chicago Press.
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and
data-driven decision making. Big Data, 1(1), 51-59.
Reinhart, C. M., & Rogoff, K. S. (2008). Is the 2007 US sub-prime financial crisis so
different? An international historical comparison (No. w13761). National Bureau of
Economic Research.
Renaissance Capital. (2016). IPO Center. Retrieved July 10, 2016 from
http://www.renaissancecapital.com/ipohome/press/mediaroom.aspx?market=us
Renaissance Technologies. (2016). Job Opening for Quantitative Finance. Retrieved
July 10, 2016 from https://www.rentec.com/Jobs.action?research=true
Rubin, R. & Collins, M. (2015). How an Exclusive Hedge Fund Turbocharged Its
Retirement Plan. Retrieved July 10, 2016 from
http://www.bloomberg.com/news/articles/2015-06-16/how-an-exclusive-hedge-fund-t
urbocharged-retirement-plan
Saez, E. (2013). Striking it Richer: The Evolution of Top Incomes in the United States
(updated with 2012 preliminary estimates). Berkeley: University of California,
Department of Economics. http://elsa. berkeley. edu/~ saez/saez-UStopincomes-2012.
pdf et The World Top Incomes Database. h ttp://topincomes. gmond.
parisschoolofeconomics. eu.
Samuelson, P. A. (1953). Prices of factors and good in general equilibrium. The
Review of Economic Studies, 1-20.
Schutt, R. & O'Neil, C. (2013). Doing Data Science. California, United States:
O’Reilly Media.
Securities and Exchange Commission. (2010). Concept Release on Equity Market
Structure (Release No. 34-61358). Washington, DC: U.S.
Securities and Exchange Commission, & Securities and Exchange Commission.
(2010). Findings regarding the market events of May 6, 2010.Report of the Staffs of
the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues.
Serbera, J. P., &Paumard, P. (2016). The fall of high-frequency trading: A survey of
competition and profits. Research in International Business and Finance, 36,
271-287.
Selyukh, A.(2013), Hackers send fake market-moving AP tweet on White House
explosions, Retrieved April 9, 2016 from
http://www.reuters.com/article/net-us-usa-whitehouse-ap-idUSBRE93M12Y2013042
3
Shaffer, L., China’s wild swings spark confusion in markets. Retrieved November 11,
2015 from http://www.cnbc.com/id/100967541
Solomon, J. (2010). Apple stock now costs $94. Fans love it. Retrieved July 10, 2016
from http://money.cnn.com/2014/06/09/investing/apple-stock-split-reactions/
Steiner, C. (2010). Wall Street's speed war.Retrieved July 10, 2016
fromhttp://www.forbes.com/forbes/2010/0927/outfront-netscape-jim-barksdale-daniel
-spivey-wall-street-speed-war.html
Strine Jr, L. E. (2010). One Fundamental Corporate Governance Question We Face:
Can Corporations Be Managed for the Long Term Unless Their Powerful Electorates
Also Act and Think Long Term?. The Business Lawyer, 1-26.
Szado, E. (2011). Defining speculation: The first step toward a rational dialogue. The
Journal of Alternative Investments, 14(1), 75.
The Crises. (2009). Income Inequality in the US. Retrieved July 10, 2016 from
http://www.the-crises.com/income-inequality-in-the-us-1/
Tong, L. (2014). A blessing or a curse? The impact of high frequency trading on
institutional investors. In The Impact of High Frequency Trading on Institutional
Investors (October 5, 2015). European Finance Association Annual Meetings.
Turbeville, W. C. (2012). New Perspective on the Costs and Benefits of Financial
Regulation: Inefficiency of Capital Intermediation in a Deregulated System, A. Md. L.
Rev., 72, 1173.
Volscho, T. (2015). The Revenge of the Capitalist Class: Crisis, the Legitimacy of
Capitalism and the Restoration of Finance from the 1970s to Present. Critical
Sociology, 0896920515589003.
Wall Street Journal. (2009). Paul Volcker: Think More Boldly. Retrieved July 10,
2016 from
http://www.wsj.com/articles/SB10001424052748704825504574586330960597134
Zimmer, B. (2010). Quants. Retrieved July 10, 2016 from
http://www.nytimes.com/2010/05/16/magazine/16FOB-OnLanguage-t.html?_r=0
Appendices
A: Websites of Renaissance Technologies
B: Codes programmed for this research with R studio
setwd("U:/ Disertation/program")
#-------------------#
#5.1.1 Get data from Yahoo
#-------------------#
install.packages("quantmod")
library(quantmod)
stock_data<-getSymbols("AAPL",from = "2010-01-01",to = "2016-06-21",src =
"yahoo",auto.assign=FALSE,row.names= 1)
colnames(stock_data)<-c('OPEN','HIGH','LOW','CLOSE','Volue','Adjust')
stock_data$DATE[1]<-as.character(index(stock_data)[1])
for(i in 1:length(stock_data$OPEN)) {stock_data$DATE[i]<-as.character(index(stock_data)[i]) }
stock_data<-stock_data[,c(7,1:4)]
write.csv(stock_data, file="data/AAPL.csv",row.names=F,quote=F)
#-------------------#
#5.1.2 Visualization
#-------------------#
Cplot(stock_data)
#-------------------#
#Plot C-graph
Cplot<-function(stock)
{
par(mar=c(2,4,1.5,0.5))
N<-length(stock$OPEN)
plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5)
title(main="APPLE",cex=2,col='black')
w<-0.3
for(i in 1:N)
{
D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])
lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)
x<-c(i-w,i-w,i+w,i+w)
y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])
if(D<0)
{
polygon(x,y,col='red',border='red')
} else
{
polygon(x,y,col='green',border='green')
}
}
Index<-seq(from=1,to=N,length=5)
Index<-round(Index)
Text<-stock$DATE[Index]
axis(side=1,Index,labels=Text,cex.axis=1)
}
#-------------------#
#5.1.3Deal with stock split
#-------------------#
stock<-read.csv("data/AAPL.csv",header=T,encoding='UTF-8',as.is=TRUE)
stock<-Stock_Split(stock)
Cplot(stock)
#-------------------#
#Function for APPLE stock split
Stock_Split<-function(stock)
{
for(i in 1:1114)
{
stock$OPEN[i]<-stock$OPEN[i]/7
stock$HIGH[i]<-stock$HIGH[i]/7
stock$LOW[i]<-stock$LOW[i]/7
stock$CLOSE[i]<-stock$CLOSE[i]/7
}
return(stock)
}
#-------------------#
#-------------------#
#5.2.1Calculate MA
#-------------------#
stock<-Calculate_MA(stock)
Kplot_MA(stock)
#-------------------#
#Calculate MA5, MA20
Calculate_MA<-function(stock)
{
stock$MA5<-as.numeric(0)
stock$MA20<-as.numeric(0)
N<-length(stock$DATE)
for(i in 5:N)
{
stock$MA5[i]<-mean(stock$CLOSE[(i-4):i])
}
for(i in 20:N)
{
stock$MA20[i]<-mean(stock$CLOSE[(i-19):i])
}
return(stock)
}
#-------------------#
# Plot MA
Kplot_MA<-function(stock,Number)
{
par(mar=c(2,4,1.5,0.5))
N<-length(stock$OPEN)
plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5)
title(main="APPLE",cex=2,col='black')
w<-0.3
for(i in 1:N)
{
D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])
lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)
x<-c(i-w,i-w,i+w,i+w)
y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])
if(D<0)
{
polygon(x,y,col='red',border='red')
} else
{
polygon(x,y,col='green',border='green')
}
}
lines(c(1:N),stock$MA5,lwd=2,col="black")
lines(c(1:N),stock$MA20,lwd=2,col="orange")
Index<-seq(from=1,to=N,length=5)
Index<-round(Index)
Text<-stock$DATE[Index]
axis(side=1,Index,labels=Text,cex.axis=1)
}
#-------------------#
#-------------------#
#5.2.2 Backtesting MA strategy
#-------------------#
stock<-Cut_Period(stock,253,1628)
stock<-MA_Strategy(stock)
#-------------------#
Cut_Period<-function(data,start,end)
{
data<-data[c(start:end),]
return(data)
}
#-------------------#
MA_Strategy<-function(stock)
{
stock$Trade<-0
stock$Hold<-0
N<-length(stock$DATE)
flag_hold<-'0'
for (i in 2:N)
{
if(stock$MA5[i-1]<=stock$MA20[i-1] & stock$MA5[i]>stock$MA20[i] & stock$Hold[i-1]!='1')
{
stock$Trade[i]<-'1'
flag_hold<-'1'
}
if(stock$MA5[i-1]>=stock$MA20[i-1] & stock$MA5[i]<stock$MA20[i] & stock$Hold[i-1]!='-1')
{
stock$Trade[i]<-'-1'
flag_hold<-'-1'
}
stock$Hold[i]<-flag_hold
}
return(stock)
}
#-------------------#
#5.2.3 Compute profits and visualization
#-------------------#
stock<-Calcualate_Profit(stock)
Kplot_Profit(stock,"APPLE")
Calcualate_Profit<-function(stock)
{
stock$Profit<-as.numeric(0)
stock$Delta_Profit<-0
flag_first_trade<-0
N<-length(stock$DATE)
for (i in 2:N)
{
if(stock$Trade[i]!='0' & flag_first_trade=='1')
{
stock$Delta_Profit[i]=((-1)*stock$OPEN[i]*as.numeric(stock$Trade[i])
-stock$CLOSE[last_trade]*as.numeric(stock$Trade[last_trade]))
last_trade<-i
}
if(stock$Trade[i]!='0' & flag_first_trade=='0')
{
flag_first_trade<-'1'
last_trade<-i
}
stock$Profit[i]<-stock$Profit[i-1]+stock$Delta_Profit[i]
}
return(stock)
}
#-------------------#
# Plot Profits
Kplot_Profit<-function(stock,stock_name)
{
m<-matrix(c(1,2,1,2),2,2)
N<-length(stock$OPEN)
layout(m,heights=c(3,1))
par(mar=c(0.5,2.5,1.5,0.5))
plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5,cex.axis=1.8)
title(main=stock_name,cex=4,col='black')
w<-0.3
for(i in 1:N)
{
D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])
lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)
x<-c(i-w,i-w,i+w,i+w)
y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])
if(D<0)
{
polygon(x,y,col='red',border='red')
if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-4,"buy",col='green',cex=2)}
if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+5,"sell",col='red',cex=2)}
} else
{
polygon(x,y,col='green',border='green')
if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-4,"buy",col='green',cex=2)}
if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+5,"sell",col='red',cex=2)}
}
}
text((N*0.1),(max(as.numeric(stock$HIGH))-10),paste("Profit
rate:",Profit_rate(stock),"%"),col='black',cex=2)
lines(c(1:N),stock$MA5,lwd=2,col="black")
lines(c(1:N),stock$MA20,lwd=2,col="orange")
Index<-seq(from=1,to=N,length=5)
Index<-round(Index)
Text<-stock$DATE[Index]
axis(side=1,Index,labels=Text,cex.axis=1.8)
plot(c(1:N),stock$Profit,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)
abline(0,0)
lines(c(1:N),stock$Profit,lwd=2,col="blue")
}
#-------------------#
#-------------------#
#5.2.4 Evaluation and discussion
#-------------------#
temp<-Cut_Period(stock,1,270)
Kplot_Profit(temp,"APPLE")
#-------------------#
#-------------------#
#5.3.1 Avoid trades in range bound
#-------------------#
profits<-c()
for (i in 1:30)
{
stock<-MA_Avoid_Range_Bound(stock,i)
stock<-Calcualate_Profit(stock)
profits[i]<-Profit_rate(stock)
}
max(profits)
i<-which.max(profits)
i
profits
stock<-MA_Avoid_Range_Bound(stock,13)
stock<-Calcualate_Profit(stock)
Kplot_Profit(stock,"APPLE")
MA_Avoid_Range_Bound<-function(stock,N)
{
stock$Signal<-0
stock$Trade<-0
stock$Hold<-0
Len<-length(stock$DATE)
flag_hold<-'0'
for (i in (N+1):Len)
{
if(stock$MA5[i-1]<=stock$MA20[i-1] & stock$MA5[i]>stock$MA20[i] & stock$Hold[i-1]!='1')
{
stock$Signal[i]<-'1'
if(sum(as.numeric(stock$Signal[(i-N):(i-1)]))==0)
{
stock$Trade[i]<-'1'
flag_hold<-'1'
}
}
if(stock$MA5[i-1]>=stock$MA20[i-1] & stock$MA5[i]<stock$MA20[i] & stock$Hold[i-1]!='-1')
{
stock$Signal[i]<-'1'
if(sum(as.numeric(stock$Signal[(i-N):(i-1)]))==0)
{
stock$Trade[i]<-'-1'
flag_hold<-'-1'
}
}
stock$Hold[i]<-flag_hold
}
return(stock)
}
Profit_rate<-function(stock)
{
profit<-stock$Profit[length(stock$Profit)]
average_cost<-(min(stock$CLOSE)+max(stock$CLOSE))/2
return(round(100*profit/average_cost,2))
}
#-------------------#
#-------------------#
#5.4 Using technical indicators
#-------------------#
stock<-read.csv("data/AAPL.csv",header=T,encoding='UTF-8',as.is=TRUE)
stock<-Stock_Split(stock)
Cplot(stock)
stock<-Calculate_MACD(stock)
stock<-Cut_Period(stock,253,1628)
stock<-MACD_Cross(stock)
stock<-Calcualate_Profit(stock)
Kplot_MACD(stock,"APPLE")
temp<-Cut_Period(stock,1,330)
Kplot_MACD(temp,"APPLE")
# Calculate MACD
Calculate_MACD<-function(stock)
{
library(TTR)
stock$EMA12<-'0'
stock$EMA26<-'0'
stock$DIFF<-'0'
stock$DEA<-'0'
stock$MACD<-'0'
stock$EMA12<-EMA(stock$CLOSE,12)
stock$EMA26<-EMA(stock$CLOSE,26)
stock$DIFF<-stock$EMA12-stock$EMA26
stock$DEA<-EMA(stock$DIFF,9)
stock$MACD<-2*(stock$DIFF-stock$DEA)
return(stock)
}
#-------------------#
#MACD Cross strategy
MACD_Cross<-function(stock)
{
stock$Trade<-0
stock$Hold<-0
N<-length(stock$DATE)
flag_hold<-'0'
for (i in 2:N)
{
if(stock$DIFF[i-1]<=stock$DEA[i-1] & stock$DIFF[i]>stock$DEA[i] & stock$Hold[i-1]!='1')
{
stock$Trade[i]<-'1'
flag_hold<-'1'
}
if(stock$DIFF[i-1]>=stock$DEA[i-1] & stock$DIFF[i]<stock$DEA[i] & stock$Hold[i-1]!='-1')
{
stock$Trade[i]<-'-1'
flag_hold<-'-1'
}
stock$Hold[i]<-flag_hold
}
return(stock)
}
#-------------------#
#Plot MACD
Kplot_MACD<-function(stock,stock_name)
{
m<-matrix(c(1,2,3,1,2,3),3,2)
N<-length(stock$OPEN)
layout(m,heights=c(3,1,1))
par(mar=c(0.5,2.5,1.5,0.5))
plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5,cex.axis=1.8)
title(main=stock_name,cex=2,col='black')
w<-0.3
for(i in 1:N)
{
D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])
lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)
x<-c(i-w,i-w,i+w,i+w)
y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])
if(D<0)
{
polygon(x,y,col='green',border='green')
if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-0.2,"buy",col='green',cex=2)}
if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+0.2,"sell",col='red',cex=2)}
} else
{
polygon(x,y,col='red',border='red')
if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-1.5,"buy",col='green',cex=2)}
if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+1.5,"sell",col='red',cex=2)}
}
}
Index<-seq(from=1,to=N,length=5)
Index<-round(Index)
Text<-stock$DATE[Index]
axis(side=1,Index,labels=Text,cex.axis=1.8)
text((N*0.1),(max(as.numeric(stock$HIGH))-10),paste("Profit
rate:",Profit_rate(stock),"%"),col='black',cex=2.5)
plot(c(1:N),stock$DIF,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)
w<-0.1
for(i in 1:N)
{
if(stock$MACD[i]<0)
{
lines(c(i,i),c(stock$MACD[i],0),col='red',lwd=1)
}
if(stock$MACD[i]>0)
{
lines(c(i,i),c(0,stock$MACD[i]),col='green',lwd=1)
}
}
abline(0,0)
lines(c(1:N),stock$DIF,lwd=2,col="red")
lines(c(1:N),stock$DEA,lwd=2,col="blue")
plot(c(1:N),stock$Profit,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)
abline(0,0)
lines(c(1:N),stock$Profit,lwd=2,col="blue")
}
#-------------------#
#-------------------#
#5.5 Applying same strategy to other stocks
#-------------------#
stock_list<-
c("NYSE:BRK.A","NYSE:JPM","NYSE:XOM","NYSE:TM","NYSE:T","HSBA.L","NYSE:C","NYSE:WMT","KR
X:005930","VOW.F","NASDAQ:MSFT","NASDAQ:GOOGL","F","NYSE:IBM","VTX:NESN")
stock_Num<-15
stock_code<-stock_list[stock_Num]
stock_data<-getSymbols(stock_code,from = "2010-01-01",to = "2016-06-21",src =
"google",auto.assign=FALSE,row.names= 1)
stock_data<-stock_data[,c(1:4)]
colnames(stock_data)<-c('OPEN','HIGH','LOW','CLOSE')
stock_data$DATE<-""
for(i in 1:length(stock_data$OPEN)) {stock_data$DATE[i]<-as.character(index(stock_data)[i]) }
stock<-stock_data[,c(5,1:4)]
stock<-Calculate_MA(stock)
stock<-Calculate_MACD(stock)
stock<-stock[c(253:length(stock$OPEN)),]
stock_file<-paste("data/",stock_Num,".csv",sep="")
write.csv(stock, file=stock_file,row.names=F,quote=F)
stock_file<-paste("data/",stock_Num,".csv",sep="")
stock<-read.csv(stock_file,header=T,encoding='UTF-8',as.is=TRUE)
stock<-MA_Strategy(stock)
stock<-Calcualate_Profit(stock)
Kplot_Profit(stock,stock_list[stock_Num])
pro<-c()
for (i in 1:30)
{
stock<-MA_Avoid_Range_Bound(stock,i)
stock<-Calcualate_Profit(stock)
pro[i]<-Profit_rate(stock)
}
max(pro)
i<-which.max(pro)
i<-21
stock<-MA_Avoid_Range_Bound(stock,i)
stock<-Calcualate_Profit(stock)
Kplot_Profit(stock,stock_list[stock_Num])
stock<-MACD_Cross(stock)
stock<-Calcualate_Profit(stock)
Kplot_MACD(stock,stock_list[stock_Num])
#-------------------#
#5.6 Applying MACD strategy to high frequency trading
#-------------------#
temp=list.files(path="data/if_csv",pattern = "*.csv",all.files = T)
pro_fut<-c()
i=1
future_Num<-substr(temp[i],1,17)
future_file<-paste("data/if_csv/",future_Num,".csv",sep="")
index_future<-read.csv(future_file,header=T,encoding='UTF-8',as.is=TRUE)
index_future<-MACD_Cross(index_future)
index_future<-Calcualate_Profit(index_future)
Kplot_MACD(index_future,future_Num)
pro_fut[i]<-Profit_rate(index_future)
pro_fut
mean(pro_fut)
Information School.
Access to Dissertation
A Dissertation submitted to the University may be held by the Department (or School) within which the
Dissertation was undertaken and made available for borrowing or consultation in accordance with
University Regulations.
Requests for the loan of dissertations may be received from libraries in the UK and overseas. The
Department may also receive requests from other organisations, as well as individuals. The conservation
of the original dissertation is better assured if the Department and/or Library can fulfill such requests by
sending a copy. The Department may also make your dissertation available via its web pages.
In certain cases where confidentiality of information is concerned, if either the author or the supervisor so
requests, the Department will withhold the dissertation from loan or consultation for the period specified
below. Where no such restriction is in force, the Department may also deposit the Dissertation in the
University of Sheffield Library.
To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box
If you are willing to give permission for the Information School to make your dissertation available in
these ways, please complete the following:
√ (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation
being made immediately available through the Department and/or University Library for
consultation, and for the Department and/or Library to reproduce this dissertation in whole or
part in order to supply single copies for the purpose of research or private study
(b) Subject to the General Regulation on Intellectual Property, I, the author, request that this
dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from
the date of its submission. Subsequent to this period, I agree to this dissertation being made
available through the Department and/or University Library for consultation, and for the
Department and/or Library to reproduce this dissertation in whole or part in order to supply
single copies for the purpose of research or private study
Name Duan Zhao
Department Information School
Signed Duan Zhao (赵端) Date Aug. 24. 2016
To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate
box
(a) I, the supervisor, agree to this dissertation being made immediately available through the
Department and/or University Library for loan or consultation, subject to any special restrictions
(*) agreed with external organisations as part of a collaborative project.
*Special
restrictions
(b) I, the supervisor, request that this dissertation be withheld from loan, consultation or
reproduction for a period of [ ] years from the date of its submission. Subsequent to this period,
I, agree to this dissertation being made available through the Department and/or University
Library for loan or consultation, subject to any special restrictions (*) agreed with external
organisations as part of a collaborative project
Name
Department
Signed Date
THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS BY DEPARTMENTAL REQUIREMENTS.