64
Data science in Financial Markets: How Do Data Science Techniques Reshape Financial Trading? A study submitted in partial fulfilment of the requirements for the degree of MSc Data Science at THE UNIVERSITY OF SHEFFIELD by Duan Zhao September 2016

Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Embed Size (px)

Citation preview

Page 1: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Data science in Financial Markets:

How Do Data Science Techniques Reshape Financial Trading?

A study submitted in partial fulfilment

of the requirements for the degree of

MSc Data Science

at

THE UNIVERSITY OF SHEFFIELD

by

Duan Zhao

September 2016

Page 2: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Abstract Background

Data Science and data science techniques play a key role in modern financial markets and financial

trading by providing new tools such as backtesting, automated trading system and powerful

applications such as high frequency trading. These innovations have made the markets significantly

different from the past in fundamental ways

Aims

The first aim of this research is to investigate new techniques created by Data Science for financial

trading, and try to understand the trading strategy, computer program and implications buried in

the secrecy of this black box from a micro perspective. The second aim is to investigate the potential

social implications of the massive use of data science techniques in financial trading, and to

understand how modern financial trading reshapes our economy and society from macro

approaches.

Methods

This research will be implemented with both qualitative and quantitative methods. The first part of

the research deals with computer programming to actually build several computer programs that

reproduce backtesting, automated trading system and a few trading strategies. The second part of

the research tries to develop an innovative method to integrate the micro and macro analysis.

Technical details together with insights from the literature, the potential social impacts of these

tools will be analysed and investigated.

Results

Three trading strategies, Mean Average strategy, Improved Mean Average strategy and MACD

strategy are reproduced and backtested with the historical data of 15 listed companies. MACD

strategy and Improved Mean Average strategies perform much better than Mean Average strategy,

making profits on all 15 stocks. The average compound return rate for MACD strategy is 20% per

year during previous five years. A much higher return rate can be achieved if applying same MACD

trading strategy with high frequency data. The average daily return rate of MACD strategy in index

future is around 30%.

Conclusions

Data Science Techniques enhance the profit rate for financial speculators dramatically. The excessive

financial speculation may have negative social impacts, because it subtracts value from the society,

costing a great deal of both physical and human capital without providing any socially beneficial

service. Excessive speculation may also compel companies to sacrifice long-term value in order to

meet short-term goals set by financial markets. From macro point of view, it is the Neoliberal

policies deployed since 1980s that gave rise to various financial innovations. Many of these

innovations aim to pursue self-interest, contributing nothing to the good of society and may transfer

large deal of wealth from real economy to financial sector, resulting the depravation of income

inequality.

Page 3: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Acknowledgements

I would like to express my gratitude to my supervisor, Dr. Jo Bates, who has supported me

throughout my dissertation with her patience, insightful comments and encouragements. It is

impossible for me to complete this dissertation without the her guidance.

I would also like to thank the Information School for create a very interesting data science program. I

really enjoy this year at Sheffield.

I would also like to thank my family whose support and encouragement were more than I can

express on paper.

Page 4: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Contents 1. Introduction ………………………………………………………………………………………………………………………………….1

2. Research aims and objectives ……………………………………………………………………………………………………….1

3. Literature Review ………………………………………………………………………………………………………………………….2

3.1 Backtesting ………………………………………………………………………………………………………………………………2

3.2 Automated trading system ………………………………………………………………………………………………………3

3.3 High frequency trading …………………………………………………………………………………………………………….3

3.3.1 HFT strategies ……………………………………………………………………………………………………………………….3

3.3.2 The winner-take-all nature of HFT ………………………………………………………………………………………..4

3.4 The impacts of utilising data science techniques on financial markets …………………………………….5

4. Methodology …………………………………………………………………………………………………………………………………6

4.1 Data …………………………………………………………………………………………………………………………………………7

4.2 Coding ……………………………………………………………………………………………………………………………………..7

4.3 Visualization …………………………………………………………………………………………………………………………….8

4.4 From micro technical details to macro social impacts ………………………………………………………………8

4.5 Ethical aspects………………………………………………………………………………………………………………………….9

5. Demonstration and case study ………………………………………………………………………………………………………9

5.1Preparation of historical stock data ………………………………………………………………………………………….9

5.1.1 Download data ………………………………………………………………………………………………………………….9

5.1.2 Visualization …………………………………………………………………………………………………………………..10

5.1.2 Stock split ……………………………………………………………………………………………………………………….11

5.2 Moving Average strategy ………………………………………………………………………………………………………12

5.2.1 Calculate Moving Average ………………………………………………………………………………………………12

5.2.2 Backtesting strategy ……………………………………………………………………………………………………….12

5.2.3 Compute profits and visualization …………………………………………………………………………………..13

5.2.4 Evaluation and discussion ……………………………………………………………………………………………….14

5.3 Improved Moving Average Strategy ………………………………………………………………………………………15

5.3.1 Avoid trades in range bound ……………………………………………………………………………………………15

5.3.2 Chose the best value for N ………………………………………………………………………………………………16

5.3.3 Comparison between MA strategy and Improved MA strategy ………………………………………18

5.4 Using technical indicators ………………………………………………………………………………………………………19

5.5 Applying same strategy to other stocks …………………………………………………………………………………21

5.6 Applying MACD strategy to high frequency trading ………………………………………………………………25

6. Potential Social impacts of implying Data Science Technologies in finance trading ……………………26

6.1 Promote financial speculation ………………………………………………………………………………………………26

Page 5: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

6.1.1Investing vs. speculation …………………………………………………………………………………………………26

6.1.2 Data Science techniques: super weapons for speculation ………………………………………………27

6.2 Potential social impacts of excessive speculation in finance trading ………………………………………28

6.2.1 Financial speculation subtracts value from society …………………………………………………………29

6.2.2 Waste of capital, both physical and human ……………………………………………………………………30

6.2.3 Negative impacts on corporation governance …………………………………………………………………30

6.3 Financial industry, Neoliberalism and income inequality ………………………………………………………31

6.3.1 The culture of speculation in finance sector ……………………………………………………………………31

6.3.2 The Neoliberal reform in 1980s ………………………………………………………………………………………32

6.3.3 Financial innovation and income inequality ……………………………………………………………………33

7. Conclusion …………………………………………………………………………………………………………………………………34

References

Appendices

List of tables and figures

Table 1 Example of historical market data ………………………………………………………………………………………7

Table 2 Backtesting of 15 stocks ……………………………………………………………………………………………………21

Figure 1 E-Mini S&P 500 futures prices during the flash crash …………………………………………………………5

Figure 2 Shanghai composite index during China’s wild swings ………………………………………………………6

Figure 3 the interface of R studio ……………………………………………………………………………………………………7

Figure 4 Backtesting result of the share price of Yahoo ……………………………………………………………………8

Figure 5 Backtesting result of the share price of Facebook ………………………………………………………………8

Figure 6 Codes for downloading data ………………………………………………………………………………………………9

Figure 7 Historical data of Apple ……………………………………………………………………………………………………10

Figure 8 Candlestick chart ………………………………………………………………………………………………………………10

Figure 9 Codes for plotting candlestick chart …………………………………………………………………………………10

Figure 10 Visualization of Apple historical share price ……………………………………………………………………11

Figure 11 Codes for dealing with stock split ……………………………………………………………………………………11

Figure 12 Visualization of Apple historical share price recalculated …………………………………………………11

Figure 13 Codes for calculating MA 5 & 20 lines ……………………………………………………………………………12

Figure 14 Apple historical share price with MA 5 and MA 20 …………………………………………………………12

Page 6: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Figure 15 Codes for MA strategy ……………………………………………………………………………………………………13

Figure 16 Codes for calculating profits ……………………………………………………………………………………………13

Figure 17 Backtesting result for MA strategy on Apple ……………………………………………………………………14

Figure 18 Codes for calculating profit rate ………………………………………………………………………………………14

Figure 19 Backtesting result for MA strategy on Apple from Jan. 2011 to Apr. 2012 ………………………15

Figure 20 Codes of Improved MA strategy ………………………………………………………………………………………16

Figure 21 Backtesting results for Improved MA strategy on Apple with different values of N …………16

Figure 22 Codes of Improved MA strategy ………………………………………………………………………………………17

Figure 23 Backtesting result for Improved MA strategy on Apple ……………………………………………………17

Figure 24 Comparison between MA strategy and Improved MA strategy ………………………………………18

Figure 25 MACD indicator ………………………………………………………………………………………………………………19

Figure 26 Codes for calculating MACD ……………………………………………………………………………………………19

Figure 27 Codes for MACD strategy …………………………………………………………………………………………………19

Figure 28 Backtesting results for MACD strategy ……………………………………………………………………………20

Figure 29 Backtesting results for MA strategy on Citi group ……………………………………………………………22

Figure 30 Backtesting results for MA strategy on Nestle …………………………………………………………………22

Figure 31 Backtesting results for Improved MA strategy on Citi group ……………………………………………23

Figure 32 Backtesting results for Improved MA strategy on Nestle …………………………………………………23

Figure 33 Backtesting results for MACD strategy on Volkswagen Group …………………………………………24

Figure 34 Backtesting results for MACD strategy on Toyota Motor …………………………………………………24

Figure 35 Daily return rate of Index points ………………………………………………………………………………………25

Figure 36 Backtesting for MACD strategy on index future on 2015-05-06 ………………………………………25

Figure 37 Backtesting for MACD strategy on index future on 2015-07-02 ………………………………………26

Figure 38 Financial Business and Nonfinancial Business Augmented Rates of Profit in U.S. ……………33

Figure 39 the top decile income share from 1917-2014 …………………………………………………………………34

Figure 40 Real Median Household Income in the United States ………………………………………………………34

Page 7: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

1

1. Introduction

Data Science focuses on extracting information and knowledge from data and producing data based products.

As Provost and Fawcett (2013) stated, “Data science is a set of fundamental principles that support and guide

the principled extraction of information and knowledge from data.” These principles are applied broadly and

intertwined closely with data mining, data analysis, data-driven decision making and big data. Data science

has proved its value in customer behaviour analysis, online recommendations and advertising, credit scoring,

supply-chain management, fraud detection, and financial trading (Schutt & O'Neil, 2013).

The main function of financial system is to allocate resources efficiently by transferring capital from savers to

borrowers (Boot & Thakor, 1997). This function is achieved with two means, bank lending and direct finance.

Direct finance refers to the situation where borrowers borrow funds by selling securities or issuing bonds

through financial markets without using a third party (Mishkin, 2007). Financial markets consist of stock

markets, bond markets, commodity markets, money markets, derivatives markets, futures markets and foreign

exchange markets (Pilbeam, 2010). Despite the benefits of financial markets, over-developed financial market

may become a recipe for economic crises, as we have witnessed, the U.S. sub-prime mortgage financial crisis

led the whole world into recession and harmed the society deeply (Reinhart& Rogoff, 2008).

Data Science and data science techniques play a key role in modern financial markets and financial trading.

As Kitchin (2014) pointed, “With the rise of Data Science, a revolution is underway, reshaping how

knowledge produced and business conducted from an infrastructural level”. As for financial markets, data

science has changed the way in which information and knowledge were gathered, processed, stored and

reused in the financial trading by providing new tools such as backtesting, automated trading system and

various applications such as high frequency trading. These applications have made the markets significantly

different from the past in fundamental ways, as O’Hara stated, “from the way traders trade, to the way markets

are structured, to the way liquidity and price discovery arise” (O’Hara, 2015). At the same time, these

techniques also transition financial trading into a black box. As Pasquale (2015) stressed, the inputs and

outputs of the financial trading systems can be observed, but “we cannot tell how one becomes the other”.

To understand this evolution of the financial markets and trading, the following questions are raised and will

be investigated in this research: How do Data Science technologies reshape Financial Markets? What are the

data products produced by Data Science in financial trading? How do they work? What are the impacts of

these tools on the liquidity, stability and equality of financial markets? What are the potential social impacts

of these techniques?

2. Research aims and objectives

The first aim of this research is to investigate new techniques created by Data Science for financial trading,

and try to understand the trading strategy, computer program and implications buried in the secrecy of this

black box from a micro perspective.

The second aim is to investigate the social implications of the massive use of data science techniques in

financial trading, and to understand how modern financial trading reshapes our economy and society from

macro approaches.

Objective 1:

New tools created by data science techniques, such as backtesting and automated trading system will be

described and demonstrated with cases of computer program coded by myself.

Objective 2:

The applications based on these new tools such as high frequency trading (HFT), will be analysed with case

study and visualization methods.

Page 8: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

2

Objective 3:

Financial impacts of these data science applications in financial trading will be analysed through textual

analysis.

Objective 4:

To link the micro and macro, theories that explain the social influences of modern financial trading will be

discussed.

This dissertation is organised as follows. Section 3, literature review, describes two basic tools produced by

data science techniques for financial trading, backtesting and automated trading system which gave rise to

high frequency trading(HFT). Section 4 explains the data and methods used in this research. To understand

the technical details of those new tools, several trading strategies are programmed by myself and backtested

with real historical market data. Both test results and data are visualised to be better understood. Section 5

demonstrated the whole research process with the codes of computer program in order to provide a rough

picture of how these new tools may create extraordinary profits in realistic financial trading. Equipped with

these, section 6 discussed the potential social impacts of these new tools and argues that many of these new

trading tools focus on the financial speculation, which may not only waste social capital but lead to a high

degree of income inequality.

3. Literature Review

In this section, backtesting and automated trading system are first introduced. With the help of these tools,

traders can built various high frequency trading strategies, such as market making, arbitrage and directional

trading. Most of these strategies have a winner-taker-all nature and it cost traders a great deal to maintain its

advantage in the competition.

3.1 Backtesting

Backtesting is the practice of utilising historical financial data to test trading strategies or analytical models to

see how actually the strategy or models would perform (Masteika et al., 2014). The assumption of backtesting

is that if the strategy had worked in the past, then it would have a good but not certain chance to make profits

in future, and conversely if the strategy had failed previously, it would be unsuccessful to perform well in

future (Lopez & Saidenberg, 2000). With the help of computer programs and historical trading data,

backtesting enables investors and analysts to evaluate and optimize their trading strategies and models before

taking them into realistic trading. The evaluation of backtesting is shown by various statistical indicators. For

example, sharp ratio is used to measure the performance of an trading strategy after adjusting for its risk,

dividing the excess return one may receive by the extra volatility that one endure for performing that strategy

(Maier-Paape & Platen, 2014).Unlike traditional investment strategies, backtesting makes it possible for

investors to test and improve strategies before facing any risk in realistic trading(Cao et al., 2004).Due to the

benefits of backtesting, it is widely employed by investors, hedge funds and investment banks (Ni & Zhang,

2005).

Previous studies also illustrate that one of the major problems of backtesting is overfitting. Overfitting is a

machine learning concept, indicating a situation when a model fits a particular observation but is unable to

describe the general structure. For example, a trading strategy can be designed based on a group of parameters

that fits one set of historical data perfectly, but fails to make profits if applying the same strategy to other

period of historical data or realistic trading (Bailey et al., 2014). The risk of overfitting cannot be eliminated,

but can be diminished by reducing the number of parameters or testing the same strategy to various types of

Page 9: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

3

historical data (Maier-Paape & Platen, 2014).

3.2 Automated trading system

Automated trading system is a computer program developed to automatically create orders and submit them to

the market exchange (Austin et al., 2004). An automated trading system has multiple functions, passing

instant market data to trading strategies, submitting orders to the market exchange, risk management and

dynamic optimization adapt to changing market situations (Dempster & Leemans, 2006).

The Automated trading system can execute repetitive tasks correctly at a high speed, which is impossible to

human beings. The key part of an automated trading system is building the buy-sell rule, which usually bases

on fundamental analysis or technical analysis. While fundamental analysis evaluates the true value of an asset

with financial factors, such as balance sheet, supply and demand, technical analysis focuses on the pattern of

historical price movement of a security, in order to forecast the future (Creamer & Freund, 2010).One major

branch of technical analysis is indicator analysis. Indicators, such as Moving Average Convergence

Divergence (MACD), Relative Strength Index (RSI), can be calculated with past trading data, and reflect

characteristics of market movement (Liu & Xiao, 2009).

3.3 High frequency trading

High frequency trading (HFT) is one of the major applications of data science techniques on financial trading.

A large range of trading activities and behaviours are described by the term HFT, however, they have at least

three attributes in common, it is done by sophisticated computer program, it depends on extraordinarily fast

speed, and it is strategy-based (O’Hara, 2015).According to the definition from the U.S. Securities and

Exchange Commission (SEC, 2010, p.45), high frequency trading is used to refer to “professional traders

acting in a proprietary capacity that engage in strategies that generate a large number of trades on a daily

basis”.

As an application of data science techniques, a HFT program is an automated trading system pre-set with

strategies, which have been proven profitable by backtesting. A HFT program can run at a high speed with

ultra-low latency. Latency refers to the time interval between a HFT program receives data from the market

and sends orders back to the market (Carrion, 2013). After years of development, latencies of a HFT program

can be reduced to the scale from milliseconds (one millionth of a second) to even nanoseconds (one billionth

of a second), equipped with best micro-chips and co-located with the market exchange computers (Serbera &

Paumard, 2016).

3.3.1 HFT strategies

HFT deploys a wide variety of strategies, which can be divided into three categories, market making, arbitrage

and directional trading (Jones, 2013). Each category is discussed below.

Market Making refers the behaviour that post limit orders on both side of buy and sell of one financial asset,

and gaining profit on the bid-offer spread. Limits orders are orders with a certain number of shares set at a

specified price on the order book, waiting orders from other traders to meet its price requirement. Market

making provides liquidity for the market, because market participants can trade immediately at the price

provided by the market maker (Amihud & Mendelson, 1980). HFT market making strategies, however, differ

from traditional market making in speed and sensitivity. Market makers bear the risk of losing money if their

limit order price is left behind by the current information. HFT market making strategy adjusts its quotes in

response to new information in a high frequency (Menkveld, 2014). They also uses historical correlation

patterns to constantly adjust the quote, for example, if the movement of stock A and stock B is highly

Page 10: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

4

correlated in history, when stock A price moves up, HFT market making strategy will immediately raise the

price of limit orders on stock B. As a result, HFT market making strategy tends to submit and cancel massive

limit orders continuously (Jones, 2013).

HFT Arbitrage as a trading strategy heavily depends on the data science techniques, involving data mining,

statistics and automated trading systems. Arbitrage as a trading strategy has been existed for decades, but due

to the advent of data science techniques, finance globalization and market fragmentation, the number of

arbitrage opportunities for HFT has soared since 1990s. Arbitrage vary in many forms, and can be divided into

two groups, deterministic arbitrage and statistical arbitrage. Deterministic arbitrage is the situation when a

sure profit can be obtained with no risk, while statistical arbitrage refers to the trading which takes advantage

of statistical mispricing (Bondarenko, 2003). One example for deterministic arbitrage is the arbitrage between

exchange-traded fund (ETF) and future. Both S&P 500 futures and S&P 500 ETF tracks S&P 500 index.

When a large buy order pushes up the price of S&P 500 futures and the price of S&P 500 ETF does not move

up at that moment, HFT arbitrage strategy will buy the ETF and sell the future, locking the price gap as profit

(Marshall et al., 2013). As for statistical arbitrage, the most common strategy is pairs trading. If the price

movement of two stocks is highly correlated in the history, two stocks of two rival companies for instance,

HFT strategies will buy the under-performer stock and sell the over-performer one, expecting the price gap

between these two stocks will reduce in future and make the profit. Pairs trading is a market-neutral strategy,

because its profit is gained from the volatility of the correlation between two securities, no matter the market

is in uptrend, downtrend, or sideways movement (Elliott et al., 2005).

Directional trading strategies based on the trader’s assessment of the market or stock’s near-term direction.

HFT directional trading strategies assess the direction of a security on related news or detection of large orders.

Some HFT traders use data science techniques to analyse Twitter data and media news, in order to respond

immediately after the release of new information. For example, on Tuesday, 23 April 2013, at 1.07pm Eastern

Standard Time, the official Associated Press Twitter account @AP was hacked and posted a false tweet

saying that “Two Explosions in the White House and Barack Obama is injured”. Within seconds, the U.S.

stock market dropped dramatically and $136.5 billion of the S&P 500 index's value had been wiped out before

the market recovered after a few minutes (Selyukh, 2013). Another category of directional trading strategies

focuses on detecting large order. If HFT strategies detect a series of large buy orders of one stock, which is a

signal that an institutional trader may start to purchase a certain amount of shares. Then, HFT program may

purchase the existing limit sell orders, driving up the price, and sell those shares back to that institutional

trader at a higher price (Jones, 2013).

3.3.2 The winner-take-all nature of HFT

As shown above, most HFT strategies have a winner-take-all nature, and this makes fierce competition

between HFT traders inevitable. For market making strategies, if a HFT trader reacts to the news slower than

others, he will be the victim of the market movement. For arbitrage and directional strategies, only the first

HFT trader who detects the opportunity can make the deal and gain the profits. In HFT world, slightly slower

than others means failure, therefore HFT firms have to invest a great amount of money in order to reduce the

latency as much as they can, such as purchasing the best computers, switches, hiring rocket scientists to create

technological advantage, renting co-location service next to stock exchange, and even digging a gigantic

tunnel to reduce the communication latency from 17 milliseconds to 12 milliseconds with a straight line wire

(Lewis, 2014). High cost and fierce competition have made the HFT business much more difficult than before

and it has been estimated that the total earnings of HFT industry reached its peak of USD $ 5 billion in 2009,

and kept declining to $1billion in 2013 (O’Hara, 2015).

Page 11: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

5

3.4 The impacts of utilising data science techniques on financial markets

The applications of data science techniques on financial trading have various impacts on the financial market

and the society in terms of market liquidity, efficiency, stability and fairness.

Market liquidity and efficiency have been improved by HFT. Liquidity refers to the ability to trade a large

amount of securities in a short time period without incurring heavy price movement. Hendershott & Riordan

(2013) discovered that automated trading activities reduce the volatility of liquidity, because they consume

liquidity when it is cheap, and provide liquidity when it is expensive. Jarnecic & Snape (2014) proved that as

HFT strategies constantly submit limit orders at multiple prices, they provide liquidity on a on-going basis.

Market efficiency is enhanced as HFT strategies respond to news faster than traditional traders and HFT

arbitrage strategies are more capable to capture arbitrage opportunities (Manahov et al., 2014).

Despite the benefits brought by data science techniques for financial markets, the market stability may be

threatened by the massive use of automated trading systems. A famous example is the “flash crash” that

happened on 6th May 2010 in the U.S. stock markets (Lauricella, 2010). The U.S. Securities and Exchange

Commission (SEC) and the Commodity Futures Trading Commission (CFTC) later found out that the

extraordinary swing of the market was triggered by a mutual fund’s large selling order. After the limit buy

orders were exhausted, which is a signal of shorting for most HFT programs, a tremendous amount of selling

orders submitted by those HFT programs crashed the market (SEC & CFTC, 2010).

Figure 1 E-Mini S&P 500 futures prices during the flash crash (Jones, 2013)

Page 12: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

6

U.S. is not the only one that experienced this kind of accident. On 16th August 2013, China’s stock market

rose 5.96% within 15 minutes, due to a set of large buy orders mistakenly submitted by the automated trading

system of one security company (Shaffer, 2013).

Figure 2 Shanghai composite index during China’s wild swings (source: Tdx.com.cn)

Concerns about fairness are from two perspectives, the fairness between traders, and the fairness between the

industry and society. For example, HFT traders can co-locate their computers with that of exchange, in order

to receive the market information faster and have their orders processed earlier than traditional traders (Angel

& McCabe, 2013).

Generally speaking, Data Science techniques have increase the complexity and uncertainty of the market. It

costs the public and the government much more efforts to understand and supervise it. Social fairness is hard

to define or measure, but concerns arise when so many resources are placed in financial trading simply to

pursuit more profits, while contributes little for society. Frank Pasquale argued that “financial sector took 29

percent of the profits of the American economy while accounting for only 10% of the value added in the

fourth quarter of 2010” (Pasquale, 2015, p.6).

To sum up, new tools built by data science techniques have significant changed the financial trading and

finance industry and have various impacts on both micro and macro level. It is important to understand these

new tools from the micro technical level, in order to get a better understanding of their potential social impacts

from a macro perspective.

4. Methodology

This research will be implemented with both qualitative and quantitative methods.

In order to understand those applications produced by data science techniques for financial trading, and

demonstrate their functions, the first part of the research deals with computer programming to actually build

several computer programs that reproduce backtesting, automated trading system and a few trading strategies.

The algorithm will be written in R, tested with real historical market data. Basic knowledge of financial

markets and financial trading can be found from text books (Pilbeam, 2010). Details of technical indicators

and trading strategies used in this research, such as Moving Average (MA) and Moving Average Convergence

Divergence (MACD) are from existing literature (Chong & Ng, 2008). Skills for building computer program

with R can be found from relevant books such as R in action (Kabacoff, 2015).

The second part of the research tries to develop an innovative method to integrate the micro and macro

analysis. Literature has suggested that financial system plays an important role in the modern economy and

society. It is also addressed that data science techniques, such as high frequency trading, have reshaped the

Page 13: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

7

finance trading significantly. In this research, through demonstrating the tools, I aim to develop a detailed

understanding of these techniques at the micro level, together with insights from the literature, to investigate

the potential social impacts of these tools from a macro level.

4.1 Data

All the data for this research is real historical market data, downloaded from public source, such as Yahoo

Finance, Google Finance, Bloomberg and Tushare (A Chinese company providing China’s financial market

data openly and freely via APIs).

Table 1 Example of historical market data

date open high close low volume

2013/3/11 22.72 22.98 22.63 21.85 573293.4

2013/3/12 22.60 22.79 21.72 21.31 840769.1

2013/3/13 21.69 22.58 22.09 21.61 836901.5

2013/3/14 22.00 22.09 21.67 21.30 577853.6

2013/3/15 21.66 22.63 22.20 20.80 943574.1

2013/3/18 21.70 22.00 21.35 21.21 745294.8

2013/3/19 21.30 21.73 21.50 21.04 529771.2

Table 1 is an example of historical market data. The first row records the date, the second to fifth row record

the open price, highest price, close price, and lowest stock price on that day. The volume row is the value of

shares traded that day. This kind of data is the basic input for backtesting, automated trading system and HFT

programs (Maier-Paape & Platen, 2014).

4.2 Coding

Figure 3 the interface of R studio

As shown above, most of the coding work of this research will be done with R studio. R is an open-source

statistical computer programming language, widely used in the academic research. In this research, I will use

R to code computer program to implement backtesting, automated trading system and some strategies used in

HFT.

Page 14: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

8

4.3 Visualization

Figure 4 Backtesting result of the share price of Yahoo

Figure 5 Backtesting result of the share price of Facebook

Figure 4 and 5 are examples of one of the visualization tools that will be deployed in this research to

demonstrate how Data Science techniques work in financial trading. Codes of this visualization method are

included in the appendix. The buy and sell signals, created by the strategy in backtesting, are shown in a

candle graph of the stock price. There are also various measures to evaluate the performance of trading

strategies in the literature, for this research the profit rate would be the indicator to measure the performance

of each trading strategy (Masteika et al., 2014).

4.4 From micro technical details to macro social impacts

Data science techniques have produced many new tools for finance and financial trading, and turn them into

“black box”. This research aims to open this black box by developing and demonstrating some of these new

Page 15: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

9

tools, and investigation on the social impacts of these new tools based on technical details. I will use the

insights from redeveloping and demonstrating these new tools, together with theories from literature, to

investigate the potential social impacts of these new tools and its relationship to data science techniques.

4.5 Ethical aspects

This research has no risk considering with ethical aspects, because as shown in the section 4, all the data used

in this research will be public market data. No private information will be collected or used in this research.

5. Demonstration and case study

Data Science techniques provide powerful weapons for financial trading. Backtesting system allows traders to

build, test and improve their trading strategies. This section aims to demonstrate the whole process of

backtesting system: acquiring data, building strategy, improving strategy, testing strategy. All computer

coding is in R with R studio.

Firstly, a simple trading strategy, Mean Average strategy, will be built based on the historical data from Apple

Inc., which is a world well-known company and listed in NASDAQ. Based on the results from backtesting, a

better strategy can be created from the first simple strategy. Then a third strategy is built with technical

indicator, MACD, after evaluating the performance of second strategy. Secondly, three strategies are test with

historical market data from 15 well-known companies to make a further assessment. Finally, high frequency

data is used to test the MACD strategy, demonstrating the high profit possibility of HFT.

5.1Preparation of historical stock data

5.1.1 Download data

Figure 6 Codes for downloading data

As shown in figure 6, from line 1 to line 3 are notes written by the programmer. Notes start with “#” and will

not be run. Line 4 and 5 installs and load the “quantmod” package. A package includes many pre-set functions

which can be used by anyone after loading the package. Line 6 uses the function “getSymbols” to download

historical data of Apple from yahoo finance between Jan. 1st 2010 to Jun. 21

st 2016. From line 7 to line 11, the

raw data are cleaned and saved as a csv file in local drive. Cleaned data are shown in figure 7.

Page 16: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

10

Figure 7 Historical data of Apple

5.1.2 Visualization

Market data is usually shown with Candlestick chart. Four crucial information, open price, close price, highest

price and lowest price of a certain time period is contained in one candle stick, as shown in figure 8. If the

close price is higher than open price, the body will be green, otherwise in red.

Figure 8 Candlestick chart (source: Wikipedia)

In R, programmer can build a function by himself to plot candlestick chart. As shown in figure 9, line 15

inputs the stock data into function “Cplot”, which is from line 18 to line 43, and receives the visualization of

stock data in figure 10.

Figure 9 Codes for plotting candlestick chart

Page 17: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

11

Figure 10 Visualization of Apple historical share price

5.1.2 Stock split

On Jun. 9th 2014, Apple split every one share of its stock into seven shares (Solomon, 2010). So the stock

price went down to one seventh of before. For traders, the share price before stock split has to be recalculated,

because the total market value of Apple would not be influenced by stock split.

Figure 11 Codes for dealing with stock split

As shown in figure 11, a new function “Stock_Split” is built from line 52 to line 62 to recalculate the stock

price of Apple before the stock split. New price is one seventh of original price. The real historical stock price

pattern is shown in figure 12, which is significant different from figure 10.

Figure 12 Visualization of Apple historical share price recalculated

Page 18: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

12

5.2 Moving Average strategy

5.2.1 Calculate Moving Average

Moving average is the arithmetic mean of close price for a certain amount of previous trading days. For

example MA5 is a line connecting the mean of five previous trading days’ close prices of each trading day. To

calculate MA5, in function “Calculate_MA”, from line 76 to line 79, a loop is set up to calculate the mean

close price of previous 5 days based on each trading day. Another loop from line 80 to line 83 calculates MA

20. As shown in figure 14, the black line is the MA 5 line and the orange line is MA 20 line. It is obvious that

MA 5 line is much more sensitive to the movement of stock price. When the stock starts an uptrend, the MA 5

line will cross above the MA 20 line.

Figure 13 Codes for calculating MA 5 & 20 lines

Figure 14 Apple historical share price with MA 5 (the black one) and MA 20 (the yellow one)

5.2.2 Backtesting strategy

From this feature, a simple MA strategy can be designed. When the MA 5 crosses above MA 20, close the

short position and open a long position, and when MA 5 crosses below MA 20, the long position is closed and

another short position is opened. A short position refers to the transaction that an investor borrows stocks from

other investors and sell them, making profits if the price of the stock drops. A long position, on the contrary,

Page 19: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

13

refers to the transaction that an investor buy stocks and hold them, making profits if the stock price rises up

(Pilbeam, 2010).

Figure 15 Codes for MA strategy

This strategy can be precisely defined by codes from line 137 to line 151 shown in figure 15. For ith day, if the

value of MA 5 is larger than MA 20, and the value of MA 5 is smaller than MA 20 the day before it, then a

long position is opened. On the contrary, if the value of MA 5 is smaller than MA 20 on ith day, but MA5 is

larger than MA 20 on the day before (i-1)th day, a short position is opened. The trade and position is recorded

separately in the attributes Trade and Hold. A for-loop will perform these two “if” conditional statements to

all data from beginning to end, therefore, backtesting this strategy with historical data, and recording the

results.

5.2.3 Compute profits and visualization

To evaluate the performance of a trading strategy, profit is calculated once a trading is done and the buy or

sell is labelled near the day when that trading is conducted. Two functions are built to calculate the profits and

plot them. As shown in figure 16, a for loop from line 166 to line 182 scans all the trading results, within this

loop, from line 168 to line 173, a conditional statement detects the trade action if the attribute Trade is not 0,

then the profit of this trade is calculated and stored in the variable, “Delta_Profit”. At last, at line 180, all

“Delta_Proft” are added together to get the total profit up to that trade.

Figure 16 Codes for calculating profits

Page 20: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

14

Figure 17 Backtesting result for MA strategy on Apple

The result of backtesting MA strategy on Apple from 2011 to 2016 is visualized in figure 17. Each trade is

marked as “Sell” or “Buy” besides the candlestick graph and the profit line is plot below to show the total

profit up to each trade. The strategy keeps losing money in the first year, when the share price is in a range

bound (stock price moves up and down within a chancel). From early 2012, when the Apple starts an uptrend,

the MA strategy begins to make profits. Once the stock price begins to range bound again in late 2013, the

profits made before is lost soon. From 2014 to late 2015, there is an uptrend and a downtrend with several

range bound periods for Apple share price, and the profit line of MA strategy just fluctuates near zero. From

2016, the MA strategy makes several good deals when there is no more range bound, and the profit line

reaches 30 before the backtesting ends.

Because the strategy is designed to sell or buy only one share for each trade, the profit rate is calculated by

dividing the final profit to the mean of the highest and lowest price of this stock within that period. The codes

to calculate profit rate is shown in figure 18.

Figure 18 Codes for calculating profit rate

5.2.4 Evaluation and discussion

From the backtesting result above, it is clear that the MA strategy performs well when the stock price is in a

trend, no matter up or down, but losses money when the stock price is in range bound. This is because it takes

a few days for MA lines to react the movement of stock price. Figure 19 shows the first 16 months backtesting

results. The stock price begins to rise 5 days before the crossover, labelled by “a” in figure 19, of MA 5

crossing above MA 20. The strategy opens a long position at the price of point a. Then a few days past, the

stock price begins to go down, but more days past when the MA 5 reacts to the downtrend and cross below

MA 20 at point b, where the MA strategy close the long position. But the stock price at b is already lower than

that of point a, therefore, a loss is made with the long position from point “a” to point “b”. Similarly, the MA

Page 21: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

15

strategy opens a short position at point c. This position does make profits in the next 5 days as the stock price

dropping, but when the strategy close this position at point d, the stock price is already higher than that of

point c, because the trading signal from MA lines crossover is later then the movement of stock price.

Figure 19 Backtesting result for MA strategy on Apple from Jan. 2011 to Apr. 2012

From Dec. 23rd 2011 to early April 2012, an uptrend makes the stock price rise from 50 to 90. The trading

signal is still behind the movement of the stock price, but stock price is rising so much that profit is made by

MA strategy in this trade.

5.3 Improved Moving Average Strategy

Backtesting allows trader to evaluate his trading strategy with real historical data without losing any money.

From discussion above, if a switch can be designed to turn off trading when the stock is in range bound, and

turn on when the stock is in trends, a much higher profit can be achieved.

5.3.1 Avoid trades in range bound

Usually, during range bound period, trading signals are created frequently by MA strategy, so a switch can be

designed as if there is a trading signal within N days before this new trading signal, and then do not trade on

this trading signal. As shown in figure 20, at line 257 a new variable “N” is created to set the length for the

length of days that stop trading, and at line 259, a new attribute “Signal” is created to store all trading signals.

From line 266 to line 284 is the code for Improved MA strategy, compared with codes in figure 15, a new

condition is added at line 269 and line 279. Under this condition, a new trade can be conduct only if the sum

of trading signals generated in previous N days is zero.

Page 22: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

16

Figure 20 Codes of Improved MA strategy

5.3.2 Chose the best value for N

As shown in figure 21, the improved MA strategy performs differently with different values of N, the number

of days that stop new trading to reduce loss in range bound. With backtesting system, trader can test the

strategy with all possible values of N, and chose the one with highest profits rate, as shown in figure 22. A

loop tests N from one to thirty, recoding the profit rates of all results. For this instance, in figure 23, the profit

rate reaches 102.78% when N equals to 13.

Figure 21 Backtesting results for Improved MA strategy on Apple with different values of N

Page 23: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

17

Figure 22 Codes of Improved MA strategy

Figure 23 Backtesting result for Improved MA strategy on Apple

Page 24: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

18

5.3.3 Comparison between MA strategy and Improved MA strategy

As show below, in 2011, the MA strategy makes 18 trades when the stock price is in bound range, and most of

these trades ending up with losing money. The MA strategy lost more than 25 dollars at the end of 2011. The

improved MA strategy, only make 8 trades in 2011, therefore only lost 8 dollars. While testing these two

strategies to same historical data, the profit rate of MA strategy is 7.18%, while that for Improved MA

strategy is 34.04%. Hence, a better strategy is improved based on the results from backtesting.

Figure 24 Comparison between MA strategy and Improved MA strategy

Page 25: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

19

5.4 Using technical indicators

The improved MA strategy performs much better than MA strategy, but trading signals from Improved MA

strategy are still a few days behind the trend. To overcome this problem, technical indicators, which are built

from basic historical data with mathematical methods, can be used. There a number of technical indicators

used in financial trading, and Moving average convergence divergence (MACD) is one of them (Appel, 2003).

MACD is calculated from different exponential moving average (EMA) lines. EMA is also a moving average

line, but different from MA, more weight is given to the latest data so EMA is much more sensitive than MA.

As shown in figure 25, a MACD indicator consists of three components: the DIFF line, which equals to

subtracting 26-day EMA from 12-day EMA; DEA line, which is 9-day EMA line, and MACD histogram, the

value of which is subtracting DEA from DIFF.

Figure 25 MACD indicator (source: Wikipedia) Figure 26 Codes for calculating MACD

Figure 27 Codes for MACD strategy

MACD can be easily calculated with computer program, as shown in figure 26. A new strategy, MACD

strategy can be built with MACD indicators. Similar to MA strategy, when the DIFF line cross above DEA

line, open a long position and close short position, when the DIFF line cross below DEA line, do the opposite

transaction. The codes for MACD strategy is in figure 27.

Page 26: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

20

Figure 28 Backtesting results for MACD strategy

The backtesting results of MACD strategy is in figure 28. The graph in middle is the MACD indicator. The

MACD strategy performs better than Improved MA strategy, because it not only reaches a higher profit rate at

the end, but make less loss compared with Improved MA strategy in the beginning as well.

Page 27: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

21

5.5 Applying same strategy to other stocks

Three strategies, MA, Improved MA and MACD, are built with historical data of Apple. With backtesting

system, it is possible to test these strategies with other stocks to make a further assessment. In this section,

historical data from fifteen well-known companies are downloaded. The performance of each strategy on each

stock is in table 2.

The return rate of each strategy during the whole period of 5 years is recorded in MA row, IMA row and

MACD row. The value of N for Improved MA strategy is recorded. The last row, Compound, is the

compound return of MACD strategy. The compound return rate is computed under the condition that each

year’s interest can be reinvested in next year. For example, For HSBC Holdings, the return rate of MACD

strategy in five years is 102.32 %, which means on average return rate per year is 20.46% (102.32% / 5), so

the compound return rate is 153.6% (=1.2046 ^ 5 - 1).

The best strategy of each stock is marked with grey background. MACD strategy makes the best performance

on nine stocks, and IMA performs best for left six. Both MACD strategy and IMA strategy manage to achieve

a positive return on all fifteen stocks, while MA strategy ends up with losing money on almost half of fifteen

stocks. Generally, MACD strategy is the best among three strategies, with average return rate of 63.29%, and

average compound return rate at 104.56%.

No Name Country Code Source MA IMA N MACD Compound

1 Berkshire Hathaway U.S. NYSE:BRK.A google 15.19 24.81 4 9.56 9.93

2 JPMorgan Chase U.S. NYSE:JPM google 16.35 17.43 15 82.49 114.58

3 Exxon Mobil U.S. NYSE:XOM google 16.3 40.21 3 19.27 20.81

4 Toyota Motor Japan NYSE:TM google -44.31 55.49 20 3.89 3.95

5 AT&T U.S. NYSE:T google -33.01 17.66 8 6.52 6.69

6 HSBC Holdings U.K. HSBA.L yahoo 32.83 49.54 8 102.32 153.68

7 Citigroup U.S. NYSE:C google 67.15 132.01 12 128.12 212.87

8 Wal-Mart Stores U.S. NYSE:WMT google 9.33 48.08 20 50.71 62.09

9 Samsung Electronics South Korea KRX:005930 google -33.42 29.61 9 30.59 34.57

10 Volkswagen Group Germany VOW.F yahoo 52.74 102.68 13 209.3 474.51

11 Microsoft U.S. NASDAQ:MSFT google -20.83 34.44 10 15.03 15.96

12 Google U.S. NASDAQ:GOOGL google -9.64 38.51 24 59.2 74.98

13 Ford Motor U.S. F yahoo 40.71 42.66 1 142.98 251.67

14 IBM U.S. NYSE:IBM google -21.94 16.1 4 72.17 96.23

15 Nestle Switzerland VTX:NESN google -51.48 7.29 21 17.18 18.40

Average 2.40 43.77 63.29 104.56

Table 2 Backtesting of 15 stocks

Page 28: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

22

The best and worst result from backtesting of each strategy is shown below.

MA strategy performs best on Citi group, with a total return of 67.15%, and worst on Nestle, with -51.48%.

From figure 29 and 30, MA strategy begins to lose money once the stock is in bound range.

Figure 29 Backtesting results for MA strategy on Citi group

Figure 30 Backtesting results for MA strategy on Nestle

Page 29: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

23

Improved MA strategy performs best on Citi group, with a total return of 132.01%, and worst on Toyota

Nestle, with 7.29%. Comparing figure 31 with figure 28, the Improved MA strategy achieved is goal. By

reducing trade frequency during bound range, return rate has been enhanced greatly. From figure 32, as the

uptrend of Nestle ends in early 2015, and started to move up and down within 65 to 75, the Improved MA

strategy begins to lose money. After all, Improved MA strategy is from MA strategy by reducing loss during

bound range. But if a stock keeps moving in a bound range, none of these strategies can make a satisfying

return.

Figure 31 Backtesting results for Improved MA strategy on Citi group

Figure 32 Backtesting results for Improved MA strategy on Nestle

Page 30: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

24

MACD strategy is more sensitive to the movement of stock price, so it can make the best among these three

strategies, as shown in figure 33. However, if the stock is in a tense bound range, like what happened for

Toyota Motor in figure 34 from early 2015 to mid 2016, MACD strategy also losses money.

Figure 33 Backtesting results for MACD strategy on Volkswagen Group

Figure 34 Backtesting results for MACD strategy on Toyota Motor

Page 31: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

25

5.6 Applying MACD strategy to high frequency trading

After testing strategies with daily stock data, this section focuses on testing MACD strategy on high frequency

data, the CSI 300 Index future traded in China Financial Futures Exchange. The data is candlestick graph of a

period of 15 seconds. MACD strategy is tested on 47 trading days, from Apr. 30th 2015 to Jul. 7

th 2015. The

daily return on index points is in figure 35. As the leverage are usually 8 to 10 times on index future, the real

return equals to 5 to 8 times of this rate.

The MACD strategy manages to make a positive return on all trading days, with the lowest at 0.17% on May

6th (figure 36) and highest at 27.35% on Jul. 2

nd (figure 37). The average daily index point return is 7.17%, so

the average daily return is more than 35% in money.

Figure 35 Daily return rate of Index points

Figure 36 Backtesting for MACD strategy on index future on 2015-05-06

Page 32: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

26

Figure 37 Backtesting for MACD strategy on index future on 2015-07-02

6. Potential Social impacts of implying Data Science Technologies in finance trading

6.1 Promote financial speculation

Data science techniques have reshaped financial trading with many super weapons for financial speculation,

which is significantly different from traditional investment.

6.1.1Investing vs. speculation

Investing is the behaviour of allocating capital into assets, in order to gain a return in future. Investors make

investments in stock markets and debt markets, facilitating companies in raising funds for productive purposes,

and indirectly create value for society (Hazen, 1991). Investing has two characteristics, adding value to the

society in long-term. In equity markets, investors analyse and research listed companies, predicting their

future performances, and purchase the stock shares of most promising companies. The stock price of these

companies will rise due to investors purchasing and make it possible for the company to obtain more loan

from bank or issuing new debt. Therefore, the company can produce better products or provide improved

service for its customers and obtain more profits in the future to meet investors’ expectation. Investing is a

long-term behaviour because this process may take several months or even a few years, during which period

investors will hold the stocks of this company.

Speculation, on the other hand, is short-term and non-productive. In financial trading, the term speculation

refers to the practice of making profits based on the prediction of market movement rather than the financial

attributes of this instrument such as dividends or interests (Szado, 2011). Regulation authority, the U.S.

Page 33: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

27

Commodity Futures Trading Commission for example, views speculator as a trader “who trades with the

objective of achieving profits through the successful anticipation of price movements" (CFTC, 2005). The

holding periods for stock speculation are usually shorter compared with investing, a few days for most

speculators or a few seconds for HFT speculators. Speculation is a zero-sum game. One speculator’s gain is

another’s loss and therefore speculative trading creates little value for society. It is simply transferring wealth

from loser to winner. For example, in future markets, option markets, credit default swaps (CDS) and other

derivative markets, each contract consists of two traders at the opposite side, a seller and a buyer. While the

winner receives what the loser lose, society gains nothing.

6.1.2 Data Science techniques: super weapons for speculation

Data Science techniques are not created for financial speculation, however, they are widely used by

speculators to reduce the cost of speculation, enhance the practicality and therefore increase the profit margin

of financial speculation significantly.

Constantly making profits in financial speculation is extremely difficult, therefore since World War II, value

investing prevailed over speculation until the rise of quantitative trading since 1990s. A professional

speculator’s goal is to discover the price pattern from historical data, which appears as completely chaos.

Then based on his “knowledge” of the stock price pattern, trading rules are set up to instruct his speculation.

Trading rules are a set of trading plans, quotes, requirements and principles, which define the right thing to do

under each type of situations. For example, trade the strongest stocks in a bull market, or trade with a stop loss.

Then, the speculator has to apply his trading rules into realistic trading, bearing risk and loss, suffering from

greed and fear, to improve his trading rules based on the trading results until constantly making profits or

giving up (Lefevre, 2004). The destination for most financial speculators is failure in life. And because of this,

after World War II, value investing is believed as the orthodoxy of financial trading. The aim of value

investing is to buy securities when it is undervalued by the market compared with the true value computed by

fundamental analysis (Graham & Dodd, 1934). However, after Data Science techniques are introduced into

financial speculation, a whole new world opens up for speculators. Computer programming, backtesting and

automated trading system greatly enhance the power of speculators.

Firstly, with data science techniques, it cost a speculator much less time, money and efforts to create a

profitable trading rule, or say, trading strategy. As demonstrated in section 5.1, historical data of securities

from global markets can be easily acquired and visualized with data science techniques. Then trading

strategies can be precisely defined with mathematical models and computer codes. It is obvious from section

5.2 to section 5.4, that with backtesting system, a speculator can test his trading strategy to any period

historical data of any security, and receive feedback immediately. To do the same thing, it may cost several

months and certain amount of money for speculators in last century. Without losing anything, a speculator in

21st century can easily and constantly evaluate and improve his strategy, until it is profitable enough to be

deployed into realistic trading.

Secondly, data science techniques dramatically enhance the practicality of implying profitable trading strategy

considering with the precision and scope. For speculators in last century, emotional fluctuations, spoiling food,

Page 34: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

28

or even a nightmare may become the barriers for them to conduct trading according to their trading rules. As

Jesse Livermore, who was the most famous and successful speculator before WWII, noted, “A stock operator

has to fight a lot expensive enemies within himself” (Lefevre, 2004). At present, however, automated trading

system replaces human to conduct trading strategies precisely as they are designed. As showed in section 5.5

and 5.6, automated trading system can also apply the same strategy to various stocks at the same time,

creating profits with little marginal cost. Data Science techniques greatly increase the scope of trading for

speculators, as it is impossible for human to monitor and trade fifteen or more stocks at the same time.

Thirdly, the profit margin of financial speculation has been increased astonishingly by data science techniques.

As shown in 5.3, a simple strategy after improvement can produce a profit of 80 dollars per stock in 5 years,

while the stock price rose from 60 to 100, which means in this case, a speculator can make 200% of profits

made by a value investing investor. In section 5.4, after testing same strategies to 15 different stocks, MACD

strategy manages to produce a positive return for all 15 stocks. MA strategy performs well for only 8 stocks,

but the improved MA strategy creates positive profits for all 15 stocks. In a five year period, the average

return for MACD strategy is 63%, and 43% for improved MA strategy. However, if compound interest is

taken into consideration, which means previous year’s profit can be reinvested, the average return for MACD

strategy will be 103% for 5 years, which roughly means a return rate of 20% for each year. As a comparison,

Warren Buffett, one of the most successful investors ever, his compound rate of return is 22.3% each year

(Mises, 2010). Therefore, with data science techniques, a profit rate of best ‘value investing’ investors can be

achieved by speculators with little difficulty.

The profit rate for HFT is even higher. Section 5.5 shows a fraction of HFT world, when applying same

MACD strategy to index future, the profit rate soar. After testing the MACD strategy to 47 trading days,

which is roughly 2 month real market data, the strategy manages to product an average daily return rate of 7%

in index point. Because the leverage of index future is 5 to 7 times, therefore, the daily return rate in money

would be more than 30%, which means if this strategy works as designed, it can turn £ 1,000 into £10,000

within two weeks!

It may be argued that these results are from backtesting, therefore transaction cost and various technical

problems in realistic trading may reduce the profit rate. It is true that there is a long way to go from

backtesting to realistic financial trading. However the purpose of section 5 is to demonstrate the principles and

potential power of utilising data science in financial trading, especially for financial speculation. Because

most people who do this job on Wall Street are PhDs of mathematics, physics and computer science from

world’s best universities, it is reasonable to assume that they can create much more profitable strategies than

those demonstrated in section 5 and applying them into realistic trading.

6.2 Potential social impacts of excessive speculation in finance trading

Investing and speculation are two cultures co-existing in financial trading since financial markets were created.

There used to be a balance between them. Investors provide capital for companies and receive a steady but

relative low return. Speculators, who are tempted by the lure of making fast money, trade frequently and

Page 35: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

29

provide liquidity for investors unintentionally. Most speculators lose or earn little money finally, but most

investors manage a low but a positive return, so the balance can be maintained.

However, after data science techniques are widely used into financial trading, the balance of investing and

speculation may have been broken, because the profit return of speculation has soared with the super weapons

provided by data science techniques. At present, financial speculation may not only be conducted by

individual speculators, but also become a main source of profits for big financial institutions as well, such as

investment banks and funds.

There are many social impacts caused by the excessive financial speculation. Speculators create little or no

wealth for society at all, so their gain is society’s loss. The extraordinary return of financial speculation may

attract many talented and well-educated young people to be professional speculators, instead of becoming

scientists or inventors benefiting society. What’s more, excessive speculation may compel the management of

companies focusing more on short-term profits, rather than making good decisions that benefits companies

most in long-term.

6.2.1 Financial speculation subtracts value from society

It can be deduced from general principles that because financial speculation does not create wealth, all profits

made by speculator come from other’s loss, other speculator, investor, or even middle class who have shares

of mutual funds or pension funds (Tong, 2014).

Since 2006, fund managers began to notice a strange phenomenon in stock markets. When they decided to

purchase the stock of one company, its share price would go up immediately after fund managers submitted

their purchase orders and the existing sell orders on the order book would just disappear(O’Hara, 2015).

Therefore, the fund manager had to purchase those stocks at a higher price with higher cost. Three years past

before outsiders of Wall Street figured out what had happened. It is due to the order detection strategy from

HFT speculators. Because the order size from fund manager is usually large, say one million shares at one

time, so the broker has to send the order to various exchanges or markets to get this big order matched.

Because the distances between the broker’s computer and various stock exchanges are not the same, so orders

arrive some markets later than others in a few milliseconds, during which disparity HFT speculators make

their profits. When the HFT program detects a large order in one market, it would send buy orders to all other

markets and exchanges to consume all existing sell orders and push up the stock price. All these actions can

be conducted before other orders from fund manager arrives to the markets due to the technical advantages of

HFT speculators over funds. Then, the fund manager has to raise the price of his buy order to get those shares,

when HFT speculators can sell fund managers the shares which they just bought and make the profits (Lewis,

2014).

Facing challenges from HFT speculators, fund managers have two choices, bearing the loss of profits, or using

algorithm trading to cut one larger order into many small orders in order to prevent their trading plan detected

by HFT speculators. Both choices will raise the cost of funds, and it is people who possess the shares of funds

that pay the bill finally, because this cost is a part of the fund fee. It has been estimated that retirement

savings account fees may cost “a median-income two-earner family nearly $154,794 and consume nearly one-

Page 36: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

30

third of their investment returns” (Hiltonsmith, 2012). Where does their money go? Profits of financial

institutions and HFT speculators.

6.2.2 Waste of capital, both physical and human

Because present speculators totally depend on their technical advantages to make profits, so the arm race

among them is inevitable. This arm race cost a great deal of capital, both physical and human. Unlike other

industries, the production of this race is simply better speculative trading techniques, but not improved service

that benefits the majority of society.

Literature has indicated that HFT is motivated by techniques, so HFT companies and speculators have to

spend a great deal of money to maintain its technical advantage over other speculators and funds. One of the

most famous examples is that a private company, Spread Networks, spent $300 million to build an 827-mile

long fiber cable route connecting Chicago exchange and New York exchange. This new cable route can cut

the latency between these two exchanges from 17 milliseconds to 12 milliseconds, giving users of Spread

Networks a slight advantage of 5 milliseconds over others (Steiner, 2010).

The extraordinary return of financial speculation attracts a number of well-educated people devoting

themselves into professional speculation with most advanced mathematical and computer techniques, which is

a waste of human capital for society. For example, Renaissance Technologies is one of the most famous and

successful quant funds founded by James Simons in 1982. James Simons is an American mathematician and

used to be a code breaker during cold war (Mallaby, 2010). With statistical methods and data science

techniques, Simons and his fund managed a 71.8 percent annual return from 1994 through mid-2014 (Rubin&

Collins, 2015), and gained more than $ 15 billion for himself by 2016 (Forbes, 2016). On the website of

Renaissance Technologies, it declares that we are a group of “roughly one hundred fifty people, half of whom

have PhDs. in scientific disciplines, dedicated to producing superior returns for its clients and employees by

adhering to mathematical and statistical methods” (Renaissance Technologies, 2016). As for recruitment, it

does not require its job candidates having experience in finance, however strong scientific and computing

skills, excellent academic record are basic requirements for most job openings, and for certain jobs, only

“Ph.D. in Computer Science, Mathematics, Physics, Statistics, or a related discipline” can apply (Renaissance

Technologies, 2016). Because the web pages of job openings may not always be accessible, the screen shots

of these pages are included in Appendix. Renaissance Technologies is not the only quant fund or financial

institutions hiring many PhDs. in scientific fields. In fact, it is so common that there is special term for them,

“Wall Street Rocket Scientists” (Zimmer, 2010). The massive rewards of financial speculation, promoted by

data science techniques and other mathematical and financial knowledge, divert so many talented young

people from socially beneficial fields into professional speculation, which harms the future of society.

6.2.3 Negative impacts on corporation governance

Excessive speculation in financial markets also makes negative impacts on corporation governance, because it

usually compels listed corporations sacrificing long-term growth to meet short-term expectations.

Page 37: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

31

As mentioned previously, investors focus on the long-term value, while the speculators mainly focus on the

short-term movement of stock price. If a company fails to meet a short-term expectation, the growth rate of

quarterly earnings for instance, which usually leads to a drop of the stock price of this company in near future,

investors will view it as a chance to buy more shares of this company at a lower price, as long as the company

is capable to fulfil its long-term goal. Speculators, on the other hand, will sell or short stocks of this company

in order to make profits out of the short-term downtrend of this company. As the markets becoming more and

more speculative, corporations will witness a larger drop of its share price as long as it fails to meet the short-

term targets set by analysts, because there are more speculators to short and fewer investors to buy. As a result,

company managers have to bear higher pressure, as shares price is one of the most decisive indicators to

measure their performance. This pressure may force companies to sacrifice long-term value to meet short-term

goals, because for any activities benefiting in long-term, such as research and development or deploying new

strategies, the cost is committed now but the fruits is uncertain and can only be reaped in future

(Ashkenas,2012; Strine, 2010). A survey of 400 executives indicates that 78% of managers admit that they

have sacrificed long-term value to smooth earnings in order to boost stock price (Graham, Harvey & Rajgopal,

2005).

6.3 Financial industry, Neoliberalism and income inequality

More than a century ago, Oscar Wilde described the cynic as “a man who knows the price of everything, but

value of nothing”. Nowadays financial speculators with data science techniques follow this definition, making

enormous profits out of price movement of securities and derivatives with little consideration about the value

behind them.

In 2016, IMF’s research department published a new report, Neoliberalism: Oversold?, rethinking the effects

of Neoliberalism polices widely deployed by many countries since 1980s. Financial openness, which is one of

the major policy of Neoliberalism, not only attracted foreign direct investment boosting the long-term

economic growth, but also brought “portfolio investment, banking and especially hot, or speculative, debt

inflows”, which leads to a high degree of income inequality in both U.S. and U.K. (Ostry, Loungani & Furceri,

2016). Volscho (2015) argued that Neoliberalism has “put Wall street back into power”, because it took the

profit rate of financial sector to a new high, while the profit rate of other industry “was not restored by the

neoliberal era”.

6.3.1 The culture of speculation in finance sector

Although data science techniques are widely used in financial speculation, it is not Data science that creates

speculation. The impacts of techniques are not decided by techniques itself, but by the people who utilise them.

From a broader point of view, the culture of speculation has gradually prevailed over the investing culture in

finance industry. Data science techniques are exploited by this speculative culture and in turn, strengthen it. In

1950, the annual turnover of stocks in U.S was 15%, which means only 15% of markets shares was traded

within that year. By the late 1990s, it rose to 100%, and in 2010, the annual turnover was 250%, which means

generally everyone in the markets trades more frequently and becomes more speculative. For institutional

Page 38: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

32

investors, the fund portfolio turnover also increased from 15%-20% in 1950s to 100% in recent years (Bogle,

2011). For last decade, the average annual value of equity IPOs in U.S. is $ 42 billion (Renaissance Capital,

2016) and $ 110 billion for secondary offerings (Bogle, 2011). Therefore the stocks markets raised around

$ 150 billion for listed companies each year, while the annual stock trading volume is around $ 30 trillion

(Pollin & Heintz, 2011), 200 times of capital raised for listed companies. At the same time, trading in futures

and other derivatives also soared. In 2010, trading volume in S&P 500 futures was $33 trillion, and the total

value of all credit default swaps (CDS) and other derivatives was $ 580 trillion, about 4 times of the value of

world’s stock and bonds (Bogle, 2011).

The initial aim of futures and derivatives is to allocate risks for investors who hold the underlying securities,

like buying insurance for a car. However, the problem now is that the finance industry is becoming so

speculative that “insurance fee” is multiple times of assets insured for. Why would this happen? Financial

institutions make profits out of fees and commissions by selling derivatives to their clients. For example,

income from sales and trading accounted for 36% of Morgan Stanley’s revenues during the first nine months

of 2010, while the revenue from investment bank, which is raising capital for companies, only accounted for

15%. Goldman Sachs made 63% of its revenue out of trading and sales from July to September in 2010, while

that of corporate finance is only 13% (Cassidy, 2010).

6.3.2 The Neoliberal reform in 1980s

Why does financial industry become more and more speculative? The initial purpose of financial industry is to

provide financial service for other productive industries. However, after the Neoliberalism policies occupied a

dominant place since 1980s, finance industry has gradually turned its goal to serve itself and the rich by

exacting profits from society. As a result, the income inequality has been deteriorated, causing many social

problems (Volscho, 2015).

In late 1970s and early 1980s, the economy of western world was stuck in stagnation, where the economy

ceased to grow but the inflation kept rising (Easterly2001). The Keynesian policies, which saved the U.S.

from the Great Depression and were deployed widely ever since, could do nothing about it, as the expense of

government were already too high and government deficits was one of the main reasons causing stagnation.

To overcome the stagnation, Neoliberal policies were deployed by U.S. President Reagan and U.K. Prime

Minister Thatcher. The Neoliberalism, following classical Liberalism, believes in the “invisible hand” of free

market, that while individuals pursue self-interest under justice in a free and competitive market, the good of

society will be promoted by individuals unintentionally (Adams, 2001). Neoliberal scholars developed Adam

Smith’s invisible hand into general equilibrium theory, proving that several free and fully-competitive

interacting markets can achieve an overall equilibrium automatically, maximizing the welfare of every

participant of markets (Samuelson, 1953). Therefore, the key features of Neoliberal reform are reduction of

welfare and taxation, deregulation of finance industry in order to promote free markets (Prasad, 2006).

Page 39: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

33

Figure 38 Financial Business and Nonfinancial Business Augmented Rates of Profit in U.S. (Bakir&

Campbell, 2013)

6.3.3 Financial innovation and income inequality

Neoliberal policies did work since 1980s, as shown in figure 38, the declining trend in rate of profits of both

financial and nonfinancial business in U.S. ceased in early 1980s and started go up. The side effects of

Neoliberalism was not clear until 1990s, when the profit rate of finance began to soar, while that of

nonfinancial business fluctuated around 5%-8%.

The deregulation of financial sector gave place for financial innovations, such as those super speculation

weapons based on data science techniques, or subordinated debt and CDS which brought the whole world into

crisis in 2007. These financial innovations were invented under the pursuit of self-interest in free markets, but

unlike theoretical hypothesis, much of these innovations have nothing to do with the good of society, the

results of which are transferring wealth from the majority of society to those very rich people. Paul Volcker, a

former chairman of Federal Reserve, stated that ATM was the only financial innovation that improved society

(WSJ, 2009). Paul Krugman, a Nobel Prize winner in economics, views the rapid growth of finance since

1980s, the start point of Neoliberal era, “largely as rent-seeking, rather than true productivity” (Krugman,

2009).

Page 40: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

34

Figure 39 the top decile income share from 1917-2014 (Saez, 2013)

One of the social impacts of Neoliberalism and financial innovations is income inequality. As financial

speculators extract profits with data science techniques from other investors, financial industry drains wealth

from real economy. It has been estimated that each year around $ 635 billion wealth was transferred to

financial sector in United States (Turbeville, 2012). As shown in figure 40, the income share of top 10% rich

American families started to rise since 1980s, and kept rising even after the financial crisis in 2007, reached

50% in 2012, the highest level after the Great depression.

Figure 40 Real Median Household Income in the United States (source: U.S. Bureau of the Census)

Page 41: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

35

While the rich get richer, the public suffer. Figure 40 shows that the real median household income in the

United States reached its peak in 1998, and has been declining since then. There are many factors caused the

depravation of income inequality. Excessive financial speculation within the overgrowth of financial industry

is definitely one of them, if not the biggest one.

7. Conclusion

Backtesting and automated trading system are the new tools produced by data science techniques for financial

trading. As described and demonstrated in this research, with these new tools, it is possible to build various

types of trading strategy with computer coding and test them with real historical data. Trading strategy can be

analysed and improved based on the evaluation of backtesting results.

Many applications can be build on these new tools, such as various profitable trading strategies and high

frequency trading (HFT). These powerful applications impose significant financial impacts. In this research,

three trading strategies are reproduced, Mean Average strategy, Improved Mean Average strategy and MACD

strategy. After testing these three strategies with the historical data of 15 listed companies, MACD strategy

and Improved Mean Average strategies perform much better than Mean Average strategy, making profits on

all 15 stocks. The average compound return rate for MACD strategy is 20% per year during previous five

years. A much higher return rate can be achieved if applying same trading strategy with high frequency data.

The average daily return rate of MACD strategy in index future is around 30%.

These new tools and applications are so profitable that may have many potential impacts on both economy

and society. Data Science Techniques enhance the profit rate for financial speculators dramatically, as

speculators can improve their trading strategies at low cost with backtesting, and conduct the realistic trading

exactly according to their trading strategies with automated trading system. As a result, the balance between

speculation and investing in finance industry may have been broken, as the return rate for financial

speculation soared with data science techniques. However, the excessive financial speculation may have

negative social impacts, because it subtracts value from the society, costing a great deal of both physical and

human capital without providing any socially beneficial service. Excessive speculation may also compel

companies to sacrifice long-term value in order to meet short-term goals set by financial markets. From macro

point of view, it is the Neoliberal policies deployed since 1980s that gave rise to various financial innovations.

Many of these innovations, despite aiming to pursue self-interest, contribute nothing to the good of society.

Similar to speculative trading program built by data science techniques exacting profits from other investors,

financial innovations may transfer large deal of wealth from real economy to financial sector, resulting the

depravation of income inequality.

The limitation of this research is that the demonstration and case study in this study are based on secondary

resources such as books and literature. Data Science techniques do enhance the power of financial speculation,

but it is still unclear what happens in realistic financial speculation. Future work may need to acquire more

market data or interview traders to get a better understanding of these new tools. It is also suggested that more

policy-oriented research should be done, in order to protect not only investors but also society from the harm

of excessive financial speculation.

Page 42: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

References

Adams, I. (2001). Political ideology today. Manchester University Press.

Amihud, Y., & Mendelson, H. (1980). Dealership market: Market-making with

inventory. Journal of Financial Economics, 8(1), 31-53.

Angel, J. J., & McCabe, D. (2013). Fairness in financial markets: The case of high

frequency trading. Journal of business ethics, 112(4), 585-595.

Appel, G. (2003). Become Your Own Technical Analyst: How to Identify Significant

Market Turning Points Using the Moving Average Convergence-Divergence Indicator

or MACD. The Journal of Wealth Management, 6(1), 27-36.

Ashkenas, R. (2012). Thinking Long-Term in a Short-Term Economy. Retrieved July

10, 2016 from https://hbr.org/2012/08/thinking-long-term-in-a-short/

Austin, M. P., Bates, G., Dempster 3, M. A., Leemans, V., & Williams, S. N. (2004).

Adaptive systems for foreign exchange trading. Quantitative Finance,4(4), 37-45.

Bailey, D. H., Borwein, J. M., de Prado, M. L., & Zhu, Q. J. (2014).

Pseudomathematics and financial charlatanism: The effects of backtest over fitting on

out-of-sample performance. Notices of the AMS, 61(5), 458-471.

Bakir, E., & Campbell, A. (2013). The Financial Rate of Profit What is it, and how

has it behaved in the United States? Review of Radical Political Economics, 45(3),

295-304.

Bogle, J. C. (2011). The clash of the cultures. Journal of Portfolio Management, 37(3),

14.

Bondarenko, O. (2003). Statistical arbitrage and securities prices. Review of Financial

Studies, 16(3), 875-919.

Boot, A. W., &Thakor, A. V. (1997). Financial system architecture. Review of

Financial studies, 10(3), 693-733.

Cao, L., Wang, J., Lin, L., & Zhang, C. (2004, September). Agent services-based

infrastructure for online assessment of trading strategies. In Intelligent Agent

Technology, 2004.(IAT 2004). Proceedings. IEEE/WIC/ACM International

Conference on (pp. 345-348). IEEE.

Carrion, A. (2013). Very fast money: High-frequency trading on the

NASDAQ. Journal of Financial Markets, 16(4), 680-711.

Page 43: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Cassidy, J. (2010). WHAT GOOD IS WALL STREET? Much of what investment

bankers do is socially worthless. Retrieved July 10, 2016 from

http://www.newyorker.com/magazine/2010/11/29/what-good-is-wall-street

Chong, T. T. L., & Ng, W. K. (2008). Technical analysis and the London stock

exchange: testing the MACD and RSI rules using the FT30. Applied Economics

Letters, 15(14), 1111-1114.

Commodity Futures Trading Commission. (2005). CFTC GLOSSARY.Retrieved July

11, 2016 from

http://www.cftc.gov/ConsumerProtection/EducationCenter/CFTCGlossary/index.htm

Creamer, G., & Freund, Y. (2010). Automated trading with boosting and expert

eighting. Quantitative Finance, 10(4), 401-420.

Dempster, M. A., &Leemans, V. (2006). An automated FX trading system using

adaptive reinforcement learning. Expert Systems with Applications,30(3), 543-552.

Easterly, W. (2001). The lost decades: developing countries' stagnation in spite of

policy reform 1980–1998. Journal of Economic Growth, 6(2), 135-157.

Elliott, R. J., Van Der Hoek, J., & Malcolm, W. P. (2005). Pairs trading.Quantitative

Finance, 5(3), 271-276.

Forbes. (2016). The World’s Billionaires. Retrieved July 10, 2016 from

http://www.forbes.com/profile/james-simons/

Goldstein, M. A., Kumar, P., & Graves, F. C. (2014). Computerized and

High‐Frequency Trading. Financial Review, 49(2), 177-202.

Graham, B., & Dodd, D. L. (1934). Security analysis: principles and technique.

McGraw-Hill.

Graham, J. R., Harvey, C. R., &Rajgopal, S. (2005). The economic implications of

corporate financial reporting. Journal of accounting and economics, 40(1), 3-73.

Hazen, T. L. (1991). Rational Investments, Speculation, or Gambling--Derivative

Securities and Financial Futures and Their Effect on the Underlying Capital

Markets. Nw. UL Rev., 86, 987.

Page 44: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Hendershott, T., & Riordan, R. (2013). Algorithmic trading and the market for

liquidity. Journal of Financial and Quantitative Analysis, 48(04), 1001-1024.

Hiltonsmith, R. (2012). The Retirement Savings Drain: The Hidden and Excessive

Costs of 401 (k) s. Demos, May, 29.

Jones, C. M. (2013). What do we know about high-frequency trading?.Columbia

Business School Research Paper, (13-11).

Kabacoff, R. (2015). R in action: data analysis and graphics with R. Manning

Publications Co..

Kitchin, R. (2014). The data revolution. SAGE publications.

Krugman, P. (2009). Darling, I love you. Retrieved July 10, 2016 from

http://krugman.blogs.nytimes.com/2009/12/09/darling-i-love-you

Lauricella, T. (2010). Market Plunge Baffles Wall Street. Retrieved November 11,

2015 from

http://www.wsj.com/articles/SB10001424052748704370704575228664083620340

Lefevre, E. (2004). Reminiscences of a stock operator (Vol. 175). John Wiley & Sons.

Levine, D.M. (2013), A day in the quiet life of a NYSE floor trader, Retrieved April

10, 2016 from

http://fortune.com/2013/05/29/a-day-in-the-quiet-life-of-a-nyse-floor-trader/

Lewis, M. (2014). Flash boys: a Wall Street revolt. WW Norton & Company.

Liu, Z., & Xiao, D. (2009). An automated trading system with multi-indicator fusion

based on DS evidence theory in forex market. In Fuzzy Systems and Knowledge

Discovery, 2009. FSKD'09. Sixth International Conference on (Vol. 3, pp. 239-243).

IEEE.

Livermore, J. (2006). How to trade in stocks. McGraw Hill Professional.

Lopez, J.A. and Saidenberg, M.R., 2000. Evaluating Credit Risk Models. Journal of

Banking and Finance, 24, 151-167

Madura J. (2012). Financial Institutions and Markets, (10th

Edition), Ohio, United

Page 45: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

States: Cengage South-Western

Manahov, V., Hudson, R., &Gebka, B. (2014). Does high frequency trading affect

technical analysis and market efficiency? And if so, how?. Journal of International

Financial Markets, Institutions and Money, 28, 131-157.

Maier-Paape, S., & Platen, A. (2014). Backtest of trading systems on candle

charts. arXiv preprint arXiv:1412.5558.

Mallaby, S. (2010). More money than god: Hedge funds and the making of the new

elite. A&C Black.

Marshall, B. R., Nguyen, N. H., &Visaltanachoti, N. (2013). ETF arbitrage: Intraday

evidence. Journal of Banking & Finance, 37(9), 3486-3498.

Masteika, S., Rutkauskas, A. V., & Alexander, J. A. (2012, February). Continuous

futures data series for back testing and technical analysis. In Conference Proceedings,

3rd International Conference on Financial Theory and Engineering (Vol. 29, pp.

265-269). IACSIT Press.

Menkveld, A. J. (2014). High‐Frequency Traders and Market Structure. Financial

Review, 49(2), 333-344.

Mises, L.V. (2010), The Wealth of Generations: Warren Buffett, Retrieved July 10,

2016 fromhttp://thewealthofgenerations.blogspot.co.uk/2010/01/warren-buffett.html

Mishkin, F. S. (2007). The economics of money, banking, and financial markets.

Pearson education.

Ni, J., & Zhang, C. (2005). An efficient implementation of the backtesting of trading

strategies. In Parallel and Distributed Processing and Applications(pp. 126-131).

Springer Berlin Heidelberg.

O’Hara, M. (2015). High frequency market microstructure. Journal of Financial

Economics, 116(2), 257-270.

Ostry, J, Loungani, P.&Furceri, D. (2016). Neoliberalism: Oversold?Finance&

Development, June 2016, Vol. 53, No. 2.

Pasquale, F. (2015). The black box society: The secret algorithms that control money

and information. Harvard University Press.

Pilbeam, K. (2010). Finance and financial markets. Palgrave Macmillan.

Page 46: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Pollin, R., &Heintz, J. (2011). Transaction Costs, Trading Elasticities and the

Revenue Potential of Financial Transaction Taxes for the United States.Research

Brief, December 2011, 1-16.

Prasad, M. (2006). The politics of free markets: The rise of neoliberal economic

policies in Britain, France, Germany, and the United States (Vol. 19). Chicago:

University of Chicago Press.

Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and

data-driven decision making. Big Data, 1(1), 51-59.

Reinhart, C. M., & Rogoff, K. S. (2008). Is the 2007 US sub-prime financial crisis so

different? An international historical comparison (No. w13761). National Bureau of

Economic Research.

Renaissance Capital. (2016). IPO Center. Retrieved July 10, 2016 from

http://www.renaissancecapital.com/ipohome/press/mediaroom.aspx?market=us

Renaissance Technologies. (2016). Job Opening for Quantitative Finance. Retrieved

July 10, 2016 from https://www.rentec.com/Jobs.action?research=true

Rubin, R. & Collins, M. (2015). How an Exclusive Hedge Fund Turbocharged Its

Retirement Plan. Retrieved July 10, 2016 from

http://www.bloomberg.com/news/articles/2015-06-16/how-an-exclusive-hedge-fund-t

urbocharged-retirement-plan

Saez, E. (2013). Striking it Richer: The Evolution of Top Incomes in the United States

(updated with 2012 preliminary estimates). Berkeley: University of California,

Department of Economics. http://elsa. berkeley. edu/~ saez/saez-UStopincomes-2012.

pdf et The World Top Incomes Database. h ttp://topincomes. gmond.

parisschoolofeconomics. eu.

Samuelson, P. A. (1953). Prices of factors and good in general equilibrium. The

Review of Economic Studies, 1-20.

Schutt, R. & O'Neil, C. (2013). Doing Data Science. California, United States:

O’Reilly Media.

Securities and Exchange Commission. (2010). Concept Release on Equity Market

Structure (Release No. 34-61358). Washington, DC: U.S.

Securities and Exchange Commission, & Securities and Exchange Commission.

Page 47: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

(2010). Findings regarding the market events of May 6, 2010.Report of the Staffs of

the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues.

Serbera, J. P., &Paumard, P. (2016). The fall of high-frequency trading: A survey of

competition and profits. Research in International Business and Finance, 36,

271-287.

Selyukh, A.(2013), Hackers send fake market-moving AP tweet on White House

explosions, Retrieved April 9, 2016 from

http://www.reuters.com/article/net-us-usa-whitehouse-ap-idUSBRE93M12Y2013042

3

Shaffer, L., China’s wild swings spark confusion in markets. Retrieved November 11,

2015 from http://www.cnbc.com/id/100967541

Solomon, J. (2010). Apple stock now costs $94. Fans love it. Retrieved July 10, 2016

from http://money.cnn.com/2014/06/09/investing/apple-stock-split-reactions/

Steiner, C. (2010). Wall Street's speed war.Retrieved July 10, 2016

fromhttp://www.forbes.com/forbes/2010/0927/outfront-netscape-jim-barksdale-daniel

-spivey-wall-street-speed-war.html

Strine Jr, L. E. (2010). One Fundamental Corporate Governance Question We Face:

Can Corporations Be Managed for the Long Term Unless Their Powerful Electorates

Also Act and Think Long Term?. The Business Lawyer, 1-26.

Szado, E. (2011). Defining speculation: The first step toward a rational dialogue. The

Journal of Alternative Investments, 14(1), 75.

The Crises. (2009). Income Inequality in the US. Retrieved July 10, 2016 from

http://www.the-crises.com/income-inequality-in-the-us-1/

Tong, L. (2014). A blessing or a curse? The impact of high frequency trading on

institutional investors. In The Impact of High Frequency Trading on Institutional

Investors (October 5, 2015). European Finance Association Annual Meetings.

Turbeville, W. C. (2012). New Perspective on the Costs and Benefits of Financial

Regulation: Inefficiency of Capital Intermediation in a Deregulated System, A. Md. L.

Rev., 72, 1173.

Page 48: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Volscho, T. (2015). The Revenge of the Capitalist Class: Crisis, the Legitimacy of

Capitalism and the Restoration of Finance from the 1970s to Present. Critical

Sociology, 0896920515589003.

Wall Street Journal. (2009). Paul Volcker: Think More Boldly. Retrieved July 10,

2016 from

http://www.wsj.com/articles/SB10001424052748704825504574586330960597134

Zimmer, B. (2010). Quants. Retrieved July 10, 2016 from

http://www.nytimes.com/2010/05/16/magazine/16FOB-OnLanguage-t.html?_r=0

Page 49: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Appendices

A: Websites of Renaissance Technologies

Page 50: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques
Page 51: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

B: Codes programmed for this research with R studio

setwd("U:/ Disertation/program")

#-------------------#

#5.1.1 Get data from Yahoo

#-------------------#

install.packages("quantmod")

library(quantmod)

stock_data<-getSymbols("AAPL",from = "2010-01-01",to = "2016-06-21",src =

"yahoo",auto.assign=FALSE,row.names= 1)

colnames(stock_data)<-c('OPEN','HIGH','LOW','CLOSE','Volue','Adjust')

stock_data$DATE[1]<-as.character(index(stock_data)[1])

for(i in 1:length(stock_data$OPEN)) {stock_data$DATE[i]<-as.character(index(stock_data)[i]) }

stock_data<-stock_data[,c(7,1:4)]

write.csv(stock_data, file="data/AAPL.csv",row.names=F,quote=F)

#-------------------#

#5.1.2 Visualization

#-------------------#

Cplot(stock_data)

#-------------------#

#Plot C-graph

Cplot<-function(stock)

{

par(mar=c(2,4,1.5,0.5))

N<-length(stock$OPEN)

plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5)

title(main="APPLE",cex=2,col='black')

w<-0.3

for(i in 1:N)

{

D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])

lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)

x<-c(i-w,i-w,i+w,i+w)

y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])

if(D<0)

{

polygon(x,y,col='red',border='red')

} else

{

polygon(x,y,col='green',border='green')

}

Page 52: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

}

Index<-seq(from=1,to=N,length=5)

Index<-round(Index)

Text<-stock$DATE[Index]

axis(side=1,Index,labels=Text,cex.axis=1)

}

#-------------------#

#5.1.3Deal with stock split

#-------------------#

stock<-read.csv("data/AAPL.csv",header=T,encoding='UTF-8',as.is=TRUE)

stock<-Stock_Split(stock)

Cplot(stock)

#-------------------#

#Function for APPLE stock split

Stock_Split<-function(stock)

{

for(i in 1:1114)

{

stock$OPEN[i]<-stock$OPEN[i]/7

stock$HIGH[i]<-stock$HIGH[i]/7

stock$LOW[i]<-stock$LOW[i]/7

stock$CLOSE[i]<-stock$CLOSE[i]/7

}

return(stock)

}

#-------------------#

#-------------------#

#5.2.1Calculate MA

#-------------------#

stock<-Calculate_MA(stock)

Kplot_MA(stock)

#-------------------#

#Calculate MA5, MA20

Calculate_MA<-function(stock)

{

stock$MA5<-as.numeric(0)

stock$MA20<-as.numeric(0)

N<-length(stock$DATE)

for(i in 5:N)

Page 53: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

{

stock$MA5[i]<-mean(stock$CLOSE[(i-4):i])

}

for(i in 20:N)

{

stock$MA20[i]<-mean(stock$CLOSE[(i-19):i])

}

return(stock)

}

#-------------------#

# Plot MA

Kplot_MA<-function(stock,Number)

{

par(mar=c(2,4,1.5,0.5))

N<-length(stock$OPEN)

plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5)

title(main="APPLE",cex=2,col='black')

w<-0.3

for(i in 1:N)

{

D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])

lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)

x<-c(i-w,i-w,i+w,i+w)

y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])

if(D<0)

{

polygon(x,y,col='red',border='red')

} else

{

polygon(x,y,col='green',border='green')

}

}

lines(c(1:N),stock$MA5,lwd=2,col="black")

lines(c(1:N),stock$MA20,lwd=2,col="orange")

Index<-seq(from=1,to=N,length=5)

Index<-round(Index)

Text<-stock$DATE[Index]

axis(side=1,Index,labels=Text,cex.axis=1)

}

#-------------------#

#-------------------#

#5.2.2 Backtesting MA strategy

#-------------------#

Page 54: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

stock<-Cut_Period(stock,253,1628)

stock<-MA_Strategy(stock)

#-------------------#

Cut_Period<-function(data,start,end)

{

data<-data[c(start:end),]

return(data)

}

#-------------------#

MA_Strategy<-function(stock)

{

stock$Trade<-0

stock$Hold<-0

N<-length(stock$DATE)

flag_hold<-'0'

for (i in 2:N)

{

if(stock$MA5[i-1]<=stock$MA20[i-1] & stock$MA5[i]>stock$MA20[i] & stock$Hold[i-1]!='1')

{

stock$Trade[i]<-'1'

flag_hold<-'1'

}

if(stock$MA5[i-1]>=stock$MA20[i-1] & stock$MA5[i]<stock$MA20[i] & stock$Hold[i-1]!='-1')

{

stock$Trade[i]<-'-1'

flag_hold<-'-1'

}

stock$Hold[i]<-flag_hold

}

return(stock)

}

#-------------------#

#5.2.3 Compute profits and visualization

#-------------------#

stock<-Calcualate_Profit(stock)

Kplot_Profit(stock,"APPLE")

Calcualate_Profit<-function(stock)

{

Page 55: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

stock$Profit<-as.numeric(0)

stock$Delta_Profit<-0

flag_first_trade<-0

N<-length(stock$DATE)

for (i in 2:N)

{

if(stock$Trade[i]!='0' & flag_first_trade=='1')

{

stock$Delta_Profit[i]=((-1)*stock$OPEN[i]*as.numeric(stock$Trade[i])

-stock$CLOSE[last_trade]*as.numeric(stock$Trade[last_trade]))

last_trade<-i

}

if(stock$Trade[i]!='0' & flag_first_trade=='0')

{

flag_first_trade<-'1'

last_trade<-i

}

stock$Profit[i]<-stock$Profit[i-1]+stock$Delta_Profit[i]

}

return(stock)

}

#-------------------#

# Plot Profits

Kplot_Profit<-function(stock,stock_name)

{

m<-matrix(c(1,2,1,2),2,2)

N<-length(stock$OPEN)

layout(m,heights=c(3,1))

par(mar=c(0.5,2.5,1.5,0.5))

plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5,cex.axis=1.8)

title(main=stock_name,cex=4,col='black')

w<-0.3

for(i in 1:N)

{

D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])

lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)

x<-c(i-w,i-w,i+w,i+w)

y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])

if(D<0)

{

Page 56: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

polygon(x,y,col='red',border='red')

if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-4,"buy",col='green',cex=2)}

if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+5,"sell",col='red',cex=2)}

} else

{

polygon(x,y,col='green',border='green')

if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-4,"buy",col='green',cex=2)}

if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+5,"sell",col='red',cex=2)}

}

}

text((N*0.1),(max(as.numeric(stock$HIGH))-10),paste("Profit

rate:",Profit_rate(stock),"%"),col='black',cex=2)

lines(c(1:N),stock$MA5,lwd=2,col="black")

lines(c(1:N),stock$MA20,lwd=2,col="orange")

Index<-seq(from=1,to=N,length=5)

Index<-round(Index)

Text<-stock$DATE[Index]

axis(side=1,Index,labels=Text,cex.axis=1.8)

plot(c(1:N),stock$Profit,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)

abline(0,0)

lines(c(1:N),stock$Profit,lwd=2,col="blue")

}

#-------------------#

#-------------------#

#5.2.4 Evaluation and discussion

#-------------------#

temp<-Cut_Period(stock,1,270)

Kplot_Profit(temp,"APPLE")

#-------------------#

#-------------------#

#5.3.1 Avoid trades in range bound

#-------------------#

profits<-c()

for (i in 1:30)

{

stock<-MA_Avoid_Range_Bound(stock,i)

stock<-Calcualate_Profit(stock)

profits[i]<-Profit_rate(stock)

}

max(profits)

Page 57: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

i<-which.max(profits)

i

profits

stock<-MA_Avoid_Range_Bound(stock,13)

stock<-Calcualate_Profit(stock)

Kplot_Profit(stock,"APPLE")

MA_Avoid_Range_Bound<-function(stock,N)

{

stock$Signal<-0

stock$Trade<-0

stock$Hold<-0

Len<-length(stock$DATE)

flag_hold<-'0'

for (i in (N+1):Len)

{

if(stock$MA5[i-1]<=stock$MA20[i-1] & stock$MA5[i]>stock$MA20[i] & stock$Hold[i-1]!='1')

{

stock$Signal[i]<-'1'

if(sum(as.numeric(stock$Signal[(i-N):(i-1)]))==0)

{

stock$Trade[i]<-'1'

flag_hold<-'1'

}

}

if(stock$MA5[i-1]>=stock$MA20[i-1] & stock$MA5[i]<stock$MA20[i] & stock$Hold[i-1]!='-1')

{

stock$Signal[i]<-'1'

if(sum(as.numeric(stock$Signal[(i-N):(i-1)]))==0)

{

stock$Trade[i]<-'-1'

flag_hold<-'-1'

}

}

stock$Hold[i]<-flag_hold

}

return(stock)

}

Profit_rate<-function(stock)

{

profit<-stock$Profit[length(stock$Profit)]

average_cost<-(min(stock$CLOSE)+max(stock$CLOSE))/2

return(round(100*profit/average_cost,2))

Page 58: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

}

#-------------------#

#-------------------#

#5.4 Using technical indicators

#-------------------#

stock<-read.csv("data/AAPL.csv",header=T,encoding='UTF-8',as.is=TRUE)

stock<-Stock_Split(stock)

Cplot(stock)

stock<-Calculate_MACD(stock)

stock<-Cut_Period(stock,253,1628)

stock<-MACD_Cross(stock)

stock<-Calcualate_Profit(stock)

Kplot_MACD(stock,"APPLE")

temp<-Cut_Period(stock,1,330)

Kplot_MACD(temp,"APPLE")

# Calculate MACD

Calculate_MACD<-function(stock)

{

library(TTR)

stock$EMA12<-'0'

stock$EMA26<-'0'

stock$DIFF<-'0'

stock$DEA<-'0'

stock$MACD<-'0'

stock$EMA12<-EMA(stock$CLOSE,12)

stock$EMA26<-EMA(stock$CLOSE,26)

stock$DIFF<-stock$EMA12-stock$EMA26

stock$DEA<-EMA(stock$DIFF,9)

stock$MACD<-2*(stock$DIFF-stock$DEA)

return(stock)

}

#-------------------#

#MACD Cross strategy

MACD_Cross<-function(stock)

{

stock$Trade<-0

stock$Hold<-0

N<-length(stock$DATE)

flag_hold<-'0'

Page 59: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

for (i in 2:N)

{

if(stock$DIFF[i-1]<=stock$DEA[i-1] & stock$DIFF[i]>stock$DEA[i] & stock$Hold[i-1]!='1')

{

stock$Trade[i]<-'1'

flag_hold<-'1'

}

if(stock$DIFF[i-1]>=stock$DEA[i-1] & stock$DIFF[i]<stock$DEA[i] & stock$Hold[i-1]!='-1')

{

stock$Trade[i]<-'-1'

flag_hold<-'-1'

}

stock$Hold[i]<-flag_hold

}

return(stock)

}

#-------------------#

#Plot MACD

Kplot_MACD<-function(stock,stock_name)

{

m<-matrix(c(1,2,3,1,2,3),3,2)

N<-length(stock$OPEN)

layout(m,heights=c(3,1,1))

par(mar=c(0.5,2.5,1.5,0.5))

plot(c(1:N),stock$CLOSE,type='n',xaxt='n',xlab='',ylab='Price',font.axis=1.5,cex.axis=1.8)

title(main=stock_name,cex=2,col='black')

w<-0.3

for(i in 1:N)

{

D<-as.numeric(stock$CLOSE[i])-as.numeric(stock$OPEN[i])

lines(c(i,i),c(stock$LOW[i],stock$HIGH[i]),col='black',lwd=1)

x<-c(i-w,i-w,i+w,i+w)

y<-c(stock$OPEN[i],stock$CLOSE[i],stock$CLOSE[i],stock$OPEN[i])

if(D<0)

{

polygon(x,y,col='green',border='green')

if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-0.2,"buy",col='green',cex=2)}

if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+0.2,"sell",col='red',cex=2)}

} else

{

Page 60: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

polygon(x,y,col='red',border='red')

if(stock$Trade[i]=='1'){text(i,as.numeric(stock$LOW[i])-1.5,"buy",col='green',cex=2)}

if(stock$Trade[i]=='-1'){text(i,as.numeric(stock$HIGH[i])+1.5,"sell",col='red',cex=2)}

}

}

Index<-seq(from=1,to=N,length=5)

Index<-round(Index)

Text<-stock$DATE[Index]

axis(side=1,Index,labels=Text,cex.axis=1.8)

text((N*0.1),(max(as.numeric(stock$HIGH))-10),paste("Profit

rate:",Profit_rate(stock),"%"),col='black',cex=2.5)

plot(c(1:N),stock$DIF,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)

w<-0.1

for(i in 1:N)

{

if(stock$MACD[i]<0)

{

lines(c(i,i),c(stock$MACD[i],0),col='red',lwd=1)

}

if(stock$MACD[i]>0)

{

lines(c(i,i),c(0,stock$MACD[i]),col='green',lwd=1)

}

}

abline(0,0)

lines(c(1:N),stock$DIF,lwd=2,col="red")

lines(c(1:N),stock$DEA,lwd=2,col="blue")

plot(c(1:N),stock$Profit,type='n',xaxt='n',xlab='',ylab='Profit',font.axis=1.5,cex.axis=1.8)

abline(0,0)

lines(c(1:N),stock$Profit,lwd=2,col="blue")

}

#-------------------#

#-------------------#

#5.5 Applying same strategy to other stocks

#-------------------#

stock_list<-

c("NYSE:BRK.A","NYSE:JPM","NYSE:XOM","NYSE:TM","NYSE:T","HSBA.L","NYSE:C","NYSE:WMT","KR

X:005930","VOW.F","NASDAQ:MSFT","NASDAQ:GOOGL","F","NYSE:IBM","VTX:NESN")

stock_Num<-15

stock_code<-stock_list[stock_Num]

stock_data<-getSymbols(stock_code,from = "2010-01-01",to = "2016-06-21",src =

"google",auto.assign=FALSE,row.names= 1)

Page 61: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

stock_data<-stock_data[,c(1:4)]

colnames(stock_data)<-c('OPEN','HIGH','LOW','CLOSE')

stock_data$DATE<-""

for(i in 1:length(stock_data$OPEN)) {stock_data$DATE[i]<-as.character(index(stock_data)[i]) }

stock<-stock_data[,c(5,1:4)]

stock<-Calculate_MA(stock)

stock<-Calculate_MACD(stock)

stock<-stock[c(253:length(stock$OPEN)),]

stock_file<-paste("data/",stock_Num,".csv",sep="")

write.csv(stock, file=stock_file,row.names=F,quote=F)

stock_file<-paste("data/",stock_Num,".csv",sep="")

stock<-read.csv(stock_file,header=T,encoding='UTF-8',as.is=TRUE)

stock<-MA_Strategy(stock)

stock<-Calcualate_Profit(stock)

Kplot_Profit(stock,stock_list[stock_Num])

pro<-c()

for (i in 1:30)

{

stock<-MA_Avoid_Range_Bound(stock,i)

stock<-Calcualate_Profit(stock)

pro[i]<-Profit_rate(stock)

}

max(pro)

i<-which.max(pro)

i<-21

stock<-MA_Avoid_Range_Bound(stock,i)

stock<-Calcualate_Profit(stock)

Kplot_Profit(stock,stock_list[stock_Num])

stock<-MACD_Cross(stock)

stock<-Calcualate_Profit(stock)

Kplot_MACD(stock,stock_list[stock_Num])

#-------------------#

#5.6 Applying MACD strategy to high frequency trading

#-------------------#

temp=list.files(path="data/if_csv",pattern = "*.csv",all.files = T)

pro_fut<-c()

i=1

Page 62: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

future_Num<-substr(temp[i],1,17)

future_file<-paste("data/if_csv/",future_Num,".csv",sep="")

index_future<-read.csv(future_file,header=T,encoding='UTF-8',as.is=TRUE)

index_future<-MACD_Cross(index_future)

index_future<-Calcualate_Profit(index_future)

Kplot_MACD(index_future,future_Num)

pro_fut[i]<-Profit_rate(index_future)

pro_fut

mean(pro_fut)

Page 63: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

Information School.

Access to Dissertation

A Dissertation submitted to the University may be held by the Department (or School) within which the

Dissertation was undertaken and made available for borrowing or consultation in accordance with

University Regulations.

Requests for the loan of dissertations may be received from libraries in the UK and overseas. The

Department may also receive requests from other organisations, as well as individuals. The conservation

of the original dissertation is better assured if the Department and/or Library can fulfill such requests by

sending a copy. The Department may also make your dissertation available via its web pages.

In certain cases where confidentiality of information is concerned, if either the author or the supervisor so

requests, the Department will withhold the dissertation from loan or consultation for the period specified

below. Where no such restriction is in force, the Department may also deposit the Dissertation in the

University of Sheffield Library.

To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box

If you are willing to give permission for the Information School to make your dissertation available in

these ways, please complete the following:

√ (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation

being made immediately available through the Department and/or University Library for

consultation, and for the Department and/or Library to reproduce this dissertation in whole or

part in order to supply single copies for the purpose of research or private study

(b) Subject to the General Regulation on Intellectual Property, I, the author, request that this

dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from

the date of its submission. Subsequent to this period, I agree to this dissertation being made

available through the Department and/or University Library for consultation, and for the

Department and/or Library to reproduce this dissertation in whole or part in order to supply

single copies for the purpose of research or private study

Name Duan Zhao

Department Information School

Signed Duan Zhao (赵端) Date Aug. 24. 2016

To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate

box

Page 64: Data science in Financial Markets: MSc Data Sciencedagda.shef.ac.uk/dispub/dissertations/2015-16/External/Duan_Zhao.pdf · Data science in Financial Markets: How Do Data Science Techniques

(a) I, the supervisor, agree to this dissertation being made immediately available through the

Department and/or University Library for loan or consultation, subject to any special restrictions

(*) agreed with external organisations as part of a collaborative project.

*Special

restrictions

(b) I, the supervisor, request that this dissertation be withheld from loan, consultation or

reproduction for a period of [ ] years from the date of its submission. Subsequent to this period,

I, agree to this dissertation being made available through the Department and/or University

Library for loan or consultation, subject to any special restrictions (*) agreed with external

organisations as part of a collaborative project

Name

Department

Signed Date

THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS BY DEPARTMENTAL REQUIREMENTS.