Discovery of Stock Trading Expertise Using Genetic Programming

7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

1/42

Laboratoire des Sciences de l'Image, de l'Informatique et de la Tldtection

LSIIT - UMR 7005

Fundamental and Applied Computer Science Research Master

Internship Research Report

Discovery of Stock TradingExpertise Using Genetic

Programming

By: Syed Muhammed Ali Jafri

Supervisor: Pr. Jerzy KORCZAK

Illkirch, September 2006


2/42

Contents1. Introduction 1

2. Background 22.1 Financial Prediction 22.2 Evolutionary Computing 22.3 Genetic Algorithms 2

2.3.1 String representation 2

2.3.2 Crossover and mutation operations 3

2.3.3 Fitness based selection 3

2.4 Genetic Programming 42.4.1 Tree structure representation 4

2.4.2 Operators and terminals 4

2.4.3 GP Crossover and mutation 5

3. Internet Bourse Experts System. 73.1 Introduction 73.2 Conceptual Flow 73.2 GP Engine in IBE 9

4 System Design and Implementation 94.1 Problem Definition 94.2 Implementation details 9

4.2.1 Technical Indicator set 10

4.2.2 Fitness Measurement12

4.2.3 Description of GP Engine 14

5. Experimentation 205.1 Experimental Aims and Objectives 205.2 Trading Procedure 205.3 Experimental Input Data 215.4 Parameters 225.5 Results 245.6 Discussion of Results 26

6. Conclusion 29

Appendix A: GP Algorithm Parameters 31

Bibliography 37


3/42

1. Introduction

Evolutionary computation has been extensively applied to problems whose solution space is

irregular, i.e., too large and highly complex, so that it is difficult to employ conventional

optimization procedures to search for the global optimum [Chen, 1998].

Solution spaces for financial time series data are highly irregular. General acceptance of this

property has in fact fostered the growth of financial engineering [Chen, 1998].

Many strategies and frameworks have been employed ranging from the traditional and more

popular autoregressive statistical approaches such as ARCH and GARCH [Gourieroux, 1997]

To more recent evolutionary approaches such as neural networks [Krishnaswamy et al., 2000],

genetic algorithms [Allen, 1999], [Korczak, 2001], [Lipinski, 2003], [Korczak, 2004] and

genetic programming [Langdon, 1995], [Kaboudan, 1999], [Santini, 2000], [Hui, 2003],

[Castebrunet,2005] which is the concern of this report.

The objective is to model a process of evolution-based learning and to create a geneticprogramming based system, which will be able to accept high frequency stock market data,

analyze it and rapidly and give BUY/SELL/HOLD signals.

This system will generate trees of technical trading rules, joined together by the logical

operators. Every decision signal at a certain point will be result of a training stage and a testing

stage. The training stage will include generation of trees, performance testing and evolution.

At the end of the training stage, a single best tree will remain which will be used to generate a

decision for the testing stage. For further time steps, a selection of the best trees will be reused

in the previous time step.

The idea is that at each time step better performing trees will be used and promoted, the lesser

trees being discarded.

This report begins with an introduction to the theory and practice of financial prediction and a

description of evolutionary computation techniques, with a particular focus on Genetic

Programming. This is followed by a review of an existing system, Internet Bourse Experts.

Details of the design and implementation of the GP system are included next, together with a

discussion of various design choices and also a description of the dynamically-adaptive GP

algorithm and how this algorithm will fit into IBE.

1/40


4/42

2. Background

2.1 Financial Prediction

The idea behind financial prediction is to use historical pricing data of the assets traded to

identify unique trends and patterns in the fluctuations of prices [Pantazopoulos et al., 1998].

These patterns and trends are used to predict what the forthcoming price movements will be

and decisions to buy or sell an asset are made on this basis.

Classes of patterns and/or trends are uniquely identified using technical indicators which can

either quantitative or qualitative. A lot of technical indicators are based on moving average

computations or from series of local minimums and maximums [Lendasse et al., 2001].

2.2 Evolutionary Computing

Evolutionary Computing concerns itself with computer programs trying to behave as livingorganisms undergoing Darwinian mechanisms of natural selection for the purpose of

optimization, adaptation or search [Koza, 1992], [Spears, 2003]. All evolutionary algorithms

involve the representation of set of possible solutions to a given problem as a population of

individuals. The fitness of each candidate solution is tested and the best individuals are

permitted to survive and produce offspring derived from them. This is seen to create complex

and highly adapted organisms - optimized solutions to the problem of survival and

reproduction in the natural environment [Xu et al., 2003].

2.3 Genetic Algorithms

Genetic Algorithms operate on a population of individuals represented by character strings

[Holland 1975], [Mitchell et al 1992]. These are evaluated according to a fitness function

appropriate to the problem in hand. Pairs of individuals, selected at random but biased

according to fitness, are recombined to create members of a new population. Starting from an

initial population of randomly generated candidate solutions, successive generations are

produced until some termination criterion is reached: This may be the convergence of the

average and maximum fitness values, or simply a limit on the number of generations.

2.3.1 String representation

Genetic Algorithms represent candidate solutions as strings - finite sequences of charactersfrom a given alphabet (typically binary or integer numeric). The method of mapping a

candidate solution to a GA string depends on the problem domain: The string may represent,

for example, an ordered sequence of operations, or a set of independent parameters. However,

a particular location in the string sequence always represents the same part or parameter of the

solution.

2/40


5/42

2.3.2 Crossover and mutation operations

The string representation used in GA is analogous to the structure of biological genetic

material - DNA. In the same way, the method of creating new GA strings mimics the

recombination mechanisms of DNA.

Crossover is the operation of exchanging information, or genetic material, between twoindividuals. It works by swapping the values at corresponding locations between pairs of

strings. There are various methods for implementing crossover suited to different applications.

The simplest method is single point crossover: a point is selected at random to divide each

string into two sections, one of which is swapped over. Alternatively, a greater number of

crossover points may be used so that more than one contiguous sub-sequence is exchanged

between parent strings. Another method, uniform crossover, acts on individual locations -

swapping each according to a fixed probability.

Mutation is simply the action of randomly changing the value of individual locations or sub-

strings within a GA sequence. Although crossover is the main factor in the evolutionary

behavior in GA, mutation is important because it is the only way of introducing new genetic

material into the overall population.

2.3.3 Fitness based selection

As stated previously, individuals are selected for reproduction randomly, but with the

probability of selection weighted according the measured fitness of the candidate solution.

There are many methods by which fitness based selection can be implemented [Blickle, 1995].

The following are the three most successful in terms of effectiveness and popularity:

Fitness Proportional Selection (FPS)

The sum total of the fitness values of all population members is calculated and a

random number is selected between zero and this value. Running through all

population members, the fitness values are summed a second time. When the sum

exceeds the randomly generated number, the current population member is returned. If

the total fitness sum is thought of as the circumference of a circle, then each individual

is represented by a sector of the circle equal to its fitness value. If a pointer is placed at

a random position on the wheel, the probability of it falling within any individual

sector is proportional to the fitness of that individual. This is why FPS is also known as

Roulette Wheel Selection. A disadvantage of this method is that a fitness proportional

selection weighting may not always be suitable. It may be desirable todisproportionately bias selection in favor of individuals whose fitness is only

marginally greater than average, or to have only a small bias towards individuals who

have very high relative measured fitness. Another problem with this method is that it

does not work with negative fitness values.

Rank Selection

This method works like FPS, only the fraction of the roulette wheel assigned to each

individual is dependent on rank position rather than absolute fitness. The degree of bias

can be controlled by using the rank position value raised by a chosen polynomial

3/40


6/42

factor. This is a comparatively slow method because the population must be sorted

according to fitness.

Tournament Selection

A group of Individuals are selected from the population at random. The fittest member

of this group is returned. The degree of selection bias is related to the size of the groupor tournament - the greater the size, the greater the relative weight of fitter individuals:

With a tournament size of two, the fittest member of the population is twice as likely to

be selected as the median. This method is the most computationally efficient as only

the individuals selected for the tournament need to be inspected.

Other various schemes also exist such as truncation selection, linear ranking selection, and

exponential ranking selection but these schemes are outperformed by the above mentioned

schemes.

For a comprehensive review on selection schemes please refer to [Blickle, 1995].

2.4 Genetic Programming

Genetic Programming (GP) is an extension of GA [Koza, 1992]. GP uses a similar

evolutionary procedure for search and optimization based on selective recombination from a

population of candidate solutions. It differs from GA in the representation of the candidate

solutions.

2.4.1 Tree structure representation

The tree structure is a hierarchical model consisting of a set of interconnected nodes (seefigure 1). Each node can have several connections to nodes at a lower level, but only a single

parent connection.

The name genetic programming refers to the fact that the tree structure is usually used to

represent a function in the style of a computer program syntax tree. The branch nodes

represent functions - they take values passed by their immediate descendants as input

arguments and return an output to their parent. The terminal leaf nodes represent input

arguments or variables. The branching hierarchy denotes the evaluation ordering of functions.

In contrast to genetic algorithms, the tree representation of GP facilitates the creation of

candidate solutions of variable size and complexity - crossover and mutation operations canalter the size of individual trees. Another important difference is that, unlike GA, there is no

specific mapping of individual parts of the tree to a part of a candidate solution. The GP

function parse tree returns output values from a given set of input variables.

2.4.2 Operators and terminals

The GP tree structure is constructed from two sets of node types - functions and terminals.

The branch nodes - those which have at least one connection to a child node - are taken from

the function set. This set typically consists of simple logical (AND, OR, etc), conditional (IF-

4/40


7/42

THEN-ELSE) arithmetic (+, -, *, /), or comparison (, =) operators. The choice of function

set is a design decision which depends on the problem domain and on the data types that GP

function should take as input and return as output. The terminal set consists of all the data

input variables which are to be evaluated by the GP function.

Function and terminal sets must be chosen such that they are capable of expressing a solutionto the problem. This means that the designer should have knowledge about the problem

domain - including some idea of the likely form of solutions.

2.4.3 GP Crossover and mutation

Genetic Programming implements crossover and mutation operations equivalent to those used

in GA. To carry out the process of crossover on a pair of trees, a single node is selected at

random from each - these form the crossover points. The sub-trees originating at these nodes

are swapped over, creating two new GP trees (shown in figure 2). If the two sub-trees contain

a different number of nodes then the resulting offspring trees will be of different sizes to theparents. Crossover is easy to implement in code by swapping over pointers between parent and

child nodes at the selected points.

The parent trees used are selected using the roulette wheel selection algorithm.

In this the parents are selected according to their fitness. The better the chromosomes are, the

more chances to be selected they have. Imagine a roulette wheel where all the chromosomes in

the population are placed. The size of the section in the roulette wheel is proportional to the

value of the fitness function of every chromosome - the bigger the value is, the larger the

section is.

A marble is thrown in the roulette wheel and the chromosome where it stops is selected.

Clearly, the chromosomes with bigger fitness value will be selected more times.

This process can be described by the following algorithm.

1. Calculate the sum of all chromosome fitnesss in population - sum S.

2. Generate random number from the interval (0,S) - r.

3. While: Go through the population and sum the fitnesss from 0 - sum s. When the sum

s is greater then r, stop and return the chromosome where you are.

Of course, the step 1 is performed only once for each population.

Mutation works in a similar way; in the strictest definition a new randomly generated sub-tree

is inserted at a randomly selected node and the displaced section is discarded.

In our application mutation could be any one of the following operations;

1. Removing a randomly selected node from the tree (Deletion operation).

2. Adding a node to a randomly selected point in the tree without removing any portion

of the original tree (Addition operation).

3. The classical definition of mutation; removing a section of a tree and replacing it with

a randomly generated sub-tree (Replacement operation) .

5/40


8/42

Because crossover can exchange sub-trees between different locations, unlike in GA, there is

less need for mutation in creating and maintaining diversity in the population of candidate

solutions. Therefore the mutation operator is sometimes left out of GP algorithms if the

population is made large enough to ensure sufficient initial diversity of available building

blocks [Mitchell,1998].

Figure 1: GP parse-tree representation of two functions taking four separate input

parameters

AND

OR

NOT

IF

In ut 1

OR

Input 2 Input 3 Input 4

NOT

Input 1 AND

Input 2

In ut 3

Input 4

NOT

Parent 1 Parent 2

Subtree 1 1

Subtree 2 1

return value return value

6/40


9/42

Figure 2: Result GP of crossover.

3. Internet Bourse Experts System

One of the goals of this project is to analyze an existing system, namely the Internet Bourse

Experts system, and to attempt to improve on its existing experts generator, which uses GA,by replacing it with a GP based system.

3.1 Introduction

Internet Bourse Experts (IBE) is an on-line multi-agent system, based on client-server

architecture, which analyzes financial data and is able to generate stock trading expertise. In

this context, this expertise is composed of trading rules in GA based strings [Korczak,

Kustner, 2001], [Korczak, Lipinski , 2004], [Lipinski , 2003].

Given a library of trading rules the objective is to find the best case scenario collection of

trading rules and to judge its efficiency without giving much priority to economic relevance.

IBE uses genetic algorithms which employs the "survival of the fittest" ideology to create AI

based experts which base their decisions on a subset of the trading rules. This does not mean a

global optimum but the most effective under the circumstances. The fitness function used to

evaluate experts in the population is explicitly tailored to stock trading. The evolutionary

approach presented here whereby knowledge-based trading systems building are to be built, is

evaluated on real financial time series.

AND

OR

NOT

IF

Input 1

OR

Input 2

Input 3 Input 4

NOT

Input 1 ANDInput 2

Input 3

Input 4

NOT

Child 1 Child 2

7/40

return valuereturn value


10/42

There are a large numberof trading rules based on technical analysis indicators. Using these

rules, financial experts and market traders make decisions on the stock market: to buy, sell, or

defer action and do nothing.

For more details refer to [Lipinski, 2003], [Korczak, 2001], [Korczak, 2004].

3.2 Conceptual Flow

Within the system, a certain number of intelligent agents exist [Zitvogel, 2003]. These agents

specialize and represent different methods to analyze and process the data as well as

heterogeneous events.Each agent is autonomous; using its own methods to analyze the market

and concentrating on its own objectives.

The starting point of the system, according to the diagram (refer to figure 3), is of course the

stock market data from which all events are conceived. This data is preprocessed by the

database agents and then stored in the database.

Preprocessing consists of grouping all the data and calculating an average. The reason for this

is, keeping in mind the large volume of data that arrives from the stock market per second.

It is useless and impossible to store all of it. It seems better that the data be preprocessed and

stored in the database by intelligent agents , the stock market being continuously analyzed by

other intelligent agents known as market watch agents which try to detect as early as possible

the important events to keep the system on track. As a consequence, the system can adapt

easily to the new situation.

When the preprocessed data finally arrives in the financial database, it is treated by two classes

of agents. The first class is focused on a global analysis of the market like the analysis of

volatility and the second one is concentrated on the analysis of individual action. Thus, thesecond class forms those specialist experts which are used to define the state of quotations of a

particular stock. In certain cases, the agents require supplementary knowledge of which is

stored in an experts database which is managed by the experts observation agents.

The expert generator uses genetic algorithms to find the best composition of rules. Each

composition forms an expert in concurrence with the others.

Also, these agents process digital data, finding agents based on textual analysis. Certain

agents can observe the flashes of information which correspond to a particular action. After

the text analysis phase, they generate an additional signal for other agents, telling them to

change certain of their parameters.

At the end, the output of each agent is captured by the visualization agent and presented to the

user.

8/40


11/42

Fig. 3: Agents of IBE

3.2 GP Engine in IBE

The idea will be to replace the GA based engine in the Experts Generator module with our GP

engine. The previous section states that the agents can be divided into two classes. The first

class of agents deal with a global analysis of the market and the second class are concerned

with individual action.

Each tree will generate trading signals with respect to the current point in time of the stockbeing monitored. During the systems migration phase, when the GA engine in IBE will be

replaced with the GP engine, the first class of agents will remain unchanged. The second class

of agents which will undergo a change. As the Experts Generator module will receive data it

will use the genetic program to generate trees of trading rules with certain predefined

parameters. Similar to the genetic algorithm, each tree will be a composition of trading rules.

4 System Design and Implementation

4.1 Problem Definition

The objective is to create a system which optimizes a set of existing technical trading rules

using historical quotation data using a GP based method. This system must be able to evolve

optimized candidate solutions and also implement an appropriate dynamically adaptive GP

learning algorithm. Meaning it must be continually evolving and adapting to the changing

dynamics of the stock market. It must be able to produce trading expertise, promoting the

fittest ones and rejecting the weaker ones.

It must be also figured that what is fit now might be weak at the next moment. This signifies

continuous fitness evaluation.

9/40

Stock Watch Agents

Database Agents

Live Stock Market Data

Volatility Agents

Financial Database

Experts Database

Experts Observation Agents

Experts Generator

Action Analysis Agents

Users Database

Text Analysis Agents

Visualization Agents

Security Agents

Users


12/42


13/42

Price Channel Breakout (PCB)

If the current price exceeds the maximum from the previous n time units, BUY; if it

goes below the minimum from this period, SELL; otherwise HOLD.

If Pricecurrent x=currentncurrent Pricex Return BUY

Else If Price current x=currentncurrent Pricex Return SELLELSE Return HOLD

Simple Moving Average Crossover (SMAC)

If a short term (5-day) moving average value crosses above a long term (50-day)

moving average then BUY; if the short term average crosses below then SELL.

If MovingAverageShortTerm MovingAverageLongTerm Return BUY

Else If MovingAverageShortTerm MovingAverageLongTerm Return SELL

ELSE Return HOLD

Moving Average Convergence Divergence (MACD)

The MACD is the difference between a short term and long term price Exponential

Moving Average (EMA) values. If the MACD crosses above its own EMA value,

return a BUY indicator; SELL if it crosses below.

MovingAverageCD=ExponentialMovingAverageShortTermExponentialMovingAverageLongTerm

If MovingAverageCD ExponentialMovingAverageCurrent Return BUY

Else If MovingAverageCD ExponentialMovingAverageCurrent Return SELLELSE Return HOLD

Relative Strength Index (RSI)

The RSI compares the magnitude of a stock's recent gains to the magnitude of its

recent losses and turns that information into a number that ranges from 0 to 100. It

takes a single parameter, n, the number of time periods to use in the calculation

11/40

[4.2]

[4.3]

[4.4]


14/42

AverageGain = TotalGains / nAverageLoss = TotalLoss / n

FirstRelativeStrength = AverageGain /AverageLoss

For Count=2 to n

SmoothedRelativeStrengthn=[AverageGaincount1 count1Gaincount]/ count[AverageLoss count1 count1Losscount] / count

RelativeStrengthIndex=1100

1RelativeStrength

If RelativeStrengthIndex 70 then BUYElse If RelativeStrengthIndex 30 then SELL

Else Hold

K-Stochastic

A technical momentum indicator that compares a security's closing price to its price

range over a given time period. The oscillator's sensitivity to market movements can be

reduced by adjusting the time period or by taking a moving average of the result. It

takes a single parameter, n, the number of time periods to use in the calculation

%K = 100[Price CurrentLowestPricen/HighestPricenLowestPricen]

%D = 3-Period Moving Average of %K

If %K %D then BUY

Else If %K %D then SELLElse HOLD

1-Day Price Change

The 1-Day Price Change Indicator gives a BUY signal if the price has risen from the

previous days value and SELL if it has dropped. This is a naive trading strategy that is

being used to benchmark the performance of the GP-evolved trading rules.

If PriceCurrent Price Current1 then BUY

Else If PriceCurrent PriceCurrent1 then SELL

Else HOLD

4.2.2 Fitness Measurement

The fitness of individual technical trading rules is measured directly from the returns

generated by simulated trading using those rules [Altenberg, 1993].

12/40

[4.5]

[4.7]

[4.6]


15/42

A set of ratios to measure the performance of a stock movement have been observed and

analyzed [Lipinski, 2003]. These ratios, while not very useful on their own, do provide a

valuable insight on the dynamics of a stock price when used in conjunction with each other.

They include:

Sharpe Ratio

Sharpe Ratio=rprfp

where :

rp is Expected Portfolio Return

rf is Risk Free Rate

is p is Portfolio standard deviation

Source : [Sharpe, 1996]

The Sharpe ratio measures risk adjusted performance. On an international front, current

Sharpe ratios range from 1.7 to 2.5 with the average being 0.9, the ratio of choice by

modern standards being above 1.0 [Domash, 2006].

The larger the Sharpe ratio the better (the more consistent the results). The ratio will be

negative if the average return is less than the risk-free return. Some systems exhibit a

Sharpe ratio of 0.5 or more, and ratios above 1.0 are sometimes seen. For a long-term

system, open profit should be included in each month's profit and loss data in order for

the Sharpe ratio to be meaningful. If there are less than 12 months of data, we do not

calculate the Sharpe ratio, because such a small number of data points might not be

statistically significant and could give misleading results.

The one-year (short-term) Sharpe ratio provides an indication of how well a system has

performed in the most recent 12 months. The calculation uses the average monthly

profit/loss in excess of the risk-free return for the most recent 12 months, divided by

the standard deviation of monthly profits and losses over the same period.

Sortino ratio

Sortino Ratio=< R >R fd

where :

< R > is Expected ReturnRf is The Risk - Free Rate of Return

p is Standard deviation of Negative Asset Returns

Source: [Sortino, 1994]

The larger the Sortino ratio the better. The Sortino ratio will be larger if the profit is

high, and if the disappointments are small. For a given average disappointment, the

Sortino ratio would be better if there were many small disappointments, rather than a

few large disappointments (see the examples below). The ratio will be negative if the

13/40

[4.8]

[4.9]


16/42

average return is below the risk-free return. Some systems exhibit a Sortino ratio of 1

or more, and ratios above 2 may be seen.

If there are less than 24 months of data, we do not calculate the Sortino ratio, because

we feel that there may not be enough "disappointment" data to be statistically

meaningful.

When "average" and "standard deviation" of the disappointments are mentioned, the

calculations include the zero values. For example, for disappointments of 1.5, 1.5, 0, 0,

0, 0 the average is 0.5 and the standard deviation is 0.8; for disappointments of 1, 1, 1,

0, 0, 0 the average is again 0.5 but the standard deviation is 0.5, which is significantly

smaller than in the first example.

The one-year (short-term) Sortino ratio is calculated, to provide an indication of how

well a system has performed in the most recent

A seemingly obvious method to evaluate the fitness of a trading rule is to see whether itgenerates any profits. Overheads such as transaction costs have to be taken into account. The

idea that the trading rule might perform better under all conditions and time periods except for

the current one has to be taken into account as well. This would imply giving individual

trading rules a second chance.

Another means to judge fitness is to compare the results of a trade made by individual trading

rules to the results of a trade made by the BUY and HOLD strategy.

Many authors have disputed the effectiveness of this strategy and in some literatures it is

termed as a wrong idea for short term investments but provides a steady performance over

long term portfolios. For a comparative study, it does provide an indication. [Koza et al, 1996]

Initially both these methods will be used and after due experimentation, the decision to deploy

one or both of them will be made.

4.2.3 Description of GP Engine

The GP algorithm will be detailed in this chapter. The flow chart in figure 5 details the

components of this algorithm. In the flowchart, the functionality of each module has

been divided into the level of the GP hierarchy with which it is concerned. Further

more, before the actual algorithm is presented, the notation and terminology used

within it is also explained so as to facilitate understanding the algorithm.

14/40


17/42

Fig. 5: Flow Chart of GP Algorithm

Figure 5 shows a detailed flowchart representation of the algorithm.

Specification, Notation & Terminology

Each function of the algorithm is represented by a letter (A,B,C etc..) and a namedetailing the functionality. Any function can be called from another function and

arguments are provided in italics.

15/40


18/42

An in-depth analysis of the algorithm parameters and their significance is provided in

Annex A.

16/40

Object_Offset/Object_Count This integer relates to an object such as

population,generation or expert identifies

which is the count or the offset of that

object being worked upon

Node_Library A text file containing nodes for the GP

tree. Nodes are randomly selected fromthis library to create GP trees.

T,Ttest/train The variable T is an integer

representation of the total time for a

stock as the number of "ticks". When it

is subscripted with either test or train,

this variable then specifies ,at which tick

to begin testing or training

NTRE/GEN The integer N defines the maximum

number of trees or generations that can

be created during any instant

Objectobject_Count This variable is a direct representation ofthe object at an instance of object_count

Ctest/train (time) This variable, defines the capital or the

performance measure. There are

seperate capital values for the testing

period and training period. represented as

Ctestnet(Ttest-1+Population_Offset) and

Operation_Limit This integer defines the limit for any

operation, starting at 0.

Px This variable defines the percentage for

any context X. X can be elitism,

crossover, mutation, the percentage of

generation to be carried and the

percentage of the stock quotations to be

used for testing.

Rand Any random variable

Rdec This variable represents the boundary

limit

B time The variable B is an identification of the

stock and when subscripted with a time

value, it means the value of the stock at

that time.

A lower/uppertime This variable is generated by the moving

average parameter. At any given time,

this would be the upper and lower limits

of the moving average boundary


19/42

Algorithm

Initialization:

1. Initialize parameters

2. InitializePopulation_Offset= 03. RetrieveNode_Library

4. SetPopulationPopulation_Offset=Population Creation fromNode_Library5. Train, Test and EvolvePopulationPopulation_Offset

Train, Test and EvolvePopulation

1. While Ttest+Population_Offset< T

i. Train, Test and Evolve Generation

ii. IncrementPopulation_Offsetby 1

iii. SetPopulationPopulation_Offset=Population Creation fromPopulationPopulation_Offset-1

Train, Test and Evolve Generation

1. While Generation_Count


20/42

iv. Mutation_Limit = Pmut *NTRE/100

4. ForCount= 0 toElitism_Limit

i. GenerationExpert_Count = Previous_GenerationCountii. Increment Expert_Count

5. ForCount= 0 to Crossover_Count

i. Initialize Parent_Expert_1 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset

ii. Initialize Parent_Expert_2 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset

iii. GenerationExpert_Count , GenerationExpert_Count+1 =Expert Creation From Crossover ofParent_Expert_1 andParent_Expert_2

iv. Increment Expert_Countby 2

6. ForCount= 0 toMutation_Count

i. GenerationExpert_Count = Expert Creation From Mutation of a GenerationCountii. Increment Expert_Count

Roulette Wheel Selection ofExpertfrom Generation atPopulation_Offset

1. InitializePerformance_Sum = Sum ofCtestnet(Ttest-1+Population_Offset)of eachExpertin Generation

2. InitializePerformance_Ratio = 0

3. Initialize Count= 0

4. Select a random double valueRandbetween 0 and 1.

5. WhilePerformance_Ratio


21/42

Tree Creation FromNode_Library:

1. Initialize an empty Tree

2. While Tree_Depth


22/42

4.2.4 Conclusion

The problem, rapidly analyze stock market price data for a given stock and give a

BUY/SELL or HOLD decision, has been detailed. A solution, a genetic programming

based algorithm which incorporates certain financial technical indicator functions.The technical indicator functions are also detailed and explained alongwith the context.

A flow chart describing the flow of the modules of the GP algorithm is presented to

give a clearer view of its structure. Then finally the algorithm itself is presented, with a

technical specification.

Now at this stage, we are ready to do some experiments by assigning parametric data

and to draw conclusions from these experiments.

5. Experimentation

5.1 Experimental Aims and Objectives

The primary objective of the experimental work was to demonstrate the effectiveness

(or otherwise) of this system and of the general concept - GP optimization of TI based

trading rules - in making profitable forecasts of stock price movements.

It is necessary to demonstrate that the GP algorithm is learning rules which have some

predictive power beyond the training period, as opposed to just learning the behavior of

the training data. The stock data used for the experiment is composed of variable tick

rates and is sufficiently unpredictable to facilitate the goals. Applying various

parameters, one of the objectives is to establish whether any profit is achieved, and todiscern relationships between parameters and performance values.

5.2 Trading Procedure

A selection of the parameters of the GP algorithm will be assigned a range of values.

Then the GP algorithm will be applied to experimental input data. The American

trading strategy will be used; at the beginning of the experiment on each set of data, the

initial number of stocks in hand will be zero towards the end of the data, a BUY

decision will be forced.

A trial is run with the first set of values of each parameter. Performance is measuredand noted. Then the value of one parameter is assigned the next value in its range and

the above process is repeated.

The time taken for the whole process as a ratio of the number of populations will be

used as one of the measures of evaluating performance. This ration provides two

advantages. It gives a reasonable estimate as to how much time will be taken to

calculate a decision for one quotation. The size of data in each set of quotations is

different and the parameter for the percentage of data used in testing will yield a biased

result, this ratio eliminates such concerns.

The second measure of performance will be the net profit at the end. That is the

difference between the initial capital and the net worth at the end.

20/40


23/42

5.3 Experimental Input Data

The experiments used stock market price data for simulated evaluation of trading

strategies.

Experiments were conducted using share price data for AXA, Peugeot S.A. and STMicroelectronics N.V. traded on the Paris Stock Exchange; Bourse de Paris. Data for

all stocks covers the same 6 day period, from 29 th May 2006 to 3rd June 2006. The price

values were plotted from a spreadsheet and visually inspected for anomalous values,

such as negative volume values at the start of the trading day, before being used as

input for the GP system as CSV files. Also note that there are periods within each

graph represented by sloping lines. These lines are periods of inactivity in the stock

market, the time after which the stock market is closed for the day and before it opens

the next morning.

Fig. 6: AXA Data

Fig. 7: Peugeot Data

21/40


24/42

Fig. 8: ST Microelectronics Data

5.4 Parameters

The parameters used in the GP algorithm will now be explained, with regards to range

of values and reasons for selection of these parameters at the respective values or valueranges.

Initial CapitalC0, Commission Pcom, Initial Number of Stocks St, Decision BoundaryRdec.and Trading Strategy

Initial Capital and Commission are fixed at 100,000 and 0.2 % respectively.

Since the American trading strategy is used, the Initial Number of Stocks will be

throughout zero.

Decision Boundary is set at 0.2% as a lower value would allow too many decisions to

be taken, thus increasing the commission by a large number. A higher value would

filter too much allowing too few decisions to be made.

Moving Average RangeRMAThe Moving Average Range will affect how narrowly to filter decisions according to

price fluctuations. A low value will allow decisions to be made according to smaller

shifts as opposed to a high value which will be less sensitive. Values selected for this

are 10,15 and 30.

Buy Sell Percentage

This value is fixed at 50 to allow for the effects of decisions to be more apparent.

Number of GenerationsNGENand Number of Trees in each Generation. NTRE.The number of generations and number of trees effect the performance and time taken.

A low value will take less processing time but performance will be sacrificed and vice

versa.

Values for Number of Generations include 5,10 and 20 and for Number of Trees in

each Generation include 100 ,200 and 300.

Maximum Tree Depth NDEPThis parameter is fixed at 20 as too deep a tree would needless increase processing

time and resources.

22/40


25/42

Percentage of Previous Population Carried ForwardPcarry.This parameter is fixed at 50 as this would provide an equal mix of expertise from the

old population and new expertise from randomly created nodes.

ElitismPelite

The value for elitism is fixed as 2 as too high a value would lead to convergence.

Crossover ProbabilityPcross .and Mutation ProbabilityPmutThis parameter would effect expertise exchanged but genetically modified between

generations.

Values are set between 80 and 90 for crossover and 10 and 20 for mutation.

Replacement, addition and deletion Probability in MutationPrepmut, Paddmut PdelmutThese are fixed as 33.33 for each one of them.

Training Start Quotation limit Ttrain

The training start time is fixed at 30 , as a lower value would limit the effectiveness ofsome of the technical indicators.

Percentage of Quotations for TestingPtest.

This parameter would define how much of the data would be used for training and how

much for testing.

Its values include 60,75 and 90.

Refer to figure 9 for a summary of parameter values and ranges.

C0 Pcom St Rdec.

100,000 0.2 0 0.2

RMA NGEN. NTRE. NDEP Pcarry. Pelite Pcross . Pmut Prep

mut Padd

mut Pdel

mut

10 5 100 20 50 2 80 20 33.33 33.33 33.33

15 10 200 85 15

30 20 300 90 10

Ttrain Ptest30 60

75

90

Fig. 9: Summary of Parameters

23/40


26/42

5.5 Results

Representation of Results

Performance is measured as the ratio of the profit gained to the initial capital invested.

The results are shown on scatter charts with the parameter values on the X-axis versus

profit ratio on the Y-axis. The use of these type of charts is helpful in determining thetendencies of profit ratios with respect to a certain parameter. They are also helpful in

determining anomalies.

Buy & Hold as a Performance Indicator

The application also does a Buy & Hold run before trials are run on each stock.

Stocks are bought at the beginning of each business day. The amount of stocks bought

is determined by the Buy/Sell percentage, which is fixed during parameterization.

The profit gained during each such run is also marked on the scatter graph.

Variation of Profit

According to Figure 10, which shows the net profit as a variant of the moving average

range, a tendency for higher values of profit are shown at a moving average range of

30. A slight discrepancy is noticed for the AXA stock value which shows higher values

of profit at a moving average range of 10. This anomaly can be taken as a random

occurrence and discounted as it appears isolated.

Fig 10. Scatter charts of net profit as a variant of Moving Average Range

Figure 11, shows the net profit as a variant of the number of generations per

population, a tendency for higher values of profit are shown at the value of 10. Adiscrepancy is noticed for the AXA stock value which shows a slightly higher value of

profit at 20.

24/40

AXA Profit Ratio vs. RMA

-2.00E-02

-1.50E-02

-1.00E-02

-5.00E-03

0.00E+005.00E-03

1.00E-02

1.50E-02

2.00E-02

0 10 20 30 40

RMA

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. RMA

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 10 20 30 40

RMA

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STM Profit Ratio vs. RMA

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

0 10 20 30 40

RMA

ProfitRatio

Profit Ratio

Buy& Hold Ratio


27/42

Fig 11. Scatter charts of net profit as a variant of Number of generations

Figure 12, shows the net profit as a variant of the number of trees per generation, a

tendency for higher values of profit are shown at the low value of 200.

Fig 12. Scatter charts of net profit as a variant of Number of Trees

Figure 13 shows the net profit as a variant of the crossover percentage and there is ahigh profit ratio trend at the 80 percent mark.

Fig 13. Scatter charts of net profit as a variant of Crossover percentage

Figure 14, shows the net profit as a variant of the testing percentage, a tendency for

higher values of profit are shown at the 75 percent mark. A marked discrepancy can be

seen with regards to STMicroelectronics which shows higher profit values at 90.

25/40

AXA Profit Ratio vs. NGEN

-2.00E-02-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

0 5 10 15 20 25

Ngen

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. NGEN

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 5 10 15 20 25

Ngen

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STM Profit Ratio vs. NGEN

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

0 5 10 15 20 25

Ngen

ProfitRatio

Profit Ratio

Buy& Hold Ratio

AXA Profit Ratio vs. NTrees

-2.00E-02

-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

0 100 200 300 400

Ntrees

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. Ntrees

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 100 200 300 400

Ntrees

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STM Profit Ratio vs. Ntrees

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

0 100 200 300 400

Ntrees

ProfitRatio

Profit Ratio

Buy& Hold Ratio

AXA Profit Ratio vs. PCross

-2.00E-02

-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

78 80 82 84 86 88 90 92

PCross

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. PCross

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

78 80 82 84 86 88 90 92

PCross

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STMProfit Ratio vs. PCross

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

78 80 82 84 86 88 90 92

PCross

ProfitRatio

Profit Ratio

Buy& Hold Ratio


28/42

Fig 14. Scatter charts of net profit as a variant of Testing percentage

Measure of Profit and Time per population

Figure 15 shows the net profit as a variant of the time per population in seconds. Ascan be seen, higher profit values are closer to the low end of the time range. This

means that higher profit values are in fact more likely to generated at shorter amounts

of time, at the 10 second boundary or before.

Fig 15. Scatter charts of net profit as a variant of Time per population

The evolutionary performance of the GP algorithm was reasonably sensitive to the

control parameters: Varying the crossover and mutation probabilities, number of

generations etc had a noticeable effect on the profit values attained

5.6 Discussion of Results

If the anomalies in the above results are disregarded; the following parameters at the

following settings should give very high, if not the highest, profit values at a time ratio

of less than 10 seconds per population.

26/40

AXA Profit Ratio vs. Ptest

-2.00E-02

-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

0 20 40 60 80 100

Ptest

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. Ptest

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0 20 40 60 80 100

Ptest

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STM Profit Ratio vs. Ptest

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

0 20 40 60 80 100

Ptest

ProfitRatio

Profit Ratio

Buy& Hold Ratio

AXA Profit Ratio vs. Time(seconds) per population

-2.00E-02-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

0 10 20 30 40 50

Time(seconds)

ProfitRatio

Profit Ratio

Buy& Hold Ratio

Peugeot Profit Ratio vs. Time(seconds) per

population

-0.03-0.02

-0.01

0

0.01

0.02

0.03

0 10 20 30 40 50

Time(seconds)

ProfitRatio

Profit Ratio

Buy& Hold Ratio

STM Profit Ratio vs. Time(seconds) per population

-3.00E-02

-2.00E-02

-1.00E-02

0.00E+00

1.00E-02

2.00E-02

3.00E-02

0 10 20 30 40 50 60

Time(seconds)

ProfitRatio

Profit Ratio

Buy& Hold Ratio


29/42


30/42

The highest profit for each data set has parameters which are slightly off from the

above proposed optimal settings.

Some of the parameters at certain datasets seem to follow a pattern with respect to

profit. An example would be AXA with increasing moving average ranges. The exact

opposite is noted at STMicroelectronics which shows increasing profit at decreasing

moving average ranges. A few parameters seem to show no pattern at all; number ofgenerations for example.

Figures 19,20 and 21 show the stock data and the BUY/SELL decisions of the GP

algorithm for AXA, Peugeot and STMicroelectronics respectively. The circles

represent SELL decisions, the squares represent BUY decisions. The following f

igures are subsets of the original stock data, to make it easier to represent on paper.

Fig 19. Graph output of AXA quotes with BUY/SELL decisions

Fig 20. Graph output of Peugeot quotes with BUY/SELL decisions

28/40


31/42

Fig 21. Graph output of STM quotes with BUY/SELL decisions

6. ConclusionThis report put forth the problem of analyzing financial time series data to suggest actions to

be taken in quasi-real time. A solution was proposed, based on genetic programming. The idea

was to create GP trees with financial technical indicators as branches and logical operators to

join these branches.

In order to fully appreciate the significance of this endeavor, current systems which employ

similar techniques were studied. The greatest inspiration was the Internet Bourse Experts

system, which employed genetic algorithm. In depth analysis was done of another GP based

system called EDDIE. A development platform had to be chosen which would make designing

of the software portion easier.

The initial tasks included an intensive study of evolutionary computing and stock market

trading methodologies. A tentative GP algorithm was devised. The hierarchical structure of the

major objects; population, generation, expert and tree, was proposed.The functionalities of

each object was designed such that any property or function could be accessible at any point in

the program. A representation for a tree structure was researched. Functionalities such as tree

construction, parsing, removal and modification of nodes and evaluation had to be

incorporated in this representation. A grammar for this kind of representation which emulated

a typical GP tree structure.

The technical indicators used in the project had been selected from their obvious benefits on

previous work in this domain. The vast library of IBE's trading functions is an obvious source.

All experiments were conducted on real stock price data. In all cases, as was demonstrated

during the experimentation phase, the results are more profitable then by the technique of

Buy-and-hold.

The tick frequency of each data set was different. Although the time period for each was the

same, the number of quotations was different. AXA contained 1013 quotations, Peugeot had

828 quotations, while STMicroElectronics had 860 quotations. This factor was not taken into

29/40


32/42


33/42

Appendix A

GP Algorithm Parameters

Trading-Specific Parameters

Initial Capital

This defines the initial working capital as type double, before any trading decisions are

made, it is represented as C0 and the amount of working capital at any subsequent time

period tis represented as Ct.

Commission

This is the commission charged per transaction as a percentage of number of stocks

bought or sold, as type double, represented asPcom .

Buy Sell PercentageThis parameter defines the percentage of capital to use to buy stocks, if the decision to

buy is given or the percentage of stocks in hand to sell if the decision to sell is given.

Both of are type double.

They are represented asPbuy andPsellrespectively.

Initial Number of Stocks

This parameter is used to define the initial number of stocks to have in hand, at the start

of the trading day. It is of type integer. It is represented as S0 . The number of stocks at

any given time tis represented as St.

Decision BoundaryThis parameter defines the minimum difference in stock prices which will allow a

decision to take place. It is of type double and it is represented byRdec . At any point in

time the absolute difference between subsequent stock prices must be greater than or

equal toRdec.

.

Moving Average Range

This parameter defines the previous number of time periods used in calculating a

moving average from stock prices. It is of type integer. It is defined asRMA. The

commission percentagePcom of the moving average at a given time tis added and

subtracted to create a boundary, if the current stock price falls into it, no decision is to

be made. This allows for decisions to be made according to fluctuations in price

movement trends.

Trading Strategy

This parameter defines whether to use the American trading strategy, i.e. if the

American trading strategy is used, fix the initial number of stocks at 0 and end of the

trading day, sell all stocks.

31/40


34/42

Importance of Trading-Specific Parameters

Portfolio Management and Performance (Pt)

The capital at time t Ct , and the number of stocks in hand at time t St are

continuously changing based on whether the decision is BUY or SELL.

A. If the decision is BUY, and if the working capital is more than zero, the

following formula is used;

1. Sbuy = (Pbuy * Ct-1 ) /Bt

Where Sbuy is the number of stocks to buy

andBt is the stock value at time t.

2. St= Sbuy + St-13. Ct = Ct-1 - (Bt* Sbuy) - (Pcom * Sbuy)

B. If the decision is SELL, and if the number of stocks is more than zero, the

following formula is used;

1.Ssell= (Psell * Ct-1 ) /BtWhere Ssellis the number of stocks to sell

2. St= Ssell- St-13. Ct = Ct-1 + (Bt* Ssell) - (Pcom * Ssell)

C. After either of these steps, the net worth , which is used a performance

measure is calculated;

1. Cnet(t) = Ct + (Bt* St)

Where Cnet(t) is the net worth/ performance at time t

.

Also all variable portfolio indices, Capital, Number of Stocks and net worth/

performance are separate for training and testing periods. They are represented as Ctestt, Stesttand C

testnet(t).For testing and for training as C

traint , S

traint and C

trainnet(t) respectively.

As can be seen from the above two formulas , the commission , Pcom, always figures

into the calculation and is always deducted from the capital , regardless of the decision

made. Keeping this in mind, decisions have to made carefully as too many would

deplete the capital too soon. The following section describes how it is possible to avoid

such an event from happening.

Filtering measures

Two filtering measures are used, moving average and decision boundary.

For the moving average;

1. Calculate the moving average from time tto t-RMA as follows;

I. SetA t := 0

II. for Index= 0 toRMAA t= (A t*Index + Bt-Index)/(Index + 1)

WhereA t is the moving average at time t andIndex is a counter

32/40

Formula 13

[A.1]

[A.2]

[A.3]

[A.4]


35/42

2. Calculate and upper limit and lower range of the

moving average by adding and subtracting the

commission , Pcom, as a percentage of the moving

average as follows;

I. Aupper

t =A t+ (Pcom * A t)/100

II. Alowert=A t- (Pcom * A t)/100Where A

uppert is the upper limit and A

lowertis the lower limit.

If a stock value at time t does not lie betweenAupper

t andAlower

t and the absolute

difference between the current and previous value is greater then or equal to the

decision boundary ,Rdec , a request for a BUY or SELL decision will be made.

IF (Rdec |Bt-Bt-1|) AND NOT(Alower

tBtAupper

t)

REQUEST DECISION.

Genetic Programming Specific Parameters

Number of Generations

This defines the maximum number of generations in each population. It is of type

integer. It is represented asNGEN.

Number of Trees in each Generation

This defines the maximum number of trees in each generation. It is of type integer. It

is represented asNTRE.

Maximum Tree Depth

This defines the maximum depth of a tree. It is of type integer. It is represented as

NDEP.

Percentage of Previous Population Carried Forward

This defines the percentage of the top members of the previous population which will

be used to create the new population. It is of type integer. It is represented asPcarry.

The very first population is consists of trees which have been generated from randomly

selected leaf nodes from a library. This kind of randomness is sufficient for an initial

population, but for subsequent populations, the expertise of a previous population is

necessary as it may be provide a reasonable solution for the forthcoming sample space.

Elitism

This defines the percentage of the elitist trees based on performance which will be

carried forward into the next generation unchanged. It is of type integer. It is

represented asPelite.

Crossover Probability

This defines the percentage of the number of trees from a previous generation which

are used to create new trees using the crossover process described in section. It is of

type integer. It is represented asPcross .

33/40

[A.5]


36/42

Mutation Probability

This defines the percentage of the number of trees from the previous generation which

are used to create new trees using the mutation process described in section. It is of

type integer. It is represented asPmut .

Mutation consists of replacement, addition or deletion operations and the probabilityfor each operation occurring is defined asPrepmut,P

addmutandP

delmut.

They are of type double.

Importance of Genetic Programming Specific Parameters

Hierarchical Structure

The Tree is the base object. The conceptual structure of the tree has been detailed in

section. The technical indicators and logical operators are stored as strings and to

evaluate the tree, the string is parsed. When a tree operates on stock market data, itgives a BUY, SELL or HOLD decision.

Trees can either be created from random nodes, from crossover operations or from

mutation operations.

EachExpertcontains 2 trees, a BUY tree and a SELL tree. The result of both trees

undergoes a XOR operation to return a single result. Each expert maintains a record of

parameters, performance, capital and number of stocks. In future references, an expert

would refer to the pair of BUY and SELL trees.

Each Generation containsNTREnumber of experts. The first generation, GEN0 ,

contains experts created from random nodes. Subsequent generations, GEN1 to

GENNGEN, have high ranking experts from the previous generation and new experts

created by genetic operations, namely mutation and crossover.

EachPopulation containsNGENnumber of generations. The first generation in the first

population is randomly created (as detailed above, subsequent generations in the same

population are created through elitism and genetic operations) , but the first generation

in forthcoming populations will consist ofPcarry percent of the elitist members of the

fittest generation from the previous population.

Figure A.1 exhibits the hierarchical structure of the objects described above.

34/40


37/42

Fig. A.1: Hierarchical Structure of GP Objects

Training and Testing Specific Parameters

Training Start Quotation limitThis defines at which point in time on the stock market sample to begin training. This

number is an integer and at minimum it has to be 30. Some of the technical indicators

used in the application read quotations going back to several points in time. It is

represented as Ttrain.

Percentage of Quotations for Testing

This defines what percentage of the stock market sample to use for testing. This

number is an integer. It is represented asPtest.

Subsequently;

35/40

Population POP0

Generation GEN0

Generation

GEN

..

ExpertEXP0 ExpertEXP

NTRE

..

Sell TreeBuy Tree Sell TreeBuy Tree

ExpertEXP0 ExpertEXP

NTRE

..

Sell TreeBuy TreeSell TreeBuy Tree


38/42

Ttest= (Ptest * T)/100

Where Trepresents the total time line in the sample space and Ttestis the

time at which testing will begin.

Importance of Training and Testing Specific Parameters

The sample space, in this case the stock market data, is divided into TRAINING and

TESTING periods which move forward a single time unit as populations progress.

The first training period is between Ttrain and Ttest-1 and the first testing period is at Ttest .

This means that each expert of the first generation of the first population will be

applied to the stock quotations during this training period. The performance of each

expert will be calculated and the fittest will be used to create the second generation.

This process will repeat itself untilNGENgenerations have been created. The last

generation will be the fittest according to the fitness measure. The fittest tree will be

applied to the first testing period, Ttest.

At this point, a new population is to be created. The training and testing periods will beoffset by 1. Therefore, in this case, the training period will be between Ttrain+1 and Ttest-

1+1 and the testing period will be Ttest+1. Instead of complete random creation of the first

generation in this new population,Pcarry percent of the elitist experts from the last

generation of the previous population will be carried as they are into the new

population and the remainder will be randomly generated.

This process will repeat until the last point in the sample space is tested. That is,

Until Ttest+offset== T.

Where offset is an integer which is initialized at 0 and is incremented

by 1 each time a new population is to be created.

36/40

Formula 15

Formula 16[A.7]

[A.6]


39/42


40/42

[Goodhart, 1995]

Goodhart, C., OHara, M., High Frequency Data in Financial

Markets: Issues and Applications, London School of Economics,

1995.

[Gourieroux, 1997]Gourieroux., C.,ARCH Models and Financial Applications,

Springer Verlag, 1997.

[Holland,1975]

Holland, J., Adaptation in Natural and Artificial Systems,1975.

[Hui, 2003]

Hui, A., Using Genetic Programming to Perform Time-Series

Forecasting of Stock Prices, http://ww.genetic-programming.org ,

2003.

[Kaboudan, 2000]

Kaboudan, M., Genetic Programming Prediction of Stock Prices,

Computational Economics, Volume 16, pp. 207236, 2000.

[Korczak, 2001]

Korczak, J., Kustner. P.,A Stock Trading System using Genetic

Approach and Object-Oriented Database Technology, Proceedings

on Workshop on Artificial Intelligence for Financial Time Series

Analysis, 2001.

[Korczak, 2004] Korczak, J., Lipinski, P.,Evolutionary building of stock trading

Experts in a Real-Time System, Proceedings of the 2004 Congress

on Evolutionary Computation, CEC 2004, pp.940-947, 2004.

[Korczak, 2001]

Korczak, J., Roger, P., Stock timing using genetic

algorithms,Applied Stochastic Models in Business and Industry

Volume 18: pages 121134,2001.

[Koza, 1992]

Koza, J., Genetic Programming: On the Programming ofComputers by Means of Natural Selection, The MIT Press, 1992.

[Koza, 1995]

Koza, J., Survey of Genetic Algorithms and Genetic Programming,

Proceedings of the WESCON 95 Conference Record,1995.

38/40
http://ww.genetic-programming.org/http://ww.genetic-programming.org/


41/42

[Koza et al., 1996]

Koza, J., Bennett III, F., Andre, K., Keane,M.,Artificial

Intelligence in Design, http://www.genetic-programming.com,

1996.

[Krishnaswamy et al., 2000]Krishnaswamy, C., Gilbert, E., Pashley, M., Neural Network

Applications in Finance: A Practical Introduction, Financial

Practice and Education, 2000.

[Langdon, 1995]

Langdon, W., Qureshi, A., Genetic Programming: Computers

using "Natural Selection" to generate programs, The MIT Press,

1995.

[Lendasse et al., 2001]

Lendasse A., Lee J., de Bodt, E., Wertz, V., Verleysen, M.,Dimension Reduction of Technical Indicators for the Prediction of

Financial Time Series - Application to the BEL20 Market Index,

European Journal of Economic and Social Systems 15, Vol. 2, pp.

31-48, 2001.

[Lipinski, 2003]

Lipinski P.,Evolutionary Data-Mining Methods in Discovering

Stock Market Expertise from Financial Time Series, PhD Thesis,

ULP Strasbourg, 2003.

[Mitchell et al., 1992]

Mitchell M., Forrest S., Holland ,J., The royal road for genetic

algorithms: Fitness landscapes and GA performance; Proceedings

of the First European Conference on Artificial Life, Paris, France,

pp. 245, 1992.

[Molgedey, 2000]

Molgedey, L., Ebeling, W.,Intraday Patterns and Local

Predictability of High Frequency Financial Time Series, Physica A:

Statistical Mechanics and its Applications,Volume 287, Issues 3-

4,pp. 420-428, 2000.

[Pantazopoulos et al., 1998]

Pantazopoulos, K., Tsoukalas, L., Bourbakis, N., Brun, M.,

Houstis, E.,Financial prediction and trading strategies using

neuro-fuzzy approaches , IEEE Transactions on Systems, Man and

Cybernetics, Part B,Volume: 28, Issue: 4, pp. 520-531, 1998.

39/40
http://www.genetic-programming.com/http://www.genetic-programming.com/


42/42

[Santini, 2000]

Santini, M., Tattamanzi A., Genetic Programming for Financial

Time Series Prediction, Proceedings of EuroGP'2001, Volume:

2038, pp. 360371, 2001.

[Sharpe, 1996]Sharpe, W.,Mutual Fund Performance, Journal of Business, pp.

119-138, 1966

[Sortino, 1994]

Sortino, F., Price, L., Performance Measurement in a Downside

Risk Framework, The Journal of Investing, pp. 59-65, 1994

[Spears, 2003]

Spears,W., Gordon-Spears, D., Evolution of strategies for resource

protection problems, Advances in evolutionary computing: theory

and applications, Springer-Verlag, 2003.

[Xu et al., 2003]

Xu, Z., Leung, K., Liang, Y., Leung, Y., Efficiency Speed-up

Strategies for Evolutionary Computation: Fundamentals and Fast-

GAs, Applied Mathematics and Computation, v.142, pp. 341-388,

2003.

[Zitvogel, 2003]

Zitvogel, O.,Dveloppement d'un Systme Multi-Agents, Interface

Intelligente, Ngociation et Gestion de Bases de Dones, Internship

Report, LSIIT-AFD, Illkirch, 2003.

Documents

Discovery of Stock Trading Expertise Using Genetic Programming