Discovery of Stock Trading Expertise Using Genetic Programming

Embed Size (px)

Citation preview

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    1/42

    Laboratoire des Sciences de l'Image, de l'Informatique et de la Tldtection

    LSIIT - UMR 7005

    Fundamental and Applied Computer Science Research Master

    Internship Research Report

    Discovery of Stock TradingExpertise Using Genetic

    Programming

    By: Syed Muhammed Ali Jafri

    Supervisor: Pr. Jerzy KORCZAK

    Illkirch, September 2006

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    2/42

    Contents1. Introduction 1

    2. Background 22.1 Financial Prediction 22.2 Evolutionary Computing 22.3 Genetic Algorithms 2

    2.3.1 String representation 2

    2.3.2 Crossover and mutation operations 3

    2.3.3 Fitness based selection 3

    2.4 Genetic Programming 42.4.1 Tree structure representation 4

    2.4.2 Operators and terminals 4

    2.4.3 GP Crossover and mutation 5

    3. Internet Bourse Experts System. 73.1 Introduction 73.2 Conceptual Flow 73.2 GP Engine in IBE 9

    4 System Design and Implementation 94.1 Problem Definition 94.2 Implementation details 9

    4.2.1 Technical Indicator set 10

    4.2.2 Fitness Measurement12

    4.2.3 Description of GP Engine 14

    5. Experimentation 205.1 Experimental Aims and Objectives 205.2 Trading Procedure 205.3 Experimental Input Data 215.4 Parameters 225.5 Results 245.6 Discussion of Results 26

    6. Conclusion 29

    Appendix A: GP Algorithm Parameters 31

    Bibliography 37

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    3/42

    1. Introduction

    Evolutionary computation has been extensively applied to problems whose solution space is

    irregular, i.e., too large and highly complex, so that it is difficult to employ conventional

    optimization procedures to search for the global optimum [Chen, 1998].

    Solution spaces for financial time series data are highly irregular. General acceptance of this

    property has in fact fostered the growth of financial engineering [Chen, 1998].

    Many strategies and frameworks have been employed ranging from the traditional and more

    popular autoregressive statistical approaches such as ARCH and GARCH [Gourieroux, 1997]

    To more recent evolutionary approaches such as neural networks [Krishnaswamy et al., 2000],

    genetic algorithms [Allen, 1999], [Korczak, 2001], [Lipinski, 2003], [Korczak, 2004] and

    genetic programming [Langdon, 1995], [Kaboudan, 1999], [Santini, 2000], [Hui, 2003],

    [Castebrunet,2005] which is the concern of this report.

    The objective is to model a process of evolution-based learning and to create a geneticprogramming based system, which will be able to accept high frequency stock market data,

    analyze it and rapidly and give BUY/SELL/HOLD signals.

    This system will generate trees of technical trading rules, joined together by the logical

    operators. Every decision signal at a certain point will be result of a training stage and a testing

    stage. The training stage will include generation of trees, performance testing and evolution.

    At the end of the training stage, a single best tree will remain which will be used to generate a

    decision for the testing stage. For further time steps, a selection of the best trees will be reused

    in the previous time step.

    The idea is that at each time step better performing trees will be used and promoted, the lesser

    trees being discarded.

    This report begins with an introduction to the theory and practice of financial prediction and a

    description of evolutionary computation techniques, with a particular focus on Genetic

    Programming. This is followed by a review of an existing system, Internet Bourse Experts.

    Details of the design and implementation of the GP system are included next, together with a

    discussion of various design choices and also a description of the dynamically-adaptive GP

    algorithm and how this algorithm will fit into IBE.

    1/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    4/42

    2. Background

    2.1 Financial Prediction

    The idea behind financial prediction is to use historical pricing data of the assets traded to

    identify unique trends and patterns in the fluctuations of prices [Pantazopoulos et al., 1998].

    These patterns and trends are used to predict what the forthcoming price movements will be

    and decisions to buy or sell an asset are made on this basis.

    Classes of patterns and/or trends are uniquely identified using technical indicators which can

    either quantitative or qualitative. A lot of technical indicators are based on moving average

    computations or from series of local minimums and maximums [Lendasse et al., 2001].

    2.2 Evolutionary Computing

    Evolutionary Computing concerns itself with computer programs trying to behave as livingorganisms undergoing Darwinian mechanisms of natural selection for the purpose of

    optimization, adaptation or search [Koza, 1992], [Spears, 2003]. All evolutionary algorithms

    involve the representation of set of possible solutions to a given problem as a population of

    individuals. The fitness of each candidate solution is tested and the best individuals are

    permitted to survive and produce offspring derived from them. This is seen to create complex

    and highly adapted organisms - optimized solutions to the problem of survival and

    reproduction in the natural environment [Xu et al., 2003].

    2.3 Genetic Algorithms

    Genetic Algorithms operate on a population of individuals represented by character strings

    [Holland 1975], [Mitchell et al 1992]. These are evaluated according to a fitness function

    appropriate to the problem in hand. Pairs of individuals, selected at random but biased

    according to fitness, are recombined to create members of a new population. Starting from an

    initial population of randomly generated candidate solutions, successive generations are

    produced until some termination criterion is reached: This may be the convergence of the

    average and maximum fitness values, or simply a limit on the number of generations.

    2.3.1 String representation

    Genetic Algorithms represent candidate solutions as strings - finite sequences of charactersfrom a given alphabet (typically binary or integer numeric). The method of mapping a

    candidate solution to a GA string depends on the problem domain: The string may represent,

    for example, an ordered sequence of operations, or a set of independent parameters. However,

    a particular location in the string sequence always represents the same part or parameter of the

    solution.

    2/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    5/42

    2.3.2 Crossover and mutation operations

    The string representation used in GA is analogous to the structure of biological genetic

    material - DNA. In the same way, the method of creating new GA strings mimics the

    recombination mechanisms of DNA.

    Crossover is the operation of exchanging information, or genetic material, between twoindividuals. It works by swapping the values at corresponding locations between pairs of

    strings. There are various methods for implementing crossover suited to different applications.

    The simplest method is single point crossover: a point is selected at random to divide each

    string into two sections, one of which is swapped over. Alternatively, a greater number of

    crossover points may be used so that more than one contiguous sub-sequence is exchanged

    between parent strings. Another method, uniform crossover, acts on individual locations -

    swapping each according to a fixed probability.

    Mutation is simply the action of randomly changing the value of individual locations or sub-

    strings within a GA sequence. Although crossover is the main factor in the evolutionary

    behavior in GA, mutation is important because it is the only way of introducing new genetic

    material into the overall population.

    2.3.3 Fitness based selection

    As stated previously, individuals are selected for reproduction randomly, but with the

    probability of selection weighted according the measured fitness of the candidate solution.

    There are many methods by which fitness based selection can be implemented [Blickle, 1995].

    The following are the three most successful in terms of effectiveness and popularity:

    Fitness Proportional Selection (FPS)

    The sum total of the fitness values of all population members is calculated and a

    random number is selected between zero and this value. Running through all

    population members, the fitness values are summed a second time. When the sum

    exceeds the randomly generated number, the current population member is returned. If

    the total fitness sum is thought of as the circumference of a circle, then each individual

    is represented by a sector of the circle equal to its fitness value. If a pointer is placed at

    a random position on the wheel, the probability of it falling within any individual

    sector is proportional to the fitness of that individual. This is why FPS is also known as

    Roulette Wheel Selection. A disadvantage of this method is that a fitness proportional

    selection weighting may not always be suitable. It may be desirable todisproportionately bias selection in favor of individuals whose fitness is only

    marginally greater than average, or to have only a small bias towards individuals who

    have very high relative measured fitness. Another problem with this method is that it

    does not work with negative fitness values.

    Rank Selection

    This method works like FPS, only the fraction of the roulette wheel assigned to each

    individual is dependent on rank position rather than absolute fitness. The degree of bias

    can be controlled by using the rank position value raised by a chosen polynomial

    3/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    6/42

    factor. This is a comparatively slow method because the population must be sorted

    according to fitness.

    Tournament Selection

    A group of Individuals are selected from the population at random. The fittest member

    of this group is returned. The degree of selection bias is related to the size of the groupor tournament - the greater the size, the greater the relative weight of fitter individuals:

    With a tournament size of two, the fittest member of the population is twice as likely to

    be selected as the median. This method is the most computationally efficient as only

    the individuals selected for the tournament need to be inspected.

    Other various schemes also exist such as truncation selection, linear ranking selection, and

    exponential ranking selection but these schemes are outperformed by the above mentioned

    schemes.

    For a comprehensive review on selection schemes please refer to [Blickle, 1995].

    2.4 Genetic Programming

    Genetic Programming (GP) is an extension of GA [Koza, 1992]. GP uses a similar

    evolutionary procedure for search and optimization based on selective recombination from a

    population of candidate solutions. It differs from GA in the representation of the candidate

    solutions.

    2.4.1 Tree structure representation

    The tree structure is a hierarchical model consisting of a set of interconnected nodes (seefigure 1). Each node can have several connections to nodes at a lower level, but only a single

    parent connection.

    The name genetic programming refers to the fact that the tree structure is usually used to

    represent a function in the style of a computer program syntax tree. The branch nodes

    represent functions - they take values passed by their immediate descendants as input

    arguments and return an output to their parent. The terminal leaf nodes represent input

    arguments or variables. The branching hierarchy denotes the evaluation ordering of functions.

    In contrast to genetic algorithms, the tree representation of GP facilitates the creation of

    candidate solutions of variable size and complexity - crossover and mutation operations canalter the size of individual trees. Another important difference is that, unlike GA, there is no

    specific mapping of individual parts of the tree to a part of a candidate solution. The GP

    function parse tree returns output values from a given set of input variables.

    2.4.2 Operators and terminals

    The GP tree structure is constructed from two sets of node types - functions and terminals.

    The branch nodes - those which have at least one connection to a child node - are taken from

    the function set. This set typically consists of simple logical (AND, OR, etc), conditional (IF-

    4/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    7/42

    THEN-ELSE) arithmetic (+, -, *, /), or comparison (, =) operators. The choice of function

    set is a design decision which depends on the problem domain and on the data types that GP

    function should take as input and return as output. The terminal set consists of all the data

    input variables which are to be evaluated by the GP function.

    Function and terminal sets must be chosen such that they are capable of expressing a solutionto the problem. This means that the designer should have knowledge about the problem

    domain - including some idea of the likely form of solutions.

    2.4.3 GP Crossover and mutation

    Genetic Programming implements crossover and mutation operations equivalent to those used

    in GA. To carry out the process of crossover on a pair of trees, a single node is selected at

    random from each - these form the crossover points. The sub-trees originating at these nodes

    are swapped over, creating two new GP trees (shown in figure 2). If the two sub-trees contain

    a different number of nodes then the resulting offspring trees will be of different sizes to theparents. Crossover is easy to implement in code by swapping over pointers between parent and

    child nodes at the selected points.

    The parent trees used are selected using the roulette wheel selection algorithm.

    In this the parents are selected according to their fitness. The better the chromosomes are, the

    more chances to be selected they have. Imagine a roulette wheel where all the chromosomes in

    the population are placed. The size of the section in the roulette wheel is proportional to the

    value of the fitness function of every chromosome - the bigger the value is, the larger the

    section is.

    A marble is thrown in the roulette wheel and the chromosome where it stops is selected.

    Clearly, the chromosomes with bigger fitness value will be selected more times.

    This process can be described by the following algorithm.

    1. Calculate the sum of all chromosome fitnesss in population - sum S.

    2. Generate random number from the interval (0,S) - r.

    3. While: Go through the population and sum the fitnesss from 0 - sum s. When the sum

    s is greater then r, stop and return the chromosome where you are.

    Of course, the step 1 is performed only once for each population.

    Mutation works in a similar way; in the strictest definition a new randomly generated sub-tree

    is inserted at a randomly selected node and the displaced section is discarded.

    In our application mutation could be any one of the following operations;

    1. Removing a randomly selected node from the tree (Deletion operation).

    2. Adding a node to a randomly selected point in the tree without removing any portion

    of the original tree (Addition operation).

    3. The classical definition of mutation; removing a section of a tree and replacing it with

    a randomly generated sub-tree (Replacement operation) .

    5/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    8/42

    Because crossover can exchange sub-trees between different locations, unlike in GA, there is

    less need for mutation in creating and maintaining diversity in the population of candidate

    solutions. Therefore the mutation operator is sometimes left out of GP algorithms if the

    population is made large enough to ensure sufficient initial diversity of available building

    blocks [Mitchell,1998].

    Figure 1: GP parse-tree representation of two functions taking four separate input

    parameters

    AND

    OR

    NOT

    IF

    In ut 1

    OR

    Input 2 Input 3 Input 4

    NOT

    Input 1 AND

    Input 2

    In ut 3

    Input 4

    NOT

    Parent 1 Parent 2

    Subtree 1 1

    Subtree 2 1

    return value return value

    6/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    9/42

    Figure 2: Result GP of crossover.

    3. Internet Bourse Experts System

    One of the goals of this project is to analyze an existing system, namely the Internet Bourse

    Experts system, and to attempt to improve on its existing experts generator, which uses GA,by replacing it with a GP based system.

    3.1 Introduction

    Internet Bourse Experts (IBE) is an on-line multi-agent system, based on client-server

    architecture, which analyzes financial data and is able to generate stock trading expertise. In

    this context, this expertise is composed of trading rules in GA based strings [Korczak,

    Kustner, 2001], [Korczak, Lipinski , 2004], [Lipinski , 2003].

    Given a library of trading rules the objective is to find the best case scenario collection of

    trading rules and to judge its efficiency without giving much priority to economic relevance.

    IBE uses genetic algorithms which employs the "survival of the fittest" ideology to create AI

    based experts which base their decisions on a subset of the trading rules. This does not mean a

    global optimum but the most effective under the circumstances. The fitness function used to

    evaluate experts in the population is explicitly tailored to stock trading. The evolutionary

    approach presented here whereby knowledge-based trading systems building are to be built, is

    evaluated on real financial time series.

    AND

    OR

    NOT

    IF

    Input 1

    OR

    Input 2

    Input 3 Input 4

    NOT

    Input 1 ANDInput 2

    Input 3

    Input 4

    NOT

    Child 1 Child 2

    7/40

    return valuereturn value

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    10/42

    There are a large numberof trading rules based on technical analysis indicators. Using these

    rules, financial experts and market traders make decisions on the stock market: to buy, sell, or

    defer action and do nothing.

    For more details refer to [Lipinski, 2003], [Korczak, 2001], [Korczak, 2004].

    3.2 Conceptual Flow

    Within the system, a certain number of intelligent agents exist [Zitvogel, 2003]. These agents

    specialize and represent different methods to analyze and process the data as well as

    heterogeneous events.Each agent is autonomous; using its own methods to analyze the market

    and concentrating on its own objectives.

    The starting point of the system, according to the diagram (refer to figure 3), is of course the

    stock market data from which all events are conceived. This data is preprocessed by the

    database agents and then stored in the database.

    Preprocessing consists of grouping all the data and calculating an average. The reason for this

    is, keeping in mind the large volume of data that arrives from the stock market per second.

    It is useless and impossible to store all of it. It seems better that the data be preprocessed and

    stored in the database by intelligent agents , the stock market being continuously analyzed by

    other intelligent agents known as market watch agents which try to detect as early as possible

    the important events to keep the system on track. As a consequence, the system can adapt

    easily to the new situation.

    When the preprocessed data finally arrives in the financial database, it is treated by two classes

    of agents. The first class is focused on a global analysis of the market like the analysis of

    volatility and the second one is concentrated on the analysis of individual action. Thus, thesecond class forms those specialist experts which are used to define the state of quotations of a

    particular stock. In certain cases, the agents require supplementary knowledge of which is

    stored in an experts database which is managed by the experts observation agents.

    The expert generator uses genetic algorithms to find the best composition of rules. Each

    composition forms an expert in concurrence with the others.

    Also, these agents process digital data, finding agents based on textual analysis. Certain

    agents can observe the flashes of information which correspond to a particular action. After

    the text analysis phase, they generate an additional signal for other agents, telling them to

    change certain of their parameters.

    At the end, the output of each agent is captured by the visualization agent and presented to the

    user.

    8/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    11/42

    Fig. 3: Agents of IBE

    3.2 GP Engine in IBE

    The idea will be to replace the GA based engine in the Experts Generator module with our GP

    engine. The previous section states that the agents can be divided into two classes. The first

    class of agents deal with a global analysis of the market and the second class are concerned

    with individual action.

    Each tree will generate trading signals with respect to the current point in time of the stockbeing monitored. During the systems migration phase, when the GA engine in IBE will be

    replaced with the GP engine, the first class of agents will remain unchanged. The second class

    of agents which will undergo a change. As the Experts Generator module will receive data it

    will use the genetic program to generate trees of trading rules with certain predefined

    parameters. Similar to the genetic algorithm, each tree will be a composition of trading rules.

    4 System Design and Implementation

    4.1 Problem Definition

    The objective is to create a system which optimizes a set of existing technical trading rules

    using historical quotation data using a GP based method. This system must be able to evolve

    optimized candidate solutions and also implement an appropriate dynamically adaptive GP

    learning algorithm. Meaning it must be continually evolving and adapting to the changing

    dynamics of the stock market. It must be able to produce trading expertise, promoting the

    fittest ones and rejecting the weaker ones.

    It must be also figured that what is fit now might be weak at the next moment. This signifies

    continuous fitness evaluation.

    9/40

    Stock Watch Agents

    Database Agents

    Live Stock Market Data

    Volatility Agents

    Financial Database

    Experts Database

    Experts Observation Agents

    Experts Generator

    Action Analysis Agents

    Users Database

    Text Analysis Agents

    Visualization Agents

    Security Agents

    Users

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    12/42

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    13/42

    Price Channel Breakout (PCB)

    If the current price exceeds the maximum from the previous n time units, BUY; if it

    goes below the minimum from this period, SELL; otherwise HOLD.

    If Pricecurrent x=currentncurrent Pricex Return BUY

    Else If Price current x=currentncurrent Pricex Return SELLELSE Return HOLD

    Simple Moving Average Crossover (SMAC)

    If a short term (5-day) moving average value crosses above a long term (50-day)

    moving average then BUY; if the short term average crosses below then SELL.

    If MovingAverageShortTerm MovingAverageLongTerm Return BUY

    Else If MovingAverageShortTerm MovingAverageLongTerm Return SELL

    ELSE Return HOLD

    Moving Average Convergence Divergence (MACD)

    The MACD is the difference between a short term and long term price Exponential

    Moving Average (EMA) values. If the MACD crosses above its own EMA value,

    return a BUY indicator; SELL if it crosses below.

    MovingAverageCD=ExponentialMovingAverageShortTermExponentialMovingAverageLongTerm

    If MovingAverageCD ExponentialMovingAverageCurrent Return BUY

    Else If MovingAverageCD ExponentialMovingAverageCurrent Return SELLELSE Return HOLD

    Relative Strength Index (RSI)

    The RSI compares the magnitude of a stock's recent gains to the magnitude of its

    recent losses and turns that information into a number that ranges from 0 to 100. It

    takes a single parameter, n, the number of time periods to use in the calculation

    11/40

    [4.2]

    [4.3]

    [4.4]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    14/42

    AverageGain = TotalGains / nAverageLoss = TotalLoss / n

    FirstRelativeStrength = AverageGain /AverageLoss

    For Count=2 to n

    SmoothedRelativeStrengthn=[AverageGaincount1 count1Gaincount]/ count[AverageLoss count1 count1Losscount] / count

    RelativeStrengthIndex=1100

    1RelativeStrength

    If RelativeStrengthIndex 70 then BUYElse If RelativeStrengthIndex 30 then SELL

    Else Hold

    K-Stochastic

    A technical momentum indicator that compares a security's closing price to its price

    range over a given time period. The oscillator's sensitivity to market movements can be

    reduced by adjusting the time period or by taking a moving average of the result. It

    takes a single parameter, n, the number of time periods to use in the calculation

    %K = 100[Price CurrentLowestPricen/HighestPricenLowestPricen]

    %D = 3-Period Moving Average of %K

    If %K %D then BUY

    Else If %K %D then SELLElse HOLD

    1-Day Price Change

    The 1-Day Price Change Indicator gives a BUY signal if the price has risen from the

    previous days value and SELL if it has dropped. This is a naive trading strategy that is

    being used to benchmark the performance of the GP-evolved trading rules.

    If PriceCurrent Price Current1 then BUY

    Else If PriceCurrent PriceCurrent1 then SELL

    Else HOLD

    4.2.2 Fitness Measurement

    The fitness of individual technical trading rules is measured directly from the returns

    generated by simulated trading using those rules [Altenberg, 1993].

    12/40

    [4.5]

    [4.7]

    [4.6]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    15/42

    A set of ratios to measure the performance of a stock movement have been observed and

    analyzed [Lipinski, 2003]. These ratios, while not very useful on their own, do provide a

    valuable insight on the dynamics of a stock price when used in conjunction with each other.

    They include:

    Sharpe Ratio

    Sharpe Ratio=rprfp

    where :

    rp is Expected Portfolio Return

    rf is Risk Free Rate

    is p is Portfolio standard deviation

    Source : [Sharpe, 1996]

    The Sharpe ratio measures risk adjusted performance. On an international front, current

    Sharpe ratios range from 1.7 to 2.5 with the average being 0.9, the ratio of choice by

    modern standards being above 1.0 [Domash, 2006].

    The larger the Sharpe ratio the better (the more consistent the results). The ratio will be

    negative if the average return is less than the risk-free return. Some systems exhibit a

    Sharpe ratio of 0.5 or more, and ratios above 1.0 are sometimes seen. For a long-term

    system, open profit should be included in each month's profit and loss data in order for

    the Sharpe ratio to be meaningful. If there are less than 12 months of data, we do not

    calculate the Sharpe ratio, because such a small number of data points might not be

    statistically significant and could give misleading results.

    The one-year (short-term) Sharpe ratio provides an indication of how well a system has

    performed in the most recent 12 months. The calculation uses the average monthly

    profit/loss in excess of the risk-free return for the most recent 12 months, divided by

    the standard deviation of monthly profits and losses over the same period.

    Sortino ratio

    Sortino Ratio=< R >R fd

    where :

    < R > is Expected ReturnRf is The Risk - Free Rate of Return

    p is Standard deviation of Negative Asset Returns

    Source: [Sortino, 1994]

    The larger the Sortino ratio the better. The Sortino ratio will be larger if the profit is

    high, and if the disappointments are small. For a given average disappointment, the

    Sortino ratio would be better if there were many small disappointments, rather than a

    few large disappointments (see the examples below). The ratio will be negative if the

    13/40

    [4.8]

    [4.9]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    16/42

    average return is below the risk-free return. Some systems exhibit a Sortino ratio of 1

    or more, and ratios above 2 may be seen.

    If there are less than 24 months of data, we do not calculate the Sortino ratio, because

    we feel that there may not be enough "disappointment" data to be statistically

    meaningful.

    When "average" and "standard deviation" of the disappointments are mentioned, the

    calculations include the zero values. For example, for disappointments of 1.5, 1.5, 0, 0,

    0, 0 the average is 0.5 and the standard deviation is 0.8; for disappointments of 1, 1, 1,

    0, 0, 0 the average is again 0.5 but the standard deviation is 0.5, which is significantly

    smaller than in the first example.

    The one-year (short-term) Sortino ratio is calculated, to provide an indication of how

    well a system has performed in the most recent

    A seemingly obvious method to evaluate the fitness of a trading rule is to see whether itgenerates any profits. Overheads such as transaction costs have to be taken into account. The

    idea that the trading rule might perform better under all conditions and time periods except for

    the current one has to be taken into account as well. This would imply giving individual

    trading rules a second chance.

    Another means to judge fitness is to compare the results of a trade made by individual trading

    rules to the results of a trade made by the BUY and HOLD strategy.

    Many authors have disputed the effectiveness of this strategy and in some literatures it is

    termed as a wrong idea for short term investments but provides a steady performance over

    long term portfolios. For a comparative study, it does provide an indication. [Koza et al, 1996]

    Initially both these methods will be used and after due experimentation, the decision to deploy

    one or both of them will be made.

    4.2.3 Description of GP Engine

    The GP algorithm will be detailed in this chapter. The flow chart in figure 5 details the

    components of this algorithm. In the flowchart, the functionality of each module has

    been divided into the level of the GP hierarchy with which it is concerned. Further

    more, before the actual algorithm is presented, the notation and terminology used

    within it is also explained so as to facilitate understanding the algorithm.

    14/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    17/42

    Fig. 5: Flow Chart of GP Algorithm

    Figure 5 shows a detailed flowchart representation of the algorithm.

    Specification, Notation & Terminology

    Each function of the algorithm is represented by a letter (A,B,C etc..) and a namedetailing the functionality. Any function can be called from another function and

    arguments are provided in italics.

    15/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    18/42

    An in-depth analysis of the algorithm parameters and their significance is provided in

    Annex A.

    16/40

    Object_Offset/Object_Count This integer relates to an object such as

    population,generation or expert identifies

    which is the count or the offset of that

    object being worked upon

    Node_Library A text file containing nodes for the GP

    tree. Nodes are randomly selected fromthis library to create GP trees.

    T,Ttest/train The variable T is an integer

    representation of the total time for a

    stock as the number of "ticks". When it

    is subscripted with either test or train,

    this variable then specifies ,at which tick

    to begin testing or training

    NTRE/GEN The integer N defines the maximum

    number of trees or generations that can

    be created during any instant

    Objectobject_Count This variable is a direct representation ofthe object at an instance of object_count

    Ctest/train (time) This variable, defines the capital or the

    performance measure. There are

    seperate capital values for the testing

    period and training period. represented as

    Ctestnet(Ttest-1+Population_Offset) and

    Operation_Limit This integer defines the limit for any

    operation, starting at 0.

    Px This variable defines the percentage for

    any context X. X can be elitism,

    crossover, mutation, the percentage of

    generation to be carried and the

    percentage of the stock quotations to be

    used for testing.

    Rand Any random variable

    Rdec This variable represents the boundary

    limit

    B time The variable B is an identification of the

    stock and when subscripted with a time

    value, it means the value of the stock at

    that time.

    A lower/uppertime This variable is generated by the moving

    average parameter. At any given time,

    this would be the upper and lower limits

    of the moving average boundary

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    19/42

    Algorithm

    Initialization:

    1. Initialize parameters

    2. InitializePopulation_Offset= 03. RetrieveNode_Library

    4. SetPopulationPopulation_Offset=Population Creation fromNode_Library5. Train, Test and EvolvePopulationPopulation_Offset

    Train, Test and EvolvePopulation

    1. While Ttest+Population_Offset< T

    i. Train, Test and Evolve Generation

    ii. IncrementPopulation_Offsetby 1

    iii. SetPopulationPopulation_Offset=Population Creation fromPopulationPopulation_Offset-1

    Train, Test and Evolve Generation

    1. While Generation_Count

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    20/42

    iv. Mutation_Limit = Pmut *NTRE/100

    4. ForCount= 0 toElitism_Limit

    i. GenerationExpert_Count = Previous_GenerationCountii. Increment Expert_Count

    5. ForCount= 0 to Crossover_Count

    i. Initialize Parent_Expert_1 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset

    ii. Initialize Parent_Expert_2 = Roulette Wheel Selection of Expert fromPrevious_Generation atPopulation_Offset

    iii. GenerationExpert_Count , GenerationExpert_Count+1 =Expert Creation From Crossover ofParent_Expert_1 andParent_Expert_2

    iv. Increment Expert_Countby 2

    6. ForCount= 0 toMutation_Count

    i. GenerationExpert_Count = Expert Creation From Mutation of a GenerationCountii. Increment Expert_Count

    Roulette Wheel Selection ofExpertfrom Generation atPopulation_Offset

    1. InitializePerformance_Sum = Sum ofCtestnet(Ttest-1+Population_Offset)of eachExpertin Generation

    2. InitializePerformance_Ratio = 0

    3. Initialize Count= 0

    4. Select a random double valueRandbetween 0 and 1.

    5. WhilePerformance_Ratio

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    21/42

    Tree Creation FromNode_Library:

    1. Initialize an empty Tree

    2. While Tree_Depth

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    22/42

    4.2.4 Conclusion

    The problem, rapidly analyze stock market price data for a given stock and give a

    BUY/SELL or HOLD decision, has been detailed. A solution, a genetic programming

    based algorithm which incorporates certain financial technical indicator functions.The technical indicator functions are also detailed and explained alongwith the context.

    A flow chart describing the flow of the modules of the GP algorithm is presented to

    give a clearer view of its structure. Then finally the algorithm itself is presented, with a

    technical specification.

    Now at this stage, we are ready to do some experiments by assigning parametric data

    and to draw conclusions from these experiments.

    5. Experimentation

    5.1 Experimental Aims and Objectives

    The primary objective of the experimental work was to demonstrate the effectiveness

    (or otherwise) of this system and of the general concept - GP optimization of TI based

    trading rules - in making profitable forecasts of stock price movements.

    It is necessary to demonstrate that the GP algorithm is learning rules which have some

    predictive power beyond the training period, as opposed to just learning the behavior of

    the training data. The stock data used for the experiment is composed of variable tick

    rates and is sufficiently unpredictable to facilitate the goals. Applying various

    parameters, one of the objectives is to establish whether any profit is achieved, and todiscern relationships between parameters and performance values.

    5.2 Trading Procedure

    A selection of the parameters of the GP algorithm will be assigned a range of values.

    Then the GP algorithm will be applied to experimental input data. The American

    trading strategy will be used; at the beginning of the experiment on each set of data, the

    initial number of stocks in hand will be zero towards the end of the data, a BUY

    decision will be forced.

    A trial is run with the first set of values of each parameter. Performance is measuredand noted. Then the value of one parameter is assigned the next value in its range and

    the above process is repeated.

    The time taken for the whole process as a ratio of the number of populations will be

    used as one of the measures of evaluating performance. This ration provides two

    advantages. It gives a reasonable estimate as to how much time will be taken to

    calculate a decision for one quotation. The size of data in each set of quotations is

    different and the parameter for the percentage of data used in testing will yield a biased

    result, this ratio eliminates such concerns.

    The second measure of performance will be the net profit at the end. That is the

    difference between the initial capital and the net worth at the end.

    20/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    23/42

    5.3 Experimental Input Data

    The experiments used stock market price data for simulated evaluation of trading

    strategies.

    Experiments were conducted using share price data for AXA, Peugeot S.A. and STMicroelectronics N.V. traded on the Paris Stock Exchange; Bourse de Paris. Data for

    all stocks covers the same 6 day period, from 29 th May 2006 to 3rd June 2006. The price

    values were plotted from a spreadsheet and visually inspected for anomalous values,

    such as negative volume values at the start of the trading day, before being used as

    input for the GP system as CSV files. Also note that there are periods within each

    graph represented by sloping lines. These lines are periods of inactivity in the stock

    market, the time after which the stock market is closed for the day and before it opens

    the next morning.

    Fig. 6: AXA Data

    Fig. 7: Peugeot Data

    21/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    24/42

    Fig. 8: ST Microelectronics Data

    5.4 Parameters

    The parameters used in the GP algorithm will now be explained, with regards to range

    of values and reasons for selection of these parameters at the respective values or valueranges.

    Initial CapitalC0, Commission Pcom, Initial Number of Stocks St, Decision BoundaryRdec.and Trading Strategy

    Initial Capital and Commission are fixed at 100,000 and 0.2 % respectively.

    Since the American trading strategy is used, the Initial Number of Stocks will be

    throughout zero.

    Decision Boundary is set at 0.2% as a lower value would allow too many decisions to

    be taken, thus increasing the commission by a large number. A higher value would

    filter too much allowing too few decisions to be made.

    Moving Average RangeRMAThe Moving Average Range will affect how narrowly to filter decisions according to

    price fluctuations. A low value will allow decisions to be made according to smaller

    shifts as opposed to a high value which will be less sensitive. Values selected for this

    are 10,15 and 30.

    Buy Sell Percentage

    This value is fixed at 50 to allow for the effects of decisions to be more apparent.

    Number of GenerationsNGENand Number of Trees in each Generation. NTRE.The number of generations and number of trees effect the performance and time taken.

    A low value will take less processing time but performance will be sacrificed and vice

    versa.

    Values for Number of Generations include 5,10 and 20 and for Number of Trees in

    each Generation include 100 ,200 and 300.

    Maximum Tree Depth NDEPThis parameter is fixed at 20 as too deep a tree would needless increase processing

    time and resources.

    22/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    25/42

    Percentage of Previous Population Carried ForwardPcarry.This parameter is fixed at 50 as this would provide an equal mix of expertise from the

    old population and new expertise from randomly created nodes.

    ElitismPelite

    The value for elitism is fixed as 2 as too high a value would lead to convergence.

    Crossover ProbabilityPcross .and Mutation ProbabilityPmutThis parameter would effect expertise exchanged but genetically modified between

    generations.

    Values are set between 80 and 90 for crossover and 10 and 20 for mutation.

    Replacement, addition and deletion Probability in MutationPrepmut, Paddmut PdelmutThese are fixed as 33.33 for each one of them.

    Training Start Quotation limit Ttrain

    The training start time is fixed at 30 , as a lower value would limit the effectiveness ofsome of the technical indicators.

    Percentage of Quotations for TestingPtest.

    This parameter would define how much of the data would be used for training and how

    much for testing.

    Its values include 60,75 and 90.

    Refer to figure 9 for a summary of parameter values and ranges.

    C0 Pcom St Rdec.

    100,000 0.2 0 0.2

    RMA NGEN. NTRE. NDEP Pcarry. Pelite Pcross . Pmut Prep

    mut Padd

    mut Pdel

    mut

    10 5 100 20 50 2 80 20 33.33 33.33 33.33

    15 10 200 85 15

    30 20 300 90 10

    Ttrain Ptest30 60

    75

    90

    Fig. 9: Summary of Parameters

    23/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    26/42

    5.5 Results

    Representation of Results

    Performance is measured as the ratio of the profit gained to the initial capital invested.

    The results are shown on scatter charts with the parameter values on the X-axis versus

    profit ratio on the Y-axis. The use of these type of charts is helpful in determining thetendencies of profit ratios with respect to a certain parameter. They are also helpful in

    determining anomalies.

    Buy & Hold as a Performance Indicator

    The application also does a Buy & Hold run before trials are run on each stock.

    Stocks are bought at the beginning of each business day. The amount of stocks bought

    is determined by the Buy/Sell percentage, which is fixed during parameterization.

    The profit gained during each such run is also marked on the scatter graph.

    Variation of Profit

    According to Figure 10, which shows the net profit as a variant of the moving average

    range, a tendency for higher values of profit are shown at a moving average range of

    30. A slight discrepancy is noticed for the AXA stock value which shows higher values

    of profit at a moving average range of 10. This anomaly can be taken as a random

    occurrence and discounted as it appears isolated.

    Fig 10. Scatter charts of net profit as a variant of Moving Average Range

    Figure 11, shows the net profit as a variant of the number of generations per

    population, a tendency for higher values of profit are shown at the value of 10. Adiscrepancy is noticed for the AXA stock value which shows a slightly higher value of

    profit at 20.

    24/40

    AXA Profit Ratio vs. RMA

    -2.00E-02

    -1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+005.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    0 10 20 30 40

    RMA

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. RMA

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0 10 20 30 40

    RMA

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STM Profit Ratio vs. RMA

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    0 10 20 30 40

    RMA

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    27/42

    Fig 11. Scatter charts of net profit as a variant of Number of generations

    Figure 12, shows the net profit as a variant of the number of trees per generation, a

    tendency for higher values of profit are shown at the low value of 200.

    Fig 12. Scatter charts of net profit as a variant of Number of Trees

    Figure 13 shows the net profit as a variant of the crossover percentage and there is ahigh profit ratio trend at the 80 percent mark.

    Fig 13. Scatter charts of net profit as a variant of Crossover percentage

    Figure 14, shows the net profit as a variant of the testing percentage, a tendency for

    higher values of profit are shown at the 75 percent mark. A marked discrepancy can be

    seen with regards to STMicroelectronics which shows higher profit values at 90.

    25/40

    AXA Profit Ratio vs. NGEN

    -2.00E-02-1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+00

    5.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    0 5 10 15 20 25

    Ngen

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. NGEN

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0 5 10 15 20 25

    Ngen

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STM Profit Ratio vs. NGEN

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    0 5 10 15 20 25

    Ngen

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    AXA Profit Ratio vs. NTrees

    -2.00E-02

    -1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+00

    5.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    0 100 200 300 400

    Ntrees

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. Ntrees

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0 100 200 300 400

    Ntrees

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STM Profit Ratio vs. Ntrees

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    0 100 200 300 400

    Ntrees

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    AXA Profit Ratio vs. PCross

    -2.00E-02

    -1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+00

    5.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    78 80 82 84 86 88 90 92

    PCross

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. PCross

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    78 80 82 84 86 88 90 92

    PCross

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STMProfit Ratio vs. PCross

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    78 80 82 84 86 88 90 92

    PCross

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    28/42

    Fig 14. Scatter charts of net profit as a variant of Testing percentage

    Measure of Profit and Time per population

    Figure 15 shows the net profit as a variant of the time per population in seconds. Ascan be seen, higher profit values are closer to the low end of the time range. This

    means that higher profit values are in fact more likely to generated at shorter amounts

    of time, at the 10 second boundary or before.

    Fig 15. Scatter charts of net profit as a variant of Time per population

    The evolutionary performance of the GP algorithm was reasonably sensitive to the

    control parameters: Varying the crossover and mutation probabilities, number of

    generations etc had a noticeable effect on the profit values attained

    5.6 Discussion of Results

    If the anomalies in the above results are disregarded; the following parameters at the

    following settings should give very high, if not the highest, profit values at a time ratio

    of less than 10 seconds per population.

    26/40

    AXA Profit Ratio vs. Ptest

    -2.00E-02

    -1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+00

    5.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    0 20 40 60 80 100

    Ptest

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. Ptest

    -0.03

    -0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0 20 40 60 80 100

    Ptest

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STM Profit Ratio vs. Ptest

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    0 20 40 60 80 100

    Ptest

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    AXA Profit Ratio vs. Time(seconds) per population

    -2.00E-02-1.50E-02

    -1.00E-02

    -5.00E-03

    0.00E+00

    5.00E-03

    1.00E-02

    1.50E-02

    2.00E-02

    0 10 20 30 40 50

    Time(seconds)

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    Peugeot Profit Ratio vs. Time(seconds) per

    population

    -0.03-0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0 10 20 30 40 50

    Time(seconds)

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

    STM Profit Ratio vs. Time(seconds) per population

    -3.00E-02

    -2.00E-02

    -1.00E-02

    0.00E+00

    1.00E-02

    2.00E-02

    3.00E-02

    0 10 20 30 40 50 60

    Time(seconds)

    ProfitRatio

    Profit Ratio

    Buy& Hold Ratio

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    29/42

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    30/42

    The highest profit for each data set has parameters which are slightly off from the

    above proposed optimal settings.

    Some of the parameters at certain datasets seem to follow a pattern with respect to

    profit. An example would be AXA with increasing moving average ranges. The exact

    opposite is noted at STMicroelectronics which shows increasing profit at decreasing

    moving average ranges. A few parameters seem to show no pattern at all; number ofgenerations for example.

    Figures 19,20 and 21 show the stock data and the BUY/SELL decisions of the GP

    algorithm for AXA, Peugeot and STMicroelectronics respectively. The circles

    represent SELL decisions, the squares represent BUY decisions. The following f

    igures are subsets of the original stock data, to make it easier to represent on paper.

    Fig 19. Graph output of AXA quotes with BUY/SELL decisions

    Fig 20. Graph output of Peugeot quotes with BUY/SELL decisions

    28/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    31/42

    Fig 21. Graph output of STM quotes with BUY/SELL decisions

    6. ConclusionThis report put forth the problem of analyzing financial time series data to suggest actions to

    be taken in quasi-real time. A solution was proposed, based on genetic programming. The idea

    was to create GP trees with financial technical indicators as branches and logical operators to

    join these branches.

    In order to fully appreciate the significance of this endeavor, current systems which employ

    similar techniques were studied. The greatest inspiration was the Internet Bourse Experts

    system, which employed genetic algorithm. In depth analysis was done of another GP based

    system called EDDIE. A development platform had to be chosen which would make designing

    of the software portion easier.

    The initial tasks included an intensive study of evolutionary computing and stock market

    trading methodologies. A tentative GP algorithm was devised. The hierarchical structure of the

    major objects; population, generation, expert and tree, was proposed.The functionalities of

    each object was designed such that any property or function could be accessible at any point in

    the program. A representation for a tree structure was researched. Functionalities such as tree

    construction, parsing, removal and modification of nodes and evaluation had to be

    incorporated in this representation. A grammar for this kind of representation which emulated

    a typical GP tree structure.

    The technical indicators used in the project had been selected from their obvious benefits on

    previous work in this domain. The vast library of IBE's trading functions is an obvious source.

    All experiments were conducted on real stock price data. In all cases, as was demonstrated

    during the experimentation phase, the results are more profitable then by the technique of

    Buy-and-hold.

    The tick frequency of each data set was different. Although the time period for each was the

    same, the number of quotations was different. AXA contained 1013 quotations, Peugeot had

    828 quotations, while STMicroElectronics had 860 quotations. This factor was not taken into

    29/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    32/42

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    33/42

    Appendix A

    GP Algorithm Parameters

    Trading-Specific Parameters

    Initial Capital

    This defines the initial working capital as type double, before any trading decisions are

    made, it is represented as C0 and the amount of working capital at any subsequent time

    period tis represented as Ct.

    Commission

    This is the commission charged per transaction as a percentage of number of stocks

    bought or sold, as type double, represented asPcom .

    Buy Sell PercentageThis parameter defines the percentage of capital to use to buy stocks, if the decision to

    buy is given or the percentage of stocks in hand to sell if the decision to sell is given.

    Both of are type double.

    They are represented asPbuy andPsellrespectively.

    Initial Number of Stocks

    This parameter is used to define the initial number of stocks to have in hand, at the start

    of the trading day. It is of type integer. It is represented as S0 . The number of stocks at

    any given time tis represented as St.

    Decision BoundaryThis parameter defines the minimum difference in stock prices which will allow a

    decision to take place. It is of type double and it is represented byRdec . At any point in

    time the absolute difference between subsequent stock prices must be greater than or

    equal toRdec.

    .

    Moving Average Range

    This parameter defines the previous number of time periods used in calculating a

    moving average from stock prices. It is of type integer. It is defined asRMA. The

    commission percentagePcom of the moving average at a given time tis added and

    subtracted to create a boundary, if the current stock price falls into it, no decision is to

    be made. This allows for decisions to be made according to fluctuations in price

    movement trends.

    Trading Strategy

    This parameter defines whether to use the American trading strategy, i.e. if the

    American trading strategy is used, fix the initial number of stocks at 0 and end of the

    trading day, sell all stocks.

    31/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    34/42

    Importance of Trading-Specific Parameters

    Portfolio Management and Performance (Pt)

    The capital at time t Ct , and the number of stocks in hand at time t St are

    continuously changing based on whether the decision is BUY or SELL.

    A. If the decision is BUY, and if the working capital is more than zero, the

    following formula is used;

    1. Sbuy = (Pbuy * Ct-1 ) /Bt

    Where Sbuy is the number of stocks to buy

    andBt is the stock value at time t.

    2. St= Sbuy + St-13. Ct = Ct-1 - (Bt* Sbuy) - (Pcom * Sbuy)

    B. If the decision is SELL, and if the number of stocks is more than zero, the

    following formula is used;

    1.Ssell= (Psell * Ct-1 ) /BtWhere Ssellis the number of stocks to sell

    2. St= Ssell- St-13. Ct = Ct-1 + (Bt* Ssell) - (Pcom * Ssell)

    C. After either of these steps, the net worth , which is used a performance

    measure is calculated;

    1. Cnet(t) = Ct + (Bt* St)

    Where Cnet(t) is the net worth/ performance at time t

    .

    Also all variable portfolio indices, Capital, Number of Stocks and net worth/

    performance are separate for training and testing periods. They are represented as Ctestt, Stesttand C

    testnet(t).For testing and for training as C

    traint , S

    traint and C

    trainnet(t) respectively.

    As can be seen from the above two formulas , the commission , Pcom, always figures

    into the calculation and is always deducted from the capital , regardless of the decision

    made. Keeping this in mind, decisions have to made carefully as too many would

    deplete the capital too soon. The following section describes how it is possible to avoid

    such an event from happening.

    Filtering measures

    Two filtering measures are used, moving average and decision boundary.

    For the moving average;

    1. Calculate the moving average from time tto t-RMA as follows;

    I. SetA t := 0

    II. for Index= 0 toRMAA t= (A t*Index + Bt-Index)/(Index + 1)

    WhereA t is the moving average at time t andIndex is a counter

    32/40

    Formula 13

    [A.1]

    [A.2]

    [A.3]

    [A.4]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    35/42

    2. Calculate and upper limit and lower range of the

    moving average by adding and subtracting the

    commission , Pcom, as a percentage of the moving

    average as follows;

    I. Aupper

    t =A t+ (Pcom * A t)/100

    II. Alowert=A t- (Pcom * A t)/100Where A

    uppert is the upper limit and A

    lowertis the lower limit.

    If a stock value at time t does not lie betweenAupper

    t andAlower

    t and the absolute

    difference between the current and previous value is greater then or equal to the

    decision boundary ,Rdec , a request for a BUY or SELL decision will be made.

    IF (Rdec |Bt-Bt-1|) AND NOT(Alower

    tBtAupper

    t)

    REQUEST DECISION.

    Genetic Programming Specific Parameters

    Number of Generations

    This defines the maximum number of generations in each population. It is of type

    integer. It is represented asNGEN.

    Number of Trees in each Generation

    This defines the maximum number of trees in each generation. It is of type integer. It

    is represented asNTRE.

    Maximum Tree Depth

    This defines the maximum depth of a tree. It is of type integer. It is represented as

    NDEP.

    Percentage of Previous Population Carried Forward

    This defines the percentage of the top members of the previous population which will

    be used to create the new population. It is of type integer. It is represented asPcarry.

    The very first population is consists of trees which have been generated from randomly

    selected leaf nodes from a library. This kind of randomness is sufficient for an initial

    population, but for subsequent populations, the expertise of a previous population is

    necessary as it may be provide a reasonable solution for the forthcoming sample space.

    Elitism

    This defines the percentage of the elitist trees based on performance which will be

    carried forward into the next generation unchanged. It is of type integer. It is

    represented asPelite.

    Crossover Probability

    This defines the percentage of the number of trees from a previous generation which

    are used to create new trees using the crossover process described in section. It is of

    type integer. It is represented asPcross .

    33/40

    [A.5]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    36/42

    Mutation Probability

    This defines the percentage of the number of trees from the previous generation which

    are used to create new trees using the mutation process described in section. It is of

    type integer. It is represented asPmut .

    Mutation consists of replacement, addition or deletion operations and the probabilityfor each operation occurring is defined asPrepmut,P

    addmutandP

    delmut.

    They are of type double.

    Importance of Genetic Programming Specific Parameters

    Hierarchical Structure

    The Tree is the base object. The conceptual structure of the tree has been detailed in

    section. The technical indicators and logical operators are stored as strings and to

    evaluate the tree, the string is parsed. When a tree operates on stock market data, itgives a BUY, SELL or HOLD decision.

    Trees can either be created from random nodes, from crossover operations or from

    mutation operations.

    EachExpertcontains 2 trees, a BUY tree and a SELL tree. The result of both trees

    undergoes a XOR operation to return a single result. Each expert maintains a record of

    parameters, performance, capital and number of stocks. In future references, an expert

    would refer to the pair of BUY and SELL trees.

    Each Generation containsNTREnumber of experts. The first generation, GEN0 ,

    contains experts created from random nodes. Subsequent generations, GEN1 to

    GENNGEN, have high ranking experts from the previous generation and new experts

    created by genetic operations, namely mutation and crossover.

    EachPopulation containsNGENnumber of generations. The first generation in the first

    population is randomly created (as detailed above, subsequent generations in the same

    population are created through elitism and genetic operations) , but the first generation

    in forthcoming populations will consist ofPcarry percent of the elitist members of the

    fittest generation from the previous population.

    Figure A.1 exhibits the hierarchical structure of the objects described above.

    34/40

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    37/42

    Fig. A.1: Hierarchical Structure of GP Objects

    Training and Testing Specific Parameters

    Training Start Quotation limitThis defines at which point in time on the stock market sample to begin training. This

    number is an integer and at minimum it has to be 30. Some of the technical indicators

    used in the application read quotations going back to several points in time. It is

    represented as Ttrain.

    Percentage of Quotations for Testing

    This defines what percentage of the stock market sample to use for testing. This

    number is an integer. It is represented asPtest.

    Subsequently;

    35/40

    Population POP0

    Generation GEN0

    Generation

    GEN

    ..

    ExpertEXP0 ExpertEXP

    NTRE

    ..

    Sell TreeBuy Tree Sell TreeBuy Tree

    ExpertEXP0 ExpertEXP

    NTRE

    ..

    Sell TreeBuy TreeSell TreeBuy Tree

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    38/42

    Ttest= (Ptest * T)/100

    Where Trepresents the total time line in the sample space and Ttestis the

    time at which testing will begin.

    Importance of Training and Testing Specific Parameters

    The sample space, in this case the stock market data, is divided into TRAINING and

    TESTING periods which move forward a single time unit as populations progress.

    The first training period is between Ttrain and Ttest-1 and the first testing period is at Ttest .

    This means that each expert of the first generation of the first population will be

    applied to the stock quotations during this training period. The performance of each

    expert will be calculated and the fittest will be used to create the second generation.

    This process will repeat itself untilNGENgenerations have been created. The last

    generation will be the fittest according to the fitness measure. The fittest tree will be

    applied to the first testing period, Ttest.

    At this point, a new population is to be created. The training and testing periods will beoffset by 1. Therefore, in this case, the training period will be between Ttrain+1 and Ttest-

    1+1 and the testing period will be Ttest+1. Instead of complete random creation of the first

    generation in this new population,Pcarry percent of the elitist experts from the last

    generation of the previous population will be carried as they are into the new

    population and the remainder will be randomly generated.

    This process will repeat until the last point in the sample space is tested. That is,

    Until Ttest+offset== T.

    Where offset is an integer which is initialized at 0 and is incremented

    by 1 each time a new population is to be created.

    36/40

    Formula 15

    Formula 16[A.7]

    [A.6]

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    39/42

  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    40/42

    [Goodhart, 1995]

    Goodhart, C., OHara, M., High Frequency Data in Financial

    Markets: Issues and Applications, London School of Economics,

    1995.

    [Gourieroux, 1997]Gourieroux., C.,ARCH Models and Financial Applications,

    Springer Verlag, 1997.

    [Holland,1975]

    Holland, J., Adaptation in Natural and Artificial Systems,1975.

    [Hui, 2003]

    Hui, A., Using Genetic Programming to Perform Time-Series

    Forecasting of Stock Prices, http://ww.genetic-programming.org ,

    2003.

    [Kaboudan, 2000]

    Kaboudan, M., Genetic Programming Prediction of Stock Prices,

    Computational Economics, Volume 16, pp. 207236, 2000.

    [Korczak, 2001]

    Korczak, J., Kustner. P.,A Stock Trading System using Genetic

    Approach and Object-Oriented Database Technology, Proceedings

    on Workshop on Artificial Intelligence for Financial Time Series

    Analysis, 2001.

    [Korczak, 2004] Korczak, J., Lipinski, P.,Evolutionary building of stock trading

    Experts in a Real-Time System, Proceedings of the 2004 Congress

    on Evolutionary Computation, CEC 2004, pp.940-947, 2004.

    [Korczak, 2001]

    Korczak, J., Roger, P., Stock timing using genetic

    algorithms,Applied Stochastic Models in Business and Industry

    Volume 18: pages 121134,2001.

    [Koza, 1992]

    Koza, J., Genetic Programming: On the Programming ofComputers by Means of Natural Selection, The MIT Press, 1992.

    [Koza, 1995]

    Koza, J., Survey of Genetic Algorithms and Genetic Programming,

    Proceedings of the WESCON 95 Conference Record,1995.

    38/40

    http://ww.genetic-programming.org/http://ww.genetic-programming.org/
  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    41/42

    [Koza et al., 1996]

    Koza, J., Bennett III, F., Andre, K., Keane,M.,Artificial

    Intelligence in Design, http://www.genetic-programming.com,

    1996.

    [Krishnaswamy et al., 2000]Krishnaswamy, C., Gilbert, E., Pashley, M., Neural Network

    Applications in Finance: A Practical Introduction, Financial

    Practice and Education, 2000.

    [Langdon, 1995]

    Langdon, W., Qureshi, A., Genetic Programming: Computers

    using "Natural Selection" to generate programs, The MIT Press,

    1995.

    [Lendasse et al., 2001]

    Lendasse A., Lee J., de Bodt, E., Wertz, V., Verleysen, M.,Dimension Reduction of Technical Indicators for the Prediction of

    Financial Time Series - Application to the BEL20 Market Index,

    European Journal of Economic and Social Systems 15, Vol. 2, pp.

    31-48, 2001.

    [Lipinski, 2003]

    Lipinski P.,Evolutionary Data-Mining Methods in Discovering

    Stock Market Expertise from Financial Time Series, PhD Thesis,

    ULP Strasbourg, 2003.

    [Mitchell et al., 1992]

    Mitchell M., Forrest S., Holland ,J., The royal road for genetic

    algorithms: Fitness landscapes and GA performance; Proceedings

    of the First European Conference on Artificial Life, Paris, France,

    pp. 245, 1992.

    [Molgedey, 2000]

    Molgedey, L., Ebeling, W.,Intraday Patterns and Local

    Predictability of High Frequency Financial Time Series, Physica A:

    Statistical Mechanics and its Applications,Volume 287, Issues 3-

    4,pp. 420-428, 2000.

    [Pantazopoulos et al., 1998]

    Pantazopoulos, K., Tsoukalas, L., Bourbakis, N., Brun, M.,

    Houstis, E.,Financial prediction and trading strategies using

    neuro-fuzzy approaches , IEEE Transactions on Systems, Man and

    Cybernetics, Part B,Volume: 28, Issue: 4, pp. 520-531, 1998.

    39/40

    http://www.genetic-programming.com/http://www.genetic-programming.com/
  • 7/27/2019 Discovery of Stock Trading Expertise Using Genetic Programming

    42/42

    [Santini, 2000]

    Santini, M., Tattamanzi A., Genetic Programming for Financial

    Time Series Prediction, Proceedings of EuroGP'2001, Volume:

    2038, pp. 360371, 2001.

    [Sharpe, 1996]Sharpe, W.,Mutual Fund Performance, Journal of Business, pp.

    119-138, 1966

    [Sortino, 1994]

    Sortino, F., Price, L., Performance Measurement in a Downside

    Risk Framework, The Journal of Investing, pp. 59-65, 1994

    [Spears, 2003]

    Spears,W., Gordon-Spears, D., Evolution of strategies for resource

    protection problems, Advances in evolutionary computing: theory

    and applications, Springer-Verlag, 2003.

    [Xu et al., 2003]

    Xu, Z., Leung, K., Liang, Y., Leung, Y., Efficiency Speed-up

    Strategies for Evolutionary Computation: Fundamentals and Fast-

    GAs, Applied Mathematics and Computation, v.142, pp. 341-388,

    2003.

    [Zitvogel, 2003]

    Zitvogel, O.,Dveloppement d'un Systme Multi-Agents, Interface

    Intelligente, Ngociation et Gestion de Bases de Dones, Internship

    Report, LSIIT-AFD, Illkirch, 2003.