1
Data points are spread over the space according to two of their component values Using real data sets to simulate evolution within complex environments Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University Target Question: Does the complexity of the environment significantly affect evolutionary processes? (here “complexity” means that there are exploitable patterns in the environment but these are difficult to discover) Adding randomness to the environment (or the fitness function) is not satisfactory and the NK model of fitness adjusts the difficulty in a uniform manner (second order uniformity) Start of Simulation (HD Data) Individuals each with gene which is an arithmetic expression, e.g.: Data points from set distributed over space dependent on 2 variables chol (x) & thalach (y) Simulation – 25 ticks Simulation – 50 ticks ca 1 sex slope restecg+fbs+1 fbs/ oldpeak Simulation – 300 ticks (100 w/o variation) An example simulation run using the HD data set: Illustrative Results: Using Reduced Cleveland Heart Disease Data Set, 20 runs with each setting, 1000 individuals, 1000 iterations, Locality parameter 0.1 (radius), Comparison of Original vs Ersatz Data Sets (same distribution in each dimension but random), Fixed normal noise (0, 0.1) added to both data sets Fitness Gene Complexity/Depth Original Ersatz • Evolutionary data-mining is where ideas from biological evolution are applied to data-mining – finding patterns in data • Data sets exist for the purpose of testing different ML algorithms that have patterns in them, albeit difficult to discover Reversing this... I am suggesting the use of complex data sets as a test bed to investigate how the complexity of the environment might affect evolution Main Ideas: The Data Set Environment: o Find a rich data set (preferably one derived from a naturally complex system) with many independent variables o The gene of an individual is an arbitrary arithmetic expression stored as a tree (or similar technique) o Resource in the model is modelled by distributing to individuals predicting the outcome variable of local data better than its competitors o The gene are mutated and crossed as the simulation progresses o Individuals are selected for/against depending on their total success in predicting When mighty the complexity of the environment effect evolutionary processes? How might the complexity of the environment effect evolutionary processes? Will models with a simple environment tell us about evolution in the wild? When and about what aspects will models with simple environments be sufficient? •In what ways might evolution differ when in complex environments? What kind of complexity might we need? How might one measure this complexity in the wild (if this is even Data Space 3.7 1.1 0.8 For each data point (or a random subset of them) evaluate (a random selection of) near individuals to determine the share of fitness each receive (depending on predictive success) Sum of fitness determines which breed and die Individuals each with genes composed of an arithmetic expression to predict HD based on the other 13 variables Data Space 23.7 17.6 8.6 12.3 9.0 15.5 3.2 12.5 N times: 1. probabilisticall y select winner(s) on fitness 2. probabilisticall y select a loser on lack of fitness 3. kill loser Either 4. propagate winner locally with possible mutation 5. mate with another local based on fitness 8.1 Questions: That might be explored using this approach (1) Evaluation Phase of Model: (2) Selection Phase of Model:

Using Data Sets to Simulate Evolution within Complex Environments (Poster)

Embed Size (px)

DESCRIPTION

An agent-based model of evolution and adaption is presented with complex genomes, sexual and asexual reproduction. However, unlike other models, this has a purposefully complex environment. Individuals are spread over a 2D space upon which is projected a complex data set derived from observations of the real world. Energy extraction for the purposes of life and reproduction is achieved by the match between the genome and the data in its locality. In this model different genomes develop to exploit different parts of the space, with some genomes acquiring a greater generality than others. This model can be used to explore the trade-offs inherent between diversity, specialization and inter-breeding among individuals.

Citation preview

Page 1: Using Data Sets to Simulate Evolution within Complex Environments (Poster)

Data points are spread over the space according to two of their component values

Using real data sets to simulate evolution within complex environmentsBruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University

Target Question: Does the complexity of the environment significantly affect evolutionary processes?

(here “complexity” means that there are exploitable patterns in the environment but these are difficult to discover) Adding randomness to the environment (or the fitness function) is not satisfactory and the NK model of fitness

adjusts the difficulty in a uniform manner (second order uniformity)

Start of Simulation (HD Data)

Individuals each with gene which is an arithmetic expression, e.g.:

Data points from set distributed over space dependent on 2 variables

chol (x) & thalach (y)

Simulation – 25 ticks Simulation – 50 ticks

ca

1

sexslope

restecg+fbs+1fbs/oldpeak

Simulation – 300 ticks (100 w/o variation)

An example simulation run using the HD data set:

Illustrative Results:Using Reduced Cleveland Heart Disease Data Set, 20 runs with each setting, 1000 individuals, 1000 iterations, Locality parameter 0.1 (radius), Comparison of Original vs Ersatz Data Sets (same distribution in each dimension but random), Fixed normal noise (0, 0.1) added to both data sets

Fitness Gene Complexity/Depth

Original

Ersatz

• Evolutionary data-mining is where ideas from biological evolution are applied to data-mining – finding patterns in data• Data sets exist for the purpose of testing different ML algorithms that have patterns in them, albeit difficult to discover• Reversing this... I am suggesting the use of complex data sets as a test bed to investigate how the complexity of the environment might affect evolution

Main Ideas:

The Data Set Environment:o Find a rich data set (preferably one derived from a naturally complex system) with many independent variableso The gene of an individual is an arbitrary arithmetic expression stored as a tree (or similar technique)o Resource in the model is modelled by distributing to individuals predicting the outcome variable of local data better than its competitorso The gene are mutated and crossed as the simulation progresseso Individuals are selected for/against depending on their total success in predicting

When mighty the complexity of the environment effect evolutionary processes?How might the complexity of the environment effect evolutionary processes?Will models with a simple environment tell us about evolution in the wild?

• When and about what aspects will models with simple environments be sufficient?• In what ways might evolution differ when in complex environments?

What kind of complexity might we need?How might one measure this complexity in the wild (if this is even possible)?

Data Space

3.71.1

0.8

For each data point (or a random subset of them) evaluate (a random selection of) near individuals to determine the share of fitness each receive (depending on predictive success)

Sum of fitness determines which breed and die

Individuals each with genes composed of an arithmetic expression to predict HD based on the other 13 variables

Data Space

23.717.6

8.612.3

9.0

15.5

3.2

12.5 N times:1. probabilistically select

winner(s) on fitness2. probabilistically select a

loser on lack of fitness3. kill loserEither4. propagate winner locally

with possible mutation5. mate with another local

based on fitness8.1

Questions:That might be explored

using this approach

(1) Evaluation Phase of Model: (2) Selection Phase of Model: