Complexity and Order: Modelling Functions with Mixed Order Hyper-Networks

Complexity and Order

Kevin Swingler

What is Complexity?

<

Interaction Order

21

3

Total Possible Interactions

• Where n is the number of nodes:– Possible 1st order interactions = n– Possible 2nd order interactions = (n(n-1))/2

– Possible order k interactions =

k

n

nn

k k

n2

0

Measuring Complexity

• Enumerate the possible interactions w0 .. w2n

-1

• Count the number that are used• Then a measure of complexity of a system

might be the number of interactions, possibly divided by 2n

• We might also want to consider higher order interactions as being more complex than lower ones– For example, accounting for the number of

samples needed to model them …

Function Modelling

Now we define a system more specifically

We would like to express any such function so that:• The interactions are explicit• It reproduces the function perfectly• Local maxima are attractor points• If the function is a PMF, we can sample from it

and calculate any probability we like

Why?

• We can specifically manage and understand the complexity of the model

• We can find optimal points in input space that maximise the output

• We can sample from the function (which in general is difficult)

How?

• I am using Mixed Order Hyper-Networks (MOHNs)

• A type of neural network

Neural Networks

• Generally:– A set of processing units, u with roles of either:

• Receiving input• Making calculations• Providing output

– Defined by a set of weighted connections between pairs of units

– Each unit makes the same calculation

jjii uwfu

MOHNs

• Units do not have roles – no input, output etc.

• Connections are not just between pairs of units, but into single ones, and amongst subsets of all sizes

• Defined by a set of parameters w0 .. w2n-1

• Function is threshold: >0=1, else -1

• Takes values over c = {-1,1}n

Can I See a Picture of One?

u1 u2

u4 u3

W0

W1 W2

W8 W4

W15

W7

W6W9

In What Follows …

u4

u1 u2

u3

w8 = 1000

w2 = 0010

w13 = 1101

u4

u1 u2

u3Q8 = 1000

Q6 = 0110

How?(1) Learning a Function

Qi is the subset of neurons connected to weight i.

How?(2) Learning a PMF

Qi is the subset of neurons connected to weight i.

Calculating Function Output

Once learning is complete, we calculate the function output by:

icu ii

u4

u1 u2

u3

f (-1,1,1,-1) =

u1u3u4W13+ u1u2W3

= W13 – W3

Calculating Output Averages

• We may want to ask questions such as “If input 3 is set to one, what is the average output?”

• Or, more generally, we want to calculate:

• For example:

* is, of course, a wild card

nhhf ,*}1,1{),( )1,*,*,1,1( f

Calculating Output Averages

• To calculate the average output, we sum the weights, as before, but with one change:– Not all of the weights are used – just those

defined by the *s in h:

1**0 producesΨ = 0000, 0001, 1000, 1001 = W0,W1,W8,W9

Schemata Averages

Ψ

Examples:

f(***) = W0

f(*0*) = W0 + W2

f(01*) = W0 – W2 + W4 – W6

Finding Attractors

• Attractors in a function are maximal turning points – the tops of hills

• They are of particular interest in optimisation problems

• If we treat the output of the function as a score, or measure of quality, then the attractor patterns are in some sense good examples of the concept being learned

Finding Attractors

• We find the attractors by starting at a random point in {-1,1}n

• And then repeatedly apply these two steps:

u4

u1 u2

u3

u1=w14 x u4 x u3 + u2 x w3

Probabilities and Sampling

• If we learn the Probability Mass Function (PMF) from samples, we can calculate the probability of a pattern occurring as:

• Marginal and joint probabilities are calculated using the function average method described above, e.g.

)()( cfcp

)1,*,1(*,)11( 31 fuup

Sampling

• Let’s say we want to generate 1000 patterns, across which the distribution is that same as that of the data used to build the MOHN

• Useful in search and optimisation

• And for many other reasons

Sampling Algorithm

1. Start with h = *,*,*,*,* …2. Pick a random location, i

3. Calculate p(h) with hi set to 14. Repeat:

1. Leave hi=1 with given probability, else hi=-1

2. Choose another i (hi = *) at random

3. Calculate p(h^hi=1|h)

5. Until all bits are set

Example 1 – Function Learning

Binary to Integer encoding

Weights and Averages

Weights:

W0=127.5, W1 = 0.5, W2=1, W4=2, W8=4

W16=8, W32=16, W64=32, W128=64

ΣWi = 255

******** = 127.51******* = 191.5*******1 = 1281111111* = 254.5

Example 2: Symmetry

• Function output is measure of vertical symmetry

• No first order interactions

• Some second order

• None higher

Attractors

Example 3: Sampling

Data & Analytics

Complexity and Order: Modelling Functions with Mixed Order Hyper-Networks