Upload
kevin-swingler
View
173
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mixed Order Hyper-Networks are a type of neural network with connections among groups of neurons rather than the traditional neuron pairs. This show describes their structure and some algorithms for using them to learn functions, model distributions and solve optimisation problems.
Citation preview
Complexity and Order
Kevin Swingler
What is Complexity?
<
Interaction Order
21
3
Total Possible Interactions
• Where n is the number of nodes:– Possible 1st order interactions = n– Possible 2nd order interactions = (n(n-1))/2
– Possible order k interactions =
k
n
nn
k k
n2
0
Measuring Complexity
• Enumerate the possible interactions w0 .. w2n
-1
• Count the number that are used• Then a measure of complexity of a system
might be the number of interactions, possibly divided by 2n
• We might also want to consider higher order interactions as being more complex than lower ones– For example, accounting for the number of
samples needed to model them …
Function Modelling
Now we define a system more specifically
We would like to express any such function so that:• The interactions are explicit• It reproduces the function perfectly• Local maxima are attractor points• If the function is a PMF, we can sample from it
and calculate any probability we like
Why?
• We can specifically manage and understand the complexity of the model
• We can find optimal points in input space that maximise the output
• We can sample from the function (which in general is difficult)
How?
• I am using Mixed Order Hyper-Networks (MOHNs)
• A type of neural network
Neural Networks
• Generally:– A set of processing units, u with roles of either:
• Receiving input• Making calculations• Providing output
– Defined by a set of weighted connections between pairs of units
– Each unit makes the same calculation
jjii uwfu
MOHNs
• Units do not have roles – no input, output etc.
• Connections are not just between pairs of units, but into single ones, and amongst subsets of all sizes
• Defined by a set of parameters w0 .. w2n-1
• Function is threshold: >0=1, else -1
• Takes values over c = {-1,1}n
Can I See a Picture of One?
u1 u2
u4 u3
W0
W1 W2
W8 W4
W15
W7
W6W9
In What Follows …
u4
u1 u2
u3
w8 = 1000
w2 = 0010
w13 = 1101
u4
u1 u2
u3Q8 = 1000
Q6 = 0110
How?(1) Learning a Function
Qi is the subset of neurons connected to weight i.
How?(2) Learning a PMF
Qi is the subset of neurons connected to weight i.
Calculating Function Output
Once learning is complete, we calculate the function output by:
icu ii
u4
u1 u2
u3
f (-1,1,1,-1) =
u1u3u4W13+ u1u2W3
= W13 – W3
Calculating Output Averages
• We may want to ask questions such as “If input 3 is set to one, what is the average output?”
• Or, more generally, we want to calculate:
• For example:
* is, of course, a wild card
nhhf ,*}1,1{),( )1,*,*,1,1( f
Calculating Output Averages
• To calculate the average output, we sum the weights, as before, but with one change:– Not all of the weights are used – just those
defined by the *s in h:
1**0 producesΨ = 0000, 0001, 1000, 1001 = W0,W1,W8,W9
Schemata Averages
Ψ
Examples:
f(***) = W0
f(*0*) = W0 + W2
f(01*) = W0 – W2 + W4 – W6
Finding Attractors
• Attractors in a function are maximal turning points – the tops of hills
• They are of particular interest in optimisation problems
• If we treat the output of the function as a score, or measure of quality, then the attractor patterns are in some sense good examples of the concept being learned
Finding Attractors
• We find the attractors by starting at a random point in {-1,1}n
• And then repeatedly apply these two steps:
u4
u1 u2
u3
u1=w14 x u4 x u3 + u2 x w3
Probabilities and Sampling
• If we learn the Probability Mass Function (PMF) from samples, we can calculate the probability of a pattern occurring as:
• Marginal and joint probabilities are calculated using the function average method described above, e.g.
)()( cfcp
)1,*,1(*,)11( 31 fuup
Sampling
• Let’s say we want to generate 1000 patterns, across which the distribution is that same as that of the data used to build the MOHN
• Useful in search and optimisation
• And for many other reasons
Sampling Algorithm
1. Start with h = *,*,*,*,* …2. Pick a random location, i
3. Calculate p(h) with hi set to 14. Repeat:
1. Leave hi=1 with given probability, else hi=-1
2. Choose another i (hi = *) at random
3. Calculate p(h^hi=1|h)
5. Until all bits are set
Example 1 – Function Learning
Binary to Integer encoding
Weights and Averages
Weights:
W0=127.5, W1 = 0.5, W2=1, W4=2, W8=4
W16=8, W32=16, W64=32, W128=64
ΣWi = 255
******** = 127.51******* = 191.5*******1 = 1281111111* = 254.5
Example 2: Symmetry
• Function output is measure of vertical symmetry
• No first order interactions
• Some second order
• None higher
Attractors
Example 3: Sampling