Levels of Abstraction in Probabilistic Modeling and Sampling

Preview:

DESCRIPTION

Levels of Abstraction in Probabilistic Modeling and Sampling. Moshe Looks November 18 th , 2005. Outline. Global Optimization with Graphical Models Where Do Our Random Variables Come From? Incorporating Levels of Abstraction Results Conclusions. Global Optimization. User Determines - PowerPoint PPT Presentation

Citation preview

Levels of AbstractionLevels of Abstractionin Probabilistic Modeling and Samplingin Probabilistic Modeling and Sampling

Moshe LooksNovember 18th, 2005

Levels of Abstraction 2

OutlineOutline

• Global Optimization with Graphical Models

• Where Do Our Random Variables Come From?

• Incorporating Levels of Abstraction

• Results

• Conclusions

Levels of Abstraction 3

Global OptimizationGlobal Optimization

• User Determines– How instances are represented

• E.g., fixed-length bit-strings drawn from {0,1}n

– How instance quality is evaluated• E.g., a fitness function from {0,1}n to R

– May be expensive to compute

• Assumes a Black Box– No additional problem knowledge

Levels of Abstraction 4

ApproachesApproaches

• Blind Search– Generate and test random instances

• Local Search (Hill-climbing, Annealing, etc..)– Search from a single (best) instance seen

• Population-Based Search– Search from a collection of good instances seen

Levels of Abstraction 5

Population-Based SearchPopulation-Based Search

1. Generate a population of random instances

2. Recombine promising instances in the population to create new instances

3. Remove the worst instances from the population

4. Goto step 2

Levels of Abstraction 6

Population-Based SearchPopulation-Based Search

• How to Generate New Instances?

• Genetic Algorithms– Crossover + Mutation

• Estimation of Distribution Algorithms (EDAs)– Generate instances by sampling from a probability

distribution reflecting the good instances– How to represent the distribution?

Levels of Abstraction 7

Probability-Vector EDA (Bit-String Case)Probability-Vector EDA (Bit-String Case)

• Each position in the bit-string corresponds to a random variable in the model; X = {X1,X2,…,Xn}

• Assume independence– For the population 001, 111, and 101

• P(X1=0) = 1/3 P(X1=1) = 2/3• P(X2=0) = 2/3 P(X2=1) = 1/3• P(X3=0) = 0 P(X3=1) = 1

– E.g., P(011) = P(X1=0) • P(X2=1) • P(X3=1) = 1/3 • 1/3 • 1 = 1/9

– Can generate instances according to the distribution

• Population-Based Incremental Learning (PBIL)– Baluja, 1995

Levels of Abstraction 8

Graphical ModelsGraphical Models

• Probability + Graph Theory– Nodes are random variables– Graph structure encodes variable

dependencies

• Great for– Uncertainty– Complexity– Learning

Levels of Abstraction 9

The Bayesian Optimization The Bayesian Optimization AlgorithmAlgorithm ((Pelikan, Goldberg, and Cantú-Paz, 1999)Pelikan, Goldberg, and Cantú-Paz, 1999)

• Dynamically learn dependencies between variables (nodes in a Bayesian network)

• Without dependenciesP(X1X2X3X4)

= P(X1) • P(X2) • P(X3) • P(X4)

• With dependenciesP(X1X2X3X4)

= P(X1) • P(X2 | X1) • P(X3) • P(X4 | X1, X3)– Dependencies (edges) correspond to partitions of the target

node’s distribution based on the source node’s distribution

Variables must now be sampled in topological order

Levels of Abstraction 10

Where Do Our Random Variables Come From?Where Do Our Random Variables Come From?

• The real world is a mess!– Boundaries are fuzzy and ambiguous– Even in discrete domains, ambiguity remains:

• E.g., DNA- a gene’s positions is sometimes critical, and sometimes irrelevant

• Consider abstracted features, defined in terms of “base-level variables”

• E.g., contains a prime number of ones• E.g., does not contain the substring AATGC

Levels of Abstraction 11

FeaturesFeatures• Features are predicates over instances,

describing the presence or absence of some pattern

• Base-level variables Xi, 1 < i < n, are a special case (i.e., “Xi=1” is a feature)

• Any well-defined feature (f) may be introduced as a node in the graph for probabilistic modeling

• What about model-based instance generation?

Levels of Abstraction 12

Definitions for Feature-Based SamplingDefinitions for Feature-Based Sampling

• Given a set of base variables X={X1,X2,…,Xn}, and an instance x=(x1x2..xn) – SX is sufficient for fx if f is true for every solution

with the same assignment as x for variables in S– S is minimally sufficient if none of its subsets are

sufficient– The unique grounding of fx is the union of all

minimally sufficient sets• If f is not present in x, the grounding of fx is the empty set

• E.g., for f = “contains the substring 11”, the grounding of f10110111 is {X3,X4,X6,X7,X8}

Levels of Abstraction 13

Generalizing Variable Assignment in Generalizing Variable Assignment in Instance GenerationInstance Generation

• Assigning fx to a newinstance means:

– When the feature is present in x• Partitioning the distribution to include the grounding of fx

– When the feature is absent from x• Partitioning the distribution to exclude the groundings of f for

all instances in the population

Levels of Abstraction 14

Generalizing Variable Assignment in Generalizing Variable Assignment in Instance GenerationInstance Generation

• Consider assignmentwith f = “contains the substring 11”

We may now generate instances as before, although some assignments may fail (i.e., features may overlap)

Assignment with f110 (i1) Assignment with f010 (i3)

Levels of Abstraction 15

Feature-Based BOA With MotifsFeature-Based BOA With Motifs

• Acceptance criterion for a motif f:

• F is the set of existing features

• count(A,B) is the number of instances in the population with all features in A and none in B

• spread(f) is the relative frequency of f across possible substring positions (more possibilities for short strings)

Levels of Abstraction 16

Feature-Based BOA With MotifsFeature-Based BOA With Motifs

• N (the population size) random substrings are tested as possible motifs

• c = 0.4 was chosen based on ad-hoc experimentation with small (n = 30, 60) OneMax instances

• Motif-learning is O(n2•N), assuming a fixed upper bound on the number of motifs

• The general complexity of BOA-modeling is O(n3 + n2•N)

Levels of Abstraction 17

Test ProblemsTest Problems

• OneMax should be very easy for fbBOA– Features are strings of all-ones

• TwoMax (aka Twin Peaks)– Global optima at 0n and 1n

– Features are strings of all-ones and strings of all-zeros

– Requires dependency learning

• 3-deceptive– Hard because of traps (local optima)

Levels of Abstraction 18

TwoMax ~ ResultsTwoMax ~ Results

Qualitatively similar results were obtained for OneMax and 3-deceptive

Levels of Abstraction 19

More Test ProblemsMore Test Problems

• One-Dimensional Ising Model– Penalizes every transition

between 0 and 1– Large flat regions in the fitness

landscape

• Hierarchal IFF (H-IFF) and Hierarchal XOR (H-XOR)– Adjacent pairs of variables are

grouped recursively (i.e., problem size is 2k)

– Global optima achieved when all levels are synchronized (0n and 1n for H-IFF, 01101001 and 10010110, for H-XOR)

H-IFF-64's Fitness Landscape (Cross-Section)

Levels of Abstraction 20

Results ~ Ising ModelResults ~ Ising Model

Levels of Abstraction 21

Results ~ H-IFFResults ~ H-IFF

Levels of Abstraction 22

Results ~ H-XORResults ~ H-XOR

Levels of Abstraction 23

ConclusionsConclusions

• Generalized probabilistic modeling and sampling with an additional level of abstraction, features

• Constraining instance generation on an abstract level can speed up the discovery of optima by an Estimation of Distribution Algorithm

• Current/Future Work: Strings to Trees– Dynamically rewrite trees

• identify meaningful variables

– New kinds of features• Semantics rather than syntax

Levels of Abstraction 24

Recommended