No Free Lunch (NFL) Theorem

No Free Lunch (NFL)Theorem

Many slides are based on a presentation of Y.C. Ho

Presentation by Kristian Nolde

25. August 2004– 2/29

General notes

Goal:• Give an intuitive feeling for the NFL• Present some mathemtical background

To keep in mind• NFL is an impossibility theorem, such

as– Gödel‘s proof in mathematics (roughly:

some facts cannot be proved or disaproved in any mathematical system)

– Arrow‘s theorem in economics (in principle, perfect democracy is not realizable)

• Thus, practicle use is limited ?!?

25. August 2004– 3/29

The No Free Lunch Theorem

• Without specific structural assumptions, no optimization scheme can perform better than blind search on the average

• But blind search is very inefficient! • Prob (at least one out of N samples is in

the top-n for search space of size ||) ~ nN/|| ex. Prob=0.0001 for ||=109, n=1000, N=1000

25. August 2004– 4/29

Assume a finite World

Finite # of input symbols (x’s) and

finite # of output symbols (y’s) =>

finite # of possible mappings from input to output (f’s)

25. August 2004– 5/29

The Fundamental Matrix F

x1

x2

x|X|

f1 f2 f|F|

0

0

1

0

0

0

FACT: equal number of 0’s and 1’s in each row!

1

1

0 1

0

1 1

1

1

1

In each row, each value of Y appear |Y| |X|-1 times!

Averaged over all f, the value is independent of x!

25. August 2004– 6/29

Compare Algorithms

• Think of two algorithms: a1 and a2

e.g. a1 always selects from x1 to x.5|X|

a2 always selects from x.5|X| to x|X|

• For specific f: a1 or a2 may be bettter. However, if f is not known average performance of both is equal:

where d is a sample and dy is the cot value associated with d.

f

y

f

y afdPafdP ),(),( 21

25. August 2004– 7/29

Comparing Algorithms Continued

• Case 1: Algorithms can be more specific, e.g. assume a certain realization fk, a1

• Case 2: Or, they can be more general, assume more uniform distribution of possible f, a2.

• Then performance of a1 will be excellent for fk

but catastrophic for all other cases (great performance, no robustness)

• Contrary, a2 performs mediocre for all cases, but doesn‘t fail (poor performance, high robustness)

Common Sense says:Robustness * Efficiency = Constant

or Generality * Depth = Constant

25. August 2004– 8/29

Implication 1

• Let x be the optimization variable, f the performance function, and y the performance, i.e., y=f(x)

• then averaged over all possible optimization problems, the result is choice independent

• if you don’t know the structure of f (which column you are dealing with), blind choice is as good as any!

25. August 2004– 10/29

Implications 2

• Let X be the space of all possible representation (as in genetic algorithms), or space of all possible algorithms to apply to a class of problems

• Without understanding of the problem, blind choice is as good as any.

• “understanding” means you know which column of the F matrix you are dealing with

25. August 2004– 11/29

Implications 3

• Even if you know which columns or group of columns you are dealing with => you can specialize the choice of rows

• You must accept that you will suffer LOSSES should other choices of column occur due to uncertainties or disturbances

25. August 2004– 12/29

The Fundamental Matrix F

x1

x2

x|X|

f1 f2 f|F|

0

0

1

0

0

0 1

1

0 1

0

1 1

1

1

1

Assume a distribution of the columns, then pick a row that results in minimal expected losses or maximal performance. This is

stochastic optimization

25. August 2004– 13/29

Implications 5

• Worse, if you should estimate the probabilities incorrectly, then your stochastically optimized solution may suffer catastrophic bad outcomes more frequent then you like.

• Reason: you have already used up more of the good outcomes in your “optimal” choice. What are left are bad ones that are not suppose to occur! (HOT Design & power law -Doyle)

25. August 2004– 14/29

Implications 6

• Generality for generality sake is not very fruitful

• Working on a specific problem can be rewarding

• Because: – the insight can be generalized– the problem is practically important– the 80-20 effect

Documents

No Free Lunch (NFL) Theorem