27
Kludged * October 5, 2010 Abstract Is there reason to believe that our brains have evolved to make effi- cient decisions so that the details of the internal process are irrelevant? I develop a model which illustrates a limitation of adaptive processes: improvements tend to come in the form of kludges. A kludge is a marginal adaptation that compensates for, but does not eliminate fun- damental design inefficiencies. When kludges accumulate the result can be perpetually sub-optimal behavior even in a model of evolu- tion in which arbitrarily large innovations occur infinitely often with probability 1. Keywords: kludge. * I thank Andy Postlewaite and three anonymous referees for suggestions that improved (drastically) the quality of this paper. A previous version has been reviewed in NAJ eco- nomics. See http://www.najecon.org/naj/v14.htm

Abstract - Northwestern University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Abstract - Northwestern University

Kludged∗

October 5, 2010

Abstract

Is there reason to believe that our brains have evolved to make effi-cient decisions so that the details of the internal process are irrelevant?I develop a model which illustrates a limitation of adaptive processes:improvements tend to come in the form of kludges. A kludge is amarginal adaptation that compensates for, but does not eliminate fun-damental design inefficiencies. When kludges accumulate the resultcan be perpetually sub-optimal behavior even in a model of evolu-tion in which arbitrarily large innovations occur infinitely often withprobability 1.Keywords: kludge.

∗I thank Andy Postlewaite and three anonymous referees for suggestions that improved(drastically) the quality of this paper. A previous version has been reviewed in NAJ eco-nomics. See http://www.najecon.org/naj/v14.htm

Page 2: Abstract - Northwestern University

1 Introduction

In July of 2004, Microsoft announced that the release of Vista, the next gen-eration of the Windows operating system, would be delayed until late 2006.Jim Allchin famously walked into the office of Bill Gates and proclaimed,“It’s not going to work.” Development of Windows had become unman-ageable and Allchin decided that Vista would have to be re-written essen-tially from scratch.

Mr Allchin’s reforms address a problem dating to Microsoft’sbeginnings. . . . PC users wanted cool and useful features quickly.they tolerated – or didn’t notice – the bugs riddling the soft-ware. Problems could always be patched over. With each patchand enhancement, it became harder to strap new features ontothe software since new code could affect everything else in un-predictable ways.1

The Alternative Minimum Tax was introduced by the Tax Reform Actof 1969. It was intended to prevent taxpayers with very high incomes fromexploiting numerous tax exemptions and paying little or no tax at all. Overtime, the shortcomings of the AMT as a solution to the proliferation of ex-emptions have begun to appear. However, over this same time, the federaltax and budgeting system has come to depend on the AMT to the pointthat many observers think that changing the AMT, without complicatedaccompanying adjustments elsewhere, would be worse than leaving it asis.

Flat fish inhabit the sea floor, but for many this was not the originalhabitat. When their ancestors moved to the sea floor, they adapted bychanging their orientation from swimming “upright” to on their sides. Giventheir existing bone structure, this was the only way to become “flat”, butit rendered one eye useless. So, by a further adaptation many of today’sspecies of flatfish migrate one eye to the opposite side of their body duringdevelopment.2

As beautifully documented in the film The March of the Penguins, em-peror penguins spend a nearly 9 month breeding and nurturing cycle whichinvolves walking up to 100 KM away from any food source in order toavoid predators. The problem for penguins is that they are birds, and hencelay eggs; but they are flightless birds, so they find it inconvenient to move

1“Code Red: Battling Google, Microsoft Changes How it Builds Software.” The WallStreet Journal, Robert Guth, September 2005.

2For a vivid account of the evolution of bony flat fish, see Dawkins (1986).

1

Page 3: Abstract - Northwestern University

to areas where the eggs can be easily protected. They adapted not by recti-fying either of these two basic problems,3 but instead by compensating forthem by an extremely costly and risky behavior.

Each of these examples represents a kludge: an improvement upon ahighly complex system that solves an inefficiency but in a piecemeal fash-ion and without addressing the deep-rooted underlying problem. Thereare three ingredients to a kludge. First the system must be increasing incomplexity so that new problems arise that present challenges to the in-ternal workings of the system. Second, a kludge addresses the problem bypatching up any mis-coordination between the inherited infrastructure andthe new demands. Third, the kludge itself– because it makes sense only inthe presence of the disease it is there to treat– intensifies the internal ineffi-ciency, necessitating either further kludges in the future or else eventuallya complete revolution.4

Microsoft Windows is a complex system whose evolution is guided bya forward-looking dynamic optimizer. It is not surprising therefore that,after two decades worth of kludges that accompanied the expansion fromDOS to Windows to 32 bit and eventually 64 bit architecture, revolutionwas the final solution. In the case of the US Tax Code, or for that matter anysufficiently complex body of contracts that govern interactions among di-verse interests, while the evolution may be influenced by forward-lookingconsiderations, full dynamic optimization is more tenuous as a model ofthe long-run trade-offs.

But the story is very different for flat fish and penguins, and, to cometo the point, for human brains, whether we are considering the evolutionof the brain across generations or the development of the decision-makingapparatus within the life a single individual. Here, progress is adaptive.An adaptive process is not forward-looking and certainly not governed bydynamic optimization. An adaptive process inherits its raw material fromthe past, occasionally modifies it by chance (mutation or experimentation),and selects among variants according to success today.

Nevertheless there is the possibility, not completely fanciful, that anadaptive process can produce complex systems that perform as well to-day as those that were designed by an optimizer given the same set of rawmaterials. Indeed, there is a tradition in economics that accepts the dis-

3Incidentally, it has happened in evolutionary history that oviparous (egg-laying)species have adapted to vivipary (giving birth to live offspring.) Some species of sharksare important examples. Vivipary enables a long internal gestation so that the developingoffspring is protected and nourished within the body of the mother.

4See wikipedia for the history, usage, and pronunciation of the word kludge.

2

Page 4: Abstract - Northwestern University

tinction between adaptation and optimization, but rationalizes a positivemethodology based on unfettered optimization by an appeal to this un-written proposition.5

In this paper I present a model which probes that common story. I an-alyze a simple single-person decision problem. An organism is a procedurefor solving this problem. I parametrize a family of such algorithms whichincludes the optimal algorithm in addition to algorithms that perform lesswell. An adaptive process alters the organism over time, favoring improve-ments. I show conditions under which no matter how long the adaptiveprocess proceeds, an engineer, at any point in time, working only with theraw materials that presently make up the organism, could eliminate a per-sistent structural inefficiency and produce a significant improvement. Inthe model, kludges arise naturally and are the typical adaptations that im-prove the organism. A kludge always improves the organism at the mar-gin, but also increases both its complexity and its internal complementarityand as a by-product makes it harder and harder for adaptation to undothese inefficiencies in the future.

Illustrative Example The following stylized example illustrates the mainideas. An organism of complexity k has k genes each of which can havevalue −1 or +1. Thus, its genetic code π is a k-length sequence such asπ = {−1,+1,+1, . . . ,−1}. The fitness of the organism is given by the func-tion V(π) which exhibits complementarity. A simple example would beV(π) = V(κ) where κ is the number of +1 genes. If V(0) = k, V(1) = 2k,and V(κ) is U-shaped then, holding k fixed, there are two local optima: anegatively aligned organism (all genes are −1) and a positively aligned organ-ism (all genes are +1.) Refer to Figure 1.

In each period the organism is, with probability 0 < (1− q) < 1 subjectto random mutation. When a mutation occurs, each gene has an indepen-dent probability µ > 0 of changing signs. If the new version of the organ-ism is an improvement (higher V) then it replaces the existing organism.Otherwise the existing organism survives to the next period. In this model,when k is constant over time, with probability 1 a negatively aligned organ-ism will eventually be replaced and never return. It is enough that k̄ geneschange sign where V(k̄) > k = V(0). But now suppose that k is increasingover time. In particular, suppose that a second form of mutation can occur.With complementary probability q, the complexity increases by 1 and thenew gene is determined at random.

5The classic defense is Friedman (1966).

3

Page 5: Abstract - Northwestern University

(a) Drastic mutation for fixed k. (b) Increased complexity increases the sizeof a drastic mutation

Figure 1: Kludges in the illustrative example.

If the organism is initially negatively aligned, and the complementarityin V is strong, an increase in complexity survives only when the new gene issimilarly aligned. An increase in complexity thus produces an incrementalimprovement but potentially reduces the probability of the drastic changesneeded to overhaul the structure of the organism from negative to positivealignment. If so, then the negatively aligned organism is kludged.

In fact, if a drastic mutation always requires some minimum fractionof genes to switch, then as k grows over time, drastic mutation probabili-ties shrink exponentially fast. This means that the total probability that adrastic mutation ever occurs is strictly less than 1. Essentially, while at anypoint in time the probability of jumping between local optima is positive,the distance between the local optima is growing fast enough that the totalprobability remains bounded away from 1. In particular there is a pos-itive probability that the kludges perpetually accumulate and ultimatelyprevent the efficient re-alignment from taking place.

The key features in this story are the complementarity in the fitnessfunction V and the rate of increased complexity relative to mutations. Thispaper builds a model of a decision-making algorithm and its evolutionaryprocess which give rise to these features. In the model, an organism evolvesa procedure for processing and encoding data. When the procedure ispoorly adapted to the environment, improvements can still come from in-creased complexity: the organism analyzes additional data but encodes itusing the inefficient protocol. Because aggregating the data requires a com-mon encoding, increased complexity increases the dependence on the pro-tocol. Useful aggregation requires, at a minimum, that at least half of the

4

Page 6: Abstract - Northwestern University

data are encoded consistently with the protocol. Finally, unlike in the sim-ple example, complexity grows at an endogenous rate in the model. As themarginal value of additional data decreases, complexity growth becomesvery slow. Indeed the organism waits for roughly k2 periods before in-creasing complexity from k to k + 1. Nevertheless this is fast enough thatwith positive probability the kludged protocol survives forever.6

1.1 Kludges In The Brain

Neuroscience views evolution as having progressively built higher func-tioning systems on top of more primitive systems, each time leaving thelatter essentially intact.7 The resulting patchwork often bears all of the hall-marks of a kludge. The brain itself is built of neurons, which as evidencedby their presence in jellyfish and snails are evolutionarily ancient build-ing blocks of neurocircuitry. Unfortunately they don’t scale well: they areenergy-inefficient, noisy, and slow; hardly the tools one would have chosento build a complex, conscious brain.

But the most glaring clashes come in the interaction between consciousdecision-making and the more primitive systems that provide the data guid-ing our decisions. For example, the brain’s system of passive, context-driven memory is a poor match for a conscious decision-making centerwhich needs arbitrary access to memories. Hence, unlike a computer whichrecalls memories by their addresses, when we need to find lost keys we “re-trace our steps.” This is an attempt to manipulate the passive memory sys-tem, conjuring up the requisite context in hopes of shaking the lost memoryout. “Blindsight” reveals the inefficient integration between primitive vi-sual centers in the brain and more evolutionarily recent visual modules inthe cortex. Blind subjects asked to locate an object held before them are,naturally, unable to make any guess with confidence. But when asked toreach for the object, some subjects can consistently locate it and grab it.

1.2 Organism Design

Structurally inefficient decision-makers present a problem not just for pos-itive methodology, but normative as well. Much of welfare economics isfounded on revealed preference and agent sovereignty. These principles

6Sandholm and Pauzner (1998) make a similar point in a coordination game with a grow-ing population. There, even very slow population growth precludes escape from inefficientequilibria.

7Linden (2007) and Marcus (2008) are excellent popular accounts.

5

Page 7: Abstract - Northwestern University

presume that the choices we observe reveal what benefits the agent. Butwhen the adaptive process creates a wedge between observed behavior andthe underlying objective the agent is designed to satisfy, there is a corre-sponding wedge between revealed preference and true preference. Put dif-ferently, if we grant that there is some underlying objective that guides theadaptive process, then at best we can view the organism as an agent whoseefforts at achieving that objective are the result of a second-best solutiondesigned by nature, the principal. We can no better infer that underlyingobjective from the choice behavior of the organism than we can identifythe distorted choices made by an incentivized agent with the principal’sfirst-best solution.

Indeed, this principal-agent metaphor is the basis of an increasinglypopular methodology for behavioral economics. For example, Robson (2001)studies the biological rationale for hedonic utility. His model shows thatutility can be understood as an optimal compensation scheme for an agentwho has private information about the fitness consequences of various con-sumption bundles. Implicit is the interpretation that natural selection canbe equivalently regarded as a fitness-maximizing principal with a freedomto design the agent’s preferences limited only by asymmetric information.The bottom-line of such a model is an agent whose revealed preference ex-actly coincides with nature’s first-best.

By contrast, interesting non-standard preferences can be generated by asimilar model in which metaphorical nature is assumed to face additionalconstraints. For example, Samuelson and Swinkels (2006) consider a de-sign problem in which the agent necessarily makes errors in informationprocessing and nature’s incentive scheme must trade off the value of in-corporating the agent’s private information about the local environmentagainst the risks of granting too much leeway to imperfectly formed beliefs.Constrained to use the blunt instrument of utility to provide incentives tothe agent, nature’s optimal design necessarily induces behavioral biases,including self-control problems, menu-dependence, and present-bias. InRayo and Becker (2007) nature is constrained to use outcome-contingentrewards to motivate the agent and there are limits to the granularity of this“happiness” instrument. The resulting optimal incentive scheme is equiv-alent to reference-dependent preference. In dynamic decision problems,certain behavioral responses can substitute for expanded memory and inBaliga and Ely (2009), nature’s design economizes on memory capacity byutilizing this trade-off. The result is observationally equivalent to a sunk-cost bias.

In each of these models, the conclusions are driven by the particular

6

Page 8: Abstract - Northwestern University

constraint imposed on nature’s representative, the principal. It follows thatthe same evolutionary argument can turn each one of these conclusions onits head. Nature is appropriately modeled as an optimizing principal onlyif natural selection can be assumed to operate long enough to reach an op-timum. But then there should also have been ample time for nature to relaxthese constraints. In the language of incentives, because there is no intrin-sic conflict of interest between principal and agent, in the long run naturesimply “sells the firm” to the organism. Whatever residual effect of theconstraints persists should have negligible costs.8 Equivalently, observablebehavior should be arbitrarily close to the first-best, and hence free from(costly) behavioral anomalies.

The present paper provides a defense of this methodology against sucharguments. On the one hand, the model yields arbitrarily large innova-tions in the design of the organism. It follows that each “component” ofthe organism is optimally designed taking as given the existence of, andinteractions with, other components. That is, the organism is optimal sub-ject to certain design constraints. And on the other hand, these constraintsneed not be eliminated despite the fact that arbitrarily large innovations oc-cur infinitely often. Indeed, their presence can have non-vanishing shadowcosts even in the long run.

2 The Model

An organism is designed to solve a fixed decision problem, instances ofwhich are presented to the organism repeatedly over time. The decisionproblem has the following interpretation. A resource is available at a cer-tain location. The location is realized independently in each period. Signalswhich reveal the location of the resource are available to the organism. Theproblem for the organism is to input these signals, interpret them, and thenchoose a location in attempt to extract the resource. At any given point intime, the organism is subject to a complexity constraint which imposes atechnological trade-off between these activities. The optimal design of theorganism at a given level of complexity will balance this trade-off. Overtime, the organism will evolve by increasing complexity and the design of

8Of course there are physical constraints which could not be eliminated no matter howlong nature is left to act. But the appropriate comparison here is between the outcome ofthe evolutionary process as modeled by a designer subject to physical and non-physicalconstraints and the design that is optimal subject only to the physical constraints. Theargument here is that all residual internal design inefficiencies should, in the long run, havenegligible bottom-line consequences.

7

Page 9: Abstract - Northwestern University

the organism will adjust to the relaxed constraint. We describe the long runbehavior of this evolutionary process.

The Decision Problem A resource is hidden at a location θ ∈ [−1, 1]which is drawn from the uniform distribution. The organism will choose alocation a and search intensity i. The payoffs are

u(a, i, θ) = i[2aθ − a2] . (1)

These payoffs model the following trade-offs. First, for any given searchintensity, the organism would like to choose a location a as close as possibleto θ. Likewise, for any given a, the payoffs are increasing in search intensity.The complexity constraint (to be described in the next section) will implya technological trade-off between these to be balanced against the trade-offin payoffs captured by Equation 1. Beyond capturing these trade-offs, theprecise functional form is for computational convenience only. We refer tothe expression 2aθ − a2 as the payoff per-unit of intensity, or average payoff.

In the environment there is a collection of signals σ available to theorganism which convey information about the current location of the re-source. The structure of correlation betwen σ and θ is given by a fixed jointdistribution to be specified later. New realizations of both are drawn fromthis distribution in each of an infinite sequence of periods, independentlyover time.

Formally, σ = σ1, σ2, . . . ∈ {−1,+1}∞ is an infinite sequence of signalsfrom {−1,+1} and the organism is able to collect a finite sample of size k.The parameter k is referred to as the precision of the organism.

The organism has a strategy which selects a location to search as a func-tion of the sample σ1, . . . , σk. At any point in time, this strategy is char-acterized by another finite sequence π = π0, π1, . . . , πk ∈ {−1,+1}k+1 inaccordance with the following rule. When the organism observes the sam-ple σ1, . . . , σk it applies this formula

a(σ1, . . . , σk|π) =π0

k + 2

k

∑j=1

πjσj. (2)

to select a location to search. To interpret this, note that the sign of π0πjexpresses whether the organism acts as though signal σj is positively ornegatively correlated with the location θ. The organism then obtains con-ditionally expected payoff

i[2a(σ1, . . . , σk|π) · E(θ|σ1, . . . , σk)− a(σ1, . . . , σk|π)2] (3)

8

Page 10: Abstract - Northwestern University

The sequence π0, . . . , πk can be viewed as the part of the genetic codeof the organism that represents the organism’s current “model” of the truecorrelation structure in the environment. Throughout the paper I will referto π as the genetic code, noting that this is a slight misuse of terminologybecause both i and π determine the behavior of the organism and both willbe subject to mutation.

We have assumed that the organism uses an algorithm from a particularclass, namely those parametrized by Equation 2. When the parameters ofthe environment are defined next, we will show that the optimal algorithmis always a member this class.

Computation and Complexity The organism is subject to a complexityconstraint. We model this by assuming that the organism has a fixed ca-pacity x of “energy” or computational capacity with which to implement itsstrategy. The complexity cost includes the computational cost of process-ing signals plus the costs of search intensity. We model computational costsby assuming that each arithmetic operation (adding, multiplying by −1) inthe formula Equation 2 has unit cost.9 Thus, the strategy a(σ1, . . . , σk|π)requires

κ(π) = k +k

∑j=0

1{πj=−1}

computational steps to implement,10 and an organism of complexity x wouldthen have a search intensity equal to the residual

i = x− κ(π). (4)

The optimal design maximizes Equation 3 subject to this complexity “bud-get” constraint.

The Environment Recall that π describes the organism’s model of thecorrelation between σ and θ. Now I describe the true correlation structure.The environment is described by a parameter λ which specifies for each jwhether σj is positively or negatively correlated with θ. Formally, λ is an

9 As will become clear below these computational costs are best thought of as repre-senting the costs of transforming the data into a common format for aggregation and notliterally the costs of doing arithmetic.

10or convenience we exclude the cost of dividing by k + 2 as this cost will be common toall algorithms in this class.

9

Page 11: Abstract - Northwestern University

infinite sequence λ = λ1, λ2, . . . ∈ {−1,+1}∞ that is determined at the be-ginning of time according to an i.i.d. process with Prob(λj = 1) = l > 1/2.Recall that in each period, θ is drawn from the uniform distribution. Next,conditional on the realized location θ, each signal σj is chosen indepen-dently with

Prob(σj = λj) = θ̂. (5)

where θ̂ = (θ + 1)/2 is just θ ∈ [−1,+1] normalized to a probability. Tounderstand this structure, first consider a signal j for which λj = 1. Inthis case, observing σj = 1 indicates that the resource is likely to be locatedfurther to the right, whereas σj = −1 indicates that the resource is likelyto be located further to the left. However, when λj = −1, these inferencesare reversed. For this reason, if λj = −1, then we say that the jth signal isinverted.

3 The Optimal Organism

The first step in the analysis is to solve the “engineering benchmark” prob-lem of maximizing payoffs for any fixed level of complexity under the as-sumption that λ is known. This will be our point of comparison when welater analyze the evolutionary outcome when the organism must adapt toλ over time by natural selection.

Let us first ignore complexity costs and begin by characterizing the first-best algorithm for a given precision k. Inspection of the objective functionin Equation 1 reveals that for any given sample of size k, the first-best strat-egy computes the conditional expectation E(θ|σ1, . . . , σk) and selects loca-tion

a∗k (σ1, . . . , σk) := E(θ|σ1, . . . , σk).

An explicit formula for the conditional expectation follows from standardproperties of binomial sampling.11

θ̄k := E(θ|σ1, . . . , σk) =1

k + 2

k

∑j=1

λjσj. (6)

11Write θ̂ = (θ + 1)/2. Then θ̂ is distributed uniformly on [0, 1] and observation of{λjσj}k

j=1 is a binomial sampling process from {−1, 1} with unknown probability θ̂ that

λjσj = 1. It is a standard result that in this case the posterior distribution of θ̂ is a Beta dis-tribution with parameters (ζ1, ζ2) where ζ1 is one plus the number of j for which λjσj = 1and ζ2 is one plus the number of j for which λjσj = −1. The expectation of the Beta distri-

bution is ζ1ζ1+ζ2

. This yields Equation 6 after some algebra.

10

Page 12: Abstract - Northwestern University

Therefore the optimal average payoff is given by the random variable

2a∗k θ̄k − (a∗k )2 = θ̄2

k

whose expectation with respect to σ is

(θ̄2

k)= var(θ̄k) (7)

since the unconditional expectation of θ is equal to zero and therefore bythe law of total probability so is Eσ

(θ̄k).

Thus, the variance of the estimator θ̄k will play an important role in theanalysis. For future reference, in the appendix it is shown that this varianceincreases monotonically in the precision12

var(θ̄k−1

)< var

(θ̄k)< var (θ) = 1/3 (8)

for all k and moreover that the incremental variance is asymptotically onthe order of 1/k2:

var(θ̄k)− var(θ̄k−1) =1− var(θ̄k−1)

(k + 2)2 (9)

Note for future reference that the covariance between λjσj and λj′σj′ is pos-itive.

cov(λjσj, λj′σj′) > 0 (10)

Minimizing Computational Costs We have shown that the first-best de-sign for a given k achieves average payoff var(θ̄k). Using this result, afterincorporating complexity costs, we obtain the following formulation of theoptimal design for a fixed level of complexity x.

maxπ

i · var θ̄k

subject to a(·|π) = θ̄k

x = i + κ(π)

A solution to this problem necessarily economizes the complexity constraintby choosing the genetic code π to minimize the computational costs of im-plementing the estimator θ̄k. This cost-minimization problem is the centralidea of the model.

12 Intuitively, higher precision means a better estimate implying that the estimate is moresensitive to the variation in the state.

11

Page 13: Abstract - Northwestern University

Compare the class of algorithms in Equation 2 and the optimal estima-tor in Equation 6. It is easy to see that there are two types of organism thatachieve the optimum. A positively-aligned organism is one with π0 = +1and πj = λj for j = 1, . . . , k. A negatively-aligned organism is one withπ0 = −1 and πj = −λj for j = 1, . . . , k. In both cases, the resulting strategya(σ1, . . . , σk|π) is identically equal to θ̄k.

On the other hand, these two design protocols typically differ in theircomputational costs. The budget constraint for a positively aligned organ-ism reduces to

x = i +k

∑j=1

(3− λj

2

)whereas a negatively aligned organism yields the following budget con-straint

x = i +k

∑j=1

(3 + λj

2

). (11)

Thus, the gene π0 adds flexibility to the design of the organism potentiallyallowing it to economize on computation.13

The diagrams in Figure 2 illustrate the optimal design for a fixed com-plexity x. The budget lines capture the tradeoff between intensity and pre-cision for positively- (dashed) and negatively- (solid) aligned organismsrespectively. Adding the jth unit of precision requires one or two addi-tional computational steps, depending on the alignment and the value ofλj. The “indifference curve” is the set of pairs (i, k) which achieve the sameexpected total payoff i var(θ̄k).

Figure 2(a) shows a case in which the optimal organism is negativelyaligned. As the organism increases in complexity, the budget lines shiftup, potentially switching the alignment of the optimal organism. This isillustrated in Figure 2(b). Indeed, the optimal alignment depends on thesign of the moving average

L(k) :=1k

k

∑j=1

λj.

13Up to now we have assumed that the optimal design must implement θ̄k and searchedfor the algorithm that minimizes computational costs. We can see from the budget equa-tions that the optimum indeed must implement θ̄k. If instead π0πj 6= λj then the jth signalis biasing the estimate and also consuming computational steps. A better design wouldreduce precision by one and increase intensity.

12

Page 14: Abstract - Northwestern University

(a) Low x. Negative alignment (solid line)is optimal.

(b) Higher x. Budget lines shift upward andnow positive alignment is optimal.

Figure 2: Optimal organism for a fixed level of complexity x.

If it is positive, then the fraction of inverted signals up to k is less than1/2, and the optimal organism will be positively aligned. The negativelyaligned organism is optimal in the alternative case.

Recall that we have assumed that l > 1/2. This implies that for suffi-ciently complex organisms, positive alignment is optimal.14 A convenientway to visualize this is to consider k sufficiently large so that L(k) ≈ 2l − 1and the two budget lines are approximately

x ≈ i + k (2− l)

for positive alignment and

x ≈ i + k (1 + l)

for negative alignment. This is illustrated in Figure 3.

14What is important is that a particular structure is eventually optimal forever, not thatit is necessarily positive alignment. The model could easily be extended to allow the even-tually preferred alignment to be randomly determined. This would capture the point thatthe flexibility of the organism is beneficial ex ante because the preferred alignment cannotbe predicted in advance.

13

Page 15: Abstract - Northwestern University

Figure 3: Optimal alignment for large k.

4 Kludge

To this point we have been analyzing the design of the organism from theperspective of an engineer who knows λ. The focus now turns to a modelof the adaptive process whereby the genetic code π is instead shaped overtime by the force of natural selection.

First, note that for sufficiently complex organisms, positive alignmentyields a greater budget. Once this is the case, any negatively aligned organ-ism is attempting to implement the optimal decision rule via an inefficientprotocol. For this reason and reasons developed further below, we refer tosuch an organism as a kludge.

Definition 1. Suppose that the fraction of inverted signals up to k is less than1/2, i.e.

1k

k

∑j=1

λj > 0.

Then we say that an optimal negatively aligned organism with precision k is akludge.

14

Page 16: Abstract - Northwestern University

We can quantify the inefficiency of a kludge of complexity x. A switchto positive alignment would produce an organism of the same precisionbut strictly higher intensity. Indeed the intensity and therefore the payoffcan be increased by a number which (on average) increases linearly in k.

However, this measure may be hard to interpret as it depends on a car-dinal interpretation of payoffs. As an alternative, let us define the follow-ing ordinal concept of inefficiency of an organism. Say that the organismis asymptotically structurally inefficient if there is a specific gene (here π0)such that at some point in time, and forever thereafter, there exists an im-provement which involves altering this gene, and yet no such change everoccurs.15

5 The Adaptive Process

The final ingredient in the model is a description of the process by whichthe organism evolves. I adopt a simple model of mutation and naturalselection designed to capture the effects of a general class of adaptive pro-cesses. The specific assumptions are chosen mostly for expositional andanalytical convenience.

The organism O is parametrized by the pair O = (x, π). We consideradaptation of a single organism over time. Let Ot denote the organismexisting at time t. Each period t, the organism Ot = (x, π) is evaluatedaccording to its overall payoff given by

V(Ot) = Eθ,σ (x− κ(π))[2θa(σ1, . . . , σk|π)− a(σ1, . . . , σk|π)2] .

In each period, with positive probability, a variation occurs which re-sults in a new version O′ of the organism. We apply the following simplis-tic model of natural selection. If the variant O′ is more fit, i.e. V(O′) >V(Ot), then the variant replaces the existing version and survives to date

15A virtue of this definition is that it excludes “marginal inefficiencies” where at any pointin time some inefficiencies are present, but every inefficiency, once it appears, is eventuallyeliminated. For example, we may imagine that the most recently developed features of theorganism might begin in an inefficient state, but eventually as the organism matures, thesefeatures are improved to their optimal state and align optimally with the rest of the organ-ism. By contrast, asymptotic structural inefficiency identifies persistent mis-alignments.It would be desirable to sharpen the definition even further by considering dynamic effi-ciency issues. Without going into the details of such a definition, I note that the kludgesin this paper represent static as well as dynamic inefficiencies. In addition to outperform-ing a kludge at each point in time, positively aligned organisms also grow in intensity andprecision faster than kludges.

15

Page 17: Abstract - Northwestern University

t + 1, that is Ot+1 = O′. If not, then the existing version survives, i.e.Ot+1 = Ot.16

Two types of variations can occur which we call expansion and mutation.First, with probability q, the organism undergoes expansion. This involveseither increasing complexity x or increasing the length of the genetic codeπ, by one unit. Assume that the organism always expands on the dimen-sion which results in a higher payoff and that the “gene” πk+1 at the addedlocus k + 1 is set optimally given the existing π.17

To illustrate and to preview formal arguments to come, suppose at timet, the organism is a kludge with complexity x and precision k. Let Vx,kdenote the overall payoff of such an organism. If expansion occurs andVx+1,k > Vx,k+1 then the organism will increase complexity by one unit,otherwise it will increase precision (by increasing π.)

To trace out the evolution of a kludge, it is useful to define the followingsequence of numbers {xk}∞

k=1. First, set x1 = 1. Next, for k > 1, definexk to be the smallest x ≥ xk−1 such that Vx,k > Vx+1,k−1. As the organismexpands, it will improve along the relevant margin as illustrated in Figure 4(beginning from some initial level of precision k̄.)

On the other hand18, with probability (1− q) the organism instead un-dergoes a mutation which alters the genetic code π. Mutations are modeledas follows. A subset of the “loci” {0, 1, . . . , k} is selected and the genes πjat those loci change sign. We will assume that all subsets have positiveprobability.19 Apart from this, we can be quite general in terms of the prob-abilities with which various subsets are selected. One simple model wouldbe the following case of independent mutations: there is a fixed mutationprobability µ > 0 and each gene πj changes sign with independent proba-bility µ.

16It is obviously a simplification to use expected fitness as the criterion, but this is an ap-proximation to an environment in mutations arrive infrequently. In particular, we can thinkof each period as representing a length of time sufficiently long that the actual performanceapproximates the average.

17These assumptions keep the analysis simple. A general model would set the sign ofπk+1 randomly. However, whenever the sign is mis-aligned, (i.e. πk+1π0 6= λk+1) thiswould reduce the payoff of the organism, so there is no loss in ignoring these events. Aprocess in which the organism can expand on either dimension with positive probabilitydelivers the same conclusion as this one but with added notational burden.

18It is an innocuous notational simplification that some variation occurs in every periodwith probability 1.

19This is used only for the benchmark result in Proposition 1.

16

Page 18: Abstract - Northwestern University

Figure 4: Escape Process.

Returning to Figure 4, the crucial mutation events will be those thatresult in a new organism that outperforms a kludge, illustrated by thediagonal arrows. Recall that a kludge is negatively aligned despite posi-tive alignment being optimal. At a minimum, therefore, an improvementwould require changing the alignment, i.e. the sign of π0. Note however,that the kludge has all other genes πj negatively aligned. Thus a changingonly of π0 would result in a complete misalignment of π. This means thatin addition to a change in sign of π0, a mutation must also change the signof sufficiently many πj so that the organism makes an accurate estimate ofthe location of the resource and benefits from the reduced computationalburden of positive alignment. The size of such a mutation, measured bythe number of genes which would be required to change sign, will deter-mine the probability that the organism escapes being kludged.

The key to our result is in showing that the size of this mutation isbounded below by a constant fraction of k. Thus, the probability of escapewill be shrinking as the organism expands in complexity. The independentmutation model would imply that this probability shrinks exponentiallyfast, and this will enable us to show that, with positive probability, an es-cape never occurs.

While the independent mutation model is useful for building intuition,it imposes more structure than is needed for the results. Our assump-tion abstracts the key condition which is the following generalized large-

17

Page 19: Abstract - Northwestern University

deviation property governing the asymptotic probabilities of large muta-tions.

Assumption 1 (Large Deviation Property). Let ωk be a probability distributionover subsets of {0, 1, . . . , k}. We say that the family of distributions {ωk}k∈N

satisfies a large-deviation condition if there exists µ ∈ (0, 1) and a functionM : N → (0, 1) such that M(k) is an upper bound for the probability under ωkof selecting a subset with cardinality larger than µk and

lim supk

M(k + 1)M(k)

< 1.

This is a large-deviation property which asymptotically bounds the prob-ability of mutations to a large fraction m (greater than µ) of genes from anygiven large subset. This probability must shrink to zero at a sufficiently fastrate. Note that by a standard result from large-deviation theory, the inde-pendent mutation model is a special case. We now turn to a formal analysisof the adaptive process.

6 Analysis

First, we consider an instructive benchmark case in which q = 0. In thiscase, the complexity of the organism is fixed and cannot increase. Then,because the mutation probabilities are strictly positive, with probability 1the organism will be optimally adapted after some finite timespan.

Benchmark with q = 0 Consider an arbitrary organism O of complexityx. Let O∗ be an optimal organism of the same complexity. There is a lowerbound on the probability of a mutation large enough to transform any suchO intoO∗. In the worst case, a change to the entire genetic structure will berequired and the probability of such a large mutation is strictly positive byassumption. When q = 0, the organism will never increase in complexityand so this remains forever a lower bound on the probability of reachingan optimally adapted organism in a single step. It follows that with proba-bility 1 the optimal organism will appear eventually. Moreover an optimalorganism can never be replaced if the complexity of the organism cannotincrease.

Proposition 1. When q = 0, with probability 1 the organism is eventually opti-mally adapted, regardless of the initial complexity.

18

Page 20: Abstract - Northwestern University

The proposition shows that any asymptotic inefficiency that arises whenq > 0 is not due to a simple problem of local optima. The model allows forarbitrarily large mutations at any point in time. Thus, any improvement, ofany fixed size, if available for sufficiently long, will eventually be realized.On the other hand, this argument does not apply to improvements whichrequire larger and larger mutations. Potentially, the organism can gradu-ally improve at the margin by increasing in complexity, all the while inten-sifying the complementarity among its components. This would mean thatsubstantial improvements decline in probability. Whether such improve-ments will be realized will depend on the rate at which this probabilitydeclines.

The main result of the paper concerns the case of q > 0 and asymptoticstructural inefficiency.

Theorem 1. Suppose µ < 1/2. When q > 0 there is a positive probability that theorganism will be forever kludged and thus asymptotically structurally inefficient.

6.1 Proof of Theorem 1

Recall that we have assumed that l > 1/2. The parameter l determines theprobability that each λj = +1. As discussed above, what matters for theoptimal design of the organism is the sign of the moving average

L(k) =1k

k

∑j=1

λj.

Because l > 1/2, by the strong law of large numbers, there is probabilityone on the set of environments in which there exists some k̄ such that L(k)is positive for all k > k̄. Throughout the proof, we fix such an environmentλ and integer k̄ and consider the stochastic process of evolution in thatenvironment.

Let O be a kludge. We wish to bound the probability that the organismwill ever be replaced by one that is not a kludge. Since expansion can onlylead to a kludge of larger complexity or precision, the crucial events willinvolve mutations. We say that a beneficial mutation occurs if a mutationproduces an organism O′ which has strictly larger payoff. Let ηx,k be theprobability that a kludge of complexity x and precision k will undergo abeneficial mutation. Consider the following simplified stochastic process.Let the states of the process correspond to the levels of complexity andprecision (x, k) of the organism. Refer to Figure 4. At each state (x, k),

19

Page 21: Abstract - Northwestern University

three transitions are possible. First, with probability q, an expansion occursand the organism transits to the next state which is (x + 1, k) in the case ofx < xk+1 (the horizontal transitions in the figure) or (x, k + 1) in the case ofx = xk+1 (the vertical transitions.) Second, with probability (1− q)ηx,k, theorganism “escapes” and the process terminates. Finally, with the remainingprobability, the process continues and the state is unchanged at (x, k).

The process begins in state (xk̄, k̄). We refer to this process as the escapeprocess. It is a simplification of the true evolutionary process because it doesnot consider the further evolution that occurs after a beneficial mutation.We will analyze the probability that escape never occurs and this will be alower bound for the probability that the organism remains kludged forever.

Lemma 1. The probability that the organism remains asymptotically structurallyinefficient is bounded below by the probability that the escape process never termi-nates. This probability is positive if and only if

∑k=1

xk+1

∑x=xk

ηx,k < ∞.

Proof. We apply a standard technique from the theory of countable-stateMarkov chains20. With each non-terminal state (x, k), we associate a vari-able Zx,k. We consider the system of equations which expresses each Zx,kas a linear combination of other variables where the weights are the non-terminal transition probabilities. This yields the following system.

Zx,k = qZx+1,k + (1− q)(1− ηx,k)Zx,k x ∈ {xk, . . . , xk+1 − 1} (12)Zx,k = qZx,k+1 + (1− q)(1− ηx,k)Zx,k x = xk+1 (13)

The probability is positive that the system will never reach the terminalstate if and only if the system possesses a bounded, non-zero solution. Toestablish this, we set Zxk̄ ,k̄ = 1, rewrite the system as follows

Zx+1,k =

(1 +

(1− q)ηx,k

q

)Zx,k x ∈ {xk, . . . , xk+1 − 1} (14)

Zx,k+1 =

(1 +

(1− q)ηx,k

q

)Zx,k x = xk+1, (15)

and recursively substitute to obtain

Zx,k =k−1

∏l=k̄

xl+1

∏y=xl

[1 +

(1− q)q

ηy,l

x−1

∏y=xk

[1 +

(1− q)q

ηy,k

]. (16)

20See (Billingsley, 1995, Theorems 8.4 and 8.5)

20

Page 22: Abstract - Northwestern University

The solution is bounded iff∞

∏k=1

xk+1

∏x=xk

[1 +

(1− q)q

ηx,k

]< ∞ (17)

which is equivalent to21

∑k=1

xk+1

∑x=xk

[(1− q)

qηx,k

]< ∞. (18)

The latter is equivalent to the condition in the statement of the lemma.

We now present two lemmas which establish a bound on the seriesidentified in Lemma 1.

Lemma 2. There exists a function M(k) such that ηx,k ≤ M(k) for all k and x,and

lim supk

M(k + 1)M(k)

< 1. (19)

Lemma 3. There exists a function C(k) such that xk+1 ≤ C(k) for all k and

limk

C(k + 1)C(k)

= 1. (20)

Combining lemmas 2 and 3 enables us to conclude the proof of Theo-rem 1 as they establish the bound

∑k=1

xk+1

∑x=xk

ηx,k ≤∞

∑k=k̄

C(k)M(k) < ∞

which by Lemma 1 is enough to prove the theorem.We now turn to the proofs of Lemma 2 and Lemma 3.

Proof of Lemma 2. We will show that for any x, the probability ηx,k is smallerthan the probability that at least k/2 genes are selected for mutation.

Consider a mutation which resulting in the genetic code π with π0 =+1. Let A be the subset of {π1, . . . , πk} which also undergo mutation, andassume that A consists of fewer than k/2 elements. We have

πj =

{λj for πj ∈ A−λj for πj ∈ {π1, . . . , πk} \ A

21Note that for any sequence of positive numbers Rn, 1 + ∑x1 Rn ≤ ∏x

1(1 + Rn) ≤exp(∑x

1 Rn).

21

Page 23: Abstract - Northwestern University

because the organism was negatively aligned prior to the mutation. LetB be a subset of πj ∈ {π1, . . . , πk} \ A that has the same cardinality as A.(This is possible because A has fewer than k/2 elements.)

The expected average payoff to the new organism is

[2a(σ|π) · E(θ|σ)− a(σ|π)2]

which, expanding the definitions of a(σ|π) and E(θ|σ) is less than

[(2

k + 2

)2 k

∑j=1

πjσj

k

∑j=1

λjσj

]

which we can decompose into three terms

{(2

k + 2

)2 k

∑j=1

λjσj

[∑j∈A

λjσj −∑j∈B

λjσj − ∑j/∈A∪B

λjσj

]}.

Note that because the sequence {λjσj} is i.i.d. conditional on θ, by de Finetti’stheorem it is an exchangeable sequence and therefore,

{k

∑j=1

λjσj

[∑j∈A

λjσj −∑j∈B

λjσj

]}= 0.

And recalling that for any j, j′, the covariance of λjσj and λj′σj′ is positive(Equation 10), the remaining term

[−

k

∑j=1

λjσj · ∑j/∈A∪B

λjσj

]

is negative. Thus, the expected average payoff of the new organism is neg-ative. The total expected payoff is therefore lower than that of the originalorganism. This shows that a mutation is beneficial only if it affects morethan k/2 genes. By the large deviation property, the probability of such amutation is bounded by M(k) where

lim supk

M(k + 1)M(k)

< 1.

22

Page 24: Abstract - Northwestern University

Proof of Lemma 3. Recall that xk is defined to be the smallest x ≥ xk−1 suchthat Vx,k > Vx+1,k−1. Because a kludge is negatively aligned and nega-tive alignment achieves the optimal average payoff var θ̄k we can apply thebudget formula in Equation 11, to express xk as follows.[

xk − 1−k

∑j=1

3 + λj

2

]var θ̄k =

[xk −

k−1

∑j=1

3 + λj

2

]var θ̄k−1

up to an integer. Rearranging, we can obtain the bound

xk ≤3 var θ̄k

var θ̄k − var θ̄k−1+ 2k

From Equation 8 and Equation 9 the first term is bounded by a constantχ multiplied by (k + 2)2 so that

xk ≤ C(k) := (k + 2)2χ + 2k

with

limC(k + 1)

C(k)= lim

(k + 3)2χ + 2(k + 1)(k + 2)2χ + 2k

= lim(k + 3)2

(k + 2)2 = 1.

7 Extensions

The model is purposefully simple. However a number of natural and in-structive extensions are possible. Here I will sketch two.

First, the model of natural selection does not have any serious compe-tition across organisms. And indeed in its simple form here, competitioncould eliminate the inefficiency as long as we are assured that at least oneindividual in the population begins positively aligned. This will be truewith large probability in a large population. However, it is easy to en-rich the decision problem so that with probability 1 any individual will bekludged and hence so will all individuals in the population. What followsis a sketch of such an extension.

Suppose that now there are two ways in which an organism can prof-itably increase in complexity. First, it may improve its procedure for solvinga given resource extraction problem as in the present model. Second now,the organism may evolve the ability to extract a new kind of resource in ad-dition to the old. Each problem that the organism is currently equipped to

23

Page 25: Abstract - Northwestern University

solve has its own apparatus identical in structure to the one modeled here,however independently evolving. And the organism always benefits fromincreasing in complexity in this way. Then the result from this paper thatan individual problem will be kludged with positive probability impliesthat every member of the population, even the most fit, will be kludged inthe way it solves a positive, and bounded, fraction of problems.

The second extension uses competition not as a robustness test, but todraw new conclusions from the model. Suppose that organisms competefor resources within a niche. There is a hierarchy of niches ordered by thecomplexity necessary to exploit each niche. At the beginning, simple organ-isms inhabit the simplest niche and evolve to adapt to that niche. Soon, effi-cient simple organisms emerge and grow the saturation point. This createsan incentive for organisms to increase in complexity in order to move intothe unoccupied more complex niche. This process continues. The resultof this process would be that some “species” stop increasing in complexitywhile others are perpetually driven to increasing complexity. Those thatstop, the simplest organisms, will eventually become efficiently adapted totheir niche as in Proposition 1. By contrast, the leading edge organisms willbe better described by Theorem 1, potentially remaining kludged forever.

A Appendix

Here we prove the statistical result given in Equation 9 and Equation 8.Define τj = λjσj. Note that the Eτj = 0 and so by Equation 6, Eθ̄k = 0

and hence var(θ̄k) = Eσ

(θ̄2

k

). We calculate

E(τj|τ1, . . . , τj−1

)= Prob(τj = 1|τ1, . . . , τj−1)− Prob(τj = −1|τ1, . . . , τj−1)

and by Equation 5 and the law of total probability,

= E( θ + 1

2|τ1, . . . , τj−1

)− E

(1− θ

2|τ1, . . . , τj−1

)=

θ̄k−1 + 12

− 1− θ̄k−1

2= θ̄k−1. (21)

From Equation 6,

θ̄k =

(k + 1k + 2

)θ̄k−1 +

τk

k + 2,

24

Page 26: Abstract - Northwestern University

so

var(θ̄k) = E

[(k + 1k + 2

)2

θ̄2k−1 + 2

(τk

k + 2

)(k + 1k + 2

)θ̄k−1 +

1(k + 2)2

]

= Eτ1,...,τj−1 E

[(k + 1k + 2

)2

θ̄2k−1 + 2

(τk(k + 1)(k + 2)2

)θ̄k−1 +

1(k + 2)2 |τ1, . . . , τj−1

]

Applying Equation 21

= Eτ1,...,τj−1

[1

(k + 2)2 +2(k + 1) + (k + 1)2

(k + 2)2 θ̄2k−1

]and we have

var(θ̄k)− var(θ̄k−1) = E[

1(k + 2)2 −

(1− 2(k + 1) + (k + 1)2

(k + 2)2

)θ̄2

k−1

]= E

[1

(k + 2)2 −1

(k + 2)2 θ̄2k−1

]=

1− var(θ̄k−1)

(k + 2)2 .

establishing Equation 9.Now to show Equation 8, note that the sequence of random variables θ̄k

converges in distribution to the random variable θ and therefore var(θ̄k)→var(θ). Equation 9 implies that the former is monotonically increasinghence var(θ̄k) < var(θ) for all k.

References

BALIGA, S., AND J. C. ELY (2009): “Mnemonomics: Sunk Cost Bias as aMemory Kludge,” working paper.

BILLINGSLEY, P. (1995): Probability and Measure. Wiley and Sons.

DAWKINS, R. (1986): The Blind Watchmaker. W.W. Norton.

FRIEDMAN, M. (1966): “The Methodology of Positive Economics,” in Es-says in Positive Economics. University of Chicago Press.

LINDEN, D. (2007): The accidental mind. Belknap Press.

25

Page 27: Abstract - Northwestern University

MARCUS, G. (2008): Kluge: The haphazard construction of the human mind.Houghton Mifflin Harcourt.

RAYO, L., AND G. BECKER (2007): “Evolutionary Efficiency and Happi-ness,” Journal of Political Economy, 115(2), 302–337.

ROBSON, A. (2001): “Why Would Nature Give Individuals Utility Func-tions?,” Journal of Political Economy, 109, 900–914.

SAMUELSON, L., AND J. SWINKELS (2006): “Information, Evolution andUtility,” Theoretical Economics, 1(1), 119–142.

SANDHOLM, W., AND A. PAUZNER (1998): “Evolution, Population Growth,and History Dependence,” Games and Economic Behavior, 22, 84–120.

26