Physical Separation of DNA According to Royal Road Fitnesswood/449/notesNov13.pdf · 2003. 11. 13. · Physical Separation of DNA According to Royal Road Fitness ... However, selecting

Physical Separation of DNA According to Royal Road Fitness

David Harlan WoodDepartment of Computer and Information Sciences

University of Delaware, Newark DE [email protected]

Junghuei ChenDepartment of Chemistry and BiochemistryUniversity of Delaware, Newark DE 19716

[email protected]

Abstract- We want to implement evolutionary computa-tion using DNA, with trillions of candidate solutions beingsimultaneously evaluated for fitness. Unsurprisingly, themost difficult aspect is designing and implementing lab-oratory methods for physical separation of DNA strandsaccording to “fitness.” We propose a DNA strand designsuited to the classical Royal Road Problem. And we pro-pose companion laboratory operations which would phys-ically separate these DNA strands according to the RoyalRoad fitness criterion.

1 Introduction

Since the beginning of DNA computing, there have been callsto implement Evolutionary Computation using DNA. One en-visions populations of trillionsof candidates being simultane-ously evaluated for fitness! An often-cited reason for believ-ing this may be possible is that so-called “in vitro evolution”is an established part of molecular biology.In vitro evolutionstarts with a randomized population of DNA strands. Muta-tion, and sometimes crossover, is used in each generation tobreed DNA strands of higher “fitness.”

Unsurprisingly, the most difficult aspect of implementinggenetic algorithms in DNA is in designing and implementinglaboratory methods for selection by fitness.

2 Genetic Algorithms Using DNA

Recent expositions of DNA computing can be found in [14]and [24]. See also the DNA computing bibliography ofDassen [8].

There have been calls in the literature [13, 26, 32] to usemolecular materials for evolutionary computations. A fewpreliminary proposals have been given. In a recent DIMACSWorkshop an approach to the maximum clique problem wasproposed [1], as well as some other classic genetic algorithmstest case problems [3]. Recently, preliminary results for theMax 1s problem were given [41]. The earliest design waspresented [10] in outline in 1997, but no laboratory resultshave been so far obtained.

Of all computing paradigms inspired by evolution, geneticalgorithms seem particularly suited to implementation usingDNA. This is because genetic algorithms generally use bit-strings, crossover, and pointwise mutation. DNA computingcould do trillions of fitness evaluations at the same time (ifthey are simple enough). The cost of DNA computing is pro-

portional to the number of generations required. We attemptto minimize the number of generations required by makinguse of both pointwise mutation and crossover.

DNA computing techniques are desirable for genetic al-gorithm computations for several reasons, some of which arelisted below.

� These techniques might process, in parallel, popula-tions which are billions of times larger than is usualfor conventional computers. The expectation for largerpopulations is: they can sustain larger ranges of geneticvariation and thus can generate high-fitness individualsin fewer generations.

� Massive information storage is available using DNA.For example, grams of DNA could eventually be used.A gram of DNA contains about1021 bases. This infor-mation content is approximately2� 10

21 bits, greatlyexceeding the 200 petabyte storage of all the digitalmagnetic tape produced in one year [39].

� Modifications to the current technology ofin vitro evo-lution suffice to implement pointwise mutation [15, 27,34, 35, 36] and crossover [30, 31, 33].

� Biolaboratory operations on DNA inherently involveerrors. These are more tolerable in executing geneticalgorithms than in executing deterministic algorithms.To some extent, errors may be regarded as contributingto desirable genetic diversity.

However, selecting DNA strands for “breeding” in geneticalgorithms can be challenging because one mustphysicallyseparate DNA strands according to their “fitness.”

3 The Royal Road Problem

The “Royal Road” family of problems are of particular in-terest because it is one of very few families of problems forwhich theoretical predictions are available [38].

A fixed-length target is specified consisting ofN blocks,each block consisting ofK bits. Each block of a candidatebitstring makes no contribution to fitness unless it is aperfectmatch to the corresponding block on the target. Convention-ally, the fitness of a candidate is taken to be the number ofsuch perfectly matched blocks. The objective is to evolvesome bitstrings to perfectly match the target.

This family of problems got its name from the fact that itwas intended to be especially suitable for genetic algorithms

using crossover [25]. Distressingly, computer trials for Royalroad problems exhibit a wide variety of unpleasant conver-gence behaviors (see Figure 1, reproduced from [38]).

0

123

45

6789

10

0 2500 5000 7500 10000 12500 15000

0

1234

56

78

9

10

0 1000 2000 3000 4000 50000

1

23

4

5

6

78

9

10

0 100 200 300 400 500 600 700

0

2

46

8

10

121416

18

20

0 500 1000 1500 2000 2500

0

1

2

3

4

0 25 50 75 100 125 150(x 1000)

⟨f⟩

⟨f⟩

⟨f⟩

t t t

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

N=10, K=6M=500q=0.001

M↑5000

M↓50

K↓3

0.0075↑q

0.005↑q

M↓

300

7500↑M

15↑K

0123

4

5

6

789

10

0 200 400 600 800 1000

01

23

4

5

6789

10

0 5000 10000 15000 20000 25000

0

1

23

4

56

789

10

0 500 1000 1500 2000 2500

0

5

10

15

20

0 150 300 450 600 750

Figure 1: Evolution of average fitness<f> for a genetic al-gorithm for Royal Road problems varies greatly with popu-lation sizeM , mutation rateq, and the number of blocksN ,each havingK bits. Eight graphs show the effects of varyingparameters from those used in graph (d). (From [38], withpermission.)

Confirming an earlier conjecture [37], recent seminal pa-pers [5, 6, 7, 38] from the Santa Fe Institutepredict the pre-viously unanticipated behaviors for the Royal Road problem,attributing them to limitations on population sizes. In futurework on Royal Road problems we hope to test the predictionsof the Santa Fe papers using population sizes which are toolarge to be practical for conventional computers.

4 DNA Implementation of Royal Road Fitness

For the Royal Road problem, the fitness of a bitstring is thenumber of blocks(all of the same given size) which haveevolved into a perfect match with a preassigned target. In ourproposed design, the (initially random) Royal Road blocksin DNA strands alternate with (distinct and unchanging) spe-cially designed separators.

The separators ensure an alignment and partial stabiliza-tion. In such an situation, an evolving block contributes alocal stabilization if and only if it is a perfect match with thetarget. Then, 2d denaturing gradient gel electrophoresis (2dDGGE) can provide physical separation proportional to thenumber of locally stabilized blocks—that is, proportional tothe Royal Road fitness function.

The proposed physical separation by fitness isaccom-plished according to how well candidate DNA strands match(stick to) DNA “target” strands complementary to the desired

outcome. The separators are designed to enforce alignmentby sticking only to their complements on the target strand.Further, separators are designed to stick more tightly thando perfectly matched blocks. The key to physical separa-tion according to Royal Road fitness is adjusting experimen-tal conditions so that each perfectly matched block sticks toits proper place in the target strand, but imperfectly matchedblocks can not stick.

4.1 DNA Design of Target and The Candidate Solutions

The target alternates Royal Road blocks with distinct and un-changing spacers. For an initial investigation, we considerfive blocks of 10 Ts to represent the Royal road blocks. Thespacers are taken to be 10 Gs, except we use 50 Gs on eachend to avoid end effects. Gs bond more tightly to their com-plement than do Ts. Of course, we are temporarily ignoringconcerns about the spacers ensuring correct alignment. How-ever by modeling this simplified design, we are able to arguethat for candidate solutions, we have a way to implement theRoyal Road fitness evaluation.

The main idea is to use denaturing (tending to separatestrands) conditions strong enough that perfectly complemen-tary strands can just barely remain bonded. In this circum-stance, even a single mismatch in a Royal Road block (con-sisting of Ts) is enough to disassociate the entire block. Butspacer sequences (consisting of all Cs and Gs) do not disas-sociate because they are more tightlybound.

Of course, this description is an idealization. It needs tobe confirmed experimentally that it approximately physicallyseparates DNA strands according to the number of perfectlymatched blocks; that is by the Royal Road fitness criterion.

Encouraging preliminary results can be obtained usingcomputer simulation. Using the on-line POLAND software[29], we obtain estimates of the 2d DGGE mobility of ourparticular candidate design. Figure 2 shows that for tempera-tures of70o or more a single mismatch in a Royal Road blockdisassociates its whole block with high probability.

4.2 2d DGGE Implements Royal Road Fitness

The the most challenging part of the DNA implementationof genetic algorithms is to identify a laboratory process thatwill physically separate DNA strandsaccording to their “fit-ness.” For this task we use so-called 2d denaturing gradientgel electrophoresis (2d DGGE), which we push far beyond itsestablished domain of application [23]. A first important factfor DNA computing is that 2d DGGE can detect even a sin-gle base mismatch in DNA strands. Indeed, this is a commonapplication of 2d DGGE in molecular biology [23]. In effectour design of candidate strands magnifies the effect of a mis-match at a single point, making it functionally equivalent tomismatching its entire block.

It should be noted that our experiments with 2d DGGE[21, 41] demonstrated a surprisingly smooth transition in 2dDGGE through a large dynamic range of mismatching. This

70

73

75

78

80

83

85

88

90

0.0

0.2

0.4

0.6

0.8

1 50 100 150

Sequence / nt

T / °C

p

70

73

75

78

80

83

85

88

90

0.0

0.2

0.4

0.6

0.8

1 50 100 150

Sequence / nt

T / °C

p

70

73

75

78

80

83

85

88

90

0.0

0.2

0.4

0.6

0.8

1 50 100 150

Sequence / nt

T / °C

p

Figure 2: Probability of disassociation as a function of sequence position and temperature. Left: No mismatches occur and thestrand tends to remain uniformly intact up to about84

o. Center: A single mismatch at the first position in the first Royal Roadblock tends to disassociate its entire block of 10 Ts. Right: A single central mismatch also tends to disassociate its entire block.

is the only example known to us of physically separating acombinatorially encoded library of DNA strands according toglobal criteria. (This question does not occur in molecularbiology applications.)

Review of 2d DGGE

Let us review the nature of DGGE. Figure 3 shows a an il-lustrative case, a 2d DGGE from our laboratory [41] hav-ing complementary strands. The mixture of complemen-

Denaturant Concentration

Ele

ctro

phor

esis

-

+ Low High

Figure 3: DGGE using complementary strands. DNA strandsmove downward from a reservoir at the top of the figure. Thespeed of vertical strand migration is retarded as strands comeapart (denature) as shown schematically on the figure.

tary strands is placed uniformly along the top of the gel.These strands travel vertically downward in the gel as a re-sult of an applied electric field. However, their speeds of

migration is determined by their conformations, which de-pend on their initial placement from left to right; that is, byhow strongly they are denatured (pulled apart). On the left,where no denaturant is encountered, the strands move rel-atively quickly downward. In the center, they move moreslowly because they encounter intermediate denaturing, be-coming less streamlined. At the extreme right, the stands areable to move only very slowly because the strands are almostcompletely pulled apart.

Computer Simulations of 2d DGGE Separations

The on-line POLAND software [29], was used to generate Fig-ure 4. This figure predicts a clear separation of three RoyalRoad situations. From top to bottom, two mismatched blocks,one mismatched block, and no mismatched blocks. In addi-tion, each of the central curves correspond to quite differentlylocated mismatches — one at the center of the center block,and one as far as possible from the center.

5 Completing the Design

One needs to design the spacer sequences of DNA to onlystick (hybridize) to their complementary locations and not tostick anywhere else. This will have to be done before pro-ceeding to the laboratory.

Correct alignment can be difficult to achieve, because ex-act complementary pairing is not required for pieces of DNAto stick. This appears to lead to difficulties in experiments[12, 18, 19, 20]. Methods have been proposed for finding“good encodings” [2, 4, 9, 11, 10, 16, 17, 40].

Some experimental data on highly selective hybridizationresults have been reported [28], which have been laboratorytested to ensure they only stick where they are wanted.

70 72 74 76 78 80 82 84 86 88 900

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Temperature Equivalent of Denaturation

Rel

ativ

e M

obili

ty in

Ele

ctric

Fie

ld

Figure 4: DGGE predictions for selected imperfect candi-dates using the POLAND software. At78o, the curves corre-spond, from the bottom up, to (1) perfect match of all blocks,(2) single mismatch at the center, (3) single mismatch at thevery first possibility, (4) double mismatch at both of these po-sitions.

6 Making Generations by Breeding

This report has concentrated on the preliminary design pa-rameters that implement the Royal Road fitness function.When this is available one can physically separate more than10

12 candidate solutions in a single 2d DGGE operation.(Elsewhere we have ventured [41] that this could comparefavorably with contemporary supercomputers.)

Physical separation by fitness is the especially challengingaspect of doing genetic algorithms with DNA. The remainingaspects of Genetic Algorithms — selection by fitness, point-wise mutation, and single point crossover — are more closelyrelated to conventionalin vitro evolution. We have presentedapproaches to these Genetic Algorithms aspects in previouspapers [3, 41].

Here we have reported a design and computer simula-tions that suggest a means of implementing DNA solutionsof Royal Road problems using populations billions of timeslarger than is usual with conventional computers.

Acknowledgment

We want to acknowledge partial support under DARPA/NSFGrant No. 9725021.

Bibliography

[1] Thomas Back, Joost N. Kok, and Grzegorz Rozenberg.Evolutionary computation as a paradigm for DNA-based computing. In Laura Landweber, Erik Winfree,Richard Lipton, and Stephan Freeland, editors,Prelim-inary Proceedings DIMACS Workshop on Evolution asComputation, pages 67–88, DIMACS, Piscataway NJ,January 1999. Available on request from DIMACS. Pa-per found at URL: http://www.wi.LeidenUniv.nl/˜joost.

[2] Eric B. Baum. DNA sequences useful for computation.In Landweber and Lipton [22].

[3] Junghuei Chen, Eugene Antipov, Bertrand Lemieux,Walter Cede˜no, and David Harlan Wood. DNA com-puting implementing genetic algorithms. In LauraLandweber, Erik Winfree, Richard Lipton, and StephanFreeland, editors,Preliminary Proceedings DIMACSWorkshop on Evolution as Computation, pages 39–49, DIMACS, Piscataway NJ, January 1999. Avail-able on request from DIMACS. Paper found at URL:http://www.cis.udel.edu/˜wood/papers/DIMACS99.ps.

[4] H. C. Crick, J. S. Griffith, and L. E. Orgel. Codes with-out commas.Proceedings of the National Academy ofSciences USA, 43:416–421, 1957.

[5] James P. Crutchfield and Erik van Nimwegen. Op-timizing epochal evolutionary search: Population-size independent theory. SFI Working Paper 98-06-046, 1998, 18 pages. Paper found at URL:http://www.santafe.edu/projects/evca/evabstracts.html.

[6] James P. Crutchfield and Erik van Nimwegen. Op-timizing epochal evolutionary search: Population-size dependent theory. SFI Working Paper 98-10-090, 1998, 18 pages. Paper found at URL:http://www.santafe.edu/projects/evca/evabstracts.html.

[7] James P. Crutchfield and Erik van Nimwegen. The evo-lutionary unfolding of complexity. In Laura Landwe-ber, Erik Winfree, Richard Lipton, and Stephan Free-land, editors,Proceedings of the DIMACS Workshop onEvolution as Computation, New York, 1999, to appear.Springer-Verlag.

[8] J. H. M. Dassen. A bibliography of molecularcomputation and splicing systems. HTML in:http://www.wi.LeidenUniv.nl/ jdassen/dna.html, Bib-TeX source: http://www.wi.LeidenUniv.nl/ jdassen/dna.bib.This bibliography is also hooked intohttp://liinwww.ira.uka.de/bibliography/, The Col-lection of Computer Science Bibliographies.

[9] R. Deaton, Max H. Garzon, R. C. Murphy, Donald R.Franceschetti, and Jr. S. E. Stevens. Genetic search ofreliable encodings for DNA based computation. InFirst

Conference on Genetic Programming, Stanford Univer-sity, 1996.

[10] R. Deaton, R. C. Murphy, J. A. Rose, Max H. Gar-zon, Donald R. Franceschetti, and S. E. Stevens Jr. ADNA based implementation of an evolutionary searchfor good encodings for DNA computation. InIEEEInternational Conference on Evolutionary Computa-tion, pages 267–271, Indianapolis, Illinois, April 13–16,1997.

[11] R. Deaton, R.C. Murphy, M. Garzon, D.R.Franceschetti, and S.E. Stevens, Jr. Good encodingsfor DNA-based solutions to combinatorial problems. InLandweber and Lipton [22].

[12] R. Deaton, R.C. Murphy, M. Garzon, D.R.Franceschetti, and S.E. Stevens, Jr. Good encodingsfor DNA-based solutions to combinatorial problems. InLandweber and Lipton [22].

[13] Alan Dove. From bits to bases: Computing withDNA. Nature Biotechnology, 16(9):830–832, Septem-ber 1998.

[14] Tino Gramß, Stephan Bornholdt, Michael Gramß,Melanie Mitchell, and Thomas Pellizzari. Non-Standard Computation. Wiley-VCH, Weinheim, 1998.

[15] Rachel Green, Andrew D. Ellington, David P. Bartel,and Jack W. Szostak.In vitro genetic analysis: Selec-tion and amplification of rare functional nucleic acids.Methods, 2:75–86, 1991.

[16] Alexander J. Hartemink, David K. Gifford, and JuliaKhodor. Automated constraint-based nucleotide se-quence selection for dna computation. In Harvey Ru-bin and David Harlan Wood, editors,Preliminary Pro-ceedings of the Fourth Annual Workshop on DNA BasedComputers, held at the University of Pennsylvania, June15-19, 1998. University of Pennsylvania, 1998.

[17] B. H. Jiggs. Recent results on comma-free codes.Cana-dian Journal of Mathematics, 15:178–187, 1963.

[18] Peter Kaplan, Guillermo Cecchi, and Albert Libchaber.DNA based molecular computation: Template-templateinteractions in PCR. In Landweber and Lipton [22].

[19] Peter D. Kaplan, Guillermo Cecchi, and Albert Libch-aber. Molecular computation: Adleman’s experimentrepeated. Technical report, NEC Research Institute,1995.

[20] Peter D. Kaplan, Guillermo Cecchi, and Albert Libch-aber. DNA based molecular computation: template-template interactions in PCR. In Landweber and Lipton[22].

[21] Laura Landweber, Richard Lipton, Andrew Ellington,and Robert Dorit, editors. DIMACS Nucleic Selec-tion, DIMACS, Piscataway NJ, March 1998. Ab-stracts only. Available on request from DIMACS. URL:http://dimacs.rutgers.edu/Workshops/NucleicAcid.

[22] Laura F. Landweber and Richard J. Lipton, editors.DNA Based Computers II: DIMACS Workshop, June10-12, 1996, volume 44 ofDIMACS series in discretemathematics and theoretical computer science, Provi-dence, 1998. American Mathematical Society.

[23] L. S. Lerman, K. Silverstein, and E. Grinfeld. Search-ing for gene defects by denaturing gradient gelelectrophoresis. Trends in Biochemical Sciences,172(3):89–93, 1992.

[24] Carlo C. Maley. DNA computation: Theory, practice,and prospects.Evolutionary Computation, 6(3):201–229, 1998.

[25] M. Mitchell, S. Forrest, and J. H. Holland. The royalroad for genetic algorithms: Fitness landscapes and GAperformance. In F. V. Varela and P. Bourgine, editors,Proceedings of the First European Conference on Ar-tificial Life, pages 245–254, Paris, France, 1991. MITPress.

[26] Robert Pool. Forget silicon, try DNA.New Scientist,151(2038):26–31, July 13, 1996.

[27] D. L. Robertson and F. G. Joyce. Selectionin vitro of anRNA enzyme that specifically cleaves single-strandedDNA. Nature, 344(6265):467–468, March 29, 1990.

[28] D. D. Shoemaker, D. A. Lashkari, D. Morris,M. Mittmann, and R. W. Davis. Quantitative pheno-typic analysis of yeast deletion mutants using a highlyparallel molecular bar-coding strategy.Nature Genetics,14:450–456, 1996.

[29] Gerhard Steger. Thermal denaturation of double-stranded nucleic acids: Prediction of tempera-tures critical for gradient gel electrophoresis andpolymerase chain reaction. Nucleic Acids Re-search, 22(14):2760–2768, July 25, 1994. On lineserver is found at the URL: http://www.biophys.uni-duesseldorf.de/POLAND/poland.html.

[30] Willem P. C. Stemmer. DNA shuffling by randomfragmentation and reassembly:In vitro recombinationfor molecular evolution.Proceedings of the NationalAcademy of Science, U.S.A., 91:389–391, 1994.

[31] Willem P. C. Stemmer. Rapid evolution of a protein byDNA shuffling. Nature, 370:389–391, 1994.

[32] Willem P. C. Stemmer. The evolution of molecular com-putation.Science, 270:1510–1510, December 1,1995.

[33] Willem P. C. Stemmer. Sexual PCR and assemblyPCR. In Robert M. Meyers, editor,The Encyclopedia ofMolecular Biology and Molecular Medicine, volume 5,pages 447–457. VCH, New York, 1996.

[34] Jack W. Szostak.In vitro genetics.Trends in Biochemi-cal Sciences, 172(3):89–93, 1992.

[35] H. J. Thiesen and C. Bach. Target detection assay (TDA)— A versatile procedure to determine DNA-bindingsites as demonstrated on SP1 protein.Nucleic Acids Re-search, 18(11):3203–3209, 1990.

[36] C. Tuerk and L. Gold. Systematic evolution ofligands by exponential enrichment — RNA lig-ands to bacteriophage-T4 DNA-polymerase.Science,249(4968):505–510, August 3, 1990.

[37] Erik van Nimwegen, James P. Crutchfield, and MelanieMitchell. Finite populations induce metastability in evo-lutionary search. Physics Letters A, 229(3):144–150,May 12 1997.

[38] Erik van Nimwegen, James P. Crutchfield, and MelanieMitchell. Statistical dynamics of the Royal Road geneticalgorithm.Theoretical Computer Science, to Appear.

[39] Roy Williams. Data powers of ten. Web page athttp://www.ccsf.caltech.edu/˜roy/dataquan.

[40] David Harlan Wood. Applying error correcting codesto DNA coding. In Harvey Rubin and David HarlanWood, editors,Preliminary Proceedings of the FourthAnnual Workshop on DNA Based Computers, held atthe University of Pennsylvania, June 15-19, 1998, pages109–110. University of Pennsylvania, 1998. (unrefereedextended abstract).

[41] David Harlan Wood, Junghuei Chen, Eugene Antipov,Walter Cede˜no, and Bertrand Lemieux. A DNA im-plementation of the Max 1s problem. In WolfgangBanzhaf, A. E. Eiben, Max H. Garzon, Vasant Honavar,Mark Jakiela, and Robert E. Smith, editors,GECCO-99: Proceedings of the Genetic and EvolutionaryComputation Conference, July 13-17, 1999, Orlando,Florida USA., San Francisco, 1999. Morgan Kaufman.

Documents

Physical Separation of DNA According to Royal Road Fitnesswood/449/notesNov13.pdf · 2003. 11. 13. · Physical Separation of DNA According to Royal Road Fitness ... However, selecting