13
J Comput Electron DOI 10.1007/s10825-014-0570-3 DNA: hardware and software of life Amand Lucas © Springer Science+Business Media New York 2014 Abstract In this introductory paper I will first go back in history and endeavor to explain in simple terms, with the support of optical diffraction experiments, just how X-ray fiber diffraction pictures lead Watson and Crick to discover the DNA double helix. Second I will present the geometrical and chemical structures of the molecule, the “hardware of life”, emphasizing in some detail the nature of the hydrogen bonding in the Watson–Crick (WC) base pairs A–T, G–C formed by the natural bases of the genetic alphabet. I will then discuss a class of twelve artificial analogues to these bases, some of which have been successfully synthesized by organic chemists by rearranging the pattern of hydrogen bonds of the base pairs. Adopting the perspective of theoretical computer science and error-coding theory, I will finally present DNA as the “software of life”, by discussing Mac Dónaill’s recent interpretation of the optimality of the natural genetic cipher as compared to other possible alphabets selected from the artificial analogues. Keywords DNA · X-ray diffraction · Optical simulations · Hydrogen bonding · Artificial basepairs · Genetic alphabet · Error-coding theory 1 Introduction: anniversaries In 2013 the scientific community has celebrated the 100th anniversary of Niels Bohr first quantum model of the atom (Fig. 1)[1]. The discovery was made at about the same time as several other breakthroughs in physics, particularly in the area of X-ray science. First Max von Laue (Fig. 1) demon- A. Lucas (B ) University of Namur, Namur, Belgium e-mail: [email protected] strated the diffraction of X-rays by an atomic lattice [2], thereby revealing their electromagnetic nature. This in turn led the Braggs (father and son Lawrence, Fig. 1) to the prac- tical use of X-rays for the determination of atomic structure of inorganic solids [3, 4]. The method of X-ray crystallog- raphy was thereby born and could be made quantitative as soon as an absolute scale of X-ray wavelengths was made available by Henry Moseley’s displacement rule (Fig. 1) for Bohr like emission of X-rays from atomic core levels [5]. The year 2014 has been declared International Year of Crystal- lography in celebration of the 100 anniversary of the Bragg method (http://www.iycr2014.org/ ). Soon after these initial discoveries, the X-ray diffraction technique was applied to biomolecules in fibre form [6, 7] or in crystal form [8, 9]. The hope was—and has remained the same ever since—that the structure of a biomolecule could hint at its function, the goal being to understand how life works at the molecular level and ultimately, to answer the age old question “what is life?”. With respect to the nature of life Bohr entertained strong views [10] inspired from the quantum mechanical concept of complementarity. Life and death are complementary, said he somewhat cryptically. He further claimed that life is an irreducible, unexplainable fact, like Planck quantum h, and that it probably involves new fun- damental laws compatible with, but beyond those of physics and chemistry. Although influential in creating a research legacy [11] such ideas would not be upheld by later discov- eries in biology [12]. By the time of Bohr’s death in 1962, no new laws had turned up nor found necessary. Prosaic kinds of chemical complementarity, such as in base pairing of nucleic acids, had indeed been found to preside over all bio-molecular processes but had nothing in common with the complementarity envisaged by the great quantum creator. Sixty years ago—another anniversary—the continuing search for bio-molecular structures and functions recorded 123

DNA: hardware and software of life

  • Upload
    amand

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DNA: hardware and software of life

J Comput ElectronDOI 10.1007/s10825-014-0570-3

DNA: hardware and software of life

Amand Lucas

© Springer Science+Business Media New York 2014

Abstract In this introductory paper I will first go back inhistory and endeavor to explain in simple terms, with thesupport of optical diffraction experiments, just how X-rayfiber diffraction pictures lead Watson and Crick to discoverthe DNA double helix. Second I will present the geometricaland chemical structures of the molecule, the “hardware oflife”, emphasizing in some detail the nature of the hydrogenbonding in the Watson–Crick (WC) base pairs A–T, G–Cformed by the natural bases of the genetic alphabet. I will thendiscuss a class of twelve artificial analogues to these bases,some of which have been successfully synthesized by organicchemists by rearranging the pattern of hydrogen bonds of thebase pairs. Adopting the perspective of theoretical computerscience and error-coding theory, I will finally present DNAas the “software of life”, by discussing Mac Dónaill’s recentinterpretation of the optimality of the natural genetic cipheras compared to other possible alphabets selected from theartificial analogues.

Keywords DNA · X-ray diffraction · Optical simulations ·Hydrogen bonding · Artificial basepairs · Genetic alphabet ·Error-coding theory

1 Introduction: anniversaries

In 2013 the scientific community has celebrated the 100thanniversary of Niels Bohr first quantum model of the atom(Fig. 1) [1]. The discovery was made at about the same timeas several other breakthroughs in physics, particularly in thearea of X-ray science. First Max von Laue (Fig. 1) demon-

A. Lucas (B)University of Namur, Namur, Belgiume-mail: [email protected]

strated the diffraction of X-rays by an atomic lattice [2],thereby revealing their electromagnetic nature. This in turnled the Braggs (father and son Lawrence, Fig. 1) to the prac-tical use of X-rays for the determination of atomic structureof inorganic solids [3,4]. The method of X-ray crystallog-raphy was thereby born and could be made quantitative assoon as an absolute scale of X-ray wavelengths was madeavailable by Henry Moseley’s displacement rule (Fig. 1) forBohr like emission of X-rays from atomic core levels [5]. Theyear 2014 has been declared International Year of Crystal-lography in celebration of the 100 anniversary of the Braggmethod (http://www.iycr2014.org/).

Soon after these initial discoveries, the X-ray diffractiontechnique was applied to biomolecules in fibre form [6,7] orin crystal form [8,9]. The hope was—and has remained thesame ever since—that the structure of a biomolecule couldhint at its function, the goal being to understand how lifeworks at the molecular level and ultimately, to answer theage old question “what is life?”. With respect to the natureof life Bohr entertained strong views [10] inspired from thequantum mechanical concept of complementarity. Life anddeath are complementary, said he somewhat cryptically. Hefurther claimed that life is an irreducible, unexplainable fact,like Planck quantum h, and that it probably involves new fun-damental laws compatible with, but beyond those of physicsand chemistry. Although influential in creating a researchlegacy [11] such ideas would not be upheld by later discov-eries in biology [12]. By the time of Bohr’s death in 1962,no new laws had turned up nor found necessary. Prosaickinds of chemical complementarity, such as in base pairingof nucleic acids, had indeed been found to preside over allbio-molecular processes but had nothing in common with thecomplementarity envisaged by the great quantum creator.

Sixty years ago—another anniversary—the continuingsearch for bio-molecular structures and functions recorded

123

Page 2: DNA: hardware and software of life

J Comput Electron

its greatest triumph with the discovery in 1953 of the DNAdouble helix boldly characterized by Watson and Crick asthe “Secret of Life” [13]. In the present introductory paper Iwill attempt (Sect. 2) to explain in non-technical terms justhow X-ray diffraction experiments on DNA fibres providedkey structural information which guided Crick and Watsonto their great discovery. To help with these explanations, Iwill show optical laser simulations (Sect. 3) which elucidatethe structural origin of the major features observed in theexperimental X-ray diffraction patterns.

In Sect. 4, the detail chemical structure of DNA will bereviewed with emphasis on understanding the nature and pat-tern of hydrogen bonds which bind the two DNA strandstogether via base pairing of the genetic letters. In Sect. 4 I willalso present an extension of the genetic alphabet with sev-eral analogues of the 4 natural bases synthesized by organicchemists. Following Mc Donaill [14–16] I will introduce a 4binary digit representation to symbolize the hydrogen bondpatterns for all the base analogues.

Finally in Sect. 5, I will present Mc Donaill’s interpreta-tion of the genetic message as a stream of 4-bit letters. I willsummarize his views on the optimality of the natural 4 lettergenetic alphabet as opposed to extended alphabets includingsynthetic nucleotides, in terms of error resistance in the infor-mation transfer processes of the replication, transcription andexpression machineries.

2 How X-rays cracked the DNA double helix structure

The reader may find it helpful to view a video by the author onthis subject by visiting the website http://vega.org.uk/video/programme/80.

2.1 The A-DNA and B-DNA X-ray patterns

The two very different X-ray diffraction photographs of nat-ural DNA which have contributed most immediately to thediscovery of the DNA structure sixty years ago [13,17–20]are shown in Fig. 2, along with the molecular models whichcorrespond to the diffraction patterns.

Pattern B is often presented as the only key X-ray diffrac-tion picture. However since pattern A also played a consider-able, if more subtle role in the discovery, I will present herea comparison between the two patterns.

Both pictures in Fig. 2 are fiber diagrams obtained froma macroscopic fiber of natural DNA placed vertically in afine monochromatic X-ray beam. The material in the fiber(a few tens of microns thick) comprises a large numberof long, roughly parallel, negatively charged DNA strands,Na+ counter ions and a relatively high amount of solvatingwater (up to ten H2O molecules per so-called “nucleotide”,the phosphate-sugar-base repeating monomer of DNA; see

Sect. 3). The solvated cations and water molecules are ina disordered state and cause only diffuse, background scat-tering. The diffraction tends to be dominated by the phos-phate groups, PO4, as the atomic form factor (the cross sec-tion) for X-rays depends on the square of the atomic num-ber.

Franklin and Gosling discovered [21–23] that it was pos-sible to pass reversibly from A to B by changing the relativewater content of the fiber. The drier state giving pattern Awas interpreted as that of a crystalline arrangement of theDNA molecules, with long range order across the fiber, asevidenced by the ordered reciprocal lattice of relatively sharpspots near the pattern center. The wetter state giving patternB with its broader features was called paracrystalline andviewed as representative of a molecular arrangement lackinglong range positional as well as rotational order as a resultof the disordering effect of the increased solvating sheath ofwater and counter ions surrounding the molecules. Pattern Bis that of a gel and is representative of the average scatteringby a single molecule, while pattern A, at least its center part,is representative of the crystal lattice, the information on theindividual molecular structure in that case being containedin the relative spot intensities.

Both patterns of spots or smears are organized in regu-larly spaced “layer lines”, that is the spots sit on equidistantlines perpendicular to the molecular axis. A layer line orga-nization is expected from a filamentous molecule having astructural unit regularly repeating every period P in the fiberdirection. The layer line separation � reveals the value of theperiod P, the relationship being approximately P = D λ /�

near the equator layer line, where D is the distance fromfiber to pattern and λ the X-ray wavelength. The spectac-ular reversible change in the type and distribution of spotsfrom A to B was taken as reflecting not only the change inthe long range spatial arrangement just discussed but alsoa reversible change of internal structure of the DNA mole-cules themselves (see the molecular model in Fig. 2). Themolecules were said to adopt the A-DNA or B-DNA con-formations, respectively. The B conformation is believed tobe the one adopted by chromosomic DNA (in associationwith proteins) in the high humidity condition of the livingcell. The A form is thought to correspond better to doublestranded RNA.

2.2 Key structural information of the X-ray patterns

Here we enumerate, without detailed explanation, the mainobserved features of the X-ray diffraction pictures of DNAin Fig. 2 and state the corresponding structural elementsof the molecule. A qualitative justification of these assign-ments will be given by the optical simulations experiments ofsection 3.

123

Page 3: DNA: hardware and software of life

J Comput Electron

Fig. 1 From left to right: NielsBohr, Max von Laue, LawrenceBragg and Henry Moseley

Fig. 2 The two celebrated X-ray fiber diffraction pictures of DNA in the A (left) and B (right) conformations, with corresponding molecularmodels

2.2.1 B-DNA: horizontal base pairs with axial repeatp = 0.34 nm

The big North and South meridian arcs seen in the B patternare due to X-ray scattering by the flat base pairs seen edgeon by the beam and acting like a diffraction grating of par-allel slits. The arcing is due to the fluctuations of the baseorientation around the horizontal.

2.2.2 B-DNA: helix period P = 3.4 nm

The B pattern has about ten layer line intervals between thecenter and the North blob, which shows that the moleculebackbone repeats every 10 nucleotides (per strand).

2.2.3 B-DNA: external phosphate helix of radius r = 1 nm

Pattern B shows a prominent Saint Andrew cross of smearsnear the center. The 60◦ angle φ between the arms of the crossreveals the radius R of the phosphate (or Phosphorous) helixvia the relationship P = 2 π R.tan(φ /2). The absence ofdiffracted intensity in the meridian angles of the cross impliesthat the phosphate backbone lies outside of the molecule. This

Fig. 3 High resolution B-DNA X-ray diffraction pattern obtained byLangridge et al. [25]. The layer lines are numbered 0–5; notice theextinction along the 4th layer line

in turn explains why DNA is capable of crystalizing at all (asin pattern A).

123

Page 4: DNA: hardware and software of life

J Comput Electron

Fig. 4 Ball-and-stick model ofthe double helix structure andWatson–Crick base-pairs. Thecounter-oriented red and greensine waves run through the twophosphorus helices (adaptedfrom http://en.wikipedia.org/wiki/DNA)

2.2.4 A-DNA: DNA is a dyad

The A-DNA crystal lattice was shown to pertain to theC-face centered monoclinic space group [18,24] (not appar-ent in Fig. 2a). This meant that the lattice has a twofold sym-metry axis C2 perpendicular to the (ac) face of the (abc) unitcell. This in turn entails that the molecule backbone itself isinvariant under C2 and thus must be a dyad, meaning that itis made of two counter-oriented strands, each strand beingknown to have a definite sense (called 3′–5′ by biochemistsin reference to the numbered carbon atoms of the backbonepentagonal sugar; see Sect. 3).

2.2.5 A-DNA: base pairs inclined to 20◦

With the pattern center, the smears on the 6th–8th layer linesin the east and west quadrants form a Saint Andrew crossof 40◦ meridian opening, revealing a base pair inclinationof 20◦ on the horizontal. The smears are produced by twobase pairs seen edge on every half period of the double helix(see the molecular model in Fig. 2). This information becameavailable only after the discovery of the structure.

2.2.6 B-DNA: two backbones axially shifted by 3P/8

Patten B shows no intensity on the 4th layer line. The betterresolved B-pattern of Fig. 3 [25] shows this more clearly.The layer line extinction confirms the plectonemic (inter-twined), dyadic nature of DNA and indicates an axial sep-aration of 3P/8 between the two backbone helical strands.Again this information wasn’t available to Crick and Watsonbut was mentioned by Franklin and Gosling [22]. In DNA,the unequal shift creates large and small grooves in the dou-ble helix (Figs. 4, 6) giving access to protein enzymes to thebase sequence.

2.3 Watson–Crick A-T, G–C base pairing

In February 1953, in a renewed attempt at model building[13], Watson and Crick built a spacious cylindrical cagemade of two counter-oriented, plectonemic helical sugar-phosphate strands by using handmade mechanical modelsof the components. They chose a right-handed double helix,for unstated reasons, probably for easing steric constraints.Inside that cage, Watson tried to install pairs of card boardmodels of the four planar bases A, T, G, C perpendicular tothe cage axis and to make the bases of each pair interact via

123

Page 5: DNA: hardware and software of life

J Comput Electron

Fig. 5 (Top) Diffraction slidewith twelve diffraction gratingsused for optical simulation ofX-ray diffraction by DNA. Ineach grating, the diffractionmotive is indicated in the upperleft corner. Gratings 11 and 12represent planar models ofA-DNA and B-DNA,respectively (see Table 1 fordetails). (Bottom) The twelveoptical diffraction patternsobtained by passing the light ofa red laser pointer through thetwelve gratings of the diffractionslide (see Table 1 for details).Patterns 11 and 12 can becompared with the real X-rayimages of Fig. 2

123

Page 6: DNA: hardware and software of life

J Comput Electron

Table 1 (Left) The diffraction motives of the set of 12 diffraction gratings in Fig. 5 (top). (Right) The 12 diffraction patterns shown in Fig. 5(bottom)

G1 A set of horizontal, equidistant slits. This gratingreproduces the original Thomas Young experimentwith n slits

P1 The pattern shows equidistant layer lines whose interval isinversely proportional to the slit separation

G2 A set of parallel, equidistant, oblique slitsvertically aligned

P2 The pattern of equidistant layer line shows intensity maximaaligned perpendicular to the inclined slits

G3 Mirror symmetric of G2 P3 Mirror symmetric of P2

G4 Zigzag motive combining G2 and G3 P4 Prominent Saint Andrew cross whose arms areperpendicular to the zigzaging segments

G5 Double zigzag representing the base pairsinclined at 20◦ in A-DNA which are seen edge-on

P5 Saint Andrew cross whose arms are perpendicular to thezigzaging segments. The intensity along one arm ismodulated by the interference of the double zigzag

G6 A continuous sine wave P6 Saint Andrew cross whose arms are perpendicular to thezigzaging sine wave. The meridional angle of the crossreveals the sine amplitude

G7 Two coaxial sines in phase but with differentamplitudes

P7 The central, prominent Saint Andrew cross is that of theexternal sine. There is no intensity in the meridian angles ofthat cross

G8 Two coaxial, out of phase sines. The true motiveperiod is P/2

P8 The layer line interval is doubled. There is no intensity inthe meridian angles of the cross

G9 Two coaxial sines shifted by 3/8 of the period, asin B-DNA

P9 The Saint Andrew cross has no intensity in the meridianangles. There is no intensity along the 4th layer line

G10 A single, atomized sine with 10 atoms perperiod P, as in B-DNA

P10 A diamond pattern whose meridian diagonal reveals theaxial atomic distance P/10 in the motive

G11 A model for A-DNA with two backbones of 11atoms and two base pairs seen edge on

P11 There is a Saint Andrew cross produced by the doubleinclined base pairs of the motive. The pattern compares wellwith that of A-DNA in Fig. 2 (left)

G12 A model for B-DNA with two backbones shiftedP/8, 10 atoms per period, and 10 edge-on base pairs

P12 The pattern has all the previous features, layer lines(Period 3.4 nm), diamond pattern (nucleotide repeat 0.34nm), Saint Andrew cross (helix radius 1 nm), no 4th layerlines (3P/8 shift). Compare to Fig. 2 (right)

hydrogen bonds across the duplex axis (Fig. 12). In this last,successful attempt, under the recommendation of Donohue,the bases were given the correct keto tautomeric form (insteadof enol, see Sect. 4) of the H-bonds [18,19]. Among the sixpossible couples, Watson stumbled on the A–T, G–C pair-ings as the only combinations with the proper H-bondingpattern and with nearly identical overall sizes (see Fig. 4)which could be made to fit between the rigid outer frameof the backbone. That clinching discovery rapidly led to thecompletion of the beautiful and definitive DNA double helixmodel. The Watson–Crick (WC) base pairing scheme ratio-nalized the empirical bio-chemical rules on the concentra-tions of the bases ([A]=[T], [G]=[C]) discovered earlier byChargaff [26] by chemical analysis of DNA of any origin.

The enormous achievement of Watson and Crick [27] wastherefore threefold: (i) they elucidated the geometrical, dou-ble helix structure of the molecule and its role as the repos-itory of the genetic information written as a linear sequenceof bases; (ii) they discovered the fundamental WC base pair-ing phenomenon which weakly binds the two helices (basesequence in double, complementary copies); (iii) they statedthe immediate implication of the structure, namely the prob-

able mechanism of “semi-conservative” replication of themolecule by WC base pair templating.

3 Simulations by optical diffraction

In this section we present optical simulations of the diffrac-tion of X-rays by DNA which aim at a physical understand-ing of the structural content of the X-ray pictures describedabove. Diffraction of optical waves by suitable 2-D modelgratings, the so-called optical transform method, has beenused for many decades in crystallography to mimic X-raysdiffraction by atomic lattices, especially before the advent ofpowerful methods and computers for the reconstruction of thestructure from the diffraction pattern. The idea of such opti-cal simulations first originated with Lawrence Bragg himself[28,29]. Today the availability of highly coherent light ofcheap laser pointers has rendered the method easily accessi-ble to anyone and is of great didactic value.

Taking the known geometrical dimensions of A-DNA andB-DNA into account, I have created a set of 12 optical diffrac-tion gratings held on a single standard 5×5 slide shown in Fig.

123

Page 7: DNA: hardware and software of life

J Comput Electron

Fig. 6 The structure of theDNA double helix in its Bconformation (left). The twosugar-phosphate backbones areoriented in opposite directions,one in the 5′–3′ direction andthe other in the 3′–5′ direction.The flat, horizontal bases areseen edge on in the left cartoon.The A–T and G–C pairs, seenface on in the unwound sectionat the right, are linked by 2H-bonds and 3 H-bondsrespectively. From http://www.nature.com/scitable/topicpage/discovery-of-dna-structure-and-function-watson-397

5. Each grating consists of a large number of identical motivesdensely repeated in parallel (about 5 motives per mm). Fromgrating 1 to 12 the motives are conceived to reconstruct one byone the diffracting elements of the molecular conformations.The diffraction patterns are projected on a white screen a fewmeters away. The corresponding twelve diffraction patternsproduced with a red laser pointer are shown in Fig. 5. Thegratings and patterns are briefly described in Table 1 [30–33] which gives enough detail for the reader to understandthe simulations. With a green pointer the simulations workin broad day light (a DNA diffraction kit inspired from thepresent demonstration has been made available by the Insti-tute of Chemical Education, see http://ice.chem.wisc.edu/Catalog.html; the present diffraction slide may be orderedfrom [email protected]).

4 The double helix

4.1 Composition and structure

The full chemical composition and the structure of DNA aresketched in Fig. 6.

DNA is a polymer whose monomer unit, called nucleotide,is one base-sugar-phosphate complex. The name Desoxyri-bose in DNA refers to the removal of the O atom of theOH hydroxyl group normally attached to the 2′ carbon ofthe Ribose (in RNA). The backbone is made of a monoto-nous repetition of a sugar pentose and a phosphate ion. Theso-called phosphodiester covalent linkage of the two groupsvia the 3′ and 5′ sugar carbons is shown in Fig. 6. In dou-ble stranded DNA, the two strands form a dyad, i.e. one isoriented in the direction 3′–5′ and the other in the direction5′–3′, and are axially shifted by 3P/8, as described in Sect. 3.Watson and Crick were forced to introduce a shift of aboutthis much when constructing their original mechanical modelin order to establish the H-bonds between the edges of theflat bases across the duplex of counter-oriented backbones.

The full chemical formula of the four bases Adenine (A),Thymine (T), Guanine (G), Cytosine (C) are displayed inFig. 6. They are attached to the backbones by a covalentbond linking the 1′ sugar carbon to a nitrogen atom of thepentagonal ring of the purine (A or G) or the hexagonal ringof the pyrimidine (T or C). The sequence of bases on onestrand is complementary in the WC sense to the sequenceon the other. The sequence constitutes the genetic messagewritten in a 4-letter cipher.

123

Page 8: DNA: hardware and software of life

J Comput Electron

Fig. 7 Tautomerism of the natural bases. The curved arrows indicatethe shift of H protons leading from the Enol-like to the more stableKeto tautomers. Notice also the rearrangement of the double bondsconfiguration in and around the rings

Fig. 8 Formation of a Hydrogen bond between two electronegativeatoms X and Y (nitrogen, oxygen, fluorine,. . .) covalently attached totwo different molecules or two distinct parts of the same molecule. Theδ’s indicate fractions of the electron charge. The X and Y atoms takingpart in the H bonding are designated as (proton) donor and acceptor,respectively

It is important for what follows to carefully note in Fig.6 the pattern of H-bonds linking each base pair, i.e. whereexactly do the H protons sit at the fringe of the bases fac-ing each other. The natural bases can exist in different so-called tautomeric forms shown in Fig. 7. In their last his-torical drive towards building the double helix, Watson andCrick were attempting, in vain, to insert the base pairs in theirenol forms (Fig. 7) inside the backbone helical cage. The H-bond pattern wouldn’t fit. Not until the biochemist Donohue

[18], who was sharing their office, pointed out that the baseswere in the more stable keto tautomeric forms did Watsonfinally succeed in discovering the correct WC base pairing.The tautomerism of the natural purines and pyrimidines haslong been thought to play a role in spontaneous mutagenesis[34,35]. The possibility of such tautomeric mutations willbecome clear in the next section.

4.2 H-bond: the bonding of life

The very occurrence of liquid water, of biology based on theelements CHNOP. . ., of life itself on this planet and pos-sibly on extra solar planetary systems, hinges on the inter-mediate size of the binding energy of the H-bond. Indeedremember that the scale of chemical binding energies spreadsfrom several eV per bond (100’s of kcal/mol) for the strongcovalent binding to a few meV for the weak van der Waalsforces. At about 0.25 eV the H-bond strength is intermedi-ate, yet an order of magnitude larger than the room tem-perature thermal energy of kBT ≈ 25 meV. This meansthat the H-bond network of biomolecules, tissues and organ-isms can resist thermal disordering at room T but not muchhigher.

A hydrogen bond tends to form when an electronega-tive atom Y approaches a hydrogen atom bound to anotherelectro-negative atom X as depicted in Fig. 8 [36,37].

The bonding is partly ionic and partly covalent (i.e. direc-tional, as the X–H· · ·Y atoms prefer to be aligned on a straightline). The most ubiquitous example of hydrogen bondingoccurs between water molecules. In liquid water a fluctuat-ing network of multiple H-bonds forms, every water mole-cule being transiently H-bonded with up to four near neigh-bor molecules. This is responsible for the high tension ofthe surface of water on which some insects with “hydropho-bic” legs can walk. Another notable example of cooperativeH-bonding is the very high stability of cellulose, the mostabundant organic polymer on Earth, in plant cell walls, woodfibers, etc. . . Molecules (or parts of molecules) are said to behydrophobic (non soluble in water) or hydrophilic (soluble)according to whether their interaction with water (alwaysattractive) is weaker or stronger than the water–water H-bonds, respectively. This property, derived as it does fromthe very existence of the H-bond cohesion of water itself,plays an essential role in the formation of lipid bilayers ofcellular membranes or of intra cellular vesicles in the highhumidity environment of the cell. Intra and inter molecu-lar H-bonds are also ubiquitous in proteins in water wherethey directly influence the formation and stability of theirsecondary and tertiary spatial structures such as the Paulingα-helix, the β-sheets, protein coiled-coils, globular enzymes,etc. . .

123

Page 9: DNA: hardware and software of life

J Comput Electron

Fig. 9 (Top) The natural WC base pairs and their H-bonding patternsdescribed by the conventional donor–acceptor definition of Fig. 8. Inthe column vector representation shown under the chemical formulae(bottom), Py stands for Pyrimidine (monocyclic) and Pu for Purine(bicyclic). Alternatively each base can be given a purely binary digit

representation according to the convention: D = 1, A = 0, Py = 1, Pu =0. R represents the linkage to a sugar-phosphate backbone. Notice thatAdenine in the WC A–T base pair has been replaced by an idealizedaA–T pair, where aA designates amino adenine

Fig. 10 Thymine (in DNA) and uracil (in RNA) have the sameH-bonding pattern and the same 4-bit representation

4.3 H-bonding patterns in the natural bases

Most importantly of course, as described previously, H-bonding is essential in the WC base pairing phenomenonwhere it is omnipresent in transactions involving DNA andother types of nucleic acids in replication, transcription andtranslation of the genetic message.

For discussing the aspect “software of life” of DNA inSect. 5 below, let us follow Mac Dónaill [14–16] and intro-duce a systematic way of describing the pattern of H-bondsoccurring in the genetic letters A, T, G, C. Following the defi-nition given in Fig. 8 each of the four bases is represented by a

4-vector with components A, D, Pu or Py, or, more abstractly,by a purely 4-binary digit vector according to the (arbitrary)assignments shown in Fig. 9. It will be convenient for thefollowing discussion to replace the natural base Adenine byamino Adenine aA. In this idealized form a second aminegroup NH2 (at the bottom in Fig. 9, left) has substituted forH, which allows forming a third H-bond with the facing O ofThymine. The reason for doing this is to deal with base pairsall having triple H-bonds, as will become clearer below. Thereplacement is harmless as long as the stability of the A–Tbase pair is not under consideration and can be removed atthe end of the argument [14–16].

It is important to note (Fig. 10) that the bases Thymine(used in DNA) and Uracil (used instead of T in RNA) haveidentical 4-bit representations, as they should since both formWC base pairs exclusively with Adenine in transcription(DNA–mRNA association) as well as in translation (mRNA–tRNA association).

4.4 Extending the natural alphabet with artificial bases

Organic chemists have synthesized new, artificial bases, ana-logues to the natural bases, that can be incorporated intonucleic acid polymers [38–40]. The idea is to rationallydesign these new molecules by shuffling the normal H-bondacceptor (A) and donor (D) groups, so that they appear indifferent order at the fringe of the bases facing each other inthe base pair [41]. The ambitious program dubbed AEGIS(Artificially Expanded Genetic Information System) [39]has achieved the synthesis of several new bases which have

123

Page 10: DNA: hardware and software of life

J Comput Electron

Fig. 11 As indicated by thecurved arrows, swapping amineNH2 and oxygen O in thenatural WC pair C–G, createstwo new paired bases isoC andisoG. The corresponding 4-bitrepresentations of the bases areshown

Fig. 12 Eight new bases withtheir 4-bit representationsarranged in WC base pairs [40]

been successfully incorporated into synthetic DNA and RNAstrands polynucleotides.

Figure 11 shows one such artificial base pair between isoCand isoG, two isomers of C and G obtained by swappingthe places occupied by the functional groups NH2 and O inthe original bases, as shown by the curved arrows. FurtherH-bond permutations have been realized by Benner’s group[40] with the creation of three new artificial base pairs V–J,κ–X and Z–P exhibited in Fig. 12.

These extraordinary laboratory creations do not exhaustthe complete alphabet of all the 24 = 16 possible bases

(b1, b2, b3, b4) where bi = 0 or 1, created from the exten-sion of the natural 4 letter alphabet A, T, G, C. The completelist is presented in a highly schematized format in Fig. 13.

5 Software of life

5.1 Life as a code

A few days after the momentous discovery of the doublehelix, Francis Crick wrote an enthusiastic letter to his teenage

123

Page 11: DNA: hardware and software of life

J Comput Electron

Fig. 13 The complete alphabetof 16 4-bit letters (b1, b2, b3, b4)constructed by extending thenatural alphabet (dottedrectangle on the upper left) byshuffling the H-bond pattern.The letters are arranged in 8 WCbase pairs following theconventions of Fig. 9. Thedouble dots around acceptorsdesignate electron lone pairs.The bases and base pairs aresplit into parity classes (leftcolumn even letters; rightcolumn odd letters) (see text).The symbolic letters for theartificial bases are those used byMac Dónaill [14–16]. The pairsβ–δ and α–� are denoted V–Jand Z–P in Fig. 12, respectively[40]

Fig. 14 The transmission ofletters of the 2-bit alphabet B2.If the transmission line issubject to noise which changesone or several bits, the error willgo undetected by a detectorwhich accepts any B2 letter

son Michael in which he stated “Now we believe that theD.N.A. is a code” [42]. He was expressing tersely the keysoftware concept, quite novel at the time but implicit in thevery basic fact of heredity, that life consists primarily in cod-ing, processing and transmission (with variations) of digitalinformation, a concept discovered by the stupefied scientists[11] to be inherent in the geometrical and chemical structuresof the molecular “hardware” used in life processes. Here wewish to adopt this line of thoughts and end this paper bypresenting some of Mac Dónaill’s remarkable ideas on anerror coding theory [14–16] for the evolutionary selection ofthe natural nucleotide alphabet of Fig. 13. In this figure, theextended alphabet of natural and artificial bases has been splitinto two classes according to the parity of the 4-bit represen-

tation of the letters. The parity of a letter (b1, b2, b3, b4) issimply the parity of the sum �ibi or the parity of the numberof 1’s, since 0 is even. Mac Dónaill’s addresses, among oth-ers issues, the three observational facts: (1) all four naturalletters are even; (2) the other even, artificial letters are notused by Nature; (3) none of the 8 odd, artificial letters areused by Nature.

5.2 Error-coding theory

Mac Dónaill [14–16] goes about addressing these facts byreferring to several concepts in coding and transmitting dig-ital information. Here we shall present just one of these

123

Page 12: DNA: hardware and software of life

J Comput Electron

Fig. 15 Principle fortransforming an error prone,mixed parity B2 alphabet into anerror resistant, even parityalphabet B3 by adding a paritybit to the alphabet letters. Theeven B3 letters are now the evencoordinates of the corners of acube. Single bit errors can thenbe detected by testing the parityof each letter and eventuallycorrecting it for transmission

concepts and refer to Mac Dónaill for a discussion of otheraspects of coding theory. Consider what happens when trans-mitting digital letters along a line subject to noise causing biterrors. Figure 14 illustrates the problem for the transmis-sion of a stream of 2-bit words. Notice that a single bit errorchanges the parity of the letter. Such a bit error will go unde-tected if the receiver accepts any and all of the alphabet lettersissued by the message source. The B2 alphabet with mixedparity letters is therefore prone to bit errors.

The purely informatics trick to cure this is to add a paritybit to the letters so that all letters used in the alphabet fortransmitting information have a fixed parity, either even orodd but not mixed. The principle is illustrated in Fig. 15where it can be seen that single bit errors are easily detectedand corrected for by testing the parity of each word ahead ofthe receiver.

The strategy of using fixed parity letters appears to havebeen selected by Nature in evolving a genetic alphabet withonly the even parity bases shown in Fig. 13. In this interpre-tation of the natural genetic cipher, it appears that the sizebit b4 (purine or pyrimidine), is related to the bit pattern ofH-bonds (b1, b2, b3) as an added parity bit, keeping in mindhowever that the size of the base pairs is de facto imposed bythe rigid DNA backbone hardware. This idea at once explainsfact 3 mentioned above that none of the odd genetic lettersis used in existing DNA or RNA polynucleotides along withthe even nucleotides. For example, the mixed parity alphabetextended to six letters (aA, T, G, C, X, �) would be errorprone e.g. in replication. Indeed with such an alphabet, Xcould substitute for G in G–C and � for aA in aA–T since, byinspection of Fig. 13, both base pairs X–C and �–T are seento have two favorable H-bonds and only one weak repulsionbetween lone pairs on opposing acceptor atoms [14–16].

The above argument from error coding theory does notexclude the use of two or more of the other even bases iC,iG, L and S along with the natural subset. Nor does it excludethe use of an alphabet of 4 or more letters chosen exclusivelyfrom the odd class in Fig. 13. But to argue against such pos-sibilities Mac Dónaill [14–16] enumerates several “chemicalconstraints”, among which tautomeric instability (such asthose of Fig. 7). This excludes the unstable tautomeric basesiC, iG, β, δ, α and � of the extended family of Fig. 13. As tothe base pairs L–S and �–�, they are totally unstable in wateras the 3 acceptor lone pairs are vulnerable to hydrolysis.

Mac Dónaill also discusses why natural evolution, appar-ently for reasons of efficiency, did not stop at using a minimalalphabet of just two letters, namely one of the three pairs ofequally stable tautomers A–T, G–C or κ–X, an outstandingevolutionary problem already discussed by Crick [43]. Werefer to Mac Dónaill’s papers for more detailed discussionsof these fascinating issues.

Acknowledgments This paper is dedicated to the memory of my dearfriend and colleague, the late geneticist Jean Vandenhaute of the Univer-sity of Namur, Belgium. I thank my colleagues, professors Guy Maghuinand Jacques Pasteels for reading critically this manuscript and for manydiscussions on its subject.

References

1. Bohr, N.: On the constitution of atoms and molecules. Philos. Mag.26, 1–25 (1913)

2. von Laue, M.: Phys. Z. 14, 421–423; 1040–1041; 1075–1079(1913)

3. Bragg, W.L.: The diffraction of electromagnetic waves by a crystal.Proc. Camb. Philios. Soc. 17, 43–57 (1913)

123

Page 13: DNA: hardware and software of life

J Comput Electron

4. Bragg, W.L.: The structure of some crystals as indicated by theirdiffraction of X-rays. Proc. R. Soc. (Lond.) A89, 248–277 (1913)

5. Moseley, H.G.J.: The high frequency spectra of the elements. Phi-los. Mag. 27, 703–713 (1914)

6. Polanyi, M.: The X-ray fiber diagram. Z. Phys. 7, 149–180 (1921)7. Astbury, W.T., Street, A.: X-ray studies of the structure of hair,

wool and related fibers. Philos. Trans. R. Soc. Lond. A230, 75–101 (1931)

8. Bernal, J.D., Crowfoot, D.: X-rays photographs of crystallinepepsin. Nature 133, 794–795 (1934)

9. Bernal, J.D., Fankuchen, I., Perutz, M.: An X-ray study of chy-motrypsin and haemoglobin. Nature 141, 523–524 (1938)

10. Bohr, N.: Light and life. Nature 131(421–423), 457–459 (1933)11. Delbrück, M.: Light and life III. Carlsberg Res. Commun. 41(6),

299–309 (1976)12. Stent, G.: Light and life: Niels Bohr’s legacy to contemporary biol-

ogy. Science 160, 384 (1968)13. Watson, J.D.: The Double Helix. Athenaeum, New York (1968)14. Mac Dónaill, D.A.: A parity code interpretation of nucleotide

alphabet composition. Chem. Commun. 18, 2062–2063 (2002)15. Mac Dónaill, D.A.: Why nature chose A, C, G and U/T: an error-

coding perspective of nucleotide alphabet composition. Orig. LifeEvol. Biosph. 33, 433–455 (2003)

16. Mac Dónaill, D.A .: IEEE Engineering in Medicine and BiologyMagazine, Jan–Feb (2006)

17. Maddox, B.: Rosalind Franklin—The Dark Lady of DNA. HarperCollins, London (2002)

18. Judson, H.F.: The Eighth Day of Creation. Penguin Books, London(1979)

19. Olby, R.: The Path to the Double Helix. The Discovery of DNA.Dover Publications, New York (1994)

20. Fuller, W.: Who said “Helix”. Nature 424, 876–878 (2003)21. Wilkins, M.H.F., Stokes, A.R., Wilson, H.R.: Molecular structure

of nucleic acids: molecular structure of deoxypentose nucleic acids.Nature 171, 739 (1953)

22. Franklin, R., Gosling, R.G.: Molecular configuration in sodiumthymonucleate. Nature 171, 740–741 (1953)

23. Franklin, R., Gosling, R.G.: The structure of sodium thymonucleatefibres. I. The influence of water content. Acta Cryst. 6, 673–677(1953)

24. Crick, F.H.C.: What Mad Pursuit. Basic Books-Harper Collins,New York (1988)

25. Langridge, R., Seeds, W.E., Wilson, H.R., Hooper, C.W., Wilkins,M.H.F., Hamilton, L.D.: Molecular structure of deoxyribonucleicacid (DNA). J. Biophys. Biochem. Cytol. 3, 767 (1957)

26. Chargaff, E.: Chemical specificity of nucleic acids and the mech-anism of their enzymatic degradation. Experientia 6, 201–209(1950)

27. Watson, F.W., Crick, F.H.C.: A structure for deoxyribose nucleicacid. Nature 171, 737–738 (1953)

28. Bragg, W.L.: A new type of “X-ray Microscope”. Nature 143, 678(1939)

29. Bragg, W.L.: Lightning calculations with light. Nature 154, 69–72(1944)

30. Lucas, A.A., Lambin, P.H., Mairesse, R., Mathot, M.: Revealingthe backbone structure of B-DNA from laser optical simulations ofits X-ray diffraction diagram. J. Chem. Educ. 76, 378–383 (1999)

31. Lucas, A.A.: Rosetta stone of the genetic language. Int. J. QuantumChem. 90, 1491–1504 (2002)

32. Lucas, A.A.: A -DNA and B-DNA : comparing their historicalX-ray fiber diffraction images. J. Chem. Educ. 85, 737–744 (2008)

33. Lucas, A.A., Lambin, P.: Diffraction by DNA, carbon nanotubesand other helical nanostructures. Rep. Prog. Phys. 68, 1181–1249(2005)

34. Shugar, D.L., Kierdaszukl, B.: New light on tautomerism of purinesand pyrimidines and its biological and genetic implications, Proc.Int. Symp. Biomol. Struct. Interactions, Suppl. J. Biosci. 8, 657–668 (1985)

35. Manoj, K.S., Leszczynski, J.: Tautomerism in nucleic acid basesand base pairs: a brief overview. WIREs Comput. Mol. Sci. 3, 637–649 (2013)

36. Arunan, E., et al.: Defining the hydrogen bond: an account (IUPACTechnical Report). Pure Appl. Chem. 83, 1637–1641 (2011)

37. Gilli, G., Gilli, P.: The Nature of the Hydrogen Bond: Out-line of a Comprehensive Hydrogen Bond Theory, Published toOxford Scholarship Online: September 2009. Print ISBN-13:9780199558964

38. Piccirilli, J.A., Krauch, T., Moroney, S.E., Benner, S.A.: Enzymaticincorporation of a new base pair into DNA and RNA extends thegenetic alphabet. Nature 343, 33–37 (1990)

39. Yang, Z., Hutter, D., Sheng, P., Sismour, A.M., Benner, S.A.: Artifi-cially expanded genetic information system: a new base pair with analternative hydrogen bonding pattern. Nucleic Acids Res. 34(21),6095–6101 (2006)

40. Benner, S.A., Yang, Z., Chen, F.: Synthetic biology, tinkering biol-ogy, and artificial biology. What are we learning? C. R. Chim. 14,372–387 (2011)

41. Szathmary, E.: Perspectives nature reviews. Genetics 4, 995 (2003)42. Lee, J.J.: http://newswatch.nationalgeographic.com/2013/04/11/

francis-cricks-letter-to-son-describing-dna-auctioned/. Accessed11 Apr 2013

43. Crick, F.H.C.: The origin of the genetic code. J. Mol. Biol. 38,367–379 (1968)

123