14
To appreciate bioinformatics, it is necessary to un- derstand some of the basic concepts and terminol- ogy of molecular biology. This article is a brief in- troduction to the extraordinarily complex phenomenon of life and to its molecular basis. We begin with the amazing diversity of life forms and the equally amazing unity in the molecules under- lying life’s processes. The challenge of accounting for both the variety and the commonalities among organisms is met by evolutionary theory; despite controversies, all scientific approaches to under- standing life build on a shared core that can briefly be stated. One of the great insights of the last gen- eration of biologists was the chemical instantiation of these evolutionary theories, whose discovery has driven biology toward the study of the structure and function of biological molecules. After an in- troduction to some of these key molecules and to the central dogma of molecular biology, we can be- gin to see the outlines of how such molecules can accomplish the tasks required of simple and then more complex life forms. The introduction con- cludes with a brief account of some of the new in- struments and model systems that are now so rapidly advancing scientific understanding of life. L ife is an extraordinarily complex pheno- menon. Although the study of living things dates at least as far back as Aristotle (ca. 300 BCE), the advent of tools that allow the interrogation of living systems in molecu- lar detail and genomic breadth makes this a particularly exciting era in the history of biol- ogy. The purpose of this introduction is to help you begin to understand and appreciate our growing understanding of what living things are, what they do, and how they do it. It is perhaps the holistic nature of the sub- ject matter that makes creating an accessible introduction to biology so difficult. Under- standing any aspect of living things can seem to require understanding of dozens of other as- pects. There is no easy place to begin, no sim- ple set of problems that can be grasped in iso- lation as a prelude to deeper understanding. The study of life is really many studies: evolu- tion, biochemistry, genetics, pathology, 1 and ecosystems, just to name a few. The purpose of a brief introduction such as this one is to im- part enough knowledge about enough differ- ent aspects of life to provide a foundation for more detailed understanding of the particulars relevant to the other articles in this issue. A useful metaphor to keep in mind is that learning biology is akin to learning a foreign language. First, there is an extensive specialized vocabulary that biologists use to characterize living systems and their properties. To under- stand the biological literature, one must learn these terms and how they are used. This article introduces many such terms, using italics to set them off. As might become clear in this intro- duction, language is also a useful metaphor for understanding the structure and function of bi- ological systems at the molecular level. The “book of life” is an apt and useful idea. Learn- ing a foreign language involves more than just learning words. Languages are an intimate part of cultures; so, for example, learning French generally involves learning something about French culture as well as French words—like- wise in biology. Biologists approach scientific problems somewhat differently than physicists, chemists, and other colleagues. One of the major differences between biolo- gy and other physical sciences is the central role of detailed descriptions of the phenomena under study compared to general theoretical constructs. The English physicist Ernest Ruth- erford once dismissed biology as mere “stamp collecting,” poking fun at this aspect of the sci- ence. Although it is true that a central aspect of biological science aims to create detailed (and accurate) descriptions of living things and their activities, a better metaphor than stamps might Articles SPRING 2004 9 Life and Its Molecules A Brief Introduction Lawrence Hunter Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00

Articles Life and Its Molecules

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Articles Life and Its Molecules

■ To appreciate bioinformatics, it is necessary to un-derstand some of the basic concepts and terminol-ogy of molecular biology. This article is a brief in-troduction to the extraordinarily complexphenomenon of life and to its molecular basis. Webegin with the amazing diversity of life forms andthe equally amazing unity in the molecules under-lying life’s processes. The challenge of accountingfor both the variety and the commonalities amongorganisms is met by evolutionary theory; despitecontroversies, all scientific approaches to under-standing life build on a shared core that can brieflybe stated. One of the great insights of the last gen-eration of biologists was the chemical instantiationof these evolutionary theories, whose discovery hasdriven biology toward the study of the structureand function of biological molecules. After an in-troduction to some of these key molecules and tothe central dogma of molecular biology, we can be-gin to see the outlines of how such molecules canaccomplish the tasks required of simple and thenmore complex life forms. The introduction con-cludes with a brief account of some of the new in-struments and model systems that are now sorapidly advancing scientific understanding of life.

Life is an extraordinarily complex pheno-menon. Although the study of livingthings dates at least as far back as Aristotle

(ca. 300 BCE), the advent of tools that allowthe interrogation of living systems in molecu-lar detail and genomic breadth makes this aparticularly exciting era in the history of biol-ogy. The purpose of this introduction is to helpyou begin to understand and appreciate ourgrowing understanding of what living thingsare, what they do, and how they do it.

It is perhaps the holistic nature of the sub-ject matter that makes creating an accessibleintroduction to biology so difficult. Under-standing any aspect of living things can seemto require understanding of dozens of other as-pects. There is no easy place to begin, no sim-

ple set of problems that can be grasped in iso-lation as a prelude to deeper understanding.The study of life is really many studies: evolu-tion, biochemistry, genetics, pathology,1 andecosystems, just to name a few. The purpose ofa brief introduction such as this one is to im-part enough knowledge about enough differ-ent aspects of life to provide a foundation formore detailed understanding of the particularsrelevant to the other articles in this issue.

A useful metaphor to keep in mind is thatlearning biology is akin to learning a foreignlanguage. First, there is an extensive specializedvocabulary that biologists use to characterizeliving systems and their properties. To under-stand the biological literature, one must learnthese terms and how they are used. This articleintroduces many such terms, using italics to setthem off. As might become clear in this intro-duction, language is also a useful metaphor forunderstanding the structure and function of bi-ological systems at the molecular level. The“book of life” is an apt and useful idea. Learn-ing a foreign language involves more than justlearning words. Languages are an intimate partof cultures; so, for example, learning Frenchgenerally involves learning something aboutFrench culture as well as French words—like-wise in biology. Biologists approach scientificproblems somewhat differently than physicists,chemists, and other colleagues.

One of the major differences between biolo-gy and other physical sciences is the centralrole of detailed descriptions of the phenomenaunder study compared to general theoreticalconstructs. The English physicist Ernest Ruth-erford once dismissed biology as mere “stampcollecting,” poking fun at this aspect of the sci-ence. Although it is true that a central aspect ofbiological science aims to create detailed (andaccurate) descriptions of living things and theiractivities, a better metaphor than stamps might

Articles

SPRING 2004 9

Life and Its MoleculesA Brief Introduction

Lawrence Hunter

Copyright © 2004, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2004 / $2.00

Page 2: Articles Life and Its Molecules

and their relationships with each other; thisstudy is called taxonomy. In many respects, mi-croscopic life is considerably more varied thanthe life forms we can see. Many of these lifeforms consist of a single cell, which is the fun-damental unit of life. (An adult human beingcontains more than a trillion cells.) The diver-sity of microscopic species in a cubic meter ofseawater can rival that of macroscopic speciesin a cubic kilometer of rainforest. Because mostmicrobes do not grow well in a laboratory, ithas only been with the advent of moleculartechnology that the extent of microbial diver-sity has become apparent.

It is also worth noting that human beingshave been present for only a vanishingly smallportion of the history of life, at most the pastmillion years or so. In comparison, dinosaursroamed the earth for more than 160 millionyears. Peering deep into fossil history shows usmany organisms that thrived for millions ofyears but are like nothing alive today, such asthe five-eyed, vacuum-nose Opabinia from theCambrian era.

One of the major challenges of biology as ascience is to account for this diversity. How didit arise? How is it maintained? Why is it thisparticular set of diverse entities and processesand not some other?

UnityGiven the extraordinary diversity, one of themost surprising discoveries in the history of bi-ology is near universality of the molecular de-tail underlying all living things. The instru-ments and experimental approaches necessaryto even perceive anything molecular at all area relatively recent phenomenon, and it is onthe basis of that newfound ability to investi-gate life at a molecular level that so much ofthe recent progress and excitement has come.

All living things ever encountered dependcrucially on the activities of the unusual andcomplex family of molecules called proteins.There are hundreds of thousands of differentkinds of proteins, and they work together inlarge groups to carry out almost every biologicalfunction. Two rather extreme examples of pro-teins include hemoglobin, which carries oxygenin the blood, and anthrax toxin, a lethal poisoncreated by a microbe. As described in more de-tail later, proteins are the entities responsible forthe near miracles of chemistry required to turnfood into bodies and offspring. The proteinsthat accomplish a particular function in one or-ganism are generally quite similar to the pro-teins that do similar functions in many other or-ganisms. The unity among organisms is not

be collecting biographies. Although stamps dovary, their variations are quite constrained, andmost variations are not particularly tied tofunction. However, variation in living things,like variation in human life stories, is extraordi-narily broad and so central to what it is to bealive that it is itself a phenomenon worthy ofstudy. The many details of the complex story ofan organism play a synergistic role in under-standing it; reducing these details to a simplercharacterization runs the risk of caricature. Letus therefore turn first to examining the diversi-ty of life stories that make up the subject matterof biology.

DiversityOne of the most clearly distinguishing featuresof life as a whole is its diversity. Organisms ex-hibit an overwhelming collection of differ-ences. Most people are familiar with only atiny fraction of the kinds of life on earth, andeven that small sample includes enormousvariation. There are more than a millionknown species, and estimates of the numberyet to be characterized range from 10 to 100million additional species.

Consider the differences among just a feworganisms, say, mayflies, grizzly bears, tor-toises, dinosaurs, earthworms, guppies, and ea-gles. Some are huge, others tiny. They exhibittremendous differences in how they feed, howthey reproduce, what environments they cansurvive in, how long they live, what their sen-sory and motor abilities are, and so on. Thereare organisms whose home environments areso remote from our intuitions about what ishospitable to life that they are called ex-tremophiles, for example, creatures that live involcanic vents on the deep ocean floor or inacids so strong they can dissolve most familiarmaterials immediately.

Whole species differ greatly from one anoth-er, but there are also large variations among in-dividuals within a single species. Even within asingle individual, there can exist an amazingdiversity of organs, tissues, and other compo-nents. This diversity in the activities and con-stituents of living things continues all the waydown to the molecular level, where even in thesimplest organisms many thousands of mole-cules interact with each other in as yet un-countable ways. As one becomes more familiarwith the details of species, individuals, tissues,and molecules, life’s diversity becomes evenmore striking.

An important part of understanding biologyis developing at least a moderately detailed ap-preciation for the many species of living things

One of themost clearly

distinguishingfeatures of lifeas a whole isits diversity.

Articles

10 AI MAGAZINE

Page 3: Articles Life and Its Molecules

merely that proteins generally do most of thebiochemical work required for life but that verysimilar sets of proteins doing very similar kindsof things are found in extraordinarily diverse or-ganisms. Many of the proteins in human beingsare remarkably similar in structure and functionto those found in, say, brewer’s yeast!

The ubiquity of proteins is not the only re-markable unity among organisms. All livingthings make important use of another unusualand complex family of molecules, the nucleicacids. There are two distinct kinds of nucleicacids—(1) deoxyribonucleic acid (DNA) and (2)ribonucleic acid (RNA)—which play somewhatdifferent but related roles as the informationcarriers of life.

Together, the nucleic acids and the proteinsare called biological macromolecules, based ontheir large size compared to most inorganicmolecules. If one could stretch it out, a singleDNA molecule can be more than a meter long(although only a few hundred angstromswide)! Both proteins and nucleic acids are lin-ear polymers, which are molecules made up oflong strings of just a few basic components.The components of proteins are called aminoacids, and the components of nucleic acids arenucleotides. It is the particular relationshipsamong components that give an individualmacromolecule its distinguishing characteris-tics. The specific order of components is calledthe sequence of the macromolecule. Macro-molecular sequences can be thought of asstrings of “letters” that form the words, sen-tences, chapters, and books of living things.

In contrast to the macromolecules, all theother many molecules in the world relevant toliving things (such as water, sugars, fats, anddrugs) are often called small molecules. Thestudy of the actions of biomolecules large andsmall is generally termed biochemistry. Both sci-entific understanding of the structure andfunction of macromolecules and instrumenta-tion engineering advances in the ability to in-vestigate the details of particular members ofthese families are a crucial driving force in theexpanding understanding of life, hence theterm molecular biology (figure 1).

Evolution There is another unity among all forms of lifethat is even more important than the molecu-lar one: evolution. Evolution is, without adoubt, the most important concept in biology,and it was discovered long before biologicalmolecules were even conceived. Although bio-logical evolution is itself a complex topic, thebasic idea is again simple: All organisms are

part of a continuous line of ancestors and de-scendants. This is the only statement in biolo-gy to which there is no exception.

There are some very important conse-quences to this statement. Every creature thatever existed on earth is related (however dis-tantly) to every other creature. If you go backfar enough, every pair of organisms shares acommon ancestor. Not only are humans relat-ed to (that is, share a common ancestor with)chimpanzees, we are relatives to dinosaurs andeven bacteria! There is, in fact, a “universal an-cestor” that is the great-great-great ... great-grandparent of every organism on the planet.

The existence of common ancestors is an im-portant part of the explanation of the similari-ties we see within families of organisms. For ex-ample, because the use of nucleic acids to codefor proteins is universal throughout life, evolu-tion suggests that the most recent universal an-cestor must have done the same thing. Othersimilarities among smaller sets of organisms,such as bilateral symmetry in body shapes orthe presence of oxygen-carrying hemoglobin incirculating blood, are usually shared by virtueof their inheritance from a common ancestor.

Evolutionary relatedness leads to a straight-forward explanation of similarities among or-ganisms, analogous to the observation that off-spring are similar to their parents. The moredifficult challenge is to balance an explanationof our similarities with an explanation of thediversity. Its success in this challenge is whathas made evolution so central to our under-standing of life.

Evolution is a controversial topic with sever-al competing theories, but the overall structureof all the competitors involves three basic phe-nomena. First, evolution requires self-replica-tion. That is, entities that evolve must makecopies of themselves. The entity that does thereplicating is the parent, and the resulting newentity is the offspring. There is a lot of subtletyhidden in the word copy in that definition.Simply stated, offspring must share at leastsome of the characteristics of the parent; thissharing is called inheritance. Inheritable charac-teristics are called traits. Inheritance is one ofthe forces that drives life toward unity.

The second requirement for evolution is asource of variation—if offspring were all per-fect replicates of parents, there would be noevolution. There are many sources of inherita-ble variation in biology. Some of these sourcesof variation are random, such as mutation;others are systematic, such as the mix of in-heritance from multiple parents in sexual re-production. Variation is one of the forces thatdrives life toward diversity.

Evolution is,without adoubt, themostimportantconcept inbiology, and it wasdiscoveredlong beforebiologicalmoleculeswere evenconceived.

Articles

SPRING 2004 11

Page 4: Articles Life and Its Molecules

will tend to become common in the popula-tion over time, and those variations that nega-tively effect reproduction will tend to disap-pear.

The relationship between the particular setof traits of an organism and its reproductivesuccess is termed fitness. Charles Darwin’s fa-mous dictum, “survival of the fittest,” empha-sizes the high stakes and inherent competitioninvolved in differential reproductive success.Of course, fitness is a very complex function,which depends on many things, including theenvironment in which the organism lives and

However, the variation we see in life is clear-ly not wholly random. The final key in under-standing evolution is the idea of selection. Notevery organism gets to reproduce. Selection isthe process by which some organisms have off-spring, and others don’t. There are many as-pects to selection. Although some aspects of re-productive success are random, others arerelated to the traits of the organism. If we lookat groups of closely related organisms (calledpopulations) rather than individuals, it is possi-ble to demonstrate that variation in traits thatare positively related to reproductive success

Articles

12 AI MAGAZINE

Figure 1. From the Cell to Protein Machines.Cells are the fundamental working units of every living system. All the instructions needed to direct their activities are contained withinthe chemical deoxyribonucleic acid (DNA).

Although genes get a lot of attention, it’s the proteins that perform most life functions and even make up the majority of cellular struc-tures. Proteins are large, complex molecules made up of smaller subunits called amino acids. Chemical properties that distinguish the 20different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functionsin the cell. Figure courtesy, U.S. Department of Energy Human Genome (www.ornl.gov/hgmis).

Page 5: Articles Life and Its Molecules

even other organisms in that environment.Variations that improve fitness in a particularenvironment are called adaptations to that en-vironment.

The combination of inheritance, variation,and selection is the essence of evolution, theforce that created both the unity and the diver-sity of living things that we observe today. Bi-ology’s first understanding of the basic mecha-nisms of evolution arose from Gregor Mendel’sstudy of inheritance of characteristics in sexu-ally reproducing plants. Offspring had longbeen thought to be blends of the traits of theparents. However, if this were true, all variationwould quickly disappear. Darwin noted thisparadox but never resolved it. Mendel’s answerwas that traits were particulate; traits do notblend but are inherited (or not) as a unit. Evo-lutionarily speaking, a gene is the particle of in-heritance, or the smallest inheritable unit. Anorganism can be said to have a genotype, that is,the complete set of genes that were inheritedfrom its parents.

It is important to remember that not all thecharacteristics of an organism (that togetherare called its phenotype) are determined by in-heritance. Organisms with precisely the samegenotype can end up with quite different phe-notypes, like the differences between twinsreared apart. One of the basic concepts of biol-ogy is that genotype interacts with environ-ment to determine phenotype.

The genotype of one individual can be dif-ferent from that of another individual, even ofthe same species. Alternative forms of the samegene are called alleles. For example, the color ofa flower might be determined by a single gene,and the phenotype pink might arise from a par-ticular allele of the flower color gene. Organ-isms of a particular species all have the samegenes but have different alleles.

Mendel proposed that each sexually repro-ducing organism has two alleles for each gene,one from each parent. If the two alleles are thesame, the phenotype reflects it. These organ-isms are called homozygotic for that allele. If thetwo alleles are different, the organism is calledheterozygotic, and the phenotype reflects thedominant allele. The allele that is not dominantis called recessive. Recessive alleles are reflectedin the phenotype only when they are homozy-gotic.

Although nearly lost to history (Mendelworked before Darwin), this theory has held upthrough the transition from a purely evolu-tionary definition of a gene to the contempo-rary chemical one. Inheritance of genes in sex-ually reproducing organisms is determined byMendel’s laws. However, only characteristics

that are monogenic (related to a single gene) dis-play Mendelian inheritance at the phenotypiclevel. All continuously varying quantitativetraits (such as height) must be polygenic (in-volving several genes), which can make inher-itance appear to be a blending rather than anall or nothing phenomenon. Most medicallyimportant traits (such as proclivity to cancer orheart disease) are polygenic; that is, they in-volve multiple genes. That is why claims abouthaving found the gene for breast cancer (or in-telligence or any other complex phenomena)are generally journalistic oversimplifications.

The Central DogmaBiological macromolecules have many remark-able properties, and their study is the essenceof molecular biology. The most central ofthese, the one that was discovered by JamesWatson and Francis Crick, is that the DNAmolecule is the carrier of the gene. The rela-tionship between genes and phenotype is thatnucleotide sequences in a DNA molecule codefor the amino acid sequences of proteins. DNAcoding for protein is the biochemical basis forthe connection between genes and phenotype(a bit more biochemistry is necessary to fullyappreciate that statement but keep reading!).

The specific relationship between nucleicacids and proteins is so important to modernbiology that it is called the central dogma. Thecentral dogma itself is relatively simple, al-though the chemical mechanisms underlyingit are not. The dogma states, “DNA moleculescontain information about how to create pro-teins; this information is transcribed into RNAmolecules, which, in turn, direct chemical ma-chinery which translates the nucleic acid mes-sage into a protein.” One way to remember thedifference between transcription and transla-tion is to note that DNA and RNA are differentmechanisms for information storage (so ex-change among them is mere transcription) butthat protein is a mechanism for action, so itsproduction requires translating informationinto action. Although it is most often the casethat a single DNA sequence specifies a singleprotein, sometimes the DNA sequence of agene can specify multiple proteins (through al-ternative splicing) or even no protein (whenthe transcribed RNA plays a direct functionalrole).

The central dogma states that the flow of in-formation is one way, from DNA to RNA toprotein. This is a good time to make the obser-vation that there is practically no statement inall biology that is universally true, without ex-ception. It turns out that there are even excep-

Biologicalmacro-moleculeshave manyremarkableproperties,and theirstudy is theessence ofmolecularbiology.

Articles

SPRING 2004 13

Page 6: Articles Life and Its Molecules

modynamically feasible but would happen veryslowly or infrequently on their own. Other re-actions that living things manage to exploit arenot even thermodynamically feasible. Whatmakes these reactions happen at the rates need-ed for life? This question is why chemistry is socentral to understanding biology.

Proteins, functioning as enzymes, providethe activation energy necessary to catalyzethermodynamically feasible reactions. Al-though they play other roles as well, many pro-teins have an enzymatic function. Even moreimportantly, reactions that are not thermody-namically feasible at all can be made to happenby coupling them to other reactions that breakdown energy-rich compounds and providecompensating entropy. This is the essence ofmetabolism and why organisms need energy(either from sunlight or food) to live.

Metabolism involves catabolism, the transfor-mation of external energy into forms that canbe used by the organism, and anabolism, thesynthesis of the material components necessaryfor maintenance and reproduction of life (suchas particular lipids, proteins, and nucleic acids).The material being acted on by an enzyme is of-ten called its substrate, and the result of the en-zymatic transformation of the substrate is theproduct. Generally speaking, metabolism is real-ized by sets of linked chemical reactions, wherethe product of one reaction becomes the sub-strate for the next. Each reaction in this chainis catalyzed by a different protein; such a set ofreactions is called a metabolic pathway. There arehundreds of such pathways even in the sim-plest organisms, and these pathways branchand loop various ways to form complex meta-bolic networks (figures 2, 3).

How is it that proteins can accomplish theseamazing feats of chemistry? The details of en-zymatic mechanisms depend on the quantummechanics of electrons and bonds, but withoutgoing into that level of detail, it is still possibleto gain a rough understanding of the structureand function of proteins.

The enzymatic function of a protein general-ly has three aspects: (1) activity, (2) specificity,and (3) regulation. The activity of an enzyme iswhat it does chemically; for example, it mightbreak a particular kind of bond. There areabout a dozen very broad classes of activitiesand many variations on these themes. Specifici-ty is the ability of proteins to recognize and acton only particular substrates, often being ableto discriminate between extremely subtlechemical differences. Finally, the activity of aprotein can often be turned on or off or mod-ulated more finely by other molecules, termedthe regulation of the enzyme.

tions to the central dogma! For example, theAIDS virus, and its relatives in the broader fam-ily of retroviruses, is able to translate RNA intoDNA—hence the “retro” in retrovirus. Howev-er, these kinds of exceptions are quite rare, andthe central dogma is about as lawlike as anystatement in biology ever gets.

A profound implication of the central dog-ma is that nearly all the information necessaryto construct and operate a living thing is con-tained in its DNA.2 We call the complete com-plement of DNA (and therefore the collectionof all the genes) in a particular species itsgenome. That is why genome sequencing pro-jects, which determine the exact sequence ofall the DNA in an organism, are so important.

Structure and Function The components and activities of living thingsare studied in two distinct and complementaryways: (1) their structure and (2) their function.Structure, whether of an entire organism or of asingle biomolecular component, describesphysical composition and physical relation-ships. Function describes the role that a compo-nent plays in the processes of life. Much re-search in molecular biology is done to relate aknown function to the (unknown) structuresthat instantiate it, or relating a known struc-ture to the (unknown) function that it sup-ports.

The pressure of evolutionary selection en-sures that the main function of all living thingsis to turn environmentally available matterand energy into offspring more successfullythan competitors. The structures that supportthis function consist minimally of three com-ponents: (1) boundaries separating the organ-ism from its environment, (2) the organism’sinheritable characteristics, and (3) all the othermaterials necessary for survival and reproduc-tion. Boundaries take the form of membranesand are made of a class of small moleculescalled lipids. The inheritable characteristics,that is, the genome, are physically embodied instructures called chromosomes that consist pri-marily of DNA. The other materials necessaryfor life form a complex and highly structuredmixture loosely called the cytoplasm.

The main function that life must support(turning food into babies) is realized through awildly complex set of chemical reactions. At itsessence, bonds among the atoms in the matterthat living things consume as food must be bro-ken and remade into the molecules needed forlife—all at the right times and in the rightamounts. Some of the reactions that livingthings use to turn food into offspring are ther-

Thecomponents

and activitiesof living

things arestudied in two

distinct andcomplemen-

tary ways:(1) their

structure and(2) theirfunction.

Articles

14 AI MAGAZINE

Page 7: Articles Life and Its Molecules

Articles

SPRING 2004 15

Figure 2. Metabolic Network Model for Escherichia coli.Metabolic maps provide a framework for studying the consequences of genotype changes and the relationships betweengenotypes and phenotypes. This metabolic network model for Escherichia coli incorporated data on 436 metabolic interme-diates undergoing 720 possible enzyme-catalyzed reactions. In this diagram, the circles contain abbreviated names of themetabolic intermediates, and the arrows represent enzymes. The very heavy lines indicate links with high metabolic fluxes.Analyses were correct 90 percent of the time in predicting the ability of 36 mutants with single-gene deletions to grow ondifferent media. (Image courtesy U.S. Department of Energy Genomes to Life Program, doegenomestolife.org. Figure adapted from J.S. Edwards and B. O. Palsson, Proc. Nat. Acad. Sci. 97, 5528-33 [2000].)

Each of these aspects of enzymatic functionis realized by a corresponding aspect of thestructure of the protein. Recall that proteins arelinear polymers, made up of a particular se-quence of amino acids. An average proteinmight contain a bit more than a hundred

amino acids; very large ones can have thou-sands. There are 20 different naturally occur-ring amino acids that are assembled into pro-teins, which means that the total number ofpossible proteins is enormous. Each differentamino acid has somewhat different chemical

Page 8: Articles Life and Its Molecules

conformation or a small ensemble of similarconformations. However, there are an enor-mous number of physically possible conforma-tions, and a complete understanding of thisprocess or even enough of an understanding topredict the three-dimensional shape of a pro-tein given its sequence of amino acids is stillelusive, although much progress has beenmade recently. The correct folding of a proteinis crucial for its function; the recently discov-ered prion diseases (such as mad cow disease)are caused by misfolded proteins. How partic-ular confirmations impart particular activities,

properties; some are charged, some are heavy,some have aromatic rings,3 some are hydro-phobic,4 and so on. The precise arrangement ofamino acids determines the structure, activity,specificity, and mechanism of regulation of theprotein.

Although proteins are linear polymers, thelinear chain (also known as the backbone ofthe protein) self-assembles into quite complexthree-dimensional shapes (or conformations).These three-dimensional shapes are crucial tothe function of the protein. When dissolved inwater, most proteins fold into either a single

Articles

16 AI MAGAZINE

Figure 3. Pathway Kinetics.The pathway kinetics model depicts the mechanisms of the “decision circuit” that commits a bacterial virus (lambda) to one of two alternatepathways in its life cycle. The lytic path sets the stage for immediate replication of the virus and destruction of its Escherichia coli host cell,and the lysogenic path selects for the incorporation of viral DNA into the host genome, allowing the virus to remain in a dormant state.

In the diagram, bold horizontal lines indicate stretches of double-stranded DNA, arrows over genes show the transcription direction,and dashed boxes enclose operator sites that make up a promoter control complex. The core of the decision circuit is the four-promoter,five-gene regulatory network; initiation of pathway actions involves other coupled genes not shown. (Image courtesy U.S. Department of En-ergy Genomes to Life Program, doegenomestolife.org. Figure adapted from A. Arkin, J. Ross, and H. H. McAdams, Genetics 149, 163348 [1998].)

Page 9: Articles Life and Its Molecules

Articles

SPRING 2004 17

specificities, and regulation mechanisms is al-so the subject of intense scrutiny. For example,such an understanding is often important inthe development of new pharmaceuticals.

The Molecular Biologyof the Gene

The central dogma connects the Mendelianidea of the function of a gene (that is, as theunit of genetic transmission) with a particularstructure: Genes specify proteins. How does thiswork? DNA, which is the structure that embod-ies the genome, has to support two functions:First, it must be copied with high fidelity so thatthose instructions can be passed to offspring.Second, it must contain the specification of theproteins that ultimately determine (with theenvironment) the phenotype of an organism.

First, consider how inheritance works. DNAis also a linear polymer, this time of nucleotides.In some senses, the structure DNA is simplerthan proteins because there are only four differ-ent kinds of nucleotides found in DNA, and nomatter what its sequence, DNA forms prettymuch the same three-dimensional structure,the famous double helix. However, the remark-able aspect of DNA’s structure is how it supportsreplication. The nucleotide elements of DNAare adenosine, guanine, cytosine, and thymine,abbreviated A, G, C, and T, respectively. Eachnucleotide forms chemical bonds with one ofthe other nucleotides, which is called comple-mentary: A is complementary to T and G to C.Each nucleotide element of the polymer is al-ways matched with its complement; the termfor one of these units is a base pair. A sequenceof nucleotides (called a strand) is also directionalin that the “head” end can be distinguishedfrom the “tail” end.5 Thus, each DNA moleculeactually embodies two sequences: (1) one goingfrom head to tail and (2) the complementarystrand going the other way:

head ACTGACTG tailtail TGACTGAC head

The replicative machinery takes advantage ofthis complementarity. To copy a DNA mole-cule, it is unzipped (starting at either end), anda complementary nucleotide is bonded to eachof the two unzipped segments. By the time theentire molecule is unzipped, new base pairshave been attached to each position in both ofthe original strands, which then form twocopies of the original double-stranded DNA.

The second function that DNA must serve isto specify proteins. In fact, the chemical defi-nition of a gene is “a sequence of nucleotidesin DNA that codes for an amino acid” se-

quence. DNA is a linear polymer whose con-stituents are drawn from four nucleotides, andDNA sequences are often written down asstrings of their abbreviations, for example, AC-CATAGGACTT. The genetic code is the mappingby which nucleotide sequences are translatedinto amino acid sequences. Recall that thereare 20 different amino acids but only 4 nucleicacids. The information required to specify 1 of20 amino acids therefore requires at least 3 nu-cleotides (because there are only 16 possiblecombinations of 2 nucleotides). Nucleotidetriplets are called codons, and they specify aparticular amino acid. Because there are 64possible codons and only 20 amino acids,there is some redundancy in the code; for ex-ample, both the GGT codon and GGC codonspecify the amino acid glycine. There are alsothree stop codons that are used to indicate theend of a protein sequence and a commonlyused start codon that also codes for the aminoacid methionine.

The mapping from DNA sequences to pro-teins closes the loop in understanding howphenotype can be influenced by genotype. Thephenotype of an organism is highly dependenton how its components are synthesized, thatis, on its metabolism. Metabolism, in turn, isdetermined by the precise details of the organ-ism’s constituent proteins. Those details of pro-tein structure and function can be traced to theprotein’s precise amino acid sequences, andfrom there to the organism’s DNA, which ex-actly encodes the amino acid sequence of eachprotein an organism can make. That, brieflystated, is the connection between the structureof DNA and the function of Mendel’s gene. Fol-lowing the metaphor of the book of life, con-sider genomic DNA to be the text, proteins thewords, metabolic processes the sentences andparagraphs, and the phenotype a movie basedon the text.

Not all the DNA in an organism codes forproteins. The remainder, however, is not justjunk. Some RNA molecules have roles otherthan transcriptional messengers and can befunctional end products themselves. A key roleof functional RNA molecules is as part of theribosome, an assembly of proteins and RNAmolecules that carries out the translation ofmessenger RNA into protein. Even DNA thatdoesn’t code for protein or RNA can play animportant role. One function of noncodingDNA is the regulation of gene expression, thatis, the amount of each protein that is beingmade from a given gene at any particular time.The synthesis of proteins from the DNA codehas a cost in matter and energy, and the organ-ism doesn’t need the same amount of each

Page 10: Articles Life and Its Molecules

species, as is the dirt in your front yard, thesponge in your kitchen sink, and nearly everyother place on the planet.

Prokaryotes are simple in a variety of re-spects. Prokaryotes are microscopic, single-celled organisms with minimal internal struc-ture. Their fitness is largely determined by thespeed at which they can reproduce. They con-tain fewer genes, organized in simpler regulato-ry patterns than other organisms. In fact, thestudy of certain bacteria was central in identi-fying the chemical components and processesthat are absolutely necessary to life.

However, being as simple as possible still in-volves a fair degree of complexity. Prokaryoticmolecular systems accomplish most of the keytasks in all living things: the capture, transport,and application of energy; the synthesis of allthe molecules necessary for life from environ-mentally available materials; sensation, aware-ness, and response to the environment; theprocessing and even exchange of nucleic acidinstructions; and, of course, reproduction, inthe form of cell division called mitosis. All thesecellular processes are carried out by complexnetworks of proteins and catalyzed reactions.Despite their relative simplicity and thedecades of research done on E. coli, a completemechanistic understanding of all the activitiesof even these organisms is still elusive. In addi-tion, it is important to keep in mind thatprokaryotes are evolving organisms that areable to find novel and complex mechanisms toincrease their fitness in the contemporaryworld, for example, for resisting antibiotics,eating plastics, or otherwise living in human-dominated environments.

The Eukaryotes and the Eukaryotic Cell

Despite the ubiquity of bacteria, most peopleare more familiar with the broad class of organ-isms that include plants and animals; that is,the eukaryotes. Eukaryotes include a tremendousrange of organisms, from tiny free-living single-celled organisms to human beings consisting ofmore than a trillion cells. All multicellular or-ganisms (and therefore all that are visible to thenaked eye) are eukaryotic. Even some single-celled eukaryotes are familiar, such as brewer’syeast or athlete’s foot fungus. (Although not dis-cussed further in this introduction, there is alsoa third main branch of life called archaea. Theseorganisms are all single celled but are more likeeukaryotes than prokaryotes in a variety ofways; many are extremophiles that live in envi-ronments that once had been thought to be in-compatible with life.)

protein at all times. Some proteins (say, thosespecialized for digesting unusual food sourcesor managing temperature stress) might not beneeded at all in most circumstances. An impor-tant aspect of the way an organism controlshow much protein is present is to regulate thesynthesis of the protein (other mechanisms ofcontrol include the regulation of degradationand transport). The synthesis of a particularprotein coded for by a particular gene is calledthe expression of that gene.

The expression of a gene begins with thetranscription of genomic DNA into RNA. Thatprocess is itself controlled by the binding ofparticular proteins, called transcription factors,to the DNA molecule. Transcription factors rec-ognize particular sequences of nucleotides thatoccur just upstream (before) from the start ofthe coding sequence of the gene. Transcrip-tional control is combinatorial, in that onetranscription factor can influence the expres-sion of many genes, and most genes are influ-enced by more than one transcription factor.Genes that are influenced by multiple tran-scription factors have multiple upstream DNAsequences that those transcription factors rec-ognize. Regions that are required for transcrip-tion are called promoters; optional additionalregions called repressors bind transcription fac-tors that reduce the expression level, and oth-ers, called enhancers, bind transcription factorsthat increase it. Because transcription factorsare themselves proteins, their activities are un-der the control of still other transcription fac-tors, forming complex feedback loops thatcombine stability with responsiveness to envi-ronmental and other signals. These relation-ships among genes and transcription factors isthe genetic regulatory network.

ProkaryotesArmed with an appreciation of evolution,chemistry, and the central dogma, one can be-gin to understand some of life’s simplest forms:the bacteria. Bacteria, or to be more technicallyprecise, the prokaryotes, are important in sci-ence (making up the vast majority of the bio-mass on the planet), economics (as tools inbioengineering) and human health (aspathogens). They are also utterly ubiquitous.The human gut is populated with the prokary-ote Escherceria coli, which is necessary for peo-ple to properly digest their food. Small varia-tions in that E. coli bacterium can turn it into anasty pathogen, responsible for sometimeslethal cases of food poisoning. The shower cur-tain in your bathroom is probably home to bil-lions of bacteria of thousands of different

… prokaryotesare evolving

organismsthat are ableto find noveland complexmechanisms

to increasetheir fitness

in thecontemporary

world….

Articles

18 AI MAGAZINE

Page 11: Articles Life and Its Molecules

At the cellular level, eukaryotes are easilydistinguishable from bacteria by their greatlyelaborated internal structure. Eukaryotic cellshave a variety of internal compartments sepa-rated by membranes and various specialized in-ternal components called organelles. The ori-gins of the eukaryotic cell are particularlyinteresting; they appear to be the result of sym-biotic communities of prokaryotes losing theirindividuality and merging into a single entity.Organelles are remnants of this merging—oneeven has kept its own genome, which repro-duces and is inherited separately from the restof the organism’s genome.

The most striking organelle of the eukaryoticcell is the nucleus, which contains all the genet-ic material of the cell. Under typical micro-scopic stains, the nucleus appears as a large,dark (or false-colored blue) central sphere in alight (or false-colored pink) background of cy-toplasm. The nucleus is where gene expressionis controlled and where DNA replication oc-curs. The initial transcription of genomic DNAinto RNA also occurs in the nucleus, but thatmessenger RNA (mRNA) is then transported toanother organelle in the cytoplasm, called a ri-bosome, for translation into protein. Unlikebacteria, the DNA that codes for protein in eu-karyotes can be interrupted by noncoding re-gions of DNA called introns. Special proteins re-move the noncoding regions from the mRNAbefore it is translated into protein at the ribo-some. The parts of the gene that are not splicedout and that continue to be translated intoproteins are called exons. One way to remem-ber which is which is to think that exons areexpressed, and introns interrupt. The function-al role that introns play is not entirely clear. Atleast one function is to allow a single gene tocode for multiple related proteins through theuse of alternative splicing, or using differentsubsets of exons from a single gene to code forentire families of related proteins.

There are many other differences betweenprokaryotes and the eukaryotic cell. Anotherimportant difference is the mechanism bywhich the expression of genes is regulated. Inprokaryotes, functionally related genes are ad-jacent to each other in the genome, and a sin-gle promoter can control the whole group ofgenes. Regulation of the expression of eukary-otic genes is much more complex.

Remarkably enough, the biochemistry ofnearly all eukaryotic organisms is quite similar.The mechanisms of metabolism, protein cod-ing, gene regulation, genome replication, andso on are quite similar in humans and, say,earthworms.

Multicellular OrganismsEvery organism visible to the naked eye con-tains many cells and is called multicellular. Allmulticellular organisms are eukaryotes. Whatdifferentiates a multicellular organism from acolony of single-celled organisms is that a mul-ticellular organism is composed of cells ofmany different types, each of which specializesto accomplish a particular task. Cellular special-ization is the hallmark of multicellularity.

All the cells in a multicellular organism aredescendants of a single fertilized egg cell and,therefore, have exactly the same DNA. Howev-er, the cells differentiate from one another intolineages. Most cells become committed oncethey are differentiated; that is, they and theirdescendants cannot change to become anyother cell type. Differentiated cells specialize inparticular functions; for example, muscle cellscontract, nerve cells process signals, and so on.Cellular specialization is a clear example of theimportance of differential gene expression. Dif-ferent cell types in an organism all have exactlythe same genome; it is differences among theexpression levels of their genes that make amuscle cell different from a nerve cell.

The most basic division among cell types isbetween germline cells (that is, reproductivecells such as eggs and sperm) and somatic cells.Although the somatic cells sometimes divide,none of their genomes will make it directly intoanother generation of the entire organism. Anyvariation that arises in a somatic cell will be lostforever, although variations that arise ingermline cells will be passed along to the nextgeneration. From the somatic cell’s perspective,this is a remarkable loss of function. The granddeal that made cellular specialization possible isa fascinating evolutionary story that also likelyinvolves the origin of sexual reproduction. Thebreakdown of this deal, when somatic cells be-gin to reproduce without regard to their beingpart of an organism, is what we call cancer.

Most multicellular organisms reproduce sex-ually, although some can also reproduce bybudding. Sexual reproduction involves a differ-ent mechanism than mitosis, the process bywhich other cells divide. This special process,called meosis, is the mechanism by which twoparental genomes are transformed into a singleoffspring’s genome, and it underlies theMendelian nature of inheritance.

Reproduction is not the only place that cellsin multicellular organisms have to cooperate.Nearly every organismal function depends ontight coordination among cells, and an elabo-rate mechanism for sending and receiving sig-nal has developed in response to this need.This mechanism involves both the release and

Articles

SPRING 2004 19

Page 12: Articles Life and Its Molecules

multiple incoming signals over time and ulti-mately activating various transcription factors,causing changes in gene expression within thecell.

Signal transduction plays a key role in thefunction of many pharmaceuticals. Because thefunction of cell membranes is largely to keepundesired substances (such as drugs) out of thecell, it is much easier to find a chemical that in-teracts with a receptor than it is to find one thatcan get inside a cell. Receptors can trigger anenormous number of responses (even cell sui-cide, or apoptosis), and hence, drugs that bind

reception of signaling molecules and the cre-ation of an appropriate response to signalswithin a cell. Cell membranes are studded withan enormous variety of molecules called recep-tors that receive these signals. Receptors re-spond only to very specific molecular signals;the molecule that a particular receptor re-sponds to is called its ligand. The process set in-to motion by the binding of a ligand to its re-ceptor is called signal transduction. This processinvolves interacting cascades of reactions inwhich proteins called secondary messengerschemically modify one another, integrating

Articles

20 AI MAGAZINE

Larry Hunter’s Web Sitecompbio.uchsc.edu/hunter/My web site has pointers to many useful teaching and learning re-

sources.

Human Genome Project (DoE) Educationwww.ornl.gov/hgmis/education/education.html

National Human Genome Research Institute (NIH) Education Pages

www.genome.gov/page.cfm?pageID=10000002

The National Library of Medicine (NIH)www.nlm.nih.gov

NCBIwww.ncbi.nlm.nih.govNCBI provides GenBank and many other databases.

Molecular Biology Textbookswww.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books

AcademicInfo.net Biology Research Pageswww.academicinfo.net/biology.html

AcademicInfo.net Biology Education Pageswww.academicinfo.net/biologyed.html

All Species Foundationwww.all-species.org/The All Species Foundation is attempting to catalog all the species on earth.

The University of California Museum of Paleontology History of Life

www.ucmp.berkeley.edu/historyoflife/histoflife.html

The Hooper Virtual Paleontological Museumhannover.park.org/Canada/Museum/lobby.html

Mass Extinctionshannover.park.org/Canada/Museum/extinction/homepg.htmlThis museum has a very nice site on mass extinctions.

Anthropogenic Mass Extinctionwww.well.com/user/davidu/extinction.htmlThis is an excellent web site collecting information about the possible current anthropogenic mass extinction event.

The Tree of Life Web Projecttolweb.org/tree/phylogeny.htmlThis is a good taxonomy web site.

AccessExcellencewww.accessexcellence.org/BF/bf02/lipps/The origin of multicellular organisms is well described byJere Lipps on this site, which also has great pictures.

Protein Databankwww.rcsb.org/pdb/

Protein Databank Molecule of the Monthwww.rcsb.org/pdb/molecules/molecule_list.html

Protein Databank Ribosomewww.rcsb.org/pdb/molecules/pdb10_1.html

RNA Structurewww.rnabase.orgRNAbase is a database of RNA structures.

RNA Structure Primerwww.rnabase.org/primerRNAbase has an excellent primer on RNA structure..

Visible Human Anatomywww.uchsc.edu/sm/chs/open.html

Web Sites to Visit

Page 13: Articles Life and Its Molecules

to them can mediate effects such as reducingblood pressure, suppressing immune responses,or even alleviating depression.

Tissues, Organs, and Development

Not only are cells in multicellular organismsspecialized, they are precisely arrayed in partic-ular spatial patterns. Tissues are collections ofcells of a particular type in a particular spatialdistribution. The cells in a tissue all arise fromthe same lineage. Tissues that work together toexecute a particular biological function arecalled organs, such as a kidney or a leaf. Organs,in turn, are grouped into organ systems.

The four main human tissue types are (1) ep-ithelium, (2) connective tissue, (3) muscle tis-sue, and (4) nerve tissue. Epithelium is the tissuecovering all body surfaces and the lining of in-ternal organs and glands. There are three sub-classes of simple epithelium and then variouscombinations. Connective tissue is distin-guished not only by the type of cells it containsbut also by the extracellular material aroundthe cells. Different types of connective tissueinclude supportive tissues such as bone andcartilage as well as fat, blood, and lymphatic(or immune system) tissue. There are three dis-tinct kinds of muscle tissue: First, striated mus-cle is the kind under voluntary control, such asthat in arm or leg muscles. Second, smoothmuscle is found in places such as blood vesselwalls and the intestines and is generally notunder voluntary control. Third, the heart ismade of a special kind of cardiac muscle. Ner-vous tissue is specialized for sending and re-ceiving signals and makes up the constituentsof the nerves, brain, and sense organs.

Specifying where one organ (or even organsystem) ends and another begins is largely amatter of definition. Consider just a few of thedozen or so human organ systems. The circula-tory system includes the heart, blood, and bloodvessels. The digestive system includes the mouth,teeth, tongue, esophagus, stomach, and in-testines. Some would also include the glands ofdigestion (the pancreas, the gallbladder, andthe liver) in the digestive system, but otherswould include them in a glandular system.

The transformation from fertilized egg tomature adult is called the process of development.The study of development pursues three mainquestions: (1) differentiation, or how a single cellgives rise to all the many cell types found inadult organisms; (2) morphogenesis, or how tis-sues are organized spatially to make organs andhow organs are arranged into a body plan; and(3) growth, or how proliferation (and cell death)

is regulated and how cells know when to divideand when not to. Developmental mechanisms,although still not completely understood,evolve very slowly and tend to be widely sharedacross large numbers of organisms.

The structure of the tissues and organs ofthe body is the subject of the study of anatomy,and their function is the study of physiology,both of which are beyond the scope of this in-troduction.

Instrumentation and Experimental Systems

Biology involves not only its subject matter butalso the methods used to study it. A great dealof the excitement in contemporary biologycomes from the rapid pace of innovation in bi-ological instrumentation, which for the firsttime is producing data about living systemsthat is both molecular in detail andgenomewide in scope. Such instrumentation isgenerally referred to as high throughput, mean-ing that whatever is being assayed can be mea-sured quickly enough to look at a large propor-tion of the biomolecules in an organism.

The first high-throughput instrumentation,and in some ways the most fundamental, pro-vides the ability to determine the specific se-quence of a molecule of DNA. Longer se-quences are harder to obtain, so although thereare thousands of viral genomes that have beendetermined, and hundreds of bacterial ones,there are fewer than two dozen completely se-quenced eukaryotic genomes.

However, these complete sequenced organ-isms are not chosen at random. They representorganisms that are either of significant eco-nomic importance themselves (for example,rice) or are particularly amenable to experi-mentation and explanation. Such creatures arecalled model organisms and include mice, thefruit fly Drosophila melangaster, a simple multi-cellular worm called Caenorhabditis elegans,and the single-celled eukaryote brewer’s yeast.Although the complete genomic sequence isavailable for only a tiny fraction of the world’sorganisms, the sequences of at least somegenes of particular interest (for example, spidersilk) have been determined for tens of thou-sands of organisms.

DNA sequencing provides information notonly about the genome but also can be used tolook at variations, or polymorphisms, among dif-ferent individuals of the same species. Assayinga particular set of polymorphisms among differ-ent individuals, whether by DNA sequencing orother means, is called genotyping. The smallestpossible difference is a single nucleotide polymor-

Articles

SPRING 2004 21

Page 14: Articles Life and Its Molecules

ConclusionsLearning molecular biology is a process thatgoes in a spiral; one repeatedly studies thesame aspects of living systems but each time inmore detail and with new perspective. This in-troduction is intended to bring the readeraround the circuit once, providing only thecoarsest generalities and few examples or de-tails. Nevertheless, the careful reader shouldnow be able to put into context many of the bi-ological problems addressed by the computa-tional methods described in this issue.

For those whose appetite has been whetted,an abundance of excellent textbooks and refer-ences are available as well, some online (seesidebar). The two most widely used college-lev-el textbooks are Molecular Cell Biology (W. H.Freeman, 2003), by Harvey Lodish et al., andMolecular Biology of the Cell (Garland, 2002), byBruce Alberts et al. For a somewhat more gentleintroduction, I recommend either the wonder-fully illustrated The Way Life Works (ThreeRivers Press, 1998), by Mahlon Hoagland andBert Dodson, or Molecular Biology Made Simpleand Fun (Cache River, 2000), by David Clarkand Lonnie Russell.

Notes1. Mechanisms of disease.

2. There is also a role for the maternal proteins in thefertilized egg cell, but this is relatively minor com-pared to the contribution of the DNA.

3. Chemical structures when the atoms are bondedtogether to form a loop.

4. Water hating, such as oil.

5. One end is called 5′ (pronounced five-prime), andthe other end is called 3′ (three prime).

Lawrence Hunter is one of thefounders of bioinformatics. Aftergraduating with a Ph.D. in comput-er science from Yale University in1989, he spent more than a decadeas a research scientist at the Nation-al Institutes of Health before join-ing the faculty of the University ofColorado School of Medicine,

where he now directs the Center for ComputationalPharmacology. He is writing a book-length introduc-tion to molecular biology to be published by The MITPress in 2004. His e-mail address is [email protected].

phism (SNP), and technology for high-through-put SNP genotyping is beginning to be used inmolecular genetics laboratories.

Gene sequences generally contain all the in-formation necessary to specify the three-di-mensional structure, hence the chemical func-tion, of a protein; however, there is as yet nopractical method for mapping from an aminoacid sequence to its folded structure. Instead,instrumentation involving X-ray crystallogra-phy or nuclear magnetic resonance can be usedto empirically determine a protein’s structure.Although still difficult, protein structure deter-mination is rapidly increasing in speed. Struc-tural genomics is the name for the effort nowunder way to determine a representative set ofthree-dimensional structures, including allmedically important human proteins and pro-teins from important pathogens and model or-ganisms.

Recall that the differences between variouscell types and tissues in an organism are notdetermined directly by genotype (which is thesame for all its cells) but instead by differencesin expression levels among the various genesover time. Once the sequence of the genes ofan organism is known, it is possible to fabricatea device, called an expression array or gene chip,that takes snapshots of the expression level ofall genes simultaneously.

Even at a particular instant, snapshots ofgene expression do not tell the whole story ofwhat is going on in a cell. Recall also that sig-nal transduction, which is very important forthe invention of new drugs, involves the mod-ification of existing proteins rather than thesynthesis of new ones. Mass spectrometry can beused to assay proteins and modifications tothem. Although the technology is still evolv-ing, this approach is also becoming highthroughput and forms the basis for proteomics,or the study of the complete set of proteins ina living system.

Despite these spectacular innovations, biol-ogy is still quite limited in its measurementabilities (and their associated costs). For exam-ple, although gene expression can vary quitesignificantly among individual cells in a tis-sue, current expression array technology re-quires RNA from a large number of cells com-bined to get a signal. Innovation in molecularinstrumentation is proceeding very rapidly,and it is reasonable to expect that new meth-ods will provide a great deal more valuable in-formation, and the insights into the functionsof living things that come from them, in com-ing years.

Articles

22 AI MAGAZINE