Reverse engineering the genotype– phenotype map with

Rockman layout.indd NS OLD.inddMore than a decade into the genomic era, it remains easier to collect genomic data sets than to understand them. The research community has obtained vast quantities of data on genes, transcripts, proteins, metabolites and so on, but has discerned only faint outlines of the networks that connect these factors. Biologists are justifiably enthusiastic about the ability to describe networks of biological molecules that are co-expressed or co-localized, but a central goal of contemporary biology is to connect these observable patterns to form models that predict how biological networks operate as systems. The networks that matter in this context are networks of causal relationships, which can be uncovered by using experiments in which biological systems are perturbed1–3. The translation of genotype into phenotype depends solely on these causal relationships; many of the relationships are shaped by the co-expression of genes and physical interactions between cellular components, but many others are determined by intricate networks of cause and effect that are mediated by an organism’s physiology, behaviour, and interactions with the environment4 (Box 1). Inferring causal networks from observations is often called reverse engineering, because the goal is not merely to identify components that are functionally related or situated near to one another but to understand how the system works as an integrated whole.
The classic method for reverse engineering a system is to poke a component with a stick and then to characterize the effect of the perturbation. An alternative is to poke many components simultaneously and at random, repeating the experiment over many random sets of components. Ever since R. A. Fisher put forward his ideas5 in the 1920s, statisticians have recognized such randomized multifactorial perturbation as the ideal experimental design for uncovering causation. Con veniently, the genetic variation that occurs naturally within a population is a source of multifactorial perturbation6,7. The use of natural genetic variation to probe the causal network that links genotype and phenotype has grown recently as large data sets have been generated for many experimental model species, crops and humans8–10. In this Review, I discuss recent progress in the application of natural genetic variation to reverse engineer the ‘genotype–phenotype map’. After introducing the basic experimental approach, I describe its advantages over traditional genetic screens and show how the resultant data allow tentative inferences to be made
about causation. Finally, I discuss the steps that are being taken to gain a mechanistic understanding of the network that connects genotype to phenotype, and I point out potential obstacles to this process, as well as potential shortcuts.
Quantitative genetics of transcript abundance Genetically characterized populations are the central tool for uncovering the genetic variants that underlie phenotypic variation. A common approach is to cross two inbred lines, each homozygous at every locus, to yield a hybrid that is heterozygous at every locus that differs between the strains. In a typical cross, thousands to millions of genetic loci differ. The ordinary process of meiotic recombination rearranges these polymorphisms within the hybrid germ line, and segments of each of the initial genomes are passed on randomly to the progeny of the hybrids. By tracking the genomic segments with molecular markers, the regions of the genome that contain genetic variants that affect phenotypes (known as quantitative trait loci, QTLs) can be identified11 (Fig. 1a–d).
An important recent advance in connecting the links between genotype and phenotype has been to measure the ‘phenotypic states’ of the links, most notably the abundance of the transcripts corresponding to each gene of interest8–10,12. Quantitative genetic analysis of genome-wide transcript abundance is sometimes called genetical genomics6 or expression QTL mapping9, and the results from this type of analysis — the correlations between genes and transcript phenotypes — can be represented by plotting the physical position in the genome of the gene corresponding to each transcript against the position of the loci associated with variation in transcript abundance (Fig. 1d). In a properly controlled cross, an association between genotype and phenotype implicates genetic variation as the cause of the phenotypic variation; as is the case for all claims based on empirical data, confidence in such a causal inference is defined statistically. The genetic analysis of genome-wide transcript abundance poses distinct technical, computational and analytical challenges that necessitate careful avoidance of potential artefacts13,14 and that drive innovation in statistical genetics methods15–19.
Analyses typically show many linkages between the abundances of transcripts and the regions in the genomes where the structural genes for those transcripts reside. Such genes contain QTLs for their
Reverse engineering the genotype– phenotype map with natural genetic variation Matthew V. Rockman1
The genetic variation that occurs naturally in a population is a powerful resource for studying how genotype affects phenotype. Each allele is a perturbation of the biological system, and genetic crosses, through the processes of recombination and segregation, randomize the distribution of these alleles among the progeny of a cross. The randomized genetic perturbations affect traits directly and indirectly, and the similarities and differences between traits in their responses to common perturbations allow inferences about whether variation in a trait is a cause of a phenotype (such as disease) or whether the trait variation is, instead, an effect of that phenotype. It is then possible to use this information about causes and effects to build models of probabilistic ‘causal networks’. These networks are beginning to define the outlines of the ‘genotype–phenotype map’.
1Center for Genomics and Systems Biology, Department of Biology, New York University, 100 Washington Square East, New York, New York 10003, USA.
738
INSIGHT REVIEW NATURE|Vol 456|11 December 2008|doi:10.1038/nature07633
own transcript abundance. Consequently, the data points align on the diagonal in Fig. 1d. Most of these local linkages will be attributable to cis-acting regulatory polymorphisms, although there will be some contribution from polymorphisms that affect the transcript abundance by acting in trans10,20,21.
Data points aligning in vertical bands in Fig. 1d indicate linkage hotspots: that is, regions of the genome at which variation alters the abundance of a large number of transcripts. The loci responsible for these large phenotypic effects might be highly influential regulatory loci22, or they might be loci at which variation has a marked but non- specific effect on transcript abundances. Such pleiotropic alleles might alter cellular homeostasis in such a way as to shift the steady state for a large number of traits, without this shift being a regulatory effect10,23.
The quantitative genetics approach described earlier, using natural genetic variation as a source of perturbations, has striking advantages over the classic (one gene at a time) approach7. First, the quantitative genetics approach involves massive hidden replication. The effect of each of the alleles present in the initial cross is measured repeatedly because each allele is present in a large number of the phenotyped progeny. A small phenotyping panel of 100 individuals represents on average a 50-fold replication in studying the effect of every allele; similar amounts of replication are not feasible for a one-at-a-time approach.
Second, the presence of simultaneous variation at multiple loci allows the interactions between perturbations (that is, genetic variants) to be uncovered. In the context of gene-expression genetics, such interactions are likely to be common24,25, and observations have shown that the inheritance of many transcript-abundance traits involves interactions between genes16,26,27. A prime example of this gene-interaction phenomenon is genetic redundancy; only by varying the redundant loci simultaneously can their effects be detected.
Last, the simultaneous perturbation of a large number of factors results in the exploration of a larger ‘space’ of variation than when carrying out single perturbations. Genetically complex traits, which are shaped by variation at many loci, often show transgressive segregation: that is, the random assignment of alleles to the progeny of hybrids results in individuals with unusual collections of alleles, which yield extreme phenotypes. Transgressive segregation characterizes the majority of the transcript- abundance traits in crosses in which its frequency has been examined26,28. The large space of phenotypes covered by genetically segregating populations increases the ability to detect relationships between traits. Tran- script-abundance data from such populations, for example, are unusually successful at predicting functional relationships between genes29–33.
Causal ordering A QTL can affect some traits directly and can affect others indirectly through the effects of intermediate traits. Of particular interest is whether variation in an organismal phenotype, such as a disease state or a behaviour, is an effect of transcript-abundance variation or a cause34. Only if the transcript-abundance trait is a cause (not an effect) can it be a target for perturbations that shape variation in the organismal trait, whether in the laboratory, the clinic or an evolving population.
Under a broad set of assumptions, causality shapes correlations in a recognizable way, and its signature is conditional independence. This can be shown by considering three causally related traits: A, B and C. Under standard Markov assumptions, if variation in A causes variation in B, which in turn causes variation in C (this can be written as A → B → C), then when the distribution of B is known, A provides no additional information about C. A and C are independent conditional on B. This statement of conditional independence would not hold if the causal ordering were B → C → A, for example. Conditional independence is by itself insufficient to order causal links uniquely: A ← B → C yields the same conditional independence statement as A → B → C. Nevertheless, conditional independence is a powerful tool for distinguishing direct links between traits from indirect links, even when causal ordering is not possible1,35,36.
The key advantage of studying genetic perturbations is that many causal orderings are prohibited by the central dogma that genotypic
variation can cause phenotypic variation, but, at least within an individual, phenotype does not feed back to affect genotype. Therefore, in a properly controlled cross, genetic perturbations are causally upstream of phenotypes and provide a terra firma into which causal networks can be rooted34,37–40 (Box 2).
There are several methods for causally ordering pairs of phenotypes measured in segregating populations, including approaches that simultaneously map QTLs and fit causal models18, approaches that apply for- mal statistical tests to identify direct causal links39, and approaches that fit various causal models to triplets that comprise two traits and a QTL and then compare the fit of the models using information-theory criteria34.
Analysis of the correlations between multiple traits that share an underlying QTL can also help to identify the causal gene within a QTL interval33,37,41–44 — the major challenge in quantitative genetics today. Many such analyses incorporate sources of information in addition to the correlations, including data on the binding sites of transcription factors,
The concepts of causality and networks are controversial. Many biologists are sceptical about inferring causality from statistics and about how general and useful network models might be, so it is valuable to consider how causality and networks can be interpreted in the context of the relationship between genotype and phenotype.
When carrying out a biological experiment, most people are content with the everyday theory of causation that dictates that one fact precedes a second and alters its probability (setting aside questions about the ontological status of probabilities). Empirical claims about links between causes and effects rely on assumptions (often implicit) and on statistical measures of confidence. Even when testing a simple single-gene perturbation, such as in a gene-knockout organism, it is assumed that inductive reasoning will lead to knowledge, that genotypes are fixed and that they causally precede phenotypes, and that all variables are fully controlled (for example, by comparing the experimental organisms with wild-type full siblings, by blinding observers to differences in treatment, and by carrying out randomized replications of the entire experiment). A researcher’s confidence that the wild-type organism and the mutant organism show real differences is influenced by statistical tests (for example, the P value from a t-test), which are themselves typically laden with assumptions.
In inferring probabilistic causal networks, the assumptions are more numerous, but the conceptual framework is the same. At the end of an analysis, the outcome is a claim about cause and effect, and this claim is accepted to an extent that is defined by the researcher’s comfort with the assumptions and the statistics. For genetics experiments, the limits of comfort for most researchers lie not far beyond the single- gene perturbation experiment, and making inferences about a causal network is typically seen as a technique for nominating candidate genes for follow-up experiments.
Whether networks exist is a popular topic at biology department happy hours, but it is not necessary to subscribe to the reality of a Platonic Network, an ideal form independent of the material world, in order to embrace the idea that there is a many-to-many relationship between causes and effects in biology. The more pressing question is which components are needed to represent such a network in a predictive model: molecules, interactions, dynamics, all of these, or more? Conveniently, the set of variables in a causal network is entirely circumscribed by the set of things that vary. A molecule or an event can be required for a biological process, but if it does not vary, then it cannot be a cause. In that sense, causal networks in genetics are analogous to the geneticist’s concept of heritability, which describes not the dependence of a trait on inherited genes but the proportion of the trait’s variation that can be explained by genetic variation in the observed sample. A causal network that is inferred from a genetically segregating population will therefore depend on the genetic variation that is present in the population and on the distribution of variation in the environment across the sampled individuals. Within that framework, a causal network constitutes a predictive model of the consequences of perturbing the represented variables.
Box 1 | Do causal networks exist?
739
NATURE|Vol 456|11 December 2008 REVIEW INSIGHT
the interactions between proteins, and the presence of polymorphisms in the sequences of each gene in the QTL. As a starting point, a gene that is found at the same location as a QTL for its own abundance is a strong candidate for the causal gene underlying variation in other traits that link to the same location, with the gene’s transcript abundance as a candidate causal trait29.
The use of phenotypic correlations to identify causal transcripts is especially promising in the context of genome-wide association studies. These studies have recently uncovered a wealth of high-confidence, rep- licated associations between genetic variants and diseases in humans (see page 728), but the disease-associated variants are often in non-co ding regions with unknown function45. The mechanisms that link genotype and disease in these cases can be identified by taking advantage of the structure of the correlations among transcript-abundance traits and disease states in human populations32,46–48. In population-based association mapping, however, correlations between genotype and phenotype can arise from external causes that are common to the correlated variables, for example from the stratification of populations by age or ethnicity. Consequently, the causal ‘anchor’ provided by genotype in these cases is less secure than in the experimental setting of inbred line crosses. But studies in animal models can be used to corroborate findings, providing reassurance31,49.
Causal networks To gain a predictive, systems-level understanding of biological causation, researchers need to integrate the entire ensemble of genetic variants and phenotypic traits (and not just to causally order trait pairs). Several approaches aim at this more general goal, and Bayesian networks provide the most popular framework1. A Bayesian network is a graph of random variables, each representing a phenotype in this case, that are connected by directed edges. A set of probability distributions describes the state of each variable conditional on the variables with edges leading to it. The graph and probability distributions define a conditional probability statement. One problem with using Bayesian network graphs is that several directed graphs can be described by a
Recombinant inbred lines
at a QTL
to a QTL
*
* *
Physical position of gene 3
Positions of QTLs that explain variation in gene 3 transcript abundance
Marker position along the genome
1 2
* * * * * * *A B C D E F G
1 2 3 4 5 6 7 8 9 10Transcript- abundance phenotype
QTL genotype
a
b
c
d
e
f
Significance threshold
Figure 1 | From genetic randomization to causal network. A genetically randomized population, such as a panel of recombinant inbred lines whose chromosomes carry random segments of genome from two progenitor strains (depicted in orange and blue) (a), is a starting point for linkage analysis of a phenotype. In this case, the phenotype is the abundance of transcripts corresponding to a gene denoted as gene 3. The abundance of gene 3 transcripts varies between the recombinant lines (b, left). Each point along the genome is tested to see whether it affects the abundance of gene 3 transcripts (b, centre and right), and statistical evidence is uncovered for the linkage of gene 3 with two regions (indicated by asterisks) (c). These regions are called QTLs. If a similar experiment is carried out for many transcript-abundance phenotypes (not just for gene 3, but for genes 1–10), the positions of the QTLs that affect transcript abundance (asterisks) can be plotted against the physical positions of the gene corresponding to each transcript (green stars) (d). In such a plot, the data (red dots) along the diagonal line represent local linkages, typically due to cis-acting regulatory polymorphisms. Vertical alignments in the plot indicate linkage hotspots. The plot depicted implies a high- level causal network (shown in e), in which QTL variation is the cause of variation in transcript-abundance phenotype. Transcript-abundance QTLs can co-localize with QTLs for organismal phenotypes such as a disease (not shown); for illustrative purposes, disease is shown linked to QTL G. A goal of the reverse engineering of causal networks is to include phenotypes as variables, for example to determine whether the transcript abundances that are affected by QTL G are causes or effects of the disease. Although the traits are densely connected by correlations — as is evident from the hypothetical correlation network that is depicted (f), which connects all traits that share perturbations — a causal network (f) reveals that QTL G acts directly on gene 9, the transcript abundance of which affects genes 1, 2, 4 and 5. The transcript abundance of gene 2 is a cause of disease, which in turn alters the transcript abundances of genes 4, 7 and 10. Many transcripts are correlated with disease, but only perturbations of genes 2 and 9 will affect disease outcome.
740
NATURE|Vol 456|11 December 2008INSIGHT REVIEW
single conditional probability statement (as discussed earlier), so observations of the random variables (the transcript abundances in this case) cannot uniquely identify the directed network that underlies the graph. Moreover, a second problem is that Bayesian network graphs are acyclic and therefore cannot model feedback regulation; the consequences of this limitation for the utility of Bayesian network are unclear1,31,37. A third problem is that the space of possible network graphs is large, making causal-network inference a computationally intractable problem1. These difficulties notwithstanding, transcript-abundance measurements from genetically segregating populations are uniquely suited to uncovering directed Bayesian networks for two main reasons29,37. First, a trait that is caused by another trait should share an underlying genetic perturbation: a QTL. This simple filter excludes a huge proportion of the space of possible networks, making the problem tractable. Second, genetic perturbations anchor causal networks (as discussed earlier), giving direction to the edges. Although large-scale causal-network inference remains challenging, the incorporation of genetic data clearly improves the quality of the predictions over those derived solely from trait correlations50.
In parallel with Bayesian network models, structural equation models have been applied to transcript-abundance data from segregating popu- lations38,51. These models involve systems of linear equations organized into a network structure; a linear model is fitted with variables that simultaneously function as predictors and responses. Although structural equations, unlike Bayesian networks, have the advantage of allowing feedback cycles to be modelled, they require the standard assumptions of linear modelling. Therefore, when nonlinear causal dynamics underlie transcript abundances, problems can arise. Bayesian networks typically deal with nonlinearity incidentally, by classifying all of the data into simple discrete categories (for example, upregulated, downregulated and unchanged), a simplification that has its own drawbacks50.
An alternative approach is to generate a simple network from pairwise trait correlations and then to trim this network by testing for conditional dependence relationships35. The resultant undirected graph can then be directed by anchoring the edges in QTLs40. There are clear computational advantages to starting with a network that is derived from pairwise correlations. Because correlations between genes typically show modu larity — with clusters of highly correlated genes being largely uncorrelated with other such clusters — pairwise analyses can break intractably large problems into problems that focus on individual modules29,52,53.
Two recent studies on disease phenotypes in humans and mice used this shortcut of partitioning the transcript-abundance data into modules of correlated traits31,32. Breaking the problem down further, the authors compared causal orderings for pairs of transcript abundances and disease phenotypes, to see whether each transcript could be placed causally upstream of the disease state. A single module, evident in both mouse and human data, was significantly enriched for putatively causal traits; subsequent experimental manipulations corroborated these inferences. Although this is far from a complete reverse engineering of the genotype–phenotype map, these empirical successes (reducing genomic data to the two-trait ordering problem) point to a coming age in which prediction will be a common tool.
Quantitative genetics is evolutionary genetics The central dogma that genes are causes of phenotypes within an individual aids in the anchoring of directed networks. But among individuals, phenotypes feed back by selection to shape genes. Natural variation therefore samples a biased subset of possible genetic perturbations, a subset that is enriched for those variants that are not strongly deleterious. Under the classic infinitesimal model of the genotype–phenotype map, variation derives from mutations of small effect with limited pleiotropy 54. If this model holds, the modularity of networks inferred from natural variation29,31,32,37 might be an epiphenomenon of natural genetic perturbations, the effects of which are less systemic than those of random mutations.
The effect of selection is evident in the numbers and types of QTL detected in typical studies. James Ronald and Joshua Akey found that the proportion of genes showing local QTLs in a yeast cross is smaller than expected under neutrality, implying that negative selection keeps certain perturbations at low frequency55. Similarly, genes that are crucial regu lators of essential processes are likely to be under-represented among genetically variable genes. Genes that encode transcription factors, for example, are clearly candidates when looking for the genes involved in varying gene expression, but these genes are largely absent from expression QTLs18,56. Nevertheless, links between transcription factors and target genes can be detected by causal inference approaches, even when the transcription- factor locus contains no genetic variation. The only requirement is that some phenotypic measure of the transcription factor’s activity (such as the abundance of the corresponding transcript) is causally intermediate between the genotype and the target gene’s phenotype31.
The basis for causal inference can be shown graphically. Consider a population of haploid individuals with a single causal locus (G) that has two alleles. The allelic state at this locus causes variation in the abundance of the corresponding transcript (T1), and additional sources of variation (genetic, environmental and stochastic) also influence T1. Data can be simulated with the allelic effect modelled as β1 and the additional variation modelled as normally distributed noise, ε. Thus, T1 = β1G + ε, where G is in a indicator variable for genotype. Variation in the abundance of T1 causes variation in a downstream trait, T2, which is also affected by other sources of variation, so T2 = β2T1 + γ, where γ is an additional noise term. The causal ordering is shown as a scheme in panel a of the figure, and simulated data are plotted in panel b of the figure.
Data were simulated to represent 300 individuals. Each data point is coloured according to genotype (blue for one allele of G and red for the
other), and the mean values for each trait are indicated by a coloured line for each genotype. (For this simulation, genotypes were assigned indicator variables: red was assigned –1, and blue was assigned 1. The parameters used were β1 = 0.5 and ε ~ N(0, 1), yielding a zero-mean trait T1. For T2, β2 = 1 and γ ~ N(0, 1), so T2 is simply T1 with additional noise.)
The causal links mean that the traits, T1 and T2, are correlated with one another and that both are correlated with genotype (figure, b). Nevertheless, the relationship between T2 and G is entirely mediated by T1. T2 conditional on T1 is independent of genotype; the mean phenotype for each genotype is the same (figure, c). Conversely, the distribution of T1 conditional on T2 remains dependent on genotype (figure, d). It is the noise component of T1 variation, ε, propagated through T2 that makes this mode of analysis possible, and it is the causal anchor of the genotype that gives it direction.
Box 2 | Causal ordering yields conditional independence
b c d
G T1 T2
–4
–2
0
2
4
T1
T2
–2
–1
0
1
2
T1
T2 |
T1
–4
–2
0
2
4
The effect of selection on the filtering of genetic perturbations varies according to the type of experimental population studied. Inbred line crosses often involve genetically divergent lines, chosen to maximize phenotypic or genotypic differences. The alleles that contribute to divergence are likely to differ in their allelic effects from those that contribute to standing variation (the ordinary genetic diversity present within a population)57–59. Crosses between divergent inbred lines are more likely to uncover rare large-effect mutations (that is, linkage hotspots) than are samples of individuals from large populations32,55,60. Such crosses might also be biased towards a subset of perturbations with large effect, not individually but in combinations, as a result of coadaptation61. Pervasive genetic interaction means that causal links between pairs of genes will be poorly modelled, although there has been progress towards solving this problem recently16.
The suite of naturally occurring perturbations is also shaped by population genetics phenomena that are unrelated to the fitness effects of the perturbations themselves. Genetic variants tightly linked to other variants that are the target of selection are evolutionarily coupled to them, yielding a correlation between local recombination rate and levels of genetic vari- ation62,63, and mutagenic recombination can produce the same pattern64. Genes present in regions that undergo recombination at a low rate are less likely to contribute to the pool of genetic perturbations than genes in regions where recombination occurs frequently. If gene location is non- random with respect to recombination rate, then there might be fewer perturbations belonging to particular functional classes61.
Prospects for genetical systems biology Causal network inference faces many difficulties, and its application to gene expression in segregating populations introduces additional challenges. Despite many published reports of empirical successes, it is important to consider the pitfalls that unpublished studies might have encountered. Many of these pitfalls are now well recognized, and there are clear paths around them, on both the experimental front and the analytical front, towards a richer understanding of the genotype– phenotype map.
One of the most problematic assumptions that is made when drawing causal inferences from gene-expression data is that measurement errors are similarly distributed across traits. If a causal trait is poorly measured and the trait it affects is well measured, then the measurements of the ‘effect trait’ might report the true values of the causal trait more accu- rately than measurements made directly on the causal trait itself 34,39,51. In such cases, conditional correlation approaches can yield inverted inferences (Fig. 2). There are good biological reasons to be concerned about this situation. For example, regulatory molecules are often present at low abundances in a cell, but their effects are amplified by the dynamics of gene regulation. Thus, variation in the number of transcripts encoding a low-abundance transcription factor might be the cause of variation
in the number of transcripts encoding a high-abundance structural protein. The difficulty arises in that the high-abun dance transcripts might be easy to measure with great precision, whereas the low- abundance transcripts might be present at the threshold of detection. One possible solution is to return to the early designs of microarray experiments, when technical replicates were routine. Taking multiple independent measurements of transcript abundances from a single biological sample would generate empirical parameters for gene-specific error models. The potential to be misled when making causal inferences underscores the point that a causal inference yields an assumption-laden probability statement, and stronger claims about causality need to be experimentally validated3.
Another concern is that data for transcript-abundance mapping are derived from mixed populations of cells. Consequently, the measurements describe cellular mixtures, the characteristics of which are determined by developmental and cellular demographics31,32,65. This is as true for yeast, which has been studied by measuring unsynchro- nized cultures (which contain cells at different stages of the cell cycle), as it is for animals and plants, in which tissues or whole organisms are assessed. A study of mice that varied genetically in the cell-cycle timing of their haematopoietic stem cells took advantage of the issue of mixed cell populations to identify which genes were expressed differentially in the different cell populations65.
A related issue is that networks inferred from segregating populations are static representations. Without time-series data, there can be no account of dynamics. Inferred causal links correspond to perturbations that alter the steady state. In a system with feedback, which is likely to include almost all biological steady states, the causal links will describe only the causes that ‘win out’ over others in shifting the phenotypic bal- ance31. For practical considerations of whether it is possible to predict how novel perturbations will affect steady states, these low-dimension projections of the network might be adequate; however, for true reverse engineering, more-complete models are needed. Time-series data have proved exceptionally important for solving the general problem of causal-network inference2, and it is clear that integrating such information into studies of genetic variation will improve the knowledge gained from these studies.
Many network-inference methods depend on the inclusion of all causal variables in the models1, but such completeness is not plausible for most biological networks. Some phenotypic causes will be overlooked because they are not measured: for example, unannotated genes, genes present in structural polymorphisms that are absent from reference genomes, and transcripts of small RNAs. In addition, the abundances of metabolites are typically not considered, although much progress has been made in this area recently40,66. Regulatory events for which there are no transcriptional indications (for example, post-translational regulation) will also be missed, although again progress is occurring on
G T1 T2
–4
–2
0
2
4
T1′
T2 ′
–4
–2
0
2
4
T1′
T2 ′ |
–4
–2
0
2
4
T2′T1′
Figure 2 | Measurement error can confuse causal inference. The effect of errors in measurement on causal inference is depicted for the population, parameters and conditions set out in Box 2. In brief, a population of haploid individuals has a single causal locus (G) with two alleles, and the allelic state at this locus causes variation in the abundance of the corresponding transcript (T1), which subsequently affects the abundance of another transcript (T2). a, The measured values of T1 and T2 in this example were simulated as their true values from Box 2 plus normally distributed error, yielding T1 and T2. For T1, the error is normally distributed with a variance of 2, whereas the variance of T2 is tenfold lower. The causal ordering of this scheme is shown. b, T1 and T2 are correlated with one
another and linked to the genotype, which is represented by the colours (blue for one allele of G and red for the other). However, the conditional correlations are now misleading with respect to the true causal network (shown in a). c, T2 remains dependent on genotype after taking T1 into account, which is unexpected given the causal ordering (a) (and given that T2 conditional on T1 does not depend on genotype; Box 2 figure, panel c). d, T1 is nearly independent of genotype after taking T2 into account, which is also unexpected given the causal ordering (a) (and given that T1 has been shown to depend on genotype; Box 2 figure, panel c). In total, the effect of the differences in measurement error is to make T2 a better measure of the true T1 than T1 itself.
742
NATURE|Vol 456|11 December 2008INSIGHT REVIEW
22. Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).
23. Churchill, G. A. The genetics of gene expression. Mamm. Genome 17, 465 (2006). 24. Omholt, S. W., Plahte, E., Øyehaug, L. & Xiang, K. Gene regulatory networks generating the
phenomena of additivity, dominance and epistasis. Genetics 155, 969–980 (2000). 25. Gjuvsland, A. B., Hayes, B. J., Omholt, S. W. & Carlborg, O. Statistical epistasis is a generic
feature of gene regulatory networks. Genetics 175, 411–420 (2007). 26. Brem, R. B. & Kruglyak, L. The landscape of genetic complexity across 5,700 gene
expression traits in yeast. Proc. Natl Acad. Sci. USA 102, 1572–1577 (2005). 27. Brem, R. B., Storey, J. D., Whittle, J. & Kruglyak, L. Genetic interactions between
polymorphisms that affect gene expression in yeast. Nature 436, 701–703 (2005). 28. West, M. A. et al. Global eQTL mapping reveals the complex genetic architecture of
transcript-level variation in Arabidopsis. Genetics 175, 1441–1450 (2007). 29. Lum, P. Y. et al. Elucidating the murine brain transcriptional network in a segregating mouse
population to identify core functional modules for obesity and diabetes. J. Neurochem. 97 (suppl. 1), 50–62 (2006).
30. Huttenhower, C. et al. Nearest neighbor networks: clustering expression data based on gene neighborhoods. BMC Bioinformatics 8, 250 (2007).
31. Chen, Y. et al. Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435 (2008).
32. Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008). References 31 and 32 integrated association mapping in human populations and linkage mapping in mice to identify suites of functionally related genes that are causally implicated in disease.
33. Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nature Genet. 40, 854–861 (2008).
34. Schadt, E. E. et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genet. 37, 710–717 (2005).
35. de la Fuente, A., Bing, N., Hoeschele, I. & Mendes, P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20, 3565–3574 (2004).
36. Magwene, P. M. & Kim, J. Estimating genomic coexpression networks using first-order conditional independence. Genome Biol. 5, R100 (2004).
37. Zhu, J. et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet. Genome Res. 105, 363–374 (2004). This paper was the first to integrate expression QTL data and phenotypic correlation data into causal modelling, as well as to describe the crucial role of genetic perturbations in anchoring causal links in the Bayesian network context.
38. Li, R. et al. Structural model analysis of multiple quantitative traits. PLoS Genet. 2, e114 (2006).
39. Chen, L. S., Emmert-Streib, F. & Storey, J. D. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 8, R219 (2007). This paper details a conservative analysis pipeline for uncovering high-confidence causal links with a well-defined false-discovery rate.
40. Ferrara, C. T. et al. Genetic networks of liver metabolism revealed by integration of metabolic and transcriptional profiling. PLoS Genet. 4, e1000034 (2008).
41. Bing, N. & Hoeschele, I. Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics 170, 533–542 (2005).
42. Li, H. et al. Inferring gene transcriptional modulatory relations: a genetical genomics approach. Hum. Mol. Genet. 14, 1119–1125 (2005).
43. Tu, Z. et al. An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22, e489–e496 (2006).
44. Suthram, S. et al. eQED: an efficient method for interpreting eQTL associations using protein networks. Mol. Syst. Biol. 4, 162 (2008).
45. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).
46. Dixon, A. L. et al. A genome-wide association study of global gene expression. Nature Genet. 39, 1202–1207 (2007).
47. Goring, H. H. et al. Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genet. 39, 1208–1216 (2007).
48. Stranger, B. E. et al. Population genomics of human gene expression. Nature Genet. 39, 1217–1224 (2007).
49. Schadt, E. E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 6, e107 (2008).
50. Zhu, J. et al. Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput. Biol. 3, e69 (2007).
51. Liu, B., de la Fuente, A. & Hoeschele, I. Gene network inference via structural equation modeling in genetical genomics experiments. Genetics 178, 1763–1776 (2008).
52. Ghazalpour, A. et al. Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genet. 2, e130 (2006).
53. Lee, S. I. et al. Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc. Natl Acad. Sci. USA 103, 14062–14067 (2006).
54. Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, 1930). 55. Ronald, J. & Akey, J. M. The evolution of gene expression QTL in Saccharomyces cerevisiae.
PLoS ONE 2, e678 (2007). This paper is a founding contribution to the field of functional population genomics; it addresses the genomic basis of phenotypic evolution from the perspective of the functional alleles segregating in populations.
56. Yvert, G. et al. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nature Genet. 35, 57–64 (2003).
57. Barton, N. H. & Keightley, P. D. Understanding quantitative genetic variation. Nature Rev. Genet. 3, 11–21 (2002).
58. Ohta, T. Origin of the neutral and nearly neutral theories of evolution. J. Biosci. 28, 371–377 (2003).
59. Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346–350 (2008).
this front67,68. Some missing causal links will arise from an organism’s history, because genotypic variables act across an individual’s lifespan. For example, the expression of a gene that acts early in an organism’s life might show no correlation with the phenotypes it affected at the time of measurement, but the gene–trait relationship will remain6. The catalogue of genetic causes will also be incomplete. For large numbers of gene-expression traits, the underlying genetic variants that affect their expression will be undetected because of insufficient power26. A simple solution to this problem is to use larger sample sizes, and on this front progress is also being made69.
Finally, the realm of biological causes is enormous4, and most experiments limit exploration to a tiny controlled corner of this realm (Box 1). Studies of common lines in multiple environments are now providing the first steps towards integrating genetic perturbations and environmental perturbations into a single view of causal networks70,71. Analysing transcript abundances across multiple tissues and sexes is equally important for studying context-dependent causal networks, because each cell type provides a distinct environment for the genome32,72–74. Ultimately, as data collection becomes more rapid and less expensive, researchers will be able to study a broader range of conditions.
Natural genetic variation is the stuff of evolution and the cause of herit- able susceptibility to diseases. Its properties are fundamentally important to a wide range of biological disciplines. It is therefore fortunate that natural genetic variation has the character of an ideal multifactorial perturbation, providing a natural experimental design that is helping researchers to uncover the mechanistic basis of the map that connects genotype to phenotype. As the molecular catalogues of genomics yield to the integrated models of systems biology, natural genetic variation will have an increasingly central role.
1. Friedman, N., Linial, M., Nachman, I. & Pe’er, D. Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000). This paper provides a clear overview of Bayesian network formalisms, the main framework for causal network inference at present, and demonstrates how they can be applied to gene-expression data.
2. Bonneau, R. et al. A predictive model for transcriptional control of physiology in a free living cell. Cell 131, 1354–1365 (2007).
3. Sieberts, S. K. & Schadt, E. E. Moving toward a system genetics view of disease. Mamm. Genome 18, 389–401 (2007).
4. Oyama, S. The Ontogeny of Information: Developmental Systems and Evolution (Duke Univ. Press, 2000).
5. Fisher, R. A. The arrangement of field experiments. J. Ministry Agric. Great Britain 33, 503–511 (1926).
6. Jansen, R. C. & Nap, J. P. Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001). This was the first article in which the many advantages of using natural variation to probe gene-expression networks were articulated.
7. Jansen, R. C. Studying complex biological systems using multifactorial perturbation. Nature Rev. Genet. 4, 145–151 (2003).
8. Brem, R. B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002).
9. Schadt, E. E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003). References 8 and 9 provided the first empirical results showing the power of genetic analysis of genome-wide gene expression.
10. Rockman, M. V. & Kruglyak, L. Genetics of global gene expression. Nature Rev. Genet. 7, 862–872 (2006).
11. Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998). 12. Stamatoyannopoulos, J. A. The genomics of gene expression. Genomics 84, 449–457
(2004). 13. Perez-Enciso, M. In silico study of transcriptome genetic variation in outbred populations.
Genetics 166, 547–554 (2004). 14. Alberts, R. et al. A statistical multiprobe model for analyzing cis and trans genes in genetical
genomics experiments with short-oligonucleotide arrays. Genetics 171, 1437–1439 (2005).
15. Carlborg, O. et al. Methodological aspects of the genetic dissection of gene expression. Bioinformatics 21, 2383–2393 (2005).
16. Storey, J. D., Akey, J. M. & Kruglyak, L. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol. 3, e267 (2005).
17. Kendziorski, C. M. et al. Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62, 19–27 (2006).
18. Kulp, D. C. & Jagalur, M. Causal inference of regulator–target pairs by gene mapping of expression phenotypes. BMC Genomics 7, 125 (2006).
19. Jia, Z. & Xu, S. Mapping quantitative trait loci for expression abundance. Genetics 176, 611–623 (2007).
20. Doss, S., Schadt, E. E., Drake, T. A. & Lusis, A. J. Cis-acting expression quantitative trait loci in mice. Genome Res. 15, 681–691 (2005).
21. Ronald, J., Brem, R. B., Whittle, J. & Kruglyak, L. Local regulatory variation in Saccharomyces cerevisiae. PLoS Genet. 1, e25 (2005).
743
60. Schliekelman, P. Statistical power of expression quantitative trait loci for mapping of complex trait loci in natural populations. Genetics 178, 2201–2216 (2008).
61. Petkov, P. M. et al. Evidence of a large-scale functional organization of mammalian chromosomes. PLoS Genet. 1, e33 (2005).
62. Begun, D. J. & Aquadro, C. F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).
63. Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).
64. Kulathinal, R. J., Bennett, S. M., Fitzpatrick, C. L. & Noor, M. A. Fine-scale mapping of recombination rate in Drosophila refines its correlation to diversity and divergence. Proc. Natl Acad. Sci. USA 105, 10051–10056 (2008).
65. Bystrykh, L. et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nature Genet. 37, 225–232 (2005).
66. Wentzell, A. M. et al. Linking metabolic QTLs with network and cis-eQTLs controlling biosynthetic pathways. PLoS Genet. 3, 1687–1701 (2007).
67. Foss, E. J. et al. Genetic basis of proteome variation in yeast. Nature Genet. 39, 1369–1375 (2007).
68. Stylianou, I. M. et al. Applying gene expression, proteomics and single-nucleotide polymorphism analysis for complex trait gene identification. Genetics 178, 1795–1805 (2008).
69. Churchill, G. A. et al. The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nature Genet. 36, 1133–1137 (2004).
70. Li, Y. et al. Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. 2, e222 (2006).
71. Smith, E. N. & Kruglyak, L. Gene–environment interaction in yeast gene expression. PLoS Biol. 6, e83 (2008).
72. Hubner, N. et al. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nature Genet. 37, 243–253 (2005).
73. Cotsapas, C. J. et al. Genetic dissection of gene regulation in multiple mouse tissues. Mamm. Genome 17, 490–495 (2006).
74. Wang, S. et al. Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet. 2, e15 (2006).
Acknowledgements I thank the Jane Coffin Childs Memorial Fund for Medical Research and New York University for support, and L. Chen for discussion.
Author Information Reprints and permissions information is available at www.nature.com/reprints. The author declares no competing financial interests. Correspondence should be addressed to the author ([email protected]).
744
<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.3 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends false /DetectCurves 0.0000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize false /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage false /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts false /TransferFunctionInfo /Apply /UCRandBGInfo /Remove /UsePrologue false /ColorSettingsFile (None) /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 450 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.00000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 450 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.00000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 2400 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /PDFX1a:2001 ] /PDFX1aCheck true /PDFX3Check false /PDFXCompliantPDFOnly true /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox false /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (U.S. Web Coated \050SWOP\051 v2) /PDFXOutputConditionIdentifier (CGATS TR 001) /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /False /CreateJDFFile false /Description << /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f0062006500200050004400460020658768637b2654080020005000440046002f0058002d00310061003a0032003000300031002089c4830330028fd9662f4e004e2a4e1395e84e3a56fe5f6251855bb94ea46362800c52365b9a7684002000490053004f0020680751c6300251734e8e521b5efa7b2654080020005000440046002f0058002d00310061002089c483037684002000500044004600206587686376848be67ec64fe1606fff0c8bf753c29605300a004100630072006f00620061007400207528623763075357300b300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200034002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002> /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef67b2654080020005000440046002f0058002d00310061003a00320030003000310020898f7bc430025f8c8005662f70ba57165f6251675bb94ea463db800c5c08958052365b9a76846a196e96300295dc65bc5efa7acb7b2654080020005000440046002f0058002d003100610020898f7bc476840020005000440046002065874ef676848a737d308cc78a0aff0c8acb53c395b1201c004100630072006f00620061007400204f7f7528800563075357201d300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200034002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002> /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c00200064006500720020006600f800720073007400200073006b0061006c00200073006500730020006900670065006e006e0065006d00200065006c006c0065007200200073006b0061006c0020006f0076006500720068006f006c006400650020005000440046002f0058002d00310061003a0032003000300031002c00200065006e002000490053004f002d007300740061006e0064006100720064002000740069006c00200075006400760065006b0073006c0069006e00670020006100660020006700720061006600690073006b00200069006e00640068006f006c0064002e00200059006400650072006c006900670065007200650020006f0070006c00790073006e0069006e0067006500720020006f006d0020006f007000720065007400740065006c007300650020006100660020005000440046002f0058002d00310061002d006b006f006d00700061007400690062006c00650020005000440046002d0064006f006b0075006d0065006e007400650072002000660069006e006400650072002000640075002000690020006200720075006700650072006800e5006e00640062006f00670065006e002000740069006c0020004100630072006f006200610074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200034002e00300020006f00670020006e0079006500720065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e0020005000440046002f0058002d00310061003a0032003000300031002d006b006f006d00700061007400690062006c0065006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002e0020005000440046002f0058002d003100610020006900730074002000650069006e0065002000490053004f002d004e006f0072006d0020006600fc0072002000640065006e002000410075007300740061007500730063006800200076006f006e0020006700720061006600690073006300680065006e00200049006e00680061006c00740065006e002e0020005700650069007400650072006500200049006e0066006f0072006d006100740069006f006e0065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e0020005000440046002f0058002d00310061002d006b006f006d00700061007400690062006c0065006e0020005000440046002d0044006f006b0075006d0065006e00740065006e002000660069006e00640065006e002000530069006500200069006d0020004100630072006f006200610074002d00480061006e00640062007500630068002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200034002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f00620065002000710075006500200073006500200064006500620065006e00200063006f006d00700072006f0062006100720020006f002000710075006500200064006500620065006e002000630075006d0070006c006900720020006c00610020006e006f0072006d0061002000490053004f0020005000440046002f0058002d00310061003a00320030003000310020007000610072006100200069006e00740065007200630061006d00620069006f00200064006500200063006f006e00740065006e00690064006f00200067007200e1006600690063006f002e002000500061007200610020006f006200740065006e006500720020006d00e1007300200069006e0066006f0072006d00610063006900f3006e00200073006f0062007200650020006c0061002000630072006500610063006900f3006e00200064006500200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006d00700061007400690062006c0065007300200063006f006e0020006c00610020006e006f0072006d00610020005000440046002f0058002d00310061002c00200063006f006e00730075006c007400650020006c006100200047007500ed0061002000640065006c0020007500730075006100720069006f0020006400650020004100630072006f006200610074002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200034002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f006200650020005000440046002000710075006900200064006f006900760065006e0074002000ea0074007200650020007600e9007200690066006900e900730020006f0075002000ea00740072006500200063006f006e0066006f0072006d00650073002000e00020006c00610020006e006f0072006d00650020005000440046002f0058002d00310061003a0032003000300031002c00200075006e00650020006e006f0072006d0065002000490053004f00200064002700e9006300680061006e0067006500200064006500200063006f006e00740065006e00750020006700720061007000680069007100750065002e00200050006f0075007200200070006c007500730020006400650020006400e9007400610069006c007300200073007500720020006c006100200063007200e9006100740069006f006e00200064006500200064006f00630075006d0065006e00740073002000500044004600200063006f006e0066006f0072006d00650073002000e00020006c00610020006e006f0072006d00650020005000440046002f0058002d00310061002c00200076006f006900720020006c00650020004700750069006400650020006400650020006c0027007500740069006c0069007300610074006500750072002000640027004100630072006f006200610074002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200034002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e> /ITA (Utilizzare queste impostazioni per creare documenti Adobe PDF che devono essere conformi o verificati in base a PDF/X-1a:2001, uno standard ISO per lo scambio di contenuto grafico. Per ulteriori informazioni sulla creazione di documenti PDF compatibili con PDF/X-1a, consultare la Guida dell'utente di Acrobat. I documenti PDF creati possono essere aperti con Acrobat e Adobe Reader 4.0 e versioni successive.) /JPN <FEFF30b030e930d530a330c330af30b330f330c630f330c4306e590963db306b5bfe3059308b002000490053004f00206a196e96898f683c306e0020005000440046002f0058002d00310061003a00320030003000310020306b6e9662e03057305f002000410064006f0062006500200050004400460020658766f830924f5c62103059308b305f3081306b4f7f75283057307e30593002005000440046002f0058002d0031006100206e9662e0306e00200050004400460020658766f84f5c6210306b306430443066306f3001004100630072006f006200610074002030e630fc30b630ac30a430c9309253c2716730573066304f30603055304430023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200034002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002> /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020c791c131d558b294002000410064006f0062006500200050004400460020bb38c11cb2940020d655c778c7740020d544c694d558ba700020adf8b798d53d0020cee8d150d2b8b97c0020ad50d658d558b2940020bc29bc95c5d00020b300d55c002000490053004f0020d45cc900c7780020005000440046002f0058002d00310061003a0032003000300031c7580020addcaca9c5d00020b9dec544c57c0020d569b2c8b2e4002e0020005000440046002f0058002d003100610020d638d65800200050004400460020bb38c11c0020c791c131c5d00020b300d55c0020c790c138d55c0020c815bcf4b2940020004100630072006f0062006100740020c0acc6a90020c124ba85c11cb97c0020cc38c870d558c2edc2dcc624002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200034002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e> /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die moeten worden gecontroleerd of moeten voldoen aan PDF/X-1a:2001, een ISO-standaard voor het uitwisselen van grafische gegevens. Raadpleeg de gebruikershandleiding van Acrobat voor meer informatie over het maken van PDF-documenten die compatibel zijn met PDF/X-1a. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 4.0 en hoger.) /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200073006b0061006c0020006b006f006e00740072006f006c006c0065007200650073002c00200065006c006c0065007200200073006f006d0020006d00e50020007600e6007200650020006b006f006d00700061007400690062006c00650020006d006500640020005000440046002f0058002d00310061003a0032003000300031002c00200065006e002000490053004f002d007300740061006e006400610072006400200066006f007200200075007400760065006b0073006c0069006e00670020006100760020006700720061006600690073006b00200069006e006e0068006f006c0064002e00200048007600690073002000640075002000760069006c0020006800610020006d0065007200200069006e0066006f0072006d00610073006a006f006e0020006f006d002000680076006f007200640061006e0020006400750020006f007000700072006500740074006500720020005000440046002f0058002d00310061002d006b006f006d00700061007400690062006c00650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020007300650020006200720075006b00650072006800e5006e00640062006f006b0065006e00200066006f00720020004100630072006f006200610074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200034002e003000200065006c006c00650072002000730065006e006500720065002e> /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f00620065002000500044004600200063006100700061007a0065007300200064006500200073006500720065006d0020007600650072006900660069006300610064006f00730020006f0075002000710075006500200064006500760065006d00200065007300740061007200200065006d00200063006f006e0066006f0072006d0069006400610064006500200063006f006d0020006f0020005000440046002f0058002d00310061003a0032003000300031002c00200075006d0020007000610064007200e3006f002000640061002000490053004f002000700061007200610020006f00200069006e007400650072006300e2006d00620069006f00200064006500200063006f006e0074006500fa0064006f00200067007200e1006600690063006f002e002000500061007200610020006f00620074006500720020006d00610069007300200069006e0066006f0072006d006100e700f50065007300200073006f00620072006500200063006f006d006f00200063007200690061007200200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006d00700061007400ed007600650069007300200063006f006d0020006f0020005000440046002f0058002d00310061002c00200063006f006e00730075006c007400650020006f0020004700750069006100200064006f002000750073007500e100720069006f00200064006f0020004100630072006f006200610074002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200034002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e> /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002c0020006a006f0074006b00610020007400610072006b0069007300740065007400610061006e00200074006100690020006a006f006900640065006e0020007400e400790074007900790020006e006f00750064006100740074006100610020005000440046002f0058002d00310061003a0032003000300031003a007400e400200065006c0069002000490053004f002d007300740061006e006400610072006400690061002000670072006100610066006900730065006e002000730069007300e4006c006c00f6006e00200073006900690072007400e4006d00690073007400e4002000760061007200740065006e002e0020004c0069007300e40074006900650074006f006a00610020005000440046002f0058002d00310061002d00790068007400650065006e0073006f00700069007600690065006e0020005000440046002d0064006f006b0075006d0065006e0074007400690065006e0020006c0075006f006d0069007300650073007400610020006f006e0020004100630072006f0062006100740069006e0020006b00e400790074007400f6006f0070007000610061007300730061002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200034002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d00200073006b00610020006b006f006e00740072006f006c006c006500720061007300200065006c006c0065007200200073006f006d0020006d00e50073007400650020006d006f0074007300760061007200610020005000440046002f0058002d00310061003a0032003000300031002c00200065006e002000490053004f002d007300740061006e00640061007200640020006600f6007200200075007400620079007400650020006100760020006700720061006600690073006b007400200069006e006e0065006800e5006c006c002e00200020004d0065007200200069006e0066006f0072006d006100740069006f006e0020006f006d00200068007500720020006d0061006e00200073006b00610070006100720020005000440046002f0058002d00310061002d006b006f006d00700061007400690062006c00610020005000440046002d0064006f006b0075006d0065006e0074002000660069006e006e00730020006900200061006e007600e4006e00640061007200680061006e00640062006f006b0065006e002000740069006c006c0020004100630072006f006200610074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200034002e00300020006f00630068002000730065006e006100720065002e> /ENU (Use these settings to create Adobe PDF documents that are to be checked or must conform to PDF/X-1a:2001, an ISO standard for graphic content exchange. For more information on creating PDF/X-1a compliant PDF documents, please refer to the Acrobat User Guide. Created PDF documents can be opened with Acrobat and Adobe Reader 4.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /HighResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [2400 2400] /PageSize [665.858 854.929] >> setpagedevice

Documents

Reverse engineering the genotype– phenotype map with