Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
Training program on
Approaches to Plant Genomics March 4 - March 8, 2013
Coordinator Dr. P. Nagarajan
Professor & Head (DPMB&B)
Co-coordinator Dr. J. Ramalingam
Professor
Compiled by Dr. P. Nagarajan, Professor & Head
Dr. J. Ramalingam, Professor
Distributed Information Sub Centre (DISC) Department of Plant Molecular Biology and Bioinformatics
Centre for Plant Molecular Biology and Biotechnology Tamil Nadu Agricultural University
Coimbatore- 641003
TABLE OF CONTENTS
S.No Title Page No.
Theory Sessions
1. Plant Genome Databases 1
2. Application of Next Generation Sequencing in Plant Genomics 7
3. Gene annotation 14
4. Transcriptomics for miRNAs: Innovative Frontiers in Regulatory Genomics 18
5. Next Generation Sequencing Technology platform for development of SNP markers and association mapping studies
29
6. Functional genomics 38
7. Genome Mapping: Establishing route maps of Genomes 41
8. Nutrigenomics in Agriculture 55
9. Targeting Induced Local Lesions IN Genomes (TILLING): Application of Traditional mutagenesis for Plant Functional Genomics
60
10. Genomic approaches of nitrogen and phosphorus efficient rice 69
Practical Sessions
1. Plant Genome Databases and Comparison of Genome Sequences 111
2. Prediction of genes in genomic sequences 115
3. Comparative Genomics 119
4. Microarray-based gene expression analysis 123
1
Plant Genome Databases
Dr. J.Ramalingam, Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
Databases are capable of storing vast amounts of data. A database processes the data
stored in its files and transform into usable information when the need arises. The plant
genomics research group focuses on the analysis of plant genomes, using bioinformatics
techniques. To store and manage the data, many databases were formed which provides a data
and information resource for individual plant species. In addition the plant databases can
provide a platform for integrative and comparative plant genome research. Number of plant
databases is available which are repository for morphological, markers and sequence data. The
information on fully sequenced organism like Rice, Arabidopsis, Maize and Soybean are
valuable for any scientific work.
Rice
Rice (Oryza sativa L.) is the staple food for more than three billion people, over half the
world’s population. It provides 27% of dietary energy and 20% of dietary protein in the
developing world. About 90% of the world’s rice is grown and consumed in Asia (Zeigler &
Barclay, 2008). Research into rice is crucial for the development of technologies that will
increase the productivity for farmers who rely on rice for their livelihood. Research in rice is
aimed at increasing the yield, yield stability and improving resistance against abiotic and biotic
stresses. There exist a huge collection of rice germplasm carrying useful genes for the above
traits and have been continuously exploited by the plant breeders towards developing
improved rice varieties. Detailed documentation of the germplasm/varieties facilitated by an
easy and efficient search and retrieval system is an absolute necessity considering the wide
range of breeding activities taken up globally in rice, and other crops. There are a selective
public/private databases carrying molecular and genetic data of specific model species such as
2
maize (Lawrence et al 2005), Medicago (Cannon et al 2005), rice (Sakata et al 2002) and
Arabidopsis (Poole, 2007) which caters to more of an investigative nature of research. However,
when it comes to applied varietal developmental programs, familiarity of the plant breeders on
aspects such as, pedigree, yield, resistance levels to biotic and abiotic stresses etc., is important
for evolving a successful hybridization program. TNAURice, a database on rice was developed
from this centre to serve as ready reckoner to the rice breeders; it is a compilation of the rice
varieties developed from Tamil Nadu Agricultural University, Coimbatore.
The use of database technologies has drawn the attention of a subset of the biological
community, as of now it is limited to a small sector of the scientific community. However, more
and more individual institutions are generating experimental data on a large scale and are in
need of developing and managing databases of their own. Effective use of information may
strongly promote biological studies, and may lead to many important findings. Plant germplasm
resources play an important role in crop improvement programs. The use and benefits derived
from conserved germplasm is the sole criterion for assessing the genetic conservation program
of a crop. The use of rice germplasm includes both direct use in rice production and indirect use
in rice breeding programs as parents. Presently, information regarding the available and
evaluated germplasm is rarely documented in the form of a well structured database. It is
therefore, time-consuming and laborious for individual researchers to extract and organize
varietal information from sources such as, notebooks/records or from typed electronic pages.
Thus the primitive nature of documentation of the varietal information is a major limitation and
restricts its utility by time and space. This draws an urgent need to consolidate rice germplasm
through a database system.
There are few Rice Genome Databases
1. Oryza Base
2. Gramene
Oryzabase
The Oryzabase (http://www.nig.ac.jp) is a comprehensive rice science database established
in 2000 by rice researcher's committee in Japan. The database is originally aimed to gather as
3
much knowledge as possible ranging from classical rice genetics to recent genomics and from
fundamental information to hot topics. The Oryzabase consists of five parts, (1) genetic
resource stock information, (2) gene dictionary, (3) chromosome maps, (4) mutant images, and
(5) fundamental knowledge of rice science. This database provides more extensive cross-
referencing of Oryzabase to the major DNA sequence database, literature database and other
plant databases in order to provide the wealth of information to rice researchers. It currently
contains information on NBRP Strains: 17,229, Strains: 10,938, Markers: 4,904, Trait genes:
4,753, References for new gene and sequences: 5,739
Gramene
Gramene (http://www.gramene.org) is a curated, open-source, data resource for
comparative genome analysis in the grasses. Its goal is to facilitate the study of cross-species
homology relationships using information derived from public projects involved in genomic
and EST sequencing, protein structure and function analysis, genetic and physical mapping,
interpretation of biochemical pathways, gene and QTL localization and descriptions of
phenotypic characters and mutations
The rice genome is more than a resource for understanding the biology of a single
species. It is a window into the structure and function of genes in other crop grasses as well.
Using rice as the sequenced reference genome, researchers can identify and understand the
relationships among genes, pathways and phenotypes in a wide range of grass species.
Extensive work over the past two decades has shown remarkably consistent conservation of
gene order within large segments of linkage groups in rice, maize, sorghum, barley, wheat, rye,
sugarcane and other agriculturally important grasses. A substantial body of data supports the
notion that the rice genome is substantially colinear at both large and short scales with other
crop grasses, opening the possibility of using rice synteny relationships to rapidly isolate and
characterize homologues in maize, wheat, barley and sorghum.
As an information resource, Gramene's purpose is to provide added value to data sets
available within the public sector, which will facilitate researchers' ability to understand the
rice genome and leverage the rice genomic sequence for identifying and understanding
4
corresponding genes, pathways and phenotypes in other crop grasses. This is achieved by
building automated and curated relationships between rice and other cereals for both
sequence and biology. The automated and curated relationships are queried and displayed
using controlled vocabularies and web-based displays. The controlled vocabularies
(Ontologies), currently being utilized includes Gene ontology, Plant ontology, Trait ontology,
Environment ontology and Gramene Taxonomy ontology. The web-based displays for
phenotypes include the Genes and Quantitative Trait Loci (QTL) modules. Sequence based
relationships are displayed in the Genomes module using the genome browser adapted from
Ensembl, in the Maps module using the comparative map viewer (CMap) from GMOD, and in
the Proteins module displays. BLAST is used to search for similar sequences. Literature
supporting all the above data is organized in the Literature database.
Database for Arabidopsis - TAIR
The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org) maintains a
database of genetic and molecular biology data for Arabidopsis thaliana, a widely used model
plant. TAIR is located at the Carnegie Institution for Science Department of Plant Biology,
Stanford, California. Funding is provided by the National Science Foundation, (Grant No. DBI-
0850219). Additional support is provided by corporate and nonprofit organizations through the
TAIR Sponsorship Program.TAIR collaborates with the Arabidopsis Biological Resource Center
(ABRC) to provide researchers with the ability to search and order stocks. The ABRC's mission is
to aquire, preserve and distribute seed and DNA resources that are useful to the Arabidopsis
research community.TAIR datasets can also be downloaded for your convenience. In addition,
pages on news, information on the Arabidopsis Genome Initiative (AGI), Arabidopsis lab
protocols, and useful links are provided.
Data available from TAIR includes the complete genome sequence along with gene
structure, gene product information, metabolism, gene expression, DNA and seed stocks,
genome maps, genetic and physical markers, publications, and information about the
Arabidopsis research community. Gene product function data is updated every two weeks from
the latest published research literature and community data submissions. TAIR also provides
5
extensive linkouts from our data pages to other Arabidopsis resources. It currently holds the
information on AA Sequence: 72149, Clone: 1998885, Clone End: 1883475, Gene: 76850,
Database Reference: 1602, Flanking Sequence: 1160572, Genetic Marker: 4087, Nucleotide
sequence: 2766322, Germplasm: 547838, Protein Domain: 27203, Restriction Enzyme: 619,
Vector: 503
Database for Maize: MaizeGDB
The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for
maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal
mapping data. MaizeGDB represents the synthesis of all data available previously from ZmDB
(Zea mays DataBase) and from MaizeDB-databases that have been superseded by MaizeGDB.
MaizeGDB is a USDA/ARS funded project which develops an effective interface to access this
data and develop additional tools to make data analysis easier. Its goal in the long term is to
have a true next-generation online maize database. In addition, MaizeGDB provides contact
information for over 2400 maize cooperative researchers, facilitating interactions between
members of the rapidly expanding maize community.
MaizeGDB (http://www.maizegdb.org) is the community database for biological
information about the crop plant Zea mays ssp. mays. Genetic, genomic, sequence, gene
product, functional characterization, literature reference, and person/organization contact
information are among the data types accessible through this database. MaizeGDB provides
web-based tools for ordering maize stocks from several organizations including the Maize
Genetics Cooperation Stock Center and the North Central Regional Plant Introduction Station
(NCRPIS). Sequence searches yield records displayed with embedded links to facilitate ordering
cloned sequences from various groups including the Maize Gene Discovery Project and the
Clemson University Genomics Institute. MaizeGDB can be accessed at
http://www.maizegdb.org.
6
Database for Soyabean - Soybean Genome Database (SoyGD)
The soybean genome is comparatively large, approximately 1,112 MB; a diploidized
tetraploid, the product of two genome duplications that occurred 8-10 MYA(Million years ago)
and 40-50 MYA; extensively segmented and reshuffled; and, 64 percent euchromatic and 36
percent heterochromatic. The Soybean Genome Database (SoyGD) genome browser
(http://soybeangenome.siu.edu) has, since 2002, integrated and served the publicly available
soybean physical map; bacterial artificial chromosome (BAC) fingerprint database and genetic
map associated genomic data.
Above is the information on some of the agricultural important organisms. Numerous
plant genomes were sequenced and are in sequencing process and are available online
(www.genomesonline.org).
References:
Zeigler, R.S., and Barclay, A., 2008 Rice 1: 3.
Lawrence, C.J., et al., 2005 Plant Physiol 138: 55.
Cannon, S.B., et al., 2005 Plant Physiol 138: 38.
Sakata, K., et al., 2002 Nucleic Acids Res 30: 98.
Poole, R.L., 2007 Methods Mol Biol 406:179.
7
Application of Next Generation Sequencing in Plant Genomics
Dr. K.K. Kumar, Assistant Professor Department of Plant Biotechnology, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
DNA sequencing was performed almost exclusively by the Sanger Automated DNA
sequencing method before 2005, which has excellent accuracy and reasonable read length but
very low throughput. Sanger sequencing was used to produce the reference genome of
Arabidopsis thaliana, Oryza sativa, Escherichia coli, Drosophila melanogaster, Caenorahabditis
elegans, and human. Since 2005, new types of sequencing instruments have been introduced
that permit amazing acceleration of the data collection rates for DNA sequencing, hence called
Next Generation Sequencer (NGS). The next-generation sequencing platforms are also referred
to as ‘second-generation platforms’, as they constitute the second phase of technologies after
Sanger sequencing. The next generation sequencing can produce over 100 times more data
compared to the most sophisticated capillary sequencer based on the Sanger Method. Hence,
NGS is a high throughput DNA sequence reducing the time and cost of sequencing. High
throughput sequencing machines and advancement of modern bioinformatics tools at
unprecedented pace, the target goal of sequencing individual genomes of living organism at a
cost of $1,000 each is seemed to be realistically feasible in the near future. A common strategy
for NGS is to use DNA synthesis or ligation process to read through many different DNA
templates in parallel (Fuller et al., 2009). Hence, Next generation sequencer also termed as
massively parallel sequencing. Second-generation sequencing platforms that produce large
amounts (typically millions) of short DNA sequence reads of length typically between 25 and
400 bp. While these reads are shorter than the traditional Sanger sequence reads with 700-
1200 bp.
There are five Next generation sequencing platforms available commercially. Among
them, the Roche/454 FLX, the Illumina/Solexa Genome Analyzer, and the Applied Biosystems
(ABI) SOLiD Analyzer are currently dominating the market. The other two platforms, the
8
Polonator and the Helicos have just recently been introduced and are not widely used. Very
recently, newer next-generation sequencing platforms have emerged that are referred to as
‘third-generation platforms’, as they represent a further paradigm shift in sequencing
technology compared with the Sanger platform and the Illumin , SoLiD and Roche 454
platforms. Two such platforms commercially available are the Pacific Biosciences PacBio RS and
the Ion Torrent Personal Genome Machine
Application of Next Generation Sequencing
Next generation sequencing technologies are being utilized for de novo sequencing of
bacterial and lower eukaryotic genomes, genome re-sequencing and transcriptome analysis
(RNA-sequencing). Compared to Sanger sequencing there are many different uses for next-
generation sequencer. The NGS is also used for genome-wide profiling of epigenetic marks and
chromatin structure using other seq-based methods (ChIP–seq, methyl– seq and DNase–seq),
and species classification and gene discovery by metagenomics studies (Metzker, 2010) . Ever
declining sequencing costs and increased data output and sample throughput for Next
generation sequencing and Third generation sequencing technologies enable the plant
genomics and breeding community to undertake genotyping-by-sequencing (GBS).
a. Plant genome sequencing
The preferred method of Plant whole genome sequencing is to identify the overlapping
BAC clones covering the entire genome that had been isolated and located on a physical map,
followed by sequencing the individual BAC clones, and then genome assembly is done. This
method is termed the BAC by BAC sequencing method. The benefit of BAC by BAC genome
sequencing is the reduction of complexity, as only relatively short regions are assembled at one
time, and this approach is still favoured for genomes which contain a large number of repetitive
elements. This strategy was used for the sequencing the rice, Arabidopsis, Poplar, Grape, Peach,
Sorghum (Paterson et al. 2009) and Eucalyptus using the standard Sanger sequencing. BAC
sequencing can use traditional Sanger or NGS platforms like Roche 454 or Illumina GAII or
potentially combinations of these methods. After the availability of NGS platforms, the
9
approaches to genome sequencing have also changed, moving from laborious BAC by BAC to
whole genome shotgun sequencing (WGS) methods (reviewed by Imelfort & Edwards, 2009) .
The cost of producing and mapping the BAC library as well as the significant cost associated
with the sequencing of large numbers of BACs makes BAC by BAC sequencing approaches
increasingly unfavourable and several genome sequencing projects which started with the BAC
by BAC approach has now moved to whole genome shotgun (WGS) methods using NGS.
Recently, different NGS platforms are used for sequencing the plant genome (reviewed
by Hamilton and Buell, 2012). For example, NGS platform, Roche 454 technology is being used
to sequence the 430 Mbp genome of Theobroma cacao, while a combination of Sanger and
Roche 454 Sanger sequencing and 454 (4 X versus 12 X coverage, respectively) is being used to
interrogate the apple genome. Illumina Solexa and Roche 454 sequencing has also been used
to characterize the genomes of cotton. Roche 454 sequencing has been used to survey the
genome of Miscanthus, while Sanger, Illumina Solexa and Roche 454 sequencing is being used
to interrogate the banana genome. The Roche 454 platform has also been utilized for de novo
sequencing of plant genomes. The genomes of the cucumber (Cucumis sativus, 376 Mb) and A.
thaliana (157 Mb) were sequenced as a proof-of-principle using the 454 FLX Titanium platforms,
with mate-paired libraries being essential to the assembly process. Roche 454 has been applied
for the sequencing of complex BACs from barley and this has been complemented by repeat
characterization using WGS Illumina Solexa data. However, the size of the wheat genome is
much larger (17 000 Mbp) than related cereal genomes such as barley (Hordeum vulgare, 5000
Mbp), rye (Secale cereale, 9100 Mbp) and oat (Avena sativa, 11 000 Mbp). The size and
hexaploid nature of the wheat genome creates significant problems in elucidating its genome
sequence. However, bread wheat (Triticum aestivum) draft genome sequencing was done
using the Roche 454 sequencer (Brenchley et al., 2012)
b. Plant Genotyping with Next Generation Sequencing
The genotyping of plants has progressed through a range of technologies like RFLP,
RAPD, AFLP followed by microsatellites or simple sequence repeats SSR. Since 2000, single
10
nucleotide polymorphism (SNP) markers have increasingly replaced the SSR markers as more
sequence data and techniques for SNP discovery and analysis have advanced. SNP analysis
methods with greater repeatability compared to SSR marker methods have been developed
facilitating much more widespread adoption of SNP methods. By comparing the DNA sequence
of any plant genotype with its reference genome help to identify the SNP marker. The discovery
of genetic polymorphism between plants has been based upon a range of mutation detection
techniques and DNA sequencing. With the availability of NGS platforms, the DNA sequencing
has recently become a more cost-effective method for discovery of sequence differences
(Mardis, 2008).
Genome sequencing projects focus on obtaining the full genome sequence (reference
genome) of a single organism as a representative of a species, polymorphism studies focus on
genome-wide variation present within a species. Although the full genome sequence of a
species provides valuable information regarding homology and distribution of genes across a
genome, it does not provide information regarding the dynamic processes that affect a genome
and the genes therein. Next generation offer the scope for re-sequencing the individual plant
cultivar genome for which reference gene is present at low cost and time. Comparison of the
individual plant cultivar genome sequence with the reference gene will help to identify these
genetic variants, including single nucleotide variants (SNVs) or single nucleotide polymorphisms
(SNPs), small insertions and deletions (indels, 1–1000 bp), and structural and genomic variants
(>1000 bp). As increasing numbers of reference genomes become available, it is expected that
whole genome re-sequencing of crop genomes will become common. Ossowski et al (2010)
reported the re-sequencing of reference Arabidopsis accession Col-0 and two divergent strains,
Bur-0 and Tsu-1 using Illumina Solexa technology to produce between 15 X and 25 X coverage.
As well as finding over 2000 potential errors in the reference genome sequence, they identified
more than 800 000 unique Single nucleotide polymorphism (SNPs) and almost 80 000 1–3 bp
indels. The resequencing of rice and Medicago truncatula has also been undertaken using the
Illumina Solexa for SNP discovery. A potential direct application of genome-wide SNP studies in
11
plant breeding is the generation of markers for the construction of (dense) genetic linkage
maps.
c. Transcriptomics
Next-generation high-throughput RNA sequencing technology (RNA-Seq) is a recently-
developed method for discovering, profiling, and quantifying RNA transcripts with several
advantages over other expression profiling technologies including higher sensitivity and the
ability to detect splicing isoforms and somatic mutations. Transcriptional profiling using high
throughput next generation sequencing (RNA-Seq) has emerged as a superior alternative to
microarrays. With the availability of faster and cheaper next generation sequencing platforms,
more transcriptomic analyses are performed using a recently-developed deep sequencing
approach, RNA-Seq (Wang et al., 2009). RNA-seq is a NGS method that sequences the
transcriptome (all RNA transcripts). RNA-Seq, with its high resolution and sensitivity, has
revealed many novel transcribed regions and Splicing isoforms of known genes. In addition,
results from RNA-Seq suggest the existence of a large number of novel transcribed regions in
every genome surveyed, including those of A. thaliana (Lister et al., 2008). These novel
transcribed regions, combined with many undiscovered novel splicing variants, suggest that
there is considerably more transcriptomic complexity than previously appreciated.
A related application of next-generation sequencing technologies to the analysis of
transcriptomes is small RNA such as miRNA and siRNA discovery and profiling. High-throughput
sequencing offers a greater potential for the identification of novel small RNAs as well as
profiling of known and novel small RNA genes. Small RNA profiling with 454 pyrosequencing
technology has been widely reported, which include studies in the moss Physcomitrella patens
(Axtell et al., 2006), A. thaliana (Henderson et al., 2006)
d. Other application of NGS
Next-generation sequencing methods with single-base resolution have begun to
characterize the epigenome, i.e. modifications to the DNA that do not affect the DNA sequence.
12
This includes DNA methylation, histone modification, nucleosome density and chromatin state.
Examination of the A. thaliana epigenome, which was surveyed at the single-base level using
the Illumina platform, revealed that DNA methylation was not evenly dispersed throughout the
genome or the genes, and that the location and abundance of small RNA targets were positively
associated with cytosine methylation (Lister et al., 2008).
References:
Axtell, M. J., Snyder, J. A., and Bartel, D. P., 2007 Common functions for small RNAs of land
plants. Plant Cell 19: 1750-1769.
Brenchley, R., Spannag, M., Pfeifer, M., et al., 2012 Analysis of the breadwheat genome using
whole-genome shotgun sequencing. Nature 491: 705-710.
Fuller, C. W., Middendorf, L. R., Benner, S. A, Church, G. M., Harris, T., Huang, X., Jovanovich, S.
B., Nelson, J. R., Schloss, J. A., Schwartz, D. C., and Vezenov, D. V., 2009 The challenges of
sequencing by synthesis. Nat.Biotechnol. 27:1013–1023.
Hamilton, J. P., and Buell, C. R., 2012 Advances in plant genome sequencing. The Plant J 70:
177-190.
Henderson, I. R., Zhang, X., Lu, C., Johnson, L., Meyers, B. C., Green, P. J., and Jacobsen, S. E.,
2006 Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and
DNA methylation patterning. Nat Genet. 38: 721-725.
Imelfort, M., and Edwards, D., 2009 De Novo seuqencing of plant genomes using second-
generation technologies. Briefings in Bioinformatics 10: 609-618.
Lister, R., O’Malley, R.C., Tonti-Filippini, J., Gregory, B. D., Berry, C. C., Millar, A. H. and Ecker, J.
R. 2008 Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell
133: 523–536.
Mardis, E. R., 2008 Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet.
9: 387-402.
13
Metzker, M. L., 2010 Sequencing technologies- the next generation. Nature Genetics 11: 31-46.
Paterson, A. H., Bowers, J. E., Bruggmann, R. et al., 2009 The Sorghum bicolor genome and the
diversification of grasses. Nature 457: 551–556.
14
Gene annotation
Dr. L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
The “Genome” is universally defined as the total nuclear DNA content of a haploid cell
or half the DNA content of a diploid cell of any given organism. Sequencing genome(s) is
regarded the way for an ultimate characterization of an organism. The first genome to be
sequenced was a small phage Φ-X174 of size 5.38 Kb in 1977, only after the advent of the
Sanger’s dideoxy method of DNA sequencing. The next major happening was the sequencing of
the first bacterial genome, Haemophilus influenzae of 1.8 Mb in size during 1995. These were
however remarkable achievements in the field of genomics which paved way for the successful
sequencing of hundreds of prokaryotic and a few eukaryotic genomes so far.
Concept of model plant genomes
Model organisms (Drosophila melanogaster, Caenorhabditis elegans, and
Saccharomyces cerevisiae) provide genetic and molecular insights into the biology of more
complex species. The effort to determine the nucleotide sequence of a plant genome has issues
with respect to the genetic makeup of the plants. The range of plant genome size is very large
extending from approximately the same size as the genome of many small animals through
more than five times as large as the human genome for many of the domesticated crops to
almost forty times as large as the human genome for some ornamental flowers. Hence, the
plant biologists started resorting to the concept of model organism which share features such
as being diploid and appropriate for genetic analysis, being amenable to genetic
transformation, having a (relatively) small genome and a short growth cycle, having commonly
available tools and resources, and being the focus of research by a large scientific community.
The species now used as model organisms for mono- and dicotyledonous plants are rice (Oryza
sativa) and Arabidopsis (Arabidopsis thaliana) respectively. And today we have a long list of
plants for which the complete genome sequence has been published (see table 1).
15
Table 1: The plant genome with the complete sequence published
Sl. No Plant Chr. No (n) Size (Mb)
1 Arabidopsis thaliana (thale cress) 5 120
2 Oryza sativa (rice) 12 430
3 Glycine max (soybean) 20 950
4 Medicago truncatula (barrel medic) 8 500
5 Populus trichocarpa (black cottonwood) 19 480
6 Sorghum bicolor 10 690
7 Vitis vinifera (wine grape) 19 500
8 Zea mays (corn) 10
Gene annotation
Whole genome sequencing of plants and animals had resulted in huge volumes of
sequence data which pose to be highly incomprehensible. Understanding the genome
constitution, growth and development, evolution and ultimately every other relevant issues are
presently being addressed by a combination of experimental and bioinformatics tools. As a
result of this synergy, there has been a tremendous progress in some of the areas which
includes: gene finding and genome annotation, comparative genomics, allele mining, protein-
protein interactions, structure-function relationships, in silico dissection of quantitative trait
loci, global transcript profiling, deciphering metabolic and signaling pathway, drug discovery,
diversity analysis and phylogeny.
Gene finding typically refers to the area of computational biology that is concerned with
algorithmically identifying stretches of sequence that are biologically functional. This especially
includes protein-coding genes, but may also include other functional elements such as RNA
genes and regulatory regions. Gene finding is one of the first and most important steps in
understanding the genome of a species once it has been sequenced. In the genomes of
prokaryotes, genes have specific and relatively well-understood promoter sequences (signals),
such as the Pribnow box and transcription factor binding sites, which are easy to systematically
16
identify. Also, the sequence coding for a protein occurs as one contiguous open reading frame
(ORF), which is typically many hundreds or thousands of base pairs long. Furthermore, protein-
coding DNA has certain periodicities and other statistical properties that are easy to detect in
sequence of this length. These characteristics make prokaryotic gene finding relatively
straightforward, and well designed systems are able to achieve high levels of accuracy.
However, gene finding in eukaryotes, especially complex organisms like humans, is
considerably more challenging for several reasons. First, the promoter and other regulatory
signals in these genomes are more complex and less well-understood than in prokaryotes,
making them more difficult to reliably recognize. Two classic examples of signals identified by
eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. Second, splicing
mechanisms employed by eukaryotic cells mean that a particular protein-coding sequence in
the genome is divided into several parts (exons), separated by non-coding sequences (introns).
Splice sites are themselves another signal that eukaryotic gene finders are often designed to
identify. A typical protein-coding gene in humans might be divided into a dozen exons, each less
than two hundred base pairs in length, and some as short as twenty to thirty. It is therefore
much more difficult to detect periodicities and other known content properties of protein-
coding DNA in eukaryotes.
Gene finders for both prokaryotic and eukaryotic genomes typically use complex
probabilistic models, such as Hidden Markov Models, in order to combine information from a
variety of different signal and content measurements. The GLIMMER system is a widely used
and highly accurate gene finder for prokaryotes. Eukaryotic ab initio gene finders, by
comparison, have also achieved a reasonable level of success notable example is GENSCAN
(Figure 1).
17
Figure 1: GENSCAN model indicating states and transition, states are parameterized with signal
and content features
References:
Durbin, Krogh, and Mitchison. 1998 Biological sequence analysis: Probabilistic models of
proteins and nucleic acids. Cambridge University Press
Zhang, M. Q., 2002 Computational prediction of eukaryotic protein-coding genes. Nature
Reviews Genetics 3: 698-709
Mathe, C et al., 2002 Current methods of gene prediction, their strengths and weaknesses.
Nucleic Acids Research 30: 4103-4117
Yandell, M et al., 2012 A beginner’s guide to eukaryotic genome annotation. Nature Reviews
Genetics 13: 329-342
18
Transcriptomics for miRNAs: Innovative Frontiers in Regulatory Genomics
Dr. N. Manikanda Boopathi, Assistant Professor,
Department of Plant Molecular Biology and Bioinformatics, CPMB&B, Tamil Nadu Agricultural University, Coimbatore.
[email protected]; [email protected]
Prelude
MicroRNAs (miRNAs) are small, endogenous RNAs (~20 – 24 bp in length) that can play
important regulatory roles in plant growth and development under normal and stressed
conditions (Bartel, 2004). Usually miRNAs are endogenously expressed as non-translated RNAs
and are processed by Dicer-like proteins from stem loop regions of longer RNA precursors
(Figure 1).It comprises one of abundant classes of gene regulatory molecules in multicellular
organisms and likely influence the output of many protein-coding genes. MiRNAs are chemically
and functionally similar to small interfering RNAs. MiRNAs are deeply conserved within both the
plant and animal kingdoms. However, there are substantial differences between the two
lineages with regard to the mechanism and scope of miRNA-mediated gene regulation (Jones-
Rhoades et al., 2006).
Historical perspectives
In 1993, laboratories of Drs.Ambros and Ruvkunindependently investigated the first
endogenous 22 bp RNAs in Caenorhabditis elegans. Theyfound that a small 22 nucleotides RNA
derived from a gene called lin-4 suppressed the expression of lin-14, which controls the timing
of Caenorhabditis elegans larval development (Lee et al., 1993; Wightman et al., 1993); lin-4 is
recognized as the founding member of a new class of small RNAs called miRNAs and it was
clearly noticed the transcript of these genes were not produced for protein. Since lin-4 was
recognized as a miRNA in 2001, thousands of miRNAs have been identified from various
organisms including C. elegans, Drosophila melanogaster, fish, mouse, rat, plants and human
(Lee and Ambrose, 2001). Although investigations of plant miRNAs are less extensive, the
19
versatile functions of plant miRNAs are becoming clear. A majority of miRNA functions are
currently investigated by over-expression or lowered expression of miRNAs. Plant miRNAs are
mostly involved in transcriptional regulation, developmental timing, cell death, cell proliferation
and even nutrient metabolism.
Biogenesis of plant miRNAs
MiRNA biogenesis in plants requires multiple steps in order to form mature miRNAs from
miRNA genes (there are excellent reviews on this such as Bartel, 2004 and Jones-Rhoades,
2006). Briefly, what has been known so far in biogenesis of plant miRNAs can be described
below and illustrated in Figure 1.First, a miRNA gene is transcribed to a primary miRNA (pri-
miRNA), which is usually a long sequence of more than several hundred nucleotides. This step is
controlled by Polymerase II enzymes. Second, the pri-miRNA is cleaved to a stem loop
intermediate called miRNA precursor or pre-miRNA. This step is controlled by the Dicer-like 1
enzyme (DCL1). In animals, pre-miRNAs are then transported by exportin 5 from the nucleus
into the cytoplasm. Only the active miRNA strand of the duplex, but not the passenger strand
(miRNA*) is incorporated into the RNA induced silencing complex (RISC). In this step, however,
plant miRNAs differ from animals in that they cleaved into miRNA: miRNA* duplex possibly by
dicer-like enzyme 1 (DCL1) in the nucleus rather than in the cytoplasm, then the duplex is
translocated into the cytoplasm by HASTY, the plant orthologue of exportin 5. The
accumulation of mature miRNAs is thought to be affected by DCL1 in response to different
stresses and phytohormone like GA and ABA. In the cytoplasm, miRNAs are unwound into
single strand, mature miRNAs by helicase. Finally, the mature miRNAs enter a ribonucleoprotein
complex known as the RNA induced silencing complex (RISC) where they regulate targeted gene
expression. This suggests that miRNA biogenesis is required several enzymes for processing of
long pri-miRNA to ~20–24 bp mature miRNAs. The mature miRNA is typically 19–22 bplong.
20
a) b)
Figure 1. Schematic illustration of miRNA a) biogenesis and b) mechanism of action
General features of plant miRNAs
Since, the first report on plant miRNA was published in 2002, thousands of plant miRNAs
were reported by employing several methods (see below) and a list of experimentally validated
miRNAs is being updated at plant miRNA database, PMRD
(http://bioinformatics.cau.edu.cn/PMRD),miRNAdatabase (http://www.sanger.ac.uk/Software/
Rfam/mima/), miRNEST (http://lemur.amu.edu.pl/share/php/mirnest /home.php) and other
crop specific databases such as Arabidopsis (http://sundarlab.ucdavis.edu/mirna/).
Experimental and computational analysis has indicated that many plant miRNAs and their
targets are conserved between monocots and dicots (Sunkar et al., 2008). Axtell and Bartel
(2005) employed microarray technology to analyse the expression of several miRNAs in
different plant species, and found that some miRNAs existed not only in dicots and monocots
but also in ferns, lycipods, and mosses. A set of 21 abundant miRNAs were found to be
conserved among the angiosperms (Axtell and Bowman, 2008), many of which are involved in
transcription factor regulation in developmental control. In another report, based on the
analysis of ESTs for miRNA targets in many plant species Zhang et al. (2006) found that they are
omnipresent. Both miRNA conservation and miRNA target conservation also indicate that plant
miRNAs have a very deep origin in plant phylogeny, at least since the last common ancestor of
bryophytes and seed plants. This suggests that miRNA-mediated gene regulation existed more
than 425 million years ago in the plant kingdom.
Conserved miRNAs do not necessarily exhibit the same levels of expression, patterns of
stage of expression in different species. Therefore, sequence and expression divergence in
21
miRNAs between species may affect miRNA accumulation and target regulation leading to
unique developmental changes and phenotypic variation. The conservation of miRNAs and
other small RNAs between the conifers and the angiosperms indicates important RNA silencing
processes were highly developed in the earliest spermatophytes (Morin et al., 2008). On the
other hand some miRNAs are species specific (Ng et al., 2012). Non-conserved miRNAs may
play more explicit roles in the given plant species (Rhoades, 2012) such as the differentiation
and elongation of cotton fibres (Boopathi and Pathmanaban, 2012), nodulation regulated
miRNAs in soybean root (Subramanian et al., 2008).
Plant transcriptome regulation by miRNAs
Although much is known about their biogenesis and biological functions, the
mechanisms allowing miRNAs to silence gene expression in plant cells are still under debate.
The current models for miRNA-mediated gene silencing can be explained as below. In the RISC
complex, miRNAs bind to target mRNA and inhibit gene expression through its cleavage or
stalling translation or by some unknown mechanisms (see Bartel 2004, for review). This causes
gene silencing and termed RNA interference (RNAi) in animals, quelling in fungi and
posttranscriptional gene silencing in plants (Baulcombe, 2004). In plants, most target mRNAs
contain a single miRNA complementary site. Most corresponding miRNAs typically and perfectly
complement to these sites and cleave the target mRNAs (Bartel, 2009). Unlike animal miRNA
targets, the complementary sites in plants can exist anywhere along the target mRNA rather
than at the 3’-UTR. Another mechanism was also identified in plant miRNA regulation where it
regulated gene expression by repressing translation possibly, through inhibition of ribosome
movement (Bartel, 2009). This suggests that miRNAs mediated regulation of gene expression in
plants is more complex than in animals.
MiRNAs regulate gene expression at the posttranscriptional level and act as key
regulators in diverse regulatory pathways. MiRNAs binding to target mRNAs often leads to
blockade of translation or degradation of the target mRNAs. Thus, identification of target
mRNAs is essential for understanding the biological functions of miRNAs. MiRNAs from plants
induce direct cleavage and degradation by binding to the target sequences with perfect base
22
pairing. Targets of miRNAs are often difficult to predict, because few of them match to their
target mRNAs perfectly. Their miRNA: mRNA duplexes often contain several mismatches, gaps
and G:U base pairs in many positions. It is known that a so-called miRNA “seed region” (usually,
nucleotides 2-7 at the 5’-end of miRNA) is the most important determinant for target specificity
(Bartel, 2009). miRNA-mediated repression often depends on perfect or near-perfect base
pairing of seed regions to their targets. A conventional way to search for miRNA targets is by
using bioinformatics. The classical model for specific miRNA target recognition by most
algorithms was mainly depended on (a) the detection of seed matches and (b) thermodynamic
stability of miRNA: mRNA duplexes. Different algorithms always produce divergent results. To
circumvent this problem to some extent, stacking binding matrix uses both the information
about the miRNAs as well as experimentally validated target sequences in the search for
candidate target sequences (Moxon et al., 2008). Further, methods do vary depending on
whether a method is for animals, plants, or viruses, since the biology of miRNAs is somewhat
different in each case. In addition, much work has been done to develop molecular tools to
identify miRNA targets, such as HITS-CHIP and microarray technique (Huang et al., 2011). Those
tools have been proven to be useful in miRNA targets research, but they are not widely applied
because their processes are too complicated. Huang et al., (2011) demonstrated the hybrid-PCR
as an effective and rapid approach for screening putative miRNA targets, with much more
advantage of simplicity, low cost, and ease of implementation. Target mRNA candidates can be
obtained through hybrid-PCR from any kind of cells at different development stages or from
different tissues. Hybrid-PCR can be used as a quick screen tool in miRNA research, although
more experimental validations are needed in further study.
Tools for miRNA identification
Methods for identifying miRNAs are based on their following major characteristics: (i) all
miRNAs are small non-coding RNAs, usually consisting of ~20–24 nt in plants (ii) All miRNA
precursors have a well-predicted stem loop hairpin structure and this fold-back hairpin
structure has a low free energy as predicted by MFOLD or other computational programs and
23
(iii) Many miRNAs are evolutionarily conserved from ferns to higher plants (Kim and Nam,
2006). To avoid designating other small RNAs or fragments of other RNAs as miRNAs, Ambros et
al. (2003) developed combined criteria to identify new miRNAs. These combined criteria include
both biogenesis and expression measures, neither of which on its own is sufficient for
identifying a candidate gene for a new miRNA. So far, all newly predicted/identified miRNAs
have confirmed to these rules.
Both experimental methods and computational approaches have been adopted to
identify miRNAs in plants. Recently, as deep sequencing technologies have become more
available and affordable, more and more studies have been employing deep sequencing to
discover and functionally analyse miRNAs in plants. In general, there are four approaches for
identifying miRNAs: genetic screening, direct cloning after isolation of small RNAs,
computational strategy or in silico methods and expressed sequence tags (ESTs) analysis. Sun
(2011) summarized the major advantages and disadvantages of these methods. A range of
techniques are available for miRNA quantification, including northern blotting, dot blotting,
RNase protection assay, primer extension analysis, Invader assay and quantitative PCR.
It is worth noting that standards of evidence for annotating miRNAs have been applied
unevenly over the years. In particular, a substantial number of annotations were added to
miRbase using the older paradigm, and have not been removed, even in cases where
subsequent experiments provide strong evidence that they are in fact siRNAs rather than
miRNAs. Therefore, analyses using miRbase as a set of verified miRNAs should take into account
that not all annotated miRNAs are equally well supported by evidence (Rhoades, 2012). To
experimentally identify and validate identified miRNAs, virtuous and efficient protocols are
necessary to isolate them, which sometimes may be challenging due to the composition of
specific tissues of certain plant species.
24
Key Software for miRNA identification
miRanda:
miRanda is an algorithm for finding genomic targets for microRNAs. This algorithm has
been written in C and is available as an open-source method under the GPL. MiRanda was
developed at the Computational Biology Center of Memorial Sloan-Kettering Cancer Center.
miRBse:
The miRBase database is a searchable database of published miRNA sequences and
annotation. Each entry in the miRBase Sequence database represents a predicted hairpin
portion of a miRNA transcript (termed mir in the database), with information on the location
and sequence of the mature miRNA sequence (termed miR). Both hairpin and mature
sequences are available for searching and browsing, and entries can also be retrieved by name,
keyword, references and annotation. All sequence and annotation data are also available for
download.
PicTar:
PicTar is an algorithm for the identification of microRNA targets. This searchable website
provides details on 3' UTR alignments with predicted sites, links to various public database.
TargetScan:
Target Scan predicts biological targets of miRNAs by searching for the presence of
conserved 8mer and 7mer sites that match the seed region of each miRNA. As an option, non-
conserved sites are also predicted. Also identified are sites with mismatches in the seed region
that are compensated by conserved 3' pairing.
MicroCosm:
MicroCosm Targets (formerly miRBase Targets) is a web resource developed by the
Enright Lab at the EMBL-EBI containing computationally predicted targets for microRNAs across
many species. The miRNA sequences are obtained from the miRBase Sequence database and
most genomic sequence from EnsEMBL. It provides the most up-to-date and accurate
predictions of miRNA targets.
25
MapMi:
MapMi is a tool designed to locate miRNA precursor sequences in existing genomic
sequences (e.gEnsembl and EnsemblMetazoa), using potential mature miRNA sequences as
input. After searching the genome for the provided mature sequences, these hits are extended
and classified taking into account major structural properties of known miRNA precursors.
MapMi uses Bowtie and RNAfold third-party tools.
Sylamer
Sylamer is a system for finding significantly over or under-represented words in
sequences according to a sorted gene list. Typically it is used to find significant enrichment or
depletion of microRNA or siRNA seed sequences from microarray expression data. Sylamer is
extremely fast and can be applied to genome-wide datasets with ease.
Potentials, problems and perspectives
At the time of this writing, there are some gaps with respects to the phylogenetic
distribution of miRNAs and understanding on the way in which miRNAs have evolved. The
addition of novel sequencing data from key phylogenetic positions will likely accomplish many
demands. For example, we now have data on small miRNAs and miRNA targets in
representative species from green algae, non-seed plants and gymnosperms. But we do not
have much information about the extent of diversity or conservation of miRNA expression
within any of these lineages. Similarly, the analysis of more examples of closely related species
will provide a clearer picture of how miRNAs evolve over shorter time frames. Several
contrasting results have shown that the same miRNA have up- or down- regulated in different
species in response to given abiotic or biotic stress (examples are provided in Sunker et al.,
2012). These conflicting results highlight the need for more in-depth and detailed
characterizations of stress-responsive miRNAs in plants. It is likely that a more comprehensive
analysis, including the impact of such regulation on miRNA targets, would provide better
insights into miRNA-guided gene regulation. It is also imperative to analyse differential
regulation of miRNAs in different tissues since it is important for adaptation to stress and that
26
tissue-specific regulation may be missed in analyses of whole plants. MiRNA*, the
complementary strand of mature functional miRNA, is thought to degrade rapidly or
accumulate at only very low levels, suggesting that it may not be functional. However, several
recent studies have shown that plant miRNA* tends to accumulate at high levels under certain
conditions and this also warrants additional investigation. Besides degradation of transcripts,
several miRNAs have been shown to regulate targets via translational repression under normal
conditions. However, it is unclear whether plants under stress use both modes of target gene
regulation or whether one mode is preferred over the other. Identification of stress-responsive
miRNAs is largely dependent on sequence-based profiling, which is known to have some bias,
and thus requires independent validation. Small RNA blot analysis, although lacking sensitivity is
a gold standard for validation. Implementation of highly reliable and rigorous assays is essential
for firm characterization of stress-responsive miRNAs in plants. Further, examining the effect of
stress regulated miRNA on its mRNA target using degradome analysis can provide robust
confirmation of the stress responsiveness of miRNA. In addition to identifying miRNA targets,
by analysing degradome libraries from control and stressed samples it should be possible to
quantify the impact of a stress responsive miRNA on its mRNA target.
In summary, plant miRNA dramatically alter gene expression during cellular
developmental phases and under all types of stress. In order to assure plant growth and
development under extreme environmental conditions, the metabolic network has to be re-
programmed. To this end, a comprehensive understanding of plant cell development is
required to decipher the molecular mechanisms of plant stress responses. This can be achieved
by successful integration of miRNA data with metabolomics, transcriptomics, proteomics and
other ‘omics’ results.
Acknowledgement
This work is financially supported by Department of Biotechnology, Government of
India.
27
References
Ambros, V. B., Bartel, B. D., Bartel, P., Burge, C. B., Carrington, J. C., Chen, X., Dreyfuss, G., Eddy,
S. R., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., and Tuschl, T., 2003 A uniform
system for microRNA annotation. RNA 9: 277-279.
Axtell, M. J. and Bartel, D. P., 2005 Antiquity of MicroRNAs and Their Targets in Land Plants.
Plant Cell 17: 1658-1673.
Axtell, M. J., and Bowman, J. L., 2008 Evolution of plant microRNAs and their targets. Trends
Plant Sci 13: 343-349.
Bartel, D.P., 2004 MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell 116: 281–
297.
Bartel, D.P., 2009 MicroRNAs: target recognition and regulatory functions. Cell 136: 215-233.
Baulcombe, D., 2004 RNA silencing in plants. Nature 431: 356-363.
Boopathi, N.M., and Pathmanaban, R., 2012 Additional insights into the adaptation of cotton
plants under abiotic stresses by in silico analysis of conserved miRNAs in cotton expressed
sequence tag database (dbEST). African Journal of Biotechnology Vol. 11(76): 14054-14063.
Huang, Y., Qi Y., Ruan, Q., Ma, Y., He, R., Ji, Y., Sun, Z., 2011 A rapid method to screen putative
mRNA targets of any known microRNA. Virology Journal 8: 8.
Kim, V.N., and Nam, J.W., 2006 Genomics of microRNA. Trends in genetics 22(3): 165-176.
Lee, R.C., Ambros, V., 2001 An extensive class of small RNAs in Caenorhabditis elegans. Science
294: 862–64.
Lee, R.C., Feinbaum, R.L., and Ambros, V., 1993 The C. elegans heterochronic gene lin-4
encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843–854.
28
Jones-RhoadesM, W., David, P., Bartel, and Bonnie, Bartel, 2006 MicroRNAs and Their
Regulatory Roles in Plants. Annu. Rev. Plant Biol. 57: 19–53.
Morin, R. D., Aksay, G., Dolgosheina, E., Ebhardt, H.A., Magrini, V., Mardis, E. R., Sahinalp, S. C,
Unrau, P. J., 2008 Comparative analysis of the small RNA transcriptomes of Pinuscontorta and
Oryzasativa. Genome Res 18 (4): 571-84.
Moxon, S., Moulton, V., Kim, J. T., 2008 A scoring matrix approach to detecting miRNA target
sites. Algorithms Mol. Biol 3 (1): 3.
Ng, D.W., Lua, J., and Chen, Z.J., 2012 Big roles for small RNAs in polyploidy, hybrid vigor, and
hybrid incompatibility. Current Opinion in Plant Biology 15: 154–161.
Rhoades, M.W.J., 2012 Conservation and divergence in plant microRNAs. Plant MolBiol 80: 3–
16.
Subramanian, S., Fu, Y., Sunkar, R., Barbazuk, W.B., Zhu, J.K., Yu, O., 2008 Novel and
nodulation-regulated microRNAs in soybean roots. BMC Genomics 9: 160.
Sun, G., 2011 MicroRNAs and their diverse functions in plants. Plant Mol. Biol. DOI
10.1007/s11103-011-9817-6.
Sunkar, R., Jagadeeswaran, G., 2008 In silico identification of conserved microRNAs in large
number of diverse plant species. BMC Plant Biol 8(1): 37.
Wightman, B., Ha, I., Ruvkun, G., 1993 Posttranscriptional regulation of the heterochronic gene
lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855–62.
Zhang, B., Pan, X., Cobb, G. P., Anderson, T. A., 2006 Plant microRNA: a small regulatory
molecule with big impact. Dev Biol 289(1):3-16.
29
Next Generation Sequencing Technology platform for development of
SNP markers and association mapping studies
Dr. N.Senthil, Professor Department of Plant Biotechnology, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
The decreasing cost along with rapid progress in next-generation sequencing and
related bioinformatics computing resources has facilitated large-scale discovery of SNPs in
various model and non-model plant species. Large numbers and genome-wide availability of
SNPs make them the marker of choice in partially or completely sequenced genomes.
Molecular markers are widely used in plant genetic research and breeding. Single Nucleotide
Polymorphisms (SNPs) are currently the marker of choice due to their large numbers in virtually
all populations of individuals. The applications of SNP markers have clearly been demonstrated
in human genomics where complete sequencing of the human genome led to the discovery of
several million SNPs and technologies to analyze large sets of SNPs (up to 1 million) have been
developed. SNPs have been applied in areas as diverse as human forensics and diagnostics,
aquaculture, marker assisted-breeding of dairy cattle, crop improvement, conservation, and
resource management in fisheries. Functional genomic studies have capitalized upon SNPs
located within regulatory genes, transcripts, and Expressed Sequence Tags (ESTs). Until recently
large scale SNP discovery in plants was limited to maize, Arabidopsis, and rice. Genetic
applications such as linkage mapping, population structure, association studies, map-based
cloning, marker-assisted plant breeding, and functional genomics continue to be enabled by
access to large collections of SNPs. Arabidopsis thaliana was the first plant genome sequenced
followed soon after by rice. In the year 2011 alone, the number of plant genomes sequenced
doubled as compared to the number sequenced in the previous decade, resulting in currently,
31 and counting, publicly released sequenced plant genomes (http://www.phytozome.net/).
With the ever increasing throughput of next-generation sequencing (NGS), de novo and
reference-based SNP discovery and application are now feasible for numerous plant species.
30
The major sequencing technologies are focused here,
Roche (454) Sequencing
Pyrosequencing was the first of the new highly parallel sequencing technologies to
reach the market. It is commonly referred to as 454 sequencing after the name of the company
that first commercialized it. It is an SBS method where single fragments of DNA are hybridized
to a capture bead array and the beads are emulsified with regents necessary to PCR amplifying
the individually bound template. Each bead in the emulsion acts as an independent PCR where
millions of copies of the original template are produced and bound to the capture beads which
then serve as the templates for the subsequent sequencing reaction. The individual beads are
deposited into a picotiter plate along with DNA polymerase, primers, and the enzymes
necessary to create fluorescence through the consumption of inorganic phosphate produced
during sequencing. The instrument washes the picotiter plate with each of the DNA bases in
turn. As template-specific incorporation of a base by DNA polymerase occurs, a pyrophosphate
(PPi) is produced. This pyrophosphate is detected by an enzymatic luminometric inorganic
pyrophosphate detection assay (ELIDA) through the generation of a light signal following the
conversion of PPi into ATP. Thus, the wells in which the current nucleotides are being
incorporated by the sequencing reaction occurring on the bead emit a light signal proportional
to the number of nucleotides incorporated, whereas wells in which the nucleotides are not
being incorporated do not. The instrument repeats the sequential nucleotide wash cycle
hundreds of times to lengthen the sequences. The 454 GS FLX Titanium XL+ platform currently
generates up to 700MB of raw 750bp reads in a 23 hour run. The technology has difficulty
quantifying homopolymers resulting in insertions/deletions and has an overall error rate of
approximately 1%. Reagent costs are approximately $6,200 per run.
Illumina Sequencing
Illumina technology, acquired by Illumina from Solexa, followed the release of 454
sequencing. With this sequencing approach, fragments of DNA are hybridized to a solid
substrate called a flow cell. In a process called bridge amplification, the bound DNA template
31
fragments are amplified in an isothermal reaction where copies of the template are created in
close proximity to the original. This results in clusters of DNA fragments on the flow cell
creating a “lawn” of bound single strand DNA molecules. The molecules are sequenced by
flooding the flow cell with a new class of cleavable fluorescent nucleotides and the reagents
necessary for DNA polymerization. A complementary strand of each template is synthesized
one base at a time using fluorescently labeled nucleotides. The fluorescent molecule is excited
by a laser and emits light, the colour of which is different for each of the four bases. The
fluorescent label is then cleaved off and a new round of polymerization occurs. Unlike 454
sequencing, all four bases are present for the polymerization step and only a single molecule is
incorporated per cycle. The flagship HiSeq2500 sequencing instrument from Illumina can
generate up to 600GB per run with a read length of 100nt and 0.1% error rate. The Illumina
technique can generate sequence from opposite ends of a DNA fragment, so called paired-end
(PE) reads. Reagent costs are approximately $23,500 per run.
Applied Biosystems (SOLiD) Sequencing
The SOLiD system was jointly developed by the Harvard Medical School and the Howard
Hughes Medical Institute. The library preparation in SOLiD is very similar to Roche/454 in which
clonal bead populations are prepared in microreactors containing DNA template, beads,
primers, and PCR components. Beads that contain PCR products amplified by emulsion PCR are
enriched by a proprietary process. The DNA templates on the beads are modified at their 3′ end
to allow attachment to glass slides. A primer is annealed to an adapter on the DNA template
and a mixture of fluorescently tagged oligonucleotides is pumped into the flow cell. When the
oligonucleotide matches the template sequence, it is ligated onto the primer and the
unincorporated nucleotides are washed away. A charged couple device (CCD) camera captures
the different colours attached to the primer. Each fluorescence wavelength corresponds to a
particular dinucleotide combination. After image capture, the fluorescent tag is removed and
new set of oligonucleotides are injected into the flow cell to begin the next round of DNA
ligation. This sequencing-by-ligation method in SOLiD-5500x1 platform generates up to 1,410
32
million PE reads of nucleotide each with an error rate of 0.01% and reagent cost of
approximately $10,500 per run.
Although widely accepted and used, the NGS platforms suffer from amplification biases
introduced by PCR and dephasing due to varying extension of templates. The TGS technologies
use single molecule sequencing which eliminates the need for prior amplification of DNA thus
overcoming the limitations imposed by NGS. The advantages offered by TGS technology are (i)
lower cost, (ii) high throughput, (iii) faster turnaround, and (iv) longer reads. The TGS can
broadly be classified into three different categories: (i) SBS where individual nucleotides are
observed as they incorporate (Pacific Biosciences single molecule real time (SMART), Heliscope
true single molecule sequencing (tSMS), and Life Technologies/Starlight and Ion Torrent), (ii)
nanopore sequencing where single nucleotides are detected as they pass through a nanopore
(Oxford/Nanopore), and (iii) direct imaging of individual molecules (IBM).
Helicos Biosciences Corporation (Heliscope) Sequencing
Heliscope sequencing involves DNA library preparation and DNA shearing followed by
addition of a poly-A tail to the sheared DNA fragments. This poly-A tailed DNA fragments are
attached to flow cells through poly-T anchors. The sequencing proceeds by DNA extension with
one out of 4 fluorescent tagged nucleotides incorporated followed by detection by the
Heliscope sequencer. The fluorescent tag on the incorporated nucleotide is then chemically
cleaved to allow subsequent elongation of DNA. Heliscope sequencers can generate up to 28GB
of sequence data per run (50 channels) with maximum read length of 55bp at 99% accuracy.
The cost per run per channel is approximately $360.
Pacific Biosciences SMART Sequencing
The Pacific Biosciences sequencer uses glass anchored DNA polymerases which are
housed at the bottom of a zero-mode waveguide (ZMW). DNA fragments are added into the
ZMW chamber with the anchored DNA polymerase and nucleotides, each labeled with a
different colour fluorophore, and are diffused from above the ZMW. As the nucleotides
33
circulate through the ZMW, only the incorporated nucleotides remain at the bottom of the
ZMW while unincorporated nucleotides diffuse back above the ZMW. A laser placed below the
ZMW excites only the fluorophores of the incorporated nucleotides as the ZMW entraps the
light and does not allow it to reach the unincorporated nucleotides above. The Pacific
Biosciences sequencers can generate up to 140MB of sequences per run (per smart cell) with
reads of 2.5Kbp at 85% accuracy. The cost per run per smart cell is approximately $600.
Among the TGS technologies, Pacific Biosciences SMART and Heliscope tSMS have been
used in characterizing bacterial genomes and in human-disease-related studies; however, TGS
has yet to be capitalized upon in plant genomes. The Heliscope generates short reads (55bp)
which may cause ambiguous read mapping due to the presence of paralogous sequences and
repetitive elements in plant genomes. The Pacific Biosciences reads have high error rates which
limit their direct use in SNP discovery. However, their long reads offer a definite advantage to
fill gaps in genomic sequences and, at least in bacterial genomes, NGS reads have proven
capable of “correcting” the base call errors of this TGS technology. Hybrid assemblies
incorporating short (Illumina, SOLiD), medium (454/Roche), and long reads (Pac-Bio) have the
potential to yield better quality reference genomes and, as such, would provide an improved
tool for SNP discovery.
The choice of a sequencing strategy must take into account the research goals, ability to
store and analyze data, the ongoing changes in performance parameters, and the cost of
NGS/TGS platforms. Some key considerations include cost per raw base, cost per consensus
base, raw and consensus accuracy of bases, read length, cost per read, and availability of PE or
single end reads. The pre- and postprocessing protocols such as library construction and
pipeline development and implementation for data analysis are also important.
RNA and ChIP Sequencing
Genome-wide analyses of RNA sequences and their qualitative and quantitative
measurements provide insights into the complex nature of regulatory networks. RNA
34
sequencing has been performed on a number of plant species including Arabidopsis, soybean,
rice, and maize for transcript profiling and detection of splice variants. RNA sequencing has
been used in de novo assemblies followed by SNP discovery performed in nonmodel plants
such as Eucalyptus grandis Brassica napus, and Medicago sativa.
RNA deep-sequencing technologies such as digital gene expression and Illumina RNASeq
are both qualitative and quantitative in nature and permit the identification of rare transcripts
and splice variants. RNA sequencing may be performed following its conversion into cDNA that
can then be sequenced as such. This method is, however, prone to error due to (i) the
inefficient nature of reverse transcriptases (RTs), (ii) DNA-dependent DNA polymerase activity
of RT causing spurious second strand DNA , and (iii) artifactual cDNA synthesis due to template
switching. Direct RNA sequencing (DRS) developed by Helicos Biosciences Corporation is a high
throughput and cost-effective method which eliminates the need for cDNA synthesis and
ligation/amplification leading to improved accuracy.
Chromatin immunoprecipitation (ChIP) is a specialized sequencing method that was
specifically designed to identify DNA sequences involved in in vivo protein DNA interaction.
ChIP-sequencing (ChIP-Seq) is used to map the binding sites of transcription factors and other
DNA binding sites for proteins such as histones. As such, ChIP-Seq does not aid SNP discovery,
but the availability of SNP data along with ChIP-Seq allows the study of allele-specific states of
chromatin organization. Deep sequence coverage leading to dense SNP maps permits the
identification of transcription factor binding sites and histone-mediated epigenetic
modifications. ChIP-Seq can be performed on serial analysis of gene expression (SAGE) tags or
PE using Sanger, 454, and Illumina platforms.
Applications of SNP markers in Association Mapping
SNPs are increasingly becoming the marker of choice for a wide range of applications
including genetic mapping, MAS, diversity analysis and association mapping. Fast and efficient
35
generation of these SNPs has been facilitated by high throughput genotyping methods, as
mentioned above thus making SNPs a marker of choice for association genetics studies.
As the availability of SNPs increases, they are displacing other forms of molecular
markers for association mapping studies. As costs associated with SNP discovery and detection
continue to fall, SNPs will increasingly be associated with agronomic traits and will be applied
for crop improvement through Association Mapping.
General Scheme of Association Mapping for Tagging a Gene of Interest using
Germplasm Accessions.
36
Types of Association Mapping
Genome-wide Association Mapping (GWAS)
It is a comprehensive approach to systematically search the genome for causal genetic
variation. A larger number of markers are tested for association with various complex traits,
and prior information regarding candidate gene is not required. It works best for a research
consortium with complementary expertise and adequate funding.
Candidate- gene association mapping
Candidate genes are selected based on prior knowledge from mutational analysis,
biochemical pathway, or linkage analysis of the trait of interest. An independent set of random
markers needs to be scored to infer genetic relationships. It is a low cost, hypothesis driven,
and trait specific approach but will miss other unknown loci. (Zhu et al., 2010).
Genome Scans and Candidate Genes
Association studies with high density SNP coverage, large sample size, and minimum
population structure offer great promise in complex trait dissection. To date, candidate-gene
association studies have searched only a tiny fraction of the genome. As genomic technologies
continue to evolve, we would certainly expect to see more genome-wide association analyses
conducted in different plant species. So far, there have been few successful results from
candidate-gene association mapping. But for many research groups, starting with candidate-
gene sequences and background markers will provide firm understanding of population
structure, familial relatedness, nucleotide diversity, LD decay, and many other aspects of
association mapping. Afterward, this knowledge can be built on through comprehensive
genome scans with intensive sequencing and high-density genotyping.
Another reason for the promising but still limited success found in the candidate-gene
approach is the way candidate genes were selected. Obviously, many candidate genes were
discovered though comparisons of severe mutants and the wild-type lines. We do not have a
37
strong understanding of naturally occurring effects of alleles at such loci. Even if the loss-of-
function allele results in a significant phenotypic change, we can only expect that mild
mutations would have a somewhat modest effect on the phenotype; those changes, in turn,
could be detected with the assembled association mapping population. Moreover, both the
frequency and effect of the allele effect whether variation explained by a locus is detectable. A
skewed allele frequency would make it difficult to detect an association even though the
candidate gene polymorphism is truly underlying the phenotypic variation.
Nested Association Mapping
Ultimately, it is desirable to have both candidate-gene and genome-wide approaches to
exploit in a species along with traditional linkage mapping. Joint linkage and linkage
disequilibrium mapping have been proposed as “Fine Mapping’’ approach in theory (Mott and
Flint, 2002; Wu et al., 2002) and demonstrated in practice (Blott et al., 2003; Meuwissen et al.,
2002). Nested association mapping (NAM), is currently implemented in maize, could be an even
more powerful strategy for dissecting the genetic basis of quantitative traits in species with low
LD (Yu et al., 2008). For other crop species, different genetic designs (e.g., diallel, design II,
eight-way cross, single round robin, or double round robin) could be used to accommodate the
level of LD, practicality of creating the population and phenotyping a large number of RILs, and
resources available (Churchill et al., 2004; Stich et al., 2007).
In essence, by integrating genetic design, natural diversity, and genomics technologies,
the NAM strategy allows high power, cost-effective genome scans, and facilitates community
endeavours to link molecular variation with complex trait variation.
38
Functional genomics
Dr. M. Raveendran, Associate Professor, Departmemt of Plant Biotechnolgy, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
Functional genomics is a field of molecular biology that attempts to describe gene (and
protein) functions and interactions. Functional genomics focuses on the dynamic aspects such
as gene transcription, translation, and protein–protein interactions. A key characteristic of
functional genomics studies is their genome-wide approach to these questions, generally
involving high-throughput methods rather than a more traditional “gene-by-gene” approach.
This approach is based on data mining, large scale experimental methodologies combined with
statistical and computational analyses of the results.
The goal of functional genomics is to understand the relationship between an
organism's genome and its phenotype. Functional genomics involves studies of natural
variation in genes, RNA, and proteins over time (such as an organism's development) or space
(such as its body regions), as well as studies of natural or experimental functional disruptions
affecting genes, chromosomes, RNA, or proteins.
Techniques and applications
Functional genomics includes function-related aspects of the genome gene expression
level (molecular activities), mutation and polymorphism (such as SNP) analysis. Expression
analysis (profiling) comprise a number of "-omics" such as transcriptomics (gene expression),
proteomics (protein expression), and metabolomics. Functional genomics uses mostly multiplex
techniques to measure the abundance of many or all gene products such as mRNAs or proteins
within a biological sample.
39
Microarrays
Microarrays measure the amount of mRNA in a sample that corresponds to a given gene
or probe DNA sequence. Probe sequences are immobilized on a solid surface and allowed to
hybridize with fluorescently labeled “target” mRNA. The intensity of fluorescence of a spot is
proportional to the amount of target sequence that has hybridized to that spot, and therefore
to the abundance of that mRNA sequence in the sample. Microarrays allow for identification of
candidate genes involved in a given process based on variation between transcript levels for
different conditions and shared expression patterns with genes of known function.
SAGE
SAGE (Serial analysis of gene expression) is an alternate method of gene expression
analysis based on RNA sequencing rather than hybridization. SAGE relies on the sequencing of
10–17 base pair tags which are unique to each gene. These tags are produced from poly-A
mRNA and ligated end-to-end before sequencing. SAGE gives an unbiased measurement of the
number of transcripts per cell.
At the protein level: protein–protein interactions
Yeast two-hybrid system
A yeast two-hybrid (Y2H) screen tests a "bait" protein against many potential interacting
proteins ("prey") to identify physical protein–protein interactions. This system is based on a
transcription factor, originally GAL4, which possess separate DNA-binding and transcription
activation domains required in order for the protein to cause transcription of a reporter gene.
In a Y2H screen, the "bait" protein is fused to the binding domain of GAL4, and a library of
potential "prey" (interacting) proteins is recombinantly expressed in a vector with the
activation domain. In vivo interaction of bait and prey proteins in yeast cell brings the activation
and binding domains of GAL4 close enough together to result in expression of a reporter gene.
It is also possible to systematically test a library of bait proteins against a library of prey
proteins to identify all possible interactions in a cell.
40
AP/MS
Affinity purification and mass spectrometry (AP/MS) is able to identify proteins that
interact with one another in complexes. Complexes of proteins are allowed to form around a
particular “bait” protein. The bait protein is identified using an antibody or a recombinant tag
which allows it to be extracted along with any proteins that have formed a complex with it. The
proteins are then digested into short peptide fragments and mass spectrometry is used to
identify the proteins based on the mass-to-charge ratios of those fragments.
Loss-of-function techniques Mutagenesis
Gene function can be investigated by systematically “knocking out” genes one by one.
This is done by either deletion or disruption of function (such as by insertional mutagenesis)
and the resulting organisms are screened for phenotypes that provide clues to the function of
the disrupted gene.
RNAi
RNA interference (RNAi) methods can be used to transiently silence or knock down gene
expression using ~20 base-pair double-stranded RNA typically delivered by transfection of
synthetic ~20-mer short-interfering RNA molecules (siRNAs) or by virally encoded short-hairpin
RNAs (shRNAs). RNAi screens, typically performed in cell culture-based assays or experimental
organisms (such as C. elegans) can be used to systematically disrupt nearly every gene in a
genome or subsets of genes (sub-genomes); possible functions of disrupted genes can be
assigned based on observed phenotypes.
Functional genomics and bioinformatics
Because of the large quantity of data produced by these techniques and the desire to
find biologically meaningful patterns, bioinformatics is crucial to analysis of functional genomics
data. Examples of techniques in this class are data clustering or principal component analysis
for unsupervised machine learning (class detection) as well as artificial neural networks or
support vector machines for supervised machine learning.
41
Genome Mapping: Establishing route maps of Genomes
Dr. M. Maheswaran, Professor Centre for Plant Breeding and Genetics
Tamil Nadu Agricultural University, Coimbatore. [email protected]
Introduction
Genome of each organism is the blue print that determines the ultimate life. In general
genomes are grouped into two categories viz. prokaryotic and eukaryotic. The eukaryotic
genome is safeguarded by the nuclear membrane from exposure to cytoplasm. The constitution
of nuclear genome of different organisms has different sequences, variable amount of DNA and
the number of chromosomes. However, certain features are unique to eukaryotic genomes
compared with prokaryotic genomes (see Table 1).
Table 1: Major differences between prokaryotic and eukaryotic genomes
Characteristic Prokaryotes Eukaryotes
Genome size 600kb to 9.5 Mb 3 Mb to 140000 Mb
Average gene size/number 950bp/4300 2500bp/19000
Operon-like regulatory unit General No
Horizontal gene transfer Significant Negligible
Rate of non-coding sequences Low High
Intron Rare General
Redundant gene number Rare General
Ploidy level Haploidy Haploidy to Polyploidy
Chromosome number One More than one
Heterozygosity No Yes
What is a genome?
The conventional definition of genome is number of chromosomes representing
haploidy. But the recent definition on a genome states the following: The term genome is used
to describe the totality of chromosomes (in molecular term DNA) unique to a particular
42
organism or any cell within the organism, as distinct from genotype, which is the information
contained within those chromosomes.
The first genomes to be studied in depth at the molecular level were those of the
bacterium Eschercia coli and its bacteriophage. The second group of genomes to be studied in
great detail at the molecular level was those of mitochondria and chloroplasts. Finally attention
has turned to a detailed study of the nuclear genomes of selected eukaryotes: yeast
(Saccharomyces cerevisiae), a nematode (Caenorhabditis elegans), a mammal (Homo sapiens)
and plants (Arabidopsis thaliana) and rice (Oryza sativa).
Figure 1. Genomes A) Arabidopsis thaliana (2n = 10),
B) Oryza sativa (2n = 24) and C) Homo sapiens (2n = 46)
Nuclear DNA content and genomes
The total amount of DNA in the (haploid) genome is a characteristic of each living
species known as its C value. The fact that the DNA content of an organism is much greater
than that required to code for and regulate the production of all necessary proteins has been
termed the C value paradox. In eukaryotes, the C value is defined as the amount of DNA per
genome (1 C = haploid nucleus, 2C = dipliod nucleus and 4C = nucleus which is just about to
43
divide by mitosis). There is enormous variation in the range of C values, from as little as a mere
106 bp for a mycoplasma to as much as 1011 bp for some plants and animals. Table 2
summarizes the range of C values found in different plant species.
Table 2. Variation in nuclear DNA content of various plant species
Species pg/2C Mbp/1C
Arabidopsis thaliana 0.30 145
Arachis hypogaea 5.83 2813
Brassica juncea 2.29 1105
Cicer arietinum 1.53 738
Glycine max 2.31 1115
Gossypium hirsutum 4.39 2118
Helainthus annuus 5.95 2871
Hordeum vulgare 10.10 4873
Lycopersicon esculentum 1.88 907
Oryza sativa 0.87 419
Phaselous vulgaris 1.32 637
Pisum sativum 8.18 3947
Saccharum officinarum 5.28 2547
Triticum aestivum 33.09 15966
Vigna mungo 1.19 574
Vigna radiata 1.20 579
Zea mays 4.75 2292
1 picogram (pg) = 965 Million base pairs (Mbp)
Genome or Chromosome Mapping
Genome mapping, otherwise also called as chromosome mapping, is establishing the
road map of a genome. Mapping of chromosomes helps to locating important genes.
Chromosome mapping helps to identify the molecular environment of both coding and non
coding sequences. The map of a chromosome can be of two types: genetic map and physical
map.
44
Figure 2. Diagrammatic representation of genetic map and portion of physical map of a chromosome
Genetic map: Genetic map identifies the linear arrangement of genes on a chromosome and
is assembled from meiotic recombination data. These are theoretical maps which give the order
of genes on a chromosome. They cannot pinpoint the physical whereabouts of genes or
determine how far apart they are. Distances on this map are not directly equivalent to physical
distances. The unit of measurement is the centi-Morgon (cM).
Physical map: Physical maps identify the actual physical position of genes on a chromosome.
Distances are measured in base pairs (bp), kilobases (kb = 1000 bases) or megabases (mb = 10,
00, 000 bases).
Construction of a Genetic Map
The construction of a genetic map involves the linkage analysis of genes segregating
among the progenies of a sexual cross. According to the Mendel’s Law of Independent
Assortment, members of different pairs of alleles assort independently of each other when the
germ cell is formed. This is only true if the genes are on different chromosomes. If they are on
the same chromosome, they are said to be linked. The frequency of recombination of a pair of
linked gene is constant and characteristic for that pair of genes. The strength of linkage and
recombination observed is a function of the distance on the chromosome between the genes in
question. The greater the distance recombination is expected between genes. In other words,
45
the frequency of recombinants (not similar to both the parents) determines the distance
between genes. When multiple genes are considered for their recombination frequency, one
can group all the genes, which are linked, and the resultant group is called as linkage group. The
number of linkage groups equals the haploid number of chromosomes. The early work on
linkage map construction was mainly based on the segregation of easily observable
morphological traits among the progenies of a cross. There are several examples of linkage
maps resulted from the morphological traits, otherwise called as morphological markers.
Morphological markers are easily observable traits with discrete phenotype. Though these
markers are highly advantageous, the number is the limitation. Further, these markers are
influenced by the environment and the genetic background of the individual.
Genes were the markers for earlier genetic maps
The first genetic maps, constructed in the early decades of the 20th century for
organisms such as the fruit fly, used genes as markers. This was many years before it was
understood that genes are segments of DNA molecules. Instead, genes were looked on as
abstract entities responsible for the transmission of heritable characteristics from parent to
offspring. To be useful in genetic analysis, a heritable characteristic had to exist in tow
alternative forms or phenotypes, an example being tall and dwarf statures in the pea plants
originally studied by Mendel. By 1922 over 50 genes had been mapped on the four fruit fly
chromosomes. Figure 3 shows the genome and genetic map based on various genes of
Drosophila melanogaster
46
Figure 3. Diagrammatic representation of genetic map and portion of physical map of a
chromosome
Concept behind the genetic map construction
Genes located on the same chromosome tend to remain together and all the genes do
not assort independently as Mendel proposed. These genes on the same chromosome that do
not show independent assortment are said to be linked. The strength of linkage between a pair
of genes can be determined based on the recombination between genes by a process called
crossing over between two homologous chromosomes. This results in the production of
gametes similar to parental types and some not similar to parental types in a heterozygous
47
organism. The non-parental type of gametes is called recombinant types. The frequency of
recombinants is a measure of whether genes are linked or not, if so, how close together they
are on the chromosome and this process is called as linkage analysis.
Figure 4. Diagrammatic representation of crossing over and recombination between two
homologous chromosomes.
Based on the linkage analysis one can determine the distance between two genes on a
chromosome and establish the order of genes with other genes. Two-point analysis is a method
by which the genetic distance between two linked genes is determined. The order of genes is
established based on three-point analysis. When many genes are involved it is called multipoint
analysis.
DNA markers hastened the process of genetic map construction
In the past, genetic maps of different plant and animal genomes were constructed
mostly based on morphological markers (phenotypes showing Mendelian segregation).
However, the availability of fewer markers slowed down the approach of linkage map
construction in several species. The advent of DNA markers - polymorphic and locus
characterized by a number of variable lengths paved the way for easy and faster linkage map
48
construction. These DNA markers possess several positive features over the morphological
markers. See Table 3 for the differences between the morphological and molecular markers.
Table 3. Differences between morphological and molecular markers
Morphological markers Molecular markers
Less in number
Interact with the environment
Show differential expression across
genotypes
Not distributed throughout the
genome
Expression depends on the ontogeny of
the individual
Dominant-recessive expression
More in number
No environmental influence
Phenotypically neutral
Distributed throughout the genome
Possibility of using these markers at
any growth stage of the individual
Codominant or dominant
Codominant or dominant
Construction of genetic maps using molecular markers generally involves the following
steps viz., 1) Identifying suitable DNA marker system for the map construction, 2) Parental
survey to identify polymorphic DNA markers, 3) Establishing a suitable mapping population, 4)
Segregation analysis of polymorphic markers on the individuals of the mapping population, 5)
Linkage analysis to establish linkage map of each chromosome based on marker-segregation
data and 6) Assigning chromosome numbers to linkage groups.
Identifying suitable DNA marker system for the map construction
DNA markers behave in the same way the genes behave as for as their inheritance thus
giving the opportunity to locate them on specific chromosomes. These DNA markers can be
grouped into two groups viz. single locus marker system and multi-loci marker system. Markers
belonging to the first category are more useful in genetic map construction [e.g restriction
fragment length polymorphism (RFLP) and simple sequence repeats (SSR)]. However, multi-loci
49
markers such randomly amplified polymorphic DNA (RAPD) and amplified fragment length
polymorphism (AFLP) are also used in genetic map construction of various organisms.
Figure 5. (A) Single locus markers (top RFLP; bottom SSR), (B) multi-loci marker system (top
RAPD; bottom AFLP)
As with gene markers, to be useful a DNA marker must exist in at least two allelic forms.
Markers having differences in their allelic forms are said to be polymorphic and others not
having the differences are considered monomorphic. For some of the DNA markers, the
alternative allele may not exist. In other words, any one of the parents will have the marker.
This kind of markers will not be detected in the hybrid of the parental combination in which the
marker is polymorphic. These markers are called as dominant markers. The other category of
markers are said to be codominant since both the alleles of a parental combination can be
detected in the hybrid. The codominant markers thus differentiate the heterozygotes from the
homozygotes thus they are more informative than the dominant markers.
50
Figure 6. A) Polymorphic marker, B) Monomorphic marker,C) Polymorphic co-dominant, D)
Polymorphic dominant
Parental survey to identify polymorphic DNA markers
To construct a genetic map using DNA markers, the foremost requirement is a set of
polymorphic markers covering the entire genome of an organism. This can be achieved by
selecting suitable parents which are not only divergent in their phenotypes but also in their
genotypes. A survey of different individuals with selected DNA marker system will help to
establish a suitable parental combination for developing the mapping population.
Establishing a suitable mapping population
The group of progenies derived from two parents employed to assess the segregation of
DNA markers for the purpose of linkage map construction is called as mapping population. It
would be always advantageous using populations of early generations such as F2, F3, BC
population etc, since these populations can be established in shorter time scale. Apart from
these populations, populations of recombinant inbred lines (RIL) and doubled haploid lines
(DHL) are also used in many of the mapping projects. The maps constructed using the RIL and
DHL populations remain the best choice considering the utility of genetic maps in various
genetic studies.
C
P F P
B) D A
A
B
A
D A
A A
B
51
Figure 7. Various routes to develop mapping populations for the purpose of genetic map
construction
Segregation analysis of polymorphic markers across the mapping population
When a set of polymorphic markers are identified for a particular parental combination
then all those markers are to be surveyed on the segregating population of those parents. The
segregating markers are given scores for their alleles. If the allele in an offspring is similar to P1
the score to be given is ‘1’. Score ‘3’ will be given when the allele is similar to P2. If both alleles
are present then the score to be given is ‘2’.
Figure 8. Segregation of an RFLP marker on a set of recombinant inbred lines.
P1 P2 X
F1 P1 X
F1 P2 X
B1
B2
F2 F4 F3 F8
Backcross Populations
RIL
DHL
52
Linkage analysis to establish linkage map of each chromosome
When the segregation pattern for a set of polymorphic markers is generated, the data
set is subjected to two-point analysis followed by multi-point analysis using standard software
packages. There are several software packages available that made the process of linkage map
construction simple. An alphabetic list of genetic analysis software is available at
http://www.nslij-genetics.org/soft/ ;http://linkage.rockefeller.edu/soft/. The routinely used
software for linkage map construction is from Lander et al. (1987). The online tutorial is
available at http://www-genome.wi.mit.edu/genome_software.
Assigning chromosome numbers to linkage groups
During early phase of linkage map construction, chromosomal assignment to linkage
groups are done based on the establishment of chromosome specific markers using cytogenetic
stocks such trisomics and monosomics. Most of the RFLP markers are assigned with
chromosome numbers based on the dosage analysis using trisomics. With the advent of multi-
loci markers for genetic map construction, the chromosomal assignment is done by involving
markers in the linkage analysis for which the chromosomal positions are known.
Figure 8. A molecular linkage map of rice chromosome # 6.
53
Conclusion
Demonstration of the role of DNA markers in the construction of genetic maps has been
done in major crop species such as tomato (Bernatzky and Tanksley, l986) maize (Helentzaris et
al., l986), lettuce (Landry et al., l987), potato (Bonierbale et al., l988) and rice (McCouch et al.,
l988). The recent developments in molecular biology simplified many aspects in the genome
mapping based on the position of certain land marks (often referred to as genetic markers) in
the genome. Genetic maps are the basic route maps locate both major and minor genes. But
the genetic maps are not just for locating the important genes, it is also about the nature,
organization and interaction of sequences on a chromosome. They also provide the key to
understanding the nature of genes and their action across genomes. The molecular marker
based genetic maps give the clue to clone the genes of agronomic importance. The increased
use of genome maps in crop species is realized in various crop species which resulted in the
better understanding on the molecular environment of the genomes.
References
Bernatzky, R., and Tanksley, S. D., 1986 Toward a saturated linkage map in tommato based on
isozymes and random cDNA sequences. Genetics 112: 887-898.
Bonierbalae, M. B., Plaisted, R. L., and Tanksley, S. D., l988 RFLP maps based on common set of
clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 1095-
1103.
Helentzaris, T., Slocum, M., Wright, S., Shaefer, A., and Nienhuis, J., 1986 Construction of
genetic linkage maps in maize and toamto using restriction fragment length polymorphism.
Theor. Appl. Genet., 72: 761-769.
Lander, E. S., Green, P., Abrahamson, J., Barlow, A., Daly, M., Lincoln, S. E., and Newburg, L.,
1987 Mapmaker: an interactive computer package for constructing primary genetic linkage
maps of experimental and natural populations. Genomics 1: 174-181.
54
Landry, B.S., Kesseli, R.V., Farrara, B., and Michelmore, R. W., l987 A genetic map of lettuce
(Lactuca sativa L.) with restriction fragment length polymorphismm, isozyme, disease
resistance, and morphological markers. Genetics 116: 331-337.
Mccouch, S. R., Kochert, G., Yu, Z. H., Wang, Z.Y., Khush, G. S., Coffman, W. R., and Tanksley, S.
D., 1988 Molecular mapping rice chromosomes. Theor. Appl. Genet., 76: 815-829.
55
Nutrigenomics in Agriculture
Dr. P. Nagarajan* and Devkar Vikas Suresh *Professor and Head
Department of Plant Molecular Biology and Bioinformatics, CPMB&B, Tamil Nadu Agricultural University, Coimbatore.
In modern molecular biology and genetics, the genome is the entirety of an organism's
hereditary information. It is encoded either in DNA or for many types of virus, in RNA. The
genome includes both the genes and the non-coding sequences of the DNA. The term was
adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany.
It is a portmanteau word i.e., blending the sounds and the meaning of the two other words
gene and chromosome, (Gene + ome (for mass), large number of genes gathered. The term
genome can be applied specifically to mean that stored on a complete set of nuclear DNA (i.e.,
the "nuclear genome"). Although plants possess mitochondrial and chloroplast genomes, their
nuclear genome is the largest and most complex. Until very recently, the molecular analysis of
plants often focused on the single gene level. Recent technological advances have changed this
paradigm, enabling the analysis of organisms in terms of genome organization, expression and
interaction. The study of the way genes and genetic information are organized within the
genome, the methods of collecting and analyzing this information and how this organization
determines their biological functionality is referred to as genomics. Genomics is reversing the
previous paradigm of identifying genes behind biological functions and instead focuses on
finding biological functions behind genes and also reduces the gap between phenotype and
genotype.
Effects of the Human Genome Project are surfacing in anticipated and unanticipated
areas. One of the key discoveries from the project is the existence of individual differences in
gene sequences that result in differential response to environmental factors, such as diet.
Those genetic differences, single nucleotide polymorphisms (SNPs, pronounced snips), are the
key genetic enabler of the emerging scientific discipline called nutrigenomics or nutritional
56
genomics. In the broad arena of health and wellness, one of the major impacts on the food
industry, throughout the value chain from agricultural crops through consumer products, is this
emerging understanding of human nutrition at the molecular level of gene expression. The
science of nutrigenomics is the study of how naturally occurring chemicals in foods alter
molecular expression of genetic information in each individual. In addition to bioinformatics,
three scientific methodological approaches underpin nutrigenomics. Those methodological
approaches are based on nutrition, molecular biology, and genomics. Integration of these
disciplines are leading to identification and understanding of individual and population
differences and similarities in gene expression, or phenotype, in response to diet. When a gene
is activated or expressed, a protein is produced. Through molecular biology and the tools of
genomics have identified genes responsible for production of nutritionally important proteins
such as digestive enzymes, transport molecules responsible for ferrying nutrients and cofactors
to their site of use, and numerous other molecules responsible for metabolism and utilization
of our dietary components, including macronutrients, vitamins, minerals, and phytochemicals.
Gene expression patterns produce a phenotype, which represents the physical characteristics
or observable traits of an organism, e.g., hair color, weight, or the presence or absence of a
disease. Phenotypic traits are not necessarily produced by genes alone. Phenotypic expression
is influenced by nutrition.
India has the largest number of vitamin-A deficient children in the world with 3, 30,000
children dying directly or indirectly every year, due to this deficiency. Food sources containing
natural carotenoids may be more beneficial than vitamin supplements (Van den Berg et al.,
2000). Folate is required for synthesis of purine, pyrimidine, nucleoproteins and for
methylation reactions that play an important role in cell division and development. Folates also
plays important role in proper growth and for optimal functioning of the bone marrow. Folate
deficiency is one of the world’s most common human nutrition problems because humans and
animals are unable to make folates de novo and hence depend on dietary sources, especially on
plants (Blancquaert et al., 2010, Scott et al., 2000).
57
Pulses are the predominant source of vegetarian protein in Indian diet. Pulses contain
18 - 31 percent of protein. Pulses are grown for direct human consumption, available all year
and important sources of dietary proteins, dietary fiber, iron, zinc, and calcium. Pulses have
contributed significantly in providing nutritionally balanced food predominantly in vegetarian
population of India. Keeping above in mind, to contribute towards nutrigenomis an attempt
was made to enrich the β-carotene and folate in pulses through genomics approaches. Study
was carried out on estimation of total carotenoids by spectrophotometer, β-carotene
quantification by HPLC and 5-methyltetrahydrofolate (5-CH3-H4-folate) extraction by trienzyme
treatment (α-amylase, protease, folateconjugase) and quantification of its content by HPLC in
different pulses namely bengal gram, red gram, green gram, black gram and soybean.
In bengal gram, the variety CO 3 was found to contain highest total carotenoids (1003
µg/100g) and β-carotene (99.4 µg/100g), whereas 5-CH3-H4-folate was found to be more in CO
4 (25.11µg/100g). Among all the red gram varieties tested, CO 7 was found to contain highest
total carotenoids (1161µg/100g), β-carotene (105.6 µg/100g) and 5-CH3-H4-folate (46.08
µg/100g). In green gram, the CO 7 variety was found to be contain highest total carotenoids
(974 µg/100g) and β-carotene (165.3 µg/100g), whereas CO 5 was found to contain highest 5-
CH3-H4-folate (46.08 µg/100g). In black gram, the variety VBN 5 showed highest total
carotenoids (661 µg/100g), highest in β-carotene (70.3 µg/100g), while 5-CH3-H4-folate
content was more in CO 6 variety (23.84µg/100g). Among the soybean cultivar the total
carotenoids was found to be highest in CO 2 (233 µg/100g) and β-carotene (70.3 µg/100g),
while 5-CH3-H4-folate content was more in CO 1 (46.08 µg/100g).
Several studies have demonstrated that carotenoid accumulation is mainly controlled by
the transcriptional regulation of carotenoid biosynthetic genes like Lycopene is cyclized by ε
and/or β-cyclase to give rise to the yellow-orange pigments, (Matthews et al., 2003). Bai et al.
(2009) suggested that higher levels of LCYB in the endosperm would result in increased β-
carotene content. The Arabidopsis aminodeoxychorismate synthase (AtADCS) gene is
introduced in tomato plant with E8 promoter. Due to this gene incorporation there was 19 fold
58
enhancements in PABA. There crossing between the AtADCS plants and T1 folE gene (GCH)
plants. The GCHI/ AtADCS transgenic fruit accumulated enough folates (840 g per 100 g) to
provide the whole recommended dietary of pregnant women (Storozhenko et al., 2005).
The degenerate primers were designed for β-carotene biosynthesizing gene (Lcy-β) and
two of the folate biosynthesizing genes (ADCS and DHNA). The PCR conditions were
standardized using genomic DNA from all the pulse crops tested. The expression level of β-
carotene and folate biosynthesizing genes was determined by reverse transcriptase PCR (RT-
PCR) analysis. The β-carotene biosynthesizing gene β-lycopene cyclase (Lcy-β) expression was
observed same in all pulse crops, but in folate biosynthesizing genes such as
aminodeoxychorismate synthase (ADCS) and dihydroneopterinaldolase gene (DHNA) shown
more expression in soybean followed by green gram and bengal gram.
References
Bai, L. k., Della, P. D., and Brutnell, T. P., 2009 Novel lycopene epsilon cyclase activities in maize
revealed through perturbation of carotenoid biosynthesis. The Plant Journal 59: 588–599.
Blancquaert, D., Storozhenko, S., Loizeau, K., De Steur, H., and De Brouwer, V., 2010 Folates and
folic acid: from fundamental research toward sustainable health. Critical Reviews in Plant
Sciences 29: 14–35.
Matthews, P. D., Luo, R., and Wurtzel, E.T., 2003 Maize phytoene desaturase and zeta
carotenedesaturase catalyze a poly-Z desaturation path- way: implications for genetic
engineeringof carotenoid content among cereal crops. Journal of Experimental Botany 54:
2215–2230.
Scott, J., Rebeille, F., and Fletcher, J., 2000 Folic acid and folates: the feasibility for nutritional
enhancement in plant foods. Journal of the Science of Food and Agriculture 80: 795–824.
Storozhenkoa, S., Ravanelb, S., Zhangc, G. F., Rebeille, F., Lambertc, W., and Straeten, D., 2005
Folate enhancement in staple crops by metabolic engineering. Trends in Food Science &
Technology 16: 271–281.
59
Van den Berg, H., Faulks, R., Granado, H.F., Hirschberg, J., Olmedilla, B., Sandmann, G., Southon,
S., and Stahl, W., 2000. The potential for the improvement of carotenoid levels in foods and the
likely systemic effects. Journal of the Science Food and Agriculture 80: 880–912.
60
Targeting Induced Local Lesions IN Genomes (TILLING): Application of Traditional mutagenesis for Plant Functional Genomics
Madasu Viswanath, V. Anusheela, A. John Joel and Dr. S. Ganesh Ram
Department of Plant Genetic Resources, Centre for Plant Breeding and Genetics, Tamil Nadu Agricultural University, Coimbatore.
Introduction
Crop improvement has a long history as key agronomic traits have been selected over
thousands of years during the domestication of crops. More recently, this progress has been
accelerated as the Green Revolution has brought about great increases in crop yields (Khush,
2001). With the advent of genomics in the last 25 years, opportunities for crop improvement
have continued to grow and may help to meet future challenges of food production and land
sustainability. Novel DNA sequence information allows the development of additional
molecular markers for breeding as well as providing targets for transgenic alteration of gene
expression and introduction of new traits. Recently, a method called Targeting Induced Local
Lesions IN Genomes (TILLING) was developed to take advantage of this new DNA sequence
information and to investigate the functions of specific genes. TILLING also shows promise as a
non- transgenic tool to improve domesticated crops by introducing and identifying novel
genetic variation in genes that affect key traits.
The TILLING method
TILLING applies advances in molecular biology and genomics to the identification of
genetic variation at the level of a single base pair. In the first step of the TILLING process, novel
single base pair changes are induced in a population of plants by treating seeds (or pollen) with
a chemical mutagen, and then advancing plants to a generation where mutations will be stably
inherited. DNA is extracted, and seeds are stored from all members of the population to create
a resource that can be accessed repeatedly over time. Establishing a good population and
preparing DNA samples from what typically constitutes thousands of plants for a TILLING
‘library’ can easily take the better part of two years for crop plants. However, chemical
61
mutagenesis is readily applicable to many diverse plants and animals, and so has the potential
for widespread use (Kodym & Afza, 2003). In crops that propagate vegetatively or plants with
long generation times (e.g., mint, coffee, and trees), creating such a population may be more
difficult. However, even in these cases, natural variation in crop and non-crop species can by
uncovered by ‘ecoTILLING’ (Comai et al., 2004).
For a TILLING assay, PCR primers are designed to specifically amplify a single gene target
of interest. Specificity is especially important if a target is a member of a gene family or part of
a polyploid genome. The application of bioinformatics tools can also help leverage functional
genomics data across species for investigation of targets in novel crops. Next, dye-labelled
primers are used to amplify PCR products from pooled DNA of multiple individuals. These PCR
products are denatured and reannealed to allow the formation of mismatched base pairs.
Mismatches, or heteroduplexes, represent both naturally occurring single nucleotide
polymorphisms (SNPs) (i.e., several plants from the population are likely to carry the same
polymorphism) and induced SNPs (i.e., only rare individual plants are likely to display the
mutation). After heteroduplex formation, the use of an endonuclease, Cel I, that recognizes and
cleaves mismatched DNA is the key to discovering novel SNPs within a TILLING population
(Oleykowski, et al., 1998; Colbert et al., 2001). Using this approach, many thousands of plants
(or animals) can be screened to identify any individual with a single base change as well as small
insertions or deletions (1–30 bp) in any gene or specific region of the genome (Comai et al.,
2004). To increase throughput, DNA is typically pooled from 4 to 8 individuals. Genomic
fragments being assayed can range in size anywhere from 0.3 to 1.6 Kb. At 8-fold pooling, 1.4
Kb fragments (discounting the ends of fragments where SNP detection is problematic due to
noise) and 96 lanes per assay, this combination allows up to a million base pairs of genomic
DNA to be screened per single assay, making TILLING a high-throughput technique.
TILLING by sequencing
TILLING constitutes a desirable alternative as a functional genomic tool and indeed its
application has now become widespread as multiple TILLING projects focused on a variety of
crops are underway in the US, Europe, Australia, and Asia. Currently, TILLING efforts employ the
62
CEL I-LiCor technology, or close variations. This method is time consuming and very expensive
as it employs the use of fluorescent primers for initiating the mutation screens. The fluorescent
primers vary in quality and are intrinsically less efficient at PCR than regular ones. Operating the
LiCor apparatus is labour intensive and the technology is susceptible to performance
fluctuations. Ultra-high throughput sequencing is becoming available at low cost and it is a very
attractive approach for TILLING. A new and advantageous model for a TILLING service is now
possible. Academic laboratories, public and private breeding programs add their favourite
genes to the TILLING service list. A resequencing run is initiated on a set of genes and
additionally in this method, genes from different organisms can be mixed and processed in one
run so that one does not have to wait for a critical number of targets to emerge in any given
crop system. Indeed, the different genes provide biological barcoding that facilitates the
analysis of massively parallel experiments. This method is particularly advantageous for pilot
testing, where fewer individuals can now be sampled at many genes.
TILLING as a functional genomics tool
Once mutations are identified by resequencing, the predicted impact on gene function is
assessed and the data are posted to the community of users. Each user then chooses which
mutations to characterize. For a gene of average composition and structure, about 5% of
mutations are predicted to result in protein truncation because they are nonsense or affect
splicing. About 50% of discovered mutations result in missense changes whose impact on
function can be estimated in silico. The rest of the mutations are silent. For uncharacterized
genes, complete knock-outs are desirable and so truncation mutations will be selected first. For
other genes, allelic series of varying severity may be desirable and so a range of missense alleles
will be selected.
Arabidopsis TILLING
Since the inception of TILLING, this method has been widely used for the study of
functional genomics in plants, especially for the model plant Arabidopsis thaliana. Greene et al.
(2003) reported that the Arabidopsis TILLING Project (ATP), which was set up and introduced as
63
a public service for the Arabidopsis community (Till et al 2003), had detected 1,890 mutations
in 192 target gene fragments. Heterozygote mutations were detected at twice the rate of
homozygote mutations. Therefore, the mutational density for treatment of Arabidopsis with
EMS was approximately 1 mutation / 300 kb of DNA screened with these mutations distributed
throughout the genome. The numerous mutations in Arabidopsis thaliana that have been
identified via TILLING have provided an allelic series of phenotypes and genotypes to elucidate
gene and protein function throughout the genome for Arabidopsis researchers.
TILLING in crops
Even though TILLING was originally designed for and applied mainly in Arabidopsis in the
early years of its inception, it has been demonstrated to be an extremely versatile approach
compared to many other reverse genetic approaches. This method has proven to be successful
to rapidly identify variant genotypes and determine gene function in plants that are diploid and
have relatively small genomes such as Arabidopsis. Additionally, it also can be easily applied to
other crop plants with very large genomes that are further complicated by various ploidy levels
such as wheat. Moreover, the uses of chemical mutagens in diploid and polyploidy plants
produce a range of various mutations and a high density of mutations throughout the genome.
Obtaining an allelic series for a gene of interest greatly assists in the overall determination of
gene function by providing multiple types of phenotypic variants to be analyzed.
Maize
In 2004, maize, which is an important staple crop with a large genome, was shown to be
conducive to the TILLING method (Till et al 2004). A total of 11 genes were examined in a
population of 750 mutagenized plants and six of these 11 genes had detectable induced
mutations. One of the genes examined in this study, DMT102 (chromomethylase gene), has
been previously suggested to play a role in non-CpG DNA methylation and gene silencing in
Arabidopsis. A Maize TILLING Project established in 2005 at Purdue University has already
identified 319 mutations in 62 genes, which has greatly assisted functional genomic studies in
maize (Weil and Monde 2007). Barley, which is also an important cereal crop with a fairly large
64
genome size of ~5,300 Mb, was evaluated for the ability of induced mutations to be detected
by TILLING. Two genes (Hin-a and HvFor1) were examined and 10 variants were identified, six of
which were missense mutations. Phenotyping the M3 individuals demonstrated that 20% had
visible phenotypes.
Wheat
Wheat is an extremely important agronomic staple crop with an estimated production
level of 600 million tons per year (Bagge et al 2007). A polyploid plant investigation to locate
variants in the waxy locus (granule-bound starch synthase I, GBSSI) in wheat was implemented.
Partial waxy wheat cultivars are desirable because production of amylose starch is reduced,
which leads to the production of superior flour and noodle products for human consumption.
Wheat genetics can be complicated because its genome is complex, it is an allohexaploid, and
the total genome size is quite large (17,000 Mb). A total of 246 alleles were uncovered in three
waxy gene homoeologues (Wx-A1, Wx-B1, and Wx-D1) from allohexaploid and allotetraploid
wheat via TILLING. This comprehensive allelic series provided 84 missense, three nonsense and
five splice site mutations. Phenotyping of M3 progeny demonstrated reduction of amylose
production. Detecting genetic variants via Phenotyping in wheat can be difficult because
redundant copies of loci in the genome can mask expression. This study identified more
extensive allelic variation in GBSSI than was identified in any report produced in the last 25
years (Slade et al 2005).
Rice
Rice, which is also a staple and important economic crop around the world, currently
estimated to provide 80% of the caloric intake for three billion people (Storozhenko et al 2007),
has been the focus of a few TILLING studies. The rice genome has been predicted to contain
~50,000 genes, of which gene function needs to be determined empirically. In 2005, a report
was published on the generation of a large mutation population (60,000) using multiple
chemical mutagens on IR64, a widely grown indica rice (Wu et al 2005). This study
demonstrated that TILLING was suitable for reverse genetic studies with mutations detected in
65
two genes; albeit, the mutational density in the population was fairly low. In addition, extensive
phenotypic variation was assessed for the various chemical mutagens used to develop the
mutant population and albinism was a common phenotype no matter which mutagen was
applied. In a separate study, EMS and Az-MNU were used to induce an elevated mutational
density in rice, with 57 polymorphisms identified from 10 target genes by TILLING (Till et al
2007). Another report on rice TILLING published in 2007, demonstrated the efficacy of TILLING
to detect mutations by separation of products on Agarose gels (Raghavan et al 2007). Results
were analogous to pooling DNA and detecting mutations on a LI-COR DNA Analyzer.
Pearl Millet
TILLING populations were developed in the inbred line “P1449-2-P1”. Assuming a
population size of 10,000 M2 lines to be ideal for mining allelic variants in drought- responsive
candidate genes, a total of 31,000 seeds of inbred line “P1449-2-P1” were mutagenized in three
different batches using 5, 7.5, 9 and 10 mM EMS and 10,000 TILLING lines, were developed. A
total of 9 pooled plates, 5 plates from the DNA isolated from the initial set of mutagenesis
experiments and 4 pooled plated from mutant lines generated during 2007 were prepared.
Further, all the plate records were uploaded into LIMS (Laboratory Information Management
System) followed at ICRISAT for data storage and retrieval. Thus the pools of DNA from mutant
population will be made available for international pearl millet community for allelic mining
candidate genes. Further, efforts are underway to mine allelic variants for drought responsive
candidate genes (viz, DREB2A) in the mutant DNA pools prepared < http://www.icrisat.org/bt-
gene-discovery.htm>.
Soybean
Soybean contains approximately 35-50% protein and has been shown to be beneficial
for human health. It is an important economic crop that can improve soil quality by fixing
nitrogen. Four mutant populations from two genetic backgrounds (Forrest and Williams 82)
were created for soybean by treatment with EMS or NMU and evaluated for induced mutations
(Cooper et al 2008). Several of the target genes initially tested amplified more than one target.
66
Further work was carried out to produce a single product to employ TILLING so that mutation
detection functioned optimally. A total of 116 mutations were identified via TILLING from seven
target genes. The majority of the mutations uncovered by TILLING were determined to be the
expected G/C to A/T transitions. This study demonstrated that soybean is suitable for TILLING
studies.
Conclusion
Nucleotide sequences contain hidden information about the forces for conservation and
variation that shaped their evolutionary history. It is known that detection of a gene
responsible for a mutation is a challenging process. TILLING and Eco TILLING are inexpensive
and swift natural polymorphism detection and genotyping methods . They have advantages for
determining the range of variation for genetic mapping based on linkage analysis. In these
techniques, if a mutation is detected in a pool, the individual DNA samples that went into the
pool can be individually analyzed to identify the individual that carries the mutation. Once this
individual has been identified, its phenotype can be determined. These techniques work with
good results, even if a population contains pre-existing mutations that would compromise SNP
discovery by other methodologies. The use of these approaches can facilitate the handling of
the large-scale discovery of induced point mutations through populations that are required.
These new screening methods can be applied to several plant species, whether small or large,
diploid or allohexaploid in nature. They may serve as rapid approaches to identify the induced
and naturally occurring variation in many species. Now the successes have been reported in a
variety of important plant species, the next challenge is to use these technologies to develop
improved crop varieties.
References
Bagge, M., Xia, X., Lübberstedt, T., 2007 Functional markers in wheat. Curr Opin Plant Biol.
10(2): 211- 216.
67
Chitra, Raghavan, M. A., Elizabeth, B., Naredo, Hehe, Wang, Genelou, Atienza, Bin Liu, Fulin Qiu,
Kenneth, L., McNally Hei Leung., 2007 Rapid method for detecting SNPs on agarose gels and its
application in candidate gene mapping. Mol Breeding 19: 87–101.
Colbert, T., Till, B. J., Tompa, R., Reynolds, S., Steine, M. N., Yeung, A. T., McCallum, C. M.,
Comai, L., and Henikoff, S., 2001 High throughput screening for induced point mutations. Plant
Physiol 126: 480–484.
Comai, L.,, Young K, Till B. J., Reynolds, S. H., Greene, E. A., Codomo, C. A., Enns, L. C., Johnson,
J. E., Burtner, C., Odden, A.R., and Henikoff, S., 2004 Efficient discovery of DNA polymorphisms
in natural populations by Ecotilling. Plant J 37: 778–786
Cooper, J. L., Till, B. J., Laport, R. G., Darlow, M. C., Kleffner, J. M., Jamai, A., El-Mellouki, T., Liu,
S., Ritchie, R., Nielsen, N., et al., 2008 TILLING to detect induced mutations in soybean. BMC
Plant Biol 8:9.
Greene, E. A., Codomo, C. A., Taylor, N. E., Henikoff, J. G., Till, B. J., Reynolds, S. H., Enns LC, et
al., 2003 Spectrum of chemically induced mutations from a large-scale reverse-genetic screen
in Arabidopsis. Genetics. 164(2): 731-40
Jain-Li Wu, Chanjian Wu, Cailin Leil, Marietta Baraoidan, Alicia Bordeos, et al., 2005 Chemical
and irradiation-induced mutants of indica rice IR64 for forward and reverse genetics. Plant
Molecular Biology.59: 85-97
Khush, G.S., 2001 Challenges for meeting the global food and nutrient needs in the new
millennium. Proc Nutr Soc. 60(1): 15-26.
Kodym, A., and Afza, R., 2003 Physical and chemical mutagenesis.Methods Mol Biol 236: 189–
204.
Oleykowski, C. A., Bronson Mullins, C. R., Godwin, A. K., and Yeung, A. T., 1998 Mutation
detection using a novel plant endonuclease. Nucleic Acids Res 26: 4597–4602.
68
Slade, A. J., Fuerstenberg, S. I., Loeffler, D., Steine, M. N., Facciotti, D., 2005 A reverse genetic,
nontransgenic approach to wheat crop improvement by TILLING. Nat Biotechnol , 23(1): 75-81.
Storozhenko, S., De Brouwer, V., Volckaert, M., Navarrete, O., Blancquaert, D., Zhang, G. F.,
Lambert, W., Van Der Straeten, D., 2007 Folate fortification of rice by metabolic engineering.
Nat Biotechnol 11: 1277-9.
Till, B. J., Cooper, J., Tai, T. H., Colowit, P., Greene, E. A., Henikoff, S., Comai, L., 2007 Discovery
of chemically induced mutations in rice by TILLING. BMC Plant Biol 7:19.
Weil. C. F., and Monde, R., 2007 Getting the point - mutations in maize. Crop Sci. 47: 60-67.
69
Genomic approaches of nitrogen and phosphorus efficient rice
Dr. K K Vinod IARI, Rice Breeding and Genetics Research Centre
Aduthurai, Tanjavur District. [email protected]
Introduction
Life in Asia depends on rice not only because it provides more than 70% of thedaily
calories for the population, butalso because of its important role as a source of income for
millions of rice farmersand landless workers (Dawe 2000).Global food security is at stake since
the demand for rice is exceeding production. Furthermore, increase in rice production shows
adiminishing trend, fallingfrom 1.6 per cent per annum in the 1990s toless than 1.0% by 2010
(FAO2003). Having surpassed the seven billion mark, the global population growth is thelone
serious factor that influence increased demand for rice (UNFPA 2011). With theincreasing
urbanization, land suitable and available for agriculture is diminishing and an increase in rice
production has to be obtained by increasing productivity, that is, increasedyield per unit land.
In addition to improved and sustainable agro-management options, higher yielding and more
nutrient-efficient genotypes have to be developed in order to secure rice production.
Mineral nutrition in rice requiressixteen essential elements (De Datta 1981), of which
nitrogen (N), phosphorus (P) and potassium (K) are applied to rice fields as chemical fertilizers
in large quantities. N and P are fundamental to crop development because they form the basic
component of many organic molecules, nucleic acids, and proteins (Lea and Mifflin 2011).
Estimates suggest that, from 2008–2012, global fertilizer demand will increase by 1.7% annually
amounting to about 15 million tons, of which 69% are required in Asia alone (FAO 2008). By
2012, global demand for N fertilizer will increase by about 1.4% (7.3 M t) and by about 2% (4.2
M t) for P fertilizer. Since Asia’s share in global rice production is more than 90% (Gulati and
Narayanan 2002) a substantial proportion of the increased fertilizer demand would be utilized
for rice production (Witt et al., 2009). This is of growing concern because recent estimates (FAO
70
2008) indicate a decliningtrend in nutrient-use efficiency as a result of fertilizer consumption
exceeding grain production. In addition, fertilizer prices are increasing due to high energy costs
and because natural resources are limited and increasingly difficult to assess, as will be outlined
in more detail below. Therefore, a balanced and sustainable use of fertilizers is of utmost
importance. The food crisis in 2008 (Von Braun 2008) which triggered socio-political unrests
worldwide has given us a first glimpse on what there is to come if we do not succeed in
providing sufficient food at affordable costs, long term and in a sustainable way.
Nitrogen – A diminutively plant availablenutrient
In spite of being the most abundant element in the atmosphere (78%), N is one of the
most limiting nutrients in natural and agricultural ecosystems. It enters into the soil only in
miniscule amounts through natural precipitation and biological N fixation. Although significant
N reserves are present in the soil (2–20 tha-1) (Bockman et al., 1990), only a limited amount is
available to plants. Since plants require large quantities of N, greater than any other primary
nutrient, plant assimilation of soil N often exceeds the amount being replenished (Epstein and
Bloom 2005). Native sources of soil N include pre-existing inorganic and organic forms, net N
mineralization from organic matter, biological N fixation and N inputs from irrigation waters,
and atmosphere deposition (Cassman et al., 1996). The majority of plant-useable N is
consumed as nitrate (NO3) from well-aerated soils and as ammonium (NH4+) from poorly
aerated, submerged soils (Huang et al., 2000). Although NH4+uptake demands less energy than
NO3, only few plant species, such as rice, are capable of growing exclusively with NH4+
(Kronzucker et al., 1999). Since rice is capable of assimilating both forms of N (Wang et al.,
1993a; Kronzucker et al., 1998) it is adapted to aerobic as well as to anaerobic growth
conditions.
In traditional rice ecosystems, low-N stress is a problem in marginal areas where no or
sub-optimal levels of N are applied (Laffite and Edmeades 1994a,b; Agrama et al., 1999)
because farmers lack resources to purchase fertilizer and agricultural practices are subsistence
farming (Ortiz-Monasterio et al., 2001). Apart from the immense benefit of mineral fertilizer,
71
when applied in excess it causes eutrophication of fresh water estuaries and coastal water
ecosystems (Raven and Taylor 2003)and increased emission of greenhouse gases, such as
nitrous oxide (N2O, Matson et al., 1998). If this practice continues, problems of nutrient
leaching and atmospheric contamination soon will become a wide-spread problem also in
developing countries (Ortiz-Monasterio et al., 2001).
Furthermore, diminishing non-renewable global energy resources, such as petroleum
and natural gas, demands for a more efficient fertilizer use for two compelling reasons; (a)
conservation of energy, because the chemical synthesis of N fertilizers requires ammonia
produced by the Haber-Bosch process (Travis 1993) for which natural methane is the major
hydrogen source and (b) for curtailing the ever-increasing fertilizer costs.
Although, N deficiency in particular, can be managed to a certain extent by addition of
organic manures and by cultivation of fallow legumes (Balasubramanian et al., 2004), a
significant reduction in N-fertilizer application can be achieved by optimizing rate and timing of
fertilizer application to synchronize N supply and demand (Peng and Bouman 2007). To further
reduce N applications, an alternative approach is the development of varieties that use N more
efficiently, either physiologically, that is increased carbon gain per unit plant N and time, or
agronomically, that is greater dry matter production and protein yield per unit N (Laungani and
Knops 2009).
Crop varieties can respond to nutrient supply in four different ways (Figure 1), viz.,
efficient and inefficient under nutrient deficiency, and responder and non-responder under
nutrient sufficiency (Ortiz-Monasterio et al., 2001). Efficient genotypes are those that possess
high external (uptake) efficiency, while responders are characterised by high internal
(utilization) efficiency. Since nutrient uptake and utilization are interdependent, it is difficult to
distinguisha responder from an efficient cultivar. It is therefore important to developefficient
selection criteria for low-nutrient tolerance, to differentiate between external and internal
efficiency (Agrama 2006) and to assess genetic variability (Broadbent et al., 1987; DeDatta and
Broadbent 1988). This necessitates a careful integration of both physiological and agronomic
72
evaluation of cultivars under low and high-nutrient regimes. For high-input systems, remarkable
breeding efforts have already been carried out for the selection of responder varieties with high
internal efficiency. This now has to be complemented with breeding of cultivars that are
tolerant of nutrient-deficiency and cultivars that maintain a high yield with reduced fertilizer
inputs.
Selection in environments with low nutrient status is often plagued by problems of low
heritability and high environmental variability. Although, nutrient-scarce environments are not
normally preferred by breeders, selection under such conditions will be more effective for yield
per se rather than selection for yield potential alone (Muruli and Paulsen 1981; Blum 1988).
Evaluation under field conditions is preferable over screenings in nutrient solution since the
latter cannot simulate the complex soil-plant interaction. Furthermore, in the case of N,
screening of genotypes at low-N levels would additionally avoid problems of vulnerability to
pests and diseases and lodging, which creates artefacts at high N levels (Tirol-Padre et al.,
1996).
To establish selection criteria for higher uptake efficiency, it is essential to study yield
components, biomass production, as well as nutrient assimilation in relation to traits
determining uptake efficiency that is a large root volume, efficient nutrient absorption, and
nutrient transport (Bassirirad 2000; Wang et al., 2006). In addition, a clear understanding of
environmental parameters, such as location effect, weather, radiation etc., and their effect on
nutrient uptake in different genotypes can leverage an objective-based distinction between
uptake vs. utilization efficiency in the target environment. This is particularly important
becausehigh nutrient-uptake efficiency without repletion of nutrient might accelerate nutrient
depletion, while low-nutrient efficient genotypes may lead to increased nutrient leaching or
volatilization under high nutrient situations (Ortiz-Monasterio et al., 2001).
73
Figure 1. Crop response to nutrient supply
Genes and pathways of nitrogen uptake and metabolism
In rice, excessive N stimulates shoot growth, root inhibition, delayed flowering, and
senescence (Bernier et al., 1993; Stitt 1999), while deficiency results instunted growth,
chlorosis, poor yield, and anthocyanin pigmentation due to carbohydrateaccumulation (Martin
et al., 2002). Under low N conditions, rice plants attempt to acquire more N by increasing the
root surface area which increases the root to shoot ratio (Marschner et al., 1986) with varying
degree depending on the phenological stage (Sheehy et al., 1998). Rice genotypes show
significant variability for N uptake (external efficiency) and utilization (internal efficiency) with
yield being predominantly determined by the uptake process, particularly under low N
conditions (Singh et al., 1998; Witcombe et al., 2008). The most N-efficient rice genotypes are
those capable of accumulating N in the first 35 days of transplanting (Peng et al., 1994).
External efficiency declines as the crop progresses to maturity with reduction in daily uptake of
N towards terminal stages due to increasingly inefficient roots (Sheehy et al., 1998) and internal
Yiel
d un
der
nutr
ient
suf
ficie
ncy
Yield under nutrient deficiency
Efficient responder
Inefficient responder
Efficient non-responder
Inefficient non-responder
Nutrient use
74
N recycling from senescing tissues to the developing panicle (Mae and Ohira 1981). Under low-
N supply, the internal recycling accounts for 70–90% of the total panicle N (Mae 1997; Tabuchi
et al., 2007). On the other hand, N concentrations in the straw at crop maturity is not
significantly affected by changes in N supply at terminal stages (Tirol-Padre et al., 1996), hence,
N uptake prior to panicle initiation is critical in building up the internal N reservoir (Figure 2).
In young rice plants, roots play a significant role in N absorption with root density and
distribution in the soil as the major determinants. Although there are many studies related to
variation in nitrogen (N) uptake in rice, there seems to be little information on differences in
root morphology that may contribute to this variation. A recent study on the N-use efficiency of
two rice cultivars by Fan et al., (2010) indicates a significant influence of root morphological
parameters and root physiological characteristics at different growth stages. The uptake
process and N homeostasis are complex processes that involve recycling of N (especially amino
acids) from shoots to roots via the phloem, and from shoots to roots via the xylem (Imsande
and Touraine 1994; Marschner et al., 1997).
Figure 2. Field management and plant-related traits relevant for improved nitrogen and
phosphorus efficiency
Deep roots for access to water
Terminal nutrient recycling
Basal fertilizer
N+P
External nutrient use efficiency Nutrient uptake
Early root vigor with large shallow root system
for P uptake
Internal nutrient-use efficiency Translocation potential Alternative pathways
Novel alleles
Weed management Top dress fertilizer (N)
Nutrient transfer from senescing roots
Tolerance to abiotic/ bioic stresses
High-affinity nutrient uptake
Mobilization and foraging of P
Mycorrhizae
High grain N Low grain P Low phytates Early seedling vigour
Functional stay green
75
This is under complex genetic control and the molecular genetics of N utilization is well
studied in model species such as Arabidopsis. Many gene families, including NO3 and NH4+
transporters and primary assimilation genes, amino acid transporters, as well as transcription
factors, and other regulatory genes have been identified by different approaches. With the
identification of orthologous genes from rice, opportunities are now emerging of utilising these
genes in marker-assisted breeding for N-efficiency (for reviews, see Li et al., 2009b; Kant et al.,
2011). N uptake in roots is mainly regulated by high-affinity transport system (HATS) that
regulates uptake at N levels below 1 mM and a low-affinity transport system (LATS) which
functions at higher N concentrations (Forde and Clarkson 1999; Glass et al., 2001; Williams and
Miller 2001). It has been shown that the low-affinity NO3 transporter OsNRT1 contributes
predominantly to N uptake in the root epidermis and in root hairs (Lin et al., 2000) acting in
conjunction with the high affinity and NO3-inducible transporters NRT2 and NAR2 (Cai et al.,
2008). Recent data provided further insight into the complexity of N uptake showing that the
mRNA of OsNRT2.3 is alternatively spliced (OsNRT2.3a/b) during the uptake process, and that
OsNAR2.1 interacts with two other NRT proteins(Yan et al., 2011). Furthermore, it was shown
that most transporter genes are up-regulated by NO3 and suppressed by NH4+, with an
exception of OsNRT2.3b being insensitive to NH4+ (Feng et al., 2011). Similar to Arabidopsis,
high-affinity rice NH4+ transporters are encoded by members of the AMT1 and AMT2 gene
families (Gazzarini et al., 1999; Howitt and Udvardi 2000; Loqué and von Wirén 2004). So far,
twelve AMT genes have been identified located on different chromosomes and grouped into
five sub-families (AMT1 – AMT5) with one to three gene members (Li et al., 2009b). Gene
expression analyses showed that OsAMT1;1 is constitutively expressed in shoots and roots
(Ding et al., 2011). In contrast, OsAMT1;2 shows root-specific expression and is NH4+ inducible,
whereas OsAMT1;3 is root-specific and N-suppressible. In addition, a high-affinity urea
transporter (OsDUR3) encoding for an integral membrane-protein that is up-regulated under N-
deficiency has recently been identified in rice roots (Wang et al., 2012). OsDUR3 over-
expression improved growth on low urea and this gene might therefore play a significant role in
N uptake in rice. Urea is converted to NH4+, and carbon dioxide in the presence of urease,
which requires nickel (Ni) as co-factor. Interestingly, application of Ni has a marked effect on
76
plant growth in rice when urea was provided as the sole N source (Gerendas et al., 1998).
Further, an early nodulin gene (OsENOD93-1) with potential function in amino acid
accumulation and transport has been identified in rice (Bi et al., 2009).
To enable plants to accumulate sufficient internal N, it is critically important to apply N
fertilizer at the right developmental stages. In irrigated rice fields, N fertilizer is usually applied
in three splits, that is, at transplanting (basal), at maximum tillering, and at panicle initiation.
This practice is recommended to rice farmers (Williams et al., 2010) and is now widely adopted.
Urea is the major source of N in rice ecosystems since it is rapidly converted to NH4+ or NO3 by
soil micro-organisms. In cereals, depending on the genotype and environmental conditions, the
root or shoot can be the main site for NO3 assimilation, with root assimilation dominating at
soil-nitrate concentrations below 1 mM and shoot assimilation at concentrations above 1 mM
(Andrews 1986; Andrews et al., 1995). Upon absorption, NO3 is reduced to nitrite by nitrate
reductase (NR) and subsequently to NH4+ by nitrite reductase (NiR). NH4+ from both the
nitrate reduction pathway and direct absorption is subsequently incorporated into amino acids
through synthesis of glutamine and glutamate (Crawford and Glass 1998; Meyer and Stitt 2001;
Campbell 2002) primarily in the chloroplasts and plastids via the glutamine synthetase (GS)/
glutamate synthase (GOGAT) cycle (Tobin and Yamaya 2001; Andrews et al., 2004) and
alternatively via pathways involving glutamate dehydrogenase (GDH) and asparagine
synthetase (Hirel and Lea 2002; Dubois et al., 2003). Although roots have high constitutive
levels of GS and GOGAT, both enzymes are also induced by NH4+ (Ishiyama et al., 2003). For the
conversion of glutamine to glutamate by GOGAT, either ferredoxin (Fd-GOGAT) or NADH
(NADH-GOGAT) is used. For GS, two major forms are known, that is cytosolic GS (GS1),
expressed in roots and shoots, and plastidic GS (GS2), expressed in chloroplasts and plastids.
GS1 constitutes a complex gene family of three to six genes (Hirel and Lea 2002).
Remobilization of internal N during grain filling is another key process in N metabolism.
Among the primary N assimilation genes (for review see Lea and Mifflin 2011), physiological
and biochemical evidences indicate that GS1 plays a major role in the synthesis of glutamine in
77
older leaves which is transported to the growing tissues (Kamachi et al., 1991; Habash et al.,
2001; Masclaux et al., 2001), a process positively related to yield and N use efficiency. The GS1
genes from rice, OsGS1; 1OsGS1; 2 are expressed in all organs but with higher expression in leaf
blades and roots respectively, while OsGS1;3 is found specifically in the spikelet (Tabuchi et al.,
2005). Prior to senescence, GS1 activity is reduced which leads to rapid accumulation of NH4+
and senescence since accumulated NH4+ is highly toxic to plant cells (Chen and Kao 1996). In
addition, it was shown that a knock-out mutation in OsGS1.1 resulted in reduced plant growth
and poor yield (Tabuchi et al., 2005), whereas over-expression of the GS2 gene increased yield
in wheat and Arabidopsis (Habash et al., 2001; Martin et al., 2006), and enhanced photo-
respiratory capacity and salt tolerance in rice (Hoshida et al., 2000). Re-assimilation of
glutamine to glutamate by GOGAT is tissue-specific with Fd-GOGAT predominantly active in
photosynthetic tissues (Lea 1997) whereas NADH-GOGAT has a significant role in developing
sink organs in the reutilization of glutamine (Tabuchi et al., 2007). Over-expression of NADH-
GOGAT increased panicle weight in rice in agreement with its important role in transporting
glutamate to major sink tissues during grain filling (Yamaya et al., 1992), and therefore is
considered a key enzyme in N utilization and grain filling in rice (Andrews et al., 2004). Two
genes of NADH-GOGAT have been reported in rice; OsNADH-GOGAT1 expressed in growing
tissues such as root tips, young spikelets and developing leaf blades and OsNADH-GOAGT2
mainly expressed in mature leaves and leaf sheaths, of which NADH-GOGAT1 is important in N
remobilization (Tabuchi et al., 2007).
Gene-expression profiling of seedlings from the indica-type variety Minghui 63 grown in
N-deficient hydroponics solution (Lian et al., 2006) revealed rapid down-regulation of genes
involved in photosynthesisand energy metabolism in response to N stress. Interestingly, genes
involved in earlyresponses to biotic and abiotic stresses, as well as transcription factors and
other regulatory genes were responsive to N, especially in roots. In contrast, genes known to be
involved in N uptake andassimilation showed little or no response to low N. However, since this
study was conducted in hydroponics with very young seedlings, the data might not represent
the actual situation in soil or the field.
78
Grain development in rice depends on the establishment and maintenance of a
photosynthetically active canopy which acts as a major N storage before internal N is
translocated to the panicle. Since this process occurs at the expense of the photosynthetic
machinery, canopy longevity and maintenance of the photosynthetic capacity are vital for
continuous remobilization of N and starch accumulation (Hawkesford and Howarth 2011). In
this context the functional stay-green (FSG) trait is of interest since it delays leaf senescence
and sustains photosynthesis in maturing plants which leads to increased biomass production
and grain yield (Fu and Lee 2008). Recently, quantitative trait loci (QTLs) have been mapped for
FSG (Yoo et al., 2007; Fu et al., 2011), that confer yield advantage and hold promise for marker-
assisted introgression of the stay-green trait into rice varieties.
Phosphorus - Declining natural reserves and accessibility
In contrast to nitrogen, P is a non-renewable natural resource and concerns are rising
that rock phosphate reserves are limited. A recent study conducted by the International
Fertilizer Development Centre (IFDC) concluded that currently known and easily accessible
world rock phosphate reserves will last approximately another 300-400 years (van
Kauwenbergh 2010). Investments are therefore made to discover new rock phosphate reserves
and to develop alternative technologies to isolate P from e.g., marine sediments and faeces
(see below).Another concern is the unequal geographical distribution of P reserves with the
vast majority of phosphate rock located in Morocco, followed by the USA and China (van
Kauwenbergh 2010). A dependence of the world’s food production on few countries is
problematic and policies should be put in place in time to ensure equal access to P reserves in
the future. In fact, China is already imposing seasonal export taxes to secure national supply of
P fertilizer (http://agfax.com/2011/11/18/chinese-government-imposes-seasonal-export-
taxes).
Whereas P fertilizer is applied in excess in the western world and some Asian countries
(e.g., China, Japan, Korea; MacDonald et al., 2011), P deficiency is a major problem in many
Asian and African countries, as well as in South America. The two main reasons for this are (i)
79
insufficient application of P in form of P-fertilizer or manure and (ii) P-fixing soil properties
which renders P unavailable to plants even if present in sometimes large amounts. P
imbalances in the world with too much in some countries and too little or inaccessible P in
others (MacDonald et al., 2011) will require different measures and breeding strategies.
In modern agriculture, inorganic P fertilizer has widely replaced the application of
manure. A large quantity of P present in animal and human faeces is therefore removed from
the nutrient cycle and ends up in waterways where it is no longer available and difficult to
recycle (Cordell et al., 2009). In realization of this, technologies are being developed in Europe
to extract P from sewage in urban centres to produce Struvite (ammonium magnesium
phosphate) that can be used as P and N fertilizer (http://www.ceep-
phosphates.org/Files/Newsletter/scope50.pdf). In light of decreasing rock phosphate reserves,
these efforts are extremely important and, once applied at a large scale, this technology will
become more competitive in terms of production costs.
The high concentration of P in human and animal faeces is due to the consumption of
phytic acid (PA, inositol hexaphosphate) which is the major (50-80%) storage form of P and
present in large quantities in cereal grains, legumes, soybean, and other plants (for a review
seeLott et al., 2000). PA is not digestible by humans and animals and additionally binds iron,
zinc, and other minerals thereby reducing their bioavailability. Research is therefore on-going
aiming at the reduction of PA in crops (for review see Raboy 2009). Apart from its role as an
“anti-nutrition”, PA is the main source of P drainage from fields. Estimates suggest that50-80%
of the P in cereal grains and legume seeds is stored as PA (Lott et al., 2009). Studies on low PA
(lpa) mutants in rice (Larson et al., 2000; Liu et al., 2007) and barley (Bregitzer and Raboy 2006),
however, showed that total grain P was generally higher in lpa mutants compared with wild
type and this is approach is therefore not directly applicable to reduce P drainage.
For soils naturally low in P or depleted of P due to continuous cropping without
repletion of P (and other nutrients), fertilizer or manure application is inevitable to maintain
productivity and prevent soil degradation. However, continuous cropping of poor soil is often
80
related to poverty and breeding of efficient crops, therefore, has to be complemented with
policy measures providing poor farmers with agricultural inputs. With regard to breeding for
poor soils, crops with high P-uptake and high internal P-use efficiency need to be developed to
maximize yield in such low-input systems. Besides, a combination of both, uptake and internal,
efficiency is equally desirable for high-input systems since it would facilitate reduction of
fertilizer doses without yield penalty. In rice, the P fertilizer-use efficiency is about 25%
(Dobermann and Fairhurst 2000), providing a considerable scope for improvement.
For areas with P fixing soils, high fertilizer applications are currently necessary to
provide sufficient plant-available P. Soils with P-fixing properties are wide-spread in the Asia
Pacific and occur on 5% of the total land area (Bot et al., 2000). In China, Indonesia, Japan,
Thailand and Vietnam, P-fixing soils cover 9-15% of the total land area. These numbers are even
higher in Laos (24%) and Myanmar (16%). In Africa, P fixing soils are especially widespread in
Burundi, Congo, Liberia, Swaziland, and Rwanda (16-29%). Similar numbers are reported from
South America, where P fixation occurs on 14-25% of the total land area in e.g., Brazil,
Colombia, Venezuela, Peru, and Ecuador. In French Guyana, 79% of the total land area has P
fixing properties (Bot et al., 2000). The development of crops that can access P-reserves in
these soils and that are highly efficient in P-fertilizer uptake should therefore be a global
breeding priority. In addition, it is critically important to develop crops with tolerance of
multiple stresses because P deficiency is often a secondary effect in soils with high
concentrations of iron and aluminium, and low pH, which restricts root growth even if P is
available (Ismail et al., 2007).
Genes and pathways of Phosphorus uptake and metabolism
In irrigated rice systems, P fertilizer is generally applied only at the beginning of the
season (basal) whereas N is applied in three splits (basal/after transplanting, maximum tillering,
panicle initiation/booting). This is common practise in Asia and Africa (Haefele et al., 2005;
Hossain et al., 2005) indicating that P acquisition and requirement are highest during early
growth stages. Under P-deficient conditions, plant development is delayed and P-deficiency
81
symptoms (reduced tillering, darker leaves due to anthocyanine accumulation) are easily
recognized and cause significant yield losses (Dobermann and Fairhust 2000). Another
important factor is that P, in contrast to N and K, is not transported with the soil solution (mass
flow) but mainly by diffusion (Schachtman et al., 1998). A large root surface area is therefore
particularly important for P uptake since plants gain access to a larger soil area and thereby to P
(Lynch 2007). In agreement with that, induction of root growth under P-deficiency has been
described in many species (Hernández et al., 2007), though it should be noted that root growth
under –P conditions is generally reduced compared with plants grown under +P conditions
(Chin et al., 2010; Li et al., 2009a).
For yield stability under P-deficient conditions, long root hairs and highly branched root
systems, especially in the top soil where P is mainly located, are considered beneficial
(Ramaekers et al., 2010). In agreement with this, the Pup1 major QTL for tolerance of P-
deficiency is an enhancer of root growth rather than a regulator of specific P-uptake
mechanisms (Heuer et al., unpublished). The same was observed in the P-deficiency tolerant
maize mutant line (99038; Li et al., 2008) which likewise developed a larger root system.
P is transported into the plant by P transporters located in the root plasma membrane.
In the rice reference genome, thirteen P transporter genes are present (Paszkowski et al.,
2002). Two of these transporters have been functionally characterized revealing that OsPT2
encodes a low affinity and OsPT6 a high affinity transporter (Ai et al., 2009). High affinity
transporters are generally induced under low P conditions (Paszkowski et al., 2002) and are
therefore considered more important for P uptake under field conditions since the soil P
concentration is about 1000 times lower compared with intracellular concentrations
(Schachtman et al., 1998). In agreement with this OsPT6, but not OsPT2, was shown to be
expressed in the root epidermis (Ai et al., 2009). In another study (Jia et al., 2011), the effect of
the rice P transporter OsPht1;8 was analyzed by over-expression and RNAi. The authors showed
that P uptake was enhanced and decreased according to the expectation, however, both
approaches lead to a significant reduction in the number and size of panicles as well as to >80%
82
spikelet sterility. In another study on rice, transgenic plants over-expressing the tobacco
transporter NtPT1 were generated but, although some lines outperformed the controls, on
average transgenic lines yielded less (Park et al., 2010). Likewise, it was shown in barley that
over-expression of the transporter gene HORvu; Pht1; 1 did not increase P uptake
(Rae et al., 2004). Taken together, the data suggest that over-expression of high-affinity
transporters alone is not sufficient to improve P efficiency. In addition, because P is rapidly
depleted at the soil-root interphase, it will be critically important to combine high-affinity
uptake with increased access and mobilization of P, e.g., a large root-surface area, exudation of
organic acids, acid phosphatases, and phytases (for review see Gahoonia and Nielson 2004).
For an extended root system and thereby better access to P, mycorrhiza are generally
considered important and the positive effect of arbuscular mycorrhiza (AM) symbiosis on
nutrient uptake, especially P and N, has been demonstrated in several crops (for review see
Karandashov and Bucher 2005; Sawers et al., 2008). In rice, surprisingly few studies on the
effect of AM are available since research focussed on intensive rice systems which are flooded
and anaerobic conditions are generally considered unfavourable for fungi. However, using the
irrigated rice variety Nipponbare, it was shown that the P transporter OsPT11 is specifically
induced in roots colonized with AM (Paszkowski et al., 2002). A study from Japan additionally
showed that AM inoculation of Nipponbare seedlings significantly increased yield in flooded
fields and that the fungi survived under these conditions (Solaiman and Hirata 1997). The latter
was also found in a study by Hajiboland et al., (2009) that additionally reported a significant
growth advantage of plants colonized with mycorrhiza. Surprisingly, this was observed only
under flooded conditions, whereas growth was severely inhibited under aerobic conditions.
Growth depression due to mycorrhiza colonization has also been reported from wheat,
especially under P deficient conditions (Li et al., 2005). However, another study on rice that
analysed the effect of mycorrhiza in four different upland crop-rotationsystems showed a
significant increase in P uptake and grain yield in the system with the highest concentration of
mycorrhiza (Maiti et al., 2012). These data are encouraging and more detailed studies should be
83
conducted to assess the potential of mycorrhiza for enhancement of P uptake in irrigated and
rain-fed rice systems, and to assess genetic diversity for mycorrhiza interaction (Figure 2).
In contrast to enhanced P uptake, improvement of internal P-use efficiency would target
genes/pathways that enable plants to maintain cellular processes and productivity under low
P conditions. Since P is an indispensible component of virtually all cellular functions and
required in large amounts for e.g., ATP, NADPH, nuclear acids, and phosphoproteins, it can be
expected that modification of P-related pathways will affect the whole plant. This is well
reflected in microarray gene expression studies in rice (Wasaki et al., 2003; Pariasca-Tanaka
2009) and Arabidopsis (Wu et al., 2003; Morcuende et al., 2007) showing that, depending on
the study, between 220 and 5800 genes were responsive to P. For breeding applications, it will
be critically important to identify genes that act upstream of this systemic response in order to
reduce complexity and the number of genes/QTLs needed to modify internal P-use efficiency.
The regulatory network of P homeostasis has been well studied in Arabidopsis andfor
somegenes rice orthologs have been identified. The MYB-type transcription factor AtPHR1
(Rubio et al., 2001) and its rice ortholog OsPHR2 (Zhou et al., 2008) act as positive regulators of
P-transporters and other P-responsive genes, whereas other genes act as suppressors of P
starvation genes, e.g., theubiquitin conjugating enzyme AtPHO2 and its rice
orthologOsLTN1(Bari et al., 2006; Aung et al., 2006; Hu et al., 2011). Expression of PHO2 is
negatively regulatedby a micro RNA (miR399) which itself is “sequestered” by target mimicry of
the IPS1 gene (Bari et al., 2006; Aung et al., 2006; Franco-Zorrilla et al., 2007). Two recent
reviews provide an excellent and comprehensive overview over the genes involved and their
interaction (Nilsson et al., 2010; Hammond and White 2011).
Alternative pathways have been described that are up-regulated under P-starvation
utilizing PPi rather than ATP (for reviews see Hammond et al., 2004). This includes UDP-glucose
pyrophosphorylase (PPi-dependent conversion of glucose to hexose-P) and
phosphofructokinase (PPi-dependent phosphorylation of Fructose-6-P). Other adaptive
processes include the substitution of phospholipids with galacto- and sulpholipids, and up-
84
regulation of ribonucleases to mobilize P from nucleic acids (for review see Hammond et al.,
2004). Many other genes and pathways are altered under low P conditions, which is not
surprising in light of the central role of P in living cells. The challenge will be to identify genes
that enhance internal P-use efficiency without causing an imbalance in P homeostasis and
negatively affect plant development.
Interestingly, it has recently been shown in Arabidopsis that down-regulation of the
PHO1 gene (an SPX protein) conferred tolerance via suppression of the P starvation response
(Rouached et al., 2011). A similar observation has been made in tolerant Pup1 rice plants which
did not differentially express P starvation genes in comparison with non-Pup1 controls
(Pariasca-Tanaka et al., 2009; Heuer et al., unpublished). In this context it is important to note
that most studies on P-starvation responses in rice were conducted in the intolerant variety
Nipponbare. The identified genes and pathways, therefore, represent the intolerant response.
In agreement with that genes not formerly related to P-starvation tolerance have been
identified in the Pup1 donor variety Kasalath (Heuer et al., unpublished) as well as in a QTL-
mapping study using a tolerant Arabidopsisaccession (Reymond et al., 2006). It therefore seems
important to further explore genetic diversity, in Arabidopsis as well as in rice, and to identify
additional tolerant genotypes in order to gain access to large-effect genes and QTLs.
Quantitative trait loci for molecular breeding
Molecular breeding now provides a real opportunity to develop varieties with multiple
tolerance traits - provided that large-effect QTLs/genes are available. The number of reported
QTLs is steadily increasing, but still very few are applied in breeding programs. This is largely
due to lack of data that validate QTLs/tolerance genes in different genetic backgrounds and
environments (i.e., in field trials), which is a prerequisite for a large-scale application of QTLs.
In rice, molecular marker-assisted breeding is at an advanced stage for few large-effect
QTLs that confer tolerance of submergence (Sub1; Singh et al., 2010; Septiningsih et al., 2009),
drought (e.g., qtl12.1; Bernier et al., 2009a,b; Swamy et al., 2011), salinity (SalTol; Thomson et
85
al., 2010), and P deficiency tolerance (Chin et al., 2011). The latter QTL, Phosphorus uptake 1
(Pup1), has been identified more than ten years ago (Wissuwa et al., 2002) and is, to our
knowledge, currently the only P-related QTL for which molecular markers are available and
which has been evaluated in different genetic backgrounds under field conditions (Chin et al.,
2010; Chin et al., 2011; Heuer, unpublished data). Additional P related QTLs with smaller effect
have been identified and are summarized in Table 1.
Table 1: Quantitative trait loci for phosphorus-deficiency tolerance in rice
Traits Population Cross No. of QTL
Reference MQTL EQTL
PUP, PDW, TN, PUE NIL Nipponbare / Kasalath
8 - Wissuwa et al., 1998
RTA, RSDW, RRDW RIL IR20 / IR55178 4 - Ni et al., 1998
RE, SDW, RPC, RIC F8 Gimbozu / Kasalath 6 - Shimizu et al., 2004
REP CSSL Nipponbare / Kasalath CSSL29
1 - Shimizu et al., 2008
PH, MRL, RN, RV, RFW, RDW,SDW, TDW, RS
ILs Yuefa / IRAT109 24 29 Li et al., 2009a
PUP, Phosphorus uptake; DW, plant dry weight; TN, tiller number; PUE, Phosphorus use
efficiency; RTA, relative tillering ability; RSDW, relative shoot dry weight: RRDW, relative root
dry weight; RE, root elongation; SDW, ; RPC, relative phosphorus content; RIC, relative Fe
content; REP, root elongation under phosphorus deficiency, PH, plant height; MRL, maximum
root length; RN, root number; RV, root volume; RFW, root fresh weight; RDW, root dry weight;
SDW, shoot dry weight; TDW, total dry weight; RS, root-shoot dry weight ratio.
A QTL on chromosome 6 was mapped in two independent studies (Wissuwa et al., 1998;
Ni et al., 1998) but has, compared with Pup1, a smaller effect. However, this QTL has gained
importance after it was shown that a cluster of P-responsive genes is located in this region
(Heuer et al., 2009). Among these is the transcription factor gene OsPTF1 which was shown to
86
confer tolerance of P deficiency (Yi et al., 2005). Within the larger-effect QTL Pup1, no P-
responsive gene has been identified and, in agreement with that, Pup1-based tolerance does
not seem to employ currently known P-starvation response pathways as indicated by two
independent gene array analyses (Pariasca-Tanaka et al., 2009; Heuer et al., unpublished).
Interestingly, the region on chromosome 12, where Pup1 is located, has been associated with
tolerance to several biotic stresses (Ramalingam et al., 2003; Li et al., 2006), as well as to
drought (Babu et al., 2003; Bernier et al., 2007), aluminium toxicity (Wu et al., 2000) and cold
(Andaya and Mackill 2003). Similarly in tropical maize, drought-tolerant selections are also
found tolerant to low N (Bänziger et al., 2002). Whether this is a direct effect of Pup1 remains
to be shown.
Figure 3. Future approaches in breeding nutrient-efficient varieties essentially require
integration of classical and modern tools
Mutant screens
Trait discovery
Germplasm screening
QTL mapping + validation
Genes in QTLs
Low-nutrient tolerant varieties
Marker-assisted breeding
Transgenics
Genes and markers
Conventional breeding
Novel alleles
Transcript profiling
Reverse genetics
Genome sequencing
Genes with known function
Forward genetics
87
With respect to N, several genomic regions associated with N use and response have
been mapped in rice (Fang and Wu 2001; Ishimaru et al., 2001; Obara et al., 2001), following
the pioneering work on mapping QTLs for N-use efficiency in corn (Agrama et al., 1999). An
overview of the published rice QTLs is provided in Table 2. Indications so far suggest possible
links between very few of these QTLs with primary N assimilation genes and
transporters,especially of GS structural genes and yield components (Obara et al., 2001; Obara
et al., 2004; Senthilvel et al., 2008; Feng et al., 2010; Vinodet al., 2011). In a recent study, stably
expressed QTLs for yield and associated traits at different N levels, especially those expressed
at low N, were reported in recombinant inbred line populations (Tong et al., 2011).
Table 2: Quantitative trait loci for nitrogen response and associated traits in rice
Traits Population Cross No. of QTL
Reference MQTL EQTL
PH DHL IR64/ Azucena 10 - Fang and Wu, 2001 Rubisco, TLN, SPC BIL Nipponbare/
Kasalath 15 - Ishimaru et al.,
2001 GS, GOGAT BIL Nipponbare/
Kasalath 13 - Obara et al., 2001
GS, PN, PW NIL Koshihikari/ Kasalath
1 - Obara et al., 2004
TGN, TSN, NUP, NUE, NTE
F3 Basmati370/ ASD16 43 - Senthilvel et al., 2004
RDW, SDW, BM RIL Zhenshan 97/ Minghui 63
52 103 Lian et al., 2005
PH, PN, CC, SDW CSSL Teqing/ Lemont 31 - Tong et al., 2006 TGN, TLN, TSN, NUP, SLN
RIL IR69093-4-3-2 / IR72
32 - Laza et al., 2006
RL, RT, RM, BM etc. RIL Bala/ Azucena 17 - MacMillan et al., 2006
TGN, TLN, TSN, PNUE, BM
RIL Dasanbyeo / TR22183
20 58 Cho et al., 2007
TPN, NUE DHL IR64/ Azucena 16 - Senthilvel et al., 2008
TPN, NDMPE, NGPE, TGN
RIL Dasanbyeo / TR22183
28 23 Piao et al., 2009
88
PH, NR,GS, GOGAT, BM etc
RIL Basmati 370/ ASD16
15 44 Vinod et al., 2011
GYP, BM, HI etc. RIL IR64/ INRC10192 46 - Srividya et al., 2010 PH, RDW, SDW, CC, RL, BM
RIL R9308/ Xieqingzao B
7 - Feng et al., 2010
GYP, GNP RIL Zhenshan 97/ HR5 19 11 Tong et al., 2011
BIL, backcross inbred lines; BM, biomass; CC, chlorophyll content; CSSL, chromosomal
segment substitution lines; DHL, doubled haploid lines; EQTL, epistatic QTL; GNP, grain number
per panicle; GOGAT, glutamate synthase; GYP, grain yield per plant; GS, glutamine synthetase;
HI, harvest index; MQTL, main-effect QTL; NDMPE, nitrogen dry matter production efficiency;
NGPE, nitrogen grain production efficiency; NHI, nitrogen harvest index; NIL, near isogenic lines;
NR, nitrate reductase; NTE, nitrogen translocation efficiency; NUE, nitrogen use efficiency; NUP,
nitrogen uptake; PH, plant height; PN, panicle number per plant, PW, panicle weight; PNUE,
physiological nitrogen use efficiency; RDW, root dry weight; RIL, recombinant inbred lines; RL,
root length; RT, root thickness; RM, root biomass; SDW, shoot dry weight; SLN, specific leaf
nitrogen; SPC, soluble protein content; TGN, total grain nitrogen; TLN, total leaf nitrogen; TSN,
total shoot nitrogen; TPN, total plant nitrogen
One of these QTLs, for number of grains per panicle under low N level, is located in the
same region as the Pup1 locus on chromosome 12, prospecting the use of Pup1 materials for
testing low-N tolerance. Attempts to map loci for associative rhizosphere N fixation was also
reported in rice and independent QTLs linked to the activity of different N fixing bacterial
strains were identified on chromosome 2 (Ji et al., 2005). In independent studies (Lian et al.,
2005; Cho et al., 2007; Vinod, unpublished data), many small-effect epistatic QTLs have been
mapped accounting for a large cumulative proportion of variation for traits under low and
normal N conditions, many of which also show significant QTL x environment interaction,
emphasising the importance of validating QTLs in multiple environments and genetic
backgrounds before using them in selection.
89
Summary and outlook
In light of scarce resources, increasing fertilizer-production costs, and the need to keep
rice production at a pace with the growing demand, the development of nutrient-efficient crops
is increasingly important. As outlined above, both, nutrient uptake and metabolic pathways are
under the control of a complex regulatory network involving many genes (Figure 3). The
identification of large-effect QTLs/genes is therefore a challenge. However, examples such as
the submergence-tolerance OsSUB1Agene (Xu et al., 2006; Fukao et al., 2008) demonstrate that
a single gene can modify many downstream responses without affecting plant performance
under non-stressed conditions (Mackill et al., 2012). In addition to Pup1, large-effect QTLs have
also been identified for drought tolerance (Vikram et al., 2011; Swamy et al., 2011). Cloning of
the genes is in progress and it can be expected that upstream regulatory genes.Sub1, Pup1, and
the drought QTLs have been identified by reverse genetic approaches using tolerant rice
genotypes and screenings were mainly conducted under field conditions. Given the success of
this approach for a complex trait like drought, it seems advisable to apply it for the
identification of QTLs/genes for N and P use efficiency.
With the experiences gained in QTL mapping and the rapid development of genome-
sequencing and molecular-marker technologies, more high-impact, large-effect QTLs will surely
be identified in the future. These efforts require expertise in different disciplines and,
therefore, modern breeding is more and more implemented in multi-disciplinary teams
involving breeders, physiologists, and molecular biologists/geneticists. With the rapid advance
in the development ofmolecular markers and other breeding tools, breeders now gain access to
un-adapted genotypes that were difficult to use in breeding programs due to crossing barriers
and poor agronomic performance of landraces. Molecular breeding now provides real
opportunities to develop well-adapted and high yielding rice varieties, and will widen the gene
pool that breeders use in their programs.
References
90
Agrama, H. A., Zakaria, A. G., Said, F. B., and Tuinstra, M., 1999 Identification of quantitative
trait loci for nitrogen use efficiency in maize. Molecular Breeding 5: 187-195.
Agrama, H. A., 2006 Application of molecular markers in breeding for nitrogen use efficiency. J
Crop Improvement 15: 175–211.
Ai, P., Sun, S., Zhao, J., Fan, X., Xin, W., Guo, Q., Yu, L., Shen, Q., Wu, P., Miller, A. J., and Xu, G.,
2009 Two rice phosphate transporters, OsPht1;2 and OsPht1;6, have different functions and
kinetic properties in uptake and translocation. Plant Journal 57: 798–809.
Andaya, V. C., Mackill, D. J., 2003 Mapping of QTLs associated with cold tolerance during the
vegetative stage in rice. JExp Bot 54: 2579–2585.
Andrews, M., Lea, P. J., Raven, J. A., and Lindsey, K., 2004 Can genetic manipulation of plant
nitrogen assimilation enzymes result in increased crop yield and greater N-use efficiency? An
assessment.Annals Appl Bot1 45: 25-40.
Andrews, M., Raven, J. A., and Sprent, J.I., 1995 Site of nitrate assimilation in grain legumes in
relation to low temperature sensitivity: An assessment. In Proceedings of the 2nd European
Conference on Grain Legumes, Paris: L’Association Européenne de recherche sur les
protéagineuse, France 120–121.
Andrews, M., 1986 The partitioning of nitrate assimilation between root and shoot of higher
plants.Plant Cell Environment 9: 511–519.
Aung, K., Lin, S., Wu, C., Huang, Y., Su, C., and Chiou, T.. 2006 pho2, a phosphate
overaccumulator, is caused by a nonsense mutation in a microRNA399 target gene. Plant
Physiol 141: 1000–1011.
Babu, R. C., Nguyen, B. D., Chamarerk, V., Shanmugasundaram, P., Chezhian, P., Jeyaprakash,
P., Ganesh, S. K., Palchamy, A., Sadasivam, S., Sarkarung, S., Wade, L. J., and Nguyen, H.T., 2003
91
Genetic analysis of drought resistance in rice by molecular markers: association between
secondary traits and field performance.Crop Science 43: 1457–1469.
Balasubramanian, V., Alves, B., Aulakh, M., Bekunda Me, Cai, Z., Drinkwater, L., Mugendi, D.,
van Kessel, C., and Oenema, O., 2004 Crop, environmental, and management factors affecting
nitrogen use efficiency. In: Mosier AR, Syers JK, Freney JR (eds) Agriculture and the nitrogen
cycle: Assessing the impacts of fertilizer use on food production and the environment,
Washington: Island Press 19–34.
Bänziger, M., Edmeades, G. O., and Lafitte, H. R., 2002 Physiological mechanisms contributing
to the increased N stress tolerance of tropical maize selected for drought tolerance. Field Crop
Res 75: 223–233.
Bari, R., Datt Pant, B., Stitt, M., and Scheible, W. R., 2006 PHO2, microRNA399, and PHR1 define
a phosphate-signaling pathway in plants. Plant Physiol 141: 988–999.
Bassirirad, H., 2000 Kinetics of nutrient uptake by roots: Responses to global change. New
Phytologist 147: 155–169.
Bernier, G., Havelange, A., Houssa, C., Petitjean, A., Lejeune, P., 1993 Physiological signals that
induce flowering. Plant Cell 5: 1147–1155.
Bernier, J., Kumar, A., Ramaiah, V., Spaner, D., and Atlin, G., 2007 A large-effect QTL for grain
yield under reproductive-stage drought stress in upland rice.Crop Sci 47: 507–517.
Bernier, J., Kumar, A., Venuprasad, R., Spaner, D., Verulkar, S., Mandal, N. P., Sinha, P. K.,
Peeraju, P., Dongre, P. R., Mahto, R. N., and Atlin, G., 2009b Characterization of the effect of a
QTL for drought resistance in rice, qtl12.1, over a range of environments in the Philippines and
eastern India.Euphytica 166: 207–217.
92
Bernier, J., Serraj, R., Kumar, A., Venuprasad, R., Impa, S., Gowda, R. P.V., Oane, R., Spaner, D.,
and Atlin, G., 2009a The large-effect drought-resistance QTL qtl12.1 increases water uptake in
upland rice. Field Crops Res 110: 139–146.
Bi, Y. M., Kant, S., Clark, J., Gidda, S., Ming, F., Xu, J., Rochon, A., Shelp, B. J., Hao, L., Zhao, R.,
Mullen, R. T., Zhu, T., and Rothstein, S. J., 2009 Increased nitrogen-use efficiency in transgenic
rice plants over-expressing a nitrogen responsive early nodulin gene identified from rice
expression profiling. Plant Cell Environ 32: 1749–1760.
Blum, A., 1988 Plant breeding for stress environments.Boca Raton; CRC Press.
Bockman, O. C., Kaarstard, O., Lie, O. H., and Richards, I., 1990 Agriculture and fertilizers. Oslo:
Agriculture group Norsk Hydro, Norway.
Bot, A. J., Nachtergaele, O. F., and Young, A., 2000 Land resource potential and constraints at
regional and country levels. Food and Agriculture Organisation, FAO, Rome, World Soil
Resources Report No. 90, 114p;
Bregitzer, P., and Raboy, V., 2006 Effects of four independent low-phytate mutations on barley
agronomic performance, Crop Sci 46:1318–1322.
Broadbent, F. E., DeDatta, S. K., and Laureles, E. V., 1987 Measurement of nitrogen utilization
efficiency in rice genotypes. Agronomy J 79: 786–791.
Cai, C., Wang, J. Y., Zhu, Y. G., Shen, Q. R., Li, B., Tong, Y. P., and Li, Z. S., 2008 Gene structure
and expression of the high-affinity nitrate transport system in rice roots. J Integrat Plant Biol 50:
443–451.
Campbell, W. H., 2002 Molecular control of nitrate reductase and other enzymes involved in
nitrate assimilation. In: Foyer CH, NoctorG, eds. Photosynthetic nitrogen assimilation and
associated carbon and respiratory metabolism, Kluwer Academic, Netherlands 35–48.
93
Cassman, K. G., Gines, G. C., Dizon, M. A., Samson, M. I., and Alcantara, J. M., 1996 Nitrogen-use
efficiency in tropical lowland rice systems: contributions from indigenous and applied nitrogen.
Field Crops Res 47: 1–12.
Chen, S. J., and Kao, C. H., 1996 Ammonium accumulation in relation to senescence of detached
maize leaves. Botanical Bulletin of Academia Sinica 37: 255–259.
Chin, J. H., Gamuyao, R., Dalid, C., Bustamam, M., Prasetiyono, J., Moeljopawiro, S., Wissuwa,
M., and Heuer, S., 2011 Developing rice with high yield under phosphorus deficiency: Pup1
sequence to application.Plant Physiol 156: 1202–1216.
Chin, J. H., Lu, X., Haefele, S. M., Gamuyao, R., Ismail, A., Wissuwa, M., and Heuer, S., 2010
Development and application of gene-based markers for the major rice QTL Phosphorus uptake
1.Theor Appl Genet 120: 1073–1086.
Cho, Y., Jiang, W., Chin, J. H., Piao, Z., Cho, Y. G., McCouch, S. R., and Koh, H. J., 2007
Identification of QTLs associated with physiological nitrogen use efficiency in rice. Molecules
and Cells 23: 72–79.
Cordell, D., Drangert, J. O., and White, S., 2009 The story of phosphorus: Global food security
and food for thought. Global Environmental Change19: 292–305.
Crawford, N. M., and Glass, A. D. M., 1998 Molecular and physiological aspects of nitrate uptake
in plants.Trends Plant Sci 3: 389–395.
Dawe, D., 2000 The potential role of biological nitrogen fixation in meeting future demand for
rice and fertilizer. In: Ladha, J. K., and Reddy, P. M., ed. The quest for nitrogen fixation in rice.
International Rice Research Institute, Philippines. pp. 1–9.
DeDatta, S. K., and Broadbent, F. E., 1988 Methodology for evaluating nitrogen utilization
efficiency by rice genotypes. Agronomy J 80: 793–798.
94
DeDatta, S. K., 1981 Principles and practice of rice production. Singapore: John Wiley and Sons,
348–419.
Ding, Z., Wang, C., Chen, S., and Yu, S., 2011 Diversity and selective sweep in the OsAMT1;1
genomic region of rice. BMC Evolutionary Biology 11: 61.
Dobermann, A., and Fairhurst, T., 2000 Rice: Nutrient Disorders & Nutrient Management.
Potash & Phosphate Institute, Potash & Phosphate Institute of Canada and International Rice
Research Institute.
Dubois, F., Tercé-Laforgue, T., Gonzalez-Moro, M. B., Estavillo, J. M., Sangwan, R., Gallais, A.,
and Hirel, B., 2003 Glutamate dehydrogenase in plants: Is there a new story for an old enzyme?
Plant Physiology and Biochemistry 41: 565–576.
Epstein, E., and Bloom, A. J., 2005 Mineral Nutrition of Plants: Principles and Perspectives, 2nd
edn. Sunderland: Sinauer Associates.
Fan, J. B., Zhang, Y. L., Turner, D., Duan, Y. H., Wang, D. S., and Shen, Q. R., 2010 Root
physiological and morphological characteristics of two rice cultivars with different nitrogen-use
efficiency. Pedosphere 20: 446–455.
Fang, P., Wu, P., 2001 QTL × N-level interaction for plant height in rice (Oryza sativa L.).Plant
and Soil 236: 237–242.
FAO. 2003 Medium-term prospects for agricultural commodities - Projections to the year 2010.
Rome: Food and Agriculture Organization of the United Nations, Italy.
FAO 2008 Current world fertilizer trends and outlook to 2011/12. Rome: Food and Agriculture
Organization of the United Nations, Italy.
95
Feng, H., Yan, M., Fan, X., Li, B., Shen, Q., Miller, A. J., and Xu, G., 2011 Spatial expression and
regulation of rice high-affinity nitrate transporters by nitrogen and carbon status. J Exp Bot 62 :
2319–2332.
Feng, Y., Cao, L. Y., Wu, W. M., Shen, X. H., Zhan, X. D., Zhai, R. R., Wang, R. C., Chen, D. B., and
Cheng, S. H., 2010 Mapping QTLs for nitrogen-deficiency tolerance at seedling stage in rice
(Oryza sativa L.).Plant Breeding 129: 652–656.
Forde, B. G., and Clarkson, D. T., 1999 Nitrate and ammonium nutrition of plants: physiological
and molecular perspectives. Advances in Botanical Research 30: 1–90.
Franco-Zorrilla, J. M., Valli, A., Todesco, M., Mateos, I., Puga, M. I., Rubio-Somoza, I., Leyva, A.,
Weigel, D., García, J. A., and Paz-Ares, J., 2007 Target mimicry provides a new mechanism for
regulation of microRNA activity. Nature Genet 39: 1033–1037.
Fu, J. D., and Lee, B. W., 2008 Changes in photosynthetic characteristics during grain filling of
functional stay-green rice SNUSG1 and its F1 hybrids. J Crop Sci Biotechnology 11: 75–82.
Fu, J. D., Yan, Y. F., Kim, M. Y., Lee, S. H., and Lee, B. W., 2011 Population-specific quantitative
trait loci mapping for functional stay-green trait in rice (Oryza sativa L.). Genome 54: 235–243.
Fukao, T., and Bailey-Serres, J., 2008 Submergence tolerance conferred by Sub1A is mediated
by SLR1 and SLRL1 restriction of gibberellin responses in rice. Proc Natl Acad Sci USA 105:
16814–16819.
Gahoonia, T. S., and Nielsen, N. E., 2004 Root traits as tools for creating phosphorus efficient
varieties. Plant Soil 260: 47–57.
Gazzarini, S., Lejay, L., Gojon, A., Ninnemann, O., Frommer, W. B., and von Wiren, N., 1999
Three functional transporters for constitutive, diurnally regulated, and starvation-induced
uptake of ammonium into Arabidopsis roots.Plant Cell 11: 937–948.
96
Gerendas, J. G., Zhu, Z., and Sattelmacher, B., 1998 Influence of N and Ni supply on nitrogen
metabolism and urease activity in rice (Oryza sativa L.). J Exp Bot 49: 1545–1554.
Glass, A. D. M., Britto, D. T., Kaiser, B. N., Kronzucker, H. J., Kumar, A., Okamoto, M., Rawat, S.
R., Siddiqi, M. Y., Silim, S. M., Vidmar, J. J., and Zhuo, D., 2001 Nitrogen transport in plants, with
an emphasis on the regulation of fluxes to match plant demand.J Plant Nutrition Soil Sci 164:
199–207.
Gulati, A., and Narayanan, S., 2002 Rice trade liberalization and poverty. Washington:
International Food Policy Research Institute.
Habash, D. Z., Massiah, A. J., Rong, H. L., Wallsgrove, R. M., and Leigh, R. A., 2001 The role of
cytosolic glutamine synthetase in wheat. Annals Appl Biol 138: 83–89.
Haefele, S. M., Wopereis, M. C. S., 2005 Spatial variability of indigenous supplies for N, P and K
and its impact on fertilizer strategies for irrigated rice in West Africa. Plant and Soil 270: 57–72.
Hajiboland, R., Afiasgharzad, N., Barzeghar, R., 2009 Influence of arbuscular mycorrhizal fungi
on uptake of Zn and P by two contrasting rice genotypes. Plant Soil Environm 55: 93–100.
Hammond, J. P., Broadley, M. R., White, P. J., 2004 Genetic responses to phosphorus deficiency.
Annals Botany 94: 323–332.
Hammond, J. P., White, P. J., 2011 Sugar signaling in root responses to low phosphorus
availability. Plant Physiol 156: 1033–1040.
Hawkesford, M. J., Howarth, J.R., 2011 Transcriptional profiling approaches for studying
nitrogen use efficiency. In: Foyer C and Zhang H (ed.) Nitrogen Metabolism in Plants in the Post-
genomic Era, Blackwell Publishing Ltd, Annual Plant Reviews 42: 41–62.
97
Hayakawa, T., Yamaya, T., Mae, T., Ojima, K., 1993 Changes in the content of two glutamate
synthase proteins in spikelets of rice (Oryza sativa) plants during ripening.Plant Physiol 101:
1257–1262.
Hernández, G., Ramírez, M., Valdés-López, O., Tesfaye, M., Graham, M. A., Czechowski, T.,
Schlereth, A., Wandrey, M., Erban, A., Cheung, F., Wu, H. C., Lara, M., Town, C. D., Kopka, J.,
Udvardi, M. K., Vance, C.P., 2007 Phosphorus stress in common bean: Root transcript and
metabolic responses. Plant Physiol 144: 752–767
Heuer, S., Lu, X., Chin, J. H., Tanaka, J. P., Kanamori, H., Matsumoto, T., De Leon, T., Ulat, V. J.,
Ismail, A. M., Yano, M., Wissuwa, M., 2009 Comparative sequence analyses of the major
quantitative trait locus phosphorus uptake 1 (Pup1) reveal a complex genetic structure. Plant
Biotech J: 456–457.
Hirel, B., Lea, P. J., 2002 The biochemistry, molecular biology and genetic manipulation of
primary ammonia assimilation. In: Foyer H, Noctor G, eds. Photosynthetic nitrogen assimilation
and associated carbon and respiratory metabolism. Kluwer Academic Netherlands, 71–92.
Hoshida, H., Tanaka, Y., Hibino, T., Hayashi, Y., Tanaka, A., Takabe, T., Takabe, T., 2000
Enhanced tolerance to salt stress in transgenic rice that overexpresses chloroplast glutamine
synthetase.Plant Mol Bio l43: 103–111.
Hossain, M. F., White, S. K., Elahi, S. F., Sultana, N., Choudhury, N. M. K., Alam, Q. K., Rother, J.
A., Gaunt, J. L., 2005 The efficiency of nitrogen fertiliser for rice in Bangladeshi farmers’ fields.
FieldCrops Res 93: 94–107.
Howitt, S., Udvardi, M. K., 2000 Structure, function and regulation of ammonium transporters in
plants.Biochimica et Biophysica Acta 1465: 152–170.
98
Hu, B., Zhu, C., Li, F., Tang, J., Wang, Y., Lin, A., Liu, L., Che, R., Chu, C., 2011 LEAF TIP
NECROSIS1 plays a pivotal role in the regulation of multiple phosphate starvation responses in
rice. Plant Physiol 156: 1101–1115.
Huang, Y. Z., Feng, Z. W., Zhang, F. Z., 2000 Study on loss of nitrogen fertilizer from agricultural
fields and countermeasure. J Graduate School Academia Sinica 17: 49–58.
Imsande, J., Touraine, B., 1994 N demand and the regulation of nitrate uptake.Plant Physiol
105: 3–7.
Ishimaru K, Kobayashi N, Ono K, Yano M, Ohsugi R. 2001. Are contents of Rubisco, soluble
protein and nitrogen in flag leaves of rice controlled by the same genetics? J Exp Bot52: 1827–
1833.
Ishiyama K, Kojima S, Takahashi H, Hayakawa T, Yamaya T. 2003. Cell type distinct accumulation
of mRNA and protein for NADH-dependent glutamate synthase in rice roots in response to the
supply of NH4+. Plant Physiol Biochem41: 643–647.
Ismail AM, Heuer S, Thomson MJ, Wissuwa M. 2007.Genetic and genomic approaches to
develop rice germplasm for problem soils.Plant Mol Biol65: 547–570
Ji TW, Fang P, XING YZ, JIA XM. 2005. Mapping quantitative trait loci for associative nitrogen
fixation ability in rhizosphere of rice seedling. Plant Nutrition Fertilizer Sci11:394-398.
Jia H, Ren H, Gu M, Zhao J, Sun S, Zhang X, Chen J, Wu P, Xu G. 2011. The phosphate transporter
gene OsPht1;8 is involved in phosphate homeostasis in rice. Plant Physiol156: 1164-1175.
Kamachi K, Yamaya T, Mae T, Ojima K. 1991. A role for glutamine synthetase in the
remobilisation of leaf nitrogen during natural senescence in rice leaves. Plant Physiol96: 411–
417.
99
Kant S, Bi YM, Rothstein SJ. 2011. Understanding plant response to nitrogen limitation for the
improvement of crop nitrogen use efficiency. J Exp Bot 62: 1499–1509
Karandashov V, Bucher M. 2005.Symbiotic phosphate transport in arbuscular
mycorrhizas.Trends in Plant Science10: 22–29.
Kronzucker HJ, Kirk GJD, Siddiqi MY, Glass ADM. 1998.Effects of hypoxia on 13NH4+ fluxes in
rice roots: Kinetics and compartmental analysis.Plant Physiol 116: 581–587.
Kronzucker HJ, Siddiqi MY, Glass ADM, Kirk GJD. 1999. Nitrate-ammonium synergism in rice: A
subcellular flux analysis. Plant Physiol119: 1041–1045.
Laffite HR, Edmeades GO. 1994a. Improvement for tolerance to low soil nitrogen in tropical
maize. I. Selection criteria. Field Crops Res39: 1–14.
Laffite HR, Edmeades GO. 1994b. Improvement for tolerance to low soil nitrogen in tropical
maize. II. Grain yield, biomass production, and N accumulation. Field Crops Res39:15–25.
Larson SR, Rutger JN, Young KA, Raboy V. 2000.Isolation and genetic mapping of a non-lethal
rice (Oryza sativa L.) low phytic acid 1 mutation.Crop Sci40: 1397–1405.
Laungani R, Knops JMH. 2009. Species-driven changes in nitrogen cycling can control plant
invasions. Proc Nat Acad Sci USA106: 12400–12405.
Laza MR, Kondo M, Ideta O, Barlaan E, Imbe T. 2006. Identification of quantitative trait loci for
δ13C and productivity in irrigated lowland rice. Crop Sci46: 763–773.
Lea PJ, Miflin BJ. 2011. Nitrogen assimilation and its relevance to crop improvement. In: Foyer C
and Zhang H (ed.) Nitrogen Metabolism in Plants in the Post-genomic Era, Blackwell Publishing
Ltd, Annual Plant Reviews42: 1–40.
100
Lea PJ. 1997. Primary nitrogen metabolism. In: Dey PM, Harborne JB, eds. Plant biochemistry.
London: Academic Press, 273–313.
Li BZ, Merrick M, Li SM, Li HY, Zhu SW, Shi WM, Su YH. 2009b. Molecular basis and regulation of
ammonium transporter in rice.Rice Sci16: 314–322.
Li HY, Zhu YG, Marschner P, Smith FA, Smith SE. 2005. Wheat responses to arbuscular
mycorrhizal fungi in a highly calcareous soil differ from those of clover, and change with plant
development and P supply. Plant and Soil277: 221–232.
Li J, Xie Y, Dai A, Liu L, Li Z. 2009a. Root and shoot traits responses to phosphorus deficiency
and QTL analysis at seedling stage using introgression lines of rice. J Genet Gen36: 173–183.
Li K, Xu C, Li Z, Zhang K, Yang A, Zhang J. 2008. Comparative proteome analyses of phosphorus
responses in maize (Zea mays L.) roots of wild-type and a low-P-tolerant mutant reveal root
characteristics associated with phosphorus efficiency. Plant J55: 927–939.
Li ZK, Arif M, Zhong DB, Fu BY, Xu JL, Domingo-Rey J, Ali J, Vijayakumar CHM, Yu SB, Khush GS.
2006. Complex genetic networks underlying the defensive system of rice (Oryza sativa L.) to
Xanthomonas oryzaepv. oryzae. Proc Natl Acad SciUSA 103: 7994–7999.
Lian X, Wang S, Zhang J, Feng Q, Zhang L, Fan D, Li X, Yuan D, Han B, Zhang Q. 2006. Expression
profiles of 10,422 genes at early stage of low nitrogen stress in rice assayed using a cDNA
microarray. Plant Mol Biol60: 617–631.
Lian X, Xing Y, Yan H, Xu C, Li X, Zhang Q. 2005. QTLs for low nitrogen tolerance at seedling
stage identified using a recombinant inbred line population derived from an elite rice hybrid.
Theor Appl Genet112: 85–96.
Lin CM, Koh S, Stacey G, Yu SM, Lin TY, Tsay YF. 2000. Cloning and functional characterization of
a constitutively expressed nitrate transporter gene, OsNRT1, from rice. Plant Physiol122: 379–
388.
101
Liu QL, Xu XH, Ren XL, Fu HW, Wu DX, Shu QY. 2007. Generation and characterization of low
phytic acid germplasm in rice (Oryza sativa L.). Theor Appl Genet114: 803–814.
Loqué D, von Wirén N. 2004. Regulatory levels for the transport of ammonium in plant roots. J
Expl Bot55: 1293–1305.
Lott JNA, Bojarski M, Kolasa J, Batten GD, Campbell LC. 2009. A review of the phosphorus
content of dry cereal and legume crops of the world. Internat J Agricult Resources, Governance
and Ecology8: 351–370.
Lott JNA, Ockenden I, Raboy V, Batten GD. 2000. Phytic acid and phosphorus in crop seeds and
fruits: a global estimate. Seed Sci Res10: 11–33.
Lynch JP. 2007.Roots of the Second Green Revolution. Turner Review No. 14, Austral J Bot 55:
493–512.
MacDonald GK, Bennett EM, Potter PA, Ramankutty N. 2011. Agronomic phosphorus
imbalances across the world's croplands.Proc Nat Acad Sci USA 108: 3086–3091.
MacMillan K, Emrich K, Piepho H-P, Mullins CE, Price AH. 2006. Assessing the importance of
genotype × environment interaction for root traits in rice using a mapping population II:
conventional QTL analysis. Theor Appl Genet113: 953–964.
Mae T, Ohira K. 1981. The remobilization of nitrogen related to leaf growth and senescence in
rice plants (Oryza sativa L.). Plant Cell Physiol22:1067–1074.
Mae T. 1997. Physiological nitrogen efficiency in rice: Nitrogen utilization, photosynthesis, and
yield potential. Plant and Soil196: 201–210.
Mackill DJ, Ismail AM, Singh US, Labios RV, Paris TR. 2012. Development and rapid adoption of
submergence-tolerant (Sub1) rice varieties. Adv Agron115: 299–352.
102
Maiti D, Singh RK, Variar M. 2012. Rice-based crop rotation for enhancing native arbuscular
mycorrhizal (AM) activity to improve phosphorus nutrition of upland rice (Oryza sativa L.). Biol
Fertil Soils48: 67–73.
Marschner H, Kirby EA, Engels C. 1997.Importance of cycling and recycling of mineral nutrients
within plants for growth and development.Botanical Acta110: 265–273.
Marschner H, Römheld V, Horst WJ, Martin P. 1986. Root-induced changes in the rhizosphere:
importance of the mineral nutrition in plants. Zeitschrift für Pflanzenernährung und
Bodenkdunde 149: 441–456.
Martin A, Lee J, Kichey T, Gerentes D, Zivy M, Tatout C, Dubois F, Balliau T, Valot B, Davanture
M, Tercé-Laforgue T, Quilleré I, Coque M, Gallais A, Gonzalez-Moro MB, Bethencourt L, Habash
DZ, Lea PJ, Charcosset A, Perez P, Murigneux A, Sakakibara H, Edwards KJ, Hirel B. 2006. Two
cytosolic glutamine synthetase isoforms of maize are specifically involved in the control of grain
production. Plant Cell8: 3252–3274.
Martin T, Oswald O, Graham IA. 2002. Arabidopsis seedling growth, storage lipid mobilization,
and photosynthetic gene expression are regulated by carbon: nitrogen availability. Plant
Physiol128: 472–481.
Masclaux C, Quillere I, Gallais A, Hirel B. 2001. The challenge of remobilisation in plant nitrogen
economy: A survey of physio-agronomic and molecular approaches. Annals Appl Biol138: 69–
81.
Matson PA, Naylor R, Ortiz-Monasterio I. 1998.Integration of environmental, agronomic, and
economic aspects of fertilizer management.Science280: 112–115.
Meyer C, Stitt M. 2001. Nitrate reduction and signaling. In: Lea PJ, Morot-Gaudry J-F, eds. Plant
nitrogen, Berlin: Springer, pp37–59.
103
Morcuende R, Bari R, Gibon Y, Zheng W, Pant BD, Blasing O, Usadel B, Czechowski T, Udvardi
MK, Stitt M, Scheible WR. 2007. Genome-wide reprogramming of metabolism and regulatory
networks of Arabidopsis in response to phosphorus. Plant Cell Environ30: 85–112.
Muruli BI, Paulsen GM. 1981. Improvement of nitrogen use efficiency and its relationship to
other traits in maize. Maydica26: 63–73.
Ni JJ, Wu P, Senadhira D, Huang N. 1998.Mapping QTLs for phosphorus deficiency tolerance in
rice (Oryza sativa L.).Theor Appl Genet97:1361–1369
Nilsson L, Műller R, Nielsen TH. 2010. Dissecting the plant transcriptome and the regulatory
responses to phosphate deprivation. Physiol Plantarum139: 129–143.
Obara M, Kajiura M, Fukuta Y, Yano M, Hayashi M, Yamaya T, Sato T. 2001. Mapping of QTLs
associated with cytosolic glutamine synthetase and NADH-glutamate synthase in rice (Oryza
sativa L.). J Expl Bot52: 1209–1217.
Obara M, Sato T, Sasaki S, Kashiba K, Nagano A, Nakamura I, Ebitani T, Yano M, Yamaya T. 2004.
Identification and characterization of a QTL on chromosome 2 for cytosolic glutamine
synthetase content and panicle number in rice. Theor Appl Gen110: 1–11.
Ortiz-Monasterio JI, Manske GGB, van Ginkel M. 2001. Nitrogen and phosphorus use efficiency.
In: Reynolds MP, Ortiz-Monasterio JI, McNab A, eds. Application of physiology in wheat
breeding. CIMMYT, Mexico, 200–207.
Pariasca-Tanaka J, Satoh K, Rose T, Mauleon R, Wissuwa M. 2009. Stress response versus stress
tolerance: a transcriptome analysis of two rice lines contrasting in tolerance to phosphorus
deficiency. Rice2: 167–185.
Park MR, Tyagi K, Baek SH, Kim YJ, Rehman S, Yun SJ. 2010. Agronomic characteristics of
transgenic rice with enhanced phosphate uptake ability by overexpressed tobacco high affinity
phosphate transporter. Pakistan J Bot42: 3265–3273.
104
Paszkowski U, Kroken S, Roux C, Briggs SP. 2002. Rice phosphate transporters include an
evolutionarily divergent gene specifically activated in arbuscular mycorrhizal symbiosis. Proc
Nat Acad Sci USA 99: 13324–13329.
Peng S, Bouman BAM. 2007. Prospects for genetic improvement to increase lowland rice yields
with less water and nitrogen. In: Spiertz JHJ, Struik PC, van Laar HH, eds. Scale and complexity in
plant systems research: Gene-Plant-Crop relations. Springer, 251–266.
Peng S, Khush GS, Cassman KG. 1994. Evolution of the new plant ideotype for increased yield
potential. In: Cassman KG, ed. Breaking the Yield Barrier. Manila: International Rice Research
Institute, Philippines, 57–60.
Piao Z, Li M, Li P, Zhang J, Zhu C, Wang H, Luo Z, Lee J, Yang R. 2009. Bayesian dissection for
genetic architecture of traits associated with nitrogen utilization efficiency in rice. African J
Biotechnol8: 6834–6839.
Raboy V. 2009. Approaches and challenges to engineering seed phytate and total phosphorus.
Plant Sci177: 281–296.
Rae AL, Jarmey JM, Mudge SR, Smith FW. 2004. Over-expression of a high-affinity phosphate
transporter in transgenic barley plants does not enhance phosphate uptake rates. Functional
Plant Biol31: 141–148.
Ramaekers L, Remans R, Rao IM, Blair MW, Vanderleyden J. 2010.Strategies for improving
phosphorus acquisition efficiency of crop plants.Field Crops Res117: 169–176.
Ramalingam J, Vera Cruz CM, Kukreja K, Chittoor JM, Wu JL, Lee SW, Baraoidan M, George ML,
Cohen MB, Hulbert SH, Leach JE, Leung H. 2003.Candidate defense genes from rice, barley, and
maize and their association with qualitative and quantitative resistance in rice.Mol Plant-
Microbe Interact16:14–24.
105
Raven JA, Taylor R. 2003. Macroalgal growth in nutrient enriched estuaries: a biogeochemical
perspective. Water, Air and Soil Pollution: Focus3: 7–26.
Reymond M, Svistoonoff S, Loudet O, Nussaume L, Desnos T. 2006.Identification of QTL
controlling root growth response to phosphate starvation in Arabidopsis thaliana.Plant Cell
Environ29: 115–125.
Rouached H, Stefanovic A, Secco D, Arpat AB, Gout E, Bligny R, Poirier Y. 2011. Uncoupling
phosphate deficiency from its major effects on growth and transcriptome via PHO1 expression
in Arabidopsis.Plant J65: 557–570.
Rubio V, Linhares F, Solano R, Martin AC, Iglesias J, Leyva A, Paz-Ares J. 2001. A conserved MYB
transcription factor involved in phosphate starvation signaling both in vascular plants and in
unicellular algae. Genes Dev15: 2122–2133.
Sawers RJ, Gutjahr C, Paszkowski U. 2008. Cereal mycorrhiza: an ancient symbiosis in modern
agriculture. Trends Plant Sci13: 93–97.
Schachtman DP, Reid RJ, Ayling SM. 1998. Phosphorus Uptake by Plants: From Soil to Cell. Plant
Physiol116: 447–453.
Senthilvel S, Govindaraj P, Arumugachamy S, Latha R, Malarvizhi P, Gopalan A, Maheswaran M.
2004. Mapping genetic loci associated with nitrogen use efficiency in rice (Oryza sativa L.).
Brisbane: 4th International Crop Science Congress, Queensland, Australia.
Senthilvel S, Vinod KK, Malarvizhi P, Maheswaran M. 2008. QTL and QTL×Environment effects
on agronomic and nitrogen acquisition traits in rice. J Integrat Plant Biol50: 1108–1117.
Septiningsih EM, Pamplona AM, Sanchez DL, Neeraja CN, Vergara GV, Heuer S, Ismail AM,
Mackill DJ. 2009. Development of submergence-tolerant rice cultivars: the Sub1 locus and
beyond. Annals Bot103: 151–160.
106
Sheehy JE, Dionora MJA, Mitchell PL, Peng S, Cassman KG, Lemaire G, Williams RL. 1998. Critical
nitrogen concentrations: implications for high-yielding rice (Oryza sativa L.) cultivars in the
tropics. Field Crops Res59: 31–41.
Shimizu A, Yanagihara S, Kawasaki S, Ikehashi H. 2004. Phosphorus deficiency-induced root
elongation and its QTL in rice (Oryza sativa L.). Theor Appl Genet109:1361–1368
Shimizu A, Kato K, Komatsu A, Motomura K, Ikehashi H. 2008. Genetic analysis of root
elongation induced by phosphorus deficiency in rice (Oryza sativa L.): Fine QTL mapping and
multivariate analysis of related traits. Theor Appl Genet117:987–996
Singh N, Dang TTM, Vergara GV, Pandey DM, Sanchez D, Neeraja CN, Septiningsih EM,
Mendioro M, Tecson-Mendoza EM, Ismail AM, Mackill DJ, Heuer S. 2010.Molecular marker
survey and expression analyses of the rice submergence-tolerance gene SUB1A.Theor Appl
Genet121: 1441–1453.
Singh U, Ladha JK, Castillo EG, Punzalan G, Tirol-Padre A, Duqueza M. 1998. Genotypic variation
in nitrogen use efficiency in medium- and long-duration rice. Field Crops Res58: 35–53.
Solaiman MZ, Hirata H. 1997. Effect of arbuscular mycorrhizal fungi inoculation of rice seedlings
at the nursery stage upon performance in the paddy field and greenhouse.Plant Soil 191: 1–12.
Srividya A, Vemireddy LR, Hariprasad AS, Jayaprada M, Sridhar S, Ramanarao PV, Anuradha G,
Siddiq EA. 2010. Identification and Mapping of Landrace Derived QTL Associated with Yield and
its Components in Rice under Different Nitrogen Levels and Environments. Internat J Plant
Breed Genet4: 210–227.
Stitt M. 1999. Nitrate regulation of metabolism and growth. Current Opinion Plant Biol2: 178–
186.
Swamy BPM, Vikram P, Dixit S, Ahmed HU, Kumar A. 2011. Meta-analysis of grain yield QTL
identified during agricultural drought in grasses showed consensus. BMC Genomics12: 319.
107
Tabuchi M, Sugiyama K, Ishiyama K, Inoue E, Sato T, Takahashi H, Yamaya T. 2005. Severe
reduction in growth rate and grain filling of rice mutants lacking OsGS1;1, a cytosolic glutamine
synthetase1;1. The Plant Journal42: 641–651.
Tabuchi M, Abiko T, Yamaya T. 2007.Assimilation of ammonium ions and reutilization of
nitrogen in rice (O. sativa L.).J Exp Bot58: 2319–2327.
Thomson MJ, Ocampo M, Egdane J, Rahman MA, Sajise AG, Adorada DL, Tumimbang-Raiz E,
Blumwald E, Seraj ZI, Singh RK, Gregorio GB, Ismail AM. 2010. Characterizing the Saltol
quantitative trait locus for salinity tolerance in rice. Rice3: 148–160.
Tirol-Padre A, Ladha JK, SinghU, Laureles E, Punzalan G, Akita S. 1996.Grain yield performance
of rice genotypes at suboptimal levels of soil N as affected by N uptake and utilization
efficiency.Field Crops Res46: 127–143.
Tobin AK, Yamaya T. 2001.Cellular compartmentation of ammonium assimilation in rice and
barley.J Exp Bot53: 591–604.
Tong H, Chen L, Li W, Mei H, Xing Y, Yu X, Xu X, Zhang S, Luo L. 2011.Identification and
characterization of quantitative trait loci for grain yield and its components under different
nitrogen fertilization levels in rice (Oryza sativa L.).Mol Breeding28: 495–509.
Tong HH, Mei HW, Yu XQ, Xu XY, Li MS, Zhang SQ, Luo LJ. 2006. Identification of related QTLs at
late developmental stage in rice (Oryza sativa L.) under two nitrogen levels. Acta Genetica
Sinica33: 458–467.
Travis T. 1993. The Haber-Bosch process: exemplar of 20th century chemical industry.
Chemistry and Industry15: 581–585.
UNFPA. 2011. The State of World Population 2011: People and possibilities in a world of 7
billion. United Nations Population Fund, New York. 124p.
108
van Kauwenbergh SJ. 2010. World phosphate rock reserves and resources. International
Fertilizer Development Center, IFDC Technical Bulletin No. 75. Muscle Shoals, Alabama, USA, 58
pp.
Vikram P, Swamy BPM, Dixit S, Ahmed HU, Cruz MTS, Singh AK, Kumar A. 2011.qDTY1.1, a
major QTL for rice grain yield under reproductive-stage drought stress with a consistent effect
in multiple elite genetic backgrounds.BMC Genet12:89
Vinod KK, Meenakshisundaram P, Maheswaran M, Malarvizhi P, Gopalan A. 2011. Quantitative
trait loci for shoot biomass under low nitrogen stress may be putatively linked to loci
influencing glutamine synthetase (GS) activity in rice (Oryza sativa L.). International symposium
on plant biotechnology towards tolerance to stresses and enhancing yield. Ranchi: Birla
Institute of Technology.
Von Braun J. 2008. The food crisis isn’t over. Nature456: 701.
Wang H, Inukai Y, Yamauchi A. 2006. Root development and nutrient uptake. Critical Reviews
Plant Sci25: 279–301.
Wang MY, Siddiqi MY, Ruth TJ, Glass ADM. 1993a.Ammonium uptake by rice roots. I. Fluxes and
subcellular distribution of 13NH4+. Plant Physiol 103: 1249–1258.
Wang WH, Köhler B, Cao FQ, Liu GW, Gong YY, Sheng S, Song QC, Cheng XY, Garnett T, Okamoto
M, Qin R, Mueller-Roeber B, Tester M, Liu LH. 2012. Rice DUR3 mediates high-affinity urea
transport and plays an effective role in improvement of urea acquisition and utilization when
expressed in Arabidopsis. New Phytologist 193: 432–444.
Wasaki J, Yonetani R, Shinano T, Kai M, Osaki M. 2003. Expression of the OsPI1 gene, cloned
from rice roots using cDNA microarray, rapidly responds to phosphorus status. New Phytol 158:
239–248.
109
Williams JF, Mutters RG, Greer CA, Horwath WR. 2010. Rice nutrient management in California.
Agriculture and Natural Resources, University of California.
Williams LE, Miller AJ. 2001. Transporters responsible for the uptake and partitioning of
nitrogenous solutes. Annual Review Plant Physiol Plant Mol Biol52: 659–688.
Wissuwa M, Wegner J, Ae N, Yano M. 2002. Substitution mapping of Pup1: a major QTL
increasing phosphorus uptake of rice from a phosphorus-deficient soil. Theor Appl Genet105:
890–897.
Wissuwa M, Yano M, Ae N. 1998.Mapping of QTLs for phosphorus-deficiency tolerance in rice
(Oryza sativa L.).Theor Appl Genet97: 777–783.
Witcombe JR, Hollington PA, Howarth CJ, Reader S, Steele KA. 2008. Breeding for abiotic
stresses for sustainable agriculture. Philosophical Transactions Royal Society B363: 703–716.
Witt C, Pasuquin JM, Sulewsk G. 2009.Predicting agronomic boundaries of future fertilizer
needs in AgriStats.Better Crops93: 16–18.
Wu P, Liao CY, Hu B, Yi KK, Jin WZ, Ni JJ, He C. 2000. QTLs and epistasis for aluminum tolerance
in rice (Oryza sativa L.) at different seedling stages.Theor Appl Genet 100: 1295–1303.
Wu P, Ma L, Hou X, Wang M, Wu Y, Liu F, Deng XW. 2003. Phosphate starvation triggers distinct
alterations of genome expression in Arabidopsis roots and leaves. Plant Physiol132: 1260–1271.
Xu K, Xu X, Fukao T, Canlas P, Maghirang-Rodriguez R, Heuer S, Ismail AM, Bailey-Serres J,
Ronald PC, Mackill DJ. 2006.Sub1A is an ethylene-response-factor-like gene that confers
submergence tolerance to rice. Nature 442: 705–708.
Yamaya T, Hayakawa T, Tanasawa K, Kamachi K, Mae T, Ojima K. 1992. Tissue distribution of
glutamate synthase and glutamine synthetase in rice leaves: occurrence of NADH-dependent
110
glutamate synthase protein and activity in the unexpanded, non-green leaf blades. Plant Physiol
100: 1427-1432.
Yan M, Fan X, Feng H, Miller AJ, SHEN Q, XU G. 2011. Rice OsNAR2.1 interacts with OsNRT2.1,
OsNRT2.2 and OsNRT2.3a nitrate transporters to provide uptake over high and low
concentration ranges. Plant Cell Environm34: 1360–1372.
Yi K, Wu Z, Zhou J, Du L, Guo L, Wu Y, Wu P. 2005.OsPTF1, a novel transcription factor involved
in tolerance to phosphate starvation in rice.Plant Physiol138: 2087–2096.
Yoo SC, Cho SH, Zhang H, Paik HC, Lee CH, Li J, Yoo JH, Lee BW, Koh HJ, Seo HS, Paek NC. 2007.
Quantitative trait loci associated with functional stay-green SNU-SG1 in rice. Molecules and
Cells24: 83-94.
Zhou J, Jiao F, Wu Z, Li Y, Wang X, He X, Zhong W, Wu P. 2008,OsPHR2 is involved in phosphate-
starvation signaling and excessive phosphate accumulation in shoots of plants. Plant
Physiol146: 1673–1686.
111
Plant Genome Databases and Comparison of Genome Sequences
Dr. J. Ramalingam, Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
Dr. S. Subashini, Senior Research Fellow Tamil Nadu Agricultural University, Coimbatore.
Aim:
To get familiarized with few plant genome databases and to compare the gene of interest in
those plants.
Description:
Genome Database- Rice
Database for Rice: Oryzabase
Oryzabase is an integrated database of rice genome resources. The Oryzabase consists
of the following parts:
(1) Strain stock information
(2) Mutants information
(3) Chromosome maps
(4) Gene dictionary
(5) Reference list
(6) Fundamental knowledge of rice science
(7) Featured links
Procedure:
1. Open Oryzabase using the URL http://www.nig.ac.jp/labs/PlantGen/english/home-e.html
2. Search for your gene of interest
112
3. Collect the information about its position, Tanscript info, Exon info or the peptide info
Gramene
Gramene is a curated, open-source, data resource for comparative genome analysis in the
grasses. Its goal is to facilitate the study of cross-species homology relationships using
information derived from public projects involved in genomic and EST sequencing, protein
structure and function analysis, genetic and physical mapping, interpretation of biochemical
pathways, gene and QTL localization and descriptions of phenotypic characters and mutations.
Gramene release 36 contains the following information:
1. Genomes
2. Maps and Markers
3. Proteins
4. Ontologies
5. Genes
6. QTL
7. Pathways
8. Diversity
9. Germplasm
10. Literature
11. Website.
Procedure:
1. Open Gramene database using the URL http://www.gramene.org/
2. Select the Genome as your search option.
3. Select any one species of Oryza.
113
4. Select the Chromosome and the location of the gene you are looking for.
5. In the resulting page click on the Locus and get the information about that particular
locus.
6. Then click on the Transcript info, Exon info or the peptide info of your interest.
Genome Database- Maize
Database for Maize: MaizeGDB
MaizeGDB (Maize Genetics and Genomics Database) is a community-oriented, long-term,
federally funded informatics service to researchers focused on the crop plant and model
organism Zea mays. Genetic, genomic, sequence, gene product, functional characterization,
literature reference, and person/organization contact information are among the data types
accessible through this database.
Procedure
1. Open MaizeGDB using the URL: http://www.maizegdb.org/
2. Select the All data option an enter the query of your interest and click go you can get all
information in the MaizeGDB database.
3. Analyze the phenotypic characters and the genes responsible for particular trait.
Genome Database- Arabidopsis
Database for Arabidopsis: TAIR
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and
molecular biology data for the model higher plant Arabidopsis thaliana. Data available from
TAIR includes the complete genome sequence along with gene structure, gene product
information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and
physical markers, publications, and information about the Arabidopsis research community.
114
Gene product function data is updated every two weeks from the latest published research
literature and community data submissions. TAIR also provides extensive link outs from our
data pages to other Arabidopsis resources.
Procedure:
1. Open the Arabidopsis database, The Arabidopsis Information Resource using the URL http://www.arabidopsis.org/
2. From search option, select gene
3. Find the details of the gene of your interest.
Genome Database- Soybean
Database for Soybean
SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for
professionally curated genetics, genomics and related data resources for soybean. SoyBase
contains the most current genetic, physical and genomic sequence maps integrated with
qualitative and quantitative traits. SoyBase also contains the well-annotated ‘Williams 82’
genomic sequence and associated data mining tools. The genetic and sequence views of the
soybean chromosomes and the extensive data on traits and phenotypes are extensively
interlinked. This allows entry to the database using almost any kind of available information,
such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the
repository for controlled vocabularies for soybean growth, development and trait terms, which
are also linked to the more general plant ontologies.
Procedure:
1. Open Soybase - soybean genetics and genomics database using the URL http://soybase.org.
2. Search for gene in the Soybase toolbox provided.
115
Prediction of genes in genomic sequences
Dr.L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
V. Srividhya, Senior Research Fellow Tamil Nadu Agricultural University, Coimbatore
Aim
To computationally predict protein encoding region in a given DNA sequence
Objectives
1. To predict open reading frames (ORFs) in a prokaryotic example
2. To predict the possible start, stop and intron/exon structures in a eukaryotic example.
And, evaluate the gene finding accuracy, as well.
Brief methodology
Gene finding in prokaryotes: Gene finding in prokaryotes is restricted to the search for
open reading frame (ORF) in a DNA sequence. It is that region of the prokaryotic genome which
could potentially encode a protein. An ORF is characterized by the occurrence of a start codon
“ATG” and ending with any of the three termination codons (TAA, TAG, and TGA). Depending
on the starting point, there are six possible ways (three on forward strand and three on reverse
strand) of translating any nucleotide sequence into amino acid sequence according to the
genetic code. These are called reading frames. ORFs are usually encountered when sifting
through pieces of DNA while trying to locate a gene. Since there are altered genetic codes even
among the prokaryotes, the ORF will be identified differently. A typical ORF finder will employ
algorithms based on existing genetic codes and on all possible reading frames.
Gene finding in eukaryotes: The gene structure and the gene expression mechanism in
eukaryotes are far more complicated than in prokaryotes. In typical eukaryotes, the region of
the DNA coding for a protein is usually not continuous. This region is composed of alternating
stretches of exons and introns. During transcription, both exons and introns are transcribed
116
onto the RNA, in their linear order. Thereafter, a process called splicing takes place, in which
the intron sequences are excised and discarded from the RNA sequence. The remaining RNA
segments, the ones corresponding to the exons, are ligated to form the mature mRNA strand.
Gene finding typically refers to the area of computational biology that is concerned with
algorithmically identifying features, usually in a genomic DNA. Gene finding is one of the first
and most important steps in understanding the genome of a species once it has been
sequenced. These include protein-coding genes as well as other functional elements such as
RNA genes and regulatory regions. Three types of information are used in gene prediction
process, i) Signals in the sequence, such as splice sites; ii) Content statistics, such as codon bias;
and iii) Similarity to known genes. The first two types have been used since the early days of
gene prediction, whereas similarity information has been used routinely only in recent years.
One of the reasons that the accuracy of gene-prediction programs have improved in the last
few years is the enormous increase in the number of examples of known coding sequences.
Following gene prediction, determining the function of gene(s) is vital and it is done primarily
through experimentation, although frontiers of bioinformatics research are making it
increasingly possible to predict the function of a gene based on its sequence alone.
Procedure
Step 1: Find manually the presence/absence of an ORF in the sequence below.
Find in which reading frame does ORF falls.
5’ – TGCTTGTCTGATGTGCTGGCTGACGTGATGACATTTGTAGTTGATAG – 3’
117
Step 2: Go to following URL www.ncbi.nlm.nih.gov/gorf/gorf.html. Identify the ORFs in the
given DNA sequences (Test sequence 2 & Test sequence 3)
Step 3: Predict gene using Rice HMM, http://rgp.dna.affrc.go.jp/RiceHMM/ for the “test
sequence 3”. Find out whether your prediction matches with the experimental information on
gene structure known to this sequence (GenBank AF323175).
118
Step 4: Predict gene using “Rice HMM” for the “test sequence 4”. Appreciate the deviations
between predictions and experimentally published results (AB050986). Consult your results
obtained in the light of the following questions, how many exons are predicted? Where do the
predicted exons start and end? If there is a difference, where does the difference lie?
Inference
Discuss your results with respect to the success in predicting genes given the prokaryotic and
eukaryotic examples.
119
Comparative Genomics
Mrs. N. Bharathi, Assistant Professor, Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore. [email protected]
Introduction
The rapidly emerging field of comparative genomics has yielded dramatic results.
Comparative genome analysis has become feasible with the availability of a number of
completely sequenced genomes. Comparison of complete genomes between organisms allow
for global views on genome evolution and the availability of many completely sequenced
genomes increases the predictive power in deciphering the hidden information in genome
design, function and evolution. The rapid progress in genome sequencing demands more
comparative analysis to gain new insights into evolutionary, biochemical, genetic, metabolic,
and physiological pathways. Comparative genomics is the direct comparison of complete
genetic material of one organism against that of another to gain a better understanding of how
species evolved and to determine the function of genes and noncoding regions in genomes. It
includes a comparison of gene number, gene content, and gene location, the length and
number of coding regions (called exons) within genes, the amount of non coding DNA in each
genome, and conserved regions maintained in both prokaryotic and eukaryotic groups of
organisms. Comparative genomics not only can trace out the evolutionary relationship between
organisms but also differences and similarities within and between species.
Applications:
Gene identification
Once genome correspondence is established, comparative genomics can aid gene
identification. Comparative genomics can recognize real genes based on their patterns of
nucleotide conservation across evolutionary time. With the availability of genome-wide
alignments across the genomes compared, the different ways by which sequences change in
120
known genes and in intergenic regions can be analyzed. The alignments of known genes will
reveal the conservation of the reading frame of protein translation (Fitch et al., 1970).
Regulatory motif discovery
Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to
control the expression of genes, dictating the conditions under which a gene will be turned on
or off. Each motif is typically recognized by a specific DNA-binding protein called a transcription
factor (TF). A transcription factor binds precise sites in the promoter region of target genes in a
sequence-specific way, but this contact can tolerate some degree of sequence variation.
Thus, different binding sites may contain slight variations of the same underlying motif,
and the definition of a regulatory motif should capture these variations while remaining as
specific as possible. Comparative genomics provides a powerful way to distinguish regulatory
motifs from non-functional patterns based on their conservation. One such example is the
identification of TF DNA-binding motif using comparative genomics and denovo motif (Fitch et
al., 1995).
Other applications:
Comparative genomics has wide applications in the field of molecular medicine and
molecular evolution. The most significant application of comparative genomics in molecular
medicine is the identification of drug targets of many infectious diseases. For example,
comparative analyses of fungal genomes have led to the identification of many putative targets
for novel antifungal (Batzoglou et al., 2000). This discovery can aid in target based drug design
to cure fungal diseases in human. Comparative analysis of genomes of individuals with genetic
disease against healthy individuals may reveal clues of eliminating that disease. Comparative
genomics helps in selecting model organisms. A model system is a simple, idealized system that
can be accessible and easily manipulated. For example, a comparison of the fruit fly genome
with the human genome discovered that about 60 percent of genes are conserved between fly
and human.
121
Tools for Comparative Genomics
1. GLIMMER is a system for finding genes in microbial DNA, especially the genomes of
bacteria and archaea. GLIMMER (Gene Locator and Interpolated Markov ModelER) uses
interpolated Markov models to identify coding regions.
2. CMR The Comprehensive Microbial Resource (CMR) gives access to a central repository
of the sequence and annotation of all complete public prokaryotic genomes as well as
comparative genomics tools across all of the genomes in the database.
3. CisMols - (Cis-regulatory Modules) is a tool that identifies compositionally predicted cis-
clusters that occur in groups of co-regulated genes within each of their ortholog-pair
evolutionarily conserved cis-regulatory regions.
4. DAVID Bioinformatic Resources - The Database for Annotation, Visualization and
Integrated Discovery (DAVID) provides a comprehensive set of functional annotation
tools for investigators to understand biological meaning behind large list of genes.
5. DCODE.ORG The dcode.org website provides access to tools for comparative genomic
analyses developed by the Comparative Genomics Center at the Lawerence Livermore
National Laboratory. Tools include: zPicture, Mulan, eShadow, rVista, CREME, and the
ECR Browser
6. EnteriX- EnteriX is a collection of tools for viewing pairwise and multiple alignments for
bacterial genome sequences.
7. Mauve - Mauve is a stand-alone software tool for constructing multiple genome
alignments.
8. MicroFootPrinter - MicroFootPrinter identifies the conserved motifs in regulatory
regions of prokaryotic genomes using the phylogenetic footprinting program
FootPrinter.
9. Viral Bioinformatics - Viral Bioinformatics provides access to viral genomes and a variety
of tools for comparative genome analyses.
122
10. TIGR Software Tools - A list of open-source software packages available for free from
The Institute for Genomic Research (TIGR).
11. TraFaC- TraFaC (Transcription Factor Binding Site Comparison) is a tool that identifes
regulatory regions using a comparative sequence analysis approach.
References:
Fitch, W. M., et al., 1970 Syst Zool. 19:99 [PMID: 5449325].
Fitch, W. M., et al., Philos Trans, R., Soc Lond, B., 1995 BiolSci. 349:93 [PMID: 8748022].
Batzoglou, S., et al., 2000 Genome Res. 10:950 [PMID: 10899144].
Mao, L., & Zheng, W. J., 2006 BMC Bioinformatics. 7:s21 [PMID: 17217514].
Sivashankari, S., et al., 2007 Comparative genomics - A perspective Bioinformation. 1(9): 376-
378.
123
Microarray-based gene expression analysis
M. Mugilisai, IV yr. B. Tech (Bioinformatics) S. Saravanakumar, Senior Research Fellow
Dr.L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,
Tamil Nadu Agricultural University, Coimbatore [email protected]
Aim
To identify gene(s) those are differentially expressed under varied experimental
conditions in a microarray dataset.
Objectives
1. To download microarray data from Gene Expression Omnibus (GEO) database
2. To extract expression values for a select set of genes of choice from the microarray data
file using array id/element id of the gene(s)
3. Visualization of the differential expression pattern using MeV
Brief methodology
The key step in the differential expression analysis is “Clustering”. It basically group’s
similar samples or genes together. Hierarchical clustering (HCL) is a commonly used clustering
algorithm. A hierarchical clustering of samples is a tree representation of their relative similarity
in expression profiled of the features over a set of samples or groups. The hierarchical
clustering algorithm requires distance measure and a cluster linkage. The cluster distance
metric specifies the distance between two clusters. The default distance metric used for HCL is
Pearson correlation. The most common measure of correlation in statistics is the Pearson
Correlation, which shows the linear relationship between two variables. Correlation between
variables is a measure of how well the variables are related. Cluster linkage, this parameter is
used to indicate the cluster-to-cluster distances when constructing the hierarchical tree. In
124
single linkage, the distances are measured between each member of one cluster and each
member of the other cluster.
Multi Experiment Viewer (MeV) is the tool which is used for this purpose, it is an open
ware. MeV is a desktop application meant for the analysis, visualization and data mining of
large scale genomic data. It is a versatile microarray tool, for clustering, visualization,
classification, statistical analysis and biological theme discovery. MeV can be freely downloaded
from the site http://www.tm4.org/mev/.
Procedure
Step 1: Open GEO database (URL: http://www.ncbi.nlm.nih.gov/geo/) and enter the following
GSE ID (GSE6901) in the tab GEO accession.
125
Step 2: Download the “Series Matrix File” which contains the microarray experimental data
Step 3: Extracting the expression values of a few select gene(s) from the entire microarray data
file was carried out via searching the file for element/array id’s of the corresponding genes.
Towards this, firstly the array id/element ids need to be decoded. Since the present exercise
deal with the expression of rice genes, Rice Oligonucleotide Array Database (URL:
http://www.ricearray.org) will be used to find the array id/element ids.
Click, “Element Search” “Single Element Search”, enter the GenBanK id (eg. Os01g0818400)
of the gene(s) of interest (this information should be known prior to us).
Example GenBank id’s: Os01g0818400, Os01g0818600, Os01g0818800, Os01g0818900,
Os01g0819000 & (Os06g0681400, ubiquitin an internal control gene)
126
The following are the corresponding element Id’s for above said gene ids:
Os.24893.1.S1_a_at, Os.34757.1.S1_at, Os.54860.1.S1_at, Os.51620.1.S1_at, Os.25065.3.S1_at
& Os_Ubiquitin_M_f_at
Note: The element id is unique for each of the microarray platforms eg (Agilent, Affymetrix and
NSF etc.).
Step 4: Subsequently, the microarray expression data was extracted by copying the values into
a new file. A word search using “find” option has to be carried out as below.
127
Step 5: Instead of retaining the columns by GSM E ids, rename the column by experimental
descriptors for easy interpretation (eg. Drought stress or control)
Step 6: The new file carrying the extracted data should be save in text format (*.txt ,Text tab
delimited). Example file name: GSE 6901 data.txt
128
Step 7: Download MeV Tool from the URL: www.tm4.org/mev. After downloading the MeV,
open the folder and then click Windows batch file to run MeV
Step 8: Working in MeV
MeV, go to File Load Data, on the expression loader, select Browse tab
Choose your file Example: GSE 6901 data.txt
129
A diagram will pop up with lots of tiny green and red boxes as shown below. This is the Heat
map View. Each row represents a specific gene, whereas each column represents each sample,
or experiment. The lightest green boxes are the most “under expressed” genes, the brightest
red being the most “over expressed” genes. This is called result tree
Step 9: For Hierarchical Clustering (HCL), on the left top click the “Clustering” drop down menu.
Choose the first option, Hierarchical Clustering
130
Step 10: Choose 1- “Pearson correlation” and “Single linkage” method to get hierarchical
clustering output. To view the clusters, double click “HCL” on the navigation tree on the left
side of the window.
Inference
Analyze the heat map generated for up/down (differential) regulation of genes under the
given experimental conditions.
Note
Gene expression under specific conditions can also be analyzed by using .soft files from
GEO database.