Approaches to Plant Genomics - btistnau.inbtistnau.in/PDF/BTIS Manual-2013.pdf · Traditional mutagenesis for Plant Functional Genomics 60 10. Genomic approaches of nitrogen and phosphorus

Training program on

Approaches to Plant Genomics March 4 - March 8, 2013

Coordinator Dr. P. Nagarajan

Professor & Head (DPMB&B)

Co-coordinator Dr. J. Ramalingam

Professor

Compiled by Dr. P. Nagarajan, Professor & Head

Dr. J. Ramalingam, Professor

Distributed Information Sub Centre (DISC) Department of Plant Molecular Biology and Bioinformatics

Centre for Plant Molecular Biology and Biotechnology Tamil Nadu Agricultural University

Coimbatore- 641003

TABLE OF CONTENTS

S.No Title Page No.

Theory Sessions

1. Plant Genome Databases 1

2. Application of Next Generation Sequencing in Plant Genomics 7

3. Gene annotation 14

4. Transcriptomics for miRNAs: Innovative Frontiers in Regulatory Genomics 18

5. Next Generation Sequencing Technology platform for development of SNP markers and association mapping studies

29

6. Functional genomics 38

7. Genome Mapping: Establishing route maps of Genomes 41

8. Nutrigenomics in Agriculture 55

9. Targeting Induced Local Lesions IN Genomes (TILLING): Application of Traditional mutagenesis for Plant Functional Genomics

60

10. Genomic approaches of nitrogen and phosphorus efficient rice 69

Practical Sessions

1. Plant Genome Databases and Comparison of Genome Sequences 111

2. Prediction of genes in genomic sequences 115

3. Comparative Genomics 119

4. Microarray-based gene expression analysis 123

1

Plant Genome Databases

Dr. J.Ramalingam, Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,

Tamil Nadu Agricultural University, Coimbatore. [email protected]

Databases are capable of storing vast amounts of data. A database processes the data

stored in its files and transform into usable information when the need arises. The plant

genomics research group focuses on the analysis of plant genomes, using bioinformatics

techniques. To store and manage the data, many databases were formed which provides a data

and information resource for individual plant species. In addition the plant databases can

provide a platform for integrative and comparative plant genome research. Number of plant

databases is available which are repository for morphological, markers and sequence data. The

information on fully sequenced organism like Rice, Arabidopsis, Maize and Soybean are

valuable for any scientific work.

Rice

Rice (Oryza sativa L.) is the staple food for more than three billion people, over half the

world’s population. It provides 27% of dietary energy and 20% of dietary protein in the

developing world. About 90% of the world’s rice is grown and consumed in Asia (Zeigler &

Barclay, 2008). Research into rice is crucial for the development of technologies that will

increase the productivity for farmers who rely on rice for their livelihood. Research in rice is

aimed at increasing the yield, yield stability and improving resistance against abiotic and biotic

stresses. There exist a huge collection of rice germplasm carrying useful genes for the above

traits and have been continuously exploited by the plant breeders towards developing

improved rice varieties. Detailed documentation of the germplasm/varieties facilitated by an

easy and efficient search and retrieval system is an absolute necessity considering the wide

range of breeding activities taken up globally in rice, and other crops. There are a selective

public/private databases carrying molecular and genetic data of specific model species such as

2

maize (Lawrence et al 2005), Medicago (Cannon et al 2005), rice (Sakata et al 2002) and

Arabidopsis (Poole, 2007) which caters to more of an investigative nature of research. However,

when it comes to applied varietal developmental programs, familiarity of the plant breeders on

aspects such as, pedigree, yield, resistance levels to biotic and abiotic stresses etc., is important

for evolving a successful hybridization program. TNAURice, a database on rice was developed

from this centre to serve as ready reckoner to the rice breeders; it is a compilation of the rice

varieties developed from Tamil Nadu Agricultural University, Coimbatore.

The use of database technologies has drawn the attention of a subset of the biological

community, as of now it is limited to a small sector of the scientific community. However, more

and more individual institutions are generating experimental data on a large scale and are in

need of developing and managing databases of their own. Effective use of information may

strongly promote biological studies, and may lead to many important findings. Plant germplasm

resources play an important role in crop improvement programs. The use and benefits derived

from conserved germplasm is the sole criterion for assessing the genetic conservation program

of a crop. The use of rice germplasm includes both direct use in rice production and indirect use

in rice breeding programs as parents. Presently, information regarding the available and

evaluated germplasm is rarely documented in the form of a well structured database. It is

therefore, time-consuming and laborious for individual researchers to extract and organize

varietal information from sources such as, notebooks/records or from typed electronic pages.

Thus the primitive nature of documentation of the varietal information is a major limitation and

restricts its utility by time and space. This draws an urgent need to consolidate rice germplasm

through a database system.

There are few Rice Genome Databases

1. Oryza Base

2. Gramene

Oryzabase

The Oryzabase (http://www.nig.ac.jp) is a comprehensive rice science database established

in 2000 by rice researcher's committee in Japan. The database is originally aimed to gather as

3

much knowledge as possible ranging from classical rice genetics to recent genomics and from

fundamental information to hot topics. The Oryzabase consists of five parts, (1) genetic

resource stock information, (2) gene dictionary, (3) chromosome maps, (4) mutant images, and

(5) fundamental knowledge of rice science. This database provides more extensive cross-

referencing of Oryzabase to the major DNA sequence database, literature database and other

plant databases in order to provide the wealth of information to rice researchers. It currently

contains information on NBRP Strains: 17,229, Strains: 10,938, Markers: 4,904, Trait genes:

4,753, References for new gene and sequences: 5,739

Gramene

Gramene (http://www.gramene.org) is a curated, open-source, data resource for

comparative genome analysis in the grasses. Its goal is to facilitate the study of cross-species

homology relationships using information derived from public projects involved in genomic

and EST sequencing, protein structure and function analysis, genetic and physical mapping,

interpretation of biochemical pathways, gene and QTL localization and descriptions of

phenotypic characters and mutations

The rice genome is more than a resource for understanding the biology of a single

species. It is a window into the structure and function of genes in other crop grasses as well.

Using rice as the sequenced reference genome, researchers can identify and understand the

relationships among genes, pathways and phenotypes in a wide range of grass species.

Extensive work over the past two decades has shown remarkably consistent conservation of

gene order within large segments of linkage groups in rice, maize, sorghum, barley, wheat, rye,

sugarcane and other agriculturally important grasses. A substantial body of data supports the

notion that the rice genome is substantially colinear at both large and short scales with other

crop grasses, opening the possibility of using rice synteny relationships to rapidly isolate and

characterize homologues in maize, wheat, barley and sorghum.

As an information resource, Gramene's purpose is to provide added value to data sets

available within the public sector, which will facilitate researchers' ability to understand the

rice genome and leverage the rice genomic sequence for identifying and understanding

4

corresponding genes, pathways and phenotypes in other crop grasses. This is achieved by

building automated and curated relationships between rice and other cereals for both

sequence and biology. The automated and curated relationships are queried and displayed

using controlled vocabularies and web-based displays. The controlled vocabularies

(Ontologies), currently being utilized includes Gene ontology, Plant ontology, Trait ontology,

Environment ontology and Gramene Taxonomy ontology. The web-based displays for

phenotypes include the Genes and Quantitative Trait Loci (QTL) modules. Sequence based

relationships are displayed in the Genomes module using the genome browser adapted from

Ensembl, in the Maps module using the comparative map viewer (CMap) from GMOD, and in

the Proteins module displays. BLAST is used to search for similar sequences. Literature

supporting all the above data is organized in the Literature database.

Database for Arabidopsis - TAIR

The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org) maintains a

database of genetic and molecular biology data for Arabidopsis thaliana, a widely used model

plant. TAIR is located at the Carnegie Institution for Science Department of Plant Biology,

Stanford, California. Funding is provided by the National Science Foundation, (Grant No. DBI-

0850219). Additional support is provided by corporate and nonprofit organizations through the

TAIR Sponsorship Program.TAIR collaborates with the Arabidopsis Biological Resource Center

(ABRC) to provide researchers with the ability to search and order stocks. The ABRC's mission is

to aquire, preserve and distribute seed and DNA resources that are useful to the Arabidopsis

research community.TAIR datasets can also be downloaded for your convenience. In addition,

pages on news, information on the Arabidopsis Genome Initiative (AGI), Arabidopsis lab

protocols, and useful links are provided.

Data available from TAIR includes the complete genome sequence along with gene

structure, gene product information, metabolism, gene expression, DNA and seed stocks,

genome maps, genetic and physical markers, publications, and information about the

Arabidopsis research community. Gene product function data is updated every two weeks from

the latest published research literature and community data submissions. TAIR also provides

5

extensive linkouts from our data pages to other Arabidopsis resources. It currently holds the

information on AA Sequence: 72149, Clone: 1998885, Clone End: 1883475, Gene: 76850,

Database Reference: 1602, Flanking Sequence: 1160572, Genetic Marker: 4087, Nucleotide

sequence: 2766322, Germplasm: 547838, Protein Domain: 27203, Restriction Enzyme: 619,

Vector: 503

Database for Maize: MaizeGDB

The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for

maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal

mapping data. MaizeGDB represents the synthesis of all data available previously from ZmDB

(Zea mays DataBase) and from MaizeDB-databases that have been superseded by MaizeGDB.

MaizeGDB is a USDA/ARS funded project which develops an effective interface to access this

data and develop additional tools to make data analysis easier. Its goal in the long term is to

have a true next-generation online maize database. In addition, MaizeGDB provides contact

information for over 2400 maize cooperative researchers, facilitating interactions between

members of the rapidly expanding maize community.

MaizeGDB (http://www.maizegdb.org) is the community database for biological

information about the crop plant Zea mays ssp. mays. Genetic, genomic, sequence, gene

product, functional characterization, literature reference, and person/organization contact

information are among the data types accessible through this database. MaizeGDB provides

web-based tools for ordering maize stocks from several organizations including the Maize

Genetics Cooperation Stock Center and the North Central Regional Plant Introduction Station

(NCRPIS). Sequence searches yield records displayed with embedded links to facilitate ordering

cloned sequences from various groups including the Maize Gene Discovery Project and the

Clemson University Genomics Institute. MaizeGDB can be accessed at

http://www.maizegdb.org.

6

Database for Soyabean - Soybean Genome Database (SoyGD)

The soybean genome is comparatively large, approximately 1,112 MB; a diploidized

tetraploid, the product of two genome duplications that occurred 8-10 MYA(Million years ago)

and 40-50 MYA; extensively segmented and reshuffled; and, 64 percent euchromatic and 36

percent heterochromatic. The Soybean Genome Database (SoyGD) genome browser

(http://soybeangenome.siu.edu) has, since 2002, integrated and served the publicly available

soybean physical map; bacterial artificial chromosome (BAC) fingerprint database and genetic

map associated genomic data.

Above is the information on some of the agricultural important organisms. Numerous

plant genomes were sequenced and are in sequencing process and are available online

(www.genomesonline.org).

References:

Zeigler, R.S., and Barclay, A., 2008 Rice 1: 3.

Lawrence, C.J., et al., 2005 Plant Physiol 138: 55.

Cannon, S.B., et al., 2005 Plant Physiol 138: 38.

Sakata, K., et al., 2002 Nucleic Acids Res 30: 98.

Poole, R.L., 2007 Methods Mol Biol 406:179.

7

Application of Next Generation Sequencing in Plant Genomics

Dr. K.K. Kumar, Assistant Professor Department of Plant Biotechnology, CPMB&B,


DNA sequencing was performed almost exclusively by the Sanger Automated DNA

sequencing method before 2005, which has excellent accuracy and reasonable read length but

very low throughput. Sanger sequencing was used to produce the reference genome of

Arabidopsis thaliana, Oryza sativa, Escherichia coli, Drosophila melanogaster, Caenorahabditis

elegans, and human. Since 2005, new types of sequencing instruments have been introduced

that permit amazing acceleration of the data collection rates for DNA sequencing, hence called

Next Generation Sequencer (NGS). The next-generation sequencing platforms are also referred

to as ‘second-generation platforms’, as they constitute the second phase of technologies after

Sanger sequencing. The next generation sequencing can produce over 100 times more data

compared to the most sophisticated capillary sequencer based on the Sanger Method. Hence,

NGS is a high throughput DNA sequence reducing the time and cost of sequencing. High

throughput sequencing machines and advancement of modern bioinformatics tools at

unprecedented pace, the target goal of sequencing individual genomes of living organism at a

cost of $1,000 each is seemed to be realistically feasible in the near future. A common strategy

for NGS is to use DNA synthesis or ligation process to read through many different DNA

templates in parallel (Fuller et al., 2009). Hence, Next generation sequencer also termed as

massively parallel sequencing. Second-generation sequencing platforms that produce large

amounts (typically millions) of short DNA sequence reads of length typically between 25 and

400 bp. While these reads are shorter than the traditional Sanger sequence reads with 700-

1200 bp.

There are five Next generation sequencing platforms available commercially. Among

them, the Roche/454 FLX, the Illumina/Solexa Genome Analyzer, and the Applied Biosystems

(ABI) SOLiD Analyzer are currently dominating the market. The other two platforms, the

8

Polonator and the Helicos have just recently been introduced and are not widely used. Very

recently, newer next-generation sequencing platforms have emerged that are referred to as

‘third-generation platforms’, as they represent a further paradigm shift in sequencing

technology compared with the Sanger platform and the Illumin , SoLiD and Roche 454

platforms. Two such platforms commercially available are the Pacific Biosciences PacBio RS and

the Ion Torrent Personal Genome Machine

Application of Next Generation Sequencing

Next generation sequencing technologies are being utilized for de novo sequencing of

bacterial and lower eukaryotic genomes, genome re-sequencing and transcriptome analysis

(RNA-sequencing). Compared to Sanger sequencing there are many different uses for next-

generation sequencer. The NGS is also used for genome-wide profiling of epigenetic marks and

chromatin structure using other seq-based methods (ChIP–seq, methyl– seq and DNase–seq),

and species classification and gene discovery by metagenomics studies (Metzker, 2010) . Ever

declining sequencing costs and increased data output and sample throughput for Next

generation sequencing and Third generation sequencing technologies enable the plant

genomics and breeding community to undertake genotyping-by-sequencing (GBS).

a. Plant genome sequencing

The preferred method of Plant whole genome sequencing is to identify the overlapping

BAC clones covering the entire genome that had been isolated and located on a physical map,

followed by sequencing the individual BAC clones, and then genome assembly is done. This

method is termed the BAC by BAC sequencing method. The benefit of BAC by BAC genome

sequencing is the reduction of complexity, as only relatively short regions are assembled at one

time, and this approach is still favoured for genomes which contain a large number of repetitive

elements. This strategy was used for the sequencing the rice, Arabidopsis, Poplar, Grape, Peach,

Sorghum (Paterson et al. 2009) and Eucalyptus using the standard Sanger sequencing. BAC

sequencing can use traditional Sanger or NGS platforms like Roche 454 or Illumina GAII or

potentially combinations of these methods. After the availability of NGS platforms, the

9

approaches to genome sequencing have also changed, moving from laborious BAC by BAC to

whole genome shotgun sequencing (WGS) methods (reviewed by Imelfort & Edwards, 2009) .

The cost of producing and mapping the BAC library as well as the significant cost associated

with the sequencing of large numbers of BACs makes BAC by BAC sequencing approaches

increasingly unfavourable and several genome sequencing projects which started with the BAC

by BAC approach has now moved to whole genome shotgun (WGS) methods using NGS.

Recently, different NGS platforms are used for sequencing the plant genome (reviewed

by Hamilton and Buell, 2012). For example, NGS platform, Roche 454 technology is being used

to sequence the 430 Mbp genome of Theobroma cacao, while a combination of Sanger and

Roche 454 Sanger sequencing and 454 (4 X versus 12 X coverage, respectively) is being used to

interrogate the apple genome. Illumina Solexa and Roche 454 sequencing has also been used

to characterize the genomes of cotton. Roche 454 sequencing has been used to survey the

genome of Miscanthus, while Sanger, Illumina Solexa and Roche 454 sequencing is being used

to interrogate the banana genome. The Roche 454 platform has also been utilized for de novo

sequencing of plant genomes. The genomes of the cucumber (Cucumis sativus, 376 Mb) and A.

thaliana (157 Mb) were sequenced as a proof-of-principle using the 454 FLX Titanium platforms,

with mate-paired libraries being essential to the assembly process. Roche 454 has been applied

for the sequencing of complex BACs from barley and this has been complemented by repeat

characterization using WGS Illumina Solexa data. However, the size of the wheat genome is

much larger (17 000 Mbp) than related cereal genomes such as barley (Hordeum vulgare, 5000

Mbp), rye (Secale cereale, 9100 Mbp) and oat (Avena sativa, 11 000 Mbp). The size and

hexaploid nature of the wheat genome creates significant problems in elucidating its genome

sequence. However, bread wheat (Triticum aestivum) draft genome sequencing was done

using the Roche 454 sequencer (Brenchley et al., 2012)

b. Plant Genotyping with Next Generation Sequencing

The genotyping of plants has progressed through a range of technologies like RFLP,

RAPD, AFLP followed by microsatellites or simple sequence repeats SSR. Since 2000, single

10

nucleotide polymorphism (SNP) markers have increasingly replaced the SSR markers as more

sequence data and techniques for SNP discovery and analysis have advanced. SNP analysis

methods with greater repeatability compared to SSR marker methods have been developed

facilitating much more widespread adoption of SNP methods. By comparing the DNA sequence

of any plant genotype with its reference genome help to identify the SNP marker. The discovery

of genetic polymorphism between plants has been based upon a range of mutation detection

techniques and DNA sequencing. With the availability of NGS platforms, the DNA sequencing

has recently become a more cost-effective method for discovery of sequence differences

(Mardis, 2008).

Genome sequencing projects focus on obtaining the full genome sequence (reference

genome) of a single organism as a representative of a species, polymorphism studies focus on

genome-wide variation present within a species. Although the full genome sequence of a

species provides valuable information regarding homology and distribution of genes across a

genome, it does not provide information regarding the dynamic processes that affect a genome

and the genes therein. Next generation offer the scope for re-sequencing the individual plant

cultivar genome for which reference gene is present at low cost and time. Comparison of the

individual plant cultivar genome sequence with the reference gene will help to identify these

genetic variants, including single nucleotide variants (SNVs) or single nucleotide polymorphisms

(SNPs), small insertions and deletions (indels, 1–1000 bp), and structural and genomic variants

(>1000 bp). As increasing numbers of reference genomes become available, it is expected that

whole genome re-sequencing of crop genomes will become common. Ossowski et al (2010)

reported the re-sequencing of reference Arabidopsis accession Col-0 and two divergent strains,

Bur-0 and Tsu-1 using Illumina Solexa technology to produce between 15 X and 25 X coverage.

As well as finding over 2000 potential errors in the reference genome sequence, they identified

more than 800 000 unique Single nucleotide polymorphism (SNPs) and almost 80 000 1–3 bp

indels. The resequencing of rice and Medicago truncatula has also been undertaken using the

Illumina Solexa for SNP discovery. A potential direct application of genome-wide SNP studies in

11

plant breeding is the generation of markers for the construction of (dense) genetic linkage

maps.

c. Transcriptomics

Next-generation high-throughput RNA sequencing technology (RNA-Seq) is a recently-

developed method for discovering, profiling, and quantifying RNA transcripts with several

advantages over other expression profiling technologies including higher sensitivity and the

ability to detect splicing isoforms and somatic mutations. Transcriptional profiling using high

throughput next generation sequencing (RNA-Seq) has emerged as a superior alternative to

microarrays. With the availability of faster and cheaper next generation sequencing platforms,

more transcriptomic analyses are performed using a recently-developed deep sequencing

approach, RNA-Seq (Wang et al., 2009). RNA-seq is a NGS method that sequences the

transcriptome (all RNA transcripts). RNA-Seq, with its high resolution and sensitivity, has

revealed many novel transcribed regions and Splicing isoforms of known genes. In addition,

results from RNA-Seq suggest the existence of a large number of novel transcribed regions in

every genome surveyed, including those of A. thaliana (Lister et al., 2008). These novel

transcribed regions, combined with many undiscovered novel splicing variants, suggest that

there is considerably more transcriptomic complexity than previously appreciated.

A related application of next-generation sequencing technologies to the analysis of

transcriptomes is small RNA such as miRNA and siRNA discovery and profiling. High-throughput

sequencing offers a greater potential for the identification of novel small RNAs as well as

profiling of known and novel small RNA genes. Small RNA profiling with 454 pyrosequencing

technology has been widely reported, which include studies in the moss Physcomitrella patens

(Axtell et al., 2006), A. thaliana (Henderson et al., 2006)

d. Other application of NGS

Next-generation sequencing methods with single-base resolution have begun to

characterize the epigenome, i.e. modifications to the DNA that do not affect the DNA sequence.

12

This includes DNA methylation, histone modification, nucleosome density and chromatin state.

Examination of the A. thaliana epigenome, which was surveyed at the single-base level using

the Illumina platform, revealed that DNA methylation was not evenly dispersed throughout the

genome or the genes, and that the location and abundance of small RNA targets were positively

associated with cytosine methylation (Lister et al., 2008).

References:

Axtell, M. J., Snyder, J. A., and Bartel, D. P., 2007 Common functions for small RNAs of land

plants. Plant Cell 19: 1750-1769.

Brenchley, R., Spannag, M., Pfeifer, M., et al., 2012 Analysis of the breadwheat genome using

whole-genome shotgun sequencing. Nature 491: 705-710.

Fuller, C. W., Middendorf, L. R., Benner, S. A, Church, G. M., Harris, T., Huang, X., Jovanovich, S.

B., Nelson, J. R., Schloss, J. A., Schwartz, D. C., and Vezenov, D. V., 2009 The challenges of

sequencing by synthesis. Nat.Biotechnol. 27:1013–1023.

Hamilton, J. P., and Buell, C. R., 2012 Advances in plant genome sequencing. The Plant J 70:

177-190.

Henderson, I. R., Zhang, X., Lu, C., Johnson, L., Meyers, B. C., Green, P. J., and Jacobsen, S. E.,

2006 Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and

DNA methylation patterning. Nat Genet. 38: 721-725.

Imelfort, M., and Edwards, D., 2009 De Novo seuqencing of plant genomes using second-

generation technologies. Briefings in Bioinformatics 10: 609-618.

Lister, R., O’Malley, R.C., Tonti-Filippini, J., Gregory, B. D., Berry, C. C., Millar, A. H. and Ecker, J.

R. 2008 Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell

133: 523–536.

Mardis, E. R., 2008 Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet.

9: 387-402.

13

Metzker, M. L., 2010 Sequencing technologies- the next generation. Nature Genetics 11: 31-46.

Paterson, A. H., Bowers, J. E., Bruggmann, R. et al., 2009 The Sorghum bicolor genome and the

diversification of grasses. Nature 457: 551–556.

14

Gene annotation

Dr. L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,


The “Genome” is universally defined as the total nuclear DNA content of a haploid cell

or half the DNA content of a diploid cell of any given organism. Sequencing genome(s) is

regarded the way for an ultimate characterization of an organism. The first genome to be

sequenced was a small phage Φ-X174 of size 5.38 Kb in 1977, only after the advent of the

Sanger’s dideoxy method of DNA sequencing. The next major happening was the sequencing of

the first bacterial genome, Haemophilus influenzae of 1.8 Mb in size during 1995. These were

however remarkable achievements in the field of genomics which paved way for the successful

sequencing of hundreds of prokaryotic and a few eukaryotic genomes so far.

Concept of model plant genomes

Model organisms (Drosophila melanogaster, Caenorhabditis elegans, and

Saccharomyces cerevisiae) provide genetic and molecular insights into the biology of more

complex species. The effort to determine the nucleotide sequence of a plant genome has issues

with respect to the genetic makeup of the plants. The range of plant genome size is very large

extending from approximately the same size as the genome of many small animals through

more than five times as large as the human genome for many of the domesticated crops to

almost forty times as large as the human genome for some ornamental flowers. Hence, the

plant biologists started resorting to the concept of model organism which share features such

as being diploid and appropriate for genetic analysis, being amenable to genetic

transformation, having a (relatively) small genome and a short growth cycle, having commonly

available tools and resources, and being the focus of research by a large scientific community.

The species now used as model organisms for mono- and dicotyledonous plants are rice (Oryza

sativa) and Arabidopsis (Arabidopsis thaliana) respectively. And today we have a long list of

plants for which the complete genome sequence has been published (see table 1).

15

Table 1: The plant genome with the complete sequence published

Sl. No Plant Chr. No (n) Size (Mb)

1 Arabidopsis thaliana (thale cress) 5 120

2 Oryza sativa (rice) 12 430

3 Glycine max (soybean) 20 950

4 Medicago truncatula (barrel medic) 8 500

5 Populus trichocarpa (black cottonwood) 19 480

6 Sorghum bicolor 10 690

7 Vitis vinifera (wine grape) 19 500

8 Zea mays (corn) 10

Gene annotation

Whole genome sequencing of plants and animals had resulted in huge volumes of

sequence data which pose to be highly incomprehensible. Understanding the genome

constitution, growth and development, evolution and ultimately every other relevant issues are

presently being addressed by a combination of experimental and bioinformatics tools. As a

result of this synergy, there has been a tremendous progress in some of the areas which

includes: gene finding and genome annotation, comparative genomics, allele mining, protein-

protein interactions, structure-function relationships, in silico dissection of quantitative trait

loci, global transcript profiling, deciphering metabolic and signaling pathway, drug discovery,

diversity analysis and phylogeny.

Gene finding typically refers to the area of computational biology that is concerned with

algorithmically identifying stretches of sequence that are biologically functional. This especially

includes protein-coding genes, but may also include other functional elements such as RNA

genes and regulatory regions. Gene finding is one of the first and most important steps in

understanding the genome of a species once it has been sequenced. In the genomes of

prokaryotes, genes have specific and relatively well-understood promoter sequences (signals),

such as the Pribnow box and transcription factor binding sites, which are easy to systematically

16

identify. Also, the sequence coding for a protein occurs as one contiguous open reading frame

(ORF), which is typically many hundreds or thousands of base pairs long. Furthermore, protein-

coding DNA has certain periodicities and other statistical properties that are easy to detect in

sequence of this length. These characteristics make prokaryotic gene finding relatively

straightforward, and well designed systems are able to achieve high levels of accuracy.

However, gene finding in eukaryotes, especially complex organisms like humans, is

considerably more challenging for several reasons. First, the promoter and other regulatory

signals in these genomes are more complex and less well-understood than in prokaryotes,

making them more difficult to reliably recognize. Two classic examples of signals identified by

eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. Second, splicing

mechanisms employed by eukaryotic cells mean that a particular protein-coding sequence in

the genome is divided into several parts (exons), separated by non-coding sequences (introns).

Splice sites are themselves another signal that eukaryotic gene finders are often designed to

identify. A typical protein-coding gene in humans might be divided into a dozen exons, each less

than two hundred base pairs in length, and some as short as twenty to thirty. It is therefore

much more difficult to detect periodicities and other known content properties of protein-

coding DNA in eukaryotes.

Gene finders for both prokaryotic and eukaryotic genomes typically use complex

probabilistic models, such as Hidden Markov Models, in order to combine information from a

variety of different signal and content measurements. The GLIMMER system is a widely used

and highly accurate gene finder for prokaryotes. Eukaryotic ab initio gene finders, by

comparison, have also achieved a reasonable level of success notable example is GENSCAN

(Figure 1).

17

Figure 1: GENSCAN model indicating states and transition, states are parameterized with signal

and content features

References:

Durbin, Krogh, and Mitchison. 1998 Biological sequence analysis: Probabilistic models of

proteins and nucleic acids. Cambridge University Press

Zhang, M. Q., 2002 Computational prediction of eukaryotic protein-coding genes. Nature

Reviews Genetics 3: 698-709

Mathe, C et al., 2002 Current methods of gene prediction, their strengths and weaknesses.

Nucleic Acids Research 30: 4103-4117

Yandell, M et al., 2012 A beginner’s guide to eukaryotic genome annotation. Nature Reviews

Genetics 13: 329-342

18

Transcriptomics for miRNAs: Innovative Frontiers in Regulatory Genomics

Dr. N. Manikanda Boopathi, Assistant Professor,

Department of Plant Molecular Biology and Bioinformatics, CPMB&B, Tamil Nadu Agricultural University, Coimbatore.

[email protected]; [email protected]

Prelude

MicroRNAs (miRNAs) are small, endogenous RNAs (~20 – 24 bp in length) that can play

important regulatory roles in plant growth and development under normal and stressed

conditions (Bartel, 2004). Usually miRNAs are endogenously expressed as non-translated RNAs

and are processed by Dicer-like proteins from stem loop regions of longer RNA precursors

(Figure 1).It comprises one of abundant classes of gene regulatory molecules in multicellular

organisms and likely influence the output of many protein-coding genes. MiRNAs are chemically

and functionally similar to small interfering RNAs. MiRNAs are deeply conserved within both the

plant and animal kingdoms. However, there are substantial differences between the two

lineages with regard to the mechanism and scope of miRNA-mediated gene regulation (Jones-

Rhoades et al., 2006).

Historical perspectives

In 1993, laboratories of Drs.Ambros and Ruvkunindependently investigated the first

endogenous 22 bp RNAs in Caenorhabditis elegans. Theyfound that a small 22 nucleotides RNA

derived from a gene called lin-4 suppressed the expression of lin-14, which controls the timing

of Caenorhabditis elegans larval development (Lee et al., 1993; Wightman et al., 1993); lin-4 is

recognized as the founding member of a new class of small RNAs called miRNAs and it was

clearly noticed the transcript of these genes were not produced for protein. Since lin-4 was

recognized as a miRNA in 2001, thousands of miRNAs have been identified from various

organisms including C. elegans, Drosophila melanogaster, fish, mouse, rat, plants and human

(Lee and Ambrose, 2001). Although investigations of plant miRNAs are less extensive, the

19

versatile functions of plant miRNAs are becoming clear. A majority of miRNA functions are

currently investigated by over-expression or lowered expression of miRNAs. Plant miRNAs are

mostly involved in transcriptional regulation, developmental timing, cell death, cell proliferation

and even nutrient metabolism.

Biogenesis of plant miRNAs

MiRNA biogenesis in plants requires multiple steps in order to form mature miRNAs from

miRNA genes (there are excellent reviews on this such as Bartel, 2004 and Jones-Rhoades,

2006). Briefly, what has been known so far in biogenesis of plant miRNAs can be described

below and illustrated in Figure 1.First, a miRNA gene is transcribed to a primary miRNA (pri-

miRNA), which is usually a long sequence of more than several hundred nucleotides. This step is

controlled by Polymerase II enzymes. Second, the pri-miRNA is cleaved to a stem loop

intermediate called miRNA precursor or pre-miRNA. This step is controlled by the Dicer-like 1

enzyme (DCL1). In animals, pre-miRNAs are then transported by exportin 5 from the nucleus

into the cytoplasm. Only the active miRNA strand of the duplex, but not the passenger strand

(miRNA*) is incorporated into the RNA induced silencing complex (RISC). In this step, however,

plant miRNAs differ from animals in that they cleaved into miRNA: miRNA* duplex possibly by

dicer-like enzyme 1 (DCL1) in the nucleus rather than in the cytoplasm, then the duplex is

translocated into the cytoplasm by HASTY, the plant orthologue of exportin 5. The

accumulation of mature miRNAs is thought to be affected by DCL1 in response to different

stresses and phytohormone like GA and ABA. In the cytoplasm, miRNAs are unwound into

single strand, mature miRNAs by helicase. Finally, the mature miRNAs enter a ribonucleoprotein

complex known as the RNA induced silencing complex (RISC) where they regulate targeted gene

expression. This suggests that miRNA biogenesis is required several enzymes for processing of

long pri-miRNA to ~20–24 bp mature miRNAs. The mature miRNA is typically 19–22 bplong.

20

a) b)

Figure 1. Schematic illustration of miRNA a) biogenesis and b) mechanism of action

General features of plant miRNAs

Since, the first report on plant miRNA was published in 2002, thousands of plant miRNAs

were reported by employing several methods (see below) and a list of experimentally validated

miRNAs is being updated at plant miRNA database, PMRD

(http://bioinformatics.cau.edu.cn/PMRD),miRNAdatabase (http://www.sanger.ac.uk/Software/

Rfam/mima/), miRNEST (http://lemur.amu.edu.pl/share/php/mirnest /home.php) and other

crop specific databases such as Arabidopsis (http://sundarlab.ucdavis.edu/mirna/).

Experimental and computational analysis has indicated that many plant miRNAs and their

targets are conserved between monocots and dicots (Sunkar et al., 2008). Axtell and Bartel

(2005) employed microarray technology to analyse the expression of several miRNAs in

different plant species, and found that some miRNAs existed not only in dicots and monocots

but also in ferns, lycipods, and mosses. A set of 21 abundant miRNAs were found to be

conserved among the angiosperms (Axtell and Bowman, 2008), many of which are involved in

transcription factor regulation in developmental control. In another report, based on the

analysis of ESTs for miRNA targets in many plant species Zhang et al. (2006) found that they are

omnipresent. Both miRNA conservation and miRNA target conservation also indicate that plant

miRNAs have a very deep origin in plant phylogeny, at least since the last common ancestor of

bryophytes and seed plants. This suggests that miRNA-mediated gene regulation existed more

than 425 million years ago in the plant kingdom.

Conserved miRNAs do not necessarily exhibit the same levels of expression, patterns of

stage of expression in different species. Therefore, sequence and expression divergence in

21

miRNAs between species may affect miRNA accumulation and target regulation leading to

unique developmental changes and phenotypic variation. The conservation of miRNAs and

other small RNAs between the conifers and the angiosperms indicates important RNA silencing

processes were highly developed in the earliest spermatophytes (Morin et al., 2008). On the

other hand some miRNAs are species specific (Ng et al., 2012). Non-conserved miRNAs may

play more explicit roles in the given plant species (Rhoades, 2012) such as the differentiation

and elongation of cotton fibres (Boopathi and Pathmanaban, 2012), nodulation regulated

miRNAs in soybean root (Subramanian et al., 2008).

Plant transcriptome regulation by miRNAs

Although much is known about their biogenesis and biological functions, the

mechanisms allowing miRNAs to silence gene expression in plant cells are still under debate.

The current models for miRNA-mediated gene silencing can be explained as below. In the RISC

complex, miRNAs bind to target mRNA and inhibit gene expression through its cleavage or

stalling translation or by some unknown mechanisms (see Bartel 2004, for review). This causes

gene silencing and termed RNA interference (RNAi) in animals, quelling in fungi and

posttranscriptional gene silencing in plants (Baulcombe, 2004). In plants, most target mRNAs

contain a single miRNA complementary site. Most corresponding miRNAs typically and perfectly

complement to these sites and cleave the target mRNAs (Bartel, 2009). Unlike animal miRNA

targets, the complementary sites in plants can exist anywhere along the target mRNA rather

than at the 3’-UTR. Another mechanism was also identified in plant miRNA regulation where it

regulated gene expression by repressing translation possibly, through inhibition of ribosome

movement (Bartel, 2009). This suggests that miRNAs mediated regulation of gene expression in

plants is more complex than in animals.

MiRNAs regulate gene expression at the posttranscriptional level and act as key

regulators in diverse regulatory pathways. MiRNAs binding to target mRNAs often leads to

blockade of translation or degradation of the target mRNAs. Thus, identification of target

mRNAs is essential for understanding the biological functions of miRNAs. MiRNAs from plants

induce direct cleavage and degradation by binding to the target sequences with perfect base

22

pairing. Targets of miRNAs are often difficult to predict, because few of them match to their

target mRNAs perfectly. Their miRNA: mRNA duplexes often contain several mismatches, gaps

and G:U base pairs in many positions. It is known that a so-called miRNA “seed region” (usually,

nucleotides 2-7 at the 5’-end of miRNA) is the most important determinant for target specificity

(Bartel, 2009). miRNA-mediated repression often depends on perfect or near-perfect base

pairing of seed regions to their targets. A conventional way to search for miRNA targets is by

using bioinformatics. The classical model for specific miRNA target recognition by most

algorithms was mainly depended on (a) the detection of seed matches and (b) thermodynamic

stability of miRNA: mRNA duplexes. Different algorithms always produce divergent results. To

circumvent this problem to some extent, stacking binding matrix uses both the information

about the miRNAs as well as experimentally validated target sequences in the search for

candidate target sequences (Moxon et al., 2008). Further, methods do vary depending on

whether a method is for animals, plants, or viruses, since the biology of miRNAs is somewhat

different in each case. In addition, much work has been done to develop molecular tools to

identify miRNA targets, such as HITS-CHIP and microarray technique (Huang et al., 2011). Those

tools have been proven to be useful in miRNA targets research, but they are not widely applied

because their processes are too complicated. Huang et al., (2011) demonstrated the hybrid-PCR

as an effective and rapid approach for screening putative miRNA targets, with much more

advantage of simplicity, low cost, and ease of implementation. Target mRNA candidates can be

obtained through hybrid-PCR from any kind of cells at different development stages or from

different tissues. Hybrid-PCR can be used as a quick screen tool in miRNA research, although

more experimental validations are needed in further study.

Tools for miRNA identification

Methods for identifying miRNAs are based on their following major characteristics: (i) all

miRNAs are small non-coding RNAs, usually consisting of ~20–24 nt in plants (ii) All miRNA

precursors have a well-predicted stem loop hairpin structure and this fold-back hairpin

structure has a low free energy as predicted by MFOLD or other computational programs and

23

(iii) Many miRNAs are evolutionarily conserved from ferns to higher plants (Kim and Nam,

2006). To avoid designating other small RNAs or fragments of other RNAs as miRNAs, Ambros et

al. (2003) developed combined criteria to identify new miRNAs. These combined criteria include

both biogenesis and expression measures, neither of which on its own is sufficient for

identifying a candidate gene for a new miRNA. So far, all newly predicted/identified miRNAs

have confirmed to these rules.

Both experimental methods and computational approaches have been adopted to

identify miRNAs in plants. Recently, as deep sequencing technologies have become more

available and affordable, more and more studies have been employing deep sequencing to

discover and functionally analyse miRNAs in plants. In general, there are four approaches for

identifying miRNAs: genetic screening, direct cloning after isolation of small RNAs,

computational strategy or in silico methods and expressed sequence tags (ESTs) analysis. Sun

(2011) summarized the major advantages and disadvantages of these methods. A range of

techniques are available for miRNA quantification, including northern blotting, dot blotting,

RNase protection assay, primer extension analysis, Invader assay and quantitative PCR.

It is worth noting that standards of evidence for annotating miRNAs have been applied

unevenly over the years. In particular, a substantial number of annotations were added to

miRbase using the older paradigm, and have not been removed, even in cases where

subsequent experiments provide strong evidence that they are in fact siRNAs rather than

miRNAs. Therefore, analyses using miRbase as a set of verified miRNAs should take into account

that not all annotated miRNAs are equally well supported by evidence (Rhoades, 2012). To

experimentally identify and validate identified miRNAs, virtuous and efficient protocols are

necessary to isolate them, which sometimes may be challenging due to the composition of

specific tissues of certain plant species.

24

Key Software for miRNA identification

miRanda:

miRanda is an algorithm for finding genomic targets for microRNAs. This algorithm has

been written in C and is available as an open-source method under the GPL. MiRanda was

developed at the Computational Biology Center of Memorial Sloan-Kettering Cancer Center.

miRBse:

The miRBase database is a searchable database of published miRNA sequences and

annotation. Each entry in the miRBase Sequence database represents a predicted hairpin

portion of a miRNA transcript (termed mir in the database), with information on the location

and sequence of the mature miRNA sequence (termed miR). Both hairpin and mature

sequences are available for searching and browsing, and entries can also be retrieved by name,

keyword, references and annotation. All sequence and annotation data are also available for

download.

PicTar:

PicTar is an algorithm for the identification of microRNA targets. This searchable website

provides details on 3' UTR alignments with predicted sites, links to various public database.

TargetScan:

Target Scan predicts biological targets of miRNAs by searching for the presence of

conserved 8mer and 7mer sites that match the seed region of each miRNA. As an option, non-

conserved sites are also predicted. Also identified are sites with mismatches in the seed region

that are compensated by conserved 3' pairing.

MicroCosm:

MicroCosm Targets (formerly miRBase Targets) is a web resource developed by the

Enright Lab at the EMBL-EBI containing computationally predicted targets for microRNAs across

many species. The miRNA sequences are obtained from the miRBase Sequence database and

most genomic sequence from EnsEMBL. It provides the most up-to-date and accurate

predictions of miRNA targets.

25

MapMi:

MapMi is a tool designed to locate miRNA precursor sequences in existing genomic

sequences (e.gEnsembl and EnsemblMetazoa), using potential mature miRNA sequences as

input. After searching the genome for the provided mature sequences, these hits are extended

and classified taking into account major structural properties of known miRNA precursors.

MapMi uses Bowtie and RNAfold third-party tools.

Sylamer

Sylamer is a system for finding significantly over or under-represented words in

sequences according to a sorted gene list. Typically it is used to find significant enrichment or

depletion of microRNA or siRNA seed sequences from microarray expression data. Sylamer is

extremely fast and can be applied to genome-wide datasets with ease.

Potentials, problems and perspectives

At the time of this writing, there are some gaps with respects to the phylogenetic

distribution of miRNAs and understanding on the way in which miRNAs have evolved. The

addition of novel sequencing data from key phylogenetic positions will likely accomplish many

demands. For example, we now have data on small miRNAs and miRNA targets in

representative species from green algae, non-seed plants and gymnosperms. But we do not

have much information about the extent of diversity or conservation of miRNA expression

within any of these lineages. Similarly, the analysis of more examples of closely related species

will provide a clearer picture of how miRNAs evolve over shorter time frames. Several

contrasting results have shown that the same miRNA have up- or down- regulated in different

species in response to given abiotic or biotic stress (examples are provided in Sunker et al.,

2012). These conflicting results highlight the need for more in-depth and detailed

characterizations of stress-responsive miRNAs in plants. It is likely that a more comprehensive

analysis, including the impact of such regulation on miRNA targets, would provide better

insights into miRNA-guided gene regulation. It is also imperative to analyse differential

regulation of miRNAs in different tissues since it is important for adaptation to stress and that

26

tissue-specific regulation may be missed in analyses of whole plants. MiRNA*, the

complementary strand of mature functional miRNA, is thought to degrade rapidly or

accumulate at only very low levels, suggesting that it may not be functional. However, several

recent studies have shown that plant miRNA* tends to accumulate at high levels under certain

conditions and this also warrants additional investigation. Besides degradation of transcripts,

several miRNAs have been shown to regulate targets via translational repression under normal

conditions. However, it is unclear whether plants under stress use both modes of target gene

regulation or whether one mode is preferred over the other. Identification of stress-responsive

miRNAs is largely dependent on sequence-based profiling, which is known to have some bias,

and thus requires independent validation. Small RNA blot analysis, although lacking sensitivity is

a gold standard for validation. Implementation of highly reliable and rigorous assays is essential

for firm characterization of stress-responsive miRNAs in plants. Further, examining the effect of

stress regulated miRNA on its mRNA target using degradome analysis can provide robust

confirmation of the stress responsiveness of miRNA. In addition to identifying miRNA targets,

by analysing degradome libraries from control and stressed samples it should be possible to

quantify the impact of a stress responsive miRNA on its mRNA target.

In summary, plant miRNA dramatically alter gene expression during cellular

developmental phases and under all types of stress. In order to assure plant growth and

development under extreme environmental conditions, the metabolic network has to be re-

programmed. To this end, a comprehensive understanding of plant cell development is

required to decipher the molecular mechanisms of plant stress responses. This can be achieved

by successful integration of miRNA data with metabolomics, transcriptomics, proteomics and

other ‘omics’ results.

Acknowledgement

This work is financially supported by Department of Biotechnology, Government of

India.

27

References

Ambros, V. B., Bartel, B. D., Bartel, P., Burge, C. B., Carrington, J. C., Chen, X., Dreyfuss, G., Eddy,

S. R., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., and Tuschl, T., 2003 A uniform

system for microRNA annotation. RNA 9: 277-279.

Axtell, M. J. and Bartel, D. P., 2005 Antiquity of MicroRNAs and Their Targets in Land Plants.

Plant Cell 17: 1658-1673.

Axtell, M. J., and Bowman, J. L., 2008 Evolution of plant microRNAs and their targets. Trends

Plant Sci 13: 343-349.

Bartel, D.P., 2004 MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell 116: 281–

297.

Bartel, D.P., 2009 MicroRNAs: target recognition and regulatory functions. Cell 136: 215-233.

Baulcombe, D., 2004 RNA silencing in plants. Nature 431: 356-363.

Boopathi, N.M., and Pathmanaban, R., 2012 Additional insights into the adaptation of cotton

plants under abiotic stresses by in silico analysis of conserved miRNAs in cotton expressed

sequence tag database (dbEST). African Journal of Biotechnology Vol. 11(76): 14054-14063.

Huang, Y., Qi Y., Ruan, Q., Ma, Y., He, R., Ji, Y., Sun, Z., 2011 A rapid method to screen putative

mRNA targets of any known microRNA. Virology Journal 8: 8.

Kim, V.N., and Nam, J.W., 2006 Genomics of microRNA. Trends in genetics 22(3): 165-176.

Lee, R.C., Ambros, V., 2001 An extensive class of small RNAs in Caenorhabditis elegans. Science

294: 862–64.

Lee, R.C., Feinbaum, R.L., and Ambros, V., 1993 The C. elegans heterochronic gene lin-4

encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843–854.

28

Jones-RhoadesM, W., David, P., Bartel, and Bonnie, Bartel, 2006 MicroRNAs and Their

Regulatory Roles in Plants. Annu. Rev. Plant Biol. 57: 19–53.

Morin, R. D., Aksay, G., Dolgosheina, E., Ebhardt, H.A., Magrini, V., Mardis, E. R., Sahinalp, S. C,

Unrau, P. J., 2008 Comparative analysis of the small RNA transcriptomes of Pinuscontorta and

Oryzasativa. Genome Res 18 (4): 571-84.

Moxon, S., Moulton, V., Kim, J. T., 2008 A scoring matrix approach to detecting miRNA target

sites. Algorithms Mol. Biol 3 (1): 3.

Ng, D.W., Lua, J., and Chen, Z.J., 2012 Big roles for small RNAs in polyploidy, hybrid vigor, and

hybrid incompatibility. Current Opinion in Plant Biology 15: 154–161.

Rhoades, M.W.J., 2012 Conservation and divergence in plant microRNAs. Plant MolBiol 80: 3–

16.

Subramanian, S., Fu, Y., Sunkar, R., Barbazuk, W.B., Zhu, J.K., Yu, O., 2008 Novel and

nodulation-regulated microRNAs in soybean roots. BMC Genomics 9: 160.

Sun, G., 2011 MicroRNAs and their diverse functions in plants. Plant Mol. Biol. DOI

10.1007/s11103-011-9817-6.

Sunkar, R., Jagadeeswaran, G., 2008 In silico identification of conserved microRNAs in large

number of diverse plant species. BMC Plant Biol 8(1): 37.

Wightman, B., Ha, I., Ruvkun, G., 1993 Posttranscriptional regulation of the heterochronic gene

lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855–62.

Zhang, B., Pan, X., Cobb, G. P., Anderson, T. A., 2006 Plant microRNA: a small regulatory

molecule with big impact. Dev Biol 289(1):3-16.

29

Next Generation Sequencing Technology platform for development of

SNP markers and association mapping studies

Dr. N.Senthil, Professor Department of Plant Biotechnology, CPMB&B,


The decreasing cost along with rapid progress in next-generation sequencing and

related bioinformatics computing resources has facilitated large-scale discovery of SNPs in

various model and non-model plant species. Large numbers and genome-wide availability of

SNPs make them the marker of choice in partially or completely sequenced genomes.

Molecular markers are widely used in plant genetic research and breeding. Single Nucleotide

Polymorphisms (SNPs) are currently the marker of choice due to their large numbers in virtually

all populations of individuals. The applications of SNP markers have clearly been demonstrated

in human genomics where complete sequencing of the human genome led to the discovery of

several million SNPs and technologies to analyze large sets of SNPs (up to 1 million) have been

developed. SNPs have been applied in areas as diverse as human forensics and diagnostics,

aquaculture, marker assisted-breeding of dairy cattle, crop improvement, conservation, and

resource management in fisheries. Functional genomic studies have capitalized upon SNPs

located within regulatory genes, transcripts, and Expressed Sequence Tags (ESTs). Until recently

large scale SNP discovery in plants was limited to maize, Arabidopsis, and rice. Genetic

applications such as linkage mapping, population structure, association studies, map-based

cloning, marker-assisted plant breeding, and functional genomics continue to be enabled by

access to large collections of SNPs. Arabidopsis thaliana was the first plant genome sequenced

followed soon after by rice. In the year 2011 alone, the number of plant genomes sequenced

doubled as compared to the number sequenced in the previous decade, resulting in currently,

31 and counting, publicly released sequenced plant genomes (http://www.phytozome.net/).

With the ever increasing throughput of next-generation sequencing (NGS), de novo and

reference-based SNP discovery and application are now feasible for numerous plant species.

30

The major sequencing technologies are focused here,

Roche (454) Sequencing

Pyrosequencing was the first of the new highly parallel sequencing technologies to

reach the market. It is commonly referred to as 454 sequencing after the name of the company

that first commercialized it. It is an SBS method where single fragments of DNA are hybridized

to a capture bead array and the beads are emulsified with regents necessary to PCR amplifying

the individually bound template. Each bead in the emulsion acts as an independent PCR where

millions of copies of the original template are produced and bound to the capture beads which

then serve as the templates for the subsequent sequencing reaction. The individual beads are

deposited into a picotiter plate along with DNA polymerase, primers, and the enzymes

necessary to create fluorescence through the consumption of inorganic phosphate produced

during sequencing. The instrument washes the picotiter plate with each of the DNA bases in

turn. As template-specific incorporation of a base by DNA polymerase occurs, a pyrophosphate

(PPi) is produced. This pyrophosphate is detected by an enzymatic luminometric inorganic

pyrophosphate detection assay (ELIDA) through the generation of a light signal following the

conversion of PPi into ATP. Thus, the wells in which the current nucleotides are being

incorporated by the sequencing reaction occurring on the bead emit a light signal proportional

to the number of nucleotides incorporated, whereas wells in which the nucleotides are not

being incorporated do not. The instrument repeats the sequential nucleotide wash cycle

hundreds of times to lengthen the sequences. The 454 GS FLX Titanium XL+ platform currently

generates up to 700MB of raw 750bp reads in a 23 hour run. The technology has difficulty

quantifying homopolymers resulting in insertions/deletions and has an overall error rate of

approximately 1%. Reagent costs are approximately $6,200 per run.

Illumina Sequencing

Illumina technology, acquired by Illumina from Solexa, followed the release of 454

sequencing. With this sequencing approach, fragments of DNA are hybridized to a solid

substrate called a flow cell. In a process called bridge amplification, the bound DNA template

31

fragments are amplified in an isothermal reaction where copies of the template are created in

close proximity to the original. This results in clusters of DNA fragments on the flow cell

creating a “lawn” of bound single strand DNA molecules. The molecules are sequenced by

flooding the flow cell with a new class of cleavable fluorescent nucleotides and the reagents

necessary for DNA polymerization. A complementary strand of each template is synthesized

one base at a time using fluorescently labeled nucleotides. The fluorescent molecule is excited

by a laser and emits light, the colour of which is different for each of the four bases. The

fluorescent label is then cleaved off and a new round of polymerization occurs. Unlike 454

sequencing, all four bases are present for the polymerization step and only a single molecule is

incorporated per cycle. The flagship HiSeq2500 sequencing instrument from Illumina can

generate up to 600GB per run with a read length of 100nt and 0.1% error rate. The Illumina

technique can generate sequence from opposite ends of a DNA fragment, so called paired-end

(PE) reads. Reagent costs are approximately $23,500 per run.

Applied Biosystems (SOLiD) Sequencing

The SOLiD system was jointly developed by the Harvard Medical School and the Howard

Hughes Medical Institute. The library preparation in SOLiD is very similar to Roche/454 in which

clonal bead populations are prepared in microreactors containing DNA template, beads,

primers, and PCR components. Beads that contain PCR products amplified by emulsion PCR are

enriched by a proprietary process. The DNA templates on the beads are modified at their 3′ end

to allow attachment to glass slides. A primer is annealed to an adapter on the DNA template

and a mixture of fluorescently tagged oligonucleotides is pumped into the flow cell. When the

oligonucleotide matches the template sequence, it is ligated onto the primer and the

unincorporated nucleotides are washed away. A charged couple device (CCD) camera captures

the different colours attached to the primer. Each fluorescence wavelength corresponds to a

particular dinucleotide combination. After image capture, the fluorescent tag is removed and

new set of oligonucleotides are injected into the flow cell to begin the next round of DNA

ligation. This sequencing-by-ligation method in SOLiD-5500x1 platform generates up to 1,410

32

million PE reads of nucleotide each with an error rate of 0.01% and reagent cost of

approximately $10,500 per run.

Although widely accepted and used, the NGS platforms suffer from amplification biases

introduced by PCR and dephasing due to varying extension of templates. The TGS technologies

use single molecule sequencing which eliminates the need for prior amplification of DNA thus

overcoming the limitations imposed by NGS. The advantages offered by TGS technology are (i)

lower cost, (ii) high throughput, (iii) faster turnaround, and (iv) longer reads. The TGS can

broadly be classified into three different categories: (i) SBS where individual nucleotides are

observed as they incorporate (Pacific Biosciences single molecule real time (SMART), Heliscope

true single molecule sequencing (tSMS), and Life Technologies/Starlight and Ion Torrent), (ii)

nanopore sequencing where single nucleotides are detected as they pass through a nanopore

(Oxford/Nanopore), and (iii) direct imaging of individual molecules (IBM).

Helicos Biosciences Corporation (Heliscope) Sequencing

Heliscope sequencing involves DNA library preparation and DNA shearing followed by

addition of a poly-A tail to the sheared DNA fragments. This poly-A tailed DNA fragments are

attached to flow cells through poly-T anchors. The sequencing proceeds by DNA extension with

one out of 4 fluorescent tagged nucleotides incorporated followed by detection by the

Heliscope sequencer. The fluorescent tag on the incorporated nucleotide is then chemically

cleaved to allow subsequent elongation of DNA. Heliscope sequencers can generate up to 28GB

of sequence data per run (50 channels) with maximum read length of 55bp at 99% accuracy.

The cost per run per channel is approximately $360.

Pacific Biosciences SMART Sequencing

The Pacific Biosciences sequencer uses glass anchored DNA polymerases which are

housed at the bottom of a zero-mode waveguide (ZMW). DNA fragments are added into the

ZMW chamber with the anchored DNA polymerase and nucleotides, each labeled with a

different colour fluorophore, and are diffused from above the ZMW. As the nucleotides

33

circulate through the ZMW, only the incorporated nucleotides remain at the bottom of the

ZMW while unincorporated nucleotides diffuse back above the ZMW. A laser placed below the

ZMW excites only the fluorophores of the incorporated nucleotides as the ZMW entraps the

light and does not allow it to reach the unincorporated nucleotides above. The Pacific

Biosciences sequencers can generate up to 140MB of sequences per run (per smart cell) with

reads of 2.5Kbp at 85% accuracy. The cost per run per smart cell is approximately $600.

Among the TGS technologies, Pacific Biosciences SMART and Heliscope tSMS have been

used in characterizing bacterial genomes and in human-disease-related studies; however, TGS

has yet to be capitalized upon in plant genomes. The Heliscope generates short reads (55bp)

which may cause ambiguous read mapping due to the presence of paralogous sequences and

repetitive elements in plant genomes. The Pacific Biosciences reads have high error rates which

limit their direct use in SNP discovery. However, their long reads offer a definite advantage to

fill gaps in genomic sequences and, at least in bacterial genomes, NGS reads have proven

capable of “correcting” the base call errors of this TGS technology. Hybrid assemblies

incorporating short (Illumina, SOLiD), medium (454/Roche), and long reads (Pac-Bio) have the

potential to yield better quality reference genomes and, as such, would provide an improved

tool for SNP discovery.

The choice of a sequencing strategy must take into account the research goals, ability to

store and analyze data, the ongoing changes in performance parameters, and the cost of

NGS/TGS platforms. Some key considerations include cost per raw base, cost per consensus

base, raw and consensus accuracy of bases, read length, cost per read, and availability of PE or

single end reads. The pre- and postprocessing protocols such as library construction and

pipeline development and implementation for data analysis are also important.

RNA and ChIP Sequencing

Genome-wide analyses of RNA sequences and their qualitative and quantitative

measurements provide insights into the complex nature of regulatory networks. RNA

34

sequencing has been performed on a number of plant species including Arabidopsis, soybean,

rice, and maize for transcript profiling and detection of splice variants. RNA sequencing has

been used in de novo assemblies followed by SNP discovery performed in nonmodel plants

such as Eucalyptus grandis Brassica napus, and Medicago sativa.

RNA deep-sequencing technologies such as digital gene expression and Illumina RNASeq

are both qualitative and quantitative in nature and permit the identification of rare transcripts

and splice variants. RNA sequencing may be performed following its conversion into cDNA that

can then be sequenced as such. This method is, however, prone to error due to (i) the

inefficient nature of reverse transcriptases (RTs), (ii) DNA-dependent DNA polymerase activity

of RT causing spurious second strand DNA , and (iii) artifactual cDNA synthesis due to template

switching. Direct RNA sequencing (DRS) developed by Helicos Biosciences Corporation is a high

throughput and cost-effective method which eliminates the need for cDNA synthesis and

ligation/amplification leading to improved accuracy.

Chromatin immunoprecipitation (ChIP) is a specialized sequencing method that was

specifically designed to identify DNA sequences involved in in vivo protein DNA interaction.

ChIP-sequencing (ChIP-Seq) is used to map the binding sites of transcription factors and other

DNA binding sites for proteins such as histones. As such, ChIP-Seq does not aid SNP discovery,

but the availability of SNP data along with ChIP-Seq allows the study of allele-specific states of

chromatin organization. Deep sequence coverage leading to dense SNP maps permits the

identification of transcription factor binding sites and histone-mediated epigenetic

modifications. ChIP-Seq can be performed on serial analysis of gene expression (SAGE) tags or

PE using Sanger, 454, and Illumina platforms.

Applications of SNP markers in Association Mapping

SNPs are increasingly becoming the marker of choice for a wide range of applications

including genetic mapping, MAS, diversity analysis and association mapping. Fast and efficient

35

generation of these SNPs has been facilitated by high throughput genotyping methods, as

mentioned above thus making SNPs a marker of choice for association genetics studies.

As the availability of SNPs increases, they are displacing other forms of molecular

markers for association mapping studies. As costs associated with SNP discovery and detection

continue to fall, SNPs will increasingly be associated with agronomic traits and will be applied

for crop improvement through Association Mapping.

General Scheme of Association Mapping for Tagging a Gene of Interest using

Germplasm Accessions.

36

Types of Association Mapping

Genome-wide Association Mapping (GWAS)

It is a comprehensive approach to systematically search the genome for causal genetic

variation. A larger number of markers are tested for association with various complex traits,

and prior information regarding candidate gene is not required. It works best for a research

consortium with complementary expertise and adequate funding.

Candidate- gene association mapping

Candidate genes are selected based on prior knowledge from mutational analysis,

biochemical pathway, or linkage analysis of the trait of interest. An independent set of random

markers needs to be scored to infer genetic relationships. It is a low cost, hypothesis driven,

and trait specific approach but will miss other unknown loci. (Zhu et al., 2010).

Genome Scans and Candidate Genes

Association studies with high density SNP coverage, large sample size, and minimum

population structure offer great promise in complex trait dissection. To date, candidate-gene

association studies have searched only a tiny fraction of the genome. As genomic technologies

continue to evolve, we would certainly expect to see more genome-wide association analyses

conducted in different plant species. So far, there have been few successful results from

candidate-gene association mapping. But for many research groups, starting with candidate-

gene sequences and background markers will provide firm understanding of population

structure, familial relatedness, nucleotide diversity, LD decay, and many other aspects of

association mapping. Afterward, this knowledge can be built on through comprehensive

genome scans with intensive sequencing and high-density genotyping.

Another reason for the promising but still limited success found in the candidate-gene

approach is the way candidate genes were selected. Obviously, many candidate genes were

discovered though comparisons of severe mutants and the wild-type lines. We do not have a

37

strong understanding of naturally occurring effects of alleles at such loci. Even if the loss-of-

function allele results in a significant phenotypic change, we can only expect that mild

mutations would have a somewhat modest effect on the phenotype; those changes, in turn,

could be detected with the assembled association mapping population. Moreover, both the

frequency and effect of the allele effect whether variation explained by a locus is detectable. A

skewed allele frequency would make it difficult to detect an association even though the

candidate gene polymorphism is truly underlying the phenotypic variation.

Nested Association Mapping

Ultimately, it is desirable to have both candidate-gene and genome-wide approaches to

exploit in a species along with traditional linkage mapping. Joint linkage and linkage

disequilibrium mapping have been proposed as “Fine Mapping’’ approach in theory (Mott and

Flint, 2002; Wu et al., 2002) and demonstrated in practice (Blott et al., 2003; Meuwissen et al.,

2002). Nested association mapping (NAM), is currently implemented in maize, could be an even

more powerful strategy for dissecting the genetic basis of quantitative traits in species with low

LD (Yu et al., 2008). For other crop species, different genetic designs (e.g., diallel, design II,

eight-way cross, single round robin, or double round robin) could be used to accommodate the

level of LD, practicality of creating the population and phenotyping a large number of RILs, and

resources available (Churchill et al., 2004; Stich et al., 2007).

In essence, by integrating genetic design, natural diversity, and genomics technologies,

the NAM strategy allows high power, cost-effective genome scans, and facilitates community

endeavours to link molecular variation with complex trait variation.

38

Functional genomics

Dr. M. Raveendran, Associate Professor, Departmemt of Plant Biotechnolgy, CPMB&B,


Functional genomics is a field of molecular biology that attempts to describe gene (and

protein) functions and interactions. Functional genomics focuses on the dynamic aspects such

as gene transcription, translation, and protein–protein interactions. A key characteristic of

functional genomics studies is their genome-wide approach to these questions, generally

involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

This approach is based on data mining, large scale experimental methodologies combined with

statistical and computational analyses of the results.

The goal of functional genomics is to understand the relationship between an

organism's genome and its phenotype. Functional genomics involves studies of natural

variation in genes, RNA, and proteins over time (such as an organism's development) or space

(such as its body regions), as well as studies of natural or experimental functional disruptions

affecting genes, chromosomes, RNA, or proteins.

Techniques and applications

Functional genomics includes function-related aspects of the genome gene expression

level (molecular activities), mutation and polymorphism (such as SNP) analysis. Expression

analysis (profiling) comprise a number of "-omics" such as transcriptomics (gene expression),

proteomics (protein expression), and metabolomics. Functional genomics uses mostly multiplex

techniques to measure the abundance of many or all gene products such as mRNAs or proteins

within a biological sample.

39

Microarrays

Microarrays measure the amount of mRNA in a sample that corresponds to a given gene

or probe DNA sequence. Probe sequences are immobilized on a solid surface and allowed to

hybridize with fluorescently labeled “target” mRNA. The intensity of fluorescence of a spot is

proportional to the amount of target sequence that has hybridized to that spot, and therefore

to the abundance of that mRNA sequence in the sample. Microarrays allow for identification of

candidate genes involved in a given process based on variation between transcript levels for

different conditions and shared expression patterns with genes of known function.

SAGE

SAGE (Serial analysis of gene expression) is an alternate method of gene expression

analysis based on RNA sequencing rather than hybridization. SAGE relies on the sequencing of

10–17 base pair tags which are unique to each gene. These tags are produced from poly-A

mRNA and ligated end-to-end before sequencing. SAGE gives an unbiased measurement of the

number of transcripts per cell.

At the protein level: protein–protein interactions

Yeast two-hybrid system

A yeast two-hybrid (Y2H) screen tests a "bait" protein against many potential interacting

proteins ("prey") to identify physical protein–protein interactions. This system is based on a

transcription factor, originally GAL4, which possess separate DNA-binding and transcription

activation domains required in order for the protein to cause transcription of a reporter gene.

In a Y2H screen, the "bait" protein is fused to the binding domain of GAL4, and a library of

potential "prey" (interacting) proteins is recombinantly expressed in a vector with the

activation domain. In vivo interaction of bait and prey proteins in yeast cell brings the activation

and binding domains of GAL4 close enough together to result in expression of a reporter gene.

It is also possible to systematically test a library of bait proteins against a library of prey

proteins to identify all possible interactions in a cell.

40

AP/MS

Affinity purification and mass spectrometry (AP/MS) is able to identify proteins that

interact with one another in complexes. Complexes of proteins are allowed to form around a

particular “bait” protein. The bait protein is identified using an antibody or a recombinant tag

which allows it to be extracted along with any proteins that have formed a complex with it. The

proteins are then digested into short peptide fragments and mass spectrometry is used to

identify the proteins based on the mass-to-charge ratios of those fragments.

Loss-of-function techniques Mutagenesis

Gene function can be investigated by systematically “knocking out” genes one by one.

This is done by either deletion or disruption of function (such as by insertional mutagenesis)

and the resulting organisms are screened for phenotypes that provide clues to the function of

the disrupted gene.

RNAi

RNA interference (RNAi) methods can be used to transiently silence or knock down gene

expression using ~20 base-pair double-stranded RNA typically delivered by transfection of

synthetic ~20-mer short-interfering RNA molecules (siRNAs) or by virally encoded short-hairpin

RNAs (shRNAs). RNAi screens, typically performed in cell culture-based assays or experimental

organisms (such as C. elegans) can be used to systematically disrupt nearly every gene in a

genome or subsets of genes (sub-genomes); possible functions of disrupted genes can be

assigned based on observed phenotypes.

Functional genomics and bioinformatics

Because of the large quantity of data produced by these techniques and the desire to

find biologically meaningful patterns, bioinformatics is crucial to analysis of functional genomics

data. Examples of techniques in this class are data clustering or principal component analysis

for unsupervised machine learning (class detection) as well as artificial neural networks or

support vector machines for supervised machine learning.

41

Genome Mapping: Establishing route maps of Genomes

Dr. M. Maheswaran, Professor Centre for Plant Breeding and Genetics


Introduction

Genome of each organism is the blue print that determines the ultimate life. In general

genomes are grouped into two categories viz. prokaryotic and eukaryotic. The eukaryotic

genome is safeguarded by the nuclear membrane from exposure to cytoplasm. The constitution

of nuclear genome of different organisms has different sequences, variable amount of DNA and

the number of chromosomes. However, certain features are unique to eukaryotic genomes

compared with prokaryotic genomes (see Table 1).

Table 1: Major differences between prokaryotic and eukaryotic genomes

Characteristic Prokaryotes Eukaryotes

Genome size 600kb to 9.5 Mb 3 Mb to 140000 Mb

Average gene size/number 950bp/4300 2500bp/19000

Operon-like regulatory unit General No

Horizontal gene transfer Significant Negligible

Rate of non-coding sequences Low High

Intron Rare General

Redundant gene number Rare General

Ploidy level Haploidy Haploidy to Polyploidy

Chromosome number One More than one

Heterozygosity No Yes

What is a genome?

The conventional definition of genome is number of chromosomes representing

haploidy. But the recent definition on a genome states the following: The term genome is used

to describe the totality of chromosomes (in molecular term DNA) unique to a particular

42

organism or any cell within the organism, as distinct from genotype, which is the information

contained within those chromosomes.

The first genomes to be studied in depth at the molecular level were those of the

bacterium Eschercia coli and its bacteriophage. The second group of genomes to be studied in

great detail at the molecular level was those of mitochondria and chloroplasts. Finally attention

has turned to a detailed study of the nuclear genomes of selected eukaryotes: yeast

(Saccharomyces cerevisiae), a nematode (Caenorhabditis elegans), a mammal (Homo sapiens)

and plants (Arabidopsis thaliana) and rice (Oryza sativa).

Figure 1. Genomes A) Arabidopsis thaliana (2n = 10),

B) Oryza sativa (2n = 24) and C) Homo sapiens (2n = 46)

Nuclear DNA content and genomes

The total amount of DNA in the (haploid) genome is a characteristic of each living

species known as its C value. The fact that the DNA content of an organism is much greater

than that required to code for and regulate the production of all necessary proteins has been

termed the C value paradox. In eukaryotes, the C value is defined as the amount of DNA per

genome (1 C = haploid nucleus, 2C = dipliod nucleus and 4C = nucleus which is just about to

43

divide by mitosis). There is enormous variation in the range of C values, from as little as a mere

106 bp for a mycoplasma to as much as 1011 bp for some plants and animals. Table 2

summarizes the range of C values found in different plant species.

Table 2. Variation in nuclear DNA content of various plant species

Species pg/2C Mbp/1C

Arabidopsis thaliana 0.30 145

Arachis hypogaea 5.83 2813

Brassica juncea 2.29 1105

Cicer arietinum 1.53 738

Glycine max 2.31 1115

Gossypium hirsutum 4.39 2118

Helainthus annuus 5.95 2871

Hordeum vulgare 10.10 4873

Lycopersicon esculentum 1.88 907

Oryza sativa 0.87 419

Phaselous vulgaris 1.32 637

Pisum sativum 8.18 3947

Saccharum officinarum 5.28 2547

Triticum aestivum 33.09 15966

Vigna mungo 1.19 574

Vigna radiata 1.20 579

Zea mays 4.75 2292

1 picogram (pg) = 965 Million base pairs (Mbp)

Genome or Chromosome Mapping

Genome mapping, otherwise also called as chromosome mapping, is establishing the

road map of a genome. Mapping of chromosomes helps to locating important genes.

Chromosome mapping helps to identify the molecular environment of both coding and non

coding sequences. The map of a chromosome can be of two types: genetic map and physical

map.

44

Figure 2. Diagrammatic representation of genetic map and portion of physical map of a chromosome

Genetic map: Genetic map identifies the linear arrangement of genes on a chromosome and

is assembled from meiotic recombination data. These are theoretical maps which give the order

of genes on a chromosome. They cannot pinpoint the physical whereabouts of genes or

determine how far apart they are. Distances on this map are not directly equivalent to physical

distances. The unit of measurement is the centi-Morgon (cM).

Physical map: Physical maps identify the actual physical position of genes on a chromosome.

Distances are measured in base pairs (bp), kilobases (kb = 1000 bases) or megabases (mb = 10,

00, 000 bases).

Construction of a Genetic Map

The construction of a genetic map involves the linkage analysis of genes segregating

among the progenies of a sexual cross. According to the Mendel’s Law of Independent

Assortment, members of different pairs of alleles assort independently of each other when the

germ cell is formed. This is only true if the genes are on different chromosomes. If they are on

the same chromosome, they are said to be linked. The frequency of recombination of a pair of

linked gene is constant and characteristic for that pair of genes. The strength of linkage and

recombination observed is a function of the distance on the chromosome between the genes in

question. The greater the distance recombination is expected between genes. In other words,

45

the frequency of recombinants (not similar to both the parents) determines the distance

between genes. When multiple genes are considered for their recombination frequency, one

can group all the genes, which are linked, and the resultant group is called as linkage group. The

number of linkage groups equals the haploid number of chromosomes. The early work on

linkage map construction was mainly based on the segregation of easily observable

morphological traits among the progenies of a cross. There are several examples of linkage

maps resulted from the morphological traits, otherwise called as morphological markers.

Morphological markers are easily observable traits with discrete phenotype. Though these

markers are highly advantageous, the number is the limitation. Further, these markers are

influenced by the environment and the genetic background of the individual.

Genes were the markers for earlier genetic maps

The first genetic maps, constructed in the early decades of the 20th century for

organisms such as the fruit fly, used genes as markers. This was many years before it was

understood that genes are segments of DNA molecules. Instead, genes were looked on as

abstract entities responsible for the transmission of heritable characteristics from parent to

offspring. To be useful in genetic analysis, a heritable characteristic had to exist in tow

alternative forms or phenotypes, an example being tall and dwarf statures in the pea plants

originally studied by Mendel. By 1922 over 50 genes had been mapped on the four fruit fly

chromosomes. Figure 3 shows the genome and genetic map based on various genes of

Drosophila melanogaster

46

Figure 3. Diagrammatic representation of genetic map and portion of physical map of a

chromosome

Concept behind the genetic map construction

Genes located on the same chromosome tend to remain together and all the genes do

not assort independently as Mendel proposed. These genes on the same chromosome that do

not show independent assortment are said to be linked. The strength of linkage between a pair

of genes can be determined based on the recombination between genes by a process called

crossing over between two homologous chromosomes. This results in the production of

gametes similar to parental types and some not similar to parental types in a heterozygous

47

organism. The non-parental type of gametes is called recombinant types. The frequency of

recombinants is a measure of whether genes are linked or not, if so, how close together they

are on the chromosome and this process is called as linkage analysis.

Figure 4. Diagrammatic representation of crossing over and recombination between two

homologous chromosomes.

Based on the linkage analysis one can determine the distance between two genes on a

chromosome and establish the order of genes with other genes. Two-point analysis is a method

by which the genetic distance between two linked genes is determined. The order of genes is

established based on three-point analysis. When many genes are involved it is called multipoint

analysis.

DNA markers hastened the process of genetic map construction

In the past, genetic maps of different plant and animal genomes were constructed

mostly based on morphological markers (phenotypes showing Mendelian segregation).

However, the availability of fewer markers slowed down the approach of linkage map

construction in several species. The advent of DNA markers - polymorphic and locus

characterized by a number of variable lengths paved the way for easy and faster linkage map

48

construction. These DNA markers possess several positive features over the morphological

markers. See Table 3 for the differences between the morphological and molecular markers.

Table 3. Differences between morphological and molecular markers

Morphological markers Molecular markers

Less in number

Interact with the environment

Show differential expression across

genotypes

Not distributed throughout the

genome

Expression depends on the ontogeny of

the individual

Dominant-recessive expression

More in number

No environmental influence

Phenotypically neutral

Distributed throughout the genome

Possibility of using these markers at

any growth stage of the individual

Codominant or dominant

Codominant or dominant

Construction of genetic maps using molecular markers generally involves the following

steps viz., 1) Identifying suitable DNA marker system for the map construction, 2) Parental

survey to identify polymorphic DNA markers, 3) Establishing a suitable mapping population, 4)

Segregation analysis of polymorphic markers on the individuals of the mapping population, 5)

Linkage analysis to establish linkage map of each chromosome based on marker-segregation

data and 6) Assigning chromosome numbers to linkage groups.

Identifying suitable DNA marker system for the map construction

DNA markers behave in the same way the genes behave as for as their inheritance thus

giving the opportunity to locate them on specific chromosomes. These DNA markers can be

grouped into two groups viz. single locus marker system and multi-loci marker system. Markers

belonging to the first category are more useful in genetic map construction [e.g restriction

fragment length polymorphism (RFLP) and simple sequence repeats (SSR)]. However, multi-loci

49

markers such randomly amplified polymorphic DNA (RAPD) and amplified fragment length

polymorphism (AFLP) are also used in genetic map construction of various organisms.

Figure 5. (A) Single locus markers (top RFLP; bottom SSR), (B) multi-loci marker system (top

RAPD; bottom AFLP)

As with gene markers, to be useful a DNA marker must exist in at least two allelic forms.

Markers having differences in their allelic forms are said to be polymorphic and others not

having the differences are considered monomorphic. For some of the DNA markers, the

alternative allele may not exist. In other words, any one of the parents will have the marker.

This kind of markers will not be detected in the hybrid of the parental combination in which the

marker is polymorphic. These markers are called as dominant markers. The other category of

markers are said to be codominant since both the alleles of a parental combination can be

detected in the hybrid. The codominant markers thus differentiate the heterozygotes from the

homozygotes thus they are more informative than the dominant markers.

50

Figure 6. A) Polymorphic marker, B) Monomorphic marker,C) Polymorphic co-dominant, D)

Polymorphic dominant

Parental survey to identify polymorphic DNA markers

To construct a genetic map using DNA markers, the foremost requirement is a set of

polymorphic markers covering the entire genome of an organism. This can be achieved by

selecting suitable parents which are not only divergent in their phenotypes but also in their

genotypes. A survey of different individuals with selected DNA marker system will help to

establish a suitable parental combination for developing the mapping population.

Establishing a suitable mapping population

The group of progenies derived from two parents employed to assess the segregation of

DNA markers for the purpose of linkage map construction is called as mapping population. It

would be always advantageous using populations of early generations such as F2, F3, BC

population etc, since these populations can be established in shorter time scale. Apart from

these populations, populations of recombinant inbred lines (RIL) and doubled haploid lines

(DHL) are also used in many of the mapping projects. The maps constructed using the RIL and

DHL populations remain the best choice considering the utility of genetic maps in various

genetic studies.

C

P F P

B) D A

A

B

A

D A

A A

B

51

Figure 7. Various routes to develop mapping populations for the purpose of genetic map

construction

Segregation analysis of polymorphic markers across the mapping population

When a set of polymorphic markers are identified for a particular parental combination

then all those markers are to be surveyed on the segregating population of those parents. The

segregating markers are given scores for their alleles. If the allele in an offspring is similar to P1

the score to be given is ‘1’. Score ‘3’ will be given when the allele is similar to P2. If both alleles

are present then the score to be given is ‘2’.

Figure 8. Segregation of an RFLP marker on a set of recombinant inbred lines.

P1 P2 X

F1 P1 X

F1 P2 X

B1

B2

F2 F4 F3 F8

Backcross Populations

RIL

DHL

52

Linkage analysis to establish linkage map of each chromosome

When the segregation pattern for a set of polymorphic markers is generated, the data

set is subjected to two-point analysis followed by multi-point analysis using standard software

packages. There are several software packages available that made the process of linkage map

construction simple. An alphabetic list of genetic analysis software is available at

http://www.nslij-genetics.org/soft/ ;http://linkage.rockefeller.edu/soft/. The routinely used

software for linkage map construction is from Lander et al. (1987). The online tutorial is

available at http://www-genome.wi.mit.edu/genome_software.

Assigning chromosome numbers to linkage groups

During early phase of linkage map construction, chromosomal assignment to linkage

groups are done based on the establishment of chromosome specific markers using cytogenetic

stocks such trisomics and monosomics. Most of the RFLP markers are assigned with

chromosome numbers based on the dosage analysis using trisomics. With the advent of multi-

loci markers for genetic map construction, the chromosomal assignment is done by involving

markers in the linkage analysis for which the chromosomal positions are known.

Figure 8. A molecular linkage map of rice chromosome # 6.

53

Conclusion

Demonstration of the role of DNA markers in the construction of genetic maps has been

done in major crop species such as tomato (Bernatzky and Tanksley, l986) maize (Helentzaris et

al., l986), lettuce (Landry et al., l987), potato (Bonierbale et al., l988) and rice (McCouch et al.,

l988). The recent developments in molecular biology simplified many aspects in the genome

mapping based on the position of certain land marks (often referred to as genetic markers) in

the genome. Genetic maps are the basic route maps locate both major and minor genes. But

the genetic maps are not just for locating the important genes, it is also about the nature,

organization and interaction of sequences on a chromosome. They also provide the key to

understanding the nature of genes and their action across genomes. The molecular marker

based genetic maps give the clue to clone the genes of agronomic importance. The increased

use of genome maps in crop species is realized in various crop species which resulted in the

better understanding on the molecular environment of the genomes.

References

Bernatzky, R., and Tanksley, S. D., 1986 Toward a saturated linkage map in tommato based on

isozymes and random cDNA sequences. Genetics 112: 887-898.

Bonierbalae, M. B., Plaisted, R. L., and Tanksley, S. D., l988 RFLP maps based on common set of

clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 1095-

1103.

Helentzaris, T., Slocum, M., Wright, S., Shaefer, A., and Nienhuis, J., 1986 Construction of

genetic linkage maps in maize and toamto using restriction fragment length polymorphism.

Theor. Appl. Genet., 72: 761-769.

Lander, E. S., Green, P., Abrahamson, J., Barlow, A., Daly, M., Lincoln, S. E., and Newburg, L.,

1987 Mapmaker: an interactive computer package for constructing primary genetic linkage

maps of experimental and natural populations. Genomics 1: 174-181.

54

Landry, B.S., Kesseli, R.V., Farrara, B., and Michelmore, R. W., l987 A genetic map of lettuce

(Lactuca sativa L.) with restriction fragment length polymorphismm, isozyme, disease

resistance, and morphological markers. Genetics 116: 331-337.

Mccouch, S. R., Kochert, G., Yu, Z. H., Wang, Z.Y., Khush, G. S., Coffman, W. R., and Tanksley, S.

D., 1988 Molecular mapping rice chromosomes. Theor. Appl. Genet., 76: 815-829.

55

Nutrigenomics in Agriculture

Dr. P. Nagarajan* and Devkar Vikas Suresh *Professor and Head

Department of Plant Molecular Biology and Bioinformatics, CPMB&B, Tamil Nadu Agricultural University, Coimbatore.

[email protected]

In modern molecular biology and genetics, the genome is the entirety of an organism's

hereditary information. It is encoded either in DNA or for many types of virus, in RNA. The

genome includes both the genes and the non-coding sequences of the DNA. The term was

adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany.

It is a portmanteau word i.e., blending the sounds and the meaning of the two other words

gene and chromosome, (Gene + ome (for mass), large number of genes gathered. The term

genome can be applied specifically to mean that stored on a complete set of nuclear DNA (i.e.,

the "nuclear genome"). Although plants possess mitochondrial and chloroplast genomes, their

nuclear genome is the largest and most complex. Until very recently, the molecular analysis of

plants often focused on the single gene level. Recent technological advances have changed this

paradigm, enabling the analysis of organisms in terms of genome organization, expression and

interaction. The study of the way genes and genetic information are organized within the

genome, the methods of collecting and analyzing this information and how this organization

determines their biological functionality is referred to as genomics. Genomics is reversing the

previous paradigm of identifying genes behind biological functions and instead focuses on

finding biological functions behind genes and also reduces the gap between phenotype and

genotype.

Effects of the Human Genome Project are surfacing in anticipated and unanticipated

areas. One of the key discoveries from the project is the existence of individual differences in

gene sequences that result in differential response to environmental factors, such as diet.

Those genetic differences, single nucleotide polymorphisms (SNPs, pronounced snips), are the

key genetic enabler of the emerging scientific discipline called nutrigenomics or nutritional

56

genomics. In the broad arena of health and wellness, one of the major impacts on the food

industry, throughout the value chain from agricultural crops through consumer products, is this

emerging understanding of human nutrition at the molecular level of gene expression. The

science of nutrigenomics is the study of how naturally occurring chemicals in foods alter

molecular expression of genetic information in each individual. In addition to bioinformatics,

three scientific methodological approaches underpin nutrigenomics. Those methodological

approaches are based on nutrition, molecular biology, and genomics. Integration of these

disciplines are leading to identification and understanding of individual and population

differences and similarities in gene expression, or phenotype, in response to diet. When a gene

is activated or expressed, a protein is produced. Through molecular biology and the tools of

genomics have identified genes responsible for production of nutritionally important proteins

such as digestive enzymes, transport molecules responsible for ferrying nutrients and cofactors

to their site of use, and numerous other molecules responsible for metabolism and utilization

of our dietary components, including macronutrients, vitamins, minerals, and phytochemicals.

Gene expression patterns produce a phenotype, which represents the physical characteristics

or observable traits of an organism, e.g., hair color, weight, or the presence or absence of a

disease. Phenotypic traits are not necessarily produced by genes alone. Phenotypic expression

is influenced by nutrition.

India has the largest number of vitamin-A deficient children in the world with 3, 30,000

children dying directly or indirectly every year, due to this deficiency. Food sources containing

natural carotenoids may be more beneficial than vitamin supplements (Van den Berg et al.,

2000). Folate is required for synthesis of purine, pyrimidine, nucleoproteins and for

methylation reactions that play an important role in cell division and development. Folates also

plays important role in proper growth and for optimal functioning of the bone marrow. Folate

deficiency is one of the world’s most common human nutrition problems because humans and

animals are unable to make folates de novo and hence depend on dietary sources, especially on

plants (Blancquaert et al., 2010, Scott et al., 2000).

57

Pulses are the predominant source of vegetarian protein in Indian diet. Pulses contain

18 - 31 percent of protein. Pulses are grown for direct human consumption, available all year

and important sources of dietary proteins, dietary fiber, iron, zinc, and calcium. Pulses have

contributed significantly in providing nutritionally balanced food predominantly in vegetarian

population of India. Keeping above in mind, to contribute towards nutrigenomis an attempt

was made to enrich the β-carotene and folate in pulses through genomics approaches. Study

was carried out on estimation of total carotenoids by spectrophotometer, β-carotene

quantification by HPLC and 5-methyltetrahydrofolate (5-CH3-H4-folate) extraction by trienzyme

treatment (α-amylase, protease, folateconjugase) and quantification of its content by HPLC in

different pulses namely bengal gram, red gram, green gram, black gram and soybean.

In bengal gram, the variety CO 3 was found to contain highest total carotenoids (1003

µg/100g) and β-carotene (99.4 µg/100g), whereas 5-CH3-H4-folate was found to be more in CO

4 (25.11µg/100g). Among all the red gram varieties tested, CO 7 was found to contain highest

total carotenoids (1161µg/100g), β-carotene (105.6 µg/100g) and 5-CH3-H4-folate (46.08

µg/100g). In green gram, the CO 7 variety was found to be contain highest total carotenoids

(974 µg/100g) and β-carotene (165.3 µg/100g), whereas CO 5 was found to contain highest 5-

CH3-H4-folate (46.08 µg/100g). In black gram, the variety VBN 5 showed highest total

carotenoids (661 µg/100g), highest in β-carotene (70.3 µg/100g), while 5-CH3-H4-folate

content was more in CO 6 variety (23.84µg/100g). Among the soybean cultivar the total

carotenoids was found to be highest in CO 2 (233 µg/100g) and β-carotene (70.3 µg/100g),

while 5-CH3-H4-folate content was more in CO 1 (46.08 µg/100g).

Several studies have demonstrated that carotenoid accumulation is mainly controlled by

the transcriptional regulation of carotenoid biosynthetic genes like Lycopene is cyclized by ε

and/or β-cyclase to give rise to the yellow-orange pigments, (Matthews et al., 2003). Bai et al.

(2009) suggested that higher levels of LCYB in the endosperm would result in increased β-

carotene content. The Arabidopsis aminodeoxychorismate synthase (AtADCS) gene is

introduced in tomato plant with E8 promoter. Due to this gene incorporation there was 19 fold

58

enhancements in PABA. There crossing between the AtADCS plants and T1 folE gene (GCH)

plants. The GCHI/ AtADCS transgenic fruit accumulated enough folates (840 g per 100 g) to

provide the whole recommended dietary of pregnant women (Storozhenko et al., 2005).

The degenerate primers were designed for β-carotene biosynthesizing gene (Lcy-β) and

two of the folate biosynthesizing genes (ADCS and DHNA). The PCR conditions were

standardized using genomic DNA from all the pulse crops tested. The expression level of β-

carotene and folate biosynthesizing genes was determined by reverse transcriptase PCR (RT-

PCR) analysis. The β-carotene biosynthesizing gene β-lycopene cyclase (Lcy-β) expression was

observed same in all pulse crops, but in folate biosynthesizing genes such as

aminodeoxychorismate synthase (ADCS) and dihydroneopterinaldolase gene (DHNA) shown

more expression in soybean followed by green gram and bengal gram.

References

Bai, L. k., Della, P. D., and Brutnell, T. P., 2009 Novel lycopene epsilon cyclase activities in maize

revealed through perturbation of carotenoid biosynthesis. The Plant Journal 59: 588–599.

Blancquaert, D., Storozhenko, S., Loizeau, K., De Steur, H., and De Brouwer, V., 2010 Folates and

folic acid: from fundamental research toward sustainable health. Critical Reviews in Plant

Sciences 29: 14–35.

Matthews, P. D., Luo, R., and Wurtzel, E.T., 2003 Maize phytoene desaturase and zeta

carotenedesaturase catalyze a poly-Z desaturation pathway: implications for genetic

engineeringof carotenoid content among cereal crops. Journal of Experimental Botany 54:

2215–2230.

Scott, J., Rebeille, F., and Fletcher, J., 2000 Folic acid and folates: the feasibility for nutritional

enhancement in plant foods. Journal of the Science of Food and Agriculture 80: 795–824.

Storozhenkoa, S., Ravanelb, S., Zhangc, G. F., Rebeille, F., Lambertc, W., and Straeten, D., 2005

Folate enhancement in staple crops by metabolic engineering. Trends in Food Science &

Technology 16: 271–281.

59

Van den Berg, H., Faulks, R., Granado, H.F., Hirschberg, J., Olmedilla, B., Sandmann, G., Southon,

S., and Stahl, W., 2000. The potential for the improvement of carotenoid levels in foods and the

likely systemic effects. Journal of the Science Food and Agriculture 80: 880–912.

60

Targeting Induced Local Lesions IN Genomes (TILLING): Application of Traditional mutagenesis for Plant Functional Genomics

Madasu Viswanath, V. Anusheela, A. John Joel and Dr. S. Ganesh Ram

Department of Plant Genetic Resources, Centre for Plant Breeding and Genetics, Tamil Nadu Agricultural University, Coimbatore.

[email protected]

Introduction

Crop improvement has a long history as key agronomic traits have been selected over

thousands of years during the domestication of crops. More recently, this progress has been

accelerated as the Green Revolution has brought about great increases in crop yields (Khush,

2001). With the advent of genomics in the last 25 years, opportunities for crop improvement

have continued to grow and may help to meet future challenges of food production and land

sustainability. Novel DNA sequence information allows the development of additional

molecular markers for breeding as well as providing targets for transgenic alteration of gene

expression and introduction of new traits. Recently, a method called Targeting Induced Local

Lesions IN Genomes (TILLING) was developed to take advantage of this new DNA sequence

information and to investigate the functions of specific genes. TILLING also shows promise as a

nontransgenic tool to improve domesticated crops by introducing and identifying novel

genetic variation in genes that affect key traits.

The TILLING method

TILLING applies advances in molecular biology and genomics to the identification of

genetic variation at the level of a single base pair. In the first step of the TILLING process, novel

single base pair changes are induced in a population of plants by treating seeds (or pollen) with

a chemical mutagen, and then advancing plants to a generation where mutations will be stably

inherited. DNA is extracted, and seeds are stored from all members of the population to create

a resource that can be accessed repeatedly over time. Establishing a good population and

preparing DNA samples from what typically constitutes thousands of plants for a TILLING

‘library’ can easily take the better part of two years for crop plants. However, chemical

61

mutagenesis is readily applicable to many diverse plants and animals, and so has the potential

for widespread use (Kodym & Afza, 2003). In crops that propagate vegetatively or plants with

long generation times (e.g., mint, coffee, and trees), creating such a population may be more

difficult. However, even in these cases, natural variation in crop and non-crop species can by

uncovered by ‘ecoTILLING’ (Comai et al., 2004).

For a TILLING assay, PCR primers are designed to specifically amplify a single gene target

of interest. Specificity is especially important if a target is a member of a gene family or part of

a polyploid genome. The application of bioinformatics tools can also help leverage functional

genomics data across species for investigation of targets in novel crops. Next, dye-labelled

primers are used to amplify PCR products from pooled DNA of multiple individuals. These PCR

products are denatured and reannealed to allow the formation of mismatched base pairs.

Mismatches, or heteroduplexes, represent both naturally occurring single nucleotide

polymorphisms (SNPs) (i.e., several plants from the population are likely to carry the same

polymorphism) and induced SNPs (i.e., only rare individual plants are likely to display the

mutation). After heteroduplex formation, the use of an endonuclease, Cel I, that recognizes and

cleaves mismatched DNA is the key to discovering novel SNPs within a TILLING population

(Oleykowski, et al., 1998; Colbert et al., 2001). Using this approach, many thousands of plants

(or animals) can be screened to identify any individual with a single base change as well as small

insertions or deletions (1–30 bp) in any gene or specific region of the genome (Comai et al.,

2004). To increase throughput, DNA is typically pooled from 4 to 8 individuals. Genomic

fragments being assayed can range in size anywhere from 0.3 to 1.6 Kb. At 8-fold pooling, 1.4

Kb fragments (discounting the ends of fragments where SNP detection is problematic due to

noise) and 96 lanes per assay, this combination allows up to a million base pairs of genomic

DNA to be screened per single assay, making TILLING a high-throughput technique.

TILLING by sequencing

TILLING constitutes a desirable alternative as a functional genomic tool and indeed its

application has now become widespread as multiple TILLING projects focused on a variety of

crops are underway in the US, Europe, Australia, and Asia. Currently, TILLING efforts employ the

62

CEL I-LiCor technology, or close variations. This method is time consuming and very expensive

as it employs the use of fluorescent primers for initiating the mutation screens. The fluorescent

primers vary in quality and are intrinsically less efficient at PCR than regular ones. Operating the

LiCor apparatus is labour intensive and the technology is susceptible to performance

fluctuations. Ultra-high throughput sequencing is becoming available at low cost and it is a very

attractive approach for TILLING. A new and advantageous model for a TILLING service is now

possible. Academic laboratories, public and private breeding programs add their favourite

genes to the TILLING service list. A resequencing run is initiated on a set of genes and

additionally in this method, genes from different organisms can be mixed and processed in one

run so that one does not have to wait for a critical number of targets to emerge in any given

crop system. Indeed, the different genes provide biological barcoding that facilitates the

analysis of massively parallel experiments. This method is particularly advantageous for pilot

testing, where fewer individuals can now be sampled at many genes.

TILLING as a functional genomics tool

Once mutations are identified by resequencing, the predicted impact on gene function is

assessed and the data are posted to the community of users. Each user then chooses which

mutations to characterize. For a gene of average composition and structure, about 5% of

mutations are predicted to result in protein truncation because they are nonsense or affect

splicing. About 50% of discovered mutations result in missense changes whose impact on

function can be estimated in silico. The rest of the mutations are silent. For uncharacterized

genes, complete knock-outs are desirable and so truncation mutations will be selected first. For

other genes, allelic series of varying severity may be desirable and so a range of missense alleles

will be selected.

Arabidopsis TILLING

Since the inception of TILLING, this method has been widely used for the study of

functional genomics in plants, especially for the model plant Arabidopsis thaliana. Greene et al.

(2003) reported that the Arabidopsis TILLING Project (ATP), which was set up and introduced as

63

a public service for the Arabidopsis community (Till et al 2003), had detected 1,890 mutations

in 192 target gene fragments. Heterozygote mutations were detected at twice the rate of

homozygote mutations. Therefore, the mutational density for treatment of Arabidopsis with

EMS was approximately 1 mutation / 300 kb of DNA screened with these mutations distributed

throughout the genome. The numerous mutations in Arabidopsis thaliana that have been

identified via TILLING have provided an allelic series of phenotypes and genotypes to elucidate

gene and protein function throughout the genome for Arabidopsis researchers.

TILLING in crops

Even though TILLING was originally designed for and applied mainly in Arabidopsis in the

early years of its inception, it has been demonstrated to be an extremely versatile approach

compared to many other reverse genetic approaches. This method has proven to be successful

to rapidly identify variant genotypes and determine gene function in plants that are diploid and

have relatively small genomes such as Arabidopsis. Additionally, it also can be easily applied to

other crop plants with very large genomes that are further complicated by various ploidy levels

such as wheat. Moreover, the uses of chemical mutagens in diploid and polyploidy plants

produce a range of various mutations and a high density of mutations throughout the genome.

Obtaining an allelic series for a gene of interest greatly assists in the overall determination of

gene function by providing multiple types of phenotypic variants to be analyzed.

Maize

In 2004, maize, which is an important staple crop with a large genome, was shown to be

conducive to the TILLING method (Till et al 2004). A total of 11 genes were examined in a

population of 750 mutagenized plants and six of these 11 genes had detectable induced

mutations. One of the genes examined in this study, DMT102 (chromomethylase gene), has

been previously suggested to play a role in non-CpG DNA methylation and gene silencing in

Arabidopsis. A Maize TILLING Project established in 2005 at Purdue University has already

identified 319 mutations in 62 genes, which has greatly assisted functional genomic studies in

maize (Weil and Monde 2007). Barley, which is also an important cereal crop with a fairly large

64

genome size of ~5,300 Mb, was evaluated for the ability of induced mutations to be detected

by TILLING. Two genes (Hin-a and HvFor1) were examined and 10 variants were identified, six of

which were missense mutations. Phenotyping the M3 individuals demonstrated that 20% had

visible phenotypes.

Wheat

Wheat is an extremely important agronomic staple crop with an estimated production

level of 600 million tons per year (Bagge et al 2007). A polyploid plant investigation to locate

variants in the waxy locus (granule-bound starch synthase I, GBSSI) in wheat was implemented.

Partial waxy wheat cultivars are desirable because production of amylose starch is reduced,

which leads to the production of superior flour and noodle products for human consumption.

Wheat genetics can be complicated because its genome is complex, it is an allohexaploid, and

the total genome size is quite large (17,000 Mb). A total of 246 alleles were uncovered in three

waxy gene homoeologues (Wx-A1, Wx-B1, and Wx-D1) from allohexaploid and allotetraploid

wheat via TILLING. This comprehensive allelic series provided 84 missense, three nonsense and

five splice site mutations. Phenotyping of M3 progeny demonstrated reduction of amylose

production. Detecting genetic variants via Phenotyping in wheat can be difficult because

redundant copies of loci in the genome can mask expression. This study identified more

extensive allelic variation in GBSSI than was identified in any report produced in the last 25

years (Slade et al 2005).

Rice

Rice, which is also a staple and important economic crop around the world, currently

estimated to provide 80% of the caloric intake for three billion people (Storozhenko et al 2007),

has been the focus of a few TILLING studies. The rice genome has been predicted to contain

~50,000 genes, of which gene function needs to be determined empirically. In 2005, a report

was published on the generation of a large mutation population (60,000) using multiple

chemical mutagens on IR64, a widely grown indica rice (Wu et al 2005). This study

demonstrated that TILLING was suitable for reverse genetic studies with mutations detected in

65

two genes; albeit, the mutational density in the population was fairly low. In addition, extensive

phenotypic variation was assessed for the various chemical mutagens used to develop the

mutant population and albinism was a common phenotype no matter which mutagen was

applied. In a separate study, EMS and Az-MNU were used to induce an elevated mutational

density in rice, with 57 polymorphisms identified from 10 target genes by TILLING (Till et al

2007). Another report on rice TILLING published in 2007, demonstrated the efficacy of TILLING

to detect mutations by separation of products on Agarose gels (Raghavan et al 2007). Results

were analogous to pooling DNA and detecting mutations on a LI-COR DNA Analyzer.

Pearl Millet

TILLING populations were developed in the inbred line “P1449-2-P1”. Assuming a

population size of 10,000 M2 lines to be ideal for mining allelic variants in drought- responsive

candidate genes, a total of 31,000 seeds of inbred line “P1449-2-P1” were mutagenized in three

different batches using 5, 7.5, 9 and 10 mM EMS and 10,000 TILLING lines, were developed. A

total of 9 pooled plates, 5 plates from the DNA isolated from the initial set of mutagenesis

experiments and 4 pooled plated from mutant lines generated during 2007 were prepared.

Further, all the plate records were uploaded into LIMS (Laboratory Information Management

System) followed at ICRISAT for data storage and retrieval. Thus the pools of DNA from mutant

population will be made available for international pearl millet community for allelic mining

candidate genes. Further, efforts are underway to mine allelic variants for drought responsive

candidate genes (viz, DREB2A) in the mutant DNA pools prepared < http://www.icrisat.org/bt-

gene-discovery.htm>.

Soybean

Soybean contains approximately 35-50% protein and has been shown to be beneficial

for human health. It is an important economic crop that can improve soil quality by fixing

nitrogen. Four mutant populations from two genetic backgrounds (Forrest and Williams 82)

were created for soybean by treatment with EMS or NMU and evaluated for induced mutations

(Cooper et al 2008). Several of the target genes initially tested amplified more than one target.

66

Further work was carried out to produce a single product to employ TILLING so that mutation

detection functioned optimally. A total of 116 mutations were identified via TILLING from seven

target genes. The majority of the mutations uncovered by TILLING were determined to be the

expected G/C to A/T transitions. This study demonstrated that soybean is suitable for TILLING

studies.

Conclusion

Nucleotide sequences contain hidden information about the forces for conservation and

variation that shaped their evolutionary history. It is known that detection of a gene

responsible for a mutation is a challenging process. TILLING and Eco TILLING are inexpensive

and swift natural polymorphism detection and genotyping methods . They have advantages for

determining the range of variation for genetic mapping based on linkage analysis. In these

techniques, if a mutation is detected in a pool, the individual DNA samples that went into the

pool can be individually analyzed to identify the individual that carries the mutation. Once this

individual has been identified, its phenotype can be determined. These techniques work with

good results, even if a population contains pre-existing mutations that would compromise SNP

discovery by other methodologies. The use of these approaches can facilitate the handling of

the large-scale discovery of induced point mutations through populations that are required.

These new screening methods can be applied to several plant species, whether small or large,

diploid or allohexaploid in nature. They may serve as rapid approaches to identify the induced

and naturally occurring variation in many species. Now the successes have been reported in a

variety of important plant species, the next challenge is to use these technologies to develop

improved crop varieties.

References

Bagge, M., Xia, X., Lübberstedt, T., 2007 Functional markers in wheat. Curr Opin Plant Biol.

10(2): 211- 216.

67

Chitra, Raghavan, M. A., Elizabeth, B., Naredo, Hehe, Wang, Genelou, Atienza, Bin Liu, Fulin Qiu,

Kenneth, L., McNally Hei Leung., 2007 Rapid method for detecting SNPs on agarose gels and its

application in candidate gene mapping. Mol Breeding 19: 87–101.

Colbert, T., Till, B. J., Tompa, R., Reynolds, S., Steine, M. N., Yeung, A. T., McCallum, C. M.,

Comai, L., and Henikoff, S., 2001 High throughput screening for induced point mutations. Plant

Physiol 126: 480–484.

Comai, L.,, Young K, Till B. J., Reynolds, S. H., Greene, E. A., Codomo, C. A., Enns, L. C., Johnson,

J. E., Burtner, C., Odden, A.R., and Henikoff, S., 2004 Efficient discovery of DNA polymorphisms

in natural populations by Ecotilling. Plant J 37: 778–786

Cooper, J. L., Till, B. J., Laport, R. G., Darlow, M. C., Kleffner, J. M., Jamai, A., El-Mellouki, T., Liu,

S., Ritchie, R., Nielsen, N., et al., 2008 TILLING to detect induced mutations in soybean. BMC

Plant Biol 8:9.

Greene, E. A., Codomo, C. A., Taylor, N. E., Henikoff, J. G., Till, B. J., Reynolds, S. H., Enns LC, et

al., 2003 Spectrum of chemically induced mutations from a large-scale reverse-genetic screen

in Arabidopsis. Genetics. 164(2): 731-40

Jain-Li Wu, Chanjian Wu, Cailin Leil, Marietta Baraoidan, Alicia Bordeos, et al., 2005 Chemical

and irradiation-induced mutants of indica rice IR64 for forward and reverse genetics. Plant

Molecular Biology.59: 85-97

Khush, G.S., 2001 Challenges for meeting the global food and nutrient needs in the new

millennium. Proc Nutr Soc. 60(1): 15-26.

Kodym, A., and Afza, R., 2003 Physical and chemical mutagenesis.Methods Mol Biol 236: 189–

204.

Oleykowski, C. A., Bronson Mullins, C. R., Godwin, A. K., and Yeung, A. T., 1998 Mutation

detection using a novel plant endonuclease. Nucleic Acids Res 26: 4597–4602.

68

Slade, A. J., Fuerstenberg, S. I., Loeffler, D., Steine, M. N., Facciotti, D., 2005 A reverse genetic,

nontransgenic approach to wheat crop improvement by TILLING. Nat Biotechnol , 23(1): 75-81.

Storozhenko, S., De Brouwer, V., Volckaert, M., Navarrete, O., Blancquaert, D., Zhang, G. F.,

Lambert, W., Van Der Straeten, D., 2007 Folate fortification of rice by metabolic engineering.

Nat Biotechnol 11: 1277-9.

Till, B. J., Cooper, J., Tai, T. H., Colowit, P., Greene, E. A., Henikoff, S., Comai, L., 2007 Discovery

of chemically induced mutations in rice by TILLING. BMC Plant Biol 7:19.

Weil. C. F., and Monde, R., 2007 Getting the point - mutations in maize. Crop Sci. 47: 60-67.

69

Genomic approaches of nitrogen and phosphorus efficient rice

Dr. K K Vinod IARI, Rice Breeding and Genetics Research Centre

Aduthurai, Tanjavur District. [email protected]

Introduction

Life in Asia depends on rice not only because it provides more than 70% of thedaily

calories for the population, butalso because of its important role as a source of income for

millions of rice farmersand landless workers (Dawe 2000).Global food security is at stake since

the demand for rice is exceeding production. Furthermore, increase in rice production shows

adiminishing trend, fallingfrom 1.6 per cent per annum in the 1990s toless than 1.0% by 2010

(FAO2003). Having surpassed the seven billion mark, the global population growth is thelone

serious factor that influence increased demand for rice (UNFPA 2011). With theincreasing

urbanization, land suitable and available for agriculture is diminishing and an increase in rice

production has to be obtained by increasing productivity, that is, increasedyield per unit land.

In addition to improved and sustainable agro-management options, higher yielding and more

nutrient-efficient genotypes have to be developed in order to secure rice production.

Mineral nutrition in rice requiressixteen essential elements (De Datta 1981), of which

nitrogen (N), phosphorus (P) and potassium (K) are applied to rice fields as chemical fertilizers

in large quantities. N and P are fundamental to crop development because they form the basic

component of many organic molecules, nucleic acids, and proteins (Lea and Mifflin 2011).

Estimates suggest that, from 2008–2012, global fertilizer demand will increase by 1.7% annually

amounting to about 15 million tons, of which 69% are required in Asia alone (FAO 2008). By

2012, global demand for N fertilizer will increase by about 1.4% (7.3 M t) and by about 2% (4.2

M t) for P fertilizer. Since Asia’s share in global rice production is more than 90% (Gulati and

Narayanan 2002) a substantial proportion of the increased fertilizer demand would be utilized

for rice production (Witt et al., 2009). This is of growing concern because recent estimates (FAO

70

2008) indicate a decliningtrend in nutrient-use efficiency as a result of fertilizer consumption

exceeding grain production. In addition, fertilizer prices are increasing due to high energy costs

and because natural resources are limited and increasingly difficult to assess, as will be outlined

in more detail below. Therefore, a balanced and sustainable use of fertilizers is of utmost

importance. The food crisis in 2008 (Von Braun 2008) which triggered socio-political unrests

worldwide has given us a first glimpse on what there is to come if we do not succeed in

providing sufficient food at affordable costs, long term and in a sustainable way.

Nitrogen – A diminutively plant availablenutrient

In spite of being the most abundant element in the atmosphere (78%), N is one of the

most limiting nutrients in natural and agricultural ecosystems. It enters into the soil only in

miniscule amounts through natural precipitation and biological N fixation. Although significant

N reserves are present in the soil (2–20 tha-1) (Bockman et al., 1990), only a limited amount is

available to plants. Since plants require large quantities of N, greater than any other primary

nutrient, plant assimilation of soil N often exceeds the amount being replenished (Epstein and

Bloom 2005). Native sources of soil N include pre-existing inorganic and organic forms, net N

mineralization from organic matter, biological N fixation and N inputs from irrigation waters,

and atmosphere deposition (Cassman et al., 1996). The majority of plant-useable N is

consumed as nitrate (NO3) from well-aerated soils and as ammonium (NH4+) from poorly

aerated, submerged soils (Huang et al., 2000). Although NH4+uptake demands less energy than

NO3, only few plant species, such as rice, are capable of growing exclusively with NH4+

(Kronzucker et al., 1999). Since rice is capable of assimilating both forms of N (Wang et al.,

1993a; Kronzucker et al., 1998) it is adapted to aerobic as well as to anaerobic growth

conditions.

In traditional rice ecosystems, low-N stress is a problem in marginal areas where no or

sub-optimal levels of N are applied (Laffite and Edmeades 1994a,b; Agrama et al., 1999)

because farmers lack resources to purchase fertilizer and agricultural practices are subsistence

farming (Ortiz-Monasterio et al., 2001). Apart from the immense benefit of mineral fertilizer,

71

when applied in excess it causes eutrophication of fresh water estuaries and coastal water

ecosystems (Raven and Taylor 2003)and increased emission of greenhouse gases, such as

nitrous oxide (N2O, Matson et al., 1998). If this practice continues, problems of nutrient

leaching and atmospheric contamination soon will become a wide-spread problem also in

developing countries (Ortiz-Monasterio et al., 2001).

Furthermore, diminishing non-renewable global energy resources, such as petroleum

and natural gas, demands for a more efficient fertilizer use for two compelling reasons; (a)

conservation of energy, because the chemical synthesis of N fertilizers requires ammonia

produced by the Haber-Bosch process (Travis 1993) for which natural methane is the major

hydrogen source and (b) for curtailing the ever-increasing fertilizer costs.

Although, N deficiency in particular, can be managed to a certain extent by addition of

organic manures and by cultivation of fallow legumes (Balasubramanian et al., 2004), a

significant reduction in N-fertilizer application can be achieved by optimizing rate and timing of

fertilizer application to synchronize N supply and demand (Peng and Bouman 2007). To further

reduce N applications, an alternative approach is the development of varieties that use N more

efficiently, either physiologically, that is increased carbon gain per unit plant N and time, or

agronomically, that is greater dry matter production and protein yield per unit N (Laungani and

Knops 2009).

Crop varieties can respond to nutrient supply in four different ways (Figure 1), viz.,

efficient and inefficient under nutrient deficiency, and responder and non-responder under

nutrient sufficiency (Ortiz-Monasterio et al., 2001). Efficient genotypes are those that possess

high external (uptake) efficiency, while responders are characterised by high internal

(utilization) efficiency. Since nutrient uptake and utilization are interdependent, it is difficult to

distinguisha responder from an efficient cultivar. It is therefore important to developefficient

selection criteria for low-nutrient tolerance, to differentiate between external and internal

efficiency (Agrama 2006) and to assess genetic variability (Broadbent et al., 1987; DeDatta and

Broadbent 1988). This necessitates a careful integration of both physiological and agronomic

72

evaluation of cultivars under low and high-nutrient regimes. For high-input systems, remarkable

breeding efforts have already been carried out for the selection of responder varieties with high

internal efficiency. This now has to be complemented with breeding of cultivars that are

tolerant of nutrient-deficiency and cultivars that maintain a high yield with reduced fertilizer

inputs.

Selection in environments with low nutrient status is often plagued by problems of low

heritability and high environmental variability. Although, nutrient-scarce environments are not

normally preferred by breeders, selection under such conditions will be more effective for yield

per se rather than selection for yield potential alone (Muruli and Paulsen 1981; Blum 1988).

Evaluation under field conditions is preferable over screenings in nutrient solution since the

latter cannot simulate the complex soil-plant interaction. Furthermore, in the case of N,

screening of genotypes at low-N levels would additionally avoid problems of vulnerability to

pests and diseases and lodging, which creates artefacts at high N levels (Tirol-Padre et al.,

1996).

To establish selection criteria for higher uptake efficiency, it is essential to study yield

components, biomass production, as well as nutrient assimilation in relation to traits

determining uptake efficiency that is a large root volume, efficient nutrient absorption, and

nutrient transport (Bassirirad 2000; Wang et al., 2006). In addition, a clear understanding of

environmental parameters, such as location effect, weather, radiation etc., and their effect on

nutrient uptake in different genotypes can leverage an objective-based distinction between

uptake vs. utilization efficiency in the target environment. This is particularly important

becausehigh nutrient-uptake efficiency without repletion of nutrient might accelerate nutrient

depletion, while low-nutrient efficient genotypes may lead to increased nutrient leaching or

volatilization under high nutrient situations (Ortiz-Monasterio et al., 2001).

73

Figure 1. Crop response to nutrient supply

Genes and pathways of nitrogen uptake and metabolism

In rice, excessive N stimulates shoot growth, root inhibition, delayed flowering, and

senescence (Bernier et al., 1993; Stitt 1999), while deficiency results instunted growth,

chlorosis, poor yield, and anthocyanin pigmentation due to carbohydrateaccumulation (Martin

et al., 2002). Under low N conditions, rice plants attempt to acquire more N by increasing the

root surface area which increases the root to shoot ratio (Marschner et al., 1986) with varying

degree depending on the phenological stage (Sheehy et al., 1998). Rice genotypes show

significant variability for N uptake (external efficiency) and utilization (internal efficiency) with

yield being predominantly determined by the uptake process, particularly under low N

conditions (Singh et al., 1998; Witcombe et al., 2008). The most N-efficient rice genotypes are

those capable of accumulating N in the first 35 days of transplanting (Peng et al., 1994).

External efficiency declines as the crop progresses to maturity with reduction in daily uptake of

N towards terminal stages due to increasingly inefficient roots (Sheehy et al., 1998) and internal

Yiel

d un

der

nutr

ient

suf

ficie

ncy

Yield under nutrient deficiency

Efficient responder

Inefficient responder

Efficient non-responder

Inefficient non-responder

Nutrient use

74

N recycling from senescing tissues to the developing panicle (Mae and Ohira 1981). Under low-

N supply, the internal recycling accounts for 70–90% of the total panicle N (Mae 1997; Tabuchi

et al., 2007). On the other hand, N concentrations in the straw at crop maturity is not

significantly affected by changes in N supply at terminal stages (Tirol-Padre et al., 1996), hence,

N uptake prior to panicle initiation is critical in building up the internal N reservoir (Figure 2).

In young rice plants, roots play a significant role in N absorption with root density and

distribution in the soil as the major determinants. Although there are many studies related to

variation in nitrogen (N) uptake in rice, there seems to be little information on differences in

root morphology that may contribute to this variation. A recent study on the N-use efficiency of

two rice cultivars by Fan et al., (2010) indicates a significant influence of root morphological

parameters and root physiological characteristics at different growth stages. The uptake

process and N homeostasis are complex processes that involve recycling of N (especially amino

acids) from shoots to roots via the phloem, and from shoots to roots via the xylem (Imsande

and Touraine 1994; Marschner et al., 1997).

Figure 2. Field management and plant-related traits relevant for improved nitrogen and

phosphorus efficiency

Deep roots for access to water

Terminal nutrient recycling

Basal fertilizer

N+P

External nutrient use efficiency Nutrient uptake

Early root vigor with large shallow root system

for P uptake

Internal nutrient-use efficiency Translocation potential Alternative pathways

Novel alleles

Weed management Top dress fertilizer (N)

Nutrient transfer from senescing roots

Tolerance to abiotic/ bioic stresses

High-affinity nutrient uptake

Mobilization and foraging of P

Mycorrhizae

High grain N Low grain P Low phytates Early seedling vigour

Functional stay green

75

This is under complex genetic control and the molecular genetics of N utilization is well

studied in model species such as Arabidopsis. Many gene families, including NO3 and NH4+

transporters and primary assimilation genes, amino acid transporters, as well as transcription

factors, and other regulatory genes have been identified by different approaches. With the

identification of orthologous genes from rice, opportunities are now emerging of utilising these

genes in marker-assisted breeding for N-efficiency (for reviews, see Li et al., 2009b; Kant et al.,

2011). N uptake in roots is mainly regulated by high-affinity transport system (HATS) that

regulates uptake at N levels below 1 mM and a low-affinity transport system (LATS) which

functions at higher N concentrations (Forde and Clarkson 1999; Glass et al., 2001; Williams and

Miller 2001). It has been shown that the low-affinity NO3 transporter OsNRT1 contributes

predominantly to N uptake in the root epidermis and in root hairs (Lin et al., 2000) acting in

conjunction with the high affinity and NO3-inducible transporters NRT2 and NAR2 (Cai et al.,

2008). Recent data provided further insight into the complexity of N uptake showing that the

mRNA of OsNRT2.3 is alternatively spliced (OsNRT2.3a/b) during the uptake process, and that

OsNAR2.1 interacts with two other NRT proteins(Yan et al., 2011). Furthermore, it was shown

that most transporter genes are up-regulated by NO3 and suppressed by NH4+, with an

exception of OsNRT2.3b being insensitive to NH4+ (Feng et al., 2011). Similar to Arabidopsis,

high-affinity rice NH4+ transporters are encoded by members of the AMT1 and AMT2 gene

families (Gazzarini et al., 1999; Howitt and Udvardi 2000; Loqué and von Wirén 2004). So far,

twelve AMT genes have been identified located on different chromosomes and grouped into

five sub-families (AMT1 – AMT5) with one to three gene members (Li et al., 2009b). Gene

expression analyses showed that OsAMT1;1 is constitutively expressed in shoots and roots

(Ding et al., 2011). In contrast, OsAMT1;2 shows root-specific expression and is NH4+ inducible,

whereas OsAMT1;3 is root-specific and N-suppressible. In addition, a high-affinity urea

transporter (OsDUR3) encoding for an integral membrane-protein that is up-regulated under N-

deficiency has recently been identified in rice roots (Wang et al., 2012). OsDUR3 over-

expression improved growth on low urea and this gene might therefore play a significant role in

N uptake in rice. Urea is converted to NH4+, and carbon dioxide in the presence of urease,

which requires nickel (Ni) as co-factor. Interestingly, application of Ni has a marked effect on

76

plant growth in rice when urea was provided as the sole N source (Gerendas et al., 1998).

Further, an early nodulin gene (OsENOD93-1) with potential function in amino acid

accumulation and transport has been identified in rice (Bi et al., 2009).

To enable plants to accumulate sufficient internal N, it is critically important to apply N

fertilizer at the right developmental stages. In irrigated rice fields, N fertilizer is usually applied

in three splits, that is, at transplanting (basal), at maximum tillering, and at panicle initiation.

This practice is recommended to rice farmers (Williams et al., 2010) and is now widely adopted.

Urea is the major source of N in rice ecosystems since it is rapidly converted to NH4+ or NO3 by

soil micro-organisms. In cereals, depending on the genotype and environmental conditions, the

root or shoot can be the main site for NO3 assimilation, with root assimilation dominating at

soil-nitrate concentrations below 1 mM and shoot assimilation at concentrations above 1 mM

(Andrews 1986; Andrews et al., 1995). Upon absorption, NO3 is reduced to nitrite by nitrate

reductase (NR) and subsequently to NH4+ by nitrite reductase (NiR). NH4+ from both the

nitrate reduction pathway and direct absorption is subsequently incorporated into amino acids

through synthesis of glutamine and glutamate (Crawford and Glass 1998; Meyer and Stitt 2001;

Campbell 2002) primarily in the chloroplasts and plastids via the glutamine synthetase (GS)/

glutamate synthase (GOGAT) cycle (Tobin and Yamaya 2001; Andrews et al., 2004) and

alternatively via pathways involving glutamate dehydrogenase (GDH) and asparagine

synthetase (Hirel and Lea 2002; Dubois et al., 2003). Although roots have high constitutive

levels of GS and GOGAT, both enzymes are also induced by NH4+ (Ishiyama et al., 2003). For the

conversion of glutamine to glutamate by GOGAT, either ferredoxin (Fd-GOGAT) or NADH

(NADH-GOGAT) is used. For GS, two major forms are known, that is cytosolic GS (GS1),

expressed in roots and shoots, and plastidic GS (GS2), expressed in chloroplasts and plastids.

GS1 constitutes a complex gene family of three to six genes (Hirel and Lea 2002).

Remobilization of internal N during grain filling is another key process in N metabolism.

Among the primary N assimilation genes (for review see Lea and Mifflin 2011), physiological

and biochemical evidences indicate that GS1 plays a major role in the synthesis of glutamine in

77

older leaves which is transported to the growing tissues (Kamachi et al., 1991; Habash et al.,

2001; Masclaux et al., 2001), a process positively related to yield and N use efficiency. The GS1

genes from rice, OsGS1; 1OsGS1; 2 are expressed in all organs but with higher expression in leaf

blades and roots respectively, while OsGS1;3 is found specifically in the spikelet (Tabuchi et al.,

2005). Prior to senescence, GS1 activity is reduced which leads to rapid accumulation of NH4+

and senescence since accumulated NH4+ is highly toxic to plant cells (Chen and Kao 1996). In

addition, it was shown that a knock-out mutation in OsGS1.1 resulted in reduced plant growth

and poor yield (Tabuchi et al., 2005), whereas over-expression of the GS2 gene increased yield

in wheat and Arabidopsis (Habash et al., 2001; Martin et al., 2006), and enhanced photo-

respiratory capacity and salt tolerance in rice (Hoshida et al., 2000). Re-assimilation of

glutamine to glutamate by GOGAT is tissue-specific with Fd-GOGAT predominantly active in

photosynthetic tissues (Lea 1997) whereas NADH-GOGAT has a significant role in developing

sink organs in the reutilization of glutamine (Tabuchi et al., 2007). Over-expression of NADH-

GOGAT increased panicle weight in rice in agreement with its important role in transporting

glutamate to major sink tissues during grain filling (Yamaya et al., 1992), and therefore is

considered a key enzyme in N utilization and grain filling in rice (Andrews et al., 2004). Two

genes of NADH-GOGAT have been reported in rice; OsNADH-GOGAT1 expressed in growing

tissues such as root tips, young spikelets and developing leaf blades and OsNADH-GOAGT2

mainly expressed in mature leaves and leaf sheaths, of which NADH-GOGAT1 is important in N

remobilization (Tabuchi et al., 2007).

Gene-expression profiling of seedlings from the indica-type variety Minghui 63 grown in

N-deficient hydroponics solution (Lian et al., 2006) revealed rapid down-regulation of genes

involved in photosynthesisand energy metabolism in response to N stress. Interestingly, genes

involved in earlyresponses to biotic and abiotic stresses, as well as transcription factors and

other regulatory genes were responsive to N, especially in roots. In contrast, genes known to be

involved in N uptake andassimilation showed little or no response to low N. However, since this

study was conducted in hydroponics with very young seedlings, the data might not represent

the actual situation in soil or the field.

78

Grain development in rice depends on the establishment and maintenance of a

photosynthetically active canopy which acts as a major N storage before internal N is

translocated to the panicle. Since this process occurs at the expense of the photosynthetic

machinery, canopy longevity and maintenance of the photosynthetic capacity are vital for

continuous remobilization of N and starch accumulation (Hawkesford and Howarth 2011). In

this context the functional stay-green (FSG) trait is of interest since it delays leaf senescence

and sustains photosynthesis in maturing plants which leads to increased biomass production

and grain yield (Fu and Lee 2008). Recently, quantitative trait loci (QTLs) have been mapped for

FSG (Yoo et al., 2007; Fu et al., 2011), that confer yield advantage and hold promise for marker-

assisted introgression of the stay-green trait into rice varieties.

Phosphorus - Declining natural reserves and accessibility

In contrast to nitrogen, P is a non-renewable natural resource and concerns are rising

that rock phosphate reserves are limited. A recent study conducted by the International

Fertilizer Development Centre (IFDC) concluded that currently known and easily accessible

world rock phosphate reserves will last approximately another 300-400 years (van

Kauwenbergh 2010). Investments are therefore made to discover new rock phosphate reserves

and to develop alternative technologies to isolate P from e.g., marine sediments and faeces

(see below).Another concern is the unequal geographical distribution of P reserves with the

vast majority of phosphate rock located in Morocco, followed by the USA and China (van

Kauwenbergh 2010). A dependence of the world’s food production on few countries is

problematic and policies should be put in place in time to ensure equal access to P reserves in

the future. In fact, China is already imposing seasonal export taxes to secure national supply of

P fertilizer (http://agfax.com/2011/11/18/chinese-government-imposes-seasonal-export-

taxes).

Whereas P fertilizer is applied in excess in the western world and some Asian countries

(e.g., China, Japan, Korea; MacDonald et al., 2011), P deficiency is a major problem in many

Asian and African countries, as well as in South America. The two main reasons for this are (i)

79

insufficient application of P in form of P-fertilizer or manure and (ii) P-fixing soil properties

which renders P unavailable to plants even if present in sometimes large amounts. P

imbalances in the world with too much in some countries and too little or inaccessible P in

others (MacDonald et al., 2011) will require different measures and breeding strategies.

In modern agriculture, inorganic P fertilizer has widely replaced the application of

manure. A large quantity of P present in animal and human faeces is therefore removed from

the nutrient cycle and ends up in waterways where it is no longer available and difficult to

recycle (Cordell et al., 2009). In realization of this, technologies are being developed in Europe

to extract P from sewage in urban centres to produce Struvite (ammonium magnesium

phosphate) that can be used as P and N fertilizer (http://www.ceep-

phosphates.org/Files/Newsletter/scope50.pdf). In light of decreasing rock phosphate reserves,

these efforts are extremely important and, once applied at a large scale, this technology will

become more competitive in terms of production costs.

The high concentration of P in human and animal faeces is due to the consumption of

phytic acid (PA, inositol hexaphosphate) which is the major (50-80%) storage form of P and

present in large quantities in cereal grains, legumes, soybean, and other plants (for a review

seeLott et al., 2000). PA is not digestible by humans and animals and additionally binds iron,

zinc, and other minerals thereby reducing their bioavailability. Research is therefore on-going

aiming at the reduction of PA in crops (for review see Raboy 2009). Apart from its role as an

“anti-nutrition”, PA is the main source of P drainage from fields. Estimates suggest that50-80%

of the P in cereal grains and legume seeds is stored as PA (Lott et al., 2009). Studies on low PA

(lpa) mutants in rice (Larson et al., 2000; Liu et al., 2007) and barley (Bregitzer and Raboy 2006),

however, showed that total grain P was generally higher in lpa mutants compared with wild

type and this is approach is therefore not directly applicable to reduce P drainage.

For soils naturally low in P or depleted of P due to continuous cropping without

repletion of P (and other nutrients), fertilizer or manure application is inevitable to maintain

productivity and prevent soil degradation. However, continuous cropping of poor soil is often

80

related to poverty and breeding of efficient crops, therefore, has to be complemented with

policy measures providing poor farmers with agricultural inputs. With regard to breeding for

poor soils, crops with high P-uptake and high internal P-use efficiency need to be developed to

maximize yield in such low-input systems. Besides, a combination of both, uptake and internal,

efficiency is equally desirable for high-input systems since it would facilitate reduction of

fertilizer doses without yield penalty. In rice, the P fertilizer-use efficiency is about 25%

(Dobermann and Fairhurst 2000), providing a considerable scope for improvement.

For areas with P fixing soils, high fertilizer applications are currently necessary to

provide sufficient plant-available P. Soils with P-fixing properties are wide-spread in the Asia

Pacific and occur on 5% of the total land area (Bot et al., 2000). In China, Indonesia, Japan,

Thailand and Vietnam, P-fixing soils cover 9-15% of the total land area. These numbers are even

higher in Laos (24%) and Myanmar (16%). In Africa, P fixing soils are especially widespread in

Burundi, Congo, Liberia, Swaziland, and Rwanda (16-29%). Similar numbers are reported from

South America, where P fixation occurs on 14-25% of the total land area in e.g., Brazil,

Colombia, Venezuela, Peru, and Ecuador. In French Guyana, 79% of the total land area has P

fixing properties (Bot et al., 2000). The development of crops that can access P-reserves in

these soils and that are highly efficient in P-fertilizer uptake should therefore be a global

breeding priority. In addition, it is critically important to develop crops with tolerance of

multiple stresses because P deficiency is often a secondary effect in soils with high

concentrations of iron and aluminium, and low pH, which restricts root growth even if P is

available (Ismail et al., 2007).

Genes and pathways of Phosphorus uptake and metabolism

In irrigated rice systems, P fertilizer is generally applied only at the beginning of the

season (basal) whereas N is applied in three splits (basal/after transplanting, maximum tillering,

panicle initiation/booting). This is common practise in Asia and Africa (Haefele et al., 2005;

Hossain et al., 2005) indicating that P acquisition and requirement are highest during early

growth stages. Under P-deficient conditions, plant development is delayed and P-deficiency

81

symptoms (reduced tillering, darker leaves due to anthocyanine accumulation) are easily

recognized and cause significant yield losses (Dobermann and Fairhust 2000). Another

important factor is that P, in contrast to N and K, is not transported with the soil solution (mass

flow) but mainly by diffusion (Schachtman et al., 1998). A large root surface area is therefore

particularly important for P uptake since plants gain access to a larger soil area and thereby to P

(Lynch 2007). In agreement with that, induction of root growth under P-deficiency has been

described in many species (Hernández et al., 2007), though it should be noted that root growth

under –P conditions is generally reduced compared with plants grown under +P conditions

(Chin et al., 2010; Li et al., 2009a).

For yield stability under P-deficient conditions, long root hairs and highly branched root

systems, especially in the top soil where P is mainly located, are considered beneficial

(Ramaekers et al., 2010). In agreement with this, the Pup1 major QTL for tolerance of P-

deficiency is an enhancer of root growth rather than a regulator of specific P-uptake

mechanisms (Heuer et al., unpublished). The same was observed in the P-deficiency tolerant

maize mutant line (99038; Li et al., 2008) which likewise developed a larger root system.

P is transported into the plant by P transporters located in the root plasma membrane.

In the rice reference genome, thirteen P transporter genes are present (Paszkowski et al.,

2002). Two of these transporters have been functionally characterized revealing that OsPT2

encodes a low affinity and OsPT6 a high affinity transporter (Ai et al., 2009). High affinity

transporters are generally induced under low P conditions (Paszkowski et al., 2002) and are

therefore considered more important for P uptake under field conditions since the soil P

concentration is about 1000 times lower compared with intracellular concentrations

(Schachtman et al., 1998). In agreement with this OsPT6, but not OsPT2, was shown to be

expressed in the root epidermis (Ai et al., 2009). In another study (Jia et al., 2011), the effect of

the rice P transporter OsPht1;8 was analyzed by over-expression and RNAi. The authors showed

that P uptake was enhanced and decreased according to the expectation, however, both

approaches lead to a significant reduction in the number and size of panicles as well as to >80%

82

spikelet sterility. In another study on rice, transgenic plants over-expressing the tobacco

transporter NtPT1 were generated but, although some lines outperformed the controls, on

average transgenic lines yielded less (Park et al., 2010). Likewise, it was shown in barley that

over-expression of the transporter gene HORvu; Pht1; 1 did not increase P uptake

(Rae et al., 2004). Taken together, the data suggest that over-expression of high-affinity

transporters alone is not sufficient to improve P efficiency. In addition, because P is rapidly

depleted at the soil-root interphase, it will be critically important to combine high-affinity

uptake with increased access and mobilization of P, e.g., a large root-surface area, exudation of

organic acids, acid phosphatases, and phytases (for review see Gahoonia and Nielson 2004).

For an extended root system and thereby better access to P, mycorrhiza are generally

considered important and the positive effect of arbuscular mycorrhiza (AM) symbiosis on

nutrient uptake, especially P and N, has been demonstrated in several crops (for review see

Karandashov and Bucher 2005; Sawers et al., 2008). In rice, surprisingly few studies on the

effect of AM are available since research focussed on intensive rice systems which are flooded

and anaerobic conditions are generally considered unfavourable for fungi. However, using the

irrigated rice variety Nipponbare, it was shown that the P transporter OsPT11 is specifically

induced in roots colonized with AM (Paszkowski et al., 2002). A study from Japan additionally

showed that AM inoculation of Nipponbare seedlings significantly increased yield in flooded

fields and that the fungi survived under these conditions (Solaiman and Hirata 1997). The latter

was also found in a study by Hajiboland et al., (2009) that additionally reported a significant

growth advantage of plants colonized with mycorrhiza. Surprisingly, this was observed only

under flooded conditions, whereas growth was severely inhibited under aerobic conditions.

Growth depression due to mycorrhiza colonization has also been reported from wheat,

especially under P deficient conditions (Li et al., 2005). However, another study on rice that

analysed the effect of mycorrhiza in four different upland crop-rotationsystems showed a

significant increase in P uptake and grain yield in the system with the highest concentration of

mycorrhiza (Maiti et al., 2012). These data are encouraging and more detailed studies should be

83

conducted to assess the potential of mycorrhiza for enhancement of P uptake in irrigated and

rain-fed rice systems, and to assess genetic diversity for mycorrhiza interaction (Figure 2).

In contrast to enhanced P uptake, improvement of internal P-use efficiency would target

genes/pathways that enable plants to maintain cellular processes and productivity under low

P conditions. Since P is an indispensible component of virtually all cellular functions and

required in large amounts for e.g., ATP, NADPH, nuclear acids, and phosphoproteins, it can be

expected that modification of P-related pathways will affect the whole plant. This is well

reflected in microarray gene expression studies in rice (Wasaki et al., 2003; Pariasca-Tanaka

2009) and Arabidopsis (Wu et al., 2003; Morcuende et al., 2007) showing that, depending on

the study, between 220 and 5800 genes were responsive to P. For breeding applications, it will

be critically important to identify genes that act upstream of this systemic response in order to

reduce complexity and the number of genes/QTLs needed to modify internal P-use efficiency.

The regulatory network of P homeostasis has been well studied in Arabidopsis andfor

somegenes rice orthologs have been identified. The MYB-type transcription factor AtPHR1

(Rubio et al., 2001) and its rice ortholog OsPHR2 (Zhou et al., 2008) act as positive regulators of

P-transporters and other P-responsive genes, whereas other genes act as suppressors of P

starvation genes, e.g., theubiquitin conjugating enzyme AtPHO2 and its rice

orthologOsLTN1(Bari et al., 2006; Aung et al., 2006; Hu et al., 2011). Expression of PHO2 is

negatively regulatedby a micro RNA (miR399) which itself is “sequestered” by target mimicry of

the IPS1 gene (Bari et al., 2006; Aung et al., 2006; Franco-Zorrilla et al., 2007). Two recent

reviews provide an excellent and comprehensive overview over the genes involved and their

interaction (Nilsson et al., 2010; Hammond and White 2011).

Alternative pathways have been described that are up-regulated under P-starvation

utilizing PPi rather than ATP (for reviews see Hammond et al., 2004). This includes UDP-glucose

pyrophosphorylase (PPi-dependent conversion of glucose to hexose-P) and

phosphofructokinase (PPi-dependent phosphorylation of Fructose-6-P). Other adaptive

processes include the substitution of phospholipids with galacto- and sulpholipids, and up-

84

regulation of ribonucleases to mobilize P from nucleic acids (for review see Hammond et al.,

2004). Many other genes and pathways are altered under low P conditions, which is not

surprising in light of the central role of P in living cells. The challenge will be to identify genes

that enhance internal P-use efficiency without causing an imbalance in P homeostasis and

negatively affect plant development.

Interestingly, it has recently been shown in Arabidopsis that down-regulation of the

PHO1 gene (an SPX protein) conferred tolerance via suppression of the P starvation response

(Rouached et al., 2011). A similar observation has been made in tolerant Pup1 rice plants which

did not differentially express P starvation genes in comparison with non-Pup1 controls

(Pariasca-Tanaka et al., 2009; Heuer et al., unpublished). In this context it is important to note

that most studies on P-starvation responses in rice were conducted in the intolerant variety

Nipponbare. The identified genes and pathways, therefore, represent the intolerant response.

In agreement with that genes not formerly related to P-starvation tolerance have been

identified in the Pup1 donor variety Kasalath (Heuer et al., unpublished) as well as in a QTL-

mapping study using a tolerant Arabidopsisaccession (Reymond et al., 2006). It therefore seems

important to further explore genetic diversity, in Arabidopsis as well as in rice, and to identify

additional tolerant genotypes in order to gain access to large-effect genes and QTLs.

Quantitative trait loci for molecular breeding

Molecular breeding now provides a real opportunity to develop varieties with multiple

tolerance traits - provided that large-effect QTLs/genes are available. The number of reported

QTLs is steadily increasing, but still very few are applied in breeding programs. This is largely

due to lack of data that validate QTLs/tolerance genes in different genetic backgrounds and

environments (i.e., in field trials), which is a prerequisite for a large-scale application of QTLs.

In rice, molecular marker-assisted breeding is at an advanced stage for few large-effect

QTLs that confer tolerance of submergence (Sub1; Singh et al., 2010; Septiningsih et al., 2009),

drought (e.g., qtl12.1; Bernier et al., 2009a,b; Swamy et al., 2011), salinity (SalTol; Thomson et

85

al., 2010), and P deficiency tolerance (Chin et al., 2011). The latter QTL, Phosphorus uptake 1

(Pup1), has been identified more than ten years ago (Wissuwa et al., 2002) and is, to our

knowledge, currently the only P-related QTL for which molecular markers are available and

which has been evaluated in different genetic backgrounds under field conditions (Chin et al.,

2010; Chin et al., 2011; Heuer, unpublished data). Additional P related QTLs with smaller effect

have been identified and are summarized in Table 1.

Table 1: Quantitative trait loci for phosphorus-deficiency tolerance in rice

Traits Population Cross No. of QTL

Reference MQTL EQTL

PUP, PDW, TN, PUE NIL Nipponbare / Kasalath

8 - Wissuwa et al., 1998

RTA, RSDW, RRDW RIL IR20 / IR55178 4 - Ni et al., 1998

RE, SDW, RPC, RIC F8 Gimbozu / Kasalath 6 - Shimizu et al., 2004

REP CSSL Nipponbare / Kasalath CSSL29

1 - Shimizu et al., 2008

PH, MRL, RN, RV, RFW, RDW,SDW, TDW, RS

ILs Yuefa / IRAT109 24 29 Li et al., 2009a

PUP, Phosphorus uptake; DW, plant dry weight; TN, tiller number; PUE, Phosphorus use

efficiency; RTA, relative tillering ability; RSDW, relative shoot dry weight: RRDW, relative root

dry weight; RE, root elongation; SDW, ; RPC, relative phosphorus content; RIC, relative Fe

content; REP, root elongation under phosphorus deficiency, PH, plant height; MRL, maximum

root length; RN, root number; RV, root volume; RFW, root fresh weight; RDW, root dry weight;

SDW, shoot dry weight; TDW, total dry weight; RS, root-shoot dry weight ratio.

A QTL on chromosome 6 was mapped in two independent studies (Wissuwa et al., 1998;

Ni et al., 1998) but has, compared with Pup1, a smaller effect. However, this QTL has gained

importance after it was shown that a cluster of P-responsive genes is located in this region

(Heuer et al., 2009). Among these is the transcription factor gene OsPTF1 which was shown to

86

confer tolerance of P deficiency (Yi et al., 2005). Within the larger-effect QTL Pup1, no P-

responsive gene has been identified and, in agreement with that, Pup1-based tolerance does

not seem to employ currently known P-starvation response pathways as indicated by two

independent gene array analyses (Pariasca-Tanaka et al., 2009; Heuer et al., unpublished).

Interestingly, the region on chromosome 12, where Pup1 is located, has been associated with

tolerance to several biotic stresses (Ramalingam et al., 2003; Li et al., 2006), as well as to

drought (Babu et al., 2003; Bernier et al., 2007), aluminium toxicity (Wu et al., 2000) and cold

(Andaya and Mackill 2003). Similarly in tropical maize, drought-tolerant selections are also

found tolerant to low N (Bänziger et al., 2002). Whether this is a direct effect of Pup1 remains

to be shown.

Figure 3. Future approaches in breeding nutrient-efficient varieties essentially require

integration of classical and modern tools

Mutant screens

Trait discovery

Germplasm screening

QTL mapping + validation

Genes in QTLs

Low-nutrient tolerant varieties

Marker-assisted breeding

Transgenics

Genes and markers

Conventional breeding

Novel alleles

Transcript profiling

Reverse genetics

Genome sequencing

Genes with known function

Forward genetics

87

With respect to N, several genomic regions associated with N use and response have

been mapped in rice (Fang and Wu 2001; Ishimaru et al., 2001; Obara et al., 2001), following

the pioneering work on mapping QTLs for N-use efficiency in corn (Agrama et al., 1999). An

overview of the published rice QTLs is provided in Table 2. Indications so far suggest possible

links between very few of these QTLs with primary N assimilation genes and

transporters,especially of GS structural genes and yield components (Obara et al., 2001; Obara

et al., 2004; Senthilvel et al., 2008; Feng et al., 2010; Vinodet al., 2011). In a recent study, stably

expressed QTLs for yield and associated traits at different N levels, especially those expressed

at low N, were reported in recombinant inbred line populations (Tong et al., 2011).

Table 2: Quantitative trait loci for nitrogen response and associated traits in rice

Traits Population Cross No. of QTL

Reference MQTL EQTL

PH DHL IR64/ Azucena 10 - Fang and Wu, 2001 Rubisco, TLN, SPC BIL Nipponbare/

Kasalath 15 - Ishimaru et al.,

2001 GS, GOGAT BIL Nipponbare/

Kasalath 13 - Obara et al., 2001

GS, PN, PW NIL Koshihikari/ Kasalath

1 - Obara et al., 2004

TGN, TSN, NUP, NUE, NTE

F3 Basmati370/ ASD16 43 - Senthilvel et al., 2004

RDW, SDW, BM RIL Zhenshan 97/ Minghui 63

52 103 Lian et al., 2005

PH, PN, CC, SDW CSSL Teqing/ Lemont 31 - Tong et al., 2006 TGN, TLN, TSN, NUP, SLN

RIL IR69093-4-3-2 / IR72

32 - Laza et al., 2006

RL, RT, RM, BM etc. RIL Bala/ Azucena 17 - MacMillan et al., 2006

TGN, TLN, TSN, PNUE, BM

RIL Dasanbyeo / TR22183

20 58 Cho et al., 2007

TPN, NUE DHL IR64/ Azucena 16 - Senthilvel et al., 2008

TPN, NDMPE, NGPE, TGN

RIL Dasanbyeo / TR22183

28 23 Piao et al., 2009

88

PH, NR,GS, GOGAT, BM etc

RIL Basmati 370/ ASD16

15 44 Vinod et al., 2011

GYP, BM, HI etc. RIL IR64/ INRC10192 46 - Srividya et al., 2010 PH, RDW, SDW, CC, RL, BM

RIL R9308/ Xieqingzao B

7 - Feng et al., 2010

GYP, GNP RIL Zhenshan 97/ HR5 19 11 Tong et al., 2011

BIL, backcross inbred lines; BM, biomass; CC, chlorophyll content; CSSL, chromosomal

segment substitution lines; DHL, doubled haploid lines; EQTL, epistatic QTL; GNP, grain number

per panicle; GOGAT, glutamate synthase; GYP, grain yield per plant; GS, glutamine synthetase;

HI, harvest index; MQTL, main-effect QTL; NDMPE, nitrogen dry matter production efficiency;

NGPE, nitrogen grain production efficiency; NHI, nitrogen harvest index; NIL, near isogenic lines;

NR, nitrate reductase; NTE, nitrogen translocation efficiency; NUE, nitrogen use efficiency; NUP,

nitrogen uptake; PH, plant height; PN, panicle number per plant, PW, panicle weight; PNUE,

physiological nitrogen use efficiency; RDW, root dry weight; RIL, recombinant inbred lines; RL,

root length; RT, root thickness; RM, root biomass; SDW, shoot dry weight; SLN, specific leaf

nitrogen; SPC, soluble protein content; TGN, total grain nitrogen; TLN, total leaf nitrogen; TSN,

total shoot nitrogen; TPN, total plant nitrogen

One of these QTLs, for number of grains per panicle under low N level, is located in the

same region as the Pup1 locus on chromosome 12, prospecting the use of Pup1 materials for

testing low-N tolerance. Attempts to map loci for associative rhizosphere N fixation was also

reported in rice and independent QTLs linked to the activity of different N fixing bacterial

strains were identified on chromosome 2 (Ji et al., 2005). In independent studies (Lian et al.,

2005; Cho et al., 2007; Vinod, unpublished data), many small-effect epistatic QTLs have been

mapped accounting for a large cumulative proportion of variation for traits under low and

normal N conditions, many of which also show significant QTL x environment interaction,

emphasising the importance of validating QTLs in multiple environments and genetic

backgrounds before using them in selection.

89

Summary and outlook

In light of scarce resources, increasing fertilizer-production costs, and the need to keep

rice production at a pace with the growing demand, the development of nutrient-efficient crops

is increasingly important. As outlined above, both, nutrient uptake and metabolic pathways are

under the control of a complex regulatory network involving many genes (Figure 3). The

identification of large-effect QTLs/genes is therefore a challenge. However, examples such as

the submergence-tolerance OsSUB1Agene (Xu et al., 2006; Fukao et al., 2008) demonstrate that

a single gene can modify many downstream responses without affecting plant performance

under non-stressed conditions (Mackill et al., 2012). In addition to Pup1, large-effect QTLs have

also been identified for drought tolerance (Vikram et al., 2011; Swamy et al., 2011). Cloning of

the genes is in progress and it can be expected that upstream regulatory genes.Sub1, Pup1, and

the drought QTLs have been identified by reverse genetic approaches using tolerant rice

genotypes and screenings were mainly conducted under field conditions. Given the success of

this approach for a complex trait like drought, it seems advisable to apply it for the

identification of QTLs/genes for N and P use efficiency.

With the experiences gained in QTL mapping and the rapid development of genome-

sequencing and molecular-marker technologies, more high-impact, large-effect QTLs will surely

be identified in the future. These efforts require expertise in different disciplines and,

therefore, modern breeding is more and more implemented in multi-disciplinary teams

involving breeders, physiologists, and molecular biologists/geneticists. With the rapid advance

in the development ofmolecular markers and other breeding tools, breeders now gain access to

un-adapted genotypes that were difficult to use in breeding programs due to crossing barriers

and poor agronomic performance of landraces. Molecular breeding now provides real

opportunities to develop well-adapted and high yielding rice varieties, and will widen the gene

pool that breeders use in their programs.

References

90

Agrama, H. A., Zakaria, A. G., Said, F. B., and Tuinstra, M., 1999 Identification of quantitative

trait loci for nitrogen use efficiency in maize. Molecular Breeding 5: 187-195.

Agrama, H. A., 2006 Application of molecular markers in breeding for nitrogen use efficiency. J

Crop Improvement 15: 175–211.

Ai, P., Sun, S., Zhao, J., Fan, X., Xin, W., Guo, Q., Yu, L., Shen, Q., Wu, P., Miller, A. J., and Xu, G.,

2009 Two rice phosphate transporters, OsPht1;2 and OsPht1;6, have different functions and

kinetic properties in uptake and translocation. Plant Journal 57: 798–809.

Andaya, V. C., Mackill, D. J., 2003 Mapping of QTLs associated with cold tolerance during the

vegetative stage in rice. JExp Bot 54: 2579–2585.

Andrews, M., Lea, P. J., Raven, J. A., and Lindsey, K., 2004 Can genetic manipulation of plant

nitrogen assimilation enzymes result in increased crop yield and greater N-use efficiency? An

assessment.Annals Appl Bot1 45: 25-40.

Andrews, M., Raven, J. A., and Sprent, J.I., 1995 Site of nitrate assimilation in grain legumes in

relation to low temperature sensitivity: An assessment. In Proceedings of the 2nd European

Conference on Grain Legumes, Paris: L’Association Européenne de recherche sur les

protéagineuse, France 120–121.

Andrews, M., 1986 The partitioning of nitrate assimilation between root and shoot of higher

plants.Plant Cell Environment 9: 511–519.

Aung, K., Lin, S., Wu, C., Huang, Y., Su, C., and Chiou, T.. 2006 pho2, a phosphate

overaccumulator, is caused by a nonsense mutation in a microRNA399 target gene. Plant

Physiol 141: 1000–1011.

Babu, R. C., Nguyen, B. D., Chamarerk, V., Shanmugasundaram, P., Chezhian, P., Jeyaprakash,

P., Ganesh, S. K., Palchamy, A., Sadasivam, S., Sarkarung, S., Wade, L. J., and Nguyen, H.T., 2003

91

Genetic analysis of drought resistance in rice by molecular markers: association between

secondary traits and field performance.Crop Science 43: 1457–1469.

Balasubramanian, V., Alves, B., Aulakh, M., Bekunda Me, Cai, Z., Drinkwater, L., Mugendi, D.,

van Kessel, C., and Oenema, O., 2004 Crop, environmental, and management factors affecting

nitrogen use efficiency. In: Mosier AR, Syers JK, Freney JR (eds) Agriculture and the nitrogen

cycle: Assessing the impacts of fertilizer use on food production and the environment,

Washington: Island Press 19–34.

Bänziger, M., Edmeades, G. O., and Lafitte, H. R., 2002 Physiological mechanisms contributing

to the increased N stress tolerance of tropical maize selected for drought tolerance. Field Crop

Res 75: 223–233.

Bari, R., Datt Pant, B., Stitt, M., and Scheible, W. R., 2006 PHO2, microRNA399, and PHR1 define

a phosphate-signaling pathway in plants. Plant Physiol 141: 988–999.

Bassirirad, H., 2000 Kinetics of nutrient uptake by roots: Responses to global change. New

Phytologist 147: 155–169.

Bernier, G., Havelange, A., Houssa, C., Petitjean, A., Lejeune, P., 1993 Physiological signals that

induce flowering. Plant Cell 5: 1147–1155.

Bernier, J., Kumar, A., Ramaiah, V., Spaner, D., and Atlin, G., 2007 A large-effect QTL for grain

yield under reproductive-stage drought stress in upland rice.Crop Sci 47: 507–517.

Bernier, J., Kumar, A., Venuprasad, R., Spaner, D., Verulkar, S., Mandal, N. P., Sinha, P. K.,

Peeraju, P., Dongre, P. R., Mahto, R. N., and Atlin, G., 2009b Characterization of the effect of a

QTL for drought resistance in rice, qtl12.1, over a range of environments in the Philippines and

eastern India.Euphytica 166: 207–217.

92

Bernier, J., Serraj, R., Kumar, A., Venuprasad, R., Impa, S., Gowda, R. P.V., Oane, R., Spaner, D.,

and Atlin, G., 2009a The large-effect drought-resistance QTL qtl12.1 increases water uptake in

upland rice. Field Crops Res 110: 139–146.

Bi, Y. M., Kant, S., Clark, J., Gidda, S., Ming, F., Xu, J., Rochon, A., Shelp, B. J., Hao, L., Zhao, R.,

Mullen, R. T., Zhu, T., and Rothstein, S. J., 2009 Increased nitrogen-use efficiency in transgenic

rice plants over-expressing a nitrogen responsive early nodulin gene identified from rice

expression profiling. Plant Cell Environ 32: 1749–1760.

Blum, A., 1988 Plant breeding for stress environments.Boca Raton; CRC Press.

Bockman, O. C., Kaarstard, O., Lie, O. H., and Richards, I., 1990 Agriculture and fertilizers. Oslo:

Agriculture group Norsk Hydro, Norway.

Bot, A. J., Nachtergaele, O. F., and Young, A., 2000 Land resource potential and constraints at

regional and country levels. Food and Agriculture Organisation, FAO, Rome, World Soil

Resources Report No. 90, 114p;

Bregitzer, P., and Raboy, V., 2006 Effects of four independent low-phytate mutations on barley

agronomic performance, Crop Sci 46:1318–1322.

Broadbent, F. E., DeDatta, S. K., and Laureles, E. V., 1987 Measurement of nitrogen utilization

efficiency in rice genotypes. Agronomy J 79: 786–791.

Cai, C., Wang, J. Y., Zhu, Y. G., Shen, Q. R., Li, B., Tong, Y. P., and Li, Z. S., 2008 Gene structure

and expression of the high-affinity nitrate transport system in rice roots. J Integrat Plant Biol 50:

443–451.

Campbell, W. H., 2002 Molecular control of nitrate reductase and other enzymes involved in

nitrate assimilation. In: Foyer CH, NoctorG, eds. Photosynthetic nitrogen assimilation and

associated carbon and respiratory metabolism, Kluwer Academic, Netherlands 35–48.

93

Cassman, K. G., Gines, G. C., Dizon, M. A., Samson, M. I., and Alcantara, J. M., 1996 Nitrogen-use

efficiency in tropical lowland rice systems: contributions from indigenous and applied nitrogen.

Field Crops Res 47: 1–12.

Chen, S. J., and Kao, C. H., 1996 Ammonium accumulation in relation to senescence of detached

maize leaves. Botanical Bulletin of Academia Sinica 37: 255–259.

Chin, J. H., Gamuyao, R., Dalid, C., Bustamam, M., Prasetiyono, J., Moeljopawiro, S., Wissuwa,

M., and Heuer, S., 2011 Developing rice with high yield under phosphorus deficiency: Pup1

sequence to application.Plant Physiol 156: 1202–1216.

Chin, J. H., Lu, X., Haefele, S. M., Gamuyao, R., Ismail, A., Wissuwa, M., and Heuer, S., 2010

Development and application of gene-based markers for the major rice QTL Phosphorus uptake

1.Theor Appl Genet 120: 1073–1086.

Cho, Y., Jiang, W., Chin, J. H., Piao, Z., Cho, Y. G., McCouch, S. R., and Koh, H. J., 2007

Identification of QTLs associated with physiological nitrogen use efficiency in rice. Molecules

and Cells 23: 72–79.

Cordell, D., Drangert, J. O., and White, S., 2009 The story of phosphorus: Global food security

and food for thought. Global Environmental Change19: 292–305.

Crawford, N. M., and Glass, A. D. M., 1998 Molecular and physiological aspects of nitrate uptake

in plants.Trends Plant Sci 3: 389–395.

Dawe, D., 2000 The potential role of biological nitrogen fixation in meeting future demand for

rice and fertilizer. In: Ladha, J. K., and Reddy, P. M., ed. The quest for nitrogen fixation in rice.

International Rice Research Institute, Philippines. pp. 1–9.

DeDatta, S. K., and Broadbent, F. E., 1988 Methodology for evaluating nitrogen utilization

efficiency by rice genotypes. Agronomy J 80: 793–798.

94

DeDatta, S. K., 1981 Principles and practice of rice production. Singapore: John Wiley and Sons,

348–419.

Ding, Z., Wang, C., Chen, S., and Yu, S., 2011 Diversity and selective sweep in the OsAMT1;1

genomic region of rice. BMC Evolutionary Biology 11: 61.

Dobermann, A., and Fairhurst, T., 2000 Rice: Nutrient Disorders & Nutrient Management.

Potash & Phosphate Institute, Potash & Phosphate Institute of Canada and International Rice

Research Institute.

Dubois, F., Tercé-Laforgue, T., Gonzalez-Moro, M. B., Estavillo, J. M., Sangwan, R., Gallais, A.,

and Hirel, B., 2003 Glutamate dehydrogenase in plants: Is there a new story for an old enzyme?

Plant Physiology and Biochemistry 41: 565–576.

Epstein, E., and Bloom, A. J., 2005 Mineral Nutrition of Plants: Principles and Perspectives, 2nd

edn. Sunderland: Sinauer Associates.

Fan, J. B., Zhang, Y. L., Turner, D., Duan, Y. H., Wang, D. S., and Shen, Q. R., 2010 Root

physiological and morphological characteristics of two rice cultivars with different nitrogen-use

efficiency. Pedosphere 20: 446–455.

Fang, P., Wu, P., 2001 QTL × N-level interaction for plant height in rice (Oryza sativa L.).Plant

and Soil 236: 237–242.

FAO. 2003 Medium-term prospects for agricultural commodities - Projections to the year 2010.

Rome: Food and Agriculture Organization of the United Nations, Italy.

FAO 2008 Current world fertilizer trends and outlook to 2011/12. Rome: Food and Agriculture

Organization of the United Nations, Italy.

95

Feng, H., Yan, M., Fan, X., Li, B., Shen, Q., Miller, A. J., and Xu, G., 2011 Spatial expression and

regulation of rice high-affinity nitrate transporters by nitrogen and carbon status. J Exp Bot 62 :

2319–2332.

Feng, Y., Cao, L. Y., Wu, W. M., Shen, X. H., Zhan, X. D., Zhai, R. R., Wang, R. C., Chen, D. B., and

Cheng, S. H., 2010 Mapping QTLs for nitrogen-deficiency tolerance at seedling stage in rice

(Oryza sativa L.).Plant Breeding 129: 652–656.

Forde, B. G., and Clarkson, D. T., 1999 Nitrate and ammonium nutrition of plants: physiological

and molecular perspectives. Advances in Botanical Research 30: 1–90.

Franco-Zorrilla, J. M., Valli, A., Todesco, M., Mateos, I., Puga, M. I., Rubio-Somoza, I., Leyva, A.,

Weigel, D., García, J. A., and Paz-Ares, J., 2007 Target mimicry provides a new mechanism for

regulation of microRNA activity. Nature Genet 39: 1033–1037.

Fu, J. D., and Lee, B. W., 2008 Changes in photosynthetic characteristics during grain filling of

functional stay-green rice SNUSG1 and its F1 hybrids. J Crop Sci Biotechnology 11: 75–82.

Fu, J. D., Yan, Y. F., Kim, M. Y., Lee, S. H., and Lee, B. W., 2011 Population-specific quantitative

trait loci mapping for functional stay-green trait in rice (Oryza sativa L.). Genome 54: 235–243.

Fukao, T., and Bailey-Serres, J., 2008 Submergence tolerance conferred by Sub1A is mediated

by SLR1 and SLRL1 restriction of gibberellin responses in rice. Proc Natl Acad Sci USA 105:

16814–16819.

Gahoonia, T. S., and Nielsen, N. E., 2004 Root traits as tools for creating phosphorus efficient

varieties. Plant Soil 260: 47–57.

Gazzarini, S., Lejay, L., Gojon, A., Ninnemann, O., Frommer, W. B., and von Wiren, N., 1999

Three functional transporters for constitutive, diurnally regulated, and starvation-induced

uptake of ammonium into Arabidopsis roots.Plant Cell 11: 937–948.

96

Gerendas, J. G., Zhu, Z., and Sattelmacher, B., 1998 Influence of N and Ni supply on nitrogen

metabolism and urease activity in rice (Oryza sativa L.). J Exp Bot 49: 1545–1554.

Glass, A. D. M., Britto, D. T., Kaiser, B. N., Kronzucker, H. J., Kumar, A., Okamoto, M., Rawat, S.

R., Siddiqi, M. Y., Silim, S. M., Vidmar, J. J., and Zhuo, D., 2001 Nitrogen transport in plants, with

an emphasis on the regulation of fluxes to match plant demand.J Plant Nutrition Soil Sci 164:

199–207.

Gulati, A., and Narayanan, S., 2002 Rice trade liberalization and poverty. Washington:

International Food Policy Research Institute.

Habash, D. Z., Massiah, A. J., Rong, H. L., Wallsgrove, R. M., and Leigh, R. A., 2001 The role of

cytosolic glutamine synthetase in wheat. Annals Appl Biol 138: 83–89.

Haefele, S. M., Wopereis, M. C. S., 2005 Spatial variability of indigenous supplies for N, P and K

and its impact on fertilizer strategies for irrigated rice in West Africa. Plant and Soil 270: 57–72.

Hajiboland, R., Afiasgharzad, N., Barzeghar, R., 2009 Influence of arbuscular mycorrhizal fungi

on uptake of Zn and P by two contrasting rice genotypes. Plant Soil Environm 55: 93–100.

Hammond, J. P., Broadley, M. R., White, P. J., 2004 Genetic responses to phosphorus deficiency.

Annals Botany 94: 323–332.

Hammond, J. P., White, P. J., 2011 Sugar signaling in root responses to low phosphorus

availability. Plant Physiol 156: 1033–1040.

Hawkesford, M. J., Howarth, J.R., 2011 Transcriptional profiling approaches for studying

nitrogen use efficiency. In: Foyer C and Zhang H (ed.) Nitrogen Metabolism in Plants in the Post-

genomic Era, Blackwell Publishing Ltd, Annual Plant Reviews 42: 41–62.

97

Hayakawa, T., Yamaya, T., Mae, T., Ojima, K., 1993 Changes in the content of two glutamate

synthase proteins in spikelets of rice (Oryza sativa) plants during ripening.Plant Physiol 101:

1257–1262.

Hernández, G., Ramírez, M., Valdés-López, O., Tesfaye, M., Graham, M. A., Czechowski, T.,

Schlereth, A., Wandrey, M., Erban, A., Cheung, F., Wu, H. C., Lara, M., Town, C. D., Kopka, J.,

Udvardi, M. K., Vance, C.P., 2007 Phosphorus stress in common bean: Root transcript and

metabolic responses. Plant Physiol 144: 752–767

Heuer, S., Lu, X., Chin, J. H., Tanaka, J. P., Kanamori, H., Matsumoto, T., De Leon, T., Ulat, V. J.,

Ismail, A. M., Yano, M., Wissuwa, M., 2009 Comparative sequence analyses of the major

quantitative trait locus phosphorus uptake 1 (Pup1) reveal a complex genetic structure. Plant

Biotech J: 456–457.

Hirel, B., Lea, P. J., 2002 The biochemistry, molecular biology and genetic manipulation of

primary ammonia assimilation. In: Foyer H, Noctor G, eds. Photosynthetic nitrogen assimilation

and associated carbon and respiratory metabolism. Kluwer Academic Netherlands, 71–92.

Hoshida, H., Tanaka, Y., Hibino, T., Hayashi, Y., Tanaka, A., Takabe, T., Takabe, T., 2000

Enhanced tolerance to salt stress in transgenic rice that overexpresses chloroplast glutamine

synthetase.Plant Mol Bio l43: 103–111.

Hossain, M. F., White, S. K., Elahi, S. F., Sultana, N., Choudhury, N. M. K., Alam, Q. K., Rother, J.

A., Gaunt, J. L., 2005 The efficiency of nitrogen fertiliser for rice in Bangladeshi farmers’ fields.

FieldCrops Res 93: 94–107.

Howitt, S., Udvardi, M. K., 2000 Structure, function and regulation of ammonium transporters in

plants.Biochimica et Biophysica Acta 1465: 152–170.

98

Hu, B., Zhu, C., Li, F., Tang, J., Wang, Y., Lin, A., Liu, L., Che, R., Chu, C., 2011 LEAF TIP

NECROSIS1 plays a pivotal role in the regulation of multiple phosphate starvation responses in

rice. Plant Physiol 156: 1101–1115.

Huang, Y. Z., Feng, Z. W., Zhang, F. Z., 2000 Study on loss of nitrogen fertilizer from agricultural

fields and countermeasure. J Graduate School Academia Sinica 17: 49–58.

Imsande, J., Touraine, B., 1994 N demand and the regulation of nitrate uptake.Plant Physiol

105: 3–7.

Ishimaru K, Kobayashi N, Ono K, Yano M, Ohsugi R. 2001. Are contents of Rubisco, soluble

protein and nitrogen in flag leaves of rice controlled by the same genetics? J Exp Bot52: 1827–

1833.

Ishiyama K, Kojima S, Takahashi H, Hayakawa T, Yamaya T. 2003. Cell type distinct accumulation

of mRNA and protein for NADH-dependent glutamate synthase in rice roots in response to the

supply of NH4+. Plant Physiol Biochem41: 643–647.

Ismail AM, Heuer S, Thomson MJ, Wissuwa M. 2007.Genetic and genomic approaches to

develop rice germplasm for problem soils.Plant Mol Biol65: 547–570

Ji TW, Fang P, XING YZ, JIA XM. 2005. Mapping quantitative trait loci for associative nitrogen

fixation ability in rhizosphere of rice seedling. Plant Nutrition Fertilizer Sci11:394-398.

Jia H, Ren H, Gu M, Zhao J, Sun S, Zhang X, Chen J, Wu P, Xu G. 2011. The phosphate transporter

gene OsPht1;8 is involved in phosphate homeostasis in rice. Plant Physiol156: 1164-1175.

Kamachi K, Yamaya T, Mae T, Ojima K. 1991. A role for glutamine synthetase in the

remobilisation of leaf nitrogen during natural senescence in rice leaves. Plant Physiol96: 411–

417.

99

Kant S, Bi YM, Rothstein SJ. 2011. Understanding plant response to nitrogen limitation for the

improvement of crop nitrogen use efficiency. J Exp Bot 62: 1499–1509

Karandashov V, Bucher M. 2005.Symbiotic phosphate transport in arbuscular

mycorrhizas.Trends in Plant Science10: 22–29.

Kronzucker HJ, Kirk GJD, Siddiqi MY, Glass ADM. 1998.Effects of hypoxia on 13NH4+ fluxes in

rice roots: Kinetics and compartmental analysis.Plant Physiol 116: 581–587.

Kronzucker HJ, Siddiqi MY, Glass ADM, Kirk GJD. 1999. Nitrate-ammonium synergism in rice: A

subcellular flux analysis. Plant Physiol119: 1041–1045.

Laffite HR, Edmeades GO. 1994a. Improvement for tolerance to low soil nitrogen in tropical

maize. I. Selection criteria. Field Crops Res39: 1–14.

Laffite HR, Edmeades GO. 1994b. Improvement for tolerance to low soil nitrogen in tropical

maize. II. Grain yield, biomass production, and N accumulation. Field Crops Res39:15–25.

Larson SR, Rutger JN, Young KA, Raboy V. 2000.Isolation and genetic mapping of a non-lethal

rice (Oryza sativa L.) low phytic acid 1 mutation.Crop Sci40: 1397–1405.

Laungani R, Knops JMH. 2009. Species-driven changes in nitrogen cycling can control plant

invasions. Proc Nat Acad Sci USA106: 12400–12405.

Laza MR, Kondo M, Ideta O, Barlaan E, Imbe T. 2006. Identification of quantitative trait loci for

δ13C and productivity in irrigated lowland rice. Crop Sci46: 763–773.

Lea PJ, Miflin BJ. 2011. Nitrogen assimilation and its relevance to crop improvement. In: Foyer C

and Zhang H (ed.) Nitrogen Metabolism in Plants in the Post-genomic Era, Blackwell Publishing

Ltd, Annual Plant Reviews42: 1–40.

100

Lea PJ. 1997. Primary nitrogen metabolism. In: Dey PM, Harborne JB, eds. Plant biochemistry.

London: Academic Press, 273–313.

Li BZ, Merrick M, Li SM, Li HY, Zhu SW, Shi WM, Su YH. 2009b. Molecular basis and regulation of

ammonium transporter in rice.Rice Sci16: 314–322.

Li HY, Zhu YG, Marschner P, Smith FA, Smith SE. 2005. Wheat responses to arbuscular

mycorrhizal fungi in a highly calcareous soil differ from those of clover, and change with plant

development and P supply. Plant and Soil277: 221–232.

Li J, Xie Y, Dai A, Liu L, Li Z. 2009a. Root and shoot traits responses to phosphorus deficiency

and QTL analysis at seedling stage using introgression lines of rice. J Genet Gen36: 173–183.

Li K, Xu C, Li Z, Zhang K, Yang A, Zhang J. 2008. Comparative proteome analyses of phosphorus

responses in maize (Zea mays L.) roots of wild-type and a low-P-tolerant mutant reveal root

characteristics associated with phosphorus efficiency. Plant J55: 927–939.

Li ZK, Arif M, Zhong DB, Fu BY, Xu JL, Domingo-Rey J, Ali J, Vijayakumar CHM, Yu SB, Khush GS.

2006. Complex genetic networks underlying the defensive system of rice (Oryza sativa L.) to

Xanthomonas oryzaepv. oryzae. Proc Natl Acad SciUSA 103: 7994–7999.

Lian X, Wang S, Zhang J, Feng Q, Zhang L, Fan D, Li X, Yuan D, Han B, Zhang Q. 2006. Expression

profiles of 10,422 genes at early stage of low nitrogen stress in rice assayed using a cDNA

microarray. Plant Mol Biol60: 617–631.

Lian X, Xing Y, Yan H, Xu C, Li X, Zhang Q. 2005. QTLs for low nitrogen tolerance at seedling

stage identified using a recombinant inbred line population derived from an elite rice hybrid.

Theor Appl Genet112: 85–96.

Lin CM, Koh S, Stacey G, Yu SM, Lin TY, Tsay YF. 2000. Cloning and functional characterization of

a constitutively expressed nitrate transporter gene, OsNRT1, from rice. Plant Physiol122: 379–

388.

101

Liu QL, Xu XH, Ren XL, Fu HW, Wu DX, Shu QY. 2007. Generation and characterization of low

phytic acid germplasm in rice (Oryza sativa L.). Theor Appl Genet114: 803–814.

Loqué D, von Wirén N. 2004. Regulatory levels for the transport of ammonium in plant roots. J

Expl Bot55: 1293–1305.

Lott JNA, Bojarski M, Kolasa J, Batten GD, Campbell LC. 2009. A review of the phosphorus

content of dry cereal and legume crops of the world. Internat J Agricult Resources, Governance

and Ecology8: 351–370.

Lott JNA, Ockenden I, Raboy V, Batten GD. 2000. Phytic acid and phosphorus in crop seeds and

fruits: a global estimate. Seed Sci Res10: 11–33.

Lynch JP. 2007.Roots of the Second Green Revolution. Turner Review No. 14, Austral J Bot 55:

493–512.

MacDonald GK, Bennett EM, Potter PA, Ramankutty N. 2011. Agronomic phosphorus

imbalances across the world's croplands.Proc Nat Acad Sci USA 108: 3086–3091.

MacMillan K, Emrich K, Piepho H-P, Mullins CE, Price AH. 2006. Assessing the importance of

genotype × environment interaction for root traits in rice using a mapping population II:

conventional QTL analysis. Theor Appl Genet113: 953–964.

Mae T, Ohira K. 1981. The remobilization of nitrogen related to leaf growth and senescence in

rice plants (Oryza sativa L.). Plant Cell Physiol22:1067–1074.

Mae T. 1997. Physiological nitrogen efficiency in rice: Nitrogen utilization, photosynthesis, and

yield potential. Plant and Soil196: 201–210.

Mackill DJ, Ismail AM, Singh US, Labios RV, Paris TR. 2012. Development and rapid adoption of

submergence-tolerant (Sub1) rice varieties. Adv Agron115: 299–352.

102

Maiti D, Singh RK, Variar M. 2012. Rice-based crop rotation for enhancing native arbuscular

mycorrhizal (AM) activity to improve phosphorus nutrition of upland rice (Oryza sativa L.). Biol

Fertil Soils48: 67–73.

Marschner H, Kirby EA, Engels C. 1997.Importance of cycling and recycling of mineral nutrients

within plants for growth and development.Botanical Acta110: 265–273.

Marschner H, Römheld V, Horst WJ, Martin P. 1986. Root-induced changes in the rhizosphere:

importance of the mineral nutrition in plants. Zeitschrift für Pflanzenernährung und

Bodenkdunde 149: 441–456.

Martin A, Lee J, Kichey T, Gerentes D, Zivy M, Tatout C, Dubois F, Balliau T, Valot B, Davanture

M, Tercé-Laforgue T, Quilleré I, Coque M, Gallais A, Gonzalez-Moro MB, Bethencourt L, Habash

DZ, Lea PJ, Charcosset A, Perez P, Murigneux A, Sakakibara H, Edwards KJ, Hirel B. 2006. Two

cytosolic glutamine synthetase isoforms of maize are specifically involved in the control of grain

production. Plant Cell8: 3252–3274.

Martin T, Oswald O, Graham IA. 2002. Arabidopsis seedling growth, storage lipid mobilization,

and photosynthetic gene expression are regulated by carbon: nitrogen availability. Plant

Physiol128: 472–481.

Masclaux C, Quillere I, Gallais A, Hirel B. 2001. The challenge of remobilisation in plant nitrogen

economy: A survey of physio-agronomic and molecular approaches. Annals Appl Biol138: 69–

81.

Matson PA, Naylor R, Ortiz-Monasterio I. 1998.Integration of environmental, agronomic, and

economic aspects of fertilizer management.Science280: 112–115.

Meyer C, Stitt M. 2001. Nitrate reduction and signaling. In: Lea PJ, Morot-Gaudry J-F, eds. Plant

nitrogen, Berlin: Springer, pp37–59.

103

Morcuende R, Bari R, Gibon Y, Zheng W, Pant BD, Blasing O, Usadel B, Czechowski T, Udvardi

MK, Stitt M, Scheible WR. 2007. Genome-wide reprogramming of metabolism and regulatory

networks of Arabidopsis in response to phosphorus. Plant Cell Environ30: 85–112.

Muruli BI, Paulsen GM. 1981. Improvement of nitrogen use efficiency and its relationship to

other traits in maize. Maydica26: 63–73.

Ni JJ, Wu P, Senadhira D, Huang N. 1998.Mapping QTLs for phosphorus deficiency tolerance in

rice (Oryza sativa L.).Theor Appl Genet97:1361–1369

Nilsson L, Műller R, Nielsen TH. 2010. Dissecting the plant transcriptome and the regulatory

responses to phosphate deprivation. Physiol Plantarum139: 129–143.

Obara M, Kajiura M, Fukuta Y, Yano M, Hayashi M, Yamaya T, Sato T. 2001. Mapping of QTLs

associated with cytosolic glutamine synthetase and NADH-glutamate synthase in rice (Oryza

sativa L.). J Expl Bot52: 1209–1217.

Obara M, Sato T, Sasaki S, Kashiba K, Nagano A, Nakamura I, Ebitani T, Yano M, Yamaya T. 2004.

Identification and characterization of a QTL on chromosome 2 for cytosolic glutamine

synthetase content and panicle number in rice. Theor Appl Gen110: 1–11.

Ortiz-Monasterio JI, Manske GGB, van Ginkel M. 2001. Nitrogen and phosphorus use efficiency.

In: Reynolds MP, Ortiz-Monasterio JI, McNab A, eds. Application of physiology in wheat

breeding. CIMMYT, Mexico, 200–207.

Pariasca-Tanaka J, Satoh K, Rose T, Mauleon R, Wissuwa M. 2009. Stress response versus stress

tolerance: a transcriptome analysis of two rice lines contrasting in tolerance to phosphorus

deficiency. Rice2: 167–185.

Park MR, Tyagi K, Baek SH, Kim YJ, Rehman S, Yun SJ. 2010. Agronomic characteristics of

transgenic rice with enhanced phosphate uptake ability by overexpressed tobacco high affinity

phosphate transporter. Pakistan J Bot42: 3265–3273.

104

Paszkowski U, Kroken S, Roux C, Briggs SP. 2002. Rice phosphate transporters include an

evolutionarily divergent gene specifically activated in arbuscular mycorrhizal symbiosis. Proc

Nat Acad Sci USA 99: 13324–13329.

Peng S, Bouman BAM. 2007. Prospects for genetic improvement to increase lowland rice yields

with less water and nitrogen. In: Spiertz JHJ, Struik PC, van Laar HH, eds. Scale and complexity in

plant systems research: Gene-Plant-Crop relations. Springer, 251–266.

Peng S, Khush GS, Cassman KG. 1994. Evolution of the new plant ideotype for increased yield

potential. In: Cassman KG, ed. Breaking the Yield Barrier. Manila: International Rice Research

Institute, Philippines, 57–60.

Piao Z, Li M, Li P, Zhang J, Zhu C, Wang H, Luo Z, Lee J, Yang R. 2009. Bayesian dissection for

genetic architecture of traits associated with nitrogen utilization efficiency in rice. African J

Biotechnol8: 6834–6839.

Raboy V. 2009. Approaches and challenges to engineering seed phytate and total phosphorus.

Plant Sci177: 281–296.

Rae AL, Jarmey JM, Mudge SR, Smith FW. 2004. Over-expression of a high-affinity phosphate

transporter in transgenic barley plants does not enhance phosphate uptake rates. Functional

Plant Biol31: 141–148.

Ramaekers L, Remans R, Rao IM, Blair MW, Vanderleyden J. 2010.Strategies for improving

phosphorus acquisition efficiency of crop plants.Field Crops Res117: 169–176.

Ramalingam J, Vera Cruz CM, Kukreja K, Chittoor JM, Wu JL, Lee SW, Baraoidan M, George ML,

Cohen MB, Hulbert SH, Leach JE, Leung H. 2003.Candidate defense genes from rice, barley, and

maize and their association with qualitative and quantitative resistance in rice.Mol Plant-

Microbe Interact16:14–24.

105

Raven JA, Taylor R. 2003. Macroalgal growth in nutrient enriched estuaries: a biogeochemical

perspective. Water, Air and Soil Pollution: Focus3: 7–26.

Reymond M, Svistoonoff S, Loudet O, Nussaume L, Desnos T. 2006.Identification of QTL

controlling root growth response to phosphate starvation in Arabidopsis thaliana.Plant Cell

Environ29: 115–125.

Rouached H, Stefanovic A, Secco D, Arpat AB, Gout E, Bligny R, Poirier Y. 2011. Uncoupling

phosphate deficiency from its major effects on growth and transcriptome via PHO1 expression

in Arabidopsis.Plant J65: 557–570.

Rubio V, Linhares F, Solano R, Martin AC, Iglesias J, Leyva A, Paz-Ares J. 2001. A conserved MYB

transcription factor involved in phosphate starvation signaling both in vascular plants and in

unicellular algae. Genes Dev15: 2122–2133.

Sawers RJ, Gutjahr C, Paszkowski U. 2008. Cereal mycorrhiza: an ancient symbiosis in modern

agriculture. Trends Plant Sci13: 93–97.

Schachtman DP, Reid RJ, Ayling SM. 1998. Phosphorus Uptake by Plants: From Soil to Cell. Plant

Physiol116: 447–453.

Senthilvel S, Govindaraj P, Arumugachamy S, Latha R, Malarvizhi P, Gopalan A, Maheswaran M.

2004. Mapping genetic loci associated with nitrogen use efficiency in rice (Oryza sativa L.).

Brisbane: 4th International Crop Science Congress, Queensland, Australia.

Senthilvel S, Vinod KK, Malarvizhi P, Maheswaran M. 2008. QTL and QTL×Environment effects

on agronomic and nitrogen acquisition traits in rice. J Integrat Plant Biol50: 1108–1117.

Septiningsih EM, Pamplona AM, Sanchez DL, Neeraja CN, Vergara GV, Heuer S, Ismail AM,

Mackill DJ. 2009. Development of submergence-tolerant rice cultivars: the Sub1 locus and

beyond. Annals Bot103: 151–160.

106

Sheehy JE, Dionora MJA, Mitchell PL, Peng S, Cassman KG, Lemaire G, Williams RL. 1998. Critical

nitrogen concentrations: implications for high-yielding rice (Oryza sativa L.) cultivars in the

tropics. Field Crops Res59: 31–41.

Shimizu A, Yanagihara S, Kawasaki S, Ikehashi H. 2004. Phosphorus deficiency-induced root

elongation and its QTL in rice (Oryza sativa L.). Theor Appl Genet109:1361–1368

Shimizu A, Kato K, Komatsu A, Motomura K, Ikehashi H. 2008. Genetic analysis of root

elongation induced by phosphorus deficiency in rice (Oryza sativa L.): Fine QTL mapping and

multivariate analysis of related traits. Theor Appl Genet117:987–996

Singh N, Dang TTM, Vergara GV, Pandey DM, Sanchez D, Neeraja CN, Septiningsih EM,

Mendioro M, Tecson-Mendoza EM, Ismail AM, Mackill DJ, Heuer S. 2010.Molecular marker

survey and expression analyses of the rice submergence-tolerance gene SUB1A.Theor Appl

Genet121: 1441–1453.

Singh U, Ladha JK, Castillo EG, Punzalan G, Tirol-Padre A, Duqueza M. 1998. Genotypic variation

in nitrogen use efficiency in medium- and long-duration rice. Field Crops Res58: 35–53.

Solaiman MZ, Hirata H. 1997. Effect of arbuscular mycorrhizal fungi inoculation of rice seedlings

at the nursery stage upon performance in the paddy field and greenhouse.Plant Soil 191: 1–12.

Srividya A, Vemireddy LR, Hariprasad AS, Jayaprada M, Sridhar S, Ramanarao PV, Anuradha G,

Siddiq EA. 2010. Identification and Mapping of Landrace Derived QTL Associated with Yield and

its Components in Rice under Different Nitrogen Levels and Environments. Internat J Plant

Breed Genet4: 210–227.

Stitt M. 1999. Nitrate regulation of metabolism and growth. Current Opinion Plant Biol2: 178–

186.

Swamy BPM, Vikram P, Dixit S, Ahmed HU, Kumar A. 2011. Meta-analysis of grain yield QTL

identified during agricultural drought in grasses showed consensus. BMC Genomics12: 319.

107

Tabuchi M, Sugiyama K, Ishiyama K, Inoue E, Sato T, Takahashi H, Yamaya T. 2005. Severe

reduction in growth rate and grain filling of rice mutants lacking OsGS1;1, a cytosolic glutamine

synthetase1;1. The Plant Journal42: 641–651.

Tabuchi M, Abiko T, Yamaya T. 2007.Assimilation of ammonium ions and reutilization of

nitrogen in rice (O. sativa L.).J Exp Bot58: 2319–2327.

Thomson MJ, Ocampo M, Egdane J, Rahman MA, Sajise AG, Adorada DL, Tumimbang-Raiz E,

Blumwald E, Seraj ZI, Singh RK, Gregorio GB, Ismail AM. 2010. Characterizing the Saltol

quantitative trait locus for salinity tolerance in rice. Rice3: 148–160.

Tirol-Padre A, Ladha JK, SinghU, Laureles E, Punzalan G, Akita S. 1996.Grain yield performance

of rice genotypes at suboptimal levels of soil N as affected by N uptake and utilization

efficiency.Field Crops Res46: 127–143.

Tobin AK, Yamaya T. 2001.Cellular compartmentation of ammonium assimilation in rice and

barley.J Exp Bot53: 591–604.

Tong H, Chen L, Li W, Mei H, Xing Y, Yu X, Xu X, Zhang S, Luo L. 2011.Identification and

characterization of quantitative trait loci for grain yield and its components under different

nitrogen fertilization levels in rice (Oryza sativa L.).Mol Breeding28: 495–509.

Tong HH, Mei HW, Yu XQ, Xu XY, Li MS, Zhang SQ, Luo LJ. 2006. Identification of related QTLs at

late developmental stage in rice (Oryza sativa L.) under two nitrogen levels. Acta Genetica

Sinica33: 458–467.

Travis T. 1993. The Haber-Bosch process: exemplar of 20th century chemical industry.

Chemistry and Industry15: 581–585.

UNFPA. 2011. The State of World Population 2011: People and possibilities in a world of 7

billion. United Nations Population Fund, New York. 124p.

108

van Kauwenbergh SJ. 2010. World phosphate rock reserves and resources. International

Fertilizer Development Center, IFDC Technical Bulletin No. 75. Muscle Shoals, Alabama, USA, 58

pp.

Vikram P, Swamy BPM, Dixit S, Ahmed HU, Cruz MTS, Singh AK, Kumar A. 2011.qDTY1.1, a

major QTL for rice grain yield under reproductive-stage drought stress with a consistent effect

in multiple elite genetic backgrounds.BMC Genet12:89

Vinod KK, Meenakshisundaram P, Maheswaran M, Malarvizhi P, Gopalan A. 2011. Quantitative

trait loci for shoot biomass under low nitrogen stress may be putatively linked to loci

influencing glutamine synthetase (GS) activity in rice (Oryza sativa L.). International symposium

on plant biotechnology towards tolerance to stresses and enhancing yield. Ranchi: Birla

Institute of Technology.

Von Braun J. 2008. The food crisis isn’t over. Nature456: 701.

Wang H, Inukai Y, Yamauchi A. 2006. Root development and nutrient uptake. Critical Reviews

Plant Sci25: 279–301.

Wang MY, Siddiqi MY, Ruth TJ, Glass ADM. 1993a.Ammonium uptake by rice roots. I. Fluxes and

subcellular distribution of 13NH4+. Plant Physiol 103: 1249–1258.

Wang WH, Köhler B, Cao FQ, Liu GW, Gong YY, Sheng S, Song QC, Cheng XY, Garnett T, Okamoto

M, Qin R, Mueller-Roeber B, Tester M, Liu LH. 2012. Rice DUR3 mediates high-affinity urea

transport and plays an effective role in improvement of urea acquisition and utilization when

expressed in Arabidopsis. New Phytologist 193: 432–444.

Wasaki J, Yonetani R, Shinano T, Kai M, Osaki M. 2003. Expression of the OsPI1 gene, cloned

from rice roots using cDNA microarray, rapidly responds to phosphorus status. New Phytol 158:

239–248.

109

Williams JF, Mutters RG, Greer CA, Horwath WR. 2010. Rice nutrient management in California.

Agriculture and Natural Resources, University of California.

Williams LE, Miller AJ. 2001. Transporters responsible for the uptake and partitioning of

nitrogenous solutes. Annual Review Plant Physiol Plant Mol Biol52: 659–688.

Wissuwa M, Wegner J, Ae N, Yano M. 2002. Substitution mapping of Pup1: a major QTL

increasing phosphorus uptake of rice from a phosphorus-deficient soil. Theor Appl Genet105:

890–897.

Wissuwa M, Yano M, Ae N. 1998.Mapping of QTLs for phosphorus-deficiency tolerance in rice

(Oryza sativa L.).Theor Appl Genet97: 777–783.

Witcombe JR, Hollington PA, Howarth CJ, Reader S, Steele KA. 2008. Breeding for abiotic

stresses for sustainable agriculture. Philosophical Transactions Royal Society B363: 703–716.

Witt C, Pasuquin JM, Sulewsk G. 2009.Predicting agronomic boundaries of future fertilizer

needs in AgriStats.Better Crops93: 16–18.

Wu P, Liao CY, Hu B, Yi KK, Jin WZ, Ni JJ, He C. 2000. QTLs and epistasis for aluminum tolerance

in rice (Oryza sativa L.) at different seedling stages.Theor Appl Genet 100: 1295–1303.

Wu P, Ma L, Hou X, Wang M, Wu Y, Liu F, Deng XW. 2003. Phosphate starvation triggers distinct

alterations of genome expression in Arabidopsis roots and leaves. Plant Physiol132: 1260–1271.

Xu K, Xu X, Fukao T, Canlas P, Maghirang-Rodriguez R, Heuer S, Ismail AM, Bailey-Serres J,

Ronald PC, Mackill DJ. 2006.Sub1A is an ethylene-response-factor-like gene that confers

submergence tolerance to rice. Nature 442: 705–708.

Yamaya T, Hayakawa T, Tanasawa K, Kamachi K, Mae T, Ojima K. 1992. Tissue distribution of

glutamate synthase and glutamine synthetase in rice leaves: occurrence of NADH-dependent

110

glutamate synthase protein and activity in the unexpanded, non-green leaf blades. Plant Physiol

100: 1427-1432.

Yan M, Fan X, Feng H, Miller AJ, SHEN Q, XU G. 2011. Rice OsNAR2.1 interacts with OsNRT2.1,

OsNRT2.2 and OsNRT2.3a nitrate transporters to provide uptake over high and low

concentration ranges. Plant Cell Environm34: 1360–1372.

Yi K, Wu Z, Zhou J, Du L, Guo L, Wu Y, Wu P. 2005.OsPTF1, a novel transcription factor involved

in tolerance to phosphate starvation in rice.Plant Physiol138: 2087–2096.

Yoo SC, Cho SH, Zhang H, Paik HC, Lee CH, Li J, Yoo JH, Lee BW, Koh HJ, Seo HS, Paek NC. 2007.

Quantitative trait loci associated with functional stay-green SNU-SG1 in rice. Molecules and

Cells24: 83-94.

Zhou J, Jiao F, Wu Z, Li Y, Wang X, He X, Zhong W, Wu P. 2008,OsPHR2 is involved in phosphate-

starvation signaling and excessive phosphate accumulation in shoots of plants. Plant

Physiol146: 1673–1686.

111

Plant Genome Databases and Comparison of Genome Sequences

Dr. J. Ramalingam, Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,

Dr. S. Subashini, Senior Research Fellow Tamil Nadu Agricultural University, Coimbatore.

[email protected]

Aim:

To get familiarized with few plant genome databases and to compare the gene of interest in

those plants.

Description:

Genome Database- Rice

Database for Rice: Oryzabase

Oryzabase is an integrated database of rice genome resources. The Oryzabase consists

of the following parts:

(1) Strain stock information

(2) Mutants information

(3) Chromosome maps

(4) Gene dictionary

(5) Reference list

(6) Fundamental knowledge of rice science

(7) Featured links

Procedure:

1. Open Oryzabase using the URL http://www.nig.ac.jp/labs/PlantGen/english/home-e.html

2. Search for your gene of interest

112

3. Collect the information about its position, Tanscript info, Exon info or the peptide info

Gramene

Gramene is a curated, open-source, data resource for comparative genome analysis in the

grasses. Its goal is to facilitate the study of cross-species homology relationships using

information derived from public projects involved in genomic and EST sequencing, protein

structure and function analysis, genetic and physical mapping, interpretation of biochemical

pathways, gene and QTL localization and descriptions of phenotypic characters and mutations.

Gramene release 36 contains the following information:

1. Genomes

2. Maps and Markers

3. Proteins

4. Ontologies

5. Genes

6. QTL

7. Pathways

8. Diversity

9. Germplasm

10. Literature

11. Website.

Procedure:

1. Open Gramene database using the URL http://www.gramene.org/

2. Select the Genome as your search option.

3. Select any one species of Oryza.

113

4. Select the Chromosome and the location of the gene you are looking for.

5. In the resulting page click on the Locus and get the information about that particular

locus.

6. Then click on the Transcript info, Exon info or the peptide info of your interest.

Genome Database- Maize

Database for Maize: MaizeGDB

MaizeGDB (Maize Genetics and Genomics Database) is a community-oriented, long-term,

federally funded informatics service to researchers focused on the crop plant and model

organism Zea mays. Genetic, genomic, sequence, gene product, functional characterization,

literature reference, and person/organization contact information are among the data types

accessible through this database.

Procedure

1. Open MaizeGDB using the URL: http://www.maizegdb.org/

2. Select the All data option an enter the query of your interest and click go you can get all

information in the MaizeGDB database.

3. Analyze the phenotypic characters and the genes responsible for particular trait.

Genome Database- Arabidopsis

Database for Arabidopsis: TAIR

The Arabidopsis Information Resource (TAIR) maintains a database of genetic and

molecular biology data for the model higher plant Arabidopsis thaliana. Data available from

TAIR includes the complete genome sequence along with gene structure, gene product

information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and

physical markers, publications, and information about the Arabidopsis research community.

114

Gene product function data is updated every two weeks from the latest published research

literature and community data submissions. TAIR also provides extensive link outs from our

data pages to other Arabidopsis resources.

Procedure:

1. Open the Arabidopsis database, The Arabidopsis Information Resource using the URL http://www.arabidopsis.org/

2. From search option, select gene

3. Find the details of the gene of your interest.

Genome Database- Soybean

Database for Soybean

SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for

professionally curated genetics, genomics and related data resources for soybean. SoyBase

contains the most current genetic, physical and genomic sequence maps integrated with

qualitative and quantitative traits. SoyBase also contains the well-annotated ‘Williams 82’

genomic sequence and associated data mining tools. The genetic and sequence views of the

soybean chromosomes and the extensive data on traits and phenotypes are extensively

interlinked. This allows entry to the database using almost any kind of available information,

such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the

repository for controlled vocabularies for soybean growth, development and trait terms, which

are also linked to the more general plant ontologies.

Procedure:

1. Open Soybase - soybean genetics and genomics database using the URL http://soybase.org.

2. Search for gene in the Soybase toolbox provided.

115

Prediction of genes in genomic sequences

Dr.L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,

V. Srividhya, Senior Research Fellow Tamil Nadu Agricultural University, Coimbatore

[email protected]

Aim

To computationally predict protein encoding region in a given DNA sequence

Objectives

1. To predict open reading frames (ORFs) in a prokaryotic example

2. To predict the possible start, stop and intron/exon structures in a eukaryotic example.

And, evaluate the gene finding accuracy, as well.

Brief methodology

Gene finding in prokaryotes: Gene finding in prokaryotes is restricted to the search for

open reading frame (ORF) in a DNA sequence. It is that region of the prokaryotic genome which

could potentially encode a protein. An ORF is characterized by the occurrence of a start codon

“ATG” and ending with any of the three termination codons (TAA, TAG, and TGA). Depending

on the starting point, there are six possible ways (three on forward strand and three on reverse

strand) of translating any nucleotide sequence into amino acid sequence according to the

genetic code. These are called reading frames. ORFs are usually encountered when sifting

through pieces of DNA while trying to locate a gene. Since there are altered genetic codes even

among the prokaryotes, the ORF will be identified differently. A typical ORF finder will employ

algorithms based on existing genetic codes and on all possible reading frames.

Gene finding in eukaryotes: The gene structure and the gene expression mechanism in

eukaryotes are far more complicated than in prokaryotes. In typical eukaryotes, the region of

the DNA coding for a protein is usually not continuous. This region is composed of alternating

stretches of exons and introns. During transcription, both exons and introns are transcribed

116

onto the RNA, in their linear order. Thereafter, a process called splicing takes place, in which

the intron sequences are excised and discarded from the RNA sequence. The remaining RNA

segments, the ones corresponding to the exons, are ligated to form the mature mRNA strand.

Gene finding typically refers to the area of computational biology that is concerned with

algorithmically identifying features, usually in a genomic DNA. Gene finding is one of the first

and most important steps in understanding the genome of a species once it has been

sequenced. These include protein-coding genes as well as other functional elements such as

RNA genes and regulatory regions. Three types of information are used in gene prediction

process, i) Signals in the sequence, such as splice sites; ii) Content statistics, such as codon bias;

and iii) Similarity to known genes. The first two types have been used since the early days of

gene prediction, whereas similarity information has been used routinely only in recent years.

One of the reasons that the accuracy of gene-prediction programs have improved in the last

few years is the enormous increase in the number of examples of known coding sequences.

Following gene prediction, determining the function of gene(s) is vital and it is done primarily

through experimentation, although frontiers of bioinformatics research are making it

increasingly possible to predict the function of a gene based on its sequence alone.

Procedure

Step 1: Find manually the presence/absence of an ORF in the sequence below.

Find in which reading frame does ORF falls.

5’ – TGCTTGTCTGATGTGCTGGCTGACGTGATGACATTTGTAGTTGATAG – 3’

117

Step 2: Go to following URL www.ncbi.nlm.nih.gov/gorf/gorf.html. Identify the ORFs in the

given DNA sequences (Test sequence 2 & Test sequence 3)

Step 3: Predict gene using Rice HMM, http://rgp.dna.affrc.go.jp/RiceHMM/ for the “test

sequence 3”. Find out whether your prediction matches with the experimental information on

gene structure known to this sequence (GenBank AF323175).

118

Step 4: Predict gene using “Rice HMM” for the “test sequence 4”. Appreciate the deviations

between predictions and experimentally published results (AB050986). Consult your results

obtained in the light of the following questions, how many exons are predicted? Where do the

predicted exons start and end? If there is a difference, where does the difference lie?

Inference

Discuss your results with respect to the success in predicting genes given the prokaryotic and

eukaryotic examples.

119

Comparative Genomics

Mrs. N. Bharathi, Assistant Professor, Department of Plant Molecular Biology and Bioinformatics, CPMB&B,


Introduction

The rapidly emerging field of comparative genomics has yielded dramatic results.

Comparative genome analysis has become feasible with the availability of a number of

completely sequenced genomes. Comparison of complete genomes between organisms allow

for global views on genome evolution and the availability of many completely sequenced

genomes increases the predictive power in deciphering the hidden information in genome

design, function and evolution. The rapid progress in genome sequencing demands more

comparative analysis to gain new insights into evolutionary, biochemical, genetic, metabolic,

and physiological pathways. Comparative genomics is the direct comparison of complete

genetic material of one organism against that of another to gain a better understanding of how

species evolved and to determine the function of genes and noncoding regions in genomes. It

includes a comparison of gene number, gene content, and gene location, the length and

number of coding regions (called exons) within genes, the amount of non coding DNA in each

genome, and conserved regions maintained in both prokaryotic and eukaryotic groups of

organisms. Comparative genomics not only can trace out the evolutionary relationship between

organisms but also differences and similarities within and between species.

Applications:

Gene identification

Once genome correspondence is established, comparative genomics can aid gene

identification. Comparative genomics can recognize real genes based on their patterns of

nucleotide conservation across evolutionary time. With the availability of genome-wide

alignments across the genomes compared, the different ways by which sequences change in

120

known genes and in intergenic regions can be analyzed. The alignments of known genes will

reveal the conservation of the reading frame of protein translation (Fitch et al., 1970).

Regulatory motif discovery

Regulatory motifs are short DNA sequences about 6 to 15bp long that are used to

control the expression of genes, dictating the conditions under which a gene will be turned on

or off. Each motif is typically recognized by a specific DNA-binding protein called a transcription

factor (TF). A transcription factor binds precise sites in the promoter region of target genes in a

sequence-specific way, but this contact can tolerate some degree of sequence variation.

Thus, different binding sites may contain slight variations of the same underlying motif,

and the definition of a regulatory motif should capture these variations while remaining as

specific as possible. Comparative genomics provides a powerful way to distinguish regulatory

motifs from non-functional patterns based on their conservation. One such example is the

identification of TF DNA-binding motif using comparative genomics and denovo motif (Fitch et

al., 1995).

Other applications:

Comparative genomics has wide applications in the field of molecular medicine and

molecular evolution. The most significant application of comparative genomics in molecular

medicine is the identification of drug targets of many infectious diseases. For example,

comparative analyses of fungal genomes have led to the identification of many putative targets

for novel antifungal (Batzoglou et al., 2000). This discovery can aid in target based drug design

to cure fungal diseases in human. Comparative analysis of genomes of individuals with genetic

disease against healthy individuals may reveal clues of eliminating that disease. Comparative

genomics helps in selecting model organisms. A model system is a simple, idealized system that

can be accessible and easily manipulated. For example, a comparison of the fruit fly genome

with the human genome discovered that about 60 percent of genes are conserved between fly

and human.

121

Tools for Comparative Genomics

1. GLIMMER is a system for finding genes in microbial DNA, especially the genomes of

bacteria and archaea. GLIMMER (Gene Locator and Interpolated Markov ModelER) uses

interpolated Markov models to identify coding regions.

2. CMR The Comprehensive Microbial Resource (CMR) gives access to a central repository

of the sequence and annotation of all complete public prokaryotic genomes as well as

comparative genomics tools across all of the genomes in the database.

3. CisMols - (Cis-regulatory Modules) is a tool that identifies compositionally predicted cis-

clusters that occur in groups of co-regulated genes within each of their ortholog-pair

evolutionarily conserved cis-regulatory regions.

4. DAVID Bioinformatic Resources - The Database for Annotation, Visualization and

Integrated Discovery (DAVID) provides a comprehensive set of functional annotation

tools for investigators to understand biological meaning behind large list of genes.

5. DCODE.ORG The dcode.org website provides access to tools for comparative genomic

analyses developed by the Comparative Genomics Center at the Lawerence Livermore

National Laboratory. Tools include: zPicture, Mulan, eShadow, rVista, CREME, and the

ECR Browser

6. EnteriX- EnteriX is a collection of tools for viewing pairwise and multiple alignments for

bacterial genome sequences.

7. Mauve - Mauve is a stand-alone software tool for constructing multiple genome

alignments.

8. MicroFootPrinter - MicroFootPrinter identifies the conserved motifs in regulatory

regions of prokaryotic genomes using the phylogenetic footprinting program

FootPrinter.

9. Viral Bioinformatics - Viral Bioinformatics provides access to viral genomes and a variety

of tools for comparative genome analyses.

122

10. TIGR Software Tools - A list of open-source software packages available for free from

The Institute for Genomic Research (TIGR).

11. TraFaC- TraFaC (Transcription Factor Binding Site Comparison) is a tool that identifes

regulatory regions using a comparative sequence analysis approach.

References:

Fitch, W. M., et al., 1970 Syst Zool. 19:99 [PMID: 5449325].

Fitch, W. M., et al., Philos Trans, R., Soc Lond, B., 1995 BiolSci. 349:93 [PMID: 8748022].

Batzoglou, S., et al., 2000 Genome Res. 10:950 [PMID: 10899144].

Mao, L., & Zheng, W. J., 2006 BMC Bioinformatics. 7:s21 [PMID: 17217514].

Sivashankari, S., et al., 2007 Comparative genomics - A perspective Bioinformation. 1(9): 376-

378.

123

Microarray-based gene expression analysis

M. Mugilisai, IV yr. B. Tech (Bioinformatics) S. Saravanakumar, Senior Research Fellow

Dr.L. Arul, Associate Professor Department of Plant Molecular Biology and Bioinformatics, CPMB&B,

Tamil Nadu Agricultural University, Coimbatore [email protected]

Aim

To identify gene(s) those are differentially expressed under varied experimental

conditions in a microarray dataset.

Objectives

1. To download microarray data from Gene Expression Omnibus (GEO) database

2. To extract expression values for a select set of genes of choice from the microarray data

file using array id/element id of the gene(s)

3. Visualization of the differential expression pattern using MeV

Brief methodology

The key step in the differential expression analysis is “Clustering”. It basically group’s

similar samples or genes together. Hierarchical clustering (HCL) is a commonly used clustering

algorithm. A hierarchical clustering of samples is a tree representation of their relative similarity

in expression profiled of the features over a set of samples or groups. The hierarchical

clustering algorithm requires distance measure and a cluster linkage. The cluster distance

metric specifies the distance between two clusters. The default distance metric used for HCL is

Pearson correlation. The most common measure of correlation in statistics is the Pearson

Correlation, which shows the linear relationship between two variables. Correlation between

variables is a measure of how well the variables are related. Cluster linkage, this parameter is

used to indicate the cluster-to-cluster distances when constructing the hierarchical tree. In

124

single linkage, the distances are measured between each member of one cluster and each

member of the other cluster.

Multi Experiment Viewer (MeV) is the tool which is used for this purpose, it is an open

ware. MeV is a desktop application meant for the analysis, visualization and data mining of

large scale genomic data. It is a versatile microarray tool, for clustering, visualization,

classification, statistical analysis and biological theme discovery. MeV can be freely downloaded

from the site http://www.tm4.org/mev/.

Procedure

Step 1: Open GEO database (URL: http://www.ncbi.nlm.nih.gov/geo/) and enter the following

GSE ID (GSE6901) in the tab GEO accession.

125

Step 2: Download the “Series Matrix File” which contains the microarray experimental data

Step 3: Extracting the expression values of a few select gene(s) from the entire microarray data

file was carried out via searching the file for element/array id’s of the corresponding genes.

Towards this, firstly the array id/element ids need to be decoded. Since the present exercise

deal with the expression of rice genes, Rice Oligonucleotide Array Database (URL:

http://www.ricearray.org) will be used to find the array id/element ids.

Click, “Element Search” “Single Element Search”, enter the GenBanK id (eg. Os01g0818400)

of the gene(s) of interest (this information should be known prior to us).

Example GenBank id’s: Os01g0818400, Os01g0818600, Os01g0818800, Os01g0818900,

Os01g0819000 & (Os06g0681400, ubiquitin an internal control gene)

126

The following are the corresponding element Id’s for above said gene ids:

Os.24893.1.S1_a_at, Os.34757.1.S1_at, Os.54860.1.S1_at, Os.51620.1.S1_at, Os.25065.3.S1_at

& Os_Ubiquitin_M_f_at

Note: The element id is unique for each of the microarray platforms eg (Agilent, Affymetrix and

NSF etc.).

Step 4: Subsequently, the microarray expression data was extracted by copying the values into

a new file. A word search using “find” option has to be carried out as below.

127

Step 5: Instead of retaining the columns by GSM E ids, rename the column by experimental

descriptors for easy interpretation (eg. Drought stress or control)

Step 6: The new file carrying the extracted data should be save in text format (*.txt ,Text tab

delimited). Example file name: GSE 6901 data.txt

128

Step 7: Download MeV Tool from the URL: www.tm4.org/mev. After downloading the MeV,

open the folder and then click Windows batch file to run MeV

Step 8: Working in MeV

MeV, go to File Load Data, on the expression loader, select Browse tab

Choose your file Example: GSE 6901 data.txt

129

A diagram will pop up with lots of tiny green and red boxes as shown below. This is the Heat

map View. Each row represents a specific gene, whereas each column represents each sample,

or experiment. The lightest green boxes are the most “under expressed” genes, the brightest

red being the most “over expressed” genes. This is called result tree

Step 9: For Hierarchical Clustering (HCL), on the left top click the “Clustering” drop down menu.

Choose the first option, Hierarchical Clustering

130

Step 10: Choose 1- “Pearson correlation” and “Single linkage” method to get hierarchical

clustering output. To view the clusters, double click “HCL” on the navigation tree on the left

side of the window.

Inference

Analyze the heat map generated for up/down (differential) regulation of genes under the

given experimental conditions.

Note

Gene expression under specific conditions can also be analyzed by using .soft files from

GEO database.

Documents

Approaches to Plant Genomics - btistnau.inbtistnau.in/PDF/BTIS Manual-2013.pdf · Traditional mutagenesis for Plant Functional Genomics 60 10. Genomic approaches of nitrogen and phosphorus