50
Molecular Genetics Genome Sequencing

Genome sequencing

Embed Size (px)

Citation preview

Page 1: Genome sequencing

Molecular Genetics

Genome Sequencing

Page 2: Genome sequencing

What is a genome Types of genomes What is genomics How is genomics different from genetics Types of genomics Genome sequencing

Milestones in genomic sequencing Technical foundations of genomics Steps of genome sequencing DNA sequencing approaches

Hierarchical shotgun sequencing Markers used in mapping large genomes

Whole genome shotgun sequencing New technologiesGenome sequencing achievment in Bangladesh

Benefits of Genome Research

At a glance At a glance

Page 3: Genome sequencing

WHAT IS A GENOME? WHAT IS A GENOME? Genome: One complete set of genetic information (total amount of DNA) from a haploid set of chromosomes of a single cell in eukaryotes, in a single chromosome in bacteria, or in the DNA or RNA of viruses.

Basic set of chromosome in a organism.

“The whole hereditary information of an organism that is encoded in the DNA”

•In cytogenetic genome means a single set of chromosomes.

•It is denoted by x. Genome depends on the number of ploidy of organism.

• In Drosophila melanogaster (2n = 2x = 8); genome x = 4.• In hexaploid Triticum aestivum (2n = 6x = 42); genome x = 7.

Continue………

Page 4: Genome sequencing

The genome is found inside every cell, and in those that have nucleus, the genome is situated inside the nucleus. Specifically, it is all the DNA in an organelle.

The term genome was introduced by H. Winkler in 1920 to denote the complete set of chromosomal and extra chromosomal genes present in an organism, including a virus.

Page 5: Genome sequencing

How many types of genomes are:How many types of genomes are:

1. Prokaryotic Genomes2. Eukaryotic Genomes

• Nuclear Genomes• Mitochondrial Genomes• Choloroplast Genomes

If not specified, “genome” usually refers to the nuclear genome.

• Genomics is the study of the structure and function of whole genomes.

• Genomics is the comprehensive study of whole sets of genes and their interactions rather than single genes or proteins.

• According to T.H. Roderick, genomics is the mapping and sequencing to analyze the structure and organization of genome.

WHAT IS GENOMICS?WHAT IS GENOMICS?

Page 6: Genome sequencing

Origin of terminologyOrigin of terminology• The term genome was used by German botanist Hans

Winker in 1920 • Collection of genes in haploid set of chromosomes• Now it encompasses all DNA in a cell

Genomics is the sub discipline of molecular genetics devoted to the

Genomics is the sub discipline of molecular genetics devoted to the

The field includes studies of intro-genomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome.

Page 7: Genome sequencing

The sequence information of the genome will show;

The position of every gene along the chromosome, The regulatory regions that flank each gene, and The coding sequence that determines the protein

produce by each gene.

How is Genomics different from Genetics?Genetics as the study of inheritance and genomics as the

study of genomes.– Genetics looks at single genes, one at a time, like a

picture or snapshot.– Genomics looks at the big picture and examines all the

genes as an entire system.

Page 8: Genome sequencing

Types of GenomicsTypes of Genomics1. Structural: It deals with the determination of the complete sequence of genomes and gene map.

This has progressed in steps as follows:

(i) construction of high resolution genetic and physical maps,

(ii) sequencing of the genome, and

(iii) determination of complete set of proteins in an organism.

2. Functional: It refers to the study of functioning of genes and their regulation and products(metabolic pathways), i.e., the gene expression patterns in organism.

3. Comparative: It compare genes from different genomes to elucidate functional and evolutional relationship.

Page 9: Genome sequencing

Genome sequencing is the technique that allows researchers to read the genetic information found in the DNA of anything from bacteria to plants to animals. Sequencing involves determining the order of bases, the nucleotide subunits- adenine(A), guanine(G), cytosine(C) and thymine(T), found in DNA.

Genome sequencing is figuring out the order of DNA nucleotides.

Genome SequencingGenome Sequencing

Challenges of genome sequencingChallenges of genome sequencing

Data produce in form of short reads, which have to be assembled correctly in large contigs and chromosomes.

Short reads produced have low quality bases and vector/adaptor contaminations.

Several genome assemblers are available but we have to check the performance of them to search for best one.

Page 10: Genome sequencing

Milestones in Genomic SequencingMilestones in Genomic Sequencing

1977; Fred Sanger; φX 174 bacteriophage (first sequenced genome ); 5,375 bpAmino acid sequence of phage proteinsOverlapping genes only in viruses

Fig: The genetic map of phage φX174 (Overlapping reading frames)

Continue………

Page 11: Genome sequencing

1995; Craig Venter & Hamilton Smith;Haemophilus influenzae (1,830,137 bp) (1st free living).Mycoplasma genitalium (smallest free-living, 580,000 bp; 470 genes)

1996; Saccharomyces cerevisiae; (1st eukaryote) 12,068,000 bp

1997; Escherichia coli; 4,639,221 bp; Genetically more important.

1999; Human chromosome 22; 53,000,000 bp

2000; Drosophila melanogaster; 180,000,000 bp

2001; Human; Working draft; 3,200,000,000 bp

2002; Plasmodium falciparum; 23,000,000 bp

Anopheles gambiea; 278,000,000 bp

Mus musculus; 2,500,000,000 bp

2003; Human; finished sequence, 3,200,000,000 bp

2005; Oryza sativa (first cereal grain); 489,000,000 bp

2006; Populus trichocarpa (first tree) ; 485,000,000 bp

Page 12: Genome sequencing

Technical foundations of genomics Molecular biology: Almost all of the

underlying techniques of genomics originated with recombinant-DNA technology.

DNA sequencing: In particular, almost all DNA sequencing is still performed using the approach pioneered by Sanger.

Library construction: Also essential to high-throughput sequencing is the ability to generate libraries of genomic clones and then cut portions of these clones and introduce them into other vectors.

PCR amplification: The use of the polymerase chain reaction (PCR) to amplify DNA, developed in the 1980s, is another technique at the core of genomics approaches.

Log

MW

Distance

. ...

Hybridization techniques: Finally, the use of hybridization of one nucleic acid to another in order to detect and quantitate DNA and RNA (Southern blotting). This method remains the basis for genomics techniques such as microarrays.

Page 13: Genome sequencing

Break genome into smaller fragments Sequence those smaller pieces Piece the sequences of the short fragments together

Two different methods used

1. Hierarchical shotgun sequencing

-Useful for sequencing genomes of higher vertebrates that contain repetitive sequences

2. Whole genome Shotgun Sequencing

-Useful for smaller genomes

Steps of genome sequencingSteps of genome sequencing

DNA sequencing approachesDNA sequencing approaches

Page 14: Genome sequencing

• The method preferred by the Human Genome Project is the hierarchical shotgun sequencing method.

• Also known as – The Clone-by-Clone Strategy– the map-based method– map first, sequence later  

– top-down sequencing

Hierarchical Shotgun Sequencing Hierarchical Shotgun Sequencing 

Human Genome Project adopted a map-based strategy– Start with well-defined physical map– Produce shortest tiling path for large-insert clones– Assemble the sequence for each clone

– Then assemble the entire sequence, based on the physical map

Page 15: Genome sequencing

1) Markers for regions of the genomes are identified.

2) The genome is split into larger fragments (50-200kb) using restriction/cutting enzymes that contain a known marker.

3) These fragments are  cloned  in  bacteria  (E. coli) using BACs (Bacterial Artificial Chromosomes), where they are replicated and stored.

4) The  BAC  inserts  are  isolated  and  the  whole  genome  is  mapped  by finding markers regularly spaced along each chromosome to determine the order of each cloned.

5) The fragments contained in these clones have different ends, and with enough coverage finding a scaffold of BAC contigs. This scaffold is called a tiling path. BAC contig that covers the entire genomic area of  interest makes up the tiling path.

6) Each BAC fragment in the Golden Path is fragmented randomly into smaller pieces and these fragments are individually sequenced using automated Sanger sequencing and sequenced on both strands.

7) These sequences are aligned so that identical sequences are overlapping. Assembly of the genome is done on the basis of prior knowledge of the markers used to localize sequenced fragments to their genomic location. A computer stitches the sequences up using the markers as a reference guide.

In The Clone-by-Clone StrategyIn The Clone-by-Clone Strategy

Continue………

Page 16: Genome sequencing

Fig: Hierarchical shotgun sequencing

In this approach, every part of the genome is actually sequenced roughly 4-5 times to ensure that no part of the genome is left out.

Page 17: Genome sequencing

The Clone-by-Clone Strategy used in S. cerevisiae (yeast), C. elegans (nematode), Arabidopsis thaliana (mustard weed), Oryza sativa, Drosophila melanogaster and Homo sapiens (Human), etc.

Each 150,000 bp fragment is inserted into a BAC (bacterial artificial chromosome). A BAC can replicate inside a bacterial cell. A set of BACs containing an entire human genome is called a BAC library.

Page 18: Genome sequencing

The Clone-by-Clone StrategyMarkers used in mapping large genomes

The Clone-by-Clone StrategyMarkers used in mapping large genomes

Different  types  of  Markers  are  used  in  mapping  large genomes, Such as 

A. Restriction Fragment Length Polymorphisms (RFLP) 

B. Variable Number of Tandem Repeats (VNTRs)

C. Sequence Tagged Sites (STS)

D. Microsatellites, etc. 

Page 19: Genome sequencing

A. Restriction Fragment Length Polymorphisms (RFLP) 

Polymorphism means that a genetic locus has different forms, or alleles.

The cutting the DNA from any two individuals with a restriction enzyme may yield fragments of different lengths, called Restriction Fragment  Length  Polymorphisms  (RFLP), is usually pronounced “rifflip”.

The pattern of RFLP generated will depend mainly on

– 1) The differentiation in DNA of selected strains (or) species– 2) The restriction enzymes used– 3) The DNA probe employed for southern hybridization

Steps: a. Consider the restriction enzyme HindIII, which recognizes the sequence

AAGCTT.

b. Between two, One individual contains three sites of a chromosome, so cutting the DNA with HindIII yields two fragments, 2 and 4 kb long.

Continue………

Page 20: Genome sequencing

Figure:  Detecting a RFLP

c. Another individual may lack the middle site but have the other two, so cutting the DNA with HindIII yields one fragment 6 kb long. These fragments are called RFLP.

Continue………

Page 21: Genome sequencing

d. These restriction fragments of different lengths beteween the genotypes can  be  detected  on  southern  blots  and by  the  use  of  suitable probe. An RFLP is detected as a differential movement of a band on the gel lanes from different species and strains. Each such bond is regarded as single RFLP locus. So any differences among the DNA of individuals are easy to see.

e. This RFLP  is used as a marker in chromosomal mapping.

Limitations Requires relatively large amount of highly pure DNA Laborious and expensive to identify a suitable marker restriction

enzymes. Time consuming. Required expertise in auto radiography because of using radio actively

labeled probes

Page 22: Genome sequencing

B. Variable Number of Tandem Repeats (VNTRs)Due to the greater the degree of polymorphism of a RFLP, mapping

become very tedious, in this case variable  number  tandem  repeats (VNTRs) will be more useful.

Tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other.

An example would be:

In which the sequence ATTCGCCAATC is repeated three times.

• A variable  number  tandem  repeat (or VNTR) is a location in a genome where a short nucleotide sequence is organized as a tandem repeat.

• The repeated sequence is longer — about 10-100 base pairs long.• The full genetic profiles of individuals reveal many differences. • Since most human genes are the same from person to person, but

Variable Number of Tandem Repeats or VNTRs that tends to differ among different people.

ATTCGCCAATC  ATTCGCCAATC  ATTCGCCAATCATTCGCCAATC  ATTCGCCAATC  ATTCGCCAATC

Continue………

Page 23: Genome sequencing

• While the repeated sequences  themselves are usually the same from person to person, the number of times they are repeated tends to vary.

• VNTRs are highly  polymorphic. These can be isolated  from  an individual’s DNA and therefore relatively easy to map.

• However, VNTRs have a disadvantage as genetic markers: They tend to bunch together at the ends of chromosomes, leaving the interiors of the chromosomes relatively devoid of markers.

Page 24: Genome sequencing

C. Sequence Tagged Sites (STS)

Another kind of genetic marker, which is very useful to genome mappers, is the sequence-tagged site (STS). •STSs are short sequences, about 60–1000  bp  long, that can be easily detected by PCR using specific primers. •The sequences of small areas of this DNA may be known or unknown, so one can design primers that will hybridize to these regions and allow PCR to produce double stranded fragments of predictable lengths. If the proper size appears, then the DNA has the STS of interest.•One great advantage of STSs as a mapping tool is that no DNA must be cloned and examined. •Instead, the sequences of the primers used to generate an STS are published and then anyone in the world can order those same primers and find the same STS in an experiment that takes just a few hours.

Continue………

Page 25: Genome sequencing

In this example, two PCR primers (red) spaced 250 bp apart have been used. Several cycles of PCR generate many double-stranded PCR products that are precisely 250 bp long. Electrophoresis of this product allows one to measure its size exactly and confirm that it is the correct one.

Figure : Sequence-tagged sites

Page 26: Genome sequencing

1. Geneticists interested in physically mapping or sequencing a given region of a genome aim to assemble a set of clones called a contig, which contains contiguous (actually overlapping) DNAs spanning long distances.

2. It is essential to have vectors like BACs and YACs that hold big chunks of DNA. Assuming we have a BAC library of the human genome, we need some way to identify the clones that contain the region we want to map.

3. A more reliable method is to look for STSs in the BACs. It is best to screen the BAC library for at least two STSs, spaced hundreds of kilo-bases apart, so BACs spanning a long distance are selected.

4. After we have found a number of positive BACs, we begin mapping by screening them for several additional STSs, so we can line them up in an overlapping fashion as shown in following figure. This set of overlapping BACs is our new contig. We can now begin finer mapping, and even sequencing, of the contig.

Making physical map using Sequence Tagged Sites (STS) 

Continue………

Page 27: Genome sequencing

At top left, several representative BACs are shown, with different symbols representing different STSs placed at specific intervals. In step (a) of the mapping procedure, screen for two or more widely spaced STSs. In this case screen for STS1 and STS4. All those BACs with either STS1 or 4 are shown at top right. The identified STSs are shown in color. In step (b), each of these positive BACs is further screened for the presence of STS2, STS3, and STS5.The colored symbols on the BACs at bottom right denote the STSs detected in each BAC. In step (c), align the STSs in each BAC to form the contig. Measuring the lengths of the BACs by pulsed-field gel electrophoresis helps to pin down the spacing between pairs of BACs.

Fig: Mapping with STSs.

Page 28: Genome sequencing

D. Microsatellites STSs are very useful in physical mapping or locating specific sequences in the genome. But sometimes it is not possible to use them for genetic mapping. Fortunately, geneticists have discovered a class of STSs called microsatellites.

GCTTGGTGTGATGTAGAAGGCGCCAATGCATCTCGACGTATGCGTATACGGGTTACCCCCTTTGCAATCAGTGCACACACAC

ACACACACACACACACACACACACACACACAGTGCCAAGCAAAAATAACGCCAAGCAGAACGAAGACGTTCTCGAGAACACC

GCTTGGTGTGATGTAGAAGGCGCCAATGCATCTCGACGTATGCGTATACGGGTTACCCCCTTTGCAATCAGTGCACACACAC

ACACACACACACACACACACACACACACACAGTGCCAAGCAAAAATAACGCCAAGCAGAACGAAGACGTTCTCGAGAACACC

Microsatellites are similar to minisatellites in that they consist of a core sequence repeated over and over many times in a row.

The core sequence in typical microsatellites is smaller—usually only 2–4 bp long.

Microsatellites are highly polymorphic; they are also widespread and relatively uniformly distributed in the human genome.

The number of repeats varied quite a bit from one individual to another. Thus, they are ideal as markers for both linkage and physical mapping.

Continue………

Page 29: Genome sequencing

In 1992, Jean Weissenbach et al produced a linkage map of the entire human genome based on 814 microsatellites containing a C–A dinucleotide repeat.

The most common way to detect microsatellites is to design PCR primers that are unique to one locus in the genome and unique on base pair on either side of the repeated portion.

Therefore, a single pair of PCR primers will work for every individual in the species and produce different sized products for each of the different length microsatellites.

The PCR products are then separated by either gel electrophoresis. Either way, the investigator can determine the size of the PCR product and thus how many times the dinucleotide ("CA") was repeated for each allele.

Page 30: Genome sequencing

Whole genome Shotgun SequencingWhole genome Shotgun SequencingThe shotgun-sequencing strategy, first proposed by Craig Venter,

Hamilton Smith, and Leroy Hood in 1996, bypasses the mapping stage and goes right to the sequencing stage.

This method was employed by Celera Genomics, which was a private entity that was trying to mono-polise the human genome sequence by patenting it, to do this they had to try and beat the publicly funded project. Whole genome shotgun sequencing was therefore adopted by them.

1. BAC library: A BAC library is generated of random fragments of the human genome using restriction digestion followed by cloning.

The sequencing starts with a set of BAC clones containing very large DNA inserts, averaging about 150 kb. The insert in each BAC is sequenced on both ends using an automated sequencer that can usually read about 500 bases at a time, so 500 bases at each end of the clone will be determined.

Assuming that 300,000 clones of human DNA are sequenced this way, that would generate 300 million bases of sequence, or about 10% of the total human genome. These 500-base sequences serve as an identity tag, called a sequence-tagged connector (STC), for each BAC clone. This is the origin of the term connector—each clone should be “connected” via its STCs to about 30 other clones. Continue………

Page 31: Genome sequencing

Fig: Whole Genome Shotgun Sequencing Method

Steps:1. BAC library

2. Finger printing

3. Plasmid library

4. BAC walking

5. Powerful computer program

Continue………

Page 32: Genome sequencing

2. Finger printing: This step is to fingerprint each clone by digesting it with a restriction enzyme. This serves two important purposes. First, it tells the insert size (the sum of the sizes of all the fragmented by the restriction enzyme). Second, it allows one to eliminate aberrant clones whose fragmentation patterns do not fit the consensus of the overlapping clones. Note that this clone fingerprinting is not the same as mapping; it is just a simple check before sequencing begins. 3. Plasmid library: A seed BAC is selected for sequencing. The seed BAC is sub cloned into a plasmid vector by subdividing the BAC into smaller clones only about 2 kb. A plasmid library is prepared by transforming E. coli strains with plasmid. This whole BAC sequence allows the identification of the 30 or so other BACs that overlap with the seed: They are the ones with STCs that occur somewhere in the seed BAC.

4. BAC walking: Three thousand of the plasmid clones are sequenced, and the sequences are ordered by their overlaps, producing the sequence of the whole 150-kb BAC. Finding the BACs (about 30) with overlapping STCs, then compare them by fingerprinting to find those with minimal overlaps, and sequence them. This strategy, called BAC walking, would in principle allow one laboratory to sequence the whole human genome. Continue………

Page 33: Genome sequencing

5. Powerful computer program: But we do not have that much time, so Venter and colleagues modified the procedure by sequencing BACs at random until they had about 35 billion bp of sequence. In principle that should cover the human genome ten times over, giving a high degree of coverage and accuracy. Then they fed all the sequence into a computer with a powerful program that found areas of overlap between clones and fit their sequences together, building the sequence of the whole genome.

Page 34: Genome sequencing

Finishing

• Process of assembling raw sequence reads into accurate contiguous sequence– Required to achieve

1/10,000 accuracy

• Manual process– Look at sequence reads at

positions where programs can’t tell which base is the correct one

– Fill gaps– Ensure adequate coverage

GapSingle

stranded

Continue………

Page 35: Genome sequencing

Finishing

• To fill gaps in sequence, design primers and sequence from primer

• To ensure adequate coverage, find regions where there is not sufficient coverage and use specific primers for those areas

GAP

Primer

Primer

Page 36: Genome sequencing

Verification

• Region verified for the following:– Coverage– Sequence quality– Contiguity

• Determine restriction-enzyme cleavage sites – Generate restriction map of sequenced region– Must agree with fingerprint generated of clone

during mapping step

Page 37: Genome sequencing

New technologiesNew technologies

• A high-priority goal at the beginning of the Human Genome Project was to develop new mapping and sequencing technologies

• To date, no major breakthrough technology has been developed– Possible exception: whole-genome shotgun sequencing applied

to large genomes, Celera

Automated sequencersAutomated sequencers• Perhaps the most important contribution to large-

scale sequencing was the development of automated sequencers– Most use Sanger sequencing method– Fluorescently labeled reaction products– Capillary electrophoresis for separation

Page 38: Genome sequencing

Automated sequencers: ABI 3700

96–well plate

robotic arm and syringe

96 glass capillaries

load bar

MegaBACE ABI 3700

Page 39: Genome sequencing

Automatic gel reading

Computer image of sequence read by

automated sequencer

Page 40: Genome sequencing

Sequence assembly readout

Consensus building

Page 41: Genome sequencing

Genome sequencing achievment in Bangladesh

Genome sequencing achievment in Bangladesh

• Genome sequencing of Macrophomina phaseolina • Genome sequencing of Jute

Page 42: Genome sequencing

Genome of destructive Pathogen Macrophomina phaseolina unraveled by Maqsudul Alam & BJRI Associates

Genome of destructive Pathogen Macrophomina phaseolina unraveled by Maqsudul Alam & BJRI Associates

Macrophomina phaseolina is a soil and seed borne fungus.

it can infect more than 500 cultivated and wild plant species. 

It causes seedling blight, dry  root  rot, wilt,  leaf blight, stem blight, root and stem rot of different cultivated and wild plant species. 

The  fungus can remain viable for  more  than 4 years in  soil  and crop.

Continue………

Page 43: Genome sequencing

• The Basic and Applied Research on Jute (BARJ) project team,  led by  Prof Maqsudul Alam,  took  this  unique  challenge  and,  for  the  first time  in  the  world,  decoded the genome of this most dangerous fungus. 

• They have identified the proteins and their networks that the fungus uses to attack and kill the plant. This fundamental knowledge will help to defend and fight against this fungus and to promote the development of resistant varieties of jute as well as other crops. 

Page 44: Genome sequencing

• Jute  was  called  the  Golden Fiber of Bangladesh as Bangladesh  was  the  largest jute production country  of  the world. 

• Genome  sequencing  of  jute  has  been  discovered  by Bangladeshi scientists. 

Genome sequencing of Tossa jute (Corchorus olitorius)Genome sequencing of Tossa jute (Corchorus olitorius)

Continue………

Page 45: Genome sequencing

• The country first time in world decoded the jute genome.• The research team was led by Professor Maqsudul Alam from University of 

Hawaii,  who  also  successfully  led  the  genome  discovery  of  papaya  in  USA and rubber in Malaysia.

This  was  done  under  the Basic & Applied Research on Jute Project (BARJ).

• Also included  a  group  of  Bangladeshi  researchers  from  Dhaka

University's  Biochemistry  and  Biotechnology departments, 

Bangladesh Jute Research Institute (BJRI) software firm Data Soft in collaboration with Centre 

for Chemical Biology,  University of Science, Malaysia and  University of Hawaii have  successfully  decoded 

the  jute's genome.

Page 46: Genome sequencing

Fig: Internationally famed geneticist Maqsudul Alam and

other scientists of jute genome project

Page 47: Genome sequencing

Anticipated Benefits of Genome ResearchAnticipated Benefits of Genome Research

Molecular Medicine • improve diagnosis of disease• detect genetic predispositions to disease• create drugs based on molecular information

Microbial Genomics• rapidly detect and treat pathogens (disease-causing microbes) in clinical practice• develop new energy sources (biofuels)• monitor environments to detect pollutants• clean up toxic waste safely and efficiently. 

Risk Assessment • evaluate the health risks faced by individuals who may be exposed to radiation and to cancer-causing chemicals and toxins

Bio-archaeology, Anthropology, Evolution

• study evolution through mutations in lineages• study migration of different population groups based on maternal inheritance Continue………

Page 48: Genome sequencing

• compare breakpoints in the evolution of mutations with ages of populations and historical events. 

Agriculture, Livestock Breeding, and Bio-processing

• grow disease-, insect-, and drought-resistant crops• breed healthier, more productive, disease-resistant farm animals• grow more nutritious produce• develop biopesticides• incorporate edible vaccines incorporated into food products

DNA Identification (Forensics)

• identify potential suspects whose DNA may match evidence left at crime scenes• identify crime victims• establish paternity and other family relationships• identify endangered and protected species as an aid to wildlife officials • detect bacteria and other organisms that may pollute air, water, soil, and food• match organ donors with recipients in transplant programs 

Page 49: Genome sequencing

References • Weaver RF 2005. Molecular Biology. McGraw-Hill

International edition, NY.• Gardner EJ, MJ Simmons and DP Snustad 1991.

Principles of Genetics. John Wiley and Sons Inc, NY.

• Gupta, P.K. 2007. Genetics. Rastogi Publications, Meerut.

• Allison LA, 2007. Fundamental Molecular Biology, Blackwell publishing, USA

• Internet

Page 50: Genome sequencing

Thank You