16
The Human Genome Project Lecture 4 Strachan and Read Chapter 8

The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Embed Size (px)

Citation preview

Page 1: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

The Human Genome Project

Lecture 4

Strachan and Read Chapter 8

Page 2: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

The HGP’s primary aims

• The main aims of the Human Genome Project (HGP) were to: – Construct maps of the genome (genetic and

physical) – Identify all the genes (now known to be about

30,000) – Determine the entire DNA sequence

(3,000,000,000 bp)

Page 3: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Other aims of HGP

• As well as the genome sequence, the aims were:

• Technology development

• Model organism genome projects (E. coli, yeast, mouse, fruit fly, C. elegans)

• Ethical, legal and societal implications (ELSI)

Page 4: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

The linkage map • The map was built by linkage studies in 60 large families with

grandparents and large numbers of children, collected by the University of Utah and the Centre d'Étude du Polymorphisme Humain (CEPH), Paris

• Families were typed with over 5000 polymorphic DNA sequences: 60% were microsatellite repeats (mostly dinucleotide (CA) repeats, also some tri- and tetra-nucleotides). Only about 400 of them were actual genes

• Construction of the genetic map: – Obtain genotypes of all markers on all family members (PCR and gel

electrophoresis, using robots and automated gel apparatus – Calculation of recombination fractions between markers – Observe crossovers between closely linked markers, use this information

to confirm order of markers • Construction of the linkage map is a very big problem; sophisticated

software was used to work out the "best fit" map of all the markers, with advanced statistical methods and algorithms

Page 5: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

STSs and ESTs• Sequence tagged sites (STSs) are specific loci in the genome, for which

enough DNA sequence is available to make PCR primers to amplify the locus (usually as a fragment of a few 100bp). These include microsatellites (e.g. CA repeats) that can be used for linkage studies.

• The information required to use an STS is just the sequences of the PCR primers; therefore it is very easy to make databases of STSs that can be used by anyone. No actual bits of DNA need change hands. This is crucial in allowing genome projects to proceed as international collaborations, with many laboratories participating in a co-ordinated way.

• ESTs act as specific tags for each human gene, since they are derived by sequencing cDNA clones which came from mRNA and therefore represent the actual transcribed sequences (as opposed to STSs, which can be derived from anywhere in the genome and are mostly non-coding). They allow rapid access to the actual genes, ignoring introns and “junk” DNA

Page 6: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

ESTs can be 3' or 5' depending on which end of the cDNA was sequenced. Because of the methods used to make cDNA libraries, parts of the 5' end of the gene are often lost during cloning whereas the 3' end is more reliable. Therefore, the same gene may give different 5' ESTs and it will difficult to deduce whether they have come from the same gene. This shown on the diagram by the white boxes representing cDNA clones being different lengths. Another complication is due to alternative splicing. On the left is shown the genomic structure of a gene, with the exons as boxes - the red one is subject to alternative splicing.

Page 7: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

X-ray hybrid mapping • X-ray hybrids are made by irradiating a human cell line with 3000

rad of X-rays, fusion to hamster cells, and isolation of hybrid cell lines in culture

• A panel of 100-200 hybrids with 5-10 different fragments of human DNA in each gives about 1000 fragments in total, i.e. the human genome has been divided into 1000 bits.

• The closer together 2 markers are in the genome, the more likely it is that they will be present in the same hybrids (since they are less likely to be separated by an X-ray induced break).

• By doing a PCR assay for each marker on all the hybrids, a map can be made. The units are called cR (centiray, where 1cR is a 1% chance that the markers will be separated by X-ray breakage).

Page 8: The Human Genome Project Lecture 4 Strachan and Read Chapter 8
Page 9: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

For each pair of markers in turn the "co-retention frequency" is the number of hybrids in which both markers are present, divided by the number of hybrids in which one or other (or both) markers are present. On the figure, there are 5 hybrids containing both markers B and C, and 6 containing B and/or C. Therefore the co-retention frequency is 5/6 or 0.83. Likewise it is 6/7 for markers E and F, and 2/10 for markers C and E. This shows that B and C are close together, E and F are close together, but C and E are further apart. The analysis is extended to all the markers and their order is worked out by considering all the co-retention frequencies.

Page 10: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Clone contigs

• A clone contig is a series of cloned DNA segments that overlap each other, assembled in the correct order along the genome

• The clones are made using vectors: – cosmids (capacity 45 kb) – BACs or YACs (Bacterial or Yeast Artificial

Chromosomes) which can clone 100s of kb of DNA - more suitable for dealing with large stretches of mammalian DNA.

Page 11: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Making a clone contig by fingerprinting

Page 12: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Putting it together

• The physical map consists of 1000s of cloned genomic DNA fragments, in E coli host cells (BACs, cosmids, 40-250kb) or yeast (100-1500kb: "Yeast artificial chromosomes" or YACs), X-ray hybrids, and hundreds of thousands or STSs and ESTs.

• The linkage map contains several thousand STSs. • All of these can be linked together to produce an

integrated genome map. • The presence or absence of each STS or EST in each

X-ray hybrid and cloned DNA is simply determined by PCR.

• Because of the huge numbers involved, automation of the assays is required.

Page 13: The Human Genome Project Lecture 4 Strachan and Read Chapter 8

Sequencing

• There was a great deal of human genome to sequence (3000 Mb, or 3 x 109 bp).

• Due to the limitations of the techniques, each sequencing reaction can only generate up to 700 bp of DNA sequence.

• So the total sequence must be assembled from millions of short, overlapping bits of sequence. The starting point for this is the contigs of overlapping BAC clones.

• Each clone in the contig is subcloned into 100s of smaller fragments, using a plasmid vector suitable for preparing templates for the DNA sequencing reactions.

Page 14: The Human Genome Project Lecture 4 Strachan and Read Chapter 8
Page 15: The Human Genome Project Lecture 4 Strachan and Read Chapter 8
Page 16: The Human Genome Project Lecture 4 Strachan and Read Chapter 8