72
Genomics and Personalized Care in Health Systems Lecture 1: Introduction Leming Zhou, PhD Department of Health Information management School of Health and Rehabilitation Sciences The University of Pittsburgh

Genomics and Personalized Care in Health Systems Lecture 1: Introduction

  • Upload
    boyce

  • View
    21

  • Download
    1

Embed Size (px)

DESCRIPTION

Genomics and Personalized Care in Health Systems Lecture 1: Introduction. Leming Zhou, PhD Department of Health Information management School of Health and Rehabilitation Sciences The University of Pittsburgh. Text Books. - PowerPoint PPT Presentation

Citation preview

Page 1: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Genomics and Personalized Care in Health Systems

Lecture 1: Introduction

Leming Zhou, PhD

Department of Health Information management

School of Health and Rehabilitation Sciences

The University of Pittsburgh

Page 2: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Text Books

• Jonathan Pevsner, Bioinformatics and Functional Genomics, Second Edition, Wiley-Blackwell, 2009.

• Ebook: Genes and Disease, searchable and freely available http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/gnd/gnd.pdf or http://www.ncbi.nlm.nih.gov/disease/

Page 3: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Course Description• This course will focus on general introduction to genomics,

gene structure and annotation, and gene and disease association.

• Other topics such as RNA and protein structure, and microarray experiments will also be briefly covered.

• Students will understand gene structure and be familiar with various genome analysis tools by working on novel gene annotation projects.

Page 4: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Course Objectives (1/2)• Explain eukaryotic gene structure and molecular biology

central dogma

• Demonstrate the skills of annotating eukaryotic genes using online tools

• Demonstrate the skills of performing sequence similarity search using blast

• Demonstrate the skills of collecting evidence from UCSC genome browser

• Describe major DNA and protein databases and the method of extracting data from them

Page 5: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Course Objectives (2/2)• Explain major gene finding methods, their advantages and

disadvantages

• Describe different types of genetic diseases and the relationship between genetic variations and diseases

• Demonstrate the skills of determining protein and RNA secondary structures using online tools

• Explain basic ideas behind microarray and DNA sequencing technologies

Page 6: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Method of Presentation

• Lectures

• In-Class Laboratory Sessions

• Student Projects and Presentations

• Term Paper (graduate students)

Page 7: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Course Outline (Tentative)

Page 8: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Basic Concepts

Page 9: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA (1/3)• DNA (Deoxyribonucleic Acid), a helical molecular

comprising a sequence of four nucleotides (bases)– Adenine (A) – purine; Thymine (T) – pyrimidine

– Guanine (G) – purine; Cytosine (C) - pyrimidine

Cytosine ThymineAdenine Guanine

Page 10: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA (2/3)• A is always paired with T, while G always with C

Page 11: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA (3/3)• A DNA sequence can

be either single-stranded or double-stranded

• DNA sequences have an orientation: from 5’ to 3’ or from 3’ to 5’ (chemical conventions)

Page 12: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Nucleotides

Page 13: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

RNA• RNA (RiboNucleic Acid), usually a single-

stranded molecular

• It comprises four nucleotides

– A, C, G, and U (Uracil)

• Produced by copying one of the two strands of a DNA molecule in the 5’ to 3’ direction

• Different types of RNAs

– Messenger RNA (mRNA)

– Transfer RNA (tRNA)

– Ribosomal RNA (rRNA)

– …

Uracil

Page 14: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Protein• A molecule comprising a long chain of amino acids

connected by peptide bonds

• There are 20 standard amino acids encoded by the universal genetic code

Molecular Biology of the Cell,Alberts et al. 2002

Page 15: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Cell Types

• Prokaryotes: a group of organisms that lack of nucleus membrane, such as blue-green algae and common bacteria (Escherichia coli). It has two major taxa: Archaea and Bacteria

• Eukaryotes: unicellular and multicellular organisms, such as yeast, fruitfly, mouse, plants, and human

Page 16: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Gene• A stretch of DNA containing the information

necessary for coding a protein/polypeptide

• Promoter region

• Transcription Factor Binding Site

• Translation Start Site

• Exon: coding (informative) regions of the DNA

• Intron: noninformative regions between exons

• Untranslated region (UTR)

• Codons

Page 17: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Eukaryotic Gene Structure

http://www.nslij-genetics.org/pic/dna-rna-protein.jpg

Page 18: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Eukaryotes• In eukaryotes, transcription is complex:

– Many genes contain alternating exons and introns

– Introns are spliced out of mRNA

– mRNA then leaves the nucleus to be translated by ribosomes

• Genomic DNA: entire gene including exons and introns– The same genomic DNA can produce different proteins by

alternative splicing of exons

• Complementary DNA (cDNA): spliced sequence containing only exons– cDNA can be manufactured by capturing mRNA and performing

reverse transcription

Page 19: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Central Dogma of Molecular Biology

• DNA RNA Protein

DNA RNA protein

Transcription Translation

Page 20: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA Transcription• RNA molecules synthesized by RNA polymerase

• RNA polymerase binds to promoter region on DNA

• Promoter region contains start site

• Transcription ends at termination signal site

• Primary transcript: direct coding of RNA from DNA

• RNA splicing: introns removed to make the mRNA

• mRNA: contains the sequence of codons that code for a protein

• Splicing and alternative splicing

• Post-transcriptional modification

Page 21: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA Translation• Ribosomes is made of protein and rRNA

• mRNA goes through the ribosomes

• Initiation factors: proteins that catayze the start of transcription

• tRNA brings the different amino acids to the ribosome complex so that the amino acids can be attached to the growing amino acid chain

• When a STOP codon is encountered, the ribosome releases the mRNA and synthesis ends

• An open reading frames (ORF): a contiguous sequence of DNA starting at a start codon and ending at a STOP codon

http://www.youtube.com/watch?v=5bLEDd-PSTQ

Page 22: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Chromosomes• A chromosome is a long and tightly wound DNA string (visible under

a microscope)

• Chromosomes can be linear or circular

• Prokaryotes usually have a single chromosome, often a circular DNA molecule

• Eukaryotic chromosome appear in pairs (diploid), each inherited from one parent

– Homologous chromosomes carry the same genes

– Some genes are the same in both parents

– Some genes appear in different forms called alleles, e.g., human blood type has three alleles: A, B, and O

• All genes are presented in all cells, but a give cell types only expressed a small portion of the genes

Page 23: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Chromosomal Location

Page 24: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Genome• The genome is formed by one or more chromosomes

• A genome is the entire set of all DNA contained in a cell

• A human genome has 46 chromosomes

• The total length of a human genome is 3 billion bases

Page 25: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Species Complete Draft Assembly

(Almost complete)

In process Total

All 1153 1285 889 3327

Eukaryotes 36 319 294 649

http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html

Genome Sequences

Retrieved on 1/8/2012

Page 26: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Genome Sequence SizesDNA Sequence size is measured as base pairs (bp)

• Phage phiX174 5,368

• HIV virus 9,193

• SARS 29,751

• Haemophilus influenzae (bacteria) 1,830,000

• Escherichia coli K12 4,600,000

• Saccharomyces cerevisiae (yeast) 12,500,000

• Drosophila melanogaster (fruit fly) 180,000,000

• Arabidopsis thaliana (thale cress) 125,000,000

• Homo sapiens (human) 3,000,000,000

Page 27: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

The Whole Picture

Page 28: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Genomics• The definition of genomics may be different from person to

person

• Genomics involves large data sets (whole genome sequences) and high-throughput methods (DNA sequencing technologies)

– Genetics research focuses on one or a set of genes

• Genomics may or may not include other specific research areas, such as proteomics, transcriptomics, variomics, metabolomics, etc.

• In this course, genomics includes DNA sequence analysis, genomics variations, gene expression, and proteomics.

Page 29: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Topics in This Course• Molecular Biology Databases

• Sequence Alignment

• Blast Search

• Genome Browser

• Gene Finding Methods

• Genomic Variations and Disease

• Protein and RNA Secondary Structure

• High-throughput Technologies

Page 30: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Molecular Biology Databases

Page 31: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Important Databases• Genome

– NCBI

– European Molecular Biology Lab ( EMBL )

– DNA Database of Japan ( DDBJ )

– Go ( Gene Ontology )

– Consortium of databases

• Flybase, Mouse Genome Database (MGD)

• Protein– Protein Data Bank (PDB)

– ENBL-EBI ( European Bioinformatics Institute )

• Uniprot, Expasy, Swiss-Prot

• KEGG: Kyoto Encyclopedia of Genes and Genomes

Page 32: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

NCBI (www.ncbi.nlm.nih.gov)• NCBI – National Center for Biotechnology Information

• Established in 1988 as a national resource for molecular biology information

• NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information

• Databases– GenBank, dbSNP, RefSeq, etc.

– PubMed, OMIM, MMDB, UniGene

– The Taxonomy Browser

• Tools– Blast, Cn3D, etc.

– Entrez is NCBI’s search and retrieval system that provides users with integrated access to sequence, mapping, taxonomy, and structural data

Page 33: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

PDB (www.pdb.org)• The Protein Data Bank (PDB) is the single worldwide

depository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.

• Understanding the shape of a molecule helps to understand how it works.

• The PDB was established in 1971 at Brookhaven National Lab and originally contained 7 structures

• In 1998, the Research Collaboratory for Structural Bioinformatics(RCSB) became responsible for the management of the PDB

• PDB provides– Sequence, atomic coordinates, derived geometric data,

secondary structure, and annotations about protein literature references

Page 34: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

KEGG• KEGG: Kyoto Encyclopedia of Genes and Genomes

• Contains Pathway information as well as (1/10/2011)– KEGG PATHWAY: 126,336 pathways generated from 379

reference pathways

– KEGG GENES: 6,121,933 genes in 139 eukaryotes +

1144 bacteria + 94 archaea

– KEGG GENOME: 1,508 organisms

– KEGG DISEASE: 375 disease

– KEGG DRUG: 9,316 drugs

Page 35: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Sequence Alignment

Page 36: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Sequence Similarity• Similarity: The extent to which nucleotide or protein

sequences are related. It is based upon identity plus conservation.

• Identity: The extent to which two sequences are invariant.

• Conservation: Changes at a specific position of a DNA or amino acid sequence that preserve the properties of the original residue.

• The distance between two sequences, based on an evolutionary model, describes when the two sequences had a common ancestor

Page 37: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Sequence AlignmentSequence alignment is the procedure of comparing two or more DNA

or protein sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences.

Given two sequences A and B, an alignment is a pair of sequences A’ and B’ such that:

1. A’ is obtained from A by inserting gap character ‘-’

2. B’ is obtained from B by inserting gap character ‘-’

3. A’ and B’ have some length: |A’|=|B’|

4. No position has gap characters in both A’ and B’

Example:

A = ATGGCT

B = TGCTA

A’= ATGGCT-

B’= -TG-CTA

Goal: given two sequences, find the “best” alignment according some scoring function

Page 38: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Types of Sequence Alignment

• Pairwise Alignment – compare two sequences

• Multiple Alignment – compare one sequence to many others

For each of the above we can do

• Local Alignment – compare similar parts of two sequences

• Global Alignment – compare the whole sequence

For the different types of alignments there are different assumptions and methods

Page 39: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Global Alignment vs. Local Alignment

• Local alignment: finds continuous or gapped high-scoring regions which do not span the entire length of the sequences being aligned

• Global alignment: finds the optimal full-length alignment between the two sequences being aligned

Page 40: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Pairwise Alignment• The process of lining up two sequences to achieve

maximal levels of identity/similarity for the purpose of assessing the degree of similarity and the possibility of homology.

• It is used to decide if two genes are structurally or functionally related

• It is used to identify domains or motifs that are shared between proteins

• It is used in the analysis of genomes

Page 41: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

An Example of Pairwise Alignment 1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 LAC

51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: | .| . || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 LAC

98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 LAC

• Symbols between two sequences (Ssearch format):Bar: identical; One dot: somewhat similar; Two dots: very similar

• Dots in sequences: gaps

Page 42: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Multiple Sequence Alignment• Multiple sequence alignment is an alignment of three or

more sequences such that each column of the alignment is an attempt to represent the evolutionary changes I one sequence position, including substitutions, insertions, and deletions.

• It is believed that over time the functional components embedded within the sequences are conserved in order to retain function– One of the most important elements of sequences is the

phylogenetic information that similarities represent

– The sequence similarities gives insight into the evolution of families of protein or DNA sequences

Page 43: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

An Example of Multiple Sequence Alignment

fly GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA human GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA plant GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA yeast GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA archaeon GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA

fly KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST human KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST plant KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST yeast KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST archaeon KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST

fly GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK human GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV plant GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA yeast GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV archaeon GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA

Page 44: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Evolutionary Basis of Sequence Comparison

• The simplest molecular mechanisms of evolution are substitution, insertion, and deletion

• If a sequence alignment represents the evolutionary relationship of two sequences, residues that are aligned but do not match equal substitutions

• Residues that are aligned with a gap in the sequence represent insertions or deletions

Page 45: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Homology• Homology: Similarity attributed to descent from a common ancestor.

• There are two type of homology: Paralogs and Orthologs

• Orthologs: – Homologous sequences in different species that arose from a common ancestral

gene during speciation;

– May or may not be responsible for a similar function.

– Members of a gene family in various organisms

• Paralogs:

– Homologous sequences within a single species that arose by gene duplication.

– Members of gene family within a species

• Genes either are homologous, or they are not. There are no degrees of homology

Page 46: Genomics and Personalized Care in Health Systems Lecture 1: Introduction
Page 47: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Blast Search

Page 48: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Similarity Search• Find statistically significant matches to a protein or DNA

sequence of interest.

• Obtain information on inferred function of the gene

• Sequence alignment algorithms

– Dynamic Programming

• Needleman-Wunsch Global Alignment (1970)

• Smith-Waterman Local Alignment (1981)

• Guaranteed to find the best alignment

• Slow, especially search against a large database

Page 49: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

FASTA and BLAST

• Sequence Alignment Heuristics

– FASTA and BLAST: heuristic approximations to Smith-waterman

• Fast and results comparable to the Smith-Waterman algorithm

• FASTA and BLAST also calculate significance of the search results alignments

Page 50: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

BLAST• Basic Local Alignment Search Tool: A sequence

comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.

• Expected Value (E)– The number of matches expected to occur randomly with a given score.

– The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance.

– The lower the E value, more significant the match.

– The Expect value can be any positive real number.

Page 51: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

BLAST Search

>seq exampleGAKKVIISAPSADAPMFVCGVNLDAYKPDMKVVSNASCTTNCLAPLAKV

INDNFEIVEGLMTTVHATTATQKTVDGPSGKLWRDGRGAAQNIIPASTGAAKAVGKVIPALNGKLTGMAFRVPTPNVSVVDLTVRLGKGASYDEIKAK

Page 52: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Genome Browser

Page 53: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Genome Browser• Genome Browser is a computer program which helps to display gene

maps, browse the chromosomes, align genes or gene models with ESTs or contigs etc.

UCSC Genome Browser (http://genome.ucsc.edu)

Page 54: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

NCBI Mapviewer

Page 55: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Gene Finding Methods

Page 56: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Gene Prediction Methods

• Ab initio genes prediction programs

• Programs using expressed sequences

• Programs using evolutionary conservation

Page 57: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Evolution

• Evolution in two ways:

– Mutation

– Selection pressure to eliminate random mutations

• Mutations which cause frame shifts in the coding exon regions of important proteins will most likely not survive.

• Mutations in introns or in non-gene regions will have very little effect on the survival of the species and therefore they will be kept in the sequence.

• When two sequences are aligned and compared, the regions

that are conserved will be most likely the gene-regions.

Page 58: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Gene Annotation

http://www.pggrc.co.nz/Portals/0/Mbb%20ruminantium%20genome%20DIAGRAM.jpg

Page 59: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Genomic Variations and Disease

Page 60: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA Variations

• DNA Mutation

– Synonymous mutations

– Non-synonymous mutation

Page 61: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Genome Sequences and Diseases

http://genomics.energy.gov

Page 62: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Single Nucleotide Polymorphisms • Genomic sequences from two unrelated individuals are 99.9%

identical.

• The 0.1% difference is due to genetic variations, and mainly one form of variation called single nucleotide polymorphisms (single-base mutations).

• Other genetic variations may produced from nucleotide insertions and deletions (Tandem repeat polymorphisms and insertion / deletion polymorphisms)

• These polymorphisms are considered one of the key factors that makes each and every one of us different and can have a major impact on how we respond to diseases; environmental insults such as bacteria, viruses and chemicals; and drugs and other therapies.

Page 63: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

SNPs and Mutations• Terminology for variation at a single nucleotide position is

defined by allele frequency.

– A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP)

– When a single base change occurs at <1% it is considered to be a mutation

Page 64: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Protein/RNA Structure

Page 65: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

RNA Structure

• RNA can have a complicated secondary structure

Gene VIII, Lewin, 2004

Page 66: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Protein Structure

• Primary structure: amino acid sequence

• Secondary structure: local structure such as alpha helix and beta sheets

• Tertiary structure: 3D structure of a protein monomer

• Quaternary structure: 3D structure of a fully functional protein (protein complexes)

Page 67: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Protein Secondary Structure

• Protein can have secondary structure

• Alpha helix and Beta sheet

Molecular Cell Biology, Lodish et al. 2000

Page 68: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Protein 3D Structure• Protein structure is closely

related to its biological function/activity

• One protein may have multiple domains which are used to have functional interactions with different molecules

– Domains in one protein may have extensively interaction or simply be connected by the protein sequence

Human P53 core domainMMDB ID: 69151PDB ID: 3D0A

Page 69: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

High-Throughput Technologies

Page 70: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

DNA Sequencing Technologies• Sanger method, 1977

• Used in Human Genome Project• Slow, and expensive ($300m/genome)

• Whole Genome Shotgun sequencing (1990s)• Break the genome into short pieces • Sequence all the pieces in parallel• Put all the pieces back together (sequence assembly)• Faster and cheaper (~$10m/genome)

• Next generation sequencing technologies (2000s)• Much faster speed & lower cost (<$5k/genome,2010)• May be used for personal genomics

Page 71: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

http://beespotter.mste.uiuc.edu/topics/genome/Honey%20bee%20genome.html

Page 72: Genomics and Personalized Care in Health Systems Lecture 1: Introduction

Department of Health Information Management

Microarray

http://www.coriell.org/index.php/content/view/93/184/