Genomics Transcriptomics Proteomics Metabolomics Genes mRNA Proteins MetabolitesGC-MS 2D PAGE...

Preview:

Citation preview

Genomics

Transcriptomics

Proteomics

Metabolomics

Genes

mRNA

Proteins

Metabolites GC-MS

2D PAGEMALDI-MSESI-MS

DNA arraysGeneChip

Sequencing Programs

TechniquesApproach Component examined

mRNA level expressed protein level nor does it indicate the nature of the functional protein product

GenomicSequence

mRNAProteinProduct

FunctionalProteinProduct

Transcriptional

Control

Translational

Control

Post-Translational

Control

Temporal Changes in mRNA and protein

When you measure expression affects what you find

ProteinGene Expression

t t t

Does mRNA level correlate with protein level?

Anderson & SeilhamerElectrophoresis1997 18:533-537

Anderson & AndersonElectrophoresis1998 19:1853-1861

From Tew et al 1996

20 liver proteins and corresponding mRNAs

Glutathione-S-transferasein 60 human cell lines

xx

x

xx

x

LungOvarianCNSLeukemiaRenalMelanomaBreast

0.1 1.0 10 1000.1

1.0

10

100

1000

R = 0.43

Protein (Affinity-HPLC)

mR

NA

(N

orth

ern)

0.1 1 10 1000.1

1

10

100

1000

R=0.48

Protein (2D gels)

mR

NA

(E

ST

clo

nes)

• Static• Can be amplified• Little complexity:

Single component• Good solubility

characteristics

• Very dynamic• Cannot be amplified• Very complex:

post-translational modification

• Variable solubility

DNA Protein

Challenges of proteins vs DNA

Identifying new protein complexes:

Isolation of proteins using: Classical Purification +1D PAGETag Purification +1D PAGE

Phenotypic Complexity of the Eukaryotic Proteome

Domain Accretion

Protein Architecture

Protein Diversity

Functional Diversity

Domain Expansion

Somatic Rearrangement

Alternative SplicingHorizontal Transfer

Modifications

Biological Processes

Paralogous Expansion

Protein Interactions

Evolution Somatic

de novo Systems

• Duplication• Divergence• Recombination

Recombination

Eukaryotic Proteomes

ProteomeHuman Fly Worm Yeast Mustard Weed

Number of Genes 31,778 13,338 18,266 6,144 25,706

% of DB Matches* 51 56 50 50 52

(* Similarity search of protein sequences in the database)

Comparative Analysis of Proteomic Pheno-Complexity

Eubacteria

ArchaeaEukarya

UnicelluarOrganisms

Invertebrates Vertebrates Mammals Human

ConservedCore Proteins

Lineage-Specific Proteins

Domain Accretion Protein Architecture

Vertebrate-Specific Proteins

Protein Diversity

Functional Diversity

Protein Sequence Homology

Query

(1) Protein Match with Known or Unknown Function

Query

(2) Domain Match with Known or Unknown Function

Match

Match

Ortholog: A evolutionarily conserved gene that arose during speciation

Paralogs: Genes that arose due to intra-genome duplication in a species

Protein Sequence Comparison

(I) Homology• > 40 % : Same Function• 25-40 % : Similar Function• < 25 % : Different Function

(II) Distance

• Phylogenetic Tree

Yeast Worm Fly Weed Human

Domain/Protein*

1 1 1 1 1

0 1 1 1 1

0 1 1 0 1

0 0 0 0 1

Eukaryote-specific

Metazoan-specific

Animal-specific

Vertebrate-specific

Comparative Proteomics

*: The domain/protein is present (1) or absent (0) in the proteome.

61% 43%

46%

Eukaryotic Proteomes Shared with Humans

Human

Fly

Yeast

Worm

ConservedCore Proteins in

1,308 Groups

Human(3,109 Proteins)

Fly(1,445 Proteins)

Yeast(1,441 Proteins)

Worm(1,503 Proteins)

Conserved Core Groups in Eukaryotes

Vertebrate-Specific Proteins

UnicelluarOrganisms

Invertebrates Vertebrates Mammals Human

Human22%Eukaryote and

Prokaryote21%

Vertebrates andOther Animals

Other EukaryotesAnd Animals

24%32%

Vertebrate-specific Proteins

Comparative Pheno-Complexity

Bacteria

ArcheaeEukarya

UnicelluarOrganisms

Invertebrates Vertebrates Mammals Human

ConservedCore Proteins

Lineage-Specific Proteins

Domain Accretion Protein Architecture

Vertebrate-Specific Proteins

Protein Diversity

Housekeeping Functions• Engery/Metabolism• DNA replication/Repair• Translation

Physiological Differences• Defense & Immunity• Cell-Cell Communications• Nervous System

Functional Diversity

Protein Diversity in Eukaryotes

• Horizontal Gene Transfer• Invention of Protein Domain• Expansion of Protein/Domain Families• Evolution of New Protein Architectures

HumanBacteria 223 Genes

Lateral Gene Transfer

• Hydrolase• Oxidoreductase• Dehydrogenase• Monoamine Oxidase• Transporter

• Lineage Specific• Intron Acquisition

Comparative Pheno-Complexity

Bacteria

ArcheaeEukarya

UnicelluarOrganisms

Invertebrates Vertebrates Mammals Human

ConservedCore Proteins

Lineage-Specific Proteins

Domain Accretion Protein Architecture

Vertebrate-Specific Proteins

Protein Diversity

Housekeeping Functions• Engery/Metabolism• DNA replication/Repair• Translation

Physiological Differences• Defense & Immunity• Cell-Cell Communications• Nervous System

Functional Diversity

Protein Function Assignment

12 Function Categories (Gene Ontology Project)

1. Cellular Processes2. Metabolism3. DNA Replication/Modification4. Transcription/Translation5. Intracellular Signaling6. Cell-Cell Communication7. Protein Folding/Degradation8. Transport9. Multifunctional Proteins10. Cytoskeletal/Structural11. Defense and Immunity12. Miscellaneous Function

Classification of Proteome

(1) Functional Categories(2) Evolutionary Conservation(3) Structural Classification

ProteinSequence

Cellular Function

Domain/MotifDatabases

FunctionalAnnotation

~50% of Eukaryotes

PRINTS, Prosite,Pfam, Prosite Profile

Bacteria

ArcheaeEukarya

UnicelluarOrganisms

Invertebrates Vertebrates Mammals Human

Vertebrate-Specific Proteins

Physiological Differences• Defense & Immunity• Cell-Cell Communications• Nervous System

Functions

94 (7%)/1,262 InterPro Families70 Proteins24 Domains

New Proteins and Domains in Vertebrates

YeastWorm Fly

• Few new protein domains invented• Common ancestral domains in animals

Protein Domain

• An evolutionary unit • The coding sequence can be duplicated and/or recombined• ~100 to 250 residues• In small proteins or parts of large ones in a domain family• Descending from a common ancestor

• Duplication: to give arise one or more domains• Divergence: to generate modified proteins by mutations or In/Del• Recombination: to produce new domain arrangements

Protein Domain Architecture

Domain A B C D

(1) Single-domain Protein

(II) Multi-domain Protein

• Prokaryotic Proteome: 2/3 proteins are > 2 domains• Eukaryotic Proteome: 4/5 proteins are multi-domain

Invention of Protein Domain

Yeast Worm Fly Human Weed

Number of Proteins

48 151 357 706 115

7 54 115 188 392

0 113 81 222 17

0 2 8 18 131

Domain

C2H2 zinc finger

Leu-rich repeats

EGF-like

TIR

Immunoglobulin 0 64 140 765 0

CRAB box 0 0 0 171 0

Q14 repeats 0 0 0 1 0

• Expansion of paralogous proteins in metazoan• Invention of new domains in eukaryotic genome evolution

Domain Expansion: Duplication

Yeast Worm Fly Human Weed

Number of Proteins(Domains)

3 8 5 11 0

9 20 19 59 8

6 8 9 16 15

0 67(323) 125(291) 381(930) 0

Domain

RasGAP

RhoGAP

ArfGAP

Ig

PH 24 65(68) 72(78) 193(212) 23

SH3 23(27) 46(61) 55(75) 143(182) 4

Ank 12(20) 75(223) 72(269) 145(404) 66(111)

Domains are expandable in metazoan!

Rosetta StoneRosetta StoneRosetta StoneRosetta Stone

Protein A

Protein B

Protein X

Function 1

Function 2

Functions 1 and 2 due todomain recombination

Similarity Search of Protein Databases

Domain Accretion: Recombination

Ancetral Domains in Different ProteinsA B C D

A C B D

A C B D?

Combinatorial Architecture

Rho X PH

ArfGAP Ank

SH3

ArfGAP

X PH ArfGAP

X PH ArfGAP

Ank Ank

Ank Ank

Ank Ank Ank

Ank Ank Ank

PBS

Superdomain: Domain recombination in sequential order

Rho X PH

ArfGAP Ank

SH3

ArfGAP

X PH ArfGAP

X PH ArfGAP

Classification of Multi-domain ArfGAP Gene FamilyClassification of Multi-domain ArfGAP Gene Family

Class

Rho

X

PH

ArfGAP

Ank

SH3Ras-like GTPases

Domain X

Plecstrin homology domain

Zinc finger domain

Ankyrin repeat

Src homology domain

Ank Ank

Ank Ank

Ank Ank Ank

Ank Ank Ank

PBS

PBS Paxillin-binding subdomain

// //AW993140 (159)

1 11 12 15 16 17

C1

C6

KIAA1099.1

// // 1 11 12 15 16 17

C1

C6

KIAA1099.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

KIAA1099.1

KIAA1099.0

Expression of Variants in Multiple Human Tissues:KIAA1099.0 and .1

Leu

koc

ytes

LN

Sp

leen

Am

ygd

ala

Bra

in

S. M

usc

le

Hea

rt

S. I

.

Sto

mac

h

M. G

lan

d

Liv

er

Kid

ney

Lu

ng

Ute

rus

Tes

tis

Pla

cen

ta

// // 1 11 12 15

BE780934 (395)

C1

C5

// //AW993140 (159)

1 11 12 15

BE780934 (395)

C1

C5

KIAA1099.2

KIAA1099.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

KIAA1099.3

KIAA1099.2

Expression of Variants in Multiple Human Tissues:KIAA1099.2 and .3

Leu

koc

ytes

LN

Sp

leen

Am

ygd

ala

Bra

in

S. M

usc

le

Hea

rt

S. I

.

Sto

mac

h

M. G

lan

d

Liv

er

Kid

ney

Lu

ng

Ute

rus

Tes

tis

Pla

cen

ta

Expressed Diversities of Functional Domains

Rho X PH ArfGAP Ank Ank

Rho X PH ArfGAP

Class II ArfGAP: KIAA 1099

Transcription

Alternatively Spliced Variant Transcripts

• One alternatively spliced transcript lacks ankyrin repeats.• Other variants have an altered PH domain.

Rho X PH ArfGAP Ank Ank

Eukaryotic Protein Diversity

• Lateral Gene Transfer: Bacterial Genes• Domain Invention: Vertebrate-specific Proteins• New Architecture: Combinatorial Domain Accretion• Domain Expansion: Multiple Domains in a Protein• Paralogous Expansion: Gene Duplication

(I) Genome Evolution (Germ-line)

• Somatic Rearrangement: Ig & TCR Gene Families• Alternative Splicing: Protein Isoforms

(II) Gene Expression (Somatic)

Alternative Splicing: Domain Ablation or Alteration

Phenotypic Complexity of the Eukaryotic Proteome

Domain Accretion

Protein Architecture

Protein Diversity

Functional Diversity

Domain Expansion

Somatic Rearrangement

Alternative SplicingHorizontal Transfer

Modifications

Biological Processes

Paralogous Expansion

Protein Interactions

Evolution Somatic

de novo Systems

• Duplication• Divergence• Recombination

Recombination

• Domain ablation• Domain alteration

Protein Diversity Functional Diversity

Biological Processes

Physiome Patholome

Cellome Metabolome

Proteome

Genome

Integrated Life Sciences in the Post-Genomic Era

Functional Proteomics Structural Proteomics

Gene Repertoire

Protein Repertoire

Systems Biology

Recommended