41
[email protected] http://www.bork.embl-heidelberg. Peer Bork Peer Bork EMBL & MDC EMBL & MDC Heidelberg & Berli Heidelberg & Berli Proteome Proteome analysis analysis in silico in silico

[email protected] bork.embl-heidelberg.de

  • Upload
    misu

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Proteome analysis in silico. Peer Bork EMBL & MDC Heidelberg & Berlin. [email protected] http://www.bork.embl-heidelberg.de/. ‘omes: use and misuse. Original intention exemplified by the genome:. ‘ome – entirety of biomolecular objects (ALL genes etc). - PowerPoint PPT Presentation

Citation preview

Page 1: bork@embl.de bork.embl-heidelberg.de

[email protected]://www.bork.embl-heidelberg.de/

Peer BorkPeer BorkEMBL & MDCEMBL & MDC

Heidelberg & BerlinHeidelberg & Berlin

Proteome analysis Proteome analysis in silicoin silico

Page 2: bork@embl.de bork.embl-heidelberg.de

‘‘omics – research on an entirety of biomolecular objectsomics – research on an entirety of biomolecular objects

Proteomics – research on the entirety of proteins (so Proteomics – research on the entirety of proteins (so far in an organism) coined beginning of the 90thfar in an organism) coined beginning of the 90th

Original intention exemplified by the genome:Original intention exemplified by the genome:

Common Praxis:Common Praxis:

‘‘omics - used to describe large-scale approachesomics - used to describe large-scale approaches(whereby large is sometimes 1)(whereby large is sometimes 1)

‘‘omes: use and misuseomes: use and misuse

Proteomics - used for research on many proteinsProteomics - used for research on many proteins(whereby many might mean 3)(whereby many might mean 3)

‘‘ome – entirety of biomolecular objects (ALL genes etc)ome – entirety of biomolecular objects (ALL genes etc)

Page 3: bork@embl.de bork.embl-heidelberg.de

Protein profilingProtein profiling andand interaction proteomicsinteraction proteomics

Originally two main directions:Originally two main directions:

Protein profiling: establishment of protein inventories Protein profiling: establishment of protein inventories under controlled conditions (organelles, tissues, under controlled conditions (organelles, tissues, organisms). organisms).

Interaction proteomics: identification of temporally Interaction proteomics: identification of temporally and spatially defined functional modules formed by and spatially defined functional modules formed by proteinsproteins

Bioinformatics analysis is essential in both areasBioinformatics analysis is essential in both areas

Page 4: bork@embl.de bork.embl-heidelberg.de

Part IPart I

Part IIPart II

Protein detection and annotation by homology and Protein detection and annotation by homology and orthology (orthology (function in1Dfunction in1D))

Protein interactions and protein networks (Protein interactions and protein networks (function in 2Dfunction in 2D))

Proteome analysis in silicoProteome analysis in silico

Temporal and spatial considerations (Temporal and spatial considerations (function in 3D+4Dfunction in 3D+4D))

Page 5: bork@embl.de bork.embl-heidelberg.de

AlternativeAlternativeSplicingSplicing

GenomeGenomeannotationannotation

Bork et al. Bork et al. JMolBiol 1998JMolBiol 1998

Domain analysisDomain analysis

Protein networksProtein networks

Literature miningLiterature miningcoupled tocoupled togenomic datagenomic data

Page 6: bork@embl.de bork.embl-heidelberg.de

70% prediction accuracy is great!70% prediction accuracy is great!Prediction of |acc*cov | %acc | % cov of reference set| reference

Human promoters: .35 50% 70% of annotated test set Prestidge, 1995; Bucher , pers. Comm.

Human regulatory RNA elements .34 85% 40% of new DNA Dandekar & Sharma, 1998

Human genes (only presence): .49 70% 70% of chromosome. 22 Dunham et al., 1999 and refs therein

Human SNPs by EST comparison: .21 70% 30% of all proteins with SNP Sunyaev et al., 2000; Buetow et al., 1999

Human alternative splicing: .45 90% 50% of all splice sites Hanke et al., 1999

Transmembranes (only presence): .85 85% 99% of annotated test set Tusnady & Simon, 1998 and refs therein

Signal peptides (only presence): .90 90% 100% of annotated test set Nielsen et al., 1999

GPI ancors (incl cleavage site): .72 72% 100% of annotated test set Eisenhaber et al., 1999

Coiled coil (only presence): .81 90% 90% of annotated coiled coil Lupas, 1996

Secondary structure (3 states): .77 77% 100% of 3D test set Jones, 1999 and refs therein

Buried or exposed residues: .74 74% 100% of 3D test set Rost, 1996

Residue hydration: .72 72% 100% of 3D test set Ehrlich et al., 1998

Protein folds (in Mycoplasma): .49 98% 50% of Mycoplasma ORFs Teichmann et al,1999 and refs therein

Homology (several methods): .49 98% 50% of 3D test set Muller et al, 1999 and refs therein

Functional features by homology: .63 90% 70% unicellular genomes Bork and Koonin, 98; Brenner, 99

Function association by context: .25 50% 10% ‘high confidence’ in yeast Marcotte et al.,1999b

Cellular localization (2 states): .77 77% 100% of annotated test set Andrade et al., 1998

Page 7: bork@embl.de bork.embl-heidelberg.de

Concepts in function predictionConcepts in function predictionHomology-basedHomology-based (intrinsic molecular features)(intrinsic molecular features)

Gene context Gene context (functional associations)(functional associations)

- Sequence and domain DBs (Blast, Pfam,Smart)- Sequence and domain DBs (Blast, Pfam,Smart)

- Gene neighbourhood, fusion, co-occurrence- Gene neighbourhood, fusion, co-occurrence- Shared regulatory elements- Shared regulatory elements

Other Other (residue level, functional class )(residue level, functional class )- Correlated mutations- Correlated mutations- Interaction threading- Interaction threading

- Function transfer by orthology- Function transfer by orthology

- Feature analysis- Feature analysis

Page 8: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.de

I. Homology-based protein annotationI. Homology-based protein annotation

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Evolution of protein functionEvolution of protein function

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Homology detection and domain annotationHomology detection and domain annotationHomology detection and domain annotationHomology detection and domain annotation

Page 9: bork@embl.de bork.embl-heidelberg.de

Status of Status of homology based homology based function predictionfunction prediction

Many homologues, an increasing number of predictable folds, but tough times for automatic function prediction

Page 10: bork@embl.de bork.embl-heidelberg.de

Molecular Functions have to be defined on a domain basisi.e. separately foreach structurallyindependent unitwithin a sequence

Henikoff et al. 1997 Science 278, 609

Page 11: bork@embl.de bork.embl-heidelberg.de
Page 12: bork@embl.de bork.embl-heidelberg.de

0

5

10

15

20

25

30

35

40<1

985

85/8

6

87/8

8

89/9

0

91/9

2

93/9

4

95/9

6

97/9

8

99/0

0

01//0

2

03/n

ow

cytoplasmic domainsnuclear domains

History of signaling domain discovery History of signaling domain discovery

SystematicSystematicdiscovery by discovery by 1) searching 1) searching ‘in between’‘in between’regionsregions2) starting 2) starting with repeatswith repeats

Doerks et al. 2002Doerks et al. 2002Genome Res.Genome Res.Ponting et al. 2001Ponting et al. 2001Genome Res.Genome Res.

Page 13: bork@embl.de bork.embl-heidelberg.de

Domain discovery in disease genesDomain discovery in disease genesgene/protein disease domains reference

dystrophin Muscular dystrophy WW Bork & Sudol: TIBS 19(94)531

X11 Friedreich's ataxia (c) PI/PTB+PDZ Bork & Margolis: Cell 80(95)693

PKD1 Polycystic kidney many (PKD1) Int. PKD1 consortium: Cell 81(95)298

HD Huntington's HEAT repeats Andrade & Bork: Nat.Genet.11(95)115

BRCA2 Breast cancer BRC repeats Bork et al.: Nat. Genet. 13 (96) 22

BRCA1 Breast cancer BRCT Koonin et al.: Nat. Genet. 13 (96) 266

dsh DiGeorge syndrome DEP Ponting & Bork: TIBS 21(96) 245

X25 (FRDA) Friedreich's ataxia CyaY Gibson et al. : TINS 19 (96) 465

beige/CH Chediak-Higashi BEACH Nagle et al. : Nat. Genet. 14 (96) 307

RB Retinoblastoma BRCT Bork et al. :FASEB J. 11 (97) 68

9 incl. HML1 Colon cancer HSP90 Mushegian et al. : PNAS 94 (97) 5831

TSG101 Breast cancer UBC Ponting, Cai & Bork: JMM 75 (97) 467

WRN/BLM Werner + Bloom syn. HRDC Morozov et al. : TIBS 22 (97) 417

2 inc pyrin Mediterrian fever SPRY Schultz et al. : PNAS 95 (98) 5857

p73 various tumors? SAM Bork & Koonin: Nat. Genet. 18 (98) 313

mahagony Obesity PSI Nagle et al.: Nature 398 (99) 148

Parkin AP-J Parkinsonism IBR Morett & Bork: TIBS 24 (99) 229

Page 14: bork@embl.de bork.embl-heidelberg.de

SMARTSMARTBlast-like inputBlast-like input

-Access to different databases-Domain annotation & architecture

www.smart.embl-heidelberg.de

Collaboration withChris Ponting

-Alerting

Page 15: bork@embl.de bork.embl-heidelberg.de

Digested outputDigested output

-signal sequence, Coiled coil and TM

-Pfam integrated

SMARTSMART

-comparison of domain context

www.smart.embl-heidelberg.de

Page 16: bork@embl.de bork.embl-heidelberg.de

• Calpain7MIT

• Spastin • SKD1 protein • VPS4p ATPase (Vacuolar protein sorting factor 4A and 4B)• Tobacco mosaic virus helicase domain-binding protein

MIT

• Sorting nexin 15MIT

• RSK-like protein MIT

• Similar to ribosomal protein S6 kinaseMIT

• CG8866 MIT

Ciccarelli, F. D., et al. Genomics 81(03)437Patel, H. et al. Nat Genet 31(02)347,

Spartin

Mutation

MIT Plant-relatedPlant-related

A putative transport-associated microtubule-binding domainA putative transport-associated microtubule-binding domain

Unifying disorders associated to hereditary spastic paraplegia?Unifying disorders associated to hereditary spastic paraplegia?

Page 17: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.de

I. Homology-based genome annotationI. Homology-based genome annotation

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Evolution of protein functionEvolution of protein function

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Homology detection and domain annotationHomology detection and domain annotationHomology detection and domain annotationHomology detection and domain annotation

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Page 18: bork@embl.de bork.embl-heidelberg.de

21

Number of human genes in timeNumber of human genes in time

Aug00 Apr01Oct00 Dec00 Feb01Feb00 0

100

120

20

40

80

60

HGS, Incyte and coTextbooks, public opinion

Celera

HGP38 32

5239

27 24 22

No h

uman

gen

es in

thou

sand

s

HGS

othersBasis for Feb 01 publications

10T

8T

6T

4T

2T

NEMAX50 index

Jan05

10T

8T

6T

4T

2T

TecDAX index

Page 19: bork@embl.de bork.embl-heidelberg.de

Improvement of gene cluster predictionsImprovement of gene cluster predictionsMouse chr4:94-94,6 Mb p450 (CYP2J) region: 8 genes8 genes / / 11 pseudogenic fragments11 pseudogenic fragments

cyp2j6 cyp2j9 cyp2j5

Known genesKnown genes

cyp2j13

ESTsESTs

TwinscanTwinscan (1 gene)(1 gene)

GeneIDGeneID (3 genes)(3 genes)

fgenesh++fgenesh++ (13 genes)(13 genes)

ENSEMBLENSEMBL (9 genes)(9 genes)

Manual Manual (8genes)(8genes)

(comparison performed in 2004)(comparison performed in 2004)

Page 20: bork@embl.de bork.embl-heidelberg.de

BLAST2GENE finds independent gene copies BLAST2GENE finds independent gene copies BLAST of cyp2j13 protein vs. Mouse chr4:94-94,6 Mb

~ 150 Alignments

BLAST2GENEBLAST2GENE

100

200

300

400

Mm

.cyp

2 j.p

ep

(le

n=

501)

GE

NE

_1~(

4764

..13

967)

cov =

0.4

41

id%

= 60

.4

GE

NE

_2~(

3599

3..7

7274

)

cov =

0.9

72

id%

= 66

.5

GE

NE

_3~(

8792

1..1

0691

3)co

v = 0

.166

id

%=

59.4

GE

NE

_4~(

1265

47..

1267

08)

cov =

0.1

08

id%

= 68

.0

GE

NE

_5~(

1316

66..

1723

08)

cov =

0.9

80

id%

= 63

.2

GE

NE

_6~(

1813

33..

2209

95)

cov =

0.4

41

id%

= 50

.2

GE

NE

_7~(

2415

42..

2952

91)

cov =

0.9

78

id%

= 63

.4

GE

NE

_8~(

3029

76..

3782

93)

cov =

0.4

51

id%

= 53

.5

GE

NE

_9~(

3913

23..

4541

10)

cov =

0.9

92

id%

= 59

.9

GE

NE

_10

~(46

2789

..46

2893

)co

v = 0

.070

id

%=

57.0

GE

NE

_11

~(46

4757

..50

0175

)co

v = 0

.996

id

%=

67.2

GE

NE

_12

~(51

5451

..53

8069

)co

v = 0

.986

id

%=

61.0

GE

NE

_13

~(55

2820

..56

2733

)co

v = 0

.184

id

%=

62.7

GE

NE

_14

~(57

6195

..58

8175

)co

v = 0

.547

id

%=

87.8

GE

NE

_7~(

2415

42..

2952

91)

cov =

0.9

78

id%

= 63

.4

Hundrets often considerable differences to current gene prediction pipelines!Hundrets often considerable differences to current gene prediction pipelines!

Page 21: bork@embl.de bork.embl-heidelberg.de

regions containing independent elements

Merging of fragments of the same element

1. Similarity search in intergenic regions1. Similarity search in intergenic regionsMasking of known repeats and already predicted genes1.5-2 million fragments

fragments with significant sequence similarity

BLASTX vs nr prot. dbE-value < 0.001

Exclusion of transposon and virus derived sequence

Closest known protein (first blast hit)

GENEWISE

Torrents, Suyama, Bork Torrents, Suyama, Bork Genome Res. 13(2003)2550Genome Res. 13(2003)2550

Annotation of pseudogenes changes gene numbers Annotation of pseudogenes changes gene numbers

Ka/Ks functionality check

Ca 20.000 detectable pseudogenesCa 20.000 detectable pseudogenesin each: human, mouse, ratin each: human, mouse, rat

Page 22: bork@embl.de bork.embl-heidelberg.de

Still >3000 pseudogenes among the predicted human Still >3000 pseudogenes among the predicted human genes mid 2004 (build 34)genes mid 2004 (build 34)

e1 e2

Processed PseudogeneGenewise prediction using sptrembl|Q9HBM5

e3 e4 e5 e6

Processed PseudogeneGenewise prediction using SwissProt|RS2_RAT

80 kb

Predicted GeneMm chr1:7608644-7681026 Stop codon or

frameshift

2. Consistency check of gene predictions2. Consistency check of gene predictionsAnnotation of pseudogenes changes gene numbers Annotation of pseudogenes changes gene numbers

Arrays, chips et al. 20%off?Arrays, chips et al. 20%off?

Page 23: bork@embl.de bork.embl-heidelberg.de

genes

Protein diversity

20-40k genes20-40k genes

>100k transcripts>100k transcripts

>1000k proteins?>1000k proteins?

What do we count?What do we count?

Page 24: bork@embl.de bork.embl-heidelberg.de

0

5

10

15

20

25

30

35

40

45

50

0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 3.000.000 3.500.000

ESTs

%A

S

mouse

human

Rate of detectable alternative splicing depends Rate of detectable alternative splicing depends on EST coverage and library rangeon EST coverage and library range

2.0

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

AS per m

RN

A (x)

Brett Brett et al.et al. Nature GenetNature Genet. 30(2002)29. 30(2002)29

Page 25: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.deBoue et al. Bioessays 03

Page 26: bork@embl.de bork.embl-heidelberg.de

Homology-based predictions of exons and Homology-based predictions of exons and alternative transcripts (alternative transcripts (www.smart.embl-heidelberg.dewww.smart.embl-heidelberg.de) )

SMART domain DBSMART domain DBlinks to genomeslinks to genomes

Page 27: bork@embl.de bork.embl-heidelberg.de

Top 10 domains* in human: 30% diff.!Top 10 domains* in human: 30% diff.!human fly worm

ImmunoglobulinC2H2zinc finger

*Only no of genes given, no of domains higher; note that only around 90% is sequenced

Protein kinaseRhod.-like GPCRP-loop NTPaseRev.transcriptaseRRM (RNA-binding)WD40 (G-protein)Ankyrin repeat

765 (381) 140 64706 (607) 357 151575 (501) 319 437569 (616) 97 358433 198 183350 10 50300 (224) 157 96277 (136) 162 102276 (145) 105 107

13300 18200

Nature 409 (01)860; Science 291(01)1304

Total no genesSpecies

Homeobox 267 (160) 148 109

26500(26500)

Page 28: bork@embl.de bork.embl-heidelberg.de

Metazoan genome annotation an ongoing process Metazoan genome annotation an ongoing process and far from completeand far from complete

>2000 pseudogenes in mammalian gene sets: Only now they are about to be included in prediction pipelines

Ca 150 retro-related genes in mammalian gene sets (>1000 in 2004), but true human genes sometimes suppressed

Annotation of gene clusters need considerable improvements

Alternative splicing still a major unknown Considerable human factor in annotation

Page 29: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.de

I. Homology-based genome annotationI. Homology-based genome annotation

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Evolution of protein functionEvolution of protein function

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Homology detection and domain annotationHomology detection and domain annotation

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Page 30: bork@embl.de bork.embl-heidelberg.de

Human: Human: NatureNature Feb 2001 Feb 2001

Mouse: Mouse: NatureNature Dec 2002 Dec 2002Mosquito: Mosquito: ScienceScience Oct 2002 Oct 2002

Rat: Rat: NatureNature Apr 2004 Apr 2004

7575

4040mousemouseratratchickenchicken

chimpchimp

310MY310MY

fugufugu450MY450MY

600-1200MY?600-1200MY?

??

C.eleg.C.eleg.

D.mena.D.mena.250MY250MY

mosquitomosquito

55humanhuman

chicken: chicken: NatureNature Dec 2004 Dec 2004

Page 31: bork@embl.de bork.embl-heidelberg.de

ChickenChicken genome analysisgenome analysis

Zdobnov et alZdobnov et alScience 02Science 02

15%15%

45%45%

Hillier et alHillier et alNature 04Nature 04

Page 32: bork@embl.de bork.embl-heidelberg.de

ChickenChicken genome analysis: orthology and cellular processesgenome analysis: orthology and cellular processes

75.4% identity (median)75.4% identity (median)between between

chicken and human chicken and human 1:1 orthologs1:1 orthologs

Immune response Immune response evolves fastestevolves fastest

Page 33: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.de

Chicken genome analysis:Chicken genome analysis:

Innovation and Expansion of domain familiesInnovation and Expansion of domain families

Page 34: bork@embl.de bork.embl-heidelberg.de

Orthology analysis Orthology analysis reveals more reveals more

subtle functional subtle functional changeschanges

Page 35: bork@embl.de bork.embl-heidelberg.de

Evolution by duplication: Burst of an olfactory receptor familyEvolution by duplication: Burst of an olfactory receptor family

……thought tothought torecognize MHCrecognize MHCdiversitydiversity

chickenchicken

humanhuman

……221 copies 221 copies in chickenin chicken

……given a ca 300 given a ca 300 ORs in chickenORs in chickenand 450 in humanand 450 in human

Page 36: bork@embl.de bork.embl-heidelberg.de

Chicken genome analysis: Evolution of functionChicken genome analysis: Evolution of functionby domain accretionby domain accretion

Scavenger receptor cysteine-rich domain acquired Scavenger receptor cysteine-rich domain acquired by a fibrinogen-domain containing protein by a fibrinogen-domain containing protein (identified and displayed by SMART)(identified and displayed by SMART)

Page 37: bork@embl.de bork.embl-heidelberg.de

www.bork.embl-heidelberg.de

I. Homology-based genome annotationI. Homology-based genome annotation

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Evolution of protein functionEvolution of protein function

Metazoan genome annotation: the dark side…Metazoan genome annotation: the dark side…

Homology detection and domain annotationHomology detection and domain annotation

Metazoan proteome analysis: human vs chickenMetazoan proteome analysis: human vs chicken

Evolution of protein functionEvolution of protein function

Page 38: bork@embl.de bork.embl-heidelberg.de

PhylogeneticPhylogeneticDistribution ofDistribution of

orthologsorthologs

- Losses- Losses

Page 39: bork@embl.de bork.embl-heidelberg.de

Sterol MetabolismSqualene monooxygenase (EC 1.14.99.7) - - x x - x x

7-dehydrocholesterol reductase (EC 1.3.1.21) - - x x x x x

Farnesyl-diphosphate farnesyltransferase ( EC 2.5.1.21) - - x x - x x

Lanosterol synthase (EC 5.4.99.7) - - x x - x x

Lanosterol synthase (EC 5.4.99.7) - - x x - x x

3-oxo-5-alpha-steroid 4-dehydrogenase 1 (EC 1.3.99.5) - - x - x x x

C-5 sterol desaturase (EC 1.3.3.2) Ergosterol biosynthesis - - x x - x x

Cytochrome P450 P51, sterol 14-alpha demethylase - - x x - x x

diminuto/24-dehydrocholesterol reductase ('seladin1') - - x - x x x

Biosynthesis of NADKynureninase (EC 3.7.1.3) - - - x x x x

3-hydroxyanthranilate 3,4-dioxygenase (EC 1.13.11.6) synthesis of excitotoxin quinolinic acid - - - x x x x

Quinolinate phosphoribosyltransferase (EC 2.4.2.19) - - x x - x x

DNA-methylation and repairDNA (cytosine-5)-methyltransferase 1) - - x - - x x

uracil-DNA glycosylases - - x - x x x

DNA-(apurinic or apyrimidinic site) lyase (EC 4.2.99.18) - - - x x - -

D A P Y W H M D A P Y W H M Gene loss inGene loss indipteradiptera

Page 40: bork@embl.de bork.embl-heidelberg.de

Functional changes at evolutionary time scalesFunctional changes at evolutionary time scales

Orthologs mapped onto Orthologs mapped onto metazoan phylogenymetazoan phylogeny

Page 41: bork@embl.de bork.embl-heidelberg.de

Summary (homology-based function prediction)Summary (homology-based function prediction)

Emphasis in homology based genome annotation shifts from Emphasis in homology based genome annotation shifts from sensitivity (e.g. domain identification) to selectivity issues (orthology sensitivity (e.g. domain identification) to selectivity issues (orthology assignment for 1:1 function transfer)assignment for 1:1 function transfer)

Metazoan genome annotation is far from being complete and caution Metazoan genome annotation is far from being complete and caution is needed when using incomplete and partially erroneous parts list is needed when using incomplete and partially erroneous parts list (e.g. when predicting networks)(e.g. when predicting networks)

Yet, with the incoming number of metazoan genomes our Yet, with the incoming number of metazoan genomes our understanding of functional diversification at the protein level will understanding of functional diversification at the protein level will increase dramatically ....although the proteome remains far from increase dramatically ....although the proteome remains far from being decipheredbeing deciphered