The Havana-Gencode annotation
GENCODE CONSORTIUM
Region name Region typeRegion
length (kb)Chr known Novel protein
Novel transcript
PutativePseudogene(processed)
Pseudogene(unprocessed)
Artefact TEC
ENr114 Random 500 10 1 0 1 0 0 0 0 0
Enr231 Random 500 1 12 0 0 2 1 0 0 0Enr232 Random 500 9 9 0 3 4 1 0 1 0Enr324 Random 500 X 3 0 2 1 4 0 0 0ENr334 Random 500 6 8 0 1 1 1 0 0 2Enr323 Random 500 6 5 0 0 2 5 0 0 0Enr111 Random 500 13 1 1 3 3 2 0 0 2Enr222 Random 500 6 3 0 1 2 1 0 0 0ENr132 Random 500 13 4 0 1 3 1 0 0 0Enr333 Random 500 20 16 0 1 4 3 1 0 0
ENm004 Manual 1700 22 19 0 3 5 9 4 1 0ENm006 Manual 1338,447 X 42 5 1 1 5 4 1 2ENr223 Random 500 6 6 3 2 2 7 0 0 0Enm002 Manual 1000 5 21 1 4 7 0 2 2 0ENr112 Random 500 2 0 0 0 0 1 1 0 0ENr113 Random 500 4 0 0 2 0 2 0 0 0ENr121 Random 500 2 2 0 1 3 0 1 0 1Enr131 Random 500 2 13 1 3 1 4 4 0 1ENr212 Random 500 5 2 0 2 0 0 0 0 0ENr221 Random 500 5 3 1 3 0 2 0 0 1Enr313 Random 500 16 0 0 0 0 0 0 0 0ENr321 Random 500 8 2 0 1 2 0 0 0 1ENr331 Random 500 2 5 5 1 5 2 0 0 0Enr122 Random 500 18 9 1 0 0 1 0 1 0ENr233 Random 500 15 15 0 2 0 1 4 0 0Enr311 Random 500 14 0 0 0 0 1 0 0 0Enr123 Random 500 12 3 0 0 1 1 0 0 0ENr322 Random 500 14 2 0 0 0 2 0 0 0ENr213 Random 500 18 1 0 0 1 0 0 0 0
ENm001 Manual 1877,426 7 12 1 3 5 6 0 0 0ENm003 Manual 500 11 5 2 2 1 1 0 0 0Enm005 Manual 1695,985 21 23 0 10 2 6 0 1 2
ENm007 Manual 1000,876 19 39 0 6 3 6 14 0 1ENm008 Manual 500 16 21 2 3 1 2 3 0 0Enm009 Manual 1001,592 11 44 1 2 1 3 28 0 0Enm010 Manual 500 7 15 0 4 3 6 0 1 2ENm011 Manual 606,048 11 15 1 2 5 2 0 4 0Enm012 Manual 1000 7 2 0 1 0 2 0 0 0Enm013 Manual 1114,424 7 7 0 4 1 3 0 0 1ENm014 Manual 1163,197 7 5 0 3 0 3 0 0 2ENr133 Random 500 21 6 1 2 2 4 0 1 0ENr211 Random 500 16 1 0 1 0 2 0 0 0Enr312 Random 500 11 1 0 1 2 0 0 1 0Enr332 Random 500 11 13 0 0 2 1 0 1 0
TOTAL 29997,995 416 26 82 78 104 66 15 18Total 13 first 8538,447 129 9 19 30 40 9 3 6Total 31 others 21459,548 287 17 63 48 64 57 12 12
Locus Type
Loci annotated in the 44 ENCODE regions
Experimental validations of the manual annotations
5'RACEs to obtain full length mRNA(s)
RT-PCRs to check 360 junctions
Bidirectionnal RACEs to obtain full length
mRNAs
Experimental validation of the
single exon annotated
knownNovel protein
Novel transcript
Putative TEC
416 26 82 78 18
The annotations produced by the Havana team at Sanger are being verified experimenally through
RT-PCRs and RACEs (University of Geneva)Initial
annotation
Experimental validations
Updated annotation
New set of confirmed genes
5’RACEs to extend Known and Novel protein genes
- 214 / 426 loci provided positive RACEs for at least one primer (50%)
- About 10% of the successful RACEs extend the loci in 5’ (and some provide new exon junctions)
(some RACE products are still being analysed)
Experimental validations of the manual annotations
RT-PCRs VEGA Novel_transcript and Putative
Tested Validated % validatedJ unction level 360 73 20.3%
Transcript level 214 59 27.6%Locus level 161 48 29.8%Putative loci 78 11 14.1%
Novel transcript loci 81 36 44.4%
The Novel transcript loci have a higher success rate than the
Putative loci (in accordance to their definition)
When more than one junction were submitted for the same transcript, all the junctions were in accordance in 2/3 of the cases (mostly all junctions negative).
Experimental validations of the manual annotations
RT-PCRs on non canonical splice sites
43 non canonical splice sites (non GT-AG or GC-AG) were detected in the 13 training ENCODE regions
32 could be tested by RT-PCR (others: too short exons for primer picking)
1 was confirmed: it is actually a canonical U12 intron (AT-AC)
6 provided canonical junctions (already existing in other annotated splice forms)
25 were negative
=> None of the non canonical splice sites could be validated experimentally
(83 other splice sites are being checked in the 31 other regions)
Experimental validations of the manual annotations
Gene predictions outside of Havana-Gencode annotations
In 13 ENCODE regions, 1255 predicted introns (by one or more of the 9 methods) are not annotated in VEGA:
- 380 (30%) extend VEGA objects (1)
- 530 (42%) are in introns of VEGA objects (2)
- 11 (1%) link exons from distinct VEGA objects (3)
- 334 (27%) are completely outside of VEGA annotations (4) Havana-Gencode:
Predictions:
(1)
(2)
(3)
(4)
6 computational gene prediction programs (geneid, genscan, SGP, twinscan, fgenesh, exonify) ;
3 EST-based methods (acembly, Ecgene, Ensembl EST)
1255 predicted introns tested:
=> 16 RT-PCRs confirmed the predicted junction, 9 provided another junction. (excluding pseudogenes)
=> Only 3 are intergenic (new loci?) --> being extended by RACE
Gene predictions outside of Havana-Gencode annotationsRT-PCRs on exons junctions
*1: RT-PCR successful ; 2: RT-PCR povided a product with a wrong exon junction
Predictor Identifier
ENr232 Twinscan chr9.128.008.a-intron-1 685 intronic 1ENr333 Ecgene:acemblyH20C3836.184.ENr333.-1-intron-2:H20C3836.185.ENr333.-1-intron-2:H20C3836.186.ENr333.-1-intron-2:H20C3836.187.ENr333.-1-intron-2:H20C3836.188.ENr333.-1-intron-2:H20C3836.189.ENr333.-1-intron-2:H20C3836.190.ENr333.-1-intron-2:H20C3836.191.ENr333.-1-intron-2327 intronic 1ENr334 Geneid chr6_676.1-intron-1 32543 intronic 1ENr323 Genscan NT_025741.185.ENr323.-1-intron-7 8327 intronic 1ENr231 Acembly PIK4CB.pDec03-intron-1 118 intronic 1ENr231 Acembly TUFT1.cDec03-intron-2 425 Extending 1ENr334 Ecgene H6C6026.1-intron-1 183 intergenic 1ENr333 Ecgene:fgenesh:geneid:genscan:twinscanC20000553.ENr333.-1-intron-3:H20C3776.1.ENr333.-1-intron-3:NT_028392.98.ENr333.-1-intron-3:chr20.35.017.a.ENr333.-1-intron-3:chr20_378.1.ENr333.-1-intron-3148 intronic 2ENr333 Ecgene:fgenesh:geneid:genscan:twinscanC20000553.ENr333.-1-intron-2:H20C3776.1.ENr333.-1-intron-2:NT_028392.98.ENr333.-1-intron-2:chr20.35.017.a.ENr333.-1-intron-2:chr20_378.1.ENr333.-1-intron-2148 intronic 2ENr334 Ecgene H6C6029.1-intron-1:H6C6029.2-intron-1 307 Extending 2ENr333 Ecgene H20C3671.96.ENr333.-1-intron-1 625 intronic 2ENr223 genscan_outof_VEGANT_007299.150.ENr223.-1-intron-5 38 intronic 1
ENm004 ECgene_outof_VEGA:acembly_outof_VEGAH22C3076.70.ENm004.-1-intron-2:PISD.uDec03.ENm004.-1-intron-22247 intronic 1ENm004 ECgene_outof_VEGA:acembly_outof_VEGAH22C3172.3.ENm004.-1-intron-1:RFPL2.bDec03.ENm004.-1-intron-12156 external 1ENm004 geneid_outof_VEGAchr22_380.1.ENm004.+1-intron-2 1807 external 1ENm006 ECgene_outof_VEGAHXC8372.3.ENm006.+1-intron-1 1226 intronic 1ENm006 ECgene_outof_VEGAHXC8602.5.ENm006.-1-intron-1 27179 external 1ENm006 genscan_outof_VEGANT_025307.19.ENm006.+1-intron-3 522 external 1ENr223 acembly_outof_VEGAEIF3S6P1.aDec03.ENr223.+1-intron-1 6981 external 1ENr223 ECgene_outof_VEGA:acembly_outof_VEGAEIF3S6P1.fDec03.ENr223.+1-intron-1:H6C8411.2.ENr223.+1-intron-135640 external 1ENr223 ECgene_outof_VEGA:acembly_outof_VEGA:ensEST_outof_VEGAENSESTT00000043495.0.ENr223.+1-intron-1:ENSESTT00000043496.0.ENr223.+1-intron-1:H6C8448.68.ENr223.+1-intron-1:H6C8448.69.ENr223.+1-intron-1:H6C8448.70.ENr223.+1-intron-1:H6C8448.71.ENr223.+1-intron-1:H6C8448.72.ENr223.+1-intron-1:H6C8448.73.ENr223.+1-intron-1:H6C8448.74.ENr223.+1-intron-1:H6C8448.75.ENr223.+1-intron-1:H6C8448.76.ENr223.+1-intron-1:H6C8448.77.ENr223.+1-intron-1:H6C8448.78.ENr223.+1-intron-1:H6C8448.79.ENr223.+1-intron-1:H6C8448.80.ENr223.+1-intron-1:H6C8448.81.ENr223.+1-intron-1:H6C8448.82.ENr223.+1-intron-1:H6C8448.83.ENr223.+1-intron-1:H6C8448.84.ENr223.+1-intron-1:H6C8448.85.ENr223.+1-intron-1:H6C8448.86.ENr223.+1-intron-1:H6C8448.87.ENr223.+1-intron-1:MTO1.jDec03.ENr223.+1-intron-14040 intronic 2
ENm004 SGP_outof_VEGAchr22_376.1.ENm004.-1-intron-1 414 intronic 2ENr223 geneid_outof_VEGAchr6_921.1.ENr223.+1-intron-3 33810 intronic 2
ENm004 genscan_outof_VEGANT_011520.388.ENm004.+1-intron-7 1764 intergenic 2ENm004 acembly_outof_VEGAsparter.ENm004.+1-intron-1 3534 intergenic 2
ENCODE region
Intron length
Intron type
Confirmed by RT-PCR
Gene predictions outside of Havana-Gencode annotations:
31 last regions
-About 3500 introns predicted by standard prograns from UCSC tracks are outside of the Havana-Gencode annotation (about 900 intergenic).
Very few of those could correspond to real positive (=> Need to prioritize)
- Additionaly, the EGASP predictions add about 7000 other new introns (about 1000 intergenic)
Nb of loci / Mb
0
10
20
30
40
50
60
EN
r11
1E
Nr1
12
EN
r11
3E
Nr1
14
EN
r21
1E
Nr2
12
EN
r21
3E
Nr3
11
EN
r31
2E
Nr3
13
EN
r12
1E
Nr1
22
EN
r12
3E
Nr2
21
EN
r22
2E
Nr2
23
EN
r32
1E
Nr3
22
EN
r32
3E
Nr3
24
EN
r13
1E
Nr1
32
EN
r13
3E
Nr2
31
EN
r23
2E
Nr2
33
EN
r33
1E
Nr3
32
EN
r33
3E
Nr3
34
EN
m0
01
EN
m0
02
EN
m0
03
EN
m0
04
EN
m0
05
EN
m0
06
EN
m0
07
EN
m0
08
EN
m0
09
EN
m0
10
EN
m0
11
EN
m0
12
EN
m0
13
EN
m0
14
Description of the annotations:gene density
Description of the annotations:alternative splicing
Alternative splicing
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
known and novel CDS novel transcript and putative
not alternatively spliced, singleexon gene
not alternatively spliced, severalexons
locus alternatively spliced
Nbr of transcripts per locus
0
1
2
3
4
5
6
known and novel CDS novel transcript and putative
Nbr of exons per transcript
0
1
2
3
4
5
6
7
8
known and novel CDS novel transcript and putative
Avg: 4.2 transcripts per locus 6.7 exons per transcript
Description of the annotations:coding loci
424 44 coding loci in ENCODE regions , 44.6% On average of the transcripts are annotated as coding
Description of the annotations:lengths of exons, introns, cds, utrs…
mean exon length (all
transcripts)
mean intron length (all transcripts)
mean exon length (coding transcripts)
mean CDS exon length
mean CDS-UTR exon
lengthmean UTR exon length
Mean locus length
Mean intergenic
lengthmean %NT
covered
235,55 4611,38 238,16 143,66 667,20 238,75 33210,48 64808,53142 32,57%
Coding transcripts
0,00
100,00
200,00
300,00
400,00
500,00
600,00
700,00
800,00
mean exon length(all types)
mean CDS exonlength
mean CDS-UTRexon length
mean UTR exonlength
pb
Comparison between Havana-Gencode annotation and other
sets
ENSEMBL, REFSEQ, MGC, CCDS
Nbr of genes
Nbr of transcripts Nbr of exons
Havana-Gencode 579 2632 2632
REFSEQ 386 539 539
ENSEMBL 456 747 747
MGC 266 424 424
CCDS 280 334 334
Locus
0
100
200
300
400
500
600
700
Hav
ana/
Ref
seq
SN
=0.
66
Hav
ana/
Ens
embl
SN
=0.
73
Hav
ana/
MG
CS
N=
0.45
Hav
ana/
CC
DS
SN
=0.
49
RE
FS
EQ
/Hav
ana
SP
=1
EN
SE
MB
L/H
avan
aS
P=
0.92
MG
C/H
avan
aS
P=
0.98
CC
DS
/Hav
ana
SP
=1
Nb
of
loc
i
only in other set
only in Havana
common to both sets
Locus, only cds
0
50
100
150
200
250
300
350
400
450
500
Hav
ana/
Ref
seq
SN
=0.8
9
Hav
ana/
Ens
embl
SN
=0.9
5
Hav
ana/
MG
CS
N=0
.61
Hav
ana/
CC
DS
SN
=0.6
5
RE
FSE
Q/H
avan
aS
N=0
.99
EN
SE
MB
L/H
avan
aS
N=0
.87
MG
C/H
avan
aS
N=0
.98
CC
DS
/Hav
ana
SN
=1
Nb
of lo
ci only in other set
only in Havana
common to both sets
=> Most of the genes from the other sets are contained in Havana-Gencode annotation
(less for ENSEMBL)
Gene level
Transcripts
0
500
1000
1500
2000
2500
3000
Ha
van
a/R
efs
eq
SN
=0
.02
6H
ava
na
/En
sem
bl
SN
=0
.02
7H
ava
na
/MG
CS
N=
0.0
06
Ha
van
a/C
CD
SS
N=
0.0
01
RE
FS
EQ
/Ha
van
aS
P=
0.1
26
EN
SE
MB
L/H
ava
na
SP
=0
.09
6M
GC
/Ha
vana
SP
=0
.03
8C
CD
S/H
ava
na
SP
=0
.00
9
nb
of
tra
ns
cri
pts
only in other set
only in Havana
common to both sets
Transcripts, only CDS
0
200
400
600
800
1000
1200
Hav
ana/
Ref
seq
SN
=0.4
23H
avan
a/E
nsem
blS
N=0
.418
Hav
ana/
MG
CS
N=0
.283
Hav
ana/
CC
DS
SN
=0.3
25R
EF
SE
Q/H
avan
aS
P=0
.712
EN
SE
MB
L/H
avan
aS
P=0
.511
MG
C/H
avan
aS
P=0
.568
CC
DS
/Hav
ana
SP
=0.8
68
nb
of t
ran
scri
pts
only in other set
only in Havana
common to both sets
=> Very few full transcripts are exactly identical
The coding part of the transcripts is better conserved
Transcript level
=> Few transcripts are exactly identical but most of the transcripts from other sets are included in transcripts from
Havana-Encode, especially MGC genes (transcripts not as extended as the annotation)
Havana-Gencode transcript:
Transcript from
other sets:
Not supporting the
annotated transcript
Supporting the
annotated transcript
Relaxed criterion:allows transcripts from the other sets to be
included in Havana-Gencode transcripts
Transcripts
0500
10001500200025003000
Hav
ana/
Ref
seq
SN
=0.0
26H
avan
a/E
nsem
blS
N=0
.027
Hav
ana/
MG
CS
N=0
.006
Hav
ana/
CC
DS
SN
=0.0
01R
EF
SE
Q/H
avan
aS
P=0
.126
EN
SE
MB
L/H
avan
a S
P=0
.096
MG
C/H
avan
aS
P=0
.038
CC
DS
/Hav
ana
SP
=0.0
09
nb
of t
ran
scri
pts
only in other set
only in Havana
common to both sets
Transcripts, only CDS
0
200
400
600
800
1000
1200
Hav
ana/
Ref
seq
SN
=0.4
23H
avan
a/E
nsem
blS
N=0
.418
Hav
ana/
MG
CS
N=0
.283
Hav
ana/
CC
DS
SN
=0.3
25R
EF
SE
Q/H
avan
aS
P=0
.712
EN
SE
MB
L/H
avan
aS
P=0
.511
MG
C/H
avan
aS
P=0
.568
CC
DS
/Hav
ana
SP
=0.8
68
nb
of t
ran
scri
pts
only in other set
only in Havana
common to both sets
Transcript level: relaxed criterion
=>
Transcripts, relaxed criterion
0500
10001500200025003000
Hav
ana/
Ref
seq
SN
=0.1
22
Hav
ana/
Ens
embl
SN
=0.1
39
Hav
ana/
MG
CS
N=0
.111
Hav
ana/
CC
DS
SN
=0.1
39
RE
FSE
Q/H
avan
aS
P=0
.594
EN
SE
MB
L/H
avan
aS
P=0
.491
MG
C/H
avan
aS
P=0
.887
CC
DS
/Hav
ana
SP
=0.4
91
nb o
f tra
nscr
ipts only in other set
only in Havana
common to both sets
=>
Transcripts, only CDS, relaxed criterion
0
500
1000
1500
2000
2500
3000
Hav
ana/
Ref
seq
SN
=0.2
10H
avan
a/E
nsem
blS
N=0
.223
Hav
ana/
MG
CS
N=0
.139
Hav
ana/
CC
DS
SN
=0.1
52R
EF
SE
Q/H
avan
aS
P=0
.904
EN
SE
MB
L/H
avan
aS
P=0
.704
MG
C/H
avan
aS
P=0
.940
CC
DS
/Hav
ana
SP
=0.9
61
nb
of t
ran
scri
pts
only in other set
only in Havana
common to both sets
all exons
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000H
avana/R
efs
eq
SN
=0.3
7
Havana/E
nsem
bl
SN
=0.4
Havana/M
GC
SN
=0.2
Havana/C
CD
S
SN
=0.2
3
RE
FS
EQ
/Havana
SP
=0.8
5
EN
SE
MB
L/H
avana
SP
=0.7
5
MG
C/H
avana
SP
=0.7
6
CC
DS
/Havana
SP
=0.7
9
Nb
of
exo
ns
only in other set
only in Havana
common to both sets
all introns
0
1000
2000
3000
4000
5000
6000
7000
Havana/R
efs
eq
SN
=0.5
8
Havana/E
nsem
bl
SN
=0.6
4
Havana/M
GC
SN
=0.3
3
Havana/C
CD
SS
N=
0.3
9
RE
FS
EQ
/Havana
SP
=0.9
8
EN
SE
MB
L/H
avana
SP
=0.9
MG
C/H
avana
SP
=0.9
8
CC
DS
/Havana
SP
=1
Nb
of
intr
on
s
only in other set
only in Havana
common to both sets
CDS exons
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Havana/R
efs
eq
SN
=0.7
1
Havana/E
nsem
bl
SN
=0.7
5
Havana/M
GC
SN
=0.4
2
Havana/C
CD
SS
N=
0.5
2
RE
FS
EQ
/Havana
SP
=0.9
6
EN
SE
MB
L/H
avana
SP
=0.8
5
MG
C/H
avana
SP
=0.9
5
CC
DS
/Havana
SP
=0.9
8
Nb
of
exo
ns
only in other set
only in Havana
common to both sets
CDS introns
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Hava
na/
Refs
eq
SN
=0.7
5
Hava
na/
Ense
mbl
SN
=0.8
Hava
na/
MG
CS
N=
0.4
3
Hava
na/
CC
DS
SN
=0.5
5
RE
FS
EQ
/Hav
ana
SP
=0.9
8
EN
SE
MB
L/H
avana
SP
=0.8
9
MG
C/H
avana
SP
=0.9
8
CC
DS
/Hav
ana
SP
=0.9
9
Nb
of
intr
on
s
only in other set
only in Havana
common to both sets
=> More common introns than exons: could be explained by the fact that most differences are in UTRs (last exons)
Exon/intron level
nucleotide
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000H
avan
a/R
efse
qS
N=0
.53
Hav
ana/
Ens
embl
SN
=0.5
7
Hav
ana/
MG
CS
N=0
.27
Hav
ana/
CC
DS
SN
=0.2
3
RE
FS
EQ
/Hav
ana
SP
=0.9
8
EN
SE
MB
L/H
avan
aS
P=0
.95
MG
C/H
avan
aS
P=0
.99
CC
DS
/Hav
ana
SP
=1
Nb
of
NT
only in other set
only in Havana
common to both sets
nucleotide, CDS only
0
100000
200000
300000
400000
500000
600000
700000
800000
Hav
ana/
Ref
seq
SN
=0.8
2
Hav
ana/
Ens
embl
SN
=0.8
8
Hav
ana/
MG
CS
N=0
.43
Hav
ana/
CC
DS
SN
=0.5
8
RE
FS
EQ
/Hav
ana
SP
=0.9
9
EN
SE
MB
L/H
avan
aS
P=0
.92
MG
C/H
avan
aS
P=0
.98
CC
DS
/Hav
ana
SP
=1
Nb
of
NT
only in other set
only in Havana
common to both sets
Nucleotide level
- Havana-Gencode annotation is richer than the other data sets.
-REFSEQ, MGC and CCDS are almost completely contained in Havana –Gencode, especially CCDS (smaller set)
- ENSEMBL contains more “false positives” (bigger set)
- Transcripts from the other sets are less extended than transcripts from Havana-Gencode annotations, especially MGC (very few transcripts are completely identical)
Conclusions
exon pairs
02000400060008000
10000120001400016000
Ha
van
a/R
efs
eq
SN
=0
.56
8
Ha
van
a/E
nse
mb
lS
N=
0.5
91
Ha
van
a/M
GC
SN
=0
.33
2
Ha
van
a/C
CD
SS
N=
0.3
66
RE
FS
EQ
/Ha
van
aS
P=
0.8
45
EN
SE
MB
L/H
ava
na
SP
=0
.77
3
MG
C/H
ava
na
SP
=0
.77
8
CC
DS
/Ha
van
aS
P=
0.7
84
nb
of
ex
on
pa
irs
only in other set
only in Havana
common to both sets
exon pairs, relaxed criterion
02000400060008000
10000120001400016000
Hav
ana/
Ref
seq
SN
=0.6
66H
avan
a/E
nsem
blS
N=0
.702
Hav
ana/
MG
CS
N=0
.414
Hav
ana/
CC
DS
SN
=0.4
91R
EFS
EQ
/Hav
ana
SP
=0.9
60E
NS
EM
BL/
Hav
ana
SP
=0.8
88M
GC
/Hav
ana
SP
=0.9
75C
CD
S/H
avan
aS
P=0
.999
nb
of
exo
n p
airs
only in other set
only in Havana
common to both sets
exon pairs, only CDS
0100020003000400050006000700080009000
Hav
ana/
Ref
seq
SN
=0.8
05H
avan
a/E
nsem
blS
N=0
.825
Hav
ana/
MG
CS
N=0
.474
Hav
ana/
CC
DS
SN
=0.5
78R
EF
SE
Q/H
avan
aS
P=0
.961
EN
SE
MB
L/H
avan
aS
P=0
.862
MG
C/H
avan
aS
P=0
.956
CC
DS
/Hav
ana
SP
=0.9
86
nb
of
exo
n p
airs
only in other set
only in Havana
common to both sets
exon pairs, only CDS, relaxed criterion
02000400060008000
10000120001400016000
Hav
ana/
Ref
seq
SN
=0.6
75H
avan
a/E
nsem
blS
N=0
.699
Hav
ana/
MG
CS
N=0
.417
Hav
ana/
CC
DS
SN
=0.6
99R
EFS
EQ
/Hav
ana
SP
=0.9
87E
NS
EM
BL/
Hav
ana
SP
=0.9
06M
GC
/Hav
ana
SP
=0.9
82C
CD
S/H
avan
aS
P=0
.906
nb o
f exo
n pa
irs
only in other set
only in Havana
common to both sets
Exon pair level (exon-intron-exon)