RRESEARCHESEARCHat
GGENOMEENOME BBIOINFORMATICSIOINFORMATICS LLABAB
Josep F. Abril Ferrandoand
Genís Parra Farré
Genome BioInformatics Research Lab
RGBI @ ( IMIM – UPF – CRG )
Introduction
Visualization of Genomic
Annotations
Comparative Genomics
Human and Mouse Genomes
Exon Structural SelectionBIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-
CRG)
SUMMARYSUMMARY
Computational Analysis of Genomic Computational Analysis of Genomic SequencesSequences
DNA SEQUENCE
Sequencing
ASSEMBLED SEQUENCE
Assembling
ANNOTATED SEQUENCE
Analyzing
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: Single GenesFrom Genes to Genomes: Single Genes
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: ChromosomesFrom Genes to Genomes: Chromosomes
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
From Genes to Genomes: Whole GenomesFrom Genes to Genomes: Whole Genomes
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Comparative Genomics: Single GenesComparative Genomics: Single Genes
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Comparative Genomics: Syntenic RegionsComparative Genomics: Syntenic Regions
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Programming in PProgramming in POSTOSTSSCRIPT (I)CRIPT (I)
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
%!PS
%
%% Variable Definition: $counter = 0
/counter 0 def
%
%% Function Definition: sub box(x,y) {...}
/box { %%% y x box
gsave %
20 mul % y X
0 % y X 0
moveto % y
20 mul % Y
dup % Y Y
10 0 % Y Y 10 0
rlineto % Y Y
0 % Y Y 0
exch % Y 0 Y
rlineto % Y
-10 0 % Y -10 0
rlineto % Y
neg % -Y
0 % -Y 0
exch % 0 -Y
rlineto %
closepath %
0 1 0 % 0 1 0
setrgbcolor % "green-color"
fill %
grestore %
} def %
Vector Graphics
Language
Prefix Notation
Stacks:
exec, paths, dicts, ...
Dictionaries:
Identifier Object
%
%% Initialization
100 100 translate % New Coords Origin
2 5 scale % Re-scaling x-axes*2
% % y-axes*5
%
%% BaseLine
gsave %
0 0 moveto %
90 0 lineto %
0 setgray %
1 setlinewidth %
stroke %
grestore %
Programming in PProgramming in POSTOSTSSCRIPT (II)CRIPT (II)
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
%
%% Main Loop
mark % mark
0.25 0.35 0.15 % mark 0.25 0.35 0.15
counttomark % mark 0.25 0.35 0.15 3
{ %%%%%%%%%%%%%% begin loop (x3)
/counter %%
counter %%
1 add %%
def %% $counter = $counter + 1
counter %
% 1st loop: mark 0.25 0.35 0.15 counter==1
% 2nd loop: mark 0.25 0.35 counter==2
% 2nd loop: mark 0.25 counter==3
box % mark ...
} repeat %%%%%%%%%%%%%% finish loop (x3)
pop % clean up stack (removes "mark")
%
showpage
%%EOF%%
GFF2PS and GFF2APLOTGFF2PS and GFF2APLOT
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Visualizing Genomic AnnotationsVisualizing Genomic Annotations
J.F. Abril and R. Guigó.
" gff2ps: visualizing genomic annotations "
Bioinformatics 16(8):743-744 (2000).
M.G. Reese, G. Hartzell, N.L. Harris, U. Ohler, J.F. Abril and S.E. Lewis.
" Genome Annotation Assessment in Drosophila melanogaster "
Genome Research 10(4):483-501 (2000).M.D. Adams et al (including J.F. Abril).
" The Genome Sequence of Drosophila melanogaster "
Science 287(5461):2185-2195 (2000).
J.C. Venter et al (including J.F. Abril and R. Guigó).
" The Sequence of the Human Genome "
Science 291(5507):1304-1351 (2001).
R.A. Holt et al (including J.F. Abril and R. Guigó).
" The Genome Sequence of the Malaria Mosquito Anopheles gambiae "
Science 298(5591):129-149 (2002).
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
http://genome.imim.es/software/gfftools/GFF2PS.html
Whole Genome Gene-FindingWhole Genome Gene-Finding
Homosapiens
GENES
abinitio
DATABASE
homology
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding: Comparative Whole Genome Gene-Finding: Comparative ApproachApproach
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding: Comparative Whole Genome Gene-Finding: Comparative ApproachApproach
GENES
Homosapiens
Musmusculus
GENES
homology
geneprediction
geneprediction
homology
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Whole Genome Gene-Finding Results Whole Genome Gene-Finding Results AnalysisAnalysis
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Human and Mouse Comparative GenomicsHuman and Mouse Comparative Genomics
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Mouse Genome Sequencing Consortium (including J.F. Abril, G. Parra and R. Guigó).
" Initial sequencing and comparative analysis of the mouse genome "
Nature 420(6915):520-562 (2002).
G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó.
" Comparative gene prediction in human and mouse "
Genome Research 13(1):108-117 (2003).
R. Guigó, E.T. Dermitzakis, P. Agarwal, C.P. Ponting, G. Parra, A. Reymond, J.F. Abril, E. Keibler, R. Lyle, C. Ucla, S.E. Antonarakis and M.R. Brent.
" Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes "
PNAS 100(3):1140-1145 (2003).
Predicting “Novel” Genes in the Mouse Predicting “Novel” Genes in the Mouse Genome (I)Genome (I)
golden path annotations
golden path annotations
additional blastn matches to ENSEMBL + REFSEQ
additional blastn matches to ENSEMBL + REFSEQ
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Predicting “Novel” Genes in the Mouse Predicting “Novel” Genes in the Mouse Genome (II)Genome (II)
tblastx
geneidexons
tblastx
sgpgenes
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
additional blastn matches to ENSEMBL + REFSEQ
Homosapiens
Predictions
Musmusculus
Predictions
GENESEnriched Pool
StructuralAlignment Exstral
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
HomologyBlastp
Homology and Gene Structure FilteringHomology and Gene Structure Filtering
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Exon Structure over an AlignmentExon Structure over an Alignment
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
RT-PCR ValidationRT-PCR Validation
Number of predictions
Tested Success Rate
Enriched 1428 214 62.15%
Similar 2125 38 10.53%
Other 3659 63 3.17%
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Results of the Experimental ValidationResults of the Experimental Validation
BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)
Example of a Bash ScriptExample of a Bash Script
http://genome.imim.es/