Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Data integration and visualization
http://genomeview.org/
http://www.broadinstitute.org/igv/
Stand-alone genome browsers
http://genome.ucsc.edu/
http://www.ensembl.org/
http://www.ncbi.nlm.nih.gov/mapview/
Web-based genome browsers
Web-based Genome Browsers
• Software designed to enable a user to access and display genome sequence data
• Visual integration and correlation of different types of information
• Organize large amounts of genome sequence data
Web-based Genome Browsers
• UCSC, Ensembl and NCBI are based on the same reference genome
• The three genome browser mainly change at interface level and annotations available.
• Some genomes are available in one genome browser but not the other.
UCSC genome browser http://genome.ucsc.edu/
88 species annotated in UCSC genome browser •MAMMALS •VERTEBRATES •DEUTEROSTOMES •NEMATODES •OTHER (Sea Hare, Yeast)
NO plants
UCSC genome browser
• Allows aligning to genome sequence via BLAT
• Table Browser
• Creation of PDF
• Provides access to all the data produced by the project, and to the software used to analyze and present it
• Site produces and maintains annotation tracks
http://genome.ucsc.edu/
Annotation tracks
• Genomic data: known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, homologies, Chip-Seq data, DNAse-Seq data, Expression data, …
• Annotation tracks are both computed at UCSC from publicly available sequence data and provided by collaborators
• Users can also add their own custom tracks to the browser
UCSC genome browser layout http://genome.ucsc.edu/
Current chromosomal location, navigation and search bar.
UCSC genome browser http://genome.ucsc.edu/
Graphical visualization of the chromosome and current view location
UCSC genome browser layout http://genome.ucsc.edu/
Annotation tracks
Genes
SNPs
H3K27Ac Epig. marker
ChIP-Seq data (TF binding sites)
Conservation of sequence in mammals (PhyloP)
Multiple alignments
Repeated sequences
Tracks available
Below the main view of UCSC genome browser many more tracks are available. Tracks are grouped in: •Mapping and sequencing tracks •Phenotype and Disease Associations •Genes and Gene Prediction Tracks •Literature •mRNA and EST Tracks •Expression •Regulation •Comparative Genomics •Neandertal Assembly and Analysis •Denisova Assembly and Analysis •Variation and Repeats
Tracks group
Different visualization options: -Hide -Dense -Full -Squish -Pack
Get DNA for By clicking on a feature (for example a gene) with the right button it’s possible to get access to the feature menu. By clicking on Get DNA for gene name it’s possible to download the gene sequence
Tracks formats
• BED: flexible format to define data lines that are displayed in an annotation track.
• GTF: gene transfer format, generally used to display gene annotation data
• WIG: allows to display countinuos valued data • BAM: standard alignment format for NGS
sequence aligners • VCF: variants call format, used to display
sequence variants • Many more…
BED file format
Name Description
chrom* The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
chromStart* The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
chromEnd* The ending position of the feature in the chromosome or scaffold.
name Defines the name of the BED line.
score A score between 0 and 1000.
strand Defines the strand - either '+' or '-'.
thickStart The starting position at which the feature is drawn thickly .
thickEnd The ending position at which the feature is drawn thickly
itemRgb An RGB value of the form R,G,B (e.g. 255,0,0).
blockCount The number of blocks (exons) in the BED line.
blockSizes A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
BlockStarts A comma-separated list of block starts.
track name=junctions description="TopHat junctions" test_chromosome 180 402 JUNC00000001 46 + 180 402 255,0,0 2 70,52 0,170 test_chromosome 349 550 JUNC00000002 38 + 349 550 255,0,0 2 51,50 0,151
33 * Required fields
Flexible format, easy to adapt from blast and blat outputs…
BED file example
chr7 127471196 127472363 Pos1 0 +
chr7 127472363 127473530 Pos2 100 +
chr7 127473530 127474697 Pos3 100 +
chr7 127474697 127475864 Pos4 1000 +
chr7 127475864 127477031 Neg1 1000 -
chr7 127477031 127478198 Neg2 0 -
chr7 127478198 127479365 Neg3 0 -
chr7 127479365 127480532 Pos5 0 +
chr7 127480532 127481699 Neg4 0 -
Display of custom tracks Cliccando con il tasto destro sulla traccia posso modificare come viene mostrata:
Display of custom tracks By clicking on the name of the track it’s possible to configure how the track is displayed
Display of custom tracks
Posso ad esempio cambiare il nome della traccia e la descrizione ma anche colorare la traccia in maniera differente a seconda dello strand
track name='my data' description='this is an example track' colorByStrand="255,0,0 0,0,255"
Display of custom tracks
chr7 127471196 127472363 Pos1 0 +
chr7 127472363 127473530 Pos2 100 +
chr7 127473530 127474697 Pos3 100 +
chr7 127474697 127475864 Pos4 1000 +
chr7 127475864 127477031 Neg1 1000 -
chr7 127477031 127478198 Neg2 0 -
chr7 127478198 127479365 Neg3 0 -
chr7 127479365 127480532 Pos5 0 +
chr7 127480532 127481699 Neg4 0 -
Display of custom tracks
track name='my data' description='this is an example track' useScore=1
By using the useScore=1 option it’s possible to show the tracks in shades of gray depending on the score column value
Display of custom tracks
chr7 127471196 127472363 Pos1 0 +
chr7 127472363 127473530 Pos2 100 +
chr7 127473530 127474697 Pos3 100 +
chr7 127474697 127475864 Pos4 1000 +
chr7 127475864 127477031 Neg1 1000 -
chr7 127477031 127478198 Neg2 0 -
chr7 127478198 127479365 Neg3 0 -
chr7 127479365 127480532 Pos5 0 +
chr7 127480532 127481699 Neg4 0 -
Extracting informations with the Table Browser
All the data displayed on UCSC genome browser is contained in a public MySQL server (relational database) which can be: • accessed directly with standard mysql command • or can be accessed using the Table browser
Useful to retrieve data associated with a track in text format, to calculate intersections between tracks, and to retrieve DNA sequence covered by a track.
Extracting informations with the Table Browser
Allows to intersect the selected track with another track.
Ensembl http://www.ensembl.org
71 species in Ensembl database
Includes automatic annotations on eukariotes made by Ensembl
EnsemblGenomes http://www.ensemblgenomes.org/
Extension to standard Ensembl: •Ensembl Bacteria •Ensembl Fungi •Ensembl Metazoa •Ensembl Plants •Ensembl Protists
Ensembl genome browser layout
Chromosome view
Region view
By clicking on a gene name more informations and links are shown
Configuring displayed tracks
By clicking on “Configure this page” button it’s possible to configure the tracks shown
NCBI Map Viewer
NCBI is the source of data also for Ensembl and UCSC Genomes available: •28 Vertebrates •17 Invertebrates •19 Protozoa •118 Plants
http://www.ncbi.nlm.nih.gov/mapview/
NCBI Map Viewer layout Chromosome selection
Chromosome ideogram Unigene clusters Unigene clusters
Zoom
“Navigation” Ideogram
Stand-alone Genome Browsers Integrative Genomics Viewer
-Stand-alone (Java) -Easily configurable -Useful when you want to work on your own data -No need to upload the data on public servers -Memory limited to java instance memory-hungry if many tracks are loaded.
http://www.broadinstitute.org/igv/