Upload
simon-cannon
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Genome Browser
The PlotDeepak PurushothamHamid Reza HassanzadehHaozheng TianJuliette Zerick
Lavanya RishishwarPiyush Ranjan
Lu Wang
Why A Genome Browser?I want to
analyze this organism
Gene
FunctionsProtein
Domains
Metabolic Pathways
Comparative AnalysisSyn
teny
The Genome Browser
“Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.”
Melissa S Cline & James W Kent, 2009
Genome browsers aggregate data
Taken From Andy Conley’s slides without permission
A Brief Time Travel
• FlyBase, SGD, MGD, and WormBase• Setting up an MOD is expensive and time-consuming.• The four MODs agreed in the fall of 2000 to pool their
resources and to make reusable components available to the community free of charge under an open source license.
• The goal of this NIH-funded project, christened GMOD, is “…to generate a model organism database construction set that would allow a new model organism to be assembled by mixing and matching various components.”
JBrowse
• Smooth, fast navigation (think Google Maps for genomes )
• Supports BED, GFF, Bio::DB::*, Chado, WIG, BAM, UCSC (intron/exon structure, name lookups, quantitative plots)
• Relies on pre-indexing to minimize security exposure and runtime bandwidth/CPU load on the server (future versions more likely to do some server work at runtime)
• Has an API for customized track/glyph extensions • Is stably funded by NHGRI, with many interesting
innovations implemented & pending integration
First look: Live Demo
A couple of JBrowses around the web • http://intron.ccam.uchc.edu/JBrowse/Dmel/ • http://jbrowse.org/ucsc/hg19/
Cons
• No user-uploaded data support • Slow for big numbers of reference seqs (e.g.
5,000 annotated contigs) • Few glyph options, feature tracks are limited
by the facts of <div>
GBrowse
• Most popular web based genome browser• Visualize genome features along a reference
sequence• Open Source• Highly customizable• Excellent usability• Rich set of “glyphs”
– Genome features– Quantitative Data– Sequence Alignments
Under The Hood
• Client-Server Architecture
• GBrowse Architecture• Installation Issues• Input Data• Configuration File• Customization
Client Server Architecture
3. Web Server receives the request and “serves” the client i.e., starts Gbrowse
Client Server Architecture
4. In case of success, relevant hypertexts and multimedia is generated by accessing the database
Client Server Architecture
6. The whole process repeats again when the user interacts with the browser
©2002 by Cold Spring Harbor Laboratory Press
Stein L D et al. Genome Res. 2002;12:1599-1610
GBrowse Architecture
Data file (.gff3)
Reference Sequence (Chr/Clone/Contig)
SourceEg: Prodigal/Glimmer
Type(sequence ontology (SO) terms)
StartEnd Score
Eg: E-value
Strand
Phase(0/1/2)
AttributesFormat: tag=value
Attributes (Data file)Different tags have predefined meanings:• ID: Gives the feature a unique identifier. Useful when grouping features
together (such as all the exons in a transcript).• Name: Display name for the feature. This is the name to be displayed to
the user.• Alias: A secondary name for the feature. It is suggested that this tag be
used whenever a secondary identifier for the feature is needed, such as locus names and accession numbers.
• Note: A descriptive note to be attached to the feature. This will be displayed as the feature's description.
Alias and Note fields can have multiple values separated by commas. For example : Alias=M19211,gna-12,GAMMA-GLOBULIN
• Other good stuff can go into the attributes field.
Gbrowse Configuration File
• Global Website Settings• Additional HTML Pages• JavaScript• Jquery• Global Database
Settings• Data Source Definitions
Making a new Track
### TRACK CONFIGURATION ###[ExampleFeatures]feature = remarkglyph = genericstranded = 1bgcolor = orangeheight = 10key = Example Features
Searching for Features
Gene symbolsGene IDsSequence IDsGenetic markersRelative nucleotide coordinatesAbsolute nucleotide coordinatesetc...
click
In short…
• Main features (Determination of protein coding and non-coding,…)
• Quantitative data (E-value, Identity percentage)
• Other evidences (Interpro, CoGs, etc.)• GC content and other useful measurements• Protein and DNA sequences
M19107 M19501 M21127 M21621 M21639 M217090
500
1000
1500
2000
2500
3000
Total Genes
Pangenome Hits
UniProt
Quality Value Integration
It distinguishes between different databases…
However, for matches from the same database…
Quality Scores
Origin of D
atabase Matches
Color code will also be used for matches with different quality…
Organism Summary Page
• At this point of the course, we have gathered a lot of information for the strains we are dealing with
• Not all of this information could be represented inside the genome browser
• We propose a separate section in the browser containing strain-wise summarized information
Organism Summary Page
• Conceptually, the page could contain:– Biological information– Assembly information:
Genome Size, Number of contigs, N50, Sequencing platform– Gene Prediction information:
Number of protein coding and non-protein coding genes, links to 16s rRNA gene
– Annotation information:Percent annotation, function distribution pie
– Comparative information:Unique protein clusters, etc.
Operons
• Operon“…is a functioning unit of genomic DNA containing a cluster of genes under the control of a single regulatory signal or promoter”
• ~70% of the genes have been assigned a unique OperonID
• OperonID will provide an additional browsing mechanism for biologist connecting co-transcribed and co-regulated genes.
BRIG Patterns
• Concept:To either generate BRIG images at run
time or load static images when the user requests for BRIG Pattern between two species