Objectives
• What is Ensembl?
• What type of data can you get in Ensembl?
• How to navigate the Ensembl browser website.
• Where to go for help and documentation.
This webinar courseDate Webinar topic Instructor
6th April Introduction to Ensembl Helen Sparrow
13th
April
Ensembl genes Emily Perry
20th
April
Data export with BioMart Victoria Newman
27th
April
Variation data in Ensembl and the Ensembl VEP Victoria Newman
4th May Comparing genes and genomes with Ensembl Compara Ben Moore
11th May Finding features that regulate genes – the Ensembl Regulatory
Build
Ben Moore
18th May Uploading your data to Ensembl and advanced ways to access
Ensembl data
Emily Perry
Structure
Presentation:Where Ensembl genes come from
Demo:Getting gene data
Exercises:On the train online course
Questions?
• We’ve muted all the mics• Ask questions in the Chat box in
the webinar interface• My Ensembl colleagues will
respond during the talk• There’s no threading so please
respond with @name
Ben Moore Victoria Newman
Course exercises
http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar
and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solutions will appear in the page
hierarchy
Get help with the exercises
• Use the exercise solutions in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Gene views
Merged transcript
Protein coding transcript
Non-coding transcript
Coding exon Intron Non-coding exon
2## - Ensembl annotation
0## - Havana annotation
Automatic gene annotation
• Genome-wide determination using the Ensembl automated pipeline
• Predictions based on experimental (biological) data
• Known proteins/cDNAs plotted onto the genome using sequence matching
Biological Evidence
• International Nucleotide Sequence databases
• Protein sequence databases• Swiss-Prot: manually curated
• TrEMBL: unreviewed translations
• NCBI RefSeq• Manually annotated proteins and mRNAs (NP, NM)
Other species
• Infer genes from homology to other species• Eg predict genes in by mapping cDNAs/proteins
from to the genome
• RNAseq data
Manual gene annotation
• Gene determination on a case-by-case basis by a person
• h
• Genome-wide
• Genes list
GENCODE
• The GENCODE gene set is made up of:
• Ensembl automatically annotated genes
• Havana manually annotated genes
• The merged gene set
• GENCODE is the default gene set used by ENCODE, 1000 genomes and other major projects.
CCDS transcripts
• Consensus coding DNA sequence set
• Agreement between EBI, WTSI, UCSC and NCBI
• vg• http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi
Which transcript to use?
• GENCODE Basic: Only the “complete” transcripts (where a gene has complete transcripts) (http://www.ensembl.org/Help/Glossary?id=500)
• Transcript support level: Scored 1-5 for quality, where 1 is the best (http://www.ensembl.org/Help/Glossary?id=492)
• APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments. (http://www.ensembl.org/Help/Glossary?id=521)
• + CCDS, + Golden transcripts
Ensembl stable IDs
• ENSG########### Ensembl Gene ID
• ENST########### Ensembl Transcript ID
• ENSP########### Ensembl Peptide ID
• ENSE########### Ensembl Exon ID
• For non-human species a suffix is added:
MUS (Mus musculus) for mouse ENSMUSG###
DAR (Danio rerio) for zebrafish: ENSDARG###
http://www.ensembl.org/info/genome/stable_ids/index.html
Why Gene Ontology (GO)?
Innate immunity
Non-specific immunity
Phagocyte
Complement Cytokines Natural killer cells
Multiple terms for the same thing
Gene descriptions too specific
Mast cells
GO terms form a controlled vocabulary
GO:0045087 - innate immune responseInnate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.
GO terms are hierarchical
GO:0045087innate immune response
GO:0006955immune response
GO:0006957complement activation,
alternative pathway
GO:0001867complement activation,
lectin pathway
GO:0009814defence response,
incompatible interaction
GO:0042381hemolymph coagulation
GO:0009682induced systemic
resistance
GO:0002227innate immune response
in mucosa
GO:0035420MAPK cascade involved in innate immune response
GO:0035006melanisation defence
response
GO:0002228natural killer cell
mediated immunity
GO:0045824negative reg of innate
immune response
GO:0009626plant-type
hypersensitive response
GO:0045089positive reg of innate
immune response
GO:0045088regulation of innate immune response
GO:0034341response to
interferon-gamma
GO:0034340response to type I
interferon
GO:0034342response to type II
interferon
GO:0009616virus induced gene
silencing
Hands on
• We’re going to look at an Ensembl gene, ESPN, and find out information about it and its transcripts.
Next webinar – Data export with BioMart
Ensembl data can be easily exported in bulk using BioMart. BioMart is a flexible tool that allows you to easily specify what Ensembl features you want data for, and what data you want to see about them, then export those data in a table or as sequences.
Learn the basics of running a BioMart query, and
explore some of the options that are available.
Victoria Newman
Questions?
• We’ve muted all the mics• Ask questions in the Chat box in
the webinar interface• My Ensembl colleagues will
respond during the talk• There’s no threading so please
respond with @name
Ben Moore Victoria Newman
Course exercises
http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar
and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solutions will appear in the page
hierarchy
Get help with the exercises
• Use the exercise solutions in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11
Tutorials www.ensembl.org/info/website/tutorials
Flash animations
www.youtube.com/user/EnsemblHelpdesk
http://u.youku.com/Ensemblhelpdesk
Email us [email protected]
Ensembl public mailing lists [email protected], [email protected]
Publications
Aken, B. et al
Ensembl 2017
Nucleic Acids Research
http://europepmc.org/articles/PMC5210575
Xosé M. Fernández-Suárez and Michael K. SchusterUsing the Ensembl Genome Server to Browse Genomic Sequence Data.Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010)www.ncbi.nlm.nih.gov/pubmed/20521244
Giulietta M Spudich and Xosé M Fernández-SuárezTouring Ensembl: A practical guide to genome browsingBMC Genomics 11:295 (2010)www.biomedcentral.com/1471-2164/11/295
http://www.ensembl.org/info/about/publications.html