Click here to load reader
Upload
demont
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16. Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK. Introduction: Story of Evolutionary History. - PowerPoint PPT Presentation
Citation preview
Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16
Kyle Tretinawith a team led by Dr. Pattle P. Punin collaboration with Mr. Ross Leung of CUHKAnalysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16
Introduction: Story of Evolutionary History
Bacteria < Fish < Primate < HumanStory: increasing organismal complexity as evolution proceedsWHY?But little Mouse, you are not alone, In proving foresight may be in vain: The best laid schemes of mice and men Go often askew, And leave us nothing but grief and pain, For promised joy! Robert Burns (1785)
Fellow mortalman's dominionmous is blessed compared to man, they can only see the present. I look at the future and fear.GeneticsCentral Dogma: DNA RNA Protein
Complexity ~ Number of Genes?Humans ~30,000Flies ~ 14,000
G-Value Paradox
Complexity (K) ~ Gene Number (N)?Relationship?proportional:K~Npolynomial:K~Naexponential:K~aNfactorial:K~N!
Jean-Michel Claveries: ON/OFF states
230,000 / 214,000 3x104816
GoalDetermine the role of non-coding DNA in gene regulation by looking at the functions of non-coding SNPs that are positively selected or non-positively selected on chromosome 16DefinitionsSNP: single nucleotide polymorphismVariable between populationsImportance likely due to stability of variation
Selection: description of phenomena that only organisms best adapted to their environment tend to survive and create progenyGene-selection algorithm and neutral selection theory (wrench)Methods OverviewHapMap Database Selection Data List of Chr16 SNPs
UCSC Genome Database Mirror SNP flanking sequence
TRANSFAC related transcription factor data for each SNP flanking sequence
PReMod confirm results
HapMap Phase I DataHapMap Project: an international effort to identify and catalog genetic similarities and differences in human beings (Haplotype Maps), also includes:
Selection Data List of Chr16 SNPs~25,000 non-positively selected~5,000 positively selected
UCSC Genome BrowserGenome.UCSC.edu: a website containing several reference sequences and tools for visual and computational analysis
Methods:Enter in each from list of RSIDs (SNP Identifiers)Note intersecting sequencesCopy/Paste Sequences
UCSC Genome Browser MirrorEfficiency~70seq/hr for 1.5yrs = ~1/3 sequences gathered2hrs
Online Instructions, but Complicated Data Structure
Henry Ford: 1.1 million lines source code
Many thanks to the Dr. Hayward (Wheaton College CS Faculty)We have directions, but we need to alter the machine, and know WHERE to alter it-> Dr. Hayward15Sequences CollectedGraph 1. The distributions of the positively selected SNPs used in the study across human chromosome 16
Graph 2. The distributions of the non-positively selected SNPs used in the study across human chromosome 16
TRANSFACTRANSFAC: a relational database, available via the web as six flat files including various data concerning transcription factors, DNA-binding sites, and target genes
Automation at CUHK
PReModPReMod: a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.
Enter ranges for SNP sequencesLook for same pattern as TRANSFACexploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites20
AnalysisMySQL Tables
Programmed Scripts:Word Patterns: i.e. keywords, recurring identifiersUnique EntriesProgress StatisticsOverlap between N+ selected and + selected SNPsResultsSNP SelectionRS NumbersSequence GatheredNon-Positive25,6226173 (24%)Positive47504750 (100%)
Table 1. A summary of the manual SNP flanking sequence gathering from the UCSC Genome Browser ResultsSNP SelectionTotalNo SitesUniqueMatches in Other DatasetTRANSFAC Entries to Be Looked UpNon-Positive25,5941,611 (6%)3,218 (13%)20,765 (81%)82 (