Upload
nadeem-akhter
View
113
Download
0
Embed Size (px)
DESCRIPTION
1
Introduction to Bioinformatics
2
Science of collecting, analyzing and conceptualizing biological data by implication of informatics techniques.
Bioinformatics
Biology
Informa-tics
Bioinformatics
3
What is Bioinformatics
BiologicalData
ComputerAnalysis+
Mouse Genome: 2.5 billion base pairsHuman Genome: 3 billion base pairs
4
Manage biological information organize biological information using databases Process, analyze, and visualize biological data Share biological information to the public using the
Internet.
Goals of Bioinformatics
5
Bio – informatics Bioinformatics is conceptualizing biology in
terms of molecules (in the sense of physical-chemistry) applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.
Bioinformatics is a practical discipline with many applications.
Definition
6
Computational biology
Bioinformatics
Systems biology Genomics
Bioinformatics
7
Biological Information Central Dogma
of Molecular Biology DNA -> RNA -> Protein -> Phenotype -> DNA
Molecules Sequence, Structure, Function,
Interaction Processes Mechanism, Specificity,
Regulation
Central Paradigmfor Bioinformatics
Genomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Protein Interaction -> Phenotype
Large Amounts of Information Statistical Computer Processing
8
Methods of analyzing
data Systems Analysis
Information Theory
Graph Theory
Robotics
Algorithms
Artificial IntelligenceStatistics
9
Domains of
bioinformaticsBio-
informatist
Development of new softwareAlgorithm
s
Bio-informaticians.
Using different algorithms
and computer software
10
Could not have been achieved without bioinformatics Goals 3 billion DNA subunits Discover all the human genes Make them accessible for further biological study
then ?
Need to bring together and store vast amounts of information from
Lab equipment and experiments Computer Analysis Human Analysis Make visible to the world’s scientists
Human genome project
11
How to analyze
information Data –Management. –Analysis. –Derive Hypothesis. –Design and Implement an in silico
experiment. –Confirm in the wet lab.
12
Find an answer quickly Most in silico biology is faster than in vitro 2. Massive amounts of data to analyze Need to make use of all information Not possible to do analysis by hand Can’t organize and store information only using lab
note books• Automation is key However! Verification ?
Why bioinformatics
1. Computational biology- Computing methods for classical biology Primarily concerned ----> Evolutionary, population
and theoretical biology, Cellular/Molecular biology ?
2. Medical informatics- Computing methods to improve communication,
understanding, and management of medical data Data Manipulation
Applications
3. Chemo -informatics Chemical and biological technology, for drug
design and development
4. Genomics Analysis and comparison of the entire genome of
a single species or of multiple species Genomics existed before any genomes were
completely sequenced, but in a very primitive state
Continued…
5. Proteomics Study of how the genome is expressed in proteins, and of
how these proteins function and interact Concerned with the actual states of specific cells, rather
than the potential states described by the genome
6. Pharmacogenomics The application of genomic methods to identify drug
targets For example, searching entire genomes for potential drug
receptors, or by studying gene expression patterns in tumors
Continued….
7. Pharmacogenetics : The use of genomic methods to determine
what causes variations in individual response to drug treatments
The goal is to identify drugs that may be only be effective for subsets of patients, or to tailor drugs for specific individuals or groups
17
Main Goal: ?
Annotation Comparativegenomics
Structuralgenomics
Functionalgenomics
The “post-genomics” era
18
Annotation
Identify the genes within a given sequence of DNA
Identify the sitesWhich regulate the gene
Predict the function
19
A gene is characterized by several features (promoter, ORF…)
some are easier and some harder to detect…
How do we identify a gene
in a genome?
20
Comparativegenomics
21
Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%
How humans are chimps?
Perhaps not surprising!!!
So where are we different ??
22
Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA
23
StructuralGenomics
24
The protein three dimensional structure can tell much more than the sequence alone
Protein-ligand complexes
Functional sites
fold Evolutionaryrelationship
Shape and electrostatics
Active sites
protein complexes
Biologic processes
The different types of data are collected in
database
Sequence databases Structural databases Databases of Experimental Results
All databases are connected
25
Resources and Databases
Gene database Genome database Disease related mutation database
26
Sequence databases
3-dimensional structures of proteins, nucleic
acids, molecular complexes etc
3-d data is available due to techniques such as NMR and X-Ray crystallography
27
Structure Databases
Data such as experimental microarray images-
gene expression data Proteomic data- protein expression data Metabolic pathways, protein-protein
interaction data, regulatory networks
28
Databases of Experimental Results
29
PubMed
Service of the National Library of Medicine
http://www.ncbi.nlm.nih.gov/pubmed/
Literature Databases
Each Database contains specific information
Like other biological systems also these databases are interrelated
30
Putting it all Together
31
GENOMIC DATAGenBank
DDBJEMBL
ASSEMBLED GENOMES
GoldenPathWormBase
TIGR
PROTEINPIR
SWISS-PROT
STRUCTUREPDB
MMDBSCOP
LITERATUREPubMed
PATHWAYKEGGCOG
DISEASELocusLink
OMIMOMIA
GENESRefSeq
AllGenesGDBSNPs
dbSNP
ESTsdbEST
unigene
MOTIFSBLOCKS
PfamProsite
GENE EXPRESSION
Stanford MGDBNetAffx
ArrayExpress
Applications I-- Genomics
Finding Genes in Genomic DNA introns exons Promotors
Characterizing Repeats in Genomic DNA Statistics Patterns
Expression Analysis Time Course Clustering Identifying regulatory Regions Measuring Differences
• Genome Comparisonsà Ortholog Familiesà Genome annotationà Evolutionary Phylogenetic
trees• Characterizing Intergenic
Regionsà Finding Pseudo genes à Patterns
• Duplications in the Genomeà Large scale genomic
alignment
Application II-
Protein Sequence
Sequence Alignment non-exact string matching,
gaps How to align two strings
optimally via Dynamic Programming
Local vs Global Alignment Suboptimal Alignment Hashing to increase speed
(BLAST, FASTA) Amino acid substitution
scoring matrices Multiple Alignment and
Consensus Patterns How to align more than one
sequence and then fuse the result in a consensus representation
Transitive Comparisons HMMs, Profiles Motifs
Scoring schemes and Matching statistics How to tell if a given
alignment or match is statistically significant
A P-value (or an e-value)? Score Distributions
(extreme val. dist.) Low Complexity Sequences
Evolutionary Issues Rates of mutation and
change
Application III--
Protein Structure
Secondary Structure “Prediction” via Propensities Neural Networks, Genetic
Algorithm. Simple Statistics Trans Membrane Regions Assessing Secondary
Structure Prediction
Tertiary Structure Prediction Fold Recognition Threading Ab initio
Function Prediction Active site identification
Relation of Sequence Similarity to Structural Similarity
Example Application IV: Finding Homologs
Core
Overall Occurrence of a
Certain Feature in the Genome e.g. how many
kinases in Yeast Compare Organisms and
Tissues Expression levels in
Cancerous vs Normal Tissues
Databases, Statistics
Example Application IV:Overall Genome Characterization
37
Thanks