View
35
Download
1
Category
Tags:
Preview:
DESCRIPTION
CS 7010: Computational Methods in Bioinformatics (course introduction). Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064 http://digbio.missouri.edu. Challenges of Our Civilization -1. - PowerPoint PPT Presentation
Citation preview
CS 7010: Computational Methods in
Bioinformatics(course introduction)
Dong Xu
Computer Science Department109 Engineering Building WestE-mail: xudong@missouri.edu
573-882-7064http://digbio.missouri.edu
Challenges of Our Civilization -1
top 125 unsolved problems in science over the next quarter-century (http://www.sciencemag.org/sciext/125th/)
The Top 25
What Is the Universe Made Of?
What is the Biological Basis of Consciousness?
Why Do Humans Have So Few Genes?
To What Extent Are Genetic Variation and Personal Health Linked?
Can the Laws of Physics Be Unified?
How Much Can Human Life Span Be Extended?
What Controls Organ Regeneration?
How Can a Skin Cell Become a Nerve Cell?
Challenges of Our Civilization-2
How Does a Single Somatic Cell Become a Whole Plant?
How Does Earth's Interior Work?
Are We Alone in the Universe?
How and Where Did Life on Earth Arise?
What Determines Species Diversity?
What Genetic Changes Made Us Uniquely Human?
How Are Memories Stored and Retrieved?
How Did Cooperative Behavior Evolve?
How Will Big Pictures Emerge from a Sea of Biological Data?
Challenges of Our Civilization-3
How Far Can We Push Chemical Self-Assembly?
What Are the Limits of Conventional Computing?
Can We Selectively Shut Off Immune Responses?
Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?
Is an Effective HIV Vaccine Feasible?
How Hot Will the Greenhouse World Be?
What Can Replace Cheap Oil -- and When?
Will Malthus Continue to Be Wrong?
Lecture Outline
What does bioinformatics do?
Course topics
Course Organization
Workload/grades
Technical Definitions
NIH (http://www.bisti.nih.gov/) Bioinformatics: “research, development, or
application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”.
Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.
Scope of Bioinformatics:Studying Biology on
Computer
data management; data mining; modeling; prediction; theory formulation
engineering aspect
scientific aspect
bioinformatics
an indispensable part of biological sciencewith its own methodology
genes, proteins, protein complexes, pathways, cells, organisms, ecosystem
computer science, biology, statisticsphysics, mathematics, chemistry, engineering,…
Why Bioinformatics is So Hot? (I)
More than 80 universities offer graduate degrees in bioinformatics
At cross-section of two most active fields: computer science and molecular biology
Exponential growths in computer technologies (hardware, Internet) pave the way for bioinformatics development
Why Bioinformatics is So Hot? (II)
Analytical technology
High-throughput data
Biological knowledge
Medicine & bioengineering
What Can Computing Do for Biology?
Data interpretation in analytical technologies Data management and computational
infrastructure Discovery from data mining Modeling, prediction and design Theoretical / in silico biology
Almost cover every area of computer science
Data Interpretationin Analytical Technologies
(I)
Analytical technologies are the driving force of new (large-scale) biology:
DNA sequencing (genomics)
X-ray / NMR structure determination (structural genomics)
Protein identification using mass spectrometry (proteomics)
Microarray chips (functional genomics)
i+4i+3
i+2
ii-1
i+1
CRH
N NC
CC
RO
H H
H
HNCC
O
H HNCC
O
H H
CRH
NMR spectra
peak assignment
structuralrestraint
extraction
protein structure
structure calculation
Data Interpretationin Analytical Technologies
(II)
NMR protein structure determination
Data Interpretationin Analytical Technologies
(III)
From image to data (imaging processing)
Large-scale data cannot be handled without computer
Noisy data (optimization with under-constraint / over-constraint)
Computer algorithms/programs can mimic human interpretation process and do it much faster
Automation of experimental data interpretation
Data Management and Computational Infrastructure
Track instruments, experiment conditions and results at each step of a complicated biological experiment (LIMS at modern wet labs)
Data storage and retrieval (database)
Data visualization
Data query and analysis pipeline
Discovery from Data Mining (I)
Pattern/knowledge discovery from datamany biological data are generated by biological processes
which are not well understood
interpretation of such data requires discovery of convoluted relationships hidden in the data
which segment of a DNA sequence represents a gene, a regulatory region
which genes are possibly responsible for a particular disease
Complicated data Large-scale, high-dimension
Noisy (false positives and false negatives)
Discovery from Data Mining (II)
Modeling, Prediction and
Design (I)
Modeling and prediction of biological objects/processesmodeling of biochemistry
enzyme reaction rates
modeling of biophysics dynamics of biomolecules
modeling of evolution prediction of phylogeny
Prediction of outcomes of biological processes computing will become an integral part of modern biology through an
iterative process of
From prediction to engineering design Protein structure prediction to protein engineering Design genetically modified species
model formulation
computational prediction
experimental validation
Modeling, Prediction and
Design (II)
Theoretical / In Silico Biology
Generate new hypothesis, formulate and test fundamental theories of biology
new hypothesis about detailed evolutionary history, through mining genomic sequence data?
new hypothesis about a particular signaling network, through data mining?
new hypothesis about protein folding pathways, through simulations?
Bioinformatics Application to
Biological Systems
plants (Arabidopsis)
bacteria(Synechococcus)
viruses (SARS)
yeast (Saccharomyces cerevisia)
neural systems(neurons)
Can Biology Help Computing?
Computational techniques inspired by biology:Neural network (artificial intelligence)
Genetic algorithm, automata
A new driver of computer science: Better hardware (supercomputers)
New data representation
Develop new theoretical framework: DNA computing Network communication
(communication between ants, see http://news-service.stanford.edu/news/2003/may7/antchat-57.html)
Computing versus Biology
what computer science is to molecular biology is like what mathematics has been to physics ...... -- Larry Hunter, ISMB’94
molecular biology is (becoming) an information science .......
-- Leroy Hood, RECOMB’00
Bioinformatics is still in its infancy!
Lecture Outline
What does bioinformatics do?
Course topics
Course Organization
Workload/grades
Course Topics
Data interpretation in analytical technologies Data management and computational
infrastructure Discovery from data mining Modeling, prediction and design Theoretical / in silico biology
Cover classical/mainstream bioinformatics problems from computer science prospective
Course Schedule
o See http://digbio.missouri.edu/cs7010/
First take home exam:--given on 9/29; due on 10/6
Second take home exam:--given on 11/17; due on 11/29
Three phases of project:--9/22, 10/20, 11/17, final report due 12/8
What I Will Teach
A general introduction to a few major problems in the field of bioinformatics problems definitions: from biological problem to computable problem some key computational techniques
A way of thinking: tackling “biological problem” computationally how to look at a biological problem from a computational point of view how to formulate a computational problem to address a biological issue how to collect statistics from biological data how to build a computational model how to design algorithms for the model how to test and evaluate a computational algorithm how to access confidence of a prediction result
New Ways of Thinking
Critical thinking Analytical thinking Quantitative thinking Algorithmic thinking
Lecture Outline
What does bioinformatics do?
Course topics
Course Organization
Workload/grades
A Brief Survey
Register for the course?
Academic department?
Computer background?
Biology background?
Statistical background?
Taken another bioinformatics course?
Prerequisites
CS 2050 (Algorithm Design and Programming II) or equivalent training
Statistics 2500 (Introduction to Probability and Statistics I) or equivalent training
Programming skills in any programming language are required
No biology background is necessary
Course Info
Co Instructor: Trupti Joshi (joshitr@missouri.edu)
Course Web Site:
http://digbio.missouri.edu/cs7010/
Reference Books - 1
• Neil C.Jones and Pavel A. Pevzner: An Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, 2004.• Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000. • Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.
Reference Books - 2
• Pierre Baldi and Soren Brunak: Bioinformatics – The Machine Learning Approach (second edition). MIT Press, 2001.
• Dan Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press. 1997.• Warren J. Ewens and Gregory R. Grant: Statistical Methods in Bioinformatics – An Introduction. Springer. 2001.• Terry Speed: Statistical analysis of gene expression of gene expression microarray data. Chapman&Hall/CRC. 2003.
Lectures
3:30pm – 4:45pm, Tuesday and Thursday
Powerpoint sides for each lecture (posted before the lecture)
Questions/answers in the beginning and end of lecture
Discussions are encouraged during the lecture (A topic discussion may be at the end of a lecture)
Office Hours
4:45pm-5:35pm, Tuesdays and Thursdays The instructor who deliver the lecture will give
the office hour Dong Xu: Room 109, Engineering Building West
(882-7064) Trupti Joshi: Room 317, Engineering Building
North (884-3528) Special office hours will be arranged close to the
final Appointments at other time
Lecture Outline
What does bioinformatics do?
Course topics
Course Organization
Workload/grades
Minimum Requirement
Attend class regularly Read suggested class handout after class Deliver the two take-home exams Deliver final project (for graduate
students)
Expected workload: 5-6 hours / week in addition to class attendance
How to Get Maximum out of the Course
Study suggested reading/slide before class Study optional reading Ask questions on class Frequent visits at office hours Perform homework assignments (not graded)
Not required (not counted in the final grade) but encouraged.
Grading
A final grade of A, B, C, etc. will be assigned, 2 take-home exams (20% each)
Project : 3 Phase Reports (5% each), Final Report (15%), Software Demo (15%), Presentation (15%)
Final projectA working bioinformatics program that can be used
by biologists or comprehensive computational analysis on bioinformatics tool outputs
One student one project (independent development) with consultation from instructors
Potential for publication
Three Phases of Project
Phase 1 (due 9/22): Define your project subject. A brief literature survey and illustration of its importance.
Phase 2 (due 10/20): Describe key methods.
Phase 3 (due 11/17): Present key results.
Final report: due 12/8
Discussion
What do you expect from this course?
- content?
- ways of teaching?
- how the instructors can help?
-…
Assignments
Suggested reading: http://bioinfo.mbb.yale.edu/e-print/whatis-mim/text.pdf Bioboxes in “Neil C.Jones and Pavel A. Pevzner: An
Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, 2004.”
Optional reading: Chapter 1 in “Current Topics in Computational
Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.”
http://www.ncbi.nih.gov/About/primer/bioinformatics.html
Recommended