42
CS 7010: Computational Methods in Bioinformatics (course introduction) Dong Xu Computer Science Department 109 Engineering Building West E-mail: [email protected] 573-882-7064 http://digbio.missouri.edu

CS 7010: Computational Methods in Bioinformatics (course introduction)

Embed Size (px)

DESCRIPTION

CS 7010: Computational Methods in Bioinformatics (course introduction). Dong Xu Computer Science Department 109 Engineering Building West E-mail: [email protected] 573-882-7064 http://digbio.missouri.edu. Challenges of Our Civilization -1. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 7010: Computational  Methods in Bioinformatics (course introduction)

CS 7010: Computational Methods in

Bioinformatics(course introduction)

Dong Xu

Computer Science Department109 Engineering Building WestE-mail: [email protected]

573-882-7064http://digbio.missouri.edu

Page 2: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Challenges of Our Civilization -1

top 125 unsolved problems in science over the next quarter-century (http://www.sciencemag.org/sciext/125th/)

The Top 25

What Is the Universe Made Of?

What is the Biological Basis of Consciousness?

Why Do Humans Have So Few Genes?

To What Extent Are Genetic Variation and Personal Health Linked?

Can the Laws of Physics Be Unified?

How Much Can Human Life Span Be Extended?

What Controls Organ Regeneration?

How Can a Skin Cell Become a Nerve Cell?

Page 3: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Challenges of Our Civilization-2

How Does a Single Somatic Cell Become a Whole Plant?

How Does Earth's Interior Work?

Are We Alone in the Universe?

How and Where Did Life on Earth Arise?

What Determines Species Diversity?

What Genetic Changes Made Us Uniquely Human?

How Are Memories Stored and Retrieved?

How Did Cooperative Behavior Evolve?

How Will Big Pictures Emerge from a Sea of Biological Data?

Page 4: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Challenges of Our Civilization-3

How Far Can We Push Chemical Self-Assembly?

What Are the Limits of Conventional Computing?

Can We Selectively Shut Off Immune Responses?

Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?

Is an Effective HIV Vaccine Feasible?

How Hot Will the Greenhouse World Be?

What Can Replace Cheap Oil -- and When?

Will Malthus Continue to Be Wrong?

Page 5: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Lecture Outline

What does bioinformatics do?

Course topics

Course Organization

Workload/grades

Page 6: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Technical Definitions

NIH (http://www.bisti.nih.gov/) Bioinformatics: “research, development, or

application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”.

Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.

Page 7: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Scope of Bioinformatics:Studying Biology on

Computer

data management; data mining; modeling; prediction; theory formulation

engineering aspect

scientific aspect

bioinformatics

an indispensable part of biological sciencewith its own methodology

genes, proteins, protein complexes, pathways, cells, organisms, ecosystem

computer science, biology, statisticsphysics, mathematics, chemistry, engineering,…

Page 8: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Why Bioinformatics is So Hot? (I)

More than 80 universities offer graduate degrees in bioinformatics

At cross-section of two most active fields: computer science and molecular biology

Exponential growths in computer technologies (hardware, Internet) pave the way for bioinformatics development

Page 9: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Why Bioinformatics is So Hot? (II)

Analytical technology

High-throughput data

Biological knowledge

Medicine & bioengineering

Page 10: CS 7010: Computational  Methods in Bioinformatics (course introduction)

What Can Computing Do for Biology?

Data interpretation in analytical technologies Data management and computational

infrastructure Discovery from data mining Modeling, prediction and design Theoretical / in silico biology

Almost cover every area of computer science

Page 11: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Data Interpretationin Analytical Technologies

(I)

Analytical technologies are the driving force of new (large-scale) biology:

DNA sequencing (genomics)

X-ray / NMR structure determination (structural genomics)

Protein identification using mass spectrometry (proteomics)

Microarray chips (functional genomics)

Page 12: CS 7010: Computational  Methods in Bioinformatics (course introduction)

i+4i+3

i+2

ii-1

i+1

CRH

N NC

CC

RO

H H

H

HNCC

O

H HNCC

O

H H

CRH

NMR spectra

peak assignment

structuralrestraint

extraction

protein structure

structure calculation

Data Interpretationin Analytical Technologies

(II)

NMR protein structure determination

Page 13: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Data Interpretationin Analytical Technologies

(III)

From image to data (imaging processing)

Large-scale data cannot be handled without computer

Noisy data (optimization with under-constraint / over-constraint)

Computer algorithms/programs can mimic human interpretation process and do it much faster

Automation of experimental data interpretation

Page 14: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Data Management and Computational Infrastructure

Track instruments, experiment conditions and results at each step of a complicated biological experiment (LIMS at modern wet labs)

Data storage and retrieval (database)

Data visualization

Data query and analysis pipeline

Page 15: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Discovery from Data Mining (I)

Page 16: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Pattern/knowledge discovery from datamany biological data are generated by biological processes

which are not well understood

interpretation of such data requires discovery of convoluted relationships hidden in the data

which segment of a DNA sequence represents a gene, a regulatory region

which genes are possibly responsible for a particular disease

Complicated data Large-scale, high-dimension

Noisy (false positives and false negatives)

Discovery from Data Mining (II)

Page 17: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Modeling, Prediction and

Design (I)

Modeling and prediction of biological objects/processesmodeling of biochemistry

enzyme reaction rates

modeling of biophysics dynamics of biomolecules

modeling of evolution prediction of phylogeny

Page 18: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Prediction of outcomes of biological processes computing will become an integral part of modern biology through an

iterative process of

From prediction to engineering design Protein structure prediction to protein engineering Design genetically modified species

model formulation

computational prediction

experimental validation

Modeling, Prediction and

Design (II)

Page 19: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Theoretical / In Silico Biology

Generate new hypothesis, formulate and test fundamental theories of biology

new hypothesis about detailed evolutionary history, through mining genomic sequence data?

new hypothesis about a particular signaling network, through data mining?

new hypothesis about protein folding pathways, through simulations?

Page 20: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Bioinformatics Application to

Biological Systems

plants (Arabidopsis)

bacteria(Synechococcus)

viruses (SARS)

yeast (Saccharomyces cerevisia)

neural systems(neurons)

Page 21: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Can Biology Help Computing?

Computational techniques inspired by biology:Neural network (artificial intelligence)

Genetic algorithm, automata

A new driver of computer science: Better hardware (supercomputers)

New data representation

Develop new theoretical framework: DNA computing Network communication

(communication between ants, see http://news-service.stanford.edu/news/2003/may7/antchat-57.html)

Page 22: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Computing versus Biology

what computer science is to molecular biology is like what mathematics has been to physics ...... -- Larry Hunter, ISMB’94

molecular biology is (becoming) an information science .......

-- Leroy Hood, RECOMB’00

Bioinformatics is still in its infancy!

Page 23: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Lecture Outline

What does bioinformatics do?

Course topics

Course Organization

Workload/grades

Page 24: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Course Topics

Data interpretation in analytical technologies Data management and computational

infrastructure Discovery from data mining Modeling, prediction and design Theoretical / in silico biology

Cover classical/mainstream bioinformatics problems from computer science prospective

Page 25: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Course Schedule

o See http://digbio.missouri.edu/cs7010/

First take home exam:--given on 9/29; due on 10/6

Second take home exam:--given on 11/17; due on 11/29

Three phases of project:--9/22, 10/20, 11/17, final report due 12/8

Page 26: CS 7010: Computational  Methods in Bioinformatics (course introduction)

What I Will Teach

A general introduction to a few major problems in the field of bioinformatics problems definitions: from biological problem to computable problem some key computational techniques

A way of thinking: tackling “biological problem” computationally how to look at a biological problem from a computational point of view how to formulate a computational problem to address a biological issue how to collect statistics from biological data how to build a computational model how to design algorithms for the model how to test and evaluate a computational algorithm how to access confidence of a prediction result

Page 27: CS 7010: Computational  Methods in Bioinformatics (course introduction)

New Ways of Thinking

Critical thinking Analytical thinking Quantitative thinking Algorithmic thinking

Page 28: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Lecture Outline

What does bioinformatics do?

Course topics

Course Organization

Workload/grades

Page 29: CS 7010: Computational  Methods in Bioinformatics (course introduction)

A Brief Survey

Register for the course?

Academic department?

Computer background?

Biology background?

Statistical background?

Taken another bioinformatics course?

Page 30: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Prerequisites

CS 2050 (Algorithm Design and Programming II) or equivalent training

Statistics 2500 (Introduction to Probability and Statistics I) or equivalent training

Programming skills in any programming language are required

No biology background is necessary

Page 31: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Course Info

Co Instructor: Trupti Joshi ([email protected])

Course Web Site:

http://digbio.missouri.edu/cs7010/

Page 32: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Reference Books - 1

• Neil C.Jones and Pavel A. Pevzner: An Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, 2004.• Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000. • Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.

Page 33: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Reference Books - 2

• Pierre Baldi and Soren Brunak: Bioinformatics – The Machine Learning Approach (second edition). MIT Press, 2001.

• Dan Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press. 1997.• Warren J. Ewens and Gregory R. Grant: Statistical Methods in Bioinformatics – An Introduction. Springer. 2001.• Terry Speed: Statistical analysis of gene expression of gene expression microarray data. Chapman&Hall/CRC. 2003.

Page 34: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Lectures

3:30pm – 4:45pm, Tuesday and Thursday

Powerpoint sides for each lecture (posted before the lecture)

Questions/answers in the beginning and end of lecture

Discussions are encouraged during the lecture (A topic discussion may be at the end of a lecture)

Page 35: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Office Hours

4:45pm-5:35pm, Tuesdays and Thursdays The instructor who deliver the lecture will give

the office hour Dong Xu: Room 109, Engineering Building West

(882-7064) Trupti Joshi: Room 317, Engineering Building

North (884-3528) Special office hours will be arranged close to the

final Appointments at other time

Page 36: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Lecture Outline

What does bioinformatics do?

Course topics

Course Organization

Workload/grades

Page 37: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Minimum Requirement

Attend class regularly Read suggested class handout after class Deliver the two take-home exams Deliver final project (for graduate

students)

Expected workload: 5-6 hours / week in addition to class attendance

Page 38: CS 7010: Computational  Methods in Bioinformatics (course introduction)

How to Get Maximum out of the Course

Study suggested reading/slide before class Study optional reading Ask questions on class Frequent visits at office hours Perform homework assignments (not graded)

Not required (not counted in the final grade) but encouraged.

Page 39: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Grading

A final grade of A, B, C, etc. will be assigned, 2 take-home exams (20% each)

Project : 3 Phase Reports (5% each), Final Report (15%), Software Demo (15%), Presentation (15%)

Final projectA working bioinformatics program that can be used

by biologists or comprehensive computational analysis on bioinformatics tool outputs

One student one project (independent development) with consultation from instructors

Potential for publication

Page 40: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Three Phases of Project

Phase 1 (due 9/22): Define your project subject. A brief literature survey and illustration of its importance.

Phase 2 (due 10/20): Describe key methods.

Phase 3 (due 11/17): Present key results.

Final report: due 12/8

Page 41: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Discussion

What do you expect from this course?

- content?

- ways of teaching?

- how the instructors can help?

-…

Page 42: CS 7010: Computational  Methods in Bioinformatics (course introduction)

Assignments

Suggested reading: http://bioinfo.mbb.yale.edu/e-print/whatis-mim/text.pdf Bioboxes in “Neil C.Jones and Pavel A. Pevzner: An

Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, 2004.”

Optional reading: Chapter 1 in “Current Topics in Computational

Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.”

http://www.ncbi.nih.gov/About/primer/bioinformatics.html