37
CSE891-002 Selected Topics in Bioinformatics Jin Chen 232 Plant Biology Bld. 2011 Spring 1

CSE891-002 Selected Topics in Bioinformatics

  • Upload
    ohio

  • View
    83

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE891-002 Selected Topics in Bioinformatics. Jin Chen 232 Plant Biology Bld. 2011 Spring. About me…. Jin Chen, Assistant Professor in CSE and PRL from 2009 Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: [email protected]. Outline. Course Description - PowerPoint PPT Presentation

Citation preview

Page 1: CSE891-002 Selected Topics in Bioinformatics

1

CSE891-002 Selected Topics in Bioinformatics

Jin Chen232 Plant Biology Bld.

2011 Spring

Page 2: CSE891-002 Selected Topics in Bioinformatics

2

About me…• Jin Chen, Assistant Professor in CSE and PRL from 2009

• Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: [email protected]

Page 3: CSE891-002 Selected Topics in Bioinformatics

3

Outline

• Course Description

• Introduction to Computational Network Biology

Page 4: CSE891-002 Selected Topics in Bioinformatics

4

Course Description• Course objectives: study interesting computational network biology

problems and their algorithms, with a focus on the principles used to design those algorithms. (3 credits)

• Instructor: Jin Chen, Office: 232 Plant Biology Bld. Email: [email protected]

• Office hours: Thursday 2PM-3PM. If you cannot attend office hours, email me about scheduling a different time.

• Web page: http://www.msu.edu/~jinchen/cse891a

Page 5: CSE891-002 Selected Topics in Bioinformatics

5

Course Description• Course work: One 80 minutes lecture, and 80 minutes of

discussion & student presentations each week

• Grading policies: The course will be graded on attendance (10%), participation (20%), and presentation (70%).

• No Final Exam

Page 6: CSE891-002 Selected Topics in Bioinformatics

6

Course Description• Prerequisites: Graduate students in science or engineering.

Note: an override is necessary for non-CSE graduate students; please send your PID & NetID to me.

• No prior knowledge of biology is required. Computationally inclined biology graduate students are encouraged to take the class as well.

Page 7: CSE891-002 Selected Topics in Bioinformatics

7

Suggested books• A.-L. Barabási, Linked: The new science of networks

• U. Alon, An Introduction to Systems Biology

• B. Palsson. Systems Biology: Properties of Reconstructed Networks

• K. Kaneko, Life: An Introduction to Complex Systems Biology

Page 8: CSE891-002 Selected Topics in Bioinformatics

8

Course Description

Graph model Graph clustering subgraph mining

Protein-protein interaction networkGene regulatory network

Metabolic networkIntegrative study

Graph Mining

Network Biology

Page 9: CSE891-002 Selected Topics in Bioinformatics

9

Date Title Topics1/11/11 Course Organization. Introduction. Introduction to Computational Network Biology1/13/11 Protein-Protein Interaction networks I PPI network construction and false positive detection1/18/11 Student Presentation & Discussion 1

1/20/11 Protein-Protein Interaction networks II Topological analysis in PPI networks. Network motif.1/25/11 Student Presentation & Discussion 2

1/27/11 Protein-Protein Interaction networks IIIApplications of PPI network (protein function prediction, network comparison)

2/1/11 Student Presentation & Discussion 32/3/11 Gene Correlation Networks I Gene co-expression study 2/8/11 Student Presentation & Discussion 4

2/10/11 Gene Correlation Networks II Gene co-regulation study2/15/11 Student Presentation & Discussion 52/17/11 Gene Transcriptional Regulation Networks I cis-elements and gene co-regulation2/22/11 Student Presentation & Discussion 62/24/11 Gene Transcriptional Regulation Networks II Bayesian network for GRN construction3/1/11 Student Presentation & Discussion 73/3/10 Gene Transcriptional Regulation Networks III ChIP-seq and its applications in GRN construciton3/7-11 Spring Break3/15/11 Student Presentation & Discussion 83/17/11 Gene Transcriptional Regulation Networks IV GRN topological study3/22/11 Student Presentation & Discussion 93/24/11 Metabolic Networks I Flux balance analysis and metabolic control analysis3/29/11 Student Presentation & Discussion 103/31/11 Metabolic Networks II Integrative study: r-FBA model4/5/11 Student Presentation & Discussion 114/7/11 Graph Mining I Graph models

4/12/11 Student Presentation & Discussion 124/14/11 Graph Mining II Graph clustering and partitioning4/19/11 Student Presentation & Discussion 134/21/11 Graph Mining III Frequent subgraph mining4/26/11 Student Presentation & Discussion 14

Page 10: CSE891-002 Selected Topics in Bioinformatics

10

Paper list1. Chua et al. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions.

Bioinformatics (2006) 22 (13): 1623-1630.

2. Kashani et al. Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics 2009, 10:318

3. Deng et al. Prediction of Protein Function Using Protein–Protein Interaction Data. Journal of Computational Biology. December 2003, 10(6): 947-960.

4. Hu et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. Vol. 21 Suppl. 1 pp. i213–i221. 2005

5. Xu et al. Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles. ICDE 2006

6. Xu et al, Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines. PLoS Computational Biology. 5(4) 2009

7. Huang et al. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems. 43. 1207–1225. 2007

8. Honkela et al. Model-based method for transcription factor target identification with limited data. PNAS vol 107(17) pp. 7793–7798. 2009

9. Vermeirssen et al. Transcription factor modularity in a Gene-Centered C. elegans Protein-DNA interaction network. Genome Research 17, 061-1071. 2007

10. Covert et al, Transcriptional Regulation in Constraints-Based Metabolic Models of Escherichia coli, Journal of Biological Chemistry, 277(31): pp. 28058-28064. 2002

11. Herrgard et al. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. Genome Research. 16:627–635. 2006

12. Barabási et al. Network Biology: Understanding the Cell's Functional Organization. Nature Reviews Genetics 5, 101-113. 2004

13. Dongen. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000

14. Huan et al. Mining Family Specific Residue Packing Patterns from Protein Structure Graphs, RECOMB, pp. 308-315, 2004

Page 11: CSE891-002 Selected Topics in Bioinformatics

11

Course Description• Select at least one paper for presentation from the paper list.

Email me which paper you will present by next Mon (1/17/2011)

• Each presentation is 45 min, including 15 min Q&A, followed with a discussion

• Your grade will be largely determined by the presentation (70%)

• Presentation starts from next Tue (1/18/2011)

Page 12: CSE891-002 Selected Topics in Bioinformatics

12

Important Days:

Class Begins 1/10/2011 Open adds end 1/14/2011Last day to drop with refund 2/3/2011 Last day to drop with no grade reported 3/2/2011 Class Ends 5/6/2011

Page 13: CSE891-002 Selected Topics in Bioinformatics

13

Introduction to Computational Network Biology

• Network biology belongs to systems biology, which belongs to genomics

• Interested in the relations between entities rather than the entities themselves

http://bionet.bioapps.biozentrum.uni-wuerzburg.de/

Page 14: CSE891-002 Selected Topics in Bioinformatics

14

Network’s everywhere• Internet, social network, anti-terrorism network

• Biological networks – Protein-protein interaction (PPI) network– protein-DNA interaction network– gene correlation network– gene regulatory network– metabolic network– signaling network…

• Network is a tool for under standing complex systems

• Network models explains network properties and support network behavior study

• Network measures provide quantitative analysis for complex systems

Page 15: CSE891-002 Selected Topics in Bioinformatics

15

Definition of network (graph)

Node (vertex)

G(V,E)

Self-loop

EdgeMulti-set of edges

Simple graph: does not have loops (self-edges) and does not have multi-edges.

Page 16: CSE891-002 Selected Topics in Bioinformatics

16

Definition of network (graph)

Directed graphvs.Undirected graph

Labeled graphvs.Unlabeled graph

Symmetric graphvs.Asymmetric graph

Page 17: CSE891-002 Selected Topics in Bioinformatics

17

Webpage layout

M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 2004

Pages on a web site and the hyperlinks between them

Page 18: CSE891-002 Selected Topics in Bioinformatics

18Adopted from R Albert’s slides

Page 19: CSE891-002 Selected Topics in Bioinformatics

19

Biological networks

Page 20: CSE891-002 Selected Topics in Bioinformatics

20

Hawoong Jeong

Yeast Protein-Protein Interaction network

Page 21: CSE891-002 Selected Topics in Bioinformatics

21

Eric Davidson

Gene regulation network of sea urchin

Page 22: CSE891-002 Selected Topics in Bioinformatics

22

Abhishek Murarka

Metabolic flux analysis of E. coli

Page 23: CSE891-002 Selected Topics in Bioinformatics

23

Why study networks?• Complex systems cannot be described in a reductionist view

• Behavior study of complex systems starts with understanding the network topology

• Network - related questions:– How do we reconstruct a network?– How can we quantitatively describe large networks?– How did networks get to be the way they are?

Page 24: CSE891-002 Selected Topics in Bioinformatics

24

Simple measures• Node Degree: the number of edges connected to the node

– In-degree & Out-degree– Total in-degree == total out-degree

• Average Degree: the average of node degrees for all the nodes in the network, denoted as:

• Degree distribution: the degree distribution P(k) gives the fraction of nodes that have k edges

where N is the number of nodes in the network, ki is the node degree of node i

Page 25: CSE891-002 Selected Topics in Bioinformatics

25

Simple measures• Shortest path: to find a path between two nodes such that the

sum of the weights of its constituent edges is minimized

• Graph diameter: the longest shortest path between any pair of nodes in the graph.

• Connected graph: any two vertices can be joined by a path

• Bridge: if we erase the edge, the graph becomes disconnected

Page 26: CSE891-002 Selected Topics in Bioinformatics

26

Simple measures• Betweenness centrality: for all node pairs (i, j), find all the shortest paths between

nodes i and j, denoted as C(i,j), and determine how many of these pass through node k, denoted as Ck(i,j). Betweenness centrality of node k is

• Calculating the betweenness involves calculating the shortest paths between all pairs of vertices on a graph. O(V2logV + VE) for sparse graph with Johnson’s algorithm.

L. C. Freeman, Sociometry 40, 35 (1977); P. E. Black, Dictionary of Algorithms and Data Structures (2004)

Page 27: CSE891-002 Selected Topics in Bioinformatics

28

Complex measures

• Frequent subgraph mining

• Graph comparison & classification

• Graph isomorphic testing

Page 28: CSE891-002 Selected Topics in Bioinformatics

29

Useful software

• Visualization & Topological Analysis– Cytoscape (www.cytoscape.org)– Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek)

• Graph related programming– LEDA (www.algorithmic-solutions.com)– Nauty

(www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml)

Page 29: CSE891-002 Selected Topics in Bioinformatics

1960 1999 2002

Page 30: CSE891-002 Selected Topics in Bioinformatics

Real networks are much more complex

• Transcription regulatory networks of Yeast and E. coli show an interesting example of mixed characteristics– how many genes a TF interacts with – how many TFs interact with a given gene

- scale-free- exponential

Page 31: CSE891-002 Selected Topics in Bioinformatics

Modularity and network motif• Cellular function are likely to be carried out in a highly

modular manner

• Modular -- a group of genes/proteins that work together to achieve distinct functions

• Biology is full of examples of modularity

Page 32: CSE891-002 Selected Topics in Bioinformatics
Page 33: CSE891-002 Selected Topics in Bioinformatics
Page 34: CSE891-002 Selected Topics in Bioinformatics
Page 35: CSE891-002 Selected Topics in Bioinformatics

Remaining challenges• Discovery of network motifs is closely related to the

generation of random networks

• Structure of network motifs does not necessary determine function

• Relation between higher-level organizational, functional states and networks has not yet been studied

Voigt, W. et al. Genetics 2005 Ingram P.J.et al. BMC Genomics 2006

Eric Werner. Nature 2007

Page 36: CSE891-002 Selected Topics in Bioinformatics
Page 37: CSE891-002 Selected Topics in Bioinformatics

38

Next class

• PPI network construction

• False-positive detection