CAP 5510: Introduction to Bioinformatics

3/3/08 CAP5510 1

Giri NarasimhanECS 254; Phone: x3748

giri@cis.fiu.edu

www.cis.fiu.edu/~giri/teach/BioinfS07.html

3/3/08 CAP5510 2

Gene g

Probe 1 Probe 2 Probe N…

3/3/08 CAP5510 3

Microarray Data

Expression LevelGene

3/3/08 CAP5510 4

Sample

Treated Sample(t1) Expt 1

Treated Sample(tn) Expt n

Study effect of treatment over time

3/3/08 CAP5510 5

http://www.arabidopsis.org/info/2010_projects/comp_proj/AFGC/RevisedAFGC/Friday/

3/3/08 CAP5510 6

How to compare 2 cell samples with Two-ColorMicroarrays?

mRNA from sample 1 is extracted and labeled with a red

fluorescent dye.

mRNA from sample 2 is extracted and labeled with a green

fluorescent dye.

Mix the samples and apply it to every spot on the

microarray. Hybridize sample mixture to probes.

Use optical detector to measure the amount of green and

red fluorescence at each spot.

3/3/08 CAP5510 7

Variations in cells/individuals

Variations in mRNA extraction, isolation, introduction of dye, variationin dye incorporation, dye interference

Variations in probe concentration, probe amounts, substrate surfacecharacteristics

Variations in hybridization conditions and kinetics

Variations in optical measurements, spot misalignments, discretizationeffects, noise due to scanner lens and laser irregularities

Cross-hybridization of sequences with high sequence identity

Limit of factor 2 in precision of results

Variation changes with intensity: larger variation at low or highexpression levels

Sources of Variations & Experimental Errors

Need to Normalize data

3/3/08 CAP5510 8

ClusteringClustering is a general method to study patterns in

gene expressions.

Several known methods:

Hierarchical Clustering (Bottom-Up Approach)

K-means Clustering (Top-Down Approach)

Self-Organizing Maps (SOM)

3/3/08 CAP5510 9

Hierarchical Clustering: Example

0 1 2 3 4 5 6 7 8 9 10

3/3/08 CAP5510 10

A Dendrogram

3/3/08 CAP5510 11

Hierarchical Clustering [Johnson, SC, 1967]

Given n points in Rd, compute the distance betweenevery pair of points

While (not done)Pick closest pair of points si and sj and make them part ofthe same cluster.

Replace the pair by an average of the two sij

Try the applet at:http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

3/3/08 CAP5510 12

Distance Metrics

For clustering, define a distance function:

Euclidean distance metrics

Pearson correlation coefficient

k=2: Euclidean Distance

kiik YXYXD

)(),( ==

xyYYXX

1-1 xy 1

3/3/08 CAP5510 13

3/3/08 CAP5510 14

Clustering of gene expressions

Represent each gene as a vector or a point in d-

space where d is the number of arrays or

experiments being analyzed.

3/3/08 CAP5510 15

From Eisen MB, et al, PNAS 1998 95(25):14863-8

Clustering Random vs. Biological

3/3/08 CAP5510 16

K-Means Clustering: Example

Example from Andrew Moore’s tutorial on Clustering.

3/3/08 CAP5510 17

3/3/08 CAP5510 18

3/3/08 CAP5510 19

3/3/08 CAP5510 20

3/3/08 CAP5510 21

K-Means Clustering [McQueen 67]

RepeatStart with randomly chosen cluster centers

Assign points to give greatest increase in score

Recompute cluster centers

Reassign points

until (no changes)Try the applet at:

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

3/3/08 CAP5510 22

Comparisons

Hierarchical clusteringNumber of clusters not preset.

Complete hierarchy of clusters

Not very robust, not very efficient.

K-MeansNeed definition of a mean. Categorical data?

More efficient and often finds optimum clustering.

3/3/08 CAP5510 23

Functionally related

genes behave similarly

across experiments

3/3/08 CAP5510 24

Self-Organizing Maps [Kohonen]

Kind of neural network.

Clusters data and find complex relationships

between clusters.

Helps reduce the dimensionality of the data.

Map of 1 or 2 dimensions produced.

Unsupervised Clustering

Like K-Means, except for visualization

3/3/08 CAP5510 25

SOM Architectures

2-D Grid

3-D Grid

Hexagonal Grid

3/3/08 CAP5510 26

SOM Algorithm

Select SOM architecture, and initialize weight

vectors and other parameters.

While (stopping condition not satisfied) do for each

input point x

winning node q has weight vector closest to x.

Update weight vector of q and its neighbors.

Reduce neighborhood size and learning rate.

3/3/08 CAP5510 27

SOM Algorithm Details

Distance between x and weight vector:

Winning node:

Weight update function (for neighbors):

Learning rate:

)]()()[,,()()1( kwkxixkkwkw iii +=+ μ

wxxq = min)(

exp)(),,( 0μxqi rr

3/3/08 CAP5510 28

World Bank Statistics

Data: World Bank statistics of countries in 1992.

39 indicators considered e.g., health, nutrition,

educational services, etc.

The complex joint effect of these factors can can

be visualized by organizing the countries using the

self-organizing map.

3/3/08 CAP5510 29

World Poverty PCA

3/3/08 CAP5510 30

World Poverty SOM

3/3/08 CAP5510 31

World Poverty Map

3/3/08 CAP5510 32

3/3/08 CAP5510 33

3/3/08 CAP5510 34

Viewing SOM Clusters on PCA axes

3/3/08 CAP5510 35

2 3 4 5 6

SOM Example [Xiao-rui He]

3/3/08 CAP5510 36

Neural Networks

Input X

Synaptic

Weights W

ƒ(•)

Output y

3/3/08 CAP5510 37

Learning NN

Adaptive Algorithm

Input X

Weights W

Desired Response

3/3/08 CAP5510 38

Types of NNs

Recurrent NN

Feed-forward NN

Layered

Other issuesHidden layers possible

Different activation functions possible

3/3/08 CAP5510 39

Application: Secondary Structure Prediction

CAP 5510: Introduction to Bioinformatics

Documents

Dell Precision 5510

5510 0026-00ppr.indd

Actividad Firewall Cisco ASA 5510

Sequencing and Mapping CAP 5937-01 Bioinformatics Fall 2004 Amar Mukherjee

F-5510 for online viewing.pdf

Dynamic NAT ASA 5510

CAP 5510: Introduction to Bioinformaticsgiri/teach/Bioinf/S07/Lecx1.pdf2/8/07 CAP5510 4 Unaligned Pattern Discovery Rigoutsos & Floratos, Bioinformatics, ’98 TEIRESIAS: The algorithm

CAP 5510: Introduction to Bioinformatics CGS 5166 ...users.cis.fiu.edu/~giri/teach/Bioinf/S15/Lec1-Basics.pdf · Bioinformatics Algorithms and Hardware Moore’s LawFaster processors,

Configuración Firewall - CISCO ASA 5510

SECNAVINST 5510

Social Security Law 5510

CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec1-Basics.pdf · Introduction Goals & Perspectives Phenomenal Growth Short Homework Model Organisms Motivating

Data Sheet P/N RW-5510-0A50 · RADWINHPMPHSU510Series SubscriberUnit-DataSheet RW-5510-0A50 ProductDescription RW-5510-0A50SubscriberUnit(HSU) provideshighcapacityaccessconnectivityof

5510 0101 Mk3 Product Information

CAP 5510: Introduction to Bioinformaticsgiri/teach/Bioinf/S07/Lecx.pdf · CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; ... Generates a “ton of data”.

Deep Sea 5510 Manual

5510 sayılı Sosyal

Nokia 5510 LCD Display Driver

PARTES 7210 5510.pdf

Aspire 5510