71
Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo

Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Molecular Biology:

from sequence analysis to signal processing

Junior Barrera

University of Sao Paulo

Page 2: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Layout

• Introduction

• Knowledge evolution in Genetics

• Data acquisition

• Data Analysis

• A system for genetic data analysis

• Applications

Page 3: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Some medical signals are: EEG, ECG, ultra

sound, tomography, etc.

• These signals are a great source of

information about the human body

• For fully exploration of these data, Digital

Signal Processing Techniques are necessary

• DSP: Algebra + Statistics + Computation

Page 4: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Techniques for the identification of genetic

code are well known

• Soon the code of all genes will be known

• This knowledge open the way for one of the

greatest challenges of science: the

understanding of genes functionality

Page 5: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Sets of genes constitute dynamical systems

that control sequences of Biochemical

reactions, called pathway

• The pathways in a cell define its activities

• States of large sets of genes can be observed

by the microarray technology

• Gene states observed in time are digital

signals

Page 6: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Gene states may describe properties of

tissues. Pattern Recognition techniques

permits to predict tissue properties.

Example: cancer classification

• System Identification techniques permit to

estimate net architectures and dynamics.

Example: control of cell division

Page 7: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Knowledge evolution in genetics

• Heredity - Mendel (1866)

• The phenotypes of an individual depends on

genes of his parents.

Page 8: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Knowledge evolution in genetics

• Chromosome Theory - Morgan (1910)

• Genes were situated in chromosomes

Page 9: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• The molecular structure of chromosomes

(Watson and Crick - 1953)

• DNA structure: the double helix

• Four basis: adenine(A), guanine(G),

thymine(T), cytosine(C)

• genes are sequences of nucleotides

Page 10: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

Page 11: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• DNA manipulation

• cut, replication and decoding

Page 12: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Genetic engineering

• species modification, drug production

Page 13: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Genes control the metabolism

• Metabolism occurs by sequences of

enzyme-catalyzed reactions.

• Enzymes are specified by one or more

genes

Page 14: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

Page 15: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Introduction

• Gene expression

Page 16: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data acquisition

Page 17: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data acquisition

Page 18: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data acquisition

Quantization - {-1,0,1}

Page 19: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data Analysis

• Data classes definition

• Relational search

• Data transformation

• Mining

• Integration of information

• Interpretations

Page 20: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data classes definition

molecule

cell

tissue

organism

Protein structure

and dynamics

DNA, Protein,

Gene Expression,

Gene Networks

Population

Page 21: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Relational Search

• Get a subset of the available data

• Define relations between categories of data

• Select by logical operations on relations

Page 22: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data Transformations

• Image analysis

• Measures on DNA sequences

• Measure on Protein sequences

• ...

Page 23: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

Classifier

Examples : DNA assembling, Protein and DNA homology, DNA philogeny,

genes characterizations of tissue, time pattern similarity

Page 24: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

• Attribute Space

x1

x2

x=(x1,x2)

Page 25: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

• Clustering

x1

x2

x1

x2

Page 26: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

x1

x2

x1

x2

• Classification

Page 27: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

• Attribute Space Dimension

x1

x2

Page 28: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

MiningDynamical System

Examples : gene networks, protein structure, cells, organisms, populations, drug

reaction

x1= y1

x2 = y2

t

t

initial

state u

u’Final state

Final state

Page 29: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Mining

Page 30: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Integration of Information

• different resolutions data classes

• transformed data

• selected data

• mined data

Page 31: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Interpretation

• Integrated information

• Known concepts

• Propose hypothesis

• confirm or negate hypothesis

Page 32: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

A system for genetic data

analysis

• Database

• Analytical procedures

• Data mining

• High performance computing

Page 33: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

System

P1

P2

Pn

Pi : analytical and mining procedures (kernel parallel)

Objected oriented database

Page 34: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

System Architecture

DB/2DB/2

SybaseSybase

OracleOracle

InformixInformix

SliceSlice NNSliceSlice 11

Appl. n

Appl. 3

Appl. 2

Appl. 1

JavaJava

PBPB

DelphiDelphi

MATLAMATLA

N-2 slices

Consulte rules and libraries

Page 35: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Data Warehouse

M_database

Analysis tools writer SpreadSheet Dataminig

MOLAP/ROLAP server

IMS OthersRDMS

Wrapper Wrapper Wrapper

Integrator

Metadata Data

Mart

Users

1

2

3

4

Page 36: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Applications

• Cancer tissue characterization

• Cell cycle simulation

• Inference from clustering

• Gene regulation

Page 37: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Cancer tissue characterization

• Problem: from a small set (20) of

microarrays, find a minimum number of

genes that are enough to separate cancer A

and B.

+bound

-bound

E[εn(d)]

εd

Mis

cla

ssif

icati

on

err

or

number of variables, d

( )dnε̂

Page 38: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

• Approach: randomize data, compute

classifier using genes subsets, measure error

for different dispersions, choose the subset

that balance small error and high dispersion.

A supercomputer is required.

Page 39: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

rk

• Linear classifier

• Dispersion centered in the sample

• Flat round dispersion model

• Error computed analytically (faster)

h

R

22hR −

Misclassification

error, εn(h,R)

Hyper-plane section: Vn-1

Xj [R1]

Xi [Rn-1]

xk

Page 40: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

• Robustness analysis

rj

rk

Page 41: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

2 4 6 8 10 12 14 16 18 200

0.02

0.04

0.06

0.08

0.1

0.12

# of variables, d

mis

cla

ssific

ation e

rorr

, ε

Error curves under various dispersion levels, σ

σ = 0.20

σ = 0.30

σ = 0.40

σ = 0.50

σ = 0.60

σ = 0.70

Page 42: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−1.5

−1

−0.5

0

0.5

1

1.5

2

phosphofructokinase, platelet

v−

erb

−b2 a

via

n e

ryth

robla

stic leukem

ia v

iral oncogene h

om

olo

g 2

LINEAR CLASSIFIER (DISPERSED−GAUSSIAN) w/ σ = 0.600

1−th (0.389593)

7−th (0.416607)

BRCA1 BRCA2/sporadic

the tumor sample from Patient 20

Page 43: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

• Cell cycle simulation

External Signal

Receptors

Ingredients Control

p53

Division Steps

Excitation

Excitation or Inhibition

Inhibition

Page 44: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Inference from clustering

• Examine the precision of sample-based

clustering relative to population inference

• Compare the number of replicates of

microarray experiments

• Compare the various clustering methods

Page 45: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

1 2 3 4 5 6 7-6

-4

-2

0

2

4

6

timelo

g2(r

atio)

time course data

C1

C2

C3

-40 -20 0 20 40 60 80 100 120 140-20

-10

0

10

20

30

40

50

60KL plot

C1

C2

C3

Dendrogram

Time course data

KL plotmultidimensional space

different views of different views of different views of

Microarray dataMicroarray dataMicroarray datamis-classified

Original clusters

Clustered by dendrogram

Page 46: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Time Course Model

Class 1

[low variance]

measurement

Class 2

[high variance]

measurement

Page 47: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Clustering algorithms

• K-means

• Fuzzy c-means

• Self Organizing Map

• Hierarchical (dendrogram)

Page 48: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Clustering errors

• How clustering errors change as the number

of replicates increases?

• How differently each clustering algorithm

perform?

Page 49: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

K-means

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

# of replicates

# o

f m

iscla

ssific

ations

k-means

0.25

0.5

1.0

2.0

3.0

Page 50: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Fuzzy c-means

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

# of replicates

# o

f m

iscla

ssific

ations

Fuzzy c-means

0.25

0.5

1.0

2.0

3.0

Clustering errorsClustering errorsClustering errorsClustering errors

Page 51: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Self Organizing Maps

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

70

80

90

100

# of replicates

# o

f m

iscla

ssific

ations

SOMs

0.25

0.5

1.0

2.0

3.0

Clustering errorsClustering errorsClustering errorsClustering errors

Page 52: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Variances of Data x Replicates

• The number of replicates required to get a

reasonable clustering result varies,

depending on the variance of gene

expression levels

• Clustering algorithm must also be chosen

correspondingly to get the best clustering

algorithm. No universal clustering

algorithm!

Page 53: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

No replicatevariance = 0.25

misclassifications

Tighter clusters due to small variance

Results from Fuzzy c-means

Page 54: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

No replicatevariance = 1.0

manymisclassifications

clusters start mixing

Page 55: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

No replicatevariance = 2.0

LOTS OFmisclassifications

Looser clusters due to large variance

Page 56: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

5 replicatesvariance = 0.25

NOmisclassifications

Much tighter clusters due to the replications

Page 57: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

5 replicatesvariance = 1.0

Very fewmisclassifications

Clusters well separated due to the replications and relatively small variance

Page 58: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

5 replicatesvariance = 2.0

Very fewmisclassifications

Clusters tightened due to the replications

Page 59: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Real Data Application

• Initial clustering to generate templates

– means

– variance

• Simulate time course data based on the

templates generated by initial clustering

– different number of replicates

• Apply various clustering methods

– expected clustering error for each method

Page 60: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

5 templates by HC

-4

-2

0

2

428

-4

-2

0

2

412

-4

-2

0

2

4208

-4

-2

0

2

4194

-4

-2

0

2

4592

Page 61: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Clustering errors on HC

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25HC-EU-CL 5 templates(non-exclusive)

Repetition (N)

Err

or

(%)

FCM

K-means

SOM

HC-CC-CL

HC-EU-CL

Page 62: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

5 templates by FCM

-2

-1

0

1

2

130

-2

-1

0

1

2

166

-2

-1

0

1

2

134

-2

-1

0

1

2

330

-2

-1

0

1

2

274

Page 63: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Clustering errors on FCM

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35FCM 5 templates(non-exclusive)

Repetition (N)

Err

or

(%)

FCM

K-means

SOM

HC-CC-CL

HC-EU-CL

Page 64: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene Regulation

• Problem: find the architecture of a gene

regulation network from microarray data.

• Approach: choose small subsets of genes (2,

3 or 4), design classifier, compute the

empirical error, choose the minimum error

classifier. A supercomputer is required.

Page 65: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Target Gene Predictor 1

Predictor 2

NMSE 1NMSE 2

Predictor 3

NMSE 3

Gene Regulation

Page 66: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene Regulation

x1 x2 p(-1,x1,x2) p(0,x1,x2) p(1,x1,x2) p(x1,x2) y Error

-1 -1 0.05 0.1 0.05 0.2 0 0.1

-1 0 0.03 0.03 0.04 0.1 1 0.06

-1 1 0.02 0.01 0.07 0.1 1 0.03

0 -1 0.01 0.01 0.03 0.05 1 0.02

0 0 0.03 0.01 0.01 0.05 -1 0.02

0 1 0.07 0.1 0.03 0.2 0 0.1

1 -1 0.04 0.06 0.1 0.2 1 0.1

1 0 0.03 0.01 0.01 0.05 -1 0.02

1 1 0.02 0.02 0.01 0.05 -1 0.03

0.48

Page 67: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene regulation

Cell line Condition RC

H1

BC

L3

FR

A1

RE

L-B

AT

F3

IAP

-1

PC

-1

MB

P-1

SS

AT

MD

M2

p21

p53

AH

A

OH

O

IR MM

S

UV

ML-1 IR -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0

ML-1 MMS 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0

Molt4 IR -1 0 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0

Molt4 MMS 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 1 0

SR IR -1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0

SR MMS 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0

A549 IR 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0

A549 MMS 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0

A549 UV 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1

MCF7 IR -1 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0

MCF7 MMS 0 0 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0

MCF7 UV 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1

RKO IR 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0

RKO MMS 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0

RKO UV 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1

CCRF-CEM IR -1 1 1 1 1 0 1 0 0 0 0 -1 -1 0 1 0 0

CCRF-CEM MMS 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0

HL60 IR -1 1 0 1 1 0 1 0 1 0 1 -1 -1 -1 1 0 0

HL60 MMS 0 0 1 0 1 0 0 0 0 1 1 -1 0 1 0 1 0

K562 IR 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 1 0 0

K562 MMS 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0

H1299 IR 0 0 0 1 0 0 1 0 0 0 0 -1 0 0 1 0 0

H1299 MMS 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 1 0

H1299 UV 0 0 0 0 1 0 1 0 0 0 1 -1 0 1 0 0 1

RKO/E6 IR -1 1 0 1 0 1 1 0 0 0 0 -1 -1 0 1 0 0

RKO/E6 MMS -1 0 0 0 1 0 0 0 0 0 1 -1 -1 1 0 1 0

RKO/E6 UV -1 0 0 0 1 0 0 0 0 0 1 -1 -1 1 0 0 1

T47D IR 0 0 0 1 0 0 0 0 0 0 1 -1 0 -1 1 0 0

T47D MMS 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 1 0

T47D UV 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 0 1

Rows are cell lines subjected to different experimental conditions.

Comparisons are to the same cell line not exposed to the experimental treatment.

Genes Condition

Page 68: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene regulation

• split data in two parts: 2/3 and 1/3

• 2/3: training the predictor

• 1/3: empirical error measure

• create all predictors with less than 4 genes

and measure their empirical error

Page 69: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene regulation

• repeat for 256 random splitting and take

their mean empirical error

• choose the predictors with error less than

75%

Page 70: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene Regulation

AHA p53

RCH1

0.4760.203

PC1

0.153

OHO RCH1

BCL3

0.9930.743

MDM2

0.723

p53 p21

MDM2

0.8050.612

0.810

Page 71: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout

Gene Regulation

• some well known paths of the graph were

verified

• several unknown ones were suggested

• The possible new paths should be tested by

specific biochemical experiments