85
University of Washington 1 Boosting and predictive modeling Yoav Freund Columbia University

Boosting and Predictive modeling

  • Upload
    trantu

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

1

Boosting and predictive modeling

Yoav FreundColumbia University

Page 2: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

2

What is “data mining”?

Lots of data - complex models• Classifying customers using transaction

logs.• Classifying events in high-energy physics

experiments.• Object detection in computer vision. • Predicting gene regulatory networks.• Predicting stock prices and portfolio

management.

Page 3: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

3

Leo BreimanStatistical Modeling / the two cultures

Statistical Science, 2001

• The data modeling culture (Generative modeling)– Assume a stochastic model (5-50 parameters).– Estimate model parameters.– Interpret model and make predictions.– Estimated population: 98% of statisticians

• The algorithmic modeling culture (Predictive modeling) – Assume relationship btwn predictor vars and response vars

has a functional form (10^2 -- 10^6 parameters).– Search (efficiently) for the best prediction function.– Make predictions– Interpretation / causation - mostly an after-thought.– Estimated population: 2% 0f statisticians (many in other fields).

Page 4: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

4

Toy Example

• Computer receives telephone call• Measures Pitch of voice• Decides gender of caller

HumanVoice

Male

Female

Page 5: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

5

Generative modeling

Voice Pitch

Prob

abili

ty

mean1

var1

mean2

var2

Page 6: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

6

Discriminative approach

Voice Pitch

No.

of m

ista

kes

Page 7: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

7

Ill-behaved data

Voice Pitch

Prob

abili

ty mean1 mean2

No.

of m

ista

kes

Page 8: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

8

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 9: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

9

Plan of talk• Boosting: Combining weak classifiers.• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 10: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

10

batch learning for binary classification

x,y( ) ~ D; y ∈ −1,+1{ }Data distribution:

ε h( ) ˙ = P x ,y( )~D h(x) ≠ y( )Generalization error:

T = x1,y1( ), x2 ,y2( ),..., xm ,ym( ); T ~ DmTraining set:

ˆ ε (h) ˙ = 1m

1 h(x) ≠ y[ ]x,y( )∈T∑ ˙ = P x ,y( )~T h(x) ≠ y[ ]

Training error:

GOAL: find that minimizes

h : X → −1,+1{ }

Page 11: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

11

A weighted training set

Feature vectors

Binary labels {-1,+1}

Positive weights

x1,y1,w1( ), x2 ,y2 ,w2( ),K , xm ,ym ,wm( )

Page 12: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

12

A weak learner

The weak requirement:

yi ˆ y iwii=1

m∑wii=1

m∑> γ > 0

A weak ruleh

h

Weak Learner

Weighted training set

x1,y1,w1( ), x2 ,y2 ,w2( ),K , xm ,ym ,wm( )

instances

x1,x2 ,K ,xm

predictions

ˆ y 1, ˆ y 2 ,K , ˆ y m; ˆ y i ∈ {0,1}

Page 13: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

13

The boosting process

FT x( ) = α 1h1 x( ) +α 2h2 x( ) +L + α T hT x( )Final rule:

fT (x) = sign FT x( )( )

weak learner

x1, y1,w11( ), x2, y2,w2

1( ),K , xn ,yn,wn1( )

x1, y1,w1T −1( ), x2, y2,w2

T −1( ),K , xn , yn,wnT −1( )

hT

weak learner

x1,y1,1( ), x2,y2,1( ),K , xn , yn,1( )

h1

x1, y1,w12( ), x2, y2,w2

2( ),K , xn ,yn,wn2( )

h3

h2

Page 14: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

14

αt = ln wit

i:ht xi( )=1,yi =1∑ wi

t

i:ht xi( )=1,yi =−1∑ ⎛

⎝ ⎜ ⎞ ⎠ ⎟

wit = exp −yiFt−1(xi )( )

Adaboost

F0 x( ) ≡ 0

Ft+1 = Ft +α tht€

Get ht from weak − learner

for t =1..T

Freund, Schapire 1997

Page 15: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

15

Main property of Adaboost

If advantages of weak rules over random guessing are: T then training error of final rule is at most

ˆ ε fT( ) ≤ exp − γ t2

t=1

T

∑ ⎛

⎝ ⎜

⎠ ⎟

Page 16: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

16

Boosting block diagram

WeakLearner

Booster

Weakrule

Exampleweights

Strong Learner AccurateRule

Page 17: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

17

Loss

CorrectMistakes

y Ft (x)

Adaboost as gradient-descent

Adaboost =

e−yFt (x )

0-1 loss

Logitboost=

ln 1+ e−yFt (x )( )

Brownboost=

12

1− erfyFt x( ) + c − t

c − t

⎛ ⎝ ⎜

⎞ ⎠ ⎟

⎝ ⎜

⎠ ⎟

Page 18: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

18

Plan of talk• Boosting• Alternating Decision Trees: a hybrid of boosting

and decision trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 19: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

19

Decision Trees

X>3

Y>5-1

+1-1

no

yes

yesno

X

Y

3

5

+1

-1

-1

Page 20: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

20

-0.2

A decision tree as a sum of weak rules.

X

Y-0.2

+0.2-0.3

Y>5

yesno

-0.1 +0.1

X>3

no

yes

+0.1-0.1

+0.2

-0.3

+1

-1

-1sign

Page 21: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

21

An alternating decision tree

X

Y

+0.1-0.1

+0.2

-0.3

sign

-0.2

Y>5

+0.2-0.3yesno

X>3

-0.1

no

yes

+0.1

+0.7

Y<1

0.0

no

yes

+0.7

+1

-1

-1

+1

Freund, Mason 1997

Page 22: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

22

Example: Medical Diagnostics

• Cleve dataset from UC Irvine database.

• Heart disease diagnostics (+1=healthy,-1=sick)

• 13 features from tests (real valued and discrete).

• 303 instances.

Page 23: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

23

AD-tree for heart-disease diagnostics

>0 : Healthy<0 : Sick

Page 24: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

24

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 25: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

25

AT&T “buisosity” problem

• Distinguish business/residence customers from call detail information. (time of day, length of call …)

• 230M telephone numbers, label unknown for ~30%• 260M calls / day • Required computer resources:

Huge: counting log entries to produce statistics -- use specialized I/O efficient sorting algorithms (Hancock).Significant: Calculating the classification for ~70M customers.Negligible: Learning (2 Hours on 10K training examples on an off-line computer).

Freund, Mason, Rogers, Pregibon, Cortes 2000

Page 26: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

26

AD-tree for “buisosity”

Page 27: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

27

AD-tree (Detail)

Page 28: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

28

Quantifiable results

• For accuracy 94% increased coverage from 44% to 56%.

• Saved AT&T 15M$ in the year 2000 in operations costs and missed opportunities.

RecallPr

ecis

ion

Precision θ( ) =TPos θ( )

FPos θ( ) + TPos θ( )

Recall θ( ) =TPos θ( )AllPos

Varying Score Threshold θDefine :

Page 29: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

29

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 30: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

30

The database bottleneck• Physical limit: disk “seek” takes 0.01 sec

– Same time to read/write 10^5 bytes– Same time to perform 10^7 CPU operations

• Commercial DBMS are optimized for varying queries and transactions.

• Statistical analysis requires evaluation of fixed queries on massive data streams.

• Keeping disk I/O sequential is key.• Data Compression: improves I/O speed but

restricts random access.

Page 31: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

31

CS theory regarding very large data-sets• Massive datasets: “You pay 1 per disk

block you read/write ε per CPU operation. Internal memory can store N disk blocks”– Example problem: Given a stream of line segments (in

the plane), identify all segment pairs that intersect.– Vitter, Motwani, Indyk, …

• Property testing: “You can only look at a small fraction of the data”– Example problem: decide whether a given graph is bi-

partite by testing only a small fraction of the edges.– Rubinfeld, Ron, Sudan, Goldreich, Goldwasser, …

Page 32: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

32

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 33: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

33

A very curious phenomenon

Boosting decision trees

Using <10,000 training examples we fit >2,000,000 parameters

Page 34: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

34

Large margins

marginFT(x,y) ˙ = y

α tht x( )t=1

T∑α tt=1

T∑= y

FT x( )r α

1

marginFT(x,y) > 0 ⇔ fT (x) = y

Thesis:large margins => reliable predictionsVery similar to SVM.

Page 35: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

35

Experimental Evidence

Page 36: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

36

TheoremSchapire, Freund, Bartlett & Lee / Annals of statistics 1998

H: set of binary functions with VC-dimension dC

= αihi | hi ∈ H ,α i > 0, α i =1∑∑{ }

∀c ∈C,∀θ > 0, with probability1− δ w.r.t. T ~ Dm

P x,y( )~D sign c(x)( ) ≠ y[ ] ≤ P x,y( )~T marginc x,y( ) ≤ θ[ ]

+ ˜ O d / mθ

⎝ ⎜

⎠ ⎟+O log 1

δ ⎛ ⎝ ⎜

⎞ ⎠ ⎟

T = x1,y1( ), x2 ,y2( ),..., xm ,ym( ); T ~ Dm

No dependence on no. of combined functions!!!

Page 37: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

37

Idea of Proof

Page 38: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

38

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 39: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

39

A motivating example

-

-

-+

+

+

++

+

++

++

-

-

-

-

-

-

--

-

----

--

-

--

-

---

--

--

-

---

-

--

-

--

-

---

--

+

++

+

++

+

+

++

+

++

+ +

++

+

+

+

+

+

++

+

++

+

+

++

+

++

+--

-

-- -

--

---

--

?

?

?

Unsure

Unsure

Page 40: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

40

The algorithm

η > 0, Δ > 0Parameters

w(h) ˙ = e−η ˆ ε h( )Hypothesis weight:

ˆ l η (x) ˙ = 1η

lnw(h)

h:h ( x)=+1∑

w(h)h:h ( x)=−1∑

⎜ ⎜ ⎜

⎟ ⎟ ⎟

Empirical Log RatioEmpirical Log Ratio::

ˆ p η ,Δ x( ) =+1 if ˆ l x( ) > Δ

-1,+1{ } if ˆ l x( ) ≤ Δ

−1 if ˆ l x( ) < −Δ

⎨ ⎪

⎩ ⎪

Prediction rule:

Freund, Mansour, Schapire, Annals of Stat, August 2004

Page 41: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

41

Suggested tuning

P(abstain) = P x ,y( )~Dˆ p (x) = −1,+1{ }( ) = 5ε h*( ) +O

ln 1 δ( ) + ln H( )m1/2−θ

⎝ ⎜ ⎜

⎠ ⎟ ⎟

2) for m = Ω ln 1 δ( ) ln H( )( )1/θ ⎛

⎝ ⎜ ⎞ ⎠ ⎟

Yields:

1) P mistake( ) = P x,y( )~D y ∉ ˆ p (x)( ) = 2ε h*( ) +Oln m( )m1/2−θ

⎛ ⎝ ⎜

⎞ ⎠ ⎟

H is a finite set.

m=Size of training set

δ=Probability of failure

η =m1 2−θ ln H

Δ =lnHδ

⎛ ⎝ ⎜

⎞ ⎠ ⎟ m1 2+θ

0 <θ < 14

Setting:

Page 42: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

42

Confidence Rating block diagram

Rater-Combiner

Confidence-ratedRule

CandidateRules

x1,y1( ), x2 ,y2( ),K , xm ,ym( )

Training examples

Page 43: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

43

Summary of Confidence-Rated Classifiers

• Frequentist explanation for the benefits of model averaging

• Separates between inherent uncertainty and uncertainty due to finite training set.

• Computational hardness: unknown other than in few special cases

• Margins from Boosting or SVM can be used as an approximation.

• Many practical applications!

Page 44: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

44

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 45: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

45

Face Detection - Using confidence to save time

• Paul Viola and Mike Jones developed a face detector that can work in real time (15 frames per second).

Viola & Jones 1999

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 46: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

46

Image Features

“Rectangle filters”

Similar to Haar wavelets Papageorgiou, et al.

000,000,6100000,60 =×Unique Binary Features

⎩⎨⎧ >

=otherwise

)( if )(

t

tittit

xfxh

βθα

Page 47: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

47

Example Classifier for Face Detection

ROC curve for 200 feature classifier

A classifier with 200 rectangle features was learned using AdaBoost

95% correct detection on test set with 1 in 14084false positives.

Not quite competitive...

Page 48: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

48

Employing a cascade to minimize average detection time

The accurate detector combines 6000 simple features using Adaboost.

In most boxes, only 8-9 features are calculated.

Features 1-3Allboxes

Definitely not a face

Might be a face

Features 4-10

Page 49: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

49

Using confidence to avoid labelingLevin, Viola, Freund 2003

Page 50: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

50

Image 1

Page 51: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

51

Image 1 - diff from time average

Page 52: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

52

Image 2

Page 53: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

53

Image 2 - diff from time average

Page 54: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

54

Co-training

HwyImages

Raw B/W

Diff Image

Partially trainedB/W basedClassifier

Partially trainedDiff basedClassifier

Confident Predictions

Confident Predictions

Blum and Mitchell 98

Page 55: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

55

Grey-scale detection score

Subt

ract

-ave

rage

det

ectio

n sc

ore

Non carsCars

Page 56: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

56

Co-Training Results

Raw Image detector Difference Image detector

Before co-training After co-training

Page 57: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

57

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 58: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

58

DNA

Measurable quantity

Gene Regulation

• Regulatory proteins bind to non-coding regulatory sequence of a gene to control rate of transcription

regulators

mRNAtranscript

bindingsites

Page 59: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

59

From mRNA to Protein

mRNAtranscript

Nucleus wall

ribosomeProteinfolding

Protein sequence

Page 60: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

60

Protein Transcription Factors

regulator

Page 61: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

61

Genome-wide Expression Data

• Microarrays measure mRNA transcript expression levels for all of the ~6000 yeast genes at once.

• Very noisy data• Rough time slice over all

compartments of many cells.• Protein expression not observed

Page 62: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

62

Partial “Parts List” for Yeast

Many known and putative – Transcription factors– Signaling molecules

that activate transcription factors– Known and putative binding site “motifs” – In yeast, regulatory sequence = 500 bp upstream

region

TFSM

MTF

TFMTF

Page 63: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

63

Predict target gene regulatory response from regulator activity and binding site data

MicroarrayImageR1 R2 RpR4R3 …..“Parent”

gene expression G1

Target gene expression

G2

G3

G4

Gt

GeneClass: Problem Formulation

G1G2G3G4

Gt

Binding sites (motifs)in upstream region

M. Middendorf, A. Kundaje, C. Wiggins, Y. Freund, C. Leslie.Predicting Genetic Regulatory Response Using Classification. ISMB 2004.

Page 64: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

64

Role of quantization

-1 +10

By Quantizing expression into three classesWe reduce noise but maintain most of signal

Weighting +1/-1 examples linearly with Expression level performs slightly better.

Page 65: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

65

Problem setup

• Data point = Target gene X Microarray• Input features:

– Parent state {-1,0,+1}– Motif Presence {0,1}

• Predict output:– Target Gene {-1,+1}

Page 66: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

66

Boosting with Alternating Decision Trees (ADTs)

• Use boosting to build a single ADT, margin-based generalization of decision tree

Splitter NodeIs MotifMIG1 presentAND ParentXBP1 up?Prediction Node

F(x) given by sum of prediction nodes alongall paths consistent with x

Page 67: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

67

Statistical Validation• 10-fold cross-validation experiments, ~50,000

(gene/microarray) training examples • Significant correlation between prediction score and true log

expression ratio on held-out data.• Prediction accuracy on +1/-1 labels: 88.5%

Page 68: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

68

Biological InterpretationFrom correlation to causation

• Good prediction only implies Correlation.• To infer causation we need to integrate additional knowledge.• Comparative case studies: train on similar conditions (stresses), test

on related experiments• Extract significant features from learned model

– Iteration score (IS): Boosting iteration at which feature first appearsIdentifies significant motifs, motif-parent pairs

– Abundance score (AS): Number of nodes in ADT containing featureIdentifies important regulators

• In silico knock-outs: remove significant regulator and retrain.

Page 69: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

69

Case Study: Heat Shock and Osmolarity

Training set: Heat shock, osmolarity, amino acid starvation

Test set: Stationary phase, simultaneous heat shock+osmolarity

Results: Test error = 9.3% Supports Gasch hypothesis: heat shock and osmolarity

pathways independent, additive– High scoring parents (AS): USV1 (stationary phase and heat

shock), PPT1 (osmolarity response), GAC1 (response to heat)

Page 70: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

70

Case Study: Heat Shock and Osmolarity Results:

High scoring binding sites (IS): MSN2/MSN4 STRE element Heat shock related: HSF1 and RAP1 binding sitesOsmolarity/glycerol pathways: CAT8, MIG1, GCN4Amino acid starvation: GCN4, CHA4, MET31

– High scoring motif-parent pair (IS):TPK1~STRE pair (kinase that regulates MSN2 via cellular localization) – indirect effect

TFMTF

PPMp

PMMp

Direct binding Indirect effect Co-occurrence

Page 71: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

71

Case Study: In silico knockout

• Training and test sets: Same as heat shock and osmolarity case study

• Knockout: Remove USV1 from regulator list and retrain• Results:

– Test error: 12% (increase from 9%)– Identify putative downstream targets of USV1: target genes

that change from correct to incorrect label– GO annotation analysis reveals putative functions: Nucleoside

transport, cell-wall organization and biogenesis, heat-shock protein activity

– Putative functions match those identified in wet lab USV1 knockout (Segal et al., 2003)

Page 72: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

72

Conclusions: Gene Regulation

• New predictive model for study of gene regulation– First gene regulation model to make quantitative

predictions. – Using actual expression levels - no clustering.– Strong prediction accuracy on held-out experiments– Interpretable hypotheses: significant regulators,

binding motifs, regulator-motif pairs• New methodology for biological analysis: comparative

training/test studies, in silico knockouts

Page 73: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

73

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 74: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

74

Summary• Moving from density estimation to classification

can make hard problems tractable.• Boosting is an efficient and flexible method for

constructing complex and accurate classifiers.• I/O is the main bottleneck to data-mining

– Sampling, data localization and parallelization help.• Correlation -> Causation : still a hard problem,

requires domain specific expertise and integration of data sources.

Page 75: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

75

Future work• New applications:

– Bio-informatics.– Vision / Speech and signal processing.– Information Retrieval and Information Extraction.

• Theory:– Improving the robustness of learning algorithms.– Utilization of unlabeled examples in confidence-rated

classification.– Sequential experimental design. – Relationships between learning algorithms and

stochastic differential equations.

Page 76: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

76

Extra

Page 77: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

77

Plan of talk• Boosting• Alternating Decision Trees• Data-mining AT&T transaction logs.• The I/O bottleneck in data-mining.• High-energy physics.• . Resistance of boosting to over-fitting.• Confidence rated prediction.• Confidence-rating for object recognition.• Gene regulation modeling.• Summary

Page 78: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

78

Analysis for the MiniBooNE experiment

• Goal: To test for neutrino mass by searching for neutrino oscillations. • Important because it may lead us to physics beyond the Standard Model. • The BooNE project began in 1997. • The first beam induced neutrino events were detected in September, 2002.

MiniBooNE detector(Fermi Lab)

Page 79: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

79

SimulationOf

MiniBooNEDetector

Reconstruction

Feature Vector(52 Reals)

52 inputs

26 Hidden

Neural Network

x > θ

ν e

Other

no

yes

Ion Stancu. UC Riverside

MiniBooNE Classification Task

Page 80: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

80

Page 81: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

81

Results

Page 82: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

82

Using confidence to reduce labeling

Unlabeled dataPartially trained

classifier Sample of unconfident examples

Labeledexamples

Query-by-committee, Seung, Opper & SompolinskyFreund, Seung, Shamir & Tishby

Page 83: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

83

Discriminative approach

Voice Pitch

No.

of m

ista

kes

Page 84: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

84

Results from Yotam Abramson.

Page 85: Boosting and Predictive modeling

Uni

vers

ity o

f Was

hing

ton

85