1
Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury , Dieter Fox, and Henry Kautz University of Washington Intel Research Experiments Approaches to Training Conditional Random Fields (CRFs) Maximum Likelihood • Run numerical optimization to find the optimal weights, which requires inference at each iteration • Inefficient for complex structures • Inadequate for continuous observations and feature selection Maximum Pseudo-Likelihood • Convert a CRF into separate patches; each consists of a hidden node and true values of neighbors • Run ML learning on separate patches • Efficient but may over-estimate inter-dependency • Inadequate for continuous observations and feature selection Our Approach: Virtual Evidence Boosting • Convert a CRF into separate patches; each consists of a hidden node and virtual evidence of neighbors • Run boosting (to select features) and belief propagation (to update virtual evidence) alternately Efficient and unified approach to feature selection and parameter estimation • Suitable for both discrete and continuous observations Extension of LogitBoost with Virtual Evidence Algorithms • Traditional boosting algorithms assume feature values be deterministic • We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could also be a likelihood value or probability distribution INPUTS: training samples OUTPUT: F (linear combination of features) FOR each iteration FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Virtual Evidence Boosting for CRFs Boosted Random Fields versus VEB • Closest related work to VEB is Boosted Random Fields (Torralba 2004) • BRFs combine boosting and belief propagation but assume dense graph structure and weak pair- wise influence • We compare the two as the pair- wise influence changes • VEB performs significantly better with strong relations Feature Selection VEB can be used to extract sparse structure from complex models. In this experiment it is able to find the exact order in a high- order HMM, and thus outperforms other learning alternatives. Indoor Activities • Activities: computer usage, meal, TV, meeting, and sleeping • Linear chain CRF with 315 continuous input features 1100 minutes of data over 12 days Physical Activities and Spatial Contexts • Context: indoors, outdoors, and vehicles • Activities: stationary, walking, running, driving, and going up/down stairs • Approximately 650 continuous input features • 400 minutes of data over 12 episodes INPUTS: Structure of CRF and training samples OUTPUT: F (linear combination of features) FOR each iteration Run BP using current F to get virtual evidence ve(x i , n(y i )); FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Training Algorithm Average accuracy VEB 88.8% MPL + all observations 72.1% MPL + boosting 70.9% HMM + AdaBoost 85.8% Training Algorithm Average accuracy VEB 94.1% BRF 88.0% ML + all observations 87.7% ML + boosting 88.5% MPL + all observations 87.9% MPL + boosting 88.5% (ve( ), ),with {0,1}, 1 and 0 i i i x y y i N F ( |ve( )) p py x i i i (1 ) i i i w p p ( 1) i i i y z p 2 1 1 argm in ve()(() ) N X i i i i f i x i w x fx z 1 and 0 i N F (ve( ), ),with {0,1}, i i i x y y ( |ve( )) p py x i i i ( 1) i i i y z p (1 ) i i i w p p 2 1 1 argm in ve()(() ) N X i i i i f i x i w x fx z Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs) Application domain : To learn human activity models from continuous, multi-modal sensory inputs Introduction Application: Human Activity Recognition Model human activities and select discriminatory features from multimodal sensor data. Sensors include accelerometer, audio, light, temperature, etc. Context sequence Activity sequence

Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington

Embed Size (px)

Citation preview

Page 1: Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington

Training Conditional Random Fields using Virtual Evidence Boosting

Lin Liao, Tanzeem Choudhury†, Dieter Fox, and Henry Kautz

University of Washington †Intel Research

Experiments

Approaches to Training Conditional Random Fields (CRFs)

Maximum Likelihood

• Run numerical optimization to find the optimal weights, which requires inference at each iteration

• Inefficient for complex structures• Inadequate for continuous

observations and feature selection

Maximum Pseudo-Likelihood• Convert a CRF into separate patches;

each consists of a hidden node and true values of neighbors

• Run ML learning on separate patches• Efficient but may over-estimate inter-

dependency• Inadequate for continuous observations

and feature selection

Our Approach: Virtual Evidence Boosting

• Convert a CRF into separate patches; each consists of a hidden node and virtual evidence of neighbors

• Run boosting (to select features) and belief propagation (to update virtual evidence) alternately

• Efficient and unified approach to feature selection and parameter estimation

• Suitable for both discrete and continuous observations

Extension of LogitBoost with Virtual Evidence

Algorithms

• Traditional boosting algorithms assume feature values be deterministic

• We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could also be a likelihood value or probability distributionINPUTS: training samples

OUTPUT: F (linear combination of features)

FOR each iteration

FOR each sample

Compute likelihood

Compute sample weight

Compute working response

END

Obtain best weak learner by solving

Add the weak learner to F

END

Virtual Evidence Boosting for CRFs

Boosted Random Fields versus VEB

• Closest related work to VEB is Boosted Random Fields (Torralba 2004)

• BRFs combine boosting and belief propagation but assume dense graph structure and weak pair-wise influence

• We compare the two as the pair-wise influence changes

• VEB performs significantly better with strong relationsFeature Selection

VEB can be used to extract sparse structure from complex models. In this experiment it is able to find the exact order in a high-order HMM, and thus outperforms other learning alternatives.

Indoor Activities • Activities: computer usage, meal, TV,

meeting, and sleeping • Linear chain CRF with 315 continuous

input features • 1100 minutes of data over 12 daysPhysical Activities and Spatial Contexts

• Context: indoors, outdoors, and vehicles

• Activities: stationary, walking, running, driving, and going up/down stairs

• Approximately 650 continuous input features

• 400 minutes of data over 12 episodes

INPUTS: Structure of CRF and training samples

OUTPUT: F (linear combination of features)

FOR each iteration

Run BP using current F to get virtual evidence ve(xi, n(yi));

FOR each sample

Compute likelihood

Compute sample weight

Compute working response

END

Obtain best weak learner by solving

Add the weak learner to F

END

Training Algorithm Average accuracy

VEB 88.8%

MPL + all observations

72.1%

MPL + boosting 70.9%

HMM + AdaBoost 85.8%

Training Algorithm Average accuracy

VEB 94.1%

BRF 88.0%

ML + all observations

87.7%

ML + boosting 88.5%

MPL + all observations

87.9%

MPL + boosting 88.5%

(ve( ), ), with {0,1}, 1 and 0i i ix y y i N F

( |ve( ))p p y xi i i

(1 )i i iw p p ( 1)i

i i

yz p

2

1 1argmin ve( )( ( ) )

N Xi i i i

f i xi

w x f x z

1 and 0i N F (ve( ), ), with {0,1}, i i ix y y

( |ve( ))p p y xi i i

( 1)ii i

yz p

(1 )i i iw p p

2

1 1argmin ve( )( ( ) )

N Xi i i i

f i xi

w x f x z

Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs)Application domain: To learn human activity models from continuous, multi-modal sensory inputs

Introduction

Application: Human Activity RecognitionModel human activities and select discriminatory features from multimodal sensor data. Sensors include accelerometer, audio, light, temperature, etc.

Context sequence

Activity sequence