22
7/17/2002 Greg Grudic: Nonparametri c Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University of Colorado, Boulder [email protected] www.cs.colorado.edu/~grudic

7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

1

High Dimensional Nonparametric Modeling Using Two-Dimensional

Polynomial Cascades

Greg GrudicUniversity of Colorado, Boulder

[email protected]/~grudic

Page 2: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

2

Outline

• Applications of Very High Dimension Nonparametric Modeling

• Define the problem domain

• One solution: Polynomial Cascade Algorithm

• Conclusion

Page 3: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

3

Applications of High Dimensional Non-Parametric Models

1. Human-to-Robot Skill Transfer (ICRA96)2. Mobile Robot Localization (IROS98)3. Strength Prediction of Boards4. Defect Classification in Lumber5. Activity recognition for the cognitively

impairedThe same PC algorithm is used in all of

these applications (with no parameter tuning)

Page 4: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

4

Human-to-Robot Skill Transfer (ICRA96)

• Problem: Human demonstrates a task via teleoperation.– Object locate and approach task.– 1024 raw pixel inputs and 2 actuator outputs.

• Learning Data: 4 demonstrations of task sequence.– 2000 to 5000 learning examples (~2 to 5 min).

• Learning Time: ~5 min. on SPARC 20.• Model Size / Evaluation Speed: < 500 Kb, ~5 Hz.• Autonomous control of robot using model:

– No failures in 30 random trials.

Page 5: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

5

Mobile Robot Localization (IROS98)

• Goal: Use on-board camera images to obtain position/orientation

• Workspace: 6 x 5 meters in a research lab.• Desired accuracy: 0.2 meters in pos. and 10

deg or.• Inputs: 3 raw pixel images (160 by 120) =>

19,200 inputs.• Learning Data: 2000 image inputs and robot

pos./or.• Learning time: ~2 hours on SPARC 20.• Model Size/Evaluation speed: ~2.0 MB, 7 Hz.

Page 6: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

6

Strength Prediction of Boards

• Goal: Predict the strength of a board (2x4) using nondestructive scans (Slope of Grain, Elasticity, XRay).

• Current Wood Processing Industry Standard: correlation of 0.5 to 0.65.

• The Learning Data: Scanned 300 boards and broke them each in 3 to 4 different places.

• Model Inputs: ~5000 statistical features.• Learning time and model size: ~40 min. / ~1

MB.• Model Accuracy (correlation): = 0.8.

Page 7: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

7

Defect Classification in Lumber

• Problem: Classify board defects using “images”.• < 10 ms per classification (Speed ~12 ft / sec).• ~20 classes - 4 types of knots, pitch pockets,

etc.• Many attempted solutions: analytical methods,

learning methods, etc.• Model Inputs: > 1000.• Learning Examples: > 1000.• Model Accuracy: > 92%

Page 8: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

8

Activity Recognition for the Cognitively Impaired

• Goal: – To keep track of what activity a person is doing using cameras

• e.g. which room is a person in; what are they doing; what have they completed?

– Minimal engineering of environment

• Soln: Attach a video camera to the person as tasks are accomplished– Label camera images accordingly– Build a model that classifies the images

• ~4000 raw pixels as inputs

• Preliminary results: success rate of 90% for identifying 4 different tasks

Page 9: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

9

Problem Domain Characteristics

• thousands of relevant input variables– each contributing a small but significant

amount to the final model

• no subset of these variables can adequately describe the desired function

• the relevant variables are confounded by thousands of irrelevant variables

Page 10: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

10

Why is this a difficult domain?

• Very large input space!

• Problem is intrinsically nonparametric– Don’t know which inputs are significant– Don’t know an optimal model structure

• Problems are in general nonlinear

Page 11: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

11

Constructing Models from Data

• Given input/output examples of some phenomenon (regression/classification function) :

• Construct an approximate mapping such that, for some unseen :

N

( )y f= x

( ) ( ){ }1 1, ,..., ,N Ny yx x

( )ˆˆ newy f= x

fnewx

Page 12: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

12

Polynomial Cascade Algorithm: Conceptual Motivation (IJCAI 97)

• Problem # 1: Simultaneous construction of model infeasible.– Solution: use low dimensional projections (building

blocks).• simplest approach: 2 dimensional:

• Problem # 2: Finding the best low dimensional projection infeasible.– Soln: Don’t find the best - use selection criteria which

are independent of dimension.• simplest approach: random building block selection.

( ),lg u v

Page 13: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

13

PC Algorithm: Conceptual Motivation (continued)

• Problem # 3: Low dimensional projections tend to be flat (i.e. ).– Soln: Subdivide the input space.

• simplest approach: random subdivision (bootstrap samples).

( ),lg u v C=

Page 14: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

14

Polynomial Cascade Structure

. . .

. . .

0kx

1kx

2kx

Lkx

1a

2a

La

1g

2g

Lg

yS

Page 15: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

15

Main PC Algorithm Characteristics

1. Building blocks (3rd order polynomials)

2. added one at a time, in order

3. Random (repeated) order of inputs:

4. constructed using a bootstrap sample

( ) 00 01 10 112 2 2 2

20 02 21 123 3

30 03

,lg u v a a v a u a uv

a u a v a u v a uv

a u a v

= + + + +

+ + + +

+( ),lg u v

( )1,..., dx x( ),lg u v

Page 16: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

16

PC Algorithm:

STEP 1: Initialize algorithm:

– Learning data divided into training set and validation set

– Random order of inputs

STEP 2: Construct new section: (Multiple levels)– Use bootstrap sample to fit – Set to the normalized inverse MSE of on

training set– Stop when error on validation set stops decreasing

( ),lg u v

la ( ),lg u v

Page 17: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

17

PC Algorithm: (continued)

STEP 3: Prune section: Prune back to the block which has smallest error on the validation set

STEP 4: Update learning outputs. Replace outputs with residual errors:

STEP 4: Check stopping condition: GOTO STEP 2 if further error reduction is possible, otherwise STOP

s

i il i

y y ga=

¬ - å

Page 18: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

18

Why does PC work?

• Over fitting avoided via appropriate injection of randomness: i.e like Random Forests (Breiman, 1999)– Bootstrap sampling– Random order of inputs

• Irrelevant inputs not excluded from cascade– Treated as noise and averaged out

• No explicit variable selection is used

Page 19: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

19

Why does PC work? (continued)

• Produces stable high dimensional models– Projections onto 2 dimensional structures

• Low dimensional projections are unlikely to be flat– Bootstrap sampling avoids – PC algorithm effectively deals with parity

problems of greater than 2 dimensions• e.g. 10 bit parity problem where , for

all levels, without random sampling

( ),lg u v Cº

( ),lg u v Cº

Page 20: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

20

PC Effective on Low Dimensional Problems (surprise?)

• Does as well or better than most algorithms on low dimensional regression problems (IJCAI97)

• Produces competitive models without the need for parameter tuning or kernel selection

• HOWEVER:– Models are not sparse!

Page 21: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

21

Theoretical Results

1. PC’s are universal approximators

2. Conditions for convergence to zero error:• Uncorrelated errors from level to level

• Similar to bagging and random forests (Breiman)

3. Rate of convergence (to some local error minimum), as a function of the number of learning examples, is independent of the dimension of the input space

Page 22: 7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University

7/17/2002 Greg Grudic: Nonparametric Modeling

22

Conclusion

• There are many application areas for very high dimension, nonlinear, nonparametric modeling algorithms!

• Cascaded low dimensional polynomials produce effective nonparametric models

• Polynomial Cascades are most effective in problem domains characterized by– thousands of relevant input variables

• each contributing a small but significant amount to the final model

– no subset of these variables can adequately describe the desired function

– the relevant variables are confounded by thousands of irrelevant variables