18

Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Sequential Aggregation-disaggregation OptimizationMethods For Data Stream Mining

Michael Hahsler1 Young Woong Park2

1Lyle School of Engineering, SMU

2Cox School of Business, SMU

2016 INFORMS Annual MeetingNovember 2016

Hahsler & Park (SMU) Sequential AID INFORMS16 1 / 23

Page 2: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Table of Contents

1 Motivation

2 Iterative Aggregation-Disaggregation

3 Sequential Aggregation-Disaggregation

4 Preliminary Experiments

Hahsler & Park (SMU) Sequential AID INFORMS16 2 / 23

Page 3: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Algorithms for many optimization problems scale poorly for large data.

Standard Optimization Algorithm

Data Opt. solutionAlgorithm

Issues:

Data does not fit into memory.

Many iterations over the data.

Typically have a super-linear run-time complexity.

Hahsler & Park (SMU) Sequential AID INFORMS16 3 / 23

Page 4: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Algorithms for many optimization problems scale poorly for large data.

Standard Optimization Algorithm

Data Opt. solutionAlgorithm

Issues:

Data does not fit into memory.

Many iterations over the data.

Typically have a super-linear run-time complexity.

Hahsler & Park (SMU) Sequential AID INFORMS16 3 / 23

Page 5: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

When data size is large, solving an optimization problem may be hard/intractable.

Hahsler & Park (SMU) Sequential AID INFORMS16 4 / 23

Page 6: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Can we optimize with aggregates?Optimality?

Hahsler & Park (SMU) Sequential AID INFORMS16 4 / 23

Page 7: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Iterative aggregation-disaggregation schemes have been shown to be effective forlarge data (Rogers et al, 1991; Park and Klabjan, 2016).

Data

Aggregation Disaggregation

Data

Disaggregation

Data

Disaggregation

Data

FinalSolution

Iterative Aggregation/Disaggregation Framework

Solution ImprovedSolution

Aggregates Aggregates Aggregates

Stop

Iterative Aggregation-DisaggregationThe algorithms start by aggregating the original data, solving the problem on aggregateddata, and then in subsequent steps gradually disaggregate the aggregated data to find agood (potentially optimal) solution.

Hahsler & Park (SMU) Sequential AID INFORMS16 5 / 23

Page 8: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Data StreamA data stream is a potentially unbounded sequence of observations. Processingstream is now common for many applications: GPS data from smart phones, webclick-stream data, telecommunication connection data, readings from sensor nets,stock quotes.

Limited storage but potentially unbounded size of data streams pose the followingchallenges:

3 Store only summaries (e.g., clusters).

7 Real-time processing. Only a single pass over the data is possible.

7 Concept drift: data distributions change over time.

Hahsler & Park (SMU) Sequential AID INFORMS16 6 / 23

Page 9: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Data StreamA data stream is a potentially unbounded sequence of observations. Processingstream is now common for many applications: GPS data from smart phones, webclick-stream data, telecommunication connection data, readings from sensor nets,stock quotes.

Limited storage but potentially unbounded size of data streams pose the followingchallenges:

3 Store only summaries (e.g., clusters).

7 Real-time processing. Only a single pass over the data is possible.

7 Concept drift: data distributions change over time.

Hahsler & Park (SMU) Sequential AID INFORMS16 6 / 23

Page 10: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Sequential Aggregation-Disaggregation

We propose a sequential aggregation-disaggregation optimization method wherethe disaggregation steps cannot be explicitly performed on past data. The methodhas the following properties:

1 Anticipates disaggregation via partial aggregation.

2 Performs partial aggregation sequentially as new data arrives.

3 Places more weight on newer data.

For data streams:

3 Stores only summaries (e.g., clusters).

3 Real-time processing. Only a single pass over the data.

3 Follows changing distributions.

Hahsler & Park (SMU) Sequential AID INFORMS16 7 / 23

Page 11: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Sequential Aggregation-Disaggregation

We propose a sequential aggregation-disaggregation optimization method wherethe disaggregation steps cannot be explicitly performed on past data. The methodhas the following properties:

1 Anticipates disaggregation via partial aggregation.

2 Performs partial aggregation sequentially as new data arrives.

3 Places more weight on newer data.

For data streams:

3 Stores only summaries (e.g., clusters).

3 Real-time processing. Only a single pass over the data.

3 Follows changing distributions.

Hahsler & Park (SMU) Sequential AID INFORMS16 7 / 23

Page 12: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Table of Contents

1 Motivation

2 Iterative Aggregation-Disaggregation

3 Sequential Aggregation-Disaggregation

4 Preliminary Experiments

Hahsler & Park (SMU) Sequential AID INFORMS16 8 / 23

Page 13: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

IAD: Algorithm

Components of the algorithm: need to be tailored for a particular problem

Definition aggregation/clustering procedure.

Disaggregation procedure: How to partition the current clusters?

Stopping/Optimality conditions

AID: algorithmic framework

Initialization: Define clusters and the aggregated data

While Stopping/Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality condition / Decluster / Update the aggregated data

End While

Hahsler & Park (SMU) Sequential AID INFORMS16 9 / 23

Page 14: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

LAD illustration

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 15: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

LAD illustration

𝛽

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 16: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

Aggregated data: Average vector for each cluster

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 17: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

Aggregated problem : Minimize F t = 6et1 + 8et2 + 5et3 + 5et4 + 9et5

5

5

6

8

9

𝑒1𝑡

𝑒2𝑡

𝑒3𝑡

𝑒4𝑡

𝑒5𝑡

𝛽𝑡

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 18: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

Solution to the original problem: Et =∑ni=1 ei, where ei = |βtxi − yi|

𝛽𝑡

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 19: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression

Least absolute deviation (LAD) regression

Given explanatory data x ∈ Rn×m and response data y ∈ Rn, find minimizerβ ∈ Rm

E∗ = minβ∈Rm

∑i∈I |yi −

∑j∈J xijβj |

Optimality condition: Are all observations in a cluster on the same side? (Park andKlabjan, 2016)

𝛽𝑡

Hahsler & Park (SMU) Sequential AID INFORMS16 10 / 23

Page 20: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and decluster

End While

𝛽𝑡

Solve with the aggregated data

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 21: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and decluster

End While

Check optimality criteria

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 22: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and decluster

End While

Decluster

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 23: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and decluster

End While

Create new aggregated data

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 24: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and decluster

End While

𝛽𝑡+1

Solve with the aggregated data

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 25: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for LAD Regression: Illustration

While Optimality condition is not satisfied

Solve the problem with the aggregated data

Check optimality criteria and declusterEnd While

Check optimality criteria (optimal)

Hahsler & Park (SMU) Sequential AID INFORMS16 11 / 23

Page 26: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Table of Contents

1 Motivation

2 Iterative Aggregation-Disaggregation

3 Sequential Aggregation-Disaggregation

4 Preliminary Experiments

Hahsler & Park (SMU) Sequential AID INFORMS16 12 / 23

Page 27: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Motivation

Data

Aggregation Disaggregation

Data

Disaggregation

Data

Disaggregation

Data

FinalSolution

Iterative Aggregation/Disaggregation Framework

Solution ImprovedSolution

Aggregates Aggregates Aggregates

Stop

IAD is a powerful framework, but needs repeated access to some data toperform disaggregation. This is not possible for data streams.

Hahsler & Park (SMU) Sequential AID INFORMS16 13 / 23

Page 28: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Batch Processing

Why not just do batch processing?

Batch 1Data Batch 2 Batch 3 Batch 4

Solution Solution Solution Solution

Batch Processing Framework

Batch needs to be large enough to find a good solution.

Information is not preserved over batches.

Aggregating several solutions (e.g., by parameter averaging), does notoptimize the overall objective function.

Hahsler & Park (SMU) Sequential AID INFORMS16 14 / 23

Page 29: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Batch Processing

Why not just do batch processing?

Batch 1Data Batch 2 Batch 3 Batch 4

Solution Solution Solution Solution

Batch Processing Framework

Batch needs to be large enough to find a good solution.

Information is not preserved over batches.

Aggregating several solutions (e.g., by parameter averaging), does notoptimize the overall objective function.

Hahsler & Park (SMU) Sequential AID INFORMS16 14 / 23

Page 30: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for Streams

Batch 1Data Batch 2 Batch 3 Batch 4

PartialAggregation

PartialAggregation

PartialAggregation

Partial Aggregation Framework for Streams

Solution Solution SolutionSolution

Aggregates Aggregates Aggregates

PartialAggregation

Aggregates

Partial aggregation: Use a data stream clustering algorithm.Example for LAD: Do not aggregate

1 points from different sides of the current regression line, and2 points close to the current regression line.

Decay in data stream clustering will remove aggregation mistakes over time andallow the model to adapt to changes in the data.

Hahsler & Park (SMU) Sequential AID INFORMS16 15 / 23

Page 31: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

AID for Streams

Batch 1Data Batch 2 Batch 3 Batch 4

PartialAggregation

PartialAggregation

PartialAggregation

Partial Aggregation Framework for Streams

Solution Solution SolutionSolution

Aggregates Aggregates Aggregates

PartialAggregation

Aggregates

Partial aggregation: Use a data stream clustering algorithm.Example for LAD: Do not aggregate

1 points from different sides of the current regression line, and2 points close to the current regression line.

Decay in data stream clustering will remove aggregation mistakes over time andallow the model to adapt to changes in the data.

Hahsler & Park (SMU) Sequential AID INFORMS16 15 / 23

Page 32: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Table of Contents

1 Motivation

2 Iterative Aggregation-Disaggregation

3 Sequential Aggregation-Disaggregation

4 Preliminary Experiments

Hahsler & Park (SMU) Sequential AID INFORMS16 16 / 23

Page 33: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Simple Data

1 million random data points for x = [0, 10] following

y = 5 + 3x+ ε

with ε ∼ N(0, 5).

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

−10

010

2030

40

x

y

Hahsler & Park (SMU) Sequential AID INFORMS16 17 / 23

Page 34: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Simple Data Set

n = 1 million pointsbatch size b = 500support points s = 200

0 200 400 600 800 1000

010

020

030

040

050

060

0

Points in 1000s

Use

d P

oint

s

allbatchstream

0 200 400 600 800 1000

020

4060

8010

0

Points in 1000s

time

[s]

allbatchstream

Hahsler & Park (SMU) Sequential AID INFORMS16 18 / 23

Page 35: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Simple Data Set

n = 1 million pointsbatch size b = 500support points s = 200

0 200 400 600 800 1000

010

020

030

040

050

060

0

Points in 1000s

Use

d P

oint

s

allbatchstream

0 200 400 600 800 1000

0.0

0.5

1.0

1.5

2.0

Points in 1000s

Opt

. Gap

[%]

batchstream

Hahsler & Park (SMU) Sequential AID INFORMS16 19 / 23

Page 36: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Difficult Data Set

1 million random data points, 10 dimensions

True βi, i = {1, 2, . . . , 10}, is randomly chosen from {−5, 5}.xi ∼ N(µi, σi) is a randomly generated feature where µi is uniformly chosenfrom [−5, 5] and σi is chosen from [1, 3].

y =

10∑i=1

βixi + ε

with ε ∼ N(0, .2).

Hahsler & Park (SMU) Sequential AID INFORMS16 20 / 23

Page 37: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Difficult Data Set

n = 1 million pointsbatch size b = 500support points s = 200

0 50 100 150 200

010

020

030

040

050

060

0

Points in 1000s

Use

d P

oint

s

allbatchstream

0 50 100 150 200

02

46

810

Points in 1000s

time

[s]

allbatchstream

Hahsler & Park (SMU) Sequential AID INFORMS16 21 / 23

Page 38: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Difficult Data Set

n = 1 million pointsbatch size b = 500support points s = 200

0 50 100 150 200

010

020

030

040

050

060

0

Points in 1000s

Use

d P

oint

s

allbatchstream

0 50 100 150 200

01

23

45

Points in 1000s

Opt

. Gap

[%]

batchstream

Hahsler & Park (SMU) Sequential AID INFORMS16 22 / 23

Page 39: Sequential Aggregation-disaggregation Optimization Methods ... · Iterative aggregation-disaggregation schemes have been shown to be e ective for large data (Rogers et al, 1991; Park

Conclusion and Future Work

Advantages:

Partial aggregation anticipates future disaggregation needs.

Partial aggregation is appropriate for data streams and leverages researchfrom data stream clustering.

Partial aggregation can help to improve quality over simple batch processing.

Future Work:

Test different strategies to select which points should not be aggregated.

Perform a comprehensive study.

Apply the idea to other optimization problems (SVM, etc.).

Hahsler & Park (SMU) Sequential AID INFORMS16 23 / 23