1
University of Southern California Center for Energy Informatics USC Center for Energy Informatics cei.usc.edu University of Southern California Center for Energy Informatics Machine Learning Approach An informatics approach for forecasting electricity demand Use direct and indirect indicators of power usage Use scalable Machine Learning for data intensive workloads Use Hadoop Map-Reduce on Clouds platforms for performance, scalability and reliability Experiment on the USC campus microgrid testbed Saima Aman, Wei Yin, Yogesh Simmhan, and Viktor Prasanna University of Southern California, Los Angeles, CA 300 350 400 450 500 550 600 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Energy Use (KWh) Thousands Day of the Year Observed Daily Energy Use for 2008 and 2009 used for Training the Model 2008 Energy Use (Observed) 2009 Energy Use (Observed) 300 350 400 450 500 550 600 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Energy Use (KWh) Thousands Day of the Year Daily Energy Use for 2010 - Observed and Predicted 2010 Energy Use (Observed) 2010 Energy Use (Predicted) Regression Tree Machine Learnt Models Use CV-RMSE to compare model accuracy – root mean square error between observed and predicted normalized to the mean of observed Ongoing Work Dynamic feature selection Moving windows of predictions Data feeds from information repository Effect of data granularity on prediction Multiple models for buildings/consumers From Medium term » Short term Models Features used in models were evaluated for their impact on prediction Scalable Forecasting AMI’s collect TB of data from millions of customers @ 15min Several forecasting models New data arrives constantly Models operate on large data sets Performance Limitations of Centralized Machine Learning 0 1 2 3 4 5 6 7 8 9 0 10000 20000 30000 40000 50000 60000 Memory Usage (GB) Building × Year Memory Usage(GB) 0 5000 10000 15000 20000 25000 30000 35000 40000 0 10000 20000 30000 40000 50000 60000 70000 Running Time (sec) Building × Year Running Time(sec) For 15-min interval data, we need 32 days & 700GB memory to train the model Solution Parallel processing on Cluster/ Cloud using Hadoop Map-Reduce Benefits Divide and conquer: Scalable as the training data size increases High Performance, Lower time for data intensive modeling Reliable operation for continuous execution PLANET Scalable Learning Algorithm Master worker worker worker worker worker worker Control- ler MR_Initia lization MR_Expan dNode MR_InMem oryGrow Model File Real energy consumption Predict energy consumption Ongoing work Migrate prototype to Eucalyptus Automatic feature selection Software infrastructure for launching ensemble models Google’s Regression Tree Algorithm Combine MR+Weka Master Latest model structure Scan and Analyze data Model Used CV-RMSE Annual Mean 11.32% Day of Week Mean 14.39% Day of Year Mean 12.62% Regression Tree 7.45%% Campus-level Daily Model Model Used CV-RMSE Annual 15-min Mean 17.37% Time of Week Mean 16.00% Time of Year Mean 15.07% Regression Tree 13.70% Campus-level 15-MinModel Baseline Models Baseline Models Campus & Building- level features Building-level features Weekday Semester Holiday Max temp Avg. temp Gross Building Area Net Area in Use Year of construction Type of building Demand Forecasting using Machine Learning Improving Energy Use Forecast for Campus Micro-grids using Indirect Indicators, Saima Aman, Yogesh Simmhan and Viktor K. Prasanna, DDDM, 2011 Acknowledgement This material is based upon work supported by the Department of Energy under Award Number DE-OE0000192. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Saima Aman, Wei Yin, Yogesh Simmhan, and Viktor Prasannasaimacs.github.io/pubs/2011-socalsgs-poster.pdf · Yogesh Simmhan and Viktor K. Prasanna, DDDM, 2011 Acknowledgement This material

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Saima Aman, Wei Yin, Yogesh Simmhan, and Viktor Prasannasaimacs.github.io/pubs/2011-socalsgs-poster.pdf · Yogesh Simmhan and Viktor K. Prasanna, DDDM, 2011 Acknowledgement This material

University of Southern California

Center for Energy Informatics

USC Center for Energy Informatics cei.usc.edu

University of Southern California

Center for Energy Informatics

Machine Learning Approach • An informatics approach for forecasting electricity demand • Use direct and indirect indicators of power usage • Use scalable Machine Learning for data intensive workloads • Use Hadoop Map-Reduce on Clouds platforms for performance,

scalability and reliability • Experiment on the USC campus microgrid testbed

Saima Aman, Wei Yin, Yogesh Simmhan, and Viktor Prasanna University of Southern California, Los Angeles, CA

300

350

400

450

500

550

600

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Ener

gy U

se (K

Wh)

Thou

sand

s

Day of the Year

Observed Daily Energy Use for 2008 and 2009 used for Training the Model

2008 Energy Use (Observed)

2009 Energy Use (Observed)

300

350

400

450

500

550

600

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Ener

gy U

se (K

Wh)

Thou

sand

s

Day of the Year

Daily Energy Use for 2010 - Observed and Predicted

2010 Energy Use (Observed)

2010 Energy Use (Predicted)

Regression Tree Machine Learnt Models • Use CV-RMSE to compare model accuracy – root mean square error

between observed and predicted normalized to the mean of observed

Ongoing Work • Dynamic feature selection • Moving windows of predictions • Data feeds from information repository • Effect of data granularity on prediction • Multiple models for buildings/consumers • From Medium term » Short term Models

Features used in models were evaluated for their impact on prediction

Scalable Forecasting • AMI’s collect TB of data from

millions of customers @ 15min • Several forecasting models • New data arrives constantly • Models operate on large data

sets

Performance Limitations of Centralized Machine Learning

0

1

2

3

4

5

6

7

8

9

0 10000 20000 30000 40000 50000 60000

Mem

ory

Usa

ge

(GB

)

Building × Year

Memory Usage(GB)

0

5000

10000

15000

20000

25000

30000

35000

40000

0 10000 20000 30000 40000 50000 60000 70000

Runnin

g T

ime

(sec

)

Building × Year

Running Time(sec)

For 15-min interval data, we need 32 days & 700GB memory to train the model

Solution • Parallel processing on Cluster/

Cloud using Hadoop Map-Reduce Benefits • Divide and conquer: Scalable as

the training data size increases • High Performance, Lower time for

data intensive modeling • Reliable operation for continuous

execution

PLANET Scalable Learning Algorithm

Master

worker

worker

worker

worker

worker

worker

Control-ler

MR_Initialization

MR_ExpandNode

MR_InMemoryGrow

Model File

Real energy consumption

Predict energy consumption

Ongoing work • Migrate prototype to Eucalyptus • Automatic feature selection • Software infrastructure for

launching ensemble models

• Google’s Regression Tree Algorithm

• Combine MR+Weka

Master

Latest model structure

Scan and Analyze data

Model Used CV-RMSE Annual Mean 11.32%

Day of Week Mean 14.39% Day of Year Mean 12.62% Regression Tree 7.45%%

Campus-level Daily Model

Model Used CV-RMSE Annual 15-min Mean 17.37% Time of Week Mean 16.00% Time of Year Mean 15.07%

Regression Tree 13.70%

Campus-level 15-MinModel

Baseline Models

Baseline Models

Campus & Building-level features

Building-level features

Weekday Semester Holiday

Max temp Avg. temp

Gross Building Area Net Area in Use

Year of construction Type of building

Demand Forecasting using Machine Learning†

†Improving Energy Use Forecast for Campus Micro-grids using Indirect Indicators, Saima Aman, Yogesh Simmhan and Viktor K. Prasanna, DDDM, 2011

Acknowledgement This material is based upon work supported by the Department of Energy under Award Number DE-OE0000192. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.