Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison Su, Jun Shinmada 08/04/2018

Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Learning, selecting, and control in residential demand response for grid reliability

Yingying Li, Qinran Hu, Na Li Alison Su, Jun Shinmada


Page 2: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

DR Capacity for 2015/2016 in PJM

Source: PJM Interconnection, “Demand response strategy,” Tech. Rep., 2017.

Residential demand consists the largest share. It is underutilized in demand response (DR)



Source: U.S. Energy Information Administration (EIA),

“Annual Electric Power Industry Report”.



Commercial and Other




Page 3: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Previous pilots

▪ LocationArizona, California, Colorado, Hawaii,

Massachusetts, Michigan, New Mexico,

New York, Texas, Utah, Virginia, etc

▪ DeviceNon-intrusive, Switch, ThinkEco, Ecobee, NEST, etc.

▪ RewardTime of use, Real time pricing, coupon,

Discounted bill, Gift card, Check,

Raffle, Other recognition, etc.

▪ ControlDirect load control,

Voluntary, etc.


▪ Money Incentive is low

▪ Users quit if being pushed

too hard

▪ User uncertainty is high

and unknown

▪ Users prefer simple DR


Page 4: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

1. A pilot study: Learning the customers 2. Real-time learning/decision making

This Talk: Learning for Residential DR

Data Model Decision Data Model Decision

Learn user behaviorSelect the “right” usersSet the “right” control actions

Reliable aggregated DRe.g.

TargetIndividual reduction

Page 5: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

The pilot study by ThinkEco, Inc

▪ Tech: SmartAC kits for window ACs, app/web control

▪ Data Resolution: 1 minute

▪ Time: Jun-Sep 2015 to now

▪ DR: multiple events in the afternoon/evening during the entire summer

two control types: i) reset temp. target ii) cycling rate

*opt-out option available

▪ Incentives: A simple example: $5 for setup, $15 for device staying online

Other types exist as well

Page 6: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Data of the pilot

▪ For each AC units (minute level)

▪ For each DR event (four events)

DR Event

Target Temperature

Room Temperature

Energy Consumption

Page 7: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

DR events


Targets of the pilot

Can we improve DR performance

by analyzing data?

Data Model Decision



Green curve is a baseline using the full summer’s dataset (KEMA)

Page 8: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

What can we learn?

Now Past



Room Temperature

✓AC operation pattern

✓House thermal model

✓Temperature preference


✓Opt-out rate

✓User type (classification)


Page 9: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison




Bernoulli Distribution

AC#1’s power consumption


+ =

AC operation pattern follows Bernoulli distribution



Fan Compressor

Histogram of Aggregated

power consumption of ACs

Page 10: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Learn house thermal model with regression


= 𝑎 ∙ 𝜃𝐴 − 𝜃𝑟𝑜𝑜𝑚 + 𝑢𝑘𝑄 + 𝑤𝑘

▪ Apply linear regression

Heat loss rate cooling effect

Use thermal model to design better AC control

For example, smooth response

+7F Offset

Page 11: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Predicting Opt Out

Data Model Decision

- Users’ occupancy rate

- Decision in previous DR events

- Temperature preference

- Ambient temperature

* Recent day and same weekday effect

▪ Raw Data → Inputs:

▪ Output:

- Opt-out probability

ANN Prediction Model

~ 85% detection rate for opt-out

Page 12: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

1. A pilot study: Learning the customers 2. Real-time learning/decision making

This Talk: Learning for Residential DR

Data Model Decision Data Model Decision

Multi-armed bandit learning

algorithm in aggregating demands

Page 13: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Introduction to multi-armed bandit (MAB) problem

Ex. Slot machine

MAB is about Exploration vs Exploitation

Demand Response

Select the top K arms

to maximize the expected reward

Select a number of customers

to maximize reliability (minimize variance)

Page 14: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

A MAB model for reliability: Nonmonotone objective function

Ramping Support Ancillary Service

Page 15: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

A MAB model for reliability in DR (simplified)

➢ A set of customers S

➢ Each customer i reduces one unit of load with probability pi (Bernoulli Distribution)

➢ A target total reduction Dt at time step t

➢ Objective: choose a subset St of customers to minimize reliability cost

➢ Performance analysis:

Regret: = Online reliability cost – offline optimal reliability cost


𝑃1 𝑃2 𝑃3 𝑃4

1 2

Page 16: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison


100 users, D=35

100 users, 𝐷𝑡 ∈ [10, 30]

Yingying Li, Qinran Hu, Na Li, "Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach", Control and Decision Conference, 2018

Thm: Our DR algorithm CUCB-Avg achieves log(T) regrets

where T is the number of total DR events

Page 17: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit

This Talk: Learning for Residential DR

Ongoing work with ThinkEco:

Residential DR field study in New York City with 40K+ AC devices this summer

Human-machine interaction; Engineering-Learning integration

Data Model Decision Data Model Decision


DR pilots have huge amount of valuable data.

Examples showed learning techniques are helpful for DRs.

Page 18: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison



Page 19: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison


Offline optimal selection algorithm

1 2

43𝑃1 𝑃2 𝑃3 𝑃4

1 2


the number



If we know pi for all i

𝑃1 𝑃2 𝑃3 𝑃4


Page 20: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison


Offline optimal selection algorithm

1 2

43𝑃1 𝑃2 𝑃3 𝑃4

1 2

Theorem (Li, Hu, Li, 2018): Algorithm 1 produces an

optimal output for the offline optimization problem.


the number


𝑃1 𝑃2 𝑃3 𝑃4


Page 21: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Online Algorithm

Define: sample average

Number of


Greedy Algorithm: Use the sample average to run the offline optimal algorithm.

Events when

arm i is selected


Too Much Exploitation

Not enough Exploration

Page 22: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Online Algorithm: UCB: Upper Confidence Bound (Auer et al. 2002)

Define: Upper Confidence Bound

UCB Algorithm: Use the UCB to run the offline optimal algorithm.

Exploitation Exploration

➢Popular algorithm for K-arm Maximization MAB with log(T) regret➢Performs poorly in our reliability problem:

➢Tends to select less arms → larger variance and less exploration

Page 23: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

The Proposed Online Algorithm: UCB-Average (Li, Hu, Li, 2018)

Define: Upper Confidence Bound

UCB-Average Algorithm:

➢ Use the UCB to rank the arms

➢ Use the sample average to determine the number K of selected arms

Exploitation Exploration

Page 24: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Regret Analysis of the Algorithm (Li, Hu, Li, 2018)


Online cost Offline optimal cost

Page 25: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Regret Analysis of the Algorithm (Li, Hu, Li, 2018)

Time varying



Page 26: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

Interpretation of the regret



Page 27: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison


Proof Sketch

Part I Part II Part III Part IV

Proof Sketch:

I: Initial time step

II: Sample Average (Estimation) is far away from true value

III: Select an arm that is currently under explored

IV: The other events



Page 28: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

➢Use the historical data as prior information

➢ Introduce heterogeneity and sub-arms in modeling the arms

➢Reduce the uncertainty by taking side information (Contextual Bandits)

➢Group arms to super-arms based these inputs

➢Use mechanism design/prices/rewards to ``influence’’ users


1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit

Future work:

Data Model Decision Data Model Decision

Page 29: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison


1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit

➢Use the historical data as prior information

➢ Introduce heterogeneity and sub-arms in modeling the arms

➢Reduce the uncertainty by taking side information (Contextual Bandits)

➢Group arms to super-arms based these inputs

➢Use mechanism design/prices/rewards to ``influence’’ users


Data Model Decision Data Model Decision

Page 30: Learning, selecting, and control in residential demand ... · Learning, selecting, and control in residential demand response for grid reliability Yingying Li, Qinran Hu, Na Li Alison

DR events


Higher, predictable demand reductions

Lower opt out rates/ customer fatigue

Targets of the pilot

Can we improve DR performance

by analyzing data?

Data Model Decision