Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Learning, selecting, and control in residential demand response for grid reliability
Yingying Li, Qinran Hu, Na Li Alison Su, Jun Shinmada
08/04/2018
DR Capacity for 2015/2016 in PJM
Source: PJM Interconnection, “Demand response strategy,” Tech. Rep., 2017.
Residential demand consists the largest share. It is underutilized in demand response (DR)
Residential
14%
Source: U.S. Energy Information Administration (EIA),
“Annual Electric Power Industry Report”.
Residential
38%
Commercial and Other
37%
Industrial
25%
Previous pilots
▪ LocationArizona, California, Colorado, Hawaii,
Massachusetts, Michigan, New Mexico,
New York, Texas, Utah, Virginia, etc
▪ DeviceNon-intrusive, Switch, ThinkEco, Ecobee, NEST, etc.
▪ RewardTime of use, Real time pricing, coupon,
Discounted bill, Gift card, Check,
Raffle, Other recognition, etc.
▪ ControlDirect load control,
Voluntary, etc.
Issues
▪ Money Incentive is low
▪ Users quit if being pushed
too hard
▪ User uncertainty is high
and unknown
▪ Users prefer simple DR
program
1. A pilot study: Learning the customers 2. Real-time learning/decision making
This Talk: Learning for Residential DR
Data Model Decision Data Model Decision
Learn user behaviorSelect the “right” usersSet the “right” control actions
Reliable aggregated DRe.g.
TargetIndividual reduction
The pilot study by ThinkEco, Inc
▪ Tech: SmartAC kits for window ACs, app/web control
▪ Data Resolution: 1 minute
▪ Time: Jun-Sep 2015 to now
▪ DR: multiple events in the afternoon/evening during the entire summer
two control types: i) reset temp. target ii) cycling rate
*opt-out option available
▪ Incentives: A simple example: $5 for setup, $15 for device staying online
Other types exist as well
Data of the pilot
▪ For each AC units (minute level)
▪ For each DR event (four events)
DR Event
Target Temperature
Room Temperature
Energy Consumption
DR events
2016
Targets of the pilot
Can we improve DR performance
by analyzing data?
Data Model Decision
Aggregated
behavior
Green curve is a baseline using the full summer’s dataset (KEMA)
What can we learn?
Now Past
Now
Past
Room Temperature
✓AC operation pattern
✓House thermal model
✓Temperature preference
✓Occupancy
✓Opt-out rate
✓User type (classification)
Etc…
Pro
babili
ty
Bernoulli Distribution
AC#1’s power consumption
(Histogram)
+ =
AC operation pattern follows Bernoulli distribution
Re
st
Fan Compressor
Histogram of Aggregated
power consumption of ACs
Learn house thermal model with regression
∆𝜃𝑘∆𝑡𝑘
= 𝑎 ∙ 𝜃𝐴 − 𝜃𝑟𝑜𝑜𝑚 + 𝑢𝑘𝑄 + 𝑤𝑘
▪ Apply linear regression
Heat loss rate cooling effect
Use thermal model to design better AC control
For example, smooth response
+7F Offset
Predicting Opt Out
Data Model Decision
- Users’ occupancy rate
- Decision in previous DR events
- Temperature preference
- Ambient temperature
* Recent day and same weekday effect
▪ Raw Data → Inputs:
▪ Output:
- Opt-out probability
ANN Prediction Model
~ 85% detection rate for opt-out
1. A pilot study: Learning the customers 2. Real-time learning/decision making
This Talk: Learning for Residential DR
Data Model Decision Data Model Decision
Multi-armed bandit learning
algorithm in aggregating demands
Introduction to multi-armed bandit (MAB) problem
Ex. Slot machine
MAB is about Exploration vs Exploitation
Demand Response
Select the top K arms
to maximize the expected reward
Select a number of customers
to maximize reliability (minimize variance)
A MAB model for reliability: Nonmonotone objective function
Ramping Support Ancillary Service
A MAB model for reliability in DR (simplified)
➢ A set of customers S
➢ Each customer i reduces one unit of load with probability pi (Bernoulli Distribution)
➢ A target total reduction Dt at time step t
➢ Objective: choose a subset St of customers to minimize reliability cost
➢ Performance analysis:
Regret: = Online reliability cost – offline optimal reliability cost
43
𝑃1 𝑃2 𝑃3 𝑃4
1 2
Results:
100 users, D=35
100 users, 𝐷𝑡 ∈ [10, 30]
Yingying Li, Qinran Hu, Na Li, "Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach", Control and Decision Conference, 2018
Thm: Our DR algorithm CUCB-Avg achieves log(T) regrets
where T is the number of total DR events
1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit
This Talk: Learning for Residential DR
Ongoing work with ThinkEco:
Residential DR field study in New York City with 40K+ AC devices this summer
Human-machine interaction; Engineering-Learning integration
Data Model Decision Data Model Decision
Conclusion:
DR pilots have huge amount of valuable data.
Examples showed learning techniques are helpful for DRs.
Backup
18
43
Offline optimal selection algorithm
1 2
43𝑃1 𝑃2 𝑃3 𝑃4
1 2
Determine
the number
k
Objective:
If we know pi for all i
𝑃1 𝑃2 𝑃3 𝑃4
Rank
43
Offline optimal selection algorithm
1 2
43𝑃1 𝑃2 𝑃3 𝑃4
1 2
Theorem (Li, Hu, Li, 2018): Algorithm 1 produces an
optimal output for the offline optimization problem.
Determine
the number
k
𝑃1 𝑃2 𝑃3 𝑃4
Rank
Online Algorithm
Define: sample average
Number of
selection
Greedy Algorithm: Use the sample average to run the offline optimal algorithm.
Events when
arm i is selected
Realization
Too Much Exploitation
Not enough Exploration
Online Algorithm: UCB: Upper Confidence Bound (Auer et al. 2002)
Define: Upper Confidence Bound
UCB Algorithm: Use the UCB to run the offline optimal algorithm.
Exploitation Exploration
➢Popular algorithm for K-arm Maximization MAB with log(T) regret➢Performs poorly in our reliability problem:
➢Tends to select less arms → larger variance and less exploration
The Proposed Online Algorithm: UCB-Average (Li, Hu, Li, 2018)
Define: Upper Confidence Bound
UCB-Average Algorithm:
➢ Use the UCB to rank the arms
➢ Use the sample average to determine the number K of selected arms
Exploitation Exploration
Regret Analysis of the Algorithm (Li, Hu, Li, 2018)
Define:
Online cost Offline optimal cost
Regret Analysis of the Algorithm (Li, Hu, Li, 2018)
Time varying
Stationary
Define:
Interpretation of the regret
Stationary
Define:
30
Proof Sketch
Part I Part II Part III Part IV
Proof Sketch:
I: Initial time step
II: Sample Average (Estimation) is far away from true value
III: Select an arm that is currently under explored
IV: The other events
Stationary
Define:
➢Use the historical data as prior information
➢ Introduce heterogeneity and sub-arms in modeling the arms
➢Reduce the uncertainty by taking side information (Contextual Bandits)
➢Group arms to super-arms based these inputs
➢Use mechanism design/prices/rewards to ``influence’’ users
➢…
1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit
Future work:
Data Model Decision Data Model Decision
32
1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit
➢Use the historical data as prior information
➢ Introduce heterogeneity and sub-arms in modeling the arms
➢Reduce the uncertainty by taking side information (Contextual Bandits)
➢Group arms to super-arms based these inputs
➢Use mechanism design/prices/rewards to ``influence’’ users
➢…
Data Model Decision Data Model Decision
DR events
2016
Higher, predictable demand reductions
Lower opt out rates/ customer fatigue
Targets of the pilot
Can we improve DR performance
by analyzing data?
Data Model Decision