CAN I SOLAR?HELPING YOU DECIDE IF SOLAR POWER IS RIGHT FOR YOU
Gabriel J. Michael
MOTIVATIONβ’ Residential solar sector grew 51% from
2013 to 2014
β’ Projected market value of $3.7 billion in 2015
β’ Complex decision with many variables
β’ Homeowners want to know:
β’ How much money can I save?
β’ When will I break even?
CAN I SOLAR?A DATA-DRIVEN WEB APPLICATION
http://www.canisolar.com
MODELING INSTALLATION COSTSβ’ Data on 400,000 installs obtained from
National Renewable Energy Laboratory
β’ Cost of solar installations varies by:
β’ size of the array
β’ year of installation
β’ location of installation
β’ Multiple linear regression provides good fit and is easily interpretable
β’ Also tried multilevel modeling and random forest regression
MODELING FUTURE ELECTRICITY PRICESβ’ 15 years of monthly historical electricity prices by state obtained from Energy
Information Administration
β’ Prices and trends vary significantly by state, so no one model works best for all states
β’ Developed a pipeline to automatically test, validate, and select an appropriate time-series model for each state, e.g.:
β’ linear
β’ ARIMA
β’ exponential smoothing
WHERE CAN I SOLAR?
WHERE CAN I SOLAR?
WHERE CAN I SOLAR?
GABRIEL J. MICHAELβ’ Ph.D., Political Science, George Washington
University
β’ Used survival regression to model countries' adoption of intellectual property laws
β’ Postdoc, Yale Law School
β’ Used NLP with SVMs to classify tweets and regulatory comments on political topics
Exploring the since-demolished PEPCO Benning Generating Station, Washington, DC
Urban explorer, electronics hobbyist
Visualization of Twitter users' connections and sentiment about net neutrality
MODELS OF INSTALLATION COSTS
Simple Linear Regression
Multiple Linear Regression
Multilevel Model
Random Forest Regression
Model Form log(cost) ~ log(size_kw)
log(cost) ~ log(size_kw) + state
+ year
log(cost) ~ log(size_kw) +
(log(size_kw) | state/year_installed)
log(cost) ~ log(size_kw)
Notes easy to interpret and explain
confidence and prediction intervals for multilevel models are difficult to interpret
scikit-learn's random forest regressor doesn't support factors, and the R packages are too slow
R2 or Pseudo R2 0.81 0.89 0.89 0.93
10-fold CV MSE 0.089 0.053 0.050 0.050
Per-capita electricity consumption has flattened and even declined in recent years
United States: kWh per capita
0
4000
8000
12000
16000
1960 1963 1966 1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011
β’ Industry standard warranties offer guaranteed 90% output at 10 years, 80% output at 25 years
β’ I use a simple exponential decay curve to calculate performance in month 0 to month 360 (30 years)
PHOTOVOLTAIC PERFORMANCE DECLINE OVER TIME
0 5 10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
Performance = e^(β0.005322 + β0.008935 * Years)
YearPerformance
WITHIN VS BETWEEN GROUP VARIANCE IN ELECTRICITY PRICES
ββββββββββ
β
ββββββββββββββββββββββββ
ββββββ
βββ
βββββββββββ
ββ
βββ
ββ
βββ
βββββ ββββββ
β
βββ
ββββββββββββββββ
ββ
ββ
β
ββ
ββ
β
β
β
ββββ
β
ββ
β
βββ
β
β
β
ββ
β
ββ
ββ
ββββ
β
β
βββ
βββ
ββ
ββββββββ
β
β
β
βββ
β
β
ββ
ββββββββββββββββββββββ
ββ
βββ
β
ββββββββββββββββββββββββββββ
β
βββββββ
ββββββββββββββββββββββββββββββββ
β
β
β
ββ
ββββββββ
β
β
ββ
ββββ
β
ββ
β
βββ
ββ
ββββ β
β
β
β
β
βββ
β
ββ
βββββββββββββββ
βββ
βββ
ββ
βββ
β
β
ββββββ
β
β
β
βββββββ
βββββββββ
β
βββββββββ
ββββββββββ
ββ
βββββββββββ
β
ββββββββββ
ββββββββββββ
β
ββββββββββ
β
β
βββ
ββ
β
βββ
ββ
ββββββββββ
β
βββββββββ
β
ββββ
βββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββ
β
ββββ
β
β
ββ
ββββ
β
ββββ
β
βββββββ
β
ββ
βββββ
ββ
β
β
ββββ
β
β
βββ
βββ
ββββββ
ββββββββββββ
βββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
β
β
ββ
β
β
β
β
β
ββ
β
ββ
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
ββ
ββ
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
ββ
ββββ
β
ββ
ββ
β
ββ
β
ββ
β
ββ
β
β
β
β
β
β
β
β
β
ββ
β
ββ
β
ββ
β
ββ
β
β
β
β
β
β
β
β
β
β
β
ββ
ββ
β
β
ββ
β
ββ
β
β
β
ββ
β
ββ
β
β
β
β
βββββββββββββββββββββ
β
β
ββ
β
ββββ
β
ββ
βββββββββββββββββββ
β
β
β
βββββββ
βββββββ
β
β
β
β
ββ
βββββ
β
ββββ
ββ
β
ββββββββ
βββββββββββββ
ββββββββββββββββββββββββββ
ββ
ββ
ββββββββββββββββββββββββββββ
β
βββββ
ββββββββββββββββββββ
βββ
β
βββββ
β
β
β
β
ββ
ββββ
ββββ
β
β
β
ββ
ββ
ββββββββ
βββ
ββ
ββββββββββ
β
βββββββ
β
ββββββ
βββ
βββ
β
ββ
ββ
ββ
βββ
ββ
ββ
ββββ
β
β
β
β
β
β
β
ββ
β
β
β
ββββ
ββ
ββ
β
β
β
β
β
ββ
ββ
βββ
β
β
β
ββ
β
βββββββββββ
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
ββ
β
β
ββ
ββ
β
β
ββ
β
ββ
β
βββββ
ββββββββββββββββββββ
β
ββββββββββββ
β
βββ
β
β
β
ββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββ
β
β
β
β
ββββ
β
β
ββ
β
β
ββ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
β
ββ
ββ
ββ
ββββ
β
β
β
βββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
β
βββββββββββββββββ
β
βββββββββββ
β
β
β
ββ
β
β
β
βββββββ
ββββββββββββββββββββ
β
βββββββββββββββ
β
β
β
ββββ
ββ
β
β
β
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ
β
βββββ
ββ
β
β
β
ββββββββ
ββ
βββ
ββββ
βββββ
ββ
ββ
β
β
β
β
β
β
βββββ
ββ
β
βββ
β
β
ββ
β
βββ
β
ββ
ββββ
β
β
β
β
βββ
ββ
β
ββ
ββββ
β
ββ
β
β
ββ
β
β
β
β
β
β
βββββββ
β
βββββ
β
βββ
β
β
ββ
β
β
ββ
β
ββ
β
β
β
β
β
ββ
β
βββ
β
ββ
β
βββββββββββββββββ
β
βββββββ
β
βββ
β
β
ββββ
βββββββββββββββββ
β
ββ
β
βββββββ
ββ
βββββ
ββ
ββββββ
β
β
βββββββββββββββββββββββββββββββββ
β
βββββββββββββ
β
β
ββββ
ββββββββββ
β
β
ββ
β
β
ββ
β
ββ
βββββββββββββββββββββ
βββββββββββββββββββββββββββββββββ
β
βββββββββ
βββ
βββ
β
β
ββ
ββββββββββββββββββββββββββββ
β
ββ
β
ββ
ββ
βββ
ββββ
βββββ
β
β
β
β
β
ββ
β
βββ
β
β
ββ
β
β
β
ββ
β
ββββββββββββββββββββ
β
βββ
β
β
β
βββ
β
ββββ
β
βββββββ
β
ββ
β
β
β
βββββ
β
ββββββββββ
β
βββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββ
β
ββββββ
β
ββ
β
ββββββββββββββββββ
β
ββ
β
β
β
βββββββββββ
β
β
β
βββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
ββ
β
β
β
β
βββ
ββββ
β
βββ
β
βββββ
βββ
ββββββ
βββββ
ββββββββββββββββββββββββββββββ
β
βββββ
βββββ
β
β
β
βββ
β
β
β
βββββββ
β
ββ
β
ββ
β
ββ
ββ
β
β
β
β
β
β
β
β
β
β
βββ
β
βββββ
β
βββββββ
ββββ
β
βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββ
ββββ
β
βββββββββββββββββββ
βββββ
β
β
β
ββββββ
β
β
ββ
βββ
β
ββ
β
β
ββββββββ
β
ββ
β
ββββ
ββββ
ββββ
β
ββ
βββ
ββββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
ββ
β
β
β
ββ
β
β
ββ
β
ββ
β
β
β
β
ββ
β
βββ
β
β
β
ββββ
β
ββ
β
β
ββ
ββ
β
βββββββ
βββββ
β
ββ
ββ
ββ
β
ββ
β
ββ
ββββ
ββ
βββββββββββββββββββββββ
ββ
β
βββ
β
ββββββ
ββ
β
β
βββββββββ
β
ββ
β
β
ββ
β
β
βββ
ββββ
β
ββββββββββ
ββ
β
ββ
ββ
ββ
βββββββββββββββββββββββββββ
β
ββββ
β
ββ
ββ
β
βββ
0
10
20
30
AK AL AR AZ CACOCT DCDE FL GA HI IA ID IL IN KS KY LA MAMDME MI MNMOMSMTNCNDNENH NJ NMNV NYOHOKOR PA PR RI SC SD TN TX UT VA VTWA WIWVWYState
Cen
ts p
er k
Wh
Residential Electricity Prices by State
There is more variance between states than within states
WITHIN VS BETWEEN GROUP VARIANCE IN INSTALLATION COSTS (3 - 5 KW)
βββ β
β
β
ββ
ββ
β
β
β
β
ββ
β
β
ββ
βββ
β
β
βββ
β
ββ
β
ββ
β
ββ
ββββββββ
β
β
β
βββββ
β
βββ
β
β
β
ββ
β
β
β
β
ββ
β
βββββββ
β
βββ
ββ
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββββ
ββ
ββ
ββββ
ββββ
βββ
ββ
ββ
ββ
ββ
ββ
ββ
ββββ
ββ
β
ββ
βββ
β
ββ
β
β
β
β
β
β
β
ββ
β
β
β
ββ
β
β
ββββββββ
βββββββ
ββ
βββ
ββββ
βββ
β
β
βββ
β
β
β
β
βββ
β
ββ
ββ
βββββ
β
ββββ
ββ
β
β
ββ
ββ
β
β
ββ
ββ
β
ββ
ββ
β
β
β
β
ββ
βββ
ββ
ββββ
ββββββ
β
β
ββ
β
β
β
β
ββ
β
β
β
ββ
β
ββ
β
ββ
ββ
β
β
ββ
β
ββ
β
β
β
ββ
β
ββ
βββ
β
β
β
ββββββββ
β
β
ββ
β
ββββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
ββ
β
βββ
β
β
β
β
β
β
ββ
β
βββ
β
β
β
ββ
ββ
β
β
β
β
ββββ
β
β
β
β
β
β
β
ββ
β
β
β
β
ββ
β
β
β
ββ
β
ββ
β
ββββββ
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββββ
β
β
β
β
β
ββ
β
ββ
β
β
ββ
β
ββ
β
ββ
β
β
β
βββ
β
ββ
β
ββ
β
ββββ
β
β
β
β
β
ββ
β
ββ
β
β
ββ
β
ββ
β
β
β
β
β
β
β
ββ
βββ
ββ
β
ββ
β
β
β
β
ββββ
β
β
ββββ
ββ
β
β
ββββ
ββ
β
β
β
β
β
ββ
ββ
ββ
β
βββ
β
ββ
β
β
β
β
β
β
β
ββ
ββ
ββ
β
β
β
β
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
ββ
β
β
β
βββ
β
βββ
β
β
β
β
β
ββ
βββ
β
β
β
β
β
β
βββ
β
β
β
β
β
β
βββ
βββ
β
β
β
β
β
β
β
β
β
ββ
β
βββ
β
β
β
βββ
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββ
β
β
β
βββ
ββ
β
β
βββ
β
β
β
β
β
βββ
β
β
β
ββ
β
β
ββββ
β
β
ββ
β
β
β
β
β
β
ββ
β
β
ββ
β
βββ
β
β
ββ
βββ
β
ββ
βββ
β
β
ββ
β
β
ββββ
ββ
β
β
β
β
β
βββ
β
ββ
βββ
β
β
β
β
ββ
ββ
β
β
β
β
β
β
β
β
β
ββββ
βββββ
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
βββββ
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββββ
β
β
β
βββ
β
β
β
β
βββ
ββ
β
β
ββ
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
ββ
βββββββ
β
ββ
β
β
ββ
β
βββ
β
ββ
ββ
β
β
β
β
β
β
β
β
ββ
β
β
β
β
ββ
βββ
β
ββ
β
β
β
β
β
β
ββ
βββ
β
β
β
β
β
β
ββββββ
ββ
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
ββ
βββ
β
β
β
β
β
βββ
β
β
ββββ
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββ
ββ
β
β
β
β
β
β
ββ
β
βββ
ββ
β
ββ
ββ
β
β
ββ
β
β
β
β
βββ
ββ
β
ββββ
β
β
ββ
ββ
β
βββ
β
β
β
β
ββββ
β
β
ββ
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββ
ββββ
β
βββ
β
β
ββ
β
ββββββββ
β
β
β
β
β
ββββββ
βββββββ
βββ
βββ
βββ
ββ
β
β
β
ββ
ββββ
β
βββ
β
ββ
β
β
β
β
ββ
β
βββ
β
ββ
β
β
β
β
β
β
β
β
ββ
β
β
ββββββ
ββ
β
β
β
ββ
β
β
ββ
β
β
β
βββ
β
ββ
β
β
β
ββ
β
β
βββ
β
ββ
βββ
β
β
βββ
βββ
β
β
ββ
β
β
ββ
β
ββ
β
βββββββββββββ
β
βββββββ
β
ββ
β
β
βββ
β
ββ
βββ
β
βββββββ
β
β
β
ββ
β
ββ
ββ
ββ
ββββ
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββββ
β
β
β
ββ
ββ
ββ
ββ
ββ
ββββ
ββ
β
β
β
ββ
βββ
β
ββ
ββββ
ββββββββ
ββ
β
β
β
ββββββ
β
β
β
ββ
ββ
ββ
β
β
β
β
ββββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββββ
β
β
β
β
ββ
β
ββ
ββββ
ββ
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
βββ
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
ββ
β
β
β
βββββ
β
β
β
β
βββ
β
βββββββ
β
β
β
β
β
β
β
βββ
ββ
β
ββββ
β
β
β
ββββ
β
ββββββββ
β
β
β
ββ
β
β
ββββ
β
β
βββββ
β
β
ββ
ββ
β
β
β
βββ
β
β
ββ
β
ββ
β
β
ββ
β
ββ
β
β
β
β
ββββ
β
ββ
ββ
β
β
β
ββ
β
ββ
β
β
ββ
β
β
ββ
β
β
βββββββ
β
β
β
β
ββ
β
β
βββ
β
β
ββ
β
βββββ
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
ββ
βββ
ββββββββ
βββββ
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
ββ
β
β
β
β
β
ββ
0
25000
50000
75000
100000
AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KY LA MAMDME MI MNMOMSMT NC NE NH NJ NMNV NY OHOKOR PA RI SC SD TN TX UT VA VT WA WI WVWYState
Inst
all C
ost (
$)
Costs of Solar Installations by State
Significant variance between states, but also within states
BACKENDβ’ Python 3 + pandas for core classes and program logic
β’ R for modeling + rpy2 Python interface to R
β’ MySQL for storage of electricity consumption and price data, and solar installation cost/size data
β’ MongoDB for storage and retrieval of geolocated insolation data
β’ Code on GitHub: https://github.com/langelgjm/canisolar
ASSUMPTIONS OF LINEAR REGRESSION
β’ Independence of errors
ASSUMPTIONS OF LINEAR REGRESSION
β’ Independence of errors
ASSUMPTIONS OF LINEAR REGRESSION
β’ Homoskedasticity (constant variance of errors)
β’ Some evidence of heteroskedasticity
β’ Could use robust standard errors for intervals, although the confidence intervals are not much wider
ASSUMPTIONS OF LINEAR REGRESSIONβ’ Normality of residuals
β’ Evidence of non-normal (heavy tailed) error distribution
β’ This assumption only necessary for confidence intervals/p-values, not best linear unbiased estimates
β’ Could use robust regression with t-distribution
ASSUMPTIONS OF LINEAR REGRESSION
β’ True linear relationship
β’ True with simple regression of cost ~ size
β’ No significant multicollinearity
β’ Variance inflation factors relatively low
TIME SERIES MODELINGβ’ No other predictors (time is the only variable)
β’ Strong a priori reason to believe most states will have an increasing, roughly linear trend in future electricity prices, often with seasonality
TIME SERIES MODELINGβ’ States vary significantly from one another in historical prices,
trends, and seasonality
β’ We cannot expect the same model to perform well for all states!
TIME SERIES MODELINGβ’ Automatic model fitting is a bad idea for long term forecasts
1. Create a handcrafted list of 7 possible models (1 linear, 4 ARIMA, and 2 exponential smoothing)
LONG TERM FORECASTING: A SOLUTION
Parameters Seasonal Parameters Note
Linear n/a n/a
ARIMA (1,0,0) None include drift
ARIMA (1,1,0) None include drift
ARIMA (1,0,0) (1,0,0)
ARIMA (1,0,0) (1,1,0)
Exponential Smoothing M M no damping
Exponential Smoothing A A no damping
2. Train each model on 1/3, 1/2, & 2/3 of historical data; test on the respective remaining proportion of historical data (2 models shown)
LONG TERM FORECASTING: A SOLUTION
3. Select the model with the lowest MSE across all tests
4. Repeat for every U.S. state + DC
5. Sanity check the resulting models
LONG TERM FORECASTING: A SOLUTION
Forecasts from ARIMA(1,0,0)(1,0,0)[12] with nonβzero mean
2000 2010 2020 2030 2040
1015
20
Forecasts from ETS(A,A,A)
2000 2010 2020 2030 2040
050
100
150
NH MS