Upload
david-stier
View
33
Download
2
Embed Size (px)
Citation preview
Leverage cities’ open data
and apply machine learning
To show who’s got solar
To predict who wants solar
THE GOAL
Level the playing field for installers and consumers
Reduce the cost of selling solar
Improve ease of selecting solar
Encourage cities to provide open data
BENEFITS
Using publicly available informationhow well can we model
the adoption of solar panel systems among residential home owners ?
THE PROJ ECT
HYPOTHESI S
Machine learning applied to parcel level data will generate significantly greater profit than applying a ‘naïve’ approach
NREL SEEDS program has funded over a dozen studies to understand who will adopt residential solar
To date, no study has used parcel-level information as main data source
BACKGROUND
HELLOI’mBoston
Study focused on solar panel use in Boston:
• 48 square miles | 660k+ population• Largest city in New England | 23rd largest city in US• Significant # of green initiatives at local level and the state
has been active sponsor of solar
2 0 1 5 R E S I D E N T I A L S O L A RM a r k e t s i z e ( i n s t a l l a t i o n s )
107,000United States
8,700MA
650Boston
400Owner Occupied
Single Family home
BACKGROUND
25
142
177
295
438
16
94
72
97
216
0
50
100
150
200
250
300
350
400
450
500
2011 2012 2013 2014 2015
BACKGROUND
Boston solar panel installations
2011 - 2015
Single Family Two-Three Family
DP04 Selected Housing Characteristics 570
fields
B25096 Mortgage Status By Value 37 fields
B25093 AGE OF HOUSEHOLDER BY SELECTED MONTHLY OWNER COSTS AS A %OF HOUSEHOLD INCOME IN PAST 12 MONTH 63 fields
B01001 Population by Age 103 fields
B5001 Citizenship Status 17 fields
B19049 MEDIAN HOUSEHOLD INCOME IN PAST 12 MONTHS (INFLATION-ADJUSTED) BY AGE OF HOUSEHOLDER 15 fields
B25040 House Heating Fuel 25 fields
DATASETS CHOSEN
US CENSUS TRACT LEVEL DATA ON HOUSING AND POPULATIONTotal of 7 tables with 830 fields
B Y T H E N U M B E R S
6datasets / 1.4M
records / 4kinstallations / 6
Year period
Final cleaned dataset:25,303 x 1,628
BY THE NUMBERS
GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA
STREETS ARE THE KEY
- S T R E E T N A M E A N D T Y P E
GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA
STREETS ARE THE KEY
Example: Number of parcels per block
- S T R E E T N A M E A N D T Y P E
- H O U S E N U M B E R
A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s
GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA
STREETS ARE THE KEY
Example: Half mile grid based on Lat/Lon
- S T R E E T N A M E A N D T Y P E
- H O U S E N U M B E R
A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s
- G R I D B A S E D G R O U P I N G
C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s
GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA
STREETS ARE THE KEY
2010 US Census Tracts
- S T R E E T N A M E A N D T Y P E
- H O U S E N U M B E R
A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s
- G R I D B A S E D G R O U P I N G
C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s
- S O C I A L / P O L I T I C A L G R O U P I N GC e n s u s t r a c t ( F e d e r a l )C e n s u s t r a c t ( F e d e r a l )N e i g h b o r h o o d ( M u n i c i p a l )
- S T R E E T N A M E A N D T Y P E
GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA
STREETS ARE THE KEY
- H O U S E N U M B E R
A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s
- G R I D B A S E D G R O U P I N G
C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s
- S O C I A L / P O L I T I C A L G R O U P I N GC e n s u s t r a c t ( F e d e r a l )N e i g h b o r h o o d ( M u n i c i p a l )
5,748 ¼ mile combos5,369 address blocks3,453 streets761 ½ mile grids165 census tracts16 neighborhoods
}
Feature-level contribution to classification pathTop 100 Features – grouped by source, type & aggregation level
‘Raw’ Engineered
Data SourceParcel
level
Address Block
(eg 0-100 Main)
Parcel level
1/4 Mile
Census tract
Neighborhood
1/2 Mile Total
permit 0.137 0.013 0.048 0.002 0.010 0.210
assessor 0.052 0.045 0.030 0.022 0.016 0.031 0.196
311 0.011 0.017 0.056 0.016 0.008 0.004 0.111
Assessor webscrape 0.010 0.003 0.013
Top 100 0.073 0.201 0.099 0.087 0.026 0.035 0.010 0.530
FeaturesthatIthoughtmighthavemoreimpact….- Heattype:145th foraddressblockaverage- Rooftype:411th
FIND PROFIT INYOUR CONFUSION MATRIX
B U Y E R S
N O S A L EG L A D I
P I T C H E DT H E M
A L L TA L KN O S A L E
Inaccuracy =
opportunity
Machine Learning Naïve Method
Prediction Threshold Sales # calls
Cumul. Cost # calls Cumul. Cost
15% 85 1,181 $10,039 4,285 $36,423
10% 123 1,811 $15,394 7,856 $66,776
6.75% 146 2,421 $20,579 11,843 $100,666
5% 163 2,880 $24,480 13,749 $116,867
SAVE $90K WITH MACHINE LEARNING
Assume direct sales cold call is $8.50/contact (40 houses/hour | 5% at home | $17/hour)
$92,387savings
PROOF OF CONCEPT
whowantssolar.com
Show solar installers which homeowners are most likely to adopt solar in there areaProvide the methodology to run this analysis in new markets
Beta map showing Boston results
PROOF OF CONCEPT
whosgotsolar.com
Most people considering solar want to talk to someone in their neighborhood who has already installed solar.
whosgotsolar.com shows homeowners the names & addresses of nearby neighbors who have installed solar
‘shovel ready’ cities:San Francisco, Seattle, LA County, Austin, Minnesota, Washington DC, New York City