25
Who’s Got Solar? David Stier Principal David Stier + Associates

Applying Machine Learning to Open Data Sets to find new customers

Embed Size (px)

Citation preview

Who’s Got Solar?

David Stier

Principal

David Stier + Associates

Leverage cities’ open data

and apply machine learning

To show who’s got solar

To predict who wants solar

THE GOAL

Level the playing field for installers and consumers

Reduce the cost of selling solar

Improve ease of selecting solar

Encourage cities to provide open data

BENEFITS

Using publicly available informationhow well can we model

the adoption of solar panel systems among residential home owners ?

THE PROJ ECT

HYPOTHESI S

Machine learning applied to parcel level data will generate significantly greater profit than applying a ‘naïve’ approach

NREL SEEDS program has funded over a dozen studies to understand who will adopt residential solar

To date, no study has used parcel-level information as main data source

BACKGROUND

HELLOI’mBoston

Study focused on solar panel use in Boston:

• 48 square miles | 660k+ population• Largest city in New England | 23rd largest city in US• Significant # of green initiatives at local level and the state

has been active sponsor of solar

2 0 1 5 R E S I D E N T I A L S O L A RM a r k e t s i z e ( i n s t a l l a t i o n s )

107,000United States

8,700MA

650Boston

400Owner Occupied

Single Family home

BACKGROUND

25

142

177

295

438

16

94

72

97

216

0

50

100

150

200

250

300

350

400

450

500

2011 2012 2013 2014 2015

BACKGROUND

Boston solar panel installations

2011 - 2015

Single Family Two-Three Family

DATASETS CHOSEN

DP04 Selected Housing Characteristics 570

fields

B25096 Mortgage Status By Value 37 fields

B25093 AGE OF HOUSEHOLDER BY SELECTED MONTHLY OWNER COSTS AS A %OF HOUSEHOLD INCOME IN PAST 12 MONTH 63 fields

B01001 Population by Age 103 fields

B5001 Citizenship Status 17 fields

B19049 MEDIAN HOUSEHOLD INCOME IN PAST 12 MONTHS (INFLATION-ADJUSTED) BY AGE OF HOUSEHOLDER 15 fields

B25040 House Heating Fuel 25 fields

DATASETS CHOSEN

US CENSUS TRACT LEVEL DATA ON HOUSING AND POPULATIONTotal of 7 tables with 830 fields

B Y T H E N U M B E R S

6datasets / 1.4M

records / 4kinstallations / 6

Year period

Final cleaned dataset:25,303 x 1,628

BY THE NUMBERS

GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA

STREETS ARE THE KEY

- S T R E E T N A M E A N D T Y P E

GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA

STREETS ARE THE KEY

Example: Number of parcels per block

- S T R E E T N A M E A N D T Y P E

- H O U S E N U M B E R

A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s

GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA

STREETS ARE THE KEY

Example: Half mile grid based on Lat/Lon

- S T R E E T N A M E A N D T Y P E

- H O U S E N U M B E R

A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s

- G R I D B A S E D G R O U P I N G

C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s

GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA

STREETS ARE THE KEY

2010 US Census Tracts

- S T R E E T N A M E A N D T Y P E

- H O U S E N U M B E R

A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s

- G R I D B A S E D G R O U P I N G

C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s

- S O C I A L / P O L I T I C A L G R O U P I N GC e n s u s t r a c t ( F e d e r a l )C e n s u s t r a c t ( F e d e r a l )N e i g h b o r h o o d ( M u n i c i p a l )

- S T R E E T N A M E A N D T Y P E

GEOGRAPHIC DATASTRUCTURING AND SUMMARIZING LOCATION DATA

STREETS ARE THE KEY

- H O U S E N U M B E R

A l t h o u g h ‘ o r d e r e d ’ s i g n i f i c a n tv a r i a t i o n s e x i s tS t i l l u s e f u l f o r 0 - 1 0 0 i n c r e m e n t s

- G R I D B A S E D G R O U P I N G

C r e a t e ½ m i l e s q . g r i d s ( s h o w n )a n d a l s o ¼ m i l e s u b g r i d s

- S O C I A L / P O L I T I C A L G R O U P I N GC e n s u s t r a c t ( F e d e r a l )N e i g h b o r h o o d ( M u n i c i p a l )

5,748 ¼ mile combos5,369 address blocks3,453 streets761 ½ mile grids165 census tracts16 neighborhoods

}

SORT ALL BY % OF ADDRESS BLOCK WHO HAVE SOLARTHEN SORT BY # OF PROSPECTS

NAÏVE MODEL

RANDOM FORESTSTRATIFIED SHUFFLE SPLIT

Holdout2016installations(n=219)Split50/50train/test

Feature-level contribution to classification pathTop 100 Features – grouped by source, type & aggregation level

‘Raw’ Engineered

Data SourceParcel

level

Address Block

(eg 0-100 Main)

Parcel level

1/4 Mile

Census tract

Neighborhood

1/2 Mile Total

permit 0.137 0.013 0.048 0.002 0.010 0.210

assessor 0.052 0.045 0.030 0.022 0.016 0.031 0.196

311 0.011 0.017 0.056 0.016 0.008 0.004 0.111

Assessor webscrape 0.010 0.003 0.013

Top 100 0.073 0.201 0.099 0.087 0.026 0.035 0.010 0.530

FeaturesthatIthoughtmighthavemoreimpact….- Heattype:145th foraddressblockaverage- Rooftype:411th

FIND PROFIT INYOUR CONFUSION MATRIX

B U Y E R S

N O S A L EG L A D I

P I T C H E DT H E M

A L L TA L KN O S A L E

Inaccuracy =

opportunity

Machine Learning Naïve Method

Prediction Threshold Sales # calls

Cumul. Cost # calls Cumul. Cost

15% 85 1,181 $10,039 4,285 $36,423

10% 123 1,811 $15,394 7,856 $66,776

6.75% 146 2,421 $20,579 11,843 $100,666

5% 163 2,880 $24,480 13,749 $116,867

SAVE $90K WITH MACHINE LEARNING

Assume direct sales cold call is $8.50/contact (40 houses/hour | 5% at home | $17/hour)

$92,387savings

PROOF OF CONCEPT

whowantssolar.com

Show solar installers which homeowners are most likely to adopt solar in there areaProvide the methodology to run this analysis in new markets

Beta map showing Boston results

PROOF OF CONCEPT

whosgotsolar.com

Most people considering solar want to talk to someone in their neighborhood who has already installed solar.

whosgotsolar.com shows homeowners the names & addresses of nearby neighbors who have installed solar

‘shovel ready’ cities:San Francisco, Seattle, LA County, Austin, Minnesota, Washington DC, New York City

Thank you!