Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
1
Data science and engineering for local weather forecasts
Nikhil R PodduturiData {Scientist, Engineer}
November, 2016
Agenda
● AboutMeteoGroup
● Introductiontoweatherdata
● Problemdescription
● Datascienceandweatherforecasting
● Engineering
● Verification
● Results
● Questions
3
4
Howmanyofyoucheckweatherforecasts frequently?
5
6
Weatherdata
1.5TB/day
7
8
Typesofdata
Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)
●MeteoGroupmeasurement network
9
Typesofdata
Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)
●MeteoGroupmeasurement network
Satellitedata
10
Typesofdata
Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)
●MeteoGroupmeasurement network
Satellitedata
Radardata
11
Typesofdata
Observations:● WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircrafts
etc)● MeteoGroupmeasurementnetwork
Satellite data
Radardata
Userdata
12
Typesofdata
Observations:● WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircrafts
etc)● MeteoGroupmeasurementnetwork
Satellite data
Radardata
Userdata
Numericalweatherpredictionmodeldata
13
Numericalweatherpredictionmodels
●Complex andMultidimensional data
14
Numericalweatherpredictionmodels
●Complexandmultidimensionaldata
● 5NWPmodels fromdifferentproviders
15
Numericalweatherpredictionmodels
●Complexandmultidimensionaldata
● 5NWPmodelsfromdifferentproviders
●Datasizeperday- 0.5TB
Datascienceandweatherforecasting
16
17
18
Outcome
● Took24hoursfor24hourforecasts
●Gridinterval- 736km
● Poorresults
MeteoGroupForecastingsystem
19
MeteoGroupforecastingsystem
20
Forecasts3 years of NWP data
3 years of observation
data
Daily NWP data
Machine learningmodel Trained
model
MeteoGroupforecastingsystem
Writteninpascal
21
MeteoGroupforecastingsystem
Written inpascal
Runsoninhousehighperformance computing cluster
22
MeteoGroupforecastingsystem
Written inpascal
Runsoninhousehighperformancecomputingcluster
Limitations●Hardtomaintain●Notverytransparent● Scalability
23
24
Problemdescription
Nextgenerationforecastingsystem
●Cloudbasedsolution
25
Nextgenerationforecastingsystem
●Cloudbasedsolution
● Transparent
26
Nextgenerationforecastingsystem
●Cloudbasedsolution
● Transparent
● Scalable
27
Nextgenerationforecastingsystem
●Cloudbasedsolution
● Transparent
● Scalable
● Improveforecastingaccuracy
28
29
Baselinemodel
NWP data Downscale to location Linear modelInterpolate
missing values
30
Baselinemodel
NWP data Downscale to location Linear modelInterpolate
missing values
Outcome:●Veryfast● Pooraccuracy●Multicollinearity
Iteration1
●Addressmulticollinearityusingfeatureselection● Scalethefeatures
31
NWP data Downscale to location Linear modelInterpolate
missing valuesFeature selection
Scale features
Iteration1
●Addressmulticollinearityusingfeatureselection● Scalethefeatures
32
NWP data Downscale to location Linear modelInterpolate
missing valuesFeature selection
Scale features
Outcome:● Improvedaccuracy
Iteration2
33
●Modelselectionbetween linearandnon-linearmodels●Advancedfeatureselection
NWP data Downscale to location
Model selection
(linear and non-linear models)
Interpolate missing values
Advance feature
selection
Scale features
Iteration2
34
●Modelselectionbetween linearandnon-linearmodels●Advancedfeatureselection
NWP data Downscale to location
Model selection
(linear and non-linear models)
Interpolate missing values
Advance feature
selection
Scale features
Outcome:●Onparwithexistingforecastingsystem● Slowtraining
Engineeringtoscaletheproduct
35
Baselinemodelengineering
36
(Scikit-learn, NumPy, Keras with TensorFlow)
Modelengineering
37
(Scikit-learn, NumPy, Keras with TensorFlow)
Good:● PythonMLecosystem● Familiarityamongtheteam● TestdrivenandAgileDevelopment● Failfast
Modelengineering
38
(Scikit-learn, NumPy, Keras with TensorFlow)
Good:● PythonMLecosystem● Familiarityamongtheteam● TestdrivenandAgileDevelopment● Failfast
Bad:● Notscalable
47000*15*360modelruns
39
Locations Weather attributese.g: temperature, wind etc
Hours
ScalingwithApacheAirflow
40
ApacheAirflow• ByAirBnB• Apacheproductsinceearly2016
DirectedAcyclicGraph(DAG)
Components• UI• Scheduler• Executor(s)
ApacheAirflowDAG
41
●Hooks(connections)
●Operators(tasks)
● Schedule
●Dependencies
AirflowandMesos
42
deploy
Mesos cluster
persist AWS S3
Airflow scheduler
AirflowandMesos
43
deploy
Mesos cluster
Persist AWS S3
Airflow scheduler
Cont Integ
Verification
44
45
Deploy DAG Verify model
Improve DAG
Modelimprovementcycle
Forecastverification
46
AWS S3 withmodels
Forecast Engine
JSON-LD
Verificationmetrics
47
●Meanabsoluteerror●Rootmeansquarederror●Meanerror●Heidkeskillscore● Equitablethreatscore● Probabilitydensity functions● Errorpercentiles
48
Mean absolute error for different models (Temperature)
49
Probability distribution function for multiple models (Temperature)
Percentile graphs for each model (Temperature)
FordemopleasestopbyMGbooth
51
52
Results
Cloudbasedsolution● AWSS3,EC2,ElastiCache
Transparent
Scalable
Improveforecastingaccuracy
53
Results
Cloudbasedsolution● AWSS3,EC2,ElastiCache
Transparent● Verificationmicroservice
Scalable
Improveforecastingaccuracy
54
Results
Cloudbasedsolution● AWSS3,EC2,ElastiCache
Transparent● Verificationmicroservice
Scalable● Mesoscluster● Trainingtimeamonthto5hours(approx)
Improveforecastingaccuracy
55
Results
Cloudbasedsolution● AWSS3,EC2,ElastiCache
Transparent● Verificationmicroservice
Scalable● Mesoscluster● Trainingtimeamonthto5hours(approx)
Improveforecastingaccuracy● Onparorbetter
Improvements
Hyperlocal
AWSlambdaintegration
Iterateformoreaccuracy
56
Questions?
57
We are hiring!
59