59
1

Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

1

Page 2: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Data science and engineering for local weather forecasts

Nikhil R PodduturiData {Scientist, Engineer}

November, 2016

Page 3: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Agenda

● AboutMeteoGroup

● Introductiontoweatherdata

● Problemdescription

● Datascienceandweatherforecasting

● Engineering

● Verification

● Results

● Questions

3

Page 4: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

4

Howmanyofyoucheckweatherforecasts frequently?

Page 5: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

5

Page 6: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

6

Weatherdata

Page 7: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

1.5TB/day

7

Page 8: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

8

Typesofdata

Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)

●MeteoGroupmeasurement network

Page 9: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

9

Typesofdata

Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)

●MeteoGroupmeasurement network

Satellitedata

Page 10: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

10

Typesofdata

Observations:●WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircraftsetc)

●MeteoGroupmeasurement network

Satellitedata

Radardata

Page 11: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

11

Typesofdata

Observations:● WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircrafts

etc)● MeteoGroupmeasurementnetwork

Satellite data

Radardata

Userdata

Page 12: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

12

Typesofdata

Observations:● WMOweatherstations(e.g:surface,upper-air,ships,driftingbuoys,aircrafts

etc)● MeteoGroupmeasurementnetwork

Satellite data

Radardata

Userdata

Numericalweatherpredictionmodeldata

Page 13: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

13

Numericalweatherpredictionmodels

●Complex andMultidimensional data

Page 14: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

14

Numericalweatherpredictionmodels

●Complexandmultidimensionaldata

● 5NWPmodels fromdifferentproviders

Page 15: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

15

Numericalweatherpredictionmodels

●Complexandmultidimensionaldata

● 5NWPmodelsfromdifferentproviders

●Datasizeperday- 0.5TB

Page 16: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Datascienceandweatherforecasting

16

Page 17: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

17

Page 18: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

18

Outcome

● Took24hoursfor24hourforecasts

●Gridinterval- 736km

● Poorresults

Page 19: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

MeteoGroupForecastingsystem

19

Page 20: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

MeteoGroupforecastingsystem

20

Forecasts3 years of NWP data

3 years of observation

data

Daily NWP data

Machine learningmodel Trained

model

Page 21: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

MeteoGroupforecastingsystem

Writteninpascal

21

Page 22: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

MeteoGroupforecastingsystem

Written inpascal

Runsoninhousehighperformance computing cluster

22

Page 23: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

MeteoGroupforecastingsystem

Written inpascal

Runsoninhousehighperformancecomputingcluster

Limitations●Hardtomaintain●Notverytransparent● Scalability

23

Page 24: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

24

Problemdescription

Page 25: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Nextgenerationforecastingsystem

●Cloudbasedsolution

25

Page 26: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Nextgenerationforecastingsystem

●Cloudbasedsolution

● Transparent

26

Page 27: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Nextgenerationforecastingsystem

●Cloudbasedsolution

● Transparent

● Scalable

27

Page 28: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Nextgenerationforecastingsystem

●Cloudbasedsolution

● Transparent

● Scalable

● Improveforecastingaccuracy

28

Page 29: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

29

Baselinemodel

NWP data Downscale to location Linear modelInterpolate

missing values

Page 30: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

30

Baselinemodel

NWP data Downscale to location Linear modelInterpolate

missing values

Outcome:●Veryfast● Pooraccuracy●Multicollinearity

Page 31: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Iteration1

●Addressmulticollinearityusingfeatureselection● Scalethefeatures

31

NWP data Downscale to location Linear modelInterpolate

missing valuesFeature selection

Scale features

Page 32: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Iteration1

●Addressmulticollinearityusingfeatureselection● Scalethefeatures

32

NWP data Downscale to location Linear modelInterpolate

missing valuesFeature selection

Scale features

Outcome:● Improvedaccuracy

Page 33: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Iteration2

33

●Modelselectionbetween linearandnon-linearmodels●Advancedfeatureselection

NWP data Downscale to location

Model selection

(linear and non-linear models)

Interpolate missing values

Advance feature

selection

Scale features

Page 34: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Iteration2

34

●Modelselectionbetween linearandnon-linearmodels●Advancedfeatureselection

NWP data Downscale to location

Model selection

(linear and non-linear models)

Interpolate missing values

Advance feature

selection

Scale features

Outcome:●Onparwithexistingforecastingsystem● Slowtraining

Page 35: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Engineeringtoscaletheproduct

35

Page 36: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Baselinemodelengineering

36

(Scikit-learn, NumPy, Keras with TensorFlow)

Page 37: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Modelengineering

37

(Scikit-learn, NumPy, Keras with TensorFlow)

Good:● PythonMLecosystem● Familiarityamongtheteam● TestdrivenandAgileDevelopment● Failfast

Page 38: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Modelengineering

38

(Scikit-learn, NumPy, Keras with TensorFlow)

Good:● PythonMLecosystem● Familiarityamongtheteam● TestdrivenandAgileDevelopment● Failfast

Bad:● Notscalable

Page 39: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

47000*15*360modelruns

39

Locations Weather attributese.g: temperature, wind etc

Hours

Page 40: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

ScalingwithApacheAirflow

40

ApacheAirflow• ByAirBnB• Apacheproductsinceearly2016

DirectedAcyclicGraph(DAG)

Components• UI• Scheduler• Executor(s)

Page 41: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

ApacheAirflowDAG

41

●Hooks(connections)

●Operators(tasks)

● Schedule

●Dependencies

Page 42: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

AirflowandMesos

42

deploy

Mesos cluster

persist AWS S3

Airflow scheduler

Page 43: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

AirflowandMesos

43

deploy

Mesos cluster

Persist AWS S3

Airflow scheduler

Cont Integ

Page 44: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Verification

44

Page 45: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

45

Deploy DAG Verify model

Improve DAG

Modelimprovementcycle

Page 46: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Forecastverification

46

AWS S3 withmodels

Forecast Engine

JSON-LD

Page 47: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Verificationmetrics

47

●Meanabsoluteerror●Rootmeansquarederror●Meanerror●Heidkeskillscore● Equitablethreatscore● Probabilitydensity functions● Errorpercentiles

Page 48: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

48

Mean absolute error for different models (Temperature)

Page 49: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

49

Probability distribution function for multiple models (Temperature)

Page 50: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Percentile graphs for each model (Temperature)

Page 51: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

FordemopleasestopbyMGbooth

51

Page 52: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

52

Results

Cloudbasedsolution● AWSS3,EC2,ElastiCache

Transparent

Scalable

Improveforecastingaccuracy

Page 53: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

53

Results

Cloudbasedsolution● AWSS3,EC2,ElastiCache

Transparent● Verificationmicroservice

Scalable

Improveforecastingaccuracy

Page 54: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

54

Results

Cloudbasedsolution● AWSS3,EC2,ElastiCache

Transparent● Verificationmicroservice

Scalable● Mesoscluster● Trainingtimeamonthto5hours(approx)

Improveforecastingaccuracy

Page 55: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

55

Results

Cloudbasedsolution● AWSS3,EC2,ElastiCache

Transparent● Verificationmicroservice

Scalable● Mesoscluster● Trainingtimeamonthto5hours(approx)

Improveforecastingaccuracy● Onparorbetter

Page 56: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Improvements

Hyperlocal

AWSlambdaintegration

Iterateformoreaccuracy

56

Page 57: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

Questions?

57

Page 58: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

We are hiring!

Page 59: Data science and engineering for local weather€¦ · Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist ... Apache Airflow • By AirBnB

59