43
Environmental Research with RapidMiner

Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Environmental Research with RapidMiner

Page 2: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

About Me

Rodrigo Fuentealba Cartes !

Lead Data Scientist and Senior Software Developer at Pegasus

Mr. Fuentealba has been using and developing open source technologies since 1995. His career in data science began in 2008 when he began building models for healthcare and predictive maintenance for vessels.

Page 3: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive
Page 4: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Use Case

This is a project in development since 2016 as an effort to address environmental issues in the salmon farming process.

The Pegasus Group provides data science services and technology support to this project.

Page 5: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Background

• Chile !

• World's 2nd largest farmed salmon exporter.

• Salmon farming is the 3rd largest economic activity.

• In 2017, produced USD $ 4.5 billion in revenue.

Page 6: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Problem

• Sea Lice

• Deadly parasite that hosts and damages salmonids.

• Threatens the environment, the communities and the local economy, both directly and indirectly.

• USD$ 350 million are spent to address it.

Page 7: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Sea Lice

Page 8: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Challenge

• Understanding how the Caligus is spread.

• Predicting what salmon farms are in immediate danger.

• Evaluating the best antibiotic treatments against Caligus.

Page 9: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Warming up

Page 10: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

How to solve these challenges?

• Apply a Hydrodynamic Model to review tide directions.

• Apply Predictive Analytics to detect farms in danger.

• Apply Machine Learning to evaluate the best treatments.

Page 11: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Methodology

• RMDS: Rod's Methodology for Data Science

• Understanding the Context.

• Asking the right Questions.

• Identifying the Nouns.

• Taking action with Verbs.

• Interpreting Answers.

Page 12: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RMDS vs CRISP-DM

• Context

• Questions

• Nouns (Data)

• Verbs (Processes)

• Answers

Page 13: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Infrastructure

GIS DWHCMM

Hydra 12 DBs

API

Connie

Dashboard

Page 14: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Applying Nouns and Verbs

GIS(Noun)

DWH(Verb)

CMM(Verb)

Hydra(Noun)

9 DBs (Noun)

API(Noun)

Connie(Noun)

Dashboards(Verb)

Page 15: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

But there are massive amounts of it.

No Big Data

Page 16: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

How much data do we have?

100 Gb (stable)

47 Gb(hourly)

10 Gb(yearly)

300 Gb(stable)

1 Gb(hourly)

Page 17: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Challenge 1:How the parasite is spread?

Page 18: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

How the Caligus is spread?

• Hydrodynamic Model

• Streaming a 4D representation of the ocean (latitude, longitude, depth and time) in time-series format.

• Processing this representation with Navier-Stokes equations and map/reduced into Connie Matrix.

(Think of automatic BMP to SVG transformation, a few million times heavier)

Page 19: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

4D Representation of the Ocean

A A

T T

A A

D A T A

A A

T T

A A

D A T A

A A

T T

A A

D A T A

TIME

X Dimension

Y D

imen

sion

Z Dimension

X Dimension X Dimension

Page 20: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive
Page 21: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

0

750

1500

2250

3000

1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 Hit

Performance of Hydrodynamic Model

Page 22: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Connectivity Matrix

Page 23: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Connectivity Matrix

Page 24: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Challenge 2:What farms are in danger?

Page 25: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

What farms are in danger?

• Answer: the ones in the path of Caligus!

• Mix operational databases, the GIS database and the Connie Matrix in the data warehouse.

• Perform Time-Series and k-Means on different pairs 360 times on each block.

• A manually trained Decision Tree helps categorizing the threat level between 0 and 10.

Page 26: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RapidMiner: Getting Operational DB's

Page 27: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RapidMiner: Joining Operational and GIS

Page 28: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RapidMiner: Joining Connie Matrix

• Same ol', same ol',

• Except that it's done with PostgreSQL and PostGIS.

• So, no pictures of this process.

Page 29: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RapidMiner: k-Means + Decision Tree

Page 30: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Reports

Page 31: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Results

• Find farms that might be attacked within 2 weeks.

• Trained data from 2016, tested data from 2017.

• This has been pretty consistent with data from 2018.

True Hit True Miss %

Pred. Hit 4982 1845 72.97

Pred. Miss 890 192817 99.54

Class Recall 84.84 99.05

Page 32: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Challenge 3:What is the best treatment?

Page 33: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive
Page 34: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Data Model for Production/Mortality

Page 35: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Challenge

• Explore operational databases for the following things:

• Maximized production and minimized mortality rate.

• Analyze diseases, caligus reports, treatments and vaccinations.

• Retrieve patterns that are applied in the best farms and apply these to the worst ones.

Page 36: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Notice

While the database has been entirely designed by me (the structure), the information (the data) contained on it is

proprietary and I cannot share it with you. That doesn't mean I can't obfuscate the data to show you how we performed

the analysis.

Also, it has been simplified from nearly a thousand processes to just two, as proper data extraction and

classification was quite difficult.

Page 37: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Preparation Process

Page 38: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Analytics Process

Page 39: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Results

Page 40: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Real Life Testing

• Sample: 20 farms of nearly 5800.

• The combination of treatments was designed through SVM, Neural Networks and Time-Series. (Too complex to be shown here).

• Mortality reduced in 46.1%. (73.7% in Caligus)

• USD$ 97,565 saved in treatments.

• Expected to save USD$ 24 million by 2019.

Page 41: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Conclusions

• #DataSci is about solving challenges with technology: we apply it in many other use cases.

• Proper data prep overcomes technical debt limits. Public organizations developments suffer a lot of this.

• Quick process model (20%) helps us fail fast and achieve results earlier.

• RapidMiner excels at both. We couldn't have done this without it.

Page 42: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

RapidMinerData Science, Fast and Simple

Page 43: Environmental Research with RapidMiner · 2019-08-05 · Warming up. How to solve these challenges? • Apply a Hydrodynamic Model to review tide directions. • Apply Predictive

Contact Information

Rodrigo Fuentealba Cartes

E-mail: [email protected] Twitter:@datasciencegemsLinkedIn:https://www.linkedin.com/in/rodrigofuentealbacartes/