30
Digital Infrastructure and Novel Computational Methods for Analyzing and Mining Climate and Remote Sensing Large Databases to improve Agricultural Monitoring and Forecasting Luciana Alvim S. Romani (PI) Embrapa Agricultural Informatics Jurandir Zullo Jr. (Co-PI) Cepagri/Unicamp Campinas, SP (2015-2016)

Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Digital Infrastructure and Novel Computational Methods for Analyzing and Mining Climate and Remote Sensing Large Databases to improve Agricultural Monitoring and Forecasting

Luciana Alvim S. Romani (PI)Embrapa Agricultural Informatics

Jurandir Zullo Jr. (Co-PI)Cepagri/Unicamp

Campinas, SP

(2015-2016)

Page 2: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Motivation

Big data from Agrometeorology challenges to Computer Science improvements in Agriculture

diversity of available data,

including several diverse

scales, long-term series,

platform to integrate

computer scientists and

agrometeorologists

Page 3: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Teamwork

Institutions:

Related projects:

2002- 2015

2010 - 2012

2011 - 2014

Page 4: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Goals

Page 5: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Expected results from Agrocomputing.net

1. Organize and integrate data from meteorological stations, climate change scenario models and remote sensors into a single platform, which can also be applied to other related systems;

2. Develop data mining and fractal correlation techniques to analyze time series of climate and satellite images;

3. Develop classification methods to be applied to high and medium spatial resolution satellite images;

4. Evaluate climate fitness in the productive coffee areas with altimetry data and future scenario models;

5. Develop methods for temporal monitoring of sugarcane crops using images with low spatial and high temporal resolution.

Page 6: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Climate

Data

Data from

models

Images

Decentralized storageDifferent data source

Improvement of

computational methods

Improvement of agrometeorological

models

Page 7: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Sources of data

DSpace

Satellite Images

Low, Medium and High

Spatial Resolution

Source: NOAA, MODIS,

RapidEye, LandSat, Geoeye

Climate models

Eta 20 Km and 10 Km

Source: CPTEC/INPE

Meteorological stations

Agritempo

Source: Embrapa and Cepagri/Unicamp

ChallengesDeal with very large and heterogeneousdata sources in acceptable time (knowledge for decision making)

Page 8: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label
Page 9: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Data management policy

Team is defining the policy

◦ Free access to data

◦ Use license

◦ Embrapa Agricultural Informatics will be responsible to maintain the infrastructure and databases generated in

this proposal

to guarantee free access for anyone intending to conduct research on this field, as has been made for Agritempo (12 years) and others repositories.

Page 10: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

PRELIMINARY RESULTS

Page 11: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Result 1: Organize and integrate data

DSpace◦ Datasets and results available to the community

◦ Preserve and enable easy and open access to all types of digital content including text, images and datasets.

◦ Communities, collections and metadata were defined.

Research in database ◦ Similarity into Database Management Systems

◦ Data structures to index time series

◦ Content-based image retrieval

www.agrocomp-rep.cnptia.embrapa.br

[Pola_IS_2015, Santos_ISM_2015]

Page 12: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

DSpace

Page 13: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Result 2: Developed tools [SatImagExplorer]

Page 14: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Result 2: Developed tools: SatImaExplorer

1st version of SatImagExplorer software

◦ release in 2016

Functionalities:

◦ Input: Satellite Image Time Series, file in .TXT or .CSV format

Selection of regions of interest

◦ Clustering algorithms

K-Means, BIRCH, CLARANS, K-Medoids

◦ Classification KNN, LNP, HC-LGT

◦ Distance function DTW, Manhatan, Euclidean

Page 15: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Classificação Semisupervisionada usando Proximidade Geoespacial (CSPG)◦ is a semi-supervised classification method◦ classify the unlabeled instances of the satellite image time series

◦ graph-based approach

Graph construction: connects nodes based on

1. Distance function between time series.

2. Geospatial proximity (using lat/long).

Unlabeled nodes classification: Label propagation.

Result 2: Development of data mining techniques

CSPG LNP KNN

SugarcaneNot

SugarcaneSugarcane

NotSugarcane

SugarcaneNot

Sugarcane

Sugarcane 72,1% 27,9% 67,5% 32,5% 60,8% 39,2%

NotSugarcane

32,4% 64,6% 42,3% 57,7% 43,9% 56,1%

* Validation using the Canasat/INPE mask.

Page 16: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

FCDS

Result 2: Fractal-based techniques

Fractal-based Clustering of Data Stream Framework (FCDS) To cluster sensors with the same behavior in a time interval

[Bones_SBBD_2015]

Page 17: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Geoeye, Rapideye and Landsat satellite image

Class 1: coffee Class 2: soil

Labels defined by specialist

Feature extraction

Clusters identification

Geoeye, Rapideye and Landsat image classified

Multi-resolution correlation clustering (MrCC) method

Remote sensing images (high and medium spatial

resolution)

Result 3: Classification methods (high and medium spatial resolution images)

Page 18: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Landsat image (30 m)

Non-supervised classification and supervised classification (by specialists)

Result 3: Classification methods: definition of labels by specialists

RapidEye image (5 m)

Geoeye-1 image (1.65 m)

Under construction

Labeled Images Labeled Images

Page 19: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Extraction of time series (NDVI)

Statistical analysis(Mann-Kendall)

Results presented in map format

Land use change (harvest)

Dynamics of land use change (2001-2009)

Intensity of change(2001-2009)

Satellite images

Reference Coordinates

Data preprocessing

SatImageExplorerSoftware R

(mannk-autosmannk-auto)

Quantum GIS

Results of statistical analysis

Result 5: Methods for temporal sugarcane monitoring

[Silva_SBIAgro_2015]

• Change detection in satellite image time series using Mann-Kendall method

Page 20: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Validation and dissemination

Comparison with similar computational methods

Results on crops: sugarcane and coffee crops, which are important commodities and are positively and negatively affected by the temperature increase.

It is important to highlight that among all the evaluation processes, a greater importance will be assigned to the domain experts feedback

Page 21: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Clustering algorithm implemented in SatImagExplorer to analyze NDVI time series

Validation by specialists

[Scrivani_SBSR_2015]

Page 22: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Validation by specialists

Correlation of time series from AVHRR/NOAA and MODIS/Terra using clustering◦ strong correlation between NDVI data from sensors AVHRR 1km and

MODIS 1km;

◦ NDVI data from AVHRR can be used to monitor agricultural crops cultivated in large fields without loss of information or mistakes in the mapping whether compared to results obtained by the MODIS sensor.

[Scrivani_IGARSS_2015]

Page 23: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Validation by specialists

Numerical Models to Forecast the Sugarcane Production

◦ based on time series of NDVI/AVHRR images and agrometeorological data

◦ using the variables planted area, NDVI and WRSI presented correlation coefficients (R2) around 0.9 and are able to estimate the sugarcane production for the state of São Paulo in Brazil

[Gonçalves_Multitemp_2015]

Page 24: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Contributions

new methods and algorithms to properly, quickly and efficiently process, handle and analyze data, as well as understand their inter-relationships

computational platform to integrate different researcher

proposition of a mechanism to provide autonomy for agricultural meteorologists to the access and parameterize datasets, to define new research needs, and to reformulate, inter-compare and integrate agroenvironmental models

Computer Science:

Agrometeorology:

upgrade models to analyze data in the current and future climate

perspective

new tools to evaluate Satellite Image Time Series (SITS) in a

agricultural context

new tools to deal with a huge volume of agrometeorological data

Page 25: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Published Papers

Page 26: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Workshops and meetings

April, 2015 – Embrapa Agricultural Informatics (Campinas)

May, 2015 – Cepagri (Campinas)

November, 2015 – ICMC/USP (São Carlos)

Page 27: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Environment to promote the team’s communication and collaboration

Agropedia brasilis: environment provided by Embrapa

Page 28: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

International collaboration

Prof. Mihai Datcu

German Aerospace Center (DLR)

◦ Panel Techniques for analyzing satellite images time series (SBSR, 2015) Prof. Jurandir Zullo Jr. (Cepagri/Unicamp), Prof. Agma J. M. Traina

(ICMC/USP) and Prof. Mihai Datcu (German Aerospace Center (DLR))

◦ Lectures and meetings (April, 2015) Embrapa Agricultural Informatics (Campinas)

UFSCar (São Carlos)

USP (São Carlos)

Lanapre-Embrapa (São Carlos)

Page 29: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Embrapa Agricultural Informatics (Campinas):

Luciana Alvim S. Romani (Coordinator)

Adriano F. Otavian

Alan Nakai

Aryeverton Fortes

Eduardo Assad

Giampaolo Pellegrino

Glauber Vaz

Jayme Barbedo

José Eduardo Monteiro

Luciano V. Koenigkan

Silvio R. M. Evangelista

Cepagri-Unicamp (Campinas):

Jurandir Zullo Jr.,

Priscila P. Coltri,

Renata R. V. Gonçalves

and students

CPTEC-INPE (Cachoeira Paulista)

Chou Sin Chan

and students

ICMC-USP (São Carlos):

Agma J. M. Traina (coordinator)

Caetano Traina Jr.

Elaine Parros M. Sousa

Robson L. Cordeiro

and students

UFSCar (São Carlos):

Marcela X. Ribeiro,

and students

UFU (Uberlândia):

Maria Camila Nardini Barioni

Humberto Luiz Razente

and students

UFABC (Santo André):

Alexandre Noma

Page 30: Computational framework to analyze agrometeorological ... · 1. Distance function between time series. 2. Geospatial proximity (using lat/long). Unlabeled nodes classification: Label

Thanks for attention!

[email protected]