A comparison of learning methods to predict N2O fluxes and N leaching

A comparison of learning methods to predict N2Ofluxes and N leaching

Nathalie Villa-Vialaneixhttp://www.nathalievilla.org

Joined work with Marco (Follador & Ratto) and Adrian Leip (EC, Ispra,Italy)

April, 27th, 2012 - BIA, INRA Auzeville

SAMM (Université Paris 1) &

IUT de Carcassonne (Université de Perpignan)

Nathalie Villa-Vialaneix (April 27th, 2012) Comparison of metamodels SAMM & UPVD 1 / 27

Sommaire

1 DNDC-Europe model description

2 Methodology

3 Results

DNDC-Europe model description

Sommaire

2 Methodology

3 Results

General overview

Modern issues in agriculture

• fight against the food crisis;

• while preserving environments.

EC needs simulation tools to

• link the direct aids with the respect of standards ensuring propermanagement;

• quantify the environmental impact of European policies (“CrossCompliance”).

General overview

Modern issues in agriculture

• fight against the food crisis;

• while preserving environments.

EC needs simulation tools to

• link the direct aids with the respect of standards ensuring propermanagement;

• quantify the environmental impact of European policies (“CrossCompliance”).

Cross Compliance Assessment Tool

DNDC is a biogeochemical model.

Zoom on DNDC-EUROPE

Moving from DNDC-Europe to metamodelingNeeds for metamodeling

• easier integration into CCAT

• faster execution and responding scenario analysis

Moving from DNDC-Europe to metamodelingNeeds for metamodeling

• easier integration into CCAT

• faster execution and responding scenario analysis

Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼19 000 HSMU (Homogeneous Soil Mapping Units ' 1km2 but the area isquite varying) used for corn cultivation:• corn corresponds to ' 4.6% of UAA;• HSMU for which at least 10% of the agricultural land was used for

corn were selected.

2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).

Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE:11 input (explanatory) variables (selected by experts and previoussimulations)• N FR (N input through fertilization; kg/ha y);• N MR (N input through manure spreading; kg/ha y);• Nfix (N input from biological fixation; kg/ha y);• Nres (N input from root residue; kg/ha y);• BD (Bulk Density; g/cm3 );• SOC (Soil organic carbon in topsoil; mass fraction);• PH (Soil pH);• Clay (Ratio of soil clay content);• Rain (Annual precipitation; mm/y);• Tmean (Annual mean temperature; C);• Nr (Concentration of N in rain; ppm).

2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).

Data [Villa-Vialaneix et al., 2012]Data extracted from the biogeochemical simulator DNDC-EUROPE:2 outputs to be estimated (independently) from the inputs:• N2O fluxes (greenhouse gaz);• N leaching (one major cause for water pollution).

Methodology

Sommaire

2 Methodology

3 Results

Methodology

Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).

For every data set, every output and every method,

1 The data set was split into a training set and a test set (on a80%/20% basis);

2 The regression function was learned from the training set (with afull validation process for the hyperparameter tuning);

3 The performances were calculated on the basis of the test set: forthe test set, predictions were made from the inputs and compared tothe true outputs.

Methodology

Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).For every data set, every output and every method,

Methodology

Purpose: Comparison of several metamodeling approaches (accuracy,computational time...).For every data set, every output and every method,

Methodology

Methods

• 2 linear models:• one with the 11 explanatory variables;• one with the 11 explanatory variables plus several nonlinear

transformations of these variables (square, log...): stepwise AIC wasused to train the model;

• MLP• SVM• RF• 3 approaches based on splines: ACOSSO (ANOVA splines), SDR

(improvement of the previous one) and DACE (kriging basedapproach).

Methodology

Regression

Consider the problem where:

• Y ∈ R has to be estimated from X ∈ Rd ;

• we are given a learning set, i.e., N i.i.d. observations of (X ,Y),(x1, y1), . . . , (xN , yN).

Example: Predict N2O fluxes from PH, climate, concentration of N in rain,fertilization for a large number of HSMU . . .

Methodology

Multilayer perceptrons (MLP)

A “one-hidden-layer perceptron” takes the form:

Φw : x ∈ Rd →

Q∑i=1

w(2)i G

(xT w(1)

i + w(0)i

)+ w(2)

where:

• the w are the weights of the MLP that have to be learned from thelearning set;

• G is a given activation function: typically, G(z) = 1−e−z

1+e−z ;

• Q is the number of neurons on the hidden layer. It controls theflexibility of the MLP. Q is a hyper-parameter that is usually tunedduring the learning process.

Methodology

Symbolic representation of MLPIN

w(1)11

w(1)pQ

Neuron 1

Neuron Q

φw(x)

+w(0)Q

Methodology

Learning MLP

• Learning the weights: w are learned by a mean squared errorminimization scheme :

w∗ = arg minw

N∑i=1

L(yi ,Φw(xi)).

Problem: MSE is not quadratic in w and thus some solutions can belocal minima.

• Tuning the hyper-parameters, C and Q : simple validation wasused to tune first C and Q .

Methodology

Learning MLP

• Learning the weights: w are learned by a mean squared errorminimization scheme penalized by a weight decay to avoidoverfitting (ensure a better generalization ability):

w∗ = arg minw

N∑i=1

L(yi ,Φw(xi))+C‖w‖2.

Methodology

Learning MLP

w∗ = arg minw

N∑i=1

Methodology

Learning MLP

w∗ = arg minw

N∑i=1

Methodology

SVMSVM is also an algorithm based on penalized error loss minimization:

1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT x + bwith (w, b) solution of

arg minN∑

Lε(yi ,Φ(w,b)(xi)) + λ‖w‖2

where• λ is a regularization (hyper) parameter (to be tuned);• Lε(y, y) = max{|y − y | − ε, 0} is an ε-insensitive loss function

See ε-insensitive loss function

2 Non linear SVM for regression are the same except that a nonlinear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.

Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)

Tϕ(xj).

Common kernel: Gaussian kernel

Kγ(u, v) = e−γ‖u−v‖2

is known to have good theoretical properties both for accuracy andgeneralization.

Methodology

1 Basic linear SVM for regression2 Non linear SVM for regression are the same except that a non

linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.

Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)

Tϕ(xj).Common kernel: Gaussian kernel

Methodology

linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)

Tϕ(xj).

Common kernel: Gaussian kernel

Methodology

linear (fixed) transformation of the inputs is previously made:ϕ(x) ∈ H is used instead of x.Kernel trick: in fact, ϕ is never explicit but used through a kernel,K : Rd × Rd → R. This kernel is used for K(xi , xj) = ϕ(xi)

Tϕ(xj).Common kernel: Gaussian kernel

Methodology

Learning SVM• Learning (w, b): w =

∑Ni=1 αiK(xi , .) and b are calculated by an

exact optimization scheme (quadratic programming). The only stepthat can be time consumming is the calculation of the kernel matrix:

K(xi , xj) for i, j = 1, . . . ,N.

The resulting ΦN is known to be of the form:

ΦN(x) =N∑

αiK(xi , x) + b

where only a few αi are non zero. The corresponding xi are calledsupport vectors.

• Tuning of the hyper-parameters, C = 1/λ, ε and γ: simplevalidation has been used. To limit waste of time, ε has not beentuned in our experiments but set to the default value (1) whichensured 0.5N support vectors at most.

Methodology

K(xi , xj) for i, j = 1, . . . ,N.

ΦN(x) =N∑

αiK(xi , x) + b

Methodology

K(xi , xj) for i, j = 1, . . . ,N.

ΦN(x) =N∑

αiK(xi , x) + b

Methodology

From regression tree to random forest

Example of a regression tree

|SOCt < 0.095

PH < 7.815

SOCt < 0.025

FR < 130.45 clay < 0.185

SOCt < 0.025

SOCt < 0.145

FR < 108.45PH < 6.5

4.366 7.10015.010 8.975

2.685 5.257

26.26028.070 35.900 59.330

Each split is made such thatthe two induced subsets havethe greatest homogeneity pos-sible.The prediction of a final nodeis the mean of the Y value ofthe observations belonging tothis node.

Methodology

Random forest

Basic principle: combination of a large number of under-efficientregression trees (the prediction is the mean prediction of all trees).

For each tree, two simplifications of the original method are performed:

1 A given number of observations are randomly chosen among thetraining set: this subset of the training data set is called in-bag samplewhereas the other observations are called out-of-bag and are used tocontrol the error of the tree;

2 For each node of the tree, a given number of variables are randomlychosen among all possible explanatory variables.

The best split is then calculated on the basis of these variables and of thechosen observations. The chosen observations are the same for a giventree whereas the variables taken into account change for each split.

Methodology

Random forest

Basic principle: combination of a large number of under-efficientregression trees (the prediction is the mean prediction of all trees).For each tree, two simplifications of the original method are performed:

1 A given number of observations are randomly chosen among thetraining set: this subset of the training data set is called in-bag samplewhereas the other observations are called out-of-bag and are used tocontrol the error of the tree;

2 For each node of the tree, a given number of variables are randomlychosen among all possible explanatory variables.

The best split is then calculated on the basis of these variables and of thechosen observations. The chosen observations are the same for a giventree whereas the variables taken into account change for each split.

Methodology

Learning a random forestRandom forest are not very sensitive to hyper-parameters (number ofobservations for each tree, number of variables for each split): the defaultvalues have been used.

The number of trees should be large enough for the mean squared errorbased on out-of-sample observations to stabilize:

0 100 200 300 400 500

Out−of−bag (training)Test

Methodology

Learning a random forestRandom forest are not very sensitive to hyper-parameters (number ofobservations for each tree, number of variables for each split): the defaultvalues have been used.The number of trees should be large enough for the mean squared errorbased on out-of-sample observations to stabilize:

0 100 200 300 400 500

Out−of−bag (training)Test

Results

Sommaire

2 Methodology

3 Results

Results

Influence of the training sample size

5 6 7 8 9

N2O prediction

log size (training)

LM1LM2DaceSDRACOSSOMLPSVMRF

5 6 7 8 9

N leaching prediction

log size (training)

Results

Influence of the training sample size

5 6 7 8 9

N leaching prediction

log size (training)

Results

Computational time

Use LM1 LM2 Dace SDR AcossoTrain <1 s. 50 min 80 min 4 hours 65 min nPrediction <1 s. <1 s. 90 s. 14 min 4 min.

Use MLP SVM RFTrain 2.5 hours 5 hours 15 minPrediction 1 s. 20 s. 5 s.

Time for DNDC: about 200 hours with a desktop computer and about 2days using cluster 7!

Results

Further comparisons

Evaluation of the different step (time/difficulty)

Training Validation TestLM1 ++ +LM2 + +ACOSSO = + -SDR = + -DACE = - -MLP - - +SVM = - -RF + + +

Results

Understanding which inputs are importantImportance: A measure to estimate the importance of the input variablescan be defined by:• for a given input variable randomly permute the input values and

calculate the prediction from this new randomly permutated inputs;• compare the accuracy of these predictions to accuracy of the

predictions obtained with the true inputs: the increase of meansquared error is called the importance.

Example (N leaching, SVM):

● ●

●●

● ●

2 4 6 8 10

E) N_FR

Nres pH

Nrclay

SOCTmean Nfix

The variables N_MR, N_FR, Nres and pH are the most important foraccurate predictions.

Results

Understanding which inputs are importantImportance: A measure to estimate the importance of the input variablescan be defined by:• for a given input variable randomly permute the input values and

calculate the prediction from this new randomly permutated inputs;• compare the accuracy of these predictions to accuracy of the

predictions obtained with the true inputs: the increase of meansquared error is called the importance.

This comparison is made on the basis of data that are not used to definethe machine, either the validation set or the out-of-bag observations.

● ●

●●

● ●

2 4 6 8 10

E) N_FR

Nres pH

Nrclay

SOCTmean Nfix

Results

Understanding which inputs are importantExample (N2O, RF):

●●

2 4 6 8 10

Nr N_MR NfixN_FR

clay NresTmean BD rain

The variables SOC and PH are the most important for accuratepredictions.

● ●

●●

● ●

2 4 6 8 10

E) N_FR

Nres pH

Nrclay

SOCTmean Nfix

Results

Understanding which inputs are importantExample (N leaching, SVM):

● ●

●●

● ●

2 4 6 8 10

E) N_FR

Nres pH

Nrclay

SOCTmean Nfix

Results

Thank you for your attention

Any questions?

Results

Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012).A comparison of eight metamodeling techniques for the simulation ofn2o fluxes and n leaching from corn crops.Environmental Modelling and Software, 34:51–66.

Results

ε-insensitive loss function

Go back

A comparison of learning methods to predict N2O fluxes and N leaching

Science

Leaching Equipments

CO2, CH4, AND N2O EMISSIONS FROM TRANSPORTATION-WATER-BORNE NAVIGATION€¦ · CO2, CH4 and N2O Emissions from Transportation-Water-borne Navigation 71 CO2, CH4, AND N2O EMISSIONS

A comparison of three learning methods to predict N20 fluxes and N leaching

Leaching 2

N2O Production and Removal in Streambed Sediments ... · O Production and Removal in Streambed Sediments: Understanding Physicochemical Influences on N 2 O Fluxes from Agricultural

Heap Leaching

Atmospheric oxygen-rich direct leaching behavior of zinc ... · Key words: zinc sulphide concentrate; atmospheric direct leaching; oxygen-rich leaching; leaching behavior; mineralogy

Selective Leaching

N2O Section V - Operations

Leaching Of Rare Earths From Bauxite Residues Using ... · PDF fileFurthermore leaching of gold and ... thiourea system had the same leaching ... Leaching Of Rare Earths From Bauxite

Permeation & Leaching

LEACHING Kelompok5

HIGH PRESSURE SULPHURIC ACID LEACHING OF ... - · PDF fileHigh pressure sulphuric acid leaching of lateritic nickel ore 477 leaching time on high pressure sulphuric acid leaching of

COMRA...atmospheric pressure sulfuric acid leaching – IOM, Reduction roasting ammonia leaching, high temperature and high pressure sulphuric acid leaching ,Smelting sulfur-leaching,

INFLUENCE OF OXIDE FLUXES AND DUPLEX FLUXES ON

Nitrous Oxide Emissions€¦ · nitrous oxide (N2O) fluxes and underlying factors at crucial management events (irrigation, fertilization, etc.) inrepresentative agro-ecosystems Central

Leaching Done

Global risk assessment of high nitrous oxide emissions ... · Global risk assessment of high nitrous oxide emissions from rice production Incorporating the discovery of high N2O fluxes

Leaching Laporan

CCUS/カーボンリサイクルバイオものつくり分野 …...Trend after 2050 Side-effects C o s t Potential Side-effects N2O N2O N2O N2O Side-effects ( positive, risk of