50
i Estimates of mesopelagic fish biomass according to environmental data in Mediterranean Sea Par : Morane CLAVEL-HENRY Soutenu à Rennes, le 11.09.2015 Devant le jury composé de : Président : Olivier Le Pape Maître de stage : Villy Christensen (IOF) et Chiara Piroddi (CSM-CSIS) Enseignant référent : Didier Gascuel et Olivier Le Pape (AGROCAMPUS OUEST) Autres membres du jury : Sylvain Bonhommeau (IFREMER) Les analyses et les conclusions de ce travail d'étudiant n'engagent que la responsabilité de son auteur et non celle d’AGROCAMPUS OUEST AGROCAMPUS OUEST CFR Angers CFR Rennes Année universitaire : 2014-2015 Spécialité : Agronomie Spécialisation (et option éventuelle) : Halieutique option REA Mémoire de Fin d'Études dIngénieur de l’Institut Supérieur des Sciences agronomiques, agroalimentaires, horticoles et du paysage de Master de lInstitut Supérieur des Sciences agronomiques, agroalimentaires, horticoles et du paysage d'un autre établissement (étudiant arrivé en M2)

GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

  • Upload
    danganh

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

i

Estimates of mesopelagic fish biomass according to environmental data in

Mediterranean Sea

Par : Morane CLAVEL-HENRY

Soutenu à Rennes, le 11.09.2015

Devant le jury composé de :

Président : Olivier Le Pape

Maître de stage : Villy Christensen (IOF) et Chiara Piroddi (CSM-CSIS)

Enseignant référent : Didier Gascuel et Olivier Le Pape (AGROCAMPUS OUEST)

Autres membres du jury : Sylvain Bonhommeau (IFREMER)

Les analyses et les conclusions de ce travail d'étudiant n'engagent que la responsabilité de son auteur et non celle d’AGROCAMPUS OUEST

AGROCAMPUS OUEST

CFR Angers

CFR Rennes

Année universitaire : 2014-2015

Spécialité : Agronomie

Spécialisation (et option éventuelle) : Halieutique option REA

Mémoire de Fin d'Études

d’Ingénieur de l’Institut Supérieur des Sciences agronomiques,

agroalimentaires, horticoles et du paysage

de Master de l’Institut Supérieur des Sciences agronomiques,

agroalimentaires, horticoles et du paysage

d'un autre établissement (étudiant arrivé en M2)

Page 2: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

ii

Acknowledgement

Saying thanks for the whole internship is not enough to show my gratitude to Villy

Christensen and Chiara Piroddi. I owe them for how they cared for me, their help, their time

and all what they did for me while I was conducting the internship. I discovered a bench of

new methods to achieve goals and could see again how science needs to be rigorous. Villy

and Chiara also help me with the decision I needed to take and which makes my brain crazy.

Sometimes, receiving praises was too much for me and I felt as if I did not deserve it. But

finally, this is really motivating.

I also owe a lot to all the researchers I met at IOF, and with whom I could enjoy talks about

numerous marine topics and get to discuss my own research. Thanks to Daniel, Gabriel,

David and Deng who gave me scientific papers after our talks. I discovered how data could

be managed and understood that there is not a lack of data, but only troubles to exploit them.

I promise to remember all my work life about the “Soup of stones”, that’s famous way to

gather data and methods from nothing at the beginning. I would like to thanks the master

(Catarina, Yaying) and doctorate students (Shannon, Ana, Bias, Roberto, Nicolás), post-docs

researchers and even the French (Marie, Lola) with whom UBC life was a piece of heaven

and which truly let me enjoy the Canadian way of life. With all the seminars, the coffee

breaks, lunches and workshops, I could have a sight of the wonderful field of fisheries and

marine ecosystem. At Vancouver, I spent a delightful time as a visiting student at IOF and

discovered a way to make scientific research that I could not have done in France.

Special thanks to my closest relative who supported me even as I was far away from her by

my own – I hope I didn’t make you too worried. I keep also in mind how grateful I am to my

laptop which did not abandon me, but supported me with all its power.

Page 3: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

iii

Synthèse

La zone mésopélagique est une zone marine dans la colonne d’eau délimitée entre 200 et

1000 m de profondeur. La lumière n’y parvient que faiblement et n’est pas suffisante pour

maintenir la photosynthèse.

Les poissons mésopélagiques – espèces associées à cette couche - sont souvent

caractérisés par une migration verticale nycthémérale (DVM) vers les eaux superficielles

pour se nourrir de planctons. Le jour, ces espèces font parties de la ‘Deep Scattering layer’

(DSL) qui se situe aux alentours de 400-600 m de profondeur en mer Méditerranée. La DSL

est un motif facilement identifiable lors des enquêtes scientifiques par acoustique. Elle est

constituée de plusieurs espèces marines et est ciblée lors des campagnes scientifiques sur

le Mésopélagique.

Les poissons mésopélagiques représentent une biomasse mondiale totale d’un peu moins 1

milliard tonnes d’après un article de Gjøsaeter et Kawaguchi (1980) qui est paru un peu

après que l’Organisation pour l’Alimentation et l’Agriculture (FAO) se soit intéressée aux

ressources commerciales marines encore inexplorées. Cependant, cette ressource reste

faiblement exploitée : les espèces mésopélagiques ont un riche taux d’acides gras qui les

rendent peu commerciales pour la consommation humaine mais intéressantes dans la

production d’huile ou de farine de poisson. La véritable valeur des poissons mésopélagiques

réside dans leur importance au sein des écosystèmes marins. Ils constituent une part

importante des régimes alimentaires de poissons à intérêts commerciaux comme le thon,

mais également des oiseaux, des cétacés et autres mammifères marins. De leur position

trophique dans les écosystèmes marins, ils sont situés à une place clef : leur DVM relie les

zones riches et superficielles des océans avec les zones pauvres et profondes, accélérant,

ainsi, le processus de transport du Carbone. La DVM joue donc un rôle important dans ce

qui s’appelle la pompe biologique du carbone.

La mer Méditerranée, une des mers les plus salines au monde, est un espace physiquement

entouré par les continents européen, asiatique et africain. Elle est un hotspot de biodiversité

malgré une pression par les activités humaines très intense (pêcheries, pollutions et rejets

terrestres, tourisme) et un changement climatique qui affectent les eaux méditerranéennes.

Les dernières décennies ont été marquées par un changement dans les paramètres

environnementaux telles que la salinité ou la température et ceux-ci peuvent avoir perturbé

les biomasses marines présentes. Cette mer représente, à elle seule, ce qui peut se dérouler

dans les océans et constitue donc un exemple adéquat pour comprendre ce qui se passe

dans les eaux marines à petite échelle.

Il est supposé que les poissons mésopélagiques sont sensibles à certains paramètres

environnementaux. Les biomasses estimées en 1980 ont seulement été évaluées à partir

des données de captures et acoustiques, ou encore de la relation avec leurs proies. Estimer

la biomasse des poissons mésopélagiques grâce aux paramètres environnementaux avec

des analyses statistiques n’a pas encore été réalisé. Cette estimation est relativement

importante pour la connaissance des milieux marins et du cycle biochimique du carbone. Elle

peut être aussi intégrée dans la construction de modèles écosystémiques tels qu’Ecopath

with Ecosim. Comme de récentes études ont supposé que la biomasse des mésopélagiques

pouvait être sous-estimées en raison des méthodes utilisées, l’estimation plus précise de la

biomasse représente un intérêt nouveau.

Page 4: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

iv

Nous nous sommes donc demandés quels facteurs environnementaux pouvaient impacter la

biomasse des poissons mésopélagiques et comment cette dernière aurait pu évoluer sur une

période de 30 ans (entre 1980 et 2011). Le cœur de l’étude constituait à établir un modèle

avec deux méthodes utilisant la biomasse observée à partir de la littérature scientifique

établie en mer Méditerranée et des données environnementales issues de modèles

biogéochimiques : la salinité, la température, l’oxygène, la production primaire et la

bathymétrie. Afin de confronter les deux méthodes, nous les avons comparées sur leur

performance et sur les prédictions.

La première méthode, Random Forest, utilise la technique d’apprentissage par ensemble

d’arbres décisionnels. Pour ce faire, la méthode se base sur la prédiction issue de plusieurs

arbres décisionnels après les avoir moyennés. Un arbre décisionnel dans l’algorithme de

Random Forest est construit à partir de plusieurs sous-échantillons issus de l’ensemble

d’apprentissage (les données observées). Les sous-échantillons sont eux-mêmes

déterminés par un sous-ensemble de variables choisies aléatoirement : à un nœud de

l’arbre, un échantillon va être divisé selon la valeur des variables sélectionnées. Cette

méthode est de plus en plus utilisée dans les modèles écologiques et nous avons cherché à

tester sa performance.

La deuxième méthode, le modèle additif généralisé (GAM), réalise une régression sur des

données qui peuvent être non-paramétriques. Elle utilise une fonction de lien, ici la fonction

‘identité’, pour relier la valeur dépendante aux variables environnementales. Des fonctions

non spécifiées sont estimées par des techniques de lissages locaux afin de déterminer la

relation d’une variable environnementale avec la variable à expliquer (la biomasse). En

outre, il est toujours possible d’avoir des relations linéaires entre des variables explicatives et

la variable de réponse.

Ces deux méthodes ont utilisé un jeu de donnée de 905 biomasses éparpillées en mer

Méditerranée. La couverture spatiale seulement localisée à quelques zones (Mer des

Baléares, Mer Egée, Mer Ionienne, Mer d’Alboran) peut avoir limité les résultats.

Les analyses statistiques ont montré qu’en termes de performance, Random Forest était la

meilleure méthode. La variable environnementale qui influençait le plus l’estimation de la

biomasse se trouve être la salinité pour les deux méthodes. GAM a été beaucoup plus

influencé par ce paramètre que Random Forest. Il y eut, entre autre, un phénomène entre

1988 et 1998, où la salinité ayant augmenté autour de la Crête aurait pu provoquer un

accroissement de la biomasse de mésopélagiques dans cette région. Les estimations de RF

durant cette même période n’ont pas été perturbées.

Avec Random Forest, les variables environnementales avaient une relation avec la

biomasse qui correspondait le plus à la littérature : une corrélation en partie positive avec la

production primaire et l’oxygène et une corrélation plutôt négative avec la température. Avec

GAM, quelques incohérences en termes de relation avec la biomasse apparaissaient telle

qu’une corrélation négative entre la biomasse et la production primaire. Nous avons pu

observer, par ailleurs, que la distribution de biomasse en mer Méditerranée avec RF était

sensible à la distribution de la production primaire.

Les deux méthodes estiment également la biomasse des mésopélagiques différemment.

Entre 1980 et 2011, la biomasse des mésopélagiques n’aurait pas été marquée par un

accroissement ou une diminution de la biomasse. Plus précisément, GAM n’a pas de

Page 5: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

v

tendance qui se dégage. Pour RF, nous pouvons néanmoins observer une augmentation

graduelle de la biomasse jusqu’en 2000 puis une diminution de la biomasse en 2011.

Random Forest serait, en termes de performance et de sensibilité envers certaines

variables, une méthode appropriée pour prédire la biomasse de mésopélagique. Cependant,

la prédiction de biomasse relativement élevée sur les plateaux continentaux (i.e., 0-200 m de

profondeur) remet en question son usage. En outre, le fait qu’il soit difficile d’obtenir des

intervalles de confiances et de prédictions autour des valeurs estimées peut impliquer que

les prédictions varient énormément et ne soient pas fiables. En revanche, GAM donnent des

résultats plus satisfaisants sur les variations de la biomasse autour de la prédiction.

La biomasse estimée pour les 31 années et par les deux méthodes se trouve entre 20 000 t

et 35 000 t pour toute la Méditerranée. Cela correspond à un ordre 100 fois inférieur de

l’estimation de Gjøsaeter et Kawagachi (2,4 millions tonnes). Cette différence d’estimation

peut être attribuée aux méthodes de calcul utilisées pour extraire la biomasse des articles

scientifiques ou alors au fait que les articles scientifiques utilisés ne ciblaient pas

particulièrement les espèces mésopélagiques. Nous avions potentiellement un échantillon

sous-représentant la biomasse de mésopélagiques capturée. Il est prévu de reproduire

l’expérience à une échelle planétaire. Nous pourrions ainsi déterminer l’origine du problème :

si les biomasses sont d’un ordre supérieur, les articles utilisés pour la mer Méditerranée

donneraient des biomasses erronées.

Page 6: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

vi

List of Acronymes

ACE : Alternative Condition Expectation BRT: Boosted Regression Tree C: Carbon CIT: Conditional Inference Tree CV : Coefficient of Variation d: day DMW: Deep Mediterranean Water DSL : Deep Scattering Layer DVM : Diel Vertical migration EMED: Eastern Mediterranean FAO: Food and Agricultural Organisation GAM : Generalized Additive Model GCV: Generalized Cross Validation LIW: Levantine Intermediate Water MAW: Modified Atlantic Water MSE : Mean of squared residuals N: Azote OOB : Out of Bag PPR : Primary Production RF: Random Forest RMSE : Root Mean Square Error WGS: World Geodetic system WMED: Western Mediterranean

Page 7: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

vii

List of Figures

Figure 1: Clusters of biogeochemical regions of the mesopelagic layer in Mediterranean Sea.

(Reygondeau et al.,2015) ............................................................................................... 4

Figure 2: Variation of the environmental parameters on the period studied : 1980-2011,

average on the whole Mediterranean Sea. ..................................................................... 6

Figure 3: Distribution of the samples in the Mediterranean Sea and values of the biomass

(t/km²) in each sample. ................................................................................................... 8

Figure 4: Process of random Forest from n trees to get the model. ....................................... 9

Figure 5: ACE transformation of independent variables included in the regression ...............12

Figure 6: Scatter plots of observed versus predicted mesopelagic fish biomass from a)

Random Forest model and b) GAM. The blue dashed line is the 1:1 line, the black line is

the fit. The black dashed line in a) is the fit without the bias correction. .........................15

Figure 7: Partial Dependence Plot from Random Forest model. The y-axis is the absolute

value of the prediction when other variable are fixed. ....................................................16

Figure 8: Predictor variable importance in Random Forest approaches with their confident

interval. .........................................................................................................................17

Figure 9: Form of the smooth functions for the selected covariate. Black solid line is the

smooth function estimate with grey intervals representing the 95% confidence intervals.

......................................................................................................................................18

Figure 10: Boxplot of the RMSE got from a 10 folds cross-validation repeated 100 times. ...19

Figure 11: Distribution of the predicted biomass in 1980 with a) RF and b) GAM .................20

Figure 12: Biomass of mesopelagic fish from 1980 to 2011 estimated by RF (continuous line)

and GAM (dashed line). .................................................................................................21

Figure 13: Variance of the mesopelagic biomass for 31 years: 1980-2011 extract from a) RF

and b) GAM ...................................................................................................................22

Figure 14: Rate of the climate change pressure on Mediterranean Sea according to

Reygondeau et al. (2015) ..............................................................................................25

Figure 15: Spatial distribution of the 7 clusters of the chlorophyll dynamics in Mediterranean

Sea (Sources: D’Ortenzio and Ribera d’Alcalá, 2009). ..................................................27

Figure 16: Cartography of the salinity (on the left side) and the biomass estimated by GAM

(on the right side) for the a) 1988, b) 1993, c)1998 years. .............................................28

List of Tables

Table 1: Main biogeochemical characteristics of the Mediterranean Sea. .............................. 3

Table 2: Environmental predictors used in the analyze of the biomass .................................. 6

Table 3: Parameters contribution to GAM model’s deviance.................................................17

Table 4: GAM significance levels of models terms. ...............................................................18

Table 5: Comparison between the results of the study and the result from Gjøsaeter and

Kawaguchi: ....................................................................................................................26

Page 8: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

viii

Table of Contents

Acknowledgement ............................................................................................................................ii Synthèse .......................................................................................................................................... iii List of Acronymes ............................................................................................................................ vi List of Figures .................................................................................................................................. vii List of Tables ................................................................................................................................... vii

1. INTRODUCTION AND CONTEXT ....................................................................................................... 1

2. MATERIAL AND METHODS .............................................................................................................. 2

2.1. Study site: Mediterranean Sea. ............................................................................................... 2

2.1.1.General 3-dimensional geochemical features and the global changes ................................. 3

2.1.2.Biological features: ................................................................................................................. 5

2.2. Environmental predictors ........................................................................................................ 5

2.3. Biomass information ............................................................................................................... 7

2.4. Biomass distribution models ................................................................................................... 8

2.4.1.Random Forest ....................................................................................................................... 8

2.4.2.General non parametric regression model .......................................................................... 11

2.5. Prediction of biomass ............................................................................................................ 14

RESULTS ................................................................................................................................................. 15

2.6. Comparison of model performance ...................................................................................... 15

2.7. Comparison of the estimation of the biomass in Mediterranean Sea .................................. 19

2.7.1.Biomass in 1980, an example ............................................................................................... 19

2.7.2.Temporal biomass estimates ............................................................................................... 20

3. DISCUSSION ................................................................................................................................... 23

3.1. Methodological considerations: validity of the data............................................................. 23

3.1.1.Reliability of the environmental data ................................................................................... 23

3.1.2.Miscalculation of the observed biomass. ............................................................................. 23

3.1.3.Catchability cause. ................................................................................................................ 24

3.2. Model performance .............................................................................................................. 24

3.3. Estimates of biomass. ............................................................................................................ 25

4. CONCLUSIONS AND PERSPECTIVES ............................................................................................... 28

5. BIBLIOGAPHY ................................................................................................................................. 30

Annex I: Maps of the environmental predictors ........................................................................... viii Annex I: Maps of the environmental predictors ........................................................................... viii Annex II : Mesopelagic fish families and species of the Mediterranean Sea .................................. ix Annex III: Distribution of variables from the samples ...................................................................... x Annex IV : Scatterplots of environmental variables and the biomass of samples .......................... xi Annex V: 1) Analyses of GAM residuals and uncertainty of 2) GAM and 3) RF estimates ............. xii Annex VI : Scientific literatures used for the extracting of the biomass. ...................................... xiii

Page 9: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

1

1. INTRODUCTION AND CONTEXT

Mesopelagic fish are fish species which spend the day in the mesopelagic zone

between 200 and 1,000 m depth (Gjøsaeter and Kawaguchi, 1980). This layer is itself

defined as disphotic: place where there is enough solar illumination to discern day and night

periods but unable to support photosynthesis. Mesopelagic species are found in all oceans

(Gjøsaeter and Kawaguchi, 1980) with the exception of some species that are endemic to

specific areas (e.g., Electrona antartica (Myctophidae) at South of the Antarctic Polar Front

(Hulley, 1990)). The Myctophidae family is the main one in mesopelagic water and

represents the most abundant family in the ocean (Catul et al., 2011). Few species have a

commercial value due to their high fat acids rates (Lea et al., 2002) used for production of

fishmeal and fish oil (like Benthosema pterotum (Myctophidae) (Valinassab et al., 2007)). In

general, though, fishing for mesopelagic fish is not economically viable yet.

A diel vertical migration (DVM) occurs for most of those species. Some come to shallow

water during nighttime for trophic purposes and migrate to depth before daytime where they

are less active. This concentration of fish forms several layers in the water column where

one, from 400 m to 600 m depth (Olivar et al., 2010), is famously called the deep scattering

layers (DSL). The other part does not migrate at all, like the genus Cyclothone

(Gonostomatidae), or just towards some intermediate depth layers (Legand et al., 1972). The

DVM is hardly perturbed by seasonal variation (Pearcy and Laurs, 1965), but do show higher

biomasses in the summer than in winter (Pearcy et al., 1976). Whereas vertical migration is

frequent, horizontal migration is seldom observed (Gjøsaeter and Kawaguchi (1980), Reid et

al., 1991).

Mesopelagic fishes are particularly important for their trophic role in the food web. Their

preys consist on zooplankton (William et al., 2011) and they are main prey in the diet of

several top predators such as penguins, cetaceans (Cherel et al., 2008) and seabirds

(Barrett et al., 2002) but also in the diet of commercial valuable species (Würtz, 2010) such

as the scombrids (Allain 2005, Ménard et al., 2000). With a trophic level ranging between 2.9

and 4 (Valls et al., 2014), they are, therefore, an important vector between low trophic level

species and higher trophic level species. Mesopelagic fishes are also recognized as an

important component of the ‘biological pump’ of carbon into the deep water. Because of their

vertical migration and/or their feeding of migrant zooplankton, they provide, indeed, an active

transport of organic carbon below the depth of carbon’s remineralization (Davison, 2011;

Irigoien et al., 2014). In the Northeast Pacific, it was estimated that mesopelagic fishes can

export 23.9 mg.C.m-2.d-1 into deep layers (Davison et al., 2013). The organic carbon

components from their metabolism are then used by microbial fauna (Irigoien et al., 2014).

Thus, estimating their biomass can improve the knowledge of the biogeochemical cycle of

oceans (Irigoien et al., 2014) and be useful for the construction of ecosystem models (e.g.,

Ecopath with Ecosim (Christensen et al., 2009)).

According to several studies, mesopelagic fishes biomass dominates the global fish biomass

(Mann, 1984). Gjøsaeter and Kawaguchi (1980), in a study conducted for the Food and

Agricultural Organization of the United Nations (FAO) on unexplored commercial resources,

estimated a biomass of approximately 1,000 millions tons. However, several studies as the

ones from Irigoien et al. (2014) and Kaarvedt et al. (2012), suggest that this estimate is a

significant underestimate because of the methods used to estimate the biomass.

Page 10: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

2

For example, in the first biomass estimates of mesopelagic fish, Gjøsaeter and Kawaguchi

based their analyses on catch from sampling nets (commercial trawl, micronekton net),

acoustic and eggs and larva surveys to compute the abundance per area. Since some

species have the ability to avoid nets or escape during trawl survey (Kaartvedt et al., 2014;

Mann, 1984) and some do not have swim bladders, such methods could not be enough to

assess their current biomass in ocean. Moreover, at the time the study was conducted,

acoustic techniques had an inefficient level of knowledge on the target strength of

mesopelagic fish. This parameter based on swim bladder features is an important tool for a

reliable interpretation of acoustic data. Tseitlin also evaluated the mesopelagic species

biomass. He, first, estimated the biomass of mesoplankton and then made a correlation

between the mesoplankton and the mesopelagic fish biomass (Evgeny Pakhomov, Personal

communication). His estimates was slightly higher than Gjøaseter and Kawaguchi’s biomass

but did not reach the 10-times factor supposed by Irigoien et al. (2014)

Lastly, as the relationship between mesopelagic fish and its environment was demonstrated

(Norheim, 2014), their biomass may have changed due to global warming and stronger

pressure from human activities. Furthermore, a need of species distribution models in

Mediterranean Sea is required to understand the effects of changing environmental

conditions (The MerMex Group, 2011). Thus this report aims at estimating mesopelagic fish

biomass exploring different approaches than the ones previously utilized. In particular, using

the Mediterranean as case study, statistical analyses estimate biomass as a dependent

variable of physical, chemical and biological parameters. The overall objective is to identify

strong and reliable statistical approaches as a first step toward a global implementation. In

this report, we assessed two different statistical methods: a regression tree and a traditional

classic regression. The first, also called random forest, is a tool that is increasingly used in

marine ecological research because of its robustness and the user-friendly implementation of

the algorithm with R Packages. The second, Generalized Additive Model, is a tool frequently

used for ecological modelling. The main question of this study was: which of the random

forest model and generalized additive model is the most proper tool to estimate the

mesopelagic fish biomass taking into account environmental parameters and time?

The outcomes could indeed help understanding better the dynamic of mesopelagic fishes

accordingly to environmental parameters and evaluating spatial temporal distribution of those

fish. This report has a first part explaining the relationship between the mesopelagic fish

biomass and the environmental parameters. The second part gives the estimates of biomass

on a time-series between 1980 and 2011 and the distribution of the mesopelagic fish

biomass in Mediterranean Sea.

2. MATERIAL AND METHODS

2.1. Study site: Mediterranean Sea.

With a surface of approximately 2 510 000 km² and an average depth of 1500 m, the

Mediterranean Sea is considered a semi-enclosed sea due to the small exchange of water

with the other oceans/seas (Atlantic Ocean via Gilbraltar’s strait, Black Sea via the Bosporus

and the Dardanelles and the Red Sea with the Suez Canal). These features make the basin

the world’s largest, deepest and most highly pressured enclosed sea.

Page 11: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

3

The Mediterranean Sea has two basins geographically distinct in terms of circulations and

water masses connected by the Strait of Sicily. Those nearly equal size basins are divided in

several basins characterized by the biogeochemistry of the water (e.g., Ionian Sea, Adriatic

Sea, Ligurian sea, etc…). The general circulation of water masses follows an anticlockwise

flow along the continental slope but each basin has an intern independent circulation.

Through the both basins, literatures differentiate three layers of water masses: the Modified

Atlantic Water (MAW), the Levantine Intermediate Water (LIW) and the Deep Mediterranean

Water (DMW) which are respectively around 0-150 m, 150-400m and below 400 m depth

(Zavatarelli and Melior, 1995). Each of these layers has slightly different characteristics of the

water masses which explain their formation (Lascaratos et al., 1999). The deep-sea oceanic

domain, which interested us- the Bathyl zone - covers 72 % of the Mediterranean Sea (Emig

and Geistdoerfer, 2004).

The Mediterranean Sea is as a region considered to be strongly affected by the global

warming. All patterns from the ongoing warming in the world’s ocean occur at a lower scale

in Mediterranean Sea (Lejeusne et al., 2009).

2.1.1. General 3-dimensional geochemical features and the global changes

Geochemical features of Mediterranean Sea are issue of many scientific articles. The semi-

closed sea can be characterized per each basin and layer of the water column (Table 1).

Table 1: Main biogeochemical characteristics of the Mediterranean Sea.

Parameters General values by basins (WMED/EMED) in the water column (Exception with the primary production)

Values associated with the observed biomass used for the training of the model.

Salinity 38.4 / 39.1 in LIW 38,4 / 38,7 in MDW (Zavatarelli and Melior, 1995)

Mean: 38.2 Min / Max: 37.2 / 39.1

Temperature 13.0 / 15.5 °C in LIW 12.7 / 13.6 °C in MDW (Zavatarelli and Melior, 1995)

Mean:16.1 °C Min/Max: 14.4 /20.6 °C

Primary Production

20 / 25 g C/m²/year (Danovaro et al., 2001) 0.3 / 0.6 g C/m²/day WMED (Bosc et Al., 2004) 0.2 / 0.4 g C/m²/day EMED in 1999

Mean: 12.4 mmol.N/m²/d Min/Max: 6.9 /22.3 mmol.N/m²/d

Oxygen 4.28 / 4.91 mL/L in LIW 4.24 / 4.48 mL/L in MDW (Manca et al., 2003)

Mean: 220.3 µmol/kg Min /Max: 202.3 / 235.5 µmol/kg

Bathymetry Max depth : ~ 3500 m / 5121 m Mean: -1002 m Min/Max:-4181/-59 m

In the deep-water of Mediterranean, the water masses are homothermous from 300-500 m to

the bottom. The temperature varies with the longitude: the deep-water in the Western

Mediterranean (WMED) is warmer (15.5 °C) (Table 1) than in the Eastern Mediterranean

(EMED) (13 °C) (Danovaro et al., 2010). However in the shallow water, the column is

stratified in late spring and in summer with a thermocline around 20 to 50 m deep. (Danavaro

et al., 2010). It was also observed that the temperature has increased in water below 200 m

depth to 0.19°C/decade since 1961 (Borghini et al., 2014). Nevertheless, the WMED has

Page 12: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

4

become warmer in all water columns whereas the EMED became colder in the shallow water

and mid-water (0-600 m) and warmer in the deep water (below 600 m) between 1970 and

2000 (Rixen et al, 2005; Schroeder et al., 2012). Themelis (1997) and Olivar et al. (2012)

showed that assemblages of mesopelagic and the temperature might be correlated. In

particular, the study of Olivar et al. (2012) stated that the DSL could be localized under the

thermocline.

Mediterranean Sea is the saltiest sea in the world and becomes saltier each year in all layers

of the sea (Borghini et al., 2014; Rixen et al., 2005). The eastern part is saltier than the

western part due to a strong net evaporation (50 to 100 cm/year (Bryden and Kinder, 1991))

The WMED waters are also less dense because the thermohaline circulation begins at the

Strait of Gilbraltar with water masses from Atlantic (Schroeder et Al; Borghini et al., 2014).

Themelis (1997) established mesopelagic fishes distribution was affected by the salinity.

The high oxygen concentration is also one of the main hydrological features of the deep

Mediterranean Sea (Danovaro et al., 2010). The concentration is higher in MAW and LIW

than in MDW where the variation is stable. The concentrations of dissolved oxygen became

higher in the deep layers whereas it has reduced for water above 1000 m depth during those

last decades (Lascaratos et al., 1999). Mesopelagic fish lives during day time in the deficient

oxygen layer (Kinzer et al., 1993). The abundance of mesopelagic fish could be affected and

decline with a reduction of the oxygen concentration (Bianchi and Morri, 2000; Koslow et al.,

2011).

Mesopelagic fish might be related to physical and chemical variables mentioned above.

Besides, it is important to remember that those fish are part of the mesopelagic zone. This

layer could be independently described by other variables than the ones for which the

mesopelagic fishes are sensitive. The temperature, the euphotic depth, the nutrients and the

salinity are the parameters which can differentiate the mesopelagic from the epipelagic and

the bathypelagic layer in Mediterranean Sea (Reygondeau et al., 2015). Only two of them,

temperature and salinity, are going to be part of our models.

Figure 1: Clusters of biogeochemical regions of the mesopelagic layer in

Mediterranean Sea. (Reygondeau et al., 2015)

Page 13: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

5

According to Reygondeau et al. (2015), mesopelagic layer could be clustered in eleven

regions based on temperature, salinity, Oxygen, pH, NO3 and PO4 values. Unique

environmental patterns in water masses determine an area – region- having the same

values. E.g., Adriatic Sea (violet zone 9 in Figure 1) has salinity and oxygen values

concentrated around 38.8 psu and 5.75 mL/L respectively. Balearan Sea (green zone 1 in

Figure 1) has also concentrated salinity values around 38.8 psu, but oxygen values fluctuate

between 3.5 and 5.75 mL/L. Those clusters can help in the understanding of the biomass

distribution of mesopelagic fish.

2.1.2. Biological features:

The low nutrient levels of Mediterranean characterize the sea as oligotrophic. However,

those levels contrast with an intermediate level of primary production. The primary production

is generally higher in the WMED than in the EMED. The primary production occurs

respectively in early spring time and in summer time (Bosc et al., 2004). Besides, EMED is

one of the most oligotrophic areas of the world. This pattern is explained by the decrease in

nutrients from west (due to productive waters flows coming from Atlantic Ocean) to east.

(Danovara et al., 2010). Between 1980 and 2011 – period studied in this report, the

interannual changes of primary production did not follow a general trend but partially varied

due to the occurrence of exceptional bloom (Bosc et al., 2004). Primary production and

mesopelagic fish relationship has been demonstrated by several studies : their diet is

composed by an abundant quantity and wide variety of meso and macroplankton (Kozlov,

1995) and Irigoein et al. (2014) recently proved a positive relationship between primary

production and mesopelagic fish.

The distribution of mesopelagic fish is primarily led by the relation prey/predator. As

mentioned in the introduction, mesopelagic fish happens to migrate vertically, mainly for their

diet. It was also reported that some schools of mesopelagic fish can be distributed according

to the pressure of predators (Saunders et al., 2013). A static deep scattering layer where the

majority of the mesopelagic species occur was identified between 200 to 400 m depth by

Peña et al. (2014) in Balearan water. This layer was a permanent pattern seen with acoustic

at 38 kHz. Their DVM was better observed with a lower frequency (18 kHz).

The biomass of mesopelagic fish could be impacted by physically structures such as

seamounts (isolated elevation of 1000 m and more above the seafloor (Menard, 1964)), and

sea-ice coverage. In Mediterranean Sea, 101 seamounts are numbered (Morato et al., 2013)

with the majority located in the Tyrrhenian Sea and along the South East Ionian Sea up to

the South of Crete. Those topological patterns impact negatively the abundance and

community structure of mesopelagic due to a high predation (De Forest and Drazen, 2008;

Pusch et al., 2004).

2.2. Environmental predictors

Climatological data were extracted from a biogeochemical model (Macias et al., 2014) and

from datasets freely available on the network (Table 2). They consisted on an average of

selected parameters (e.g., sea surface temperature, salinity, etc…) over the whole water

column from 1970 to 2011. Then, models used the parameters of a given year, for example

1980, to predict the biomass at this year.

Page 14: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

6

As the salinity dataset was not available since 1980, an average between 1987 and 1992

has been done for complementing the missing years (1980-1986). This was done making the

assumptions that salinity has not significantly changed during the 80s.The spatial resolution

used to model was 0.2° to optimize the running of the algorithms although the resolution of

the dataset was 0.0625°.

Table 2: Environmental predictors used in the analyze of the biomass

Environmental variable (unit)

Resolution

Model Available Year

Source

Primary production

(mmol N.m-2.d-1 )

0.0625°

Biogeochemical model

1970-2011

Macias et al.,2014

Oxygen (µmol/kg)

Temperature (°C)

Bathymetry (m) Global terrain

models /

General Bathymetric Chart of the Oceans

(www.gebco.net)

Salinity (PSU) NEMO-OPA

1987-2011

MyOcean follow-on (marine.copernicus.eu)

In the case of the salinity, oxygen and temperature, times series of our environmental

datasets are more and less following the trends described in earlier paragraphs (Figure 2).

Figure 2: Variation of the environmental parameters on the period studied : 1980-2011,

average on the whole Mediterranean Sea.

Temperature and oxygen increase over all the period studied (from 16.4 °C to 16.9°C and

from 214 µmol/kg to 218 µmol/kg respectively). In the case of oxygen, only the deepest

layers are known for having a raise of oxygen and it could explain the increase seen in

38.4

038

.46

38.5

2

z

Sal

inity

214

216

218

z

Oxy

gen

16.4

16.7

z

Tem

pera

ture

1980 1985 1990 1995 2000 2005 2010

9.5

10.5

PP

R

Page 15: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

7

Figure 2. The salinity decreases up to 1997 and then increase from 38.40 to 38.54. No

specific trend appears with the primary production. It varies between 9.3 and 11 mmol N/m²/d

on our time-series as what stated Bosc et al. (2004).

Other environmental variables such as pH and dissolved organic Carbon could have been

included in the models as environmental parameters which could influence the mesopelagic

fish biomass. However, their availability was either restrained in time (e.g., the 1980-2011

period was partially covered) or the resolution of the grids was above 0.2°. Therefore, we

could not consider those parameters.

2.3. Biomass information

All biomass data were gathered from scientific literature (from peer-reviewed article to

scientific report) (Annex VI). Conditions were that the literature might provide coordinates

and estimates of the mesopelagic fish biomass. That is why literature mentioning fish caught

by longlines or found in the diet of a predator could not be taken into consideration.

Mesopelagic fish in larval stage was not included in the estimates of biomass.

The species included in the analysis were the ones that accordingly to Fishbase (Froese,R,

Payly D., 2015) were defined as mesopelagic or in some cases, bathypelagic fishes. Indeed,

some fishes defined as bathypelagic could occur in both mesopelagic and bathypelagic layer

(e.g. Arctozenus risso, a bathypelagic fish living in the layer from 0 to 2200 m depth) (see

Annex II). Benthopelagic, reef species and epipelagic species were immediately discarded.

Biomass (in tonnes) was recorded by station (i.e., by longitude and latitude). If only

abundance data were found, then biomass was computed using the length-weight

relationship:

𝑊 = 𝑎. 𝐿𝑏 (1)

with W being the weight (gr), L the total length (cm) of the fish, a and b, coefficients

extracted from Fishbase or from the literature.

A catchability rate was also included in the analyses. In particular, we applied a catchability

of 0.07 for 7 cm or less length, 0.06 for a length around 8 cm and 0.2 for a length even or

higher than 12 cm, as observed by May and Blaber (1989).

Finally, to obtain biomass per surface, we divided the biomass per the trawled area (Strawled)

which can be expressed as:

𝑆𝑡𝑟𝑎𝑤𝑙𝑒𝑑 = ℎ. 𝑣. 𝑡 (2)

where h is the width (m) of the mouth of the trawl, v (m²/s) the speed of the boat and t

(s) the time spent trawling.

Page 16: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

8

Figure 3: Distribution of the samples in the Mediterranean Sea and values of the

biomass (t/km²) in each sample.

In total, we gathered 905 samples (including biomass estimates, depth, coordinates and

years of survey) mainly distributed along the north shore of the Mediterranean Sea (Figure

3), particularly in the western part. The biomass varied between 5.36*10-9 t/km² and 5.64

t/km². 13 mesopelagic families were assessed with 3 being well known in the mesopelagic

layer (Gonostomiids, Myctophids and Stomiids) representing 71.8 % of the biomass

sampled.

The biomass extracted and/or estimated in this study represents the total biomass of

mesopelagic fish in the water column.

2.4. Biomass distribution models

In order to visualize the relationship between mesopelagic fish biomass and environmental

data, two kinds of statistical algorithms were fitted to the dataset: random forest and

generalized additive model. In both models, each biomass point was weighted proportionally

to the number of years for which biomass was available at the coordinate point, giving it

more importance. For instance, at a point where biomasses are sampled for 3 years, the

weight is 3. Accordingly to the Pearson’s coefficient of correlation, our predictors were not

correlated between them (i.e., the coefficient was lower than 75 % between a couple of

predictors).

All the statistical analyses and maps were done with R (R Core Team) and QGIS (QGIS

Development Team) respectively.

2.4.1. Random Forest

Traditional Random Forest (hence RF) was chosen over Conditional Inferences Trees (CIT)

and Boosted Regression Trees (BRT) which are all tools used in ecological modeling

(De’ath, 2006; Prasad, 2006). The three of them make a forest of several decision trees grow

before averaging them. That means they fit many models (called decision trees) and

combine them (bagging or boosting) for prediction. Whereas RF randomly selects the

predictor to split the observations, BRT selects variables which maximise the difference

between the two groups. In addition, BRT combines additive regression with boosting

Page 17: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

9

techniques which can be recurrent with the other method used in this study (Hastie et al.,

2009). CIT differs from traditional random forest by the concept of a statistical significance

improvement approach while partitioning the training data and the aggregation scheme of the

n trees. (Hothorn et al., 2006b). It means knowing whether the variables selected for the split

will significantly improve the general performance of the model or not. However, except for a

data frame containing categorical variable, ensemble of CIT gives the same results of

traditional RF.

2.4.1.1. General information about Random Forest

Random Forest is a machine bagging learning method which builds a lot of decision trees

after bootstrapping samples from the dataset. The response variable (in our case, log

transformed biomass) is partitioned into small successive binary splits which are based on

single values from different predictors. For example, supposing two variables for the

partitioning: temperature and salinity with a value of 16°C and 37.2 respectively, all samples

having a lower temperature and salinity will be packed together in one side, and samples

having higher values, in another side. From the first new pack, two news variables will be

used (for example, oxygen and salinity) to split it once again while the second pack will

process identically but independently with other variables.

At each node of the trees (i.e., the packed data formed between two splits), the split kept is

the best one among all the values of the selected variables used for splitting (e.g., splitting

with a temperature of 16°C instead of 17°C). Then, RF algorithm does an averaging of all the

decision trees which have grown independently with the training dataset – the dataset used

to fit the model (Figure 3).

Figure 4: Process of random Forest from n trees to get the model.

RF performs automatically a cross-validation with one third of the dataset kept out of the

training dataset. This one third, called the Out-of-Bag (OOB) data will be used to define the

model accuracy to predict in computing the mean of square error (MSE) measure: MSEOOB.

The MSEOOB will allow to choose the number of tree needed in the forest. It will generally

decrease when the number of tree is increasing.

Unlike linear regressions, the normality of the data structure is not required by RF. Random

Forest can model and predict non-linear response variables with a little number of samples

and large number of predictors even if interactions and correlations among variables exist.

The predictors can be either a mix of continuous and categorical variables (Breiman, 2001)

or one type of variables. However, RF does not tolerate NA values. Data needs to be fixed

Dataset

BootStrapped Sample 1

Tree 1

OOB sample 1 ...

Tree k

... Bootstrapped

sample n

Tree n

OOB sample n

MSE MSE MSE

AVERAGE of the n TREES

Page 18: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

10

either by performing a local average, by using a function (‘NAroughfix’) included in the

Package randomForest R, or by removing those NA values.

For regression RF, the model can be understood by the following equation (Hastie et al.,

2009):

𝐹(𝑥) =1

𝐽. ∑ 𝐶𝑗 𝑓𝑢𝑙𝑙

𝐽

𝑗=1

+ ∑(1

𝐽. ∑ 𝑐𝑜𝑛𝑡𝑟𝑖𝑏𝑗(𝑥, 𝑘)

𝐽

𝑗=1

)

𝐾

𝑘=1

(3)

J : number of trees in the forest K : number of nodes Cjfull : is the value of the response at the root of the nodes (average value of the response in the training dataset), Contribj(x,k) is the contribution of the kth feature in the feature vector x (i.e., the difference between the averaged response at the node k and at the node k-1).

The theoretical prediction function of Random Forest, in the case of regression, is the

average of the bias terms plus the average contribution of each feature. This shape is quite

similar to the multiple linear regressions except that the independent variable coefficients

used for the split cannot be provided.

The performance of the model (percent variance explained) is useful to compare this model

with the generalized additive model (GAM). The computing is based on MSE between

observation and OOB predictions.

MSEoob = n−1. ∑(yi

J

j=1

− yiOOB)² (4)

and then :

R2 = 1 −MSEOOB

θ̂y2

(5)

n, the number of samples, yiOOB the average of the Out of Bag predictions and θ̂ the

standard deviation of the estimates (Liaw and Wiener, 2002).

Although the number of studies using RF is increasing in research, a bias has been noticed

with an overestimation of small values and an underestimation of large values. In order to

correct this bias, a methodology from literature has been followed to generate the bias

correction (Xu, 2013; Zhang and Lu (2012))

The steps described by Zhang and Lu are:

1- Random Forest is applied on the response (𝒀𝒐𝒃𝒔) with the predictors (Xi, (i=1,…,n)).

2- The bias �̂� is obtained : �̂� = �̂� − 𝒀𝒐𝒃𝒔

3- Random Forest is applied on the bias �̂� with the predictors (Xi, (i=1,…,n))

4- A corrected estimation of the biomass is get for each sample of the training data:

�̂�𝒃𝒄 = �̂� − �̂�𝑹𝑭

Page 19: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

11

The linear regressions can provide the predictor coefficients of the regression and their

importance in the variance explained by the model. But RF only provides the variable

importance: the predictor which impacts on the model performance if it is discarded from the

model. This impossibility to interpret how the variable might impact the response is why RF is

considered as a black box.

For regression random forest, two kinds of variable importance are provided: one is the

computing of the increasing in MSE while a permutation of a predictor occurs, e.g. the

difference between MSE from the model with all variables and the same model without one

of the variables. The other is a measure of the increasing node purity by residual sum of

squares (RSS) which looks at how the permutation of a predictor can improve the RSS at

each node of the tree (Hastie et al., 2009). The variable importance shows the contribution of

the predictors to the model accuracy but it needs to be careful handed: if a high correlated

variable is present among the predictors, the importance of this variable is affected and is

relatively big (Gregorutti et al., 2013). In that case, the predictor must be discarded.

2.4.1.2. Parametrisation of Random Forest

Random Forest was run using the package randomForest (Cutler and Wiener, 2012). Among

all the packages running random Forest (party, quantregForest), this package was the one

providing understandable outputs and fast results. This package was also well described and

its advantages and drawbacks enough known. After a short run of CIT with ‘cforest’ from

party package (Hothorn et al., 2015, version 1.0-22), it was noticed the results were similar.

Random Forest needs two entries for running: a number of variables used for the split and a

number of trees.

The number of predictors used for each split was fixed to 1 predictor (e.g., temperature,

salinity) thanks to the function ‘TuneRF’ from the package randomForest. No pruning – fact

to limit the number of nodes in a tree, was applied. The response variable was then

bootstrap resampled to generate large number of un-pruned decision trees. This number of

trees has been settled to 400 trees. With fewer trees, the performance of the model was

unstable and the importance of variables was hesitant about the order of the variables. With

more trees, a risk of overfitting the model could rise up. 400 trees allow us to have a pseudo-

R² stabilized (Breiman, 2001).

Once the parameters were defined, the running was done with seeds set to 1000. The

expected outputs were the importance of variable, the mean square error and the pseudo R².

Partial Dependent plots which are graphs plotting the predictor versus the response when

the other predictors are held at their mean values can also be obtained (Elith et al., 2008).

2.4.2. General non parametric regression model

2.4.2.1. Optimal transformation with ACE

The Alternating Conditionnal Expectation (ACE) is a tool used to seek transformations of the

independent variable which maximize the correlation between the response (log transformed

biomass) and the predictors (temperature, salinity,…) (Brieman and Friedman, 1985). It

allows nonmonotonic transformations of all variables. It is through the shape of the plots of

the transformed variable versus the observed values of the variables that we can obtain the

Page 20: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

12

type of transformations required for regression analysis. ACE was applied thanks to the

package Acepack (Spector et al., 2014). The output are representing in Figure 5.

Figure 5: ACE transformation of independent variables included in the regression

The relationship between environmental parameters could be expected as linear for the

oxygen and the primary production. Some discontinuities in the plots of Oxygen (~220

µmol/kg) and PPR (12.5 mmol N/m²/d) could be associated to weird values of the predictors.

Although the shape of the temperature against the biomass seems to be segmented, a

logarithmic transformation could be applied on this parameter and also on the bathymetry

variable. For the salinity parameters, the linearity cannot be assumed without segmenting the

variable or having a non-linearity relationship. The distribution of this variable shows two

modes (Annex III). This result led us to use a generalized additive model (GAM).

2.4.2.2. Generalized additive model regression

GAM is an extension of the linear model that allows several transformations on the

independent variables. Like RF, it can deal with non-linear data.

205 215 225 235

-1.0

-0.5

0.0

0.5

1.0

Oxygen

-4000 -2000 0

-0.6

-0.4

-0.2

0.0

Bathymetry

10 15 20

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

PPR

37.5 38.0 38.5 39.0

-3-2

-10

Salinity

15 16 17 18 19 20

-0.5

0.0

0.5

1.0

1.5

Temperature

ACE

Bio

ma

ss

Page 21: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

13

The results of the optimal transformation with ACE and the distribution of the observation

(Annex III) lead to use a semi parametric regression model to fit the data. Indeed, except the

salinity predictor, all the environmental parameters could have a linear relation without or

with a logarithmic transformation.

Semi parametric GAM is derived from the original equation of Hastie and Tibshirani (1986):

𝑔(𝑦) = 𝜇 + 𝛽1. 𝑥1 + ⋯ + 𝛽𝑘. 𝑥𝑘 + 𝑚𝑘+1(𝑥𝑘+1) + ⋯ + 𝑚𝑛(𝑥𝑛) + 𝜀 (6)

Where , g() is the monotone link function, the partial-regression functions mk are fit

using smoothing splines for non-parametric variables, µ the intercept, βk, the coefficient

of parametric variables and 𝜀 the normally distributed random error

GAM is a model which is fit by penalized likelihood maximization. Smoothing parameters,

when unknown, are solved using the Generalized Cross Validation (GCV) criterion (Wood,

2015):

𝐺𝐶𝑉 =𝑛. 𝐷

(𝑛 − 𝐷𝑜𝐹)2 (7)

Where n is the number of samples, D, the deviance, and DoF, the effective Degrees of

Freedom.

GAM has been implemented by using the ‘mgcv’ package (version 1.8-7 from Wood, 2015).

The selection of a final model followed several criteria: (i) the estimated degrees of freedom

must not be close to 1, (ii) the confidence interval should not be close to 0 along the smooth

function plot, (iii) GCV is decreasing if one of the covariates is dropped. Then, the results of

the selected model should not (i) give unrealistically biomass and (ii) errors with the

estimation of uncertainty of biomass (Wood, 2001).

With an identical link function, the Gaussian model structure was:

𝐿𝑜𝑔(𝐵𝑖𝑜𝑚𝑎𝑠𝑠) = 𝑃𝑃𝑅 + 𝑂𝑥𝑦𝑔𝑒𝑛 + log(𝐵𝑎𝑡ℎ𝑦𝑚𝑒𝑡𝑟𝑦) + log(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) + 𝑠(𝑆𝑎𝑙𝑖𝑛𝑖𝑡𝑦) +

log(𝑏𝑎𝑡ℎ𝑦𝑚𝑒𝑡𝑟𝑦) : 𝑂𝑥𝑦𝑔𝑒𝑛 (8)

Where s(), the smooth function, is a thin plate regression spline.

The basis dimension parameter k was set to 5, and thereby, reducing the highest degrees of

freedom to be 4. This choice is coherent with the needed conditions mentioned above. A low

value for this parameter has also the advantages to avoid overfitting and be more adapted to

an ecological issue. The parameter gamma, a correction ad hoc which inflates the degree of

freedom in the GCV, was set at 1.4 for avoiding the overfitting of the data as proved with Kim

and Gu (2004).

Interaction terms between the bathymetry and the other independent variables were

regarded as potential effect on the biomass. However, only one interaction was included in

the model considering that the conditions were not filled with the other terms.

Outliers underlined by a distance of Cook bigger than a threshold of 0.05 was discarded from

the data set.

Page 22: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

14

Finally, the deviance explained by the predictors was computed by step-wise approach

where each term was dropped from the final model. The residual deviance of the final model

was subtracted by the residual deviance of the model without one variable.

The comparison between GAM and RF was mainly done with the performance of the models

and their prediction. However, to complete the study of the performance a 10-folds cross-

validation was done 100 times. The root-mean-square-error (RMSE) was then compared.

2.5. Prediction of biomass

Once models calibrated, they were used to predict the total biomass for Mediterranean Sea

in a generated grid of 0.2° x 0.2° resolution. First, the biomass was predicted for each year

since 1980 to 2011. The total biomass per year was got by integration with the following

equation:

Σcell(Areacell x BiomassCell) (9)

Considering the resolution, the latitude and longitude of the Mediterranean Sea, the area of

all cells was: 22.24 x dcell km².

The distance between 0.2° latitudes did not change when longitudes change in

Mediterranean Sea whereas the distance between 0.2° longitudes changed when latitudes

change. In order to consider this variation in distance, di was computed by the following

method:

𝑑𝑐𝑒𝑙𝑙 = 𝑅. 𝑎𝑟𝑐𝑜𝑠𝑖𝑛𝑒(sin(𝑟𝑎𝑑(𝐿𝑎𝑡𝑐𝑒𝑙𝑙))2

+ cos(𝑟𝑎𝑑(𝐿𝑎𝑡𝑐𝑒𝑙𝑙))2 × cos(𝑟𝑎𝑑(0.2))) (10)

For one cell, R is the radius of the earth (6371 km), Latcell the latitude and ‘rad’ explains

the transformation from degree to radian unit.

In order to figure out where the biomass could have significantly changed, the variance by

cells across all the years was computed (Equation 11) and projected. All projections were

done in the World Geodetic System (WGS) 84 reference.

∑ (𝑥𝑖 − 𝑥)2𝑖

𝑛 (11)

where xi is the biomass (t/km²) at the year i, 𝑥 the average of the biomass for a cell and

n the number of year.

Page 23: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

15

3. RESULTS

3.1. Comparison of model performance

Random Forest performs with a pseudo R² = 76.5 %. After adding the bias correction, the

model overestimates the large values and underestimates the low values (Figure 5.a)). The

difference between the model with the bias correction and without the bias does not change

efficiently the results. GAM performance is lower than RF with an adjusted R² of 47.2 % and

GCV of 9.66. According to Figure 5.b), GAM tends to overestimate the low values and

underestimates the large values. Most of the biomass is situated in a range between -10 and

0 log t/km² with Random Forest and GAM (though this last one is less centered on the

mean).

The analysis of the residuals indicates the GAM model to be acceptable. The

homoscedasticity of residuals is respected although the normal distribution of the residuals is

unbalanced for smaller residual values (e.g., residuals from - 5 to -10 log t/km²) which may

be related to the overfitting of little values. Residuals could also be independent and

generally follow a Normal distribution except for a range between -5 and -10 log t/km² (Annex

IV).

Figure 6: Scatter plots of observed versus predicted mesopelagic fish biomass from a)

Random Forest model and b) GAM. The blue dashed line is the 1:1 line, the black line

is the fit. The black dashed line in a) is the fit without the bias correction.

Random Forest considers all the predictors non-linear. On Figure 7, no linear relation ship

appears except for segments of primary production (from 10 to 17 mmol N/m²/d) or oxygen

(from 205 to 217.5 µmol/kg). The flatness observed for values upper than 17 mmol N/m²/d in

the case of primary production and lower than 37.8 for salinity is only due to lack of data.

b) a)

Page 24: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

16

Those results are similar to the ones obtained from the ACE algorithm. The salinity almost

follows the same shape than the salinity from Figure 5. This is the case for temperature as

well where there is a distinct change around 16°C. The similarity with oxygen and bathymetry

is not clear. Nevertheless, primary production trends are totally different. While RF seems to

have a primary production which mainly rises up, the ACE plot of this parameter show a

decrease of the biomass when the environmental predictor increases.

Figure 7: Partial Dependence Plot from Random Forest model. The y-axis is the

absolute value of the prediction when other variable are fixed.

A summary of variable importance is given in Figure 8. It can be seen than salinity is the

most influent parameter in the Random Forest model. This predictor increases the MSE of 40

% when salinity is permuted from the training. Oxygen, with 38.3 %, is the second most

influent parameter. Temperature and bathymetry are considered as having an even

influence. Finally, the primary production is the variable having the less influence on the

model with 30.6 %. The absence of a high variable importance confirms the non-correlation

between the predictors.

Besides, the variable importance order with the second method, the increasing in node

purity, is the same.

Page 25: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

17

Figure 8: Predictor variable importance in Random Forest approaches with their

confident interval.

The deviance explained by each predictor (Table 3) also implies that the salinity explained a

majority of the model deviance. Indeed, the difference in residual deviance when salinity is

excluded from GAM is 8864. This important value is 100 times higher than the other

differences. With GAM, the interaction is the second predictor explaining the model whereas

it is the oxygen with RF. The primary production is once again in the variables which

contribute the less into the model.

Table 3: Parameters contribution to GAM model’s deviance.

Parameter ΔResidual Deviance

Salinity 8864.0

Interaction 93.64

Log(Bathymetry) 88.99

Oxy 53.12

Log(Temp) 52.28

PPR 39.54

Figure 9 shows how the logarithmic transformed biomass changes relatively to the salinity

when the other variables of the model are held constant. The shape of the curve is similar to

the one get with the ACE transformation (see Figure 5) and have a trend which looks like

with the one get from the dependence plot of Random Forest (Figure 7).

Page 26: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

18

Figure 9: Form of the smooth functions for the selected covariate. Black solid line is

the smooth function estimate with grey intervals representing the 95% confidence

intervals.

Finally, GAM returns statistically significant coefficients which correspond to the shape of the

plots obtained from ACE algorithm. For instance, the primary production had a negative

linear relationship with ACE, and its coefficient through GAM is also negative. The only

exception is for the negative coefficient of the oxygen and this might be due to the interaction

with the bathymetry.

Table 4: GAM significance levels of models terms.

Parametric coefficient

Estimate Standard Error

t-value Pr(>|t|)

(Intercept) 70.68376 22.98611 3.075 0.00217

Oxygen -0.22877 0.09551 -2.395 0.01681

PPR -0.04155 0.01942 -2.140 0.03262

Log(Bathymetry) -9.40316 3.05023 -3.083 0.00211

Log(Temperature) -9.38056 4.08206 -2.298 0.02179

Log(Bathymetry)/Oxygen 0.04360 0.01376 3.169 0.00158

Approximate significance of smooth terms:

Estimated df Ref. df F p-value

S(Salinity) 3.972 4 225.9 <2e-16

37.5 38.0 38.5 39.0

-2-1

01

23

4

Salinity

s(S

alin

ity)

Page 27: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

19

The best performance judged on RMSE (Figure 10) is described by the Random Forest

model with the lowest value of RMSE. While the average is at 1.79 for RF, the GAM’s

average is at 2.80 with cross-validation.

Figure 10: Boxplot of the RMSE got from a 10 folds cross-validation repeated 100

times.

3.2. Comparison of the estimation of the biomass in Mediterranean Sea

3.2.1. Biomass in 1980, an example

Between the two models, the distribution of the mesopelagic biomass differs (Figure 11).

Both models seem to predict less biomass in the Ionian basin and a biomass in the range of

more than 0.1 t/km² in the Alboran Sea. Along the Lebanese and the Egyptian coast, the

presence of mesopelagic is estimated by the two models but a different level of biomass. The

biomass of mesopelagic fish with Random Forest is concentrated in the Ligurian Sea and

between the Baelaric Island and Sardinia/Corsica.

Random Forest strangely predicts a biomass between 0.01 and 0.05 t/km² in the Adriatic Sea

(Figure 11.a)) where the bathymetry is less than 200 m depth. The Gulf of Gabes seems also

being a place with relatively elevated biomass whereas it still along the continental shelf.

Regarding the GAM (Figure 11.b)), the biomass of mesopelagic between 0.01 and 0.05 t/km²

is noticeable along the Algerian Coast and in the Aegean Sea up to the Crete. High biomass

occurs in the Gulf of Lions and along the Spanish coast in the Western Balearic Sea.

However, the uncertainty measured with the coefficient of variation underlines the fact those

predictions are not reliable. In those areas, the CV (Annex V) is between 30% and 40 % of

variations. Those same values are also seen in the northern part of the Adriatic Sea and in

the centre of Aegean Sea. The CV is under 10 % for the other parts of Mediterranean Sea.

Page 28: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

20

Figure 11: Distribution of the predicted biomass in 1980 with a) RF and b) GAM.

3.2.2. Temporal biomass estimates

The temporal estimates of biomass for both models have some variability (Figure 12).

Using RF, mesopelagic biomass seems to gradually increase from 1980 to 2000 and a

strong decrease from 2004 to 2011. The biomass range is approximatively between 25,000 t

and 35,000 t in the 31 estimated years.

Using GAM, no clear trend can be observed. However, biomass seemed to have increased

between 1988 (19,500 t) and 1993 (27,900 t) then decreased again until 1998. In GAM,

biomass varies between 19,500 t and 27,900 t in the 31 estimated years (1980-2011) with an

uncertainty of 100 t in average.

If the range of biomass is quite similar between models, the estimate from RF is bigger than

the GAM estimates. RF estimates are, on average, 35% more important than GAM. It could

also be noticed than the variation of biomass is not synchronic. Also, we cannot notice an

important variation of the biomass between 1988 and 1998 in RF. These differences show

the models to act differently with different prediction.

Page 29: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

21

Figure 12: Biomass of mesopelagic fish from 1980 to 2011 estimated by RF

(continuous line) and GAM (dashed line).

The change of the biomass through the 31 estimated years is different from both the models.

Except to the Gulf of Gabes where there is a variance of 0.8 along the continental slope, all

the important variances are pooled separately. With Random Forest (Figure 13.a)), variance

is the highest (> 1.8.10-3 (t/km²)²) where the biomass was already relatively high (Figure

11.a)) at the North Aegean Sea, along the Egyptian coast and in the Ligurian Sea.

Page 30: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

22

Figure 13: Variance of the mesopelagic biomass for 31 years: 1980-2011 extract from

a) RF and b) GAM

Regarding GAM estimates (Figure 13.b)), the pattern saw in Figure 11 is also noticeable.

Alboran Sea and South-West of Balearic Sea have a higher variance (upper than 1*10-3

(t/km²)²). The biomass in Aegean Sea around the Crete has been stable during the period of

the study. There is just a slight change of variance (1 to 1.6 *10-3 (t/km²)²) in the center of this

sea. Variation of the biomass is also visible along the continent slope on the boarder of the

200 m depth. Then it could be supposed that the increasing of biomass observed between

1988 and 1998 might be due to an important change in the areas where there is high

variances.

Page 31: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

23

4. DISCUSSION

4.1. Methodological considerations: validity of the data

4.1.1. Reliability of the environmental data

Since environmental data were extracted from biogeochemical models, uncertainties exist

around our results and their interpretations. The primary production, for example, is

surprisingly low in some parts of the Mediterranean Sea like along the Egyptian Coast, on

the Tunisian continental shelf and in the shallow Adriatic water. Some samples were taken in

those ranges of low PPR values (see Figure 3 and Annex I: East of Aegean Sea). Those

samples could have influenced the models to predict a biomass value relatively high in

places with low PPR and therefore, a low bathymetry. Besides, our training dataset only

explained 6 % of Mediterranean Sea. That low rate of coverage could imply an extrapolation

of the biomass.

Using time series of survey data for all the variables (instead of modelled climatological data)

would probably result in a better model performance; however this would be a challenging

task for modelling biomass at larger scale (global estimates).

Environmental data were average values of the different layers of water column. As some

environmental parameter datasets were established on different ranges of depth like the

PPR, which is occurring in shallow waters whereas salinity was provided on deeper layer, the

integration was useful to homogenize the datasets in one unique layer. The environmental

data sets were therefore integrated over the entire water column instead of including only

those from the mesopelagic layer. This computation modifies strongly the reality of the

mesopelagic distribution. Indeed as seen in table 1, the variables associated with the sample

of biomass have a different value from the layer where the deep-sea fish are usually found.

For instance, the temperature of the sample has 16.1° C in average while the temperature of

the LIW is ranged from 13.0 to 15.5°C (Zavatarelli and Mellor, 1994).

4.1.2. Miscalculation of the observed biomass.

A possible source of bias comes from scientific literature and all the parameters that were

extracted from it. Mesopelagic biomass was obtained from publications (Annex VI) on

Mediterranean Sea studies of marine ecology (e.g., the size relationship, the assemblages of

species or their diet). The scope of such articles was generally not related to estimate

mesopelagic fish biomass. Also, mesopelagic biomass quoted represented only a little

sample of the collected biomass as they only selected the most significant species of trawls.

The most common family, Myctophidae, was often mentioned as fish present in the trawl but

with no data or information provided.

The second source of bias might be related to the coefficient used in the weight-length

relationship. Although the most appropriate coefficients were used to compute fish biomass

when only abundance data was included, in some cases, lack of information (e.g.,sizes) led

us to use default values as the one provided Fishbase. For example, Stomias boa boa’s,

whose size was not available in the scientific article, was given a standard length of 32.2 cm,

as provided by Gibbs (1990) in Fishbase. By applying the length-weight relationship to it and

a catchability of 0.2, the biomass of one Stomias boa boa is 313.5 g. This weight

corresponds to the maximum weight of the fish but do not represent the real weight of the

Page 32: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

24

catch as we used a general size. In one way or another, the biomass of mesopelagic species

could have been either overestimated or underestimated.

4.1.3. Catchability cause.

In this study three constant catchabilities were used to estimates the biomass present in the

water column. They were accurate for a given size of a given species with a given gear (May

and Blaber, 1989). For instance, Diaphus danae’s common size is around 12 cm (Fishbase)

and has a catchability of 0.2, meaning that only 20% of the total biomass in the water column

was caught. In May and Blaber’s study, it was caught with an Engel trawl and a Simrad FB

trawl eye. It is a strong assumption not to change the catchability when the species, the gear,

the size and the behaviour of the fish and the type of net changes because a catchability

depends on biological and technical factors (Arreguín-Sánchez, 1996). For instance,

Maurolicus muelleri has a catchability of 0.06 with an Engel trawl (May and Blaber, 1989)

and a catchability of 0.15 with an Akra trawl (Heino et al., 2010).

An alternative way to correct those catchabilities can be done using a ratio. The basic

assumption is using two independent methods for estimating the biomass on an area: for

example from trawl surveys and acoustic surveys. The ratio between these two estimates

provides the range of difference between the two methods

𝐵𝑎𝑐𝑐

𝐵𝑡𝑟𝑎𝑤𝑙= 𝑎. 𝑞 (12)

Being a, the coefficient which can modify the catchability used in the data set.

However, this approach cannot be applied since in Mediterranean Sea, no publications about

acoustic survey related to trawl survey on mesopelagic fish were found. In order to get the

coefficient a, we should look at different places investigated by those methods such as

Chatham rise in Australia or along the north American coast Oregon or California.

Additionally to a change of catchability, the avoidance of the gear was not considered and it

is an important origin of underestimation of the biomass (Kaarvelt et al., 2012).

4.2. Model performance

It has been observed than RF performs better than any other models used for predicting a

dependent variable: it predicts biomass with the lowest error (Knudby et al., 2009). In this

study, the RF performance is also better than the one from GAM: the variance explained by

RF is more important and the RMSE smaller. The coefficient issued from GAM have also a

large standard error (Table 4) which might aware us about the important uncertainty around

the parameters. All the statistical analyses lead to prefer the use of RF instead of GAM.

However, the problem resides in statistical interpretation of the RF model where we only

have the partial dependent plot and the variable importance to understand the effect of an

environmental variable. The trouble is also about the certainty of the predictions and

estimates. Whereas it is easy to get confident and prediction intervals with a traditional

regression, the principle is more complicated in the case of RF.

Confident and prediction intervals can be computed with the distribution of the response. In

the case of RF, the function only keep in memory the n trees prediction used during the

running (e.g., in our case, 400 trees were grown, we had 400 biomass predicted for each

Page 33: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

25

data point). The confident or/and the prediction intervals will be built from the distribution of

400 predicted biomasses. Besides, those 400 biomasses will be used to construct the forest

prediction. There is no possibility to get the uncertainty around the mean for each tree with

the package randomForest. The use of ‘quantregForest’ package (Meinshausen and

Schiesser, 2005, version 1.0) might be helpful for getting the percentile statistics of each tree

but the standard deviation or the coefficient of variation was not provided.

In some way, we could get the coefficient of variation based on the 400 trees built for each

point of the Mediterranean Sea, however, this resulted in unrealistic values (Annex V). Low

values of CV with random Forest (0-20 %) are found in the Levantine basin and in the Ionian

Sea. Values can be more than 100 % along the continent shelf and where the bathymetry is

relatively low. However the CV computed from trees cannot be trusted (Wager et al., 2014).

A method proposed to obtain those uncertainty values is the Jackknife approach.

4.3. Estimates of biomass.

Considering biomass estimates in 1980 and then the variance during the 31 years, GAM

seems to be the more accurate with the distribution of mesopelagic biomass: i.e., there is not

an important biomass predicted along the coasts or in shallow water of Adriatic Sea.

However, regarding the fact the boundary between mesopelagic and epipelagic might be at

different depths (Reygondeau et al., 2015) due to more turbulent water and an euphotic zone

shallower, the high biomass estimates from RF in those shallow water could be realistic.

Relatively to the Reygondeau et al.’s report (2015), high mesopelagic biomass distributed by

RF (Figure 11.a)) are included in biogeochemical regions (Figure 1) where the temperature is

low (~13.5 °C) a salinity near to 38.5 psu in average and the oxygen range is limited by 4 and

5.5 mL/L.

Figure 14: Rate of the climate change pressure on Mediterranean Sea according to

Reygondeau et al. (2015)

The distribution of high biomass from GAM (Figure 11.b.) is in the same biogeochemical

regions plus in the Adriatic Sea where salinity is higher (38.5-39 psu) and the oxygen limited

to a smaller range: 5-5.5 mL/L. Then we can also establish that the distribution of biomass is

more important in WMED than in EMED where the climate change pressure are more

important (Figure 14, Reygondeau et al., 2015) and the environmental parameters less

extreme.

Page 34: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

26

Gjøsaeter and Kawaguchi (1980) have also estimated a biomass of mesopelagic fish more

important in WMED than in EMED with 2.1*106 t and 0.35*106 t respectively.

Comparing to Gjøsaeter and Kawaguchi, the biomass estimated by these models is far away

smaller (Table 5), with a factor of 100. These results indicate an error in the model or in the

computation of the biomass which can be related to the catchability and the lack of data in

some parts of the Mediterranean Sea. In order to call in question models, a basic

computation was done. For one observed biomass on a specific area, we extrapolated this

biomass to the whole Mediterranean Sea area. For example, Lampanyctus crocodilus

biomass was 0.002 g/m² on a swept area of 250,000 m². When the value is extrapolated to

2.5*106 km², the biomass was 24,000 t. We could see that biomass was in the same ranges

than the ones from models (e.g., 25,000 to 35 000 t with RF; 19,000 to 27,000 t with GAM).

The model cannot be the source of the low biomass but could be our biomass data.

Table 5: Comparison between the results of the study and the result from Gjøsaeter

and Kawaguchi:

Area Gjøsaeter and Kawaguchi (1980)

RF (1980) GAM (1980)

Mediterranean area between -5° and 10° longitude

3 t/km² 0.022 t/km² 0.016 t/km²

Total Mediterranean Sea

2.45*106 t 0.29*105 t 0.21*105 t

Regarding the results from 1980 (Figure 11), some patterns of unexpected areas with high

biomasses might be realistic. Indeed, the literature mentions the presence of larvae in

several high value of biomass predicted by models such as at the North of Cyprus (Çoker

and Cihangir, 2015), in Adriatic Sea (Dulčić and Ahnelt, 2007) along the Egyptian Coast

(Fowler, 1986) and in the golf of Gabès (Bradai and El Ouaer, 2012; Zarrad et al., 2013).

On the RF results (Figure 11.a), an important feature is quite interesting: important values of

biomass estimates (0.01-0.1 t/km²) are concentrated in the Ligurian Sea, in the North of the

WMED. Besides, in the publication of D’Ortenzio and Ribera d’Alcala (2009), a pattern in the

distribution of the chlorophyll a (Figure 15) is underlined in that same area. The pattern is a

cluster which represents a place where a bloom of primary production annually occurs.

Having important biomass estimates where a bloom is happening might show that

mesopelagic fish biomass predicted with RF is related to the primary production. That fact

could question the variable importance (Figure 8) and the weak difference in increasing of

MSE between salinity and PPR. Indeed, it may imply the importance has not a strong role in

the prediction of the biomass. Due to the well-known correlation between primary production

and mesopelagic fish established by scientific research, this spatial match between biomass

estimates and primary production bloom is a positive point in the validation of the model.

In Adriatic Sea, there is a coastal distribution of chlorophyll a (Figure 15 and Annex I) which

might partially explained the important estimates in this area by RF. However, a lack of data

could also drive the model to estimates the same biomass in that area that another having

similar environmental parameters.

Page 35: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

27

Figure 15: Spatial distribution of the 7 clusters of the chlorophyll dynamics in

Mediterranean Sea (Sources: D’Ortenzio and Ribera d’Alcalá, 2009).

Regarding the Alboran Sea, this is a place of mixing water between Atlantic and

Mediterranean. The important values of biomass estimating there by both models might be

related to those waters. An analysis on the biomass distributing in the neighboring Atlantic

waters could help determining the origin of these patterns.

Making an interpretation of a possible increase or decrease of the biomass is dangerous.

The 1980-2011 period obviously cannot describe the before 1980 biomass. Compared to

several years ago, the biomass represented can also be at a plate, either because the

biomass was more important before or because it was lower in the beginning of the twentieth

century. We cannot link the biomass evolution to a human pressure either because of the

core of this study. Therefore, the variability observed during the 31 estimated years is only

partially explained by our variables. The pattern implemented by GAM in Figure 12 is a

wonder. It might be due to a higher salinity that occurred especially in those years (Figure

16).

There was, between 1988 and 1998, a climate shift in the area of Aegean Sea (Theocharis et

al., 1999, Zervakis and Georgopoulos, 2000). The water from Adriatic Sea is usually the

source of water in EMED. But an outflow of dense water from the Aegean Sea sank into the

deep layer of the EMED in 1987 and continues to sink the following years. When the GAM

estimates bigger biomass in 1993 in the South of Crete, it could be due to the switch of the

source water and therefore, the higher salinity and lower temperature which mixed into the

deep layer. Danovaro et al. (2004) also illustrated a general increase and decrease of the

biodiversity indexes and abundance between these years. This patterns show how the GAM

is sensible towards salinity. In contrary, the RF was not influenced by this phenomenon

whereas salinity was one of its most important variables.

Page 36: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

28

Figure 16: Cartography of the salinity (on the left side) and the biomass estimated by

GAM (on the right side) for the a) 1988, b) 1993, c) 1998 years.

5. CONCLUSIONS AND PERSPECTIVES

Mesopelagic fish biomass is a response variable which requires a non-linear model. Applying

a generalized additive model or a decision tree-like model appears to answer this need.

Random Forest can deal with several combined relationships between the response and the

Page 37: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

29

independent variables and consists in a friendly model which can easily well perform.

However, it can overfit the data and is a black box with which the interpretation of the

variables influence is somehow impossible to define. The generalized additive model

considers the non-linearity between the variables and gives expected relationships

established by an alternative conditional transformation. It is easy to increase the degree of

freedom required in the GAM but the model will overfit the data more than predict the

biomass. Working simultaneously with these two tools allowed us to check for mistakes

either in the data or un the models and for model outputs ecologically meaningless. Finally,

the two models result in simple models which use the primary production, the salinity, the

temperature, the oxygen and the bathymetry. With GAM, an interaction was added. Both of

them appear to be sensitive to salinity, unique non-linear variable in GAM. The statistical

analyses reveal Random Forest to perform better although GAM’ statistical analyses are still

correct. The main issue is the difficulty to get uncertainty with Random Forest.

The two models have shown totally different results in relation to the distribution and the

variation of the biomass in Mediterranean Sea. Random Forest estimates a biomass 35%

higher than the estimates of the GAM. Both models estimate a higher biomass in the

Western Mediterranean Sea which could be expected according to the literatures. However,

the two models did not show a trend in the variation of biomass during those last years.

Regarding the climate change which modifies the waters where mesopelagic fish live, it was

expected to see either a decrease or an increase of the estimates. Only GAM estimates an

important change relative to changes in salinity. With those estimates, the reliability of

Random Forest is questionable, it estimates biomass in Mediterranean’s area where the

bathymetry is upper than 200 m depth. This fact was not expected regarding the distribution

of mesopelagic fish in the water column and favours the selection of GAM.

Also, a biomass estimated by the both model is 100 times smaller than the biomass

estimated by previous studies and this could be due to several miscomputation of the

biomass as the catchability used or the poor information given by the literatures.

All the results obtain here could not enhance the use of Random Forest or GAM in

particularly. Keeping the two methods and applying them to different case studies or to a

global ocean could finally help assessing which model is the most accurate to estimate

mesopelagic fish biomass. As Random Forest is working by making bags, the fact to enlarge

the training data could let it perform better instead to restrain it at a semi-enclosed area

where the variability in the environmental predictors is quite important.

Another approach in order to evaluate the biomass of mesopelagic fish is to go through a

presence/absence model using the same methods. For instead, a GAM method using a

different family (Binomial) and ensemble of decision tree like Boosted Regression tree or

Random Forest can predict the occurrence of the mesopelagic fishes in Mediterranean Sea,

using the same environmental parameters (e.g., salinity, temperature, etc..). The idea will be

to predict the occurrence species by species and then, applying the weight-length

relationship with an average length of the species. The whole species biomass could lead to

a global estimate of the mesopelagic fish biomass in Mediterranean Sea. This alternative

could be done with the biomod2 or dismod r package which applied several algorithms

specified by the user for obtaining the occurrence.

Page 38: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

30

6. BIBLIOGAPHY

Allain V., 2005, Diet of four tuna species of the Western and Central Pacific Ocean, SPC Fisheries Newsletter, Vol.114, p.30-33.

Arreguín-Sánchez F., 1996. Catchability: A key parameter for fish stock assessment. Reviews in Fish Biology and Fisheries 6, p.221-242. doi:10.1007/BF00182344

Barrett R.T., Anker-Nilssen T., Gabrielsen G.W. and Chapdelaine G., 2002. Food consumption by seabirds in Norwegian waters. ICES J. Mar. Sci. 59, p.43-57. doi:10.1006/jmsc.2001.1145

Bianchi C.N. and Morri C., 2000. Marine biodiversity of the Mediterranean Sea: situation, problems and prospects for future research. Marine pollution bulletin,Vol.40,No.5,p.367-376.

Borghini M., Bryden H., Schroeder K. Sparnocchia S. and Vetrano A., 2014, The Mediterranean is becoming saltier, Ocean Science, Vol.10, p.693-700.

Bosc E., Bricaud A. and Antoine D., 2004. Seasonal and interannual variability in algal biomass and primary production in the Mediterranean Sea, as derived from 4 years of SeaWiFS observations: MEDITERRANEAN SEA BIOMASS AND PRODUCTION. Global Biogeochemical Cycles, Vol.18, No.1,17 p., doi:10.1029/2003GB002034

Bradai M.N. and Ouaer A.E., 2012. New record of the scalloped ribbon fish, Zu cristatus (Osteichthyes: Trachipteridae) in Tunisian waters (central Mediterranean). Marine Biodiversity Records, Vol.5, 59p.

Breiman, L., 2001. Random forests. Machine learning,Vol.45,No.1, p.5-32. Breiman, L., Friedman, J.H., 1985. Estimating Optimal Transformations for Multiple

Regression and Correlation. Journal of the American Statistical Association, Vol.80, 580p. doi:10.2307/2288473

Catul V., Gauns M. and Karuppasamy P.K., 2011. A review on mesopelagic fishes belonging to family Myctophidae. Reviews in Fish Biology and Fisheries Vol.21, p.339-354. doi:10.1007/s11160-010-9176-4

Cherel Y., Ducatez S., Fontaine C., Richard P. and Guinet C., 2008. Stable isotopes reveal the trophic position and mesopelagic fish diet of female southern elephant seals breeding on the Kerguelen Islands. Mar Ecol Prog Ser, Vol.370,p.239-247. doi:10.3354/meps07673

Christensen V., Walters C.J., Ahrens R., Alder J., Buszowski J., Christensen L.B., Cheung W.W.L., Dunne J., Froese R., Karpouzi V., Kaschner K., Kearney K., Lai S., Lam V., Palomares M.L.D., Peters-Mason A., Piroddi C., Sarmiento J.L., Steenbeek J., Sumaila R., Watson R., Zeller D. and Pauly D., 2009. Database-driven models of the world’s Large Marine Ecosystems. Ecological Modelling, Vol.220, p.1984-1996. doi:10.1016/j.ecolmodel.2009.04.041

Çoker T. and Cihangir B., 2015. Distribution of Ichthyoplankton during the Summer period in the Nothern Cypris Marine areas. Turkish Journal of Fisheries and Aquatic Sciences, Vol.15,p. 233-245. doi:10.4194/1303-2712

Emig C.C. and Geistdoerfer P., 2004, Faune profonde en Mer Méditerranée : les échanges historiques, géographiques et bathymétriques, Carnets de géologie/Notebooks on Geology, Maintenon, Article 2004/01 (CG2004_A01_CCE-PG)

Fowler S.W., 1986, Trace metal monitoring of pelagic organisms from the open Mediterranean Sea, Environmental monitoring and assessment, Vol.7, No.1, p.59-78.

Danovaro R., Dell’Anno A., Fabiano M., Pusceddu A. and Tselepides A., 2001, Deep-sea ecosystem response to climate changes: the eastern Mediterranean case study, TRENDS in Ecology & Evolution, Vol.16, No.9, p.505-510.

Danovaro R., Dell’Anno A., and Pusceddu A., 2004, Biodiversity response to climate change in a warm deep sea, Ecology Letters, Vol.7, p.821-828. doi: 10.111/j.1461-0248.2004.00634.x

Danovaro R., Company J.B., Corinaldesi C. et al., 2010, Deep-sea biodiversity in the Mediterranean Sea: The known, the unknow and the unknowable, PlosONE, Vol.5, No.8, 25p.

Page 39: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

31

Davison P.C., 2011, The export of Carbon mediated by mesopelagic fishes in the Northeast Pacific Ocean, Doctorate in Oceanography, University of California, San Diego, 166p.

Davison P.C., Checkley Jr. D.M., Koslow J.A. and Barlow J., 2013, Carbon export mediated by mesopelagic fishes in the Northeast Pacific Ocean, Progress in Oceanography, Vol.116, p.14-30.

Davison P.C., Koslow J.A. and Kloser R.J., 2015. Acoustic biomass estimation of mesopelagic fish: backscattering from individuals, populations, and communities. ICES J. Mar. Sci., Vol.75, No.5, p.1413-1424, doi: 10.1093/icesjms/fsv023

De’ath G., 2007. Boosted trees for ecological modeling and prediction, Ecology, Vol.88, p.243–251.

De Forest L. and Drazen J., 2009. The influence of a Hawaiian seamount on mesopelagic micronekton. Deep Sea Research Part I: Oceanographic Research Papers, Vol.56, p.232-250. doi:10.1016/j.dsr.2008.09.007

D’ Ortenzio, F. and Ribera d’Alcalà M., 2009. On the trophic regimes of the Mediterranean Sea: a satellite analysis. Biogeosciences Vol.6, p.139-148.

Dulčić, J. and Ahnelt H., 2007. How many specimens of the crested oarfish, Lophotus lacepede Giorna, 1809 (Pisces: Lophotidae), were caught in the Adriatic Sea? Acta Adriatica, Vol.48, p.39-43.

Elith J., Leatwick J.R. and Hastie T., 2008, A working guide to boosted regression trees, Journal of Animal Ecology, Vol.77, p.802-813, doi:10.1111/j.1365-2656.2008.01390.x

Evgeny Pakhamov, Professor, Biological & Fisheries Oceanography, UBC, Department of Earth, Ocean & Atmospheric Sciences

Froese R. and Pauly D., 2015. FishBase. World Wide Web electronic publication, www.fishbase.org, version (04/2015).

Gibbs R.H. Jr., 1990, Stomiidae, In J.C. Quero, J.C. Hureau, C. Karrer, A. Post and L. Saldanha (eds.) Check-list of the fishes of the eastern tropical Atlantic (CLOFETA). JNICT, Lisbon; SEI, Paris; and UNECO, Paris. Vol. 1. p. 296-299.

Gjøsaeter J. and Kawaguchi K., 1980, A review of the world resources of mesopelagic fish, Food and Agriculture Org., Fisheries Technical Paper, No.193, FIRM/TI93, 151p.

Gregorutti B., Michel B. and Saint-Pierre P., 2013. Correlation and variable importance in random forests. arXiv:1310.5726 [stat], 31p.

Hastie, T., Tibshirani, R., 1986. Generalized additive models. Statistical science, p.297-310. Hastie T., Tibishirani R. and Friedman J.H., 2009, The Elements of Statistical Learning : Data

Mining, Inference, and Prediction. In: Springer Series in Statistics Edition 2, Springer-Verlag, New-York, 745p.

Heino M., Porteiro F.M., Sutton T.T., Falkenhaug T., GodØ O.R. and Piqtkozski U., 2010, Catchability of pelagic trawls for sampling deep-living nekton in the mid-North Atlantic, ICES J. Mar.Sci., 13 p.

Hothorn T., Hornik K. and Zeileis A., 2006b, Unbiased recursive partitioning; a conditional inference Framework, Journal of Computational and Graphical Statistics, Vol.15, No.3, p.651-674.

Hulley P.A., 1990, Myctophidae. In : O. Gon and P.C. Heemstra (eds.), Fishes of the Southern Ocean, J.L.B. Smith Institue of Ichthyology, Grahamstown, p.146-178.

Irigoien X., Klevjer T.A., RØstad A., Martinez U., Boyra G., Acuña J.L., Bode A., Echevarria F., Gonzalez-Gordillo J.I., Hernander-Leon S., Agusti S., Aksnes D.L., Duarte C.M. and Kaartvedt S., 2014, Large mesopelagic fishes biomass and trophic efficiency in the open ocean, Nat. commun.5, 10p. doi:10.1038/ncomms4271

Kaartvedt S., Staby A. and Aksnes D.L., 2012, Efficient trawl avoidance by mesopelagic fishes causes large underestimation of their biomass, Marine Ecology progress series, Vol.456, p.1-6. doi:10.3354/meps09785

Kim Y.-J. and C. Gu, 2004, Smoothing spline Gaussian regression: More scalable computation via efficient approximation, J. Roy. Statist. Soc.Ser., B 66, p.337-356.

Kinzer J., Böttger-Schnack R. and Schulz K., 1993. Aspects of horizontal distribution and diet of myctophid fish in the Arabian Sea with reference to the deep water oxygen

Page 40: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

32

deficiency. Deep Sea Research Part II: Topical Studies in Oceanography, Vol.40, p.783-800.

Knudby A., Brenning A. and LeDrew E., 2010. New approaches to modelling fish–habitat relationships. Ecological Modelling, Vol. 221, p.503-511. doi:10.1016/j.ecolmodel.2009.11.008

Koslow J.A., Goericke R., Lara-Lopez A. and Watson W., 2011, Impact of declining intermediate-water oxygen on deepwater fishes in the California Current, Marine Ecology Progress Series, Vol.436, p. 207-218.

Kozlov A., 1995. A review of the trophic role of mesopelagic fish of the family Myctophidae in the Southern Ocean ecosystem. Ccamlr Sc.i, Vol.2, p.71-77.

Lascaratos A., Roether W., Nittis K. and Klein B.,1999, Recent Changes in deep water formation and spreading in the eastern Mediterranean Sea: a review, Progress in Oceanography, Vol.44, p. 5-36.

Lea M.-A., Nichois, P.D. and Wilson G., 2002, Fatty acid composition of lipid-rich myctophids and mackerel icefish (Champsocephalus gunnari)- Southern Ocean food-web implications, Polar Biology, Vol.25, p.843-854.

Legand M., Dourret P., Fourmanoir P., GrandPerrin R., Guérédrat J.A., Michel A., Rancurel P., Repelin R. and Roger C., 1972, Relations trophiques et distributions verticals en milieu pélagique dans l’océan pacifique intertropical, Cahier O.R.S.T.O.M., série Océanographie, Vol.10, No.4, p.303-393.

Lejeusne C., Chevaldonné P., Pergent-Martini C., Boudouresque C.F. and Pérez T., 2009, Climate change effects on a miniature ocean: the highly diverse, highly impacted Mediterranean Sea, Trends in Ecology and Evolution, Vol. 25, No.4, p.250-260.

Lancraft T.M., Torres J.J. and Hopkins T.L., 1989. Micronekton and macrozooplankton in the open waters near Antarctic ice edge zones (AMERIEZ 1983 and 1986). Polar Bio,l Vol.9, p.225-233. doi:10.1007/BF00263770

Liaw A. and Wiener M., 2002. Classification and regression by randomForest. R news Vol.2, p.18-22.

Macias D., Garcia-Gorriz E., Piroddi C. and Stips A., 2014. Biogeochemical control of marine productivity in the Mediterranean Sea during the last 50 years. Global Biogeochem. Cycles, Vol .28, 2014GB004846. doi:10.1002/2014GB004846

Manca B., Burca M., Giorgetti A., Coatanoan C., Garcia M.-J and Iona A., 2003, Physical and biochemical averaged vertical profiles in the Mediterranean regions : an important tool to trace the climatology of water masses and to validate incoming data from operation oceanography, Journal of Marine Systems, 2004, Vol.48, p.83-116.

Mann K.H., 1984, Fish production in open ocean ecosystems. In : Fasham MJR (ed.), Flows of energy and materials in marine ecosystems : theory and practice, Plenum Press, New York, p.435-458.

May J.L. and Blaber S.J.M., 1989. Benthic and pelagic fish biomass of the upper continental slope off eastern Tasmania. Marine Biology Vol.101, p.11-25.

Meinshausen N. and Schiesser L., 2015, Quantile Regression Forests, quantregForest, version 1.0

Menard H.W., Marine Geology of the Pacific, McGraw-Hill, New-York, 1964, 271 p. Ménard F., Stéquert B., Rubin A., Herrera M. and Marchal E., 2000, Food consumption of

tuna in the Equatorial Atlantic ocean : FAD-associated versus unassociated schools, Aquatic Living Resources, Vol.13, p. 233-240.

Morato T., Kvile K., Taranto G.H., Tempera F., Narayanaswamy B.E., Hebbeln D., Menezes G.M., Wienberg C., Santos R.S. and Pitcher T.J., 2013. Seamount physiography and biology in the north-east Atlantic and Mediterranean Sea. Biogeosciences, Vol.10, p.3039-3054. doi:10.5194/bg-10-3039-2013

My Ocean follow-on, 2014, Ocean Monitoring and Forecasting, www.myocean.eu, last consult on 2015.03.23

Norheim E., 2014. Distribution of mesopelagic scattering layer (MSL) in relation to the physical environment in the Norwegian Sea. Master of Science, Program of Biodiversity, Evolution and Ecology, University of Bergen.

Page 41: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

33

Olivar M.P., Bernal A., Molí B., Peña M., Balbín R., Castellón A., Miquel J. and Massutí E., 2012. Vertical distribution, diversity and assemblages of mesopelagic fishes in the western Mediterranean. Deep Sea Research Part I: Oceanographic Research Papers, Vol.62, p.53-69. doi:10.1016/j.dsr.2011.12.014

Pearcy W.G. and Laurs R.M., 1965, Vertical migration and distribution of mesopelagic fishes off Oregon, Deep-Sea Research (1966), Vol.13, p.153-165.

Pearcy W.G., Krygier F.E., Mesecar R. and Ramsez F., 1976, Vertical distribution and migration of oceanic micronekton of Oregon, Deep-Sea Research (1977), Vol.24, p.224-245.

Prasad A.M., Iverson L.R., and Liaw A., 2006 Newer classification and regression tree techniques: bagging and random forest for ecological predition, Ecosystems, Vol.9, p.181-199, doi: 10.1007/s10021-005-0054-1.

QGIS Development Team, 2015, QGIS Geographic Information System, Open Source Geospatial Foundation Project, URL: http://qgis.osgeo.org

R Core Team, 2015, R: A language and Environment for statistical computing, R Foundation for statistical computing, Vienna, Austria, url : www.R-project.org

Pusch C., Beckmann C., Porteiro F.M. and von Westernhagen H., 2004. The influence of seamounts on mesopelagic fish communities. Archive of fishery and marine research, Vol.51, p.165–186.

Reid S.B., Hirota, J. Young R.E. and Hallacher L.E., 991, Mesopelagic-boundary community in Hawaii : micronekton at the interface between neritic and oceanic ecosystems, Marine Biology, Vol. 109, p.427-440.

Reygondeau G., Irisson J.-O., Ayata S.-K., Gasparini S., Benedetti F., Albouy C., Hattab T., Guieu C. and Koubbi P., 2015. Definition of the Mediterranean eco-regions and maps of potential pressures in these eco-regions. Perseus Deliverable Nr. 1.6, 45 p.

Rixen M., Beckers J.-M., Levitus S., Antonov K., Boyer T. Maillard C., Fichaut M.,Balopoulos E., Iona S., Dooley H., Garcia M.-J., Manca B., Giorgetti A., Manzella G., Mikhailov N., Pinardi N. and Zavatarelli M., 2005, The western Mediterranean deep water: a proxy for climate change, Geophysical research letters, Vol.32, Issue 12, 4p.

Saunders R.A., Fielding S., Thorpe S.E. and Tarling G.A., 2013, School characteristics of mesopelagic fish at South Georgia, Deep Sea Researg Part I: Oceanographic Research Papers, Vol.81, p.62-77. doi:10.1016/j.dsr.2013.07.007

Schroeder K., Millot C., Bengara L., Ben Ismail S., Bensi M., et al., 2012, Long-term monitoring programme of the hydrological variability in the Mediterranean Sea : a first overview of the HYDROCHANGES network.Ocean Sciences, Vol.9 (2013), p.301-324.

Spector P., Friedman J., Tibshirani R. and Lumley, T, 2014. acepack: ace() and avas() for selecting regression transformations.Version 1.3-3.3.

The MerMex Group et al., 2011, Marine Ecosystems’responses to climatic and anthropogenic forcings in the Mediterranean, Progress in Oceanography, Vol.91, Issue 2, p.97-166.

Themelis D.E., 1997. Variations in the abundance and distribution of mesopelagic fishes in the Slope Sea off Atlantic Canada. Doctorate of Philosophy, Dalhousie University, Halifax, NS, Canada.

Theocharis A., Nittis K., Kontoyiannis H., Papageorgiou E. and Balopoulos E., 1999. Climatic changes in the Aegean Sea influence the eastern Mediterranean thermohaline circulation (1986-1997). Geophysical Research Letters, Vol.26, p.1617–1620. doi:10.1029/1999GL900320

Valinassab T., Pierce G.J. and Johannesson K., 2007, Lantern fish (Benthosema pterotum) resources as a target for commercial exploitation in the Oman Sea, Journal applied Ichthyology, Vol.23, p.573-577.

Valls M., Olivar M.P., Fernández de Puelles M.L., Molí B., Bernal A. and Sweeting C.J.,

2014, Trophic structure of mesopelagic fishes in the western Mediterranean based on stable isotopes of carbon and nitrogen, Journal of Marin Systems, Vol.138, p.160-170.

Page 42: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

34

Wager S., Hastie T. and Efron B., 2014. Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research, Vol.15, p.1625–1651.

Williams A., Koslow J.A., Terauds A., and Haskard K., 2001. Feeding ecology of five fishes from the mid-slope micronekton community off southern Tasmania, Australia. Marine Biology 139, 1177–1192. doi:10.1007/s002270100671

Wood S.N., 2001. mgcv: GAMs and generalized ridge regression for R. R News Vol.1. p.20-25

Wood S. and Wood, M.S., 2015. Package “mgcv.” R package version 1–7. Würtz, M., 2010. Mediterranean Pelagic Habitat. Oceanographic and Biological Processes,

An Overview. IUCN. Xu R., 2013. Improvements to random forest methodology, Doctorate of Philosphy, major :

Statistics, Iowa State University, Ames, United States. Zarrad R., Alemany F., Rodriguez J.-M., Jarboui O., Lopez-Jurado J.-L. and Balbin R., 2013.

Influence of summer conditions on the larval fish assemblage in the eastern coast of Tunisia (Ionian Sea, Southern Mediterranean). Journal of Sea Research, Vol.76, p.114–125. doi:10.1016/j.seares.2012.08.001

Zavatarelli M. and Mellor G.L., 1994, A numerical study of the Mediterranean Sea Circulation, Journal of physical oceanography, Vol.25, p.1384-1414.

Zervakis V. and Georgopoulos D., 2000, The role of the North Aeageanin trigegering the recent Eastern Mediterranean climatic changes, Journal of Geophysical research, Vol.105, No. C11, p.103-126.

Zhang G. and Lu Y., 2012. Bias-corrected random forests in regression. Journal of Applied Statistics, Vol.39, p.151–160. doi:10.1080/02664763.2011.578621

Page 43: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

viii

Annex I: Maps of the environmental predictors

Annex I: Maps of the environmental predictors

Page 44: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

ix

Annex II : Mesopelagic fish families and species of the Mediterranean Sea

Family Species Family Biomass (t/km²)

Criteria of selection to determine if the bathypelagic species will be integrated to the study

Centrolophidae Centrolophus niger Centrolophus nigerrimus Schedophilus medusophagus

0.53

Depth range

Evermannellidae Evermannella balbo 0.005 Mesopelagic distribution

Gonosotomatidae Cyclothone braueri C. pygmaea Gonostoma denudatum Maurilicus muelleri

30.84 Depth range

Macrouridae Nezumia sclerorhynchus 24.11 Depth range *

Microstomatidae Nansenia oblita 3.6x10-6 Depth range

Myctophidae Benthosema glaciale Ceratoscopelus maderensis Diaphus holti D. metopoclampus D. rafinesquii Electrona risso Gonichthys coccoi Hygophum benoiti H. hygomii Lampanyctus crocodilus L.pusillus Lobianchia dofleini L. gemellarii Myctophum punctatum Notoscopelus elongates N. bolini N. kroyeri Symbolophorus veranyi

54.11

Mesopelagic distribution

Nemichthyidae Nemichthys scolopaceus 0.022 Mesopelagic distribution

Notacanthidae Notacanthus bonapartei 0.419 Depth range*

Paralepididae Arctozenus risso Lestidiops jayarki Paralepis speciose Sudis hyalina

0.192

Depth range

Phosichthyidae Ichthyococcus ovatus Vinciguerria attenata Vincigueria poweriae

0.394

Depth range

Regalicidae Regalecus glesne 0.028 Mesopelagic distribution

Sternoptychidae Argyropelecus hemigymnus 2.962 Depth range

Stomiidae Bathophilus nigerrimus Borostomias antarcticus Chauliodus sloani Stomias boa boa

7.605 Mesopelagic distribution

*Macrouridae and Notacanthidae species are far from being mesopelagic species. However considering the biology of the fish, it has been included in the study.

Page 45: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

x

Annex III: Distribution of variables from the samples

Distribution of Log-transformed Biomass

logbiomass

log(B

iom

ass)

-20 -15 -10 -5 0

0100

Distribution of Biomass

data2$x

Bio

mass

0 1 2 3 4 5

0300

Distribution of Salinity

Sal

Salin

ity

37.5 38.0 38.5 39.0

040

80

Distribution of Oxygen

Oxy

Oxygen

205 210 215 220 225 230 2350

40

Distribution of Bathymetry

bathy

Bath

ym

etr

y

-4000 -3000 -2000 -1000 0

030

60

Distribution of Temperature

Temp

Tem

pera

ture

15 16 17 18 19 20

060

Distribution of Primary production

PPR

Prim

ary

Pro

ductio

n

10 15 20

040

80

Page 46: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

xi

Annex IV : Scatterplots of environmental variables and the biomass of samples

Oxy

45

67

837

.538

.038

.539

.0-1

5-1

0-5

0

205220235

45678

log.

.bat

hy.

PP

R

101520

37.538.5

Sal

Tem

p

151719

205

210

215

220

225

230

235

-15-50

1015

2015

1617

1819

20

logb

iom

ass

Page 47: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

xii

Annex V: 1) Analyses of GAM residuals and uncertainty of 2) GAM and 3) RF

estimates

-10 -5 0 5 10

-10

-50

5

theoretical quantiles

devia

nce r

esid

uals

-9 -8 -7 -6 -5 -4 -3 -2

-10

-50

5

Resids vs. linear pred.

linear predictor

resid

uals

Histogram of residuals

Residuals

Fre

quency

-10 -5 0 5

050

150

250

-9 -8 -7 -6 -5 -4 -3 -2

-15

-10

-50

Response vs. Fitted Values

Fitted Values

Response

a)

b)

c)

Page 48: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

xiii

Annex VI : Scientific literatures used for the extracting of the biomass.

V. Andersen, F. François, J. Sardou, M. Picheral, M. Scotto, P. Nival, 1998, Vertical

distributions of macroplankton and micronekton in the Ligurian and Tyrrhenian Seas (NorthWestern Mediterranean)m Oceanologica acta, Vol.21 (5)(1998), 655-676.

V. Andersen, J. Sardoum 1992, The diel migrations and vertical distributions of zooplankton and micronekton in the Northwestern Mediterranean Sea. 1. Euphausiids, mysids, decapods and fishes, Journal of Plankton research, Vol.14(8) (1992), 1129-1154

A. Bernal Bajo, 2014, Feeding ecology and community structure of mesopelagics fishes in the western Mediterranean, Doctora de Ciències del Mar, Universitat Politècnica de Catalunya, Barcelona

B. Busalacchi, P. Rinelli, F De Domenico, A. Profeta, F. Perdichizzi, T. Bottati, 2010, Analysis of demersal fish assemblages off the Southern Tyrrhenian Sea (Central Mediterranean), Hydrobiologia, Vol.654 (2010) , 111-124.

J.E. Cartes, E. Fanelli, V. Papio I, L. Zucca, 2010, Distribution and diversity of open-ocean, near-bottom macroplankton in the western Mediterranean : Analysis at different spatio-temporal scales , Deep-sea Research I, Vol.57 (2010), 1485-1498

B. Cihangir, E. M. Tirasin, A. Unlüöglu, H.A. Benli, K. Can Bizsel, 2002, New records of Three deep-sea fishes : Diaphus rafinesquei (Myctophidae), Lobianchia gemellarii (Myctophidae), and Notolepis rissoi (Paralepididae) from the Aegan Sea (Turkish Coast) , Journal of Ichthyology, Vol.43(6) (2003), 486-489.

C. Dalyan, L. Eryilmaz, 2008, A new deepwater fish, Chauliodus sloani Bloch & Scneider, 1801 (Osteichthyes : Stomiidae), from the Turkish waters of Levant Sea (Eastern Mediterranean), Black Sea/Mediterranean Environment, Vol. 14 (2008), 33-37.

M. C. Deval, O. Güven, I. Saygu, T. Kabapçioglu, 2013, New records and uncommon occurreces of deep-water fishes in the Turkish Mediterranean Sea (Osteichthyes), Zoology in the Middle East, Vol. 59(4), 308-313

G. D'Onghia, F. Mastrototaro, A. Matarrese, C.-Y. Politou, Ch. Mytilineou, 2003, Biodiversity of the upper slope Demersal community in the Eastern Mediterranean : Preliminary Comparison in between two areas with and without trawl fishing, Journal Northwest Atlantic Fisheries Science, Vol. 31, 263-273.

G. D'Onghia, P. Maiorano P., L. Sion, A. Giove, F. Capezzuto, R. Carlucci,A. Tursi, 2009, Effects of deep-water coral banks on the abundance and size structure of the megafauna in the Mediterranean Sea, Deep-sea Research II, Vol.57 (2010), 397-411.

G.D'Onghia, A. Tursi, P. Maiorano, A. Matarrese, M. Panza, 1998, Demersal fish assemblages from the bathyal grounds of the Ionian Sea (middle-eastern Mediterranean), Italy Journal Zoology, Vol.65 (1998), Suppl:287-292.

M.C. Follesa, C. Porcu, S. Cabiddu,A. Mulas,A.M.Deiana, A. Cau, 2010, Deep-water fish assemblages in the central-western Mediterranean (south Sardinian deep-waters), Applied Ichthyology, Vol.27 (2011), 129-135.

B.S. Galil, M. Goren, 1994, The Deep Sea Levantine fauna. New records and rare occurrences, Senckenbergiana maritima, Vol.25, No.1, p.41-52

C. Garcia-Ruiz, D. Lloris, J.L. Rueda, M. C. Garcia-Martinez, L. Gil de Sola, 2015, Spatial distribution of ichthyofauna in the nothern Alboran Sea (Western Mediterranean), Journal of Natural History, Vol.49, No.19-20, 1191-1224

R.H. Goodyear, B.J. Zahuranec, W.L. Pugh, R.H.Jr Gibbs,1972, Mediterranean biological studies Final Report, Vol I, Washington, D.C. (USA) : Smithsonian Insitution

A. Kallianiotis, K. Sophronidis, P. Vidoris, A. Tselepides, 2000, Demersal fish and megafaunal assemblages on the Cretan continental shelf and slope (LE Mediterranean): seasonal variation in species density, biomass and diversity, Progress in Oceanography, Vol.46 (2000), 429-455.

M. Kaya, M. Bilecenoglu, 1999, New records of deep-sea fish in Turkish seas and the Eastern Mediterranean, Journal of Ichthyology, Vol.40(7) (2000), 543-547.

Page 49: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

xiv

P. Laval, J.-C. Braconnot, C. Carré, J. Goy, P. Morand, C.E. Mills, 1989, Small-scale distribution of macroplankton and micronekton in the Ligurian Sea (Mediterranean Sea) as observed from the manned submersible Cyanam Plankton Research, Vol.11(4) (1989), 665-685

A. Lourie, 1969, Occurrence of twol Lanternfishes (Myctophidae) in the open sea off Mount Carmel (Israel), Israel Journal of Zoology, Vol.18, No.4, 379-380

J. Moranta, M. Palmer, E. Massuti, C. Stefanescu, B. Morales-Nin, 2004, Body fish size tendencies within and among species in the deep-sea of the Western Mediterranean, Scientia Marina, Vol.68 (Suppl.3), 141-152

C. Stefanecu, J.E. Cartes, 1992, Benthopelagic habits of adult specimens of Lampanyctus crocodilus (Risso, 1810) (Osteichthyes, Myctophidae) in Western deep slope, Scientia Marina, Vol.56 (1) (1992), 69-74.

C.-Y. Politou, S. Kavadas, Ch. Mytilineou, A. Thrusiand, R. Carlucci, G. Lembo, 2003, Fisheries Resources in the Deep Waters of the Eastern Mediterranean (Greek Ionian Sea), Journal Northwest Atlantic Fisheries Sciences, Vol.31 (2003), 35-46

C.Papaconstantinou, K. Anastasopoulou, E. Caragitsou, 1997, Comments on the Mesopelagic fauna of the North Aegean Sea, Cybium, Vol.21(3) (1997), 281-288

C.-Y. Politou, C. Mytilineou, G. D'Onghia, J. Dokos, 2008, Demersal faunal assemblages in the deep waters of the easter Ionian Sea, Journal Of Natural History, Vol.43(5-8) (2008), 661-672.

P.Mateu, V. Nardi, N. Fraija-Fernandez, S. Mattiucci, L. Gil de Sola, J.A. Raga, M.Fernandez, F.J. Aznar, 2015, The role of the lantern fish (Myctophidae) in the life-cycle of cetacean parasites from Western Mediterranean Waters, Deep Sear Research I :Oceanographic Research Papers, Vol.95, 115-121

S. Tecchio, E. Ramirez-Llodra, F. Sarda, J.B. Company, 2011, Biodiversity of deep-sea demersal megafauna in western and Central Mediterranean basins, Scientia Marina, Vol.75(2) ,341-350

N.Ungaro,G.Marano,R.Marsan, K. Osmani, 1998, Demersal fish assemblage biodiversity as an index of fishery resources exploitation, Italian Journal of Zoology, Vol.65, 511-516

Page 50: GXQDXWUHpWDEOLVVHPHQW …halieutique.agrocampus-ouest.fr/memoires/201503.pdf · maintenir la photosynthèse. Les poissons ... Entre 1980 et 2011, la biomasse des mésopélagiques

xv

Diplôme :Diplôme d’Ingénieur de l’institut Supérieur des Sciences agronomiques Agroalimentaires, Horticoles et du Paysage

Spécialité : Halieutique

Spécialisation / option : Ressources et écosystèmes aquatiques

Enseignant référent : RIVOT Etienne

Auteur(s) : CLAVEL-HENRY Morane

Date de naissance* : 16.04.1992

Organisme d'accueil : Institute for the Oceans and Fisheries (IOF)

Adresse : 2202 Main Mall, Vancouver V6T 1Z4

Canada

Maître de stage : Villy Christensen

Nb pages : 34 Annexe(s) : 6

Année de soutenance : 2015

Titre français : Estimation de la biomasse des poissons mésopélagiques à partir de données environnementales en mer Méditerranée.

Titre anglais : Estimates of the mesopelagic fish biomass according to environmental data in Mediterranean Sea

Résumé (1600 caractères maximum) : La biomasse des poissons mésopélagiques est un paramètre utile dans les modèles écosystémiques et dans le cycle du Carbone. En mer Méditerranée, la zone mésopélagique est soumise au changement climatique et à une modification de la faune marine. L’influence des paramètres environnementaux sur la biomasse des mésopélagiques (salinité, température, production primaire, oxygène et bathymétrie issus de modèles biogéochimiques) est étudiée au travers des modèles prédictifs du modèle additif généralisé (GAM) et Random Forest (RF) entre 1980 et 2011. L’analyse statistique s’est portée sur 905 biomasses extraites de littérature scientifique en Méditerranée. La plus influente variable, la salinité, impacte grandement la biomasse avec GAM tandis que RF est aussi sensible à la production primaire. Les deux méthodes indiquent une biomasse importante dans le bassin occidental et la biomasse totale s’avère faible : 20 000 à 35 000 t avec RF qui estime 35% en plus par rapport à GAM. La tendance de la biomasse en 31 ans varie sans nette augmentation ou diminution pour GAM. Pour RF, l’augmentation de la biomasse se fait graduellement jusqu’à 36 000 t en 2000 puis diminue. Bien que RF soit statistiquement une méthode satisfaisante pour l’estimation de la biomasse, elle nécessite des précautions quant à la fiabilité des résultats. RF et GAM donnant des résultats différents, conserver les deux méthodes et poursuivre cette étude sur l’ensemble des océans permettraient de conclure lequel des deux outils donne un modèle représentant le mieux la distribution écologique des mésopélagiques.

Abstract (1600 caractères maximum) :

Mesopelagic fish biomass is a useful parameter for ecosystem modelling and the understanding of the Carbon cycle. In Mediterranean Sea, the mesopelagic layer is affected by the climate change and a change in marine fauna. The impact of environmental parameter on mesopelagic fish biomass (salinity, temperature, primary production, oxygen and bathymetry from biogeochemical models) is modelled with a generalized additive model (GAM) and Random Forest (RF) between 1980 and 2011. The statistical analyze was done on 905 samples extracted from the scientific literature in Mediterranean Sea. The most influent variable – salinity – has a high impact on biomass with GAM whereas RF is also sensitive to the primary production. Both methods gave an important biomass in the Western basin. The whole biomass varies between 20,000 t and 35,000 t with RF estimating 35 % more than GAM. There is not a clear trend in the GAM estimates while RF estimates gradually rose up from 1980 to 2000 and decreased then. Although RF is a method statistically good; it needs to be carefully considered towards the reliability of the results. Giving the different results between GAM and RF, those methods will be kept for estimating on the global oceans in order to determinate which of them is the best adequate for the ecological distribution of mesopelagic fish biomass.

Mots-clés : poisson mésopélagique, Méditerranée, Random Forest, Modèle additif généralisé, estimation de biomasse,

Key Words: Mesopelagic fish, Mediterranean, Random Forest, Generalized Additive Model, Biomass estimate

* Elément qui permet d’enregistrer les notices auteurs dans le catalogue des bibliothèques universitaires

Document à déposer sur moodle en format .txt