1
A Marine Species Benchmark Dataset for Ecological Modelling Samuel Bosch 1,2 , Sofie Vranken 1 , Lennert Tyberghein 2 , Olivier De Clerck 1 1 Phycology Research Group, Ghent University, Krijgslaan 281-S8, 9000 Ghent, Belgium. E-mail: [email protected] 2 Flanders Marine Institute (VLIZ), InnovOcean site, Wandelaarskaai 7, B-8400 Oostende. Environmental data from Bio-ORACLE and MARSPEC were added to all species distribution records with the sdmpredictors R package. Environment Environmental data from Bio-ORACLE and MARSPEC were added to all species distribution records with the sdmpredictors R package. Environment At marinespeed.org you can Download links for 3 versions of the dataset : Raw version: all distribution records with environmental data and random background data Pre-processed data at 25km² Grid filtered 5 fold random cross validation Pre-processed data at 25km² grid filtered 5 fold with spatial cross validation All versions include taxonomic and traits information for all species from WoRMS. Visualize all species and there distributions records, overlay environmental layers and ecoregions. Download Goals 1. the compilation of a quality-controlled dataset of marine species occurrence data which can be used for benchmarking Species Distribution Model algorithms (SDM), 2. linked to environmental data sets and 3. trait databases. Hurdles selection of a set of identifiable species evenly distributed over taxonomic groups, range sizes and ecoregions with sufficient unique distribution records. Sources and criteria distribution records from public repositories such as OBIS, GBIF, EMODNET and Reef Life Survey, filtered for emblematic species with sufficient records, known distributions, allowing identification of gaps and errors. Links taxa are linked to WoRMS and species trait databases. Species The benchmark dataset is primarily designed to test the efficiency of species distribution algorithms, even though its use should not be limited to SDMs. Current applications include: Predictor selection for different taxa Comparison of different methods correcting for sampling bias Comparison of different SDM algorithms Defining optimal parameter settings Comparing results from different papers Easy to access data for SDM courses Set up a Kaggle competition Applications 541 species 3 million records with environmental data 1 million records filtered on a 25 km² grid Results Starry triggerfish Abalistes stellatus Animalia > Chordata > Actinopteri > Tetraodontiformes > Balistidae > Abalistes More info: WoRMS OBIS GBIF Wikipedia MarineSPEED: Marine SPEcies with Environment Dataset Overview Download Species detail Kingdom (# species) Phylum (# species) Class (# species) Order (# species) Family (# species) Species <All> <All> <All> <All> <All>

A Marine Species Benchmark Dataset for Ecological …A Marine Species Benchmark Dataset for Ecological Modelling Samuel Bosch1,2, Sofie Vranken1, Lennert Tyberghein2, Olivier De Clerck1

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Marine Species Benchmark Dataset for Ecological …A Marine Species Benchmark Dataset for Ecological Modelling Samuel Bosch1,2, Sofie Vranken1, Lennert Tyberghein2, Olivier De Clerck1

A Marine Species Benchmark Dataset for

Ecological Modelling

Samuel Bosch1,2, Sofie Vranken1, Lennert Tyberghein2, Olivier De Clerck1

1 Phycology Research Group, Ghent University, Krijgslaan 281-S8, 9000 Ghent, Belgium. E-mail: [email protected] Marine Institute (VLIZ), InnovOcean site, Wandelaarskaai 7, B-8400 Oostende.

Environmental data from Bio-ORACLE and

MARSPEC were added to all species

distribution records with the sdmpredictors

R package.

EnvironmentEnvironmental data from Bio-ORACLE and

MARSPEC were added to all species

distribution records with the sdmpredictors

R package.

Environment

At marinespeed.org you can …

• Download links for 3 versions of the dataset :

Raw version: all distribution records with environmental data

and random background data

Pre-processed data at 25km² Grid filtered 5 fold random

cross validation

Pre-processed data at 25km² grid filtered 5 fold with spatial

cross validation

All versions include taxonomic and traits information for all species

from WoRMS.

Visualize all species and there distributions records, overlay

environmental layers and ecoregions.

Download

Goals 1. the compilation of a quality-controlled dataset of marine species

occurrence data which can be used for benchmarking Species Distribution Model

algorithms (SDM), 2. linked to environmental data sets and 3. trait databases.

Hurdles selection of a set of identifiable species evenly distributed over

taxonomic groups, range sizes and ecoregions with sufficient unique distribution

records.

Sources and criteria distribution records from public repositories such

as OBIS, GBIF, EMODNET and Reef Life Survey, filtered for emblematic species

with sufficient records, known distributions, allowing identification of gaps and

errors.

Links – taxa are linked to WoRMS and species trait databases.

Species

The benchmark dataset is primarily designed to test the

efficiency of species distribution algorithms, even though

its use should not be limited to SDMs. Current

applications include:

Predictor selection for different taxa

Comparison of different methods correcting for

sampling bias

Comparison of different SDM algorithms

Defining optimal parameter settings

Comparing results from different papers

Easy to access data for SDM courses

Set up a Kaggle competition

Applications

541 species

3 million records with environmental data

1 million records filtered on a 25 km² grid

ResultsStarry triggerfish

Abalistes stellatus

Animalia > Chordata > Actinopteri > Tetraodontiformes > Balistidae > Abalistes

More info: WoRMS OBIS GBIF Wikipedia

MarineSPEED: Marine SPEcies with Environment Dataset Overview DownloadSpecies detail

Kingdom (# species)

Phylum (# species)

Class (# species)

Order (# species)

Family (# species)

Species

<All>

<All>

<All>

<All>

<All>