D3: Dra design concept and CLC-Backbone, CLC-Core ...cnig.gouv.fr/wp-content/uploads/2017/11/D3_Specs_20171110.pdf · 4.3 EO data ... spatio-temporal Triple Store Approach ... defines

EEA/IDM/R0/17/003

Technical specifications for implementation

of a new land-monitoring concept based on

EAGLE

D3: Draft design concept and CLC-Backbone,

CLC-Core technical specifications, including

requirements review.

Version 3.0

10.11.2017

1 | P a g e

Version history

Version Date Author Status and

description

Distribution

2.0 04/10/2017 S. Kleeschulte, G. Banko,

G. Smith, S. Arnold, J.

Scholz, B. Kosztra, G.

Maucha

Reviewers:

G-H Strand, G. Hazeu, M.

Bock, M. Caetano, L.

Hallin-Pihlatie

Draft for review by NRC

LC

EEA, NRC LC

3.0 10/11/2017 S. Kleeschulte, G. Banko,

G. Smith, S. Arnold, J.

Scholz, B. Kosztra, G.

Maucha

Reviewers:

G-H Strand, G. Hazeu, M.

Bock, M. Caetano, L.

Hallin-Pihlatie

Updated draft integrating

the comments following

the NRC LC workshop

For distribution at Copernicus User

Day

2 | P a g e

Table of Contents

1 Context ............................................................................................................................................ 6

2 Introduction .................................................................................................................................... 7

2.1 Background ............................................................................................................................. 7

2.2 Concept ................................................................................................................................... 8

2.3 Role of industry ..................................................................................................................... 12

2.4 Potential role of Member States........................................................................................... 12

2.5 Engagement with stakeholder community ........................................................................... 13

3 Requirements analysis .................................................................................................................. 14

4 CLC-Backbone – the industry call for tender ................................................................................ 16

4.1 Spatial scale / Minimum Mapping Unit ................................................................................ 18

4.2 Reference year ...................................................................................................................... 18

4.3 EO data .................................................................................................................................. 18

4.3.1 Pre-processing of Sentinel-2 data ................................................................................. 19

4.4 Copernicus and European ancillary data sets ....................................................................... 19

4.4.1 Completeness OSM road data ...................................................................................... 25

4.5 National data ......................................................................................................................... 28

4.5.1 Example: Rivers and lakes ............................................................................................. 28

4.5.2 Example: roads .............................................................................................................. 29

4.5.3 Example: land parcel identification system (LPIS) ........................................................ 29

4.6 Specification for geometric delineation ................................................................................ 30

4.6.1 Level 1 Objects – hardbones ......................................................................................... 31

4.6.2 Level 2 Objects – softbones .......................................................................................... 34

4.6.3 Accuracy of geometric delineation: .............................................................................. 34

4.7 Specification for thematic attribution .................................................................................. 34

4.7.1 Thematic classes ........................................................................................................... 35

4.8 Proposal for production methodology ................................................................................. 37

4.8.1 Step 1: Definition and Selection of persistent boundaries ........................................... 38

4.8.2 Step 2: Image segmentation ......................................................................................... 40

5 CLC-Core – the grid approach ....................................................................................................... 41

5.1 Background ........................................................................................................................... 41

5.2 Populating the database ....................................................................................................... 42

5.3 Data modelling in CLC-Core .................................................................................................. 43

5.4 Database implementation approach for CLC-Core ............................................................... 44

3 | P a g e

5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores .................. 44

5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach ....................................... 49

5.4.3 Processing and Publishing of CLC-Core Products .......................................................... 50

5.4.4 Conclusion and critical remarks .................................................................................... 50

6 CLC+ - the long-term vision ........................................................................................................... 51

7 CLC-Legacy .................................................................................................................................... 54

7.1 Experiences to be considered ............................................................................................... 55

8 Technical specifications ................................................................................................................ 58

8.1 CLC-Backbone ....................................................................................................................... 58

8.2 CLC-Core ................................................................................................................................ 61

9 Annex 1: OSM data ....................................................................................................................... 63

9.1 Crosswalk between OSM land use tags and CLC nomenclature Level 2 ............................... 63

9.2 OSM road nomenclature....................................................................................................... 64

9.3 OSM roads – completeness .................................................................................................. 65

10 Annex 2: LPIS data in Europe .................................................................................................... 66

11 Annex 3: illustration of Step 1 “hardbone geometry” of CLC-Backbone .................................. 70

12 Annex 4: CLC-Backbone data model ......................................................................................... 72

12.1.1 Integration of temporal dimension into LISA................................................................ 75

12.1.2 Integration of multi-temporal observations into LISA .................................................. 76

13 Annex 5 – CLC nomenclature .................................................................................................... 78

4 | P a g e

List of Figures

Figure 2-1: Conceptual design for the products and stages required to deliver improved European

land monitoring (2nd generation CLC). .................................................................................................... 8

Figure 2-2: A scale versus format schematic for the current and proposed CLMS products. .............. 11

Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%

of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC) ....................................................... 16

Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori

information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)

the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects

based on this classification ................................................................................................................... 17

Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova),

Romania (near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Senitnel-2

cloud free services as Background (left to right). ................................................................................. 27

Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services

using the ELF interface (Nov. 2017). ..................................................................................................... 29

Figure 4-5: Overview of available GIS datasets LPIS and their thematic content © Synergise, project

LandSense. ............................................................................................................................................ 30

Figure 4-6: illustration of results of step 1 – delineation of Level 1 landscape objects. The border

defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data

was available (not shaded). .................................................................................................................. 39

Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between

encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:

10 daa = 1 ha. ........................................................................................................................................ 42

Figure 5-2: Representation of real world data in the CLC-Core. ........................................................... 43

Figure 5-3: Example of an RDF triple (subject - predicate - object). ..................................................... 47

Figure 5-4: GeoSPARQL query for Airports near the City of London. ................................................... 48

Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. .......................... 48

Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other.

The arrows indicate the flow of information from a query directed to the French CLC SPARQL

endpoint. ............................................................................................................................................... 49

Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.

.............................................................................................................................................................. 50

Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently

shrinking. The technique exists for polygon as well as raster data. ..................................................... 56

Figure 7-2: Generalization levels used in LISA generalization in Austria .............................................. 56

Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany ............................................... 56

Figure 7-4: Result of the generalisation process in Germany with technical changes in different

classes ................................................................................................................................................... 57

Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit68


LandSense ............................................................................................................................................. 69

Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit .............................. 73

Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component ......... 73

Figure 12-3: The new CLC-Basis model (based on LISA) ....................................................................... 74

Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a

vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017) .................. 76

5 | P a g e

Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is

stored in the data model ...................................................................................................................... 77

List of Tables

Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd

generation CLC. ..................................................................................................................................... 10

Table 2-2: Summary (matrix) of potential roles associated with each element / product. ................. 13

Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which

were analysed as potential input to support the construction of the geometric structure (Level 1

“hard bones”) of the CLC-Backbone. The products finally proposed for further processing are marked

in green. ................................................................................................................................................ 20

Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) ............ 31

Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. .............. 47

Table 6-1: List of CLC classes and requirements for external information ........................................... 52

6 | P a g e

1 CONTEXT

The European Environment Agency (EEA) and European Commission DG Internal Market, Industry,

Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual

strategy and associated technical specifications for a new series of products within the Copernicus

Land Monitoring Service (CLMS) portfolio, which should meet the current and future requirements

for European Land Use Land Cover (LULC) monitoring. These products are nominally called the "2nd

generation CORINE Land Cover (CLC)".

After a call for tender in 2017, the EEA has tasked the EIONET Action Group on Land monitoring in

Europe (EAGLE Group) with developing an initial response to fulfilling these needs through a

conceptual design and technical specifications. The approach adopted represents a sequence of

development stages, where separate single elements of the whole concept could be developed

relatively independently and at different rates, to allow time for broad consultation with

stakeholders. This also allowed the inclusion of MS input, the exploitation of industrial production

capacity, and the necessary feedback, lessons learnt and refinement to reach the ultimate goal of a

coherent and harmonized European Land Monitoring Framework.

The first stage of this process outlined the conceptual strategy and proposed a draft technical

specification for the first product (CLC-Backbone) to be developed. The first stage also involved a

presentation of the concept to the EIONET NRCs Land Cover and the collection of the NRCs feedback,

which were then carefully taken into consideration for the continuation of the process into the

second stage. The first stage was undertaken within the constraints outlined by EEA and DG GROW

for the initial product:

• Industrial production by service providers,

• Outcome product in vector format,

• Highly automated production process,

• Short timeframe of production phase,

• Driven by Earth Observation (Sentinel-2),

• Complete the picture started by the Local Component1 products (EEA-39).

This document represents the second stage of development to further expand on the conceptual

strategy and to extend the current technical specifications, to propose additional follow-up products

in more detail and to continue to communicate these to the stakeholders involved in the field of

European land monitoring. This second stage also aims at enlarging the circle of stakeholders to all

interested Copernicus users and beyond. It is vital for the success of the project and the long-term

evolution of land monitoring in Europe that all the relevant stakeholders communicate their

requirements and opinions to this process.

Revised versions of this document are to continue to be produced at specific milestones towards a

final version in early 2018.

1 The term “component” has a twofold function, 1) as the Copernicus local component products, 2) as a term

for Land Cover Component as an element within the EAGLE data model

7 | P a g e

2 INTRODUCTION

This chapter provides the background to the concept, an outline of the proposed approach to be

adopted, an overview of the elements of the concept and the current status of the developments.

2.12.12.12.1 Background Background Background Background

Monitoring of Land Use and Land Cover (LULC) and their evolving nature is among the most

fundamental environmental survey efforts required to support policy development and effective

environmental management2. Information on LULC play a key role in a large number of European

environmental directives and regulations. Many current environmental issues are directly related to

the land surface, such as habitats, biodiversity, phenology and distribution of plant species,

ecosystem services, as well as other issues relevant to climate change. Human activities and

behaviour in space (living, working, education, supplying, recreation, mobility & communication,

socializing) have significant impacts on the environment through settlements, transportation and

industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The

land surface, who´s negative change of state can only – if at all – be reversed with huge efforts, is

therefore a crucial ecological factor, an essential economic resource, and a key societal determinant

for all spatially relevant basic functions of human existence and, not least, nations’ sense of identity.

Land thus plays a central role in all three factors of sustainable development: ecology, economy, and

society.

LULC products so far tend to be produced independently of each other at the global, European,

national and sub-national levels, each of them focussing on similar but still different emphasis of

thematic content. Such diversity leads to reduced interoperability and duplication of work, and

thereby, inefficient use of resources. At the European level CORINE Land Cover (CLC) is the flagship

programme for long-term land monitoring and is now part of the Copernicus Land Monitoring

Service (CLMS). CLC has been produced for reference years of 1990, 2000, 2006 and 2012, with 2018

under preparation and expected to be available by late 2018. The CLC specification aims to provide

consistent localized geographical information on LULC using 44 classes at level-3 in the

nomenclature (see Annex 5). The vector databases have a minimum mapping unit (MMU) of 25 ha

and minimum feature width (MFW) of 100 m with a single thematic class attribute per land parcel.

At the European level, the database is also made available on a 100 x 100 m and 250 x 250 m raster

which has been aggregated from the original vector data at 1:100 000 scale. CLC also includes a

change layer which records changes between two of the 44 thematic classes with a MMU of 5 ha.

Although CLC has become well established and has been successfully used, mainly at the pan-

European level, there are a number of deficiencies and limitations that restrict its wider exploitation,

particularly at the Member State (MS) level and below. This is partly due to the fact that many MS

have access to more detailed, precise and timely information from national programs, but also to

the fact that the MMU of CLC (25 ha) is too coarse to capture the fine details of the landscape at the

local and regional scales. In consequence, all features of smaller size that represent landscapes

diversity and complexity are not mapped either because of geometric generalization or because they

are absorbed by thematically mixed classes with very broad definitions. Moreover, changes of

features and landscape dynamics that are highly relevant to locally decided but globally effective

policy, such as small-scale forest rotation, changes in agricultural practices and urban in-filling, may

be missed due to low spatial resolution and/or thematic depth of CLC class definitions. The

successful use of CLC in combination with higher spatial resolution products to detect and document

2 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project.

8 | P a g e

urban in-filling has given a clear indication of the required direction of development for European

land monitoring.

To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to

include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide

pan-European information on selected surface characteristics in a 20 m raster format, also available

aggregated to 100 m raster cells. They provide information on basic surface properties such as

imperviousness, tree cover density and permanent grassland and can be described as intermediate

products. The LoCo products in vector format are based in part on very high spatial resolution EO

data tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide detailed

thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha. However, the

LoCo products altogether do not provide wall to wall coverage of the EEA-39 countries, even when

combined. Their nomenclatures are not harmonised and thus cause interoperability problems.

2.22.22.22.2 Concept Concept Concept Concept

Given the above issues a revised concept for European Land Monitoring is required which both

provides improved spatial and thematic performance and builds on the existing heritage. Also,

recent evolutions in the field of land monitoring (i.e. improved Earth Observation (EO) input data

due to e.g. Sentinel-programme, national bottom-up approaches, processing methods, additional

reporting requirements, INSPIRE directive, etc.) and desktop and cloud-computing capacities offer

opportunities to deliver these improvements effectively and efficiently. The EEA in conjunction with

DG GROW has identified this need and now aims to harmonise and integrate some of the CLMS

activities by investigating the concept and technical specifications for a higher performance pan-

European mapping product under the banner of "2nd generation CLC".

It was determined that such a 2nd generation CLC should build upon recent conceptual development

and expertise, while still guaranteeing backwards compatibility with the conventional CLC datasets.

Also, the proposed approach should be suitable to answer and assist the needs of recent evolution

in European policies like reporting obligations on land use, land use change and forestry (LULUCF),

plans for an upcoming Energy Union or long-term climate mitigation objectives. After a successful

establishment of 2nd generation CLC there would then also be knock-on benefits for a broad range of

other European policy requirements, land monitoring activities and reporting obligations.

The proposed conceptual strategy consists of a number of interlinked elements (Figure 2-1) which

stand for separate products and therefore can be delivered independently in separate stages. Each

of the products has its own technical specification and production methodology and can be

produced through its own funding / resourcing mechanisms. Throughout the documents delivered

by this project the conceptual design given in (Figure 2-1) will be used as a key graphic to identify the

product and stage being described.

Figure 2-1: Conceptual design for the products and stages required to deliver improved European

land monitoring (2nd generation CLC).

9 | P a g e

The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is

based on the four elements shown in Figure 2-1. The reports associated with this work will expand

incrementally on the conceptual design and provide technical specifications of increasing

elaboration for the products. Although the four elements / products will be described in more detail

later in this, and subsequent, reports they can be summarised as follows:

1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a

geometric spatial structure for landscape features with limited, but robust EO-based land

cover thematic detail on which to build other products.

2. CLC-Core is a consistent, multi-use grid3 database repository for environmental information

populated with a broad range of land cover, land use and ancillary data, forming the

information engine to deliver and support tailored thematic information requirements.

3. CLC+ is the end point for this specific exercise and is a derived vector and raster product

from the CLC-Core and CLC-Backbone and will be a LULC monitoring product with improved

spatial and thematic performance, relative to the current CLC, for reporting and assessment.

4. The final element of the conceptual design, although not strictly a new product, is the ability

to continue producing the existing CLC, which may be referred to as CLC-Legacy in the

future, which already has a well-established and agreed specification.

Table 2-1 provides a first overview of the main characteristics of the four elements that are

developed in more detail in the following chapters of this document. The table allows the reader to

make comparisons between the key characteristics of the products

3 A data structure whose grid cells are linked to a data model that can be populated with the information from

the different sources.

10 | P a g e

Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd

generation CLC.

CLC-Backbone CLC-Core CLC+ CLC-Legacy

Description Detailed wall to wall

(EEA-39) geometric

vector reference layer

with basic thematic

content.

All-in-one data

container for land

monitoring

information

according to EAGLE

data model.

Thematically and

geometrically

detailed LULC

product.

A more generalised

LULC product

consistent with the

CLC specification.

Role /

purpose

Support to CLMS

products and services

at the pan-European

and local levels.

Thematic

characterisation of

CLMS products and

services at the pan-

European and local

levels.

Support to EU and

national reporting

and policy

requirements.

Maintain the time

series (backwards

compatibility) and

support legacy

business systems.

Format Vector. GRID database. Raster and vector. Raster and vector.

Thematic

detail

<10 basic land cover

classes, few spectral-

temporal attributes.

Rich attribution of

LU, LC and ancillary

information.

High thematic detail

including LC and LU

with improvements

compared to CLC.

CLC-nomenclature

with 44 classes

+ changes between

the 44.

Geometric

detail (MMU,

grid size)

0.25 to 5.0 ha. 10 x 10 m to

1 x 1 km.

TBD, related to CLC-

Backbone.

25 ha for status, 5

ha for changes

(raster: 100 x

100 m).

Update cycle 3 - 6 years. Dynamic update as

new information

becomes available.

3 - 6 years. Standard 6 years.

Reference

year

2018 2018 2018 2018 / 2024

Production

year

2018 2019 TBD TBD

Method Geospatial data

integration and image

segmentation for

geometric boundaries

and attribution /

labelling by pixel-

based EO derived

land cover

EAGLE-Grid

approach:

population and

attribution of

regular GRID

Derived by SQL from

CLC-Core

Derived from CLC+

or directly from CLC-

Core (TBD.)

Input data Geospatial data, EO

images and ancillary

data selected from

LoCo (with visual

control) and HRL

products.

CLC-Backbone, all

available HRLs,

LoCo, ancillary data,

national data

provisions (e.g. LU).

CLC-Core and

CLC-Backbone

CLC + and / or

CLC-Core

11 | P a g e

Figure 2-2 is a schematic description of the conceptual design showing the relationship of the new

elements / products to existing CLMS products in terms of their format and level of spatial detail. In

this representation:

1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map

with fewer details.

2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more spatial and

thematic details.

3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific

surface characteristics with high spatial details.

4. The proposed CLC-core, a grid product where the level of detail was still to be decided.

5. The CLC-Backbone and CLC+, both polygon maps with more details

Figure 2-2: A scale versus format schematic for the current and proposed CLMS products.

Given the context, requirements and issues, the aim should be to find a means to deliver the

concept and its proposed products with an efficient mixture of industrially produced material backed

up by auxiliary information from various national programmes. It is important to propose a viable

system in which there is flexibility to adapt the later steps to issues of feasibility and practicality once

the implemention of the earlier steps is underway. For instance, the outcomes of the CLC-Backbone

production should be able to influence thematic content of the CLC-Core and / or the technical

specifications of the CLC+.

12 | P a g e

2.32.32.32.3 Role of industryRole of industryRole of industryRole of industry

The development and production of CLC to this pointed has had only a limited role for industry,

mainly focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012

etc.), the validation of the CLC2012 products and in some MS the subcontracting of the actual

production. Conversely, the production of the non-CLC products within the CLMS has been

dominated by industry through a series of service contracts to generate consistent products across

Europe.

Industry has the ability to produced operational solutions which exploit automation, can handle

large data volumes and are scalable to European-wide requirements. As the amount of available EO

and GI data increases, highly efficient and effective mechanisms for production will be required for

at least parts of the European land monitoring process and economies of scale must be exploited.

Industry therefore offers a number of capabilities which will be required at selected points within

the 2nd generation CLC.

DG GROW has expressed the wish that industry should have an initial role in the production of CLC-

Backbone which has therefore been designed in part to exploit industrial capabilities. Further

opportunities for industrial involvement are shown in Table 2-2.

2.42.42.42.4 Potential role Potential role Potential role Potential role of Member Statesof Member Statesof Member Statesof Member States

The MS have always been intimately linked with the CLC as its production has been the responsibility

of the nominated authorities through EIONET. The MS have provided the production teams for the

actual mapping to exploit local knowledge, familiarity with native landscape types and access to

national datasets which may not have been possible to share further. This bottom-up production

approach has been key to successful delivery of this important time series.

With respect to the non-CLC products within the CLMS the MS involvement has been limited so far

to verification and, in the case of the 2012 products, enhancement. Although the MS have provided

valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have

not been fully exploited within the CLMS.

Table 2-2 shows the potential for a greater role for the MS across all of elements / products. It is

important that the MS experts have oversight of the specification, as is happening within this

project, and opportunities to contribute data, experience and location knowledge to the production

and validation activities where appropriate.

13 | P a g e

Table 2-2: Summary (matrix) of potential roles associated with each element / product.

CLC-Backbone CLC-Core CLC+ CLC-Legacy

EEA / DG

GROW / EC

Definition,

coordination,

main user.

Definition,

coordination, main

user.

Definition,

coordination, main

user.

Definition,

coordination, main

user.

MS Review of

specification,

validation, user.

Review of

specification,

population with

national datasets,

user.

Review of

specification,

support to

production,

validation, user.

Production,

validation, user.

Industry Production. Implementation

and maintenance

DB infrastructure.

Support

production,

validation.

Validation.

2.52.52.52.5 Engagement with stakeholder community Engagement with stakeholder community Engagement with stakeholder community Engagement with stakeholder community

The success of the project and the long-term development of land monitoring in Europe is

intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant

stakeholders are aware of this activity and contribute their requirements and opinions to this

process. Revised versions of this document are foreseen at specific milestones towards a final

version for deliver in early 2018.

This specific deliverable, D3, is the second step towards the definition of the conceptual strategy and

the potential technical specifications for a series of new CLMS products. It aims at communicating

these details to the stakeholders involved in European land monitoring to elicit feedback, comments

and questions. The work reported here begins with a requirements review which goes beyond the

remit of the call for tender. The four elements and potential products of the 2nd generation CLC

within the CLMS are described in further detail in the following chapters. This version already

includes the first feedback from the NRC LC meeting in Copenhagen in October, 2017. For CLC-

Backbone, this document updates the draft versions of the technical specifications and the outline

implementation methodology provided in D2. These details are still open for discussion and able to

be reviewed by stakeholders and will ultimately be used as input for an open call for tenders to

industrial service providers for a production to start in 2018. For CLC-Core, this document provides a

more advanced outline of the technical specification and proposes a number of options for the

implementation. The technical design of CLC-Core will continue to be developed based on

stakeholder feedback in future steps. Similarly, the current thinking around CLC+ will continue to be

developed to illicit feedback to guide expansion of the technical specifications at a future step within

the project.

14 | P a g e

3 REQUIREMENTS ANALYSIS

In line with the principle of Copernicus this analysis in support of the development of new products

with CLMS is driven by user needs rather than the current technical capabilities of EO sensors and

processing systems. The technical issues will be dealt with in the chapters related to the products

focusing on specification and methodological development. This analysis was initially based around

the requirements set out in the original call for tender, but this has now been extended through

inclusion of recent work by the EC and ETC to provide a broader range of stakeholder needs. As this

project develops further, requirements will be integrated to support the increasingly detailed

specifications towards those for a complete 2nd generation CLC.

Although the MMU and thematic issues of CLC have been extensively reported for some time, the

target specification for a future improved harmonised European land monitoring product are less

clear. The proposed products should support European policies, particularly assist reporting

obligations on European level (and not on national level) on land use, land use change and forestry

(LULUCF), plans for an upcoming Energy Union and long-term climate mitigation objectives.

However, many of the policies that could exploit EO-based land monitoring information rarely give a

clear quantitative requirement for spatial, temporal or thematic specifications. The higher-level

strategic policies, such as the EU Energy Union often refer to the monitoring of other activities such

as REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to

rely on the products currently available. For instance, LULUCF assessments in Europe (independently

from national reporting obligations) use CLC, while at the global scale they use Global Land Cover

2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU). Some of these initiatives have

explored the potential of improved monitoring performance, such as the use of SPOT4 imagery with

a 20 m spatial resolution to show land-use change between 2000 and 2010 for REDD+ reporting, but

until very recently it has not been practical for these types of monitoring to become operational

globally.

A more fruitful avenue for user requirements in support of the 2nd generation CLC is to consider the

initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU

Biodiversity Strategy to 2020 was adopted the European Commission to halt the loss of biodiversity

and improve the state of Europe’s species, habitats, ecosystems and the services. It defines six major

targets with the second target focusing on maintaining and enhancing ecosystem services, and

restoring degraded ecosystems across the EU, in line with the global goal set in 2010. Within this

target, Action 5 is directed at improving knowledge of ecosystems and their services in the EU.

Member States, with the assistance of the Commission, will map and assess the state of ecosystems

and their services in their national territory, assess the economic value of such services, and

promote the integration of these values into accounting and reporting systems. The Mapping and

Assessment of Ecosystems and their Services (MAES) initiative is supporting the implementation of

Action 5 and, although much of the early work in this area on European level has used inputs such as

CLC, it is obvious that to fully address this action an improved approach is required especially for a

better characterisation of land, freshwater and coastal habitats, particularly in watershed and

landscape approaches. The spatial resolution at which ecosystems and services should be mapped

and assessed will vary depending on the context and the purpose for which the mapping/assessment

is carried out. However, information from a more detailed thematic characterisation and

classification and at higher spatial resolution are required which are compatible with the European-

wide classification and could be aggregated in a consistent manner if needed.

A first version of a European ecosystem map covering spatially explicit ecosystem types for land and

freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of

15 | P a g e

the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a

range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial

resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting

of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully

exploit the recently available Copernicus Sentinel data and land products, and move down to a

higher spatial resolution.

The recent EC-funded NextSpace project collected user requirements for the next generation of

Sentinel satellites to be launched in the 2030 time horizon. These requirements were analysed on a

domain basis and from the land domain those requirements which were given a context of “land

cover (including vegetation)” were initially considered. This was extended so that related contexts

such as “glaciers, lakes, above-ground biomass, leaf area index and snow” were also considered. A

broad range of spatial resolutions were requested from sub 2.5 m to 10 km, but two thirds of the

collated requirements wanted data in the 10 – 30 m region. As would be expected the requested

MMUs were also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field /

city block level for much of Europe. The temporal resolution / update frequency was dominated by

yearly revisions, although some users could accept 5 yearly updates and some wished for monthly

updates, although the reason was unclear. The users were less specific about the thematic detail

required and the few references given were to CLC, EUNIS, LCCS and “basic land cover”. The

preferred accuracy of the products was in the range 85 to 90%.

The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of

EIONET members involved with the CLMS and CLC production considering a range of topics in

advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the

potential improvements that could be made towards a 2nd generation CLC product. As expected, it

was noted that some CLC classes cause problems because of their mixed nature and instances on the

ground that are sometimes complex and difficult to disentangle. It was suggested that this situation

could be improved by the use of a smaller MMU so that the mapped features have more

homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear

features to be represented more realistically. Of the 32 countries, who responded to the

questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general,

a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to

remaining at the current 25 ha, but the majority favoured 0.54 to 5 ha. Thematic refinement is also

supported by around one third of the respondents, who requested improved thematic detail,

separation of land cover from land use, splitting formerly mixed CLC classes and the addition of

further attributes to the spatial polygons.

From the requirements review so far and considering the requirements put forward by LULUCF, it is

suggested that the ultimate product of a 2nd generation product would have a MMU of 0.5 ha and be

based on EO data with a spatial resolution of between 10 and 20 m. The thematic content would be

a refinement of the current CLC nomenclature to cope with changing MMU, separation of land cover

and land use for the needs of ecosystem mapping and assessment. The temporal update could come

down from 6 years to 3 years in the short-term, and potentially to 1 year in the longer-term.

4 0.5 ha being required by LULUCF

16 | P a g e

4 CLC-BACKBONE – THE INDUSTRY CALL FOR TENDER

CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a

baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-to-

wall coverage of CLC-Backbone shall draw on, complete and amend the picture started by the LoCo

(appr. 1/3 of total area) of the current local component products (Urban Atlas, Riparian Zones and

Natura 2000) as are currently shown in Figure 4-1. It will be comprehensive and effectively complete

the coverage of the EU-28 as a minimum, but preferably the whole EEA-39.

Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%

of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC)

CLC-Backbone will initially be produced by a mostly automated, industrial approach, where

production can be separated into different levels, including different degrees and sequential order of

automation and human interaction (Figure 4-2). Within CLC-Backbone landscape objects are defined

as vector polygons and identified on different levels.

Step 1: The first level of object borders (skeleton) represent persistent objects (“hard bones”) in

the landscape (Step 1).

Step 2: On a second level a subdivision of the persistent landscape features (Level 1) will be

achieved through image segmentation (“soft bones”), based multi-temporal Sentinel data within a

defined observation period (resulting in Level 2 landscape features = polygons).

17 | P a g e

Step 3: The output of CLC-Backbone – the delineated objects – represent spectrally and/or texturally

homogeneous features that are further characterised and attributed using the EAGLE land cover

component concept. The characterization of objects (segments) can be achieved using two options

• Attributing the segments using summary indicators based on a pixel-based classification of

fairly simple land cover classes (e.g. dominating class, percentage mixture of classes for each

segment)

• And / or attributing the segments based on spectral mean values of segments

Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori

information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)

the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects

based on this classification

For implementation of CLC-Backbone the existing European land cover monitoring framework has to

be considered:

� Heritage: The concept has to take into account the existing pan-European and local Copernicus

layers and the spatial and thematic representation of the landscape. The technical specifications

to be provided by this contract are therefore based on a thorough review of available and

feasible datasets and their geometry and thematic content.

� Integration: CLC-Backbone will integrate selected geometric information from existing

Copernicus products at different steps within the process.

� Sequence of production: Ideally, the production of CLC-Backbone and of the other Local

Component data sets would be arranged in a sequential order that CLC-Backbone can build and

18 | P a g e

integrate on the most updated Local Component data to achieve best consistency among the

layers.

Realistically, the production will need to make use of selected elements of the existing Local

Component layers and High Resolution Layers and of existing ancillary data that might be not

identical with the reference year of the production. The time lag between input data and production

reference year does not comprise a limiting factor. As the final delineation of polygons is achieved by

image segmentation (“soft bones”) of Sentinel-2 images of the actual reference year, any derivation

of the “hard bone” based geometry compared to the actual landscape structure (due to outdated

data) will be compensated by the image segmentation step. Considering an annual land cover

change of approx. 0,5% (on CLC spatial scale) the existing layers and dataset still be able to provide a

reliable input that is adapted by the image segmentation step.

4.14.14.14.1 Spatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping Unit

CLC-Backbone shall address landscape features with a predefined MMU of 0.5 ha and a minimum

mapping width (MMW) of 10m (these values are subject to the ongoing consultation process and

might be revised).

Each object (delineated polygon) in CLC-Backbone will be encoded according to a quite basic land

cover nomenclature (between 5-15 pure land cover classes, in line with EAGLE Land Cover

Components) and additionally characterized – depending on the user requirements - by a number of

attributes (e.g. NDVI time series) giving more detailed information about the land cover and their

dynamics inside the polygon.

4.24.24.24.2 Reference year Reference year Reference year Reference year

The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e.

Sentinel-2 HR layer stack. It is suggested that the image stack covers one full vegetation season

ranging from 2017 to 2018, noting that the vegetation season in the south of Europe starts already

in October of the previous year.

4.34.34.34.3 EO data EO data EO data EO data

The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available

observations from the ESA service hubs to provide a multi-spectral, multi-date, 10-20 m spatial

resolution imagery as input. As the full satellite constellation of Sentinel (using 2A and 2B) will be

available operationally from late 2017 onwards, it is anticipated that last month of 2017 and full year

2018 is the first full vegetation period for a comprehensive multi-temporal coverage of Europe.

In case of availability of a full European-wide coverage of VHR data for 2018, its integration in the

processing chain should be considered.

The synergic use of Sentinel -1 (SAR) imagery shall be considered for enhancing the thematic

information, especially on thematic issues like soil properties (wetness), tillage and harvesting

activities. Recent studies on the use of merged optical-SAR imagery as well as SAR visual products

and the experiences in HRL production have confirmed their applicability for both semi-automated

and visual interpretation.

19 | P a g e

4.3.1 Pre-processing of Sentinel-2 data

As Sentinel-2 is an optical system the average cloud coverage influences the number of observations

significantly. Nowadays scenes are not ordered anymore according to their average cloud coverage,

but all cloud free (and shadow-free) pixels of an image can be analysed. The constellation of both

Sentinel-2 satellites improves the revisiting time to 3-4 days on average in Europe, having a more

frequent coverage in the north according to the overlaps of the S-2 path-footprints.

Each scene has to be pre-processed to correct for atmospheric conditions. ESA has evaluated two

different types of atmospheric corrections software algorithm in order to produce a L2A product:

• Sen2Cor and

• Maja

The Sen2Cor algorithm identifies clouds and shadows in a singular scene-by-scene approach,

whereas Maja (combination of MACCS (CNES/CESBIO) and ATCOR) identifies clouds and shadows

according to the complete image stack over time.

ESA will decide until end 2017 which algorithm and which data processing will be implemented for

atmospheric correction. Current plans foresee to start atmospheric correction using MAJA in Europe

from 1/2018 onwards and to reassess the algorithms in 2019.

The cloud detection is an important element in the processing chain, but for northern countries the

typical cloud coverage may still be a very limiting factor. Therefore, either archive data from longer

periods or data from additional sensors (optical and radar) should be considered to fill gaps due to

cloud coverage.

4.44.44.44.4 Copernicus and Copernicus and Copernicus and Copernicus and European European European European ancillary data sets ancillary data sets ancillary data sets ancillary data sets

There are a number of potential input sources beyond EO image data that can be used in the

delineation of landscape objects. Table 4-1 shows the potential datasets that were analysed to form

the majority of the “hard bones” or persistent boundaries in the landscape. Only those data can be

considered that provide an adequate spatial resolution. Those data that are finally suggested to

contribute to CLC-Backbone are marked in green, whereas all other data will be used to attribute the

thematic content of the GRID-database in CLC-Core.

Geospatial data (especially on roads, rivers, buildings, LPIS and land cover/land use) are held on

national level in many cases in higher level of detail. Some of the European data might be

substituted by national data given that the criteria concerning technical and licensing issues as

described in chapter 0 are met.

Besides free and open data on European level also commercial data could be considered as input

e.g. street data (e.g. Navmart – HERE maps, etc.) or the TanDEM-X DEM for small woody features

and structure information for tree covered areas.

20 | P a g e

Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which were analysed as potential input to support the

construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone. The products finally proposed for further processing are marked in

green.

product MMU Minimum

width

Format Potential use for constructing basic

geometry

Reference year

Pan-European

CORINE Land Cover and

accounting layers

25 ha status layer

5 ha change layer

100 m Vector outlines of basic vegetation types in

remote areas without roads and

settlement network

2012, 6-year

update cycle

HRL imperviousness 20 m pixel (0,04 ha) - Raster 20 m HR satellite imagery 2015, 3-year

update cycle

HRL tree cover density 20 m pixel (0,04 ha) - Raster 2015, 3-year

update cycle

HRL forest types 0,5 ha - Raster Minimum crown cover 10% 2015, 3-year

update cycle

HLR Permanent water bodies 20 m - Raster 20 m HR satellite imagery 2015, 3-year

update cycle

European Settlement Map

(ESM)

10*10 m pixel (0,01 ha) - Raster JRC one-off scientific product: 2.5 m

VHR imagery; scientific product

reference year 2012

2014, no update

foreseen

GUF+ 10*10 m Raster S1 10m and Landsat 30m, GUF+

2018 will be based on S1/S2

2014-15

HRL Small woody feature 0,002 ha (raster product) Linear

elements

Vector and

Raster (5m

and 100m)

Streamlining halted due to VHR 2015

concerns

2015

HRL phenology 10 m - Raster Only attribution in CLC-Core 2018, likely 3-

years

Local Components

Urban Atlas 0,25 ha – 1 ha 10 m Vector Delineation of roads as polygon

feature and outer border of

settlement structure (large cities)

21 | P a g e

product MMU Minimum

width


geometry

Reference year

Riparian Zone 0,5 ha 10 m Vector Delineation of rivers and roads as

polygon feature.

N2K product 0,5 ha 10 m Vector

RPZ Green linear elements 0,05 ha – 0,5 ha < 10 m

width

< 100 m

length

Vector Delineation of small woody

vegetation elements

National products

Variety of products for land

cover, land use, population,

environmental variables

Aggregated data according to EAGLE

data model

irregular

Accompanying/Ancillary

Layers

IACS/LPIS Varying from block to

parcel level, depending on

national system

- Vector Freely available and accessible for

less than 1/3 of Europe. Increasing

availability from year-to-year due to

INSPIRE regulation. Varying thematic

content (from reference parcel only

to detailed crop types)

2017, Yearly

updates

Open Street Map - roads n.d. Line Linear transportation network

(centre lines)

Up-to date; full

time history

Open Street Map – buildings n.d. Vector -

Polygon

Delineation of single buildings

(polygons)

Up-to date; full

time history

Open Street Map – land use TBD. Vector -

Polygon

Polygon selection of settlement

areas and transportation

infrastructure according to OSM land

use tags http://osmlanduse.org

Up-to date; full

time history

HERE maps (Navmart) n.d. Line Commercial layer to replace OSM or

national road databse

Regular updates

22 | P a g e

product MMU Minimum

width


geometry

Reference year

EU Hydro (Copernicus

reference layer)

Line Geometric quality derived from VHR

images, very high location accuracy

Tbd.

WISE WFD – surface water

bodies

Waterbodies as polygons

Waterbodyline as line

geometry

Polygon + line Based on WFD2016 reporting (UK

and Slovenia only for viewing)

6 years (next

update 2022)

European coastline Line Has been produced by Copernicus

for HRL production.

Crowd-sourced data (citizen

observatories)

n.a. Point Observations from citizens as

promoted through Horizon 2020

research initiatives (e.g. LandSense,

groundtruth 2.0)

Irregular

23 | P a g e

Selected classes of the local Component products, Urban Atlas, Riparian Zones, and Natura 2000,

obviously have valuable information to offer for the production of CLC-Backbone. The HRL layers will

be mainly used for populating CLC-Core and only as one exception as well in CLC-Backbone.

The usage of additional geospatial information for constructing Level 1 “hard bones” is based on the

following arguments:

• European landscape is a highly anthropogenic transformed landscape, where transport and

river networks (beside others) form the basic subdivision of landscape

• The location of transport and river networks are fairly known due to geospatial data on

national and/or European scale

• Many of the borders defined by transport and river networks can be identified as well as

features directly from Sentinel-2 images, but this identification is combined with partially

very high costs and lower accuracies

The geospatial datasets are used in the delineation of level 1 “hard bones” in two different forms:

• As linear networks that define the border of landscape objects (e.g. parcels that are

surrounded by roads and rivers)

• As polygons that define a-priori landscape objects (e.g. wide roads, wide rivers)

Concretely we suggest the following use of existing COPERNICUS and ancillary data for the

production of CLC-Backbone (i.e. mainly for the provision of geometric information) and CLC-Core

(i.e. thematic related information).

For more detailed information on the suggested production methodology, please refer to chapter

4.8.

• CLC-Backbone:

o Urban Atlas:

� Use of outer delineation of linear transport infrastructure (class 122xx roads

and class 122) for Level 1 “hardbone” delineation

� Use of outer delineation of water areas (class 50000 water) for Level 1

“hardbone” delineation

� Use of outer delineation of cities as “persistent” segments (Level 1) for the

delineation of settlement areas (class 111xx continuous urban fabric, class

112xx continuous urban fabric, 11300 isolated structures and class 12100

industrial and commercial units).

o Riparian Zones:

� Use of river delineation (class 911 interconnected water courses and 912

highly modified water courses and 913 separated water bodies) for Level 1

“hardbone” delineation.

� Use of outer delineation of cities as “persistent” segments (Level 1) for the

delineation of settlement areas (class 1111 continuous urban fabric, class

1112 dense urban fabric, 1113 low density fabric, 1120 industrial and

commercial units).

24 | P a g e

o Natura 2000

� Using in analogy to RZ and UA the same classes for delineation of rivers,

transport infrastructure and urban outline.

o HRL forest types:

� Creation of tree mask based on the HRL forest types, as this layer already

applies a useful tree cover density threshold (>30% TCD) and a MMU of 0,5

ha for the Level 1 “hardbone” delineation of forest boundaries.

o OSM – roads

� Roads > 10m: delineation of roads as polygons according to S-2 pixel based

information

� Roads < 10m: transportation network outside the LoCo as a-priori border

between adjacent polygons (Level 1 object)

o EU-Hydro

� River network outside the LoCo as a-priori border between adjacent

polygons (Level 1 object)

o HRL imperviousness

� Derivation of outer boundary of settlements (Level 1 objects)

� Refined with ESM – European settlement map

o OSM – buildings

� Enhancement of settlement mask using buffer methods to delineated and

integrate OSM single buildings

o OSM – land use

� Enhancement of road and railway infrastructure using polygon area from

land use dataset with specific OSM tags (e.g. “roads”); full OSM tag-list

available in Annex 1

� Enhancement of settlement mask in areas, where OSM-building is not

complete, using polygon area from land use dataset with specific OSM tags

(e.g. “residential”); full OSM tag-list available in Annex 1

• CLC-Core:

o National data (MS contribution)

� National datasets (generalized / resampled if not freely available in original

resolution)

� Manual input by MS without readily available sufficient national datasets

� Land Use data

� Crop types

� Land Use intensity

25 | P a g e

o Specific data input to be defined for CLC-Core to maintain CLC-heritage and to

improve thematic content (e.g. detailed LU, LC or habitat information). UA, RZ, N2K

� Use of thematic information (aggregated to one single harmonized

nomenclature) for populating the grid database.

o all HRLs

� Use of thematic information (assigned to EAGLE land cover components) for

populating the grid database.

o OSM – land use

� Attribution of land use according to existing cross-walk between OSM land

use tags and CLC nomenclature Level 2

o Crowd-sourced data

� Data from citizen observatories become increasingly available either as

validation/verification dataset (e.g. LacoWiki) or as attributive data source

(e.g. FotoQuest GO app, specific campaigns and social media like FlickR,

foursquare, etc.)

o CLC accounting layers

� Data from CLC used for identification of thematic features not covered by

any other (finer resolution) datasets, e.g. mineral extraction areas, airport,

glaciers.

o CLC

� Outlines of basic LC types in remote areas without road and settlement

network and without forest

Mineral extraction sites

o EO parameters (from Sentinel-2 image stacks)

� This includes daily vegetation trajectories as well as key indicators for the

characterization of the growing season (start day, end day) and biological

productivity (maximum, amplitude, integral)

Any geometric information ingested in CLC-Backbone from existing products (partially 3-6 years

older than current RS data) will be “cross-checked” and corrected during the image segmentation

process. Every geometry that is outdated due to the time lag of the ancillary data - and where recent

land cover changes occurred - is updated in the phase of the Level 2 “soft bone” segmentation,

where actual EO data are used.

4.4.1 Completeness OSM road data

One major critic when using crowd-sourced data is their incompleteness and varying quality. For

OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, 20175) have

shown a completeness in Europe of by far more than 96% for the large majority of the EEA 39

countries (the figures for the completeness per country are available in the Annex). According to

5 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698

26 | P a g e

their study only Serbia, Bosnia-Hercegovina and Turkey have completeness values below 90%. These

three countries may be subject for a refinement of roads using additional national data sources.

In the Figure 4-3 below an impression of the completeness of OSM in four countries (Estonia,

Portugal, Romania and Serbia) is provided based on aerial images and Sentinel-2 background data.

27 | P a g e

Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova), Romania

(near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Senitnel-2 cloud free services as

Background (left to right).

28 | P a g e

4.54.54.54.5 National data National data National data National data

The use of national data replacing the foreseen European geospatial data for the creation of the CLC-

Backbone could be considered if the national data fulfil the following prerequisites:

• Provision of a consistent set of information across European countries, i.e. similar scale and

amount of geometric detail, comparable thematic information,

• Provision of data at no cost (or at least reasonable costs, if cost reduction in the overall

process justifies this costs)

• Provision of data in a technically useful form (i.e. roads as a connected network, not as DXF-

like line work for drawing).

• Provision of data to industrial service providers, potentially outside the country of the data

provider.

• Support of the free and open data policy of the end product, including the nationally

provided input data.

• Deficiencies in European dataset concerning completeness and fitness for purpose in the

respective country

In case these prerequisites are fulfilled, the national data can be included based on an evaluation of

economic (e.g. work load for creation of harmonized datasets) and technical criteria.

Candidates for national data (to be used for Level 1 “hard bone” delineation) are the first of all

datasets that fall under INSPIRE Annex I and should be available in harmonized version from

November 2017 onwards:

• Transportation network (including information on tunnels)

• Buildings

• Water courses (running and standing water)

• Coastline

4.5.1 Example: Rivers and lakes

One example for a joint European (EU 28) dataset is the reference spatial data set of the water

framework directive (WFD) that is part of the water information system for Europe (WISE)6. They

compile national information reported by the EU Member states to one homogeneous dataset.

According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is

available in a version 2010 and 2016. Such datasets may provide a valuable data source for the

generation of “hard bones”, but still do not provide the optimal solution. In the case of the WISE

WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as

countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only

available for EEA and EC for visualisation purposes). Compared to the EU-Hydro dataset they provide

a higher geometric quality, but are less complete, as smaller rivers of catchments less than 10 km2

are not included by definition.

6 https://www.eea.europa.eu/data-and-maps/data/wise-wfd-spatial

29 | P a g e

4.5.2 Example: roads

The project ELF – European Location Framework7 – has compiled overviews for existing national

services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single

point of access for harmonised reference data from National Mapping, Cadastre and Land Registry

Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for

European reference data, but at the current point in time the data situation even for core national

data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing

(semantically and geometrically) the various dataset. The advantage of using national data instead of

European wide available data has to be carefully evaluated with regard to costs and benefit. First of

all, it is still a challenge to find the appropriate dataset, and even specialised projects like ELF that

would like to serve as a single-entry point do not manage to provide the full picture. In the case of

roads according to the ELF-project only Iceland, Norway, Finland, Denmark and Czech Republic

provide adequate services (Figure 4-4). It is very likely that other national services do exist, but

cannot be found easily. This has to be considered, when replacing European wide existing data

sources with national data.

Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services

using the ELF interface (Nov. 2017).

4.5.3 Example: land parcel identification system (LPIS)

The land parcel identification system (LPIS) in Europe is not suited to be integrated into the Level 1

“hard bone” process, as the data is currently either

• not available on national level (appr. 50%)

• partly only available on regional level (Spain, Germany)

• do not represent the same spatial scale across countries

• are not available with the same semantic information (e.g. grassland/arable land

differentiation)

• not accessible for all users / uses

7 http://locationframework.eu/

30 | P a g e

Figure 4-5 illustrates the availability of LPIS datasets across Europe. Availability in this sense means

the possibility to physically download the GIS data in the respective country / region.


LandSense.

4.64.64.64.6 Specification for gSpecification for gSpecification for gSpecification for geometric eometric eometric eometric delineationdelineationdelineationdelineation

The vector geometry of the CLC-backbone is derived using a two-step approach (Table 4-2).

• The first level (“hardbones”) is derived using the partition of landscape based on existing

geospatial information (transport network, river network, urban outline)

• The second level (“softbones”) is derived by applying an automatic image segmentation

within the first level objects

The resulting polygons have to comply with the MMU and MWU and will be further attributed

according to the thematic specification (chapter 4.7).

Persistency in landscape is always a relative term, as landscape objects are subject to various

temporal dynamics. Within the traditional CORINE Land Cover mapping approach changes were

perceived as changes of land cover (change between two categories of the defined nomenclature)

within a 6-year period – such changes can be referred to as “status changes”, meaning a change

between a specific number of years, but not the changes within a selected year. However, in parallel

changes occur in much higher temporal frequency e.g. in agricultural management. Those changes

are referred to as “cyclic changes” and represent frequent land cover changes mostly within one

31 | P a g e

year or one vegetation cycle (e.g. on arable fields from bare soil to vegetated areas and vice versa).

Due to the recent availability of multi-temporal observations in the order of 1-2 weeks these kind of

changes can be observed on an appropriate level of scale (e.g. farm management unit).

Therefore, within CLC-Backbone those features of the landscape are defined as persistent objects

that are very unlikely to be altered within one year. These structures (roads, settlements) are very

unlikely to be removed, but may be subject to expansion (linear networks: roads, railways and rivers;

physical boundaries like urban outline). A robust reference geometry (“hardbones”) built up of

transportation network, river network (incl. lakes) and urban outline (and potentially forest outline)

will form the first level of division of landscape based on a-priori data.

Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones)

1. Level objects 2. Level objects

Name Hard bones Soft bones

Method Partition of landscape according to

existing spatial data

Image segmentation

Input data Persistent landscape features:

Transport infrastructure (roads,

railways excluding tunnels); river

network, lakes, coast line, urban

outline, forest outline

Sentinel-2 complete time-stacks of one

vegetation period

Special issue Differentiation between linear

networks as border of segments

and (larger) linear networks that

comprise polygons

Pre-processing (atmospheric and

topographic) of S-2 images; cloud

detection and cloud free observation in

Nordic countries

Advantage Very good cost/benefit ratio; no

doubling of efforts, reuse existing

European geospatial data

infrastructure;

Independent data source;

Clearly defined observation period

Disadvantage Actuality of data might be

outdated; large vector processing

facilities needed; effort to

integrate and collect appropriate

data

Reliability and reproducibility of

segmentation algorithms

Concerns European data vs. national data Technical feasibility (big data processing);

cooperation with data centres necessary

4.6.1 Level 1 Objects – hardbones

The idea behind the Level 1 objects (“hardbones”) is to derive a partition of landscape that is

reflecting relatively stable (persistent) structures and borders. The borders of these landscape

partition are built from anthropogenic and natural features. The transportation network like roads

and railways (without tunnels) built the major component, followed by the partly natural and man-

made structures of rivers and lakes. Other features like the borders of settlements and forest are not

as persistent as roads and rivers, but give an additional partition of landscape that is fairly known

through existing geospatial data. As geospatial data can be outdated depending on the reference

year the hard bone partition of landscape will not be perfect. But it does not have to be perfect, as it

is only an initial step and missing elements or correction of outdated polygons will be integrated

within the second level (soft bones) of the production process, as the second level is based on actual

32 | P a g e

and independent image segmentation. The reason why the a-priori partition of landscape is

promoted here are a higher cost-benefit analysis.

If an image segmentation is used from the very beginning (without the “hard bone” working step),

much more efforts are needed to derive already known features. And some of the features (small

roads) are not recognisable in a 10 by 10 m pixel scale thus leading to polygons that are not

acceptable for most of the foreseen applications.

When building the spatial framework, there is a long list of potential sources for persistent object

geometries. Some can be derived from existing Copernicus products such as the HRL

imperviousness, and Urban Atlas) or European reference layers (EU-Hydro) other additional data

sources like crow-sourced data (OSM) or non-operational scientific products (e.g. European

settlement mask - ESM). National data may replace the named input data in case the criteria

mentioned under chapter 0 are met.

From the long list of input sources, the above-mentioned optimal selection (see Table 4-1) a most

appropriate integration processes is selected, to provide the best available spatial framework at pan-

European level. A limited visual inspection and correction of the input data in this phase is necessary

for all data that do not comply with the requested accuracies (10m positional accuracy). As priority,

the wider OSM roads (European major highways and roads wider than 10m) and the accuracy of the

delineation of the local components has to be analysed

4.6.1.1 Types of borders

The first step will therefore be to build the Level 1 objects (“hard bones”) of the geometric borders

from the ancillary data capturing the persistent features of the landscape such as road, rivers,

railway lines etc. The information source for the hard bones will be existing vector data for roads,

railroads (OSM) and river networks (CCM / Riparian Zones), supported by ancillary datasets (OSM,

HRL-imperviousness, ESM) and Copernicus Local components for outer boundaries of settlement

areas and HRL for tree covered areas (HRL forest types).

Technically speaking the first level objects will identify both

• persistent line objects and

• persistent polygon objects

It is important to note that in this stage only geometric information is taken over from existing

geospatial datasets, but without any thematic information. The information, if a specific border line

constitutes a road, river or urban border is irrelevant for the further process in CLC-backbone, as the

thematic classification is derived in the last step completely independently from any existing

geospatial input data, solely based on the spectral and textural information from the Sentinel-2

images.

The majority of the transportation network (roads and railways) as well as smaller rivers are

represented as line network, where each line identifies the centreline of the objects and present a

border of a Level 1 landscape object.

33 | P a g e

Roads and rivers wider than the defined MMW of 10 m (still to be defined through the consultation

process) have to be represented as polygons.

• Rivers: For wider rivers, the delineation in the Riparian Zone 2012 is considered complete

and quite accurate within Europe, as the dataset covers all rivers above Strahler level 3 (and

will be extended to Strahler level 2 until Q4/2018).

• Roads and railway: As a normal 2-lane road does not meet the requirement of being wider

than 10m the selection of roads as polygons is restricted to the primary road infrastructure

(highways). All OSM-roads with the special tags “motorway” and “trunk” have to be

transformed from line features to polygon features using a mixed automated buffering

method combined with a visual inspection and correction step based on the SENTINEL-2

data. Roads and railways within tunnels have to be excluded from the hardbone, as they do

not establish a recognisable landscape feature on the ground.

Beside the typical linear networks persistent features in the landscape are identified as settlement

outside borders forest borders.

• Settlement borders: A best estimate of the outside borders of settlements can be derived in

combining existing datasets – first of all the LoCo Urban Atlas and outside of the coverage of

the LoCO Urban Atlas using the HRL imperviousness (refined with the ESM) and in

combination with a building-layer (e.g. OSM). Morphological operations are needed to adapt

the combined input layer to comply with the MMUs defined in the CLC backbone product

specification and to transform the mostly vector oriented input data to a raster dataset in

line with the Sentinel-2 grid system.

• Forest borders: For the partition of landscape through forest borders all forest polygons

from the LoCo product suite (UA, RZ, N2K) can be used. Outside of the LoCO the HRL forest

types (FTY) layer will be used as it applies already a specific forest definition with a density

threshold of >30% and a MMU of 0,5 ha.

4.6.1.2 Hierarchical order of borders

As the input borders partly overlap it is important to define a relative weighting and importance of

the input layers. The ordering is derived from the quality, type and geometric representation (line or

polygon) of the input data. The following order is specified (with increasing importance):

• HRL forest type (as polygon)

• LoCo forest classes (as polygon, from UA, RZ, N2K)

• Settlement outline (as polygon, from HRL imp, ESM, OSM)

• Road + railway (as line, excluding tunnels)

• Rivers (as line)

• Road + railway (as polygon)

o from UA …higher level streets

o from UA …lower level streets

• Rivers (as polygons)

o from RZ (code 911)

• Lakes (as polygons, from EU-hydro, WFD, national data)

• Coastline (as line)

• AoI (as line)

34 | P a g e

4.6.2 Level 2 Objects – softbones

The aim of this second level segmentation is to derive objects that can be differentiated spatially

according to their land cover and land use aspects that have an influence on the spectral response

and behaviour throughout a year. With the increasing number of observation dates per pixel

throughout a year the likelihood to find homogeneous management units increases, but at the same

time pixel based noise is increased as environmental conditions within a field parcel may vary

significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the

number of acquisitions to maximize the identification of single field parcels and on the other side to

reduce “salt and pepper” effects due to varying spectral response.

The level 2 landscape objects represent in the ideal case a consistent management of field parcels,

and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a

year. As an example, different settlement structures (e.g. single houses or building blocks) should

appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial

built-up pattern) and will therefore be delineated as different polygons. The actual cultivation

measures applied on the land will also influence the type of delineation as e.g. in agriculture the field

structure is visible in multi-temporal images due to the differences in temporal-spectral response

according to the applied management practices on field level (mowing, ploughing, sowing).

4.6.3 Accuracy of geometric delineation:

The geometric accuracy is defined with

• Appropriate size:

o Too large polygons: not more than 15% of all polygons

o Too small polygons: not more than 20% of all polygons

� Any deviation of more than 20% of polygon area is considered too small or

too large

• Appropriate delineation

o Shift of border: max. 20m shift

� not more than 10% of perimeters is shifted more than 20m

4.74.74.74.7 Specification for tSpecification for tSpecification for tSpecification for thematichematichematichematic attributionattributionattributionattribution

The polygons (hard bones and soft bones) derived from the geometric specification are attributed

according to a set of simple land cover classes. There are various options to attribute the thematic

content of the polygons

1. Option: Attribution with basic land cover classes based on a pixel-based classification (10m

or 20m) of all pixels within the object

2. Option: Attribution with basic land cover classes based on spectral properties of the object

itself

3. Option: combination of Option 1 and Option 2

35 | P a g e

Option 1: pixel based classification

Similar to the HRLs a pixel based classification of crisp land cover classes (no intensities) is conducted

based on the SENTINEL-2 images. The number and type of land cover classes is discussed below. The

object (polygon) is attributed with various statistics derived from the pixel based classification

• Dominating class (majority)

• Percentage of class i…j

• Ordered list of class percentages (3-5 most dominating classes)

• Count of regions per class i…j (one region is one spatially homogeneous area with only one

class type)

Option 2: Object based classification

The objects (polygons) are classified using the same crisp land cover classes as above, but not on a

per-pixel base, but based on the spectral values of the object itself. Statistical parameters like mean

value, standard deviation and combination of bands (e.g. NDVI) are helpful to define the class.

Option 3: combined pixel based and object based classification

Combination of Option 2 and Option 3. In Option 3 a polygon is as well described using the pixel-

based approach and in addition a classification of the polygon itself.

4.7.1 Thematic classes

Accuracy:

The accuracy for the thematic differentiation is defined with

• Option 1: 5 classes

o 90% pixel based overall accuracy

o 95% object based overall accuracy


o 85% pixel based overall accuracy

o 90% object based overall accuracy

Thematic detail:

After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape

object is then labelled by simple land cover classes. The descriptive nomenclature which is applied

for the labelling contains the following EAGLE derived Land Cover Components:


1. water,

2. permanent vegetation,

3. transient vegetation,

4. bare ground / sealed surface,

5. snow / ice

36 | P a g e


1. Sealed surface (buildings and flat sealed surfaces)

2. Woody – coniferous

3. Woody – broadleaved

4. permanent herbaceous (i.e. grasslands)

5. Periodically herbaceous (i.e. arable land)

6. Permanent bare soil

7. Non-vegetated bare surfaces (i.e. rock and screes, mineral extraction sites)

8. Water surfaces

9. Snow & ice

Remark to class woody:

Member countries have expressed the wish to further differentiate the class woody into “trees” and

“shrubs”. This differentiation would be useful but can currently not be derived with sufficient

accuracy from spectral attributes alone. A robust differentiation is only possible by using a

normalized difference surface model (nDSM).

In the CLC-Backbone nomenclature code list no land use terminology has been used. This is on

purpose to avoid semantic confusions between land cover and land use. From a purist’s view point

only land cover can be derived from remote sensing data. However, due to the multi-temporal

observations a number of land use and land management characteristics can be derived as well.

Nevertheless, for land cover semantic terms like “grassland” or “arable land” have to be avoided. We

therefore suggest to characterize landscape objects (Level 2 polygons) according to the classes that

are characterized by land cover definitions only. This is especially true for agricultural areas, where a

differentiation of arable and grassland areas is principally a key element. What can be observed

using RS images is however only if within one year a specific agricultural parcel is ploughed; as

ploughing means a reduction of biomass and the occurrence of bare soil. Therefore, ploughed

parcels can be differentiated from non-ploughed parcels as the bare soil count is a good indication

for the separation of these two classes:

• permanent herbaceous

o Permanent herbaceous areas are characterized by a continuous vegetation cover

throughout a year. No bare soil occurs within a year. These areas are either

unmanaged or extensively managed natural grasslands or permanently managed

grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or

even set-aside land in agriculture. For managed grasslands, the biomass will vary

over the year, depending on the number of mowing (grassland cuts) or grazing

events.

• periodically herbaceous

o Periodically herbaceous areas are characterized by at least one land cover change (in

the sense of EAGLE land cover components) between bare soil and herbaceous

vegetation within one year. Depending on the management intensities these areas

can have up to several changes between these two EAGLE land cover components

within a year. Normally these areas are managed as arable areas.

37 | P a g e

A permanent and managed grassland as defined in IACS/LPIS may be subject to be ploughed every 3-

5 years for amelioration purposes followed by an artificial seeding phase a renewal of vegetation

cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6

years. This has to be considered, when evaluating shorter time phases, as by occasion a managed

grassland may just be subject to renewal within the observation period of 1 year. The differentiation

from arable areas with much higher frequencies of ploughing events (either more than once a year,

but on minimum 2 times within 3 years).

All kind of complex classes either as mixture of existing land cover classes (e.g. CLC-mixed classes) or

complex classes in the sense of ecological highly diverse classes (e.g. wetlands) are avoided by the

classification, as it solely concentrates on the classification according to the surface structure and

properties.

A detailed technical description how to model the temporal changes is given in Annex 1.

Option to be discussed:

In addition to the class labels the following spectral attributes are attached to the landscape objects

(in the data model: parametric observations):

• Vegetation trajectories

o NDVI profiles (e.g. weekly observations) within the vegetation season

• Aggregated indicators for vegetation indices (within vegetation season)

o Median of NDVI

o 20 % percentile

o 80% percentile

• Further indicators tbd.

o Band specific indicators

o Time specific indicators

o Texture indicators

Important note:

As the main emphasis is on the geometric delineation of landscape objects neighbouring objects

may have the same land cover code. They should be kept as unique entities, as they can be

differentiated in specific attributes that may be added in a later stage to this polygon (e.g.

broadleaved trees vs. needle-leaved coniferous trees).

Apart from the technical specifications for the foreseen tender for CLC-Backbone additional

attribution by various data sources is an option for future enrichment of classes. First of all, member

countries can be asked to populate the resulting geometries with national data following the

prescribed classification system or even more detailed EAGLE land cover components. In addition,

crowd-sourced data may be used to characterize the land use within these polygons (either by in-

situ point observations or by wall-to-wall datasets e.g. OSM land use).

4.84.84.84.8 Proposal for production methodProposal for production methodProposal for production methodProposal for production methodologyologyologyology

Within this consultation version (Nov. 2017) of the CLC-Backbone specifications an outline of a

possible production methodology is included to illuminate the processing that is envisaged to

generate a detailed geometric structure as a geometric vector skeleton which is subsequently

described according to a simple land cover classification (approx. 10 classes) on a per-pixel level.

38 | P a g e

The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step

(hardbone) boundaries of relatively stable areas are derived based on existing (European wide or

comparable national) data sets. In a second step image segmentation techniques based on multi-

temporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level

2 objects). The third step contains an attribution of the landscape objects derived from steps 1 and 2

based on pixel-based pure land cover classification (EAGLE land cover components).

The term “geometric skeleton” is used to stress that the added value of CLC-Backbone is the

breakdown of the landscape into homogenous entities. These entities represent zones (delineated

areas) that can be attributed using spectral indices and classified according to basic land cover

classes. The basic land cover classes are identical with a spatial aggregation and temporal sequence

of EAGLE land cover components.

4.8.1 Step 1: Definition and Selection of persistent boundaries

As various geospatial data are integrated their order a hierarchy is of major importance. In chapter

4.6.1.2 the hierarchy of input data is defined that may be adapted to improve the quality of the

results.

OSM data delivers in addition to the centreline of roads and railways, a polygon layer expressing the

land use. As the land use is labelled using a free text assignment (actually any text can be attributed

to the polygons) the land use is not modelled using a controlled vocabulary. However, the project

OSM-land use (http://osmlanduse.org) has shown that it is possible to transform the land use OSM

labels into CORINE Land Cover nomenclature (see Annex 1). For roads, the land use labels “roads”

and “traffic” and for railways the label “railway” were used to define the transportation areas (land

use class). Tunnels are identified with a special tag in OSM and have to be EXCLUDED from the

analysis.

Woody vegetation and settlements are as well rather persistent landscape features. Their outer

boundary can be initiated from HRL forest types and the Local component products (boundary taken

from aggregation of polygons on Level 1 of nomenclature). The geometric outline of settlements can

be initiated using a density threshold (e.g. >5%) of the HRL imperviousness (refined with European

Settlement Mask ESM) combined with a buffer and shrink gap filling algorithm (e.g. 50m) to reduce

the typical “salt and pepper” effect especially in settlements due to smaller green garden areas

within built-up areas. The commission errors in the HRL imp and ESM are relatively low, but

omission errors (settlements that are not detected) occur occasionally. The settlement outline

border will be amended using the OSM building vector layer and the OSM land use layer (residential

and industrial). As the built-up area is always larger than the area occupied by buildings, a buffer of

10m can be applied to derive from single OSM buildings a first built-up outline. The same buffer and

shrink operation (using 40m buffer) can be applied to this built-up outline. All urban data sets

together will form a realistic outer boundary of settlements. The HRL imperviousness has to be

refined in this step, as it also includes sealed areas OUTSIDE of settlements (road infrastructure). The

combination of the two different settlement layers (HRL and ESM, OSM) should reveal the most

realistic8 outer boundary of settlements. The inner differentiations of settlements are derived (1)

through the linear networks (transportation and rivers) and (2) through the subsequent image

segmentation.

8 But not perfect outer boundary – post-processing steps will be needed to refine these boundaries.

39 | P a g e

The urban outline can be automatically checked for omissions (entirely missing villages or larger

parts of villages) using the “Global Urban Footprint +” (GUF+) raster data set on sealed areas from

DLR. Data access conditions to GUF+ are considered as free and open.

For the delineation of the forest outline outside the local components the HRL forest types layer will

be used as it applies already a specific forest definition with a density threshold of >30% and a MMU

of 0,5 ha.

The outcome of this first step shall be a coarse but robust delineation of landscape objects, including

the a-priori information of the kind of land cover according to the integrated datasets, which can be

used as further indication for the subsequent classification of smaller landscape objects. Any

potential error (systematic or random) that is introduced by the method (“burning in” from ancillary

data) can be corrected in the subsequent Step 2, given that the error is larger than the MMU and is

detectable from S-2 images.

The final delineated objects have to be cleaned according to the mandatory MMU of 0.5 ha. All Level

1 landscape objects smaller than the MMU have to be merged with the neighbouring polygons,

using the method “the longest common border”. In Annex 2 an illustration of the working procedure

is provided.

A look-and-feel illustration of the final geometric delineation in step 1 (colour coded according to the

source data) is provided Figure 4-6.

Figure 4-6: illustration of results of step 1 – delineation of Level 1 landscape objects. The border

defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data

was available (not shaded).

40 | P a g e

A cleaning of endpoints is the last step in the process. All lines with open endpoints have to be

removed from the final geometric skeleton.

4.8.2 Step 2: Image segmentation

A second landscape object level (Level 2) represents the “soft bones” of landscape and is based on

the automatic segmentation of available EO data. These second level objects subdivide the first level

landscape objects into spectrally and/or texturally homogeneous subunits according to the

predefined minimum mapping unit of 0.5 ha. For this second level assistance and available data from

MS can be foreseen in the long term and for a short-term solutions the image segmentation derived

directly from Sentinel-2 HR data can provide a first and intermediate solution.

The segmentation based on spectral information from multi-temporal EO-imagery to cover spatial

differences caused by seasonal variation will be used to subdivide the area into spectrally and/or

texturally homogeneous and stable spatial units, within the constraint set by a pre-defined MMU

and the hard bones framework (Level 1 landscape objects). The resulting segments still represent

pixel-based borders and will be vectorised and integrated with the hard bones to produce a

“polygon map” of “spectrally homogeneous areas”. There is no classification involved at this stage –

only segmentation. The Level 2 landscape objects will be used in step 3 as input for the labelling and

description of these polygons.

41 | P a g e

5 CLC-CORE – THE GRID APPROACH

The CLC-Core product should be seen as an underpinning semantically and geometrically

harmonized “data container” and “data modeller” that holds, or allows referencing of, land

monitoring information from different sources in a grid-based information system. The grid

database will essentially store geospatial and thematic data sources, such as CLC-Backbone, CLC,

Local Component data, HRLs, in-situ as well as Member State provided information. The contents of

the grid database can be exploited directly within grid-based analyses or alternatively subsets of the

grid database can be extracted and analysed by spatial queries driven by objects from an external

vector dataset. In this way, the CLC-Core can support the other elements of the 2nd generation CLC,

for instance, a CLC-Backbone object could gain additional thematic information by aggregating or

analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next

chapter).

5.15.15.15.1 BackgroundBackgroundBackgroundBackground

Textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 19929, Congalton

199710). The “vector GIS” does, like the paper map, attempt to represent individual objects that can

be identified, measured, classified and digitized. The “raster GIS” attempts to represent continuous

spatial fields approximated by a partition of the land into small spatial units with uniform size and

shape. The “raster GIS” therefore have structural properties that make them particularly suitable for

use in modelling exercises.

A grid is a spatial model falling somewhere between the vector model and the raster model. The grid

looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is

classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is

characterized by how much it contains of each land cover class or other information. The grid is, as a

result, a more information-rich data model than the raster.

The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on

CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units

of e.g. in this case 1 X 1 km. The result can either be a “raster” where a single CLC class is assigned to

each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class

found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector

representing the complete composition of land cover classes inside the unit is assigned to each grid

cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases

lost, but the overall loss of information is less in the grid than in the raster.

The loss of information when using a grid structure can be further reduced if the grid size is selected

to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha

(50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products

derived from the grid-based information must also consider grid size in relation to its requirements.

9 Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U.,

Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture

Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg 10 Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion.

Photogrammetric Engineering and Remote Sensing, 63: 425-434.

42 | P a g e

Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between

encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:

10 daa = 1 ha.

A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with

thematic information from (one or more) existing sources (Figure 5-2). One method frequently used

to populate grids with LULC information is to calculate spatial coverage of each LULC class based on

a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes

can furthermore involve counting or statistical processing of registered data for observations made

inside each grid cell (for instance, not only the area of a class, but also the number of objects it

represented). Ancillary databases, which provide important and relevant information about the

content and context of the grid cells, can also be added. For instance, the number of buildings,

length of roads or number of people found within the grid cells.

5.25.25.25.2 Populating the database Populating the database Populating the database Populating the database

The grid as a spatial model consists of square spatial units of uniform size and shape that can have

an (internally) heterogeneous thematic composition, i.e. the grids are “geometrically uniform

polygons”. Each grid cell has a unique ID that is related to a database containing the attribute data.

43 | P a g e

In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see

also chapter 4.4):

• Local Component data (Urban Atlas, N2000, Riparian Zones)

• High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness)

• CLC-Backbone

• Any other data, such as CLC (25 ha), in-situ data, …

Upon agreement, Member States are invited to provide also their national information to populate

the information system, mainly on land use and agricultural themes.

Figure 5-2: Representation of real world data in the CLC-Core.

5.35.35.35.3 Data modelling in CLCData modelling in CLCData modelling in CLCData modelling in CLC----Core Core Core Core

CLC-Core will be the effective “engine” to model, process and integrate the individual input data sets

to create new, value added information. The modelling will make use of synergies (the same

information provided for one cell based on different sources) as well as disagreements (different

information provided for one cell by different sources) during the data processing.

We would like to refer to the modelling and combination of different layers of the database,

including the modelling parameters as an “instance” of the database. In this understanding, any

output of a database modelling process using different input layers and different parameters will

create a specific instance.

Technically CLC-Core shall:

• Allow further characterisation (e.g. by additional input data / knowledge, see also chapter

5.2 - Populating the database).

• Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local

component products.

• Allow the best possible transformation of National datasets into CLC+ (capturing the wealth

of local knowledge existing in the countries).

44 | P a g e

Storing information in a grid structure will also help to overcome the problem of different

geometries: as the data are converted to grid cells, differences in vector geometries will have a

lesser impact.

5.45.45.45.4 Database implementation aDatabase implementation aDatabase implementation aDatabase implementation approach for CLCpproach for CLCpproach for CLCpproach for CLC----CoreCoreCoreCore

The implementation approach for the database of CLC-Core is based on paradigms originating from

the semantic web and knowledge engineering. This opens up new possibilities beyond the classical

object-relational database management system concept. In the following subsections, we describe

the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL)

concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that

utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we

propose strategies to generate CLC-Core products – i.e. “flat” shape-files or GeoJSON files – based

on a spatio-temporal Triple store.

5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores

The contemporary object-relational database model relies on the work of Codd (1970)11. Although

object-relational database management systems (ORDBMS) were developed further since the 1970s,

their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the

Semantic Web the requirements for databases have changed drastically, which led to the

development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012)12,13.

ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational

model defines how data are stored, retrieved and altered within databases – base on n-ary relations

on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an

environment “reasoning” is done using a two-valued predicate logic. Data, stored utilizing the

relational model, are viewed as tables, where each row represents an n-tuple of the relation itself.

Each column is labelled with the name of the corresponding set (domain). Operations are performed

using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are

designed to adhere to the ACID principles – atomicity, consistency, isolation and durability – which

are defined by Gray (1981)14. ACID principles ensure that database transactions are performed in a

“reliable” way. In contemporary ORDBMSs consistency is the central issue limiting performance and

the ability to scale horizontally – which is crucial for Web 2.0 or Big Data requirements. In addition,

ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data

model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as

static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling

of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used

in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In

11 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13

(6): 377. doi:10.1145/362384.362685 12 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot

persistence. Pearson Education. 13 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt

nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 14 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International

Conference on Very Large Databases, pp. 144–154, Cannes, France.

45 | P a g e

addition, commercial and open-source ORDBMSs are mature products and offer a variety of options

and extensions.

The term “NoSQL” database emerged in 2009 (Evans, 2009)15, as a name for a meeting of database

specialists on distributed structured data storage in San Francisco on June 11, 2009. Since then the

term has grown into a widely known umbrella term for a number of different database concepts

(Friedland et al., 2011). NoSQL databases have the following characteristics in common:

- Non-relational data model

- absence of ACID principles (especially consistency – defined as BASE [eventual consistency

and basically available, soft state eventually consistent])

- flexible schema

- simple API’s (no join operations)

- simple replication approach

- tailored towards distributed and horizontal scalability

NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require

the handling large data volumes with replication. In addition, NoSQL databases are designed to

support horizontal scalability, which is necessary for handling big data with high turnover rates16.

Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo,

Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011)17, Sadalage &

Fowler (2012)):

- Column databases: have tables, rows and columns, but the column names and their format

can change – i.e. each row in a table can be different (Khoshafian et al., 198718; Abadi,

200719). Generally speaking, a wide column store can be regarded as 2D key-value store.

Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google’s

Bigtable

- Key-value databases: store data in the form of a key and an associated value (similar to a

hash), and strictly no relations. Key value stores show exceptional performance in terms of

horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g.

OrientDB, Dynamo (Amazon), Redis, or Berkeley DB.

- Document databases: rely on the “document” metaphor that implies that the

data/information is contained in documents – that share a given format or encoding. Thus,

encodings like XML or JSON are used to represent documents. Each document can be

schema free, which is a key advantage for Web 2.0 applications. Examples of document

databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino.

15 Evans, E. (2017): NOSQL 2009. Url: http://blog.sym-link.com/2009/05/12/nosql_2009.html, visited: 06-11-

2017. 16 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot

persistence. Pearson Education. 17 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M.

Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany:

pp. 121 - 129. 18 KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy

for the decomposed storage model. In: ICDE, pp. 636–643. 19 Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on

Innovative Data Systems Research (CIDR), Url: http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf, visited:

Nov. 08, 2017.

46 | P a g e

- Graph databases: utilize the notion of a mathematical graph structure for database

purposes (Robinson et al., 2015)20. A graph contains of nodes and edges that connect nodes.

In a graph database, the data/information is stored in nodes which are connected via

relationships, represented by edges. The relations can be annotated in a way that a graph

database may contain semantic information. The concept allows modelling of real-world

phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for

populations. Graph databases are capable of storing semantics and ontologies – especially in

the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015)21. They

became very popular due to their suitability for social networking, and are used in systems

like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013)22.

- Multi-modal: such databases are capable of supporting multiple NoSQL models/types.

Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB.

Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al.,

2009, Heath & Bizer, 2011)23,24. Originating from a web of documents, Bizer et al. (2009) describe an

approach to develop a web of data (i.e. the data driven web) – by publishing structured data such

that data from different sources can be interlinked with typed links. Additionally, they formulate

four characteristics of linked data:

- They are published in machine-readable form

- They are published in a way that their meaning is explicitly defined

- They are linked to other datasets

- They can be linked from other data sets

In order to technically fulfil these characteristics, Burners-Lee (2009)25 established a number of

Linked Data principles:

- Uniform Resource Identifiers (URI‘s) to denote things

- HTTP URI‘s shall be used that things can be referred to and dereferenced

- W3C standards like Resource Description Framework (RDF) should be used to provide

information (Subject – Predicate – Object)

- Data about anything should link out to other data, can contribute to the Linked Data Cloud

RDF is a family of W3C specifications that was originally designed as metadata model. Over the years

it has evolved to the general method to describe resources. RDF is based on the notion of making

statements about resources – which follow the form: subject – predicate – object, known as triple.

Here the subject denotes the resource, the predicate represents aspects of the resource, and

therefore is regarded as an expression of the relationship between the subject and object (see

Figure 5-3).

20 Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data,

O'Reilly Media Inc. 21 Lampoltshammer, T. J. & Wiegand, S. (2015). “Improving the Computational Performance of Ontology-Based

Classification Using Graph Databases” Remote Sensing, vol. 7(7): 9473-9491. 22 Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern

Association for Information Systems Conference, Atlanta, vol. 2324, pp. 141-147. 23 Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability

and web applications: emerging concepts, 205-227. 24 Heath, T. and Bizer, C. 2011. Linked data: Evolving the web into a global data space. Synthesis lectures on the

semantic web: theory and technology, 1(1), 1-36. 25 Berners-Lee, T. 2009. Linked Data - Design Issues. Online:

http://www.w3.org/DesignIssues/LinkedData.html; visited: Nov. 09, 2017.

47 | P a g e

Figure 5-3: Example of an RDF triple (subject - predicate - object).

To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema

(RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic

reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage – called

Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples

through semantic queries. Triplestores are specific graph databases, document oriented stores, or

specific relational data stores that are designed for RDF data. A high number of available Triplestores

is able to store spatial and temporal amended data. The following table gives an overview of the

available triplestores (both open-source and commercial products):

Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data.

Product name Supports spatial data Supports temporal data

Apache Jena Yes No

Apache Marmotta Yes Yes

OpenLink Virtuoso Yes Yes

Oracle Yes Yes

Parliament Yes Yes

Sesame/RDF4J Yes No

AllegroGraph No Yes

Strabon Yes Yes (stSPARQL)

GraphDB Yes No

To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows

to write queries that can use quantitative (typically proximity based) relations as well as qualitative

relations (spatial “on” and temporal “after”, “from”). Spatial and temporal extensions of SPARQL

(such as stSPARQL and GeoSPARQL) supporting qualitative relations have been proposed but

expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014)26.

Nevertheless, spatial queries, using GeoSPARQL can look as depicted in Error! Reference source not

found.. In Error! Reference source not found., the GeoSPARQL query “list airports near the city of

26 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In

Temporal Representation and Reasoning (TIME), 2014 21st International Symposium on (pp. 81-90). IEEE.

48 | P a g e

London” is depicted. The result of the query is shown in Figure 5-5, the GeoSPARQL query “list

airports near the city of London” is depicted. The result of the query is shown in Figure 5-4 the

GeoSPARQL query “list airports near the city of London” is depicted. The result of the query is shown

in a map metaphor and as JSON (Figure 5-5).

One major feature of triplestores and SPARQL is the possibility to create federated queries. A

federated query is a single query that is distributed to several SPARQL endpoints (capable of

accepting SPARQL queries and returning results), that individually compute results. Finally, the

originating SPARQL endpoint gathers the individual results, compiles them in one single output, and

returns the output accordingly. This allows to query a large number of different data storages that

can be geographically dispersed (e.g. over Europe).

Figure 5-4: GeoSPARQL query for Airports near the City of London.

Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format.

PREFIX gn: <http://www.geonames.org/ontology#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX spatial: <http://jena.apache.org/spatial#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?link ?name ?lat ?lon WHERE { ?link spatial:nearby(51.139725 -0.895386 50 'km') . ?link gn:name ?name . ?link gn:featureCode gn:S.AIRP . ?link geo:lat ?lat . ?link geo:long ?lon }

49 | P a g e

5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach

The approach followed for the CLC-Core database can be based on a triplestore approach. This

approach could exploit the advantages of NoSQL databases (especially triplestores) while

overcoming the drawbacks of contemporary ORDBMS – especially addressing horizontal scalability

and consistency.

Hence, we propose distributed triplestore approach, where each member country of the European

Union could host their own CLC-core data in an RDF format. Each member country could host a

triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and

replication issues that might arise with contemporary ORDBMS. Hence, each member country can

host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the

member countries can be connected via federated queries, which requires only one SPARQL query at

an endpoint at hand (i.e. at exactly one member country’s CLC-endpoint) – see Figure 5-6.

If there are member countries that are not sure if they should operate a triplestore (and a SPARQL

endpoint) with the CLC-Core data, they can also as another country to host their data on their

infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU

member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria

could host the CLC+ data of another country if they e.g. don’t the resources to develop the

infrastructure for hosting and publishing the data.

Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The

arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint.

Inevitable for such an approach is the development of an underlying shared semantical model – i.e.

an ontology. An abstract semantical model – in RDFS or OWL – can be developed that describes the

abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This

allows the utilization of the abstract semantic model in queries (semantic reasoning based on the

developed knowledge basis). An example for semantic reasoning could be the determination of the

land cover type, based on a number of properties of grid cells.

France:

CLC+ SPARQL

Germany:

CLC+ SPARQL

Austria:

CLC+ SPARQL

Italy:

CLC+ SPARQL

SPARQL query

Federated

SPARQL queries

Flow of result

data

50 | P a g e

5.4.3 Processing and Publishing of CLC-Core Products

To generate CLC+ products based on the distributed triplestore architecture, we need to define the

products requested by the customers. Based on the defined products we need to develop a

migration/transition strategy describing the data flow from triplestores to the desired product.

We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc

queries and analyses, whereas customers would prefer a single “flat” file containing the CLC+ data.

Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase

file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an

automatic way by using a spatial Extract Transform an Load (ETL) tool like SAFE FME27, geokettle28

(open source), GDAL/OGR29 (open source). The migration can be performed with e.g. different

geographical or temporal coverage. These files can be hosted and published on a central server,

where they are available for the public (see Error! Reference source not found.).

Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.

5.4.4 Conclusion and critical remarks

Given the intended fine spatial and temporal resolution of CLC+, which will result in high data

volumes with high turnover rates (velocity) – which are two characteristics of Big Data30. Thus, it

seems advisable to adhere to the nature of these datasets when planning a sustainable architecture

for CLC+. As contemporary ORDBMS are not designed for Big Data applications, which manifests in

their weaknesses regarding horizontal scaling, consistency and replication, we advise to have a

deeper look into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to

cope with the requirements of CLC+, with respect to data volume, data turnover rate, distributed

architecture and semantic queries (and reasoning). Although, NoSQL datastores are widely used in

business applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical

foundation is still under active development. Hence, NoSQL databases do not reach the

technological maturity of contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a

weakness and an opportunity. By using NoSQL databases, this CLC+ is on the edge of the

technological developments and is able to contribute to the future of NoSQL databases – and to

strengthen the spatio-temporal abilities of them.

27 https://www.safe.com/fme/fme-server/ 28 http://www.spatialytics.org/projects/geokettle/ 29 http://www.gdal.org 30 Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy

Review.".martinhilbert.net. visited: Nov. 08, 2017.

Web Server (public)

…

Spatial

ETL Tool

Triple Store #1

Triple Store #2

Triple Store #n

SHAPE File France

SHAPE File Germany

SHAPE File Austria

SHAPE File Finland

…

GeoJSON France

GeoJSON Germany

GeoJSON Austria

GeoJSON Finland

…

51 | P a g e

6 CLC+ - THE LONG-TERM VISION

CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in

Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information

needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2nd

generation CLC to guarantee backwards compatibility with the original CLC datasets.

CLC+ shall be understood as one instance (i.e. realisation) of the CLC-Core (possibly within the

context of CLC-Backbone), enabling it to address various reporting obligations, such as the EU Energy

Union, Paris Agreement, MAES and LULUCF to name just some examples. Many other land

monitoring products with various nomenclatures and spatial frameworks may also be defined on the

basis of CLC-Core.

Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m),

aggregated from CLC-Core. Due to the higher geometric resolution, the use of some of the often

debated heterogeneous classes, like 2.4.2 (complex cultivation pattern) and 2.4.3 (land principally

occupied by agriculture, with significant areas of natural vegetation) will be significantly reduced, if

not disappear.

As the original CLC nomenclature is more than a pure land cover classification, there will be the need

to make use also of other information than currently provided by the Copernicus Land Monitoring

Service. Table 6-1 provides an overview of the 44 CLC classes and which type of information would

still be needed to derive a specific CLC class in addition to the information provided by the Local

Components, the High Resolution Layers and the land cover information collected by CLC-Backbone.

The Hungarian test case31 for implementing the EAGLE concept in a bottom up CLC production has

clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input

data on:

• Land Use (critical);

• Crops and use intensity (e.g. from LPIS).

Obviously, the situation is different for each CLC class and also dependent on the available

Copernicus data (i.e. UA, RZ or N2K), but Table 6-1 clearly shows that almost no CLC class can be

derived without additional information on the use aspect of the area. In almost all cases this

information gap can be closed with data from Member States. In case MS data is not available, a few

classes can be approximated via European level information, but most certainly on the expense of

data accuracy and precision. The national information should be provided as input to CLC-Core. In

case of missing national land use information as well as no other European-level replacement data,

traditional CLC data will be used as back-up solution. Areas with obvious contractions in land cover

(from CLC-Backbone) vs. traditional OBS classes will need to undergo special post-processing during

the phase of creation CLC+.

31 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the

European Environment Agency and the European Environment Information and Observation Network. Final

report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a

selected area in one EU country

52 | P a g e

Table 6-1: List of CLC classes and requirements for external information

53 | P a g e

The update cycle of CLC+ is proposed to be consistent with the update cycles of the major input data

sets (i.e. the 3 year cycle of the LoCo or the 3-6 year cycle of CLC-Backbone). Apart from that, as

CLC+ is “just” an instance of CLC-Core, it can be derived dynamically upon request.

54 | P a g e

7 CLC-LEGACY

The history of European land monitoring is a story of permanent evolution of approaches and

concepts, resulting in different solutions for individual needs and specifications. In this context, the

CORINE Land Cover (CLC) specifications provided the first set of European-wide accepted de-facto

standards, i.e. a nomenclature of land cover classes, a geometric specification, an approach for land

cover change mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC

represents a unique European-wide, if not global, consensus on land monitoring, supported by 39

countries.

Being the thematically most detailed wall to wall European LC/LU dataset and representing a time-

series started with the early 1990 years CLC data became a basis of several analyses and indicator

developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that

the planned new European land monitoring system (CLC-core & CLC+) has to be able to ensure a

possible maximum of backwards compatibility with standard CLC data, i.e. CLC-Legacy.

On the other hand, comparing the different CLC production methods, i.e. traditional visual photo-

interpretation vs. national bottom-up solutions, shows that those specific methodologies result in

CLC data with slightly different characteristics in terms of LC/LU statistics or fine geometric detail,

even though the general CLC specifications are kept. By a consequence, it should be specified in

what sense the outputs of the new land monitoring system may and should be consistent with CLC-

legacy.

Although original CLC vector data are still used mainly for cartographic purposes, the most

commonly used version of CLC data are raster CLC status and change layers. Moreover, for the

purposes of LEAC analysis CLC information have been transformed to the form of CLC Accounting

Layers.

Specific characteristics and the variety of the CLC creation and update methodologies have

resulted, that while CLC change data are well representing land cover changes between two

certain reference dates, there are significant "breaks" in the continuity of CLC time-series both in

statistical and in geometric sense. The solution applied for the harmonization of CLC time-series is

based on the idea to combine CLC status and change information to create a homogenous quality

time series of CLC / CLC-change layers for account purposes fulfilling the relation:

CLC change = CLC accounting new status – CLC accounting old status

Due to the fact that the Minimum Mapping Unit (MMU) is different for CLC status (25 ha) and CLC

change (5 ha) layers, resulting CLC accounting layers may include at several locations more spatial

detail, than original CLC status layers. Although this fact contradicts to original CLC specifications,

still CLC-accounting layers are considered as the most consistent harmonized time-series of CLC

data.

CLC accounting layers are created always on the basis of the latest raster CLC status layer. Fine

details are added to this layer from the formation codes of previous CLC-change inventories. All

the former CLC accounting status layers are calculated backwards by overlaying CLC-change

consumption codes.

55 | P a g e

The backwards compatibility of CLC+ with CLC-legacy is foreseen to be ensured in a similar way. This

method is already applied in reality, since some of the member states are producing CLC layers in a

bottom-up way and these data are then combined with manually interpreted CLC-changes. The

applied methods of bottom-up CLC creation are different, but variations of the basic idea CLC+ relies

on.

In the sense of the above, the aim of CLC-Legacy is NOT to reproduce as much as possible original

25 ha MMU vector CLC as it would be if created continuously by visual photo-interpretation. Instead

of this, CLC-legacy will be created by considering the following aspects:

• CLC-Legacy is to be created as a 100m resolution raster layer, representing the CLC (like)

status at a certain reference date.

• CLC-Legacy will be derived from CLC-Core (and potentially CLC+) in a way, which ensures

homogenous and consistent content and quality across Europe. Note, several CLC layers are

already derived based on similar principles in countries using bottom-up methods; all the

outputs correspond to basic CLC specifications, but have slightly different characteristics in

terms of geometry and statistics.

• Instead of 25 ha MMU vector CLC, CLC accounting layers are considered as reference layers

concerning targeted content:

o Applying standardized raster generalization on CLC-core outputs to reach 5ha MMU

(instead of 25ha), corresponding to CLC-change layers.

o Less significance of complex CLC classes, like 243. Standard methodology is to be

developed to reproduce relevant information contained by some other complex CLC

classes, like 244.

o CLC status layers of past reference dates (2012, 2006, 2000, ..) are to be created by

combining CLC-legacy with previous CLC-change data.

The above concept assumes, that in the future, besides of the update of CLC-core and CLC+ status,

CLC compatible change information is to be derived as well. Obviously, there are several

methodological and technical details to be clarified and solved during upcoming practical test

exercises.

7.17.17.17.1 ExperienExperienExperienExperiences to be consideredces to be consideredces to be consideredces to be considered

The following experiences will be considered for establishing the grounds for creating CLC-Legacy:

• CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC)

system. The methodology was developed by ETC/ULS to create these layers, as well as for

the raster generalization of CLC accounting layers32.

• Methodologies for bottom-up creation of CLC data have been developed by several CLC

national teams. This includes spatial and thematic aggregation of in-situ based high

resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain).

In most cases, rules were developed to allow thematic conversion in an automated or semi-

automated fashion. For spatial generalisation different techniques are available, depending

on the starting point in each of the countries.

Figure 7-1 to Figure 7-3 illustrate different techniques and results of national generalisation

approaches to produce standard CLC from more detailed data.

32 Generalization of adjusted CLC layers. ETC/ULS report 2016.

56 | P a g e

Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently

shrinking. The technique exists for polygon as well as raster data.

Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III

Figure 7-2: Generalization levels used in LISA generalization in Austria

Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany

Specific example

During the transition phase of traditional CLC to CLC-Legacy special attention needs to be paid to the

handling of geometries and resulting geometric differences vs. changes. Error! Reference source

not found.Error! Reference source not found. shows the results from Germany where the process

changed from mapping to a bottom-up approach using existing national data in 2012. While on the

left side of the river there are very little changes (located in Luxembourg), the German side shows

clearly some differences due to the changed approach. The CLC change layer is then used to

differentiate technical changes from real changes. This step is needed once to make this transition.

Future updates using the bottom-up approach will be consistent with the 2012 situation. shows the

results from Germany where the process changed from mapping to a bottom-up approach using

existing national data in 2012. While on the left side of the river there are very little changes (located

in Luxembourg), the German side shows clearly some differences due to the changed approach. The

CLC change layer is then used to differentiate technical changes from real changes. This step is

needed once to make this transition. Future updates using the bottom-up approach will be

consistent with the 2012 situation. shows the results from Germany where the process changed

57 | P a g e

from mapping to a bottom-up approach using existing national data in 2012. While on the left side of

the river there are very little changes (located in Luxembourg), the German side shows clearly some

differences due to the changed approach. The CLC change layer is then used to differentiate

technical changes from real changes. This step is needed once to make this transition. Future

updates using the bottom-up approach will be consistent with the 2012 situation. shows the results

from Germany where the process changed from mapping to a bottom-up approach using existing

national data in 2012. While on the left side of the river there are very little changes (located in

Luxembourg), the German side shows clearly some differences due to the changed approach. The

CLC change layer is then used to differentiate technical changes from real changes. This step is

needed once to make this transition. Future updates using the bottom-up approach will be

consistent with the 2012 situation.

CLC 2012

CLC 2006

Figure 7-4: Result of the generalisation process in Germany with technical changes in different

classes

58 | P a g e

8 TECHNICAL SPECIFICATIONS

8.18.18.18.1 CLCCLCCLCCLC----BackboneBackboneBackboneBackbone

CLC-Backbone

[Example image if available.]

Description

CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format

providing a geometric backbone with limited, but robust thematic detail on which to build other

products.

Input data sources (EO & ancillary)

EO Overview

Spatial resolution 10 – 20 m

Spectral content – Visible, NIR, SWIR, SAR

Temporal revisit –Bi-Monthly to achieve monthly cloud free acquisitions

Acquisition approach – likely to be multi-mission

Candidate missions – Sentinel-2 (Landsat 8).

Contributing data – Sentinel-1

Ancillary data

OSM roads and railways (or comparable national datasets)

EU – Hydro (or WISE WFD)

European coastline

OSM buildings

OSM land use

HRL imperviousness (refined with ESM)

LoCo Riparian Zones

LoCo Urban Atlas

59 | P a g e

Thematic Content

EAGLE Land Cover Components (Option 2 – 9 classes):

• Sealed

• Woody vegetation - coniferous

• Woody vegetation - broadleaved

• permanent herbaceous (i.e. grasslands)

• Periodically herbaceous (i.e. arable land)

• Permanent bare soil

• Non-vegetated bare surfaces (i.e. rock and screes)

• Water surfaces

• Snow & ice

Methodology

The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step

(hardbone) boundaries of relatively stable larger areas are detected based on existing data sets. In

a second step image segmentation techniques based on multi-temporal Sentinel-2 data are applied

to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains a

classification and spectral-temporal attribution (Sentinel 1 and 2) of the landscape objects derived

from steps 1 and 2 using a pixel-based classification of the Land Cover components. The pixel based

classification is generalized on object level using indicators like dominating land cover class and

percentual mixture of land cover classes.

Geometric resolution (Scale)

~1:10 000

Geographic projection / Reference system

European Terrestrial Reference System 1989 (ETRS89)

Coverage

EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria,

Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland,

France , Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council

Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of

Macedonia, Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia,

Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in

the regular CLC dataflow. The number of countries involved is 39. The area covered is,

approximately, 6 Million km2.

Geometric accuracy (positioning accuracy)

Less than half a pixel based on 10 m spatial resolution input data

Spatial resolution / Minimum Mapping Unit (MMU)

0.5 to 5 ha for final product

Min. Width of linear features

10 – 50 m

Topological correctness

[TBD]

60 | P a g e

Raster coding

[n/a]

Thematic accuracy (in %) / quality method

The target will be set at 90 %. A quantitative approach will be used based on a set of stratified

random point samples compared to external datasets (e.g. GoogleEarth, national orthophotos or

national grassland datasets).

Target / reference year

2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)

Up-date frequency

[TBD]

Information actuality

Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for

the ancillary datasets.

Delivery format

Vector

Data type

Vector

Naming conventions

[TBD]

Medium

[TBD]

Delivery reliability

[TBD]

Delivery time

[TBD]

Archive

[TBD]

61 | P a g e

8.28.28.28.2 CLCCLCCLCCLC----CoreCoreCoreCore

CLC-Core

[Example image if available.]

Description

CLC-Core is a consistent, multi-use grid database repository for environmental information

populated with a broad range of land cover, land use and ancillary data, forming the information

engine to deliver and support tailored thematic information requirements.

Input data sources

CLMS products

Corine Land Cover, CLC1990, CLC2000, CLC2006, CLC2012

High Resolution Layers (Imperviousness, Forest, Grassland, Wetland & Water)

Reference datasets (EU-DEM, EU-Hydro)

Related pan-European products (ESM)

Local components (Urban Atlas, Riparian Zones, Natura 2000, Coastal)

Ancillary data

Open Street Map

Member State data, particularly land use information

Thematic Content

Input data sources stored within a version of the EAGLE data model.

Methodology

TBD, see chapter 5.4

Geometric resolution (Scale)

TBD

Geographic projection / Reference system

European Terrestrial Reference System 1989 (ETRS89)

Coverage

EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria,

Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland,

France , Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council

Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of

Macedonia, Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia,

Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in

the regular CLC dataflow. The number of countries involved is 39. The area covered is,

approximately, 6 Million km2.

62 | P a g e

Geometric accuracy (positioning accuracy)

Less than half a pixel based on 10 m spatial resolution input data

Spatial resolution / Minimum Mapping Unit (MMU)

0.5 ha to 1 km2 for final product

Min. Width of linear features

Grid size

Topological correctness

Grid spatial structure

Raster coding

[n/a]

Thematic accuracy (in %) / quality method

TBD

Target / reference year

2018

Up-date frequency

[TBD]

Information actuality

Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for

the ancillary datasets.

Delivery format

Grid

Data type

Grid

Naming conventions

[TBD]

Medium

[TBD]

Delivery reliability

[TBD]

Delivery time

[TBD]

Archive

[TBD]

63 | P a g e

9 ANNEX 1: OSM DATA

9.19.19.19.1 Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2

Source: Schulz et al (2017): Open land cover from OpenStreetMap and remote sensing, Int J Appl

Earth Obs Geoinformation 63 (2017) 206-213

64 | P a g e

9.29.29.29.2 OSM road nomenclatureOSM road nomenclatureOSM road nomenclatureOSM road nomenclature

The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1

delineation. Road types that are subject for visual control as they are large enough to be

represented as area feature (>= 10m width) are marked in the second column. In many countries,

most roads are tagged as “unclassified”. Only highway = motorway/motorway_link implies anything

about quality.

CLC-

Backbone

Potential

10m width

Key Value Comment

y 10m highway motorway A restricted access major divided highway, normally with 2 or

more running lanes plus emergency hard shoulder. Equivalent

to the Freeway, Autobahn, etc..

y highway trunk The most important roads in a country's system that aren't

motorways. (Need not necessarily be a divided highway.)

y 10m highway primary The next most important roads in a country's system. (Often

link larger towns.)

y highway secondary The next most important roads in a country's system. (Often

link towns.)

y highway tertiary The next most important roads in a country's system. (Often

link smaller towns and villages)

y highway unclassified The least most important through roads in a country's system –

i.e. minor roads of a lower classification than tertiary, but which

serve a purpose other than access to properties. Often link

villages and hamlets. (The word 'unclassified' is a historical

artefact of the UK road system and does not mean that the

classification is unknown; you can use highway=road for that.)

y highway residential Roads which serve as an access to housing, without function of

connecting settlements. Often lined with housing.

y highway service For access roads to, or within an industrial estate, camp site,

business park, car park etc. Can be used in conjunction with

service=* to indicate the type of usage and with access=* to

indicate who can use it and in what circumstances.

y Link

roads

y 10m highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway

from/to a motorway or lower class highway. Normally with the

same motorway restrictions.

y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road

from/to a trunk road or lower class highway.

y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road

from/to a primary road or lower class highway.

y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary

road from/to a secondary road or lower class highway.

y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road

from/to a tertiary road or lower class highway.

y Special

road

types

y highway track Roads for mostly agricultural or forestry uses. To describe the

quality of a track, see tracktype=*. Note: Although tracks are

often rough with unpaved surfaces, this tag is not describing the

quality of a road but its use. Consequently, if you want to tag a

general use road, use one of the general highway values instead

of track.

65 | P a g e

9.39.39.39.3 OSM roads OSM roads OSM roads OSM roads –––– completenesscompletenesscompletenesscompleteness

Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh

and Millard-Ball (2017). A sigmoid curve was fitted to the cumulative length of contributions within

OSM data and used to estimate the saturation level for each country.

country ISO-Code frcComplete_fit OSM length [km]

Serbia SRB 74,3% 61.233

Bosnia and Herzegovina BIH 86,5% 37.877

Turkey TUR 89,7% 501.384

Sweden SWE 92,7% 280.107

Norway NOR 93,6% 132.527

Hungary HUN 93,8% 90.670

Romania ROU 94,3% 162.844

Ireland IRL 96,0% 107.263

Spain ESP 96,2% 509.937

Liechtenstein LIE 96,3% 359

Croatia HRV 96,8% 60.119

Bulgaria BGR 96,9% 80.983

Cyprus CYP 97,0% 12.091

Albania ALB 97,6% 22.875

Italy ITA 97,9% 594.384

Belgium BEL 98,3% 103.778

Lithuania LTU 98,5% 69.885

Poland POL 98,5% 358.733

Iceland ISL 98,6% 17.871

France FRA 99,9% 1.164.238

Denmark DNK 100,3% 96.656

Estonia EST 100,3% 42.878

Portugal PRT 100,4% 150.176

United Kingdom GBR 100,4% 443.265

Netherlands NLD 100,5% 136.118

Greece GRC 100,8% 156.586

Slovakia SVK 101,4% 41.566

Austria AUT 101,4% 128.934

Luxembourg LUX 101,5% 6.210

Germany DEU 101,6% 744.304

Latvia LVA 101,7% 61.066

Montenegro MNE 102,2% 12.346

Macedonia (FYROM) MKD 102,4% 15.174

Switzerland CHE 103,0% 73.066

Czech Republic CZE 104,0% 112.441

Finland FIN 105,3% 231.298

Kosovo XKO 107,4% 13.297

Malta MLT 109,8% 2.241

Slovenia SVN NoData 36.041

66 | P a g e

10 ANNEX 2: LPIS DATA IN EUROPE

The land parcel identification system is the geographic information system component of IACS

(Integrated agricultural control system). It contains the delineation of the reference areas, ecological

focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to

different national priorities in the implementation of the CAP the LPIS-data are not homogeneous

throughout Europe. Some countries use rather aggregated physical blocks as geometric reference

and other countries use single parcel management units as smallest geographic entity within LPIS.

Three types of spatial objects can in principal be differentiated within LPIS (Figure 10-1):

• Physical block

• Field parcel

• Single management unit

A physical block is defined as the area that is surrounded by permanent non-agricultural objects like

roads, rivers, settlement, forest or other features. A physical block normally contains several field

management units that may belong to various farmers.

A field parcel is owned by a single farmer and is the dedicated management unit for a specific

agricultural main category. It contains either grassland or arable land, but never both categories. It

may be split into several single sub-management units.

A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific

grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow).

The relation between field parcels and single management units is a 1:m relation, where 1 field

management unit can contain 1 or many sub-management units.

Remark to the relation with the real estate database:

Land properties are recorded in many countries in the real estate database. An elementary part of

the real estate database is the cartographic representation of real estate parcel boundaries. These

parcels should not be mixed up with the field parcels. The parcels in the real estate database

constitutes owner ship rights and are documented as official boundaries. The boundaries in

agricultural management units however are delineated to derive correct area figures as base for

payments under the CAP. In most cases the parcel boundaries between the real estate database and

the LPIS will be the same, but not in all cases.

67 | P a g e

68 | P a g e

Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit

From year to year the availability of national LPIS data is increasing. An overview of available and

downloadable datasets has been created based on a previous ETC-ULS report by Synergise within

the Horizon 2020 project LandSense (Figure 10-2). The minimum thematic detail that should be

harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3)

permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite

heterogeneous classes on national level a harmonization is quite challenging, as they are applied

quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any

level could already provide valuable input given that LPIS is available for the large majority of

countries. Unless a critical number of countries with available LPIS data is reached the inclusion of

LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based

on pure image segmentation.

69 | P a g e


LandSense

70 | P a g e

11 ANNEX 3: ILLUSTRATION OF STEP 1 “HARDBONE GEOMETRY” OF CLC-

BACKBONE

71 | P a g e

72 | P a g e

12 ANNEX 4: CLC-BACKBONE DATA MODEL

The complete list of feature types within CLC-Backbone contain:

• INSPIRE

o Land cover dataset

o Land cover unit

• EAGLE

o Land cover component

Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects).

Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover

unit can consist of one or more land cover components (either in time and/or space), each of which

with a specific lifespan.

73 | P a g e

INSPIRE main feature types: land cover dataset and land cover unit

Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit

EAGLE extension to land cover unit and new feature type land cover component

Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component

74 | P a g e

Each land cover unit can contain one or more so called parametric observations. According to

INSPIRE these parametric observations are limited to percentage, countable and presence

parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI,

or vegetation trajectories) of a single date this parameter type is enlarged with the data type

“Indexparameter”.

Therefore, the CLC-Backbone data model includes the single date observations as ParameterType

(with a specific observationDate and a measurement as real number (0…1 or 0….100). A new CLC-

Backbone base model is shown in Figure 12-3.

Figure 12-3: The new CLC-Basis model (based on LISA)

The new CLC-Backbone data model includes now three feature types:

• LandCoverDataset

• LandCoverUnit

• LandCoverComponent

And two additional datatypes

• LandCoverUnit

o ParameterType

� Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min,

…)

� ObservationDate: date of the observation (e.g. acquisition date of satellite

scene)

� IndexParameter: Value of e.g. NDVI_max, etc. to be stored here

75 | P a g e

• LandCoverComponent

o Occurrence

� Cover percentage of a single land cover component within the LC Unit

o Overlaying

� 1….if the LC component is overlaying another component (e.g. temporarily

covered with water, snow)

� 0…default-value, if the component is not overlaying

o validFrom: The time when the activity complex started to exist in the real world.

o validTo: The time when the activity complex no longer exists in the real world.

Integration of temporal dimension

12.1.1 Integration of temporal dimension into LISA

As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of

orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for

the adaptation of the model is that the land cover unit establishes the more or less stable geometry

over time (>1 year continuity).

Within the land cover unit several land cover components can exist with a specific life time. The land

cover unit can be considered as stable container for these changing land cover components.

Types of changes:

Changes within a land cover unit is in general regarded as status change.

We consider changes within the land cover components as cyclic changes.

Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that

will be defined within the products nr. 5.

The implementation of the temporal dynamics is realized using a 1:many relation between the

LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the

specific component.

Example

A typical agricultural field the land cover unit could have the following LC components:

• Bare soil January – March

• Herbaceous vegetation April-June

• Bare soil 2nd half of June

• Herbaceaous vegetation July-September

• Bare soil 2nd half of September

• Herbaceous vegetation October-December

A web-based application within the GSE Cadaster Environment project (financed by ESA) is available

under: http://demo.geoville.com/shiny/cadenv/p2/wien/ to demonstrate the above-mentioned

concept (Figure 12-4).

76 | P a g e

Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a

vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017)

Important note:

It is not foreseen in the technical specifications that each temporal LC component is stored in the

underlying CLC Base production database. The production chain may just provide the final

classification of the LC unit in the final CLC-Backbone product, without giving information on the

appearance of land cover components throughout.

12.1.2 Integration of multi-temporal observations into LISA

Spectral profiles over time constitute a valuable information source (Figure 12-5), not only for the

automated classification, but as well for the visual interpretation of object characteristic. For each

CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.)

can be stored in the data model using the parametricObservation.

77 | P a g e

Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is

stored in the data model

The sketch above illustrates the principles of the combined temporal information that is stored in

the data model

• The graph above illustrates the variation within a year within a LandCoverUnit. The

geometric borders of the unit are pre-fixed based on the Level 2 landscape object

geometries

o red points: single observations from each S-2 scene calculated on object level (Level

2 landscape polygon)

� From these observations (stored in the parametricObservation of the

LandCoverUnit the complete temporal profile for the specific object can be

reconstructed as graph and visually illustrated to domain experts.

o Brown and green bars: lifespan of the single LandCoverComponents (green –

herbaceous vegetation and brown – bare soil) within a Land Cover unit

o Dark green bar: classification of the land cover unit according to the information of

the sequential appearance of land cover components

Within the LandCoverComponent the attributes validFrom and validTo are used to store the life-time

information of the component. If only the start, but not the end of a LandCoverComponent can be

observed (e.g. due to missing observations in time), only the validFrom can be used. The element

will automatically be set to invalid, if a new LandCoverComponent appears (with 100% coverage of

the landcover unit).

78 | P a g e

13 ANNEX 5 – CLC NOMENCLATURE

Documents

D3: Dra design concept and CLC-Backbone, CLC-Core ...cnig.gouv.fr/wp-content/uploads/2017/11/D3_Specs_20171110.pdf · 4.3 EO data ... spatio-temporal Triple Store Approach ... defines