Upload
duongdien
View
216
Download
0
Embed Size (px)
Citation preview
EEA/IDM/R0/17/003
Technical specifications for implementation
of a new land-monitoring concept based on
EAGLE
D3: Draft design concept and CLC-Backbone,
CLC-Core technical specifications, including
requirements review.
Version 3.0
10.11.2017
1 | P a g e
Version history
Version Date Author Status and
description
Distribution
2.0 04/10/2017 S. Kleeschulte, G. Banko,
G. Smith, S. Arnold, J.
Scholz, B. Kosztra, G.
Maucha
Reviewers:
G-H Strand, G. Hazeu, M.
Bock, M. Caetano, L.
Hallin-Pihlatie
Draft for review by NRC
LC
EEA, NRC LC
3.0 10/11/2017 S. Kleeschulte, G. Banko,
G. Smith, S. Arnold, J.
Scholz, B. Kosztra, G.
Maucha
Reviewers:
G-H Strand, G. Hazeu, M.
Bock, M. Caetano, L.
Hallin-Pihlatie
Updated draft integrating
the comments following
the NRC LC workshop
For distribution at Copernicus User
Day
2 | P a g e
Table of Contents
1 Context ............................................................................................................................................ 6
2 Introduction .................................................................................................................................... 7
2.1 Background ............................................................................................................................. 7
2.2 Concept ................................................................................................................................... 8
2.3 Role of industry ..................................................................................................................... 12
2.4 Potential role of Member States........................................................................................... 12
2.5 Engagement with stakeholder community ........................................................................... 13
3 Requirements analysis .................................................................................................................. 14
4 CLC-Backbone – the industry call for tender ................................................................................ 16
4.1 Spatial scale / Minimum Mapping Unit ................................................................................ 18
4.2 Reference year ...................................................................................................................... 18
4.3 EO data .................................................................................................................................. 18
4.3.1 Pre-processing of Sentinel-2 data ................................................................................. 19
4.4 Copernicus and European ancillary data sets ....................................................................... 19
4.4.1 Completeness OSM road data ...................................................................................... 25
4.5 National data ......................................................................................................................... 28
4.5.1 Example: Rivers and lakes ............................................................................................. 28
4.5.2 Example: roads .............................................................................................................. 29
4.5.3 Example: land parcel identification system (LPIS) ........................................................ 29
4.6 Specification for geometric delineation ................................................................................ 30
4.6.1 Level 1 Objects – hardbones ......................................................................................... 31
4.6.2 Level 2 Objects – softbones .......................................................................................... 34
4.6.3 Accuracy of geometric delineation: .............................................................................. 34
4.7 Specification for thematic attribution .................................................................................. 34
4.7.1 Thematic classes ........................................................................................................... 35
4.8 Proposal for production methodology ................................................................................. 37
4.8.1 Step 1: Definition and Selection of persistent boundaries ........................................... 38
4.8.2 Step 2: Image segmentation ......................................................................................... 40
5 CLC-Core – the grid approach ....................................................................................................... 41
5.1 Background ........................................................................................................................... 41
5.2 Populating the database ....................................................................................................... 42
5.3 Data modelling in CLC-Core .................................................................................................. 43
5.4 Database implementation approach for CLC-Core ............................................................... 44
3 | P a g e
5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores .................. 44
5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach ....................................... 49
5.4.3 Processing and Publishing of CLC-Core Products .......................................................... 50
5.4.4 Conclusion and critical remarks .................................................................................... 50
6 CLC+ - the long-term vision ........................................................................................................... 51
7 CLC-Legacy .................................................................................................................................... 54
7.1 Experiences to be considered ............................................................................................... 55
8 Technical specifications ................................................................................................................ 58
8.1 CLC-Backbone ....................................................................................................................... 58
8.2 CLC-Core ................................................................................................................................ 61
9 Annex 1: OSM data ....................................................................................................................... 63
9.1 Crosswalk between OSM land use tags and CLC nomenclature Level 2 ............................... 63
9.2 OSM road nomenclature....................................................................................................... 64
9.3 OSM roads – completeness .................................................................................................. 65
10 Annex 2: LPIS data in Europe .................................................................................................... 66
11 Annex 3: illustration of Step 1 “hardbone geometry” of CLC-Backbone .................................. 70
12 Annex 4: CLC-Backbone data model ......................................................................................... 72
12.1.1 Integration of temporal dimension into LISA................................................................ 75
12.1.2 Integration of multi-temporal observations into LISA .................................................. 76
13 Annex 5 – CLC nomenclature .................................................................................................... 78
4 | P a g e
List of Figures
Figure 2-1: Conceptual design for the products and stages required to deliver improved European
land monitoring (2nd generation CLC). .................................................................................................... 8
Figure 2-2: A scale versus format schematic for the current and proposed CLMS products. .............. 11
Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%
of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC) ....................................................... 16
Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori
information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)
the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects
based on this classification ................................................................................................................... 17
Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova),
Romania (near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Senitnel-2
cloud free services as Background (left to right). ................................................................................. 27
Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services
using the ELF interface (Nov. 2017). ..................................................................................................... 29
Figure 4-5: Overview of available GIS datasets LPIS and their thematic content © Synergise, project
LandSense. ............................................................................................................................................ 30
Figure 4-6: illustration of results of step 1 – delineation of Level 1 landscape objects. The border
defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data
was available (not shaded). .................................................................................................................. 39
Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between
encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:
10 daa = 1 ha. ........................................................................................................................................ 42
Figure 5-2: Representation of real world data in the CLC-Core. ........................................................... 43
Figure 5-3: Example of an RDF triple (subject - predicate - object). ..................................................... 47
Figure 5-4: GeoSPARQL query for Airports near the City of London. ................................................... 48
Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. .......................... 48
Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other.
The arrows indicate the flow of information from a query directed to the French CLC SPARQL
endpoint. ............................................................................................................................................... 49
Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.
.............................................................................................................................................................. 50
Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently
shrinking. The technique exists for polygon as well as raster data. ..................................................... 56
Figure 7-2: Generalization levels used in LISA generalization in Austria .............................................. 56
Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany ............................................... 56
Figure 7-4: Result of the generalisation process in Germany with technical changes in different
classes ................................................................................................................................................... 57
Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit68
Figure 10-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project
LandSense ............................................................................................................................................. 69
Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit .............................. 73
Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component ......... 73
Figure 12-3: The new CLC-Basis model (based on LISA) ....................................................................... 74
Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a
vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017) .................. 76
5 | P a g e
Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is
stored in the data model ...................................................................................................................... 77
List of Tables
Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd
generation CLC. ..................................................................................................................................... 10
Table 2-2: Summary (matrix) of potential roles associated with each element / product. ................. 13
Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which
were analysed as potential input to support the construction of the geometric structure (Level 1
“hard bones”) of the CLC-Backbone. The products finally proposed for further processing are marked
in green. ................................................................................................................................................ 20
Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones) ............ 31
Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. .............. 47
Table 6-1: List of CLC classes and requirements for external information ........................................... 52
6 | P a g e
1 CONTEXT
The European Environment Agency (EEA) and European Commission DG Internal Market, Industry,
Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual
strategy and associated technical specifications for a new series of products within the Copernicus
Land Monitoring Service (CLMS) portfolio, which should meet the current and future requirements
for European Land Use Land Cover (LULC) monitoring. These products are nominally called the "2nd
generation CORINE Land Cover (CLC)".
After a call for tender in 2017, the EEA has tasked the EIONET Action Group on Land monitoring in
Europe (EAGLE Group) with developing an initial response to fulfilling these needs through a
conceptual design and technical specifications. The approach adopted represents a sequence of
development stages, where separate single elements of the whole concept could be developed
relatively independently and at different rates, to allow time for broad consultation with
stakeholders. This also allowed the inclusion of MS input, the exploitation of industrial production
capacity, and the necessary feedback, lessons learnt and refinement to reach the ultimate goal of a
coherent and harmonized European Land Monitoring Framework.
The first stage of this process outlined the conceptual strategy and proposed a draft technical
specification for the first product (CLC-Backbone) to be developed. The first stage also involved a
presentation of the concept to the EIONET NRCs Land Cover and the collection of the NRCs feedback,
which were then carefully taken into consideration for the continuation of the process into the
second stage. The first stage was undertaken within the constraints outlined by EEA and DG GROW
for the initial product:
• Industrial production by service providers,
• Outcome product in vector format,
• Highly automated production process,
• Short timeframe of production phase,
• Driven by Earth Observation (Sentinel-2),
• Complete the picture started by the Local Component1 products (EEA-39).
This document represents the second stage of development to further expand on the conceptual
strategy and to extend the current technical specifications, to propose additional follow-up products
in more detail and to continue to communicate these to the stakeholders involved in the field of
European land monitoring. This second stage also aims at enlarging the circle of stakeholders to all
interested Copernicus users and beyond. It is vital for the success of the project and the long-term
evolution of land monitoring in Europe that all the relevant stakeholders communicate their
requirements and opinions to this process.
Revised versions of this document are to continue to be produced at specific milestones towards a
final version in early 2018.
1 The term “component” has a twofold function, 1) as the Copernicus local component products, 2) as a term
for Land Cover Component as an element within the EAGLE data model
7 | P a g e
2 INTRODUCTION
This chapter provides the background to the concept, an outline of the proposed approach to be
adopted, an overview of the elements of the concept and the current status of the developments.
2.12.12.12.1 Background Background Background Background
Monitoring of Land Use and Land Cover (LULC) and their evolving nature is among the most
fundamental environmental survey efforts required to support policy development and effective
environmental management2. Information on LULC play a key role in a large number of European
environmental directives and regulations. Many current environmental issues are directly related to
the land surface, such as habitats, biodiversity, phenology and distribution of plant species,
ecosystem services, as well as other issues relevant to climate change. Human activities and
behaviour in space (living, working, education, supplying, recreation, mobility & communication,
socializing) have significant impacts on the environment through settlements, transportation and
industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The
land surface, who´s negative change of state can only – if at all – be reversed with huge efforts, is
therefore a crucial ecological factor, an essential economic resource, and a key societal determinant
for all spatially relevant basic functions of human existence and, not least, nations’ sense of identity.
Land thus plays a central role in all three factors of sustainable development: ecology, economy, and
society.
LULC products so far tend to be produced independently of each other at the global, European,
national and sub-national levels, each of them focussing on similar but still different emphasis of
thematic content. Such diversity leads to reduced interoperability and duplication of work, and
thereby, inefficient use of resources. At the European level CORINE Land Cover (CLC) is the flagship
programme for long-term land monitoring and is now part of the Copernicus Land Monitoring
Service (CLMS). CLC has been produced for reference years of 1990, 2000, 2006 and 2012, with 2018
under preparation and expected to be available by late 2018. The CLC specification aims to provide
consistent localized geographical information on LULC using 44 classes at level-3 in the
nomenclature (see Annex 5). The vector databases have a minimum mapping unit (MMU) of 25 ha
and minimum feature width (MFW) of 100 m with a single thematic class attribute per land parcel.
At the European level, the database is also made available on a 100 x 100 m and 250 x 250 m raster
which has been aggregated from the original vector data at 1:100 000 scale. CLC also includes a
change layer which records changes between two of the 44 thematic classes with a MMU of 5 ha.
Although CLC has become well established and has been successfully used, mainly at the pan-
European level, there are a number of deficiencies and limitations that restrict its wider exploitation,
particularly at the Member State (MS) level and below. This is partly due to the fact that many MS
have access to more detailed, precise and timely information from national programs, but also to
the fact that the MMU of CLC (25 ha) is too coarse to capture the fine details of the landscape at the
local and regional scales. In consequence, all features of smaller size that represent landscapes
diversity and complexity are not mapped either because of geometric generalization or because they
are absorbed by thematically mixed classes with very broad definitions. Moreover, changes of
features and landscape dynamics that are highly relevant to locally decided but globally effective
policy, such as small-scale forest rotation, changes in agricultural practices and urban in-filling, may
be missed due to low spatial resolution and/or thematic depth of CLC class definitions. The
successful use of CLC in combination with higher spatial resolution products to detect and document
2 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project.
8 | P a g e
urban in-filling has given a clear indication of the required direction of development for European
land monitoring.
To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to
include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide
pan-European information on selected surface characteristics in a 20 m raster format, also available
aggregated to 100 m raster cells. They provide information on basic surface properties such as
imperviousness, tree cover density and permanent grassland and can be described as intermediate
products. The LoCo products in vector format are based in part on very high spatial resolution EO
data tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide detailed
thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha. However, the
LoCo products altogether do not provide wall to wall coverage of the EEA-39 countries, even when
combined. Their nomenclatures are not harmonised and thus cause interoperability problems.
2.22.22.22.2 Concept Concept Concept Concept
Given the above issues a revised concept for European Land Monitoring is required which both
provides improved spatial and thematic performance and builds on the existing heritage. Also,
recent evolutions in the field of land monitoring (i.e. improved Earth Observation (EO) input data
due to e.g. Sentinel-programme, national bottom-up approaches, processing methods, additional
reporting requirements, INSPIRE directive, etc.) and desktop and cloud-computing capacities offer
opportunities to deliver these improvements effectively and efficiently. The EEA in conjunction with
DG GROW has identified this need and now aims to harmonise and integrate some of the CLMS
activities by investigating the concept and technical specifications for a higher performance pan-
European mapping product under the banner of "2nd generation CLC".
It was determined that such a 2nd generation CLC should build upon recent conceptual development
and expertise, while still guaranteeing backwards compatibility with the conventional CLC datasets.
Also, the proposed approach should be suitable to answer and assist the needs of recent evolution
in European policies like reporting obligations on land use, land use change and forestry (LULUCF),
plans for an upcoming Energy Union or long-term climate mitigation objectives. After a successful
establishment of 2nd generation CLC there would then also be knock-on benefits for a broad range of
other European policy requirements, land monitoring activities and reporting obligations.
The proposed conceptual strategy consists of a number of interlinked elements (Figure 2-1) which
stand for separate products and therefore can be delivered independently in separate stages. Each
of the products has its own technical specification and production methodology and can be
produced through its own funding / resourcing mechanisms. Throughout the documents delivered
by this project the conceptual design given in (Figure 2-1) will be used as a key graphic to identify the
product and stage being described.
Figure 2-1: Conceptual design for the products and stages required to deliver improved European
land monitoring (2nd generation CLC).
9 | P a g e
The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is
based on the four elements shown in Figure 2-1. The reports associated with this work will expand
incrementally on the conceptual design and provide technical specifications of increasing
elaboration for the products. Although the four elements / products will be described in more detail
later in this, and subsequent, reports they can be summarised as follows:
1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a
geometric spatial structure for landscape features with limited, but robust EO-based land
cover thematic detail on which to build other products.
2. CLC-Core is a consistent, multi-use grid3 database repository for environmental information
populated with a broad range of land cover, land use and ancillary data, forming the
information engine to deliver and support tailored thematic information requirements.
3. CLC+ is the end point for this specific exercise and is a derived vector and raster product
from the CLC-Core and CLC-Backbone and will be a LULC monitoring product with improved
spatial and thematic performance, relative to the current CLC, for reporting and assessment.
4. The final element of the conceptual design, although not strictly a new product, is the ability
to continue producing the existing CLC, which may be referred to as CLC-Legacy in the
future, which already has a well-established and agreed specification.
Table 2-1 provides a first overview of the main characteristics of the four elements that are
developed in more detail in the following chapters of this document. The table allows the reader to
make comparisons between the key characteristics of the products
3 A data structure whose grid cells are linked to a data model that can be populated with the information from
the different sources.
10 | P a g e
Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd
generation CLC.
CLC-Backbone CLC-Core CLC+ CLC-Legacy
Description Detailed wall to wall
(EEA-39) geometric
vector reference layer
with basic thematic
content.
All-in-one data
container for land
monitoring
information
according to EAGLE
data model.
Thematically and
geometrically
detailed LULC
product.
A more generalised
LULC product
consistent with the
CLC specification.
Role /
purpose
Support to CLMS
products and services
at the pan-European
and local levels.
Thematic
characterisation of
CLMS products and
services at the pan-
European and local
levels.
Support to EU and
national reporting
and policy
requirements.
Maintain the time
series (backwards
compatibility) and
support legacy
business systems.
Format Vector. GRID database. Raster and vector. Raster and vector.
Thematic
detail
<10 basic land cover
classes, few spectral-
temporal attributes.
Rich attribution of
LU, LC and ancillary
information.
High thematic detail
including LC and LU
with improvements
compared to CLC.
CLC-nomenclature
with 44 classes
+ changes between
the 44.
Geometric
detail (MMU,
grid size)
0.25 to 5.0 ha. 10 x 10 m to
1 x 1 km.
TBD, related to CLC-
Backbone.
25 ha for status, 5
ha for changes
(raster: 100 x
100 m).
Update cycle 3 - 6 years. Dynamic update as
new information
becomes available.
3 - 6 years. Standard 6 years.
Reference
year
2018 2018 2018 2018 / 2024
Production
year
2018 2019 TBD TBD
Method Geospatial data
integration and image
segmentation for
geometric boundaries
and attribution /
labelling by pixel-
based EO derived
land cover
EAGLE-Grid
approach:
population and
attribution of
regular GRID
Derived by SQL from
CLC-Core
Derived from CLC+
or directly from CLC-
Core (TBD.)
Input data Geospatial data, EO
images and ancillary
data selected from
LoCo (with visual
control) and HRL
products.
CLC-Backbone, all
available HRLs,
LoCo, ancillary data,
national data
provisions (e.g. LU).
CLC-Core and
CLC-Backbone
CLC + and / or
CLC-Core
11 | P a g e
Figure 2-2 is a schematic description of the conceptual design showing the relationship of the new
elements / products to existing CLMS products in terms of their format and level of spatial detail. In
this representation:
1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map
with fewer details.
2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more spatial and
thematic details.
3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific
surface characteristics with high spatial details.
4. The proposed CLC-core, a grid product where the level of detail was still to be decided.
5. The CLC-Backbone and CLC+, both polygon maps with more details
Figure 2-2: A scale versus format schematic for the current and proposed CLMS products.
Given the context, requirements and issues, the aim should be to find a means to deliver the
concept and its proposed products with an efficient mixture of industrially produced material backed
up by auxiliary information from various national programmes. It is important to propose a viable
system in which there is flexibility to adapt the later steps to issues of feasibility and practicality once
the implemention of the earlier steps is underway. For instance, the outcomes of the CLC-Backbone
production should be able to influence thematic content of the CLC-Core and / or the technical
specifications of the CLC+.
12 | P a g e
2.32.32.32.3 Role of industryRole of industryRole of industryRole of industry
The development and production of CLC to this pointed has had only a limited role for industry,
mainly focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012
etc.), the validation of the CLC2012 products and in some MS the subcontracting of the actual
production. Conversely, the production of the non-CLC products within the CLMS has been
dominated by industry through a series of service contracts to generate consistent products across
Europe.
Industry has the ability to produced operational solutions which exploit automation, can handle
large data volumes and are scalable to European-wide requirements. As the amount of available EO
and GI data increases, highly efficient and effective mechanisms for production will be required for
at least parts of the European land monitoring process and economies of scale must be exploited.
Industry therefore offers a number of capabilities which will be required at selected points within
the 2nd generation CLC.
DG GROW has expressed the wish that industry should have an initial role in the production of CLC-
Backbone which has therefore been designed in part to exploit industrial capabilities. Further
opportunities for industrial involvement are shown in Table 2-2.
2.42.42.42.4 Potential role Potential role Potential role Potential role of Member Statesof Member Statesof Member Statesof Member States
The MS have always been intimately linked with the CLC as its production has been the responsibility
of the nominated authorities through EIONET. The MS have provided the production teams for the
actual mapping to exploit local knowledge, familiarity with native landscape types and access to
national datasets which may not have been possible to share further. This bottom-up production
approach has been key to successful delivery of this important time series.
With respect to the non-CLC products within the CLMS the MS involvement has been limited so far
to verification and, in the case of the 2012 products, enhancement. Although the MS have provided
valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have
not been fully exploited within the CLMS.
Table 2-2 shows the potential for a greater role for the MS across all of elements / products. It is
important that the MS experts have oversight of the specification, as is happening within this
project, and opportunities to contribute data, experience and location knowledge to the production
and validation activities where appropriate.
13 | P a g e
Table 2-2: Summary (matrix) of potential roles associated with each element / product.
CLC-Backbone CLC-Core CLC+ CLC-Legacy
EEA / DG
GROW / EC
Definition,
coordination,
main user.
Definition,
coordination, main
user.
Definition,
coordination, main
user.
Definition,
coordination, main
user.
MS Review of
specification,
validation, user.
Review of
specification,
population with
national datasets,
user.
Review of
specification,
support to
production,
validation, user.
Production,
validation, user.
Industry Production. Implementation
and maintenance
DB infrastructure.
Support
production,
validation.
Validation.
2.52.52.52.5 Engagement with stakeholder community Engagement with stakeholder community Engagement with stakeholder community Engagement with stakeholder community
The success of the project and the long-term development of land monitoring in Europe is
intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant
stakeholders are aware of this activity and contribute their requirements and opinions to this
process. Revised versions of this document are foreseen at specific milestones towards a final
version for deliver in early 2018.
This specific deliverable, D3, is the second step towards the definition of the conceptual strategy and
the potential technical specifications for a series of new CLMS products. It aims at communicating
these details to the stakeholders involved in European land monitoring to elicit feedback, comments
and questions. The work reported here begins with a requirements review which goes beyond the
remit of the call for tender. The four elements and potential products of the 2nd generation CLC
within the CLMS are described in further detail in the following chapters. This version already
includes the first feedback from the NRC LC meeting in Copenhagen in October, 2017. For CLC-
Backbone, this document updates the draft versions of the technical specifications and the outline
implementation methodology provided in D2. These details are still open for discussion and able to
be reviewed by stakeholders and will ultimately be used as input for an open call for tenders to
industrial service providers for a production to start in 2018. For CLC-Core, this document provides a
more advanced outline of the technical specification and proposes a number of options for the
implementation. The technical design of CLC-Core will continue to be developed based on
stakeholder feedback in future steps. Similarly, the current thinking around CLC+ will continue to be
developed to illicit feedback to guide expansion of the technical specifications at a future step within
the project.
14 | P a g e
3 REQUIREMENTS ANALYSIS
In line with the principle of Copernicus this analysis in support of the development of new products
with CLMS is driven by user needs rather than the current technical capabilities of EO sensors and
processing systems. The technical issues will be dealt with in the chapters related to the products
focusing on specification and methodological development. This analysis was initially based around
the requirements set out in the original call for tender, but this has now been extended through
inclusion of recent work by the EC and ETC to provide a broader range of stakeholder needs. As this
project develops further, requirements will be integrated to support the increasingly detailed
specifications towards those for a complete 2nd generation CLC.
Although the MMU and thematic issues of CLC have been extensively reported for some time, the
target specification for a future improved harmonised European land monitoring product are less
clear. The proposed products should support European policies, particularly assist reporting
obligations on European level (and not on national level) on land use, land use change and forestry
(LULUCF), plans for an upcoming Energy Union and long-term climate mitigation objectives.
However, many of the policies that could exploit EO-based land monitoring information rarely give a
clear quantitative requirement for spatial, temporal or thematic specifications. The higher-level
strategic policies, such as the EU Energy Union often refer to the monitoring of other activities such
as REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to
rely on the products currently available. For instance, LULUCF assessments in Europe (independently
from national reporting obligations) use CLC, while at the global scale they use Global Land Cover
2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU). Some of these initiatives have
explored the potential of improved monitoring performance, such as the use of SPOT4 imagery with
a 20 m spatial resolution to show land-use change between 2000 and 2010 for REDD+ reporting, but
until very recently it has not been practical for these types of monitoring to become operational
globally.
A more fruitful avenue for user requirements in support of the 2nd generation CLC is to consider the
initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU
Biodiversity Strategy to 2020 was adopted the European Commission to halt the loss of biodiversity
and improve the state of Europe’s species, habitats, ecosystems and the services. It defines six major
targets with the second target focusing on maintaining and enhancing ecosystem services, and
restoring degraded ecosystems across the EU, in line with the global goal set in 2010. Within this
target, Action 5 is directed at improving knowledge of ecosystems and their services in the EU.
Member States, with the assistance of the Commission, will map and assess the state of ecosystems
and their services in their national territory, assess the economic value of such services, and
promote the integration of these values into accounting and reporting systems. The Mapping and
Assessment of Ecosystems and their Services (MAES) initiative is supporting the implementation of
Action 5 and, although much of the early work in this area on European level has used inputs such as
CLC, it is obvious that to fully address this action an improved approach is required especially for a
better characterisation of land, freshwater and coastal habitats, particularly in watershed and
landscape approaches. The spatial resolution at which ecosystems and services should be mapped
and assessed will vary depending on the context and the purpose for which the mapping/assessment
is carried out. However, information from a more detailed thematic characterisation and
classification and at higher spatial resolution are required which are compatible with the European-
wide classification and could be aggregated in a consistent manner if needed.
A first version of a European ecosystem map covering spatially explicit ecosystem types for land and
freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of
15 | P a g e
the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a
range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial
resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting
of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully
exploit the recently available Copernicus Sentinel data and land products, and move down to a
higher spatial resolution.
The recent EC-funded NextSpace project collected user requirements for the next generation of
Sentinel satellites to be launched in the 2030 time horizon. These requirements were analysed on a
domain basis and from the land domain those requirements which were given a context of “land
cover (including vegetation)” were initially considered. This was extended so that related contexts
such as “glaciers, lakes, above-ground biomass, leaf area index and snow” were also considered. A
broad range of spatial resolutions were requested from sub 2.5 m to 10 km, but two thirds of the
collated requirements wanted data in the 10 – 30 m region. As would be expected the requested
MMUs were also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field /
city block level for much of Europe. The temporal resolution / update frequency was dominated by
yearly revisions, although some users could accept 5 yearly updates and some wished for monthly
updates, although the reason was unclear. The users were less specific about the thematic detail
required and the few references given were to CLC, EUNIS, LCCS and “basic land cover”. The
preferred accuracy of the products was in the range 85 to 90%.
The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of
EIONET members involved with the CLMS and CLC production considering a range of topics in
advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the
potential improvements that could be made towards a 2nd generation CLC product. As expected, it
was noted that some CLC classes cause problems because of their mixed nature and instances on the
ground that are sometimes complex and difficult to disentangle. It was suggested that this situation
could be improved by the use of a smaller MMU so that the mapped features have more
homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear
features to be represented more realistically. Of the 32 countries, who responded to the
questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general,
a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to
remaining at the current 25 ha, but the majority favoured 0.54 to 5 ha. Thematic refinement is also
supported by around one third of the respondents, who requested improved thematic detail,
separation of land cover from land use, splitting formerly mixed CLC classes and the addition of
further attributes to the spatial polygons.
From the requirements review so far and considering the requirements put forward by LULUCF, it is
suggested that the ultimate product of a 2nd generation product would have a MMU of 0.5 ha and be
based on EO data with a spatial resolution of between 10 and 20 m. The thematic content would be
a refinement of the current CLC nomenclature to cope with changing MMU, separation of land cover
and land use for the needs of ecosystem mapping and assessment. The temporal update could come
down from 6 years to 3 years in the short-term, and potentially to 1 year in the longer-term.
4 0.5 ha being required by LULUCF
16 | P a g e
4 CLC-BACKBONE – THE INDUSTRY CALL FOR TENDER
CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a
baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-to-
wall coverage of CLC-Backbone shall draw on, complete and amend the picture started by the LoCo
(appr. 1/3 of total area) of the current local component products (Urban Atlas, Riparian Zones and
Natura 2000) as are currently shown in Figure 4-1. It will be comprehensive and effectively complete
the coverage of the EU-28 as a minimum, but preferably the whole EEA-39.
Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%
of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC)
CLC-Backbone will initially be produced by a mostly automated, industrial approach, where
production can be separated into different levels, including different degrees and sequential order of
automation and human interaction (Figure 4-2). Within CLC-Backbone landscape objects are defined
as vector polygons and identified on different levels.
Step 1: The first level of object borders (skeleton) represent persistent objects (“hard bones”) in
the landscape (Step 1).
Step 2: On a second level a subdivision of the persistent landscape features (Level 1) will be
achieved through image segmentation (“soft bones”), based multi-temporal Sentinel data within a
defined observation period (resulting in Level 2 landscape features = polygons).
17 | P a g e
Step 3: The output of CLC-Backbone – the delineated objects – represent spectrally and/or texturally
homogeneous features that are further characterised and attributed using the EAGLE land cover
component concept. The characterization of objects (segments) can be achieved using two options
• Attributing the segments using summary indicators based on a pixel-based classification of
fairly simple land cover classes (e.g. dominating class, percentage mixture of classes for each
segment)
• And / or attributing the segments based on spectral mean values of segments
Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori
information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)
the pixel-based classification of EAGLE land cover components and attribution of Level 2 objects
based on this classification
For implementation of CLC-Backbone the existing European land cover monitoring framework has to
be considered:
� Heritage: The concept has to take into account the existing pan-European and local Copernicus
layers and the spatial and thematic representation of the landscape. The technical specifications
to be provided by this contract are therefore based on a thorough review of available and
feasible datasets and their geometry and thematic content.
� Integration: CLC-Backbone will integrate selected geometric information from existing
Copernicus products at different steps within the process.
� Sequence of production: Ideally, the production of CLC-Backbone and of the other Local
Component data sets would be arranged in a sequential order that CLC-Backbone can build and
18 | P a g e
integrate on the most updated Local Component data to achieve best consistency among the
layers.
Realistically, the production will need to make use of selected elements of the existing Local
Component layers and High Resolution Layers and of existing ancillary data that might be not
identical with the reference year of the production. The time lag between input data and production
reference year does not comprise a limiting factor. As the final delineation of polygons is achieved by
image segmentation (“soft bones”) of Sentinel-2 images of the actual reference year, any derivation
of the “hard bone” based geometry compared to the actual landscape structure (due to outdated
data) will be compensated by the image segmentation step. Considering an annual land cover
change of approx. 0,5% (on CLC spatial scale) the existing layers and dataset still be able to provide a
reliable input that is adapted by the image segmentation step.
4.14.14.14.1 Spatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping UnitSpatial scale / Minimum Mapping Unit
CLC-Backbone shall address landscape features with a predefined MMU of 0.5 ha and a minimum
mapping width (MMW) of 10m (these values are subject to the ongoing consultation process and
might be revised).
Each object (delineated polygon) in CLC-Backbone will be encoded according to a quite basic land
cover nomenclature (between 5-15 pure land cover classes, in line with EAGLE Land Cover
Components) and additionally characterized – depending on the user requirements - by a number of
attributes (e.g. NDVI time series) giving more detailed information about the land cover and their
dynamics inside the polygon.
4.24.24.24.2 Reference year Reference year Reference year Reference year
The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e.
Sentinel-2 HR layer stack. It is suggested that the image stack covers one full vegetation season
ranging from 2017 to 2018, noting that the vegetation season in the south of Europe starts already
in October of the previous year.
4.34.34.34.3 EO data EO data EO data EO data
The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available
observations from the ESA service hubs to provide a multi-spectral, multi-date, 10-20 m spatial
resolution imagery as input. As the full satellite constellation of Sentinel (using 2A and 2B) will be
available operationally from late 2017 onwards, it is anticipated that last month of 2017 and full year
2018 is the first full vegetation period for a comprehensive multi-temporal coverage of Europe.
In case of availability of a full European-wide coverage of VHR data for 2018, its integration in the
processing chain should be considered.
The synergic use of Sentinel -1 (SAR) imagery shall be considered for enhancing the thematic
information, especially on thematic issues like soil properties (wetness), tillage and harvesting
activities. Recent studies on the use of merged optical-SAR imagery as well as SAR visual products
and the experiences in HRL production have confirmed their applicability for both semi-automated
and visual interpretation.
19 | P a g e
4.3.1 Pre-processing of Sentinel-2 data
As Sentinel-2 is an optical system the average cloud coverage influences the number of observations
significantly. Nowadays scenes are not ordered anymore according to their average cloud coverage,
but all cloud free (and shadow-free) pixels of an image can be analysed. The constellation of both
Sentinel-2 satellites improves the revisiting time to 3-4 days on average in Europe, having a more
frequent coverage in the north according to the overlaps of the S-2 path-footprints.
Each scene has to be pre-processed to correct for atmospheric conditions. ESA has evaluated two
different types of atmospheric corrections software algorithm in order to produce a L2A product:
• Sen2Cor and
• Maja
The Sen2Cor algorithm identifies clouds and shadows in a singular scene-by-scene approach,
whereas Maja (combination of MACCS (CNES/CESBIO) and ATCOR) identifies clouds and shadows
according to the complete image stack over time.
ESA will decide until end 2017 which algorithm and which data processing will be implemented for
atmospheric correction. Current plans foresee to start atmospheric correction using MAJA in Europe
from 1/2018 onwards and to reassess the algorithms in 2019.
The cloud detection is an important element in the processing chain, but for northern countries the
typical cloud coverage may still be a very limiting factor. Therefore, either archive data from longer
periods or data from additional sensors (optical and radar) should be considered to fill gaps due to
cloud coverage.
4.44.44.44.4 Copernicus and Copernicus and Copernicus and Copernicus and European European European European ancillary data sets ancillary data sets ancillary data sets ancillary data sets
There are a number of potential input sources beyond EO image data that can be used in the
delineation of landscape objects. Table 4-1 shows the potential datasets that were analysed to form
the majority of the “hard bones” or persistent boundaries in the landscape. Only those data can be
considered that provide an adequate spatial resolution. Those data that are finally suggested to
contribute to CLC-Backbone are marked in green, whereas all other data will be used to attribute the
thematic content of the GRID-database in CLC-Core.
Geospatial data (especially on roads, rivers, buildings, LPIS and land cover/land use) are held on
national level in many cases in higher level of detail. Some of the European data might be
substituted by national data given that the criteria concerning technical and licensing issues as
described in chapter 0 are met.
Besides free and open data on European level also commercial data could be considered as input
e.g. street data (e.g. Navmart – HERE maps, etc.) or the TanDEM-X DEM for small woody features
and structure information for tree covered areas.
20 | P a g e
Table 4-1: Overview of existing Copernicus land monitoring and other free and open products which were analysed as potential input to support the
construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone. The products finally proposed for further processing are marked in
green.
product MMU Minimum
width
Format Potential use for constructing basic
geometry
Reference year
Pan-European
CORINE Land Cover and
accounting layers
25 ha status layer
5 ha change layer
100 m Vector outlines of basic vegetation types in
remote areas without roads and
settlement network
2012, 6-year
update cycle
HRL imperviousness 20 m pixel (0,04 ha) - Raster 20 m HR satellite imagery 2015, 3-year
update cycle
HRL tree cover density 20 m pixel (0,04 ha) - Raster 2015, 3-year
update cycle
HRL forest types 0,5 ha - Raster Minimum crown cover 10% 2015, 3-year
update cycle
HLR Permanent water bodies 20 m - Raster 20 m HR satellite imagery 2015, 3-year
update cycle
European Settlement Map
(ESM)
10*10 m pixel (0,01 ha) - Raster JRC one-off scientific product: 2.5 m
VHR imagery; scientific product
reference year 2012
2014, no update
foreseen
GUF+ 10*10 m Raster S1 10m and Landsat 30m, GUF+
2018 will be based on S1/S2
2014-15
HRL Small woody feature 0,002 ha (raster product) Linear
elements
Vector and
Raster (5m
and 100m)
Streamlining halted due to VHR 2015
concerns
2015
HRL phenology 10 m - Raster Only attribution in CLC-Core 2018, likely 3-
years
Local Components
Urban Atlas 0,25 ha – 1 ha 10 m Vector Delineation of roads as polygon
feature and outer border of
settlement structure (large cities)
21 | P a g e
product MMU Minimum
width
Format Potential use for constructing basic
geometry
Reference year
Riparian Zone 0,5 ha 10 m Vector Delineation of rivers and roads as
polygon feature.
N2K product 0,5 ha 10 m Vector
RPZ Green linear elements 0,05 ha – 0,5 ha < 10 m
width
< 100 m
length
Vector Delineation of small woody
vegetation elements
National products
Variety of products for land
cover, land use, population,
environmental variables
Aggregated data according to EAGLE
data model
irregular
Accompanying/Ancillary
Layers
IACS/LPIS Varying from block to
parcel level, depending on
national system
- Vector Freely available and accessible for
less than 1/3 of Europe. Increasing
availability from year-to-year due to
INSPIRE regulation. Varying thematic
content (from reference parcel only
to detailed crop types)
2017, Yearly
updates
Open Street Map - roads n.d. Line Linear transportation network
(centre lines)
Up-to date; full
time history
Open Street Map – buildings n.d. Vector -
Polygon
Delineation of single buildings
(polygons)
Up-to date; full
time history
Open Street Map – land use TBD. Vector -
Polygon
Polygon selection of settlement
areas and transportation
infrastructure according to OSM land
use tags http://osmlanduse.org
Up-to date; full
time history
HERE maps (Navmart) n.d. Line Commercial layer to replace OSM or
national road databse
Regular updates
22 | P a g e
product MMU Minimum
width
Format Potential use for constructing basic
geometry
Reference year
EU Hydro (Copernicus
reference layer)
Line Geometric quality derived from VHR
images, very high location accuracy
Tbd.
WISE WFD – surface water
bodies
Waterbodies as polygons
Waterbodyline as line
geometry
Polygon + line Based on WFD2016 reporting (UK
and Slovenia only for viewing)
6 years (next
update 2022)
European coastline Line Has been produced by Copernicus
for HRL production.
Crowd-sourced data (citizen
observatories)
n.a. Point Observations from citizens as
promoted through Horizon 2020
research initiatives (e.g. LandSense,
groundtruth 2.0)
Irregular
23 | P a g e
Selected classes of the local Component products, Urban Atlas, Riparian Zones, and Natura 2000,
obviously have valuable information to offer for the production of CLC-Backbone. The HRL layers will
be mainly used for populating CLC-Core and only as one exception as well in CLC-Backbone.
The usage of additional geospatial information for constructing Level 1 “hard bones” is based on the
following arguments:
• European landscape is a highly anthropogenic transformed landscape, where transport and
river networks (beside others) form the basic subdivision of landscape
• The location of transport and river networks are fairly known due to geospatial data on
national and/or European scale
• Many of the borders defined by transport and river networks can be identified as well as
features directly from Sentinel-2 images, but this identification is combined with partially
very high costs and lower accuracies
The geospatial datasets are used in the delineation of level 1 “hard bones” in two different forms:
• As linear networks that define the border of landscape objects (e.g. parcels that are
surrounded by roads and rivers)
• As polygons that define a-priori landscape objects (e.g. wide roads, wide rivers)
Concretely we suggest the following use of existing COPERNICUS and ancillary data for the
production of CLC-Backbone (i.e. mainly for the provision of geometric information) and CLC-Core
(i.e. thematic related information).
For more detailed information on the suggested production methodology, please refer to chapter
4.8.
• CLC-Backbone:
o Urban Atlas:
� Use of outer delineation of linear transport infrastructure (class 122xx roads
and class 122) for Level 1 “hardbone” delineation
� Use of outer delineation of water areas (class 50000 water) for Level 1
“hardbone” delineation
� Use of outer delineation of cities as “persistent” segments (Level 1) for the
delineation of settlement areas (class 111xx continuous urban fabric, class
112xx continuous urban fabric, 11300 isolated structures and class 12100
industrial and commercial units).
o Riparian Zones:
� Use of river delineation (class 911 interconnected water courses and 912
highly modified water courses and 913 separated water bodies) for Level 1
“hardbone” delineation.
� Use of outer delineation of cities as “persistent” segments (Level 1) for the
delineation of settlement areas (class 1111 continuous urban fabric, class
1112 dense urban fabric, 1113 low density fabric, 1120 industrial and
commercial units).
24 | P a g e
o Natura 2000
� Using in analogy to RZ and UA the same classes for delineation of rivers,
transport infrastructure and urban outline.
o HRL forest types:
� Creation of tree mask based on the HRL forest types, as this layer already
applies a useful tree cover density threshold (>30% TCD) and a MMU of 0,5
ha for the Level 1 “hardbone” delineation of forest boundaries.
o OSM – roads
� Roads > 10m: delineation of roads as polygons according to S-2 pixel based
information
� Roads < 10m: transportation network outside the LoCo as a-priori border
between adjacent polygons (Level 1 object)
o EU-Hydro
� River network outside the LoCo as a-priori border between adjacent
polygons (Level 1 object)
o HRL imperviousness
� Derivation of outer boundary of settlements (Level 1 objects)
� Refined with ESM – European settlement map
o OSM – buildings
� Enhancement of settlement mask using buffer methods to delineated and
integrate OSM single buildings
o OSM – land use
� Enhancement of road and railway infrastructure using polygon area from
land use dataset with specific OSM tags (e.g. “roads”); full OSM tag-list
available in Annex 1
� Enhancement of settlement mask in areas, where OSM-building is not
complete, using polygon area from land use dataset with specific OSM tags
(e.g. “residential”); full OSM tag-list available in Annex 1
• CLC-Core:
o National data (MS contribution)
� National datasets (generalized / resampled if not freely available in original
resolution)
� Manual input by MS without readily available sufficient national datasets
� Land Use data
� Crop types
� Land Use intensity
25 | P a g e
o Specific data input to be defined for CLC-Core to maintain CLC-heritage and to
improve thematic content (e.g. detailed LU, LC or habitat information). UA, RZ, N2K
� Use of thematic information (aggregated to one single harmonized
nomenclature) for populating the grid database.
o all HRLs
� Use of thematic information (assigned to EAGLE land cover components) for
populating the grid database.
o OSM – land use
� Attribution of land use according to existing cross-walk between OSM land
use tags and CLC nomenclature Level 2
o Crowd-sourced data
� Data from citizen observatories become increasingly available either as
validation/verification dataset (e.g. LacoWiki) or as attributive data source
(e.g. FotoQuest GO app, specific campaigns and social media like FlickR,
foursquare, etc.)
o CLC accounting layers
� Data from CLC used for identification of thematic features not covered by
any other (finer resolution) datasets, e.g. mineral extraction areas, airport,
glaciers.
o CLC
� Outlines of basic LC types in remote areas without road and settlement
network and without forest
Mineral extraction sites
o EO parameters (from Sentinel-2 image stacks)
� This includes daily vegetation trajectories as well as key indicators for the
characterization of the growing season (start day, end day) and biological
productivity (maximum, amplitude, integral)
Any geometric information ingested in CLC-Backbone from existing products (partially 3-6 years
older than current RS data) will be “cross-checked” and corrected during the image segmentation
process. Every geometry that is outdated due to the time lag of the ancillary data - and where recent
land cover changes occurred - is updated in the phase of the Level 2 “soft bone” segmentation,
where actual EO data are used.
4.4.1 Completeness OSM road data
One major critic when using crowd-sourced data is their incompleteness and varying quality. For
OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, 20175) have
shown a completeness in Europe of by far more than 96% for the large majority of the EEA 39
countries (the figures for the completeness per country are available in the Annex). According to
5 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698
26 | P a g e
their study only Serbia, Bosnia-Hercegovina and Turkey have completeness values below 90%. These
three countries may be subject for a refinement of roads using additional national data sources.
In the Figure 4-3 below an impression of the completeness of OSM in four countries (Estonia,
Portugal, Romania and Serbia) is provided based on aerial images and Sentinel-2 background data.
27 | P a g e
Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Nova), Romania
(near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Senitnel-2 cloud free services as
Background (left to right).
28 | P a g e
4.54.54.54.5 National data National data National data National data
The use of national data replacing the foreseen European geospatial data for the creation of the CLC-
Backbone could be considered if the national data fulfil the following prerequisites:
• Provision of a consistent set of information across European countries, i.e. similar scale and
amount of geometric detail, comparable thematic information,
• Provision of data at no cost (or at least reasonable costs, if cost reduction in the overall
process justifies this costs)
• Provision of data in a technically useful form (i.e. roads as a connected network, not as DXF-
like line work for drawing).
• Provision of data to industrial service providers, potentially outside the country of the data
provider.
• Support of the free and open data policy of the end product, including the nationally
provided input data.
• Deficiencies in European dataset concerning completeness and fitness for purpose in the
respective country
In case these prerequisites are fulfilled, the national data can be included based on an evaluation of
economic (e.g. work load for creation of harmonized datasets) and technical criteria.
Candidates for national data (to be used for Level 1 “hard bone” delineation) are the first of all
datasets that fall under INSPIRE Annex I and should be available in harmonized version from
November 2017 onwards:
• Transportation network (including information on tunnels)
• Buildings
• Water courses (running and standing water)
• Coastline
4.5.1 Example: Rivers and lakes
One example for a joint European (EU 28) dataset is the reference spatial data set of the water
framework directive (WFD) that is part of the water information system for Europe (WISE)6. They
compile national information reported by the EU Member states to one homogeneous dataset.
According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is
available in a version 2010 and 2016. Such datasets may provide a valuable data source for the
generation of “hard bones”, but still do not provide the optimal solution. In the case of the WISE
WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as
countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only
available for EEA and EC for visualisation purposes). Compared to the EU-Hydro dataset they provide
a higher geometric quality, but are less complete, as smaller rivers of catchments less than 10 km2
are not included by definition.
6 https://www.eea.europa.eu/data-and-maps/data/wise-wfd-spatial
29 | P a g e
4.5.2 Example: roads
The project ELF – European Location Framework7 – has compiled overviews for existing national
services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single
point of access for harmonised reference data from National Mapping, Cadastre and Land Registry
Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for
European reference data, but at the current point in time the data situation even for core national
data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing
(semantically and geometrically) the various dataset. The advantage of using national data instead of
European wide available data has to be carefully evaluated with regard to costs and benefit. First of
all, it is still a challenge to find the appropriate dataset, and even specialised projects like ELF that
would like to serve as a single-entry point do not manage to provide the full picture. In the case of
roads according to the ELF-project only Iceland, Norway, Finland, Denmark and Czech Republic
provide adequate services (Figure 4-4). It is very likely that other national services do exist, but
cannot be found easily. This has to be considered, when replacing European wide existing data
sources with national data.
Figure 4-4: Search result for the availability for national INSPIRE transport network (road) services
using the ELF interface (Nov. 2017).
4.5.3 Example: land parcel identification system (LPIS)
The land parcel identification system (LPIS) in Europe is not suited to be integrated into the Level 1
“hard bone” process, as the data is currently either
• not available on national level (appr. 50%)
• partly only available on regional level (Spain, Germany)
• do not represent the same spatial scale across countries
• are not available with the same semantic information (e.g. grassland/arable land
differentiation)
• not accessible for all users / uses
7 http://locationframework.eu/
30 | P a g e
Figure 4-5 illustrates the availability of LPIS datasets across Europe. Availability in this sense means
the possibility to physically download the GIS data in the respective country / region.
Figure 4-5: Overview of available GIS datasets LPIS and their thematic content © Synergise, project
LandSense.
4.64.64.64.6 Specification for gSpecification for gSpecification for gSpecification for geometric eometric eometric eometric delineationdelineationdelineationdelineation
The vector geometry of the CLC-backbone is derived using a two-step approach (Table 4-2).
• The first level (“hardbones”) is derived using the partition of landscape based on existing
geospatial information (transport network, river network, urban outline)
• The second level (“softbones”) is derived by applying an automatic image segmentation
within the first level objects
The resulting polygons have to comply with the MMU and MWU and will be further attributed
according to the thematic specification (chapter 4.7).
Persistency in landscape is always a relative term, as landscape objects are subject to various
temporal dynamics. Within the traditional CORINE Land Cover mapping approach changes were
perceived as changes of land cover (change between two categories of the defined nomenclature)
within a 6-year period – such changes can be referred to as “status changes”, meaning a change
between a specific number of years, but not the changes within a selected year. However, in parallel
changes occur in much higher temporal frequency e.g. in agricultural management. Those changes
are referred to as “cyclic changes” and represent frequent land cover changes mostly within one
31 | P a g e
year or one vegetation cycle (e.g. on arable fields from bare soil to vegetated areas and vice versa).
Due to the recent availability of multi-temporal observations in the order of 1-2 weeks these kind of
changes can be observed on an appropriate level of scale (e.g. farm management unit).
Therefore, within CLC-Backbone those features of the landscape are defined as persistent objects
that are very unlikely to be altered within one year. These structures (roads, settlements) are very
unlikely to be removed, but may be subject to expansion (linear networks: roads, railways and rivers;
physical boundaries like urban outline). A robust reference geometry (“hardbones”) built up of
transportation network, river network (incl. lakes) and urban outline (and potentially forest outline)
will form the first level of division of landscape based on a-priori data.
Table 4-2: Main characteristic of Level 1 objects (hardbone) and level 2 objects (softbones)
1. Level objects 2. Level objects
Name Hard bones Soft bones
Method Partition of landscape according to
existing spatial data
Image segmentation
Input data Persistent landscape features:
Transport infrastructure (roads,
railways excluding tunnels); river
network, lakes, coast line, urban
outline, forest outline
Sentinel-2 complete time-stacks of one
vegetation period
Special issue Differentiation between linear
networks as border of segments
and (larger) linear networks that
comprise polygons
Pre-processing (atmospheric and
topographic) of S-2 images; cloud
detection and cloud free observation in
Nordic countries
Advantage Very good cost/benefit ratio; no
doubling of efforts, reuse existing
European geospatial data
infrastructure;
Independent data source;
Clearly defined observation period
Disadvantage Actuality of data might be
outdated; large vector processing
facilities needed; effort to
integrate and collect appropriate
data
Reliability and reproducibility of
segmentation algorithms
Concerns European data vs. national data Technical feasibility (big data processing);
cooperation with data centres necessary
4.6.1 Level 1 Objects – hardbones
The idea behind the Level 1 objects (“hardbones”) is to derive a partition of landscape that is
reflecting relatively stable (persistent) structures and borders. The borders of these landscape
partition are built from anthropogenic and natural features. The transportation network like roads
and railways (without tunnels) built the major component, followed by the partly natural and man-
made structures of rivers and lakes. Other features like the borders of settlements and forest are not
as persistent as roads and rivers, but give an additional partition of landscape that is fairly known
through existing geospatial data. As geospatial data can be outdated depending on the reference
year the hard bone partition of landscape will not be perfect. But it does not have to be perfect, as it
is only an initial step and missing elements or correction of outdated polygons will be integrated
within the second level (soft bones) of the production process, as the second level is based on actual
32 | P a g e
and independent image segmentation. The reason why the a-priori partition of landscape is
promoted here are a higher cost-benefit analysis.
If an image segmentation is used from the very beginning (without the “hard bone” working step),
much more efforts are needed to derive already known features. And some of the features (small
roads) are not recognisable in a 10 by 10 m pixel scale thus leading to polygons that are not
acceptable for most of the foreseen applications.
When building the spatial framework, there is a long list of potential sources for persistent object
geometries. Some can be derived from existing Copernicus products such as the HRL
imperviousness, and Urban Atlas) or European reference layers (EU-Hydro) other additional data
sources like crow-sourced data (OSM) or non-operational scientific products (e.g. European
settlement mask - ESM). National data may replace the named input data in case the criteria
mentioned under chapter 0 are met.
From the long list of input sources, the above-mentioned optimal selection (see Table 4-1) a most
appropriate integration processes is selected, to provide the best available spatial framework at pan-
European level. A limited visual inspection and correction of the input data in this phase is necessary
for all data that do not comply with the requested accuracies (10m positional accuracy). As priority,
the wider OSM roads (European major highways and roads wider than 10m) and the accuracy of the
delineation of the local components has to be analysed
4.6.1.1 Types of borders
The first step will therefore be to build the Level 1 objects (“hard bones”) of the geometric borders
from the ancillary data capturing the persistent features of the landscape such as road, rivers,
railway lines etc. The information source for the hard bones will be existing vector data for roads,
railroads (OSM) and river networks (CCM / Riparian Zones), supported by ancillary datasets (OSM,
HRL-imperviousness, ESM) and Copernicus Local components for outer boundaries of settlement
areas and HRL for tree covered areas (HRL forest types).
Technically speaking the first level objects will identify both
• persistent line objects and
• persistent polygon objects
It is important to note that in this stage only geometric information is taken over from existing
geospatial datasets, but without any thematic information. The information, if a specific border line
constitutes a road, river or urban border is irrelevant for the further process in CLC-backbone, as the
thematic classification is derived in the last step completely independently from any existing
geospatial input data, solely based on the spectral and textural information from the Sentinel-2
images.
The majority of the transportation network (roads and railways) as well as smaller rivers are
represented as line network, where each line identifies the centreline of the objects and present a
border of a Level 1 landscape object.
33 | P a g e
Roads and rivers wider than the defined MMW of 10 m (still to be defined through the consultation
process) have to be represented as polygons.
• Rivers: For wider rivers, the delineation in the Riparian Zone 2012 is considered complete
and quite accurate within Europe, as the dataset covers all rivers above Strahler level 3 (and
will be extended to Strahler level 2 until Q4/2018).
• Roads and railway: As a normal 2-lane road does not meet the requirement of being wider
than 10m the selection of roads as polygons is restricted to the primary road infrastructure
(highways). All OSM-roads with the special tags “motorway” and “trunk” have to be
transformed from line features to polygon features using a mixed automated buffering
method combined with a visual inspection and correction step based on the SENTINEL-2
data. Roads and railways within tunnels have to be excluded from the hardbone, as they do
not establish a recognisable landscape feature on the ground.
Beside the typical linear networks persistent features in the landscape are identified as settlement
outside borders forest borders.
• Settlement borders: A best estimate of the outside borders of settlements can be derived in
combining existing datasets – first of all the LoCo Urban Atlas and outside of the coverage of
the LoCO Urban Atlas using the HRL imperviousness (refined with the ESM) and in
combination with a building-layer (e.g. OSM). Morphological operations are needed to adapt
the combined input layer to comply with the MMUs defined in the CLC backbone product
specification and to transform the mostly vector oriented input data to a raster dataset in
line with the Sentinel-2 grid system.
• Forest borders: For the partition of landscape through forest borders all forest polygons
from the LoCo product suite (UA, RZ, N2K) can be used. Outside of the LoCO the HRL forest
types (FTY) layer will be used as it applies already a specific forest definition with a density
threshold of >30% and a MMU of 0,5 ha.
4.6.1.2 Hierarchical order of borders
As the input borders partly overlap it is important to define a relative weighting and importance of
the input layers. The ordering is derived from the quality, type and geometric representation (line or
polygon) of the input data. The following order is specified (with increasing importance):
• HRL forest type (as polygon)
• LoCo forest classes (as polygon, from UA, RZ, N2K)
• Settlement outline (as polygon, from HRL imp, ESM, OSM)
• Road + railway (as line, excluding tunnels)
• Rivers (as line)
• Road + railway (as polygon)
o from UA …higher level streets
o from UA …lower level streets
• Rivers (as polygons)
o from RZ (code 911)
• Lakes (as polygons, from EU-hydro, WFD, national data)
• Coastline (as line)
• AoI (as line)
34 | P a g e
4.6.2 Level 2 Objects – softbones
The aim of this second level segmentation is to derive objects that can be differentiated spatially
according to their land cover and land use aspects that have an influence on the spectral response
and behaviour throughout a year. With the increasing number of observation dates per pixel
throughout a year the likelihood to find homogeneous management units increases, but at the same
time pixel based noise is increased as environmental conditions within a field parcel may vary
significantly throughout a year (e.g. soil moisture content). Service providers will have to balance the
number of acquisitions to maximize the identification of single field parcels and on the other side to
reduce “salt and pepper” effects due to varying spectral response.
The level 2 landscape objects represent in the ideal case a consistent management of field parcels,
and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a
year. As an example, different settlement structures (e.g. single houses or building blocks) should
appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial
built-up pattern) and will therefore be delineated as different polygons. The actual cultivation
measures applied on the land will also influence the type of delineation as e.g. in agriculture the field
structure is visible in multi-temporal images due to the differences in temporal-spectral response
according to the applied management practices on field level (mowing, ploughing, sowing).
4.6.3 Accuracy of geometric delineation:
The geometric accuracy is defined with
• Appropriate size:
o Too large polygons: not more than 15% of all polygons
o Too small polygons: not more than 20% of all polygons
� Any deviation of more than 20% of polygon area is considered too small or
too large
• Appropriate delineation
o Shift of border: max. 20m shift
� not more than 10% of perimeters is shifted more than 20m
4.74.74.74.7 Specification for tSpecification for tSpecification for tSpecification for thematichematichematichematic attributionattributionattributionattribution
The polygons (hard bones and soft bones) derived from the geometric specification are attributed
according to a set of simple land cover classes. There are various options to attribute the thematic
content of the polygons
1. Option: Attribution with basic land cover classes based on a pixel-based classification (10m
or 20m) of all pixels within the object
2. Option: Attribution with basic land cover classes based on spectral properties of the object
itself
3. Option: combination of Option 1 and Option 2
35 | P a g e
Option 1: pixel based classification
Similar to the HRLs a pixel based classification of crisp land cover classes (no intensities) is conducted
based on the SENTINEL-2 images. The number and type of land cover classes is discussed below. The
object (polygon) is attributed with various statistics derived from the pixel based classification
• Dominating class (majority)
• Percentage of class i…j
• Ordered list of class percentages (3-5 most dominating classes)
• Count of regions per class i…j (one region is one spatially homogeneous area with only one
class type)
Option 2: Object based classification
The objects (polygons) are classified using the same crisp land cover classes as above, but not on a
per-pixel base, but based on the spectral values of the object itself. Statistical parameters like mean
value, standard deviation and combination of bands (e.g. NDVI) are helpful to define the class.
Option 3: combined pixel based and object based classification
Combination of Option 2 and Option 3. In Option 3 a polygon is as well described using the pixel-
based approach and in addition a classification of the polygon itself.
4.7.1 Thematic classes
Accuracy:
The accuracy for the thematic differentiation is defined with
• Option 1: 5 classes
o 90% pixel based overall accuracy
o 95% object based overall accuracy
• Option 2: 9 classes
o 85% pixel based overall accuracy
o 90% object based overall accuracy
Thematic detail:
After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape
object is then labelled by simple land cover classes. The descriptive nomenclature which is applied
for the labelling contains the following EAGLE derived Land Cover Components:
• Option 1: 5 classes
1. water,
2. permanent vegetation,
3. transient vegetation,
4. bare ground / sealed surface,
5. snow / ice
36 | P a g e
• Option 2: 9 classes
1. Sealed surface (buildings and flat sealed surfaces)
2. Woody – coniferous
3. Woody – broadleaved
4. permanent herbaceous (i.e. grasslands)
5. Periodically herbaceous (i.e. arable land)
6. Permanent bare soil
7. Non-vegetated bare surfaces (i.e. rock and screes, mineral extraction sites)
8. Water surfaces
9. Snow & ice
Remark to class woody:
Member countries have expressed the wish to further differentiate the class woody into “trees” and
“shrubs”. This differentiation would be useful but can currently not be derived with sufficient
accuracy from spectral attributes alone. A robust differentiation is only possible by using a
normalized difference surface model (nDSM).
In the CLC-Backbone nomenclature code list no land use terminology has been used. This is on
purpose to avoid semantic confusions between land cover and land use. From a purist’s view point
only land cover can be derived from remote sensing data. However, due to the multi-temporal
observations a number of land use and land management characteristics can be derived as well.
Nevertheless, for land cover semantic terms like “grassland” or “arable land” have to be avoided. We
therefore suggest to characterize landscape objects (Level 2 polygons) according to the classes that
are characterized by land cover definitions only. This is especially true for agricultural areas, where a
differentiation of arable and grassland areas is principally a key element. What can be observed
using RS images is however only if within one year a specific agricultural parcel is ploughed; as
ploughing means a reduction of biomass and the occurrence of bare soil. Therefore, ploughed
parcels can be differentiated from non-ploughed parcels as the bare soil count is a good indication
for the separation of these two classes:
• permanent herbaceous
o Permanent herbaceous areas are characterized by a continuous vegetation cover
throughout a year. No bare soil occurs within a year. These areas are either
unmanaged or extensively managed natural grasslands or permanently managed
grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or
even set-aside land in agriculture. For managed grasslands, the biomass will vary
over the year, depending on the number of mowing (grassland cuts) or grazing
events.
• periodically herbaceous
o Periodically herbaceous areas are characterized by at least one land cover change (in
the sense of EAGLE land cover components) between bare soil and herbaceous
vegetation within one year. Depending on the management intensities these areas
can have up to several changes between these two EAGLE land cover components
within a year. Normally these areas are managed as arable areas.
37 | P a g e
A permanent and managed grassland as defined in IACS/LPIS may be subject to be ploughed every 3-
5 years for amelioration purposes followed by an artificial seeding phase a renewal of vegetation
cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6
years. This has to be considered, when evaluating shorter time phases, as by occasion a managed
grassland may just be subject to renewal within the observation period of 1 year. The differentiation
from arable areas with much higher frequencies of ploughing events (either more than once a year,
but on minimum 2 times within 3 years).
All kind of complex classes either as mixture of existing land cover classes (e.g. CLC-mixed classes) or
complex classes in the sense of ecological highly diverse classes (e.g. wetlands) are avoided by the
classification, as it solely concentrates on the classification according to the surface structure and
properties.
A detailed technical description how to model the temporal changes is given in Annex 1.
Option to be discussed:
In addition to the class labels the following spectral attributes are attached to the landscape objects
(in the data model: parametric observations):
• Vegetation trajectories
o NDVI profiles (e.g. weekly observations) within the vegetation season
• Aggregated indicators for vegetation indices (within vegetation season)
o Median of NDVI
o 20 % percentile
o 80% percentile
• Further indicators tbd.
o Band specific indicators
o Time specific indicators
o Texture indicators
Important note:
As the main emphasis is on the geometric delineation of landscape objects neighbouring objects
may have the same land cover code. They should be kept as unique entities, as they can be
differentiated in specific attributes that may be added in a later stage to this polygon (e.g.
broadleaved trees vs. needle-leaved coniferous trees).
Apart from the technical specifications for the foreseen tender for CLC-Backbone additional
attribution by various data sources is an option for future enrichment of classes. First of all, member
countries can be asked to populate the resulting geometries with national data following the
prescribed classification system or even more detailed EAGLE land cover components. In addition,
crowd-sourced data may be used to characterize the land use within these polygons (either by in-
situ point observations or by wall-to-wall datasets e.g. OSM land use).
4.84.84.84.8 Proposal for production methodProposal for production methodProposal for production methodProposal for production methodologyologyologyology
Within this consultation version (Nov. 2017) of the CLC-Backbone specifications an outline of a
possible production methodology is included to illuminate the processing that is envisaged to
generate a detailed geometric structure as a geometric vector skeleton which is subsequently
described according to a simple land cover classification (approx. 10 classes) on a per-pixel level.
38 | P a g e
The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step
(hardbone) boundaries of relatively stable areas are derived based on existing (European wide or
comparable national) data sets. In a second step image segmentation techniques based on multi-
temporal Sentinel-2 data are applied to further subdivide the Level 1 objects into smaller units (Level
2 objects). The third step contains an attribution of the landscape objects derived from steps 1 and 2
based on pixel-based pure land cover classification (EAGLE land cover components).
The term “geometric skeleton” is used to stress that the added value of CLC-Backbone is the
breakdown of the landscape into homogenous entities. These entities represent zones (delineated
areas) that can be attributed using spectral indices and classified according to basic land cover
classes. The basic land cover classes are identical with a spatial aggregation and temporal sequence
of EAGLE land cover components.
4.8.1 Step 1: Definition and Selection of persistent boundaries
As various geospatial data are integrated their order a hierarchy is of major importance. In chapter
4.6.1.2 the hierarchy of input data is defined that may be adapted to improve the quality of the
results.
OSM data delivers in addition to the centreline of roads and railways, a polygon layer expressing the
land use. As the land use is labelled using a free text assignment (actually any text can be attributed
to the polygons) the land use is not modelled using a controlled vocabulary. However, the project
OSM-land use (http://osmlanduse.org) has shown that it is possible to transform the land use OSM
labels into CORINE Land Cover nomenclature (see Annex 1). For roads, the land use labels “roads”
and “traffic” and for railways the label “railway” were used to define the transportation areas (land
use class). Tunnels are identified with a special tag in OSM and have to be EXCLUDED from the
analysis.
Woody vegetation and settlements are as well rather persistent landscape features. Their outer
boundary can be initiated from HRL forest types and the Local component products (boundary taken
from aggregation of polygons on Level 1 of nomenclature). The geometric outline of settlements can
be initiated using a density threshold (e.g. >5%) of the HRL imperviousness (refined with European
Settlement Mask ESM) combined with a buffer and shrink gap filling algorithm (e.g. 50m) to reduce
the typical “salt and pepper” effect especially in settlements due to smaller green garden areas
within built-up areas. The commission errors in the HRL imp and ESM are relatively low, but
omission errors (settlements that are not detected) occur occasionally. The settlement outline
border will be amended using the OSM building vector layer and the OSM land use layer (residential
and industrial). As the built-up area is always larger than the area occupied by buildings, a buffer of
10m can be applied to derive from single OSM buildings a first built-up outline. The same buffer and
shrink operation (using 40m buffer) can be applied to this built-up outline. All urban data sets
together will form a realistic outer boundary of settlements. The HRL imperviousness has to be
refined in this step, as it also includes sealed areas OUTSIDE of settlements (road infrastructure). The
combination of the two different settlement layers (HRL and ESM, OSM) should reveal the most
realistic8 outer boundary of settlements. The inner differentiations of settlements are derived (1)
through the linear networks (transportation and rivers) and (2) through the subsequent image
segmentation.
8 But not perfect outer boundary – post-processing steps will be needed to refine these boundaries.
39 | P a g e
The urban outline can be automatically checked for omissions (entirely missing villages or larger
parts of villages) using the “Global Urban Footprint +” (GUF+) raster data set on sealed areas from
DLR. Data access conditions to GUF+ are considered as free and open.
For the delineation of the forest outline outside the local components the HRL forest types layer will
be used as it applies already a specific forest definition with a density threshold of >30% and a MMU
of 0,5 ha.
The outcome of this first step shall be a coarse but robust delineation of landscape objects, including
the a-priori information of the kind of land cover according to the integrated datasets, which can be
used as further indication for the subsequent classification of smaller landscape objects. Any
potential error (systematic or random) that is introduced by the method (“burning in” from ancillary
data) can be corrected in the subsequent Step 2, given that the error is larger than the MMU and is
detectable from S-2 images.
The final delineated objects have to be cleaned according to the mandatory MMU of 0.5 ha. All Level
1 landscape objects smaller than the MMU have to be merged with the neighbouring polygons,
using the method “the longest common border”. In Annex 2 an illustration of the working procedure
is provided.
A look-and-feel illustration of the final geometric delineation in step 1 (colour coded according to the
source data) is provided Figure 4-6.
Figure 4-6: illustration of results of step 1 – delineation of Level 1 landscape objects. The border
defines the area for which Urban Atlas data is available (shaded) versus the area where not UA data
was available (not shaded).
40 | P a g e
A cleaning of endpoints is the last step in the process. All lines with open endpoints have to be
removed from the final geometric skeleton.
4.8.2 Step 2: Image segmentation
A second landscape object level (Level 2) represents the “soft bones” of landscape and is based on
the automatic segmentation of available EO data. These second level objects subdivide the first level
landscape objects into spectrally and/or texturally homogeneous subunits according to the
predefined minimum mapping unit of 0.5 ha. For this second level assistance and available data from
MS can be foreseen in the long term and for a short-term solutions the image segmentation derived
directly from Sentinel-2 HR data can provide a first and intermediate solution.
The segmentation based on spectral information from multi-temporal EO-imagery to cover spatial
differences caused by seasonal variation will be used to subdivide the area into spectrally and/or
texturally homogeneous and stable spatial units, within the constraint set by a pre-defined MMU
and the hard bones framework (Level 1 landscape objects). The resulting segments still represent
pixel-based borders and will be vectorised and integrated with the hard bones to produce a
“polygon map” of “spectrally homogeneous areas”. There is no classification involved at this stage –
only segmentation. The Level 2 landscape objects will be used in step 3 as input for the labelling and
description of these polygons.
41 | P a g e
5 CLC-CORE – THE GRID APPROACH
The CLC-Core product should be seen as an underpinning semantically and geometrically
harmonized “data container” and “data modeller” that holds, or allows referencing of, land
monitoring information from different sources in a grid-based information system. The grid
database will essentially store geospatial and thematic data sources, such as CLC-Backbone, CLC,
Local Component data, HRLs, in-situ as well as Member State provided information. The contents of
the grid database can be exploited directly within grid-based analyses or alternatively subsets of the
grid database can be extracted and analysed by spatial queries driven by objects from an external
vector dataset. In this way, the CLC-Core can support the other elements of the 2nd generation CLC,
for instance, a CLC-Backbone object could gain additional thematic information by aggregating or
analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next
chapter).
5.15.15.15.1 BackgroundBackgroundBackgroundBackground
Textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 19929, Congalton
199710). The “vector GIS” does, like the paper map, attempt to represent individual objects that can
be identified, measured, classified and digitized. The “raster GIS” attempts to represent continuous
spatial fields approximated by a partition of the land into small spatial units with uniform size and
shape. The “raster GIS” therefore have structural properties that make them particularly suitable for
use in modelling exercises.
A grid is a spatial model falling somewhere between the vector model and the raster model. The grid
looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is
classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is
characterized by how much it contains of each land cover class or other information. The grid is, as a
result, a more information-rich data model than the raster.
The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on
CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units
of e.g. in this case 1 X 1 km. The result can either be a “raster” where a single CLC class is assigned to
each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class
found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector
representing the complete composition of land cover classes inside the unit is assigned to each grid
cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases
lost, but the overall loss of information is less in the grid than in the raster.
The loss of information when using a grid structure can be further reduced if the grid size is selected
to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha
(50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products
derived from the grid-based information must also consider grid size in relation to its requirements.
9 Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U.,
Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture
Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg 10 Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion.
Photogrammetric Engineering and Remote Sensing, 63: 425-434.
42 | P a g e
Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between
encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:
10 daa = 1 ha.
A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with
thematic information from (one or more) existing sources (Figure 5-2). One method frequently used
to populate grids with LULC information is to calculate spatial coverage of each LULC class based on
a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes
can furthermore involve counting or statistical processing of registered data for observations made
inside each grid cell (for instance, not only the area of a class, but also the number of objects it
represented). Ancillary databases, which provide important and relevant information about the
content and context of the grid cells, can also be added. For instance, the number of buildings,
length of roads or number of people found within the grid cells.
5.25.25.25.2 Populating the database Populating the database Populating the database Populating the database
The grid as a spatial model consists of square spatial units of uniform size and shape that can have
an (internally) heterogeneous thematic composition, i.e. the grids are “geometrically uniform
polygons”. Each grid cell has a unique ID that is related to a database containing the attribute data.
43 | P a g e
In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see
also chapter 4.4):
• Local Component data (Urban Atlas, N2000, Riparian Zones)
• High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness)
• CLC-Backbone
• Any other data, such as CLC (25 ha), in-situ data, …
Upon agreement, Member States are invited to provide also their national information to populate
the information system, mainly on land use and agricultural themes.
Figure 5-2: Representation of real world data in the CLC-Core.
5.35.35.35.3 Data modelling in CLCData modelling in CLCData modelling in CLCData modelling in CLC----Core Core Core Core
CLC-Core will be the effective “engine” to model, process and integrate the individual input data sets
to create new, value added information. The modelling will make use of synergies (the same
information provided for one cell based on different sources) as well as disagreements (different
information provided for one cell by different sources) during the data processing.
We would like to refer to the modelling and combination of different layers of the database,
including the modelling parameters as an “instance” of the database. In this understanding, any
output of a database modelling process using different input layers and different parameters will
create a specific instance.
Technically CLC-Core shall:
• Allow further characterisation (e.g. by additional input data / knowledge, see also chapter
5.2 - Populating the database).
• Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local
component products.
• Allow the best possible transformation of National datasets into CLC+ (capturing the wealth
of local knowledge existing in the countries).
44 | P a g e
Storing information in a grid structure will also help to overcome the problem of different
geometries: as the data are converted to grid cells, differences in vector geometries will have a
lesser impact.
5.45.45.45.4 Database implementation aDatabase implementation aDatabase implementation aDatabase implementation approach for CLCpproach for CLCpproach for CLCpproach for CLC----CoreCoreCoreCore
The implementation approach for the database of CLC-Core is based on paradigms originating from
the semantic web and knowledge engineering. This opens up new possibilities beyond the classical
object-relational database management system concept. In the following subsections, we describe
the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL)
concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that
utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we
propose strategies to generate CLC-Core products – i.e. “flat” shape-files or GeoJSON files – based
on a spatio-temporal Triple store.
5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores
The contemporary object-relational database model relies on the work of Codd (1970)11. Although
object-relational database management systems (ORDBMS) were developed further since the 1970s,
their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the
Semantic Web the requirements for databases have changed drastically, which led to the
development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012)12,13.
ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational
model defines how data are stored, retrieved and altered within databases – base on n-ary relations
on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an
environment “reasoning” is done using a two-valued predicate logic. Data, stored utilizing the
relational model, are viewed as tables, where each row represents an n-tuple of the relation itself.
Each column is labelled with the name of the corresponding set (domain). Operations are performed
using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are
designed to adhere to the ACID principles – atomicity, consistency, isolation and durability – which
are defined by Gray (1981)14. ACID principles ensure that database transactions are performed in a
“reliable” way. In contemporary ORDBMSs consistency is the central issue limiting performance and
the ability to scale horizontally – which is crucial for Web 2.0 or Big Data requirements. In addition,
ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data
model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as
static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling
of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used
in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In
11 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13
(6): 377. doi:10.1145/362384.362685 12 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot
persistence. Pearson Education. 13 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt
nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 14 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International
Conference on Very Large Databases, pp. 144–154, Cannes, France.
45 | P a g e
addition, commercial and open-source ORDBMSs are mature products and offer a variety of options
and extensions.
The term “NoSQL” database emerged in 2009 (Evans, 2009)15, as a name for a meeting of database
specialists on distributed structured data storage in San Francisco on June 11, 2009. Since then the
term has grown into a widely known umbrella term for a number of different database concepts
(Friedland et al., 2011). NoSQL databases have the following characteristics in common:
- Non-relational data model
- absence of ACID principles (especially consistency – defined as BASE [eventual consistency
and basically available, soft state eventually consistent])
- flexible schema
- simple API’s (no join operations)
- simple replication approach
- tailored towards distributed and horizontal scalability
NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require
the handling large data volumes with replication. In addition, NoSQL databases are designed to
support horizontal scalability, which is necessary for handling big data with high turnover rates16.
Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo,
Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011)17, Sadalage &
Fowler (2012)):
- Column databases: have tables, rows and columns, but the column names and their format
can change – i.e. each row in a table can be different (Khoshafian et al., 198718; Abadi,
200719). Generally speaking, a wide column store can be regarded as 2D key-value store.
Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google’s
Bigtable
- Key-value databases: store data in the form of a key and an associated value (similar to a
hash), and strictly no relations. Key value stores show exceptional performance in terms of
horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g.
OrientDB, Dynamo (Amazon), Redis, or Berkeley DB.
- Document databases: rely on the “document” metaphor that implies that the
data/information is contained in documents – that share a given format or encoding. Thus,
encodings like XML or JSON are used to represent documents. Each document can be
schema free, which is a key advantage for Web 2.0 applications. Examples of document
databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino.
15 Evans, E. (2017): NOSQL 2009. Url: http://blog.sym-link.com/2009/05/12/nosql_2009.html, visited: 06-11-
2017. 16 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot
persistence. Pearson Education. 17 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M.
Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany:
pp. 121 - 129. 18 KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy
for the decomposed storage model. In: ICDE, pp. 636–643. 19 Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on
Innovative Data Systems Research (CIDR), Url: http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf, visited:
Nov. 08, 2017.
46 | P a g e
- Graph databases: utilize the notion of a mathematical graph structure for database
purposes (Robinson et al., 2015)20. A graph contains of nodes and edges that connect nodes.
In a graph database, the data/information is stored in nodes which are connected via
relationships, represented by edges. The relations can be annotated in a way that a graph
database may contain semantic information. The concept allows modelling of real-world
phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for
populations. Graph databases are capable of storing semantics and ontologies – especially in
the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015)21. They
became very popular due to their suitability for social networking, and are used in systems
like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013)22.
- Multi-modal: such databases are capable of supporting multiple NoSQL models/types.
Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB.
Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al.,
2009, Heath & Bizer, 2011)23,24. Originating from a web of documents, Bizer et al. (2009) describe an
approach to develop a web of data (i.e. the data driven web) – by publishing structured data such
that data from different sources can be interlinked with typed links. Additionally, they formulate
four characteristics of linked data:
- They are published in machine-readable form
- They are published in a way that their meaning is explicitly defined
- They are linked to other datasets
- They can be linked from other data sets
In order to technically fulfil these characteristics, Burners-Lee (2009)25 established a number of
Linked Data principles:
- Uniform Resource Identifiers (URI‘s) to denote things
- HTTP URI‘s shall be used that things can be referred to and dereferenced
- W3C standards like Resource Description Framework (RDF) should be used to provide
information (Subject – Predicate – Object)
- Data about anything should link out to other data, can contribute to the Linked Data Cloud
RDF is a family of W3C specifications that was originally designed as metadata model. Over the years
it has evolved to the general method to describe resources. RDF is based on the notion of making
statements about resources – which follow the form: subject – predicate – object, known as triple.
Here the subject denotes the resource, the predicate represents aspects of the resource, and
therefore is regarded as an expression of the relationship between the subject and object (see
Figure 5-3).
20 Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data,
O'Reilly Media Inc. 21 Lampoltshammer, T. J. & Wiegand, S. (2015). “Improving the Computational Performance of Ontology-Based
Classification Using Graph Databases” Remote Sensing, vol. 7(7): 9473-9491. 22 Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern
Association for Information Systems Conference, Atlanta, vol. 2324, pp. 141-147. 23 Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability
and web applications: emerging concepts, 205-227. 24 Heath, T. and Bizer, C. 2011. Linked data: Evolving the web into a global data space. Synthesis lectures on the
semantic web: theory and technology, 1(1), 1-36. 25 Berners-Lee, T. 2009. Linked Data - Design Issues. Online:
http://www.w3.org/DesignIssues/LinkedData.html; visited: Nov. 09, 2017.
47 | P a g e
Figure 5-3: Example of an RDF triple (subject - predicate - object).
To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema
(RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic
reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage – called
Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples
through semantic queries. Triplestores are specific graph databases, document oriented stores, or
specific relational data stores that are designed for RDF data. A high number of available Triplestores
is able to store spatial and temporal amended data. The following table gives an overview of the
available triplestores (both open-source and commercial products):
Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data.
Product name Supports spatial data Supports temporal data
Apache Jena Yes No
Apache Marmotta Yes Yes
OpenLink Virtuoso Yes Yes
Oracle Yes Yes
Parliament Yes Yes
Sesame/RDF4J Yes No
AllegroGraph No Yes
Strabon Yes Yes (stSPARQL)
GraphDB Yes No
To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows
to write queries that can use quantitative (typically proximity based) relations as well as qualitative
relations (spatial “on” and temporal “after”, “from”). Spatial and temporal extensions of SPARQL
(such as stSPARQL and GeoSPARQL) supporting qualitative relations have been proposed but
expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014)26.
Nevertheless, spatial queries, using GeoSPARQL can look as depicted in Error! Reference source not
found.. In Error! Reference source not found., the GeoSPARQL query “list airports near the city of
26 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In
Temporal Representation and Reasoning (TIME), 2014 21st International Symposium on (pp. 81-90). IEEE.
48 | P a g e
London” is depicted. The result of the query is shown in Figure 5-5, the GeoSPARQL query “list
airports near the city of London” is depicted. The result of the query is shown in Figure 5-4 the
GeoSPARQL query “list airports near the city of London” is depicted. The result of the query is shown
in a map metaphor and as JSON (Figure 5-5).
One major feature of triplestores and SPARQL is the possibility to create federated queries. A
federated query is a single query that is distributed to several SPARQL endpoints (capable of
accepting SPARQL queries and returning results), that individually compute results. Finally, the
originating SPARQL endpoint gathers the individual results, compiles them in one single output, and
returns the output accordingly. This allows to query a large number of different data storages that
can be geographically dispersed (e.g. over Europe).
Figure 5-4: GeoSPARQL query for Airports near the City of London.
Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format.
PREFIX gn: <http://www.geonames.org/ontology#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX spatial: <http://jena.apache.org/spatial#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?link ?name ?lat ?lon WHERE { ?link spatial:nearby(51.139725 -0.895386 50 'km') . ?link gn:name ?name . ?link gn:featureCode gn:S.AIRP . ?link geo:lat ?lat . ?link geo:long ?lon }
49 | P a g e
5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach
The approach followed for the CLC-Core database can be based on a triplestore approach. This
approach could exploit the advantages of NoSQL databases (especially triplestores) while
overcoming the drawbacks of contemporary ORDBMS – especially addressing horizontal scalability
and consistency.
Hence, we propose distributed triplestore approach, where each member country of the European
Union could host their own CLC-core data in an RDF format. Each member country could host a
triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and
replication issues that might arise with contemporary ORDBMS. Hence, each member country can
host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the
member countries can be connected via federated queries, which requires only one SPARQL query at
an endpoint at hand (i.e. at exactly one member country’s CLC-endpoint) – see Figure 5-6.
If there are member countries that are not sure if they should operate a triplestore (and a SPARQL
endpoint) with the CLC-Core data, they can also as another country to host their data on their
infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU
member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria
could host the CLC+ data of another country if they e.g. don’t the resources to develop the
infrastructure for hosting and publishing the data.
Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The
arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint.
Inevitable for such an approach is the development of an underlying shared semantical model – i.e.
an ontology. An abstract semantical model – in RDFS or OWL – can be developed that describes the
abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This
allows the utilization of the abstract semantic model in queries (semantic reasoning based on the
developed knowledge basis). An example for semantic reasoning could be the determination of the
land cover type, based on a number of properties of grid cells.
France:
CLC+ SPARQL
Germany:
CLC+ SPARQL
Austria:
CLC+ SPARQL
Italy:
CLC+ SPARQL
SPARQL query
Federated
SPARQL queries
Flow of result
data
50 | P a g e
5.4.3 Processing and Publishing of CLC-Core Products
To generate CLC+ products based on the distributed triplestore architecture, we need to define the
products requested by the customers. Based on the defined products we need to develop a
migration/transition strategy describing the data flow from triplestores to the desired product.
We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc
queries and analyses, whereas customers would prefer a single “flat” file containing the CLC+ data.
Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase
file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an
automatic way by using a spatial Extract Transform an Load (ETL) tool like SAFE FME27, geokettle28
(open source), GDAL/OGR29 (open source). The migration can be performed with e.g. different
geographical or temporal coverage. These files can be hosted and published on a central server,
where they are available for the public (see Error! Reference source not found.).
Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.
5.4.4 Conclusion and critical remarks
Given the intended fine spatial and temporal resolution of CLC+, which will result in high data
volumes with high turnover rates (velocity) – which are two characteristics of Big Data30. Thus, it
seems advisable to adhere to the nature of these datasets when planning a sustainable architecture
for CLC+. As contemporary ORDBMS are not designed for Big Data applications, which manifests in
their weaknesses regarding horizontal scaling, consistency and replication, we advise to have a
deeper look into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to
cope with the requirements of CLC+, with respect to data volume, data turnover rate, distributed
architecture and semantic queries (and reasoning). Although, NoSQL datastores are widely used in
business applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical
foundation is still under active development. Hence, NoSQL databases do not reach the
technological maturity of contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a
weakness and an opportunity. By using NoSQL databases, this CLC+ is on the edge of the
technological developments and is able to contribute to the future of NoSQL databases – and to
strengthen the spatio-temporal abilities of them.
27 https://www.safe.com/fme/fme-server/ 28 http://www.spatialytics.org/projects/geokettle/ 29 http://www.gdal.org 30 Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy
Review.".martinhilbert.net. visited: Nov. 08, 2017.
Web Server (public)
…
Spatial
ETL Tool
Triple Store #1
Triple Store #2
Triple Store #n
SHAPE File France
SHAPE File Germany
SHAPE File Austria
SHAPE File Finland
…
GeoJSON France
GeoJSON Germany
GeoJSON Austria
GeoJSON Finland
…
51 | P a g e
6 CLC+ - THE LONG-TERM VISION
CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in
Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information
needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2nd
generation CLC to guarantee backwards compatibility with the original CLC datasets.
CLC+ shall be understood as one instance (i.e. realisation) of the CLC-Core (possibly within the
context of CLC-Backbone), enabling it to address various reporting obligations, such as the EU Energy
Union, Paris Agreement, MAES and LULUCF to name just some examples. Many other land
monitoring products with various nomenclatures and spatial frameworks may also be defined on the
basis of CLC-Core.
Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m),
aggregated from CLC-Core. Due to the higher geometric resolution, the use of some of the often
debated heterogeneous classes, like 2.4.2 (complex cultivation pattern) and 2.4.3 (land principally
occupied by agriculture, with significant areas of natural vegetation) will be significantly reduced, if
not disappear.
As the original CLC nomenclature is more than a pure land cover classification, there will be the need
to make use also of other information than currently provided by the Copernicus Land Monitoring
Service. Table 6-1 provides an overview of the 44 CLC classes and which type of information would
still be needed to derive a specific CLC class in addition to the information provided by the Local
Components, the High Resolution Layers and the land cover information collected by CLC-Backbone.
The Hungarian test case31 for implementing the EAGLE concept in a bottom up CLC production has
clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input
data on:
• Land Use (critical);
• Crops and use intensity (e.g. from LPIS).
Obviously, the situation is different for each CLC class and also dependent on the available
Copernicus data (i.e. UA, RZ or N2K), but Table 6-1 clearly shows that almost no CLC class can be
derived without additional information on the use aspect of the area. In almost all cases this
information gap can be closed with data from Member States. In case MS data is not available, a few
classes can be approximated via European level information, but most certainly on the expense of
data accuracy and precision. The national information should be provided as input to CLC-Core. In
case of missing national land use information as well as no other European-level replacement data,
traditional CLC data will be used as back-up solution. Areas with obvious contractions in land cover
(from CLC-Backbone) vs. traditional OBS classes will need to undergo special post-processing during
the phase of creation CLC+.
31 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the
European Environment Agency and the European Environment Information and Observation Network. Final
report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a
selected area in one EU country
52 | P a g e
Table 6-1: List of CLC classes and requirements for external information
53 | P a g e
The update cycle of CLC+ is proposed to be consistent with the update cycles of the major input data
sets (i.e. the 3 year cycle of the LoCo or the 3-6 year cycle of CLC-Backbone). Apart from that, as
CLC+ is “just” an instance of CLC-Core, it can be derived dynamically upon request.
54 | P a g e
7 CLC-LEGACY
The history of European land monitoring is a story of permanent evolution of approaches and
concepts, resulting in different solutions for individual needs and specifications. In this context, the
CORINE Land Cover (CLC) specifications provided the first set of European-wide accepted de-facto
standards, i.e. a nomenclature of land cover classes, a geometric specification, an approach for land
cover change mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC
represents a unique European-wide, if not global, consensus on land monitoring, supported by 39
countries.
Being the thematically most detailed wall to wall European LC/LU dataset and representing a time-
series started with the early 1990 years CLC data became a basis of several analyses and indicator
developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that
the planned new European land monitoring system (CLC-core & CLC+) has to be able to ensure a
possible maximum of backwards compatibility with standard CLC data, i.e. CLC-Legacy.
On the other hand, comparing the different CLC production methods, i.e. traditional visual photo-
interpretation vs. national bottom-up solutions, shows that those specific methodologies result in
CLC data with slightly different characteristics in terms of LC/LU statistics or fine geometric detail,
even though the general CLC specifications are kept. By a consequence, it should be specified in
what sense the outputs of the new land monitoring system may and should be consistent with CLC-
legacy.
Although original CLC vector data are still used mainly for cartographic purposes, the most
commonly used version of CLC data are raster CLC status and change layers. Moreover, for the
purposes of LEAC analysis CLC information have been transformed to the form of CLC Accounting
Layers.
Specific characteristics and the variety of the CLC creation and update methodologies have
resulted, that while CLC change data are well representing land cover changes between two
certain reference dates, there are significant "breaks" in the continuity of CLC time-series both in
statistical and in geometric sense. The solution applied for the harmonization of CLC time-series is
based on the idea to combine CLC status and change information to create a homogenous quality
time series of CLC / CLC-change layers for account purposes fulfilling the relation:
CLC change = CLC accounting new status – CLC accounting old status
Due to the fact that the Minimum Mapping Unit (MMU) is different for CLC status (25 ha) and CLC
change (5 ha) layers, resulting CLC accounting layers may include at several locations more spatial
detail, than original CLC status layers. Although this fact contradicts to original CLC specifications,
still CLC-accounting layers are considered as the most consistent harmonized time-series of CLC
data.
CLC accounting layers are created always on the basis of the latest raster CLC status layer. Fine
details are added to this layer from the formation codes of previous CLC-change inventories. All
the former CLC accounting status layers are calculated backwards by overlaying CLC-change
consumption codes.
55 | P a g e
The backwards compatibility of CLC+ with CLC-legacy is foreseen to be ensured in a similar way. This
method is already applied in reality, since some of the member states are producing CLC layers in a
bottom-up way and these data are then combined with manually interpreted CLC-changes. The
applied methods of bottom-up CLC creation are different, but variations of the basic idea CLC+ relies
on.
In the sense of the above, the aim of CLC-Legacy is NOT to reproduce as much as possible original
25 ha MMU vector CLC as it would be if created continuously by visual photo-interpretation. Instead
of this, CLC-legacy will be created by considering the following aspects:
• CLC-Legacy is to be created as a 100m resolution raster layer, representing the CLC (like)
status at a certain reference date.
• CLC-Legacy will be derived from CLC-Core (and potentially CLC+) in a way, which ensures
homogenous and consistent content and quality across Europe. Note, several CLC layers are
already derived based on similar principles in countries using bottom-up methods; all the
outputs correspond to basic CLC specifications, but have slightly different characteristics in
terms of geometry and statistics.
• Instead of 25 ha MMU vector CLC, CLC accounting layers are considered as reference layers
concerning targeted content:
o Applying standardized raster generalization on CLC-core outputs to reach 5ha MMU
(instead of 25ha), corresponding to CLC-change layers.
o Less significance of complex CLC classes, like 243. Standard methodology is to be
developed to reproduce relevant information contained by some other complex CLC
classes, like 244.
o CLC status layers of past reference dates (2012, 2006, 2000, ..) are to be created by
combining CLC-legacy with previous CLC-change data.
The above concept assumes, that in the future, besides of the update of CLC-core and CLC+ status,
CLC compatible change information is to be derived as well. Obviously, there are several
methodological and technical details to be clarified and solved during upcoming practical test
exercises.
7.17.17.17.1 ExperienExperienExperienExperiences to be consideredces to be consideredces to be consideredces to be considered
The following experiences will be considered for establishing the grounds for creating CLC-Legacy:
• CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC)
system. The methodology was developed by ETC/ULS to create these layers, as well as for
the raster generalization of CLC accounting layers32.
• Methodologies for bottom-up creation of CLC data have been developed by several CLC
national teams. This includes spatial and thematic aggregation of in-situ based high
resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain).
In most cases, rules were developed to allow thematic conversion in an automated or semi-
automated fashion. For spatial generalisation different techniques are available, depending
on the starting point in each of the countries.
Figure 7-1 to Figure 7-3 illustrate different techniques and results of national generalisation
approaches to produce standard CLC from more detailed data.
32 Generalization of adjusted CLC layers. ETC/ULS report 2016.
56 | P a g e
Figure 7-1: Generalisation technique applied in Norway based on expanding and subsequently
shrinking. The technique exists for polygon as well as raster data.
Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III
Figure 7-2: Generalization levels used in LISA generalization in Austria
Figure 7-3: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany
Specific example
During the transition phase of traditional CLC to CLC-Legacy special attention needs to be paid to the
handling of geometries and resulting geometric differences vs. changes. Error! Reference source
not found.Error! Reference source not found. shows the results from Germany where the process
changed from mapping to a bottom-up approach using existing national data in 2012. While on the
left side of the river there are very little changes (located in Luxembourg), the German side shows
clearly some differences due to the changed approach. The CLC change layer is then used to
differentiate technical changes from real changes. This step is needed once to make this transition.
Future updates using the bottom-up approach will be consistent with the 2012 situation. shows the
results from Germany where the process changed from mapping to a bottom-up approach using
existing national data in 2012. While on the left side of the river there are very little changes (located
in Luxembourg), the German side shows clearly some differences due to the changed approach. The
CLC change layer is then used to differentiate technical changes from real changes. This step is
needed once to make this transition. Future updates using the bottom-up approach will be
consistent with the 2012 situation. shows the results from Germany where the process changed
57 | P a g e
from mapping to a bottom-up approach using existing national data in 2012. While on the left side of
the river there are very little changes (located in Luxembourg), the German side shows clearly some
differences due to the changed approach. The CLC change layer is then used to differentiate
technical changes from real changes. This step is needed once to make this transition. Future
updates using the bottom-up approach will be consistent with the 2012 situation. shows the results
from Germany where the process changed from mapping to a bottom-up approach using existing
national data in 2012. While on the left side of the river there are very little changes (located in
Luxembourg), the German side shows clearly some differences due to the changed approach. The
CLC change layer is then used to differentiate technical changes from real changes. This step is
needed once to make this transition. Future updates using the bottom-up approach will be
consistent with the 2012 situation.
CLC 2012
CLC 2006
Figure 7-4: Result of the generalisation process in Germany with technical changes in different
classes
58 | P a g e
8 TECHNICAL SPECIFICATIONS
8.18.18.18.1 CLCCLCCLCCLC----BackboneBackboneBackboneBackbone
CLC-Backbone
[Example image if available.]
Description
CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format
providing a geometric backbone with limited, but robust thematic detail on which to build other
products.
Input data sources (EO & ancillary)
EO Overview
Spatial resolution 10 – 20 m
Spectral content – Visible, NIR, SWIR, SAR
Temporal revisit –Bi-Monthly to achieve monthly cloud free acquisitions
Acquisition approach – likely to be multi-mission
Candidate missions – Sentinel-2 (Landsat 8).
Contributing data – Sentinel-1
Ancillary data
OSM roads and railways (or comparable national datasets)
EU – Hydro (or WISE WFD)
European coastline
OSM buildings
OSM land use
HRL imperviousness (refined with ESM)
LoCo Riparian Zones
LoCo Urban Atlas
59 | P a g e
Thematic Content
EAGLE Land Cover Components (Option 2 – 9 classes):
• Sealed
• Woody vegetation - coniferous
• Woody vegetation - broadleaved
• permanent herbaceous (i.e. grasslands)
• Periodically herbaceous (i.e. arable land)
• Permanent bare soil
• Non-vegetated bare surfaces (i.e. rock and screes)
• Water surfaces
• Snow & ice
Methodology
The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step
(hardbone) boundaries of relatively stable larger areas are detected based on existing data sets. In
a second step image segmentation techniques based on multi-temporal Sentinel-2 data are applied
to further subdivide the Level 1 objects into smaller units (Level 2 objects). The third step contains a
classification and spectral-temporal attribution (Sentinel 1 and 2) of the landscape objects derived
from steps 1 and 2 using a pixel-based classification of the Land Cover components. The pixel based
classification is generalized on object level using indicators like dominating land cover class and
percentual mixture of land cover classes.
Geometric resolution (Scale)
~1:10 000
Geographic projection / Reference system
European Terrestrial Reference System 1989 (ETRS89)
Coverage
EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria,
Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland,
France , Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council
Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of
Macedonia, Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia,
Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in
the regular CLC dataflow. The number of countries involved is 39. The area covered is,
approximately, 6 Million km2.
Geometric accuracy (positioning accuracy)
Less than half a pixel based on 10 m spatial resolution input data
Spatial resolution / Minimum Mapping Unit (MMU)
0.5 to 5 ha for final product
Min. Width of linear features
10 – 50 m
Topological correctness
[TBD]
60 | P a g e
Raster coding
[n/a]
Thematic accuracy (in %) / quality method
The target will be set at 90 %. A quantitative approach will be used based on a set of stratified
random point samples compared to external datasets (e.g. GoogleEarth, national orthophotos or
national grassland datasets).
Target / reference year
2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)
Up-date frequency
[TBD]
Information actuality
Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for
the ancillary datasets.
Delivery format
Vector
Data type
Vector
Naming conventions
[TBD]
Medium
[TBD]
Delivery reliability
[TBD]
Delivery time
[TBD]
Archive
[TBD]
61 | P a g e
8.28.28.28.2 CLCCLCCLCCLC----CoreCoreCoreCore
CLC-Core
[Example image if available.]
Description
CLC-Core is a consistent, multi-use grid database repository for environmental information
populated with a broad range of land cover, land use and ancillary data, forming the information
engine to deliver and support tailored thematic information requirements.
Input data sources
CLMS products
Corine Land Cover, CLC1990, CLC2000, CLC2006, CLC2012
High Resolution Layers (Imperviousness, Forest, Grassland, Wetland & Water)
Reference datasets (EU-DEM, EU-Hydro)
Related pan-European products (ESM)
Local components (Urban Atlas, Riparian Zones, Natura 2000, Coastal)
Ancillary data
Open Street Map
Member State data, particularly land use information
Thematic Content
Input data sources stored within a version of the EAGLE data model.
Methodology
TBD, see chapter 5.4
Geometric resolution (Scale)
TBD
Geographic projection / Reference system
European Terrestrial Reference System 1989 (ETRS89)
Coverage
EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria,
Belgium, Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland,
France , Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council
Resolution 1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of
Macedonia, Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia,
Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in
the regular CLC dataflow. The number of countries involved is 39. The area covered is,
approximately, 6 Million km2.
62 | P a g e
Geometric accuracy (positioning accuracy)
Less than half a pixel based on 10 m spatial resolution input data
Spatial resolution / Minimum Mapping Unit (MMU)
0.5 ha to 1 km2 for final product
Min. Width of linear features
Grid size
Topological correctness
Grid spatial structure
Raster coding
[n/a]
Thematic accuracy (in %) / quality method
TBD
Target / reference year
2018
Up-date frequency
[TBD]
Information actuality
Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for
the ancillary datasets.
Delivery format
Grid
Data type
Grid
Naming conventions
[TBD]
Medium
[TBD]
Delivery reliability
[TBD]
Delivery time
[TBD]
Archive
[TBD]
63 | P a g e
9 ANNEX 1: OSM DATA
9.19.19.19.1 Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2Crosswalk between OSM land use tags and CLC nomenclature Level 2
Source: Schulz et al (2017): Open land cover from OpenStreetMap and remote sensing, Int J Appl
Earth Obs Geoinformation 63 (2017) 206-213
64 | P a g e
9.29.29.29.2 OSM road nomenclatureOSM road nomenclatureOSM road nomenclatureOSM road nomenclature
The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1
delineation. Road types that are subject for visual control as they are large enough to be
represented as area feature (>= 10m width) are marked in the second column. In many countries,
most roads are tagged as “unclassified”. Only highway = motorway/motorway_link implies anything
about quality.
CLC-
Backbone
Potential
10m width
Key Value Comment
y 10m highway motorway A restricted access major divided highway, normally with 2 or
more running lanes plus emergency hard shoulder. Equivalent
to the Freeway, Autobahn, etc..
y highway trunk The most important roads in a country's system that aren't
motorways. (Need not necessarily be a divided highway.)
y 10m highway primary The next most important roads in a country's system. (Often
link larger towns.)
y highway secondary The next most important roads in a country's system. (Often
link towns.)
y highway tertiary The next most important roads in a country's system. (Often
link smaller towns and villages)
y highway unclassified The least most important through roads in a country's system –
i.e. minor roads of a lower classification than tertiary, but which
serve a purpose other than access to properties. Often link
villages and hamlets. (The word 'unclassified' is a historical
artefact of the UK road system and does not mean that the
classification is unknown; you can use highway=road for that.)
y highway residential Roads which serve as an access to housing, without function of
connecting settlements. Often lined with housing.
y highway service For access roads to, or within an industrial estate, camp site,
business park, car park etc. Can be used in conjunction with
service=* to indicate the type of usage and with access=* to
indicate who can use it and in what circumstances.
y Link
roads
y 10m highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway
from/to a motorway or lower class highway. Normally with the
same motorway restrictions.
y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road
from/to a trunk road or lower class highway.
y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road
from/to a primary road or lower class highway.
y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary
road from/to a secondary road or lower class highway.
y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road
from/to a tertiary road or lower class highway.
y Special
road
types
y highway track Roads for mostly agricultural or forestry uses. To describe the
quality of a track, see tracktype=*. Note: Although tracks are
often rough with unpaved surfaces, this tag is not describing the
quality of a road but its use. Consequently, if you want to tag a
general use road, use one of the general highway values instead
of track.
65 | P a g e
9.39.39.39.3 OSM roads OSM roads OSM roads OSM roads –––– completenesscompletenesscompletenesscompleteness
Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh
and Millard-Ball (2017). A sigmoid curve was fitted to the cumulative length of contributions within
OSM data and used to estimate the saturation level for each country.
country ISO-Code frcComplete_fit OSM length [km]
Serbia SRB 74,3% 61.233
Bosnia and Herzegovina BIH 86,5% 37.877
Turkey TUR 89,7% 501.384
Sweden SWE 92,7% 280.107
Norway NOR 93,6% 132.527
Hungary HUN 93,8% 90.670
Romania ROU 94,3% 162.844
Ireland IRL 96,0% 107.263
Spain ESP 96,2% 509.937
Liechtenstein LIE 96,3% 359
Croatia HRV 96,8% 60.119
Bulgaria BGR 96,9% 80.983
Cyprus CYP 97,0% 12.091
Albania ALB 97,6% 22.875
Italy ITA 97,9% 594.384
Belgium BEL 98,3% 103.778
Lithuania LTU 98,5% 69.885
Poland POL 98,5% 358.733
Iceland ISL 98,6% 17.871
France FRA 99,9% 1.164.238
Denmark DNK 100,3% 96.656
Estonia EST 100,3% 42.878
Portugal PRT 100,4% 150.176
United Kingdom GBR 100,4% 443.265
Netherlands NLD 100,5% 136.118
Greece GRC 100,8% 156.586
Slovakia SVK 101,4% 41.566
Austria AUT 101,4% 128.934
Luxembourg LUX 101,5% 6.210
Germany DEU 101,6% 744.304
Latvia LVA 101,7% 61.066
Montenegro MNE 102,2% 12.346
Macedonia (FYROM) MKD 102,4% 15.174
Switzerland CHE 103,0% 73.066
Czech Republic CZE 104,0% 112.441
Finland FIN 105,3% 231.298
Kosovo XKO 107,4% 13.297
Malta MLT 109,8% 2.241
Slovenia SVN NoData 36.041
66 | P a g e
10 ANNEX 2: LPIS DATA IN EUROPE
The land parcel identification system is the geographic information system component of IACS
(Integrated agricultural control system). It contains the delineation of the reference areas, ecological
focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to
different national priorities in the implementation of the CAP the LPIS-data are not homogeneous
throughout Europe. Some countries use rather aggregated physical blocks as geometric reference
and other countries use single parcel management units as smallest geographic entity within LPIS.
Three types of spatial objects can in principal be differentiated within LPIS (Figure 10-1):
• Physical block
• Field parcel
• Single management unit
A physical block is defined as the area that is surrounded by permanent non-agricultural objects like
roads, rivers, settlement, forest or other features. A physical block normally contains several field
management units that may belong to various farmers.
A field parcel is owned by a single farmer and is the dedicated management unit for a specific
agricultural main category. It contains either grassland or arable land, but never both categories. It
may be split into several single sub-management units.
A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific
grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow).
The relation between field parcels and single management units is a 1:m relation, where 1 field
management unit can contain 1 or many sub-management units.
Remark to the relation with the real estate database:
Land properties are recorded in many countries in the real estate database. An elementary part of
the real estate database is the cartographic representation of real estate parcel boundaries. These
parcels should not be mixed up with the field parcels. The parcels in the real estate database
constitutes owner ship rights and are documented as official boundaries. The boundaries in
agricultural management units however are delineated to derive correct area figures as base for
payments under the CAP. In most cases the parcel boundaries between the real estate database and
the LPIS will be the same, but not in all cases.
67 | P a g e
68 | P a g e
Figure 10-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit
From year to year the availability of national LPIS data is increasing. An overview of available and
downloadable datasets has been created based on a previous ETC-ULS report by Synergise within
the Horizon 2020 project LandSense (Figure 10-2). The minimum thematic detail that should be
harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3)
permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite
heterogeneous classes on national level a harmonization is quite challenging, as they are applied
quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any
level could already provide valuable input given that LPIS is available for the large majority of
countries. Unless a critical number of countries with available LPIS data is reached the inclusion of
LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based
on pure image segmentation.
69 | P a g e
Figure 10-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project
LandSense
70 | P a g e
11 ANNEX 3: ILLUSTRATION OF STEP 1 “HARDBONE GEOMETRY” OF CLC-
BACKBONE
71 | P a g e
72 | P a g e
12 ANNEX 4: CLC-BACKBONE DATA MODEL
The complete list of feature types within CLC-Backbone contain:
• INSPIRE
o Land cover dataset
o Land cover unit
• EAGLE
o Land cover component
Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects).
Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover
unit can consist of one or more land cover components (either in time and/or space), each of which
with a specific lifespan.
73 | P a g e
INSPIRE main feature types: land cover dataset and land cover unit
Figure 12-1: INSPIRE main feature types: land cover dataset and land cover unit
EAGLE extension to land cover unit and new feature type land cover component
Figure 12-2: EAGLE extension to land cover unit and new feature type land cover component
74 | P a g e
Each land cover unit can contain one or more so called parametric observations. According to
INSPIRE these parametric observations are limited to percentage, countable and presence
parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI,
or vegetation trajectories) of a single date this parameter type is enlarged with the data type
“Indexparameter”.
Therefore, the CLC-Backbone data model includes the single date observations as ParameterType
(with a specific observationDate and a measurement as real number (0…1 or 0….100). A new CLC-
Backbone base model is shown in Figure 12-3.
Figure 12-3: The new CLC-Basis model (based on LISA)
The new CLC-Backbone data model includes now three feature types:
• LandCoverDataset
• LandCoverUnit
• LandCoverComponent
And two additional datatypes
• LandCoverUnit
o ParameterType
� Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min,
…)
� ObservationDate: date of the observation (e.g. acquisition date of satellite
scene)
� IndexParameter: Value of e.g. NDVI_max, etc. to be stored here
75 | P a g e
• LandCoverComponent
o Occurrence
� Cover percentage of a single land cover component within the LC Unit
o Overlaying
� 1….if the LC component is overlaying another component (e.g. temporarily
covered with water, snow)
� 0…default-value, if the component is not overlaying
o validFrom: The time when the activity complex started to exist in the real world.
o validTo: The time when the activity complex no longer exists in the real world.
Integration of temporal dimension
12.1.1 Integration of temporal dimension into LISA
As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of
orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for
the adaptation of the model is that the land cover unit establishes the more or less stable geometry
over time (>1 year continuity).
Within the land cover unit several land cover components can exist with a specific life time. The land
cover unit can be considered as stable container for these changing land cover components.
Types of changes:
Changes within a land cover unit is in general regarded as status change.
We consider changes within the land cover components as cyclic changes.
Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that
will be defined within the products nr. 5.
The implementation of the temporal dynamics is realized using a 1:many relation between the
LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the
specific component.
Example
A typical agricultural field the land cover unit could have the following LC components:
• Bare soil January – March
• Herbaceous vegetation April-June
• Bare soil 2nd half of June
• Herbaceaous vegetation July-September
• Bare soil 2nd half of September
• Herbaceous vegetation October-December
A web-based application within the GSE Cadaster Environment project (financed by ESA) is available
under: http://demo.geoville.com/shiny/cadenv/p2/wien/ to demonstrate the above-mentioned
concept (Figure 12-4).
76 | P a g e
Figure 12-4: illustration of temporal NDVI profile and derived land cover components throughout a
vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017)
Important note:
It is not foreseen in the technical specifications that each temporal LC component is stored in the
underlying CLC Base production database. The production chain may just provide the final
classification of the LC unit in the final CLC-Backbone product, without giving information on the
appearance of land cover components throughout.
12.1.2 Integration of multi-temporal observations into LISA
Spectral profiles over time constitute a valuable information source (Figure 12-5), not only for the
automated classification, but as well for the visual interpretation of object characteristic. For each
CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.)
can be stored in the data model using the parametricObservation.
77 | P a g e
Figure 12-5: Draft sketch to illustrate the principles of the combined temporal information that is
stored in the data model
The sketch above illustrates the principles of the combined temporal information that is stored in
the data model
• The graph above illustrates the variation within a year within a LandCoverUnit. The
geometric borders of the unit are pre-fixed based on the Level 2 landscape object
geometries
o red points: single observations from each S-2 scene calculated on object level (Level
2 landscape polygon)
� From these observations (stored in the parametricObservation of the
LandCoverUnit the complete temporal profile for the specific object can be
reconstructed as graph and visually illustrated to domain experts.
o Brown and green bars: lifespan of the single LandCoverComponents (green –
herbaceous vegetation and brown – bare soil) within a Land Cover unit
o Dark green bar: classification of the land cover unit according to the information of
the sequential appearance of land cover components
Within the LandCoverComponent the attributes validFrom and validTo are used to store the life-time
information of the component. If only the start, but not the end of a LandCoverComponent can be
observed (e.g. due to missing observations in time), only the validFrom can be used. The element
will automatically be set to invalid, if a new LandCoverComponent appears (with 100% coverage of
the landcover unit).
78 | P a g e
13 ANNEX 5 – CLC NOMENCLATURE