18
VTL: Validation and Transformation Language ESTP training course Item 4 Luxembourg, 21-22 Nov 2017 [email protected] Eurostat, Unit B1

VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

  • Upload
    others

  • View
    25

  • Download
    1

Embed Size (px)

Citation preview

Page 1: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

VTL: Validation and Transformation Language

ESTP training course

Item 4

Luxembourg, 21-22 Nov 2017

[email protected]

Eurostat, Unit B1

Page 2: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

VTL: the origin

• Based on a generic information model that can be used with different standards: SDMX, DDI, GSIM or others

• VTL is maintained by the VTL Task Force, composed of members of Eurostat, ECB, ILO, INEGI, Bank of Italy, ISTAT

• The VTL Task Force works under the umbrella of the SDMX Technical Working Group

Page 3: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

SDMX, which stands for Statistical Data and Metadata eXchange is an international initiative that aims at standardising and modernising (“industrialising”) the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries.

SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.

Page 4: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

VTL – purposes

1. provide an unambiguous language to communicate validation rules between different statistical organisations

2. provide a high-level language to document the data transformations

3. provide an efficient language for implementing data validation services

4. provide an efficient language for implementing data transformations

Page 5: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

Versions of VTL

• VTL 1.0 published in March 2015• Collection of comments (public review)

• VTL 1.1 published in November 2016• Collection of comments (public review)

• VTL 2.0 will be published in December 2017

• SDMX web site: http://sdmx.org/?page_id=5096

Page 6: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

VTL – main principles

Most of the VTL operators operate on datasets

A dataset is described by dimensions, measures and attributes

Example:

ds_bop_1

REF_AREA PARTNER TIME OBS_VALUE OBS_STATUS

EU25 CA 2010 20 D

BG CA 2010 1 P

RO CA 2010 1 P

EU27 CA 2010 23 P

Dimension Measure Attribute

Page 7: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

VTL – main principles

Example of a typical VTL operation:

ds3 := ds1 + ds2

Operations carried out by VTL:

• join the data points of the ds1 and ds2 using the dimensionvalues

• apply the scalar function "+" to all pairs of numeric measuresof ds1 and ds2 having the same name

• if desired, execute an attribute propagation function definedby the user (e.g. concatenate the "flag" attribute of the twodata points)

• create a temporary dataset containing the resulting data points

Page 8: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

Hierarchical validation rules Data point validation rules

Time-series rules Boolean conditions

Example of VTL validation rules

check ( ds1#obs_value >= 0 )

Page 9: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

Hierarchical ruleset: hr_euro_agg

N. Antecedent variables:

time

Rule variables:

ref_area

1 EU15 = AT + BE + LU + DE + ES + FI + FR + EL + IE + IT + NL + PT + DK + UK + SE

2 EU25 = EU15 + CY + CZ + ES + HU + LT + LV + MT + PL + SK + SI

3 EU27 = EU25 + BG + RO

4 EU28 = EU27 + HR

5 time between 1995 and 2003 EU = EU15

6 time between 2004 and 2005 EU = EU25

7 time between 2006 and 2012 EU = EU27

8 time >= 2013 EU = EU28

VTL - hierarchical ruleset

VTL syntax:

define hierarchical ruleset hr_euro_agg ( antecedent variable = time, variable = ref_area) isEU15 = AT + BE + LU + DE + ES + FI + FR + EL + IE + IT + NL + PT + DK + UK + SE ;EU25 = EU15 + CY + CZ + ES + HU + LT + LV + MT + PL + SK + SI ;EU27 = EU25 + BG + RO ;EU28 = EU27 + HR ;when time between 1995 and 2003 then EU = EU15 ;when time between 2004 and 2005 then EU = EU25 ;when time between 2006 and 2012 then EU = EU27 ;when time >= 2013 then EU = EU28 ;

end hierarchical ruleset

Page 10: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

10

create datapoint ruleset dr_flow_positive ( flow, obs_value )

when flow = "IMP" or flow = "EXP" then obs_value > 0 ;

end horizontal ruleset

The datapoint ruleset:

• is defined on the variables flow and obs_value

• verifies that in each data point of the dataset to be validated

(not shown here) the component obs_value is greater than

zero when the flow is "IMP" or "EXP".

• the above syntax creates a ruleset (a permanent object)

named "dr_flow_positive"

VTL – datapoint validation ruleset

Page 11: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

11

ds_result := check ( ds1 # obs_value > 1000,

errorcode ( "Value must be greater than 1000" ) )

VTL – checking boolean conditions

Page 12: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

12

VTL code:ds_result := check ( ds_bop # time_period between 2008 and

2015, errorcode(“_____”), errorlevel(“Error”) ) ;

ds_bop is the dataset containing the data to be validated

Question:What is the correct text (error message) to be inserted in _____ ?

Exercise 1

Page 13: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

13

VTL code:ds_result := check ( ds_bop1 # obs_value, hr_euro_agg ) ;

hr_euro_agg is the hierarchical ruleset described in slide 9.

Question: What is the data point contained in the ds_result dataset?

ds_bop1

REF_AREA PARTNER TIME OBS_VALUE OBS_STATUS

EU25 CA 2010 20 D

BG CA 2010 1 P

RO CA 2010 1 P

EU27 CA 2010 23 P

Exercise 2

Page 14: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

14

VTL code:ds_result := check ( ds_bop1, dr_flow_positive ) ;

dr_flow_positive is the datapoint ruleset described in slide 10.

Question: What is the data point contained in the ds_result dataset?

ds_bop1

REF_AREA PARTNER FLOW TIME OBS_VALUE OBS_STATUS

EU25 CA IMP 2010 20 D

BG CA IMP 2010 1 P

RO CA IMP 2010 0 P

EU27 CA IMP 2010 23 P

Exercise 3

Page 15: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

15

Assessment of usability by statisticians:

• Covering several domains: Animal Production, Asylum, International Trade in Services, National Accounts, Short Term Statistics

• Participation of 8 countries + Eurostat

Some comments received:• Rules in plain english and examples of bad/good data are both

essential• Rules in VTL may be useful as complement (to limit risks of ambiguity)• Need to agree on way to express the rule (negative or positive)

VTL – assessement of usability

Page 16: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

16

Development of VTL tools

IT tools and services under development:

• ECB VTL parser

• Norway Java API based on JSON-stat formathttps://github.com/statisticsnorway/java-vtl

• Poland VTL to SQL translatorUNECE paper

• Istat VTL Editor

• ESTAT Compiler (part of the Validation Service)

• ESTAT Validation Rule Manager

• ESTAT Sandbox: simple GUI + VTL translator to SQL

Page 17: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

ESTAT

17

Use of VTL

Use of VTL:

• ECB BIRD portal VTL is used to document the data validations and transformations of the statistical process:http://banks-integrated-reporting-dictionary.eu/bird-group

• Continuous Capture of Metadata There is a proposal to use VTL as a common language to describe data transformations http://c2metadata.org/

Page 18: VTL: Validation and Transformation Language · VTL –main principles Example of a typical VTL operation: ds3 := ds1 + ds2 Operations carried out by VTL: •join the data points of

Thank you for your attention!

Any questions?