31
Revision Project of the Business Register (BR) and Business Statistics in 2009- 2014 - General overview of the project (Sami) - Data Linking and Data Warehousing, statistical point of view (Sami) - Technical point of view (Riitta) 6 April 2011 Sami Saarikivi Riitta Piela

Revision Project of the Business Register (BR) and Business Statistics in 2009-2014 - General overview of the project (Sami) - Data Linking and Data Warehousing,

Embed Size (px)

Citation preview

Revision Project of the Business Register (BR) and Business Statistics in 2009-2014

- General overview of the project (Sami)- Data Linking and Data Warehousing, statistical point of view (Sami)- Technical point of view (Riitta)

6 April 2011

Sami SaarikiviRiitta Piela

Overlapping work is made between different business statistical systems/units

Data copies are made and data are processed differentlyChanges are made to several different databases or in the worst

case errors are corrected only in one place! Different definitions in different databases/production systems

Unit structures (establishment, enterprise, etc.)Industry (NACE), sector, turnover, employees...

→ Inconsistent statistics

2

Background / Results from the present status description (1)

Change pressures for comprehensive development of the data system are also produced by:

Statistics Finland's operational strategy and strategy for economic statistics

International regulations and decreesThe previous data system revision was made in 1995-1998

→ The technical service life is coming to an end

3

Background / Results from the present status description (2)

20.04.23 4Information Services

20/04/23 5Information Services

Statistical systems in the integrated system (defining phase 2009)

1. ”Dependent” statistics of the integrated production database A. Business Register (Business Trends department) B. Some SBS and other statistics: (Business Structures department)

* Financial statement * Regional and industrial statistics * International trade in services * FATS statistics (inward and outward) * Industrial output /Commodity (PRODCOM)C. STS statistics (Business Trends department)

2. ”Independent and small” statistics to the warehouse: Business services (the first system) Others in near future

ID Name of task Started Concluded20122009 2013 20142010 2011

N1 N4N2N1 N4 N3N2 N4N1N3 N2N2 N1 N1 N3N4N2 N3 N4 N1N4

2

3

4

5

6

7

8

31.12.20101.12.2009Planning and implementationof a system for receivingadministrative data

30.12.20112.3.2010Planning and implementationof databases

28.12.20123.1.2011Planning and implementationof application programs

28.12.20123.1.2011Planning and implementationof direct data collections

28.12.20123.1.2011Conversion of data

31.12.20132.3.2010Planning and implementationof the warehouse of data onenterprises

31.12.20132.1.2012Planning and implementationof statistics and products(Internet service)

1 31.12.20091.1.2009Defining phase

N3 N2N3

BR into the core of data on enterprises – timetable for 2009-2014

Temporary unit

7

YR45U project organisation in 2011-2014

Main project(YR45U)

Top authority andproject marketingSteering group

Other experts

Help when needed

Sub-project1

Sub-project n

Project Manager 1 + members

Project Manager 2 + members

Project Manager n + members

Tasks in 2011

Supporting group of statistics

Supporting group of IT YR45U (main project)•Databases

Sub-projects

YR45L•Applications 1

YR45M•Applications 2

YR45N•Direct data collections

YR45P•Conversion

YR45X•Introduction

20.04.23 8Yritysten suhdanteet / Yritysrekisteri

Collected results of the desired state

Yritystilastojen yhteiset asiatYritystietovaraston määrittely

System part Key development targets Benefits to be gained

Reception of administrative data

• Utilisation of metadata• Process management• Variable editor and editing

• Intensification / Introduction of uniform practices to pre-checking of data

Direct data collections

• Uniform tools• Facilitation of responding

• Improvement of quality and intensification of activity

Databases • Adoption of the enterprise concept• Uniform production database

• Improvement of quality • International comparability• Non-response and response burden get smaller

Application programs and processing of data

• Intensification of processing (decrease of applications)• Introduction of a process management application

• Improvement of quality and intensification of activity

Warehouse of data on enterprises

• Uniform location for data on enterprises• Basic data search• Comparison of data

• Coherence of statistics

Statistics and products

• Introduction of a unit information service (Internet)• Adoption of a chargeable target group definition service (Internet)

• Improvement of quality and intensification of activity• Improvement of customer service

Data Linking and Data Warehousing (DW)

Statistical point of view

- Timetable- Basic features- Benefits- Challenges- Other requirements

6 April 2011

Sami Saarikivi

Timetable of the DW

Years 2009-2010Basic ideas and requirements for the structure and processAnalysis of stakeholders and exploration of the needs

Year 2011 (spring)Basic structure of the database (architecture)Choice of technology

Years 2011 (autumn) – 2012 Designing and implementation of application programs

Year 2013Ready for use

10

Basic features of the DW

Covers all business data at Statistics Finland (in the long run) Includes all needed unit structures (enterprise groups, legal units,

enterprises, establishments (LKAU), local unit, KAU) and other relevant variables for the statistical process

Passive / only readable (changes to the production/operational database)

Up-to-date information from the Business Register (updating once a day)

Ready/validated unit data from other business statistics (permanent / compatible with releases/publications)

11

Benefits concerning the DW

Uniform location for data on enterprisesBasic data search and data comparisonsThe same unit structures are usedThe same classification of variablesUnification of technical solutions and modes of action

→ easy to use, coherence of statistics Makes further processing of business data easier

National accounts Research and information servicesPublishing and archiving of data

12

Benefits concerning the integrated production/operational databases

Comparison of data at the beginning of the statistical process

→ Intensification of activity

→ Coherence of statisticsNo different kinds of production systems and databases, error

checking and changes are made to one place

→ Intensification of activity

→ Quality of data

→ Reduction in overlapping work

→ Unification of the production processProfits from other systems easier to implement

13

Overall challenges

Diverse use and requirementsSingle unit base (validating, information services/legal unit)Samples and researchFor statistical purposes

“Transition from several old systems to one new system”Consensus between different kinds of business statistics

Variable definitions (naming and values of variables)Uniform processes and modes of actionsEmployees’ work tasks are going to change

Lack of examples

→ MEETS/ESSnet (3.1) results? Timetable conflict

14

More challenges, solutions and benefits

Management of a large system and process?A uniform process-controlling application will be developed

Improving the transparency of the data processReducing person-dependencyNew possibilities to divide work

Use is made of the XML-based statistical metadata systemData concerning the process managementData concerning the data content

E.g. names and descriptions of variables, validation rules,…

15

Requirements of the process-controlling application Covers the whole process (from data collection to publishing) The progression of the process

Tasks needed for the process, their order, whether performed successfully (traffic lights?) and when

Responsible persons Reading more precise descriptions (e.g. process

descriptions, processing rules) Starting tasks and batch runs (e.g. data updates, limited

rights) Automatic email messages (e.g. when a certain task is done

or behind schedule)

16

17

Basic requirements for the structure of the DW

Clarity, ease of use and understandability Flexibility and expandability Convenience of use: speed of inquiries Compatibility with existing data systems Security: access will be subject to granted user rights.

18

• Mr Sami Saarikivi, • tel. +358 9 1734 3345• email: [email protected]

Further information about the revision:

Thank you!

Data Warehouse Architecture in the Revision Project of the Business Register and Business Statistics

Technical point of view

6 April 2011

Riitta Piela

Two definitions

Data warehouse isa copy of transaction data specifically structured for

query and analysis (Ralph Kimball)a subject-oriented, integrated, time-variant and non-

volatile collection of data in support of management's decision making process (Bill Inmon)

20/04/23 20Riitta Piela

Statistical process

Data Collection Editing and Analyzing Publication

DataSource Layer

DataExtraction Layer

DataStorage Layer

ETL Layer Staging Area

DataLogic Layer

DataPresentation Layer

Metadata Layer

Data Warehouse Architecture

20/04/23 21Riitta Piela

Data Source Layerrepresents the different data sources that feed data into

the System Data Extraction Layer

data gets pulled from data sources into the System Staging Area

data sits prior to being scrubbed and transformed into a data warehouse

ETL Layerdata gains its intelligence as logic is applied to transform

the data from a transactional nature to an analytical nature

20/04/23 22Riitta Piela

Data Storage Layerwhere the transformed and cleansed data sit

Data Logic Layerwhere business rules are stored

Data Presentation Layerrefers to the information that reaches the users

Metadata Layerwhere information about the data stored in the data

warehouse system is storedalso information on how the data warehouse system

operates, such as ETL job status or process phase (technical metadata)

20.04.23 23Riitta Piela

Data Collection

Admin data

Survey data (XML data)

Technical validation process (SAS)

SAS files

Admin Data + Survey Data Database (Relational Database)

Metadata Layer

DataSource Layer

DataExtraction Layer

20/04/23 24Riitta Piela

Statistics production beginsStatistics production begins

Workflow of receiving administrative data

20.04.23 25Riitta Piela

Editing and analyzing

Metadata Layer

Admin Data Database (original data)

Operational Database

A start-up application for process management and batch runs (.NET + SAS)

Data Warehouse

Staging Area

ETL Layer

20/04/23 26Riitta Piela

Publication

Metadata Layer

Data Warehouse (unit data)

Cubes (aggregated data)

Information Service Application Browsers (ProClarity, SAS EG,

Excel)

EnterpriseCube

EstablishmentCubeData Storage Layer

Data Logic Layer

Data Presentation Layer

Publication

20/04/23 27Riitta Piela

Data Warehouse (relational database)

Data in Analysis Services’ own data structure

ROLAP HOLAP MOLAP

Low response timeReal-time data

”formats there in between”

2009

2008

2007-2000

20/04/23 28Riitta Piela

Browsing the Data

MeasuresMeasures

ClassificationsClassifications

Classification hierarchiesClassification hierarchies

20/04/23 29Riitta Piela

Goal

Low response time

Harmonise all statistical processes at Statistics Finland

Universal interface for all purposes

Speed the Statistical Process through good technical solutions

Optimal database structures for different purposes

Extensive Data Content

20/04/23 30Riitta Piela

31

Technical challenges:

Riitta Piela

SAS.NET

SQL Server

eXistAnalysis Services

1. To combine all technical components into a one functional entity

2. Hide the technological diversity from the end-user

20/04/23