Upload
chastity-dixon
View
215
Download
1
Embed Size (px)
Citation preview
Revision Project of the Business Register (BR) and Business Statistics in 2009-2014
- General overview of the project (Sami)- Data Linking and Data Warehousing, statistical point of view (Sami)- Technical point of view (Riitta)
6 April 2011
Sami SaarikiviRiitta Piela
Overlapping work is made between different business statistical systems/units
Data copies are made and data are processed differentlyChanges are made to several different databases or in the worst
case errors are corrected only in one place! Different definitions in different databases/production systems
Unit structures (establishment, enterprise, etc.)Industry (NACE), sector, turnover, employees...
→ Inconsistent statistics
2
Background / Results from the present status description (1)
Change pressures for comprehensive development of the data system are also produced by:
Statistics Finland's operational strategy and strategy for economic statistics
International regulations and decreesThe previous data system revision was made in 1995-1998
→ The technical service life is coming to an end
3
Background / Results from the present status description (2)
20/04/23 5Information Services
Statistical systems in the integrated system (defining phase 2009)
1. ”Dependent” statistics of the integrated production database A. Business Register (Business Trends department) B. Some SBS and other statistics: (Business Structures department)
* Financial statement * Regional and industrial statistics * International trade in services * FATS statistics (inward and outward) * Industrial output /Commodity (PRODCOM)C. STS statistics (Business Trends department)
2. ”Independent and small” statistics to the warehouse: Business services (the first system) Others in near future
ID Name of task Started Concluded20122009 2013 20142010 2011
N1 N4N2N1 N4 N3N2 N4N1N3 N2N2 N1 N1 N3N4N2 N3 N4 N1N4
2
3
4
5
6
7
8
31.12.20101.12.2009Planning and implementationof a system for receivingadministrative data
30.12.20112.3.2010Planning and implementationof databases
28.12.20123.1.2011Planning and implementationof application programs
28.12.20123.1.2011Planning and implementationof direct data collections
28.12.20123.1.2011Conversion of data
31.12.20132.3.2010Planning and implementationof the warehouse of data onenterprises
31.12.20132.1.2012Planning and implementationof statistics and products(Internet service)
1 31.12.20091.1.2009Defining phase
N3 N2N3
BR into the core of data on enterprises – timetable for 2009-2014
Temporary unit
7
YR45U project organisation in 2011-2014
Main project(YR45U)
Top authority andproject marketingSteering group
Other experts
Help when needed
Sub-project1
Sub-project n
Project Manager 1 + members
Project Manager 2 + members
Project Manager n + members
Tasks in 2011
Supporting group of statistics
Supporting group of IT YR45U (main project)•Databases
Sub-projects
YR45L•Applications 1
YR45M•Applications 2
YR45N•Direct data collections
YR45P•Conversion
YR45X•Introduction
20.04.23 8Yritysten suhdanteet / Yritysrekisteri
Collected results of the desired state
Yritystilastojen yhteiset asiatYritystietovaraston määrittely
System part Key development targets Benefits to be gained
Reception of administrative data
• Utilisation of metadata• Process management• Variable editor and editing
• Intensification / Introduction of uniform practices to pre-checking of data
Direct data collections
• Uniform tools• Facilitation of responding
• Improvement of quality and intensification of activity
Databases • Adoption of the enterprise concept• Uniform production database
• Improvement of quality • International comparability• Non-response and response burden get smaller
Application programs and processing of data
• Intensification of processing (decrease of applications)• Introduction of a process management application
• Improvement of quality and intensification of activity
Warehouse of data on enterprises
• Uniform location for data on enterprises• Basic data search• Comparison of data
• Coherence of statistics
Statistics and products
• Introduction of a unit information service (Internet)• Adoption of a chargeable target group definition service (Internet)
• Improvement of quality and intensification of activity• Improvement of customer service
Data Linking and Data Warehousing (DW)
Statistical point of view
- Timetable- Basic features- Benefits- Challenges- Other requirements
6 April 2011
Sami Saarikivi
Timetable of the DW
Years 2009-2010Basic ideas and requirements for the structure and processAnalysis of stakeholders and exploration of the needs
Year 2011 (spring)Basic structure of the database (architecture)Choice of technology
Years 2011 (autumn) – 2012 Designing and implementation of application programs
Year 2013Ready for use
10
Basic features of the DW
Covers all business data at Statistics Finland (in the long run) Includes all needed unit structures (enterprise groups, legal units,
enterprises, establishments (LKAU), local unit, KAU) and other relevant variables for the statistical process
Passive / only readable (changes to the production/operational database)
Up-to-date information from the Business Register (updating once a day)
Ready/validated unit data from other business statistics (permanent / compatible with releases/publications)
11
Benefits concerning the DW
Uniform location for data on enterprisesBasic data search and data comparisonsThe same unit structures are usedThe same classification of variablesUnification of technical solutions and modes of action
→ easy to use, coherence of statistics Makes further processing of business data easier
National accounts Research and information servicesPublishing and archiving of data
12
Benefits concerning the integrated production/operational databases
Comparison of data at the beginning of the statistical process
→ Intensification of activity
→ Coherence of statisticsNo different kinds of production systems and databases, error
checking and changes are made to one place
→ Intensification of activity
→ Quality of data
→ Reduction in overlapping work
→ Unification of the production processProfits from other systems easier to implement
13
Overall challenges
Diverse use and requirementsSingle unit base (validating, information services/legal unit)Samples and researchFor statistical purposes
“Transition from several old systems to one new system”Consensus between different kinds of business statistics
Variable definitions (naming and values of variables)Uniform processes and modes of actionsEmployees’ work tasks are going to change
Lack of examples
→ MEETS/ESSnet (3.1) results? Timetable conflict
14
More challenges, solutions and benefits
Management of a large system and process?A uniform process-controlling application will be developed
Improving the transparency of the data processReducing person-dependencyNew possibilities to divide work
Use is made of the XML-based statistical metadata systemData concerning the process managementData concerning the data content
E.g. names and descriptions of variables, validation rules,…
15
Requirements of the process-controlling application Covers the whole process (from data collection to publishing) The progression of the process
Tasks needed for the process, their order, whether performed successfully (traffic lights?) and when
Responsible persons Reading more precise descriptions (e.g. process
descriptions, processing rules) Starting tasks and batch runs (e.g. data updates, limited
rights) Automatic email messages (e.g. when a certain task is done
or behind schedule)
16
17
Basic requirements for the structure of the DW
Clarity, ease of use and understandability Flexibility and expandability Convenience of use: speed of inquiries Compatibility with existing data systems Security: access will be subject to granted user rights.
18
• Mr Sami Saarikivi, • tel. +358 9 1734 3345• email: [email protected]
Further information about the revision:
Thank you!
Data Warehouse Architecture in the Revision Project of the Business Register and Business Statistics
Technical point of view
6 April 2011
Riitta Piela
Two definitions
Data warehouse isa copy of transaction data specifically structured for
query and analysis (Ralph Kimball)a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making process (Bill Inmon)
20/04/23 20Riitta Piela
Statistical process
Data Collection Editing and Analyzing Publication
DataSource Layer
DataExtraction Layer
DataStorage Layer
ETL Layer Staging Area
DataLogic Layer
DataPresentation Layer
Metadata Layer
Data Warehouse Architecture
20/04/23 21Riitta Piela
Data Source Layerrepresents the different data sources that feed data into
the System Data Extraction Layer
data gets pulled from data sources into the System Staging Area
data sits prior to being scrubbed and transformed into a data warehouse
ETL Layerdata gains its intelligence as logic is applied to transform
the data from a transactional nature to an analytical nature
20/04/23 22Riitta Piela
Data Storage Layerwhere the transformed and cleansed data sit
Data Logic Layerwhere business rules are stored
Data Presentation Layerrefers to the information that reaches the users
Metadata Layerwhere information about the data stored in the data
warehouse system is storedalso information on how the data warehouse system
operates, such as ETL job status or process phase (technical metadata)
20.04.23 23Riitta Piela
Data Collection
Admin data
Survey data (XML data)
Technical validation process (SAS)
SAS files
Admin Data + Survey Data Database (Relational Database)
Metadata Layer
DataSource Layer
DataExtraction Layer
20/04/23 24Riitta Piela
Statistics production beginsStatistics production begins
Workflow of receiving administrative data
20.04.23 25Riitta Piela
Editing and analyzing
Metadata Layer
Admin Data Database (original data)
Operational Database
A start-up application for process management and batch runs (.NET + SAS)
Data Warehouse
Staging Area
ETL Layer
20/04/23 26Riitta Piela
Publication
Metadata Layer
Data Warehouse (unit data)
Cubes (aggregated data)
Information Service Application Browsers (ProClarity, SAS EG,
Excel)
EnterpriseCube
EstablishmentCubeData Storage Layer
Data Logic Layer
Data Presentation Layer
Publication
20/04/23 27Riitta Piela
Data Warehouse (relational database)
Data in Analysis Services’ own data structure
ROLAP HOLAP MOLAP
Low response timeReal-time data
”formats there in between”
2009
2008
2007-2000
20/04/23 28Riitta Piela
Browsing the Data
MeasuresMeasures
ClassificationsClassifications
Classification hierarchiesClassification hierarchies
20/04/23 29Riitta Piela
Goal
Low response time
Harmonise all statistical processes at Statistics Finland
Universal interface for all purposes
Speed the Statistical Process through good technical solutions
Optimal database structures for different purposes
Extensive Data Content
20/04/23 30Riitta Piela