Upload
kimberly-miranda-sherman
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Statistics New Zealand’s Case Study
”Creating a New Business Model for a National Statistical Office if the 21st Century”
Craig Mitchell, Gary Dunnet, Matjaz Jug
Overview• Introduction: organization, programme, strategy• The Statistical Metadata Systems and the
Statistical Cycle: description of the metainformation systems, overview of the process model, description of different metadata groups
• Statistical Metadata in each phase of the Statistical Cycle: metadata produced & used
• Systems and Design issues: IT architecture, tools, standards
• Organizational and cultural issues: user groups• Lessons learned
Macro-Economic,
Environment, Regional &
Geography
HRAM: Alan McIntyre x4662
Rachael MilicichDeputy Government Statistician (Acting)
Macro-Economic, Environment, Regional &
Geography StatisticsEA: Indigo Freya x4858
National AccountsMichael Anderson x4930
PricesJohn Morris x4307
Government & International Accounts
Peter Swensson x4060
55
37
52
57
Strategic Communication
Sam Fisher x4225
Strategic Policy & Planning
Paul Maxwell x4727
Financial ServicesRaj Narayan x4709
EA: Eugénie Bint x490303
Corporate SupportSandy Natha x4242
08
Human Resources Business UnitVina Cullum X4815
07
Social ConditionsPaul Brown x4304
PopulationDenise McGregor x4303
Standard of LivingAndrea Blackburn x4680
Census 2011Carol Slappendel x4947
General ManagerEA: Tania Mattock x4074
Social & Population
HRAM: Robynn Cade x4681 Business TransformationStrategy
Gary Dunnet x4650
Product Development& Publishing
Gareth McGuinness x4851
HRAM: HR Account Manager
EA: Executive Assistant
Business Performance & Agriculture
Eileen Basher x4701
34
62
Business, Financial& Structural
Andrew Hunter x835535
Business IndicatorsLouise Holmes-Oliver x8780
and Kathy Connolly x897536
Work, Knowledge & Skills
Julian Silver x4387
Information CustomerServices
Mike Moore x8701
David ArcherGeneral Manager
Vina Cullum(Acting till 11 July 2007)
Corporate ServicesEA: Eugenie Bint x4903
09
Dallas WelchDeputy Government
StatisticianIndustry & Labour
StatisticsEA: Eugenie Bint x4903
Statistical & Methodological
HRAM: Robynn Cade x4681
Collection &Classification
StandardsBridget Hamilton-Seymour x4833
31
OSRDACHamish James x4237 61
Statistical MethodsDiane Ramsay x4355
27
Cathryn Ashley - Jones
Deputy Government Statistician
Social & PopulationStatistics
EA: Tania Mattock x4074
65
Industry & Labour Statistics
HRAM: Lisa Mulholland x4871
Integrated Data Collection
Ray Freeman x9143
Last Updated 20/06/07
Social Statistics Development UnitTere Scotney x4956
51Macro-Economic StatisticsDevelopment Unit
Judith Hughes X4803
Integrated Data Collections
HRAM: Alan McIntyre x4662
Strategy & Communications
HRAM: Alan McIntyre x4662
6309
Planning & Performance Reporting
Greta Gordon x4223Geography, Regional &Environment
Tammy Estabrooks x4614Manager (Acting)
EA: Indigo Freya x4858
Geoff BascandGovernment Statistician
EA: Kathy Warren x4760
01Andrew HunterGeneral Manager
Christchurch Office
39
Sharleen ForbesGeneral Manager
Statistical Education & Research
EA: Indigo Freya x4858
21
47
59
56
66
58
98
82
78
38
Ray FreemanGeneral ManagerAuckland Office
EA: Diane McGuire x9315
Gary DunnetGeneral Manager
Business & DisseminationServices
EA: Hanli van der Westhuizen x4235
90
Nancy McBethGeneral Manager
Strategy &Communication
EA: Hanli van der Westhuizen x4235
14
Application Services
Nathan Scott x4156
IT Operations & Services
Sharon Hastie x4645
Vince GalvinGeneral Manager
Statistical &Methodological
ServicesEA: Indigo Freya x4858
67
Chief Information OfficerMatjaz Jug x4238
EA Hanli van der Westhuizen x4235
Maori Statistics UnitElizabeth Bridge x4696
Corporate Services and Maori
Statistics Unit
HRAM: Robynn Cade x4681
Whetu WeretaGeneral Manager
Maori Statistics UnitEA: Eugenie Bint x4903
15
Business & Dissemination
Services and
Chief Information Officer
HRAM: Lisa Mulholland x4871
29
91
Statistical Education &
Research
HRAM: Alan McIntyre x4662
Business model Transformation Strategy
1. A number of standard, generic end-to end processes for collection, analysis and dissemination of statistical data and information
Includes statistical methods Covering business process life-cycle To enable statisticians to focus on data quality and implemented
best practice methods, greater coordination and effective resource utilisation.
2. A disciplined approach to data and metadata management, using a standard information lifecycle
3. An agreed enterprise-wide technical architecture
BmTS & MetadataThe Business Model Transformation Strategy (BmTS) is designing
a metadata management strategy that ensures metadata:– fits into a metadata framework that can adequately describe all
of Statistics New Zealand's data, and under the Official Statistics Strategy (OSS) the data of other agencies
– documents all the stages of the statistical life cycle from conception to archiving and destruction
– is centrally accessible– is automatically populated during the business process, where
ever possible– is used to drive the business process– is easily accessible by all potential users– is populated and maintained by data creators– is managed centrally
A - Existing Metadata Issues• metadata is not kept up to date• metadata maintenance is considered a low priority• metadata is not held in a consistent way • relevant information is unavailable• there is confusion about what metadata needs to be stored • the existing metadata infrastructure is being under utilised • there is a failure to meet the metadata needs of advanced
data users• it is difficult to find information unless you have some
expertise or know it exists• there is inconsistent use of classifications/terminology• in some instances there is little information about data, where
it came from, processes it has been under or even the question to which it relates
B - Target Metadata Principles• metadata is centrally accessible• metadata structure should be strongly linked to data• metadata is shared between data sets• content structure conforms to standards• metadata is managed from end-to-end in the data life cycle.• there is a registration process (workflow) associated with each
metadata element• capture metadata at source, automatically• ensure the cost to producers is justified by the benefit to users• metadata is considered active• metadata is managed at as a high a level as is possible • metadata is readily available and useable in the context of
client's information needs (internal or external)• track the use of some types of metadata (eg. classifications)
How to come from A to B?1. Identified the key (10) components of our
information model.
2. Service Oriented Architecture.
3. Developed Generic Business Process Model.
4. Development approach from ‘stove-pipes’ to ‘components’ and ‘core’ teams.
5. Governance – Architectural Reviews & Staged Funding Model.
6. Re-use of components.
10 Components within BmTS
2. Output Data Store
CleanData
AggregateData
1. Input Data Store
3. Metadata StoreStatistical
ProcessKnowledge Base
9. Reference Data Stores
4. Analytical Environment
5. Information Portal
6. Transformations
RawData
7. Respondent Management 8. Customer Management
RA
DL
Web
Ou
tpu
t C
han
nel
s
Mu
lti-Mo
dal C
ollectio
nC
UR
FS
INF
OS
E-F
ormC
AI
Imaging
Adm
in.D
ataO
ffic
ial S
tatis
tics
Sys
tem
&
Da
ta A
rch
ive
SummaryData
‘UR’Data
10. Dashboard / Workflow
Time SeriesStore
(& INFOS)
Metadata Store (statistical, e.g. SIM)
Reference Data Store (e.g. BF, CARS)
NeedDesign/Build
Collect Process Analyse Disseminate
Software Register
Document Register
Management Information - HR & Finance Data Stores
Statistics New Zealand Current Information Framework
Generic Business Process
ICS Store
QMS, Ag
HES etc.
Web Store
Range of information stores by subject area (silos)
Process
Metadata Store (statistical/process/knowledge)
Reference Data Store
NeedDesign/Build
Collect Analyse Disseminate
Statistics New Zealand Future Information Framework
Generic Business Process
RawData
TS
ICS
WEB
Software Register
Document Register
Management Information - HR & Finance Data Stores
Output Data Store (confidentialised
copy of IDS - Physically separated)
CleanData
SummaryData
Input Data Store
CMF – gBPM MappingCMF Lifecycle Model Statistics NZ gBPM (sub-process level)
1 - survey planning and design Need (sub-processes 1.1 - 1.5) + Develop & Design (sub-processes 2.1 - 2.6)
2 - survey preparation Build (sub-processes 3.1 - 3.7) + Collect (sub-process 4.1)
3 - Data collection Collect (sub-processes 4.2 - 4.4)
4 - Input processing Collect (sub-process 4.5) + Process (sub-processes 5.1 - 5.3)
5 - Derivation, Estimation, Aggregation
Process (sub-processes 5.4 - 5.7)
6 - Analysis Analyse (sub-processes 6.1 - 6.6)
7 - Dissemination Disseminate (sub-processes 7.1 - 7.5)
8 - Post survey evaluation Not an explicit process, but seen as a vital feedback loop.
Metadata: End-to-End Need
– capture requirements eg usage of data, quality requirements – access existing data element concept definitions to clarify requirements
Design– capture constraints, basic dissemination plans eg products– capture design parameters that could be used to drive automated
processes eg stratification– capture descriptive metadata about the collection - methodologies used– reuse or create required data definitions, questions, classifications
Build– capture operational metadata about selection process eg number in each
stratum– access design metadata to drive selection process
Collect– capture metadata about the process– access procedural metadata about rules used to drive processes– capture metadata eg quality metrics
Metadata: End-to-End (2) Process
– capture metadata about operation of processes– access procedural metadata, eg edit parameters– create and/or reuse derivation definitions and imputation parameters
Analyse– capture metadata eg quality measures– access design parameters to drive estimation processes– capture information about quality assurance and sign-off of products– access definitional metadata to be used in creation of products
Disseminate– capture operational metadata – access procedural metadata about customers– Needed to support Search, Acquire, Analyse (incl; integrate), Report– capture re-use requirements, including importance of data - fitness for
purpose– Archive or Destruction - detail on length of data life cycle.
Metadata: End-to-End - Worked Example
Question Text: “Are you employed?” Need
– Concept discussed with users– Check International standards– Assess existing collections & questions
Design– Design question text, answers & methodologies– Align with output variables (e.g. ILO classifications)– Data model, supported through meta-model– Develop Business Process Model – process & data / metadata flows
Build– Concept Library – questions, answers & methods– ‘Plug & Play’ methods, with parameters (metadata) the key– System of linkages (no hard-coding)
Metadata: End-to-End - Worked Example
Question Text: “Do you live in Wellington?” Collect
– Question, answers & methods rendered to questionnaire– Deliver respondents question– Confirm quality of concept
Process– Draw questions, answers & methods from meta-store– Business logic drawn from ‘rules engine’
Analyse– Deliver question text, answers & methods to analyst– Search & Discover data, through metadata– Access knowledge-base (metadata)
Disseminate– Deliver question text, answers & methods to user– Archive question text, answers & methods
Conceptual View of Metadata
Anything related to data, but not dependent on data = metadata
There are four types of metadata in the model: Conceptual (including contextual), Operational, Quality and Physical
…defined by MetaNet
Metadata
Implementation: Dimensional Model
FACT
Dimension
DimensionDimension
Dimension
•Standard classifications•Standard variables
•Survey•Instruments•Survey mode
•Standard data definition
•Standard questions
Input Data Environment
Metadata
Architecture
FACT FACT
Service layer
Reference data Classifications
INFORMATION PORTAL
User access
answer_part
ap_key <pk> int identityanswer_part_text varchar(255)
question_answer_part
qap_key <pk> int identityq_key <fk> intap_key <fk> intfd_key <fk> intdata_type_code char(1)
instrument_question_map
iqm_key <pk> int identityi_key <fk> intqap_key <fk> intq_code varchar(25)ap_code varchar(25)line_seq_nbr intcolumn_seq_nbr intunit_of_measure varchar(25)magnitude varchar(25)question_type_code char(1)
class ification_used
cu_key <pk> int identityclassfn_nbr intclassfn_ver_nbr intlevel_nbr intclassfn_cat_code varchar(15)
fact_definition_classification
fd_key <pk,fk> intcu_key <pk,fk> int
fact_definition
fd_key <pk> int identitydesc_text varchar(1000)
variable_library
v_key <pk> int identityvar_name varchar(255)fd_key <fk> intdata_type_code char(1)
instrument_variable_map
ivm_key <pk> int identityi_key <fk> intv_key <fk> intcolumn_nbr intfile_offset intvar_length intunit_of_measure varchar(25)magnitude varchar(25)
instrument
i_key <pk> int identityname_text varchar(255)instrument_code varchar(30)instrument_type_code char(1)
question
q_key <pk> int identityquestion_text varchar(1000)
collection
c_key <pk> int identityname_text varchar(255)freq_code char(1)
collection_instance
ci_key <pk> int identityc_key <fk> intcollection_instance_code varchar(25)collection_instance_type_code char(1)name_text varchar(255)status_code varchar(30)reference_period_start_date datetimereference_period_end_date datetime
instrument_instance
ii_key <pk> intci_key <fk> inti_key <fk> intsu_key <fk> int
instrument_mode
i_key <fk> intm_key <fk> int
unit_of_interest
uoi_key <pk> int identityii_key <fk> intuoi_id char(10)uoi_source_code char(3)name_text varchar(100)uoi_type_code char(1)status_code char(1)s_key <fk> int
mode
m_key <pk> int identitymode_code varchar(10)
supplying_unit
su_key <pk> int identitysu_id varchar(25)su_source_code char(3)name_text varchar(100)su_type_code char(3)
fact
f_key <pk> int identityfact_group_key intfact_ver_nbr intflc_key <fk> intr_key <fk> intci_key <fk> intuoi_key <fk> intqap_key <fk> intfd_key <fk> inti_key <fk> intrfc_key <fk> intsu_key <fk> intv_key <fk> intactual_period_start_key <fk> intactual_period_end_key <fk> intcreate_date datetimecreate_user sysnamefact_value varchar(2000)
fact_life_cycle
flc_key <pk> int identitystatus_code varchar(30)
additional_dimension
ad_key <pk> int identityad_text varchar(255)
dim_level
dl_key <pk> int identityad_key <fk> intdl_parent <fk> intdl_text varchar(255)
dim_member
dm_key <pk> int identitydl_key <fk> intdm_parent <fk> intdm_text varchar(255)
fact_defn_dimension
dm_key <fk> intfd_key <fk> int identity
reason_for_change
rfc_key <pk> int identityreason_text varchar(255)
domain_value
domain_table varchar(50)domain_column varchar(50)domain_code varchar(30)domain_label varchar(255)
response
r_key <pk> int identitym_key <fk> intii_key <fk> intresponse_id varchar(50)
strata
s_key <pk> int identityci_key <fk> intstrata_code varchar(10)sub_strata_code varchar(10)
strata_attribute
sa_key <pk> int identitys_key <fk> intdata_type_code char(1)name_text varchar(50)value_text varchar(255)
weight
uoi_key <fk> ints_key <fk> intweight_type_code char(1)weight_value floatcreate_date datetimecomment_text varchar(1000)
time
period_key <pk> int identityyear intmonth intday intdate datetimeweek int
IDE Operational Areaand Exceptions Area
VersioningDimensions
Collection Dimensions
Respondent Dimensions
IDE/MetaStore
ins trument_attribute
iat_key <pk,fk> intiqm_key <pk,fk> intattribute_text varchar(255)
instrument_attribute_type
iat_key <pk> int identityattribute_type_code varchar(10)
Question Dimensions
fact_c lassification
fact_group_key <fk> intcu_key <fk> int
response_attribute_type
rat_key <pk> int identityattribute_type_code varchar(10)
response_attribute
rat_key <pk,fk> intr_key <pk,fk> intattribute_text varchar(255)
exception_fact
ef_key <pk> int identityexception_type_code char(1)f_key <fk> intfact_group_key intfact_ver_nbr intflc_key <fk> intr_key <fk> intci_key <fk> intuoi_key <fk> intqap_key <fk> intfd_key <fk> inti_key <fk> intrfc_key <fk> intsu_key <fk> intv_key <fk> intactual_period_start_key <fk> intactual_period_end_key <fk> intcreate_date datetimecreate_user sysnamefact_value varchar(2000)
Generic Dimensions
Static Reference
Tables
Version 2.0.06
* exception_ fact table relationships have not been depicted.Relationships are implied between parent table primary keysand child table foreign keys that exist in exception_ fact.
Questions & Variables
Fact definitions
Collections& Instruments Respondents
Versioning Time
DimensionsHiearchies
Units of Interest
Goal: Overall Metadata EnvironmentSearch and Discovery
Metadata and Data Access
Frames/Reference Stores
Schema
Data Definition
Classification
Management
Business Logic
Question Library
Passive Metadata Store/s
Data
Metadata: Recent Practical Experiences Generic data model – federated cluster design
– Metadata the key– Corporately agreed dimensions– Data is integrateable, rather than integrated
Blaise to Input Data Environment– Exporting Blaise metadata
‘Rules Engine’ – Based around s/sheet– Working with a workflow engine to improve (BPM based)
IDE Metadata tool Currently s/sheet based
Audience Model– Public, professional, technical – added system
SOA
Service Layer (Message and Data Bus)
Application Services
Transaction Mgmt Transaction Mgmt Directory Services Directory Services Resource Mgmt Resource Mgmt
Execution Engine Execution Engine
Load Mgmt Load Mgmt
Support Functions
Security Application Admin System
Monitoring
Services
Support Functions
Security Application Admin System
Monitoring
Support Functions
Security Application Admin System
Monitoring Security Application Admin System
Monitoring
Services
Process Management
Queuing Workflow
Scheduling
Services Process Management
Queuing Workflow Workflow
Scheduling Scheduling
Services
Blaise Blaise Respondent Management
CRM Respondent Management
CRM Customer
Management CRM
Customer Management
CRM Call Centre Call Centre SAS SAS ETL Tools ETL Tools SQL Server SQL Server Other Other
Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter
Data Warehouse
BI Cubes, SAS etc Analytics Analytics
Channel Interfaces
Intranet Extranet Web Services Internet
Channel Interfaces
Intranet Extranet Web Services Internet
Business Rules
Rules Engine Rules Engine Services
Rules Engine Transformations
Databases Services
Standards & Models - The MetaNet Reference ModelTM
Two Level Model based on: Concepts = basic ideas, core of
model
Characteristics = elements, attributes, make concepts unique
Terms and descriptions can be adapted
Concepts must stay the same
Concepts should be distinct and consistent
Concepts have hierarchy and relationships
Question 1Question 1
Question 2
Question 3
Question 2
Classifications
Classifications
Classifications
Classifications
Collection
Questionaire A Questionaire B
Collection Instance
Fact definition 1
Fact definition 2
Fact definition 3
Fact definition 4
Do you live in Wellington?
Person lives in Wellington
Classification: CITY Category: WGTNClassification: NZ Island Category: NTH ISL
Question 1
Fact definition 2Classifications
Question 3
Fact definition 4
Question 1
How old are you?
What is your age?
Age of person
Eg. Census 2006
Eg. CensusFrequency= 5 yearly
Defining Metadata Concepts: Example
How will we use MetaNet?1. Use to guide the development of a Stats NZ
model
2. Another model (SDMX) will be used for additional support in gaps
3. Provides the base for consistency across systems and frameworks
4. Will allow for better use and understanding of data
5. Will highlight duplications and gaps in current storage
Metainformation systems
Concept Based Model
SIM Other Metadata stored in:
•Business Frame
•Survey Systems
•BmTS components
•etc
IDECARSData Collections
Variables
Statistical Units
Sample Design
Classifications
Categories
Concordance
Domain Value
Collection
Fact Classification
Response
Metadata Users - External
• Government,
• Public,
• External Statisticans (incl. Intl Orgs)
Metadata Users - Internal– Statistical Analysts– IT Personnel (business analysts, IT designers & technical leads,
developers, testers etc.)– Management– Data Managers / Custodians / Archivists– Statistical Methodologists– External Statisticians (researchers etc.)– Architects - data, process & application– Respondent Liaison– Survey Developers– Metadata and Interoperability Experts– Project Managers & Teams– IT Management– Product Development and Publishing– Information Customer Services
Lessons Learnt – Metadata Concepts
• Apart from 'basic' principles, metadata principles are quite difficult. To get a good understanding of and this makes communication of them even harder.
• Every-one has a view on what metadata they need - the list of metadata requirements / elements can be endless. Given the breadth of metadata - an incremental approach to the delivery of storage facilities is fundamental.
• Establish a metadata framework upon which discussions can be based that best fits your organisation - we have agreed on MetaNet, supplemented with SDMX.
Lessons Learnt – BPM• To make data re-use a reality there is a need
to go back to 1st principles, i.e. what is the concept behind the data item. Surprisingly it might be difficult for some subject matter areas to identify these 1st principles easily, particularly if the collection has been in existence for some time.
• Be prepared for survey-specific requirements: the BPM exercise is absolutely needed to define the common processes and identify potentially required survey-specific features.
Lessons Learnt – Implementation• Without significant governance it is very easy to
start with a generic service concept and yet still deliver a silo solution. The ongoing upgrade of all generic services is needed to avoid this.
• Expecting delivery of generic services from input / output specific projects leads to significant tensions, particularly in relation to added scope elements within fixed resource schedules. Delivery of business services at the same time as developing and delivering the underlying architecture services adds significant complexity to implementation.
Lessons Learnt – Implementation
• Well defined relationship between data and metadata is very important, the approach with direct connection between data element defined as statistical fact and metadata dimensions proved to be successful because we were able to test and utilize the concept before the (costly) development of metadata management systems.
Lessons Learnt – SOA
• The adoption and implementation of SOA as a Statistical Information Architecture requires a significant mind shift from data processing to enabling enterprise business processes through the delivery of enterprise services.
• Skilled resources, familiar with SOA concepts and application are very difficult to recruit, and equally difficult to grow.
Lessons Learnt – Governance• The move from ‘silo systems’ to a BmTS type
model is a major challenge that should not be under-estimated.
• Having an active Standards Governance Committee, made up of senior representatives from across the organisation (ours has the 3 DGSs on it), is a very useful thing to have in place. This forum provides an environment which standards can be discussed & agreed and the Committee can take on the role of the 'authority to answer to' if need be.
Lessons Learnt – Other
• There is a need to consider the audience of the metadata.
• Some metadata is better than no metadata - as long as it is of good quality.
• Do not expect to get it 100% right the very first time.
Questions?