10
http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes Example Data Specifications & Information Requirements Framework PHYSICAL DATA SPECIFICATION TEMPLATE Alan D. Duncan

05. Physical Data Specification Template

Embed Size (px)

DESCRIPTION

A template defining an outline structure for the clear and unambiguous definition of the discreet data elements (tables, columns, fields) within the physical data management layers of the required data solution.

Citation preview

Page 1: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Example Data Specifications & Information Requirements Framework

PHYSICAL DATA SPECIFICATION TEMPLATE

Alan D. Duncan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Page 2: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

1 PurposeThis document template defines an outline structure for the clear and unambiguous definition of the discreet data elements (tables, columns, fields etc.) within the physical data management layers of the required data solution.

This template forms part of example data specification & information requirements framework. The framework offers a set of outline principles, standards and guidelines to describe and clarify the semantic meaning of data terms in support of an Information Requirements Management process.

(See the Framework Overview for further details.)

Page 3: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

2 Physical Data Specification

The template should be completed for each individual data element required within the Logical Data Model layer (to the extent that this information can be defined).

Data Specification Item

Purpose

System Table (File) Name

The database table name (or file name in file-based data stores)

System Column Name

The database or file structure column.

Definition of what characteristic of each row the column describes from a business perspective. This should be based on the concept of what data the column

should contain. Sometimes judgement will be required to draw a line between what is normal use of a column and what constitutes a quality issue.

It is important here to concentrate on the relation of the column to the table rows and not on how the column is used. Whilst the latter may be of interest, it should never be a substitute for the former.

Whilst we should strive for consistency, the language of the system “owners” should generally be used here.

Typically definitions will need to refer to other tables or data entities. At a level of detail, these entities may have several definitions. It is important that the references are explicit when referring to a specific definition.

Examples should be included wherever this aids understanding. Where a column is found to contain de-normalised data, the path

of de-normalisation should be fully described. If a column has multiple definitions dependant on row, it should be

clearly described together with an indication of how to determine the actual definition for each row.

Column DescriptionElaboration of the purpose for the column

Column Domain TypeSee Appendix B for suggested list of Column Domains

Data Type & Lengthe.g. Varchar 12, Numeric 9.2

Required StatusRecord whether a value is always required (both from a physical and logical perspective (NULL/NOT NULL constraint).

Primary Key DefinitionList of Columns which constitute the primary key plus any other information pertinent to the identification of rows

Column Relationships

Linkages from the column to other tables (e.g. Foreign Key relationships to other table/columns).

Typically definitions will need to refer to other tables or data entities. At a level of detail, these entities may have several definitions. It is important that the references are explicit when

Page 4: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

referring to a specific definition.

Column ConstraintsAny constraint rules to be applied (e.g. Primary Key, Unique Key rules)

Data row Definitions

Definition of what each row represents from a business perspective.

Any detailed technical rules or semantic encoding defined at the record/row level of the data store. (e.g. Data Quality cleansing rules).

This should be based on the concept of what data the table should contain. Sometimes judgement will be required to draw a line between what is normal use of a table and what constitutes a quality issue.

It is important here to concentrate on what the rows represent and not on how they are used. Whilst the latter may be of interest, it should never be a substitute for the former.

Whilst we should strive for consistency, the language of the system “owners” should generally be used here.

If a table has sets of rows with quite different characteristics, this should be clearly described together with an indication of how the sets can be differentiated. Where appropriate a name should be allocated to each set and potentially multiple definitions of the table may need to be recorded. The Master/Copy status of each set should be recorded.

Data Scope should be recorded for all significant dimensions (inclusions & exclusions)

Value Range

The valid set of values for the Column. (Or valid range in case of Date & Number fields).

(Could be defined as a link or pointer to the location of an underlying master data set.)

Related Logical Model Data Element(s)

The supporting elements of the canonical model and their lineage with the physical columns.

Expected Data Volumes

Number of records Size per record

Master/Copy StatusIs this an originating master source (System of Record) for this data set, or a copy of the originating source?

Data Quality Indications

Any initial indications of poor data quality at row level + Cause + Business Impact

(NB: Not as a detailed level. This is an indicative assessment only, and should trigger a more DQ investigation by Data Governance Unit if indicated.

Good data management and data governance practices require that the physical data storage of any data solution aligns with the Enterprise Logical (Canonical) Model.

Data designers must therefore clearly demonstrate that the data structures within any data system of business application map to and align with the Logical Model.

Page 5: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

Beyond this requirement, data specification is not concerned with the technical details of data management implementation, and therefore takes no specific interest in the physical design or technical structure of any data stores or data processing layers therein.

Note that this physical data definition schema is suitable for both “Source” and “Target” data definitions.

Examples should be included wherever this aids understanding. Notwithstanding, the expectations of auditability, integrity, traceability and persistence must

be demonstrated.

Page 6: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

Appendix A: Column Domains – Candidate list

Name Definition

Amount A Monetary Amount. i.e. a Quantity of a CURRENCY

Code A character string or number which is used for identification purposes.* has no explicit natural language meaning - i.e. not an English word

Cost/Revenue Amount An Amount of a Currency where:* positive = Revenue* negative = Cost

Count

Date

Date/Time specification of seconds ?

Day of Month

Day of Week

Days A number of days.

Description A brief text description.

Details Data with embedded meaning and of a complex format but for which the meaning cannot be consistently interpreted by a computer system.

Direction Direction of an accounted balance.

DR/CR Amount An Amount of a Currency where:* positive = DR* negative = CR

Email Address

External Reference A code or reference for which the format is specified by an external party.

Factor A rate/proportion/ratio in the range 0 to a maximum value.

Frequency EXAMPLES Annual, Half Year, Quarterly, Monthly, Weekly, Daily, Ad Hoc

Indicator Binary Indicator - Yes or No.

Name A meaningful word or phrase used for identification purposes.

Notes Textual Notes.

Ordinal A number indicating a position within a sequence of numbers.

Phone International

Quantity A number of units.

Rate A rate/proportion/percentage.

Status A number or character string used to indicate a state which is likely to change over time.

Time

Type A number or character string used for classification with a discrete set of values per column.* could be an english word or phrase

Year A calendar year. E.g. 2002

Year/Month A month in a specific year. E.g. November 2002

Page 7: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

About the author

Alan D. Duncan is an evangelist for information and analytics as enablers of better business outcomes, and a member of the Advisory Board for QFire Software.

An executive-level leader in the field of Information and Data Management Strategy, Governance and Business Analytics, he has over 20 years of international business experience, working with blue-chip companies in a range of industry sectors. Alan was named by Information-Management.com in their 2012 list of “Top 12 Data Governance gurus you should be following on Twitter”.

Twitter: @Alan_D_Duncan

Blog: http://informationaction.blogspot.com.au/

Page 8: 05. Physical Data Specification Template

http://informationaction.blogspot.com

Tw: @Alan_D_Duncan

Information Strategy | Data Governance | Analytics | Better Business Outcomes

Physical Data Specification Template

Intellectual curiosity

Skeptical scrutiny

Critical thinking

http://www.informationaction.blogspot.com.au/

@Alan_D_Duncan

http://www.linkedin.com/in/alandduncan