24
Previously known as Think Big. Move Fast.

Ds04 data quality

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ds04   data quality

Previously known as

Think Big. Move Fast.

Page 2: Ds04   data quality

Template designed by

brought to you by

Page 3: Ds04   data quality

SolidQ

• Born in 2002 in USA and Spain

• Established in 2007 in Italy

• More than 1000 customers and more than 200 consultants worldwide

• Dedicated to Data Management on the Microsoft Platform

• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors

• www.solidq.com

Page 4: Ds04   data quality

Davide Mauri

• 18 Years of experience on the SQL Server Platform

• Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence

• Microsoft SQL Server MVP

• President of UGISS (Italian SQL Server UG)

• Mentor @ SolidQ

• Video, Book & Article Author

• Regular Speaker @ SQL Server events

• Projects, Consulting, Mentoring & Training

Page 5: Ds04   data quality

Data Quality

Page 6: Ds04   data quality

The BIG problem

• What’s the key asset of a company?• Data that leads to Information and then to Knowledge

• With the mass adoption of Business Intelligence / Analytics problems with Data Quality arises and become evident• Wrong, incomplete or incoherent data leads to wrong decisions

• Managers cannot “trust” native data

• Data needs to be reworked a lot in order to be usable• As per my experience, almost the 50% of the time spent developing a BI solution is use just to solve

Data Quality problems.

Page 7: Ds04   data quality

The BIG problem

• A Gartner research states that«Organizations estimated they are losing an average of $8.2 million annually as a result of data quality issues”• 22% report estimated losses for $20 million

• 4% report estimated losses for $100 million

Page 8: Ds04   data quality

Data Quality Concepts

Data Quality Issue Sample Data Problem

Standard Are data elements consistently defined and understood ?

Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system

Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999

Accurate Does the data accurately represent reality or a verifiable source?

A Supplier is listed as ‘Active’ but went out of business six years ago

Valid Do data values fall within acceptable ranges? Salary values should be between 60,000-120,000

Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?

Page 9: Ds04   data quality

Master Data Management

• A solution to the problem is offered by Master Data Management (MDM)• Is a Discipline and a Process supported by Technology

• MDM aim to discover and define non-transactional lists of data, with the goal of compiling maintainable master lists, that will become the reference data.

Page 10: Ds04   data quality

Master Data Management

• What are Master Data?• Master Data: “Slowly changing reference data shared across system”

• Master Data != Transactional Data

• Master Data != Metadata

• Reference Data:• Products, Customers, Suppliers, Geography, ecc.

• The Dimensions of a Data Warehouse

Page 11: Ds04   data quality

Master Data Services

• Introduced with SQL Server 2008 R2

• With SQL Server 2012 al lots of improvements• Web Interface improved *a lot*

• Silverlight based

• Integrated with Excel 2007 and after• Killer Application!

• Installed with SQL Server 2012 but must be configured prior usage• Needs IIS, WCF and so on…

• No Changes in 2014

Page 12: Ds04   data quality

Master Data Services

• Allow the management and the definition of Master Data• “Model” Definition

• Entities, Attributes, Hiearchies, ecc…

• Business Rules

• Data Stewardship • Through Excel Addin or the Web Portal

• Integration• Batch and/or WCF Service

Page 13: Ds04   data quality

What is Master Data Management?

ERP CRMWarehouse

MGMTInvoicingSystem

BI

Master Data Hub

ExternalSystem

Integration

Web Service Data Hub

Data Steward

Page 14: Ds04   data quality

Master Data Services

Page 15: Ds04   data quality

Data Quality Services

• A new Service introduced with SQL Server 2012• Enable the verification of «new» data against a established Knowledge Base

• Has its own client

• Must be installed after SQL Server 2012 installation• «Data Quality Service Installer»

• Three dedicated SQL Server database (DQS prefix)

Page 16: Ds04   data quality

Data Quality Services

• Help to• Define a Knowledge Base

• Through «Knowledge Discovery» and «Domain Management»

• Perform Data Cleasing & Data Matching (De-Duplication)

• Integrated with• Integration Services

• Master Data Services (via Excel Addin)

Page 17: Ds04   data quality

Data Cleasing

• Master (Reference) Data is needed for Data Cleansing• Can be provided directly from our data (Customer Names, for example)

• Can be supplied by third party companies: Azure Market Place• *Very* nice feature.

Page 18: Ds04   data quality

Data Quality Services

Page 19: Ds04   data quality

Identity Mapping & De-Duplication

• DQS is not the only solution for Identity Mapping & De-Duplication• MDS has some built-in functions

• SSIS Fuzzy Lookup is great for this• Great Performance & Results!

Page 20: Ds04   data quality

De-Duplication with Integration Services

Page 21: Ds04   data quality

Conclusions

• Bad Data Quality means Bad Business

• Start working on Data Quality ASAP• Define a business process to achieve Data quality

• MDS and DQS will help to support it

• Integration with existing application via• Batch

• SOA

• High Data Quality will be a must have!

Page 22: Ds04   data quality

Link

• Free Data Quality eBookhttp://bit.ly/mMJgKv

Page 23: Ds04   data quality

Link

• MDS Homepagehttp://msdn.microsoft.com/en-us/sqlserver/ff943581.aspx

• 3rd Party Client Applicationhttp://profisee.com/

• Data Quality & Data Sciencehttp://www.solidq.com/consulting/

Page 24: Ds04   data quality

Previously known as

Think Big. Move Fast.