Upload
dotnetcampus
View
92
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Previously known as
Think Big. Move Fast.
Template designed by
brought to you by
SolidQ
• Born in 2002 in USA and Spain
• Established in 2007 in Italy
• More than 1000 customers and more than 200 consultants worldwide
• Dedicated to Data Management on the Microsoft Platform
• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors
• www.solidq.com
Davide Mauri
• 18 Years of experience on the SQL Server Platform
• Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence
• Microsoft SQL Server MVP
• President of UGISS (Italian SQL Server UG)
• Mentor @ SolidQ
• Video, Book & Article Author
• Regular Speaker @ SQL Server events
• Projects, Consulting, Mentoring & Training
Data Quality
The BIG problem
• What’s the key asset of a company?• Data that leads to Information and then to Knowledge
• With the mass adoption of Business Intelligence / Analytics problems with Data Quality arises and become evident• Wrong, incomplete or incoherent data leads to wrong decisions
• Managers cannot “trust” native data
• Data needs to be reworked a lot in order to be usable• As per my experience, almost the 50% of the time spent developing a BI solution is use just to solve
Data Quality problems.
The BIG problem
• A Gartner research states that«Organizations estimated they are losing an average of $8.2 million annually as a result of data quality issues”• 22% report estimated losses for $20 million
• 4% report estimated losses for $100 million
Data Quality Concepts
Data Quality Issue Sample Data Problem
Standard Are data elements consistently defined and understood ?
Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system
Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999
Accurate Does the data accurately represent reality or a verifiable source?
A Supplier is listed as ‘Active’ but went out of business six years ago
Valid Do data values fall within acceptable ranges? Salary values should be between 60,000-120,000
Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?
Master Data Management
• A solution to the problem is offered by Master Data Management (MDM)• Is a Discipline and a Process supported by Technology
• MDM aim to discover and define non-transactional lists of data, with the goal of compiling maintainable master lists, that will become the reference data.
Master Data Management
• What are Master Data?• Master Data: “Slowly changing reference data shared across system”
• Master Data != Transactional Data
• Master Data != Metadata
• Reference Data:• Products, Customers, Suppliers, Geography, ecc.
• The Dimensions of a Data Warehouse
Master Data Services
• Introduced with SQL Server 2008 R2
• With SQL Server 2012 al lots of improvements• Web Interface improved *a lot*
• Silverlight based
• Integrated with Excel 2007 and after• Killer Application!
• Installed with SQL Server 2012 but must be configured prior usage• Needs IIS, WCF and so on…
• No Changes in 2014
Master Data Services
• Allow the management and the definition of Master Data• “Model” Definition
• Entities, Attributes, Hiearchies, ecc…
• Business Rules
• Data Stewardship • Through Excel Addin or the Web Portal
• Integration• Batch and/or WCF Service
What is Master Data Management?
ERP CRMWarehouse
MGMTInvoicingSystem
BI
Master Data Hub
ExternalSystem
Integration
Web Service Data Hub
Data Steward
Master Data Services
Data Quality Services
• A new Service introduced with SQL Server 2012• Enable the verification of «new» data against a established Knowledge Base
• Has its own client
• Must be installed after SQL Server 2012 installation• «Data Quality Service Installer»
• Three dedicated SQL Server database (DQS prefix)
Data Quality Services
• Help to• Define a Knowledge Base
• Through «Knowledge Discovery» and «Domain Management»
• Perform Data Cleasing & Data Matching (De-Duplication)
• Integrated with• Integration Services
• Master Data Services (via Excel Addin)
Data Cleasing
• Master (Reference) Data is needed for Data Cleansing• Can be provided directly from our data (Customer Names, for example)
• Can be supplied by third party companies: Azure Market Place• *Very* nice feature.
Data Quality Services
Identity Mapping & De-Duplication
• DQS is not the only solution for Identity Mapping & De-Duplication• MDS has some built-in functions
• SSIS Fuzzy Lookup is great for this• Great Performance & Results!
De-Duplication with Integration Services
Conclusions
• Bad Data Quality means Bad Business
• Start working on Data Quality ASAP• Define a business process to achieve Data quality
• MDS and DQS will help to support it
• Integration with existing application via• Batch
• SOA
• High Data Quality will be a must have!
Link
• MDS Homepagehttp://msdn.microsoft.com/en-us/sqlserver/ff943581.aspx
• 3rd Party Client Applicationhttp://profisee.com/
• Data Quality & Data Sciencehttp://www.solidq.com/consulting/
Previously known as
Think Big. Move Fast.