Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Business Imperative For Data-Driven Context
4
Hyoun ParkFounder and Chief Analyst
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
As data professionals and quantitative executives,
we’ve been told to build the foundation for the “Data-
Driven Enterprise”
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
And the Results:Faster Data
Reduced Data Half-Life
Lower Predictive Value
Bigger Data
Too Much to Analyze
Too Hard to Access
More Usage
Lots of SilosLack of
Consistency
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
“Metadata is a love note to the future.
Jason Scott, Textfiles.com
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
We started with the basics:
Building a data and metadata dictionary of basic terms.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
Challenges of the Data Dictionary
Often built for data practioners, not data users
Limited to Structured Data
Focused on specific
departments by design
Hard to maintai
n
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Biggest Challenge is just that Data grows too fast!
Structured Data Growth
Mobile Data Growth
All Data Growth
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
Structured data is estimated to double every 18-24 months, in
step with Moore’s Law.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
Based on their 2019 Report, the CTIA states data use is up over 73x
since 2010.
This is an average increase of 170% per
year every year!
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
Including unstructured data, we estimate that 90% of the world’s data has been created in the
last two years.
This growth rate is equivalent to over 200% increase in data every year!
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Battle for Context means that “Big Data” needs to be categorized, curated, or tossed
away.
The future of AI requires Data Management at massive scale.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
THE NEXT GENERATION OF METADATA MANAGEMENT:
DATA CATALOGUING
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
“Data cataloguing” needs to be an ongoing process, not a one-time build like a data warehouse or an ETL job.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Challenge of Discovery
TOO MANY data and metadata catalogs, repositories, glossaries, taxonomies, and
ontologies scattered across your organization
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
Data Sources (Marts, Lakes, Clouds, etc)
Data Catalogs
Consolidated Data
Holistic Business Data
Departmental Catalog 1
Source 1
Source 2
Departmental Catalog 2
Source 3
Source 4
Data Pipelines
Data Virtualization
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Challenge of Categorization
The initial categorization of data requires human expertise and context. AI can supplement
human effort, but cannot fully replace human experience.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
So Many Stakeholders!
Catalog Stakeholders
Data Analysts
Data Engineers
Data Scientists
Developers
Line-of Business Managers
Executives
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Challenge of Curation
From now on, there will always be too much data from too many sources for humans to fully curate. AI will be necessary to provide an initial
pass at curation.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Answer?
A combination of top-down management, machine-learning aided discovery and curation,
& dedicated efforts to support metadata consistency.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The Virtuous Cycle of Context
Discovery
Categori-zationCuration
Data Sources and
Dictionaries
Business Experts and
Subject Matter Experts
AI to provide initial results
based on rules and confidence
levels
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
This is why:
AI Needs Data Management
and
Data Management Needs AI
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
So, the real goal?
The Context-Driven Business. Data is a means to an end, not the end goal.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
1.Discover all of your metadata stores across all your
departments.Embrace the diversity of names!
Deal with the politics!
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
2. Identify Data Experts within each key business unit and
department.Cheat Code: Check with finance to see which units and departments have the biggest P/Ls.
Start with them.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
3. After you have a working and standardized taxonomy,
automate and conduct long-tail categorization with AI.
You’ll still need people to verify categorization, but AI can find interesting and relevant
groupings.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
4. Use these steps as a Virtuous Cycle for Context.
People come in for scheduled reviews, but can’t spend all their time on discovery and
categorization. The goal is to maximize the value of human insights, not to turn humans into
machines.
\\ AMALGAM INSIGHTS Twitter: @hyounpark\\
The “Data Driven Enterprise” is the Past
The “Context Driven Enterprise” is the
Future
4040 © Informatica. Proprietary and Confidential.
Data Drives All Digital Transformation Priorities
Governance Self-Service/ Advanced Analytics
CloudModernization
CustomerExperience
Machine Learning/AI
Explosion in Data Volume
New Data Types (mobile, social, IoT)
New Users
Data in the Cloud
500 millionbusiness data users and growing
Over 94% of data center trafficwill come from the Cloud
20 billion connected devices
1 billion workerswill be assisted by machine learning or AI
20.6 zettabytes per yearin global data center traffic
Growing Complexity of Data Landscape
4242 © Informatica. Proprietary and Confidential.
Intelligent Data Cataloging is the First Step
Lineage, Change Notification
Business Context Association
Governance
Lineage, Related Data, Recommendations
Collaboration – Ratings, Reviews, Certification
Self-Service/ Advanced Analytics
Lineage, Impact Analysis
Detailed Data Usage Information
CloudModernization
Lineage
Discovery and Onboarding
CustomerExperience
© Informatica. Proprietary and Confidential.4343
Find the data you need with simple, powerful
semantic search
Enterprise Data CatalogData Map for the Enterprise
Understand your enterprise data with a holistic view
Trust your data by understanding its lineage and quality
© Informatica. Proprietary and Confidential.4444
Enterprise Data CatalogAI, Human Knowledge and Collaboration
AI-powered automatic discovery, enrichment
and curation
Business context via intelligent business term
association
Collaboration & social curation to tap into shared
data knowledge
© Informatica. Proprietary and Confidential.4545
Open APIs for ExtensibilityExtend EDC Capabilities into Your Environment
EDC Tableau Extension - understand data in context within the native Tableau UI
EDC + Cloud Data Integration – accelerate development; discover and select assets, auto
populate connection values
PowerCenter | DQ MDM | BDM | DIH
BG | ILM | Axon | Informatica Cloud
Informatica
Oracle | DB2 | DB2 for z/OSSQL Server | Sybase | TeradataNetezza | JDBC | SQL Scripts |
SAP HANA | Stored Procedures
Databases
SAP R/3 | SalesforceOracle | Workday
Applications
HIVE (Cloudera, Hortonworks, MapR, IBM BigInsights, EMR, HDI)
HDFS | MapRFS |
Cloudera Navigator | Atlas
Big Data
AWS S3 | AWS Redshift | Azure SQL DB | Azure SQL DW | Azure
ADLS | Azure Blob | Google BigQuery | ADLS Gen 2
Cloud Platforms
CSV | Delimited | XML | JSON | Avro | Parquet | MS Excel | Adobe PDF | Flat File | MS
PowerPoint | MS Word
File Formats
Tableau | IBM Cognos |
SAP BusinessObjects
MicroStrategy | OBIEE
Business Intelligence
Microsoft SSIS | Erwin Models | PowerDesigner | Oracle Data Integrator | IBM DataStage | Custom Scanner Framework
Other
EnterpriseData
Catalog
• Semantic Search• Domain Discovery• Similarity Clustering• Business Term Association
• Relationships• Business Context• Glossary Integration• Custom Annotations
Analytics DataGovernance
Master DataManagement
CloudModernization
Metadata Intelligence
Data Integration Data Quality
• Discovery• Profiling• Lineage• Impact Analysis
• Reviews/Ratings• Questions/Answers• Data Certifications• Change Notifications
The Catalog of Catalogs
On-premDatabases
File Systems
BI Tools
On-prem/ SaaS Apps
ETLAWS Glue Azure Data
CatalogADLS Google Data Catalog
Knowledge Graph + Powered by AI/ML
Breadth of Active Metadata
Open APIs, Full Access
Enterprise Data Catalog
The image part with relationship ID rId2 was not found in the file.
Enterprise | Unified Metadata | Intelligence
Schema Inference
Recommendations
Data Tagging
Entity Discovery
Relationship Discovery Natural Language Translation
Data Similarity
Anomaly Detection
©New York Life Insurance Co., 2019
About New York Life Insurance
50
Since 1845, people have worked with New York Life to protect their families and futures. We believe in the importance of human guidance and in trusted relationships built on being there when our customers need us most.
Fast Facts• Headquarters in Manhattan • 11,000 employees across 15 corporate locations• 12,000 agents across over 200 field sales offices around the United
States• Providing customers with a range of products and services, including
life insurance, annuities, long term care insurance, mutual funds, exchange traded funds, institutional investments, and investment services
©New York Life Insurance Co., 2019
Supporting a Logical Data Model
51
Axon
EnterpriseDataCatalog
InformaticaDataQuality
Business AnalystsSubject Matter ExpertsData Modeler
Data Sources
Attributes Systems Glossary Data Quality PeopleData Sets
Data Steward
1
Data elements are published in Axon and
linked to source systems and attributes, preferred
source identified
Data steward prioritizes data elements for respective domain
2
Metadata & lineage for each source database is
scanned in EDC
EDC columns linked to business
terminology and definitions from
Axon
3Data quality mapplets are created in IDQ and linked to local data quality rules
in Axon
Data discovery profiles with primary key and foreign key analysis
executed on key sources
4
Data stewards and subject matter experts define data elements, standardize reference values, formats, etc.
Axon attributes are linked to columns
from EDC 5
6
Axon
©New York Life Insurance Co., 2019
Data Governance Capabilities
52
Metadata Management Data Quality Management
Privacy, Risk, & Compliance
Master Data Management
Data Governance Capabilities
• Glossary – Business terminology
• Catalog – Technical metadata & lineage
• Auto-discovery
• Profiling• Rules & metrics• Identifying, prioritizing, &
remediating data defects
• Decision making• Policies• Data Access• Appropriate use
• Modern data model• Processes to create & update• Controls to ensure quality
Tableau and Informatica
Solution Theater Presenter:
EDC Native Extensionfor Tableau
Hundreds ofJoint Customers
56 © Informatica. Proprietary and Confidential.56
EDC Extension & Scanner for Tableau
• Discover and understand data in context within the native Tableau user interface
• Detailed information and lineage for Tableau analytics with EDC’s native scanners
57 © Informatica. Proprietary and Confidential.57
Learn More
Don’t miss the customer perspectives and demos at the AI-Powered Data Cataloging Virtual Summit:
• Data Cataloging for Data Governance: Maersk
• Data Curation & Collaboration for Self-Service Analytics: Nissan North America
• Data Lineage – The Foundational Use Case: Rabobank
• EDC Adoption Best Practices : Biogen