Upload
dataworks-summit
View
557
Download
2
Embed Size (px)
Citation preview
Apache Atlas & Open Metadata
Dataworks Sydney 2017
Nigel Jones,
Software Architect
IBM
Ferd Scheepers
Chief Information Architect
ING
2
Open Metadata and Governance will allow…
… metadata to be captured when the data is created, moved with the data and
be augmented and processed by any of the vendor tools.
Open Metadata and Governance consists of:
1. Standardized, extensible set of metadata types
2. Metadata exchange APIs and notifications
3. Frameworks for automated governance
Open Metadata and Governance will allow you to have:
1. An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage),
owner, structure, meaning, classification and quality
2. New data tools (from any vendor) connect to your data catalogue out of the box
3. Metadata being added automatically to the catalogue as new data is created and analysed
4. Subject matter experts collaborating around the data
5. Automated governance processes protect and manage your data
3
What is Open Metadata and Governance?
4
Positioning of Apache Atlas for Open Metadata
Open andUnified Metadata
Metadata repository
Apache Atlas
Metadata repository
IBM
Metadata repository
SAS
Open Metadata Repository ServiceOMRS
Open Metadata Access ServiceOMAS
Components defined
and being developed
by Open Metadata &
Governance project
Metadata
highway
• Apache Atlas provides an open community for developing the reference implementation
for open metadata and governance. In essence Apache Atlas delivers 2 main
capabilities:
• it plays a role of a metadata repository (Graph Database) for a metadata end-user tool
• and, it plays the important role of delivering the federated/unified metadata layer
across the entire landscape of an enterprise
• The software development governance from the Apache Software Foundation (ASF)
creates confidence that the technology will be maintained and enhanced as appropriate
in an equitable manner.
Role of Apache Atlas
5
… because Apache is mostly focused on development and we are missing a governance
body for managing the adoption of and compliance to the Open Metadata and Governance
standards. We envision the following roles for ODPI:
1. Be an advocate of the Open Metadata and Governance standards, make them visible
and their value understood.
2. Facilitate discussions around the Open Metadata and Governance standards evolution,
maintenance and development.
3. Test and sign-off compliance of vendor offerings to the Open Metadata and Governance
standards.
6
Doing all of this under Apache Atlas flag is not enough…
1. Hands-on Community members:
• ING
• IBM
• HortonWorks
2. Companies we have had conversations with:
• CIBC
• SAS
• Microsoft
• Oracle
• Informatica
• Waterline
• RBC
• DBS
7
Who is in ?
1. Ambition level:
• End of September 2017: Open Metadata working demo.
• Mid-December November 2017: first version of user access.
• Google for Data
2. Next steps:
• End of Q2 2018: production ready version of Virtual Data
Connector.
8
Timeline and next steps
About Me
https://www.linkedin.com/in/nigelljones
https://www.twitter.com/planetf1
Atlas Architecture
Storage Repository
Graph
Type System
REST API
Models
UI & Apps
Hooks &
Bridges
https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance
Common Core Data model
Data Assets Governance Lineage
Glossary CollaborationModels & Reference
Data
Base Types, Systems &
Infrastructure
Metadata Discovery
https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
Open APIs - OMRS
Metadata Highway
Adapter
Plugin
Open Connector
Framework
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258803
Open APIs - OMAS
OMRSGovernance
Engine OMAS
Glossary OMAS
Asset OMAS
Information View OMAS
++......
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
OMAS – detail
Project ListMetadata Service
Data/Asset
Community Metadata Service
Landscape Definition Metadata Service
Asset CatalogMetadata Service
Classification and Mapping Metadata Service
Information View Metadata Service
Connector Directory Metadata Service
Governance Definitions Metadata Service
Information Process Metadata Service
Glossary and Taxonomy Metadata Service
AssetMetadata Service
DiscoveryMetadata Service
Governance ActionMetadata Service
Roles and AccessMetadata Service
Models and SchemaMetadata Service
Connector
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
Businessmetadata
Structuralmetadata for
a data store
New glossary function for semantic processing
EMPNAME EMPNO JOBCODE SALARY
EMP
LOYE
E
REC
OR
D
Employee
Work Location
Annual Salary
Job Title
Employee IdEmployee Name
Hourly Pay RateManager Compensation Plan
HAS-A
HAS-A
HAS-AHAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
Replacing v1 Taxonomy (tech preview)
Categories
Terms
hierarchies
Rich Relationships
Classifications
Glossary
https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
Open Discovery Framework
Open Framework
Plugins characterize data & relationships
Updates metadata with results
Initial implementation in master
https://cwiki.apache.org/confluence/display/ATLAS/Automated+metadata+discovery
Governance Action Framework
metadata drives enforcement
Classification (tag) based – scalable, glossary driven
Access, Masking, Filtering
Supports Apache Ranger but open APIs for others
Audit,Rights - Exception management, Rights, Privacy (to look at in future)
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258801
Summary
Open Metadata
Enterprise Catalog
Discovery
Multi Vendor
Open, Layered
APIs
Metadata store
integration
Open Source &
Governance
ubiquitous
Standard Models
How can I get involved?
Discuss: Mailing List
Document, Explain: Wiki
Report, Design: Jira
Face to face
Code
Vendors!
https://cwiki.apache.org/confluence/display/ATLAS/Getting+Involved