23
Big Data Governance in Hadoop with Cloudera Navigator Emre Sevinç emre.sevinc@bigindustries.be

Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu

Embed Size (px)

Citation preview

Big Data Governance in Hadoopwith Cloudera Navigator

Emre Sevinç[email protected]

Agenda

● Introduction

● What is data governance and why should you care about it?

● What is Cloudera Navigator and how does it fit in?

● Cloudera Navigator Demonstration

● What’s new in the latest release of Cloudera 5.10?

Where are You with Hadoop? 1/2

Your relationship with Hadoop ...

● Still learning

● Evaluating distributions

● Testing / Development / Prototyping

● In production

Where are You with Hadoop? 2/2

Your are using / planning to use Hadoop in ...

● Banking

● Telecom

● Healthcare

● Media / entertainment

● Internet services ...

Data Governance...

“... refers to the overall management of the availability, usability, integrity, auditability ,and security of the data employed in an enterprise.

A sound data governance program includes a governing body, a defined set of procedures and policies, and a plan to execute them.

Data governance is used by organizations to exercise control over processes and methods used by their data stewards in order to improve data quality.”

Cloudera Big Data Maturity Survey 2016

https://goo.gl/d3A0ps

Data Governance & Challenges● Compliance Officers: how to track, understand, and protect access to sensitive data?

○ Am I prepared for an audit?○ Who’s accessing what data?○ What are they doing with the data?○ Is sensitive data governed and protected?

● Data Stewards and Curators: how to manage and organize data assets at Hadoop scale?○ How to efficiently manage the data lifecycle from ingest to purge?○ How to classify data efficiently?○ How to make data available to end users efficiently?

● Data Scientists and BI Users: how to effortlessly find and trust the data that matter the most?○ How can I explore data on my own?○ Can I trust what I find?○ How to find related data sets?

● Hadoop Administrators and DBAs: how to boost user productivity and cluster performance?○ How is data being used today?○ How can I optimize for future workloads?

Your Hadoop data management concern is...

● Compliance, e.g. EU General Data Protection Regulation

(GDPR)

● Stewardship (lifecycle management)

● Curation (metadata tagging)

● Enabling end-user self-service

● Administration (optimization)

● Other

What is Cloudera Navigator?

How does Cloudera Navigator fit into

the Big Data Governance picture?

Cloudera Navigator Governance Foundation

Unified Auditing Comprehensive Lineage

Unified Metadata Universal Policies

Cloudera Navigator

● Trusted for production: deployed at 100s of customers in

various industries, running in production for 4 years

● Compliance-ready: Cloudera is the first Hadoop

distribution that passed an independent PCI audit

● Integrates well with industry-leading partner solutions

Integration with Others 1/2

Integration with Others 2/2https://github.com/cloudera/navigator-sdk

Lineage

Metadata - Business & Technical

Cloudera Navigator Demo

Unified Auditing

Unified Auditing

Unified Auditing

What’s new in Cloudera 5.10 (1/3)

● Comprehensive Governance for the Cloud

○ Cataloging, metadata management, and

comprehensive lineage for data on Amazon S3

○ The only big data governance solution for data

stored on-premise as well as in the cloud

What’s new in Cloudera 5.10 (2/3)

Comprehensive

Governance for the Cloud

What’s new in Cloudera 5.10 (3/3)● Policy-based business metadata assignment and validation

● Major performance optimizations

● Refreshed look-and-feel for increased data stewardship

productivity

● Solr indexing has been optimized to improve search speed

and reduce memory requirements.

Thanks for attending!

Questions? Comments?

[email protected]