24
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Row Filtering and Column Masking with Apache Ranger Srikanth Venkat Senior Director, Product Management

Dynamic Column Masking and Row-Level Filtering in HDP

Embed Size (px)

Citation preview

Page 1: Dynamic Column Masking and Row-Level Filtering in HDP

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Row Filtering and Column Maskingwith Apache Ranger

Srikanth Venkat Senior Director, Product Management

Page 2: Dynamic Column Masking and Row-Level Filtering in HDP

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Dynamic Column Masking and Row-Level Filtering in HDP

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaBackground

Dynamic Column Masking and Row Filtering

Spark SQL Security via Hive LLAP/Ranger

Demo

Page 4: Dynamic Column Masking and Row-Level Filtering in HDP

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Security Challenges of Today’s Data Platforms

Central repository of critical and sensitive data

Grey Data

Data maintained over long duration

Forever

External ecosystem is in flux

The Zoo

Users can access and analyze data in new

and different ways

Democratization

Page 5: Dynamic Column Masking and Row-Level Filtering in HDP

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger

• Central audit location for all access requests

• Support multiple destination sources (HDFS, Solr, etc.)

• Real-time visual query interface

AuditingAuthorization

• Store and manage encryption keys• Support HDFS Transparent Data

Encryption• Integration with HSM

• Safenet LUNA

Ranger KMS

• Centralized platform to define, administer and manage security policies consistently across Hadoop components

• HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas

• Extensible Architecture• Custom policy conditions, user context

enrichers• Easy to add new component types for

authorization

Page 6: Dynamic Column Masking and Row-Level Filtering in HDP

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Architecture

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Ranger Plugin

Ranger Plugin

Legacy Tools and Data Governance

HDFS

Knox

NifI

Ranger Plugin

Ranger Plugin

SolrRanger Plugin

Ranger Policy Server Integration API

KafkaRanger Plugin

YARNRanger Plugin

Ranger PluginStorm Ranger Plugin Atlas

Solr

Page 7: Dynamic Column Masking and Row-Level Filtering in HDP

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

⬢ Simple Intuitive UI for Policy Editing and Setup

⬢ Fine-grained specificity by resource type, user context, tags, and operation

⬢ Supports Access, Tag Based, Dynamic Data Masking, and Row Filtering Policy Types

Apache Ranger - Intuitive and Granular Policy Management

Page 8: Dynamic Column Masking and Row-Level Filtering in HDP

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger Audits - Data Access⬢ Comprehensive scalable audit logging ⬢ Audits for:

⬢ Resource Access Events with user context⬢ Policy Edits/Creation/Deletion⬢ User session information⬢ Component plugin policy sync operations

Page 9: Dynamic Column Masking and Row-Level Filtering in HDP

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Row Filtering in Hive

R A N G E R

Control Access to Rows in Hive Tables based on Context!Goal: Improve reliability and robustness of HDP by providing Row Level Security to Hive tables and reducing surface area of security system

⬢ Capabilities– Restrict data row access based on

– user characteristics (e.g. group membership) AND– runtime context

⬢ Access restriction logic at Hive layer => No changes to apps!– Hive applies the access restrictions every time that data access is

attempted– Seamless behind the scenes enforcement of row level segmentation

without having to add this logic to the predicate of the query– No need for multiple views to filter rows for different groups and

users!

⬢ Core Technologies: Ranger, Hive

HDP2.5

AT L A S

H I V E

Page 10: Dynamic Column Masking and Row-Level Filtering in HDP

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Row Filtering in Hive

R A N G E R

Control Access to Rows in Hive Tables based on Context!⬢ Use Cases: Cross-industry application for data protection:

HDP2.5

AT L A S

H I V EHealthcare

• A hospital can create a security policy that allows doctors to view data rows only for their own patients

• Insurance claims administrators can view only specific rows for their specific site.

Financial Services

• A bank can create a policy to restrict access to rows of financial data based on the employee’s business division, locale, or based on the employee’s role

• Employees in the finance department are allowed to see customer invoices, payments, and accrual data

• European HR employees can see European employee data).

Information Technology

A multi-tenant application can create logical separation of each tenant’s data so that each tenant can see only their own data rows.

 

Page 11: Dynamic Column Masking and Row-Level Filtering in HDP

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Data Masking of Hive Columns

R A N G E R

Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation!

Goal: Mask or anonymize sensitive columns of data (e.g. PII, PCI, PHI) from Hive query output

⬢ Benefits– Does not physically alter the data, or make a copy of it– Original sensitive data also does not leave the data

store, but obfuscated when presenting to the user. – No changes are required at the application or Hive layer– No need to produce additional protected duplicate

versions of datasets– Simple & easy to setup masking policies

⬢ Core Technologies: Ranger, Hive

HDP2.5

AT L A S

H I V E

Page 12: Dynamic Column Masking and Row-Level Filtering in HDP

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Masking and Row Level FilteringCountry National ID CC No Name DOB MRN Policy IDUS 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424

US 333287465 5391304868205600 Jane Doe 8/13/1979 3736885376 cadsd984

Germany T22000129 4532786256545550 Ernie Schwarz 3/5/1963 876452830A KK-2345909

Ranger Policy Enforcement

Country National ID CC No MRN Name

US xxxxx3233 4539 xxxx xxxx xxxx null John Doe

US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe

Country National ID Name MRN

Germany T22000129 Ernie Schwarz 876452830A

Users from US customer support group see row filtered data for US persons with CC and National ID (SSN) as masked values and MRN is nullified

EU Health Policy Admins view relevant columns of data unmasked but are restricted by row filtering policies to see data for EU persons only

HDP2.5

Page 13: Dynamic Column Masking and Row-Level Filtering in HDP

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

SparkSQL Security via Hive LLAP

Page 14: Dynamic Column Masking and Row-Level Filtering in HDP

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark SQL Security: Row Filtering and Column Masking Spark SQL + Hive enables users to explore very large data sets using SQL Enterprises want to enable Spark SQL for ad-hoc analysis using BI tools with

fine grain security Spark provides strong authentication via Kerberos and wire encryption via

SSL but as general purpose compute has no built in authorization sub-system Spark also does not have any way to define a pluggable module that contains

policies for fine grain authorization– With structured data with columns and rows with Hive, fine grain security becomes a challenge

Co-mingled data in the same table may belong to two different groups, each with their own regulatory requirements.

Data may have regional restrictions, time based availability restrictions, departmental restrictions, etc.

all user passwords: hadoop

Page 15: Dynamic Column Masking and Row-Level Filtering in HDP

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2 with LLAP: Open Interfaces

Page 16: Dynamic Column Masking and Row-Level Filtering in HDP

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Features: Spark Column Security with LLAP

Fine-Grained Column Level Access Control for SparkSQL.

Fully dynamic policies per user. Doesn’t require views.

Use Standard Ranger policies and tools to control access and masking policies.

Flow:1. SparkSQL gets data locations

known as “splits” from HiveServer and plans query.

2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.

3. Spark gets a modified query plan based on dynamic security policy.

4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.

HiveServer2

Authorization

Hive MetastoreData Locations

View Definitions

LLAPData Read

Filter Pushdown

Ranger Server

Dynamic Policies

Spark Client

12

4

3

Page 17: Dynamic Column Masking and Row-Level Filtering in HDP

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Per-User Row Filtering by Region in SparkSQL

Spark User 2(East Region)

Spark User 1(West Region)

Original Query:SELECT * from CUSTOMERS

WHERE total_spend > 10000

Query Rewrites based onDynamic Ranger Policies

LLAP Data AccessUser ID Region Total Spend1 East 5,1312 East 27,8283 West 55,4934 West 7,1935 East 18,193

Dynamic Rewrite:SELECT * from CUSTOMERS

WHERE total_spend > 10000AND region = “east”

Dynamic Rewrite:SELECT * from CUSTOMERS

WHERE total_spend > 10000AND region = “west”

Page 18: Dynamic Column Masking and Row-Level Filtering in HDP

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaDemo

Page 19: Dynamic Column Masking and Row-Level Filtering in HDP

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Demo Setup Hortonia – mid-size financial services company expanding from US to

international markets Employees in EU and US Multiple business units need access to customer data: Analysts, HR Customer data is co-mingled as well as isolated Needs to have rational security policies to provide the right level of access

control to customer data across geographies, business functions, and to comply with external regulations (PII, HIPAA, EU Privacy etc.)

Page 20: Dynamic Column Masking and Row-Level Filtering in HDP

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Demo Data Customer data in hortoniabank DB

• 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-sensitive data)

–us_customers: USA person data only–ww_customers: multi-language, multi-country, localized person

data across the world• 1 Reference table: eu_countries (reference table for looking up EU

country codes to country mappings – with BRExit etc.)

all user passwords: hadoop

Page 21: Dynamic Column Masking and Row-Level Filtering in HDP

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies Setup for Demo Only US employees can see data in us_customers table and only from locations within the US

(access_us_customers)

Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table + access_ww_customers)

Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table + access_ww_customers)

US HR team members can see all original unmasked data (PCI, PII,….)

Analysts can view masked versions of sensitive data from WW customers table but are prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)

No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query (prohibition policy)

Page 22: Dynamic Column Masking and Row-Level Filtering in HDP

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Personas Setup for DemoUser Group Access Privileges

joe-analyst us_employees, analyst

US Data Only, non-sensitive data only, rest masked or forbidden depending on sensitivity

kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI)

ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data

Page 23: Dynamic Column Masking and Row-Level Filtering in HDP

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Column Data Column Description

Masking Type

Sample Output Ranger Masking Policy

password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash

nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4

ccnumber Credit Card Number

Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4

streetaddress Street Address

Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact

MRN MRN Nullify null mask_mrn_nullify

age Age CUSTOM (Adds a random number below 20 to actual age)

mask_age_custom

birthday Date of Brith

CUSTOM 01-01-1987 (Keep year of birth and make date & month 01-01)

mask_dob_custom

Data Masking Policies setup for us_customers data for analyst group

Page 24: Dynamic Column Masking and Row-Level Filtering in HDP

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Backup