Upload
hortonworks
View
221
Download
3
Embed Size (px)
Citation preview
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering and Column Maskingwith Apache Ranger
Srikanth Venkat Senior Director, Product Management
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.
Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaBackground
Dynamic Column Masking and Row Filtering
Spark SQL Security via Hive LLAP/Ranger
Demo
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security Challenges of Today’s Data Platforms
Central repository of critical and sensitive data
Grey Data
Data maintained over long duration
Forever
External ecosystem is in flux
The Zoo
Users can access and analyze data in new
and different ways
Democratization
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys• Support HDFS Transparent Data
Encryption• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer and manage security policies consistently across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas
• Extensible Architecture• Custom policy conditions, user context
enrichers• Easy to add new component types for
authorization
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
Had
oop
Com
pone
nts
Ent
erpr
ise
Use
rs
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Simple Intuitive UI for Policy Editing and Setup
⬢ Fine-grained specificity by resource type, user context, tags, and operation
⬢ Supports Access, Tag Based, Dynamic Data Masking, and Row Filtering Policy Types
Apache Ranger - Intuitive and Granular Policy Management
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger Audits - Data Access⬢ Comprehensive scalable audit logging ⬢ Audits for:
⬢ Resource Access Events with user context⬢ Policy Edits/Creation/Deletion⬢ User session information⬢ Component plugin policy sync operations
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!Goal: Improve reliability and robustness of HDP by providing Row Level Security to Hive tables and reducing surface area of security system
⬢ Capabilities– Restrict data row access based on
– user characteristics (e.g. group membership) AND– runtime context
⬢ Access restriction logic at Hive layer => No changes to apps!– Hive applies the access restrictions every time that data access is
attempted– Seamless behind the scenes enforcement of row level segmentation
without having to add this logic to the predicate of the query– No need for multiple views to filter rows for different groups and
users!
⬢ Core Technologies: Ranger, Hive
HDP2.5
AT L A S
H I V E
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!⬢ Use Cases: Cross-industry application for data protection:
HDP2.5
AT L A S
H I V EHealthcare
• A hospital can create a security policy that allows doctors to view data rows only for their own patients
• Insurance claims administrators can view only specific rows for their specific site.
Financial Services
• A bank can create a policy to restrict access to rows of financial data based on the employee’s business division, locale, or based on the employee’s role
• Employees in the finance department are allowed to see customer invoices, payments, and accrual data
• European HR employees can see European employee data).
Information Technology
A multi-tenant application can create logical separation of each tenant’s data so that each tenant can see only their own data rows.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Data Masking of Hive Columns
R A N G E R
Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation!
Goal: Mask or anonymize sensitive columns of data (e.g. PII, PCI, PHI) from Hive query output
⬢ Benefits– Does not physically alter the data, or make a copy of it– Original sensitive data also does not leave the data
store, but obfuscated when presenting to the user. – No changes are required at the application or Hive layer– No need to produce additional protected duplicate
versions of datasets– Simple & easy to setup masking policies
⬢ Core Technologies: Ranger, Hive
HDP2.5
AT L A S
H I V E
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Masking and Row Level FilteringCountry National ID CC No Name DOB MRN Policy IDUS 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424
US 333287465 5391304868205600 Jane Doe 8/13/1979 3736885376 cadsd984
Germany T22000129 4532786256545550 Ernie Schwarz 3/5/1963 876452830A KK-2345909
Ranger Policy Enforcement
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
Users from US customer support group see row filtered data for US persons with CC and National ID (SSN) as masked values and MRN is nullified
EU Health Policy Admins view relevant columns of data unmasked but are restricted by row filtering policies to see data for EU persons only
HDP2.5
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SparkSQL Security via Hive LLAP
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark SQL Security: Row Filtering and Column Masking Spark SQL + Hive enables users to explore very large data sets using SQL Enterprises want to enable Spark SQL for ad-hoc analysis using BI tools with
fine grain security Spark provides strong authentication via Kerberos and wire encryption via
SSL but as general purpose compute has no built in authorization sub-system Spark also does not have any way to define a pluggable module that contains
policies for fine grain authorization– With structured data with columns and rows with Hive, fine grain security becomes a challenge
Co-mingled data in the same table may belong to two different groups, each with their own regulatory requirements.
Data may have regional restrictions, time based availability restrictions, departmental restrictions, etc.
all user passwords: hadoop
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Open Interfaces
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
Fine-Grained Column Level Access Control for SparkSQL.
Fully dynamic policies per user. Doesn’t require views.
Use Standard Ranger policies and tools to control access and masking policies.
Flow:1. SparkSQL gets data locations
known as “splits” from HiveServer and plans query.
2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.
3. Spark gets a modified query plan based on dynamic security policy.
4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.
HiveServer2
Authorization
Hive MetastoreData Locations
View Definitions
LLAPData Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
12
4
3
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
Spark User 2(East Region)
Spark User 1(West Region)
Original Query:SELECT * from CUSTOMERS
WHERE total_spend > 10000
Query Rewrites based onDynamic Ranger Policies
LLAP Data AccessUser ID Region Total Spend1 East 5,1312 East 27,8283 West 55,4934 West 7,1935 East 18,193
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “east”
Dynamic Rewrite:SELECT * from CUSTOMERS
WHERE total_spend > 10000AND region = “west”
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaDemo
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Setup Hortonia – mid-size financial services company expanding from US to
international markets Employees in EU and US Multiple business units need access to customer data: Analysts, HR Customer data is co-mingled as well as isolated Needs to have rational security policies to provide the right level of access
control to customer data across geographies, business functions, and to comply with external regulations (PII, HIPAA, EU Privacy etc.)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Data Customer data in hortoniabank DB
• 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-sensitive data)
–us_customers: USA person data only–ww_customers: multi-language, multi-country, localized person
data across the world• 1 Reference table: eu_countries (reference table for looking up EU
country codes to country mappings – with BRExit etc.)
all user passwords: hadoop
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies Setup for Demo Only US employees can see data in us_customers table and only from locations within the US
(access_us_customers)
Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table + access_ww_customers)
Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table + access_ww_customers)
US HR team members can see all original unmasked data (PCI, PII,….)
Analysts can view masked versions of sensitive data from WW customers table but are prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)
No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query (prohibition policy)
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Personas Setup for DemoUser Group Access Privileges
joe-analyst us_employees, analyst
US Data Only, non-sensitive data only, rest masked or forbidden depending on sensitivity
kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI)
ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Column Data Column Description
Masking Type
Sample Output Ranger Masking Policy
password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash
nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4
ccnumber Credit Card Number
Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4
streetaddress Street Address
Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact
MRN MRN Nullify null mask_mrn_nullify
age Age CUSTOM (Adds a random number below 20 to actual age)
mask_age_custom
birthday Date of Brith
CUSTOM 01-01-1987 (Keep year of birth and make date & month 01-01)
mask_dob_custom
Data Masking Policies setup for us_customers data for analyst group
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backup