Upload
trihug
View
128
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Deploying enterprise grade security for Hadoop with Apache Sentry (incubating). Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation). In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).
Citation preview
1
Deploying enterprise grade security for Hadoop Brock Noland |So.ware Engineer, Cloudera February 27, 2014
Outline
• IntroducCon • Hadoop security primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
2
IntroducCon
Tonight's focus is SQL-‐on-‐Hadoop • Vast majority of Hadoop users use Hive or Cloudera Impala
• Data warehouse offload is the most common use case
• Data warehouse offload is a two step process 1. AutomaCc transformaCons moved to Hadoop 2. Data analysts given query access
3
Data warehouse use case
4
Online Database Data Warehouse Hadoop
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
5
AuthenCcaCon
• AuthenCcaCon is who you are • Hadoop models
• Default -‐ “trusted network” • Strong -‐ Kerberos
6
Default AuthenCcaCon – trusted network
• Default security mechanism • Hadoop client uses local username • Used in
• POCs • Startups • Demos • Pre-‐prod environments
7
Default AuthenCcaCon – trusted network
8
Client Host Hadoop
$ whoami brock $ cat a.txt some data $ hadoop fs -‐put a.txt .
User: brock File: a.txt Contents: some data
Strong AuthenCcaCon – Kerberos
• Hadoop is secured with Kerberos • Provides mutual authenCcaCon • Protects against eavesdropping and replay a^acks
• Every user and service has a Kerberos “principal” • Service: impala/[email protected] • User: [email protected]
• CredenCals • Service: keytabs • User: password
9
Strong AuthenCcaCon – Kerberos
10
Client Host Hadoop
$ whoami brock $ kinit Password: ******* $ cat a.txt some data $ hadoop fs -‐put a.txt .
<kerberos Ccket> <encrypted data> *
* RPC EncrypCon must be enabled
Strong AuthenCcaCon – Kerberos
• Keytab • Encrypted key for servers (similar to a “password”) • Generated by server such as MIT Kerberos or AcCve Directory
11
Strong AuthenCcaCon – Kerberos
• ImpersonaCon • Services such as Hive Server2 impersonate users • Data loaded by “joe” via HS2 is owned by “joe” • Oozie jobs submi^ed by “brock” are run as “brock”
12
Hive Server 2 and Oozie
13
Hadoop
Hive Server 2 (HS2) Oozie
Beeline (Hive CLI) Tableau JDBC Oozie CLI Control-‐M
AuthorizaCon
• HDFS permissions • Unix style • Read/Write/Execute for Owner/Group/Other • Coarse grained
• Other Hadoop components have authorizaCon • MapReduce who can use which job queues • HBase table ACL’s
14
$ hadoop fs -ls file -rw-r----- 1 analyst1 analysts 2244 2014-01-19 12:15 file
• Permissions
• Unix style permissions • Read/Write/Execute • Owner/Group/Other
• Owner • One and only one owner
• Group • One and only one group
HDFS Permisssions
Back to our use case
• Scenario facts • ETL offload is a success • Data warehouse is expensive and at capacity • Same data is in Hadoop
• Next step • End users start using Hadoop to augment the DW • Security becomes primary concern
16
End users need to share data
• Unlike automated ETL jobs, end users want to share data with peers
• Must manage HDFS permissions manually • Each file has a single group • End result is users set permissions to world readable/writeable
17
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
18
Hive: Security holes
CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’;
19
Hive: Security holes
CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2;
20
Default: AuthorizaCon
• Hive ships with an “advisory” authorizaCon system • All users see all databases/tables/columns • Does not fix any security holes • Users grant themselves permissions
21
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
22
Kerberos with impersonaCon: Sharing data
The user “manager1” wants to share the table “manager1_table” with senior analysts but not junior analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
23
Kerberos with impersonaCon: Sharing data
IT must create a group # groupadd senioranalysts
Then add the appropriate members to group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1
24
Kerberos with impersonaCon: Sharing data
Then “manager1” can manually change the file permissions $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 senioranalysts 0 manager1_table
25
Kerberos with impersonaCon: Sharing data
Now any senior-‐level analyst can query the data $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+
26
Kerberos with impersonaCon: Sharing data
Junior analysts cannot query the data: $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/manager1_table":manager1:senioranalysts:drwxr-x--T
27
Kerberos with impersonaCon: Sharing data
What happens in the real world?
28
Kerberos with impersonaCon: Sharing data
Table “manager1_table” is owned by user/group “manager1” $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
29
Kerberos with impersonaCon: Sharing data
User “manager1” makes “manager1_table” world readable/writable $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxrwxrwt - manager1 manager1 0 manager1_table
30
Kerberos with impersonaCon: Summary
• Securing Hive with Kerberos and impersonaCon makes Hive unusable for DW offload • Manual file permission management • End state is world writable/readable • No ability to restrict access to columns or rows • All users see all databases/tables/columns
31
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
32
Fine Grained Security: Apache Sentry
33
Unlocks Key RBAC Requirements Secure, fine-‐grained, role-‐based authorizaCon MulC-‐tenant administraCon
Open Source Apache Incubator project
Ecosystem Support Apache SOLR, HiveServer2, & Impala 1.1+
AuthorizaRon module for Hive, Search, & Impala
Key Benefits of Sentry
34
Store SensiCve Data in Hadoop
Extend Hadoop to More Users
Comply with RegulaCons
Key CapabiliCes of Sentry
35
Fine-‐Grained AuthorizaCon Specify security for SERVERS, DATABASES, TABLES & VIEWS
Role-‐Based AuthorizaCon SELECT privilege on views & tables INSERT privilege on tables ALL privilege on the server, databases, tables & views ALL privilege is needed to create/modify schema
MulC-‐Tenant AdministraCon Separate policies for each database/schema Can be maintained by separate admins
Sentry Architecture
36
Binding Layer
Impala
Impala Hive
Policy Engine
Policy Provider
File Database
HiveServer2
Authoriza5on Provider
Local FS/HDFS
Search
SOLR
Pig …
Query MR
SQL
Query ExecuCon Flow
37
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement tree
Validate statement objects • First check: AuthorizaCon
Forward to execuCon planner
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
38
Click to edit Master Ctle style
39