30
© Hortonworks Inc. 2014 August 2014 Page 1 Use cases and security solutions Apache Hive Thejas Nair @thejasn

August 2014 HUG : Hive 13 Security

Embed Size (px)

DESCRIPTION

August 2014 HUG : Hive 13 Security

Citation preview

Page 1: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014

August 2014

Page 1

Use cases and security solutions

Apache Hive

Thejas Nair @thejasn

Page 2: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 2

• Introduce key security concepts • Use cases • Authorization solutions • Followed by specific use cases and experience at Yahoo!

What are we talking about ?

Page 3: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 3

Authentication vs Authorization

• Authentication – Verifying your identity – Enabled in Hadoop using Kerberos – More options with HiveServer2

• Authorization

– Verifying if you have permissions to perform this action

Pic1 – – https://flic.kr/p/5qQiJR QiJR Pic2 - https://flic.kr/p/3i4SW

Page 4: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 4

What is Apache Hive?

https://flic.kr/p/nff9gY

It depends on who you ask!

Page 5: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 5

What is Apache Hive?

Its a table oriented storage layer!

It is a SQL database!

Page 6: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 6

Components - The table storage layer

HDFS Metastore

Pig/MR

Hcatalog

Data Metadata

Page 7: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 7

• Data – FileSystem –  /hive/warehouse/…/table1/ –  Traditional POSIX permissions

–  rwxr-x--- owner: thejas, group: dev –  More flexibility with Access Control Lists –  More flexibility with Apache Argus (incubating)

Authorization – The table storage layer

Page 8: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 8

• Metadata –  {name : table1, storage_info : {dir : /hive/…/table1}, columns: {..}, .. } –  Authorization ?

Authorization – The table storage layer

Page 9: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 9

• Don’t add another source of truth for authorization!

• Metadata access based on corresponding data access.

Storage Based Authorization

Page 10: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 10

• Update configuration in metastore –  http://s.apache.org/SBA –  Ensure that only metastore server has access to its RDBMS

Enabling Storage Based Authorization

Page 11: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 11

• Hive command line –  bin/hive –e ‘select * from ..’ –  Same use case as Pig, MapReduce –  Storage Based Authorization applicable here

Hive as a SQL query engine

Page 12: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 12

• ODBC/JDBC application/tools –  Adds HiveServer2 at the front –  Query processing – same way as commandline –  Storage Based Authorization applicable here –  Have query run as end user

– Default configuration hive.server2.enable.doAs=true

Hive as a SQL query engine

Page 13: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 13

• Simple. – One source of truth. Just manage the FileSystem permissions.

• Flexible HDFS ACL support – Requires upcoming hive 0.14 release.

SBA : What is great about it?

Page 14: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 14

• Access control at row and column level –  FileSystem permissions are at the level of dir and files

SBA: What is missing ?

Page 15: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 15

• Data access api should be fine grained –  API needs support for row/column concept

• HiveServer2 ? – Data server for ODBC/JDBC – SQL as api supports selecting rows,columns

Fine grained control : pre-requisites

Page 16: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 16

• Fine grained authorization with HiveServer2

• Grant/Revoke statements • Based on SQL standard

SQL standards based authorization

Page 17: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 17

• Compile Query • -> Query Plan • -> Actions required on objects – (eg READ : table1, DROP : table2)

• -> Privileges on objects – (eg SELECT : table1, OWNER: table2)

• Check if user has required privileges

SQL std based auth: How it works

Page 18: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 18

• GRANT/REVOKE <PRIVILEGE> ON <OBJECT> TO <USERS>

• <USERS> can be a user or a role • Delegate management of privileges/roles

• Hive ‘DBA’ can be added to ‘ADMIN’ role

Authorization Policy

Page 19: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 19

• Supported using views – Grant access to view, not base table – Select clause – select columns – Where clause – select rows

Fine Grained Authorization

Page 20: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 20

•  Disallows features that bypass the fine grained authorization checks.

•  dfs commands, transform clause, create udfs

•  admin can add permanent UDFs

Restrictions

Page 21: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 21

• Grant access on files for HiveServer2 process user

• Run queries as this user – Configure hive.server2.enable.doAs=false

SQL std based auth: Query processing

Page 22: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 22

• Authorization plugin API • Apache Argus first user

Extending Hive Authorization

Page 23: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 23

• Grant/revoke based access control • Unsecure/incomplete model • Unsecure model for Hive command line

Hive default authorization

Page 24: August 2014 HUG : Hive 13 Security

© Hortonworks Inc. 2014 Page 24

• Playing well with each other 1.  Metadata authorization using Storage Based

Authorization 2.  Fine grained authorization options in

HiveServer2 3.  Both 1 & 2

Conclusion

Page 25: August 2014 HUG : Hive 13 Security

Use Cases a t Yahoo !

PRESENTED BY Chris Drome⎪ August 20, 2014

Page 26: August 2014 HUG : Hive 13 Security

Overview of Use Cases

26 Yahoo Confidential & Proprietary

§  Column and row level access controls ›  Hive 0.13 SQL Standards Based Hive Authorization

•  Authorization model managed by metastore

›  HiveServer2 •  Serving engine with authorization plugin

›  Views •  Fine grain authorization on a table

§  (Limited) Authorization for Hive CLI ›  HCatalog server-side security ›  HDFS file permission based authorization (StorageBasedAuthorizationProvider) ›  HiveMetastoreAuthorizationProvider plugin

Page 27: August 2014 HUG : Hive 13 Security

The Players

27 Yahoo Confidential & Proprietary

§  Producers ›  ETL jobs load data to grid ›  Primarily Pig jobs ›  Some MR jobs ›  Owners of the data (read/write file permissions)

•  Owner of directories and files

§  Consumers ›  Consumes some sub-set of data ›  Readers of the data (read-only file permissions)

•  Member of group with read-only permissions

Page 28: August 2014 HUG : Hive 13 Security

The Challenges

28 Yahoo Confidential & Proprietary

§  Producers ›  Latency SLAs on a large volume of data ›  Responsible for managing data

•  Reloading data, rolling up data, archiving data

›  Responsible for managing access to data (groups)

§  Consumers ›  Access controlled by membership in consumer group ›  Access controls at column or row level not possible ›  Limited to one group per table ›  Access may be through Pig, Hive, MR, BI tools, etc.

Page 29: August 2014 HUG : Hive 13 Security

Fine Grain Access Control with HiveServer2

29 Yahoo Confidential & Proprietary

§  HiveServer2 as query execution engine §  HiveServer2 responsible for verifying authorization §  HiveServer2 runs as “super-user” with read privileges

›  Connecting user doesn’t have access permissions on underlying files ›  Executes query on behalf of connecting user

§  Define arbitrary access controls with views on tables ›  Able to restrict by columns and/or rows ›  Grant access to individual users

§  Prototype with Sentry as proof-of-concept

Page 30: August 2014 HUG : Hive 13 Security

(Limited) Authorization for Hive CLI

30 Yahoo Confidential & Proprietary

§  Not practical to prevent use of Hive CLI §  Hive CLI could be used to circumvent HS2-based authorization §  HCatalog server-side security uses StorageBasedAuthorizationProvider

to check HDFS access permissions ›  Chain with an authorization plugin (HiveMetastoreAuthorizationProvider)

§  Perform HCatalog-based authorization of DDL tasks ›  Prevent users from creating/dropping objects in databases without authorization

§  Perform HCatalog-based authorization for data access §  Simple prototype as proof-of-concept