39
1 Deploying enterprise grade security for Hadoop Brock Noland |So.ware Engineer, Cloudera February 27, 2014

TriHUG 2/14: Apache Sentry

  • Upload
    trihug

  • View
    128

  • Download
    2

Embed Size (px)

DESCRIPTION

Deploying enterprise grade security for Hadoop with Apache Sentry (incubating). Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation). In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).

Citation preview

Page 1: TriHUG 2/14: Apache Sentry

1

Deploying  enterprise  grade  security  for  Hadoop  Brock  Noland  |So.ware  Engineer,  Cloudera  February  27,  2014  

Page 2: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  security  primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

2

Page 3: TriHUG 2/14: Apache Sentry

IntroducCon  

Tonight's  focus  is  SQL-­‐on-­‐Hadoop  •  Vast  majority  of  Hadoop  users  use  Hive  or  Cloudera  Impala  

•  Data  warehouse  offload  is  the  most  common  use  case  

•  Data  warehouse  offload  is  a  two  step  process  1.  AutomaCc  transformaCons  moved  to  Hadoop  2.  Data  analysts  given  query  access  

3

Page 4: TriHUG 2/14: Apache Sentry

Data  warehouse  use  case  

4

Online  Database   Data  Warehouse  Hadoop  

Page 5: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

5

Page 6: TriHUG 2/14: Apache Sentry

AuthenCcaCon  

•  AuthenCcaCon  is  who  you  are  •  Hadoop  models  

•  Default  -­‐  “trusted  network”  •  Strong  -­‐  Kerberos  

6

Page 7: TriHUG 2/14: Apache Sentry

Default  AuthenCcaCon  –  trusted  network  

•  Default  security  mechanism  •  Hadoop  client  uses  local  username  •  Used  in  

•  POCs  •  Startups  •  Demos  •  Pre-­‐prod  environments  

7

Page 8: TriHUG 2/14: Apache Sentry

Default  AuthenCcaCon  –  trusted  network  

8

Client  Host   Hadoop  

$  whoami  brock  $  cat  a.txt  some  data  $  hadoop  fs  -­‐put  a.txt  .  

User:  brock  File:  a.txt  Contents:  some  data  

Page 9: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  Hadoop  is  secured  with  Kerberos  •  Provides  mutual  authenCcaCon  •  Protects  against  eavesdropping  and  replay  a^acks  

•  Every  user  and  service  has  a  Kerberos  “principal”  •  Service:  impala/[email protected]  •  User:  [email protected]  

•  CredenCals  •  Service:  keytabs  •  User:  password  

9

Page 10: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

10

Client  Host   Hadoop  

$  whoami  brock  $  kinit  Password:  *******  $  cat  a.txt  some  data  $  hadoop  fs  -­‐put  a.txt  .  

<kerberos  Ccket>  <encrypted  data>  *  

*  RPC  EncrypCon  must  be  enabled  

Page 11: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  Keytab  •  Encrypted  key  for  servers  (similar  to  a  “password”)  •  Generated  by  server  such  as  MIT  Kerberos  or  AcCve  Directory  

11

Page 12: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  ImpersonaCon  •  Services  such  as  Hive  Server2  impersonate  users  •  Data  loaded  by  “joe”  via  HS2  is  owned  by  “joe”  •  Oozie  jobs  submi^ed  by  “brock”  are  run  as  “brock”  

12

Page 13: TriHUG 2/14: Apache Sentry

Hive  Server  2  and  Oozie  

13

Hadoop  

Hive  Server  2  (HS2)   Oozie  

Beeline  (Hive  CLI)   Tableau   JDBC   Oozie  CLI   Control-­‐M  

Page 14: TriHUG 2/14: Apache Sentry

AuthorizaCon  

•  HDFS  permissions  •  Unix  style  •  Read/Write/Execute  for  Owner/Group/Other  •  Coarse  grained  

•  Other  Hadoop  components  have  authorizaCon  •  MapReduce  who  can  use  which  job  queues  •  HBase  table  ACL’s  

14

Page 15: TriHUG 2/14: Apache Sentry

$ hadoop fs -ls file -rw-r----- 1 analyst1 analysts 2244 2014-01-19 12:15 file

 •  Permissions  

•  Unix  style  permissions  •  Read/Write/Execute  •  Owner/Group/Other  

•  Owner  •  One  and  only  one  owner  

•  Group  •  One  and  only  one  group  

HDFS  Permisssions  

Page 16: TriHUG 2/14: Apache Sentry

Back  to  our  use  case  

•  Scenario  facts  •  ETL  offload  is  a  success  •  Data  warehouse  is  expensive  and  at  capacity  •  Same  data  is  in  Hadoop  

•  Next  step  •  End  users  start  using  Hadoop  to  augment  the  DW  •  Security  becomes  primary  concern  

16

Page 17: TriHUG 2/14: Apache Sentry

End  users  need  to  share  data  

•  Unlike  automated  ETL  jobs,  end  users  want  to  share  data  with  peers  

•  Must  manage  HDFS  permissions  manually  •  Each  file  has  a  single  group  •  End  result  is  users  set  permissions  to  world  readable/writeable  

17

Page 18: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

18

Page 19: TriHUG 2/14: Apache Sentry

Hive:  Security  holes  

CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’;

19

Page 20: TriHUG 2/14: Apache Sentry

Hive:  Security  holes  

CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2;

20

Page 21: TriHUG 2/14: Apache Sentry

Default:  AuthorizaCon  

•  Hive  ships  with  an  “advisory”  authorizaCon  system  •  All  users  see  all  databases/tables/columns  •  Does  not  fix  any  security  holes  •  Users  grant  themselves  permissions  

21

Page 22: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

22

Page 23: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

The  user  “manager1”  wants  to  share  the  table  “manager1_table”  with  senior  analysts  but  not  junior  analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table

23

Page 24: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

IT  must  create  a  group # groupadd senioranalysts

 

Then  add  the  appropriate  members  to  group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1

24

Page 25: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Then  “manager1”  can  manually  change  the  file  permissions   $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 senioranalysts 0 manager1_table

25

Page 26: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Now  any  senior-­‐level  analyst  can  query  the  data   $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+

26

Page 27: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Junior  analysts  cannot  query  the  data:   $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/manager1_table":manager1:senioranalysts:drwxr-x--T

27

Page 28: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

   

What  happens  in  the  real  world?  

28

Page 29: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Table  “manager1_table”  is  owned  by  user/group  “manager1”   $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table

29

Page 30: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

User  “manager1”  makes  “manager1_table”  world  readable/writable   $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxrwxrwt - manager1 manager1 0 manager1_table

30

Page 31: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Summary  

•  Securing  Hive  with  Kerberos  and  impersonaCon  makes  Hive  unusable  for  DW  offload  •  Manual  file  permission  management  •  End  state  is  world  writable/readable  •  No  ability  to  restrict  access  to  columns  or  rows  •  All  users  see  all  databases/tables/columns  

31

Page 32: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

32

Page 33: TriHUG 2/14: Apache Sentry

Fine  Grained  Security:  Apache  Sentry  

33

Unlocks  Key  RBAC  Requirements  Secure,  fine-­‐grained,  role-­‐based  authorizaCon  MulC-­‐tenant  administraCon  

Open  Source  Apache  Incubator  project  

Ecosystem  Support  Apache  SOLR,  HiveServer2,  &  Impala  1.1+  

AuthorizaRon  module  for  Hive,  Search,  &  Impala  

Page 34: TriHUG 2/14: Apache Sentry

Key  Benefits  of  Sentry  

34

Store  SensiCve  Data  in  Hadoop  

Extend  Hadoop  to  More  Users  

Comply  with  RegulaCons  

Page 35: TriHUG 2/14: Apache Sentry

Key  CapabiliCes  of  Sentry  

35

Fine-­‐Grained  AuthorizaCon  Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS  

Role-­‐Based  AuthorizaCon  SELECT  privilege  on  views  &  tables    INSERT  privilege  on  tables  ALL  privilege  on  the  server,  databases,  tables  &  views  ALL  privilege  is  needed  to  create/modify  schema  

MulC-­‐Tenant  AdministraCon  Separate  policies  for  each  database/schema  Can  be  maintained  by  separate  admins  

Page 36: TriHUG 2/14: Apache Sentry

Sentry  Architecture  

36

Binding  Layer  

Impala  

Impala   Hive  

Policy  Engine  

Policy  Provider  

File   Database  

HiveServer2  

Authoriza5on  Provider  

Local  FS/HDFS  

Search  

SOLR  

Pig   …  

Page 37: TriHUG 2/14: Apache Sentry

Query  MR  

SQL  

Query  ExecuCon  Flow  

37

Parse  

Build  

Check  

Plan  

Sentry  

Validate  SQL  grammar  

Construct  statement  tree  

Validate  statement  objects  •  First  check:  AuthorizaCon  

Forward  to  execuCon  planner  

Page 38: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

38

Page 39: TriHUG 2/14: Apache Sentry

Click  to  edit  Master  Ctle  style  

39