Transcript
Page 1: TriHUG 2/14: Apache Sentry

1

Deploying  enterprise  grade  security  for  Hadoop  Brock  Noland  |So.ware  Engineer,  Cloudera  February  27,  2014  

Page 2: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  security  primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

2

Page 3: TriHUG 2/14: Apache Sentry

IntroducCon  

Tonight's  focus  is  SQL-­‐on-­‐Hadoop  •  Vast  majority  of  Hadoop  users  use  Hive  or  Cloudera  Impala  

•  Data  warehouse  offload  is  the  most  common  use  case  

•  Data  warehouse  offload  is  a  two  step  process  1.  AutomaCc  transformaCons  moved  to  Hadoop  2.  Data  analysts  given  query  access  

3

Page 4: TriHUG 2/14: Apache Sentry

Data  warehouse  use  case  

4

Online  Database   Data  Warehouse  Hadoop  

Page 5: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

5

Page 6: TriHUG 2/14: Apache Sentry

AuthenCcaCon  

•  AuthenCcaCon  is  who  you  are  •  Hadoop  models  

•  Default  -­‐  “trusted  network”  •  Strong  -­‐  Kerberos  

6

Page 7: TriHUG 2/14: Apache Sentry

Default  AuthenCcaCon  –  trusted  network  

•  Default  security  mechanism  •  Hadoop  client  uses  local  username  •  Used  in  

•  POCs  •  Startups  •  Demos  •  Pre-­‐prod  environments  

7

Page 8: TriHUG 2/14: Apache Sentry

Default  AuthenCcaCon  –  trusted  network  

8

Client  Host   Hadoop  

$  whoami  brock  $  cat  a.txt  some  data  $  hadoop  fs  -­‐put  a.txt  .  

User:  brock  File:  a.txt  Contents:  some  data  

Page 9: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  Hadoop  is  secured  with  Kerberos  •  Provides  mutual  authenCcaCon  •  Protects  against  eavesdropping  and  replay  a^acks  

•  Every  user  and  service  has  a  Kerberos  “principal”  •  Service:  impala/[email protected]  •  User:  [email protected]  

•  CredenCals  •  Service:  keytabs  •  User:  password  

9

Page 10: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

10

Client  Host   Hadoop  

$  whoami  brock  $  kinit  Password:  *******  $  cat  a.txt  some  data  $  hadoop  fs  -­‐put  a.txt  .  

<kerberos  Ccket>  <encrypted  data>  *  

*  RPC  EncrypCon  must  be  enabled  

Page 11: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  Keytab  •  Encrypted  key  for  servers  (similar  to  a  “password”)  •  Generated  by  server  such  as  MIT  Kerberos  or  AcCve  Directory  

11

Page 12: TriHUG 2/14: Apache Sentry

Strong  AuthenCcaCon  –  Kerberos  

•  ImpersonaCon  •  Services  such  as  Hive  Server2  impersonate  users  •  Data  loaded  by  “joe”  via  HS2  is  owned  by  “joe”  •  Oozie  jobs  submi^ed  by  “brock”  are  run  as  “brock”  

12

Page 13: TriHUG 2/14: Apache Sentry

Hive  Server  2  and  Oozie  

13

Hadoop  

Hive  Server  2  (HS2)   Oozie  

Beeline  (Hive  CLI)   Tableau   JDBC   Oozie  CLI   Control-­‐M  

Page 14: TriHUG 2/14: Apache Sentry

AuthorizaCon  

•  HDFS  permissions  •  Unix  style  •  Read/Write/Execute  for  Owner/Group/Other  •  Coarse  grained  

•  Other  Hadoop  components  have  authorizaCon  •  MapReduce  who  can  use  which  job  queues  •  HBase  table  ACL’s  

14

Page 15: TriHUG 2/14: Apache Sentry

$ hadoop fs -ls file -rw-r----- 1 analyst1 analysts 2244 2014-01-19 12:15 file

 •  Permissions  

•  Unix  style  permissions  •  Read/Write/Execute  •  Owner/Group/Other  

•  Owner  •  One  and  only  one  owner  

•  Group  •  One  and  only  one  group  

HDFS  Permisssions  

Page 16: TriHUG 2/14: Apache Sentry

Back  to  our  use  case  

•  Scenario  facts  •  ETL  offload  is  a  success  •  Data  warehouse  is  expensive  and  at  capacity  •  Same  data  is  in  Hadoop  

•  Next  step  •  End  users  start  using  Hadoop  to  augment  the  DW  •  Security  becomes  primary  concern  

16

Page 17: TriHUG 2/14: Apache Sentry

End  users  need  to  share  data  

•  Unlike  automated  ETL  jobs,  end  users  want  to  share  data  with  peers  

•  Must  manage  HDFS  permissions  manually  •  Each  file  has  a  single  group  •  End  result  is  users  set  permissions  to  world  readable/writeable  

17

Page 18: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

18

Page 19: TriHUG 2/14: Apache Sentry

Hive:  Security  holes  

CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’;

19

Page 20: TriHUG 2/14: Apache Sentry

Hive:  Security  holes  

CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2;

20

Page 21: TriHUG 2/14: Apache Sentry

Default:  AuthorizaCon  

•  Hive  ships  with  an  “advisory”  authorizaCon  system  •  All  users  see  all  databases/tables/columns  •  Does  not  fix  any  security  holes  •  Users  grant  themselves  permissions  

21

Page 22: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

22

Page 23: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

The  user  “manager1”  wants  to  share  the  table  “manager1_table”  with  senior  analysts  but  not  junior  analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table

23

Page 24: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

IT  must  create  a  group # groupadd senioranalysts

 

Then  add  the  appropriate  members  to  group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1

24

Page 25: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Then  “manager1”  can  manually  change  the  file  permissions   $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 senioranalysts 0 manager1_table

25

Page 26: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Now  any  senior-­‐level  analyst  can  query  the  data   $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+

26

Page 27: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Junior  analysts  cannot  query  the  data:   $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/manager1_table":manager1:senioranalysts:drwxr-x--T

27

Page 28: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

   

What  happens  in  the  real  world?  

28

Page 29: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

Table  “manager1_table”  is  owned  by  user/group  “manager1”   $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table

29

Page 30: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Sharing  data  

User  “manager1”  makes  “manager1_table”  world  readable/writable   $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxrwxrwt - manager1 manager1 0 manager1_table

30

Page 31: TriHUG 2/14: Apache Sentry

Kerberos  with  impersonaCon:  Summary  

•  Securing  Hive  with  Kerberos  and  impersonaCon  makes  Hive  unusable  for  DW  offload  •  Manual  file  permission  management  •  End  state  is  world  writable/readable  •  No  ability  to  restrict  access  to  columns  or  rows  •  All  users  see  all  databases/tables/columns  

31

Page 32: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

32

Page 33: TriHUG 2/14: Apache Sentry

Fine  Grained  Security:  Apache  Sentry  

33

Unlocks  Key  RBAC  Requirements  Secure,  fine-­‐grained,  role-­‐based  authorizaCon  MulC-­‐tenant  administraCon  

Open  Source  Apache  Incubator  project  

Ecosystem  Support  Apache  SOLR,  HiveServer2,  &  Impala  1.1+  

AuthorizaRon  module  for  Hive,  Search,  &  Impala  

Page 34: TriHUG 2/14: Apache Sentry

Key  Benefits  of  Sentry  

34

Store  SensiCve  Data  in  Hadoop  

Extend  Hadoop  to  More  Users  

Comply  with  RegulaCons  

Page 35: TriHUG 2/14: Apache Sentry

Key  CapabiliCes  of  Sentry  

35

Fine-­‐Grained  AuthorizaCon  Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS  

Role-­‐Based  AuthorizaCon  SELECT  privilege  on  views  &  tables    INSERT  privilege  on  tables  ALL  privilege  on  the  server,  databases,  tables  &  views  ALL  privilege  is  needed  to  create/modify  schema  

MulC-­‐Tenant  AdministraCon  Separate  policies  for  each  database/schema  Can  be  maintained  by  separate  admins  

Page 36: TriHUG 2/14: Apache Sentry

Sentry  Architecture  

36

Binding  Layer  

Impala  

Impala   Hive  

Policy  Engine  

Policy  Provider  

File   Database  

HiveServer2  

Authoriza5on  Provider  

Local  FS/HDFS  

Search  

SOLR  

Pig   …  

Page 37: TriHUG 2/14: Apache Sentry

Query  MR  

SQL  

Query  ExecuCon  Flow  

37

Parse  

Build  

Check  

Plan  

Sentry  

Validate  SQL  grammar  

Construct  statement  tree  

Validate  statement  objects  •  First  check:  AuthorizaCon  

Forward  to  execuCon  planner  

Page 38: TriHUG 2/14: Apache Sentry

Outline  

•  IntroducCon  •  Hadoop  Security  Primer  

•  AuthenCcaCon  •  AuthorizaCon  

•  Security  opCons  •  Default  •  Kerberos  with  ImpersonaCon  •  Kerberos  with  Sentry  

•  Demo  

38

Page 39: TriHUG 2/14: Apache Sentry

Click  to  edit  Master  Ctle  style  

39  


Recommended