24
1 Big Data Security Joey Echeverria | Principal Solu8ons Architect [email protected] | @fwiffo ©2013 Cloudera, Inc.

Big Data Security with Hadoop

Embed Size (px)

Citation preview

Page 1: Big Data Security with Hadoop

1

Big  Data  Security  Joey  Echeverria  |  Principal  Solu8ons  Architect  [email protected]  |  @fwiffo  

©2013 Cloudera, Inc.

Page 2: Big Data Security with Hadoop

Big  Data  Security  

EARLY  DAYS  

2  

Page 3: Big Data Security with Hadoop

Hadoop  File  Permissions  

•  Added  in  HADOOP-­‐1298  •  Hadoop  0.16  •  Early  2008  

•  Authoriza8on  without  authen8ca8on  •  POSIX-­‐like  RWX  bits  

3

Page 4: Big Data Security with Hadoop

MapReduce  ACLs  

•  Added  in  HADOOP-­‐3698  •  Hadoop  0.19  •  Late  2008  

•  ACLs  per  job  queue  •  Set  a  list  of  allowed  users  or  groups  per  opera8on  

•  Job  submission  •  Job  administra8on  

•  No  authen8ca8on  

4

Page 5: Big Data Security with Hadoop

Securing  a  Cluster  Through  a  Gateway  

•  Hadoop  cluster  runs  on  a  private  network  •  Gateway  server  dual-­‐homed  (Hadoop  network  and  public  network)  

•  Users  SSH  onto  gateway  •  Op8onally  can  create  an  SSH  proxy  for  jobs  to  be  submi`ed  from  the  client  machine  

•  Provides  minimum  level  of  protec8on  

5

Page 6: Big Data Security with Hadoop

Big  Data  Security  

WHY  SECURITY  MATTERS  

6  

Page 7: Big Data Security with Hadoop

Prevent  Accidental  Access  

•  Don’t  let  users  shoot  themselves  in  the  foot  •  Main  driver  for  early  features  •  Not  security  per-­‐se,  but  a  cri8cal  first  step  •  Doesn’t  require  strong  authen8ca8on  

7

Page 8: Big Data Security with Hadoop

Stop  Malicious  Users  

•  Early  features  were  necessary,  but  not  sufficient  •  Security  has  to  get  real  •  Hadoop  runs  arbitrary  code  •  Implicit  trust  doesn’t  prevent  the  insider  threat  

8

Page 9: Big Data Security with Hadoop

Co-­‐mingle  All  Your  Data  

•  Ofen  overlooked  •  Big  data  means  gegng  rid  of  stovepipes  

•  Scalability  and  flexibility  are  only  50%  of  the  problem  •  Trust  your  data  in  a  mul8-­‐tenant  environment  

•  Most  cri8cal  driver  

9

Page 10: Big Data Security with Hadoop

Big  Data  Security  

AN  EVOLVING  STORY  

10  

Page 11: Big Data Security with Hadoop

Authoriza8on  

•  Files  •  MapReduce/YARN  job  queues  •  Service-­‐level  authoriza8on  

•  Whitelists  and  blacklists  of  hosts  and  users  

11

Page 12: Big Data Security with Hadoop

Authen8ca8on  

•  HADOOP-­‐4487  •  Hadoop  0.22  and  0.20.205  •  Late  2010  

•  Based  on  Kerberos  and  internal  delega8on  tokens  •  Provides  strong  user  authen8ca8on  •  Also  used  for  service-­‐to-­‐service  authen8ca8on  

 

12

2.2 High Level Use Cases 2 USE CASES

2.2 High Level Use Cases

1. Applications accessing files on HDFS clusters Non-MapReduce ap-plications, including hadoop fs, access files stored on one or more HDFSclusters. The application should only be able to access files and servicesthey are authorized to access. See figure 1. Variations:

(a) Access HDFS directly using HDFS protocol.(b) Access HDFS indirectly though HDFS proxy servers via the HFTP

FileSystem or HTTP get.

Name Node

Data Node

kerb(joe)

kerb(hdfs)

block token

ApplicationMapReduce

Task

block token

delg(joe)

Figure 1: HDFS High-level Dataflow

2. Applications accessing third-party (non-Hadoop) services Non-MapReduce applications and MapReduce tasks accessing files or opera-tions supported by third party services. An application should only beable to access services they are authorized to access. Examples of third-party services:

(a) Access NFS files(b) Access ZooKeeper

3. User submitting jobs to MapReduce clusters A user submits jobs toone or more MapReduce clusters. Jobs can only be submitted to queuesthe user is authorized to use. The user can disconnect after job submissionand may re-connect to get job status. Jobs may need to access files storedon HDFS clusters as the user as described in case 1). The user needsto specify the list of HDFS clusters for a job at job submission. Jobsshould only be able to access only those HDFS files or third-party servicesauthorized for the submitting user. See figure 2. Variations:

(a) Job is submitted via JobClient protocol(b) Job is submitted via Web Services protocol (Phase 2)

4

Page 13: Big Data Security with Hadoop

Encryp8on  

•  Over  the  wire  encryp8on  for  some  socket  connec8ons  

•  RPC  encryp8on  added  soon  afer  Kerberos  •  Shuffle  encryp8on  (HTTPS)  added  in  Hadoop  2.0.2-­‐alpha,  back  ported  to  CDH4  MR1  

•  HDFS  block  streamer  encryp8on  added  in  Hadoop  2.0.2-­‐alpha  

•  Volume-­‐level  encryp8on  for  data  at  rest  

13

Page 14: Big Data Security with Hadoop

Big  Data  Security  

SECURITY  FOR  KEY  VALUE  STORES  

14  

Page 15: Big Data Security with Hadoop

Apache  Accumulo  

•  Robust,  scalable,  high  performance  data  storage  and  retrieval  system  

•  Built  by  NSA,  now  an  Apache  project  •  Based  on  Google’s  BigTable  •  Built  on  top  of  HDFS,  ZooKeeper  and  Thrif  •  Iterators  for  server-­‐side  extensions  •  Cell  labels  for  flexible  security  models  

15

Page 16: Big Data Security with Hadoop

Data  Model  

•  Mul8-­‐dimensional,  persistent,  sorted  map  •  Key/Value  store  with  a  twist  •  A  single  primary  key  (Row  ID)  •  Secondary  key  (Column)  internal  to  a  row  

•  Family  •  Qualifier  

•  Per-­‐cell  8mestamp  

16

Page 17: Big Data Security with Hadoop

Cell-­‐Level  Security  

•  Labels  stored  per  cell  •  Labels  consist  of  Boolean  expressions  (AND,  OR,  nes8ng)  

•  Labels  associated  with  each  user  •  Cell  labels  checked  against  user’s  labels  with  a  built-­‐in  iterator  

17

Page 18: Big Data Security with Hadoop

Pluggable  Authen8ca8on  

•  Currently  supports  username/password  authen8ca8on  backed  by  ZooKeeper  

•  ACCUMULO-­‐259  •  Targeted  for  Accumulo  1.5.0  

•  Authen8ca8on  info  replaced  with  generic  tokens  •  Supports  mul8ple  implementa8ons  (e.g.  Kerberos)  

18

Page 19: Big Data Security with Hadoop

Applica8on  Level  

•  Accumulo  ofen  paired  with  applica8on  level  authen8ca8on/authoriza8on  

•  Accumulo  users  created  per  applica8on  •  Each  applica8on  granted  access  level  of  most  permi`ed  user  

•  Applica8on  authen8cates  users,  grabs  user  authoriza8ons,  passes  user  labels  with  requests  

19

Page 20: Big Data Security with Hadoop

Apache  HBase  

•  Also  based  on  Google’s  BigTable  •  Started  as  a  Hadoop  contrib  project  •  Supports  column-­‐level  ACLs  •  Kerberos  for  authen8ca8on  •  Discussion  and  early  prototypes  of  cell-­‐level  security  ongoing  

20

Page 21: Big Data Security with Hadoop

Big  Data  Security  

FUTURE  

21  

Page 22: Big Data Security with Hadoop

Encryp8on  for  Data  at  Rest  

•  Need  mul8ple  levels  of  granularity  •  Encryp8on  keys  8ed  to  authoriza8on  labels  (like  Accumulo  labels  or  HBase  ACLs)  

•  APIs  for  file-­‐level,  block-­‐level,  or  record-­‐level  encryp8on  

22

Page 23: Big Data Security with Hadoop

Hive  Security  

•  Column-­‐level  ACLs  •  Kerberos  authen8ca8on  •  AccessServer  

23

Page 24: Big Data Security with Hadoop

24 ©2013 Cloudera, Inc.