24
1 Big Data Security Joey Echeverria | Principal Solutions Architect [email protected] | @fwiffo ©2013 Cloudera, Inc.

Big data security

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Big data security

1

Big Data SecurityJoey Echeverria | Principal Solutions [email protected] | @fwiffo

©2013 Cloudera, Inc.

Page 2: Big data security

2

Big Data Security

EARLY DAYS

Page 3: Big data security

3

Hadoop File Permissions

• Added in HADOOP-1298• Hadoop 0.16• Early 2008

• Authorization without authentication• POSIX-like RWX bits

Page 4: Big data security

4

MapReduce ACLs

• Added in HADOOP-3698• Hadoop 0.19• Late 2008

• ACLs per job queue• Set a list of allowed users or groups per operation

• Job submission• Job administration

• No authentication

Page 5: Big data security

5

Securing a Cluster Through a Gateway

• Hadoop cluster runs on a private network• Gateway server dual-homed (Hadoop network and

public network)• Users SSH onto gateway

• Optionally can create an SSH proxy for jobs to be submitted from the client machine

• Provides minimum level of protection

Page 6: Big data security

6

Big Data Security

WHY SECURITY MATTERS

Page 7: Big data security

7

Prevent Accidental Access

• Don’t let users shoot themselves in the foot• Main driver for early features• Not security per-se, but a critical first step• Doesn’t require strong authentication

Page 8: Big data security

8

Stop Malicious Users

• Early features were necessary, but not sufficient• Security has to get real• Hadoop runs arbitrary code• Implicit trust doesn’t prevent the insider threat

Page 9: Big data security

9

Co-mingle All Your Data

• Often overlooked• Big data means getting rid of stovepipes

• Scalability and flexibility are only 50% of the problem• Trust your data in a multi-tenant environment

• Most critical driver

Page 10: Big data security

10

Big Data Security

AN EVOLVING STORY

Page 11: Big data security

11

Authorization

• Files• MapReduce/YARN job queues• Service-level authorization

• Whitelists and blacklists of hosts and users

Page 12: Big data security

12

Authentication

• HADOOP-4487• Hadoop 0.22 and 0.20.205• Late 2010

• Based on Kerberos and internal delegation tokens• Provides strong user authentication• Also used for service-to-service authentication

Page 13: Big data security

13

Encryption

• Over the wire encryption for some socket connections

• RPC encryption added soon after Kerberos• Shuffle encryption (HTTPS) added in Hadoop 2.0.2-

alpha, back ported to CDH4 MR1• HDFS block streamer encryption added in Hadoop

2.0.2-alpha• Volume-level encryption for data at rest

Page 14: Big data security

14

Big Data Security

SECURITY FOR KEY VALUE STORES

Page 15: Big data security

15

Apache Accumulo

• Robust, scalable, high performance data storage and retrieval system

• Built by NSA, now an Apache project• Based on Google’s BigTable• Built on top of HDFS, ZooKeeper and Thrift• Iterators for server-side extensions• Cell labels for flexible security models

Page 16: Big data security

16

Data Model

• Multi-dimensional, persistent, sorted map• Key/Value store with a twist• A single primary key (Row ID)• Secondary key (Column) internal to a row

• Family• Qualifier

• Per-cell timestamp

Page 17: Big data security

17

Cell-Level Security

• Labels stored per cell• Labels consist of Boolean expressions (AND, OR,

nesting)• Labels associated with each user• Cell labels checked against user’s labels with a built-

in iterator

Page 18: Big data security

18

Pluggable Authentication

• Currently supports username/password authentication backed by ZooKeeper

• ACCUMULO-259• Targeted for Accumulo 1.5.0

• Authentication info replaced with generic tokens• Supports multiple implementations (e.g. Kerberos)

Page 19: Big data security

19

Application Level

• Accumulo often paired with application level authentication/authorization

• Accumulo users created per application• Each application granted access level of most

permitted user• Application authenticates users, grabs user

authorizations, passes user labels with requests

Page 20: Big data security

20

Apache HBase

• Also based on Google’s BigTable• Started as a Hadoop contrib project• Supports column-level ACLs• Kerberos for authentication• Discussion and early prototypes of cell-level security

ongoing

Page 21: Big data security

21

Big Data Security

FUTURE

Page 22: Big data security

22

Encryption for Data at Rest

• Need multiple levels of granularity• Encryption keys tied to authorization labels (like

Accumulo labels or HBase ACLs)• APIs for file-level, block-level, or record-level

encryption

Page 23: Big data security

23

Hive Security

• Column-level ACLs• Kerberos authentication• AccessServer

Page 24: Big data security

24 ©2013 Cloudera, Inc.