25
+ Hadoop Security Landscape Sujee Maniyam Founder / Principal http://elephantscale.com/ [email protected]

Hadoop security landscape

Embed Size (px)

Citation preview

+

Hadoop Security Landscape

Sujee Maniyam

Founder / Principal

http://elephantscale.com/

[email protected]

+Approach to Security in Hadoop Until Recently…

+But Security Picture Has Improved Rapidly…

n  Lot of work going on in the eco system

n  Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features

n  ‘the core’ features are in

n  Ease of use improving as well

+What Does It Mean to be ‘Secure’?

n  1) Control who can get in?

n  2) Verify the person’s identity

n  3) safeguard communications with user

n  4) What is allowed for this user

n  5) Audit / log access

n  6) Secure NOSQL

n  7) And finally… Protect data at rest

+1) Who can get in

n  Control which machines can connect to NoSQL cluster

n  Don’t expose the cluster to public n  Too many open ports

n  Too vulnerable

n  Solutions: n  Run cluster behind firewall

n  Restrict which machines can connect to cluster

n  Linux / Network level security

n  Outside the actual NoSQL

+Trusted Environment

+Apache Knox Gateway

+2) User Authentication

n  How can we verify the user? n  Username / password (gmail)

n  Or use a third person (referee) n  Kerberos

Source : http://1.bp.blogspot.com/

Wolf : Knock Knock…

Wolf : It’s me, little piggy

Pig :who is it?

+Kerberos : Quick Primer

n  Kerberos is a authentication protocol for networked machines

n  Validates client to server and vice-versa

n  Strong crypto algorithms (AES, 3DES…)

+Kerberos Protocol for Getting a Beer in a Carnival / Fair J_

+Kerberos Protocol Explained : Getting Beer @ Fair / Party

n  Prove your age (identity) to wrist-band issuer n  Ticket Granting Ticket

n  Get a wristband à qualifies you to get beer n  Service Ticket

n  Go to bartender and ask for beer using your wrist-band n  Service Request

n  Get Beer ! J

n  For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5-kerberos/

+3) Secure Client Communication

n  Guard client / server communication (‘on the wire’)

n  Done by using SASL (certificates)

n  Prevents snooping by third parties

+4) What Is Allowed For This User?

n  In unsecured environment users can read / write to any table n  à not very secure!

n  Control which data users can see..

+5) Audit logging

n  See what is going on…

USER : tim, resource = hdfs:/data/logs , type = read, time=….

USER : tim, resource = hive:click_logs , type = read, query = “select *….”

+6) Secure NOSQL

n  NoSQL solutions : n  On Hadoop : Hbase, Accumulo

n  Other : Cassandra, ……

n  Access control n  Table level access : can I read / can I write-update-insert ?

n  Within a table, column level access

n  Who can read column ‘social_security_number’ ?

+Accumulo : Quick Intro

n  Developed by the National Security Agency (NSA) !

n  Google Big Table implementation

n  Nosql store on top of HDFS

n  Security is a first grade concept

HDFS

Accumulo

+Accumulo Data Model

Family : info

Columns à name email Last 4 ssn Ssn Gmail password

Visibility tokens à

Level 1 Level 1 Level 1 Level 2 OR Top clearance

Top clearance

•  Every thing in HBase data model •  Plus each row has a ‘Visibility Token’

+Users Are Assigned ‘Visibility Tokens’

User id Visibility levels

User 1 Level 1

User 2 Level 1 + Level 2

Edward Snowden Level 1 + Level 2 + Top Clearance

+Accumulo only returns cells visible to user

family

Columns à name email Last 4 SSN Full SSN Gmail password

person1 Joe [email protected]

6789 123-45-6789

JoeSuperMan!

Visibility tokens à

Level 1 Level 1 Level 1 Level 2 OR Top clearance

Top clearance

+What Users Can See…

User Visibility Privilage Visible Cells

User 1 Level 1 Name Email Last 4 ssn

User 2 Level 1 + Level 2

Name Email Last 4 SSN Full SSN

Edward Snowden Level 1 + Level 2 + Top Clearance

Name Email Last 4 SSN Full SSN Gmail Password

+6) Final Step : Encrypt Data At Rest

n  Eventually data ends up in disk

n  We need to protect the ‘raw data’ on disk

n  To prevent n  Users going to disk directly

n  Theft of hardware

+Transparent Encryption

+OK, so where are we…

Project / Solution

Purpose Status Vendor

kerberos Identity management

Available neutral

Knox Secure gateway Hortonworks CLoudera ?

Sentry Access control incubating Cloudera

Ranger (similar to Sentry)

Access control + Audit

In development (HDP 2.2) (originally XA secure)

Hortonworks

Rhino Secure HDFS data at rest

Available from Hadoop 2.6

Neutral (originally from Intel)

Accumulo Secure nosql Available neutral

+Future….

n  Really need a unified standard (no fragmentation)

n  Ease of use n  Easy to setup policies

n  Integrate with outside systems

n  Easy audit tools

+Thanks! & Questions?

Sujee Maniyam

Founder / Principal

http://elephantscale.com/

[email protected]