Upload
sujee-maniyam
View
505
Download
3
Embed Size (px)
Citation preview
+
Hadoop Security Landscape
Sujee Maniyam
Founder / Principal
http://elephantscale.com/
+But Security Picture Has Improved Rapidly…
n Lot of work going on in the eco system
n Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features
n ‘the core’ features are in
n Ease of use improving as well
+What Does It Mean to be ‘Secure’?
n 1) Control who can get in?
n 2) Verify the person’s identity
n 3) safeguard communications with user
n 4) What is allowed for this user
n 5) Audit / log access
n 6) Secure NOSQL
n 7) And finally… Protect data at rest
+1) Who can get in
n Control which machines can connect to NoSQL cluster
n Don’t expose the cluster to public n Too many open ports
n Too vulnerable
n Solutions: n Run cluster behind firewall
n Restrict which machines can connect to cluster
n Linux / Network level security
n Outside the actual NoSQL
+2) User Authentication
n How can we verify the user? n Username / password (gmail)
n Or use a third person (referee) n Kerberos
Source : http://1.bp.blogspot.com/
Wolf : Knock Knock…
Wolf : It’s me, little piggy
Pig :who is it?
+Kerberos : Quick Primer
n Kerberos is a authentication protocol for networked machines
n Validates client to server and vice-versa
n Strong crypto algorithms (AES, 3DES…)
+Kerberos Protocol Explained : Getting Beer @ Fair / Party
n Prove your age (identity) to wrist-band issuer n Ticket Granting Ticket
n Get a wristband à qualifies you to get beer n Service Ticket
n Go to bartender and ask for beer using your wrist-band n Service Request
n Get Beer ! J
n For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5-kerberos/
+3) Secure Client Communication
n Guard client / server communication (‘on the wire’)
n Done by using SASL (certificates)
n Prevents snooping by third parties
+4) What Is Allowed For This User?
n In unsecured environment users can read / write to any table n à not very secure!
n Control which data users can see..
+5) Audit logging
n See what is going on…
USER : tim, resource = hdfs:/data/logs , type = read, time=….
USER : tim, resource = hive:click_logs , type = read, query = “select *….”
+6) Secure NOSQL
n NoSQL solutions : n On Hadoop : Hbase, Accumulo
n Other : Cassandra, ……
n Access control n Table level access : can I read / can I write-update-insert ?
n Within a table, column level access
n Who can read column ‘social_security_number’ ?
+Accumulo : Quick Intro
n Developed by the National Security Agency (NSA) !
n Google Big Table implementation
n Nosql store on top of HDFS
n Security is a first grade concept
HDFS
Accumulo
+Accumulo Data Model
Family : info
Columns à name email Last 4 ssn Ssn Gmail password
Visibility tokens à
Level 1 Level 1 Level 1 Level 2 OR Top clearance
Top clearance
• Every thing in HBase data model • Plus each row has a ‘Visibility Token’
+Users Are Assigned ‘Visibility Tokens’
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top Clearance
+Accumulo only returns cells visible to user
family
Columns à name email Last 4 SSN Full SSN Gmail password
person1 Joe [email protected]
6789 123-45-6789
JoeSuperMan!
Visibility tokens à
Level 1 Level 1 Level 1 Level 2 OR Top clearance
Top clearance
+What Users Can See…
User Visibility Privilage Visible Cells
User 1 Level 1 Name Email Last 4 ssn
User 2 Level 1 + Level 2
Name Email Last 4 SSN Full SSN
Edward Snowden Level 1 + Level 2 + Top Clearance
Name Email Last 4 SSN Full SSN Gmail Password
+6) Final Step : Encrypt Data At Rest
n Eventually data ends up in disk
n We need to protect the ‘raw data’ on disk
n To prevent n Users going to disk directly
n Theft of hardware
+OK, so where are we…
Project / Solution
Purpose Status Vendor
kerberos Identity management
Available neutral
Knox Secure gateway Hortonworks CLoudera ?
Sentry Access control incubating Cloudera
Ranger (similar to Sentry)
Access control + Audit
In development (HDP 2.2) (originally XA secure)
Hortonworks
Rhino Secure HDFS data at rest
Available from Hadoop 2.6
Neutral (originally from Intel)
Accumulo Secure nosql Available neutral
+Future….
n Really need a unified standard (no fragmentation)
n Ease of use n Easy to setup policies
n Integrate with outside systems
n Easy audit tools