Upload
chicago-hadoop-users-group
View
109
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Hadoop and Big Data Security - Kevin T. Smith
Citation preview
Hadoop and Big Data Security
Kevin T. Smith, 11/14/2013Ksmith <AT> Novetta . COM
Big Data Security – Why Should We Care? New Challenges related to Data Management, Security, and Privacy
As data growth is explosive, so is the complexity of our IT environments Many organizations required to enforce access control & privacy restrictions on data sets (HIPAA, Privacy Laws) – or face steep penalties & fines Organizations are increasingly required to enforce access control to their data scientists based on Need-to-Know, User Authorization levels, and what data they are allowed to see – especially in Healthcare, Finance, and GovernmentOrganizations struggling to understand what data they can release
Mismanagement of Data Sets -- Costly..AOL Research “Data Valdez” Incident• CNNMoney - “101 Dumbest Moments in Business”• $5 Million Settlement , plus $100 to each member of AOL between 3/2006-5/2006, + $50
to each member who believed their data was in the released data; Fired employees, CTO Resignation
The Netflix Contest Anonymized Data Set Incident • Class-Action Lawsuit, $9 Million Settlement
Massachusetts Hospital Record Incident Cyber Security Attacks are on the Rise
Ponemon Institute – the Average Cost of a Data Breach in the U.S. is 5.4 Million dollars*Playstation (2011) – Experts predict costs between 2.2 and 2.4 Billion* (Breach Study: Global Analysis, May 2013)
A (Brief) History of Hadoop Security
Hadoop developed without Security in MindOriginally No Security model
No authentication of users or servicesAnyone could submit arbitrary code to be executedLater authorization added, but any user could impersonate other users with command-line switch
In 2009, Yahoo! focused on Hadoop Authentication, and did a Hadoop Redesign, But…
Resulting Security Model is ComplexSecurity Configuration is complex & Easy to Mess UpNo Data at Rest EncryptionKerberos-Centric Limited Authorization Capabilities
Things are Changing, But Slowly..
It is important to understand how Hadoop Security is Currently Implemented & ConfiguredIt is important to understand how to meet your organization’s security requirements
Hadoop Security Data Flow
Distributed Security is a ChallengeSince the .20.20x distributions of Hadoop, much of the model is Kerberos Centric , as you see to the right Model is quite complex, as you will see on the next slide
Token Delegation & Hadoop Security Flow
Token Used For
Kerberos TGT Kerberos initial authentication to KDC.
Kerberos service ticket
Kerberos initial authentication between users, client processes, and services.
Delegation token Token issued by the NameNode to the client, used by the client or any services working on the client’s behalf to authenticate them to the NameNode.
Block Access token
Token issued by the NameNode after validating authorization to a particular block of data, based on a shared secret with the DataNode. Clients (and services working on the client’s behalf) use the Block Access token to request blocks from the DataNode.
Job token This is issued by the JobTracker to TaskTrackers. Tasks communicating with TaskTrackers for a particular job use this token to prove they are associated with the job.
Some Vendor Activity in Hadoop Security
Cloudera Sentry – Fine Grained Access Control for Apache Hive & Cloudera ImpalaIBM InfoSphere Optim Data Masking – Optim Data Masking provides “De-identification” of data by obfuscating corporate secrets, Guardium provides monitoring & auditingIntel’s Secure Hadoop Distribution – Encryption in transit & at rest, Granular access control with HBase DataStax Enterprise – Encryption in Transit & at Rest (using Cassandra for storage)DataGuise for Hadoop – Detects & protects sensitive data, setting access permission, masking or encrypting data, authorization based access Knox Gateway (Hortonworks) – Perimeter security, integration with IDAM environments, manage security across multiple clusters – now an Apache ProjectProtegrity – Big Data Protector provides Encryption & tokenization, Enterprise Security Administrator provides central policy, key mgmt, auditing, reportingSqrrl – Builds on Apache Accumulo’s security capabilities for Hadoop Zettaset Secure Orchestrator – security wrapper around Hadoop
Seems to be a New One Every Week!
Apache Accumulo
• Cell-Level Access Control via visibility• By default, uses its own db for users & credentials• Can be extended in code to use other Identity & Access Management Infrastructure
Project Rhino
Intel launched this open source effort to improve security capabilities of Hadoop & contributed code to Apache in early 2013. Encrypted Data at Rest - JIRA Tasks HADOOP-9331 (Hadoop Crypto Codec Framework and Crypto Codec Implementation) and MAPREDUCE-5025 (Key Distribution and Management for Supporting Crypto Codec in MapReduce) . ZOOKEEPER-1688 will provide the ability for transparent encryption of snapshots and commit logs on disk, protecting against the leakage of sensitive information from files at rest.Token-Based Authentication & Unified Authorization Framework - JIRA TasksHADOOP-9392 (Token-Based Authentication and Single Sign-On) and HADOOP-9466(Unified Authorization Framework) Improved Security in HBase - The JIRA Task HBASE-6222 (Add Per-KeyValue Security) adds cell-level authorization to HBase – something that Apache Accumulo has but HBase does not. HBASE-7544 builds on the encryption framework being developed, extending it to HBase, providing transparent table encryption.
What’s the Best Guidance Now?
Identify and Understand the Sensitivity Levels of Your DataAre there access control policies associated with your data?
Understand the Impact of the Release of Your DataNetflix example – Could someone couple your data with open source data to gain new (and unintended) insight?
Develop Policies & Procedures relating to Security & Privacy of Your Data Sets
Data IngestAccess Control within Your OrganizationCleansing/Sanitization/DestructionAuditingMonitoring ProceduresIncident Response
Develop a Technical Security Approach that Complements Hadoop Security
Questions?
Ksmith <AT> Novetta.COM