Hadoop Security Today & Tomorrow with Apache Knox

Embed Size (px)

DESCRIPTION

Hadoop Security, what can you today and what is coming. Details on Apache Knox- REST API security for Hadoop.

Text of Hadoop Security Today & Tomorrow with Apache Knox

  • Hortonworks Inc. 2014 Hadoop Security Today & Tomorrow Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos
  • Hortonworks Inc. 2014 Agenda What is Hadoop Security? 4 Security Pillars & Rings of Defense What security elements exists today? Authentication Authorization Audit Data Protection What is on the security roadmap? Coming soon Longer term projects Securing Hadoop with Apache Knox Gateway Knox overview Demo How to get involved
  • Hortonworks Inc. 2014 Two Reasons for Security in Hadoop Hadoop Contains Sensitive Data As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information. Adherence to other Corporate security policies. 1 2
  • Hortonworks Inc. 2014 What is Apache Hadoop Security? Security in Apache Hadoop is defined by four key pillars: authentication, authorization, accou ntability, and data protection.
  • Hortonworks Inc. 2014 Security: Rings of Defense Perimeter Level Security Network Security (i.e. Firewalls) Apache Knox (i.e. Gateways) Data Protection Core Hadoop Partners Authentication Kerberos OS Security Authorization MR ACLs HDFS Permissions HDFS ACLs HiveATZ-NG HBase ACLs Accumulo Label Security Page 5
  • Hortonworks Inc. 2014 Authentication in Hadoop Today Authentication Who am I/prove it? Control access to cluster. Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway
  • Hortonworks Inc. 2014 Kerberos Authentication in Hadoop For more than 20 years, Kerberos has been the de-facto standard for strong authentication. no other option exists. The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen OMalley in 2010. What does Kerberos Do? Establishes identity for clients, hosts and services Prevents impersonation/passwords are never sent over the wire Integrates w/ enterprise identity management tools such as LDAP & Active Directory More granular auditing of data access/job execution
  • Hortonworks Inc. 2014 Single Hadoop access point REST API hierarchy Consolidated API calls Multi-cluster support Eliminates SSH edge node Central API management Central audit control Simple Service level Authorization SSO Integration Siteminder, API Key*, OAuth* & SAML* LDAP & AD integration Perimeter Security with Apache Knox Integrated with existing systems to simplify identity maintenance Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security. Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters
  • Hortonworks Inc. 2014 Authentication & Audit in Hadoop today Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop MapReduce Access Control Lists HDFS Permissions Process Execution audit trail Cell level access control in Apache Accumulo Authentication Who am I/prove it? Control access to cluster.
  • Hortonworks Inc. 2014 Authorization: Who can do what in Hadoop? Access Control Services exist for each of the Hadoop components HDFS has file Permissions YARN, MapReduce, HBase has Access Control Lists (ACL) Accumulo Proves more granular label/cell level security Improvements to these services are being led by Hortonworks Team: HDFS Improvements Extended ACL, more flexible via multiple policies on the same file or directory Hive Improvements Hortonworks initiative called Hive ATZ- NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.
  • Hortonworks Inc. 2014 Data Protection in Hadoop today Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop MapReduce Access Control Lists HDFS Permissions Process Execution audit trail Cell level access control in Apache Accumulo Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • Hortonworks Inc. 2014 Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Direct data flows into and out of 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption). Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Direct data flows into and out of 3rd party encryption tools. Data Protection
  • Hortonworks Inc. 2014 Data Protection Details - Today Encryption of Data at Rest Option 1: OS or Hardware Level Encryption (Out of the Box) Option 2: Custom Development Option 3: Certified Partners Work underway for encryption in Hive, HDFS and HBase as core platform capabilities. Encryption of Data on the Wire All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team. Column Level Encryption No current out of the box support in Hadoop. Certified Partners provide these capabilities.
  • Hortonworks Inc. 2014 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop MapReduce Access Control Lists HDFS Permissions Process Execution audit trail Cell level access control in Apache Accumulo Service level Authorization with Knox Access Audit with Knox Wire encryption in native Apache Hadoop Wire Encryption with Knox Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • Hortonworks Inc. 2014 Hadoop Security Hortonworks is Delivering Secure Hadoop for the Enterprise Security for Hadoop must be addressed within every layer of the stack and integrated into existing frameworks For a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page Governance &Integration Security Operations Data Access Data Management HDP 2.1 New: Apache Knox Perimeter security for Hadoop A common place to preform authentication across Hadoop and all related projects Integrated to LDAP and AD Currently supports: WebHDFS, WebHCAT, Oozie, Hive & HBase Broad community effort, incubated with Microsoft, broad set of developers involved Security Investments Security Phase 3: Audit event correlation and Audit viewer Data Encryption in HDFS, Hive & HBase Knox for HDFS HA, Ambari & Falcon Support Token-Based AuthN beyond Kerb Security Phase 2: ACLs for HDFS Knox: Hadoop REST API Security SQL-style Hive AuthZ (GRANT, REVOKE) SSL support for Hive Server 2 SSL for DN/NN UI & WebHDFS PAM support for Hive Phase 1 Strong AuthN with Kerberos HBase, Hive, HDFS basic AuthZ Encryption with SSL for NN, JT, etc. Wire encryption with Shuffle, HDFS, JDBC
  • Hortonworks Inc. 2014 Hadoop Security: Phase 2 Page 16 HDP 2.1 Features Release Theme REST API Security, Improve AuthZ, Wire Encryption Specific Features Hadoop REST API Security with Apache Knox Eliminates SSH edge node Single Hadoop access point LDAP, AD based Authentication Service-level Authorization Audit support for REST access SQL style Hive Authorization with fine grain access HDFS Access Control Lists SSL support in HiveServer2 SSL support in NN/DN UI & WebHDFS Pluggable Authentication Module (PAM) in Hive Included Components Apache Knox, Hive, HDFS
  • Hortonworks Inc. 2014 Why Knox? From fb.com/hadoopmemes Apache Knox Gateway REST/HTTP API security for Hadoop Eliminates SSH edge node Single REST API access point Centralized Authentication, Authorization, and Audit for Hadoop REST/HTTP services LDAP/AD Authentication, Service Authorization, Audit etc. Knox Eliminates Clients requirements for intimate knowledge of cluster topology
  • Hortonworks Inc. 2014 Knox Deployment with Hadoop Cluster Application Tier