Transcript
Page 1: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Hadoop Security Today & TomorrowAmsterdam - April3rd, 2014Vinay Shukla Twitter: @NeoMythos

Page 2: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Agenda• What is Hadoop Security?

– 4 Security Pillars & Rings of Defense• What security elements exists today?

– Authentication– Authorization– Audit– Data Protection

• What is on the security roadmap?– Coming soon– Longer term projects

• Securing Hadoop with Apache Knox Gateway– Knox overview– Demo

• How to get involved

Page 3: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Two Reasons for Security in Hadoop

Hadoop Contains Sensitive Data– As Hadoop adoption grows so too has the types of data

organizations look to store. Often the data is proprietary or personal and it must be protected.

– In this context, Hadoop is governed by the same security requirements as any data center platform.

Hadoop is subject to Compliance adherence– Organizations are often subject to comply with

regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information.

– Adherence to other Corporate security policies.

1

2

Page 4: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

What is Apache Hadoop Security?

Security in Apache Hadoop is defined by four key pillars:

authentication, authorization, accountability, and data

protection.

Page 5: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Security: Rings of Defense

Perimeter Level Security• Network Security (i.e. Firewalls)• Apache Knox (i.e. Gateways)

Data Protection• Core Hadoop• Partners

Authentication• Kerberos

OS Security

Authorization• MR ACLs• HDFS Permissions• HDFS ACLs• HiveATZ-NG• HBase ACLs• Accumulo Label Security

Page 5

Page 6: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Authentication in Hadoop Today…

Authentication

Who am I/prove it?Control access to cluster.

Authorization

Restrict access to explicit data

Audit

Understand who did what

Data Protection

Encrypt data at rest & motion

Kerberos in native Apache Hadoop

Perimeter Security with Apache Knox Gateway

Page 7: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Kerberos Authentication in Hadoop

For more than 20 years, Kerberos has been the de-facto standard for strong authentication. …no other option exists.

The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen O’Malley in 2010.

What does Kerberos Do?– Establishes identity for clients, hosts and services– Prevents impersonation/passwords are never sent over the wire– Integrates w/ enterprise identity management tools such as LDAP & Active Directory– More granular auditing of data access/job execution

Page 8: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

• Single Hadoop access point

• REST API hierarchy• Consolidated API

calls• Multi-cluster

support

• Eliminates SSH “edge node”

• Central API management

• Central audit control • Simple Service

level Authorization

• SSO Integration – Siteminder, API Key*, OAuth* & SAML*

• LDAP & AD integration

Perimeter Security with Apache Knox

Integrated with existing systems to

simplify identity maintenance

Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security.

Single, simple point of access for a

cluster

Central controls ensure consistency across one or more

clusters

Page 9: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Authentication & Audit in Hadoop today…

Authorization

Restrict access to explicit data

Audit

Understand who did what

Data Protection

Encrypt data at rest & motion

Kerberos in native Apache Hadoop

Perimeter Security with Apache Knox Gateway

Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail

Cell level access control in Apache Accumulo

Authentication

Who am I/prove it?Control access to cluster.

Page 10: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Authorization: Who can do what in Hadoop?

• Access Control Services exist for each of the Hadoop components

– HDFS has file Permissions– YARN, MapReduce, HBase has Access Control Lists (ACL)– Accumulo Proves more granular label/cell level security

• Improvements to these services are being led by Hortonworks Team:

– HDFS Improvements – Extended ACL, more flexible via multiple policies on the same file or directory

– Hive Improvements – Hortonworks initiative called Hive ATZ-NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.

Page 11: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Data Protection in Hadoop today…

Authorization

Restrict access to explicit data

Audit

Understand who did what

Data Protection

Encrypt data at rest & motion

Kerberos in native Apache Hadoop

Perimeter Security with Apache Knox Gateway

Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail

Cell level access control in Apache Accumulo

Wire encryption in native Apache Hadoop

Orchestrated encryption with 3rd party tools

Authentication

Who am I/prove it?Control access to cluster.

Page 12: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Data Protection in Hadoop

must be applied at three different layers in Apache Hadoop

Storage: encrypt data while it is at restDirect data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption).

Transmission: encrypt data as it is in motionNative Apache Hadoop 2.0 provides wire encryption.

Upon Access: apply restrictions when accessedDirect data flows “into” and “out of” 3rd party encryption tools.

Data Protection

Page 13: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Data Protection – Details - Today

• Encryption of Data at Rest– Option 1: OS or Hardware Level Encryption (Out of the Box)– Option 2: Custom Development– Option 3: Certified Partners– Work underway for encryption in Hive, HDFS and HBase as core

platform capabilities. • Encryption of Data on the Wire

– All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team.

• Column Level Encryption– No current out of the box support in Hadoop.– Certified Partners provide these capabilities.

Page 14: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

What can be done today?

Authorization

Restrict access to explicit data

Audit

Understand who did what

Data Protection

Encrypt data at rest & motion

Kerberos in native Apache Hadoop

Perimeter Security with Apache Knox Gateway

Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail

Cell level access control in Apache Accumulo

Service level Authorization with KnoxAccess Audit with Knox

Wire encryption in native Apache Hadoop

Wire Encryption with Knox

Orchestrated encryption with 3rd party tools

Authentication

Who am I/prove it?Control access to cluster.

Page 15: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Hadoop SecurityHortonworks is Delivering Secure Hadoop for the EnterpriseSecurity for Hadoop must be addressed within every layer of the stack and integrated into existing frameworksFor a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page

Gov

erna

nce

& In

tegr

atio

n

Secu

rity

Ope

ratio

nsData Access

Data Management

HDP 2.1

New: Apache KnoxPerimeter security for Hadoop

A common place to preform authentication across Hadoop and all related projects

Integrated to LDAP and AD Currently supports:

WebHDFS, WebHCAT, Oozie, Hive & HBase Broad community effort, incubated with

Microsoft, broad set of developers involved

Security Investments

Security Phase 3:• Audit event correlation and Audit viewer• Data Encryption in HDFS, Hive & HBase• Knox for HDFS HA, Ambari & Falcon• Support Token-Based AuthN beyond Kerb

Security Phase 2:• ACLs for HDFS• Knox: Hadoop REST API Security• SQL-style Hive AuthZ (GRANT, REVOKE)• SSL support for Hive Server 2• SSL for DN/NN UI & WebHDFS• PAM support for Hive

Phase 1• Strong AuthN with Kerberos • HBase, Hive, HDFS basic AuthZ• Encryption with SSL for NN, JT, etc.• Wire encryption with Shuffle, HDFS, JDBC

HDP 2.1

Page 16: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Hadoop Security: Phase 2

Page 16

HDP 2.1 Features

Release Theme REST API Security, Improve AuthZ, Wire Encryption

Specific Features • Hadoop REST API Security with Apache Knox• Eliminates SSH edge node• Single Hadoop access point• LDAP, AD based Authentication• Service-level Authorization• Audit support for REST access

• SQL style Hive Authorization with fine grain access• HDFS Access Control Lists• SSL support in HiveServer2• SSL support in NN/DN UI & WebHDFS• Pluggable Authentication Module (PAM) in Hive

Included Components

Apache Knox, Hive, HDFS

Page 17: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Why Knox?

From fb.com/hadoopmemes

Apache Knox Gateway • REST/HTTP API security for

Hadoop • Eliminates SSH edge node

• Single REST API access point • Centralized Authentication,

Authorization, and Audit for Hadoop REST/HTTP services

• LDAP/AD Authentication, Service Authorization, Audit etc.

Knox Eliminates• Client’s requirements for intimate knowledge of cluster topology

Page 18: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Knox Deployment with Hadoop ClusterApplication Tier

DMZ

Switch Switch

….Master NodesRack 1

Switch

NN

SNN

….Slave NodesRack 2

….Slave NodesRack N

SwitchSwitch

DN DN

Web Tier

LB

Knox

Hadoop CLIs

Page 19: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Hadoop REST API Security: Drill-Down

Page 19

RESTClient

EnterpriseIdentityProviderLDAP/AD

Knox Gateway

GWGW

Firewall

Firewall

DMZ

LB

Edge Node/Hadoop CLIs

RPC

HTTP

HTTP HTTP

LDAP

Hadoop Cluster 1Masters

Slaves

RM

NN

WebHCatOozie

DN NM

HS2

Hadoop Cluster 2Masters

Slaves

RM

NN

WebHCatOozie

DN NM

HS2

HBase

HBase

Page 20: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

Selects appropriate service filter chain based on request

URL mapping rules

RESTClient

ProtocolListener

Listens for requests on the appropriate protocols (e.g. HTTP/HTTPS)

ServiceSelector

Service Specific Filter Chain

IdentityAsserter

FilterDispatchRewrite

FilterAuthNFilter

HadoopService

Enforces propagation of authenticated identity to Hadoop

by modifying request

Streams request and response to and from Hadoop service based

on rewritten URLs

Translates URLs in request and response between external

and internal URLs based on service specific rules

EnterpriseIdentityProvider

Enterprise/Cloud SSO

Provider

Challenges client for credentials and authenticates

or validates SSO Token

Service filter chains are composed and configured at deployment time

by service specific plugins

What is Knox? Client > Knox > Hadoop Cluster

Page 20

Knox Gateway

Page 21: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014© Hortonworks Inc. 2014

Knox Gateway in actionSubmit MR job via Knox

Page 21

Page 22: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

HDFS & MR Operations with Knox• Create a few directories curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777' curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777" curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777"

• Upload filescurl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/

lib/hadoop-examples.jar?op=CREATEcurl -iku guest:guest-password -X PUT -L -T README -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/

README?op=CREATE"

• Run MR jobcurl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar

• Query the jobs for a usercurl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue

• Query the status of a given jobcurl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id>

• Read the output filecurl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN

• Remove a directorycurl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"

Page 22

Page 23: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014

How to get Involved

Page 23

Resource Location

Security Labs http://hortonworks.com/labs/security/

Security Blogs http://hortonworks.com/blog/category/innovation/security/

Apache Knox Tutorial

http://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/

Need help? http://hortonworks.com/community/forums/forum/security/ or [email protected]

Page 24: Hadoop Security Today & Tomorrow with Apache Knox

© Hortonworks Inc. 2014 Page 24

Thank you! Amsterdam - April3rd, 2014Vinay Shukla Twitter: @NeoMythos


Recommended