27
Page 1 © Hortonworks Inc. 2014 Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger & Apache Knox Hortonworks. We do Hadoop.

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Embed Size (px)

DESCRIPTION

This presentation was included in a 30-minute webinar Balaji Ganesan, Hortonworks senior director for enterprise security strategy and Vinay Shukla, director of product management. They discussed Hortonworks Data Platform 2.2’s features for delivering comprehensive security in HDP. Balaji and Vinay discussed Apache Ranger and Apache Knox and how they are integrated in HDP 2.2 to provide fine grain authorization, auditing and API security that can be centrally administered.

Citation preview

Page 1: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 1 © Hortonworks Inc. 2014

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger & Apache Knox

Hortonworks. We do Hadoop.

Page 2: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 2 © Hortonworks Inc. 2014

Speakers

Justin Sears

Hortonworks Product Marketing Manager

Vinay Shukla

Hortonworks Director of Product Management

Balaji Ganesan

Hortonworks Senior Director of Enterprise Security Strategy

Page 3: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 3 © Hortonworks Inc. 2014

Agenda

•  Overview of Security in HDP 2.2: §  Centralized security with Apache Ranger

§  API security with Apache Knox

•  Demo

•  Q & A

We’ll move quickly: •  Attendee phone lines are muted •  Text any questions to Vinay Shukla using Webex chat

•  Questions answered at the end •  Unanswered questions and answers in upcoming blog post

Page 4: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 4 © Hortonworks Inc. 2014

Big Data, Hadoop & Data Center Re-platforming

Business Drivers

•  From reactive analytics to proactive interactions

•  Insights that drive competitive advantage & optimal returns

Financial Drivers

•  Cost of data systems, as % of IT spend, continues to grow

•  Cost advantages of commodity hardware & open source software

$ Technical Drivers

•  Data is growing exponentially & existing systems overwhelmed

•  Predominantly driven by NEW types of data that can inform analytics

There is an inequitable balance between vendor and customer in the market

Page 5: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 5 © Hortonworks Inc. 2014

Clickstream Capture and analyze website visitors’ data trails and optimize your website

Sensors Discover patterns in data streaming automatically from remote sensors and machines

Server Logs Research logs to diagnose process failures and prevent security breaches

New Types of Data Hadoop Value:

Sentiment Understand how your customers feel about your brand and products – right now

Geographic Analyze location-based data to manage operations where they occur

Unstructured Understand patterns in files across millions of web pages, emails, and documents

Page 6: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 6 © Hortonworks Inc. 2014

A Shift from Reactive to Proactive Interactions

HDP and Hadoop allow organizations to use data to shift interactions from…

Reactive Post Transaction

Proactive Pre Decision

…to Real-time Personalization From static branding

…to repair before break From break then fix

…to Designer Medicine From mass treatment

…to Automated Algorithms From Educated Investing

…to 1x1 Targeting From mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Telco

Page 7: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 7 © Hortonworks Inc. 2014

Enterprise Goals for the Modern Data Architecture

•  Consolidate siloed data sets structured and unstructured

•  Central data set on a single cluster

•  Multiple workloads across batch interactive and real time

•  Central services for security, governance and operation

•  Preserve existing investment in current tools and platforms

•  Single view of the customer, product, supply chain

APP

LIC

ATIO

NS

DAT

A S

YSTE

M

Business Analytics

Custom Applications

Packaged Applications

RDBMS

EDW

MPP

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch CRM

ERP

Other 1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)

SOU

RC

ES

EXISTING  Systems  

Clickstream   Web    &Social  

Geoloca9on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Page 8: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 8 © Hortonworks Inc. 2014

YARN Transformed Hadoop & Opened a New Era

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 9: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 9 © Hortonworks Inc. 2014

YARN Extends Hadoop to Other Data Center Leaders

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

•  Supports 3rd-party ISV tools

(ex. SAS, Syncsort, Actian, etc.)

YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 10: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 10 © Hortonworks Inc. 2014

Enterprise Hadoop: Central Set of Services

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for:

•  Governance

•  Operations

•  Security

Everything that plugs into Hadoop inherits these services

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

Page 11: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 11 © Hortonworks Inc. 2014

Hortonworks Development Investment for the Enterprise

Vertical Integration with YARN and HDFS

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

•  Ensure engines can run reliably and respectfully in a YARN based cluster •  Implement features throughout the stack to accommodate

Page 12: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 12 © Hortonworks Inc. 2014

Hortonworks Development Investment for the Enterprise

Horizontal Integration for Enterprise Services

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

•  Ensure consistent enterprise services are applied across the entire Hadoop stack •  Integrate with and extend existing data center solutions for these key requirements

Page 13: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 13 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

Deployment Choice Linux Windows Cloud

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

On-Premises

Page 14: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 14 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

SECURITY

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

Deployment Choice Linux Windows On-Premises Cloud

Page 15: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 15 © Hortonworks Inc. 2014

Apache Ranger for Centralized Security

Page 16: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 16 © Hortonworks Inc. 2014

Apache Ranger (formerly Apache Argus)

Central Security Administration, Authorization and Auditing for Hadoop

Page 17: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 17 © Hortonworks Inc. 2014

Central Security Administration •  Delivers a ‘single pane of glass’ for

the security administrator •  Centralizes administration of

security policy •  Ensures consistent coverage across

the entire Hadoop stack

Page 18: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 18 © Hortonworks Inc. 2014

Set Up Authorization Policies

File level access control with flexible definition

Control user and group permissions

Page 19: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 19 © Hortonworks Inc. 2014

Monitor User Activity with Auditing

Page 20: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 20 © Hortonworks Inc. 2014

New Apache Ranger Features in HDP 2.2

New Components in Centralized Administration •  Apache Storm Authorization & Auditing

•  Apache Knox Authorization & Auditing

Deeper Integration with the Hadoop Stack •  Windows Support •  Integration with Hive new auth API, support grant/revoke commands •  Support grant/revoke commands in HBase

Enterprise Readiness •  Rest APIs for policy manager

•  Store Audit logs locally in HDFS

Page 21: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 21 © Hortonworks Inc. 2014

About Apache Knox API Security for Hadoop

Page 22: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 22 © Hortonworks Inc. 2014

Knox: Securely Share the Data Lake w/ Many Users

•  Securely extends the reach of Hadoop APIs to anyone on any device

•  Serves as a gateway for Hadoop’s REST API •  Different REST APIs have varying levels of authentication, authorization, SSL and SSO capabilities

•  For enterprise authentication, applies enterprise capabilities to all REST APIs: IDM Integration, SSO, Oauth, SAML

•  Avoids exposing the cluster port and hostnames to all users

Page 23: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 23 © Hortonworks Inc. 2014

Load Balancer

Extend Hadoop API Reach with Knox

Hadoop Cluster

Application Tier App A App N App B App C

Data Ingest

ETL

Admin/Operator

Bastian Node

SSH

RPC Call

Falcon Oozie Sqoop Flume

Data Operator

Business User

Hadoop Admin

JDBC/ODBC REST/HTTP

Knox

Page 24: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 24 © Hortonworks Inc. 2014

New Apache Knox Features in HDP 2.2

•  Use Ambari for Knox install, start, stop an configuration

•  New support for: •  YARN REST API •  HDFS HA •  SSL to Hadoop cluster services (WebHDFS, HBase, Hive and Oozie)

•  Knox Management REST API

•  Integration with Apache Ranger for service level authorization

Page 25: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 25 © Hortonworks Inc. 2014

Demo

Page 26: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 26 © Hortonworks Inc. 2014

Q & A

Page 27: Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

Page 27 © Hortonworks Inc. 2014

Thank you! Learn more at: hortonworks.com/labs/security/

Register for the remaining 7 Discover HDP 2.2 Webinars

Hortonworks.com/webinars