View
594
Download
2
Category
Preview:
DESCRIPTION
This presentation was included in a 30-minute webinar Balaji Ganesan, Hortonworks senior director for enterprise security strategy and Vinay Shukla, director of product management. They discussed Hortonworks Data Platform 2.2’s features for delivering comprehensive security in HDP. Balaji and Vinay discussed Apache Ranger and Apache Knox and how they are integrated in HDP 2.2 to provide fine grain authorization, auditing and API security that can be centrally administered.
Citation preview
Page 1 © Hortonworks Inc. 2014
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger & Apache Knox
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Vinay Shukla
Hortonworks Director of Product Management
Balaji Ganesan
Hortonworks Senior Director of Enterprise Security Strategy
Page 3 © Hortonworks Inc. 2014
Agenda
• Overview of Security in HDP 2.2: § Centralized security with Apache Ranger
§ API security with Apache Knox
• Demo
• Q & A
We’ll move quickly: • Attendee phone lines are muted • Text any questions to Vinay Shukla using Webex chat
• Questions answered at the end • Unanswered questions and answers in upcoming blog post
Page 4 © Hortonworks Inc. 2014
Big Data, Hadoop & Data Center Re-platforming
Business Drivers
• From reactive analytics to proactive interactions
• Insights that drive competitive advantage & optimal returns
Financial Drivers
• Cost of data systems, as % of IT spend, continues to grow
• Cost advantages of commodity hardware & open source software
$ Technical Drivers
• Data is growing exponentially & existing systems overwhelmed
• Predominantly driven by NEW types of data that can inform analytics
There is an inequitable balance between vendor and customer in the market
Page 5 © Hortonworks Inc. 2014
Clickstream Capture and analyze website visitors’ data trails and optimize your website
Sensors Discover patterns in data streaming automatically from remote sensors and machines
Server Logs Research logs to diagnose process failures and prevent security breaches
New Types of Data Hadoop Value:
Sentiment Understand how your customers feel about your brand and products – right now
Geographic Analyze location-based data to manage operations where they occur
Unstructured Understand patterns in files across millions of web pages, emails, and documents
Page 6 © Hortonworks Inc. 2014
A Shift from Reactive to Proactive Interactions
HDP and Hadoop allow organizations to use data to shift interactions from…
Reactive Post Transaction
Proactive Pre Decision
…to Real-time Personalization From static branding
…to repair before break From break then fix
…to Designer Medicine From mass treatment
…to Automated Algorithms From Educated Investing
…to 1x1 Targeting From mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
Page 7 © Hortonworks Inc. 2014
Enterprise Goals for the Modern Data Architecture
• Consolidate siloed data sets structured and unstructured
• Central data set on a single cluster
• Multiple workloads across batch interactive and real time
• Central services for security, governance and operation
• Preserve existing investment in current tools and platforms
• Single view of the customer, product, supply chain
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-Time Batch CRM
ERP
Other 1 ° ° °
° ° ° °
HDFS (Hadoop Distributed File System)
SOU
RC
ES
EXISTING Systems
Clickstream Web &Social
Geoloca9on Sensor & Machine
Server Logs
Unstructured
Page 8 © Hortonworks Inc. 2014
YARN Transformed Hadoop & Opened a New Era
YARN The Architectural Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Page 9 © Hortonworks Inc. 2014
YARN Extends Hadoop to Other Data Center Leaders
YARN The Architectural Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
• Supports 3rd-party ISV tools
(ex. SAS, Syncsort, Actian, etc.)
YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Page 10 © Hortonworks Inc. 2014
Enterprise Hadoop: Central Set of Services
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for:
• Governance
• Operations
• Security
Everything that plugs into Hadoop inherits these services
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Load data and manage
according to policy
Deploy and effectively
manage the platform
Provide layered approach to
security through Authentication, Authorization,
Accounting, and Data Protection
SECURITY GOVERNANCE OPERATIONS
Script
Pig
SQL
Hive
Java Scala
Cascading
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Others
ISV Engines
YARN: Data Operating System (Cluster Resource Management)
HDFS (Hadoop Distributed File System)
Tez Slider Slider Tez Tez
Page 11 © Hortonworks Inc. 2014
Hortonworks Development Investment for the Enterprise
Vertical Integration with YARN and HDFS
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Load data and manage
according to policy
Deploy and effectively
manage the platform
Provide layered approach to
security through Authentication, Authorization,
Accounting, and Data Protection
SECURITY GOVERNANCE OPERATIONS
Script
Pig
SQL
Hive
Java Scala
Cascading
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Others
ISV Engines
YARN: Data Operating System (Cluster Resource Management)
HDFS (Hadoop Distributed File System)
Tez Slider Slider Tez Tez
• Ensure engines can run reliably and respectfully in a YARN based cluster • Implement features throughout the stack to accommodate
Page 12 © Hortonworks Inc. 2014
Hortonworks Development Investment for the Enterprise
Horizontal Integration for Enterprise Services
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Load data and manage
according to policy
Deploy and effectively
manage the platform
Provide layered approach to
security through Authentication, Authorization,
Accounting, and Data Protection
SECURITY GOVERNANCE OPERATIONS
Script
Pig
SQL
Hive
Java Scala
Cascading
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Others
ISV Engines
YARN: Data Operating System (Cluster Resource Management)
HDFS (Hadoop Distributed File System)
Tez Slider Slider Tez Tez
• Ensure consistent enterprise services are applied across the entire Hadoop stack • Integrate with and extend existing data center solutions for these key requirements
Page 13 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume Kafka NFS
WebHDFS
Authentication Authorization
Audit Data Protection
Storage: HDFS
Resources: YARN Access: Hive
Pipeline: Falcon Cluster: Ranger Cluster: Knox
Deployment Choice Linux Windows Cloud
YARN is the architectural center of HDP
• Common data set across all applications
• Batch, interactive & real-time workloads
• Multi-tenant access & processing
Provides comprehensive enterprise capabilities
• Governance
• Security
• Operations
Enables broad ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options • Linux & Windows
• On premises & cloud
Others
ISV Engines
On-Premises
Page 14 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume Kafka NFS
WebHDFS
YARN is the architectural center of HDP
• Common data set across all applications
• Batch, interactive & real-time workloads
• Multi-tenant access & processing
Provides comprehensive enterprise capabilities
• Governance
• Security
• Operations
Enables broad ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options • Linux & Windows
• On premises & cloud
Others
ISV Engines
SECURITY
Authentication Authorization
Audit Data Protection
Storage: HDFS
Resources: YARN Access: Hive
Pipeline: Falcon Cluster: Ranger Cluster: Knox
Deployment Choice Linux Windows On-Premises Cloud
Page 15 © Hortonworks Inc. 2014
Apache Ranger for Centralized Security
Page 16 © Hortonworks Inc. 2014
Apache Ranger (formerly Apache Argus)
Central Security Administration, Authorization and Auditing for Hadoop
Page 17 © Hortonworks Inc. 2014
Central Security Administration • Delivers a ‘single pane of glass’ for
the security administrator • Centralizes administration of
security policy • Ensures consistent coverage across
the entire Hadoop stack
Page 18 © Hortonworks Inc. 2014
Set Up Authorization Policies
File level access control with flexible definition
Control user and group permissions
Page 19 © Hortonworks Inc. 2014
Monitor User Activity with Auditing
Page 20 © Hortonworks Inc. 2014
New Apache Ranger Features in HDP 2.2
New Components in Centralized Administration • Apache Storm Authorization & Auditing
• Apache Knox Authorization & Auditing
Deeper Integration with the Hadoop Stack • Windows Support • Integration with Hive new auth API, support grant/revoke commands • Support grant/revoke commands in HBase
Enterprise Readiness • Rest APIs for policy manager
• Store Audit logs locally in HDFS
Page 21 © Hortonworks Inc. 2014
About Apache Knox API Security for Hadoop
Page 22 © Hortonworks Inc. 2014
Knox: Securely Share the Data Lake w/ Many Users
• Securely extends the reach of Hadoop APIs to anyone on any device
• Serves as a gateway for Hadoop’s REST API • Different REST APIs have varying levels of authentication, authorization, SSL and SSO capabilities
• For enterprise authentication, applies enterprise capabilities to all REST APIs: IDM Integration, SSO, Oauth, SAML
• Avoids exposing the cluster port and hostnames to all users
Page 23 © Hortonworks Inc. 2014
Load Balancer
Extend Hadoop API Reach with Knox
Hadoop Cluster
Application Tier App A App N App B App C
Data Ingest
ETL
Admin/Operator
Bastian Node
SSH
RPC Call
Falcon Oozie Sqoop Flume
Data Operator
Business User
Hadoop Admin
JDBC/ODBC REST/HTTP
Knox
Page 24 © Hortonworks Inc. 2014
New Apache Knox Features in HDP 2.2
• Use Ambari for Knox install, start, stop an configuration
• New support for: • YARN REST API • HDFS HA • SSL to Hadoop cluster services (WebHDFS, HBase, Hive and Oozie)
• Knox Management REST API
• Integration with Apache Ranger for service level authorization
Page 25 © Hortonworks Inc. 2014
Demo
Page 26 © Hortonworks Inc. 2014
Q & A
Page 27 © Hortonworks Inc. 2014
Thank you! Learn more at: hortonworks.com/labs/security/
Register for the remaining 7 Discover HDP 2.2 Webinars
Hortonworks.com/webinars
Recommended