30
Curb Your Insecurity with HDP Tips for a Secure Cluster (with Spark too) Hadoop Summit – San Jose June 29 th , 2016

Curb your insecurity with HDP

Embed Size (px)

Citation preview

Presentation Title Goes Here with a Maximum of Three Lines of Copy

Curb Your Insecurity with HDPTips for a Secure Cluster (with Spark too)

Hadoop Summit San JoseJune 29th, 2016

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data1

Pardeep KumarSr. Systems Architect, NA Prof. Services

4+ years in Hadoop Helping Fortune500 customers succeed in their Hadoop journey Setup, implement, migrate and secure some of the largest clusters in North America Security, & Migration SME, HCC Guru Loves Hadoop, Cricket and Kerberos ;)

[email protected]

@hadooptutor

linkedin.com/in/pardeepkumarmishraAncil McBarnettSr. Solutions Engineer, NorthEast

Helping organizations design, implement, operate and consume Hadoop and Big Data Solutions. Specialize in Security and Hive Tuning. HCC Guru.

Loves Cricket, and DJ Bravo Champion :D

[email protected]

@mcbkingdom

linkedin.com/in/mcbkingdom

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hadoop Security in 4 Steps

# Hortonworks Inc. 2011 2016. All Rights ReservedHow do I set policy across the entire cluster?Who am I/prove it?What can I do?What did I do?How can I encrypt at rest and over the wire?Comprehensive Approach to SecurityData ProtectionProtect data at rest and in motionIn order to protect any data system you must implement the following:AuditMaintain a record of data accessAuthorizationProvision access to dataAuthenticationAuthenticate users and systemsAdministrationCentral management and consistent security

# Hortonworks Inc. 2011 2016. All Rights Reserved

4

HDP Security: Comprehensive, Complete, Extensible

Perimeter Level SecurityNetwork Security (i.e. Firewalls)Apache Knox (i.e. Gateways)

AuthenticationLDAP/ AD - Kerberos

Data ProtectionEncrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS

Authorization & AuditConsistent authorization controls across all Apache components within HDP: Apache Ranger

# Hortonworks Inc. 2011 2016. All Rights ReservedAuthentication with KerberosKerberos is necessary evil, just do it!!

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data6

Security Without Kerberos

# Hortonworks Inc. 2011 2016. All Rights ReservedConfigure Kerberos Ambari Wizard

# Hortonworks Inc. 2011 2016. All Rights ReservedSecurity With Kerberos

# Hortonworks Inc. 2011 2016. All Rights ReservedApache Ranger

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data10

Apache Ranger

# Hortonworks Inc. 2011 2016. All Rights ReservedHDFS File Security

# Hortonworks Inc. 2011 2016. All Rights ReservedHive Database and Table Security

# Hortonworks Inc. 2011 2016. All Rights Reserved

Authorization and AuditAuthorizationFine grain access controlHDFS Folder, FileHive Database, Table, ColumnHBase Table, Column Family, ColumnStorm, Knox and more

AuditExtensive user access auditing in HDFS, Hive and HBaseIP AddressResource type/ resourceTimestampAccess granted or denied

Control access into systemFlexibility in defining policies

# Hortonworks Inc. 2011 2016. All Rights ReservedRest API Security with Apache Knox

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data15

Hadoop REST APIsUseful for connecting to Hadoop from the outside the clusterWhen more client language flexibility is requiredi.e. Java binding not an optionChallengesClient must have knowledge of cluster topologyRequired to open ports (and in some cases, on every host) outside the clusterServiceAPIWebHDFSSupports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming.WebHCatJob control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat.HiveHive REST API operationsHBaseHBase REST API operationsOozieJob submission and management, and Oozie administration.

# Hortonworks Inc. 2011 2016. All Rights ReservedAuthenticationAPI Security with Knox

Eliminates SSH edge nodeCentral API management Central audit control Service level authorization

SSO IntegrationSiteminder and OAMLDAP and AD integration

Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexitiesIntegrated with existing systems to simplify identity maintenanceSingle, simple point of access for a clusterCentral controls ensure consistency across one or more clustersKerberos EncapsulationSingle Hadoop access pointREST API hierarchyConsolidated API callsMulti-cluster support

# Hortonworks Inc. 2011 2016. All Rights Reserved

17

Hadoop REST API with KnoxServiceDirect URLKnox URLWebHDFShttp://namenode-host:50070/webhdfshttps://knox-host:8443/webhdfsWebHCathttp://webhcat-host:50111/templetonhttps://knox-host:8443/templeton

Ooziehttp://ooziehost:11000/ooziehttps://knox-host:8443/oozie

HBasehttp://hbasehost:60080https://knox-host:8443/hbase

Hivehttp://hivehost:10001/cliservicehttps://knox-host:8443/hiveYARNhttp://yarn-host:yarn-port/wshttps://knox-host:8443/resourcemanager

Masters could be on many different hostsOne hosts, one portConsistent pathsSSL config at one host

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hadoop REST API Security: Drill-DownRESTClientEnterpriseIdentityProviderLDAP/AD

Knox GatewayGWGWFirewallFirewallDMZLBEdge Node/Hadoop CLIsRPCHTTPHTTPHTTPLDAPHadoop Cluster 1

MastersSlavesRMNNWebHCatOozie

DNNMHS2Hadoop Cluster 2

MastersSlavesRMNNWebHCatOozie

DNNMHS2HBaseHBase

# Hortonworks Inc. 2011 2016. All Rights ReservedNode the arrows to Hadoop Cluster are simplifications

19

Data Protection

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data20

Data ProtectionHDP allows you to apply data protection policy at different layers across the Hadoop stackLayerWhat?How ?Storage and AccessEncrypt data while it is at restHDFS Transparent Data Encryption, Partners, Hbase encryption, OS level encrypt, TransmissionEncrypt data as it movesSSL, SASL, RPC

# Hortonworks Inc. 2011 2016. All Rights Reserved

Points of CommunicationPage 22WebHDFSDataTransferProtocol

NodesM/R ShuffleClient124RPC3

NodesDataTransfer2JDBC/ODBC3Hadoop ClusterRPC4

# Hortonworks Inc. 2011 2016. All Rights ReservedData Protection - HDFS Encryption

DATA ACCESS DATA MANAGEMENT

SECURITY PARTNERS YARNKeyProvider API(partner integration point)

Key Management System (KMS)

Stateless Key Management1N1HDFS Encryption Zone Encrypted FileEncrypted FileEncrypted FileEncrypted FileEncrypted FilesName Node

HDFS Client

HDFS Client

Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc.Hortonworks collaborating with partners to deliver enterprise scale Key Management , deliver more choices to customersOpen source KMS with RangerOr Partner with commercial KMS solutions i.e. Voltage KMSPartner joint engineering resourcesVoltage Stateless Key Management integrated with KeyProvider API Only HDP offers open source and commercial choices for key managementOpen Source Key Management

# Hortonworks Inc. 2011 2016. All Rights Reserved

23

Demo Transparent Data Encryption

# Hortonworks Inc. 2011 2016. All Rights ReservedSecuring Spark Deployments

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data25

Spark - AuthenticationHadoop Cluster

Spark leverages Kerberos on YARN

KDCUse Spark ST, submit Spark JobSpark gets Namenode (NN) service ticketYARN launches Spark Executors using John Does identityJohn Doe

Spark AMNNExecutor reads from HDFS using John Does delegation tokenkinit1234567Get Service Ticket (ST) for Spark

# Hortonworks Inc. 2011 2016. All Rights ReservedJohn Doe first authenticates to Kerberos before launching Spark Shell

kinit -kt /etc/security/keytabs/johndoe.keytab [email protected]

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

HDFSSpark Authorization

YARN Cluster

ABC

KDCUse Spark ST, submit Spark JobGet Namenode (NN) service ticketExecutors read from HDFSClient gets service ticket for SparkJohn Doe

RangerCan John launch this job?Can John read this file

# Hortonworks Inc. 2011 2016. All Rights ReservedControlling HDFS Authorization is easy/DoneControlling Hive row/column level authorization in Spark is WIP

Spark Channel Encryption - ExampleShuffle DataControl/RPCShuffleBlockTransferRead/Write DataFS Broadcast,File Downloadspark.authenticate.enableSaslEncryption= truespark.authenticate = true. Leverage YARN to distribute keysDepends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFSNM > Ex leverages YARN based SSLspark.ssl.enabled = true

# Hortonworks Inc. 2011 2016. All Rights ReservedGotchas with Spark SecurityClient -> Spark Thrift Server > Spark Executors No identity propagation on 2nd hopForces STS to run as Hive user to read all dataReduces securityUse SparkSQL via shell or programmatic APIhttps://issues.apache.org/jira/browse/SPARK-5159SparkSQL Granular security unavailableRanger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and Hive)Brings Row/Column level/Masking features to SparkSQLSpark + HBase with KerberosIssue fixed in Spark 1.4 (Spark-6918)Spark Stream + Kafka + Kerberos + SSLIssues fixed in HDP 2.4.xSpark jobs > 72 HoursKerberos token not renewed, fixed in Spark 1.5+

# Hortonworks Inc. 2011 2016. All Rights ReservedQuestions??

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks: Powering the Future of Data30