Upload
edureka
View
370
Download
2
Embed Size (px)
Citation preview
View Hadoop Administration Course at www.edureka.co/hadoop-admin
Advanced Security in Hadoop Cluster
www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives
At the end of this module, you will be able to
Hadoop Cluster introductionRecommended Configuration for clusterHadoop cluster running modesHadoop Security with KerberosHDFS Security with ACLs (Access Control Lists )Hadoop Admin ResponsibilitiesDemo on Security
Slide 3Slide 3Slide 3 www.edureka.co/java-hadoop
Hadoop Core Components
Hadoop 2.x Core Components
HDFS YARN
Storage Processing
DataNode
Active NameNode Resource Manager
Node Manager
Master
Slave
StandbyNameNode
www.edureka.co/hadoop-admin
Slide 4
RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 cores.Ethernet: 3 x 10 GB/sOS: 64-bit CentOS
RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
StandBy NameNode
Optional
RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS
DataNode
DataNode DataNode DataNode
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 5
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
General ( Depends on requirement ‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration
» Do not use RAID!» 2 x Quad-core CPUs» 24 -32GB RAM» Gigabit Ethernet
General Configuration
Multiples of ( 1 hard drive + 2 cores+ 6-8GB RAM) generally work wellfor many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 6
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
No daemons, everything runs in a single JVM Suitable for running MapReduce programs during development Has no DFS
Hadoop daemons run on the local machine
Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
Slide 7 www.edureka.in/hadoop-admin
Security issues in Hadoop Cluster
Unauthorized clients can impersonate authorized users and access the cluster
Get the blocks directly from the Data nodes by bypassing the Name node
Eavesdropping of data packets being sent by Data nodes to client
Not all users should have access to sensitive data
No User verification for Map Reduce code execution, malicious users could submit a job
Insecure Network Transport
No Message level security
Slide 8 www.edureka.in/hadoop-admin
Hadoop security considerations
Authentication
Authorization
Access control
Data masking and encryption
Network security
Integrity
Confidentiality
Audits and event monitoring
Slide 10 www.edureka.in/hadoop-admin
Kerberos to the rescue
Network authentication protocol
Developed at MIT in the mid 1980s
Easy for administrators to manage passwords by storing them centrally
Enhance security by ensuring no clear text passwords are transmitted
Allow users to access different services with the same password
Available as open source or in supported commercial software
Slide 11 www.edureka.in/hadoop-admin
Kerberos Design Requirements
Interactions between hosts and clients should be encrypted.
Must be convenient for users (or they won’t use it).
Protect against intercepted credentials.
Kerberos is based on the Secret-Key Distribution Model
-keys are the basis of authentication in Kerberos
-typically a short sequence of bytes.
-used to both encrypt & decrypt
Slide 12 www.edureka.in/hadoop-admin
Kerberos Components & Terminology
Kerberos Client
Kerberos Server
Kerberos Key Distribution Center ( KDC )
Authentication Server ( AS )
Ticket-Granting Server ( TGS )
Users and Services in a Kerberos realm are know as Principals.
Slide 13 www.edureka.in/hadoop-admin
Kerberos to the rescue
Kerberos Integration
User Authentication User and Group access control list at
cluster level Tokens
Delegation
Job
Block Access
Simple Authentication and Security Layer (SASL) with RPC digest mechanism
Server
1: AuthenticationGet TGT
2: AuthorizationGet Service Ticket
3: Service RequestStart Service Session
Kerberos Key Distribution Center
Authentication Server
Ticket Granting Server
Client
Slide 14 www.edureka.in/hadoop-admin
Kerberos to the rescue
Server
Kerberos Key Distribution Center
Authentication Server
Ticket Granting Server
Client1.Request TGT (Auth)
2.Responds with encrypted session key + TGT (TGT + Sk1)
3. Request Service ticket by providing TGT
4. Encrypted session key and ticket granted for service access( TGT + Sk2 )
5. Authenticates with Service Ticket(Auth + TGT)
6. Server responds with encrypted timestamp ( Sk2 + Auth )
(Auth + TGT)
Auth -> AuthenticatorTGT -> Ticket Granting TicketSk1 Sk2 -> Session Key
Slide 15 www.edureka.in/hadoop-admin
Kerberos advantages
A password never travels over the network. Only time-sensitive tickets travel over the network.
Passwords or secret keys are only known to the KDC and the principal.
Kerberos supports passwords or secret keys to be stored in a centralized credential store that is LDAP-complaint. This makes it easy for the administrators to manage the system and the users.
Servers don't have to store any tickets or any client-specific details to authenticate a client.
Slide 17 www.edureka.in/hadoop-admin
HDFS Permissions ( ACLs )
HDFS has supported a permission model equivalent to traditional Unix permission
For each file or directory, permissions are managed for a set of 3 distinct user classesOwner Group Others
There are 3 different permissions controlled for each user classRead Write Execute
For files : The r permission is required to read the file, and the w permission is required to write or append to the file.
For directories : the r permission is required to list the contents of the directory, the w permission is required to create or delete files or directories, and the x permission is required to access a child of the directory.
Slide 18 www.edureka.in/hadoop-admin
HDFS Permissions ( ACLs )
Each client process that accesses HDFS has a two-part identity composed of the user name, and groups list.
Whenever HDFS must do a permissions check for a file or directory foo accessed by a client process
1. If the user name matches the owner of foo, then the owner permissions are tested
2. Else if the group of foo matches any of member of the groups list, then the group permissions are tested
3. Otherwise the other permissions of foo are tested.
4. If a permissions check fails, the client operation fails.
Slide 19 www.edureka.in/hadoop-admin
ACLs Shell Commands
hdfs dfs -getfacl [-R] <path>
Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.
hdfs dfs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set <acl_spec> <path>]
Sets Access Control Lists (ACLs) of files and directories.
hdfs dfs -ls <args>
The output of ls will append a ‘+’ character to the permissions string of any file or directory that has an ACL.
www.edureka.co/hadoop-adminSlide 21
Hadoop Admin Responsibilities
Responsible for implementation and administration of Hadoop infrastructure.
Testing HDFS, Hive, Pig and MapReduce access for Applications.
Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.
Performance tuning and Capacity planning for Clusters.
Monitor Hadoop cluster and deploy security.
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?
Questions
www.edureka.co/hadoop-adminSlide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.co/hadoop-adminSlide 24
Course Topics
Module 1 » Hadoop Cluster Administration
Module 2» Hadoop Architecture and Cluster setup
Module 3 » Hadoop Cluster: Planning and Managing
Module 4 » Backup, Recovery and Maintenance
Module 5 » Hadoop 2.0 and High Availability
Module 6» Advanced Topics: QJM, HDFS Federation and
Security
Module 7» Oozie, Hcatalog/Hive and HBase Administration
Module 8» Project: Hadoop Implementation