Upload
lehanh
View
220
Download
4
Embed Size (px)
Citation preview
SECURITY
IMPLEMENTATION
IN
HADOOP
By
Narsimha Chary(200607008)
Siddalinga K M(200950034)
Rahman(200950032)
AGENDA
What is security ?
Security in Distributed File Systems ?
Current level of security in Hadoop !
Security features to be incorporated in HDFS to make it robust.
What is Security ?
Protection of information and property from theft,
corruption, or natural disaster, while allowing the
information and property to remain accessible and
productive to its intended users
Processes and mechanisms by which sensitive and
valuable information and services are protected
from publication, tampering or collapse by
unauthorized activities or untrustworthy
individuals and unplanned events respectively.
Security in Distributed File
System
Private Clouds are more or less secure as
they are deployed within the locales of the
organization and is also secured by firewall
Public Clouds are prone to all sorts of
danger as you know where your data is
residing at any instance
Current level of security in
Hadoop Current version of Hadoop has very basic
rudimentary implementation of security which isadvisory access control mechanism.
Hadoop doesn’t strongly authenticate the client , itsimply asks the underlying Unix system byexecuting `whoami` command
Anyone can communicate directly with aDatanode (without the need for communicatingwith the Namenode)and ask for blocks if you havethe block location details (This was experimentedat the recent Cloudera's Hadoop Hackathon)
With current provisions ….
Hadoop cluster may be prone to followingattacks
Unauthorized clients can impersonateauthorized users and access the cluster.
One can get the blocks directly from theDatanodes by bypassing the Namenode.
Eavesdropping/sniffing of data packetsbeing sent by Datanodes to client.
(Can this be resolved by using secure socketover a regular socket ? YES , YES ! Butquite a overhead and hinders performance
Proposed Solutions
Authentication of users/clients accessing
the Hadoop cluster using Kerberos Protocol
Authorization for accessing data residing
over the HDFS (by granting and revoking
capabilities)
A little about Kerberos
Protocol
Network authentication protocol
Developed at MIT in the mid 1980s
Available as open source or in supported
commercial software
How does Kerberos work?
Instead of client sending password to
application server:
– Request Ticket from authentication server
– Ticket and encrypted request sent to application
server
How to request tickets without repeatedly
sending credentials?
– Ticket granting ticket (TGT)
Some of the notations used
A -> B : M #denotes a message M fromnode A to node B
KUA and KR
A :The public and private keys ofNode A
KAB : Key shared between A and B
{M} KAB: A message M encrypted with KAB
<M> KRA : A message signed with KR
A
C, N, D: Client , Namenode, Datanode
1. Authentication
The Namenode checks the details of the
request and if the client is a valid user,
accordingly issues/doesn't issue the Ticket
T
C -> N: request_ticket, TS,
hash<request_ticket,TS,KCN>
N-> C : T
Authentication contd…
Message exchange between client C and
Datanode D to establish shared key amongst
them.
C -> D: {(KCD , TS, nonce)KRC}KUD, T
D -> C: nonce', hash(nonce',KCD)
The client sends the ticket T along with a
shared key KCD that it wants to establish
with the Datanode D, the client also sends a
nonce so that the Datanode can verify the
freshness of the message.
Authentication contd…
To complete the ticket establishment step,
the Datanode has to respond to a nonce
challenge.
T = <IDU,KUC,IV,TS,TE>KR
M
KCD = hash<IV,KUD,random_data>
Authentication contd…
T contains the user Id, public key,
initialization vector and the ticket’s lifetime.
Shared key is computed by hashing the IV
with the Datanode’s public key and some
arbitrarily random data.
2. Capabilities
To read data from the HDFS the client has to obtain
block locations and capabilities from Namenode
before it goes to Datanodes.
C->N : read (path),TS,
hash<read(path),TS,KU
CN>
N->C : block_locations,
hash<block_locations>
The capabilities are embedded into block location
information and signed by the Namenode. The
Datanode verifies the capabilities and accordingly
allows to read or doesn’t.
C->D : read (block),TS,
Capabilities cont…
Description of capability information
embedded into the block location
information. The sign (With Namenode's
private key) of the capability and block id is
also embedded
C = ID,permissions,path
Sign = <c,block_id>KRN
Revocation of capabilities
Capabilities can potentially be re-used by clients
to read the data from HDFS at any time after when
they were issued. However the file permissions
change over period of time.
Revocation of capabilities needs to be done, in
order to prevent replay attacks.
Capabilities issued by Namenode will have an
expiry period ( say 1hr) and this can be configured
in hadoop-site.xml
Revocation of Capabilities
contd… The client has to get a renewal ticket issued by the
Namenode and has to present it to the Datanode for
every request after expiry of the capabilities. If the
renewal ticket is not presented, the Datanode will
deny the request.
Revocation of capabilities is done actively by
Namenode . This is done by sending the message to
the Datanodes to deny the particular capabilities.
Difficulties faced Integrating Kerberos protocol with the
HDFS Frame work is quite a task !
Need a more efficient design on Granting
and Revoking capabilities
Conclusion The overhead with the introduction of capabilities
is low but secures the data access for only clientswhich have been issued the capabilities byNamenode.
However, if the filesize is lessthan 64MB, theoverhead still remains the same as a single block,for smaller files the overhead would besubstantial.
Although the performance overhead at theDatanode isn’t significant for 64MB or largerblock size, it can be reduced further by caching thecapabilities for each block.