Upload
marianna-barefoot
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
IT-SDC : Support for Distributed Computing
HDFS and S3 plugins
Andrea Manzi Martin Hellmich
13/12/2013
DPM Workshop 2IT-SDC
Plugins functionalities
13/12/2013
NFS HTTP/DAV XROOT GridFTP RFIO
Namespace Management Pool Management Pool Driver I/O
Legacy DPM Legacy DPM Legacy DPM Legacy DPM
MySQL MySQL HDFS HDFS
Oracle Oracle S3
HDFS
Memcache
DPM Workshop 3IT-SDC
HDFS plugin
dmlite plugin implementing I/O, pool driver and namespace functionalities through Apache Hadoop HDFS ensuring: Automatic data replication Fault tolerance to client’s read
Dead of Datanode and Namenode Scalability
13/12/2013
DPM Workshop 4IT-SDC
Deployment with Lcgdm-dav
13/12/2013
DPM Head Node Lcgdm-dav + dmliteHDFS-plugin
HDFS Namenode
HDFS Datanode(s)Lcgdm-dav + dmliteHDFS-plugin
DPM Workshop 5IT-SDC
Some details
13/12/2013
HDFS C APIs (libhdfs) do not implement functions to retrieve the available datanodes ( LIVE nodes) Patch implemented and submitted to Hadoop hadoop-libhdfs rpm from our repo
First version for Puppet installation is available. To be adapted to recent dav/dmlite module
changes
DPM Workshop 6IT-SDC
On-going issues
13/12/2013
Tested with new dmlite-based GridFTP plugin Same deployment model as http/dav
frontend or single node writing to HDFS But…HDFS does not support multiple
write streams / random writes: OSG developed in-memory stream reordering in
GridFTP in order to avoid this limitation ( gridftp-hdfs DSI available also in Globus toolkit)
To test and understand integration
DPM Workshop 7IT-SDC
On-going issues
13/12/2013
SRM frontend does not speak dmlite
SRM calls through old dpm daemons do not handle properly new pools (as HDFS)
Patch to dpm daemon to be implemented
DPM Workshop 8IT-SDC
Future steps
13/12/2013
Distribution: Need to understand how to distribute
the plugin HDFS client only in Fedora 20 and
Rawhide https://apps.fedoraproject.org/packages/libh
dfs
Support for security enabled HDFS clusters ( Kerberos)
DPM Workshop 9IT-SDC
Performances
13/12/2013
Tests through LCDM-DAV: HDFS Namespace
stat/s half performances compared to Mysql plugin namespace
To be optimized with Memcached in front ROOT analysis with massive Vector
I/O and TTreeCache Comparable performance with standard
disk pools
10IT-SDC
S3 plugin
13/12/2013DPM Workshop
11IT-SDC
Key Facts
Data directly to the cloud
HTTP/HTTPS only
DPM provides the namespace
13/12/2013DPM Workshop
3
2
1
12IT-SDC
Data in the Cloud
REDIRECTGET
GET
No data through DPM Inherits all capabilities
from S3 provider: Amazon: range-header, no
multi-range, multi-stream download only, no 3rd party copy, http access only
DATA
DPM Workshop 13/12/2013
13IT-SDC
How to install an S3 pool
yum install dmlite-plugins-s3
dmlite-shell> pooladd poolaws s3> poolmodify poolaws bucketsalt xFVlsrg> poolmodify poolaws s3accesskeyid <ID>> poolmodify poolaws s3secretaccesskey <SK>
<create an s3 bucket on your storage>
13/12/2013DPM Workshop
14IT-SDC
More info
HDFS plugin https://svnweb.cern.ch/trac/lcgdm/wiki/D
pm/Dev/Dmlite/Plugins/HDFS
S3 plugin https
://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Plugins/S3
13/12/2013DPM Workshop
15IT-SDC
Thanks!
Questions?
DPM Workshop 13/12/2013