Upload
jefferson-alcantara
View
112
Download
1
Embed Size (px)
Citation preview
Tendências e Evoluções em
Armazenamento de Dados
HELLO!• Jefferson Especialista de Storage no
Walmart.com• Processamento de Dados, Fatec de São
Paulo • Experiência em alta criticidade em sistemas
de armazenamento de dados• SNIA Certificate .
AGENDA
• Storage Architecture overview • Ceph overview• Rados Gateway overview• Rados Gateway architecture• RBD• CephFS• Object Storage Multi-site
Storage Overview
BEFORE S3 AFTER S3
“ Before S3 ”
R5B1
R6BqR
5C1
R6CpR
5Dp
R6D1
R5A1
R6A1 R
5B2
R6B1R
5Cp
R6CqR
5D1
R6Dp
R5A2
R5A2 R
5Bp
R6B2R
5C2
R6C1R
5D2
R6Dq
R5A3
R6Ap R
5B3
R6BpR
5C3
R6C2R
5D3
R6D2
R5Ap
R6Aq
CONTROLER
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
CONTROLER
Raid Group 0
Raid Group 2
Raid Group 1
Protection
LUNS or Volumes
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
COMPUTER DISK
Storage Features• Snapshots • Map to Any • Clone • Volume Copy • Virtual Storage• Thin Provisioning• Dedup• Replication
2 Tipos de Storages
CONTROLER
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
CONTROLER
Block
Unified
SANFC/ISCSI
NASETHERNET
/DEV/SDAOU DRIVE F:
CIFS \\IP\SHARENFS IP:/MOUNT
Ext4
4k 4k 4k
• Ext4• XFS• BRFS• ZFS
“ S3 ”
O Conceito
March 2006Simple Storage Service
Escalável
Sem Pontos De Falha
Rápido
Barato
Simples
RBDCephFS
MONCRUSH
OSD
PG
radosgw
Sage A. Weil
Community [email protected] [email protected]
Ceph Versions Argonaut – on July 3, 2012Bobtail (v0.56) – on January 1, 2013Cuttlefish (v0.61) – on May 7, 2013Dumpling (v0.67) – on August 14, 2013Emperor (v0.72) – on November 9, 2013Firefly (v0.80) – on May 7, 2014Giant (v0.87) – on October 29, 2014Hammer (v0.94) – on April 7, 2015Infernalis (v9.2.0) – on November 6, 2015Jewel (v10.2.0) – on April 21, 2016
Open source (LGPL license) Software defined storage distributedNo single point of failure Massively scalableSelf healing Unified storage: object, block and file
Ceph
Ceph architecture
CEPHFS
A distributed file
system with POSIX
semantics and scale- out
metadata management
RGW
A web services
gateway for object
storage, compatible with
S3 and Swift
RBD
A reliable, fully-
distributed block device
with cloud platform
integration
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOS
APP HOST CLIENT
Rados Reliable Distributed Object Storage Replication Flat object namespace within each poolStrong consistency (CP system) Infrastructure aware, dynamic topology Hash-based placement (CRUSH)
3 até 10.000 OSDs p/ um ClusterOne per Disk Server Stored object to client Intelligently peer to replication
OSD
Maintain cluster membership and state Provide consensus for distributed decision-making Small, odd number These do not serve stored objects to clients
Monitor node
M
M
Object Placement
Pool
placement group (PG)
CRUSH(pg, cluster state, rule) =
[A, B]
OBJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ O
BJ
OBJ
BINARY
ID
METADATA
OBJ
CrushMap
Data Center
Rack1
Host1
OSD OSD
Host2
Rack2
Host3
OSD OSD
Host4
Rack3
Host5
OSD OSD
Host6
#ceph osd setcrushmap -i crushmap-filename
# begin crush map# devicesdevice 1 osd.1device 2 osd.2
host Host1 {id -1alg strawhash 0 # rjenkins1item osd.1 weight 3.500
}host Host2 {
id -2alg strawhash 0 # rjenkins1item osd.2 weight 3.500
}rack Rack1 {
id -4alg strawhash 0 # rjenkins1item Host1 weight 3.500
}rack Rack2 {
id -4alg strawhash 0 # rjenkins1item Host2 weight 3.500
}
datacenter DataCenter {id -5alg strawhash 0 # rjenkins1item Rack1 weight 3.500item Rack2 weight 3.500
}rule data {
ruleset 0type replicatedmin_size 1max_size 3step take DataCenterstep chooseleaf firstn 0 type rack step emit
}
Rados Gateway overview
RGW
A web services
gateway for object
storage, compatible with
S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOS
APP
RADOS CLUSTER
M M
M
RADOSGWLIBRADOS
APLICATION
REST
SOCKET
RGW Components
• Frontend
• FastCGI - external web servers
• Civetweb– embedded web server
• Rest Dialect S3
• Swift
• Other API
• Execution layer – common layer for all dialects
$ radosgw-admin user create --display-name="johnny rotten" --uid=johnny
access_key": "TCICW53D9BQ2VGC46I44”secret_key": "tfm9aHMI8X76L3UdgE+ZQaJag1vJQmE6HDb5Lbrz”
API :Java, Python, C++, ruby , Perl, C#
HTTP REST: DELETE/GET/PUT /{bucket} HTTP/1.1 Host: cname.domain.com Authorization: AWS {access-key}:{hash-of-header-and-secret}
S3 CLOUD CLIENT:
S3cmd, cyberduck , s3fs
S3cmd ls s3://bucket_name/file.txt
RBD
RBD
A reliable, fully-
distributed block device
with cloud platform
integration
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOS
HOST
RBD
• Thinly provisioned• Resizable images• Image import/export• Image copy or rename• Read-only snapshots• Revert to snapshots• Ability to mount with Linux or
QEMKVM clients!
RDB(module)
Librados
RADOS
VM/Host
RBD connectors
RADOS
RDB(module)
Libvirt
rbd create --size 1024 POOL/IMAGErbd resize --size 2048 IMAGE (to increase) rbd resize --size 2048 IMAGE --allow-shrink (to decrease)
$ sudo apt-get install ceph-common$ sudo modprobe rbd
ceph-authtool --print-key /etc/ceph/keyring.admin
sudo echo “mon:6789 name=admin,secret=AQDVGc5P0LXzJhAA5C019tbdrgypFNXUpG2cqQ== rbd IMAGE" | sudo tee /sys/bus/rbd/add
$ sudo mkfs.xfs /dev/rbd0 $ sudo mount /dev/rbd0 /mnt/
CephFS
CEPHFS
A distributed file
system with POSIX
semantics and scale- out
metadata management
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)
RADOS
CLIENT
CephFs Overview
Filesystem lookup by inode
INO
Metadata Server
OID
ONO
PGID
OBJ Ceph Crush
Obj
Obj
Obj
File
1
2
3 4 5
6
Librados
RADOS
Metadata Servers
• POSIX-compliant file system• Linux Kernel Clientt
• Mount –t ceph 1.2.3.4:/• /mnt
• Export (NFS), Samba(CIFS) • Ceph-fuse • Recursive Directory Stats
• FileSize • File and Directory Count • Modification Time
• Libcephfs.so • your app • samba • Hadoop • Ganesha(NFS)
CephFS na Prática :
ceph-deploy mds create myserver ceph osd pool create fs_data ceph osd pool create fs_metadata ceph fs new myfs fs_metadata fs_data mount -t cephfs x.x.x.x:6789 /mnt/ceph
Object Storage Multi-site • Replication• 3 Million Objects
SITE A
CEPH CLUSTER
RADOSGW
SITE B
CEPH CLUSTER
RADOSGW
S3 S3
• 800 Objs/s• Active/standby
THANKS!Any questions?You can find me at:Jefferson · [email protected]