27
Ceph @ CSC How Do we do

How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

  • Upload
    ledat

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Page 1: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Ceph @ CSC How Do we do

Page 2: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

#whoami Karan Singh

System Specialist Cloud Storage

CSC-IT Center for Science FINLAND

[email protected]

2

•  Author for Learning Ceph – Packt Publication 2015

•  Author for Ceph Cookbook – Packt Publication 2016

•  Technical Reviewer for Mastering Ceph – Packt Publication 2016

•  www.ksingh.co.in - Tune in for my blogs

Page 3: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

CSC-IT Center For Science

3

•  Founded in 1971

•  Finnish Non Profit organization, Funded by Ministry of Education

•  Connected Finland to Internet in 1988

•  Most Powerful academic computing facility in the Nordics

•  ISO27001:2013 Certification

•  Public cloud offering Pouta Cloud Services

More Information o https://www.csc.fi/ o https://research.csc.fi/cloud-computing

Page 4: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

CSC Cloud Offering

4

•  Pouta Cloud Service [ IaaS ] o  cPouta - Public cloud , General Purpose o  ePouta - Public cloud , purposely built for sensitive data

•  Built using OpenStack

•  Uses upstream openstack packages, No distribution

•  Storage : Both Ceph and non-Ceph

Page 5: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Our Need for Ceph

5

•  To build our own storage – Not to buy black box

•  Software Defined , use commodity hardware

•  Unified – Block , Object , ( File )

•  Tightly Integrates with OpenStack

•  Open Source, no vendor lock-in

•  Scalable and High available

Page 6: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Our Need for Ceph

6

•  Remove SPOF for Storage in OpenStack

•  OpenStack alone is too complex – Let’s make it a bit less o  By using Ceph for storage needs

•  To be up-to-date with community o  Ceph is the most used storage backend for OpenStack

•  Need for Object storage

Page 7: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Storage Complexity

7

LUN  

Gateway-­‐1  

Gateway-­‐2  

LUN  

LUN  

LUN  

Enterprise  Array  

Storage  for  Cinder  

OpenStack    Compute  OpenStack    Compute  OpenStack    Compute  

OpenStack    Compute  OpenStack    Compute  OpenStack    Controller  

Local  Disk  

NFS  

Storage  for    Nova  Instances  

Storage  for    Glance  

Page 8: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

This is why we choose Ceph

•  One storage to rule them all

•  Goes hand-in-hand with OpenStack

•  Supports instance Live Migration, CoW

•  Bonus for using Ceph o  OpenStack Manila ( Shared filesystem) o  On the way

hFp://www.slideshare.net/ircolle/what-­‐is-­‐a-­‐ceph-­‐and-­‐why-­‐do-­‐i-­‐care-­‐openstack-­‐storage-­‐colorado-­‐openstack-­‐meetup-­‐october-­‐14-­‐2014  

Page 9: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Ceph Infrastructure

9

Production Cluster

•  10 x HP DL380 o  E5-2450, 8c, 2.10 GHz o  24GB Memory o  12 x 3TB SATA o  2 x 40Gbe

•  Ceph Firefly 0.80.8

•  CentOS 6.6 o  3.10.69

•  360 TB Raw

Test Cluster

•  5 x HP DL380 o  E5-2450, 8c, 2.10 GHz o  24GB Memory o  12 x 3TB SATA o  2 x 40Gbe

•  Ceph Hammer 0.94.3

•  CentOS 6.6 o  3.10.69

•  180 TB Raw

Development Cluster

•  4 x HP SL4540 o  2 x E5-2470, 8c, 2.30 GHz o  192 GB Memory o  60 x 4TB SATA o  2 x 10Gbe

•  Ceph Hammer 0.94.3

•  CentOS 6.6 o  3.10.69

•  960 TB Raw

ePouta Cloud Service

Page 10: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Ceph Infrastructure .. Cont..

10

Pre-Production Cluster

•  4 x HP SL4540 o  2 x E5-2470, 8c, 2.30 GHz o  192 GB Memory o  60 x 4TB SATA o  2 x 10Gbe

•  Object Storage Service

•  Ceph Firefly 0.80.10

•  CentOS 6.5 o  2.6.32

•  240 OSD / 870 TB Available

cPouta Cloud Service

Fujitsu Eternus CD10000

•  4 x Primergy Rx300 S8 o  2 x E5-2640, 8c, 2.00 GHz o  128 GB Memory o  1 x 10Gbe / 1 x 40Gbe o  15 x 900GB SAS 2.5“ 10K o  1 x 800G Fusion ioDrive2 PCIe SSD

•  4 x Eternus JX40 JBOD o  24 x 900GB SAS 2.5“ 10K

•  Ceph Firefly 0.80.7

•  CentOS 6.6 o  3.10.42

•  156 OSD / 126 TB Available

Proof of Concept

Page 11: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Our toolkit for Ceph •  OS deployment, package mgmt.

o  Spacewalk

•  Ansible o  End to end system configuration

o  Network, Kernel, packages, OS Tuning, NTP, o  Metric collection, Monitoring, Central logging

etc. o  Entire Ceph deployment o  System / Ceph administration

•  Performance Metric & Dashboard o  Collectd, Graphite, Grafana

•  Monitoring and Logs Management o  OpsView, ELK stack

•  Version Control o  Git , GitHub

11

Page 12: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Live Demo

12

Page 13: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Near Future

•  CSC Espoo DC [ePouta Cloud Storage] o  Next 8-12 months à 3PB Raw o  Introduction to storage POD layout for scalability & better failure domain o  Dedicated Monitor node o  SSD Journals o  Erasure Coding

•  CSC Kajaani DC [cPouta Cloud Storage] o  Early next year à Add new capacity ~850TB ( total capacity ~1.8 PB Raw ) o  Enable full support to OpenStack ( Nova, Glance, Cinder, Swift ) o  Erasure Coding

•  Miscellaneous o  Multi DC replication [Espoo – Kajaani]

13

Page 14: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

14

Build Ceph environment , that is •  Multi-Petabyte ( ~ 10 PB Usable ) •  Hyper Scalable •  Multi-Rack Fault tolerant

Storage PODs •  Design on paper currently •  Still thinking for the best way •  Interested to know, what other’s are doing ?

Long Term

Page 15: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

15

Disks, Nodes , Racks

Disks  

Storage  Node  

Rack  Rack  Rack  

Page 16: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

16

More Racks ... Hyper scale

Rack  Rack  Rack  Rack  Rack  Rack   Rack  Rack  Rack  Rack  Rack  Rack  

How  to  manage  effecPvely  

C  E  P  H  

Page 17: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

17

Storage POD

•  Storage POD is a group of racks •  Ease of management , in a hyper scale environment •  Scalable modular design •  Can sustain multi-rack failure •  CRUSH failure domain changes required

•  Primary copy à One POD •  Secondary & Tertiary Copy à Other Two POD’s

Page 18: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

18

Storage POD in action

Rack  

POD-­‐1  

Rack  

POD-­‐2  

Rack  

POD-­‐3  

C  E  P  H  

Page 19: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Scaling up Multi Rack

19

Rack  

POD-­‐1  

C  E  P  H  

Rack  

POD-­‐2  

Rack  

POD-­‐3  

Page 20: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

20

Scaling up…even more racks

POD-­‐1  

C  E  P  H  

POD-­‐2   POD-­‐3  

Page 21: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

21

Scaling up…several PODs

C  E  P  H  

Page 22: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Some Recommendations

•  Monitor Nodes o Use dedicated monitor nodes, avoid sharing them with OSD’s o Use SSD for Ceph Monitor LevelDB

•  OSD nodes o Avoid overloading your SSD journals, you might not get what you expect. o Node Preference:

o  #1 Thin node (10-16 disk) o  #2 Thick Node (16-30 disk) o  #3 Fat Node (disk > 30)

o  If using FAT nodes , use several of them

22

Page 23: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Operational Experience •  Use dedicated disks for OS , OSD data & OSD Journal ( can be shared )

•  Plan your requirement well , choose PG count wisely for a prod. Cluster o  Increasing PG count is one of the most intensive operation o  Decreasing PG count is not allowed

•  Ceph version upgrades / rolling upgrades , works like charm

•  For Thick and FAT OSD nodes , tune kernel o  kernel.pid_max=4194303 o  kernel.threads-max=200000

23

Page 24: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Operational Experience

•  If you are seeing Blocked OPS/Slow OSD/Request, don’t worry you are not alone o  Ceph health detail -> Find OSD -> Find node -> Check “EVERYTHING” on that node -> Mark out o  If the problem is on most of the nodes -> Check “NETWORK”

o  Interface errors , MTU , Configuration, Network blocking , Architecture, Switch logs, remove iface, bonding. o  Even the cable change worked for us ( upgraded switch FW and the cable type became up supported )

•  Tune CRUSH for optimal parameters o  # ceph osd crush tunables optimal o  Caution this will trigger a lot of data movement

•  Ceph recovery/backfilling can starve your client for IO , you may want to reduce it

ceph tell osd.\* injectargs '--osd_recovery_max_active 1 --osd_recovery_max_single_start 1 --osd_recovery_op_priority 50 --osd_recovery_max_chunk 1048576 --osd_recovery_threads 1 --osd_max_backfills 1 --osd_backfill_scan_min 4 --osd_backfill_scan_max 8’

24

Page 25: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

03/02/15 25

#  1  Health  OK  

#2  

Page 26: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

Operational Experience •  Increasing filestore max_sync and min_sync vlaues , helped to a certain extent

o  filestore_max_sync_interval = 140 o  filestore_min_sync_interval = 100

•  Firmware upgrade on the network switches as well as replacing physical network cables fixed the issue.

26

Advice : Always check your network TWICE !!!

Page 27: How Do we do Ceph @ CSC - Karan Singh · • Monitoring and Logs Management o OpsView, ELK stack ... • Interested to know, ... you might not get what you expect

THANK YOU

27