28
Ceph Object Storage at Spreadshirt How we start July 2015 Jens Hadlich, Chief Architect Ansgar Jazdzewski, System Engineer Ceph Berlin Meetup

Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Embed Size (px)

Citation preview

Page 1: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage at Spreadshirt

How we start

July 2015

Jens Hadlich, Chief Architect Ansgar Jazdzewski, System Engineer

Ceph Berlin Meetup

Page 2: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

About Spreadshirt

2

Spread it with Spreadshirt

A global e-commerce platform for everyone to create, sell and buy ideas on clothing and accessories across many points of sale. •  12 languages, 11 currencies •  19 markets •  150+ shipping regions

•  community of >70.000 active sellers •  € 72M revenue (2014) •  >3.3M items shipped (2014)

Page 3: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Object Storage at Spreadshirt

•  Our main use case –  Store and read primarily user generated content, mostly images

•  Some 10s of terabyte (TB) of data •  2 typical sizes:

–  a few dozen KB –  a few MB

•  Up to 50.000 uploads per day •  Read > Write

3

Page 4: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Object Storage at Spreadshirt

•  „Never change a running system“? –  Currently solution (from our early days):

•  Big storage, well-branded vendor •  Lots of files / directories / sharding

–  Problems: •  Regular UNIX tools are unusable in practice •  Not designed for „the cloud“ (e.g. replication is an issue) •  Performance bottlenecks

–  Challenges: •  Growing number of users à more content •  Build a truly global platform (multiple regions and data centers)

4

Page 5: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph

•  Why Ceph? –  Vendor independent –  Open source –  Runs on commodity hardware –  Local installation for minimal latency –  Existing knowledge and experience –  S3-API

•  Simple bucket-to-bucket replication –  A good fit also for < Petabyte –  Easy to add more storage –  (Can be used later for block storage)

5

Page 6: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Architecture

6

Overview

Ceph Object Gateway

Monitor

Cluster Network

Public Network

OSD OSD OSD OSD OSD

Monitor Monitor

A lot of nodes and disks

Client HTTP (S3 or SWIFT API)

RADOS (reliable autonomic distributed object store)

Page 7: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Architecture

7

A little more detailled

Monitor

Cluster Network

Public Network

Client

RadosGW

HTTP (S3 or SWIFT API)

Monitor Monitor

Some SSDs (for journals) More HDDs JBOD (no RAID)

OSD node

Ceph Object Gateway

librados

Odd number (Quorum)

OSD node OSD node OSD node OSD node

1G

10G (the more the better)

...

RADOS (reliable autonomic distributed object store)

OSD node

Page 8: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage at Spreadshirt

8

Initial Setup

Cluster Network (OSD Replication)

Cluster nodes 3 x SSD (journal / index) 9 x HDD (data) xfs

3 Monitors

2 x 1G

2 x 10G

Public Network

Client HTTP (S3 or SWIFT API)

HAProxy

RadosGW

Monitor

RadosGW

Monitor

RadosGW

Monitor

RadosGW RadosGW

2 x 10G Cluster Network

RadosGW on each node

Page 9: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage at Spreadshirt

9

Initial Setup •  Hardware Configuration –  5 x Dell PowerEdge R730xd

•  Intel Xeon E5-2630v3, 2.4 GHz, 8C/16T •  64 GB RAM •  9 x 4 TB NLSAS HDD, 7.2K •  3 x 200 GB SSD Mixed Use •  2 x 120 GB SDD for Boot & Ceph Monitors (LevelDB) •  2 x 1 Gbit + 4 x 10 Gbit NW

Page 10: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

10

Performance – First smoke tests

Page 11: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

11

First smoke tests

•  How fast with RadosGW? –  Response times (read / write)

•  Average? •  Percentiles (P99)?

–  Throughput? –  Compared to AWS S3?

•  A first (very minimalistic) test setup –  3 VMs (KVM) all with RadosGW, Monitor and 1 OSD

•  2 Cores, 4GB RAM, 1 OSD each (15 GB + 5GB), SSD, 10G Network between nodes, HAProxy (round-robin), LAN, HTTP

–  No further optimizations

Page 12: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

12

First smoke tests

•  How fast is RadosGW? –  Random read and write –  Object size: 4 KB

•  Results: Pretty promising! –  E.g. 16 parallel threads, read:

•  Avg 9 ms •  P99 49 ms •  > 1.300 requests/s

Page 13: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

13

First smoke tests

•  Compared to Amazon S3? –  Comparing apples and oranges (unfair, but interesting)

•  http vs. https, LAN vs. WAN etc.

•  Reponse times –  Random read, object size: 4KB, 4 parallel threads, location: Leipzig Ceph S3

(Test) AWS S3

eu-central-1 eu-west-1

Location Leipzig Frankfurt Ireland Avg 6 ms 25 ms 56 ms P99 47 ms 128 ms 374 ms Requests/s 405 143 62

Page 14: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

14

Performance – Now with the final hardware

Page 15: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

15

Now with the final hardware

•  How fast is RadosGW? –  Random read and write –  Object size: 4 KB

•  Results: –  E.g. 16 parallel threads, read:

•  Avg 4 ms •  P99 43 ms •  > 2.800 requests/s

Page 16: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

16

Now with the final hardware

0

50

100

150

200

250

300

350

1 2 4 8 16 32

ms

client threads

Average response times (4k object size)

read

write

Page 17: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

17

Now with the final hardware

0

5

10

15

20

25

30

35

40

45

50

1 2 4 8 16 32 32+32

ms

client threads

Read response times (4k object size)

avg

p99

Page 18: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Ceph Object Storage Performance

18

Now with the final hardware

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 4 8 16 32 32+32

requ

ests

/s

client threads

Read request/s

4k object size

128k object size

1 client / 8 threads: 1G network almost saturated at ~115 MB/s

2 clients: 1G network saturated again; but scale out works J

Page 19: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

19

Monitoring

Page 20: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Monitoring

20

Grafana rulez J

Page 21: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

21

Global availability

Page 22: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Global Availability

22

•  1 Ceph cluster per data center

•  S3 bucket-to-bucket replication

•  Multiple regions, local delivery

Page 23: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

23

Currently open issues / operational tasks

Page 24: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Open issues / operational tasks

24

•  Backup –  s3fs-fuse too slow –  Setup another Ceph cluster?

•  Security –  Users –  ACLs

•  Migration of old data –  Upload all existing files via script –  Use the old system as fallback / in parallel

Page 25: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Open issues / operational tasks

25

•  Replication –  Test-drive radosgw-agent –  s3cmd? Custom tool? –  Metadata (User) –  Data

•  Performance?

•  Bucket Notification –  Currently unsupported by RadosGW –  Build a custom solution?

Page 26: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

Open issues / operational tasks

26

•  Scrubbing •  Rebuild

Page 27: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)

To be continued ...

+ = ?