18
Metadata Performance Bottlenecks in Gluster Poornima G Rajesh joseph

Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Embed Size (px)

Citation preview

Page 1: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Metadata Performance Bottlenecks in Gluster

Poornima G Rajesh joseph

Page 2: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Agenda Existing performance xlators Metadata operations

create files mkdir recursive ls -lR rm files rmdir recursive mv files Small file create / PUT Small file read / GET

Page 3: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Performance Xlators Write-behind and Flush behind Io-cache Read-ahead Quick-read Md-cache Readdir-ahead Open-behind Symlink cache

Page 4: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Test Setup and Tools Centos 7.2 clients and server 3 FUSE Clients simultaneously performing

operation on their home directories Latest master, with below volume options:

cluster.lookup-optimize onreaddir-optimize onconsistent-metadata no

Performance matrics collected using volume profile

Workload is some simple scripts.

Page 5: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

#Create Files

13

12

11

1025

10

181

STAT/LOOKUP - 13% (Negative Lookups ~ 12%) CREATE - 12%

REMOVEXATTR - 13% SETATTR - 8%

INODELK - 25% (Removexattr + setattr) ENTRYLK - 10%

FLUSH - 10% READDIRP - 1%

Page 6: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Create Files Negative lookups – 11-12% Removexattr (security.ima) – 23% Lease on parent dir and newly created files can

eliminate, sending inodelk(25%) and entrylk(10%) to the bricks. But only benefits single client use cases.

Create is not a function of number of bricks.

Page 7: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

#mkdir recursive

17

10

9

43

21

STAT/LOOKUP - 13% (Negative lookups ~ 12%) MKDIR - 10% SETXATTR - 9%

INODELK - 43% (SETXATTR and MKDIR) ENTRYLK - 21%

Page 8: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

mkdir recursive Negative lookup ~10% and cache miss In DHT, Setxattr can be compounded with mkdir?

saves inodelks and setxattr Lease on parent dir and newly created files can

eliminate, sending inodelk and entrylk to the bricks. But only benefits single client use cases.

It is not a function of number of dht subvol, as mkdir is sent initially on hashed subvol followed by parallel mkdir on the rest.

Page 9: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

#ls -lR

34

19

47

STAT/LOOKUP - 34% (~20% on dir ~10% cache miss) OPENDIR - 19% READDIRP - 47%

Page 10: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

ls -lR Readdirp - sequential and fn(number of dht

subvols). Hence not scalable.Eg: reading 800 empty directories on 10*1 = 16000 sequential readdirp calls30*1 = 48000 sequential readdirp calls

Readdir-ahead only masks part of this latency, hence parallel readdirp in DHT.

Eliminate EOD detection readdirp call DHT – do not NULL the readdirp entries of

directory Md-cache – eliminate revalidate lookups

Page 11: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

#rm files in one large dir

2

36

18

44

STAT/LOOKUP - 2% ENTRYLK - 36% UNLINK - 19% READDIRP - 44%

Page 12: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Delete files Parallel readdirp – may not improve performance

drastically for large directories. Lease on parent directory, can make entrylk only

client side. But helps only in single client use case.

Page 13: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

#rmdir small directories recursively

12

9

55

35

18

STAT/LOOKUP - 12% OPENDIR - 9% INODELK - 55% (rmdir sequential inodelk)

RMDIR - 3% ENTRYLK - 5% READDIRP - 18%

Page 14: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

rmdir Recusrsive Rmdir - fn (number of dht subvolumes). Hence

not scalable. Inodelks are taken sequentially on all dht subvols

before rmdir (rmdir and inodelk unlock – parallel). WIP to remove inodelk.

Lease on parent directory, can make inodelk and entrylk only client side. But helps only in single client use case.

Revalidate lookup – md-cache

Page 15: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Small file create

20

21

101

50

STAT/LOOKUP - 20% (Negative LOOKUP - 15-16%) ENTRYLK - 21%

CREATE - 10% WRITE - 1%

FLUSH - 50% (FINODELK + FXATTROP + WRITE + FLUSH)

Page 16: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Small file Create / PUT ENTRYLK, INODELK can be made only client side

by using leases, but helps single client use case Compound fop XATTROP + WRITE PUT fop, only gfapi clients(Swift, core utils) will

be able to leverage.

Page 17: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Small file Read

95.6

2.51.40.5

STAT/LOOKUP - 95% OPEN - 2.5% READ - 1.2% FLUSH - 0.5%

Page 18: Performance bottlenecks for metadata workload in Gluster with Poornima Gurusiddaiah, Rajesh Joseph

Conclusion

We have not yet saturated, the perf improvements. Performance of these metadata operations can be improved

greatly, without compromising on consistency.