Efficient data maintaince in GlusterFS using Databases

Embed Size (px)

Citation preview

  1. 1. GlusterFS1 Efficient data maintenance in GlusterFS using databases Joseph Fernandes Dan Lambright
  2. 2. GlusterFS2 Who we are ? Joseph Fernandes (Senior Engineer, Red Hat Storage) Dan Lambright (Principal Engineer, Red Hat Storage)
  3. 3. GlusterFS3 Agenda Quick GlusterFS Overview Data Maintenance Challenges Existing Solutions Proposed Solution : Optimized Database Case study : GlusterFS Data Cache Tier Lessons learned What's next
  4. 4. GlusterFS4 What is GlusterFS Distributed File System Software Define NAS TCP/IP or RDMA Native Client, SMB, NFS
  5. 5. GlusterFS5 What is Data Maintenance Maintenance tasks performed on data for protection, performance, and optimum storage utilization
  6. 6. GlusterFS6 Challenges in Data maintenance Data Maintenance has a overhead on CPU, Memory, Storage, Network.. Therefore.. Fast Search Rich Metadata Distribute Load balancing
  7. 7. GlusterFS7 Existing Solutions File system crawl File system log Metadata databases In-memory inode caches
  8. 8. GlusterFS8 Proposed Optimized DB for GlusterFSOptimized DB for GlusterFS
  9. 9. GlusterFS9 Optimized DB for GlusterFS Recordnow,consumelater Database optimized to record fast Good Querying Capabilities Embedded Database Crash Consistent (Eventually)
  10. 10. GlusterFS10 LibgfDB API Abstraction Rich Search Filters Performance optimization options
  11. 11. GlusterFS11 Gluster Brick Data Maintenance ScannersGluster Client Posix Xlator CTR Xlator IO Insert / Update LIBGFDB DataStore LIBGFDB Query
  12. 12. GlusterFS12 Datastore Optimization: Sqlite3 PRAGMA page_size: Align page size PRAGMA cache_size: Increased cache size PRAGMA journal_mode: Change to WAL PRAGMA wal_autocheckpoint : Less often autocheck PRAGMA synchronous : Set to NORMAL PRAGMA auto_vacuum : Set to NONE
  13. 13. GlusterFS13 DataStore Optimization: Sqlite3 Buffer cache Shared Memory File Write Ahead Logging (WAL) Database file Insert/Update Sync Checkpoint
  14. 14. GlusterFS14 Cache Tiering (Gluster 3.7 feature) Tiering logical volume composed of diverse storage units Secure / nonsecure, compressed / uncompressed, etc. Cache tiering Fast storage as cache for slow storage Fa$t SSD, slow HDD Fast 2X replicated, slow erasure coded What goes in the cache? DB tracks usage patterns Files migrate between tiers per usage Migration is slow
  15. 15. GlusterFS15 Policies for Smart Migration File size Access rate Migration frequency Break files into chunks Gluster sharding feature
  16. 16. GlusterFS16 Tier Xlator HOT DHT COLD DHT Replication Xlator Other Client Xlator HOT Tier POSIX Xlator CTR Xlator Other Server Xlator Brick Storage Heat Data Store POSIX Xlator CTR Xlator Other Server Xlator Brick Storage Heat Data Store COLD Tier Demotion Promotion
  17. 17. GlusterFS17 Lesson Learned : DB updates can be expensive DB query may have scalability problems
  18. 18. GlusterFS18 What's next: Libgfdb Performance options : iMeTaL : in-Memory Transaction Log PeTal : Persistent Transaction Log Sqlite3 Database Sharding Ceph Tier Implementation: Bloom Filters
  19. 19. GlusterFS19 Feature Page http://www.gluster.org/community/documentation/index.php/Features/ Gluster Github: https://github.com/gluster/glusterfs Email: Joseph Fernandes Dan Lambright
  20. 20. GlusterFS20 THANK YOU