32
Resilient and Fast Persistent Container Storage Leveraging Linux’s Storage Functionalities Philipp Reisner, CEO LINBIT

Resilient and Fast Persistent Container Storage Leveraging Linux’s … · 2019. 12. 21. · LVM, RAID, SSD cache tiers, deduplication, targets & initiators. 31 Linux's LVM Volume

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • Resilient and Fast Persistent Container StorageLeveraging Linux’s Storage Functionalities

    Philipp Reisner, CEO LINBIT

  • 31

    COMPANY OVERVIEW

    REFERENCES

    • Developer of DRBD

    • 100% founder owned

    • Offices in Europe and US

    • Team of 30 highlyexperienced Linux experts

    • Partner in Japan

    TECHNOLOGY OVERVIEW

    LINBIT - the company behind it

  • Linux Storage GemsLVM, RAID, SSD cache tiers, deduplication, targets & initiators

  • 31

    Linux's LVM

    Volume Group

    physical volume physical volumephysical volume

    logical volumelogical volume snapshot

  • 31

    Linux's LVM

    • based on device mapper• original objects

    • PVs, VGs, LVs, snapshots• LVs can scatter over PVs in multiple segments

    • thinlv• thinpools = LVs• thin LVs live in thinpools• multiple snapshots became efficient!

  • 31

    Linux's LVM

    VG

    PV PVPV

    thinpoolLV snapshot

    thin-LV thin-LV thin-sLV

  • 31

    Linux's RAID

    • original MD code• mdadm command

    • Raid Levels: 0,1,4,5,6,10

    • Now available in LVM as well• device mapper interface for MD code

    • do not call it ‘dmraid’; that is software for hardware fake-raid

    • lvcreate --type raid6 --size 100G VG_name

    A4

    A3

    A2

    A1

    A4

    A3

    A2

    A1

    RAID1

  • 31

    SSD cache for HDD

    • dm-cache• device mapper module

    • accessible via LVM tools

    • bcache• generic Linux block device

    • slightly ahead in the performance game

  • 31

    Linux’s DeDupe• Virtual Data Optimizer (VDO) since RHEL 7.5

    • Red hat acquired Permabit and is GPLing VDO

    • Linux upstreaming is in preparation

    • in-line data deduplication

    • kernel part is a device mapper module

    • indexing service runs in user-space

    • async or synchronous writeback

    • Recommended to be used below LVM

  • 31

    Linux’s targets & initiators

    • Open-ISCSI initiator

    • Ietd, STGT, SCST• mostly historical

    • LIO• iSCSI, iSER, SRP, FC, FCoE

    • SCSI pass through, block IO, file IO, user-specific-IO

    • NVMe-OF• target & initiator

    Initiator Target

    IO-requests

    data/completion

  • 31

    ZFS on Linux• Ubuntu eco-system only

    • has its own

    • logic volume manager (zVols)

    • thin provisioning

    • RAID (RAIDz)

    • caching for SSDs (ZIL, SLOG)• and a file system!

  • Put in simplest form

  • 31

    DRBD – think of it as ...

    Target

    A4

    A3

    A2

    A1

    Initiator

    IO-requests

    data/completion

    RAID1

    A4

    A3

    A2

    A1

  • 31

    DRBD Roles: Primary & Secondary

    SecondaryPrimaryreplication

  • 31

    DRBD – multiple Volumes• consistency group

    SecondaryPrimaryreplication

  • 31

    DRBD – up to 32 replicas• each may be synchronous or async

    Primary

    Secondary

    Secondary

  • 31

    DRBD – Diskless nodes• intentional diskless (no change tracking bitmap)

    • disks can fail

    Primary

    Secondary

    Secondary

  • 31

    DRBD - more about• a node knows the version of the data is exposes

    • automatic partial resync after connection outage

    • checksum-based verify & resync

    • split brain detection & resolution policies

    • fencing

    • quorum

    • multiple resouces per node possible (1000s)

    • dual Primary for live migration of VMs only!

  • 31

    DRBD Roadmap• performance optimizations (2018)

    • meta-data on PMEM/NVDIMMS

    • zero copy receive on diskless (RDMA-transport)

    • no context switch send (RDMA & TCP transport)

    • Eurostars grant: DRBD4Cloud• erasure coding (2019)

  • The combination is more than the sum of its parts

  • 31

    LINSTOR - goals• storage build from generic (x86) nodes

    • for SDS consumers (K8s, OpenStack, OpenNebula)

    • building on existing Linux storage components

    • multiple tenants possible

    • deployment architectures• distinct storage nodes

    • hyperconverged with hypervisors / container hosts

    LVM, thin LVM or ZFS for volume management (stratis later)

    Open Source, GPL

  • LINSTOR

    DRBD

    storage nodestorage node

    hypervisor

    VM VM

    storage node storage node

    hypervisor

    VM VM

    DRBD

    hypervisor

  • LINSTOR w. failed Hypervisor

    DRBD

    storage nodestorage node storage node storage node

    hypervisor

    VM VM

    DRBD

    hypervisor

    VM VM

  • LINSTOR w. failed storage node

    DRBD

    storage node

    hypervisor

    VM VM

    storage node storage node

    hypervisor

    VM VM

    DRBD

    hypervisor

  • LINSTOR - Hyperconverged

    hypervisor & storage

    VM VM

    hypervisor & storage hypervisor & storage

    hypervisor & storage hypervisor & storage hypervisor & storage

  • LINSTOR - VM migrated

    hypervisor & storage

    VM

    hypervisor & storage hypervisor & storage

    hypervisor & storage hypervisor & storage hypervisor & storage

    VM

  • LINSTOR - add local storage

    hypervisor & storage

    VM

    hypervisor & storage hypervisor & storage

    hypervisor & storage hypervisor & storage hypervisor & storage

    VM

  • LINSTOR - remove 3rd copy

    hypervisor & storage

    VM

    hypervisor & storage hypervisor & storage

    hypervisor & storage hypervisor & storage hypervisor & storage

    VM

  • 31

    LINSTOR Architecture

  • 31

    LINSTOR Roadmap• Swordfish API

    • volume management

    • access via NVMe-oF

    • inventory sync from Redfish/Swordfish

    • support for multiple sites & DRBD-Proxy (Dec 2018)

    • north bound drivers• Kubernetes, OpenStack, OpenNebula, Proxmox, XenServer

  • 31

    Case study - intel

    LINBIT working together with Intel

    LINSTOR is a storage orchestration technology that brings storage from generic Linuxservers and SNIA Swordfish enabled targets to containerized workloads as persistentstorage. LINBIT is working with Intel to develop a Data Management Platform thatincludes a storage backend based on LINBIT’s software. LINBIT adds support for theSNIA Swordfish API and NVMe-oF to LINSTOR.

    Intel® Rack Scale Design (Intel® RSD)is an industry-wide architecture for disaggregated,composable infrastructure that fundamentally changes theway a data center is built, managed, and expanded over time.

  • Thank youhttps://www.linbit.com

    1 - Folie12 - LINBIT - the company behind it3 - Linux Storage Gems4 - Linux's LVM5 - Linux's LVM6 - Linux's LVM7 - Linux's RAID8 - SSD cache for HDD9 - Linux’s DeDupe10 - Linux’s targets & initiators11 - ZFS on Linux12 - Put in simplest form13 - DRBD – think of it as ...14 - DRBD Roles: Primary & Secondary15 - DRBD – multiple Volumes16 - DRBD – up to 32 replicas17 - DRBD – Diskless nodes18 - DRBD - more about19 - DRBD Roadmap20 - The combination is more than the sum of its parts

    21 - LINSTOR - goals22 - LINSTOR23 - LINSTOR w. failed Hypervisor24 - LINSTOR w. failed storage node25 - LINSTOR - Hyperconverged26 - LINSTOR - VM migrated27 - LINSTOR - add local storage28 - LINSTOR - remove 3rd copy29 - LINSTOR Architecture30 - LINSTOR Roadmap31 - Case study - intel32 - Folie34