(Redhat) Linux Important Stuff (34)

Embed Size (px)

Citation preview

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    1/50

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    2/50

    Picking the Right File & Storage

    System for your Application

    Ric Wheeler

    Architect & Manager, Red HatJune 23, 2010

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    3/50

    Overview

    Introduction Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    4/50

    File System Types

    Local file systems ext3, ext4, xfs and btrfs

    Network file systems

    NFS and CIFS

    Shared disk file systems

    GFS2

    Cloud file systems

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    5/50

    Which File System is Best?

    It always depends on your specific application andcircumstances

    Budget?

    Performance requirements?

    Capacity needs?

    Availability?

    Robustness in the face of power outages & crashes?

    IO Workload generated by your application? Different answers for every combination of answers!

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    6/50

    Data Integrity over System Crash

    Systems can fail for multiple reasons Power outage, hardware fault, software failure

    Modern file systems use a journal mechanism tomaintain consistent state

    Similar to a database transaction

    Correctness tied to order that data makes it to safestorage

    barrier support manages volatile storage device writecache

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    7/50

    Alignment on Storage

    Most storage has a preferred IO size and alignment Simple disks have a 512 byte IO size and alignment

    need

    New drives move to 4096 byte IO and alignment

    Historic default to sector 63

    Does not work for some storage at all

    Can be a big performance hit for some sophisticatedstorage devices

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    8/50

    Discard Support

    File systems now issue discard hints to block layer Informs storage of unused ranges of blocks

    Allows storage to keep an accurate picture of what isutilized

    SSD devices see this as a TRIM command

    Used for wear leveling, pre-erase, etc

    SCSI devices see this as an UNMAP command

    Used for thinly provisioned LUNs

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    9/50

    Overview

    Introduction Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    10/50

    EXT3 Pros & Cons

    10

    ext3 is the most common file system in Linux

    Most distributions have used it as their default

    Applications tuned to its specific behaviors

    Familiar to most system administrators

    ext3 challenges

    File system repair (fsck) time can be extremely long

    Limited scalability - maximum file system size of 16TB

    Can be significantly slower than other local file systems

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    11/50

    EXT4 Pros & Cons

    11

    Ext4 has many compelling new features

    Extent based allocation

    Faster fsck time (up to 10x over ext3)

    Delayed allocation

    Higher bandwidth

    Should be relatively familiar for existing ext3 users

    Ext4 challenges

    Large device support not finished in its user space tools

    Limits supported maximum file system size to 16TB

    Has different behavior over system failure

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    12/50

    XFS Pros and Cons

    XFS is very robust and scalable Very good performance for large storage

    configurations and large servers

    Many years of use on large (> 16TB) storage

    Red Hat tests & supports up to 100TB

    XFS challenges

    Not as well known by many customers and field

    support people Performance issues with meta-data intensive (small

    file creation) workloads

    12

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    13/50

    BTRFS

    Btrfs is the newest local file system Has its own internal RAID and snapshot support

    Does full data integrity checks for metadata anduser data

    Can dynamically grow and shrink

    Supported in RHEL6 as a tech preview item

    Developers very interested in feedback and testing

    Not meant for production use!

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    14/50

    ext3 is our default file system for RHEL5

    ext4 is supported as a tech preview in (5.4)

    xfs offered as a layered product (5.5+)

    RHEL5 Local File Systems

    14

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    15/50

    RHEL6 Local FS Summary

    FS write barrier enabled for ext3, ext4, gfs2 and xfs

    FS tools warn about unaligned partitions

    parted/anaconda responsible for alignment

    Size Limitations XFS for any single node & GFS2 for clusters up to

    100TB

    Ext3 & ext4 supported < 16TB

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    16/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    17/50

    NFS Overview

    Supported by a huge range of hardware NFS servers range from consumer devices up to high

    end NAS arrays

    Performance varies with network & hardware

    Scales up to very large file systems Popular uses

    Users' home directories

    Read-mostly workloads in scale out configurations ofdozens of nodes

    See Steve Dickson's talk on NFS for details

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    18/50

    NFS Limitations

    Traditional NFS servers can be a bottleneck Parallel NFS (pNFS) is a new standard that allows

    direct client to data connections

    Object, block and file versions

    Does not provide SMP-like coherency for clients Client A needs to wait to see data written by client B

    Similar issue with newly created files in a directory

    NFS V4.0 delegations improve this situation

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    19/50

    CIFS and Samba

    Samba is a server that speaks Microsoft SMBprotocols

    Allows RHEL to provide networked storage for windowsguests

    CIFS is the client side file system that provides accessto SMB servers

    Allows RHEL clients of windows or Samba servers

    See Jeff Layton's CIFS or Simo Sorce's Samba talk for

    details

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    20/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    21/50

    Shared Disk File Systems

    Design goal is to provide tight coherence and highavailability

    Avoids most of the issues and lags seen with NFSclients and servers

    Achieves this by aggressive use of distributed locks Requires shared storage

    Shared disk file systems pay for this tighter coherency

    Tend be slower than a dedicated local file system

    Complex to set up and maintain

    Application tuning needed to avoid lock thrashing

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    22/50

    Choosing Between NFS & GFS2?

    GFS2 is a layered product aimed at deployments thatneed high availability

    Supported on clusters from 2-16 nodes

    GFS1 support is dropped in RHEL6

    Maximum FS size is 100TB Users are encouraged to review configuration with Red

    Hat

    NFS deployments are much easier to set up and

    configure

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    23/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    24/50

    Storage Systems Overview

    Different types of storage have wildly varyingperformance characteristics

    Random write?

    Random read?

    Streaming read? Streaming write?

    File systems historically have been tuned to run beston traditional, single rotating disk drives

    See Tom Coughlan's talk on storage for details

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    25/50

    Traditional Spinning Disk

    Spinning platters store data Modern drives have a large, volatile write cache (16+

    MB)

    Streaming read/write performance of a single S-ATA

    drive can sustain roughly 100MB/sec Seek latency bounds random IO to the order of 50-100

    random IO's/sec

    This is the classic platform that operating systems &

    applications are designed for Write barrier support needed on these devices

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    26/50

    External Disk Arrays

    External disk arrays can be extremely sophisticated Large non-volatile cache used to store data

    IO from a host normally lands in this cache withouthitting spinning media

    Performance changes Streaming reads and writes are vastly improved

    Random writes and reads are fast when they hit cache

    Random reads can be very slow when they miss cache No need for write barrier support on these devices

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    27/50

    SSD Devices

    S-ATA interface SSD's Streaming reads & writes are reasonable

    Random writes normally slow

    Random reads great!

    PCI-e interface SSD's enhance performance acrossthe board

    Both types of devices tend to use internal DRAM as abuffer

    Some might need write barrier support

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    28/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    29/50

    RHEL6 Support for Alignment

    New standards allow storage to inform OS of preferredalignment and IO sizes

    Few storage devices currently export the information

    Partitions must be aligned using the new alignmentvariables

    fdisk, parted, etc snap to proper alignment

    FS tools warn of misaligned partitions

    Red Hat engineering is actively working with partnersto verify and enhance this for our customers

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    30/50

    RHEL6 Support for Discard

    File system level feature that informs storage ofregions no longer in active use

    SSD devices see this as a TRIM command and use it todo wear leveling, etc

    Arrays see this as a SCSI UNMAP command and canenhance thin lun support

    Discard support is off by default

    Some devices handle TRIM poorly

    Might have performance impact Test carefully and consult with your storage provider!

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    31/50

    RHEL6 NFS Features

    NFS version 4 is the default Per client configuration file can override version 4

    Negotiates downwards to V3, V2, etc

    Support for industry standard encryption types IPV6 Support added for NFS and CIFS

    NFS clients fully supported in 6.0

    NFS server support for IPV6 aimed at 6.1

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    32/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    33/50

    Performance & Measurement

    Workload, storage device and server type all have ahuge impact

    Always measure your actual application on your realsystem if possible!

    Same test run on different storage can give oppositeresults

    Various file systems have special tuning that can help

    See talks by our performance team Rao & Wagner

    and Shakshober & Woodman

    Making a File System Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    34/50

    Making a File System Elapsed Time (sec)Smaller is Better

    S-ATA Disk - 1TB FS PCI-E SSD - 75GB FS

    0

    50

    100

    150

    200

    250

    300

    EXT3EXT4

    XFS

    BTRFS

    Making a File System Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    35/50

    Making a File System Elapsed Time (sec)Smaller is Better (Zooming in on SSD)

    PCI-E SSD - 75GB FS

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    EXT3EXT4

    XFS

    BTRFS

    Creating Lots of Small Files Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    36/50

    Creating Lots of Small Files Elapsed Time (sec)Smaller is Better

    S-ATA Disk - 100GB File PCI-E SSD - 50GB File

    0

    2000

    4000

    6000

    8000

    10000

    12000

    EXT3EXT4

    XFS

    BTRFS

    Creating Lots of Small Files Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    37/50

    Creating Lots of Small Files Elapsed Time (sec)Smaller is Better (Zooming in on SSD)

    PCI-E SSD - 50GB File

    0

    50

    100

    150

    200

    250

    300

    350

    400

    EXT3EXT4

    XFS

    BTRFS

    Creating Lots of Small Files Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    38/50

    Creating Lots of Small Files Elapsed Time (sec)Smaller is Better

    S-ATA Disk - 100GB File PCI-E SSD - 50GB File

    0

    2000

    4000

    6000

    8000

    10000

    12000

    EXT3EXT4

    XFS

    BTRFS

    Creating Lots of Small Files Elapsed Time (sec)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    39/50

    Creating Lots of Small Files Elapsed Time (sec)Smaller is Better (Zooming in on SSD)

    PCI-E SSD - 50GB File

    0

    50

    100

    150

    200

    250

    300

    350

    400

    EXT3EXT4

    XFS

    BTRFS

    File System Repair Elapsed Time (secs)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    40/50

    y p p ( )Smaller is Better

    S-ATA Disk - FSCK 1 Million Files PCI-E SSD - FSCK 1 Million Files

    0

    200

    400

    600

    800

    1000

    1200

    EXT3EXT4

    XFS

    BTRFS

    File System Repair Elapsed Time (secs)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    41/50

    y p p ( )Smaller is Better (Zooming in on SSD)

    PCI-E SSD - FSCK 1 Million Files

    0

    10

    20

    30

    40

    50

    60

    70

    80

    EXT3EXT4

    XFS

    BTRFS

    Writing a Few Medium Files Elapsed Time (secs)

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    42/50

    g p ( )Smaller is Better

    S-ATA Disk - 20 1GB Files PCI-E SSD - 20 1GB Files

    0

    50

    100

    150

    200

    250

    300

    EXT3EXT4

    XFS

    BTRFS

    Writing 1 Really Big File MB/sec

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    43/50

    g y gBigger is Better

    S-ATA Disk - 100GB File PCI-E SSD - 50GB File

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    EXT3

    EXT4

    XFS

    BTRFS

    Block Device

    RHEL5 3 IO EXT3 EXT4 XFS l

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    44/50

    RHEL5.3 IOzone EXT3, EXT4, XFS evalBigger is Better

    EXT4DEV EXT4 BARRIER=0 XFS XFS Barrier=0

    0

    5

    10

    15

    20

    25

    30

    35

    40

    RHEL53 (120), IOzone PerformanceGeo Mean 1k points, Intel 8cpu, 16GB, FC

    In Cache

    Direct I/O

    > Cache

    PercentRe

    lativetoEXT3

    RHEL5 Oracle 10.2 Performance FilesystemsIntel 8 cp 16GB 2 FC MPIO AIO/DIO

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    45/50

    Intel 8-cpu, 16GB, 2 FC MPIO, AIO/DIOBigger is Better

    10 U 20 U 40 U 60 U

    0.00

    10000.00

    20000.00

    30000.00

    40000.00

    50000.00

    60000.00

    70000.00

    80000.00

    90000.00

    100000.00

    RHEL53 Base 8 cpus

    ext3

    RHEL53 Base 8 cpus

    xfs

    RHEL53 Base 8 cpus

    ext4

    Performance Summary

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    46/50

    Performance Summary

    Always measure performance of your application onyour real system!

    No single file system out performs every other one

    Expensive storage can hide performance issues

    Retest when moving to a new OS or applicationversion

    Faster is not always better

    Trade offs include reduced data integrity

    Less features like extended attributes, system security

    Overview

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    47/50

    Overview

    Introduction

    Local File Systems

    Networked File Systems

    Shared Disk File Systems

    Storage Overview

    New RHEL6 FS Features

    Performance Results

    Futures

    Upcoming Local File System Features

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    48/50

    Upcoming Local File System Features

    Union mounts

    Allow a read-write overlay on top of a read-only base filesystem

    Useful for virt guests storage, thin clients, etc

    Continuing to help lead btrfs development towards anenterprise ready state

    Support for ext4 on larger storage

    Enhanced XFS performance for meta-data intensive

    workloads

    Upcoming NFS Features

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    49/50

    Upcoming NFS Features

    PNFS support pNFS and more 4.1 features aimed at a minor 6.x

    release

    No commercial arrays support pNFS yet

    Ongoing work on open source (GFS2, object, etc)pNFS servers

    Working with standards body to add support forpassing extended attributes over NFS

    Goal is to enable SELinux over NFS

  • 8/6/2019 (Redhat) Linux Important Stuff (34)

    50/50