164
An Overview of On- Premise File and Object Storage Access Protocols Dean Hildebrand - Research Staff Member, IBM Research Bill Owen - Senior Engineer, IBM v1.2

An Overview of On- Premise File and Object Storage Access Protocols

  • Upload
    lyduong

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

  • An Overview of On-Premise File and Object

    Storage Access Protocols

    Dean Hildebrand - Research Staff Member, IBM ResearchBill Owen - Senior Engineer, IBM

    v1.2

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    2

  • Dean HildebrandResearch Staff Member

    IBM Research

    Bill OwenSenior Engineer

    IBM

    3

  • Attendance Poll

    SysAdmin/Storage Architect/

    ManagerDevelopers Students Researchers

    4

  • Software Storage Market Growth

    5

  • Accessing Data in On-Premise Storage Systems

    6

  • Local vs SharedStorage

    7

  • Local Storage

    Most common for laptops, desktops, mobile devices, server OS boot disks Typically formatted with a file system

    E.g., Ext4, XFS, NTFS, HFS+, BtrFS, ZFS Invaluable to manage a single device (or maybe a few with LVM) Varying levels of availability, durability, scalability, etc, supported All limited to a single node

    E.g., Cannot support VM or container migration, support 1000s of applications, etc, etc

    In your research, think about the real benefits of further optimizing local storage How many pressing problems are left to be solved? Only incremental gains? Common to be used as a building block in higher level storage systems

    8

  • Shared Storage

    SSD FastDisk

    SlowDisk

    Tape

    Network

    Supports any kind of storage device

    Supports any type of network and network/file protocol

    Supports any kind of client deviceIndependent Scaling of

    clients

    Independent Scaling of storage bandwidth and capacity

    9

  • Block Shared Storage

    Used to dominate, now mostly shrinking...except

    FC continues to have very low

    latency, and so finding new life with Flash storage systems

    iSCSI still very popular for VMs

    SSD FastDisk

    SlowDisk

    Tape

    FibreChannel/Ethernet

    iSCSI/FC/FCoE (and others)

    Typically in pairs for H/A

    10

  • Parallel and Scale-out File Systems Scalability (all dimensions) Performance (all dimensions) Support general applications and

    middleware Make managing billions of files, TB/s

    of bandwidth, and PBs of data *easy*

    HPC

    Commercial

    SSD FastDisk

    SlowDisk

    Tape

    Infiniband/Ethernet

    Proprietary File Access Protocol

    Scale-out as needed

    ...

    11

  • Distributed Access Protocols

    SlowDisk

    Ethernet

    SSD FastDisk

    SlowDisk

    Tape

    Infiniband/Ethernet

    ...

    Wide variety of solutions

    Vast Range of Performance and

    Scalability Options

    Standard and Non-Standard Protocols

    12

  • Distributed Access Protocols:Portability and Lock-In

    Standard APIs help Maximize Application Portability Minimize Vendor Lock-in

    Numerous benefits of standard protocols Standard protocol clients ship in most OSs Promote predictability of semantics and application behavior

    Minimize changes to applications and system infrastructure when switching to new storage system (many times due to reasons out of your control)

    Applications can move between on-premise and off-premise (cloud) systems

    Wider and more broad user base makes it easier to find support and also hardens implementations

    13

  • Distributed Access Protocols:Standards Are Not A Silver Bullet

    For File, while applications use POSIX, they are sensitive to implementation No common set of commands guarantee crash consistency [***] For distributed file systems, it becomes even more complicated

    Different crash consistency semantics, cache consistency semantics, locking semantics, security levels/APIs/tools, etc

    For Object, each implementation varies w.r.t. level of eventual consistency, security, versioning, etc

    Even what we consider standards are not actually well defined E.g., SMB, S3

    Examples: CIFS/SMB does sequential consistency whereas NFS has close-to-open semantics Versioning is quite different between object protocols

    [***] - All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14 14

  • Distributed Access Protocols:One Storage Protocol CANNOT Do It All

    There are so many vendors...each claiming they have *solved* data storage or is it world hunger?

    Vendors sell what they have, not what you need A storage seller takes what they have a makes it fit for practically any requirement and use case Leads to many unsatisfied customers soon after deployment

    Many protocols have existed: DDN WOS, EMC ATMOS, CDMI, AFS, DFS, RFS, 9P, etc

    Tips Attend sessions like this to learn more about reality and not hype :) Dig into advertised feature support

    How many customers use a feature, will the customer talk about it, in what context do they use it, etc Validate system on-premise using realistic workloads (do you know your workloads?)

    Remember there is no guarantee for what you havent tried (x- and y-axis have an upper bound for a reason) Dont buy H/W first and then expect any storage S/W vendor to support it efficiently

    15

  • On Premise Data Access Protocols:NFS and now Swift, S3

    NFS and SMB are the clear winners

    SMB is being discussed in SNIA tutorial this week, so well focus on

    NFS

    Note: HDFS also dominant for analytics

    Winners

    Industry appears to be centralizing around Swift and S3

    S3: Amazon + many, many apps/tools

    Swift: Open source + API + 3 cloud vendors (or more)

    Easily repatriate apps due to cost

    File Object

    16

  • Tutorial GoalsSysAdmin/

    Storage Architect/Manager

    Developers Students Researchers

    Understand which protocols are best for which applications

    Understand tradeoffs between protocols

    Introduction to vendor landscape

    Be able to determine which file-based applications are good candidates for using an object protocol

    Understand how to choose the best protocol for an application (and consequences of choosing the wrong protocol)

    Introduction to NAS and Object history and vendor landscape

    Introduction to distributed data access research potential

    Understand challenges of on-premise distributed data access

    Understand on-premise data center challenges

    Introduction to distributed data access research potential

    17

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    18

  • File and ObjectBoth Can Do Anything

    Fantasy

    19

  • File and ObjectEach has their strengths and weaknesses

    vs

    Reality

    20

  • File and ObjectEach has their strengths and weaknesses

    vs

    Reality

    Confusion21

  • Object vs File Summary Object Store File

    Target most workloads (except HPC)

    Medium to High performance

    Typically Scales to Medium Scalability

    Low to High Cost

    Limited Capability for Global Distribution

    Standard File Data Access

    POSIX + Snapshots

    Strong(er) Consistency

    Target cold data (Backup, Read-only, Archive)

    Low to Medium Performance

    Typically Scales to Large Capacity

    Low cost

    Global and Ubiquitous/Mobile Access

    Data Access through REST APIs

    Immutable Objects and Versioning

    Loose/Eventual Consistency

    22

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    23

  • NFS: A Little History... NFSv2 in 1983

    Synchronous, stable writes,...outdated Finally removed from Fedora

    NFSv3 in 1995 Still default on many distros...

    NFSv4 in 2003, updated 2015 Default in RHEL... possibly others

    NFSv4.1 and pNFS in 2010 Many structural changes and new features

    NFSv4.2 practically complete now Many new features, VM workloads specifically targeted

    Now going to try per-feature releases 24

  • Deployment

    The beauty is that it is everywhere (even Windows) Well mostly...more on that later with object

    Most NFS servers are in-kernel or proprietary But Ganesha is the first open-source user-level NFS server daemon

    For the enterprise, Scale-out NAS now a requirement for capacity and availability

    New clients and environments emerging VMware announces support for NFSv4.1 as a client for storing VMDKs Amazon announces support for NFSv4.0 in AWS Elastic File System (EFS) OpenStack Manila is a shared file service with NFS as the initial file protocol Docker has volume plugins that support NFS

    25

  • NFS Caching Semantics

    Not POSIX But a single client with exclusive data access should see POSIX semantics

    v2 could not cache data Sync writes

    v3 can cache data, but... Weak cache consistency Revalidates on open and periodically (30s in Linux) Data must be kept in cache until committed by server (just in case the server fails)

    v4 standardizes close to open cache consistency Similar to v3, but guarantees cache is revalidated on OPEN, flushed at CLOSE Also checked periodically and at LOCK/LOCKU Note granularity is typically on 1 second Delegations reduces number of revalidations required...

    26

  • NFSv3 Collection of protocols (file, mount, lock, status)

    Each on their own port Stateless (mostly)

    Locks add state Server must keep request cache to prevent duplicate non-idempotent RPCs

    UNIX-centric, but seen in Windows too 32 bit numeric uids/gids UNIX permissions, but Kerberos also possible Works over UDP, TCP Needs a-priori agreement on character sets

    27

  • NFSv4 New Features

    Finally standardized almost everything Custom export tree with pseudo-name space Mandatory use of congestion protocol (TCP) Delegations

    Clients become server for a file, coordinating multi-threaded access

    Less communication and better caching Also includes callbacks from server to client Linux only implements RO delegations

    Uses a universal character set for file names Integrated and well defined locking

    Removes need for additional ports and daemons Share reservations for Windows Mandatory locks supported Much easier to support consistency across failures

    Security NFSv4 ACLs (much more full featured than POSIX ACLs) Use of named strings instead of 32-bit integers Lofty goals with new GSS-API, but essentially benefit is that

    Kerberos is officially supported and easier to configure Kerberos V5 must always supported (but not

    necessarily used)

    Compound RPCs Dream was to reduce number of messages, but. Due to state operations, and POSIX API, number of

    messages actually increases in some cases

    Referrals Server can refer clients to other servers for subtree Migration, load balancing

    Increased create rate through asynchronous ops Better inter-protocol support

    OPEN operation allows coordination with CIFS, etc

    28

  • 56% 43%OPEN &CLOSE

    NFSv4 Statefulness Implies Talkative

    *From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16

    29

    http://doi.acm.org/10.1145/2745844.2745845

  • NFSv3 NFSv4

    Lease-based state

    StatelessnessThe state of being immortal

    30

  • But Does Statelessness Really Justify Lack of Innovation?

    Are We Frozen In Time?31

  • *From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16 32

    http://doi.acm.org/10.1145/2745844.2745845

  • *From Newer Is Sometimes Better: An Evaluation of NFSv4.1, SIGMETRICS16

    Reading Small Files

    33

    http://doi.acm.org/10.1145/2745844.2745845

  • NFSv4.1 New Features Introduces a session layer

    Exactly once semantics Vastly simplifies locking

    Multipathing via Trunking Utilize more paths by using multiple IPs can be identified as same server Retry failed requests over other paths

    Retention attributes for compliance Delegations are easier to manage

    Recall ANY semantics allow clients to decide what is the best delegations to recall Re-acquisition without re-open

    pNFS Scalable data access to scale-out storage systems Improved load balancing

    34

  • What is pNFS? Scalable access to YOUR data

    Direct and parallel data access Scale with underlying storage system Better load balancing If NFS can access it, then so can pNFS

    Standard file access (part of OS) Open client, no client licensing issues

    Layouts Metadata

    Clients always issue metadata requests to an NFSv4.1 server Scale-out systems can support multiple metadata servers to the same data

    Data File layout part of NFSv4.1 Object and Block variants in separate Internet Drafts

    Security and Access Control Control path uses NFSv4.1 security Data path uses security of I/O protocol

    GPFS HDFS

    Lustre

    ZFSPanFS

    NetappdCache

    pNFS

    35

  • What Coming in NFSv4.2 Sparse File Support

    Hole punching to reclaim space Avoid transferring 0s and unallocated space across wire on reads

    Space Reservation Ensure application does not run out of space

    Server Side Copy Finally stop copying data through client machine

    Labeled NFS Allows (partial) SELinux support for Mandatory Access Control (MAC)

    Client can inform server of I/O patterns Provide a fadvise-like mechanism over a network

    Application Data Holes Allows definition of file format E.g., Initializing a 30G database takes a single over the wire operation instead of 30G of traffic.

    Great for managing virtual disk images

    36

  • Other Notable NFS Features

    RDMA Support possible for all versions of NFS, but best with NFSv4.1

    Federated File System Enables file access and namespace traversal across independent file servers Across organizations or within a single organization Suite of standards including DNS, NSDB, ADMIN, and file-access (NFS)

    Extended Attribute (xattr) support on track for first post-NFSv4.2 feature Existing named attributes did not work well with modern OS xattrs New NFS xattrs will interoperate with existing OS xattr support

    37

  • Ganesha: User Space NFS Servero Ganesha History

    Developed by Philippe Deniel (CEA)

    o Ganesha Features Efficient caching device for data and metadata Scalable, High Performance Per file system namespace (FSAL) modules

    Abstraction that allows each file system to perform its own optimizations Also allows for proxy support and other non-traditional back-ends

    o User Space makes life easier Security Managers (like Kerberos) reside in User Space

    Can be accessed directly via GSSAPI (no need for rpc_pipefs) ID mappers (NIS, LDAP) reside in User Space

    Accessed directly (the daemon is linked with libnfsidmap) Less constraints for memory allocation than in kernel space

    Managing huge pieces of memory is easy Developing in User Space is soooooooooooo much easier

    Hopefully increased community support

    Great Open Source Community that

    includes IBM, Panasas, DDN, CEA,

    Redhat

    38

  • Ganesha FSAL: File System Abstraction Layer Namespace independent API

    Translation layer between NFS server and underlying file system

    Allows file system to customize how files and directories are managed

    Allows for file system value add

    Handle based API (lookup, readdir, )

    Implements namespaces specific authentication mechanisms

    Many FSAL modules exist GPFS, HPSS, Proxy, Lustre, XFS, ZFS, GlusterFS,

    etc

    39

  • NFS Security NFSv3 first relied on ONC RPC

    AUTH_SYS is trivial to exploit AUTH_DES is trivial to exploit by someone with a

    degree in Mathematics AUTH_KERB is better, but it isnt standard

    No written specification to refer to Like AUTH_SYS, AUTH_DES, there is no integrity

    or privacy protection. All NFS versions now support RPCSEC_GSS NFSv4 added

    Mandatory support for Kerberos V5 krb5 (authentication) krb5i (auth+integrity) krb5p (auth+integrity+privacy)

    Removed external mount protocol NFSv4 ACLs

    Kerberos

    40

  • Quick Basics on ACLs (Authorization) Linux permissions too coarse

    Single user to narrow Group too broad

    POSIX ACLs are very basic Allows multiple users/groups per file/directory Files/Directories inherit ACLs of their parent directory Use standard userids

    NFSv4 ACLs are richer Close to subsuming Windows ACLs is a user/group (at an org) defined by text name 4 types of Access Control Entities (ACEs) for a

    ALLOW - Grant access DENY - Deny access AUDIT - Log access to any file or directory ALARM - Generate an alarm on attempt to access any

    file or directory Can control inheritance among other things Works well with enterprise directory services

    Example 2: Give myuser read permissions to file1:

    $ nfs4_setfacl -a "A::[email protected]:R" file1

    Example 1: Give myuser read permissions to file1:

    $ setfacl -m user:myuser:r file1

    POSIX

    NFSv4

    41

  • So Do I Just Need an NFS Server and Im in Business?

    42

  • Maybe.

    but how important is performance, scalability, availability,

    durability, multi-protocol access, backup, disaster recovery, encryption, compression, cost, ease of management, tiering, archiving, etc

    to you?43

  • If so, you need Scale-Out NAS

    High Availability Tricky with NFSv4, since state must be migrated Failure requires grace period to recover from state

    Capacity and Performance Scaling Much, much more....

    Many good options depending on requirements and budget

    44

  • Workloads and Benchmarks

    Modern NAS systems can support 100k+ iops, from 1000s of clients So the range of workloads they are currently doing is practically everything

    Spec SFS 2008 only represents a very specific metadata heavy workload

    45

  • W2W1

    W2W1

    NAS Appliance

    Physical Machine

    NFS/SMBPhysical Machine

    Physical Virtual

    GPFS, WAFL, ZFS GPFS, WAFL, ZFS

    NAS Appliance

    Application

    NFS /SMB

    Current NASBenchmarks

    New NASBenchmarks

    Physical Machine Physical Machine

    Application Application

    Virtual Machine

    Application

    Virtual Machine Virtual Machine

    Application Application

    Meta-data opsSPECsfs2008: 72%Virtual setup: < 1%

    46

  • VM Workload ChangesWorkload Property Physical NAS clients Virtual NAS clients

    File and directory count Many files and directories Few files per VM

    Directory tree depth Deep and non-uniform Shallow and uniform

    File size Lean towards small files Multi-gigabyte, but sparse

    Meta-data operations Many Almost none

    I/O synchronization Async and sync All writes are sync

    In-file randomness Workload-dependent Increased randomness

    Cross-file randomness Workload-dependent Predictable

    I/O sizes Workload-dependent Increased and decreased

    Read-modify-write Infrequent More frequent

    Think time Workload-dependent Increased

    47

  • Workloads Spec SFS 2014 - 4 separate workloads that support any POSIX interface

    number of simultaneous builds number of video streams that can be captured number of simultaneous databases number of virtual desktops

    So Spec SFS 2014 is a step forward, but still only represents a very very marginal slice of possible workloads

    Makes assumptions on architecture and use of features sparse/allocated files, file size, direct I/O, data ingest rates, etc

    Client now plays a pivotal role in results NAS systems rarely support a single dedicated workload Locking? Doesnt cover day to day operations such as copying files, find, grep, etc

    Wont see big performance difference between NFSv4 and NFSv3 NFSv4 is more than just performance enhancements (pNFS an exception :)

    48

  • Summary Comparison of NFSv4 over NFSv3

    Single protocol Coherent locking Security

    NFSv4 ACLs Enhanced Kerberos support

    Eliminate hotspots with pNFS Ride wave of NFS enhancements Exactly once semantics Asynchronous creates Close-to-open semantics

    Benefits Drawbacks

    More work for NFS developers

    49

  • Summary In order for NFS to advance, need to move to NFSv4 Lets work together to stop implementing new v3 servers

    I do love the 90s though...but not everything is worth keeping Ask your NAS vendor if they have a path from NFSv3 to NFSv4.1

    And to NFSv4.2 and beyond

    *Can* do most anything, but really good at the following use cases easy access to data within a LAN since laptops and servers have built-in clients

    plug-n-play for any file-based application very good performance without installing extra specialized clients

    small to moderate amounts of data interoperability with SMB storage for virtualization (and other emerging areas like containers)

    NFS continues to struggle with several areas Mobile, WAN, HPC, Cost (for H/A), Scalability, Searching for Files and Data 50

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    51

  • Significantly Reduced

    Complexity

    Highly Scalable with

    Low Cost

    Global, Secure Multi-tenant

    Access

    Supports Emerging

    Workloads

    Simplified data scaling through flat namespace

    Easy to use REST based data access

    Storage automation

    User defined metadata and search capabilities

    Software defined storage flexibility

    Leverage low cost commodity hardware

    High density storage

    Handles ever increasing storage requirements

    Global data access and distribution

    Multi-tenant management and data access

    Role based authentication

    Encryption of data in flight and at rest

    Unstructured immutable data store

    Social Mobile Analytics

    Why Do Clients Need Object Storage?

    52

  • 53

    Backup and Disaster

    RecoveryArchive Content storage

    and distribution Cloud Service Provider (CSP)

    Private, public, hybrid backup repository

    Recover after data loss event (corruption, deletion)

    Leverage copy on object storage to recover from a disaster

    Active archive Compliant

    archive Cold archive Big data

    storage / analytics

    Content management repository

    Global collaboration and distribution

    Non-ephemeral data store for cloud compute

    Public cloud storage Static web content

    Sample Object Storage Use Cases

    53

  • How Do Clients Access Object Storage? Two APIs emerging for on-premise object storage deployments

    OpenStack Swift Amazon S3

    Many products/public clouds support proprietary APIs Microsoft Azure Google Cloud Storage EMC Atmos DDN WOS

    CDMI is an attempt to standardize, but support is fading Concepts are similar across all APIs - we will focus on Swift and S3

    54

  • Object Storage Introduction

    Some questions well answer:

    1. Object APIs are built using RESTful APIs - what does that mean?2. What are the commands Object APIs support?

    Are there extensions?3. What does an object command look like?

    How do I know if my client request succeeded?4. What is object data? 5. What is eventual consistency?

    Is it the same for every kind of object storage?6. How do I make my object store secure and protected?

    55

  • Just Enough REST

    REpresentational State Transfer Defined:

    Resource Based Stateless Client Server Cacheable Layered System

    In Practice:

    Simple Interfaces Resources uniquely identified by URIs Relationships are identified by URIs Can access from ANY HTTP client

    Note: There is no REST standard. It is an architectural style with plenty of best practices defined. It is typically composed using standards like http, xml, json, etc.

    56

  • Object ResourcesResource Description Swift S3

    Your Data! Object Object

    Collections of Objects Container Bucket

    Collections of Containers/Buckets in an Organization Unit (Department, Company, Site)

    Account Service (implicit)

    Discoverability - provides listing of configuration Information

    Info n/a

    Location Information - provides URI to access resource directly from the storage server

    Endpoints n/a

    Bucket Sub-resources (features provided with middleware)

    acl, policy, lifecycle, version, replication, tagging, cors, website, ...

    Object Sub-resources n/a acl, torrent 57

  • Object Namespace - Super Simple

    Amazon S3 ServiceSwift Account

    ContainerContainer Container Bucket Bucket Bucket

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Object

    Swift Account Swift Account

    58

  • Object REST OperationsOperation Description Idempotent? Safe?

    GET Return the contents of the resource Y Y

    HEAD Return the metadata for a resource Y Y

    PUT Create or update the contents of the resource Y X

    POST Create, Update or Delete metadata for the resource, or create a subresource

    X X

    DELETE Remove the resource from the system Y X

    COPY Copy an object to another location (Swift only) Y X 59

  • General Format: command uri[?query-string] headers [data]

    Swift URI: http(s)://server:port/api_version/account/container/object

    Example: GET https://192.168.56.101:8080/v1/AUTH_acct/Demo-Container/object1 -H "X-Auth-Token: xxxxxx

    S3 URI: GET http(s)://server:port/bucket/object

    Example GET https://192.168.56.101:8080/s3_test_bucket/object1 -H 'Date: Sat, 06 Feb 2016 19:25:22 +0000'-H 'Authorization: AWS s3key:xxxx'

    Example Command Format

    60

  • And some common response codes...Description Code Client Retry? Common Examples

    Success 20x No effect 200: Success (GET)201: New resource created (PUT)202: Accepted (POST)204: No content (HEAD)

    Client Error 4xx No, will still fail 400: Bad Request (incorrectly formatted request, i.e., non-numeric quota specification)401: Unauthorized (wrong credentials)403: Forbidden (no access to resource)404: Not Found (wrong url)405: Method Not Allowed (PUT to a resource that doesnt support put)

    Server Error 5xx Yes, in most cases

    500: Internal Server Error (system problem - can be transient)503: Service Not Available (often due to loading - internal timeout)

    S3 Details: http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.htmlSwift Details: http://developer.openstack.org/api-ref-objectstorage-v1.html

    61

  • Some Simple Example Clients/Libraries

    Swift:

    curl boto (python library) poster (firefox browser plugin) swiftclient

    S3:

    curl/s3curl boto poster s3sh s3cmd

    62

    Note that client caching is not common in object libraries/clients today

  • Some Example Requests - Firefox PosterPoster: When you want full control - container HEAD request

    63

  • Some Example Requests - Firefox PosterResults of HEAD request

    64

  • Some Example Requests: GET

    Get a list of containers in a Swift account using swift command line (hiding all command details):

    Using curl:

    Note: We will talk about authentication details later 65

  • Some Example Requests: GET

    Using curl and formatting output as json or xml:

    66

  • Some Example Requests: GETGet a list of all objects in an S3 bucket using boto:

    Output:

    67

  • Additional Object API FeaturesFeature

    Access Control Lists

    Quotas

    Versioned Objects

    Expiring Objects

    Automatic Storage Tiering

    Storage Policies (placement, durability, etc.)

    Upload Multipart Large Objects

    Container Synchronization

    Notification Infrastructure

    Metadata Search 68

  • Object Storage Metadata

    Useful for flexibly organizing data in the flat namespace and enriching data System Metadata on Objects

    Creation time etag (md5sum of object contents)

    User Metadata on Objects (and Accounts and Containers in Swift) Attribute/Value pair passed as a header in PUT or POST request Objects: new metadata overwrites all previous metadata for that object Accounts & containers (Swift only): new metadata is added to existing metadata

    Coming Soon: Metadata Search

    69

  • Object Storage Metadata - An Example

    bill_selfie.jpg

    System Metadata:Content-Length: 68351Content-Type: image/jpegEtag: 1f32161a3c3baefb9a548a72daffa7abX-Timestamp: 1455144452.21496X-Object-Meta-Mtime: 1455144440.207139

    User (Client) Metadata: X-Object-Meta-Brightness: 10.5 X-Object-Meta-Latitude: 117.2303 X-Object-Meta-Longitude: 33.03279 X-Object-Meta-Altitude: 2322.16 X-Object-Meta-Aperture: 2.275 X-Object-Meta-Camera-Model: iphone6.0.1.3

    Metadata can be as valuable as the data itself!70

  • Eventual Consistency - CAP Theorem

    CAP Theorem: Pick Any 2 1. Consistency

    2. Availability

    3. Partition tolerance

    Object Storage systems are typically AP Consistency is eventual No standard

    Note that POSIX-based distributed file system would require CP...

    71

  • Eventual Consistency: I/O Characteristics

    Typically no locking Object operations are atomic

    The entire object must be written successfully to be committed Reads will always return a consistent object or no object

    Range reads supported - not range writes This is an artifact of HTTP GET/PUT, and derives from consistency model

    Last writer (creator) wins: For concurrent creates of the same object, the one with latest timestamp wins

    Container/Bucket listings may be updated asynchronously

    72

  • Eventual Consistency Consistency is a characteristic of object store implementation

    No standard Different products, different architectures = different consistency models

    When writing an object: container listing will not show object until container updates are completed

    When deleting an object: object may continue to to appear in container listing until container updates are completed

    When replacing an object: reads may return existing version until new version is propagated across entire system [1]

    [1] If storage backend is strongly consistent (like a parallel file system), the new or updated object is available to all nodes as soon as the write is committed.

    73

  • Object Storage Architectures

    Community Swift: Object PUTs 3x replicated

    Majority of writes must succeed for success status

    Consistency daemons ensure that failed replicas are eventually written

    Reads try each replica sequentially until success

    Account & Container listings updated asynchronously 74

  • Object Storage Architectures

    Swift with clustered file system storage Object storage writes single replica The file system is responsible for

    data replication Account and Container listings

    updated asynchronously Reads always go to single replica

    Clustered File System

    Object Node Object Node Object Node

    75

  • Object Security

    Production Object Storage systems typically interface with a dedicated identity service like OpenStack Keystone

    Simpler schemes can be used for proof of concept (Swift tempauth)

    Authentication: does the user in a request have a valid password or security token?

    Authentication service may integrate with enterprise directory service using LDAP or Microsoft Active Directory

    Authorization: does the user in a request have permission to execute that request?

    76

  • Authentication/Authorization Example using OpenStack Swift

    A client want to upload an object to a container in project MYACCOUNT:

    1. The client sends credentials to Keystone identity service 2. Keystone verifies credentials, creates a new token and returns it to client3. Token contains authorization information:

    a. Endpoint catalog (a list of available services)b. Projects the requesting user is assigned toc. Role for that projectd. Token expiration time

    4. The client sends the upload request (including the token) to object storage service5. Object storage service verifies the token with Keystone (or with a cached copy of the

    token)6. If the client has a valid role for MYACCOUNT, the upload request is implemented

    77

  • Object Security

    Secure Data In Flight SSL can be enabled from client to identity service, and to object

    storage service Load balancer can also provide SSL termination

    Secure Data at Rest Data encryption can be provided by object storage software or by

    storage backend (or by client) Not all data needs to be encrypted - enable encryption on a bucket or

    container basis Consider maturity of encryption implementation External key manager vs. integral key encoding

    78

  • Object Data Protection

    Object Storage data protection typically implemented with 3x replication - local or geo-dispersed Erasure coding - local or geo-dispersed

    Either approach can be implemented by Object Storage software or delegated to storage backend

    How to protect against user error? Or application bugs? Backups and snapshots still have their place Snapshot and/or backup critical portions of your data Easy to select by container but can also select by metadata values

    79

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    BREAK

    80

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    81

  • 82

    OpenStack Open Source IAAS platform & global collaboration

    Mission: Create an ubiquitous open source cloud computing platform that

    is simple to implement and massively scalable

    Scalable Massive scale Design Goals 1 Million physical machines, 60 Million VMs Billions of objects stored

    Controlled by the OpenStack Foundation IBM is proud to be a Platinum Sponsor

    Open All code is Apache 2 licensed

    Simple Architecture is modular

    Composed of multiple projects around the four capabilities Compute Network Storage Shared services

    Mar2013

    Oct2014859

    Contributors8,500 Members

    2556Contributors16,000+ Members

    Exponential growth in ~1 YR

    82

  • 83

  • History of OpenStack Swift

    Early OpenStack History: http://www.tiki-toki.com/timeline/entry/138134/OpenStack-History/

    Date Release Description

    Aug 2009 n/a Swift development started by Rackspace

    Jul 2010 n/a OpenStack launches with 25 member companies

    Oct 2010 Swift 1.1.0 (Austin) First OpenStack release includes Swift & Nova

    Jun 2012 Swift 1.6.0 (Essex) Integration with Keystone

    Jun 2014 Swift 2.0.0 (Icehouse) Add Swift Storage Policy support

    Jan 2016 Swift 2.6.0 (Liberty) Current release

    84

  • History of OpenStack Swift

    As of June 2015:Over 300 PB of Swift storage deployed

    85

  • Swift API and SemanticsOpenStack Swift is two parts: API specification & middleware description Object storage implementation

    Two choices for object storage implementation: Native Swift

    Can be extended, but core is Swift API Emulation

    Can never be 100% compatible Especially difficult to emulate middleware

    API & Middleware Links: http://developer.openstack.org/api-ref-objectstorage-v1.html http://docs.openstack.org/developer/swift/middleware.html 86

    http://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://developer.openstack.org/api-ref-objectstorage-v1.htmlhttp://docs.openstack.org/developer/swift/middleware.htmlhttp://docs.openstack.org/developer/swift/middleware.html

  • High-Level on OpenStack Swift

    Load balancer (e.g., HAProxy) to balance requests

    Each request stateless

    Proxy Nodes (public face) authorizes and forwards to appropriate storage server(s) using ring.

    Storage Nodes (account, container and object) store, serve and manage data and metadata partitioned based upon ring

    Object mapping and layout Objects mapped to partitions by hash on fully

    qualified object name Partitions mapped to Virtual Devices using

    consistent hashing ring

    87

    Keystone Authentication Service (public face) Authenticates credentials and provides an access token for future requests. Users can be defined locally or in external LDAP or AD system. Also defines user roles for accounts / projects.

    Additional Swift Services Maintain eventual consistency in the distributed object storage environment. Account, container & object updaters, replicators, auditors, reaper.

  • Proxy Server Architecture

    88

    Process All User Requests Requests & responses pass through wsgi pipeline Community and custom middleware Requests delegated to controller module Controller forwards requests

    account, container or object server Responses are received by controller & passed to

    the client

    Proxy Serverwsgi pipeline

    account container

    Controllers

    object

  • Storage Server ArchitectureObject Server Reads and writes object files onto storage Pipeline for community or custom middleware Pluggable backend interface

    Diskfile controls objects layout on filesystem SwiftOnFile diskfile provides file access to object data

    Account and Container Servers Manage listing db for each account and container Pipeline for community or custom middleware Pluggable backend interface

    Specified but no community implementations Could allow the use of directory listings instead of

    account and container dbs for SwiftonFile layout

    Object Server

    diskfile

    Pluggable Backend

    wsgi pipeline

    Account Server

    Pluggable Backend

    wsgi pipeline

    Container Server

    Pluggable Backend

    wsgi pipeline

    89

  • Anatomy of an Object Write: Client Gets a Token

    90

    1. Client sends token request to Keystone with credentials

    2. Keystone authenticates credentials using local or external identity server

    3. If credentials are OK, Keystone returns token to client

    Example:curl -i \ -H "Content-Type: application/json" \ -d @mycreds.json \ http://localhost:5000/v3/auth/tokens

    Clustered File System

    Object Node Object Node Object Node

    Keystone

  • Anatomy of an Object Write: Client Issues PUT request

    91

    1. Client sends PUT request to proxy-server with token, object URI and object data

    2. Client saves the token for use until token expires

    Example:curl I -X PUT H "X-Auth-Token $TOKEN \http://util5:8080/v1/acct/container/newobject \-T vacation.mp4

    Clustered File System

    Object Node Object Node Object Node

    Keystone

  • 1. Proxy-server receives request and each middleware in pipeline looks at and optionally acts on the request

    2. authtoken and keystoneauth middleware authenticate and authorize the request (against data in memcached if possible)

    Anatomy of an Object Write: Proxy Processes PUT request

    92

    Clustered File System

    Object Node Object Node Object Node

    Keystone

  • 1. Proxy-server adds X-timestamp header to the request with current system time

    2. Use ring to determine where each replica of the object is to be placed

    object-server IP virtual device partition object uri hash

    3. Pass PUT request to designated object-server(s)4. Embedded wsgi server manages reading data a

    chunk at a time from client and passing on to object-server

    Example uri: http://util5:8080/v1/acct/container/newobject

    is placed here:192.167.12.22:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/

    Anatomy of an Object Write: Proxy Processes PUT request

    93

    Clustered File System

    Object Node Object Node Object Node

    Keystone

    http://util5:8080/v1/acct/container/newobjecthttp://util5:8080/v1/acct/container/newobject

  • Anatomy of an Object Write: Object Processes PUT request

    94

    1. Object-server receives PUT request and checks that it satisfies object constraints (valid timestamp, object name length within limits, etc.)

    2. create diskfile instance for the new object3. diskfile creates tmp file and begins writing to it4. calculate length & md5sum for the new object as the object is

    written5. when object write is complete, write system

    metadata to the object as file xattrs6. move data to location specified by ring; filename

    .data7. Remove any files older than

    Example tmp file location:$mount/object_fileset/o/z1device42/tmp/tmpVkeXj

    Example object location:$mount/object_fileset/o/z1device42/objects/13540/3bd/d39381ea07419cec19ae196149a943bd/1442395677.59514.data Clustered File System

    Object Node Object Node Object Node

    Keystone

  • Anatomy of an Object Write: Update Container and Return Status

    95

    1. Send request to container server to add new object to container listing

    2. Wait for a short time (2 sec) for container server response.

    3. If container update times out, write data into async_pending directoryNote: object-updater is responsible to update container dbs with async_pending entries

    4. Return status to proxy server, and on to client

    Example async_pending location:$mount/object_fileset/o/z1device42/async_pending/

    Clustered File System

    Object Server Object Server Object Server

    Keystone

    Container Server Container Server Container Server

  • Extending Swift - Diskfile Interface

    Object server diskfile: on disk abstraction layer

    Deployers can implement their own storage interface Specialized classes for Manager, Reader & Writer Example Diskfiles:

    Community (default) Swiftonfile: Redhat, IBM Swift-ceph Seagate-kinetics Isilon In-memory

    Swiftonfile provides native access to object data through the filesystem interface.

    Object Server

    diskfile

    Pluggable Backend

    wsgi pipeline

    96

  • Extending Swift - WSGI Middleware

    API? or Implementation? Web Services Gateway Interface:

    Python standard PEP 3333 Chain together of modules to process requests Used by all OpenStack services

    Middleware: Pluggable modules that can be configured in request pipeline Specified in configuration service configuration file Each middleware module has a chance to process (or change) request coming in And process (or change) response on the way out

    Proxy Serverwsgi pipeline

    account container

    Controllers

    object

    97

  • Proxy Server Middleware

    Proxy Server

    WSGI Pipelinemware-1 proxy-servermware-2 mware-n...

    ControllersOperations

    GET PUT POST HEAD DEL

    Client

    proxy-server.conf

    98

  • Extending Swift - WSGI Middleware

    API? or Implementation?

    authentication & authorization: auth_token, keystoneauth multi-part upload: slo, dlo quotas: account-quotas, container-quotas protocol emulation: swift3, s3token bulk operations: expand archive on upload object versioning container sync rate limiting domain remapping static web & temporary url profiling & monitoring your custom middleware

    http://docs.openstack.org/developer/swift/middleware.html

    Proxy Serverwsgi pipeline

    account container

    Controllers

    object

    99

  • Storage Policies

    Used by Object Server only Allow you to specify:

    Durability levels: 1, 2 or 3x replication Storage backends

    cost vs performance tradeoffs storage features - encryption, compression,

    Grouping of storage nodes including multi-region

    Containers are permanently assigned to policies on creation default or explicit policies can be deprecated - no new containers assigned

    100

  • Geo-Distributed Object ClustersBuilding an Active-Active Multi-Site Storage Cloud

    101

    Global DistributionIngest and Access from Any

    Data Center

    Multi-Site Availability Objects Replicated Across 2

    or more Sites

    FlexibleAsync or Sync Replication

  • Geo-Distributed Object ClustersArchitecture Details

    Disaster Recovery of data center failures - Active-Active storage cloud Binds geo-distributed sites into a extended capacity storage cloud Leverages Swift replication between sites Objects are stored in one or more regions depending on

    Required durability - data copies can be 1 to N (typically max of 3) Required number of supported data center failures

    Objects accessible from ANY site If object not local, system retrieves it from remote region

    Asynchronous or synchronous replication Research on WAN acceleration technologies

    Aspera or TransferSoft are examples

    102

    Region A

    Region C

    Region B

    Data Center 1

    Data Center 2

    Data Center 3

  • Swift AuthenticationPluggable Authentication and Authorization

    Three common flavors, one choice for production environments

    1. Keystone Production ready identity system Models users, roles, projects, domains (v3) & groups (v3) Supports integration with backend LDAP and AD Authtoken (authentication) and keystoneauth (authorization) middleware Authentication through separate Keystone API

    2. tempauth, aka version 1 super simple user credentials & project assignment stored in proxy-server.conf

    3. swauth user credentials & project assignment stored in Swift

    103

  • Swift AuthenticationRole Based Access Control

    Two Swift authorization roles today:

    1. operatora. Can create, update and delete containers and objects in projects where role is assignedb. Can assign ACLs to control other users accessc. operator_roles config value (proxy-server.conf) specify Keystone roles

    2. reseller_admina. can operate on any accountb. reseller_admin config value (proxy-server.conf) specify Keystones roles

    Finer access control with Swift Container ACLs

    104

  • Swift Additional Features

    Quotas on Accounts and Containers Must have reseller_admin role to set Account quotas

    StaticWeb - serve container contents as static web site Versioning

    Current version in current container Older versions in dedicated container Implemented in middleware (as of Swift version 2.4)

    Static and Dynamic Large Objects - multi part upload RateLimit - limit operations on Account and Container Object Expiration

    105

  • Some OpenStack Swift Issues

    Community software hard to install & manage Performance

    Standard Swift daemons scans directory metadata every 30s, decreasing performance of entire system by increasing CPU and disk utilization

    No data caching Upcoming erasure coding can hurt performance for small objects Slow to rebuild

    Inefficient to scale capacity Swift must re-balance partitions to add additional storage, creating potential for out-of-space

    conditions and requiring excessive over provisioning and data movement Lack of enterprise features

    Backup/snapshots/encryption No ILM for tiering or to external storage (Tape) RAS, etc

    106

  • Get Involved! Core Swift community:

    Weekly meetings on IRC Fix bugs, improve tests, improve docs Single process optimizations Container sharding Improved Versioning Encryption Erasure Codes

    swiftonfile: Unified File and object access Bi-weekly meetings on IRC

    swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC

    107

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    108

  • History of Amazon S3 Storage & APIDate Description

    June 2006 Amazon launches Simple Storage Service

    2008 Amazon reports over 29 billion objects hosted by S3

    2010 S3 API support for versioning, bucket policies, notifications, multi-part upload

    2011 S3 API support for server side encryption, multi-object delete & object expiration

    2012 S3 API support for CORS & archiving to glacier

    2013 Amazon reports over 2 trillion objects hosted by S3

    2014 S3 API support for life cycle versioning policies, sigv4, event notification

    2015 S3 API support for cross region replication, infrequent access storage class

    Approx S3 Object Count in S3 (billions)

    109

  • Why Use S3 for On-Premise Storage?

    Run same apps against on premise and cloud storage Repatriate S3 cloud data & applications to reduce cost Rich API and tool set Swift3 middleware provides emulation layer in Swift environment

    But

    Some APIs may not apply on premise: i.e, torrents, payments API is controlled by Amazon with no published extension points On-premise implementations will not be 100% compatible

    110

  • S3 Models features explicitly Middleware not required Each resource/subresource is managed explicitly from the REST API (GET, PUT, DELETE)

    But, how do you get changes into the API spec? 111

  • S3 AuthenticationS3 Requests authenticated using credentials:

    Access Key ID (AWSAccessKeyID) Secret Access Key (AWSSecretKey)

    Two signing algorithms today: AWS Signature V2: Secret Access Key is used to sign request string using AWS Signature V4: Secret Access Key to create signing key (valid for 7 days)

    Each S3 request passes authorization header constructed using on of these algorithms

    Both are tedious to construct - let your client create the signature for you!

    Swift3 middleware today only supports AWS Signature V2.

    112

  • S3 Lifecycle and Bucket PoliciesPolicy Resources to automate and manage of object storage resources

    Lifecycle Policies Expire aged objects or object versions

    Example: Automatically delete versions older than 90 days

    Transition objects to another storage class Example: Move objects from Standard to Glacier after 30 days

    Combining Policies: Example: Move from Standard to Standard_IA to Glacier to Expired

    Bucket policies Another way to control access to bucket resources

    Allow read-only access to anonymous user Require MFA for bucket resources Restrict access to specific client IP addresses

    113

  • S3 Access Control

    S3 ACLs manage access to Buckets and Objects Every Bucket and Object has an ACL subresource

    if no ACL specified on create a default ACL is used giving owner full control

    ACLs consist of Grants, Grantee and Permission up to 100 grants per ACL

    Grantee types: User: user id, user email, Group: Authenticated User, All Users, Log Delivery Group

    Note that these are the only possible groups

    Permissions: Read, Write Read_acp, Write_acp Full_control

    Canned ACLs are predefined ACLs to simplify access control definition114

  • S3 Access Control - PermissionsPermission Granted On a Bucket Granted On a Object

    READ Allows grantee to list objects in a bucket Allows grantee to read object data and its metadata

    WRITE Allows grantee to create, overwrite, and delete any object in a bucket

    Not applicable

    READ_ACP Allows grantee to read a bucket ACL Allows grantee to read object ACL

    WRITE_ACP Allows grantee to write an ACL for applicable bucket Allows grantee to write ACL for applicable object

    FULL_CONTROL Allows grantee READ, WRITE, READ_ACP, and WRITE_ACP permissions on a bucket

    Allows grantee READ, READ_ACP, and WRITE_ACP permissions on an object

    115

  • S3 Access Control - Example Default ACL

    Owner-Canonical-User-IDowner-display-name

    Owner-Canonical-User-ID display-name FULL_CONTROL

    Single grant giving owner full control:

    116

  • S3 Access Control - Example ACL

    Owner-canonical-user-ID display-name FULL_CONTROL

    user1-canonical-user-ID display-name WRITE

    http://acs.amazonaws.com/groups/global/AllUsers READ

    ACL with 2 user and 1 group grants

    117

  • S3 Access Control - Canned ACLsCanned ACL Applies To Permissions added to ACL

    private Bucket & Object Owner gets FULL_CONTROL. No one else has access rights (default).

    public-read Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ access.

    public-read-write Bucket & Object Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. Granting this on a bucket is generally not recommended.

    aws-exec-read Bucket & Object Owner gets FULL_CONTROL. Amazon EC2 gets READ access to GET an Amazon Machine Image (AMI) bundle from Amazon S3.

    authenticated-read Bucket & Object Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access.

    bucket-owner-read Object only** Object owner gets FULL_CONTROL. Bucket owner gets READ access.

    bucket-owner-full-control

    Object only** Both the object owner and the bucket owner get FULL_CONTROL over the object.

    log-delivery-write Bucket only The LogDelivery group gets WRITE and READ_ACP permissions on the bucket.

    ** If you specify this canned ACL when creating a bucket, Amazon S3 ignores it.118

  • S3 Access Control - Limitations

    Object PUTs reset object ACL to default (unless ACL specified in PUT request) If you give another user WRITE access to a bucket you own, they will be the owner

    of any objects they create. You will not have READ access to those objects, and wont be able to see metadata like size You still have WRITE access from Bucket ACL, so you can delete or overwrite them

    Caution: When granting WRITE access at the bucket level There is no object level WRITE access With Bucket WRITE access, I can create or delete objects that you created

    Caution: Be especially careful giving Bucket WRITE access to groups

    119

  • S3 Object Versioning

    Versioning enabled at the Bucket level Objects in these buckets have a current object and 0 or more versions PUT creates a new instance that becomes the current object GET bucket?versions lists all object versions GET bucket?versions&prefix=myobject lists all versions of myobject DELETE inserts a "delete marker" but no objects are removed DELETE myobject?version=1001 permanently deletes object version Undelete by deleting the marker: DELETE myobject?version=9876 GET myobject?version=1001 to retrieve older version

    myobjectid=1001

    myobjectid=1002

    myobjectid=9876

    delete marker

    120

  • Validating the APIceph-s3 tests: Open source compatibility tests for S3 clones Approximately 350 tests Swift3 v1.9 passes approx 75% of tests

    https://github.com/ceph/s3-tests121

  • Comparing Swift and S3 FeaturesFeature Swift S3

    Access Control Lists Container Container & Object, plus policies

    Quotas Account & Container No API support

    Versioned Objects Y (limited functionality) Y

    Expiring Objects Y Y (with lifecycle policies)

    Automatic Storage Tiering Y (based on storage backend) Y (with lifecycle policies)

    Storage Policies (placement, durability, etc.) Y No API support

    Upload Multipart Large Objects SLO & DLO Y

    Container Synchronization Y Y (cross region replication)

    Notification Infrastructure Future SNS, SQS,AWS Lambda (cloud only)

    Metadata Search Future Future? 122

  • Swift & S3 SummarySwift S3

    100% Open Source with active community that is steadily adding features

    Closed source implementation (except Swift3)

    Deployers and customers can influence API and features

    API controlled by single company

    Documented ways that you can extend with middleware and diskfile changes

    No documented extension points

    Vendor extensions can address many of the management issues listed on earlier Swift slide

    No documented extension points

    Large and growing support community Limited options to support S3 on premise deployments

    123

  • Swift & S3 SummarySwift S3

    API and middleware provide feature set Well defined API, with features explicitly modelled

    More complete feature set:ACL and Access Control model, versioning support, notification service

    On premise deployment allows repatriating apps & data from the cloud

    On premise deployment allows repatriating apps & data from the cloud

    Native Swift deployments are 100% compliant.API-only deployments may lack key features, especially middleware.

    On premise vendors have different levels of compliance - each says we support core features but what are those?

    Improving development ecosystem Rich development ecosystem

    124

  • Get Involved with S3 also!

    swift3: Amazon S3 emulation middleware Bi-weekly meetings on IRC S3 versioning Lifecycle policies Bucket Policies

    ceph/s3-tests Improve test coverage Fix compliance bugs in Swift3

    125

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    126

  • Object Storage Challenge...

    The world is not object today!

    (and never will be completely)

    Multi-Protocol Access to the Same Dataset Can Provide Value(S3/Swift/NFS/SMB/POSIX/HDFS)

    127

  • Using File to Access ObjectsPrimary Use Cases

    1. Transition period Use file API as transition to object API

    2. Single Management Pane Manage file and object within single system

    3. Sharing Data Globally Create data via file interface and share globally using object interface

    4. Analysis Many analysis tools are not a good match for object immutability semantics

    5. Connecting NAS clients to object storage Home directories, shared storage from Linux clusters, etc

    128

  • File Access to Objects -NAS Gateways and Accessors

    ...

    Swift/S3

    Gateway

    Accessor

    GW and AccessorUse Cases

    Good for browsing files Ok for migration into

    object store Ok for backup tool

    Optional disk cache

    Caution Cant control users How are users to know

    what works well and what doesnt?

    Scalability issues129

  • File Access to Objects -Gateway and Accessor Vendors

    Panzura NAS front-end to cloud Distributed caching and link to off-

    premise cloud (solution includes disks)

    Avere NAS front-end to cloud

    Maldivica NAS gateway

    Nasuni NAS front-end to cloud

    Riverbed Backup of branch offices

    Ctera Consolidation of branch offices

    Example NAS Gateway Vendors Storage Made Easy

    Sync-and-share Direct integration with Windows explorer,

    Mac Finder Only Swift mobile access app

    Cloudberry Windows only object access Separate application Supports all clouds Has backup apps as well

    Cyberduck/Swift-explorer Separate app for Mac, Windows, Linux

    support to Swift, S3, etc Open-source

    Expandrive Virtual USB drive that allows dropbox to

    most cloud providers

    Example File Accessor Vendors

    130

  • File and Objects AccessIntegrated Solutions

    Several solutions exist that offer File and Objects in a single solution Object Solutions with Integrated NAS Gateway

    Object storage solution that directly integrates a NAS gateway Same advantages and disadvantages as with NAS Gateways This is offered by almost every object storage vendor

    Full Integration of File and Object support NAS support is just as good as a native NAS storage solution Object support is just as good as a native object storage solution This can include separate or the same datasets Examples include IBM Spectrum Scale (GPFS) and Red Hat GlusterFS

    131

  • File and Object Access To the Same DataWhat Should It Look Like?

    Research challenge: Dream of Full Simultaneous Access How to achieve a unified user namespace? Possible to achieve behavior similar to NFSv4+SMB3?

    Should File see file semantics, and Object see object semantics? For workflows, this works quite well

    e.g., Ingest through file, read through object e.g., ingest through object, analyze and update, read results through object

    Its All Semantics Eventual semantics vs file semantics

    Objects are allowed to just disappear...how would File deal with that Buckets/Containers are supposed to scale without limit...but directories typically do not Objects do not respect locks, but how does this fit with file?

    Should object protocols wait on a lock? How would Object deal with the delay? How in sync do the namespaces need to be? Across sites, maintaining strong File Semantics is a challenge Separate security, e.g., ACLs, authentication servers, interpreting LDAP/AD users

    Do we need a new set of semantics?132

  • A Way Forward: Swift-On-File

    A OpenStack Swift Per-Bucket/Container Storage Policy Stores objects on any cluster/parallel file system Objects created using Object API can be accessed as files and vice-versa

    Newly created files immediately accessible via Swift/S3 Newly created objects are immediately available for editing

    Challenges it overcomes Harden object visibility semantics to ensure read after write

    Object namespace eventually consistent Object data is strongly consistent Common LDAP/AD user database for both file and object

    Maintaining both file attributes on new Object PUT Currently working on further integrating ACLs, metadata and xattrs, etc

    Leverages File System data protection Part of IBM Spectrum Scale 4.2 and experimental with Redhat GlusterFS

    Swift code available at https://github.com/openstack/swiftonfile133

  • Co-Existence of Traditional and Swift-On-File

    ObjectRing 1

    ObjectRing 2

    Proxy Tier

    Traditional Swift Storage policy

    Swift on FileStorage policy

    Object storage path:

    -rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/sdb1/2/node/sdb2/objects/981/f79/d7b843bf79/1401254393.89313.data

    File System storage path:

    -rwxr-xr-x 1 swift swift 29 Aug 22 09:25/mnt/fs/container/object1

    Swift/S3 user

    Spec

    tru

    m S

    cale

    11

    134

  • File in Object:

    http://swift.example.com/v1/acct/cont/obj

    Object in File:

    /mnt/fs/acct/cont/obj

    135

  • Analytics for File and Object Analytics on File is well established Is Object storage storing Big Data or Dead Data? If data cannot be analyzed, might as well use Tape

    Tape is still much cheaper Running directly through Swift/S3 API limits functionality

    Hive and HBase (among others) lack efficient support due to file append requirement

    Plus many more...

    Load Imbalance Due to Inefficient Data

    Distribution

    Large Data Movement on

    Name Changes

    HTTP slower than RPC

    Multiple Network Hops When Writing Data

    Loss of Data Locality 136

  • 1. Use object storage solution HDFS APIsMileage will varyPerformance results specific analytics framework

    2. SparkTargeted towards in-memory analyticsLower demands on storage depending on application

    3. Analytics Tool + TachyonTachyon creates an in-memory distributed storage system

    Not yet for production...Can lower demands on storage solution

    4. Use File + Object solutionRealize native file performance

    Analytic Possibilities On Object StorageNo Single Solution

    137

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    138

  • Between File and Object...

    So are NFSv4, S3, and Swift really all needed?

    139

  • Gross Generalization of Target Workloads

    Backup (write mostly) Immutable object storage

    Backup => write mostly Distribution/streaming =>read mostly

    Archive (write mostly) Rarely accessed data, but when needed, it

    must be retrieved quickly

    ***Note that this is what Object is today, not necessarily where it will be tomorrow

    It can do object workloads and much more... User data and home directories Applications with small to medium

    performance and scalability requirements Analytics

    ***Note that NFS (without pNFS) is still not ideal for scientific applications that require high-throughput data access from medium to large compute clusters

    FileObject

    140

  • Applications

    Converse in whole objects Simple API that doesnt have complicated concepts like

    hard links, crash consistency operations, etc Many short-lived TCP connections

    Adds latency but increases parallelism Must tolerate eventual consistency

    Must be willing to retry Objects could temporarily disappear... But highly available...

    Simple hierarchy makes objects hard to find Many vendors disable even listing containers/buckets Many apps keep separate database

    Must tolerate low bandwidth/high latency This is today, so could change in future

    Converse in bytes, files, inodes, file descriptors Complicated yet now familiar

    Single long-lived TCP connection It's a benefit, but 1 TCP conn. not good in WAN

    Stronger consistency, but that makes it confusing Must be aware of scaling issues

    E.g., too many objects in a single directory Data sharing has shortcomings

    Locking typically only advisory and creates delays during failure (due to state)

    High performance, but NFS has inherent load imbalances without pNFS

    141

  • Ease of Access

    Access data from anywhere on the globe Very thin client with no optimizations Mobile integration

    iPhone includes S3 client More and more applications supporting native

    object access To ease user transition, several startups have

    file-based viewers for Mac/Windows/Linux Storage Made Easy, Cloudberry, Cyberduck, etc

    Several S3/Swift mobile apps exist as well Storage Made Easy among many others

    Use curl and build your own HTTP request

    NFS clients available in all OSs for laptops, desktops, servers

    But not to mobile devices Most applications today natively support POSIX

    142

  • Data Protection - What Can Go Wrong...

    Coordinated H/W failures

    Server Failure

    Disk Failure/Corruption

    Rack Failure

    Data Center Failure

    Accidental User Error

    Data Transfer Corruption between storage client to storage device

    Storage software bugs

    143

  • Data Protection

    Object vendors writing SW from scratch Very new Support 3-way replication and erasure coding

    Object vendors currently focused on being the backup, not backing up its data

    Little attention to backup More focus on DR support

    Beware the snake oil salesman Triplication and erasure coding does not prevent data loss

    Versioning No ability to capture entire dataset

    NAS vendors support a wide variety of storage systems

    software based controller based with specialize H/W controller based with commodity H/W

    Backup and DR support widely available Snapshots widely available

    144

  • Security

    Typically provide multi-tenancy at the level of authentication of users

    No client software required Few if any provide data isolation

    Encryption becoming more common Each protocol has its own ACL format and

    granularity HTTP-based token mechanisms work

    nicely for web and global access Privacy through HTTPS

    Variety of authentication mechanisms Kerberos now standard, and supports multi-tenancy, but

    requires client-side support Typically used in LAN, but can work in WAN Rich ACL format Data transfer encryption supported True multi-tenancy (network and data

    isolation) available from some vendors

    145

  • Cost and Features

    Current solutions consist sold as SW-only SW+commodity H/W

    Currently priced low to what market will bear) OpenStack Swift is *free* (Minus blood, sweat, and tears)

    Typically simply storing data at this point Analytics support mostly in name only

    Relatively easy to manage Only applies to supported vendor solution Note this correlates with fewer features

    Cost can vary widely Roll your own SW-only SW+commodity H/W SW+specialized H/W

    Many have tape support Viable analytics support Enterprise vendors support multi-protocol access Block-storage support for VMs

    Can support entire OpenStack storage ecosystem VMWare, Hyper-V

    146

  • Each Protocol Has Purpose and Real Value

    S3

    Swift

    NFS

    147

  • Require POSIX?

    File

    Proprietary Applications In-place updates File append Locking Strong Consistency

    Unique to FileS3

    Swift

    NFS

    148

  • Require Mobile or Cloud?

    Object

    Smartphone/tablet access Cloud-friendly security Cloud-friendly tools

    Unique to ObjectS3

    Swift

    NFS

    149

  • The Overlap...today

    S3

    Swift

    NFS

    Chances are you have applications that fit in the middle as well Today, stark differences exist between vendors, choice relatively easy

    Object vendors by and large have lower cost/capacity Targeting backup/archive market

    NAS vendors by and large have higher performance and are feature-rich

    150

  • The Overlap...tomorrow

    S3

    Swift

    NFS

    Remember that NFS/Swift/S3 are simply protocols to access data Nothing in the Swift or S3 limits performance or future possible features Most enterprise and advanced features are independent of protocol

    Object vendors are busy working their way up the application chain Even in-place updates can be mitigated to some degree

    Many videos are stored frame by frame, with each one updated in their entirety With small files, updating entire file isnt a big deal

    E.g., IoT

    With better integration, maybe you wont have to decide :)

    151

  • Metadata Search It is hard to find data in both File and Object

    A key issue with Objects flat namespace is finding dataEven File can become difficult with billions of files

    Scalable search becoming required to realize value of dataFind needles in unstructured haystacks

    Goal is to dynamically index objects/files Create structure of well known system and user attributes Tags and attributes automatically added to database

    Useful for both users and administratorsUsers search based upon their tagsAdministrators search based upon system attributes

    E.g., account_last_activity_time, container_read_permissions, object_content_type

    Rest-based search API IBM has built open-source solution with OpenStack Swift

    using RabbitMQ and ElasticSearch

    FIND ITTAG IT

    General Use Cases Data Mining Data Warehousing Selective data retrieval, data backup,

    data archival, data migration Management/Reporting

    152

  • File vs. Object Summary

    So it's not cut and dry File is very mature, but can be complicated Object is very immature, but all disruptive technologies are

    The real question is how much of the NAS pie will Object eat?

    153

  • Introduction

    File and Object Discussion

    NFS

    Object Storage Introduction

    Swift

    S3

    File to Object and Object to File

    Comparison Discussion

    Conclusion

    Outline

    154

  • struct CLOSE4args {/* CURRENT_FH: object */; seqid4 seqid; stateid4 open_stateid;};

    Whew...that was a lot of info The 5 Ws of File and Object NFS, Swift, S3 Industry File and Object Solutions

    There are few easy decisions There are some now, but it's getting harder as object vendors mature

    NFS A long history...but lets work together to advance the technology Check out NFSv4.2 and help to make it the new default!

    Swift/S3 on-premise are still emerging as standards Object access will become an essential data access mechanism for ALL data

    Get Involved! Swift and NFS have active open source communities

    155

  • More InformationNFSv4 IETF working group

    https://datatracker.ietf.org/wg/nfsv4NFSv4 RFC

    http://www.ietf.org/rfc/rfc3530.txtNFSv4.1 RFC

    http://www.ietf.org/rfc/rfc5661.txtNFSv4.2 RFC Draft

    https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41Ganesha

    http://nfs-ganesha.sourceforge.netSNIA white papers & tutorials on NFS

    https://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevance http://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdf http://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdf http://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdf

    Original pNFS paper - Exporting Storage Systems in a Scalable Manner with pNFS, MSST05 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdf

    NFS XATTR Draft https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02

    156

    https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4https://datatracker.ietf.org/wg/nfsv4http://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttp://www.ietf.org/rfc/rfc5661.txthttps://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41http://nfs-ganesha.sourceforge.nethttp://nfs-ganesha.sourceforge.nethttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttps://www.brighttalk.com/search?duration=0..&keywords[]=nfs&q=snia&rank=webcast_relevancehttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/SNIA_An_Overview_of_NFSv4-3_0.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/Migrating_to_NFSv4_v04_-Final.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://www.snia.org/sites/default/files/ChuckLever_Introducing_FedFS_On_Linux.pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.3177&rep=rep1&type=pdfhttps://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02https://tools.ietf.org/html/draft-naik-nfsv4-xattrs-02

  • More InformationNFS FAQ

    http://nfs.sourceforge.net/Virtual Machine Workloads: The Case for New Benchmarks for NAS, FAST13

    https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfNewer Is Sometimes Better: An Evaluation of NFSv4.1., SIGMETRICS15

    https://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI14

    http://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfBoosting the Power of Swift Using Metadata Search

    https://www.youtube.com/watch?v=_bODZWvIprYFrom Archive to Insight: Debunking Myths of Analytics on Object Stores

    https://www.youtube.com/watch?v=brhEUptD3JQSwift 101: Technology and Architecture for Beginners

    https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginners

    Building Applications with Swift: The Swift Developer On-Ramp https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-

    developer-on-ramp

    157

    http://nfs.sourceforge.net/http://nfs.sourceforge.net/https://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.usenix.org/system/files/conference/fast13/fast13-final84.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttps://www.fsl.cs.sunysb.edu/docs/nfs4perf/nfs4perf-sigm15.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttp://research.cs.wisc.edu/wind/Publications/alice-osdi14.pdfhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=_bODZWvIprYhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.youtube.com/watch?v=brhEUptD3JQhttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/swift-101-technology-and-architecture-for-beginnershttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramphttps://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/building-applications-with-swift-the-swift-developer-on-ramp

  • More InformationBuilding web-applications using OpenStack Swift

    https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swift

    SwiftOnFile Project https://github.com/openstack/swiftonfile

    Swift3 Project https://github.com/openstack/swift3

    ceph/s3-tests Project https://github.com/ceph/s3-tests

    158

    https://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://www.openstack.org/summit/tokyo-2015/videos/presentation/building-web-applications-using-openstack-swifthttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swiftonfilehttps://github.com/openstack/swift3https://github.com/openstack/swift3https://github.com/ceph/s3-testshttps://github.com/ceph/s3-tests

  • BACKUP

    159

  • What is Object Storage?

    Multi-Site Cloud

    Storage

    Multi-Tenancy

    Simpler management

    and flatter namespace

    Simple APIs and Semantics

    (Swift/S3 and Whole File Updates)

    Scalable Metadata Access

    Scalable and Highly-AvailableVersioning

    Ubiquitous Access

    160

  • Data ProtectionIn The Context of What Can Actually Go Wrong(and not what is only likely to go wrong)

    Per-object Auditing common Low coverage Disk Failure/Corruption

    Per file or block Auditing is vendor specific Typically high coverage

    Erasure coding

    Erasure Coding or Triplication Server Failure High-end supports Erasure CodingLow-end has no support

    Erasure Coding or Triplication Rack Failure High-end supports Erasure Coding

    Erasure Coding or ReplicationScalability can be a concern... Data Center Failure

    High-end supports ReplicationAt file or block level

    Per Object Versioning S3 supports undelete User Error

    Snapshots - Dataset ConsistentBackup

    End-to-end checksums vendor specific Data Transfer Corruption End-to-end checksums vendor specificBackup

    Typically lack scalable backup Storage Software Bugs Backup

    Typically lack scalable backup Coordinated H/W failures Backup

    161

  • File and Object Security ComparisonStandard APIs, both standard and custom implementations

    Designed for Global Access

    Userid/password or certificate

    Many support an enterprise directory service, ldap/ad

    Authentication

    Standard (Kerberos)

    Typically not globally accessible

    Userid/password or certificate

    Many support an enterprise directory service, ldap/ad

    ACLs (of varying granularity) Authorization NFSv4 and Posix ACLs

    HTTPS Data privacy Kerberos ipsec

    Typically software-based separation

    Shared servers and storage for everyone

    Multi-Tenancy

    Typically software-based separation

    High-end vendors can provide physical separation as well

    162

  • S3 Authentication Signing V2 (Backup)

    Access Key ID (AWSAccessKeyID)Secret Access Key (AWSSecretKey)

    signature = Base64( HMAC-SHA1( AWSSecretKey, UTF-8-Encoding-Of( StringToSign )))

    StringToSign = HTTP-Verb + "\n" + Content-MD5 + "\n" + Content-Type + "\n" + Date + "\n" + CanonicalizedAmzHeaders + CanonicalizedResource

    -H Authorization: AWS awsaccesskeyid:signature

    Authorization Header

    http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html163

  • S3 Authentication Signature V4 (backup)Access Key ID (AWSAccessKeyID)

    Secret Access Key (AWSSecretKey)

    -H Authorization: AWS4-HMAC-SHA256 Credential=awsaccesskeyid/20160220/us-east-1/s3/aws4_request, SignedHeaders=host;range;x-amz-date, Signature=signature

    http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html

    Authorization Header

    164