12
September 9-10, 2019 Library of Congress Storage Environment Update 2019 Carl Watts Information Technology Specialist IT Services Operations / Operations and Maintenance / Unix Systems 1 September 2019

Library of Congress Storage Environment · 2019. 11. 27. · IT Services Operations / Operations and Maintenance / Unix Systems September 2019 1. Converged Storage Tiers (old) September

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • September 9-10, 2019

    Library of Congress Storage EnvironmentUpdate 2019

    Carl WattsInformation Technology SpecialistIT Services Operations / Operations and Maintenance / Unix Systems

    1September 2019

  • Converged Storage Tiers (old)

    2September 2019

    Tier 0

    Tier 1

    Tier 2

    Tier 3

    Tier 4

    High CapacityLow Cost

    Shared StorageProcessing Space

    Archival Cache Space

    VM OSVM Apps

    DatabaseApplication Data

    User Data

    Oracle HSMOracle SL8500

    Oracle T10000D

    Hierarchical Storage Management

    IBM Spectrum ArchiverIBM TS3500 and TS4500IBM TS1140 and TS1155

    Back-up Environment

    Disk-to-DiskTape

    IBM TS1140IBM LTO7

    Applications:CommVaultSymantec

  • Converged Data Center

    3September 2019

    DS5 (AWS)

    DC5(Azure)

    DC2

    DC4On-prem

    Object Store

    On-prem Object Store

    DC3Long-term

    Storage

    DC2Long-term

    Storage

    Primary

    Secondary

    DC1DevOps

    DC = Data Center

    DS5 (GCS)

  • Content Growth – Preservation

    September 2019 4

    Unique File Count:536M Total Files

    6,856.40

    8,273.92

    10,955.96

    13,956.69

    16,448.03

    19,504.30

    -

    5,000.00

    10,000.00

    15,000.00

    20,000.00

    25,000.00

    2014 2015 2016 2017 2018 2019

    Longterm Storage (single copy in TiB)

    1,417.52

    2,682.04

    3,000.73

    2,491.34

    3,056.27

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    2014 2015 2016 2017 2018 2019

    Annual Growth (in TiB)

  • Content Growth – Preservation

    September 2019 5

    14,741

    17,789

    23,555

    30,007

    35,363

    41,934

    -

    5,000

    10,000

    15,000

    20,000

    25,000

    30,000

    35,000

    40,000

    45,000

    2014 2015 2016 2017 2018 2019

    OVERALL LONG-TERM STORAGE (ALL COPIES IN TIB)

    About 18% annual growth

  • Content Growth – Presentation

    September 2019 6

    Unique File Count:374M Total Files

    236.40

    546.60

    1,095.10

    1,620.70

    2,086.30

    2,624.99

    3,069.88

    -

    500.00

    1,000.00

    1,500.00

    2,000.00

    2,500.00

    3,000.00

    3,500.00

    2013 2014 2015 2016 2017 2018 2019

    Access Storage (in TiB)

    236.40

    310.20

    548.50 525.60

    465.60

    538.69

    444.88

    -

    100.00

    200.00

    300.00

    400.00

    500.00

    600.00

    2013 2014 2015 2016 2017 2018 2019

    Annual Terabyte Growth

    About 18% annual growth

  • Migrations Continue

    7September 2019

    Data Migrations and propagation are now at a constant churnCompleted the Consolidation of Preservation Storage

    Combined resource of three data center to two to reduce cost

    Preparing to Propagate Data Center 2 to Data Center 4 Preparing to replicate data to new data center once high-bandwidth network is established

    Migrating Tape Technology Migrated one copy of IBM TS1140 tape to TS1155 tape (2019)

    Propagating Access Storage to AWS Completed 24 of 48 AWS Snowball transfers (over 1.4PB moved in two months) Project is ongoing and should be completed in six weeks

    Propagating Preservation Storage to AWS and Azure Project to start early December to propagate all preservation content to AWS via

    Snowmobile and Azure via Data Box Heavies

  • Building out Content Abstraction Layer

    8September 2019

    Installation of StrongLink which will become the Content Abstraction Layer

    Content Abstraction Layer (CAL) will provide: Provide a persistent namespace and access method to data Management of the curated data Manage the file fixity and fixity checking Manage the automation of content processing Manage the movement / orchestration of data across multiple

    Systems Data centers Cloud providers External entities

    Manage the data migration between old and new storage platforms

  • Adding On-Prem Cloud Type Storage (STaaS)

    • Acquiring a Storage-as-a-Service (STaaS)• Replaces:

    • active archive Oracle HSM and Spectrum Archive

    • access storage Spectrum Scale

    • Cost is equal to cloud storage vendor cool/cold (about $2/TB per month)

    • Access is about the same current NAS and better then HSM products in use

    • No egress fees for access

    • Managed solutions, which lowers staff administration requirements

    September 2019 9

  • Content Storage

    11September 2019

    Content is equal to single copy of a digital object and its associated derivative(s)

    Preservation Copies (currently) Standard Collections – two (2) copies distributed across two (2) datacenters Special Collections – two (2) different platforms holding two (2) copies

    distributed across two (2) datacenters

    Presentation Copies Currently single online copy Near future – two (2) copies across (2) datacenters Future – multiple copies across datacenters and “cloud” providers

  • 12September 2018

    Quad ‘P’ Dataflow (Proposed)

    Procure Preserve Process Present System Backup

    Wo

    rkfl

    ow

    En

    gin

    e(s

    )

    esubmit.loc.gov(external push)

    Media Shuttle(push/pull)

    CTS via ingest servers

    Fetcher(internal pull)

    Transitory Storage

    Pool

    Transitory StoragePools

    Transitory StoragePools

    Delivered Content

    (portable HD)

    Transitory StoragePools

    Client

    sFTP

    Web Site

    In House Digitization

    Processing VM

    Transitory StoragePools

    Client

    On-Prem Object Storage(Storage-as-a-Service)

    Processing StoragePools

    Processing VMs

    CDN

    Web Capture

    ChronAmer.

    Web Server(s)

    Web Server(s)

    Web Server(s)

    Web Server(s)

    Other

    DMS Workflow

    PCWA

    House Video Encoders Transitory

    StoragePools

    House Recording Studio

    Content Abstraction Layer

    Long-term Storage(Large File and Special Collections)

    Tape Tech

    Off-Site Cloud Storage(DC5)

    [AWS, Azure, Google, other…)

    Off-Site Cold Cloud Storage(DC5)

    [AWS, Azure, Google, other…)

    Policy Management

    Object Discovery & Classification

    Quota Management

    Storage Analytics

    Public Datasets Cloud Storage

    (DC5)[AWS, Azure, Google, other…)

    Shared Datasets [Agency, Academia,

    other...)

    Object Audit

    Workflow Engine

    Data Tiering

    sFTP

    NFS S3

    SM

    B/C

    IFS

    HTTP

    S

    REST

    Data Validation

    and Verification

    eCO NAS

    eCO Submitter

    Server

    VMs

    DB

    BackupServer

  • 13September 2019

    Data Center 1 StorageData Center 2 Storage

    Data Center 3 Storage Data Center 4 Storage

    DC5

    Cloud Provider A

    DC5

    Cloud Provider BDC5

    Cloud Provide ...

    Web Services EnvironmentBack-up Environment

    Preservation Systems

    Procurement Systems

    Processing Systems

    Content Abstraction Layer