Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
September 9-10, 2019
Library of Congress Storage EnvironmentUpdate 2019
Carl WattsInformation Technology SpecialistIT Services Operations / Operations and Maintenance / Unix Systems
1September 2019
Converged Storage Tiers (old)
2September 2019
Tier 0
Tier 1
Tier 2
Tier 3
Tier 4
High CapacityLow Cost
Shared StorageProcessing Space
Archival Cache Space
VM OSVM Apps
DatabaseApplication Data
User Data
Oracle HSMOracle SL8500
Oracle T10000D
Hierarchical Storage Management
IBM Spectrum ArchiverIBM TS3500 and TS4500IBM TS1140 and TS1155
Back-up Environment
Disk-to-DiskTape
IBM TS1140IBM LTO7
Applications:CommVaultSymantec
Converged Data Center
3September 2019
DS5 (AWS)
DC5(Azure)
DC2
DC4On-prem
Object Store
On-prem Object Store
DC3Long-term
Storage
DC2Long-term
Storage
Primary
Secondary
DC1DevOps
DC = Data Center
DS5 (GCS)
Content Growth – Preservation
September 2019 4
Unique File Count:536M Total Files
6,856.40
8,273.92
10,955.96
13,956.69
16,448.03
19,504.30
-
5,000.00
10,000.00
15,000.00
20,000.00
25,000.00
2014 2015 2016 2017 2018 2019
Longterm Storage (single copy in TiB)
1,417.52
2,682.04
3,000.73
2,491.34
3,056.27
0
500
1000
1500
2000
2500
3000
3500
2014 2015 2016 2017 2018 2019
Annual Growth (in TiB)
Content Growth – Preservation
September 2019 5
14,741
17,789
23,555
30,007
35,363
41,934
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
2014 2015 2016 2017 2018 2019
OVERALL LONG-TERM STORAGE (ALL COPIES IN TIB)
About 18% annual growth
Content Growth – Presentation
September 2019 6
Unique File Count:374M Total Files
236.40
546.60
1,095.10
1,620.70
2,086.30
2,624.99
3,069.88
-
500.00
1,000.00
1,500.00
2,000.00
2,500.00
3,000.00
3,500.00
2013 2014 2015 2016 2017 2018 2019
Access Storage (in TiB)
236.40
310.20
548.50 525.60
465.60
538.69
444.88
-
100.00
200.00
300.00
400.00
500.00
600.00
2013 2014 2015 2016 2017 2018 2019
Annual Terabyte Growth
About 18% annual growth
Migrations Continue
7September 2019
Data Migrations and propagation are now at a constant churnCompleted the Consolidation of Preservation Storage
Combined resource of three data center to two to reduce cost
Preparing to Propagate Data Center 2 to Data Center 4 Preparing to replicate data to new data center once high-bandwidth network is established
Migrating Tape Technology Migrated one copy of IBM TS1140 tape to TS1155 tape (2019)
Propagating Access Storage to AWS Completed 24 of 48 AWS Snowball transfers (over 1.4PB moved in two months) Project is ongoing and should be completed in six weeks
Propagating Preservation Storage to AWS and Azure Project to start early December to propagate all preservation content to AWS via
Snowmobile and Azure via Data Box Heavies
Building out Content Abstraction Layer
8September 2019
Installation of StrongLink which will become the Content Abstraction Layer
Content Abstraction Layer (CAL) will provide: Provide a persistent namespace and access method to data Management of the curated data Manage the file fixity and fixity checking Manage the automation of content processing Manage the movement / orchestration of data across multiple
Systems Data centers Cloud providers External entities
Manage the data migration between old and new storage platforms
Adding On-Prem Cloud Type Storage (STaaS)
• Acquiring a Storage-as-a-Service (STaaS)• Replaces:
• active archive Oracle HSM and Spectrum Archive
• access storage Spectrum Scale
• Cost is equal to cloud storage vendor cool/cold (about $2/TB per month)
• Access is about the same current NAS and better then HSM products in use
• No egress fees for access
• Managed solutions, which lowers staff administration requirements
September 2019 9
Content Storage
11September 2019
Content is equal to single copy of a digital object and its associated derivative(s)
Preservation Copies (currently) Standard Collections – two (2) copies distributed across two (2) datacenters Special Collections – two (2) different platforms holding two (2) copies
distributed across two (2) datacenters
Presentation Copies Currently single online copy Near future – two (2) copies across (2) datacenters Future – multiple copies across datacenters and “cloud” providers
12September 2018
Quad ‘P’ Dataflow (Proposed)
Procure Preserve Process Present System Backup
Wo
rkfl
ow
En
gin
e(s
)
esubmit.loc.gov(external push)
Media Shuttle(push/pull)
CTS via ingest servers
Fetcher(internal pull)
Transitory Storage
Pool
Transitory StoragePools
Transitory StoragePools
Delivered Content
(portable HD)
Transitory StoragePools
Client
sFTP
Web Site
In House Digitization
Processing VM
Transitory StoragePools
Client
On-Prem Object Storage(Storage-as-a-Service)
Processing StoragePools
Processing VMs
CDN
Web Capture
ChronAmer.
Web Server(s)
Web Server(s)
Web Server(s)
Web Server(s)
Other
DMS Workflow
PCWA
House Video Encoders Transitory
StoragePools
House Recording Studio
Content Abstraction Layer
Long-term Storage(Large File and Special Collections)
Tape Tech
Off-Site Cloud Storage(DC5)
[AWS, Azure, Google, other…)
Off-Site Cold Cloud Storage(DC5)
[AWS, Azure, Google, other…)
Policy Management
Object Discovery & Classification
Quota Management
Storage Analytics
Public Datasets Cloud Storage
(DC5)[AWS, Azure, Google, other…)
Shared Datasets [Agency, Academia,
other...)
Object Audit
Workflow Engine
Data Tiering
sFTP
NFS S3
SM
B/C
IFS
HTTP
S
REST
Data Validation
and Verification
eCO NAS
eCO Submitter
Server
VMs
DB
BackupServer
13September 2019
Data Center 1 StorageData Center 2 Storage
Data Center 3 Storage Data Center 4 Storage
DC5
Cloud Provider A
DC5
Cloud Provider BDC5
Cloud Provide ...
Web Services EnvironmentBack-up Environment
Preservation Systems
Procurement Systems
Processing Systems
Content Abstraction Layer