26
1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey, EROS Data Center Sioux Falls, SD July, 2004 * Work performed under U.S. Geological Survey contract 03CRCN0001

1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

Embed Size (px)

Citation preview

Page 1: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

1

U.S. Department of the Interior

U.S. Geological Survey

Mission Support TeamStorage ArchitecturesMission Support TeamStorage Architectures

Presented ByKen Gacke, SAIC*

U.S. Geological Survey, EROS Data CenterSioux Falls, SD

July, 2004* Work performed under U.S. Geological Survey contract 03CRCN0001

Page 2: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

2

Storage Architecture AgendaStorage Architecture Agenda

Online/Nearline Storage Architecture System Backup Architecture

Onsite Short-term system recovery Offsite Disaster Recovery

Archive Storage Long term data preservation

Page 3: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

3

Storage Architecture AgendaStorage Architecture Agenda

Online/Nearline Storage Architecture System Backup Architecture

Onsite Short-term system recovery Offsite Disaster Recovery

Archive Storage Long term data preservation

Page 4: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

4

Storage ArchitectureStorage Architecture

Online Direct Attached Storage (DAS)

Just a Bunch Of Disk (JBOD): Intermediate processing.

Redundant Array Independent Disk (RAID): Database, Web/ftp, and product generation.

Network Attached Storage (NAS): Office automation.

Storage Area Network (SAN): Clustered File System for High Performance Processing

Nearline: Online disk cache with high performance tape backend

Page 5: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

5

Storage ArchitectureStorage Architecture

EDC’s historical nearline experience: EPOCH, AMASS – (1987 – 1993)

Optical AMASS – (1992 – 2000)

Quantum DLT 2000 UniTree – (1992 – 2001)

StorageTek 3480/3490/D-3/9840 DMF, AMASS, LAM (2000 – Present)

StorageTek 9840/9940B

Page 6: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

6

Storage ArchitectureStorage Architecture

Multi Tiered Storage Vision Online

Supported Configurations DAS – Local processing such as image processing NAS – Data sharing such as office automation SAN – Production processing

Data accessed frequently Nearline

Integrated within SAN Scalable for large datasets & infrequently accessed data Multiple Copies and/or Offsite Storage

Page 7: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

7

Storage Architecture DecisionsStorage Architecture Decisions

Optimized by individual program and program manager, not the enterprise.

Requirements Factors Reliability – Data Preservation Performance – Data Access Cost – $/GB, Engineering Support, O&M Scalability – Data Growth, Multi-mission, etc. Compatibility with current Architecture

Evaluated and recommended through engineering white papers and weighted decision matrices.

Page 8: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

8

High Performance RAID Weighted Matrix High Performance RAID Weighted Matrix

Selecton Criteria RW #EMC

CX300EMC

CX500STK D240

STK D220

Ciprico FibreSt

Adaptec SANbloc

NexSan Ataboy

Initial Cost 9 8 5 6 7 4 10 10Support Cost 9 4 4 6 6 5 10 10

Vendor Support 8 9 9 9 9 8 7 5EDC Experience 6 7 7 8 8 7 5 6

Performance 8 8 8 9 8 6 7 6Reliability 9 9 9 9 9 6 7 6

Manageability 8 7 7 9 9 6 7 7Scalability 6 7 8 8 7 7 7 5

SAN Ready 4 8 8 8 8 6 8 8Upgradeable 4 9 9 8 8 7 7 5

Weighted ScoreEMC

CX300EMC

CX500STK D240

STK D220

Ciprico FibreSt

Adaptec SANbloc

NexSan Ataboy

Initial Cost 72 45 54 63 36 90 90Support Cost 36 36 54 54 45 90 90

Vendor Support 72 72 72 72 64 56 40EDC Experience 42 42 48 48 42 30 36

Performance 64 64 72 64 48 56 48Reliability 81 81 81 81 54 63 54

Manageability 56 56 72 72 48 56 56Scalability 42 48 48 42 42 42 30

SAN Ready 32 32 32 32 24 32 32Upgradeable 36 36 32 32 28 28 20

Total Weighted Score 533 512 565 560 431 543 496

Page 9: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

9

Bulk RAID Weighted MatrixBulk RAID Weighted Matrix

Selecton Criteria RW #Nexsan Ataboy2

CESATA

EMC SATA

STK B220

STK D240

Initial Cost 10 10 10 7 9 6Support Cost 10 10 9 5 8 6

Vendor Support 2 5 3 9 9 9EDC Experience 0 6 5 6 7 8

Performance 5 6 6 7 7 9Reliability 1 6 3 7 7 9

Manageability 5 7 4 7 9 9Scalability 1 5 5 7 7 8

SAN Ready 1 8 0 8 8 8Upgradeable 1 5 3 9 9 8

Weighted ScoreNexsan Ataboy2

CESATA

EMC SATA

STK B220

STK D240

Initial Cost 100 100 70 90 60Support Cost 100 90 50 80 60

Vendor Support 10 6 18 18 18EDC Experience 0 0 0 0 0

Performance 30 30 35 35 45Reliability 6 3 7 7 9

Manageability 35 20 35 45 45Scalability 5 5 7 7 8

SAN Ready 8 0 8 8 8Upgradeable 5 3 9 9 8

Total Weighted Score 299 257 239 299 261

Page 10: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

10

CR1 Storage in Terabytes – May 2004CR1 Storage in Terabytes – May 2004

8.4

53.7

62.181

Nearline JBOD RAID

Page 11: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

11

CR1 SAN/Nearline ArchitectureCR1 SAN/Nearline Architecture

DMF Server

Product Distribution

Tape Drives 8x9840 2x9940B

1Gb Fibre

2Gb Fibre

Disk Cache /dmf/edc 68GB/dmf/doqq 547GB/dmf/guo 50GB/dmf/pds 223GB/dmf/pdsc 1100GB

Ethernet

Page 12: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

12

Future Seamless/Silo ArchitectureFuture Seamless/Silo Architecture

Ethernet

DMF

PDS

Tape Library 8x9840 3x9940B

FTP (lxs37)

Web/ExtractTP9300S3TB

TP9400

CIFSMount

Data Servers

Page 13: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

13

Storage Architecture AgendaStorage Architecture Agenda

Online/Nearline Storage Architecture System Backup Architecture

Onsite Short-term system recovery Offsite Disaster Recovery

Archive Storage Long term data preservation

Page 14: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

14

System Backup Architecture

ITS is responsible for generating system backups to maintain system integrity.

Promote centralized data backup solution to the Projects Legato is used for automated system backups for the Unix (SGI,

SUN, Linux) platforms. ArcServe is used for automated system backups for the Windows

based platform. Fully automated backup solution

Tapes located within tape library Retention period is three months

Page 15: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

15

System Backup Architecture

Unix Server Weekly Full backups with daily incremental:

System partitions Local and third party software packages Databases

DORRAN, Earth Explorer, Inventory, Seamless Legato Oracle Module for Very Large Databases

Quarterly Full backups with daily incremental RAID Datasets (DRG, Browse, Anonymous FTP)

Backups with exclusion of image files and large files User data file systems

Page 16: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

16

System Backup ArchitectureSystem Backup Architecture

Windows Servers Typically full backups with daily incremental (no exclusions)

Workstations and PCs Generally, no system backups Production workstations within CR1 are backed up

(International, WBV)

Page 17: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

17

System Backup ResourcesSystem Backup Resources

Sun E450 (4CPU)Legato -- Unix

StorageTek L700,Six SDLT 220 Drives

Dell 2550 (2 CPU)ArcServe -- Windows

Overland Storage NEO 4100,Three LTO-2 Drives

Page 18: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

18

Legato Monthly Data BackupsLegato Monthly Data Backups

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

2000010

/99

04/0

0

10/0

0

04/0

1

10/0

1

04/0

2

10/0

2

04/0

3

10/0

3

04/0

4

GB

Sto

red

Per

Mon

th

Quarterly IT Landsat WebMap Total

Page 19: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

19

Offsite Backup Architecture

ITS is responsible for generating offsite backups for disaster recovery

Mission essential data written to media and stored offsite LTO-2 tape generated once per week Data written in an open format (tar) Retention period is three months

Projects currently using offsite storage DORRAN Inventory EarthExplorer Digital Archive LAS Source Code Web Servers

Page 20: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

20

Storage Architecture AgendaStorage Architecture Agenda

Online/Nearline Storage Architecture System Backup Architecture

Onsite Short-term system recovery Offsite Disaster Recovery

Archive Storage Long term data preservation

Page 21: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

21

Archive StorageArchive Storage

Digital Archive Media Trade Study To analyze offline digital archive technologies and

recommend the next EDC archive media of choice. Criteria in decreasing order of importance:

Reliability: A second copy will reduce risk somewhat, but a reliable technology is mandatory. Reliability is proven over time.

Performance: High capacity saves significant space and high transfer rates speed up transcription.

Cost: The actual drive cost is fairly insignificant, but the media cost is quite important.

Page 22: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

22

Archive Media Weighted MatrixArchive Media Weighted Matrix

Selecton Criteria WtSTK

9940BHP

LTO2IBM

LTO2SDLT 600

Sony SAIT

IBM 3592

STK 9940B

HP LTO2

IBM LTO2

SDLT 600

Sony SAIT

IBM 3592

Design criteria 50 10.0 7.1 7.1 6.5 6.5 7.9 500.0 355.0 355.0 325.0 325.0 395.0

Capacity 10 4.1 4.1 4.1 6.8 10.0 6.0 41.0 41.0 41.0 68.0 100.0 60.0

Media cost/TB 25 10.0 9.8 9.8 9.7 9.3 7.9 250.0 245.0 245.0 242.5 232.5 197.5

Compatibility 15 1.7 8.3 8.3 10.0 10.0 6.7 25.5 124.5 124.5 150.0 150.0 100.5

Transfer rate 8.2 8.4 8.9 10.0 9.0 6.9 0.0 0.0 0.0 0.0 0.0 0.0

Drive cost 10 2.4 10.0 6.6 7.7 3.8 1.2 24.0 100.0 66.0 77.0 38.0 12.0

Vendor analyses 4.1 5.8 7.3 9.6 7.6 10.0 0.0 0.0 0.0 0.0 0.0 0.0

Scenario cost 20 4.7 10.0 9.2 9.4 7.6 4.1 94.0 200.0 184.0 188.0 152.0 82.0

Total Weighted Score 934.5 1065.5 1015.5 1050.5 997.5 847.0

FY04 RevisionFY04 Revision

Page 23: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

23

System OverviewSystem Overview

Quantity of data to be copied: 2 Copies

Data Set Scenes Data Volume DCTs / HDTs 9940BMSS-P 65,128 3.2 terabytes 118 DCTs 36MSS-A 262,088 9.5 terabytes 277 DCTs 100TM-A 13,733 3.6 terabytes 108 DCTs 40TM-R 386,934 102.2 terabytes 2,357 DCTs 1,040TM-R ~150,000 ~40.4 terabytes ~7,500 HDTs 420

Total: 877,883 158.9 terabytes 10,758 Tapes 1,636

Number of HDTs currently transcribed on TMACS to DCT: 30,500Quantity of HDTs that can be land filled after conversion: 38,000+

Page 24: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

24

Big ChangesBig Changes

Your Order is in What Box?!

Page 25: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

25

(1 copy) (2 copies) 13 Semis ½ cargo space in

SUV38,000 HDTs < 1800 STK 9940B

Impact

Page 26: 1 U.S. Department of the Interior U.S. Geological Survey Mission Support Team Storage Architectures Presented By Ken Gacke, SAIC* U.S. Geological Survey,

26

(1 copy) (2 copies) 13 Semis 1/3 space of STK

Silo38,000 HDTs < 1800 STK 9940B

Impact