Simulation of Hierarchical Storage Systems for TCO · 2018-04-17 · StorageTek SL8500 Library...

Preview:

Citation preview

Simulation of Hierarchical Storage Systems for TCO

Jakob Luttgau and Julian Kunkel

German Climate Compute Center (DKRZ)

June 22, 2017

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 1 / 20

Overview

1. Motivation and Background

2. Modeling and Simulation Tape Storage Systems

3. Evaluation

4. Conclusion / Discussion

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 2 / 20

MotivationLong-term storage and upcoming challenges for exascale supercomputers.

I Storing data from a supercomputer is a common bottleneckI Deep storage hierarchies to balance cost and performanceI Tape is among the most affordable storage solutionsI RAIT and object semantics change how tapes are usedI Innovation mostly dependent on vendors

But given a simulator it would be possible to:I Experiment with alternative configurationsI Better understand cost and deploy more informedI Explore strategies to rollout LTO generations

This is still a work in progress.

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 3 / 20

Automated Tape LibrariesArchives; Data reduction and compression; Encryption; Self-describing tape formats;

IBM TS3500 Library Complex (IBM, 2011b)

StorageTek SL8500 Library Complex (Oracle, 2015)

TFinity Library Complex (Spectralogic, 2016b)

Scalar i6000 Library Complex (Quantum, 2015)

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 4 / 20

1. Motivation and Background

2. Modeling and Simulation Tape Storage Systems

3. Evaluation

4. Conclusion / Discussion

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 5 / 20

Model OverviewHardware and software components in a combined overview.

Client GroupClient

Tape Silo

Shared Cache

Switch

I/O

Servers...

Cache

Switch

Drive

Drive

Drive Drive

Drive Drive

Drive

Drive

Netw

ork

I/O

Sch

ed

ulin

g

Tape Manager

File Manager

Direct RAIT

Cache Policies

Robot

Sched.

Library Topologies

Workload

Generation

Load Balancing

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 6 / 20

Handling READ RequestsStaging of recently accessed files for reads.

Client Group

Client

Shared Cache

SwitchCacheI/O

Server

READ (cached)

Tape Drive

READ (not in cache)Shared Cache

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 7 / 20

Handling WRITE RequestsTwo-Phase write with delayed persistence on tape.

WRITE (Phase 2)

Client Group

Client

SwitchCacheI/O

Server

WRITE (Phase 1)

Tape Drive

delay

Shared Cache

Shared Cache

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 8 / 20

Network Topology ModelNot packet based but allocated for the duration of a transfer.

15000.0

15000.0

10.0

10.0

1000.0

1000.0

1000.0

1000.0

15000.0

15000.0

100.0

100.0

100.0

200.0

200.0

200.0

200.0

200.0

100.0

100.0

100.0

200.0

200.0

200.0

200.0

200.0

1000.0

1000.0

1000.0

1000.0

0::Dummy

1::Node Switch

3::Drive Switch

4::I/O:A

6::Drive:A7::Drive:B

9::Drive 1

10::Drive 2

11::Drive 3

12::Drive 4

13::Drive 5

...

15::mistralpp.dkrz.de

16::lobster1.dkrz.de

17::mistralpp03.dkrz.de

...

...

15000.0

15000.0

2::Switch

Disk Cache

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 9 / 20

Robot Tape LibraryExample: Model of the SL8500 library with robot hands and elevators.

Rr,i

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 10 / 20

Serpentine Tape ModelEstimating spool and seek times for tape access.

Tseek(posj , posi) = max

(|posix − posjx|

vspool,|posit − posjt|

vhead

)

Tread/write(bytes) =bytes

vread/write

Tbusy = Tmount +

BOT,...,BOT∑posi,posi+1

Tseek(posi, posj) + Tread/write(bytesi)

+ Tunmount

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 11 / 20

Scheduling and Request QueuesChaining specialized request queues makes resource allocation manageable.

Drive

Drive

DriveR1,1Rr,i

R1,1Rr,i

Disk I/O Dirty

Queue

Tape I/O

IN

OUT

Robots

uncached reads

cached read

& writes

serve

writeserve

move tape

service

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 12 / 20

1. Motivation and Background

2. Modeling and Simulation Tape Storage Systems

3. Evaluation

4. Conclusion / Discussion

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 13 / 20

0

20

40

60

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

FT

P J

obs

0

20

40

60

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Sta

ges

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 14 / 20

30 drives

0

20

40

60

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Sta

ges

45 drives

0

20

40

60

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Sta

ges

75 drives

0

20

40

60

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Sta

ges

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 15 / 20

30 drives

0

250

500

750

1000

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Num

ber

of J

obs

wait−times< 1m< 2m< 3m< 4m< 5m< 8m< 10m< 15m< 20m< 30m< 1h< 2h< 4h< 8hmore

45 drives

0

10

20

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Num

ber

of J

obs

wait−times< 1m< 2m< 3m< 4m< 5m< 8m< 10m< 15m< 20m< 30m< 1h< 2h< 4h< 8hmore

75 drives

0.0

2.5

5.0

7.5

10.0

12.5

Sat Mon Wed Fri Sun Tue Thu Sat Mon Wed Fri

Num

ber

of J

obs

wait−times< 1m< 2m< 3m< 4m< 5m< 8m< 10m< 15m< 20m< 30m< 1h< 2h< 4h< 8hmore

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 16 / 20

Example: QoS for Total-WaittimeE.g.: How many drives to serve x % of requests in under y minutes.

0.00

0.05

0.10

0.15

0.20

0.25

0 50 100 150 200

duration in seconds

dens

ity

type

read

write

0.00

0.05

0.10

0.15

0.20

0 50 100 150 200

duration in seconds

dens

ity

type

read

write

0.00

0.25

0.50

0.75

1.00

0 50 100 150 200

duration in seconds

y

type

read

write

0.00

0.25

0.50

0.75

1.00

0 50 100 150 200

duration in seconds

y

type

read

write

e.g. fewer vs. more drives

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 17 / 20

Example: Power Consumption EstimatesApproximate power consumption for different configurations.

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 18 / 20

Conclusion and Discussion

SummaryI Potential helper for procurement of HSM and tape systemsI Enabler for open research otherwise only performed by vendorsI Depth of stack makes development of DES very complicated

Future WorkI Fine-tuning of various simulation parametersI Conducting experiments related drive and tape placement, LTO

generation, disk cache capacities and RAIT deploymentsI Comparison with different deployments at other sites

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 19 / 20

Bibliography I

Fontana, R. E., Decad, G. M., and Hetzler, S. R. (2013). The Impact of Areal Density and Millions ofSquare Inches ( MSI ) of Produced Memory on Petabyte Shipments of TAPE , NAND Flash , andHDD Storage Class Memories. Proceedings of the 29th IEEE Symposium on Massive Storage Systems andTechnologies.

IBM (2011a). High Performance Storage System. Technical report.

IBM (2011b). IBM System Storage TS3500 Tape Library Connector and TS1140 Tape Drive support forthe IBM TS3500 Tape Library. pages 1–15.

Oracle (2015). StorageTek SL8500 Modular Library System User’s Guide.

Quantum (2015). Quantum Scalar i6000 Datasheet.

Spectralogic (2016a). LTO Roadmap. https://www.spectralogic.com/features/lto-7/. [Online; accessed2016-01-24].

Spectralogic (2016b). Spectralogic TFinity - Enterprise Performance.https://www.spectralogic.com/products/spectra-tfinity/tfinity-features-enterprise-performance/.[Online; accessed 2016-02-12].

Sun (2006). StorageTek StreamLine SL8500 - User Guide. (96154).

Jakob Luttgau DKRZ Modeling and Simulation of Tape Libraries June 22, 2017 20 / 20

Recommended