22
Building the Storage Internet Dispersed Storage Overview – March 2008

Building the Storage Internet - Illinois Institute of ... · – Data occupies less bits on a dsNet than on traditional storage solutions allowing storage systems to scale without

  • Upload
    lyquynh

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Building the Storage Internet

Dispersed Storage Overview – March 2008

2

Project Overview

An Open Source Project with Commercial Backing

Dispersed Storage – an Open Source Project• Hosted at www.cleversafe.org

• Includes the complete protocol and algorithms

• Incorporates and/or enhances additional open source software– Bouncy Castle – Cryptography

– JSAP – Java Simple Argument Parser

– Bzip2 – Data Compressor

– Apache Commons – Logging, Statistics, basic Internet protocols

– JUnit – Testing Framework

– Log4j – Logging Utility

– MINA -- Network Application Framework

– SLF4J – Simple Logging Façade for Java

– SVNKit – Java Subversion library

– Wrapper – Java Service Wrapper

– ws-commons – Webservices Common Utilities

– jSCSI – iSCSI Initiator

Commercial Support• Product Vendors: Cleversafe

• Resellers: 5 Certified Resellers, to be announced

• Service Providers: 6 Active Dispersed Storage Providers, to be announced

Mission: Create and Establish a standard to store and distribute the world’s data

3

Data Storage Growth

Traditional Data Additional, New Data

Huge Growth in Data Storage driven by Digital Content

Images – 500KB per picture

Audio – 5,000 KB per song

Video – 5,000,000 KB per movie

Documents

Character & numerical databases

+

4

Current High Availability Scenario

300% Disk Storage Overhead + Tape Backup-Total bytes stored = 4x usable capacity

200% Bandwidth Overhead -Each node supports full operational requirement -Total bandwidth required = 3x operational requirement

Higher Cost- More Power- More Management- More Space- More EquipmentMore Security Risks

ParityA3

RAID3 Controller

Server1 @ Location 1

A2A1

The quick brown fox

jumps over the

lazy brown dog.

11010010

00110010

InternetConnection

InternetConnection

InternetConnection

ParityA3

RAID3 Controller

Server2 @ Location 2

A2A1

The quick

brown fox

jumps over

the

lazy brown

dog.

11010010

00110010

ParityA3

RAID3 Controller

Server3 @ Location 3

A2A1

The quick

brown fox

jumps over

the

lazy brown

dog.

11010010

00110010

5

Digital Data Storage - An Antiquated Approach

Currently Data Storage = Data Copies

• Not Secure– 200 major announced security breaches since 2004

• Not Private– Data copies are… data copies

• Not Long Term– Tied to hardware which doesn’t last over 5 years

• More Reliable = More Cost– Additional copies, synchronization traffic, high cost hardware

• Not Scalable– Performance and management degrades as scale increases

6

Information Dispersal

Information Dispersal Algorithms (IDAs) have traditionally been used to store extremely sensitive data elements, like cryptographic keys and weapon launch codes

– Inherently secure

– Inherently private

– Inherently reliable

– Inherently long term

With the emergence of Broadband, IDA’s can be used to store the world’s data.

7

How Information Dispersal Works

Information Dispersal Algorithms- Quick Mathematical Transformation

36 example characters = 36 total Bytes

“Slices” are to data storage …what “packets” are to data communications.- Provide inherently reliable, private, secure and long-term storage

16 example slices = 58 total Bytes

This Slicing example has a 60% Storage Overhead- Total bytes stored = 1.6X usable capacity

8

• 3 TB “Raw” storage per 1U slice server

• 30 TB usable storage with 16/10 IDA config

• Unlimited vaults (similar to LUNs) per dsNet

• Can be deployed in single rack or geographically distributed around the world

Slicestor: Slice Server

• Disperses and retrieves data to/from slice servers

• Approximately 3 MB of Java code

• Ideal for content distribution

• Disperses and retrieves data to/from slice servers

• iSCSI or Block interfaces

• Ideal for digital content loading

• Can deploy in redundant configurations

dsNet Components

Accesser: Slice Router

dsNet Client Software

9

ClientClientClientClient

Data Storage with Information Dispersal

Slicestors

dsNetSoftware

Client

ClientClientdata

Accessor

data data

16 at onceMax. Delivery

thousandsDelivery choices

15-60%Bandwidth Needed

15-60%Storage Overhead

TCS = ‘Total Content Size’dsNet Width = 16, Threshold = 12

10

ClientClientClientClient

Data Retrieval with Information Dispersal

Slicestors

dsNet Software ClientsClientClient

data

Accessor

data data

16 at onceMax. Delivery

thousandsDelivery choices

15-60%Bandwidth Overhead

15-60%Storage Overhead

dsNet Width = 16, Threshold = 12

Data delivered via thousands of choices

11

Dispersal versus Replication

Width

8

8

16

16

32

64

Threshold

4

6

10

12

24

56

13

7

>16

11

>16

>16

Storage

Overhead

100%

33%

60%

33%

33%

14%

Access

Choices

70

28

8008

1820

11 million

214 million

Typical Configurations

Copies

2

2

3

3

Parity

No

Yes

No

Yes

Nines of

Reliability

5

10

7

>16

Storage

Overhead

100%

167%

200%

300%

Access

Choices

2

2

3

3

Nines of

ReliabilitySlice Storage

Copy Copy

Copy

Copy

Copy

ParityParityCopy

Copy

Copy

Copy

Copy ParityParity Parity

Copies + Parity Storage

Dispersal

Replication

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 9 10 27282930313211121314151617181920 252621222324

Bandwidth

Overhead

100%

33%

60%

33%

33%

14%

Bandwidth

Overhead

100%

100%

200%

200%

Source data size Storage Overhead size

Source data size Storage Overhead size

12

Performance

0 10 20 30 40 50 60

Read - Local Hard Drive

Read - Local Network Server

Read - Local dsNet

Write - Local Hard Drive

Write - Local Network Server

Write - Local dsNet

Seconds

Time to Read / Write a 1GB file

dsNet Speed is equivalent to the speed of a local hard drive.

13

dsNet – Standard Interfaces

dsNet Looks and Acts like a hard drive via Standard iSCSI interface

-Works in Windows, Linux, Mac, Solaris, etc.- Works with any application

Multiple Standard dsNet Interfaces:-iSCSI, Block, NFS,CIFS/SMB,FTP, Object, etc.

Lightweight dsNet Java client also enables embedded devices to access a dsNet:- Media players, phones, set top box, security cameras, sensors, etc.

14

30 Terabyte Example Configuration

16 Slicestors- In 1 to 16 locations

16 “Wide” dsNet with a “Threshold” of 10

Location 1

Location 2

Location 3

Location 4

Location 5

Location 6Location 7

Location 8

Up to 6 simultaneous node failuresFailure Tolerance

32 x 10 Mbps Internet ConnectionsBandwidth Cost

16 x 1U Storage ServersStorage Cost

200 Mbps with 6 node failuresThroughput Capacity

48 Total / 30 Usable TerabytesStorage Capacity

32- 10 Mbps Internet Connections- 2 per Slicestor

60% Storage Overhead-Total bytes stored = 1.6x usable capacity

60% Bandwidth Overhead-Total bandwidth required = 1.6x operational requirement

15

Replication versus Dispersal - 30 TB example

Replication Dispersal

Location 1

Location 2

Location 3

Location 1

Location 2

Location 3

Location 4

Location 5

Location 6 Location 7

Location 8

60% Storage Overhead 60% Bandwidth Overhead

300% Storage Overhead 200% Bandwidth Overhead

84 x 10 Mbps Internet Connections

42 x 1U Storage Servers

200 Mbps with 2 node failures

Up to 2 simultaneous node failures

120 Total / 30 Usable Terabytes

32 x 10 Mbps Internet Connections

16 x 1U Storage Servers

200 Mbps with 6 node failures

Up to 6 simultaneous node failures

48 Total / 30 Usable Terabytes

Failure Tolerance

Bandwidth Cost *

Storage Cost

Throughput

Storage Capacity

* 2 x 10 Mbps Internet Connection per 1U Storage Server

16

Replication versus Dispersal - 30 TB example

Replication:

Dispersal:

Total Raw Storage = 117 TB

Total Raw Storage = 48 TB

30 TB Usable 30 TB Replicated 30 TB Replicated

30 TB Usable

9 TBParity

9 TBParity

9 TBParity

48 TB Raw Capacity

1/3 of the raw capacity delivers the same usable storage

Replication: Dispersal:Total Servers = 39 Total Servers = 16

~2/5 of the raw the of servers delivers the same reliability

17

Deployment Options

Internal dsNets Hosted dsNets The Storage InternetTM

Organizations build and operate private dsNetsfor their own internal use

Hosting companies and ISPs provide and operate dedicated dsNets for their customers

Dispersed Storage Providers (“DSPs”) provide Dispersed Storage service on an interconnected dsNet

18

Benefits – Limitless Storage

Scalability Without Limits– Grid-based architecture with no centralized servers and access processing

distributed to clients allows unlimited scalability

– Data occupies less bits on a dsNet than on traditional storage solutions allowing storage systems to scale without significant overhead

Security/Reliability Without Limits– No full copy of the data exists on a single server which is inherently secure

and private

– Tolerates multiple failures of hardware, storage locations or administrators while maintaining access data as long as a threshold exists

Longevity Without Limits– Rebuilding and integrity agents refresh data on new hardware without

disruption allowing data to exist for extended periods of time

– Cleversafe storage is fault-tolerant resulting in seamless access to data even when servers are unavailable

Cost-Effectiveness Without Limits– Reduces storage and bandwidth expansion

– Utilizes cost-effective commodity storage hardware

19

Getting Dispersed Storage

Commercial OfferingsOpen Source Project

www.cleversafe.org

Commercial Products- Software and Hardware to build Dispersed Storage grids

Commercial Storage Services-Dispersed Storage Providers (‘DSPs’) will offer dispersed storage services

Open Source Software- Full software protocol- Dispersed Storage client and server software

20

Open Source versus Commercial

CleversafeDispersed Storage

Interoperability Protocols- Standards

- Open Source software

Products- Integrated hw/sw Appliances

- Customized OS

- Additional hardware features

- Performance

Services- Training

- Certification

- Support

Additional Capabilities- Management

- Reporting

Open Source

Commercial

Heavy influence on and contribution to

standards efforts

Heavy influence on

and contribution to standards efforts

Commercial Operations

Commercial Operations

Internet Equipment Providers

21

Partner Opportunities

Technology Providers

– Participate in the open source project at www.cleversafe.org

– Develop new products and technologies using Dispersed Storage

Resellers / Distributors / Integrators

– Develop and deploy Dispersed Storage solutions

Thank You