Upload
lyquynh
View
213
Download
0
Embed Size (px)
Citation preview
2
Project Overview
An Open Source Project with Commercial Backing
Dispersed Storage – an Open Source Project• Hosted at www.cleversafe.org
• Includes the complete protocol and algorithms
• Incorporates and/or enhances additional open source software– Bouncy Castle – Cryptography
– JSAP – Java Simple Argument Parser
– Bzip2 – Data Compressor
– Apache Commons – Logging, Statistics, basic Internet protocols
– JUnit – Testing Framework
– Log4j – Logging Utility
– MINA -- Network Application Framework
– SLF4J – Simple Logging Façade for Java
– SVNKit – Java Subversion library
– Wrapper – Java Service Wrapper
– ws-commons – Webservices Common Utilities
– jSCSI – iSCSI Initiator
Commercial Support• Product Vendors: Cleversafe
• Resellers: 5 Certified Resellers, to be announced
• Service Providers: 6 Active Dispersed Storage Providers, to be announced
Mission: Create and Establish a standard to store and distribute the world’s data
3
Data Storage Growth
Traditional Data Additional, New Data
Huge Growth in Data Storage driven by Digital Content
Images – 500KB per picture
Audio – 5,000 KB per song
Video – 5,000,000 KB per movie
Documents
Character & numerical databases
+
4
Current High Availability Scenario
300% Disk Storage Overhead + Tape Backup-Total bytes stored = 4x usable capacity
200% Bandwidth Overhead -Each node supports full operational requirement -Total bandwidth required = 3x operational requirement
Higher Cost- More Power- More Management- More Space- More EquipmentMore Security Risks
ParityA3
RAID3 Controller
Server1 @ Location 1
A2A1
The quick brown fox
jumps over the
lazy brown dog.
11010010
00110010
InternetConnection
InternetConnection
InternetConnection
ParityA3
RAID3 Controller
Server2 @ Location 2
A2A1
The quick
brown fox
jumps over
the
lazy brown
dog.
11010010
00110010
ParityA3
RAID3 Controller
Server3 @ Location 3
A2A1
The quick
brown fox
jumps over
the
lazy brown
dog.
11010010
00110010
5
Digital Data Storage - An Antiquated Approach
Currently Data Storage = Data Copies
• Not Secure– 200 major announced security breaches since 2004
• Not Private– Data copies are… data copies
• Not Long Term– Tied to hardware which doesn’t last over 5 years
• More Reliable = More Cost– Additional copies, synchronization traffic, high cost hardware
• Not Scalable– Performance and management degrades as scale increases
6
Information Dispersal
Information Dispersal Algorithms (IDAs) have traditionally been used to store extremely sensitive data elements, like cryptographic keys and weapon launch codes
– Inherently secure
– Inherently private
– Inherently reliable
– Inherently long term
With the emergence of Broadband, IDA’s can be used to store the world’s data.
7
How Information Dispersal Works
Information Dispersal Algorithms- Quick Mathematical Transformation
36 example characters = 36 total Bytes
“Slices” are to data storage …what “packets” are to data communications.- Provide inherently reliable, private, secure and long-term storage
16 example slices = 58 total Bytes
This Slicing example has a 60% Storage Overhead- Total bytes stored = 1.6X usable capacity
8
• 3 TB “Raw” storage per 1U slice server
• 30 TB usable storage with 16/10 IDA config
• Unlimited vaults (similar to LUNs) per dsNet
• Can be deployed in single rack or geographically distributed around the world
Slicestor: Slice Server
• Disperses and retrieves data to/from slice servers
• Approximately 3 MB of Java code
• Ideal for content distribution
• Disperses and retrieves data to/from slice servers
• iSCSI or Block interfaces
• Ideal for digital content loading
• Can deploy in redundant configurations
dsNet Components
Accesser: Slice Router
dsNet Client Software
9
ClientClientClientClient
Data Storage with Information Dispersal
Slicestors
dsNetSoftware
Client
ClientClientdata
Accessor
data data
16 at onceMax. Delivery
thousandsDelivery choices
15-60%Bandwidth Needed
15-60%Storage Overhead
TCS = ‘Total Content Size’dsNet Width = 16, Threshold = 12
10
ClientClientClientClient
Data Retrieval with Information Dispersal
Slicestors
dsNet Software ClientsClientClient
data
Accessor
data data
16 at onceMax. Delivery
thousandsDelivery choices
15-60%Bandwidth Overhead
15-60%Storage Overhead
dsNet Width = 16, Threshold = 12
Data delivered via thousands of choices
11
Dispersal versus Replication
Width
8
8
16
16
32
64
Threshold
4
6
10
12
24
56
13
7
>16
11
>16
>16
Storage
Overhead
100%
33%
60%
33%
33%
14%
Access
Choices
70
28
8008
1820
11 million
214 million
Typical Configurations
Copies
2
2
3
3
Parity
No
Yes
No
Yes
Nines of
Reliability
5
10
7
>16
Storage
Overhead
100%
167%
200%
300%
Access
Choices
2
2
3
3
Nines of
ReliabilitySlice Storage
Copy Copy
Copy
Copy
Copy
ParityParityCopy
Copy
Copy
Copy
Copy ParityParity Parity
Copies + Parity Storage
Dispersal
Replication
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 1415 16
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9 10 27282930313211121314151617181920 252621222324
Bandwidth
Overhead
100%
33%
60%
33%
33%
14%
Bandwidth
Overhead
100%
100%
200%
200%
Source data size Storage Overhead size
Source data size Storage Overhead size
12
Performance
0 10 20 30 40 50 60
Read - Local Hard Drive
Read - Local Network Server
Read - Local dsNet
Write - Local Hard Drive
Write - Local Network Server
Write - Local dsNet
Seconds
Time to Read / Write a 1GB file
dsNet Speed is equivalent to the speed of a local hard drive.
13
dsNet – Standard Interfaces
dsNet Looks and Acts like a hard drive via Standard iSCSI interface
-Works in Windows, Linux, Mac, Solaris, etc.- Works with any application
Multiple Standard dsNet Interfaces:-iSCSI, Block, NFS,CIFS/SMB,FTP, Object, etc.
Lightweight dsNet Java client also enables embedded devices to access a dsNet:- Media players, phones, set top box, security cameras, sensors, etc.
14
30 Terabyte Example Configuration
16 Slicestors- In 1 to 16 locations
16 “Wide” dsNet with a “Threshold” of 10
Location 1
Location 2
Location 3
Location 4
Location 5
Location 6Location 7
Location 8
Up to 6 simultaneous node failuresFailure Tolerance
32 x 10 Mbps Internet ConnectionsBandwidth Cost
16 x 1U Storage ServersStorage Cost
200 Mbps with 6 node failuresThroughput Capacity
48 Total / 30 Usable TerabytesStorage Capacity
32- 10 Mbps Internet Connections- 2 per Slicestor
60% Storage Overhead-Total bytes stored = 1.6x usable capacity
60% Bandwidth Overhead-Total bandwidth required = 1.6x operational requirement
15
Replication versus Dispersal - 30 TB example
Replication Dispersal
Location 1
Location 2
Location 3
Location 1
Location 2
Location 3
Location 4
Location 5
Location 6 Location 7
Location 8
60% Storage Overhead 60% Bandwidth Overhead
300% Storage Overhead 200% Bandwidth Overhead
84 x 10 Mbps Internet Connections
42 x 1U Storage Servers
200 Mbps with 2 node failures
Up to 2 simultaneous node failures
120 Total / 30 Usable Terabytes
32 x 10 Mbps Internet Connections
16 x 1U Storage Servers
200 Mbps with 6 node failures
Up to 6 simultaneous node failures
48 Total / 30 Usable Terabytes
Failure Tolerance
Bandwidth Cost *
Storage Cost
Throughput
Storage Capacity
* 2 x 10 Mbps Internet Connection per 1U Storage Server
16
Replication versus Dispersal - 30 TB example
Replication:
Dispersal:
Total Raw Storage = 117 TB
Total Raw Storage = 48 TB
30 TB Usable 30 TB Replicated 30 TB Replicated
30 TB Usable
9 TBParity
9 TBParity
9 TBParity
48 TB Raw Capacity
1/3 of the raw capacity delivers the same usable storage
Replication: Dispersal:Total Servers = 39 Total Servers = 16
~2/5 of the raw the of servers delivers the same reliability
17
Deployment Options
Internal dsNets Hosted dsNets The Storage InternetTM
Organizations build and operate private dsNetsfor their own internal use
Hosting companies and ISPs provide and operate dedicated dsNets for their customers
Dispersed Storage Providers (“DSPs”) provide Dispersed Storage service on an interconnected dsNet
18
Benefits – Limitless Storage
Scalability Without Limits– Grid-based architecture with no centralized servers and access processing
distributed to clients allows unlimited scalability
– Data occupies less bits on a dsNet than on traditional storage solutions allowing storage systems to scale without significant overhead
Security/Reliability Without Limits– No full copy of the data exists on a single server which is inherently secure
and private
– Tolerates multiple failures of hardware, storage locations or administrators while maintaining access data as long as a threshold exists
Longevity Without Limits– Rebuilding and integrity agents refresh data on new hardware without
disruption allowing data to exist for extended periods of time
– Cleversafe storage is fault-tolerant resulting in seamless access to data even when servers are unavailable
Cost-Effectiveness Without Limits– Reduces storage and bandwidth expansion
– Utilizes cost-effective commodity storage hardware
19
Getting Dispersed Storage
Commercial OfferingsOpen Source Project
www.cleversafe.org
Commercial Products- Software and Hardware to build Dispersed Storage grids
Commercial Storage Services-Dispersed Storage Providers (‘DSPs’) will offer dispersed storage services
Open Source Software- Full software protocol- Dispersed Storage client and server software
20
Open Source versus Commercial
CleversafeDispersed Storage
Interoperability Protocols- Standards
- Open Source software
Products- Integrated hw/sw Appliances
- Customized OS
- Additional hardware features
- Performance
Services- Training
- Certification
- Support
Additional Capabilities- Management
- Reporting
Open Source
Commercial
Heavy influence on and contribution to
standards efforts
Heavy influence on
and contribution to standards efforts
Commercial Operations
Commercial Operations
Internet Equipment Providers
21
Partner Opportunities
Technology Providers
– Participate in the open source project at www.cleversafe.org
– Develop new products and technologies using Dispersed Storage
Resellers / Distributors / Integrators
– Develop and deploy Dispersed Storage solutions