10
1 Data Security: Leveraging Information Dispersal as an Alternative to RAID and Replication Chris Gladwin CEO & Founder Cleversafe Data Storage is Transforming *IDC Digital Universe Report Methods used in the past 50 years won’t be adequate for the next 50 years Over 90% of future storage = unstructured digital content* Data storage is growing 10x every 5 years* Numbers: 5 KB / record Text: 500 KB / record Images: 1,000 KB / picture Audio: 5,000 KB / song Video: 5,000,000 KB / movie Hi-Res: 50,000,000 KB / HD movie, CT scan, etc. Traditional Data New Data

No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

1

Data Security: Leveraging Information

Dispersal as an Alternative to RAID and Replication

Chris GladwinCEO & Founder

Cleversafe

Data Storage is Transforming

*IDC Digital Universe Report

Methods used in the past 50 years won’t be adequate for the next 50 years

Over 90% of future storage = unstructured digital content*• Data storage is growing 10x every 5 years*

Numbers: 5 KB / record

Text: 500 KB / record

Images: 1,000 KB / picture

Audio: 5,000 KB / song

Video: 5,000,000 KB / movie

Hi-Res: 50,000,000 KB / HD movie, CT scan, etc.

Traditional Data New Data

Page 2: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

2

Security Breaches Increasing

Sources: Identity Theft Resource Center, Bank info Security, Cisco Security Expert

Selected Examples

• Chase Bank, JPMorgan Chase - 2009

• Chase Bank is notifying customers that a tape used as a backup for system information is missing at a secure offsite storage unit. It may have included name, address and SSN. The information "can be read only with special equipment and software…“

• BlueCross BlueShield - 2009

• Between 57-68 hard drives are missing from the BlueCross BlueShield office in Eastgate, TN. BCBS announced that the theft affects about 2 million clients

• Virginia State Prescription Monitoring Program Records - 2009

• Hackers stole 8.3 million records, erased the originals and created an encrypted backup of VPMP's database. The records were patient records and 35 million drug prescriptions for their patients.

Industries represented by percentage of breaches

Data Security & ThreatsInformation Security CIA model:

4

Objectives Requirements Example ThreatsConfidentiality Data is never

accessed by unauthorized parties

• Key or credential mismanagement.• Accidental loss of media or devices.• Malicious access.• Remote compromise or theft.• Interception of packets.

Integrity Data is always accurate, and cannot be modified without authorization

• Bit errors in drives, memory, connections, or flash.

• Physical read and write errors.• Accidental data corruption.• Malicious data tampering.

Availability Data is always available to authorized parties

• Drive, location, server, and connection failures.

• Maintenance operations.• Denial of service attacks.

Replication increases Availability but making copies of data increases the risk of attack

Page 3: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

3

Challenges with Replicated Storage1. Because systems often fail you need multiple copies in

different places• Total bits stored = 3.5 times the original data (Assumes RAID arrays)

• Total bandwidth consumed = 3.5 times the original data• Requires 3.5 times the equipment, cooling, power, floor space

2. Multiple copies also significantly decrease security• 3 copies = Seven security vulnerabilities (3 copies + 4 data moves)• Hundreds of millions of personal records lost

3. Replication does not protect from Silent Error Corruption and results in lower data integrity

RAID failing in petabyte scale

• Drive Sizes Decreasing Reliability• Chance of Unrecoverable Read Error (URE)

approaching size of drives• Rebuild times increasing

Page 4: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

4

Cloud Storage Presents New Challenges

• Multi-terabyte to petabyte scale• Distributed across geographies• Housing unstructured content

– CT Scans, HD movies, photo libraries, etc.• Storage may be accessible via the

public internet– Can’t put a firewall around it

• Delivers Information Security –confidentiality, integrity, and availability – in order to aid adoption

Traditional Storage

Copies & Replication

Information Dispersal

Packet switching applied to storage

Applying the Internet to StorageIncreasing scale drove a transformation in data communications

Telephony

Circuit Switching

Internet

Packet Switching

System Growth

010

1100111011010101

1011010101

System Growth

1011010101 010 1100111011010101

Dispersal is to storage what packet switching is to networking

Page 5: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

5

History of Information DispersalDeep Academic Roots:

1960 Reed-Solomon codes developed by Irving S. Reed and Gustave Solomon at the MIT Lincoln Laboratory.

1969 Elwyn Berlekamp and James Massey determine the Berlekamp-Massey decoding algorithm.

1979 Adi Shamir (MIT) publishes "How to share a secret“ in the Communications of the ACM

1989 Michael Rabin (Harvard) publishes “Efficient dispersal of information for security, load balancing, and fault tolerance”

1997 Ron Rivest (MIT) publishes “All-Or-Nothing Encryption and The Package Transform”

Information Dispersal 101

Digital Content

Site 1

Site 2

Site 3

Site 4

Real-time data retrieval is always bit-perfect as long as a threshold number of slices are available

8h$1 vD@- fMq& Z4$’ >hip )aj% l[au T0kQ %~fa Uh(k My)v 9hU6 >kiR &i@n pYvQ 4Wco

Digital Assets divided into slices using Information Dispersal Algorithms

8h$1 vD@- >hip )aj% l[au %~fa 9hU6 >kiR pYvQ 4Wco

Slices distributed to separate storage devices

Dispersal is packet switching applied to data storage

IDA

IDA

Page 6: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

6

Information Dispersal Configuration

• Better Reliability – tolerates loss or unavailability of slices (n-k)• Better Security – tolerates k compromises

5 stores needed to break Confidentiality or Integrity

5 stores needed to break Availability

5-of-9 Configuration

10 stores needed to break Confidentiality or Integrity

7 stores needed to break Availability

126 combinations

10-of-16 Configuration 8008 combinations

Scale out capacity and performance independently

Information Dispersal Seamless Access

12

SITE 1 SITE 2 SITE 3 SITE nStorage nodes

Access layer

Protocols NAS protocols

JAVA SDK

Object Access

Info. Dispersal routers Direct application integration

Object Store

Dispersal

Massive content distribution with edge clients

Object Storage delivers scalability, efficiency and mobility

REST/HTTP, FTP

File access

Block Store

Page 7: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

7

Information Dispersal for Cloud Storage

• Typically, multi-site configuration with slices residing across 3-4 data centers

• Geographic redundancy and availability are achieved without the overhead of replication

DATA CENTER 1

DATA CENTER 2

DATA CENTER 3

DATA CENTER 4

Access device

Slices stored on each node –not copies of data

Examining IDA methods

• Information Dispersal is Forward Error correction techniques (AKA Reed Solomon) that form nsegments where m are needed to recreate the data (m of n)

• Look for approaches that transform data so that it doesn’t represent the original data to guarantee security– Example: Credit card number with 4 of 6 configuration

Visible Data IDA Method Secure IDA Method

5466 1610 4539 4439

5466 1610 «þTE

4439 4439 NIy^

1fLIÇ øÐ1â @Cåâ

d6=W Qµ©7 SQí&

Page 8: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

8

Information Dispersal Improves Reliability

Annual Chance of Data Loss in a 1,000 Disk System

Prob

abili

ty o

f D

ata

Loss

Storage Efficiency – 1 PB usable example

16

Page 9: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

9

Information Dispersal Improves CostsCapacity Optimized

Traditional Storage (low cost RAID 5)

Information Dispersal System

Number of Extra Copies 2 plus 20% RAID overhead None (10 of 16 Dispersal)

Expected Data Integrity 6 nines over 5 years 12 nines over 5 years

Raw Storage Capacity 2,873 TB 960 TB

Usable Storage Capacity 600 TB 600 TB

IDC AVERAGE END USER PRICE

INFO. DISPERSAL END USER PRICE

$ per TB Raw Capacity $ 1,090* $ 732 (incl. commodity hardware)

$ per TB Usable Capacity

$ 5,219 ** $ 1,172

Price as Configured $3,131,352 $ 702,933

Electrical Power $ 97,606 $ 14,709

Space $ 124,488 $ 18547

Total Cost of Ownership (year 1)

$ 3,353,446 $ 736,189

* Traditional Storage $/TB from IDC 2009 cost for Capacity Optimized Storage** Assumes 33% physical storage increase from non-virtualized storage containers

1/5 the cost

IDA – Ideal for Cloud Storage

• Scales to Petabytes with distributed architecture

• End users are in control of data since it exists only where and when they want it to

• Works in public internet since slices are transformed from actual data into unrecognizable form

Page 10: No Slide Title - Aventri...4 Cloud Storage Presents New Challenges • Multi-terabyte to petabyte scale • Distributed across geographies • Housing unstructured content – CT Scans,

10

Thank you

[email protected]