25
Huawei SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co.

Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

Embed Size (px)

Citation preview

Page 1: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Huawei SmartDisk based Object Storage --- Universal Distributed

Storage

Qingchao Luo

Huawei Technologies Co.

Page 2: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Agenda

2

Object Storage Understanding UDS System Design philosophy UDS Hardware Design UDS Software Design

Future works

Page 3: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Why Object Storage

3

iSCSI/FC Protocol Layer

Storage Layer

NFS/CIFS

File System Object Object Object Object

System Object

Key Metadata

Customized Metadata

S3

Block Storage Logical Unit Number (LUN),

Logical Block Address (LBA),

SCSI command.

High Cost, Low Latency

(<10ms), not easy to

manage, hard to scale.

File Storage Tree structure, dir/file

operations, Access Control

List (ACL) , Quota.

Low Cost, High Latency

(<100ms), easy to manage,

can be scaled out.

Object Storage Flat structure, Object has metadata and data,

which support CRUD( Create, Read, Update,

Delete) operations, HTTP based access .

Very Cheap, higher Latency (> 100ms), easy to

manage and maintenance, native scale out

architecture.

Block System

UDS (Universal Distributed Storage) for Object Storage

Page 4: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

UDS Key Features

4

Unlimited Scalability

Low TCO

Extreme Reliability

Page 5: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Agenda

5

Object Storage Understanding UDS System Design philosophy UDS Hardware Design UDS Software Design

Future works

Page 6: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

System Architecture

Highlights: Addressing by DHT Full Decentralized System Small Unit Storage Node

Smart Disk

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12 P19 P26 P5 P12

Access Layer

Distributed Hash(DHT) Ring

P19 P26 P5 P12

P19 P26 P5 P12

P19 P26 P5 P12

P19 P26 P5 P12

P19 P26 P5 P12

P19 P26 P5 P12

UDS

Smart Disk

Clients

Storage Layer with SoD (Self Organization Disk)

Page 7: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Hardware Architecture

Access Nodes: External Interface Data Flow Control Hash Calculation Centralized Management

Switches: Exchange Channel of

internal and external data

Storage Nodes: Store Data and Metadata

Basic Cabinet

256G

Expansion Cabinet Expansion Cabinet Expansion Cabinet

IP bearer network

42 U

4 U

4 U

4 U

4 U

2 U2 U

1 U1 U

4*10GE

S3

42 U

4 U

4 U

4 U

4 U

2 U2 U

1 U1 U

4*10GE

42 U

4 U

4 U

4 U

4 U

2 U2 U

1 U1 U

4*10GE

42 U

4 U

4 U

4 U

4 U

2 U2 U

1 U1 U

4*10GE

2*10GE

2*10GE

2*10GE

2*10GE

2*10GE

2*10GE

2*10GE

2*10GE

2*10GE 2*10GE

2*10

GE

2*10

GE

Aggregation

Access Layer

Networking

Page 8: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Hardware Components

One cabinet 2.1PB

Up to 84 Cabinets

GE / 10GE

High Density

Intelligent Enclosure:

4U75 slots, GE /10GE

Smart Disk:

3.9W/TB, GE

Easy Expansion, Low TCO, High Reliability. Good at Storing Massive data for long time.

Page 9: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Smart Disk

Good Reliability. One ARM one HDD,

chip /HDD fail only causes several TB data

at most to be rebuilt. This fault mode is

good for scale-out system recovery.

Good Scalability. Each smart disk has

one IP on one Ethernet port, every access

node can R/W easily.

Easy maintenance. Chip / memory / HDD

fail, only need replace smart disk.

Page 10: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Future Full ARM Design

6 5 4 3 2 1 7 14 13 12 11 10 9 8

GE Switch GE Switch

Power Power

Switch& Interface

Switch& Interface

Fan Fan Fan Fan Fan Fan

4 3 2 1 0 13 12 11 10 9 8

SOP

SOP

5 7

SOP

SOP

6 14

GE Switch

Fan

Power

4U

SOP

Power Power

SOP SoD SoD SoD SoD 0

ARM 8 Core

SoP

ARM 8 Core

ARM

Smart Disk

SoP, System over Processor

Easy deployment and maintenance.

Compute node and storage node have same

size, and both are pluggable in one enclosure.

Page 11: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Agenda

Object Storage Understanding UDS System Design philosophy UDS Hardware Design UDS Software Design

Future works

Page 12: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

DHT(Distributed Hash Table)

Hash key space is from 0~2^32 -1, it was split into N key space partitioning (This figure has 20 partitions).

Use virtual node to mange partitions. (This figure has A~T virtual node, one virtual node has one partition).

One Physical node (one Smart Disk) may has more than one virtual node.

A key (K1) must map to a partition after hashing, finally it ‘s easy to store value on Smart Disk.

Page 13: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Smart Disk layout for K-V

Reserve OS Configure System log SoD Redolog SoD K-V DB Reserve

Meta Data

reserve

Index-Primary Index-Secondary Meta Data Free Block Bitmap Data

reserve reserve reserve reserve

Smart Disk provide KV interface, put value by key and retrieve value by key.

Disk Layout is used for key index and value store.

Redundant Metadata and key index improve reliability.

K-V, Key Value

Page 14: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

KV Data Base for Key Index

Head 8 KB

8 KB

8 KB …… 8

KB 8

KB 8

KB ……

Hash Static Pages (20GB) Hash Collision Pages (1GB)

Page

Meta Data

reserve

Index-Primary Index-Secondary Meta Data Free Block Bitmap Data

reserve reserve reserve reserve

SoD K-V DB

Index-Primary is written by Direct IO to guarantee consistency.

Index-Secondary is for redundancy, and it is written by Asynchronous Page IO for performance.

Each is 8KB, keep alignment with OS 4KB page size.

Page 15: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

KV Data Base for Data

Head

4 KB

4 KB

4 KB ……

chunk

64 MB

64MB

64MB …… 64

MB

4 KB

4 KB

4 KB

4 KB

1 MB

1 MB

1 MB

1 MB

Slice

Chunk Descriptor

Meta Data

reserve

Index-Primary Index-Secondary Meta Data Free Block Bitmap Data

reserve reserve reserve reserve

SoD K-V DB

Free Block Bitmap use 4KB bitmap to manage 64MB data chunk.

Each Chunk can be split into 4KB~4MB slice.

Page 16: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

DRM(Disk Reliable Module)

SCSI

HBA

BLOCK

DRM

K-V Data Base

Kernel Space

User Land

DRM Functions:

HDD error Processing. According to SCSI

Sense Data and Status Code, retry IO or reset

device, etc.

HDD online diagnose. S.M.A.R.T / ERROR/

LOG information analyze.

Bad Sector recover.

Disk Heath Analyze.

Slow Disk Detect.

Page 17: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

EC (Erasure Code) in UDS

Algorithm to decrease possibility of putting same EC chunk into one SmartDisk.

For example, one Object is 12MB, storing by 12+5 (N+M) Erasure Code (12 data

chunk + 5 parity chunk with 1MB chunk size), avoid put these any 2 of 17 chunks in one

SmartDisk.

Partial bad Data recovery. Only part of the chunk data is bad, like 64KB of 1MB

chunk; with N+M EC, Even more than M chunks are partially bad, and these bad data

are not alignment, it maybe recovered.

Intel ISA-L(Intelligent Storage Acceleration Library). Leverage hardware accelerating

instructions to improve performance.

Page 18: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

DeDup & Compress

File Type .txt .wma .pdf .mp3 .doc FileSize Pre-Comress 6M 2.6M 1.6M 5.1M 2.1M

Compressed FileSize 251K 2.5M 1.4M 5.1M 1.2M

Compression Ratio 24:1 1.04:

1 1.15:1 1:1 1.75:

1 Compression Time(ms) 205 240 117 399 127

Decompression Time(ms) 60 71 32 117 33

Concurrt Requests 50 100 200 600 Object Size 1M 1M 1M 1M CPU Usage(%) 4 7.1 12.75 34.62

Post-process De-Duplication.

Single Instance technology, each

object has a hash value, objects

with the same hash value will be

deduplicated.

Compression. Compress the

Object data.

Page 19: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

S3 FS

GUI

NAS Gateway NFS SAMBA

FUSE

Cache

ManagerServer

UDS

NFS CIFS

S3

FUSE based S3FS.

Provide cache function to

improve performance.

Exported by NFS/CIFS.

Client can migrate

massive files into Object

Storage.

Dir/ File map to Object.

Internal translation

between Dir/File format

and Object format.

Page 20: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

WORM for Archive

Internet

S3 API Client

TSA Service

SoD

SHA-256 Engine

TSA

Clie

nt

UDS Internet

TSA* Gateway

Leverage GuardTime KSI(Keyless Signature

Infrastructure) technology. Without key management , easy

to deploy.

Trustful Time Stamping Authority(TSA). Guarantee time is

correct anytime , and signature to detect data tampering.

Page 21: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Agenda

Object Storage Understanding UDS System Design philosophy UDS Hardware Design UDS Software Design

Future works

Page 22: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Slow HDD

Slow HDD increase the latency.

How to detect Slow HDD?

Isolate Slow HDD quickly to decrease the penalty.

Page 23: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

QoS(Quality of Service)

Multi-Tenant has different SLA.

Reserved resource for Tenant.

Priority Schedule for Tenant.

Guarantee QoS while load of one SmartDisk is very heavy.

Page 24: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Performance(Latency)

Write through cache in Access node and SmartDisk now.

Add BBU (Battery Backup Unit) in enclosure to let SmartDisk use

write-back cache.

Add capacitor in SmartDisk to do write-back cache.

Hybrid Hard Disk for SmartDisk. Put Metadata (key index & bitmap)

in flash.

Internal Data Repair Task affects performance since it may need

longer time.

Page 25: Huawei SmartDisk based Object Storage --- Universal ... SmartDisk based Object Storage --- Universal Distributed Storage Qingchao Luo Huawei Technologies Co. ... File Type.txt.wma.pdf.mp3

2013 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

25

Thank You Q & A