Distributed storage system

Preview:

DESCRIPTION

Distributed storage system

Citation preview

DISTRIBUTED STORAGE

SYSTEM

Mr. Dương Công Lợi

Company: VNG-Corp

Tel: +84989510016

Email:loiduongcong@gmail.com

CONTENTS

1. What is distributed-computing system?

2. Principle of distributed database/storage

system

3. Distributed storage system paradigm

4. UniversalDistributedStorage

1. WHAT IS DISTRIBUTED-COMPUTING

SYSTEM?

Distributed-Computing is the process of solving a

computational problem using a distributed

system.

A distributed system is a computing system in

which a number of components on multiple

computers cooperate by communicating over a

network to achieve a common goal.

DISTRIBUTED DATABASE/STORAGE

SYSTEM

A distributed database system, the database is

stored on several computers .

A distributed database is a collection of multiple

, Logic computer network .

DISTRIBUTED SYSTEM ADVANCE

Advance

Avoid bottleneck & single-point-of-failure

More Scalability

More Availability

Routing model

Client routing: client request to appropriate server to

read/write data

Server routing: server forward request of client to

appropriate server and send result to this client

* can combine the two model above into a system

DISTRIBUTED STORAGE SYSTEM

Store some data {1,2,3,4,6,7,8} into 1 server

And store them into 3 distributed server

1,2,3,4,6,7,8

1,2,34,6

7,8

2. PRINCIPLE OF DISTRIBUTED

DATABASE/STORAGE SYSTEM

Shard data key and store it to appropriate server

use Distributed Hash Table (DHT)

DHT must be consistent hashing:

Uniform distribution of generation

Consistent

Jenkins, Murmur are the good choice; MD5, SHA

slower

CANONICAL PROBLEMS IN DISTRIBUTED

SYSTEMS

Distributed data independence

Distributed transactions: ACID (Atomicity,

Consistency, Isolation, Durability) requirement

Fault tolerance

Transparency

3. DISTRIBUTED STORAGE SYSTEM

PARADIGM

Data Hashing/Addressing

Determine server for data store in

Data Replication

Store data into multi server node for more available,

fault-tolerance

DISTRIBUTED STORAGE SYSTEM

ARCHITECT

Data Hashing/Addressing

Use DHT to addressing server (use server-name) to a

number, performing it on one circle called the keys

space

Use DHT to addressing data and find server store it

by successor(k)=ceiling(addressing(k))

successor(k): server store k

0

server3

server1

server2

DISTRIBUTED STORAGE SYSTEM

ARCHITECT

Addressing – Virtual node

Each server node is generated to more node-id for

evenly distributed, load balance

Server1: n1, n4, n6

Server2: n2, n7

Server3: n3, n5

0

server3

server1

server2

n7

n1

n5

n2

n4

n6

n3

n6

DISTRIBUTED STORAGE SYSTEM

ARCHITECT

Data Replication

Data k1 store in server1 as master and store in

server2 as slave

0

server3

server1

server2

k1

UNIVERSALDISTRIBUTEDSTORAGE

a distributed storage system

4. UNIVERSALDISTRIBUTEDSTORAGE

UniversalDistributedStorage is a distributed

storage system develop for:

Distributed data independence

Distributed transactions (ACID)

Fault tolerance

Leader election (decision for join or leave server node)

Replicate with multiple master replication

Transparency

UNIVERSALDISTRIBUTEDSTORAGE

ARCHITECTURE

Overview

Bussiness

Layer

Distrib

uted

Layer

Storage

Layer

Bussiness

Layer

Distrib

uted

Layer

Storage

Layer

Bussiness

Layer

Distrib

uted

Layer

Storage

Layer

ARCHITECTURE OVERVIEW

UNIVERSALDISTRIBUTEDSTORAGE

FEATURE

Data hashing/addressing

Use Murmur hashing function

UNIVERSALDISTRIBUTEDSTORAGE

FEATURE

Leader election

Use Bully Leader Election algorithm

UNIVERSALDISTRIBUTEDSTORAGE

FEATURE

Multi-master replication

Problem of multi-master replication

UNIVERSALDISTRIBUTEDSTORAGE

FEATURE

Multi-master replication

Data store to main master (called sub-leader), then

this data post to queue to sync to other master.

UNIVERSALDISTRIBUTEDSTORAGE

STATISTIC

System information:

3 machine 8GB Ram, core i5 3,220GHz

LAN/WAN network

7 physical servers on 3 above mechine

Concurrence write 16500000 items in 3680s, rate~ 4480req/sec (at client computing)

Concurrence read 16500000 items in 1458s, rate~ 11320req/sec (at client computing)

* It doesn’t limit of this system, it limit at clients (this test using 3 client thread)

Q & A

Contact:

Duong Cong Loi

loiduongcong@gmail.com

https://www.facebook.com/duongcong.loi

Recommended