41
Copyright © 2014 NTT DATA Corporation Yuji Hagiwara [email protected] Platform Engineer, NTT DATA Corp. Introduction to OpenStack Swift CloudOpen Japan 2014

Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

Embed Size (px)

Citation preview

Page 1: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

Copyright © 2014 NTT DATA Corporation

Yuji Hagiwara

[email protected]

Platform Engineer, NTT DATA Corp.

Introduction to OpenStack Swift CloudOpen Japan 2014

Page 2: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

2 Copyright © 2014 NTT DATA Corporation

Agenda

1.What is Swift?

2.Swift’s Latest Information

3.Swift’s Future

Page 3: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

3 Copyright © 2014 NTT DATA Corporation

Who am I

Yuji Hagiwara – Platform Engineer, NTT DATA Corp.

Since 2011 -

Using OpenStack

Since 2013 -

Developing Searching on Swift

Demo App for Searching on Swift

Page 4: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

4 Copyright © 2014 NTT DATA Corporation

2004 2007 2010 2013 2016

Amount of Unstructured Data

Background

Data Explosion on Enterprise – Amount of Unstructured Data has been growing.

• We need storage with Scalability, Durability, Availability.

Examples of Unstructured Data

• Media (Images, Videos, Audios)

• Web Contents

• Documents

• Backups/Archives

Where should we store these data?

One of the Solutions is Swift.

Growing

exponentially

EB or PB scale

Page 5: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

5 Copyright © 2014 NTT DATA Corporation

What is Swift?

Swift is...

• A storage system with Scalability, Durability, Availability.

• The REST-ful Distributed Object Storage likely Amazon S3.

• One of OpenStack Core Components.

• Implemented by Python.

• A Open Source Software.

① Block Storage (Cinder)

② Object Storage (Swift)

Page 7: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

7 Copyright © 2014 NTT DATA Corporation

Use cases of Swift

Page 8: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

8 Copyright © 2014 NTT DATA Corporation

Swift as a storage for a variety of applications

Swift

System

Backup

REST API

CMS

Cyber

Duck

FTP-like use Digital

Distribution

Web

Apps …

Page 9: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

9 Copyright © 2014 NTT DATA Corporation

OpenStack Swift deployments and use cases

Name of enterprise Product/ service Description

Rackspace(USA) Cloud Files Cloud file share service by Rackspace itself. They use same code as OSS except for features such as authentication, Accounting and CDN(<500PB)

Korean Telecom (Sourth Korea)

ucloud storage service Object storage service using OpenStack/Swift (16PB+ size)

Sina (Republic of China) Sina App Engine(SAE) Public storage service. They moved to OpenStack from another technology MongoDB in 2012.

San Diego Supercomputer Center (USA)

SDSC Cloud Storage Services

Cloud storage service on SDSC. Users can select Amazon/S3 or Rackspace Swift.

SME Storage (USA) SMEStorage Open Cloud Platform

Cloud storage service based on Rackspace Cloud File

SoftLayer (USA) SoftLayer Object Storage Public object storage service. Acquisition by IBM

SwiftStack(USA) Swift Stack Provide professional service and Operation and management product

HP(USA) HP Cloud Private cloud storage service uses OpenStack.

Wikimedia(USA) Wikimedia storage Media files store for Wikipedia.

NII(JAPAN) Academic Cloud service Academic cloud service by National Institute of Informatics in Japan (NII)(Integrated and supported by NTT Data)

Page 10: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

10 Copyright © 2014 NTT DATA Corporation

Inside Swift

Page 11: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

11 Copyright © 2014 NTT DATA Corporation

Architecture: Nodes

Swift consist of 2-type Nodes: Proxy Node and Storage Node.

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node

HTTP Load balancer

Forward Data to node

Store data

Application

Proxy Node

Page 12: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

12 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Architecture: The Ring

The Ring (static table for data allocation on storage node)

decide the optimal Storage Node by Name.

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

… Ring Ring Ring

Application

Ring Ring Ring Ring

Page 13: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

13 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Architecture: The role of Ring

If you requested to Store the data “A”, 3 Replica nodes store the data “A”.

Storage Node 1 Storage Node 2 Storage Node 3 Storage Node 4

Proxy Node Proxy Node Proxy Node

… Ring Ring Ring

data

Data “B” Data “B” Data “B”

“A” must be located

at “1”, “2”, “4”

Data “A” Data “A”

Data “A”

Application

Ring Ring Ring Ring

Page 14: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

14 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Architecture: The role of Ring

If you requested to Get the data “A”, One of Nodes reply the data “A”.

Storage Node 1 Storage Node 2 Storage Node 3 Storage Node 4

Proxy Node Proxy Node Proxy Node

… Ring Ring Ring

data

Data “B” Data “B” Data “B”

“A” must be located

at “1”, “2”, “4”

Data “A” Data “A”

Data “A”

Application

Ring Ring Ring Ring

data

Page 15: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

15 Copyright © 2014 NTT DATA Corporation

Scalability

Proxy

Storage Storage Storage

Proxy (expand)

Proxy

Storage Storage Storage

Proxy

Storage (Expand)

(1) Expand proxy server

“Throughput” (2)Expand Storage servers

or disks “volume”

More Throughput

More Volume

Page 16: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

16 Copyright © 2014 NTT DATA Corporation

Many processes working together

Swift

object-

replicator

account-

server

object-

auditor

object-

updater

object-

server

object-

expirer

account-

replicator

account-

auditor

account-

reaper

container-

server

container-

replicator container-

auditor

container-

updater

container-

sync

proxy-

server

Page 17: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

17 Copyright © 2014 NTT DATA Corporation

Replicator

Node 1 Node 2 Node 3 Node 4

Node 2 Node 3 Node 4

(1) Each nodes checks data in others

Node 1 Node 2 Node 3 Node 4

(5) Recover disk

(6) recover data to original node

(4) Copy data to another node

Nor

mal

Defe

at

Reco

very

Node 1 Node 2 Node 3 Node 4 (2) Disk defeat

(3) Detect disk trouble

Node 1

Node 2 Node 3 Node 4 Node 1

Replicator Disk

(7)Delete temporal data

Page 18: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

18 Copyright © 2014 NTT DATA Corporation

Replicator

Node 1 Node 2 Node 3 Node 4

Node 2 Node 3 Node 4

(1) Each nodes checks data in others

Node 1 Node 2 Node 3 Node 4

(5) Recover disk

(6) recover data to original node

(4) Copy data to another node

Nor

mal

Defe

at

Reco

very

Node 1 Node 2 Node 3 Node 4 (2) Disk defeat

(3) Detect disk trouble

Node 1

Node 2 Node 3 Node 4 Node 1

Replicator Disk

(7)Delete temporal data

Page 19: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

19 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Normal state

Each Data has replicated.

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

Server Server Server Server

Ring Ring Ring

Ring Ring Ring Ring

Data “A” Data “A”

Data “A” Data “B” Data “B” Data “B”

Application

Auditor

Replicator

Auditor

Replicator

Auditor

Replicator …

Auditor

Replicator

Page 20: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

20 Copyright © 2014 NTT DATA Corporation

Replicator

Node 1 Node 2 Node 3 Node 4

Node 2 Node 3 Node 4

(1) Each nodes checks data in others

Node 1 Node 2 Node 3 Node 4

(5) Recover disk

(6) recover data to original node

(4) Copy data to another node

Nor

mal

Defe

at

Reco

very

Node 1 Node 2 Node 3 Node 4 (2) Disk defeat

(3) Detect disk trouble

Node 1

Node 2 Node 3 Node 4 Node 1

Replicator Disk

(7)Delete temporal data

Page 21: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

21 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Defeat state

If a disk is broken...

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

Server Server Server Server

Ring Ring Ring

Ring Ring Ring Ring

Data “A” Data “A”

Data “A” Data “B” Data “B” Data “B”

Application

Auditor

Replicator

Auditor

Replicator

Auditor

Replicator …

Auditor

Replicator

Broken

Page 22: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

22 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Defeat state

Replicator detects the lost data and replicates the data to another node for

temporary.

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

Server Server Server Server

Ring Ring Ring

Ring Ring Ring Ring

Data “A” Data “A”

Data “A” Data “B” Data “B”

When detect a lost data,

Replicate the data.

Data “B”

Application

Auditor

Replicator

Auditor

Replicator

Auditor

Replicator …

Auditor

Replicator

Temporary data

Broken

Page 23: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

23 Copyright © 2014 NTT DATA Corporation

Replicator

Node 1 Node 2 Node 3 Node 4

Node 2 Node 3 Node 4

(1) Each nodes checks data in others

Node 1 Node 2 Node 3 Node 4

(5) Recover disk

(6) recover data to original node

(4) Copy data to another node

Nor

mal

Defe

at

Reco

very

Node 1 Node 2 Node 3 Node 4 (2) Disk defeat

(3) Detect disk trouble

Node 1

Node 2 Node 3 Node 4 Node 1

Replicator Disk

(7)Delete temporal data

Page 24: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

24 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Recovery state

When the broken disk is replaced to a fresh disk...

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

Server Server Server Server

Ring Ring Ring

Ring Ring Ring Ring

Data “A” Data “A”

Data “A” Data “B” Data “B”

Data “B”

Application

Auditor

Replicator

Auditor

Replicator

Auditor

Replicator …

Auditor

Replicator

Page 25: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

25 Copyright © 2014 NTT DATA Corporation

HTTP Load balancer

Recovery state

Replicator replicates the data and removes the temporary data.

Storage Node Storage Node Storage Node Storage Node

Proxy Node Proxy Node Proxy Node

Server Server Server Server

Ring Ring Ring

Ring Ring Ring Ring

Data “A” Data “A”

Data “A” Data “B” Data “B”

Replicate the data to

the correct node.

Data “B”

Application

Auditor

Replicator

Auditor

Replicator

Auditor

Replicator …

Auditor

Replicator

Removed

Page 26: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

26 Copyright © 2014 NTT DATA Corporation

Replicator

Node 1 Node 2 Node 3 Node 4

Node 2 Node 3 Node 4

(1) Each nodes checks data in others

Node 1 Node 2 Node 3 Node 4

(5) Recover disk

(6) recover data to original node

(4) Copy data to another node

Nor

mal

Defe

at

Reco

very

Node 1 Node 2 Node 3 Node 4 (2) Disk defeat

(3) Detect disk trouble

Node 1

Node 2 Node 3 Node 4 Node 1

Replicator Disk

(7)Delete temporal data

Page 27: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

27 Copyright © 2014 NTT DATA Corporation

Latest Information

Page 28: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

28 Copyright © 2014 NTT DATA Corporation

History and Trend of Community

2010.6 Start OpenStack Project

2010.10 1st release "Austin"

2013.4 “Grizzly”

2013.10 “Havana” 2014.10 “Juno”

Fundamental

Global Cluster

2013.4 “Icehouse”

Developing Supported

Timeline in each functions

Erasure Coding

Storage Policy

Hot Topics on Now

Development Trend in Swift

History of OpenStack

Now

Page 29: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

29 Copyright © 2014 NTT DATA Corporation

Replication Erasure Coding

Size 3x original 2x original

Latest info: Erasure Coding

Data

Partial

Data Partial

Data

Parity

1

Data

Data Data Data

distribute

Parity

2

distribute

Page 30: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

30 Copyright © 2014 NTT DATA Corporation

Latest info: Storage Policy

Data Data Data

Data Data Data

Data Data Data

Before

Data Data Data

Data Data

After

Partial

Data

Partial

Data

Parity

1

Parity

2

Same Policy on cluster Variety Policy on cluster

More flexibility, but More complex.

Page 31: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

31 Copyright © 2014 NTT DATA Corporation

Future Direction of Swift

2 concepts:

Integrated Searchable Storage

Intelligent Resource Management

Page 32: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

32 Copyright © 2014 NTT DATA Corporation

Integrated Searchable Storage

Swift should be integrated with Searching.

It means to need searching as Scalable, Durable, Available as Swift.

Swift

Storage System

Users

Operators

Managers

Store

Get

Search

Page 33: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

33 Copyright © 2014 NTT DATA Corporation

Use cases of Search

1.Content Search

2.Detection for de-duplication

3.Tiered storage

Data

Hash

Hot Content – More Modified

Data

Hash Data

Hash

Data

Hash

Cold Content – Less Modified

Already Stored?

Cheap Storage

Modifed Date is older than 05/20/2014?

Page 34: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

34 Copyright © 2014 NTT DATA Corporation

Future: Integrated Searchable Storage

How do we implement?

Internal External

Swift

Swift

preexist

Search

Engine

Hook

Index

Store

Search

Search Search Store

Search

feature

Storage

feature Storage

feature

Index

Internal External

Where do search Swift with search library

(such as Lucene)

Search Engine

(such as Solr)

Redundancy High Depend on Search engine

Availability High Depend on Search engine

Scalability High Depend on Search engine

Difficulty of implementation Hard Easy

Page 35: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

35 Copyright © 2014 NTT DATA Corporation

Our Implementation

Internal Approach

• Hack Swift to embed the search library.

container-server

Indexing Searching

Proxy Node

Swift’s

Ordinary API

New Search

API

object-

server

Lucene SQLite

ContainerA

DB

ContainerB

DB

Container A

Index

Container B

Index …

account-

server

Metadata

Data Query

Application

Indexing Search

Distributed by the Ring

Page 36: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

36 Copyright © 2014 NTT DATA Corporation

Future: Intelligent Resource Management

Swift has more and more different functions.

Swift

object-

replicator

account-

server

object-

auditor

object-

updater

object-

server

object-

expirer account-

replicator

account-

auditor

account-

reaper

container-

server

container-

replicator container-

auditor

container-

updater

container-

sync

proxy-

server

ec-

auditor

ec-

reconstructor

ec-stripe-

auditor

Erasure Coding

Storage Policy

Multi-ring support

Other arbitrary processes

Search...?

Compression...?

Encryption...?

Page 37: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

37 Copyright © 2014 NTT DATA Corporation

Future: Intelligent Resource Management

Resources are drained! – IOPS, CPU, Network, Memory

Performance Priorities of these functions are

different by the Requirement.

Ex1) Store performance VS Search performance

Ex2) Service Level on Business Hour

VS on Outside Hour

More Intelligent Resource Management is necessary.

with cgroups

0

12

6 18 Business Hour

(High-prio to process

requests)

Outside

(High-prio to

check durability)

Page 38: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

38 Copyright © 2014 NTT DATA Corporation

Summary

1.What is Swift? Swift is a Great OSS, for storing unstructured data.

2.Swift’s Latest Information Erasure Coding

Storage Policy

3.Swift’s Future Integrated Searchable Storage

Intelligent Resource management

Page 39: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

39 Copyright © 2014 NTT DATA Corporation

PR: Demonstration is Now Available!

We exhibit the Demo Application(Contents delivery system) built with Swift.

• On-demand Delivery a lot of contents(Pictures or movies) stored at Swift.

• Implemented Searching on Swift. (Our original implementation)

(map for demo booth)

Page 40: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

Copyright © 2011 NTT DATA Corporation

Copyright © 2014 NTT DATA Corporation

Thank you for your attention!

Please contact to [email protected] ,

if you have any questions or comments.

Q&A: Do you have any question?

Page 41: Introduction of OpenStack Swift - … · Introduction to OpenStack Swift ... • Implemented by Python. ... Sina (Republic of China) Sina App Engine(SAE

41 Copyright © 2014 NTT DATA Corporation

Challenges and Questions

How to integrate Swift with cgroups?

How to use cgroups?

What is the best toolset for cgroups?

VFS?

libcgroup?

systemd?

How to control multiple hosts with cgroups dynamically?

How to integrate Swift with search?

What is the best implementation way?

What is the best search middleware?

How to search Multilingual?