42
Software Defined Storage based on OpenStack Open Frontier Lab. Manseok (Mario) Cho [email protected]

Storage based on_openstack_mariocho

Embed Size (px)

Citation preview

Page 1: Storage based on_openstack_mariocho

Software Defined Storagebased on OpenStack

Open Frontier Lab.Manseok (Mario) Cho

[email protected]

Page 2: Storage based on_openstack_mariocho

Who am I ?Development Experience◆ Bio-Medical Data Processing based on

HPC for Human Brain Mapping◆ Medical Image Reconstruction

(Computer Tomography) ◆ Enterprise System Architect◆Open Source Software Developer

Open Source Develop◆ Linux Kernel (ARM, x86, ppc)◆ LLVM (x86, ARM, custom) ◆ OpenStack : Orchestration (heat)◆ SDN (OpenDaylight, OVS, DPDK) ◆ OPNFV: (DPACC, Genesis, Functest, Doctor,)

Technical Book◆ Unix V6 Kernel

Open Frontier Lab.Manseok (Mario) Cho

[email protected]

Page 3: Storage based on_openstack_mariocho

Open Source S/W developer community

http://kernelstudy.net- Linux�Kernel�(ARM,�x86)- LLVM�Compiler- SDN/NFV

Page 4: Storage based on_openstack_mariocho

뭐냐?

Page 5: Storage based on_openstack_mariocho

*http://www.containerstore.com/s/kitchen/1** http://cool.conservation-us.org/coolaic/sg/bpg/annual/v11/bp11-38.html

Storage (저장, 보관, 창고)

Page 6: Storage based on_openstack_mariocho

Just one more than the rest combined

*http://www.funnyjunk.com/Computer+storage+throughout+time+part+2/funny-pictures/5465540/

Page 7: Storage based on_openstack_mariocho

10 2MAINFRAME

CLIENT-SERVER

WEB

SOCIAL

INTERNET OF THINGS

CLOUD

Few Employees

ManyEmployees

Customers/Consumers

BusinessEcosystems

Communities & Society

Devices& Machines

10 4

10 6

10 7

10 910 11

Front OfficeProductivityBack Office

Automation

E-CommerceLine-of-Business

Self-Service

Social Engagement

Real-TimeOptimization1960s-1970s

1980s

1990s

2011

2016

2007

OS/360

USERS

VALUE

TECHNOLOGIES

SOURCES

BUSINESS

TECHNOLOGY

10 11

The Technical Challenge

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec2013/

Page 8: Storage based on_openstack_mariocho

10 2MAINFRAME

CLIENT-SERVER

WEB

SOCIAL

INTERNET OF THINGS

CLOUD

Few Employees

ManyEmployees

Customers/Consumers

BusinessEcosystems

Communities & Society

Devices& Machines

10 4

10 6

10 7

10 910 11

Front OfficeProductivityBack Office

Automation

E-CommerceLine-of-Business

Self-Service

Social Engagement

Real-TimeOptimization1960s-1970s

1980s

1990s

2011

2016

2007

OS/360

USERS

VALUE

TECHNOLOGIES

SOURCES

BUSINESS

TECHNOLOGY

10 11

Processes

Stand alone projectsCorporate IT driven

Data InfrastructureLOB driven

Data ecosystem

++

Data integration becoming the barrier to business success

PeopleProduct & Things

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec/

The Technical Challenge

Page 9: Storage based on_openstack_mariocho

* http://www.toonpool.com/artists/toons_589

Page 10: Storage based on_openstack_mariocho

File System (파일시스템)

* http://computerrepair-vancouver.org/deleted-file-recovery-dos-and-donts/** http://www.ibm.com/developerworks/tivoli/library/t-tamessomid/*** http://www.informatics.buzdo.com/p778-debian-root-boot-bin-lib-dev.htm

Page 11: Storage based on_openstack_mariocho

Operating System focus on Storage

SSD/HDD Memory

Application

System Call

Processor

Schedulerprocess

manager

Memory Manager

File System

I/O Interface Device Driver

User Space

Hardware Space

Operating System(Kernel Space)

Network

Application Application

Logical Block Layer

Page 12: Storage based on_openstack_mariocho

Redundant Arrays of Independent Disks

SSD/HDD

Application

Resource Manager (VFS)

File System

User Space

Hardware Space

Operating System(Kernel Space)

Application Application

Logical Block Layer

SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDDSSD/HDD SSD/HDD SSD/HDDSSD/HDD

Hardware Block Layer (RAID Controller)

Page 13: Storage based on_openstack_mariocho

RAID: The First Software Defined Storageat 1988

* Source: 1988, Anil Vasudeva “ A case for Disk Arrays” Presented at conference, Santa Clara CA. Aug 1988

Page 14: Storage based on_openstack_mariocho

OpenStackOpenStack is collection of software for setting up a massiveIaaS (Infrastructure as a Service) environment

OpenStack consists of six main components below

OpenStack support Block Storage (Cinder) & Object Storage(Swift)

* http://www.openstack.org/software/

Page 15: Storage based on_openstack_mariocho

Storage System on OpenStack

SSD/HDD

Application

Resource Manager

File System

User Space

Hardware Space

Operating System(Kernel Space)

Logical Block Layer

Application ApplicationApplication

Virtual Computing Machine Manage (Nova)

Block StorageManager(Cinder)

Object StorageManager(Swift)

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Shared File System(Manila)

Application

Page 16: Storage based on_openstack_mariocho

Comparison of OpenStack Storage

Swift CinderManilaOpenStack Component

Object BlockFile

REST API iSCSINFS, CIFS/SMB

- VM live migration- Storage of VM files- Use with legacy application

Storage Type

Primary Interface

Use Cases - Large datasets- Movie,Images, Sounds- Storage of VM files- Archiving

- High performance- DBs- VM Guest Storage- Snaps shot- VMs clones

Benefit ScalabilityDurability ManageabilityCompatibility

* http://www.openstack.org/openstack-manuals/openstack-ops/content/storage_decision.html

Page 17: Storage based on_openstack_mariocho

Cinder provides persistent block storage resource to the virtual machines running on Nova compute

Cinder uses plugin to support multiple types of backend storages

Cinder: Block Storage Layer

Cinder

Nova Compute #1

VM #2VM #1 VM #3 …

Nova Compute #2

VM #8VM #7 VM #9 …

Nova Compute #3

VM #12VM #11 VM #13 …

CreateA volume

DeleteA volume Snapshot

Attach a volume

Detech a volume

Page 18: Storage based on_openstack_mariocho

Cinder: Volume Manage APIsAPI no. Work Function

(1)

Volume Operation

Create Volume(2) Create Volume from Volume(3) Extend Volume(4) Delete Volume(5) Connection Operation Attach Volume(6) Detach Volume(7)

Volume snapshot OperationCreate Snapshot

(8) Create Volume from Snapshot(9) Delete Snapshot

(10) Volume Image Operation Create Volume from Image(11) Create Image from Volume

Nova VM VM #1

Glance Image

Cinder Volume

Snapshot

VM #2

1) create volume

2) Create volume from volume

5) Attach volume 5) Attach volume

7) Create Snapshot

VM #4

8) Create volume from Snapshot

5) Attach volume

VM #5

10) Create volume from Image

5) Attach volume

VM #6

11) Create Image from Volume

5) Attach volume

VM #3

5) Attach volume

3) Extent volume

Page 19: Storage based on_openstack_mariocho

Cinder: Requirement of backend

Life Cycle of VM CreateVM

LaunchVM

RunningVM

StopVM

DeleteVM

Cinder Work Create /

Attach

Extend/

SnapshotDetach Delete

Technical Requirement

1. When needed, quickly prepare block space

2. Copy and reuse existing block

1. Flexible add

2. Automatic Extend block

1. Preserve important data

2. Safety delete unnecessary confidential data

Page 20: Storage based on_openstack_mariocho

Cinder: Volume Manage (Scheduler)

Volume Service 1

Volume Service 2

Volume Service 3

Volume Service 4

Volume Service 5

Volume Service 1

Volume Service 2

Volume Service 3

VolumeService 4

Volume Service 5

Weight = 25

Weight = 20

Weight = 41

Volume Service 2

Volume Service 4

Volume Service 5

Filters Weighers

Winner!

• AvailabilityZoneFilter

• CapabilitiesFilter

• JsonFilter• CapacityFilter• RetryFilter

• CapacityWeigher• AllocatedVolumesWeigher• AllocatedSpaceWeigher

* http://www.intel.com/

Page 21: Storage based on_openstack_mariocho

Cinder: Create Volume Create volume- User: POST http://volume1.server.net:8776/v2/{tenat_id}/volumes

Cinder-API: CALL cinder.volume.API().create()Cinder.volume.API: RPC CAST cinder.scheduler()

Cinder.scheduler: SCHEDULE volume hostCinder.scheduler: RCP CAST cinder.volume.create_volume()

Cinder.volume.manager: CALL cinder.volume.driver.create_volume()Cinder.volume.manager: CALL cinder.volume.dirver.create_export()

Attach volume- User: POST http://novacompute1.server.net:8776/v2/{tenat_id}/servers/{vm_uuid}/os-

volume_attachments

Nova-API: CALL Nova.compute.API.attach_volume()Nova.compute.API: RPC CAST Nova.compute.manager.attach_volume()

Nova.compute.manager.attach_volume: RPC CALL cinder.volume.initialize_connection()Noba.compute.manager.attach_volume: RPC CALL virt volume driver attach_volume()

- libvirt.driver.attach_volume() -> volume_driver.connect_volume()

Nova.compute.manager.attach_volume: RPC CALL cinder.volume.attach()

* Source: https://tw.pycon.org/2013/site_media/media/proposal_files/cinder_2013.pdf

Page 22: Storage based on_openstack_mariocho

Cinder: Plug-In

CinderPlugins

SoftwareBased

HardwareBased

File SystemBased

BlockBased

FibreChannel

iSCSI

NFS

DFS

NFS

GlusterFSGPFSCeph

LVM

Storage vender specificPlug-ins such asEMC, Hitachi, HP, Dell…

Page 23: Storage based on_openstack_mariocho

Cinder Plug-In: LVM caseNova VM VM #1

Cinder

LVM

Cinder API

VM #2 VM #3 …

Hyper-visior (KVM, VMWARE, … )

iSCSI initiator /dev/sdx

Nova VM VM #6 VM #7 VM #8 …

Hyper-visior (KVM, VMWARE, … )

iSCSI initiator /dev/sdx

LV#1 LV#2 LV#3 LV#4

Cinder Scheduler

Volume (LVM Plugin)

iSCSI target

iSCSI target

Attach/Detach via Hypervisor Attach/Detach via Hypervisor

Create/Delete/Extend/…

Page 24: Storage based on_openstack_mariocho

Cinder Plug-In: FC caseNova VM VM #1

Cinder

LVM

Cinder API

VM #2 VM #3 …

Hyper-visior (KVM, VMWARE, … )

/dev/sdx(LUN1)

Nova VM VM #6 VM #7 VM #8 …

Hyper-visior (KVM, VMWARE, … )

Cinder Scheduler

Volume (FC Plugin)

Storage Controller

Attach/Detach via Hypervisor Attach/Detach via Hypervisor

Create/Delete/Extend/…

LUN1 LUN2 LUN3 LUN4

/dev/sdy(LUN2)

/dev/sdx(LUN1)

/dev/sdy(LUN2)

Page 25: Storage based on_openstack_mariocho

Cinder Compare LVM vs FC

LVM FC Remark

Volume Implementation

Managed LVM Managed Storage Controller

VolumeOperation

LVM (Software) FC (Hardware) LVM more flexible

SupportedStorage

Storage Independent

SpecificStorage (req. plug-in)

LVM BetterSupport coverage

Access Path iSCSI (Software)

Fibre Channel(Hardware)

FC Better performance

Page 26: Storage based on_openstack_mariocho

Swift: Object Storage

Client

Swift

ProxyNode

StorageNodeHTTP (REST API) HTTP (REST API)

Account NodeContainer NodeObject Node

Reliable

Highly Scalable

Hardware Proof

Configurable replica model with zones & regionsEasy to use HTTP API – Developers don’t shardHigh Concurrency (support lots of users)

Multi-tenant: each account has its own namespaceTier & Scale any component in the system

No Single Point of Failure (High AvailabilityAssumes unreliable hardwareMix & match hardware vendors

* https://www.openstack.org/assets/presentation-media/Swift-Workshop-OSS-Atlanta-2014.pdf

Page 27: Storage based on_openstack_mariocho

Swift: Ring Hash

* Source: https://ihong5.wordpress.com/tag/consistent-hashing-algorithm/

Page 28: Storage based on_openstack_mariocho

Swift architectureUser Space

Swift Stooge Node

Swift Proxy Node

Application ApplicationApplication

HTTP Load balancer

ProxyNode

ProxyNode

(expand)

Storage Node

Storage Node

Storage Node

Storage Node

Proxy Node

Application

Network

Storage Node

ExpandProxy Server“Throughput”

ExpandStorage Server“Volume”

Page 29: Storage based on_openstack_mariocho

Swift Account Node/

srv

node

Disk#1 Disk#2 Disk#3

accountcontainer object

Partition #1 Partition #2 Partition #3 Partition #4

Hash low value Hash low value Hash low value

Hash #2Hash #1

Account Hash.db Hash.db.pending

Page 30: Storage based on_openstack_mariocho

Swift Container Node/

srv

node

Disk#1 Disk#2 Disk#3

Contrineraccount object

Partition #1 Partition #2 Partition #3 Partition #4

Hash low value Hash low value Hash low value

Hash #2Hash #1

container Hash.db Hash.db.pending

Page 31: Storage based on_openstack_mariocho

Swift Object Node/

srv

node

Disk#1 Disk#2 Disk#3

account container object

Partition #1 Partition #2 Partition #3 Partition #4

Hash low value Hash low value Hash low value

Hash #2Hash #1

Object Timestamp.data Timestamp.meta

Async_pending tmp qurantined

Timestamp.ts

Hashes.pkl

Page 32: Storage based on_openstack_mariocho

Swift ReplicatorNode #1 Node #2 Node #3 Node #4 Node #5

Each Nodes checks

Node #1 Node #2 Node #3 Node #4 Node #5Find Defect data

Node #1 Node #2 Node #3 Node #4 Node #5Copy data to another node

Node #1 Node #2 Node #3 Node #4 Node #5Recovery data to original node

Node #1 Node #2 Node #3 Node #4 Node #5Delete temp. data

Page 33: Storage based on_openstack_mariocho

Swift: hash synchronizeNode #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Node #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Sync

Data #2 tmp

Data #1

Data #3

Node #1

HASH A HASH B

Data #1

Data #3

Node #2

HASH A HASH B

Data #2 tmp

Data #2 tmp

Data #1

Data #3

Page 34: Storage based on_openstack_mariocho

Swift: Object Update

Node

tmp HASH

upload Data #1

Node Node

tmp HASH

upload Data #1

Data#1’

Move

tmp HASH

Data #1

Data#1’

Delete

Page 35: Storage based on_openstack_mariocho

Swift: Object Update

Node Node

HASH

Data #1

NewData

HASH

Data#1

Delete old

Page 36: Storage based on_openstack_mariocho

https://swift.server.net/v1/AUTH_account/container/object

Swift: REST APIs

Prefix API version

Account Container Object

Page 37: Storage based on_openstack_mariocho

Swift: Using REST for Object handling

Basic Command- http://swift.server.net/v1/account/container/object

Get a list of all container in an account- GET http://swift.server.net/v1/account/

To create new container- PUT http://swift.server.net/v1/account/new_container

To list all object a container- GET http://swift.server.net/v1/account/container

To create new object- PUT http://swift.server.net/v1/account/container/new_object

* Source: https://tw.pycon.org/2013/site_media/media/proposal_files/cinder_2013.pdf

Page 38: Storage based on_openstack_mariocho

Swift vs Ceph

* http://japan.zdnet.com/article/35072972/

Load balancer

Client

Proxy Proxy

Client

OSDOSD OSD OSD

Monitor/Metadata

Placement Group

Cluster Map

Swift Ceph

Page 39: Storage based on_openstack_mariocho

Data Science work flow

EDW

NoSQL

Data

Dis

tribu

tion

Business Intelligence

Hadoop

Grid

MDM Data Quality

Real-time Streaming

Batch

Replication

Collection Layer

Data IntegrationLayer

ReportLayer

Data Sources

Archival

HadoopData

Exploration

Network Elements

Content

Network Logs

Social Media

External Data

Transactions CEP

StagingLayer

* http://www.slideshare.net/SanjeevKumar17/tech-mahindra-i5sanjeevdec/

Page 40: Storage based on_openstack_mariocho

Data analysis with OpenStack storage

StorageAPIs

AnalysisAPIs

Storage Node

Compute Node

Serv

ing

Laye

rB

atch

Lay

er

Rea

l-tim

e A

naly

sis

Lay

er

New data stream

Stream Processing

RealtimeView

Query

All Data

Pre-compute Views

BatchView

Batch View

Page 41: Storage based on_openstack_mariocho

Thanks you!

Q&A

Page 42: Storage based on_openstack_mariocho

The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

• GPFS is a trademark of International Business Machines Corporation in the United States, other countries, or both.

• GlusterFS, the Gluster ant logo, and the Gluster Community logo are all trademarks of Red Hat, Inc. All other trademarks, registered trademarks, and product names may be trademarks of their respective owners.

• Dell is a trademark of Dell Inc.

• EMC and CLARiiON are registered trademarks of EMC Corporation.

• HP is a trademark of Hewlett-Packard Development Company, L.P. in the U.S. and other countries.

•Other company, product or service names may be trademarks or service mark of others.