35
www.safespring.com SUNET Distribuerad Lagring Current status and plan ahead Gabriel Paues, Safespring Cloud Architect 2019-04-03

SUNET Distribuerad Lagring - Safespring

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SUNET Distribuerad Lagring - Safespring

www.safespring.com

SUNET Distribuerad LagringCurrent status and plan ahead

Gabriel Paues, Safespring Cloud Architect2019-04-03

Page 2: SUNET Distribuerad Lagring - Safespring

SUNETservice delivery

BILLING

REPORTING

MAINTENANCE

MONITORING

OPERATIONS

SUNET Compute

SUNET Backup

SUNET Storage

PORTAL

Customizable per customer or

reseller with your own

logos, colors.User

SWAMID

Page 3: SUNET Distribuerad Lagring - Safespring

Standardizedbuilding blocks

Applications

Data

Runtime

Middleware

Operating System

Virtualization

Servers

Storage

Networking}

Page 4: SUNET Distribuerad Lagring - Safespring

Safespring core competency

User

Business

Technology

Page 5: SUNET Distribuerad Lagring - Safespring

Identifying newbuilding blocks

Applications

Data

Runtime

Middleware

Operating System

Virtualization

Servers

Storage

Networking}

Page 6: SUNET Distribuerad Lagring - Safespring

SUNET Object Storage

Page 7: SUNET Distribuerad Lagring - Safespring

What isobject storage?

Object storage is a computer storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy.

Typically this is implemented by storing a binary object in a container together with metadata describing it.

https://en.wikipedia.org/wiki/Object_storage

Page 8: SUNET Distribuerad Lagring - Safespring

What isobject storage?

Typical data storage user requirements

● Availability● Consistency● Resilience● Cost

It is significantly cheaper to provide guarantees per object instead of across all operations and/or objects as a filesystem.

Page 9: SUNET Distribuerad Lagring - Safespring

What isobject storage?

S3 API

A de facto standard based on representational state transfer (REST) over HTTP - the Amazon Simple Storage Service application programming interface.

Well-written specification has been available for over 7 years.

Page 10: SUNET Distribuerad Lagring - Safespring

What is SUNETobject storage?

A solution to data gravityOn premisesData stay close to its users

FastReliableCheap

Page 11: SUNET Distribuerad Lagring - Safespring

SUNET Object Storage

FASTRELIABLECHEAP

Page 12: SUNET Distribuerad Lagring - Safespring

Proven BGP onlynetwork design

Internet as a design pattern for the datacenter cluster network ensure scalability and predictable performance. No surprises now, no surprises in the future.

Technically, this means BGP is used everywhere, even for the last hop to each server node.

Page 13: SUNET Distribuerad Lagring - Safespring
Page 14: SUNET Distribuerad Lagring - Safespring

Load balancing as a primary concern

Safespring service operational knowledge combined with SUNET networking chops let us use technology patterns normally not accessible outside of hyperscale DCs.

Page 15: SUNET Distribuerad Lagring - Safespring

SUNET and Safespring S3 fast data path design

Page 16: SUNET Distribuerad Lagring - Safespring

SUNET Object Storage

FASTRELIABLECHEAP

Page 17: SUNET Distribuerad Lagring - Safespring

Well-known open source solution

The Ceph Object Gateway is an open source implementation of the S3 storage API. It is in use worldwide at large academic institutions to help solve petabyte-scale storage needs.

Multiple commercial software vendors (Redhat, SUSE) and service providers has backed the solution for several years.

SUNET Object Storage builds on top this openly available knowledge pool.

Page 18: SUNET Distribuerad Lagring - Safespring

Node 3Node 2Node 1

Software defined resilience levels

As a user you can decide what storage resilience policy you want to implement.

3x replication of each object is the standard, but erasure encoding or forward error makes it possible to further reduce costs at the expense of computational complexity.

Chunk

Repl 1 Repl 2 Repl 3

Node 3Node 2Node 1

Chunk

1

Node 4 Node 5 Node 6

2 3 4 C1 C2

200 % overhead

50 % overhead

3x replication

Erasure

Page 19: SUNET Distribuerad Lagring - Safespring

SUNET Object Storage

FASTRELIABLECHEAP

Page 20: SUNET Distribuerad Lagring - Safespring

End user pricing

Goal:

Pricing on par or better than comparable products from public cloud vendors Google, Microsoft, and Amazon.

● Currently 0,06 SEK per GB/month● SLA designed for volume storage

Page 21: SUNET Distribuerad Lagring - Safespring

Efficient development and operations model

The Safespring devops model specifically targets highly skilled senior engineers with already proven field experience.

Operational costs are kept low due to best practice knowledge sharing and every responsibility being a team effort.

We build it, we operate it.

Page 22: SUNET Distribuerad Lagring - Safespring

Hardware knowledge

Hardware is really boring. But Safespring know all the boring stuff, so you don’t have to.

For example, this Supermicro box takes 24x12TB drives. It also holds enough RAM and CPU for all the drives to reliably work with it as a Ceph cluster node.

This is the current sweet spot for our operational parameters. Entry level site will have 12 nodes like this for ~2PB capacity.

Page 23: SUNET Distribuerad Lagring - Safespring

Managed Compute

Page 24: SUNET Distribuerad Lagring - Safespring

SUNET Private Cloudservice delivery

BILLING

REPORTING

MAINTENANCE

MONITORING

OPERATIONS

SUNET Compute

SUNET Storage

PORTAL

Customizable per customer

or reseller with your

ownlogos, colors.User

SWAMID

Page 25: SUNET Distribuerad Lagring - Safespring

Differences from Public Compute

Public Compute:● Catering for different needs● Windows and Linux● Large boot volumes● Reliability before performance

(storage separated from compute hosts)

Control nodes

Compute nodesBlock + Object Storage nodes

Page 26: SUNET Distribuerad Lagring - Safespring

Differences from Public Compute

Managed Compute (adapted for science)● Focus on performance● Local block storage on compute

nodes● Mostly Linux● Block storage used for temporary

storage during analysis● Object storage for long term storage

Control nodes

Compute + block storage

nodesObject Storage

nodes

Page 27: SUNET Distribuerad Lagring - Safespring

Hardware - Compute

Hardware designed for Compute intensive application, HPC & Datacenter

● Dual CPU (64 cores per CPU 1:2HT) ● Local NvME disk - Fast IOPS● Reference VM 16 vCPU and 500GB disk

Page 28: SUNET Distribuerad Lagring - Safespring

Second gen IaC - Safespring DevOps - Advantages

● Update systems faster○ Lower barrier to changes

● Reproduce systems as needed○ Build everything with as few dependencies

as possible

● Add or change easily○ Target the affected nodes easily

● Verify that software works as intended

● Scales better with many operators

Page 29: SUNET Distribuerad Lagring - Safespring

Hybrid Cloud

Page 30: SUNET Distribuerad Lagring - Safespring

On the drawing board: GPU power

Two variants:● Virtualized for sharing GPU resources

between different projects● Physical dedicated to one project at a

time● Based on sector requirements

Page 31: SUNET Distribuerad Lagring - Safespring

On the drawing board: GPU power virtualized

Virtualized● Good for testing● Many users can share the same

resources● Using KWM and Hardware Assist● Lower performance

GPU node

VM

VM

VM

VM

VM

VM

Page 32: SUNET Distribuerad Lagring - Safespring

On the drawing board: GPU power - physical

GPU node

Application

Physical

● Production● Dedicated to one project● Using OpenStack Ironic for

bare-metal provisioning● High Performance = More expensive

Page 33: SUNET Distribuerad Lagring - Safespring

Reference group

SUNET is organizing a technical review group. It will work with the project to review the processes, designs and SLA.

It is crucial to SUNET & Safespring as a vendor to make sure the service meets expectations and requirements.

https://wiki.sunet.se/display/SDL

Page 34: SUNET Distribuerad Lagring - Safespring

Current project status

The SUNET Object Storage project is currently validating the service designs by building a site together with SUNET.

This work will let us develop the service further by combining our experience at all stages of delivery.

The goal is to have a MVP (minimum viable product) solution on air by 2019-04-12.

Page 35: SUNET Distribuerad Lagring - Safespring

www.safespring.com

QAFollow us

linkedin.com/company/safespringtwitter.com/safespring

2019-04-03