20
Alluxio (formerly Tachyon) Memory Speed Virtual Distributed Storage 1 Haoyuan Li June 1 st , 2016 @ AMPLab 2016 Summer Retreat

Alluxio Presentation at AMPLab Summer Retreat 2016

Embed Size (px)

Citation preview

Page 1: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio (formerly Tachyon) Memory Speed Virtual Distributed Storage

1

HaoyuanLiJune1st,2016

@AMPLab2016SummerRetreat

Page 2: Alluxio Presentation at AMPLab Summer Retreat 2016

What is Alluxio?

•  Memory Speed Virtual Distributed Storage

•  Enables Virtualized Data Across Multiple Types of Storage

2

Page 3: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio Open Source Contributor Growth

3

•  Over 250 contributors from over 100 organizations

•  3x growth over the last year!

Page 4: Alluxio Presentation at AMPLab Summer Retreat 2016

Introducing Alluxio Open Source Governance

4

•  Alibaba

•  Alluxio

•  Baidu

•  Fosun International

•  Google

•  Huawei

•  IBM

•  Intel

•  Nanjing University

•  UC Berkeley

Page 5: Alluxio Presentation at AMPLab Summer Retreat 2016

Performance Trend: Memory is Fast

•  RAM throughput increasing exponentially

•  Disk throughput increasing slowly

•  Memory-locality key to interactive response times

5

Page 6: Alluxio Presentation at AMPLab Summer Retreat 2016

Price Trend: Memory is Cheaper

6

Source:jcmit.com

Page 7: Alluxio Presentation at AMPLab Summer Retreat 2016

The Big Data Ecosystem Today

7

Page 8: Alluxio Presentation at AMPLab Summer Retreat 2016

The Big Data Ecosystem Today

8

Page 9: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio Approach

9

Page 10: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Flexibility

–  Enable new workloads across any storage systems

–  Unified Name Space enable application to access data in any storage system

–  Future Proven Architecture

•  Technology of your choice –  Work with the framework of your choice

–  Work with the storage of your choice

•  Performance

–  High performance data access

–  Efficient data sharing among different computation frameworks and applications

•  Cost Saving

–  Scale storage and compute independently

10

Alluxio Benefits

Alluxio: Any application accesses any data from any storage at memory speed.

Page 11: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Tiered Storage

•  Transparent Naming

•  Unified Namespace

•  Native Amazon S3, Google Cloud Storage, Open Stack Swift, Alibaba OSS

integrations

•  Fuse Connector, K/V Interface

•  One Command Cluster Deployment

•  Metrics Reporting

11

New Features

Page 12: Alluxio Presentation at AMPLab Summer Retreat 2016

12

The Storage Tier Hierarchy

MEM

SSD

HDD

Page 13: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Data can be evicted to lower layers if it is “cooling down”

•  Data can be promoted to upper layers if it is “warming up”

13

Automatic Data Migration

EvictstaledatatolowerLer

PromotehotdatatoupperLer

Page 14: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Applications can transparently and efficiently interact with remote

storage through Alluxio.

•  Applications do not need to use different APIs for interacting with

different storage systems.

14

Transparent Naming

alluxio://host:port/

data users

reports sales alice bob

s3n://bucket/directory

data users

reports sales alice bob

Alluxio StorageSystem

Page 15: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Applications can read and write different storage systems

•  Decouples data location from application

15

Unified Namespace

alluxio://host:port/

data users

reports sales alice bob

hdfs://host:port/

users

alice bob

s3n://bucket/directory

reports sales

Alluxio StorageSystemA

StorageSystemB

Page 16: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark

•  Under Storage: Baidu’s File System

•  Storage Media: MEM + HDD

•  200+ nodes deployment

•  2PB+ managed space

16

+

Page 17: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark

•  Storage Media: MEM

•  Improvement from Hours to Seconds

17

+

Page 18: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark Streaming & Flink

•  Under Storage: HDFS & Ceph

•  Storage Media: MEM + HDD

•  200 nodes deployment

•  Alluxio enables previously impossible jobs to finish

•  10x Performance Improvement on average

•  300x Performance Improvement during peak time.

18

+

Page 19: Alluxio Presentation at AMPLab Summer Retreat 2016

Contacts

•  Alluxio Open Source Project: www.alluxio.org

•  Alluxio, Inc: www.alluxio.com

•  Development: www.github.com/Alluxio/alluxio

•  Meet Friends: www.meetup.com/Alluxio

•  Contact: [email protected] ; [email protected]

19

Page 20: Alluxio Presentation at AMPLab Summer Retreat 2016

Thank You