19
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017

UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

Page 1: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017

March 2017

Page 2: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

HISTORY

• Started at UC Berkeley AMPLab In Summer 2012 • Originally named as Tachyon • Rebranded to Alluxio in early 2016

• Open Sourced in 2013 • Apache License 2.0 • Latest Stable Release: Alluxio 1.4.0 • Alluxio 1.5.0 Planned For Q2, 2017

2

Page 3: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

BIG DATA ECOSYSTEM YESTERDAY

3

Page 4: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

BIG DATA ECOSYSTEM TODAY

3

Page 5: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

BIG DATA ECOSYSTEM ISSUES

3

Page 6: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

BIG DATA ECOSYSTEM WITH ALLUXIO

FUSE Compatible File System

Hadoop Compatible File System

Native Key-Value Interface

Native File System

GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface

3

Page 7: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

BIG DATA ECOSYSTEM WITH ALLUXIO

FUSE Compatible File System

Hadoop Compatible File System

Native Key-Value Interface

Native File System

Enabling Application to Access Data from any Storage System at Memory-speed

GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface

3

Page 8: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential 4

Page 9: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential 5

Page 10: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

FASTEST-GROWING BIG DATA PROJECT

6

Page 11: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

FASTEST-GROWING BIG DATA PROJECT

• Formerly named Tachyon, born in the AMPLab

• 500+ contributors from 100+ organizations

• Running world’s largest production clusters

6

Page 12: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

WHY ALLUXIO

7

Co-located compute and data with memory-speed access to data

Virtualized across different storage systems under a unified namespace

Scale-out architecture

File system API, software only

Page 13: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

ALLUXIO BENEFITS

Unification

New workflows across any data in any storage system

Orders of magnitude improvement in run time

Choice in compute and storage – grow each independently, buy only what is needed

Performance Flexibility

8

Page 14: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

ALLUXIO DEPLOYMENTS

9

Page 15: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

ALLUXIO USE CASES

On-Demand Analytics & Accelerating I/O to and from remote storage

Managing data across disparate storage systems

Sharing data across workloads at memory speed

10

Page 16: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

MANAGE DATA ACROSS STORAGE SYSTEMS

“We’ve been running in production for over 9 months, Alluxio’s enabled different applications & frameworks to easily interact with data from different storage systems

RESULTS

• Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing

• Improved the performance of their system with 15x – 300x speedups

• Tiered storage feature manages storage resources including memory, SSD and disk

Qunar uses real-time machine learning

for their website ads

• 200+ nodes deployment

• 6 billion logs (4.5 TB) daily

• Mix of Memory + HDD

ALLUXIO

11

Page 17: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

ON-DEMAND ANALYTICS &ACCELERATE I/O TO/FROM REMOTE STORAGE

“The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds.

RESULTS

• Data queries are now 30x faster with Alluxio

• Alluxio cluster runs stably, providing over 50TB of RAM space

• By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds

PMs run interactive queries to gain

insights into their products & business

• 200+ nodes deployment

• 2+ petabytes of storage

• Mix of memory + HDD

ALLUXIO

Baidu File

System

12

Page 18: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

SHARE DATA ACROSS JOBS @ MEMORY SPEED

“Thanks to Alluxio, we now have the raw data immediately available at every iteration & can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity.

RESULTS

• Barclays workflow iteration time decreased from hours to seconds

• Alluxio enabled workflows that were impossible before

• By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds

Barclays uses query & machine learning

to train models for risk management

• 6 node deployment

• 1TB of storage

• Memory only

ALLUXIO

13

ALLUXIO

Relational Database: Teradata

Page 19: UNIFY DATA AT MEMORY SPEED · finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with

© 2017 Alluxio Confidential

Thank you!Contact: {haoyuan}@alluxio.com or [email protected]

14