Upload
alluxio-inc
View
29
Download
2
Embed Size (px)
Citation preview
About Me
• Gene Pang
• Software engineer @ Alluxio, Inc.
• Alluxio open source PMC member
• Ph.D. from AMPLab @ UC Berkeley
• Worked at Google before UC Berkeley
• Twitter: @unityxx
• Github: @gpang
©2017 Alluxio, Inc. All Rights Reserved 2
History of Alluxio
Started at UC Berkeley AMPLab In Summer 2012
• Originally named as Tachyon
• Rebranded to Alluxio in early 2016
Open Sourced in 2013
• Apache License 2.0
• Latest Release: Alluxio 1.6.0
©2017 Alluxio, Inc. All Rights Reserved 4
5©2017 Alluxio, Inc. All Rights Reserved
Alluxio: Unify Data at Memory Speed
Namespace Unification
Architecture Flexibility
IO Performance
Data Ecosystem with Alluxio
• Apps only talk to Alluxio
• Simple Add/Remove
• No App Changes
• In-Memory Performance
Native File System Hadoop Compatible File System
Native Key-Value Interface
Fuse Compatible File System
HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface
©2017 Alluxio, Inc. All Rights Reserved 6
Next Gen Analytics with Alluxio
Native File System Hadoop Compatible File System
Native Key-Value Interface
Fuse Compatible File System
HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface
Apps, Data & Storage at Memory Speed
ü Big Data/IoT ü AI/ML ü Deep Learning ü Cloud Migration ü Multi Platform ü Autonomous
©2017 Alluxio, Inc. All Rights Reserved 7
Fastest Growing Big Data Open Source Projects
Fastest Growing open-source project in the big data ecosystem
Running in large production clusters
Now: 600+ Contributors from 100+ organizations
0
100
200
300
400
500 0 10 20 30 40 45
Num
ber
of C
ontr
ibut
ors
Github Open Source Contributors by Month
Alluxio
Spark
Kafka
Redis
HDFS
Cassandra
Hive
©2017 Alluxio, Inc. All Rights Reserved 8
1 0©2017 Alluxio, Inc. All Rights Reserved
Data Processing Pipeline
Stage 1 Stage 2 Stage 3 Stage 4
Output of stage is input of next stage
1 1©2017 Alluxio, Inc. All Rights Reserved
Data Processing Pipeline
Sharing via common storage
Shared Storage
Stage 1 Stage 2 Stage 3 Stage 4
1 3©2017 Alluxio, Inc. All Rights Reserved
Data Processing Pipeline in the Cloud
S3 (object store)
Stage 1 Stage 2 Stage 3 Stage 4
1 5©2017 Alluxio, Inc. All Rights Reserved
Cloud Pipeline with Alluxio
Sharing via Alluxio memory
S3 (object store)
Stage 1 Stage 2 Stage 3 Stage 4
Sharing Data in the Cloud
Previous stage writes output to storage
Next stage reads input from storage
…
©2017 Alluxio, Inc. All Rights Reserved 1 6
Sharing Data in the Cloud with Alluxio
Previous stage writes output to storage memory
Next stage reads input from storage memory
…
©2017 Alluxio, Inc. All Rights Reserved 1 7
1 8©2017 Alluxio, Inc. All Rights Reserved
Alluxio enablesin-memory data sharing
Faster pipeline performance
Alluxio – Fast Durable Writes
Improves write performance, without sacrificing fault tolerance
©2017 Alluxio, Inc. All Rights Reserved 1 9
Alluxio – Fast Durable Writes
Synchronously write to replicas in Alluxio memory
Asynchronously write to underlying storage
©2017 Alluxio, Inc. All Rights Reserved 2 0
Alluxio – Fast Durable Writes
©2017 Alluxio, Inc. All Rights Reserved 2 1
client
Storage
Write is completed
Alluxio – Fast Durable Writes
©2017 Alluxio, Inc. All Rights Reserved 2 2
client
Storage
Asynchronously written to
storage
Outline
©2017 Alluxio, Inc. All Rights Reserved 2 3
1
2
3
Alluxio Overview
Data Pipelines
Experiments
Log Pipeline in Amazon Web Services
©2017 Alluxio, Inc. All Rights Reserved 2 4
Generate Parquet Transform Aggregate
Generate: [MapReduce] Create random csv log data
Parquet: [MapReduce] Convert csv to parquet format
Transform: [Spark] Update column values
Aggregate: [Spark] Compute group by / aggregate
Log Pipeline Environment
• r4.2xlarge instances (61 GB ram, 8 CPUs) • 1 master, 3 workers
• Apache Spark 2.2.0 • Apache Hadoop 2.7.2 • Alluxio 1.6.0 • Generate 12 GB of logs • Compare AWS S3 vs Alluxio w/ Fast Durable Writes
©2017 Alluxio, Inc. All Rights Reserved 2 5
Alluxio and Pipelines in the Cloud
Alluxio enables in-memory sharing for data pipelines in the cloud Alluxio’s Fast Durable Write feature increases performance without sacrificing fault tolerance
©2017 Alluxio, Inc. All Rights Reserved 2 8
Thank you!
Gene Pang [email protected] Twitter: @unityxx
Twi$er.com/alluxio
Linkedin.com/alluxio
Website www.alluxio.com
E-mail [email protected]
@
Social Media
á
�
©2017 Alluxio, Inc. All Rights Reserved 2 9