Caching in the multiverse - USENIX · Alluxio Ceph TPC-H 10 Prefetch/Evict Script. Results •2X...

Caching in the multiverse

Mania Abdi¥, Amin Mosayyebzadeh⥾, Mohammad Hossein Hajkazemi¥,Ata Turkꬷ, Orran Krieger⥾, Peter Desnoyers¥

Northeastern University¥, State Streetꬷ, Boston University⥾

Execution Frameworke.g. Hadoop, Spark, Tez

Typical Data analytic Cluster

Analytic Frameworks

Datacenter Network

Cluster Network Cluster Network

TOR TOR TOR

Storage

Execution Framework

Typical Data analytic Cluster

• Storage disaggregation enables:• Scalability• Flexibility

• Execution frameworks perform better when data is local to cluster

• A distributed cache can help, e.g. AlluxioStorage

Distributed Cache

Datacenter Network

Cluster Network Cluster Network

TOR TOR TOR TOR

Analytic Frameworks

Data analytic executions

User queries -> Query optimizer -> Job DAG e.g. PIG

• DAGs have complex and deep structure, e.g. 200 or more nodes for complex jobs. In DAGs:• vertices represent analytic jobs• Edges represent dependencies between jobs.

• Dependency within a DAG + run time →

estimated critical path.Query #8 from TPC-H benchmark

Taxonomy of caching

Frequent Management Approaches

History based Informed hint based

LRUMost practice today

e.g. Alluxio

All-or-nothinge.g. Pacman (NSDI12)

Deadline aware

DAG aware

e.g. NetCo (SOCC18)

MRD (ICPP18), LRC (InfoComm 17)

Application hintse.g. TIP (SOSP95), MC2 (TOCS11)

Optimizing Data Analytic caches

Cache performance metric:

Query completion time = Tjob finish − Tjob submission

Goal: minimize query completion time

Our approachAdapt with

timeAdapt with schedule

The first system that solve the joint problem of DAGs scheduling and caching

Goal: Calculate optimize cache plan• Prefetch control• Admission control• Eviction control

Execution Framework

DAGs of Jobs Cache Status

Execution History

Storage bandwidth

How to find cache/prefetch plan?

• Predict job run time with and without prefetching.

• Find critical path based on dependencies and history of execution.

• Incorporate dependencies, bandwidth to storage, current cache status to choose:• Dataset to be cached

• Dataset to be prefetched

• Dataset to be evicted

DAGs of Jobs Cache Status

Execution History

Storage bandwidth

Experimental environment

Analytic framework: Pig

Execution framework: MapReduce

• 4 bare metal nodes:

• 60GB RAM

• 16 CPU

• 10 Gbps Ethernet

Distributed cache: Alluxio

• 4 bare metal nodes:

• Collocate with Hadoop nodes.

• 6BG per cache = 24GB cache

Remote storage: Ceph

Benchmark: TPC-H queries

• 30 GB dataset size

MapReduce

Alluxio

Prefetch/EvictScript

Results

• 2X improvement over LRU.

• Up to 22% improvement over MRD

Current work: implementation

Workflow DAG

Job runtime estimator

Critical path estimator

Job improvement estimator

Distrib

I/O planner

Remote Storage

History of execution

Discussion / Challenges

• Benchmark:• New benchmark to evaluate our approach.

• Multi-query execution:• Approaches: two step cache management

• Create plan for a single query• Create plan for multiple queries.

• Competition queries with contradicting requests• Initial approaches: prioritize nearest future access.

• Bandwidth allocation for multiple queries• Initial approaches: prioritize shortest job first.

• Where to prefetch?

Conclusion

Goal: minimize end-to-end latency of query execution

Approach: scheduling aware cache management policy

Key insight: incorporate execution history and current cache status to optimize the critical path through caching and prefetching.

Results: 2X improvement over LRU

Caching in the multiverse - USENIX · Alluxio Ceph TPC-H 10 Prefetch/Evict Script. Results •2X...

Documents

Spark Pipelines in the Cloud with Alluxio

Accelerating Spark Workloads in a Mesos Environment with Alluxio

Evaluation of the Suitability of Alluxio for Hadoop Processing … · 2016-09-16 · Software: Cloudera Manager: 5.8.1, Spark: 1.6.0, Hadoop: 2.6.0-cdh5.7.1, Alluxio: 1.2.0. 2 Installation

Lecture 15 Process and Memory Integration · – evict victim: evict victim line from cache and write victim to memory – refill: refill requested line by reading line from memory

Bringing Data to Life - NVIDIA€¦ · Alluxio Alluxio holds a unique place in the big data ecosystem, residing between storage systems such as Amazon S3, Apache HDFS or OpenStack

SDI/ISTC Seminar · Alluxio, formerly Tachyon, is an open source memory speed virtual distributed storage system. In this talk, I will first introduce the Alluxio project and then

They want to evict us. WHAT NOW?

SDI/ISTC Seminar€¦ · Alluxio, Inc. Jiří is a software engineer at Alluxio, Inc. and one of the top committers and a project management committee member of the Alluxio open source

Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016

Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio

Alluxio Use Cases at Strata+Hadoop World Beijing 2016

Best Practices for Using Alluxio with Apache Spark with Gene Pang

Alluxio (formerly Tachyon) - SNIA · 2019-12-21 · Alluxio Benefits • Flexibility – Enable new workloads across any storage systems – Unified Name Space enable application

基于Alluxio提升Spark和Hadoop HDFS的系统性能与 … MEM Worker3 SSD HDD MEM Worker2 SSD HDD. Alluxio ... checkPointPath : hdfs://xxx:yyy/zzz... Alluxio

Squatting: Lifting the Heavy Burden to Evict Unwanted …

Getting Started with Alluxio + Spark + S3

Get InsIGhts Faster wIth alluxIo and Intel · 2020-04-27 · Get InsIGhts Faster wIth alluxIo and Intel In today’s data centers, bounded storage and compute resources on Hadoop

10 Ways to Evict Creative Block from Your Head

Alluxio Presentation at Strata San Jose 2016

Resource Evict On