The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

The Practice of Presto & Alluxio in E-Commerce Big

Data Platform

2019-06-20

Tao Huang, JD.comBig Data Platfrom Engineer

1 2

3 4

JD BDPIntroducation of JD.com BDP architecture

Practice with Presto in BDP

Introducation of Presto and practice in

JD BDP

Presto & Alluxio StackOur user case of Presto & Alluxio

Ongoing ExplorationThe features we are exploring

Contents

JD BDP1

JD BDP

4

Tens of thousands of nodes

Thousands of users

cluster scale Computing ability

Tens of PB offline data daily

Millions of jobs daily

Storage capacity

Hundreds of PB data

Tens of PB daily increase

Business scale

Tens of business units

Hundreds of data models

BDP architecture

5

Practice with Presto in BDP2

Presto Architecture

Our Works on Presto

8

Cluster Scaling01

ERP Authorization03

Job Isolation02

Operation & Maintenance04

Presto on YARN

Unified Resource ManagementYARN

Presto worker scaling

DynamicResource

Configure Presto in WebConfiguration

load/unload pluginsPlugin

PowerServer for operation and maintenance

10

• export query result• update plugin

Plugin manager

• route query to cluster• adjust resource group

Dynamic Congfiguation

• track users’query• security

ERP Authorization

• dynamic auto-scale• start/stop cluster

Auto Maintenance

Intelligent Scheduler

Periodical Queries

◉ controllable data range

◉ high query frequency

◉ high data reuse rate

◉ high proportion

Unpredicatible Queries

◉controllable data range

◉ low query frequency

◉ low data resuse rate

◉ low proportion

Application Scenario

Presto Jobs in BDP

Presto & Alluxio Stack3

Data Ecosystem with Alluxio

15

•

•

•

•

Presto + Alluxio = Better Together

16

Higher query throughput

Consistent low query latency

Eliminates network traffic

JD Contribution to Alluxio

17

BusinessStrategy

ui-grid based sort/pagination/filter add an

input field

New Web UI

high watermark start evictlow watermark stop evict

Watermark Evict Strategy

check startsupcheck every time

Cache Consistency

monitor JVM pause periodicallylog message and metrics

JVM Pause Monitor

cp/ls/load/rm/format

Shell Command

Deadlockthrift add timeout time…

Bugfix

shellRESTful API

Change Log Level

SyncQueryAlluxioTools…

Test

Sync Evit Strategy Async Evit Strategy

Watermark Evict Strategy

Cache Consistency

Keep Alluxio & HDFS Consistency

RPC API

RESTful API

Alluxio Master startup

Client request metadata by getFileId, getFileInfo, listStatus, etc

Alluxio master will check file cache consistency

To ensure that dirty data is not read. There are three ways to trigger file consistency

check.

calling reloadMetaData to trigger Alluxio to reload all metadata

check file cache consistency while master start up

Presto on Alluxio

Why Presto on Alluxio?

High Performance

Consistent Low Query Latency

Eliminate Network Traffic

Others: Fault-tolerant & Pluggable

When we use Alluxio for Presto, we make some changes and

bring some good features

•Alluxio led to 10x performance improvement

•Hundreds of nodes

•More than 2 years in production enviroment.

Presto on Alluxio

Presto on Alluxio

Presto on Alluxio

Presto on Alluxio

Presto on Alluxio

Ongoing Exploration4

Presto Exploration

Presto Master Load Balancing

Thread Level Resource Isolation

Unify Larger Clusters

As the amount of data grows, the cluster size becomes larger, and query tasks become more and more, Master will become a performance bottleneck. To achieve load balancing, how to improve Presto will be a challenge.

The execution tasks running on the workers compete for resources, especially the jobs in the test phase. If we can restrict the execution tasks with CGroups, it will reduce the mutual impact among queries.

Large-scale cluster help improving resource utilization. In the past year, we have reduced the number of

clusters from more than 100 to 20. Within ensuring query efficiency, we will further increase the cluster size

to reduce the number of clusters.

Alluxio Exploration

Exploring more application scenarios

Porting HDFS Authentication to Alluxio

HDFS RBF or Alluxio

Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle

data

We are going to port custom authentication on our HDFS to Alluxio.

We have tried to use HDFS router-based fedration, but its performance does not meet our online requirements. We find that Alluxio also has forwarding capabilities and hopes that Alluxio will perform better, That is what we are doing.

Thank You!

[email protected]

Documents

The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are