29
The Practice of Presto & Alluxio in E-Commerce Big Data Platform 2019-06-20 Tao Huang, JD.com Big Data Platfrom Engineer

The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

The Practice of Presto & Alluxio in E-Commerce Big

Data Platform

2019-06-20

Tao Huang, JD.comBig Data Platfrom Engineer

Page 2: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

1 2

3 4

JD BDPIntroducation of JD.com BDP architecture

Practice with Presto in BDP

Introducation of Presto and practice in

JD BDP

Presto & Alluxio StackOur user case of Presto & Alluxio

Ongoing ExplorationThe features we are exploring

Contents

Page 3: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

JD BDP1

Page 4: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

JD BDP

4

Tens of thousands of nodes

Thousands of users

cluster scale Computing ability

Tens of PB offline data daily

Millions of jobs daily

Storage capacity

Hundreds of PB data

Tens of PB daily increase

Business scale

Tens of business units

Hundreds of data models

Page 5: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

BDP architecture

5

Page 6: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Practice with Presto in BDP2

Page 7: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto Architecture

Page 8: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Our Works on Presto

8

Cluster Scaling01

ERP Authorization03

Job Isolation02

Operation & Maintenance04

Page 9: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on YARN

Unified Resource ManagementYARN

Presto worker scaling

DynamicResource

Configure Presto in WebConfiguration

load/unload pluginsPlugin

Page 10: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

PowerServer for operation and maintenance

10

• export query result• update plugin

Plugin manager

• route query to cluster• adjust resource group

Dynamic Congfiguation

• track users’query• security

ERP Authorization

• dynamic auto-scale• start/stop cluster

Auto Maintenance

Page 11: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Intelligent Scheduler

Page 12: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Periodical Queries

◉ controllable data range

◉ high query frequency

◉ high data reuse rate

◉ high proportion

Unpredicatible Queries

◉controllable data range

◉ low query frequency

◉ low data resuse rate

◉ low proportion

Application Scenario

Page 13: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto Jobs in BDP

Page 14: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto & Alluxio Stack3

Page 15: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Data Ecosystem with Alluxio

15

Page 16: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto + Alluxio = Better Together

16

Higher query throughput

Consistent low query latency

Eliminates network traffic

Page 17: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

JD Contribution to Alluxio

17

BusinessStrategy

ui-grid based sort/pagination/filter add an

input field

New Web UI

high watermark start evictlow watermark stop evict

Watermark Evict Strategy

check startsupcheck every time

Cache Consistency

monitor JVM pause periodicallylog message and metrics

JVM Pause Monitor

cp/ls/load/rm/format

Shell Command

Deadlockthrift add timeout time…

Bugfix

shellRESTful API

Change Log Level

SyncQueryAlluxioTools…

Test

Page 18: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Sync Evit Strategy Async Evit Strategy

Watermark Evict Strategy

Page 19: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Cache Consistency

Keep Alluxio & HDFS Consistency

RPC API

RESTful API

Alluxio Master startup

Client request metadata by getFileId, getFileInfo, listStatus, etc

Alluxio master will check file cache consistency

To ensure that dirty data is not read. There are three ways to trigger file consistency

check.

calling reloadMetaData to trigger Alluxio to reload all metadata

check file cache consistency while master start up

Page 20: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Why Presto on Alluxio?

High Performance

Consistent Low Query Latency

Eliminate Network Traffic

Others: Fault-tolerant & Pluggable

When we use Alluxio for Presto, we make some changes and

bring some good features

•Alluxio led to 10x performance improvement

•Hundreds of nodes

•More than 2 years in production enviroment.

Page 21: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Page 22: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Page 23: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Page 24: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Page 25: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto on Alluxio

Page 26: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Ongoing Exploration4

Page 27: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Presto Exploration

Presto Master Load Balancing

Thread Level Resource Isolation

Unify Larger Clusters

As the amount of data grows, the cluster size becomes larger, and query tasks become more and more, Master will become a performance bottleneck. To achieve load balancing, how to improve Presto will be a challenge.

The execution tasks running on the workers compete for resources, especially the jobs in the test phase. If we can restrict the execution tasks with CGroups, it will reduce the mutual impact among queries.

Large-scale cluster help improving resource utilization. In the past year, we have reduced the number of

clusters from more than 100 to 20. Within ensuring query efficiency, we will further increase the cluster size

to reduce the number of clusters.

Page 28: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Alluxio Exploration

Exploring more application scenarios

Porting HDFS Authentication to Alluxio

HDFS RBF or Alluxio

Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle

data

We are going to port custom authentication on our HDFS to Alluxio.

We have tried to use HDFS router-based fedration, but its performance does not meet our online requirements. We find that Alluxio also has forwarding capabilities and hopes that Alluxio will perform better, That is what we are doing.

Page 29: The Practice of Presto & Alluxio in E-Commerce Big Data Platform · Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are

Thank You!

[email protected]