47
Making HTAP Real TiFlash: A Native Columnar Extension for TiDB

TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

Making HTAP RealTiFlash: A Native Columnar Extension for TiDB

Page 2: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

About Me

Dongxu (Ed) Huang, Co-founder & CTO @ PingCAP, a Distributed system

engineer / Open source enthusiast / Guitar hobbyist

● Projects: TiDB / TiKV

● Location: Beijing / San Fransisco Bay Area

● Research interest: Distributed consenssus algorithm / transaction / storage

engine design / data structure

● Email: [email protected]

● Twitter: @dxhuang

Page 3: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Data Platform - What Application Dev Think It Is

Reporting

DatabasesApp

BI

Console

Ad hoc

Page 4: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Data Platform - What It Really Is

OLTP DBsApp

Data Warehouse

ETLAnalytical DBs

ReportingBI

Console

Ad hoc

Page 5: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Why is it so complicated?

● Different access patterns

○ OLTP

■ Short / point access to small number of records

■ Row-based format

○ OLAP

■ Large / batch process of subset of columns

■ Column-based format

● Workload interference/Resource isolation

○ OLAP queries can easily occupy large amount of system resources

○ OLTP latency / concurrency will be dramatically impacted

Page 6: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

A Popular Solution

● Different types of databases for different workload

○ OLTP specialized database for transactional data

○ Hadoop / analytical database for historical data

● Offload transactional data via ETL process into Hadoop / analytical database

○ Periodically, usually per day

Page 7: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Good enough!

Really?

Page 8: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Data processing stack today

● Data cannot be seamlessly

connected between different

data stores

Page 9: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

● Adding a new data source

is hard

● It is painful to maintain

multiple systems.

Data processing stack today

Page 10: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

● Data pipeline may easily become the

bottleneck

Data processing stack today

Page 11: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

● The process of ETL usually loses the the transaction info

Data processing stack today

Page 12: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

A little bit about TiDB

● Full-featured SQL

○ MySQL compatibility

● ACID compliance

● HA with strong consistency

○ Multi-Raft

● Elastic scalability

● Open-source!WASM TiDB: https://play.tidb.io

Page 13: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

TiDB ArchitectureTiDB

/TiSpark

Stateless SQL Layer

Key-ValueLayer

Region 1

Region 3

TiKV node 1

Region 4

Store 1

Region 1

Region 2

TiKV node 2

Region 3

Store 2

Region 3

Region 1

TiKV node 3

Store 3

Region 4

Region 4

Region 2 Region 1

Region 2

TiKV node 4

Store 4

Raft Group

Page 14: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

“Several TB” 1 TB ~ 200 TB

Page 15: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Well, TiDB is a Row-based Database!

Page 16: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

If your database stores 100TB of data, and your database claims it supports full-featured SQL...

Page 17: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Row-based or Column-based?

Page 18: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Columnstore VS Rowstore

SELECT AVG(age) FROM emp;id name age

0962 Jane 30

7658 John 45

3589 Jim 20

5523 Susan 52

Rowstore

id

0962

7658

3589

5523

name

Jane

John

Jim

Susan

age

30

45

20

52

Columnstore

Page 19: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Columnstore VS Rowstore

● Columnstore

○ Bad for small random reads and random writes

○ Suitable for analytical workload

○ Efficient CPU utilization using vectorized processing

○ High compression rate

● Rowstore

○ Good for random reads and random writes

○ Researched and optimized for OLTP scenario for decades

○ Cumbersome for analytical workload

Page 20: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Why not both?

Page 21: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

TiFlash

Page 22: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

What Is TiFlash?

● An extended analytical engine for TiDB○ Columnar storage and vectorized processing

○ Partially based on ClickHouse with tons of modifications

○ Enterprise offering (May open source in the future)

● Data synchronization via extended Raft consensus algorithm

○ Strong consistency

○ Trivial overhead

○ Raft learner, a non-voting member in the Raft group

● Strict workload isolation to eliminate the impact on OLTP

● Native integration with TiDB

Page 23: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

TiDB with TiFlash ArchitectureTiDB

/TiSpark

Region 1

Region 3

TiKV node 1

Region 4

Store 1

Region 1

Region 2

TiKV node 2

Region 3

Store 2

Region 3

Region 1

TiKV node 3

Store 3

Raft Group

Region 4

Region 4

Region 2 Region 1

Region 2

TiKV node 4

Store 4 Store 5 Store 6

TiFlash node 1 TiFlash node 2

Page 24: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Low-cost Data Replication

● Data is replicated to TiFlash via Raft Learner○ Non-voting member

○ Extended Raft consensus algorithm

○ Async replication

○ Almost zero overhead to OLTP workload

Page 25: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Low-cost Data Replication

Region 1

TiKV node 1

Leader

Learner

TiFlash node 1

Region 1

TiKV node 1

Follower

Region 1

TiKV node 1

Follower

TiKV Client

Page 26: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Strong Consistency

● Logically the same view as in rowstore

○ Same data

○ Same isolation level (SI)

● TiFlash keeps casual consistency via async replication

○ 99.99…% in-sync

○ 0.00…1% out-of-sync

● Read operation guarantees strong consistency

○ Learner Read

○ Multiversion concurrency control (MVCC)

Raft Learner

TS:1 TS:2 TS:3

MVCC

Page 27: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Learner Read + MVCC

Raft Learner

...TS:10

$100

MVCC

Raft Leader

4 2

Timestamp: 11balance = ?

Page 28: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Learner Read + MVCC

Raft Learner

...TS:10

$100

TS:11

$200

TS:12

$150

MVCC

Raft Leader

4 4

Timestamp: 11balance = $200

Page 29: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Vectorized ProcessingSELECT SUM(foo) FROM Table WHERE foo > bar

foo

1

10

4

5

42

Table Scan

bar

4

1

10

NIL

-5

Selection Aggregation

c1

0

1

0

NIL

1

> SUM

c2

52

Column Vector Block

+

foo

10

NIL

42

foo

1

10

4

5

42

Page 30: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

DeltaMerge Engine

Delta Space (Inserts/Deletes)

Stable Space

Ins

Del

...DeltaIndex Delta

Append

A segment in DeltaMerge Engine

Page 31: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

DeltaMerge Engine

...

Level 0

Level 1

Level n

Stable Delta Stable Delta Stable Delta

LSM-Tree

...

Sement 0[-inf, 100)

Sement 1[100, 200)

Sement n[200, +inf]

DeltaMerge

SELECT ... WHERE x BETWEEN (150, 160)

VS

Append Append Append

Write Write

Page 32: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

TiFlash is beyond columnar storage

Page 33: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Scalability

● TiDB relies on Multi-Raft for scalibility

○ One command to add / remove node

○ Scaling is fully automatic

○ Smooth and painless data rebalance

● TiFlash fully inherits these abilities

Page 34: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Isolation

● Perfect resource isolation to prevent workload interference

● Dedicated nodes for TiFlash

● Nodes are clustered into “zone”s

○ TP Zone

■ TiKV nodes, for OLTP workload

○ AP Zone

■ TiFlash nodes, for OLAP workload

Page 35: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

IsolationTiDB

/TiSpark

Region 1

Region 3

TiKV node 1

Region 4

Store 1

Region 1

Region 2

TiKV node 2

Region 3

Store 2

Region 3

Region 1

TiKV node 3

Store 3

Region 4

Region 4

Region 2 Region 1

Region 2

TiKV node 4

Store 4 Store 5 Store 6

TiFlash node 1 TiFlash node 2

OLTPworkload

OLAPworkload

TPZone

APZone TiDB

/TiSpark

Page 36: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Integration

● TiDB / TiSpark might choose to read from either side

○ Based on cost

○ Columnstore is treated as a special kind of index

● Upon TiFlash replica failure, read TiKV replica transparently

● Join data from both sides in a single query

Page 37: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

IntegrationTiDB

/TiSpark

Region 1

Region 3

TiKV node 1

Region 4

Store 1

Region 1

Region 2

TiKV node 2

Region 3

Store 2

Region 3

Region 1

TiKV node 3

Store 3

Region 4

Region 4

Region 2 Region 1

Region 2

TiKV node 4

Store 4 Store 5 Store 6

TiFlash node 1 TiFlash node 2

TableScan sales(price,pid)

SELECT AVG(s.price) FROMprod p, sales sWHERE p.pid = s.pid AND p.batch_id = ‘B1328’;

IndexScan prod(pid, batch_id = ‘B1328’)

Page 38: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Performance

● Comparable performance against Parquet format

○ Underlying storage format supports Multi-Raft + MVCC

● Benchmark against Apache Spark 2.3 on Parquet

○ Pre-POC version of TiFlash + Spark

Page 39: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

WIP Items

● Native MPP

● Direct Write as Columnar format

Page 40: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Massively Parallel Processing (MPP) Support

● TiFlash nodes form a MPP cluster by themselves

● Full computation support will

○ Further speed up TiDB by pushing down more computations

○ Speed up TiSpark by avoiding writing disk during shuffle

Page 41: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

MPP SupportTiDB

/TiSpark

TiFlash node 1 TiFlash node 2 TiFlash node 3

MPP Worker MPP Worker MPP Worker

Coordinator

Plan Segment

Data Exchange

Page 42: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Direct Write as Columnar format

● For now, TiFlash is only mirroring data from TiKV

○ User needs to write data into TiKV (rowise format) first

● We are working on direct columnar write support

○ Faster batch write for data warehousing application

○ No mandatory 3 TiKV replicas anymore

○ Cheaper historical data storage

Page 43: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

TiDB Data Platform

Page 44: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Data Platform - What It Could Be

TiDBwith

TiFlash

App

ReportingBI

Console

Ad hoc

Page 45: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

“What happened yesterday?”

VS

“What’s going on right now?”

Page 46: TiFlash: A Native Columnar Extension for TiDB Making HTAP Real … · Full-featured SQL MySQL compatibility ACID compliance HA with strong consistency Multi-Raft Elastic scalability

PingCAP.com

Roadmap

● 1st round Beta now

○ With columnar engine and isolation ready

■ Access only via Spark

● Nov 2nd round Beta

○ With CBO and TiDB integration ← We're here.

● GA with TiDB 4.0 (around 2020 1Q)

○ Unified coprocessor layer

■ Ready for both TiDB / TiSpark

■ Cost based access path selection