54
Growing in the Wild. The story by CUBRID Database Developers. Esen Sagynov (@CUBRID), NHN Corporation Service Platform Development Center Monday, April 2, 2012 Eugen Stoianovici, NHN Corporation CUBRID Development Lab

Growing in the Wild. The story by CUBRID Database Developers

  • Upload
    cubrid

  • View
    5.093

  • Download
    0

Embed Size (px)

DESCRIPTION

The presentation the CUBRID team presented at Russian Internet Technologies Conference in 2012. The presentation covers such questions as *WHY* CUBRID was developed, *WHY* the developers did not fork existing solutions, *WHY* it was necessary to develop a new RDBMS from scratch, and *HOW* CUBRID Database was evolved over the years.

Citation preview

Page 1: Growing in the Wild. The story by CUBRID Database Developers

Growing in the Wild. The story by CUBRID Database

Developers.

Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center

Monday, April 2, 2012

Eugen Stoianovici,NHN CorporationCUBRID Development Lab

Page 2: Growing in the Wild. The story by CUBRID Database Developers

Who are we?

• Eugen Stoianovici– CUBRID Engine Team– [email protected]

• Esen Sagynov @CUBRID– CUBRID Project Manager– [email protected]

Page 3: Growing in the Wild. The story by CUBRID Database Developers

Purpose of this presen-tation

This is what I remember from every presentation that I’ve attended. Not the details.

1. “Some guys talked about some cool stuff they encountered in applications (don't remember what)”

2. “There's a database that they use for this type of applications, it's open source and saves from a lot of trouble (don't remember what trouble exactly).”

3. “They're really keen on doing things right.”

Page 4: Growing in the Wild. The story by CUBRID Database Developers

You will learn…

Reasons behind CUBRID development.

What CUBRID has to offer. Benefits & ad-vantages.

What we have learnt so far. Where we are heading to.

Page 5: Growing in the Wild. The story by CUBRID Database Developers

CUBRID Facts

RDBMSTrue Open Source @ www.cubrid.orgOptimized for Web servicesHigh performance 3-tier architecture Large DB supportHigh-Availability featureDB Sharding supportMySQL compatible SQL syntaxACID TransactionsOnline Backup

Page 6: Growing in the Wild. The story by CUBRID Database Developers

Reasons Behind CUBRID Development

Page 7: Growing in the Wild. The story by CUBRID Database Developers
Page 8: Growing in the Wild. The story by CUBRID Database Developers

Japan

30,000+Web Servers

USA

Korea

China

150+ Web Services

Page 9: Growing in the Wild. The story by CUBRID Database Developers

30,000+Web Servers

Korea Japan

USA

USA

Korea

Korea Japan

iOS & Android

Japan Oracle, MSSQL,MySQL, CUBRID,

NoSQL

150+ Web Services

Page 10: Growing in the Wild. The story by CUBRID Database Developers

Disadvantages of existing so-lutions

1. High License Cost1. Over 10,000 servers @ NHN

2. Third-party solution1. No ownership of the code base2. Additional $$$ for customizations3. Branch tech support is not enough4. Communication barriers w/ vendors5. Slow updates & fixes

Page 11: Growing in the Wild. The story by CUBRID Database Developers

Fork or Start from Scratch?

• No full ownership• Time to learn the

code base• Fixed

architecture• Understand the

design philosophy

• Full ownership• Time to develop• Custom more

advanced architecture and design

Page 12: Growing in the Wild. The story by CUBRID Database Developers

Benefits of in-house solution

1. High License Cost1. Over 10,000 servers

@ NHN

2. Third-party solution1. No ownership of the

code base2. Additional $$$ for

customizations3. Communication

barriers w/ vendors4. Slow updates & fixes

1. No License Cost2. Core Technological Asset

1. Complete control of the code base2. No additional $$$ for

customizations3. No communication barriers4. Fast updates & fixes

3. Key Storage Technology Skills1. Grow our developers2. Export developers

4. New Database Solution Service1. Provide CUBRID service to other

platforms2. Instant reaction to customer issues

5. Recurring Key Technology1. High-Availability2. Sharding3. Rebalancing4. Cluster5. etc.

Page 13: Growing in the Wild. The story by CUBRID Database Developers

CUBRID

Stability Performance

Scalability Ease of Use

Goal

• Human vs. DB Errors• # of customers

• Smart Index Optimizations• Shared Query Caching• Web Optimized Features• Load Balancer

• High-Availability w/ auto fail-over• Sharding• Data Rebalancer• Cluster

• SQL & API Compatibility• Native Migration Tool• Native GUI DB Management Tools• Monitoring Tools

Page 14: Growing in the Wild. The story by CUBRID Database Developers

#1

Performance

Page 15: Growing in the Wild. The story by CUBRID Database Developers

ClientRe-quests

Performance UP!

Types of WebServices

Main operations Example

READ > 95% News, Wiki, Blog, etc.

READ:WRITE = 70:30% SNS, Push services, etc.

WRITE > 90% Log monitoring, Analyt-ics.

90% of WebSer-vices

CRUD WHY?

SELECT Fast searching, avoid sequential scan and OR-DER BY

INSERT Concurrent WRITE performance, reduce I/O, andFast searching

UPDATE Fast searching, improve lock mechanism

DELETE

Fast searching

How &What toimprove

Page 16: Growing in the Wild. The story by CUBRID Database Developers

Phase 1v1.0 ~ 2.0

Phase 2v8.2.2

Phase 3v8.4.0

Phase 4v8.4.1

Phase 5Apricot

Phase 6Banana

SELECTPerfor-mance

+

INSERT &DELETEPerfor-mance

+

SELECTPerfor-mance

++

INSERT &UPDATEPerfor-mance

++

INSERTPerfor-mance+++

SELECTPerfor-mance++++

Shared Query Plan

Caching

SpaceReusabilityImprove-

ment

CoveringIndex,

Key limit, etc.

MemoryBuffer Mgmt.

Improve-ments

Filter index,Skip index,

etc.

OptimizeJOINs

DB & IndexVolume

Optimiza-tions

APIPerfor-mance

+

WindowsPerfor-mance

+

TPS 15% 10% 270% 70%

Smart Indexing

MySQL SELECTperformance

CUBRID SELECTperformance< MySQL INSERT

performanceCUBRID INSERT

performance<

Page 17: Growing in the Wild. The story by CUBRID Database Developers

CREATE TABLE forum_posts( user_id INTEGER, post_moment INTEGER, post_text VARCHAR(64));

INDEX i_forum_posts_post_moment ON forum_posts (post_moment);INDEX i_forum_posts_post_moment_user_idON forum_posts (post_moment, user_id);

Random INSERT Perfor-mance

SELECT username FROM users WHERE id = ?;

INSERT INTO forum_posts(user_id, post_moment, post_text)VALUES (?, ?, ?);

UPDATE users SET last_posted = ? WHERE id = ?;

CREATE TABLE users( id INTEGER UNIQUE, username VARCHAR(255), last_posted INTEGER,);

Page 18: Growing in the Wild. The story by CUBRID Database Developers

Random INSERT Perfor-mance

• Users– 100,000 rows prepopulated

• Test– CUBRID vNext (code name Apricot)–MySQL 5.5.21– 40 workers– 1 hour– Record QPS every 2 minutes

Page 19: Growing in the Wild. The story by CUBRID Database Developers

0

523,

080

1,04

7,72

0

1,55

4,00

0

2,07

9,00

0

2,58

6,00

0

3,11

6,64

0

3,65

2,92

0

4,17

8,40

0

4,69

4,52

0

5,21

1,24

0

5,70

8,40

0

6,18

7,32

0

6,68

1,84

0

7,17

0,96

0

7,64

1,48

0

8,10

3,84

0

8,55

9,84

0

8,99

5,32

0

9,41

8,20

0

9,83

4,60

0

10,2

30,3

60

10,5

94,0

80

10,9

68,8

40

11,2

42,8

00

11,6

90,0

40

11,9

67,3

60

12,3

88,3

20

12,7

57,3

20

13,0

85,2

800

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

CUBRID QPS decrease with DataSet size

Queri

es p

er

second

Random INSERT Perfor-mance

Average = 3685Max = 4469Min = 2821

Page 20: Growing in the Wild. The story by CUBRID Database Developers

0

1,0

74,2

19

1,7

69,1

30

2,2

31,0

16

2,5

33,9

65

2,7

97,2

36

3,0

33,1

98

3,2

25,9

48

3,3

99,6

81

3,5

68,5

63

3,7

23,4

71

3,8

73,8

73

4,0

15,6

35

4,1

57,4

33

4,2

89,1

12

4,4

32,9

38

4,5

70,9

20

4,7

06,5

23

4,8

38,0

79

4,9

78,1

52

5,1

18,6

51

5,2

70,6

94

5,4

19,0

56

5,5

46,5

17

5,6

75,6

19

5,8

09,0

68

5,9

41,2

96

6,0

73,4

31

6,2

01,1

38

6,3

34,7

490

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

MySQL QPS decrease with DataSet size

Queri

es p

er

second

Random INSERT Perfor-mance

Average = 1796Max = 8951Min = 1122

Page 21: Growing in the Wild. The story by CUBRID Database Developers

0

746,

073

943,

344

1,04

0,35

2

1,11

2,70

9

1,16

3,96

4

1,21

4,58

0

1,27

3,63

8

1,31

2,50

9

1,35

7,38

3

1,40

8,64

7

1,45

8,56

4

1,50

0,97

2

1,54

3,50

0

1,58

5,75

8

1,62

4,95

3

1,65

6,57

9

1,70

5,83

6

1,75

7,17

2

1,79

1,96

6

1,82

5,71

0

1,84

7,51

7

1,87

7,52

9

1,92

2,12

7

1,95

2,99

1

1,98

5,65

5

2,01

0,43

5

2,04

4,97

7

2,08

7,99

7

2,11

7,61

00

1000

2000

3000

4000

5000

6000

7000

PostgreSQL QPS decrease with DataSet size

Queri

es p

er

second

Random INSERT Perfor-mance

Average = 594Max = 6217Min = 181

Page 22: Growing in the Wild. The story by CUBRID Database Developers

Random INSERT Perfor-mance

094

3,34

41,

074,

219

1,21

4,58

01,

357,

383

1,50

0,97

21,

585,

758

1,70

5,83

61,

791,

966

1,87

7,52

91,

985,

655

2,07

9,00

02,

231,

016

2,79

7,23

63,

225,

948

3,65

2,92

04,

015,

635

4,28

9,11

24,

694,

520

4,97

8,15

25,

270,

694

5,67

5,61

95,

941,

296

6,20

1,13

87,

170,

960

8,55

9,84

09,

834,

600

10,9

68,8

4011

,967

,360

13,0

85,2

80

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

QPS decline over one hour

MySQL QPSCUBRID QPSPostgreSQL QPS

Queri

es p

er

second

Page 23: Growing in the Wild. The story by CUBRID Database Developers

CUBRID Optimizations

Index Features

Reverse Index

Prefix Index

Function Index

Filter Index

Unique Index

Primary Key

Foreign Key

Query Features

Multi-range key limit

Index skip scan

Skip order by

Skip group by

Range Scan op-timizations

Query rewrites

Covering Index

Descending In-dex

Server level opti-mizations

Log compres-sion

Shared Query Plan cache

Locking Opti-mizations

Transaction concurrency

Page 24: Growing in the Wild. The story by CUBRID Database Developers

Filter Index

• Interesting (open) tickets fit into a very small index.• No overhead for INSERT/UPDATE• Very fast results for open tickets

CREATE INDEX ON tickets(component, assignee)WHERE status = ‘open’;

SELECT title, component, assignee FROM usersWHERE register_date > ‘2008-01-01’ AND status = ‘open’;

Page 25: Growing in the Wild. The story by CUBRID Database Developers

QPS Filter vs. Full index0

500,0

00

1,0

00,0

00

1,5

00,0

00

2,0

00,0

00

2,5

00,0

00

3,0

00,0

00

3,5

00,0

00

4,0

00,0

00

4,5

00,0

00

5,0

00,0

00

5,5

00,0

00

6,0

00,0

00

6,5

00,0

00

7,0

00,0

00

7,5

00,0

00

8,0

00,0

00

8,5

00,0

00

9,0

00,0

00

9,5

00,0

00

10,0

00,0

000

1000

2000

3000

4000

5000

6000

7000

QPS Full IndexQPS Filter Index

Queri

es p

er

second

Page 26: Growing in the Wild. The story by CUBRID Database Developers

CUBRID Architecture

APICCI, JDBC, ADO.NET, OLEDB, ODBC,PHP, Perl, Python, Ruby

BrokerQuery Parser Query Optimizer

Query Planer

ServerQuery Man-

agerQuery Execu-

torTransaction

Manager

Lock Manager Log Manager

Storage Manager

File Manager

CUBRID

Page 27: Growing in the Wild. The story by CUBRID Database Developers

Parameterized Queries & Filter Index

• Will not use partial indexPostgreSQL

• Provides workaroundMS SQL Server

• Less flexible, has to be the exact ex-pressionORACLE

• “Shared” Query Plan CacheCUBRID

SELECT title, component, assignee FROM usersWHERE register_date > ? AND status = ?;

SELECT name, email FROM usersWHERE register_date > ? AND age < ? AND age < 18;

Page 28: Growing in the Wild. The story by CUBRID Database Developers

Query Plan Cache

• Cache a plan for the life-span of a driver level pre-pared statement

PostgreSQL

• No query plan cacheMySQL

• “Shared” Query Plan CacheCUBRID

Page 29: Growing in the Wild. The story by CUBRID Database Developers

Query Plan Cache

Parse SQL

Name Resolv-ingSemantic checkQuery Opti-mize

Query Plan

Query Execu-tion

Query Execution without Plan

Cache

Parse SQL

Get Cached Plan

Query Execution

Query Execution with Plan Cache

Page 30: Growing in the Wild. The story by CUBRID Database Developers

Auto Parameterization

SELECT title, component, assignee FROM usersWHERE register_date > ‘2008-01-01’ AND status = ‘open’;

SELECT title, component, assignee FROM usersWHERE register_date > ? AND status = ?;

Page 31: Growing in the Wild. The story by CUBRID Database Developers

#2

Scalability

Page 32: Growing in the Wild. The story by CUBRID Database Developers

Scalability challenges

• How to synchronize?– Async

• Load balancing?– Third-party solution

• Who handles Fail-over?– Application– Third-party solution

• Cost?

Page 33: Growing in the Wild. The story by CUBRID Database Developers

HA solutions

DBMS Cost Disk-shared

Replication

Consistency

Auto- Failover

Oracle RAC +++++

Shared everything N/A N/A O

MS-SQL Cluster +++ Shared

everything N/A N/A O

MySQL Cluster ++ Shared

nothing Log Based AsyncSync O

MySQL Replication

+ Third-party

Free Shared nothing

Statement Based Async O

CUBRID Free Shared nothing Log Based

SyncSemi-sync

AsyncO

Page 34: Growing in the Wild. The story by CUBRID Database Developers

ClientRe-quests

1. Non-stop 24/7 service uptime2. No missing data between nodes

Phase 1

v8.1.0

Phase 2v8.2.x

Phase 4

v8.3.x

Phase 5v8.4.x

Phase 6Apricot

Replica-tion

HASupport

Ex-tended

HAfeatures

HAMonitoring

+

Easy AdminScripts

Async AutoFail-over

HA Sta-tus

Monitor-ing

HAPerfor-

mance+

ReduceReplicationDelay Time

CUBRIDHeartbeat

HA +Replica

AdminScripts

Read-Write Serviceduring DB mainte-

nance

Async,Semi-sync,

Sync

Broker Modes

(RW, RO)

Page 35: Growing in the Wild. The story by CUBRID Database Developers

N:N Master:Slave

http://www.cubrid.org/cubrid_ha_oscon

1:1 M:S1:N M:S1:1:N M:S:RN:N M:SN:1 M:S

Page 36: Growing in the Wild. The story by CUBRID Database Developers

CUBRID HA: Benefits

• Non-stop maintenance• Auto Fail-over• Large Installations are Easy• Load balancing• Accurate and reliable Failure detection• Various Master-Slave Configurations:– 3 replication modes– 3 broker modes

Page 37: Growing in the Wild. The story by CUBRID Database Developers

Database Sharding

• Partitioning

Divide the data between

multiple tables within one

Database Instance

• Sharding

Divide the data between

multiple tables created in

separate Database Instances

DB

X Y Z

DB

X

DB

Y

DB

Z

Shard

Page 38: Growing in the Wild. The story by CUBRID Database Developers

Without Database Shard-ing

Tbl1

Tbl2

Tbl3

Broker

App

DB

Tbl4

Page 39: Growing in the Wild. The story by CUBRID Database Developers

With Database Sharding

Tbl1

Tbl2

Tbl3

Broker

App

DB

Tbl4

MetadataDirectory

Page 40: Growing in the Wild. The story by CUBRID Database Developers

CUBRID SHARDPhase 1

ApricotPhase 2

Banana

UnlimitedShards

DataRebalanc-

ing

MultipleShard ID Gen. Algo-

rithm

Connection & Statement

Pooling

Load Balancing

HA Support

CUBRID, MySQL, Oracle Support

Page 41: Growing in the Wild. The story by CUBRID Database Developers

Sharding: Benefits

• Developer friendly– Single database view– No more application logic– No application changes

• Multiple sharding strategies• Native scale-out support• Load balancing• Support for heterogeneous

databases

Page 42: Growing in the Wild. The story by CUBRID Database Developers

#3

Ease of Use

Page 43: Growing in the Wild. The story by CUBRID Database Developers

Phase 1v.8.2.x

Phase 2

v.8.3.x

Phase 4v8.4.x

Phase 6

Apricot

Oracle MySQL MySQL MySQL,Oracle

HierarchicalQuery

SQL: 60+PHP: 20+

SQL: 70+PHP: 20+

CurrencySQL

LOB,API++

Implicit Type

Conver-sion+

Usabil-ity+

Usability+++

RegExpr

MSSQL win-back

MySQL, Oracle win-back:

Monitoring system

Oracle: Ads,

Shopping

ClientRe-quests

SQL Compatibility

> 90% MySQL SQL Compatibility

Page 44: Growing in the Wild. The story by CUBRID Database Developers

ClientRe-quests

1. API Support2. Ease of Migration3. Usability

Phase 1

v.8.1.x

Phase 2v.8.3.x

Phase 3

v.8.4.x

Phase 3Apricot

CM CM, CQB, CMT

CUNI-TOR

Web man-ager

CMMonitoring

++

Phase 1v.8.1.x

Phase 2v.8.2.x

Phase 3v.8.3.x

Phase 4v.8.4.x

CCI, JDBC, OLEDB

PHP, Python, Ruby

ODBC Perl, ADO.NET

Page 45: Growing in the Wild. The story by CUBRID Database Developers

MSSQL Win-Back in 2010

Dual Read/Writer

MS SQL

Application

CUBRID

ReadWrite

[Step1] Dual Write

Dual Read/Writer

MS SQL

Application

CUBRID

ReadWrite

[Step2] Dual Write and Read

Application

CUBRID

ReadWrite

[Step3] Win-back Complete

• 16 Master/Slave servers and 1 Archive server• DB size:

0.4~0.5 billion/DB, Total 4 billion records Total 3.2 TB Total 4,000 ~ 5,000 QPS

• Save money for MSSQL License and SAN Storage

Page 46: Growing in the Wild. The story by CUBRID Database Developers

ORACLEEnterprise CUBRID

ORACLEStandardORACLE

StandardORACLEStandardORACLE

Standard

CUBRIDCUBRID

CUBRIDCUBRID

40 servers

25 servers

• DB size: 1.5 ~ 2.0 TB/DB, Total 40 TB 10~100K Inserts per second

• Save money for Oracle License and SAN Storage

1 server

Oracle Win-Back in 2011

System Monitoring Service

Page 47: Growing in the Wild. The story by CUBRID Database Developers

What we have learnt so far and Where we are heading to?

Page 48: Growing in the Wild. The story by CUBRID Database Developers

What we have learnt so far

• Not easy to break users’ habits.• Need time.• Technical support is the key to

acceptance!• Some services don’t deserve Oracle.

Page 49: Growing in the Wild. The story by CUBRID Database Developers

CUBRID Deployment in NHN

~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q0

20

40

60

80

100

120

140

0

100

200

300

400

500

42 5060

6977

8294

100107

117

166181

208

259273 283

312326

346

500

∑ services ∑ deployments

Page 50: Growing in the Wild. The story by CUBRID Database Developers

CUBRID

Stability Performance

Scalability Ease of Use

Achievements

• Human vs. DB Errors• # of customers

• Smart Index Optimizations• Shared Query Caching• Web Optimized Features• Load Balancer

• High-Availability w/ auto fail-over• Sharding• Data Rebalancer• Cluster

• > 90% MySQL SQL Compatibility• Native Migration Tool• Native GUI DB Management Tools• Monitoring Tools

Page 51: Growing in the Wild. The story by CUBRID Database Developers

CUBRID Roadmap

8.4.x

Performance++Covering index,Key limit, Range scan

SQL Compatibil-ity+70+ new syntax

HA++Monitoring tools

I18N, L10N2~3 European charsets

SQL Compatibil-ity++Cursor holdability,Mass table UPDATE &DELETE

DB SHARDING

I18N, L10N+more charsets

Performance+++ SQL monitoring perfor-

mance+ SQL Compatibility+++ Table Partitioning Improve-

ments DB SHARDING+

Performance++++ CURBID Lite SQL Compatibility++

++ DB Monitoring

Improvements Arcus Caching Inte-

gration

Page 52: Growing in the Wild. The story by CUBRID Database Developers

CUBRID is Big now.

What can you do?

1. Keep watching it2. Consider using3. Discuss, talk, write about CUBRID4. Support CUBRID in your apps5. Contribute to CUBRID6. Provide CUBRID service

Page 54: Growing in the Wild. The story by CUBRID Database Developers

. . .

• How do CUBRID developers cope with stress?– Join MySQL issue tracker ;)

• Want more?– Follow us to the next room. We’ll have

more discussions!