20141206 4 q14_dataconference_i_am_your_db

I’m your DB( I need a database that scales )

FB/hyeongchae.lee

4Q14 DataConference.IO 1


I’m your DB!May the oracle be with you

Agenda

• About me

• DBMS vs NoSQL

• Local vs Global

• So... which databases scale?

• Amazon Aurora


ABOUT ME

----------------------------



INERVITMobileLite

nhnCUBRID

TELCOWARE

Telcobase

ALTIBASEAltibase

TIBEROTibero


Global Open Frontier Full-time

• Project : MySQL Redis Plug-in ( +MariaDB, +MaxScale )

– https://github.com/sql2/MySQL_Redis_Plugin_Dev


MySQL Memcached Plug-in


Mysqld

MySQL Server

Handler API

Memcached plugin

innodb_memcache

local cache(optional)

InnoDB API

InnoDB Storage Engine

SQL Memcached protocol

Application

MySQL Redis Plug-in


Mysqld

MySQL Server

Handler API

Redis plugin

innodb_redislocal cache(optional)

InnoDB API

InnoDB Storage Engine

SQL Redis protocol

Application

2015 : MaxScale Redis Cluster Plug-in


URL : https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay

DBMS VS NoSQL


Rank Last Month DBMS Database Model Score Changes

1 1 Oracle Relational DBMS 1452.13 -19.77

2 2 MySQL Relational DBMS 1279.08 +16.11

3 3 Microsoft SQL Server Relational DBMS 1220.20 +0.59

4 4 PostgreSQL Relational DBMS 257.36 -0.36

5 5 MongoDB Document store 244.73 +4.33

6 6 DB2 Relational DBMS 206.23 -1.44

7 7 Microsoft Access Relational DBMS 138.84 -2.80

8 8 SQLite Relational DBMS 95.28 +0.33

9 10 Cassandra Wide column store 91.99 +6.29

10 9 Sybase ASE Relational DBMS 84.62 -2.17

DB-Engines Ranking


2014.11.24

http://db-engines.com/en/ranking


http://db-engines.com/en/ranking_categories

Winner !!


Magic Quadrant for Operational Database Management Systems


1 Oracle's Letter to the EU Concerning MySQL

After an antitrust investigation, the European Commission approved Oracle's acquisition of Sun Microsystems, including MySQL, on 21 January 2010.

Wikileaks subsequently published cables indicating that the Obama administration applied pressure to the EU to approve the deal.

Concerns about the MySQL acquisition had been addressed in Oracle's 14 December 2009 pledges to customers, which were to extend for five years — thus expiring in early 2015.

Oracle's pledges included commitments to maintain certain APIs, extensions of licenses to then-current licensees, continued use of GPL licensing, and others. The expiration of these commitments may change the nature of Oracle's relationships with a number of hardware and software vendors, as well as its posture regarding product investment, support for purchasing requirements, and other aspects of MySQL's business model.

LOCAL VS GLOBAL


Korean vs Japan

50M vs 127M


Korea vs Japan


Slave Slave

Master

Slave

Slave Slave

Master

Slave

x3

KakaoTalk vs LINE


KakaoTalk vs LINE


We Love FusionIO !!


• facebook/flashcache

Dolphinics’ Dolphin Interconnect Solutions


MEMSCALE


SO... WHICH DATABASES SCALE?


Read Caching

• Pros : Read-caching can take over a lot of read operations. If reads make up most of your workload, this will obviously help a lot. Even if you have a heavy write workload, read-caching might be enough to keep you from having to scale-out to handle writes.

• Cons : Read-caching, by nature, involves a memory store. If your data-access patterns are really random, or involve a large percentage of records, you might wind up with a pretty expensive memory foot print. Figuring out the right cache-invalidation for your app can also be really tricky. Many memory stores are pretty basic in terms of functionality — lack of support for transactions & joins can mean that you’ll need multiple process or network round-trips between the app & the cache.


http://spiegela.com/2014/04/28/but-i-need-a-database-that-scales-part-1

Write Coalescing

• Pros : In short: you can achieve better throughput of incoming writes. With many caching systems, you can also query the data in the cache creating a set of real-time use cases including: event-processing, triggers & real-time analytics.

• Cons : Coalescing writes will inherently mean that your persistence layer is behind your ingestion layer. To take advantage of this technique, you’ll need to consider a lot of questions:– Which data to query: cached, persisted, both?

– Does this data need to be made durable (survives a reboot)? How quickly?

– Are there consistency concerns? Unique indices? Atomic transaction?


Connection Scaling

• Pros : Connection scaling increases the number of concurrent connections (obviously, I think?) It’s biggest benefit, though, is in reliability, since any cluster node can fail and clients can simply reconnect.

• Cons : Connection Scaling requires shared storage. RAC, for example, typically uses OCFS, a clustered file-system, and SAN storage. The ability to handle more I/O transactions is dependent on scaling up that shared storage tier, which can be very expensive. Connection Scaling also doesn’t help much with capacity or analysis scaling since the data is shared, not spread out across nodes.


Master-Slave Replication

• Pros : While there’s some setup involved, it’s pretty seamless to your application. There’s still only a single node that has control over the data, so there are no new concerns around consistency. For read-constrained applications, nodes can be added quickly and the architecture remains relatively simple.

• Cons : MSR solves one problem: reader transactions. If you need to scale other aspects, you’re not doing it here. If you need more write throughput, MSR offloads the read transactions from the master, but writes are still limited to a single node. Also, slaves can lag in their updates from the master, if you need absolute consistency between the two, you’ll need to investigate options for synchronous replication which can impact performance of the master node.


Vertical Partitioning ( aka cluster )

• Pros : Having smaller databases makes indices perform better, and allows you to improve just about any aspect of scaling.

• Cons : If your model requires relationships between most or all of your tables for the basic operations, vertically partitioning may not be a fit. Even when you model fits well into partitions today, having these divisions can impact flexibility of performing joins across models in the future.


Horizontal Partitioning ( aka shard )

• Pros : This type of partitioning provides scaling for all of the elements of scale, allowing for very large data-sets and very good performance.

• Cons : Sharding can have a lot of drawbacks depending on the implementation. For one thing, the client must be aware of the partition key. When implementing sharding in MySQL, for example, an application will typically infer the partition key, and address the desired partition. Increasing the number of nodes, or changing the key requires an update to the app each time. Other trade-offs like database features are up for grabs too:

– Joins: if my data for two collections is distributed across multiple nodes, when I fetch the data back, I may need to join data across more than one — which is likely to be slower

– Transactions: if I have a transaction that involves two nodes of the cluster, how to I execute them atomic-ly? Do I lock multiple nodes? All of them?

– Bulk commits: If I update records in bulk across multiple nodes, this is really two transactions executed separately.


So... which databases scale?

• Scale Out Reads• Capacity• Scale Out Analysis• Scale Out Writes• Bulk Commits• Joins• Transactions• Durability• Consistency



Scaling Storytime

• http://en.wikipedia.org/wiki/Brad_Fitzpatrick


One Server


MySQL

Apache

Internet

• Simple:

Two Server


MySQL

Apache

Internet

• Two SPOF

• Replication !

Five Server


Master

Apache

InternetApache

Apache

Slaveread

write

replication

More Server

• Chaos !


Master

Apache

Internet Apache

Slave

Apache

Apache

Apache

Apache Slave

Slave SlaveSlave

Slave

Cluster vs Shard

Multi-Master

Cluster

Shard

Cluster + Shard4Q14 DataConference.IO 38

MySQL Recruit

• Big Table ( X )

Small Table ( O )

• Performance ( X )

Scale-up ( O ) Distributed ( O )

• Query Tuning

hard ...

• Clustering & Sharding

mission ...


AMAZON AURORA


http://www.theregister.co.uk/2014/11/26/inside_aurora_how_disruptive_is_amazons_mysql_clone/


OSSCON 4Q14 42

Software

20141206 4 q14_dataconference_i_am_your_db