Upload
hyeongchae-lee
View
2.080
Download
0
Tags:
Embed Size (px)
Citation preview
I’m your DB( I need a database that scales )
FB/hyeongchae.lee
4Q14 DataConference.IO 1
4Q14 DataConference.IO 2
I’m your DB!May the oracle be with you
Agenda
• About me
• DBMS vs NoSQL
• Local vs Global
• So... which databases scale?
• Amazon Aurora
4Q14 DataConference.IO 3
ABOUT ME
----------------------------
4Q14 DataConference.IO 4
4Q14 DataConference.IO 5
INERVITMobileLite
nhnCUBRID
TELCOWARE
Telcobase
ALTIBASEAltibase
TIBEROTibero
4Q14 DataConference.IO 6
Global Open Frontier Full-time
• Project : MySQL Redis Plug-in ( +MariaDB, +MaxScale )
– https://github.com/sql2/MySQL_Redis_Plugin_Dev
4Q14 DataConference.IO 7
MySQL Memcached Plug-in
4Q14 DataConference.IO 8
Mysqld
MySQL Server
Handler API
Memcached plugin
innodb_memcache
local cache(optional)
InnoDB API
InnoDB Storage Engine
SQL Memcached protocol
Application
MySQL Redis Plug-in
4Q14 DataConference.IO 9
Mysqld
MySQL Server
Handler API
Redis plugin
innodb_redislocal cache(optional)
InnoDB API
InnoDB Storage Engine
SQL Redis protocol
Application
2015 : MaxScale Redis Cluster Plug-in
4Q14 DataConference.IO 10
URL : https://mariadb.com/blog/maxscale-proxy-mysql-replication-relay
DBMS VS NoSQL
4Q14 DataConference.IO 11
Rank Last Month DBMS Database Model Score Changes
1 1 Oracle Relational DBMS 1452.13 -19.77
2 2 MySQL Relational DBMS 1279.08 +16.11
3 3 Microsoft SQL Server Relational DBMS 1220.20 +0.59
4 4 PostgreSQL Relational DBMS 257.36 -0.36
5 5 MongoDB Document store 244.73 +4.33
6 6 DB2 Relational DBMS 206.23 -1.44
7 7 Microsoft Access Relational DBMS 138.84 -2.80
8 8 SQLite Relational DBMS 95.28 +0.33
9 10 Cassandra Wide column store 91.99 +6.29
10 9 Sybase ASE Relational DBMS 84.62 -2.17
DB-Engines Ranking
4Q14 DataConference.IO 12
2014.11.24
http://db-engines.com/en/ranking
4Q14 DataConference.IO 13
http://db-engines.com/en/ranking_categories
Winner !!
4Q14 DataConference.IO 14
Magic Quadrant for Operational Database Management Systems
4Q14 DataConference.IO 15
1 Oracle's Letter to the EU Concerning MySQL
After an antitrust investigation, the European Commission approved Oracle's acquisition of Sun Microsystems, including MySQL, on 21 January 2010.
Wikileaks subsequently published cables indicating that the Obama administration applied pressure to the EU to approve the deal.
Concerns about the MySQL acquisition had been addressed in Oracle's 14 December 2009 pledges to customers, which were to extend for five years — thus expiring in early 2015.
Oracle's pledges included commitments to maintain certain APIs, extensions of licenses to then-current licensees, continued use of GPL licensing, and others. The expiration of these commitments may change the nature of Oracle's relationships with a number of hardware and software vendors, as well as its posture regarding product investment, support for purchasing requirements, and other aspects of MySQL's business model.
LOCAL VS GLOBAL
4Q14 DataConference.IO 16
Korean vs Japan
50M vs 127M
4Q14 DataConference.IO 17
Korea vs Japan
4Q14 DataConference.IO 18
Slave Slave
Master
Slave
Slave Slave
Master
Slave
x3
KakaoTalk vs LINE
4Q14 DataConference.IO 19
KakaoTalk vs LINE
4Q14 DataConference.IO 20
We Love FusionIO !!
4Q14 DataConference.IO 21
• facebook/flashcache
Dolphinics’ Dolphin Interconnect Solutions
4Q14 DataConference.IO 22
MEMSCALE
4Q14 DataConference.IO 23
SO... WHICH DATABASES SCALE?
4Q14 DataConference.IO 24
Read Caching
• Pros : Read-caching can take over a lot of read operations. If reads make up most of your workload, this will obviously help a lot. Even if you have a heavy write workload, read-caching might be enough to keep you from having to scale-out to handle writes.
• Cons : Read-caching, by nature, involves a memory store. If your data-access patterns are really random, or involve a large percentage of records, you might wind up with a pretty expensive memory foot print. Figuring out the right cache-invalidation for your app can also be really tricky. Many memory stores are pretty basic in terms of functionality — lack of support for transactions & joins can mean that you’ll need multiple process or network round-trips between the app & the cache.
4Q14 DataConference.IO 25
http://spiegela.com/2014/04/28/but-i-need-a-database-that-scales-part-1
Write Coalescing
• Pros : In short: you can achieve better throughput of incoming writes. With many caching systems, you can also query the data in the cache creating a set of real-time use cases including: event-processing, triggers & real-time analytics.
• Cons : Coalescing writes will inherently mean that your persistence layer is behind your ingestion layer. To take advantage of this technique, you’ll need to consider a lot of questions:– Which data to query: cached, persisted, both?
– Does this data need to be made durable (survives a reboot)? How quickly?
– Are there consistency concerns? Unique indices? Atomic transaction?
4Q14 DataConference.IO 26
Connection Scaling
• Pros : Connection scaling increases the number of concurrent connections (obviously, I think?) It’s biggest benefit, though, is in reliability, since any cluster node can fail and clients can simply reconnect.
• Cons : Connection Scaling requires shared storage. RAC, for example, typically uses OCFS, a clustered file-system, and SAN storage. The ability to handle more I/O transactions is dependent on scaling up that shared storage tier, which can be very expensive. Connection Scaling also doesn’t help much with capacity or analysis scaling since the data is shared, not spread out across nodes.
4Q14 DataConference.IO 27
Master-Slave Replication
• Pros : While there’s some setup involved, it’s pretty seamless to your application. There’s still only a single node that has control over the data, so there are no new concerns around consistency. For read-constrained applications, nodes can be added quickly and the architecture remains relatively simple.
• Cons : MSR solves one problem: reader transactions. If you need to scale other aspects, you’re not doing it here. If you need more write throughput, MSR offloads the read transactions from the master, but writes are still limited to a single node. Also, slaves can lag in their updates from the master, if you need absolute consistency between the two, you’ll need to investigate options for synchronous replication which can impact performance of the master node.
4Q14 DataConference.IO 28
Vertical Partitioning ( aka cluster )
• Pros : Having smaller databases makes indices perform better, and allows you to improve just about any aspect of scaling.
• Cons : If your model requires relationships between most or all of your tables for the basic operations, vertically partitioning may not be a fit. Even when you model fits well into partitions today, having these divisions can impact flexibility of performing joins across models in the future.
4Q14 DataConference.IO 29
Horizontal Partitioning ( aka shard )
• Pros : This type of partitioning provides scaling for all of the elements of scale, allowing for very large data-sets and very good performance.
• Cons : Sharding can have a lot of drawbacks depending on the implementation. For one thing, the client must be aware of the partition key. When implementing sharding in MySQL, for example, an application will typically infer the partition key, and address the desired partition. Increasing the number of nodes, or changing the key requires an update to the app each time. Other trade-offs like database features are up for grabs too:
– Joins: if my data for two collections is distributed across multiple nodes, when I fetch the data back, I may need to join data across more than one — which is likely to be slower
– Transactions: if I have a transaction that involves two nodes of the cluster, how to I execute them atomic-ly? Do I lock multiple nodes? All of them?
– Bulk commits: If I update records in bulk across multiple nodes, this is really two transactions executed separately.
4Q14 DataConference.IO 30
So... which databases scale?
• Scale Out Reads• Capacity• Scale Out Analysis• Scale Out Writes• Bulk Commits• Joins• Transactions• Durability• Consistency
4Q14 DataConference.IO 31
4Q14 DataConference.IO 32
Scaling Storytime
• http://en.wikipedia.org/wiki/Brad_Fitzpatrick
4Q14 DataConference.IO 33
One Server
4Q14 DataConference.IO 34
MySQL
Apache
Internet
• Simple:
Two Server
4Q14 DataConference.IO 35
MySQL
Apache
Internet
• Two SPOF
• Replication !
Five Server
4Q14 DataConference.IO 36
Master
Apache
InternetApache
Apache
Slaveread
write
replication
More Server
• Chaos !
4Q14 DataConference.IO 37
Master
Apache
Internet Apache
Slave
Apache
Apache
Apache
Apache Slave
Slave SlaveSlave
Slave
Cluster vs Shard
Multi-Master
Cluster
Shard
Cluster + Shard4Q14 DataConference.IO 38
MySQL Recruit
• Big Table ( X )
Small Table ( O )
• Performance ( X )
Scale-up ( O ) Distributed ( O )
• Query Tuning
hard ...
• Clustering & Sharding
mission ...
4Q14 DataConference.IO 39
AMAZON AURORA
4Q14 DataConference.IO 40
http://www.theregister.co.uk/2014/11/26/inside_aurora_how_disruptive_is_amazons_mysql_clone/
4Q14 DataConference.IO 41
OSSCON 4Q14 42