Conquering “big data”
An Introduction to
Shard-QueryA MPP distributed middleware solution for MySQL databases
Big Data is a buzzword
Shard-Query works with big data, but it works
with small data too
You don’t have to have big data to have big
performance problems with queries
Big performance problems
MySQL typically has performance problems on OLAP workloads for even tens of gigabytes* of data
Analytics
Reporting
Data mining
MySQL is generally not scalable*^ for these workloads
• * By itself. The point of this talk is to show how Shard-Query fixes this :)
• ^ Another presentation goes into depth as to why MySQL doesn't scale for OLAP
Not only MySQL has these issues
All major open source databases have problems
with these workloads
Why?
Single threaded queries
When all data is in memory, accessing X rows is
generally X times as expensive as accessing ONE row
even when multiple cpus could be used
MySQL scalability model is good for OLTP
MySQL was created at a time when commodity
machines
Had a small (usually one) CPU core
Had small amounts of memory and limited disk IOPS
Managed a small amount of data
It did not make sense to code intra-query
parallelism for these servers. They couldn’t take
advantage of it anyway.
The new age of multi-core
“If your time to you is worth saving,then you better start swimming.
Or you'll sink like a stone.For the times they are a-changing.”
Core Core
Core Core
Core Core
Core Core
CPU
Core Core
Core Core
Core Core
Core Core
CPU
Core Core
Core Core
Core Core
Core Core
CPU
Core Core
Core Core
Core Core
Core Core
CPU
- Bob Dylan
It is 2013. Still only single threaded queries.
Building a multi-threaded query plan is a lot
different than building a single threaded query
plan
The time investment to build a parallel query interface
inside of MySQL would be very high
MySQL has continued to focus on excellence for OLTP
workloads while leaving the OLAP market untapped
Just adding basic subquery options to the optimizer has
taken many years
MySQL scales great for OLTP because
MySQL has been improved significantly, especially
in 5.5 and 5.6
Many small queries are “balanced” over many
CPUs naturally
Large memories allow vast quantities of hot data
And very fast disk IO means that
The penalty for cache miss is lower
No seek penalty for SSD especially reduces cost of
concurrent misses from multiple threads (no head
movement)
But not for OLAP
Big queries "peg" one CPU and can use no more
CPU resources (low efficiency queries)
Numerous large queries can "starve" smaller
queries
This is often when innodb_thread_concurrency needs
to be set > 0
But not for OLAP (cont)
When the data set is significantly larger than
memory, single threaded queries often cause the
buffer pool to "churn"
While SSD helps somewhat, one thread can not read
from an SSD at maximum device capacity
Disk may be capable of 1000s of MB/sec, but the single
thread is generally limited to <100MB/sec
A multi-threaded workload could much better utilize the disk
Response similar to the NoSQL movement
Rather than fix the database or build complex
software, users just change the underlying
database
Many closed source vendors have stepped in and
provided OLAP SQL solutions
Hardware: IBM Netezza, Oracle Exadata
Software: HP Vertica, Vectorwise, Teradata, Greenplum
Response similar to the NoSQL movement (cont)
Or SQL => map/reduce interfaces
Apache Hadoop/Apache Hive
Impala
Map/R
Cloudera CDH
Google built a SQL interface to BigTable too…
Limitations
No correlated subqueries for example
What do those map/reduce things do?
Split data up over multiple servers (HDFS)
During query processing
Map (fetch/extract/select/etc) raw data from files or
tables on HDFS
Write the data into temporary areas
Shuffle temporary data to reduce workers
Final reduce written
Return results
Those sounds expensive…
It is (in terms of dollars for closed solutions)
It is (in terms of execution time for open solutions)
The map is especially expensive when data is
unstructured and it must be done repeatedly for
each different query you run
And complicated…
You get
a whole new toolchain
A new set of data management tools
A new set of high availability tools
And all new monitoring tools to learn!
Even if MySQL supported parallel query:
MySQL* doesn’t do distributed queries Those Map/Reduce solutions (and the closed
source databases) can use more than one server!
Building a query plan for queries that must
execute over a sharded data set has additional
challenges:
SELECT AVG(expr)
must be computed as:
SUM(expr)/COUNT(expr) AS`AVG(expr)`
* Again, Shard-Query does. Almost there.
Probably the simplest example of a necessary rewrite
MySQL network storage engines
Don't these engines claim to be parallel?
Fetching of data from remote machines may be done in
parallel, but query processing is coordinated by a serial
query thread
A sum still has to examine each individual row from every
server serially
Joins are still evaluated serially (in many cases)
The engine is parallel, but the SQL layer using the
engine is not.
NDB
NDB is bad for star schema
Dimension table rows are not usually co-located with
fact rows.
Engine condition pushdown may help somewhat to
alleviate network traffic but joins still have to traverse
the network which is expensive
Aggregation still serial
SPIDER
SPIDER is bad for star schema too
Nested loops may be very bad for SPIDER and star
schema if the fact table isn't scanned first (must use
STRAIGHT_JOIN hint extensively).
MRR/BKA in MariaDB might help?
Still no parallel aggregation or join.
CONNECT
Has ECP
No ICP or ability to expose remote indexes
Always uses join buffer(BNLJ) or BKAJ
Fetches in parallel
No parallel join
No parallel aggregation
Those are not parallel query solutions
Those engines are not OLAP parallel query
They are for OLTP lookup and/or filtering performance. Often can't sort in parallel.
They can offer improved performance when large numbers of rows are filtered from many machines in parallel
When aggregating, a query must return a small resultset before aggregation for good performance
star schema should be avoided
Enter Shard-QueryMassively parallel query execution for MySQL variants
Enter Shard-Query
Keep using MySQL
Choose a row store like XtraDB, InnoDB or TokuDB*
Choose a column store like ICE*, Groonga**
Use CSV, TAB, XML, or other data with the CONNECT**
engine in MariaDB 10
** These engines have not been thoroughly tested
* These engines work, but with some limitations due to bugs
Shard-Query connects to 3306…
Shard-Query can use any MySQL variant as a data
source
You continue to use regular SQL, no map/reduce
Is built on MySQL, PHP and Gearman – well proven
technologies
You probably already know these things.
Shard-Query re-writes SQL
Flexible
Does not have to re-implement complex SQL
functionality because it uses SQL directly
Hundreds of MySQL functions and features available out
of the box
Small subset* of functions not available
last_insert_id(), get_lock(), etc.
* https://code.google.com/p/shard-query/wiki/UnsupportedFeatures
Shard-Query re-writes SQL
Familiar SQL
ORDER BY, GROUP BY, LIMIT, HAVING, subqueries, even
WITH ROLLUP, all continue to work as normal
Support for all MySQL aggregate functions including
count(distinct)
Aggregation and join happens in parallel
* https://code.google.com/p/shard-query/wiki/UnsupportedFeatures
You don't have to know
PHP to use Shard-Query!
Just use SQL
You can still connect to 3306 (and more)!
Shard-Query has multiple ways of interacting with
your application
The PHP OO API is the underlying interface.
The other interfaces are built on it:
MySQL Proxy Lua script (virtual database)
HTTP or HTTPS web/REST interface
Access the database directly from Javascript?
Submit Gearman jobs (as SQL) directly from almost any
programming language
MySQL Proxy
Web Interface
Command line (with explain plan)
echo "select * from (select count(*) from lineorder) sq;"|phprun_query --verbose
SQL SET TO SEND TO SHARDS:Array ( [0] => SELECT COUNT(*) AS expr_2942896428 FROM lineorder AS `lineorder` WHERE 1=1 ORDER BY NULL )SENDING GEARMAN SET to: 2 shards
SQL FOR COORDINATOR NODE: SELECT SUM(expr_2942896428) AS `count(*)` FROM `aggregation_tmp_21498632`
SQL SET TO SEND TO SHARDS:Array ( [0] => SELECT * FROM ( SELECT SUM(expr_2942896428) AS `count(*)` FROM `aggregation_tmp_21498632` ) AS `sq` WHERE 1=1 )SENDING GEARMAN SET to: 1 shards
SQL TO SEND TO COORDINATOR NODE:SELECT * FROM `aggregation_tmp_88629847`
[count(*)] => 1199721041 rows returned Exec time: 0.053546905517578
Shard-Query constructs parallel queries
MySQL can’t run a single query in multiple threads
but it can run multiple queries at once in multiple
threads (with multiple cores)
Shard-Query breaks one query into multiple
smaller queries (aka tasks)
Tasks can run in parallel on one or more servers
OLAP into OLTP
Partitioning tables for parallelismThis is similar to Oracle Parallel Query
Partitioning splits queries on a single machine
Supports partitioning to divide up a table
RANGE, LIST and RANGE/LIST COLUMNS over a single
column
Each partition can be accessed in parallel as an
individual task
A different way to look at it:
You get to move all the pieces at the same time
T1
T4
T8
T32
T48
T64
T1
T4
T8
VERSUSSINGLE THREADED PARALLEL
*Small portion of execution is still serial, so speedup won't be quite linear (but should be close)
Sharding
Sharded tables split data over many servers
Works similarly to partitioning.
You specify a "shard key". This is like a
partitioning key, but it applies to ALL tables in the
schema.
If a table contains the "shard key", then the table is
spread over the shards based on the values of that
column
Pick a "shard key" with an even data distribution
Currently only a single column is supported
Unsharded Tables
Tables that don't contain the "shard key" are
called "unsharded" tables
A copy of these tables is replicated on ALL nodes
It is a good idea to keep these tables relatively small
and update them infrequently
You can freely join between sharded and unsharded
tables
You can only join between sharded tables when the
join includes the shard key*
* A CONNECT or FEDERATED table to a Shard-Query proxy can be used to
support cross-shard joins. Consider MySQL Cluster for cross-shard joins.
Parallel Execution
Shardingand/or
Partitioned Tables
GearmanShard-Query
RESTProxy
PHP OO
Task1 Shard1 Partition 1
Task2 Shard1 Partition 2
Task3 Shard2 Partition 1
Task4 Shard2 Partition 2
+ + =
Data Flow
SQL
DATA
Sharding for big dataOr how I stopped worrying and learned to scale out the database
You can only scale up so far
MySQL still caps out at between 24 and 48 cores
though it continues to improve (5.7 will be the
best one ever?)
If you are collecting enough data you will
eventually need to use more than one machine to
get good performance on queries over a large
portion of the data set
Scale Out – And Up
You could choose to use 4 servers with 16 cores or
2 servers with 32 cores
Usually depends on how large your data set is
Keep as much data in memory as possible
Scale Out – And Up
In the cloud many small servers can leverage memory more efficiently than a few large ones
Run 8 smaller servers with (in aggregate)
16 cores (52 total ECU) [2/per]
136.8GB memory [17.1/per]
3360MB combined local HDD storage [420/per]
This is the almost the same price as a single large SSD based machine
16 cores
64 GB of ram (35 ECU)
2048MB local SSD storage
The large machine had SSD though
If the workload is IO bound (working set >128GB)
Go with the large machine with 16 cores
Very fast IO
Getting data into memory so that the CPUs can
work on it is more important
Downgrade to smaller machines if the working set
shrinks
Still partition for parallelism
Scale "in and out"
Splitting a shard in Shard-Query is a manual (but
easy) process
Only supported when the directory mapper is used
mysqldump the data from the shard with the –T option
(or use mydumper)
Truncate the tables on the old shard
Create the tables on the new shard
Update the mapping table to split the data
Use the Shard-Query loader to load data
Combine with Map/Reduce
Use Map/Reduce jobs to extract data from HDFS
and write it into ICE
Execute high performance low latency MySQL
queries over the data
Combine with Map/Reduce (cont)
Make fast insights into smaller amounts of data
extracted from petabyte HDFS data stores
Extract a particular year of climate data
Or particular cultivars when comparing genomic plant
data
Open source ETL tools can automate this process
Performance Examples
Simple In-Memory COUNT(*) query performance on Wikipedia traffic stats
Working set: 128GB of data
2.5528580558.06488761313.326974218.5057123225.341401732.9345543240.19016381
44.6940.87
129.0382018
213.2315872
296.091397
405.4624271
526.9528692
643.0426209
750.457135
0
100
200
300
400
500
600
700
800
8 Pawns
The King
Linear (8 Pawns)
Linear (The King)
Days 8 Pawns The King
1 2.552858 40.84573
2 5.090356 81.4457
3 8.064888 129.0382
4 10.74412 171.9059
5 13.32697 213.2316
6 16.0227 256.3633
7 18.50571 296.0914
8 21.02053 336.3285
9 25.3414 405.4624
10 29.69324 475.0918
11 32.93455 526.9529
12 36.5517 584.8272
13 40.19016 643.0426
14 42.75 699.1011
15 44.69 750.4571
Shard-Query is scanning about 1B rows/sec
Star Schema Benchmark – Scale 10
6 cores
Partitioning for single node scaleup
6 worker threads
XTRADB
Star Schema Benchmark – Scale 10
6 cores
Partitioning for single node scaleup
6 worker threads
XTRADB
Schema Design for Big Data
Best schema – flat tables (no joins)
Scale to hundreds of machines with tens to
hundreds of terabytes each
Dozens or hundreds of columns per table
Can use map/reduce when you need to join
between sharded tables (Map/R or something
other than Shard-Query is used for this)
Joins to lookup tables can still be done but do so
with care
One table model (flat table, no joins)
Great for machine generated data - quintessential
big data.
Call data records (billing mediation and call analysis)
Sensor data (Internet of Things)
Web logs (Know thy own self before all others)
Hit/click information for advertising
Energy metering
Almost any large open data set
Ideal schema – flat tables (no joins)
Why one big table?
ICE/IEE
ICE and IEE engines are append-only (or append mostly)
ICE/IEE knowledge grid can filter out data more
effectively when all of the filters are placed on a single
table
No indexes means that only hash joins or sort/merge
joins can be performed when joining tables
Ideal schema – flat tables (no joins)
Insert-only tables are the easiest on which to
build summary tables
Querying is very easy as all attributes are always
available
But all attributes can be overwhelming.
Views can be created in this case
When named properly the views can be accessed in parallel too
Special view support
Shard-Query has special support for treating views
as partitioned tables* when the views have the
prefix v_ followed by the actual table name
select * from v_mysql_metrics from all_metrics where
host_id = 33 and collect_date = '2013-05-27';
Joins to these views are supported too
Make sure you only use the MERGE algorithm or
this will not work
* Shard-Query does not currently parse the underlying SQL for views, so this naming is necessary
to allow Shard-Query to find the partition metadata for the underlying table.
Schema Design for Analytics/BI and
Data VisualizationSee better results through faster queries
Star Schema
Most common BI/analytics table is star schema or
a denormalized table (see prev slides)
"Fact" (measurement) table is sharded
Dimension (lookup) tables are unsharded
JOINs between the fact and dimension tables are freely
supported
Star Schema
In some cases a dimension might be sharded
sharding by date to spread data around evenly by date
for example
date_id is in the fact table and in the date dimension table
This is safe because you JOIN by the date_id column
sharding by customer (SaaS) is also common
customer_id in FACT and in dim_customer
Safe because join is by customer_id
Star Schema (cont)
Shard-Query has experimental STAR optimizer
support
Scan dimension tables
Push FACT table IN predicates to SQL WHERE clause
Eliminate JOIN to dimension tables without projected
columns
Other schema types can work too
Master/detail relationship
Unsharded small lookup tables
comment_type
mood_type
etc
The main tables are sharded by blog_id:
blog_info
blog_posts
blog_comments
These all must contain the "shard key" (blog_id)
because they are joined by blog_id, thus blog metadata, comments
and posts must be stored in the same shard for the same
blog.
Table relationships can not currently be defined.
Some tables (like comments) require minor de-normalization to include
the blog_id column.
Snowflake schema
Shard-Query STAR optimizer not yet extended to
snowflake
Consider using star schema or flat table instead
Links and other info
Shard-Query
http://code.google.com/p/shard-query
http://shardquery.com
http://code.google.com/p/PHP-SQL-Parser
http://code.google.com/p/Instrumentation-for-
PHP
Percona
The high performance MySQL and LAMP experts
http://www.percona.com
Training - http://training.percona.com
Support - MySQL, MariaDB, and Percona Server too
Remote DBA - We wake up so you don't have to
Consulting – Is your site slow? We can help.
Development services – Somethings broke? We can fix
it. We can add or improve features to fit your use case.
Gearman
http://www.gearman.org
Job process and concurrent workload
management
Run one worker per physical CPU (or more if you
are IO bound)
Add extra loader workers and exec workers if
needed
Infobright
Infobright Community Edition
Append only
http://infobright.org
Infobright Enterprise Edition
http://infobright.com
They are both column stores but they are
architecturally different.
IEE offers intra-query parallelism natively which
Shard-Query benefits from because
Infobright does not support partitioning.
TokuDB
Compressing row store for big data
Doesn't suffer IO penalty when updating
secondary indexes
Variable compression level by library
New, so prepare to test thoroughly
http://www.tokudb.com
Groonga/Mroonga
Column store and text search system
Supports text and geospatial search
Native(column store) or fulltext wrapper around
InnoDB/MyISAM
http://groonga.org/
Network Engines
NDB(MySQL Cluster)
http://dev.mysql.com/downloads/cluster/7.3.html
SPIDER storage engine
https://launchpad.net/spiderformysql
CONNECT engine for MariaDB 10.x alpha
http://www.skysql.com/enterprise/mariadb-connect-
storage-engine