43
Goldilocks And The Three Queries – MySQL's EXPLAIN Explained Dave Stokes MySQL Community Manager, North America [email protected]

Goldilocks and the Three MySQL Queries

Embed Size (px)

DESCRIPTION

Optimizing MySQL queries using explain or the optimizer tracer can greatly increase the speed of retrieving or storing data.

Citation preview

Page 1: Goldilocks and the Three MySQL Queries

Goldilocks And The Three Queries – MySQL's EXPLAIN Explained

Dave StokesMySQL Community Manager, North [email protected]

Page 2: Goldilocks and the Three MySQL Queries

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Please Read

Page 3: Goldilocks and the Three MySQL Queries

Simple Introduction

EXPLAIN & EXPLAIN EXTENDED are tools to help optimize queries. As tools there are only as good as the crafts persons using them. There is more to this subject than can be covered here in a single presentation. But hopefully this session will start you out on the right path for using EXPLAIN.

Page 4: Goldilocks and the Three MySQL Queries

Why worry about the optimizer?

Client sends statement to server

Server checks the query cache to see if it has already run statement. If so, it retrieves stored result and sends it back to the Client.

Statement is parsed, preprocessed and optimized to make a Query Execution Plan.

The query execution engine sends the QEP to the storage engine API.

Results sent to the Client.

Page 5: Goldilocks and the Three MySQL Queries

Once upon a time ...

There was a PHP Programmer named Goldilocks who wanted to get the phone number of her friend Little Red Riding Hood in Networking’s phone number. She found an old, dusty piece of code in the enchanted programmers library. Inside the code was a special chant to get all the names and phone numbers of the employees of Grimm-Fayre-Tails Corp. And so, Goldi tried that special chant!

SELECT name, phone

FROM employees;

Page 6: Goldilocks and the Three MySQL Queries

Oh-No!

But the special chant kept running, and running, and running.

Eventually Goldi control-C-ed when she realized that Grimm hired many, many folks after hearing that the company had 10^10 employees in the database.

Page 7: Goldilocks and the Three MySQL Queries

A second chant

Goldi did some searching in the library and learned she could add to the chant to look only for her friend Red.

SELECT name, phone

FROM employees

WHERE name LIKE 'Red%';

Goldi crossed her fingers, held her breath, and let 'er rip.

Page 8: Goldilocks and the Three MySQL Queries

What she got

Name, phoneRedford 1234Redmund 2323Redlegs 1234Red Sox 1914Redding 9021

– But this was not what Goldilocks needed. So she asked a kindly old Java Owl for help

Page 9: Goldilocks and the Three MySQL Queries

The Owl's chant

'Ah, you want the nickname field!' He re-crafted her chant.

SELECT first, nick, last, phone, group

FROM employees

WHERE nick LIKE '%red%';

Page 10: Goldilocks and the Three MySQL Queries

Still too much data … but better

Betty, Big Red, Lopez, 4321, AccountingEthel, Little Red, Riding-Hoode, 127.0.0.1, NetworksAgatha, Red Herring, Christie, 007, Public RelationsJohnny, Reds Catcher, Bench, 421, Gaming

Page 11: Goldilocks and the Three MySQL Queries

'We can tune the query better'

Cried the Owl.

SELECT first, nick, name, phone, group

WHERE nick LIKE 'Red%'

AND group = 'Networking';

But Goldi was too busy after she got the data she needed to listen.

Page 12: Goldilocks and the Three MySQL Queries

The preceding were obviously flawed queries

• But how do you check if queries are running efficiently?

• What does the query the MySQL server runs really look like? (the dreaded Query Execution Plan). What is cost based optimization?

• How can you make queries faster?

Page 13: Goldilocks and the Three MySQL Queries

EXPLAIN & EXPLAIN EXTENDED

EXPLAIN [EXTENDED | PARTITIONS]

{

SELECT statement

| DELETE statement

| INSERT statement

| REPLACE statement

| UPDATE statement

}

Or EXPLAIN tbl_name (same as DESCRIBE tbl_name)

Page 14: Goldilocks and the Three MySQL Queries

What is being EXPLAINed

Prepending EXPLAIN to a statement* asks the optimizer how it would plan to execute that statement (and sometimes it guesses wrong) at lowest cost (measures in disk page seeks*).

What it can tell you:--Where to add INDEXes to speed row access--Check JOIN order

And Optimizer Tracing (more later) has been recently introduced!

* SELECT, DELETE, INSERT, REPLACE & UPDATE as of 5.6, only SELECT 5.5 & previous* Does not know if page is in memory, on disk (storage engine's problem, not optimizer), see

MySQL Manual 7.8.3

Page 15: Goldilocks and the Three MySQL Queries

The Columns

id Which SELECT

select_type The SELECT type

table Output row table

type JOIN type

possible_keys Potential indexes

key Actual index used

key_ken Length of actual index

ref Columns used against index

rows Estimate of rows

extra Additional Info

Page 16: Goldilocks and the Three MySQL Queries

A first look at EXPLAIN...using World database

Will read all 4079 rows – all the

rows in this table

Page 17: Goldilocks and the Three MySQL Queries

EXPLAIN EXTENDED -> query plan

Filtered: Estimated % of rows filteredBy condition

The query as seen by server (kind of, sort of, close)

Page 18: Goldilocks and the Three MySQL Queries

Add in a WHERE clause

Page 19: Goldilocks and the Three MySQL Queries

Time for a quick review of indexes

Advantages– Go right to desired

row(s) instead of reading ALL ROWS

– Smaller than whole table (read from disk faster)

– Can 'carry' other data with compound indexes

Disadvantages– Overhead*

• CRUD– Not used on full table

scans

* May need to run ANALYZE TABLE to update statistics such as cardinality to help optimizer make better choices

Page 20: Goldilocks and the Three MySQL Queries

Quiz: Why read 4079 rows when only five are needed?

Page 21: Goldilocks and the Three MySQL Queries

Information in the type Column

ALL – full table scan (to be avoided when possible)CONST – WHERE ID=1EQ_REF – WHERE a.ID = b.ID (uses indexes, 1 row returned)REF – WHERE state='CA' (multiple rows for key values)REF_OR_NULL – WHERE ID IS NULL (extra lookup needed for NULL)INDEX_MERGE – WHERE ID = 10 OR state = 'CA'RANGE – WHERE x IN (10,20,30)INDEX – (usually faster when index file < data file)UNIQUE_SUBQUERY – INDEX-SUBQUERY – SYSTEM – Table with 1 row or in-memory table

Page 22: Goldilocks and the Three MySQL Queries

Full table scans VS Index

So lets create a copy of the World.City table that has no indexes. The optimizer estimates that it would require 4,279 rows to be read to find the desired record – 5% more than actual rows.

And the table has only 4,079 rows.

Page 23: Goldilocks and the Three MySQL Queries

How does NULL change things?

Taking NOT NULL away from the ID field (plus the previous index) increases the estimated rows read to 4296! Roughly 5.5% more rows than actual in file.

Running ANALYZE TABLE reduces the count to 3816 – still > 1

Page 24: Goldilocks and the Three MySQL Queries

Both of the following return 1 row

Page 25: Goldilocks and the Three MySQL Queries

EXPLAIN PARTITIONS -Add 12 hash partitions to City

Page 26: Goldilocks and the Three MySQL Queries

Some parts of your querymay be hidden!!

Page 27: Goldilocks and the Three MySQL Queries

Latin1 versus UTF8

Create a copy of the City table but with UTF8 character set replacing Latin1. The three character key_len grows to nine characters. That is more data to read and more to compare which is pronounced 'slower'.

Page 28: Goldilocks and the Three MySQL Queries

INDEX Length

If a new index on CountryCode with length of 2 bytes, does it work as well as the original 3 bytes?

Page 29: Goldilocks and the Three MySQL Queries

Forcing use of new shorter index ...

Still generates a guesstimate that 39 rows must be read.

In some cases there is performance to be gained in using shorter indexes.

Page 30: Goldilocks and the Three MySQL Queries

Subqueries

Run as part of EXPLAIN execution and may cause significant overhead. So be careful when testing.

Note here that #1 is not using an index. And that is why we recommend rewriting sub queries as joins.

Page 31: Goldilocks and the Three MySQL Queries

EXAMPLE of covering Indexing

In this case, adding an index reduces the reads from 239 to 42.

Can we do better for this query?

Page 32: Goldilocks and the Three MySQL Queries

Index on both Continent and Government Form

With both Continent and GovernmentForm indexed together, we go from 42 rows read to 19.

Using index means the data is retrieved from index not table (good)

Using index condition means eval pushed down to storage engine. This can reduce storage engine read of table and server reads of storage engine (not bad)

Page 33: Goldilocks and the Three MySQL Queries

Extra ***

USING INDEX – Getting data from the index rather than the table

USING FILESORT – Sorting was needed rather than using an index. Uses file system (slow)

ORDER BY can use indexesUSING TEMPORARY – A temp table was created –

see tmp_table_size and max_heap_table_sizeUSING WHERE – filter outside storage engineUsing Join Buffer -- means no index used.

Page 34: Goldilocks and the Three MySQL Queries

Things can get messy!

Page 35: Goldilocks and the Three MySQL Queries

straight_join forces order of tables

Page 36: Goldilocks and the Three MySQL Queries

Index Hints

index_hint: USE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] ([index_list]) | IGNORE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] (index_list) | FORCE {INDEX|KEY} [{FOR {JOIN|ORDER BY|

GROUP BY}] (index_list)

Use only as a last resort – shifts in data can make this the 'long way around'.

http://dev.mysql.com/doc/refman/5.6/en/index-hints.html

Page 37: Goldilocks and the Three MySQL Queries

Controlling the Optimizer

mysql> SELECT @@optimizer_switch\G*************************** 1. row

***************************@@optimizer_switch:

index_merge=on,index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on,mrr_cost_based=on,

block_nested_loop=on,batched_key_access=off

You can turn on or off certain optimizer settings for GLOBAL or SESSION

See MySQL Manual 7.8.4.2 and know your mileage may vary.

Page 38: Goldilocks and the Three MySQL Queries

Things to watchmysqladmin -r -i 10 extended-status

Slow_queries – number in last periodSelect_scan – full table scansSelect_full_join full scans to completeCreated_tmp_disk_tables – file sortsKey_read_requerts/Key_wrtie_requests – read/write

weighting of application, may need to modify application

Page 39: Goldilocks and the Three MySQL Queries

Optimizer Tracing (6.5.3 onward)

SET optimizer_trace="enabled=on";SELECT Name FROM City WHERE ID=999;SELECT trace into dumpfile '/tmp/foo' FROM

INFORMATION_SCHEMA.OPTIMIZER_TRACE;

Shows more logic than EXPLAIN

The output shows much deeper detail on how the optimizer chooses to process a query. This level of detail is well past the level for this presentation.

Page 40: Goldilocks and the Three MySQL Queries

Sample from the trace – but no clues on optimizing for Joe Average DBA

Page 41: Goldilocks and the Three MySQL Queries

Final Thoughts

1. READ chapter 7 of the MySQL Manual2. Run ANALYZE TABLE periodically3. Minimize disk I/o

Page 42: Goldilocks and the Three MySQL Queries

Q&A