IN Subqueries in MySQL 5.6 Are Optimized Away

Sheeri

13

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:

mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: NULL
key: idx_title
key_len: 767
ref: NULL
rows: 1025
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: film_actor
type: index_subquery
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: func
rows: 1
Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:

MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1043
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table:
type: eq_ref
possible_keys: distinct_key
key: distinct_key
key_len: 2
ref: func
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: film_actor
type: index
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: NULL
rows: 4889
Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.

In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:

mysql> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: Using index; FirstMatch(film)
2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, “execution time reduces from days to seconds”. With optimizations like this, I believe it!

13 responses

  1. Kolbe Kegel wrote on :

    I don’t think this is quite the boon you’re making it out to be… the simple absence of the word “subquery” in EXPLAIN output doesn’t give the whole story about what is actually going on. See http://roylyseng.blogspot.com/2012/04/semi-join-in-mysql-56.html

    “The semi-join FirstMatch strategy executes a subquery very similar to how the IN-TO-EXISTS strategy familiar from earlier versions of MySQL works: for each matching row in the outer table, check for a match in the inner table. When a match is found, return the row from the outer table, otherwise continue scanning the inner table until reaching the end.”

    “As this is basically the same strategy that MySQL 5.5 would choose, there is usually no speedup to gain when FirstMatch is chosen.”

    You can see that FirstMatch over the dependent subquery for the specific query you were using has absolutely no impact on the number of rows that must be scanned in order to process this query. See http://pastie.org/5949337

    The whole output wouldn’t fit as a comment, so here are the crucial parts:

    mysql 5.6.9-rc (root) [sakila]> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: film
    type: index
    possible_keys: PRIMARY
    key: idx_title
    key_len: 767
    ref: NULL
    rows: 1000
    Extra: Using index
    *************************** 2. row ***************************
    id: 1
    select_type: SIMPLE
    table: film_actor
    type: ref
    possible_keys: idx_fk_film_id
    key: idx_fk_film_id
    key_len: 2
    ref: sakila.film.film_id
    rows: 2
    Extra: Using index; FirstMatch(film)
    2 rows in set (0.00 sec)

    mysql 5.6.9-rc (root) [sakila]> show status like ‘Handler%';
    +—————————-+——-+
    | Variable_name | Value |
    +—————————-+——-+
    | Handler_commit | 1 |
    | Handler_delete | 0 |
    | Handler_discover | 0 |
    | Handler_external_lock | 4 |
    | Handler_mrr_init | 0 |
    | Handler_prepare | 0 |
    | Handler_read_first | 1 |
    | Handler_read_key | 1001 |
    | Handler_read_last | 0 |
    | Handler_read_next | 1000 |
    | Handler_read_prev | 0 |
    | Handler_read_rnd | 0 |
    | Handler_read_rnd_next | 0 |
    | Handler_rollback | 0 |
    | Handler_savepoint | 0 |
    | Handler_savepoint_rollback | 0 |
    | Handler_update | 0 |
    | Handler_write | 0 |
    +—————————-+——-+
    18 rows in set (0.01 sec)

    mysql 5.6.9-rc (root) [sakila]> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
    *************************** 1. row ***************************
    id: 1
    select_type: PRIMARY
    table: film
    type: index
    possible_keys: NULL
    key: idx_title
    key_len: 767
    ref: NULL
    rows: 1000
    Extra: Using where; Using index
    *************************** 2. row ***************************
    id: 2
    select_type: DEPENDENT SUBQUERY
    table: film_actor
    type: index_subquery
    possible_keys: idx_fk_film_id
    key: idx_fk_film_id
    key_len: 2
    ref: func
    rows: 2
    Extra: Using index
    2 rows in set (0.00 sec)

    mysql 5.6.9-rc (root) [sakila]> show status like ‘Handler%';
    +—————————-+——-+
    | Variable_name | Value |
    +—————————-+——-+
    | Handler_commit | 1 |
    | Handler_delete | 0 |
    | Handler_discover | 0 |
    | Handler_external_lock | 4 |
    | Handler_mrr_init | 0 |
    | Handler_prepare | 0 |
    | Handler_read_first | 1 |
    | Handler_read_key | 1001 |
    | Handler_read_last | 0 |
    | Handler_read_next | 1000 |
    | Handler_read_prev | 0 |
    | Handler_read_rnd | 0 |
    | Handler_read_rnd_next | 0 |
    | Handler_rollback | 0 |
    | Handler_savepoint | 0 |
    | Handler_savepoint_rollback | 0 |
    | Handler_update | 0 |
    | Handler_write | 0 |
    +—————————-+——-+
    18 rows in set (0.00 sec)

  2. Sheeri wrote on ::

    Kolbe – we’re talking on Twitter, but just so it’s recorded here:

    The example just shows the point – that MySQL optimizes AWAY the subquery, whereas MariaDB doesn’t.

    1. Kolbe Kegel wrote on :

      Yeah, sorry. My goal was to make the point that “optimizing away the subquery” isn’t, in itself, such a valuable achievement when the thing it uses instead of a “DEPENDENT SUBQUERY” is described by an optimizer developer at Oracle in this way:

      “The semi-join FirstMatch strategy executes a subquery very similar to how the IN-TO-EXISTS strategy familiar from earlier versions of MySQL works: for each matching row in the outer table, check for a match in the inner table. When a match is found, return the row from the outer table, otherwise continue scanning the inner table until reaching the end.”

      My point was that just because you don’t see the word “subquery” or just because you see select_type=SIMPLE in EXPLAIN output doesn’t mean, in practice, that MySQL is really taking a different path in retrieving rows. Despite the fact you see select_type=SIMPLE, MySQL 5.6, using FirstMatch, is doing almost literally the same exact work it was doing in MySQL 5.6 when you saw DEPENDENT SUBQUERY.

      There are certainly many impressive improvements in MySQL 5.6, I just don’t think this example highlights them, and I wanted to point out that when you see FirstMatch instead of DEPENDENT SUBQUERY in EXPLAIN output you can dust off your hands and head home for the day.

      Saying the subquery is “optimized away” here is misleading, in my opinion. You’ve got exactly the same performance situation and logical execution plan that you did before, even if MySQL has taken the liberty of calling it something else.

      1. Sheeri wrote on ::

        It’s not misleading. You are focusing on the FirstMatch strategy. And in fact it’s NOT the same as dependent subquery – Duplicate Weedout is the same a dependent subquery, in this case. See:

        http://dev.mysql.com/doc/refman/5.6/en/semi-joins.html

        There are other algorithms there. The example serves to prove the point that the subquery is optimized away into a join. Sometimes the performance is better, sometimes the same, sometimes worse. As with everything, you have to test on YOUR workload.

        It’s extremely promising, though, that the semijoin in MySQL 5.6 is optimized into a regular join for some queries. MariaDB optimizes the semijoin by materializing the subquery, which is good too.

        1. Kolbe Kegel wrote on :

          In fact, DEPENDENT SUBQUERY and Duplicate Weedout result in quite different work being done for this query:

          mysql 5.6.9-rc (root) [sakila]> explain SELECT SQL_NO_CACHE title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
          *************************** 1. row ***************************
          id: 1
          select_type: SIMPLE
          table: film
          type: index
          possible_keys: PRIMARY
          key: idx_title
          key_len: 767
          ref: NULL
          rows: 1000
          Extra: Using index
          *************************** 2. row ***************************
          id: 1
          select_type: SIMPLE
          table: film_actor
          type: ref
          possible_keys: idx_fk_film_id
          key: idx_fk_film_id
          key_len: 2
          ref: sakila.film.film_id
          rows: 2
          Extra: Using index; Start temporary; End temporary
          2 rows in set (0.00 sec)

          mysql 5.6.9-rc (root) [sakila]> flush status;
          Query OK, 0 rows affected (0.00 sec)

          mysql 5.6.9-rc (root) [sakila]> pager > /dev/null
          PAGER set to ‘> /dev/null’
          mysql 5.6.9-rc (root) [sakila]> SELECT SQL_NO_CACHE title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
          997 rows in set (0.01 sec)

          mysql 5.6.9-rc (root) [sakila]> nopager
          PAGER set to stdout
          mysql 5.6.9-rc (root) [sakila]> show status like ‘handler%';
          +—————————-+——-+
          | Variable_name | Value |
          +—————————-+——-+
          | Handler_commit | 1 |
          | Handler_delete | 0 |
          | Handler_discover | 0 |
          | Handler_external_lock | 4 |
          | Handler_mrr_init | 0 |
          | Handler_prepare | 0 |
          | Handler_read_first | 1 |
          | Handler_read_key | 1001 |
          | Handler_read_last | 0 |
          | Handler_read_next | 6462 |
          | Handler_read_prev | 0 |
          | Handler_read_rnd | 0 |
          | Handler_read_rnd_next | 0 |
          | Handler_rollback | 0 |
          | Handler_savepoint | 0 |
          | Handler_savepoint_rollback | 0 |
          | Handler_update | 0 |
          | Handler_write | 0 |
          +—————————-+——-+
          18 rows in set (0.01 sec)

          mysql 5.6.9-rc (root) [sakila]> set optimizer_switch=’semijoin=off,firstmatch=off,materialization=off,loosescan=off';
          Query OK, 0 rows affected (0.00 sec)

          mysql 5.6.9-rc (root) [sakila]> flush status;
          Query OK, 0 rows affected (0.00 sec)

          mysql 5.6.9-rc (root) [sakila]> explain SELECT SQL_NO_CACHE title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
          *************************** 1. row ***************************
          id: 1
          select_type: PRIMARY
          table: film
          type: index
          possible_keys: NULL
          key: idx_title
          key_len: 767
          ref: NULL
          rows: 1000
          Extra: Using where; Using index
          *************************** 2. row ***************************
          id: 2
          select_type: DEPENDENT SUBQUERY
          table: film_actor
          type: index_subquery
          possible_keys: idx_fk_film_id
          key: idx_fk_film_id
          key_len: 2
          ref: func
          rows: 2
          Extra: Using index
          2 rows in set (0.00 sec)

          mysql 5.6.9-rc (root) [sakila]> pager > /dev/null
          PAGER set to ‘> /dev/null’
          mysql 5.6.9-rc (root) [sakila]> SELECT SQL_NO_CACHE title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
          997 rows in set (0.01 sec)

          mysql 5.6.9-rc (root) [sakila]> nopager
          PAGER set to stdout
          mysql 5.6.9-rc (root) [sakila]> show status like ‘handler%';
          +—————————-+——-+
          | Variable_name | Value |
          +—————————-+——-+
          | Handler_commit | 2 |
          | Handler_delete | 0 |
          | Handler_discover | 0 |
          | Handler_external_lock | 8 |
          | Handler_mrr_init | 0 |
          | Handler_prepare | 0 |
          | Handler_read_first | 1 |
          | Handler_read_key | 1001 |
          | Handler_read_last | 0 |
          | Handler_read_next | 1000 |
          | Handler_read_prev | 0 |
          | Handler_read_rnd | 0 |
          | Handler_read_rnd_next | 0 |
          | Handler_rollback | 0 |
          | Handler_savepoint | 0 |
          | Handler_savepoint_rollback | 0 |
          | Handler_update | 0 |
          | Handler_write | 0 |
          +—————————-+——-+
          18 rows in set (0.00 sec)

  3. Sheeri wrote on ::

    You can see the rewrite in the output to EXPLAIN EXTENDED (or in 5.6 EXPLAIN FORMAT=JSON) and then running SHOW WARNINGS.

    On MySQL 5.6, the query is rewritten to a semi-join:
    mysql> show warnings\G
    *************************** 1. row ***************************
    Level: Note
    Code: 1003
    Message: /* select#1 */ select `sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name` from `sakila`.`customer` semi join (`sakila`.`rental`) where (`sakila`.`rental`.`customer_id` = `sakila`.`customer`.`customer_id`)
    1 row in set (0.00 sec)

    On MariaDB 5.5, the query is also rewritten as a semi-join:
    MariaDB [sakila]> show warnings\G
    *************************** 1. row ***************************
    Level: Note
    Code: 1003
    Message: select `sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name` from `sakila`.`customer` semi join (`sakila`.`rental`) where 1
    1 row in set (0.00 sec)

    For completeness’ sake, here’s what the rewrite is on MySQL 5.1:
    mysql> show warnings\G
    *************************** 1. row ***************************
    Level: Note
    Code: 1003
    Message: select `sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name` from `sakila`.`customer` where (`sakila`.`customer`.`customer_id`,(((`sakila`.`customer`.`customer_id`) in rental on idx_fk_customer_id where ((`sakila`.`rental`.`rental_date` between ‘2006-02-01 00:00:00′ and ‘2006-02-28 00:00:00′) and ((`sakila`.`customer`.`customer_id`) = `sakila`.`rental`.`customer_id`)))))
    1 row in set (0.00 sec)

  4. Sheeri wrote on ::

    Also, your comment is very focused on FirstMatch. There are other algorithms, and the reason I didn’t talk about what FirstMatch meant (it’s also new in MySQL 5.6) is that it’s irrelevant to the point. The point is in the title – subqueries in MySQL 5.6 are optimized away, as you can see by the EXPLAIN output where select_type is SIMPLE instead of PRIMARY.

  5. Patryk Pomykalski wrote on ::

    MariaDB has semijoin too. For this particular query the optimizer choose materialization, but you can probably force semijoin by tweaking optimizer_switch. Saying that mysql 5.6 did a better job here maybe misleading unless you test which one is actually faster.

  6. Pingback from Sergey Petrunia’s blog » No, Sheeri, MySQL 5.6 does not optimize subqueries away on ::

    [...] wrote a blog post that claims that “IN Subqueries in MySQL 5.6 Are Optimized Away” and uses that as a [...]

  7. Justin Swanhart wrote on :

    Optimizing away means removing work completely, not reducing it, which is just optimizing. MySQL is just optimizing the query via a rewrite which is not the same as “optimizing away”.

    Two simple examples of optimizing away:
    where const = const
    where const != const

    In the first case, the optimizer does not evaluate the predicate during query execution because the predicate always returns true, thus the computation is truly optimized away.

    In the second case, the optimizer will detect the impossible WHERE condition early during execution and will optimize away all the remaining work of the query, which is the bulk as it never even got to reading rows.

  8. Patryk Pomykalski wrote on ::

    BTW this blog is unreadable on mobile phone (at least on galaxy nexus). Tried Opera, Chrome and the default browser. Text width is squezed to a few characters so even single words don’t fit in.

    1. Sheeri wrote on ::

      I have filed https://bugzilla.mozilla.org/show_bug.cgi?id=837712

      1. Craig Cook wrote on ::

        There were some custom styles being injected, adding a right margin to the main content block. It’s probably something the old theme needed that isn’t relevant now, and removing the styles fixes the sidebar. Enjoy your mobile reading :)