研究一条distinct+orderby+limit的SQL执行过程,发现limit影响排...--688IT编程网

研究⼀条distinct+orderby+limit的SQL执⾏过程，发现limit影响排。。。

假设有如下表：

mysql> select * from tab;

+----+------+

| id | col1 |

+----+------+

| 1 | 2 |

| 2 | 2 |

| 3 | 5 |

| 4 | 3 |

| 5 | 3 |

| 6 | 4 |

+----+------+

6 rows in set (0.01 sec)

mysql> desc tab;

+-------+---------+------+-----+---------+-------+

+-------+---------+------+-----+---------+-------+

| id | int(11) | NO | PRI | NULL | |

| col1 | int(11) | YES | | NULL | |

+-------+---------+------+-----+---------+-------+

2 rows in set (0.00 sec)

那么，下⾯的SQL是怎样的执⾏过程呢？

select distinct id from tab order by col1 desc limit 1,2;

1.先执⾏select distinct id from tab order by col1 desc，在筛选limit 1,2;？等效于select * from (select distinct id from tab order by col1 desc) a limit 1,2；

2.先执⾏select * from tab order by col1 desc limit 1,2;，再选出distinct id？等效于select distinct id from (select * from tab order by col1 desc limit 1,2) a；

3.先执⾏select * from tab order by col1 desc，再从结果集中第⼀⾏数据进⾏去重，直到取到3个数值，然后拿后⾯2个？

我们先从实际数据来验证⼀下：

⾸先，先执⾏⼀下select distinct id from tab order by col1 desc limit 1,2;，看看实际的返回结果是什么：

mysql> select distinct id from tab order by col1 desc limit 1,2;

+----+

| id |

+----+

| 6 |

| 5 |

+----+

2 rows in set (0.00 sec)

对于第⼀种猜想：

mysql> select * from (select distinct id from tab order by col1 desc) a limit 1,2;

+----+

| id |

+----+

| 6 |

| 4 |

+----+

2 rows in set (0.00 sec)

对于第⼆种猜想：

mysql> select distinct id from (select * from tab order by col1 desc limit 1,2) a;

+----+

| id |

+----+

| 6 |

| 5 |

+----+

2 rows in set (0.00 sec)

对于第三种猜想：

mysql> select * from tab order by col1 desc;

+----+------+

| id | col1 |

+----+------+

| 3 | 5 |

| 6 | 4 |

| 4 | 3 |

| 5 | 3 |

| 1 | 2 |

| 2 | 2 |

+----+------+

6 rows in set (0.00 sec)

--然后根据以上结果，执⾏limit 1,2，从第⼀⾏数据开始，取三个唯⼀id，即3，6，4，取后⾯两个数即是6，4

从上⾯三个测试来看，只有第⼆种猜想得到的结果才是正确，那么，这⼀种猜想就是MySQL真实的执⾏⽅式了吗？答案是否定的。再看下⾯的例⼦。

同样是上⾯的测试表tab，数据不变，我们将上⾯SQL中的col1和id互换位置，即：

select distinct col1 from tab order by id desc limit 1,2;

该SQL结果是：

mysql> select distinct col1 from tab order by id desc limit 1,2;

+------+

| col1 |

+------+

| 3 |

| 5 |

+------+

2 rows in set (0.00 sec)

那么，按照上⾯的第⼆种猜想，该SQL等效于select distinct col1 from (select * from tab order by id desc limit 1,2) a；结果是：

mysql> select distinct col1 from (select * from tab order by id desc limit 1,2) a;

+------+

| col1 |

+------+

| 3 |

+------+

1 row in set (0.00 sec)

显然，结果并不⼀致，第⼆种猜想也沦陷了。

反⽽，按照第三种猜想，结果才是正确的：

mysql> select * from tab order by id desc;

+----+------+

| id | col1 |

+----+------+

| 6 | 4 |

| 5 | 3 |

| 4 | 3 |

| 3 | 5 |

| 2 | 2 |

| 1 | 2 |

+----+------+

6 rows in set (0.00 sec)

--然后取得三个唯⼀col1值：4，3，5，取后⾯两个即3，5，与原SQL结果⼀致。

那么到底哪⼀种才是呢，我认为是第三种，那为什么第⼀条实例SQL对于第三种猜想⾏不通呢？听我解释。

对于本⽂的第⼀条实例SQL select distinct id from tab order by col1 desc limit 1,2;，第三种猜想之所以得到的结果不⼀致，是因为limit 字句导致了select * from tab order by col1 desc排序结果有了变化，

看数据：

--对于没有limit字句的排序结果是下⾯的样⼦，注意id=4排在id=5前⾯:

mysql> select * from tab order by col1 desc;

+----+------+

| id | col1 |

+----+------+

| 3 | 5 |

| 6 | 4 |

| 4 | 3 |

| 5 | 3 |

| 1 | 2 |

| 2 | 2 |

+----+------+

6 rows in set (0.00 sec)

--加了limit字句之后，id=5却排在id=4前⾯：

mysql> select * from tab order by col1 desc limit 1,2;

+----+------+

| id | col1 |

+----+------+

| 6 | 4 |

| 5 | 3 |

+----+------+

2 rows in set (0.00 sec)

也就是说，加了limit 1,2这个⼦句之后，实际的排序结果应该是：

+----+------+

| id | col1 |

+----+------+

| 3 | 5 |

| 6 | 4 |

| 5 | 3 |

| 4 | 3 |

...

+----+------+

也就是说，select distinct id from tab order by col1 desc limit 1,2；的具体执⾏过程是：

按照以下加了limit 1,2⼦句后导致的排序结果：

+----+------+

| id | col1 |

+----+------+

| 3 | 5 |

| 6 | 4 |

| 5 | 3 |

| 4 | 3 |

...

+----+------+

从结果集第⼀⾏开始，将id放⼊临时表中，临时表的结构⼤概是⼀个只⼀列的表，且该列上有⼀个唯⼀索引，在从结果集取数据插⼊临时表的过程中，由唯⼀索引来过滤重复的数据，来达到去重的效果。那么整个过程就是：

1.取id=3放⼊临时表第⼀⾏；

2.取id=6，与id=3不重复，放⼊临时表第⼆⾏；

3.取id=5，与id=3,id=6不重复，放⼊临时第三⾏；

4.对于limit 1,2，需要放⼊临时表3条数据，并顺序取出后⾯两条，即取出id=6,id=5，与原SQL结果⼀致。

那么整个过程，即使limit 1,2影响了排序，但始终还是要遍历整个表，按照col1来排序，需要读取6⾏数据，再加上临时表的3⾏，整个过程应该是读了9⾏数据。我们通过slow log的Rows_examined可以验证这个数据：

# Time: 2019-04-02T05:30:23.462551Z

# User@Host: root[root] @ localhost [] Id: 9

# Query_time: 0.000503 Lock_time: 0.000178 Rows_sent: 2 Rows_examined: 9

select distinct fromSET timestamp=1554183023;

select distinct id from tab order by col1 desc limit 1,2;

以上证明了对于第⼀个⽰例SQL，第三种猜想成⽴的理由。

那么，为什么第⼆个⽰例SQL，第三种猜想就直接成⽴了呢？因为在第⼆个⽰例SQL中，order by id desc中的id是主键，即是limit⼦句会影响排序结果，也是在相同排序列值的结果受影响，⽽id是唯⼀的，也就是说排序结果也是唯⼀的，是不受limit影响的。

所以，对于select distinct … from tab order by … limit …；这类SQL，执⾏⽅式应该是如下：

按照受limit⼦句影响的排序结果，从结果集的第⼀⾏开始遍历，将不重复的值放⼊临时表中，直到数据的数量满⾜limit的取数要求。

SQL的执⾏逻辑解决了，那么limit⼦句为什么会影响排序结果呢？这个问题还没研究透，有做过⼀些测试，但是得到的答案还是不那么确定，估计是跟优化器在执⾏这种SQL时对主键索引的访问⽅式有关。

再说⼀个额外的话题，上⾯已经解释了select distinct id from tab order by col1 desc limit 1,2；读数据的⾏数是9，那么对于第⼆个⽰例SQL select distinct col1 from tab order by id desc limit 1,2；读数据的⾏数也是9吗？我们看⼀下slow log的信息：

# Time: 2019-04-02T05:42:59.280637Z

# User@Host: root[root] @ localhost [] Id: 9

# Query_time: 0.000513 Lock_time: 0.000132 Rows_sent: 2 Rows_examined: 7

SET timestamp=1554183779;

select distinct col1 from tab order by id desc limit 1,2;

可以看到，Rows_examined是7，也就是说，这个SQL只读了7⾏数据。为什么不⼀样呢？看我解释。

解释的过程也是需要结合SQL执⾏过程：

--第⼀步，看排序结果

--⾸先由于order by 主键，所以排序结果并不受影响，直接看⼀下排序结果：

mysql> select * from tab order by id desc;

+----+------+

| id | col1 |

+----+------+

| 6 | 4 |

| 5 | 3 |

| 4 | 3 |

| 3 | 5 |

| 2 | 2 |

| 1 | 2 |

+----+------+

6 rows in set (0.00 sec)

--第⼆步，取唯⼀值到临时表中，对于上述的排序结果及SQL的limit⼦句需要3个唯⼀值，所以应该取的数是：

+----+------+

| id | col1 |

+----+------+

| 6 | 4 |

| 5 | 3 |

| 4 | 3 |

| 3 | 5 |

+----+------+

所以临时表⾥的数据应该是：

+------+

| col1 |

+------+

| 4 |

| 3 |

| 5 |

+----+------+

那么从上⾯的过程来看，Rows_examined应该是6+3=9才对啊？为什么slow log显⽰是7呢？

其实，对于第⼀步的排序，由于是按照主键倒序排序，并不需要遍历整个表，只需要从主键索引最右端⼀直往左读，直到读取到⾜够数量的col1值即可。

即主键索引上读4⾏+临时表3⾏=7⾏。

688IT编程网

研究一条distinct+orderby+limit的SQL执行过程,发现limit影响排...

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

研究一条distinct+orderby+limit的SQL执行过程,发现limit影响排...

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式