mysql千万级内模糊查询的实现方式--688IT编程网

mysql千万级内模糊查询的实现⽅式

昨晚辗转反侧，灵光闪现，突然想到了覆盖索引+主动回表的⽅式，管你⼏个字段，我只要⼀个普通索引。

所以千万级⼤表的like模糊查询能不能做？

能

废话不多说，那就搞⼀搞。

建表

create table emp

(

id int unsigned auto_increment

primary key,

empno mediumint unsigned default 0 not null,

ename varchar(20) default '' not null,

job varchar(9) default '' not null,

mgr mediumint unsigned default 0 not null,

hiredate date not null,

sal decimal(7, 2) not null,

comm decimal(7, 2) not null,

deptno mediumint unsigned default 0 not null

)

charset = utf8;

导⼊千万级数据

⽅法在

bigdata> select count(*) from emp

[2021-08-19 11:08:25] 1 row retrieved starting from 1 in 2 s 900 ms (execution: 2 s 874 ms, fetching: 26 ms)

未建索引下的模糊查询

bigdata> select ename, empno, job from emp where ename like '%S%'

[2021-08-19 11:14:25] 2,765,363 rows retrieved starting from 1 in 9 s 360 ms (execution: 8 ms, fetching: 9 s 352 ms)

仅右模糊的就不考虑了，都知道是⾛索引的。

上法宝，覆盖索引

不幸的是，直接卡在了创建索引这⼀步，因为表已经千万数据了，直接建索引机器就卡死了，顺便搜索了⼀下，总结的很好，但是我不⽤我直接truncate删除表索引和数据

检查索引/表是否删除⼲净

use information_schema;

# 查看指定库的指定表的⼤⼩

select concat(round(sum(DATA_LENGTH/1024/1024),2),'MB') as data from TABLES where table_schema='bigdata' and table_name='emp';

# 查看指定库的指定表的索引⼤⼩

SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), 'MB') AS 'Total Index Size' FROM TABLES WHERE table_schema = 'bigdata' and table_name='emp';

创建索引

create index emp_ename_idx on emp (ename);

再次导⼊数据

查看mysql索引

Call insert_emp10000(0,10000000);

[2021-08-19 14:18:53] completed in 2 h 22 m 37 s 90 ms

时间有够长的。。

尝试⼀下有索引的like模糊

bigdata> select ename from emp where ename like '%S%'

[2021-08-19 14:37:40] 2,093,321 rows retrieved starting from 1 in 5 s 128 ms (execution: 34 ms, fetching: 5 s 94 ms)

覆盖索引，性能提升

可以⽤desc/explain确认⼀下⾛了索引，原理不说了吧，覆盖索引

对⽐上⾯可以发现，使⽤覆盖索引后性能提升了⼀倍

但你可能说：就这？就这？这有卵⽤，谁查询时也不可能只查⼀个字段呀，但是把要查询的字段都加上索引⼜不现实，毕竟索引也需要空间存储的，给要返回的字段都加上索引，可能光索引就⽐表数据⼤N倍了。

那咋整？

实不相瞒，这就是我昨晚思考到的，以⾄于兴奋得夜不能寐。

关键在于这样：

bigdata> select id, ename from emp where ename like '%S%'

[2021-08-19 14:48:11] 2,093,321 rows retrieved starting from 1 in 4 s 685 ms (execution: 9 ms, fetching: 4 s 676 ms)没错，就多了个id（或者直接返回id也是可以的，不不不，理论上应该仅返回id，可避免mysql回表）

id有什么⽤？id能精确查数据鸭！（有没有觉得很像外部的“主动回表”）

就像这样，⼆次查询

bigdata> select id, from emp where id in (497723, 670849, 1371884, 1934742, 1960444, 2165983) [2021-08-19 15:45:23] 6 rows retrieved starting from 1 in 78 ms (execution: 23 ms, fetching: 55 ms)

这速度不就有了，数据也有了。

基于此，还可以实现内存分页，且基本不⽤担⼼内存溢出问题

再搞个缓存，性能⼜能进⼀步提升，不过代价也很明显，复杂度进⼀步提升

688IT编程网

mysql千万级内模糊查询的实现方式

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

688IT编程网

mysql千万级内模糊查询的实现方式

发表评论

推荐文章

应用程序的安全检测方法、装置、电子设备和存储介质

nginx map用法 正则

VBA之正则表达式(1)--基础篇

Prometheus监控学习笔记之初识PromQL

关于PHP中的webshell

热门文章

m函数数字提取

jest断言方法大全

中兴ZXSEC US 管理员手册

keras系列(一):参数设置

Qt从QString中提取出数字

element input 金额千分位格式化

freemaker 参数解析正则

C#正则验证数字

form表单验证正则

scanf正则表达式用法

grafana value的正则表达式

Android平台浮点数运算应用

js-(JS正则表达式验证数字)

判断Python输入是否是整数,字符,或浮点数

c语言 sscanf 正则规则

从文本中提取数值技巧

js将整数转换成两位浮点数的方法

vue正则限制浮点数

8到20的结尾的正则

shell 正则表达式 最后一行

最新文章

应用程序的安全检测方法、装置、电子设备和存储介质

VBA之正则表达式(1)--基础篇

代码编辑的辅助方法、装置及电子设备

SHELL查字符串中包含字符的命令

String方法中replace和replaceAll的区别详解(源码分析)

双字节符号正则

标签列表

nginx map用法正则

shell 正则表达式最后一行