PostgreSQL一复合查询SQL优化例子-(多个exists,范围检索,IN检索,模糊...--688IT编程网

PostgreSQL⼀复合查询SQL优化例⼦-（多个exists，范围检

索,IN检索，模糊。。。

标签

PostgreSQL , 多个exists , 范围检索 , IN检索 , 模糊检索 , 组合 , gin , recheck , filter , subplan

背景

当⼀个SQL包含复杂的多个exists , 范围检索 , IN检索 , 模糊检索 , 组合查询时，可能由于索引使⽤不当导致查询性能较慢。

主要的问题在于，索引使⽤不当，可能导致⼏个问题：

1、索引本⾝扫描的耗时过多

2、位图扫描引⼊的recheck过多

3、subplan 引⼊的 filter过多

⼀个现实的例⼦，可以看到耗时集中在recheck和filter上⾯，每个索引扫描返回的记录数都很多，但是组合起来是0条符合条件的记录。

问题就出在索引不正确上，导致了问题。

-> Subquery Scan on "*SELECT* 2" (cost=273453.65..432483146.70 rows=223 width=349) (actual time=25932.371..25932.371 rows=0 loops=1)

Output: ...................................

Buffers: shared hit=920071 read=269255

I/O Timings: read=1552.767

-> Bitmap Heap Scan on zjxftypt.tab1010201 t_1 (cost=273453.65..432483144.47 rows=223 width=349) (actual time=25932.370..25932.370 rows=0 loops=1 Output: t_1.storeid, t_1.xfjbh, t_1.wtsd, t_1.rs, t_1.digoal123x, t_1.dz, t_1.blfsjd, t_1.qx, t_1.gk, t_1.xfrq, t_1.djsj, t_1.djdw, t_1.xfjclzt, t_1.digoal123, t_1.xfx -- 位图扫描的条件重新过滤 , 过滤太多了

Recheck Cond: ((t_1.xfrq < (to_date('2018-06-11'::character varying, 'yyyy-mm-dd'::character varying)

+ 1)) AND (t_1.xfrq >= to_date('2014-02-12'::character Rows Removed by Index Recheck: 1214155

-- 过滤exists的JOIN条件值是否满⾜，过滤太多了

Filter: (((t_1.digoal123x)::text ~~ '%阿⾥巴巴%'::text) AND ((alternatives: SubPlan 4 or hashed SubPlan 5) OR(alternatives: SubPlan 6 or hashed SubPlan 7))) Rows Removed by Filter: 5215804

Buffers: shared hit=920071 read=269255

I/O Timings: read=1552.767

-- 条件1,2位图扫描

-> BitmapAnd (cost=273453.65..273453.65 rows=4909643 width=0) (actual time=2510.718..2510.718 rows=0 loops=1)

Buffers: shared hit=27036 read=16539

I/O Timings: read=101.425

-- ⾃⾝条件1 符合条件的记录太多了

-> Bitmap Index Scan on index_tab1010201_xfrq (cost=0.00..126565.99 rows=4943755 width=0) (actual time=1085.429..1085.429 rows=5268071 loops=1) Index Cond: ((t_1.xfrq < (to_date('2018-06-11'::character varying, 'yyyy-mm-dd'::character varying) + 1)) AND (t_1.xfrq >= to_date('2014-02-12'::c Buffers: shared hit=3288 read=16539

I/O Timings: read=101.425

-- ⾃⾝条件2 符合条件的记录太多了

-> Bitmap Index Scan on index_tab1010201_digoal123 (cost=0.00..146887.30 rows=6599316 width=0) (actual time=1355.825..1355.825 rows=6845646 loop Index Cond: (t_1.digoal123 = 1::numeric)

Buffers: shared hit=23748

..............sub plans

优化举例

1、复现问题，创建测试表

create table test(id int, c1 text, c2 date, c3 text);

SQL如下

select * from test

where

(

exists (select 1 from pg_class where oid::int = test.id)

exists (select 1 from pg_attribute where attrelid::int=test.id)

)

and c1 in ('1','2','3')

and c2 between current_date-1 and current_date

and c3 ~ 'abcdef';

2、写⼊测试护甲1000万条

insert into test select id, (random()*10)::int::text, current_date, md5(random()::text) from generate_series(1,10000000) t(id);

3、创建索引，使之可以在索引层⾯过滤掉所有数据

create extension pg_trgm;

create extension btree_gin;

create index idx_test_1 on test using gin (c1, c2, c3 gin_trgm_ops);

如果是复现问题，应该是这两个索引

create index idx1 on test (c1);

create index idx2 on test (c2);

4、查看执⾏计划

postgres=# explain (analyze,verbose,timing,costs,buffers)

select * from test

where

(

exists (select 1 from pg_class where oid::int = test.id)

exists (select 1 from pg_attribute where attrelid::int=test.id)

)

and c1 in ('1','2','3')

and c2 between current_date-1 and current_date

and c3 ~ 'abcdef';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan st (cost=156.43..8593.79 rows=228 width=43) (actual time=837.151..837.151 rows=0 loops=1)

Output: test.id, test.c1, test.c2, test.c3

-- 位图扫描重新RECHECK过滤

Recheck Cond: ((test.c1 = ANY ('{1,2,3}'::text[])) AND (test.c2 >= (CURRENT_DATE - 1)) AND (test.c2 <= CURRENT_DATE) AND (test.c3 ~ 'abcdef'::text))

Rows Removed by Index Recheck: 1

-- exists⼦句的条件检查，过滤

Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))

Rows Removed by Filter: 7

Heap Blocks: exact=8

Buffers: shared hit=11658 read=23

-- 所有条件压到GIN复合索引⾥⾯

-- GIN多个条件时，会⾃动内部位图扫描

-> Bitmap Index Scan on idx_test_1 (cost=0.00..156.37 rows=304 width=0) (actual time=834.418..834.418 rows=8 loops=1)

Index Cond: ((test.c1 = ANY ('{1,2,3}'::text[])) AND (test.c2 >= (CURRENT_DATE - 1)) AND (test.c2 <= CURRENT_DATE) AND (test.c3 ~ 'abcdef'::text)) Buffers: shared hit=11582 read=23

SubPlan 1

> Seq Scan on pg_catalog.pg_class (cost=0.00..15.84 rows=1 width=0) (never executed)

Filter: ((pg_class.oid)::integer = test.id)

SubPlan 2

-> Seq Scan on pg_catalog.pg_class pg_class_1 (cost=0.00..14.87 rows=387 width=4) (actual time=0.014..0.155 rows=388 loops=1)

Output: (pg_class_1.oid)::integer

Buffers: shared hit=11

SubPlan 3

-> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute (cost=0.28..84.39 rows=8 width=0) (never executed)

Filter: ((pg_attribute.attrelid)::integer = test.id)

Heap Fetches: 0

SubPlan 4

-> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute pg_attribute_1 (cost=0.28..77.13 rows=2904 width=4) (actual time=0.0 Output: (pg_attribute_1.attrelid)::integer

Heap Fetches: 459

Buffers: shared hit=57

Planning time: 1.070 ms

Execution time: 839.834 ms

(29 rows)

看起来还不错，但是仔细深究实际上并没有优化太多，还可以有更好的优化。

5、深⼊优化，需要理解GIN复合索引内部的执⾏机制(位图扫描)。

因为满⾜C3条件的记录本⾝就很少，所以完全不需要使⽤GIN内部的位图扫描。

postgres=# select count(*) from test where c3 ~ 'abcdef';

count

-------

23sql优化的几种方式

(1 row)

修改为如下索引

postgres=# drop index idx_test_1 ;

DROP INDEX

postgres=# create index idx_test_1 on test using gin (c3 gin_trgm_ops) ;

CREATE INDEX

6、耗时编程24毫秒

postgres=# explain (analyze,verbose,timing,costs,buffers)

select * from test

where

(

exists (select 1 from pg_class where oid::int = test.id)

exists (select 1 from pg_attribute where attrelid::int=test.id)

)

and c1 in ('1','2','3')

and c2 between current_date-1 and current_date

and c3 ~ 'abcdef';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan st (cost=53.76..27798.16 rows=228 width=43) (actual time=24.287..24.287 rows=0 loops=1)

Output: test.id, test.c1, test.c2, test.c3

Recheck Cond: (test.c3 ~ 'abcdef'::text)

Rows Removed by Index Recheck: 6

Filter: ((test.c1 = ANY ('{1,2,3}'::text[])) AND (test.c2 <= CURRENT_DATE) AND (test.c2 >= (CURRENT_DATE - 1)) AND ((alternatives: SubPlan 1 or hashed Su Rows Removed by Filter: 23

Heap Blocks: exact=29

Buffers: shared hit=226

-> Bitmap Index Scan on idx_test_1 (cost=0.00..53.70 rows=1000 width=0) (actual time=21.517..21.517 rows=29 loops=1)

Index Cond: (test.c3 ~ 'abcdef'::text)

Buffers: shared hit=128

SubPlan 1

-> Seq Scan on pg_catalog.pg_class (cost=0.00..15.84 rows=1 width=0) (never executed)

Filter: ((pg_class.oid)::integer = test.id)

SubPlan 2

-> Seq Scan on pg_catalog.pg_class pg_class_1 (cost=0.00..14.87 rows=387 width=4) (actual time=0.011..0.156 rows=387 loops=1)

Output: (pg_class_1.oid)::integer

Buffers: shared hit=11

SubPlan 3

> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute (cost=0.28..84.39 rows=8 width=0) (never executed)

Filter: ((pg_attribute.attrelid)::integer = test.id)

Heap Fetches: 0

SubPlan 4

-> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute pg_attribute_1 (cost=0.28..77.13 rows=2904 width=4) (actual time=0.0 Output: (pg_attribute_1.attrelid)::integer

Heap Fetches: 456

Buffers: shared hit=58

Planning time: 0.801 ms

Execution time: 24.403 ms

(29 rows)

Time: 26.052 ms

⼩结

本⽂的SQL⽐较复杂，优化的思路和其他SQL差不多，只是本例可以理解BITMAP SCAN以及GIN索引的内部BITMAP SCAN在对较⼤数据

进⾏合并时，可能引⼊的开销。

切⼊点依旧是explain，耗时段，背后的原因，解决。

1、什么时候使⽤GIN复合？

当任意⼀个条件，选择性不好时，使⽤复合。

什么时候使⽤GIN⾮复合？

2、当有有⼀个条件，选择性很好时，把它单独拿出来，作为⼀个独⽴索引。⽐如本例的c3模糊查询字段，过滤性好，应该单独拿出来。

其实就是说，选择性不好的列，不要放到索引⾥⾯，即使要放，也应该等PG出了分区索引后，将这种列作为分区索引的分区键。(多颗树)，或者使⽤partial index。

688IT编程网

PostgreSQL一复合查询SQL优化例子-(多个exists,范围检索,IN检索,模糊...

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

PostgreSQL一复合查询SQL优化例子-(多个exists,范围检索,IN检索,模糊...

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式