postgresql insert into select无法使用并行查询的解决
本文信息基于PG13.1。
从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。
先说结论:
换用create table as 或者select into或者导入导出。
首先跟踪如下查询语句的执行计划:
select count(*) from test t1,test1 t2 where t1.id = t2.id ;
postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
-> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
Buckets: 131072 Batches: 16 Memory Usage: 3520kB
-> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
Planning Time: 0.420 ms
Execution Time: 715.447 ms
(13 rows)
可以看到走了两个Workers。
下边看一下insert into select:
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
-> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
-> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
Hash Cond: (t1.id = t2.id)
-> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3227kB
-> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
Planning Time: 0.511 ms
Execution Time: 3745.633 ms
(11 rows)
可以看到并没有Workers的指示,没有启用并行查询。
即使开启强制并行,也无法走并行查询。
postgres=# set force_parallel_mode =on;
SET
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
-> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
-> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
Hash Cond: (t1.id = t2.id)
-> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3227kB
-> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
Planning Time: 0.577 ms
Execution Time: 3825.923 ms
(11 rows)
原因在官方文档有写:
The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.
解决方案有如下三种:
1.select into
postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
-> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
Buckets: 131072 Batches: 16 Memory Usage: 3520kB
-> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
Planning Time: 0.319 ms
Execution Time: 783.300 ms
(13 rows)
2.create table as
postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
-> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
Buckets: 131072 Batches: 16 Memory Usage: 3520kB
-> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
Planning Time: 0.189 ms
Execution Time: 565.448 ms
(13 rows)
3.或者通过导入导出的方式,例如:
psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F "," psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"
一些场景下也会比非并行快。
上一篇:Docker环境下升级PostgreSQL的步骤方法详解
栏 目:其它数据库
下一篇:Redis3.2.11在centos9安装与卸载过程详解
本文标题:postgresql insert into select无法使用并行查询的解决
本文地址:https://zz.feitang.co/shujuku/32843.html
您可能感兴趣的文章
- 12-22使用mysql记录从url返回的http GET请求数据操作
- 12-22详解sql中exists和in的语法与区别
- 12-22hive从mysql导入数据量变多的解决方案
- 12-22如何为PostgreSQL的表自动添加分区
- 12-22postgresql 实现得到时间对应周的周一案例
- 12-22sqoop export导出 map100% reduce0% 卡住的多种原因及解决
- 12-22mysql查询条件not in 和 in的区别及原因说明
- 12-22解决mysql使用not in 包含null值的问题
- 12-22解决从集合运算到mysql的not like找不出NULL的问题
- 12-22postgresql 实现多表关联删除


阅读排行
推荐教程
- 12-11mysql代码执行结构实例分析【顺序、分支、循环结构】
- 12-08添加mysql的用户名和密码是什么语句?
- 12-20PhpMyAdmin出现错误数据无法导出怎么办?
- 12-19Redis中实现查找某个值的范围
- 12-15浅析mysql迁移到clickhouse的5种方法
- 12-15CentOS7 64位下MySQL5.7安装与配置教程
- 12-14Mysql大型SQL文件快速恢复方案分享
- 12-14mysql 5.7.27 安装配置方法图文教程
- 12-13MySQL给新建用户并赋予权限最简单的方法
- 12-13关于MySQL索引的深入解析





