Postgresql进程卡住无法退出原因和解决方法
2023-06-13 09:11:01 时间
前言
Postgresql进程卡住无法退出怎么办?例如以下场景:
- kill 进程:无效
- kill -2/-12 进程:无效
- pg_terminate_backend(pid):无效
1 模拟卡住
-- pg14下测试通过
cd `pg_config --libdir`/postgresql
cat << EOF > loop.c
#include "postgres.h"
#include "fmgr.h"
#include <unistd.h>
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(loop);
Datum loop(PG_FUNCTION_ARGS)
{
/* an endless loop */
while(1)
{
// 注意这里没有加:CHECK_FOR_INTERRUPTS();
sleep(2);
}
}
EOF
gcc -I /data01/bin/pg9000/include/postgresql/server -fPIC -shared -o loop.so loop.c
-- psql
CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';
-- 卡住
postgres=#
postgres=# select loop();
2 尝试解决失败的方法
psql执行ctrl+c失败
postgres=# select loop();
^CCancel request sent
^CCancel request sent
kill 失败/kill -2失败/kill -12失败
$ kill 11699
$ kill -2 11699
$ kill -12 11699
pg_terminate_backend失败
postgres=# select pg_cancel_backend(11699);
pg_cancel_backend
-------------------
t
(1 row)
postgres=# select pg_terminate_backend(11699);
pg_terminate_backend
----------------------
t
(1 row)
3 进程在干什么
stracp -p
$ strace -p 11699
Process 11699 attached
restart_syscall(<... resuming interrupted call ...>) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0},
0x7fff60c0bb80) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff60c0bb80) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
gstack
$ gstack 11699
#0 0x00007fbbe45a68d0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00007fbbe45a6784 in sleep () from /lib64/libc.so.6
#2 0x00007fbbe585e145 in loop () from /data01/bin/pg9000/lib/postgresql/loop.so
#3 0x0000000000728b0f in ExecInterpExpr (state=0x2bf0100, econtext=0x2befe00, isnull=0x7fff60c0c037) at execExprInterp.c:725
#4 0x000000000072a965 in ExecInterpExprStillValid (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at execExprInterp.c:1824
#5 0x000000000077eec1 in ExecEvalExprSwitchContext (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at ../../../src/include/executor/executor.h:339
#6 0x000000000077ef2a in ExecProject (projInfo=0x2bf00f8) at ../../../src/include/executor/executor.h:373
#7 0x000000000077f118 in ExecResult (pstate=0x2befce8) at nodeResult.c:136
#8 0x000000000073eefa in ExecProcNodeFirst (node=0x2befce8) at execProcnode.c:463
#9 0x00000000007338aa in ExecProcNode (node=0x2befce8) at ../../../src/include/executor/executor.h:257
#10 0x0000000000736130 in ExecutePlan (estate=0x2befab0, planstate=0x2befce8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x2bee790, execute_once=true) at execMain.c:1551
#11 0x0000000000733f15 in standard_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#12 0x00007fbbe589ac5e in pgss_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at pg_stat_statements.c:1003
#13 0x0000000000733d27 in ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:303
#14 0x000000000097cd6d in PortalRunSelect (portal=0x2ba3f90, forward=true, count=0, dest=0x2bee790) at pquery.c:921
#15 0x000000000097ca2c in PortalRun (portal=0x2ba3f90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2bee790, altdest=0x2bee790, qc=0x7fff60c0c4e0) at pquery.c:765
#16 0x00000000009766d3 in exec_simple_query (query_string=0x2b064f0 "select loop();") at postgres.c:1213
#17 0x000000000097abf1 in PostgresMain (argc=1, argv=0x7fff60c0c770, dbname=0x2b32c00 "postgres", username=0x2b32bd8 "mingjiegao") at postgres.c:4494
#18 0x00000000008b6de6 in BackendRun (port=0x2b2b720) at postmaster.c:4530
#19 0x00000000008b6765 in BackendStartup (port=0x2b2b720) at postmaster.c:4252
#20 0x00000000008b2bdd in ServerLoop () at postmaster.c:1745
#21 0x00000000008b24af in PostmasterMain (argc=1, argv=0x2b000e0) at postmaster.c:1417
#22 0x00000000007b4d2b in main (argc=1, argv=0x2b000e0) at main.c:209
4 原因&解决方案
4.1 原因
原因是PG当前堆栈没有进入信号相应处理函数,一般就是进入死等堆栈了:
- 一般能cacnel的堆栈:超时后检查中断,然后继续sleep
while
sleep(timeout)
CHECK_FOR_INTERRUPTS();
- 死等堆栈
while
sleep()
4.2 解决方案
注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;PG父进程发现子进程异常退出,会停掉所有进程,释放共享内存,在重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)
执行kill -2或kill -12后或者执行pg_terminate_backend后,主动调用信号处理函数,让PG正常退出。
$ gdb attach 11699
(gdb) p ProcessInterrupts()
[Inferior 1 (process 11699) exited with code 01]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(ProcessInterrupts) will be abandoned.
(gdb) q
服务端效果
postgres=#
postgres=# select loop();
^CCancel request sent
^CCancel request sent
(GDB调用后)
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# \q
服务端日志:server没有其他报错、server没有重启
11699 [local] mingjiegao postgres 2022-08-19 09:41:43 UTC 42723STATEMENT: CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';
11699 [local] mingjiegao postgres 2022-08-19 09:42:24 UTC 00000LOG: statement: select loop();
13083 [local] mingjiegao postgres 2022-08-19 09:45:39 UTC 00000LOG: statement: select pg_cancel_backend(11699);
13083 [local] mingjiegao postgres 2022-08-19 09:45:47 UTC 00000LOG: statement: select pg_terminate_backend(11699);
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01FATAL: terminating connection due to administrator command
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01STATEMENT: select loop();
相关文章
- zip文件加密的几种破解方法
- PostgreSQL安装、配置及简单使用方法
- PostgreSQL vs. MySQL: 数据库对比(postgresql和mysql)
- 热备PostgreSQL双机热备:保障数据安全的必备技术(postgresql双机)
- 对比对比深度:PostgreSQL与Oracle之间的差异(postgresql和oracle)
- String API常用方法介绍
- PostgreSQL:实现强大优势的数据库(postgresql优势)
- 调优提高Linux性能的IO与内存调优方法(Linuxio内存)
- 安装及使用PostgreSQL数据库安装与应用指南(postgresql数据库)
- Python如何连接PostgreSQL数据库?(python连接postgresql)
- 探索PostgreSQL客户端工具,优化数据库管理(postgresql客户端工具)
- PostgreSQL的弊端:不可忽视的缺陷(postgresql缺点)
- 解决oracle O1745错误的方法(oracle01745)
- 如何在PostgreSQL中创建新用户(postgresql创建用户)
- 深入探索PostgreSQL开发技术(postgresql开发)
- Postgresql数据库实现事务回滚技术(postgresql回滚)
- 深入了解 PostgreSQL 数据库结构(postgresql结构)
- PostgreSQL登录:实现无缝连接(postgresql登录)
- MySQL中VAL函数的使用方法与作用简介(mysql 中val函数)
- 使用命令行探索Redis一种新方法(用命令行查找redis)
- MySQL删除多行语句快速高效的数据清理方法(mysql中删除多行语句)
- 深入探讨MySQL三联查询的实现方法和使用技巧(mysql三联查询)
- MySQL 多个字段合并简单实用的方法(mysql 不同字段合并)
- C#递归读取XML菜单数据的方法
- JavaScriptMath.ceil方法(对数值向上取整)