zl程序教程

您现在的位置是:首页 >  其他

当前栏目

如何防止远程程序与RDS PG连接中断

程序连接 如何 远程 防止 中断 pg RDS
2023-09-14 09:04:38 时间

偶尔有用户会遇到远程程序连接RDS PG,在不做任何操作一段时间后可能中断。

其实可能是用户和RDS PG之间,某些网络设备设置了会话空闲超时,会主动中断会话。

那么有什么方法能解决这个问题呢?

运维的同学可能有这方面的经验,例如使用securecrt或者其他终端连接服务器时,可以设置这些管理工具的no-op,周期性的发一些空字符过去,保证会话上有流量。

但是数据库连接怎么搞呢?
PostgreSQL提供了tcp keep alive的参数可供用户设置。

为了避免会话中断的问题, 可以通过tcp层的keepalive机制来达到传输心跳数据的目的.

方法一,设置数据库参数

PostgreSQL支持会话级别的设置, 数据库级别的设置在$PGDATA/postgresql.conf,
建议设置如下三个参数的值

# - TCP Keepalives - 

# see "man 7 tcp" for details 

tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds; 

 # 0 selects the system default 

tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds; 

 # 0 selects the system default 

tcp_keepalives_count = 10 # TCP_KEEPCNT; 

 # 0 selects the system default 

解释详见本文末尾[参考1].

代码详见本文末尾[参考2].

参数解释
tcp_keepalives_idle : 定义这个tcp连接间隔多长后开始发送 第一个 tcp keepalive 包.
tcp_keepalives_interval : 定义在以上发送第一个tcp keepalive包后如果在这个时间间隔内没有收到对端的回包, 则开始发送第二个tcp keepalive包. 在这个时间内再没有回包的话则发送第三个keepalive包....直到达到tcp_keepalives_count次则broken 连接.
tcp_keepalives_count : 定义一共发送多少个tcp keepalive包, 达到这个数字后如果对端都没有回响应包, 则关闭这个连接.
另外需要注意的是, 这几个PostgreSQL参数对PostgreSQL数据库服务端的backend process生效.
所以如果发出第一个keepalive包后, 在tcp_keepalives_interval秒内有客户端回包, 则又回到tcp_keepalives_idle计数(注意此时计数是tcp_keepalives_idle 减去 tcp_keepalives_interval 秒).

例如 :
CLIENT (172.16.3.33) :

psql -h 172.16.3.150 -p 1919 -U postgres postgres 

postgres=# show tcp_keepalives_idle; 

 tcp_keepalives_idle 

--------------------- 

(1 row) 

postgres=# show tcp_keepalives_interval; 

 tcp_keepalives_interval 

------------------------- 

(1 row) 

postgres=# show tcp_keepalives_count; 

 tcp_keepalives_count 

---------------------- 

(1 row) 

查找数据库端对应的process id.

postgres=# select pg_backend_pid(); 

 pg_backend_pid 

---------------- 

 11016 

(1 row) 

SERVER (172.16.3.150) :
在数据库端查看keepalive timer

root@digoal-PowerEdge-R610:~# netstat -anpo|grep 11016 

tcp 0 0 172.16.3.150:1919 172.16.3.33:50326 ESTABLISHED 11016/postgres: pos keepalive (39.73/0/0) 

CLIENT (172.16.3.33) :
在客户端查看keepalive timer

postgres=# \! 

[pg92@db-172-16-3-33 ~]$ netstat -anpo|grep 1919 

(Not all processes could be identified, non-owned process info 

 will not be shown, you would have to be root to see it all.) 

tcp 0 0 172.16.3.33:50326 172.16.3.150:1919 ESTABLISHED 20408/psql keepalive (7143.19/0/0) 

继承了操作系统的keepalive设置

通过tcpdump可以观察间隔一定的时间, 会发出keepalive包.

方法二、设置操作系统级的参数:

/etc/sysctl.conf 

net.ipv4.tcp_keepalive_intvl = 75 

net.ipv4.tcp_keepalive_probes = 9 

net.ipv4.tcp_keepalive_time = 7200 

设置CLIENT服务器系统级的keepalive, 然后重新连接到数据库, 看看客户端的keepalive timer会不会发生变化

[root@db-172-16-3-33 ~]# sysctl -w net.ipv4.tcp_keepalive_time=70 

net.ipv4.tcp_keepalive_time = 70 

[root@db-172-16-3-33 ~]# su - pg92 

pg92@db-172-16-3-33- psql -h 172.16.3.150 -p 1919 -U postgres postgres 

psql (9.2.4) 

Type "help" for help. 

postgres=# \! 

[pg92@db-172-16-3-33 ~]$ netstat -anpo|grep 1919 

(Not all processes could be identified, non-owned process info 

 will not be shown, you would have to be root to see it all.) 

tcp 0 0 172.16.3.33:50327 172.16.3.150:1919 ESTABLISHED 20547/psql keepalive (55.44/0/0) 

系统层设置的keepalive已经生效了.

.1.
通过tcpdump观察keepalive包, 也可以将这些包抓下来通过wireshark查看.

pg92@db-172-16-3-33- psql -h 172.16.3.150 -p 1919 -U postgres postgres 

postgres=# set tcp_keepalives_idle=13; 

root@digoal-PowerEdge-R610:~# tcpdump -i eth0 -n tcp port 1919 

08:43:27.647408 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [P.], seq 4937:4952, ack 58, win 115, length 15 

08:43:27.647487 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

08:43:40.667410 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [.], ack 58, win 115, length 0 

08:43:40.667536 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

08:43:53.691417 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [.], ack 58, win 115, length 0 

08:43:53.691544 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

08:44:06.715416 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [.], ack 58, win 115, length 0 

08:44:06.715544 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

08:44:19.739422 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [.], ack 58, win 115, length 0 

08:44:19.739544 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

08:44:32.763416 IP 172.16.3.150.1919 172.16.3.33.15268: Flags [.], ack 58, win 115, length 0 

08:44:32.763546 IP 172.16.3.33.15268 172.16.3.150.1919: Flags [.], ack 4952, win 488, length 0 

每个sock会话, 每隔13秒, 数据库服务端会发送心跳包.

.2.
由于每个tcp会话都需要1个计时器, 所以如果连接数很多, 开启keepalive也是比较耗费资源的.
可以使用setsockopt关闭该会话keepalive的功能. 下一篇BLOG介绍如何禁用keepalive.

.3.
如果tcp_keepalives_idle小于tcp_keepalives_interval, 那么间隔多长时间发1个心跳包呢?
例如tcp_keepalives_idle=2, tcp_keepalives_interval=10.
答案是10, 因为检查计时需要10秒.

postgres=# set tcp_keepalives_idle=2; 

postgres=# set tcp_keepalives_interval=10; 

root@digoal-PowerEdge-R610- tcpdump -i eth0 -n tcp port 1919 

09:32:27.035424 IP 172.16.3.150.1919 172.16.3.33.47277: Flags [.], ack 195, win 115, length 0 

09:32:27.035608 IP 172.16.3.33.47277 172.16.3.150.1919: Flags [.], ack 366, win 54, length 0 

09:32:37.051426 IP 172.16.3.150.1919 172.16.3.33.47277: Flags [.], ack 195, win 115, length 0 

09:32:37.051569 IP 172.16.3.33.47277 172.16.3.150.1919: Flags [.], ack 366, win 54, length 0 

09:32:47.067423 IP 172.16.3.150.1919 172.16.3.33.47277: Flags [.], ack 195, win 115, length 0 

09:32:47.067552 IP 172.16.3.33.47277 172.16.3.150.1919: Flags [.], ack 366, win 54, length 0 

09:32:57.083428 IP 172.16.3.150.1919 172.16.3.33.47277: Flags [.], ack 195, win 115, length 0 

09:32:57.083574 IP 172.16.3.33.47277 172.16.3.150.1919: Flags [.], ack 366, win 54, length 0 

.1. http://www.postgresql.org/docs/9.2/static/runtime-config-connection.html

tcp_keepalives_idle (integer) 

Specifies the number of seconds before sending a keepalive packet on an otherwise idle connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPIDLE or TCP_KEEPALIVE symbols, and on Windows; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. 

Note: On Windows, a value of 0 will set this parameter to 2 hours, since Windows does not provide a way to read the system default value. 

tcp_keepalives_interval (integer) 

Specifies the number of seconds between sending keepalives on an otherwise idle connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPINTVL symbol, and on Windows; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. 

Note: On Windows, a value of 0 will set this parameter to 1 second, since Windows does not provide a way to read the system default value. 

tcp_keepalives_count (integer) 

Specifies the number of keepalive packets to send on an otherwise idle connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPCNT symbol; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero. 

Note: This parameter is not supported on Windows, and must be zero. 

.2. /usr/share/doc/kernel/Documentation/networking/ip-sysctl.txt

tcp_keepalive_time - INTEGER 

 How often TCP sends out keepalive messages when keepalive is enabled. 

 Default: 2hours. 

tcp_keepalive_probes - INTEGER 

 How many keepalive probes TCP sends out, until it decides that the 

 connection is broken. Default value: 9. 

tcp_keepalive_intvl - INTEGER 

 How frequently the probes are send out. Multiplied by 

 tcp_keepalive_probes it is time to kill not responding connection, 

 after probes started. Default value: 75sec i.e. connection 

 will be aborted after ~11 minutes of retries. 

.3. src/backend/libpq/pqcomm.c

截取一个设置interval的函数. 

pq_setkeepalivesinterval(int interval, Port *port) 

 if (port == NULL || IS_AF_UNIX(port- laddr.addr.ss_family)) 

 return STATUS_OK; 

#if defined(TCP_KEEPINTVL) || defined (SIO_KEEPALIVE_VALS) 

 if (interval == port- keepalives_interval) 

 return STATUS_OK; 

#ifndef WIN32 

 if (port- default_keepalives_interval = 0) 

 if (pq_getkeepalivesinterval(port) 0) 

 if (interval == 0) 

 return STATUS_OK; /* default is set but unknown */ 

 else 

 return STATUS_ERROR; 

 if (interval == 0) 

 interval = port- default_keepalives_interval; 

 if (setsockopt(port- sock, IPPROTO_TCP, TCP_KEEPINTVL, 

 (char *) interval, sizeof(interval)) 0) 

 elog(LOG, "setsockopt(TCP_KEEPINTVL) failed: %m"); 

 return STATUS_ERROR; 

 port- keepalives_interval = interval; 

#else /* WIN32 */ 

 return pq_setkeepaliveswin32(port, port- keepalives_idle, interval); 

#endif 

#else 

 if (interval != 0) 

 elog(LOG, "setsockopt(TCP_KEEPINTVL) not supported"); 

 return STATUS_ERROR; 

#endif 

 return STATUS_OK; 

} 

.4. man netstat

 -o, --timers 

 Include information related to networking timers. 

.5. man 7 tcp

 /proc interfaces 

 System-wide TCP parameter settings can be accessed by files in the directory /proc/sys/net/ipv4/. In addition, most IP 

 /proc interfaces also apply to TCP; see ip(7). Variables described as Boolean take an integer value, with a nonzero 

 value ("true") meaning that the corresponding option is enabled, and a zero value ("false") meaning that the option is 

 disabled. 

 tcp_keepalive_intvl (integer; default: 75; since Linux 2.4) 

 The number of seconds between TCP keep-alive probes. 

 tcp_keepalive_probes (integer; default: 9; since Linux 2.2) 

 The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is 

 obtained from the other end. 

 tcp_keepalive_time (integer; default: 7200; since Linux 2.2) 

 The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives 

 are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An 

 idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds 

 apart) when keep-alive is enabled. 

 Socket Options 

 To set or get a TCP socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level 

 argument set to IPPROTO_TCP. In addition, most IPPROTO_IP socket options are valid on TCP sockets. For more information 

 see ip(7). 

 TCP_KEEPCNT (since Linux 2.4) 

 The maximum number of keepalive probes TCP should send before dropping the connection. This option should not be 

 used in code intended to be portable. 

 TCP_KEEPIDLE (since Linux 2.4) 

 The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the 

 socket option SO_KEEPALIVE has been set on this socket. This option should not be used in code intended to be 

 portable. 

 TCP_KEEPINTVL (since Linux 2.4) 

 The time (in seconds) between individual keepalive probes. This option should not be used in code intended to be 

 portable. 

.6. netstat core :
以下内容转载自 :
http://vzkernel.blogspot.tw/2012/09/description-of-netstat-timers.html

Its not easy to find out the detail description of a network socket timer from internet, I did some dig today. 

The mannual page from netstat: 

 -o, --timers 

 Include information related to networking timers. 

Then we check some command output: 

netstat -nto | head 

Active Internet connections (w/o servers) 

Proto Recv-Q Send-Q Local Address Foreign Address State Timer 

tcp 0 0 127.0.0.1:5005 127.0.0.1:55309 SYN_RECV on (5.14/1/0) 

tcp 0 0 127.0.0.1:5005 127.0.0.1:55312 SYN_RECV on (1.34/0/0) 

tcp 0 0 127.0.0.1:5005 127.0.0.1:55310 SYN_RECV on (2.34/0/0) 

tcp 0 0 127.0.0.1:5005 127.0.0.1:55303 SYN_RECV on (4.14/1/0) 

tcp 0 0 192.168.1.16:57018 74.125.128.132:443 ESTABLISHED off (0.00/0/0) 

tcp 0 0 192.168.1.16:41245 203.208.46.2:443 ESTABLISHED off (0.00/0/0) 

tcp 0 0 192.168.1.16:42636 203.208.46.7:443 TIME_WAIT timewait (44.66/0/0) 

tcp 0 0 127.0.0.1:55302 

The Timer field with the format (5.14/1/0), what does it mean? 

Lets figure it out. 


The second step, check the source code from net-tools, grab the source code from source forge: git clone git://net-tools.git.sourceforge.net/gitroot/net-tools/net-tools from userspace netstat.c: tcp_do_one(): .... if (flag_opt) switch (timer_run) { case 0: snprintf(timers, sizeof(timers), _("off (0.00/%ld/%d)"), retr, timeout); break; case 1: snprintf(timers, sizeof(timers), _("on (%2.2f/%ld/%d)"), (double) time_len / HZ, retr, timeout); break; case 2: snprintf(timers, sizeof(timers), _("keepalive (%2.2f/%ld/%d)"), (double) time_len / HZ, retr, timeout); break; case 3: snprintf(timers, sizeof(timers), _("timewait (%2.2f/%ld/%d)"), (double) time_len / HZ, retr, timeout); break; default: snprintf(timers, sizeof(timers), _("unkn-%d (%2.2f/%ld/%d)"), timer_run, (double) time_len / HZ, retr, timeout); break; Both the fields are grabbed from proc/net/tcp, lets check the content of it: $ head /proc/net/tcp sl local_address rem_address st tx_queue rx_queue tr tm- when retrnsmt uid timeoutinode 0: 00000000:036B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 13012 1 ffff88007baf5400 299 0 0 2 -1 1: 0100007F:138D 00000000:0000 0A 00000000:00000006 00:00000000 00000000 500 0 472674 1 ffff880021ab0380 299 0 0 2 -1 2: 00000000:006F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 11242 1 ffff8800796006c0 299 0 0 2 -1 3: 00000000:BD50 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 13056 1 ffff880078da7440 299 0 0 2 -1 4: 017AA8C0:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 14066 1 ffff880078dac100 299 0 0 2 -1 description from proc_net_tcp.txt timer_active: 0 no timer is pending 1 retransmit-timer is pending 2 another timer (e.g. delayed ack or keepalive) is pending 3 this is a socket in TIME_WAIT state. Not all fields will contain data (or even exist) 4 zero window probe timer is pending No too much glue? Lets dive into the kernel code to have a look how the proc_net_tcp defined:
" sl local_address rem_address st tx_queue " "rx_queue tr tm- when retrnsmt uid timeout " "inode"); goto out; st = seq- private; switch (st- state) { case TCP_SEQ_STATE_LISTENING: case TCP_SEQ_STATE_ESTABLISHED: get_tcp4_sock(v, seq, st- num, len); break; case TCP_SEQ_STATE_OPENREQ: get_openreq4(st- syn_wait_sk, v, seq, st- num, st- uid, len); break; case TCP_SEQ_STATE_TIME_WAIT: get_timewait4_sock(v, seq, st- num, len); break; seq_printf(seq, "%*s\n", TMPSZ - 1 - len, ""); out: return 0;
seq_printf(f, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX " "%08X %5d %8d %lu %d %pK %lu %lu %u %u %d%n", i, src, srcp, dest, destp, sk- sk_state, tp- write_seq - tp- snd_una, rx_queue, timer_active, jiffies_to_clock_t(timer_expires - jiffies), icsk- icsk_retransmits, sock_i_uid(sk), icsk- icsk_probes_out, sock_i_ino(sk), atomic_read( sk- sk_refcnt), sk, jiffies_to_clock_t(icsk- icsk_rto), jiffies_to_clock_t(icsk- icsk_ack.ato), (icsk- icsk_ack.quick 1) | icsk- icsk_ack.pingpong, tp- snd_cwnd, tcp_in_initial_slowstart(tp) ? -1 : tp- snd_ssthresh, len); which defined in include/net/inet_connection_sock.h: /** inet_connection_sock - INET connection oriented sock * @icsk_accept_queue: FIFO of established children * @icsk_bind_hash: Bind node * @icsk_timeout: Timeout * @icsk_retransmit_timer: Resend (no ack) * @icsk_rto: Retransmit timeout * @icsk_pmtu_cookie Last pmtu seen by socket * @icsk_ca_ops Pluggable congestion control hook * @icsk_af_ops Operations which are AF_INET{4,6} specific * @icsk_ca_state: Congestion control state * @icsk_retransmits: Number of unrecovered [RTO] timeouts * @icsk_pending: Scheduled timer event * @icsk_backoff: Backoff * @icsk_syn_retries: Number of allowed SYN (or equivalent) retries * @icsk_probes_out: unanswered 0 window probes * @icsk_ext_hdr_len: Network protocol overhead (IP/IPv6 options) * @icsk_ack: Delayed ACK control data * @icsk_mtup; MTU probing control data For a not established socket static void get_openreq4(const struct sock *sk, const struct request_sock *req, struct seq_file *f, int i, int uid, int *len) const struct inet_request_sock *ireq = inet_rsk(req); int ttd = req- expires - jiffies; seq_printf(f, "%4d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %pK%n", ireq- loc_addr, ntohs(inet_sk(sk)- inet_sport), ireq- rmt_addr, ntohs(ireq- rmt_port), TCP_SYN_RECV, 0, 0, /* could print option size, but that is af dependent. */ 1, /* timers active (only the expire timer) */ jiffies_to_clock_t(ttd), req- retrans, uid, 0, /* non standard timer */ 0, /* open_requests have no inode */ atomic_read( sk- sk_refcnt), req, len); static void get_timewait4_sock(const struct inet_timewait_sock *tw, struct seq_file *f, int i, int *len) __be32 dest, src; __u16 destp, srcp; int ttd = tw- tw_ttd - jiffies; if (ttd 0) ttd = 0; dest = tw- tw_daddr; src = tw- tw_rcv_saddr; destp = ntohs(tw- tw_dport); srcp = ntohs(tw- tw_sport); seq_printf(f, "%4d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK%n", i, src, srcp, dest, destp, tw- tw_substate, 0, 0, 3, jiffies_to_clock_t(ttd), 0, 0, 0, 0, atomic_read( tw- tw_refcnt), tw, len); Lets back to our questions, The description of the Timer field from netstat -o which with the format (1st/2nd/3rd) 1. The 1st field indicates when the timer will expire 2. The 2nd field is the retransmits which already have done 3. The 3rd field - for a synreq socket(not yet established) and a timewait socket its always 0, for a established socket its unanswered 0 window probes TCP zero window probe means that the receiver has reduced his receive buffer (a.k.a. window) to zero, basically telling the sender to stop sending - usually for performance reasons. If the receiver does not recover and send an so called "Window Update" with a buffer size greater than zero (meaning, the sender is allowed to continue) the sender will become "impatient" at some point and "check" if the receiver is able to receive more data. That "check" is the Zero Window Probe. TCP Keep-Alive - Occurs when the sequence number is equal to the last byte of data in the previous packet. Used to elicit an ACK from the receiver. TCP Keep-Alive ACK - Self-explanatory. ACK packet sent in response to a "keep-alive" packet. TCP DupACK - Occurs when the same ACK number is seen AND it is lower than the last byte of data sent by the sender. If the receiver detects a gap in the sequence numbers, it will generate a duplicate ACK for each subsequent packet it receives on that connection, until the missing packet is successfully received (retransmitted). A clear indication of dropped/missing packets. TCP ZeroWindow - Occurs when a receiver advertises a receive window size of zero. This effectively tells the sender to stop sending because the receivers buffer is full. Indicates a resource issue on the receiver, as the application is not retrieving data from the TCP buffer in a timely manner. TCP ZerowindowProbe - The sender is testing to see if the receivers zero window condition still exists by sending the next byte of data to elicit an ACK from the receiver. If the window is still zero, the sender will double his persist timer before probing again.

Pentaho(原Kettle)连接RDS MySQL云数据库 在2006年,Pentaho公司收购了Kettle项目,原Kettle项目发起人Matt Casters加入了Pentaho团队,成为Pentaho套件数据集成架构师。从此,Kettle成为企业级、数据集成及商业智能套件Pentaho的主要组成部分,Kettle亦重命名为Pentaho Data Integration。 本文介绍如何使用Pentaho(原Kettle)连接阿里云RDS MySQL数据库。
PostgreSQL连接(JOIN) PostgreSQL JOIN子句用于把两个或多个表的行结合起来,基于这些表之间的共同变量。 在PostgreSQL中,JOIN有五种连接类型: CROSS JOIN:交叉连接 内连接:内连接 LEFT OUTER JOIN:左外连接 右外连接:右外连接 FULL OUTER JOIN:全外连接 接下来让我们创建两张表COMPANY和DEPARTMENT。
通过云服务器跳板连接云数据库RDS? 购买同一VPC下的云服务器ECS和云数据库RDS后,在RDS不绑定EIP,如何通过本地Navicat、DBever等工具连接?这里,我们采用Nginx做代理,轻松实现本地访问云数据库,节省RDS绑定EIP的费用。