zl程序教程

您现在的位置是:首页 >  后端

当前栏目

Keepalived 非抢占模式详解 Nginx+keepalived实战

模式Nginx 详解 实战 Keepalived 抢占
2023-09-14 09:15:16 时间

背景:俩节点haproxy通过keepalived实现高可用


说明:haproxy的实际运行过程中,当master发生异常,且后期恢复master正常后,存在抢占或非抢占两种情况。简单点说抢占模式就是,当master宕机后,backup 接管服务。后续当master恢复后,vip漂移到master上,master重新接管服务,多了一次多余的vip切换,而在实际生产中是不需要这样。实际生产中是,当 原先的master恢复后,状态变为backup,不接管服务,这是非抢占模式。

server1为master,server2位backup,且master优先级大于backup。keepalived启动后server1获得master,server2为backup。当server1宕机后, server2接管服务。当server1恢复后,server1重新接管服务变为master,而server2变为backup。属于抢占式

server1和server2都为backup。我们要注意启动server服务的启动顺序,先启动的升级为master,与优先级无关。且配置nopreempt

比如server1获得master权限,server2为backup。此时server1宕机后,server2接管服务升级为master。当server1恢复后权限将为backup,不会争抢 server2的master权限,server2将会继续master权限。属于非抢占式

重点:非抢占式俩节点state必须为bakcup,且必须配置nopreempt

注意:这样配置后,我们要注意启动服务的顺序,优先启动的获取master权限,与优先级没有关系了

总结:抢占模式即MASTER从故障中恢复后,会将VIP从BACKUP节点中抢占过来。非抢占模式即MASTER恢复后不抢占BACKUP升级为MASTER后的VIP

1、两个节点的state都必须配置为BACKUP

2、两个节点都必须加上配置 nopreempt

3、其中一个节点的优先级必须要高于另外一个节点的优先级。

 

keepalived工作原理


keepalived可提供vrrp以及health-check功能,可以只用它提供双机浮动的vip(vrrp虚拟路由功能),这样可以简单实现一个双机热备高可用功能;keepalived是以VRRP虚拟路由冗余协议为基础实现高可用的,可以认为是实现路由器高可用的协议,即将N台提供相同功能的路由器组成一个路由器组,这个组里面有一个master和多个backup,master上面有一个对外提供服务的vip(该路由器所在局域网内其他机器的默认路由为该vip),master会发组播,当backup收不到VRRP包时就认为master宕掉了,这时就需要根据VRRP的优先级来选举一个backup当master。这样的话就可以保证路由器的高可用了。

 

Keepalived不抢占机制(nopreempt)


当Master出现问题后,Backup会竞选为新的Master,那么之前的Master如果故障恢复后,是继续成为Master还是变成Backup呢?默认情况下,如果没设置不抢占,那么之前的Master起来后还是会继续抢占成为Master,也就是说,整个过程需要发生两次切换;主机诶单出故障会发送Master —> Backup,主节点恢复会发送 Backup —>Master;这样对业务频繁的切换是不能容忍的,因此我们希望Master起来后成为Backup,所以要设置不抢占。

Keepalived里面提供了 nopreempt 这个配置只能用在状态为Backup的机器上,但是我们明明希望的是Master不进行抢占,那没办法,Master的状态也得设置为Backup,也就是说两台负载均衡器都要讲state状态设置为Backup;那么谁是Master?就要通过优先级priority的高低来决定了,优先级高得成为Master,反之。

 

master节点keepalived配置如下(不抢占机制)


[root@real-server1 ~]# cat /etc/keepalived/keepalived.conf 
global_defs {
    router_id real-server1   #运行keepalived机器的标识
    script_user root
    enable_script_security
 }

vrrp_script chk_nginx {
    script "/data/shell/check_nginx_status.sh"   #监控服务脚本,脚本记得授予x执行权限;
    interval 2     #指定脚本执行的间隔。单位是秒。默认为1s。
}

vrrp_instance VI_1 {
     state BACKUP
     interface ens32         #绑定虚拟机的IP
     virtual_router_id 151   #虚拟路由id,和从机保持一致
     priority 100
     nopreempt               #设置为不抢占
     advert_int 5            #查间隔,默认1秒,VRRP心跳包的发送周期,单位为s 组播信息发送间隔,两个节点设置必须一样
     authentication {
         auth_type  PASS    #主辅认证密码(生产环境介意修改),最长支持八位
         auth_pass  1111

     }
     virtual_ipaddress {     #虚拟IP地址
       192.168.179.199
     }
    
      track_script {                                                                                  
       chk_nginx
    }
 }


#这个脚本你可以测试一下,将keepalived正常启动,然后你pkill nginx,执行这个脚本看看后面会发生什么,通过/var/log/messages来验证这个脚本是否正确,或者systemctl status keepalived查看状态,正确之后就可以配置在你的keepalived的配置文件当中
[root@real-server1 ~]# cat /data/shell/check_nginx_status.sh
#!/bin/bash
nginx_status=$(ps -ef | grep nginx | grep -v grep | grep -v check | wc -l)
	
if [ $nginx_status -eq 0 ];then
		systemctl stop keepalived.service
fi
[root@real-server1 ~]# chmod o+x /data/shell/check_nginx_status.sh 

#用脚本实现健康检查,如果nginx进程为0就要发生keepalived切换,实现VIP漂移。当你的nginx挂掉了,那么你的keepalived永远都启动不了,因为下面脚本定义了systemctl stop keepalived.service,nginx没有起来那么keepalived起来会自动关闭



#advert_int 5  检查间隔,默认1秒,VRRP心跳包的发送周期,单位为s,组播信息发送间隔,可以看到组播包里面的信息包含了virtual_router_id 151 虚拟路由ID和优先级priority 100
[root@localhost shell]# tcpdump -i ens32 -nn net 224.0.0.18
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens32, link-type EN10MB (Ethernet), capture size 262144 bytes
10:26:42.910170 IP 192.168.179.102 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 100, authtype simple, intvl 5s, length 20
10:26:47.911831 IP 192.168.179.102 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 100, authtype simple, intvl 5s, length 20
10:26:52.915502 IP 192.168.179.102 > 224.0.0.18: VRRPv2, Advertisement, vrid 151, prio 100, authtype simple, intvl 5s, length 20

 

测试配置是否正确 


[root@real-server1 ~]# ps -ef | grep keepalived | grep -v grep
root      70592      1  0 20:05 ?        00:00:00 /usr/sbin/keepalived -D
root      70593  70592  0 20:05 ?        00:00:00 /usr/sbin/keepalived -D
root      70594  70592  0 20:05 ?        00:00:00 /usr/sbin/keepalived -D
#keepalived正常启动的时候,共启动3个进程,一个是父进程,负责监控其子进程,一个是vrrp子进程,另外一个是checkers子进程。
两个子进程都被系统watchlog看管,两个子进程各自负责复杂自己的事。Healthcheck子进程检查各自服务器的健康状况,例如http,lvs。如果healthchecks进程检查到master上服务不可用了,就会通知本机上的vrrp子进程,让他删除通告,并且去掉虚拟IP,转换为BACKUP状态。


Jul 31 20:05:38 real-server1 Keepalived[70591]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Jul 31 20:05:38 real-server1 Keepalived[70591]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 31 20:05:38 real-server1 systemd: PID file /var/run/keepalived.pid not readable (yet?) after start.
Jul 31 20:05:38 real-server1 Keepalived[70592]: Starting Healthcheck child process, pid=70593
Jul 31 20:05:38 real-server1 Keepalived[70592]: Starting VRRP child process, pid=70594
Jul 31 20:05:38 real-server1 systemd: Started LVS and VRRP High Availability Monitor.
Jul 31 20:05:38 real-server1 Keepalived_healthcheckers[70593]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: Registering Kernel netlink reflector
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: Registering Kernel netlink command channel
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: Registering gratuitous ARP shared channel
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: Using LinkWatch kernel netlink reflector...
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Jul 31 20:05:38 real-server1 Keepalived_vrrp[70594]: VRRP_Script(chk_nginx) succeeded


#测试健康检查脚本是否有用,可以看到脚本没问题
[root@real-server2 ~]# pkill nginx

Jul 31 20:06:50 real-server1 Keepalived[70592]: Stopping
Jul 31 20:06:50 real-server1 systemd: Stopping LVS and VRRP High Availability Monitor...
Jul 31 20:06:50 real-server1 Keepalived_vrrp[70594]: VRRP_Instance(VI_1) sent 0 priority
Jul 31 20:06:50 real-server1 Keepalived_vrrp[70594]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 31 20:06:50 real-server1 Keepalived_healthcheckers[70593]: Stopped
Jul 31 20:06:51 real-server1 Keepalived_vrrp[70594]: Stopped
Jul 31 20:06:51 real-server1 systemd: Stopped LVS and VRRP High Availability Monitor.
Jul 31 20:06:51 real-server1 Keepalived[70592]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2

 

backup节点keepalived配置如下


[root@real-server2 ~]# cat /etc/keepalived/keepalived.conf 
global_defs {
    router_id real-server2
    script_user root
    enable_script_security
 }

vrrp_script chk_nginx {
    script "/data/shell/check_nginx_status.sh"
    interval 2
}

vrrp_instance VI_1 {
     state BACKUP
     interface ens32
     virtual_router_id 151
     priority 50
     advert_int 5
     authentication {
         auth_type  PASS
         auth_pass  1111
     }
     virtual_ipaddress {
       192.168.179.199
     }
    
      track_script {                                                                                  
       chk_nginx
    }
 }

 

不抢占机制演示如下


#Master节点测试,将nginx进程杀掉,根据健康检查脚本会实现vip漂移到备节点
[root@real-server1 ~]# ip a
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:61:90:c1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.179.103/24 brd 192.168.179.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet 192.168.179.199/32 scope global ens32
       valid_lft forever preferred_lft forever
    inet 192.168.179.199/24 brd 192.168.179.255 scope global secondary ens32:1
       valid_lft forever preferred_lft forever
    inet6 fe80::f54d:5639:6237:2d0e/64 scope link 
       valid_lft forever preferred_lft forever
[root@real-server1 ~]# pkill nginx

#keepalived主节点日志如下
Jul 31 20:27:45 real-server1 Keepalived[72926]: Stopping
Jul 31 20:27:45 real-server1 systemd: Stopping LVS and VRRP High Availability Monitor...
Jul 31 20:27:45 real-server1 Keepalived_vrrp[72928]: VRRP_Instance(VI_1) sent 0 priority
Jul 31 20:27:45 real-server1 Keepalived_vrrp[72928]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 31 20:27:45 real-server1 Keepalived_healthcheckers[72927]: Stopped
Jul 31 20:27:46 real-server1 Keepalived_vrrp[72928]: Stopped
Jul 31 20:27:46 real-server1 Keepalived[72926]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Jul 31 20:27:46 real-server1 systemd: Stopped LVS and VRRP High Availability Monitor.


#来到备节点查看
[root@real-server2 ~]# ip a
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:a7:ff:f7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.179.104/24 brd 192.168.179.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet 192.168.179.199/32 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::831c:6df1:a633:742a/64 scope link 
       valid_lft forever preferred_lft forever

#备节点的keepalived日志,可以看到故障转移成功
Jul 31 08:27:45 real-server2 Keepalived_vrrp[8961]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: VRRP_Instance(VI_1) Entering MASTER STATE
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: VRRP_Instance(VI_1) setting protocol VIPs.
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens32 for 192.168.179.199
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:50 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:55 real-server2 Keepalived_vrrp[8961]: Sending gratuitous ARP on ens32 for 192.168.179.199
Jul 31 08:27:55 real-server2 Keepalived_vrrp[8961]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens32 for 192.168.179.199



#Master主节点将nginx启动,并且启动keepalived,可以看到现在进入了backup状态,没有抢占
[root@real-server1 ~]# /usr/local/nginx/sbin/nginx
[root@real-server1 ~]# systemctl start keepalived

Jul 31 20:43:33 real-server1 Keepalived[75196]: Starting Healthcheck child process, pid=75197
Jul 31 20:43:33 real-server1 Keepalived[75196]: Starting VRRP child process, pid=75198
Jul 31 20:43:33 real-server1 systemd: Started LVS and VRRP High Availability Monitor.
Jul 31 20:43:33 real-server1 Keepalived_healthcheckers[75197]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: Registering Kernel netlink reflector
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: Registering Kernel netlink command channel
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: Registering gratuitous ARP shared channel
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: VRRP_Instance(VI_1) removing protocol VIPs.
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: Using LinkWatch kernel netlink reflector...
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: VRRP_Instance(VI_1) Entering BACKUP STATE
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Jul 31 20:43:33 real-server1 Keepalived_vrrp[75198]: VRRP_Script(chk_nginx) succeeded

最后总结上面简化配置为 

A调度机器设置为:
vrrp_instance VI_feng
{
    ....
     
    state backup
    priority 100
    nopreempt
   ....
 
  }

B调度机器设置为:
vrrp_instance VI_feng
{
    ....
     
    state backup
    priority 70
    nopreempt
   ....
 
  }

不抢占是配置在优先级高的机器上面,同时状态要是backup,即集群内部要想实现不抢占,状态都需要设置为backup,优先级还是正常有高有低。谁优先级高配置一个不抢占参数nopreempt(因为优先级高的会抢占VIP)。每次抢占就需要发生切换和漂移,来回切换漂移影响业务访问,服务要中断!!!!