概述
案例要求:Tomcat + Nginx + Keepalived
正常情况下,我们可以利用 Nginx 的反向代理实现负载均衡,比如类似这样的:
为了实现高可用,避免单点故障,我们也可以部署多个 Nginx ,形成某种意义上的 Nginx 集群,便有:
上图这种其实是伪高可用,即客户端访问时需要指定特定的 IP 地址或域名,但客户端并不知道哪台机器是故障的还是正常运作的。我们的要求是让客户端访问一个 IP 地址或域名即可,即使某台机器故障了,也不影响客户端的正常访问,为了达到这种效果,也就有了 Keepalived 的 VRRP,即:
客户端直接对 VIP(10.1.1.10/24)进行请求,可隐藏真实的网卡IP地址。
在这种架构中,横向拓展(Scale Out)提高了请求与并发能力,纵向拓展(Scale Up)提高了可用性。
安装部分
基本信息如下表所示:
OS | COMMAND/GUI | 真实 IP 地址 | 部署软件 | VIP | 初始主备 |
---|---|---|---|---|---|
RL 8.7 | COMMAND | 192.168.100.3/24 | Tomcat V10.1.7(源代码) | ||
RL 8.7 | COMMAND | 192.168.100.4/24 | Tomcat V10.1.7(源代码) | ||
RL 8.7 | COMMAND | 10.1.1.3/24 | Nginx V1.24.0(源代码) keepalived V2.2.7(源代码) |
10.1.1.10/24 | Master |
RL 8.7 | COMMAND | 10.1.1.4/24 | Nginx V1.24.0(源代码) keepalived V2.2.7(源代码) |
10.1.1.10/24 | Backup |
请注意!您不必与作者的安装方式一样,直接从存储库中安装也是可以的。这里只是为了展示最麻烦且性能最好的一种安装方式。如果您已经在相关的机器上使用存储库安装了相应的软件,则可以忽略这一部分。
192.168.100.3/24 源代码安装 tomcat
Shell > dnf -y install java-11-openjdk java-11-openjdk-devel tar gzip bzip2 xz zip wget
Shell > wget -c https://dlcdn.apache.org/tomcat/tomcat-10/v10.1.7/bin/apache-tomcat-10.1.7.tar.gz
Shell > mkdir /usr/local/tomcat/
Shell > tar -zvxf apache-tomcat-10.1.7.tar.gz -C /usr/local/src/
Shell > mv /usr/local/src/apache-tomcat-10.1.7/* /usr/local/tomcat/
Shell > cd /usr/local/tomcat/bin/
Shell > ./startup.sh
tomcat 的默认端口为 8080
-
配置用户与角色
默认角色 说明 manager-gui 允许访问HTML GUI和状态页面 manager-script 允许访问HTTP API和状态页面 manager-jmx 允许访问JMX代理和状态页面 manager-status 只允许访问状态页面 Shell > vim /usr/local/tomcat/conf/tomcat-users.xml ... <role rolename="tomcat"/> <role rolename="role1"/> <role rolename="manager-gui"/> <role rolename="manager-script"/> <role rolename="manager-jmx"/> <role rolename="manager-status"/> <user username="tomcat" password="tomcat" roles="tomcat,role1,manager-gui,manager-script,manager-jmx"/> </tomcat-users>
-
配置访问控制
Shell > vim /usr/local/tomcat/webapps/manager/META-INF/context.xml ... <!-- <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" /> --> ...
-
性能配置
启动一个 tomcat 实例对应1个进程(当然你可以在一个服务器上启动多个 tomcat,它们的端口不一样),1 个进程下面有多个线程,由 /usr/local/tomcat/conf/server.xml 这个文件决定,如果要对 tomcat 进行性能调整,则需要修改这个文件。
-
测试访问
Shell > cd /usr/local/tomcat/bin/ && ./shutdown.sh && ./startup.sh
会来到这样的页面显示:
192.168.100.4/24 源代码安装 tomcat
步骤同上,略过。
10.1.1.3/24 安装 keepalived 和 nginx
Nginx 的安装步骤如下:
# 创建 伪用户(0<uid<1000) 和 伪组(0<gid<1000)
Shell > groupadd -r nginx && useradd -g nginx -r -s /sbin/nologin nginx
Shell > id nginx
uid=992(nginx) gid=988(nginx) 组=988(nginx)
# 依赖包
Shell > dnf -y install gcc gcc-c++ openssl openssl-devel make pcre pcre-devel gzip wget tar gzip bzip2 zip
# 开始源代码编译
Shell > wget -c https://nginx.org/download/nginx-1.24.0.tar.gz && tar -zvxf nginx-1.24.0.tar.gz -C /usr/local/src/
Shell > cd /usr/local/src/nginx-1.24.0/ && ./configure --prefix=/usr/local/nginx/ \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_stub_status_module \
--with-http_gzip_static_module \
&& make && make install
# 测试启动,没问题
Shell > cd /usr/local/nginx/sbin/ && ./nginx
Keepalived 的安装步骤如下:
# 依赖包
Shell > dnf -y install wget make gcc openssl openssl-devel libnl3-devel libnl3 tar bzip2 gzip zip xz
# 源代码编译
Shell > wget -c https://keepalived.org/software/keepalived-2.2.7.tar.gz && tar -zvxf keepalived-2.2.7.tar.gz -C /usr/local/src/
Shell > cd /usr/local/src/keepalived-2.2.7/ && ./configure --prefix=/usr/local/keepalived \
--with-init=systemd \
--with-systemdsystemunitdir=/usr/lib/systemd/system \
&& make && make install
10.1.1.4/24 安装 keepalived 和 nginx
步骤同上,略过。
配置部分
配置 10.1.1.3/24 Nginx 的反向代理
这里为了方便,没有配置 Nginx 代理服务器本身的 ssl 证书
相关配置如下所示:
Shell > cat /usr/local/nginx/conf/nginx.conf
user nginx;
worker_processes 1;
error_log logs/error.log;
pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
server_tokens off;
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
tcp_nopush on;
keepalive_timeout 65;
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 3;
gzip_types text/css text/xml image/gif image/jpeg text/plain;
gzip_min_length 1k;
upstream tomcatserver {
server 192.168.100.3:8080;
server 192.168.100.4:8080;
}
server {
listen 80;
server_name 10.1.1.3;
location / {
proxy_pass http://tomcatserver;
}
}
}
在配置中,使用者需要重点关注 server { } 部分与 upstream { } 部分。
配置 10.1.1.4/24 Nginx 的反向代理
同样的,这里没有配置 Nginx 代理服务器本身的 ssl 证书
Shell > cat /usr/local/nginx/conf/nginx.conf
user nginx;
worker_processes 1;
error_log logs/error.log;
pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
server_tokens off;
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
tcp_nopush on;
keepalive_timeout 65;
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 3;
gzip_types text/css text/xml image/gif image/jpeg text/plain;
gzip_min_length 1k;
upstream tomcatserver {
server 192.168.100.3:8080;
server 192.168.100.4:8080;
}
server {
listen 80;
server_name 10.1.1.4;
location / {
proxy_pass http://tomcatserver;
}
}
}
配置 10.1.1.3/24 的 keepalived
相关操作步骤如下:
# Keepalived 相关
Shell > cd /usr/local/keepalived/etc/keepalived && mv keepalived.conf.sample keepalived.conf
Shell > vim /usr/local/keepalived/etc/keepalived/keepalived.conf
global_defs {
router_id VRRP01
vrrp_skip_check_adv_addr
enable_script_security
script_user root
}
vrrp_script check_nginx {
script "/usr/local/keepalived/check_nginx.sh"
interval 2
weight -5
}
vrrp_instance VI_1 {
state MASTER
interface ens160
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.1.1.10/24 dev ens160 label ens160:1
}
track_script {
check_nginx
}
}
# bash 脚本相关
Shell > vim /usr/local/keepalived/check_nginx.sh
#!/bin/bash
count=$(ps -lef | grep nginx | grep -v grep | wc -l)
if [ "${count}" == "0" ]
then
/usr/local/nginx/sbin/nginx
sleep 2
count=$(ps -lef | grep nginx | grep -v grep | wc -l)
if [ "${count}" == 0 ]
then
systemctl stop keepalived.service
fi
fi
Shell > chmod 755 /usr/local/keepalived/check_nginx.sh
然后是 systemd unit相关(也就是加上 -f /usr/local/keepalived/etc/keepalived/keepalived.conf
这一内容):
Shell > vim /usr/lib/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target
Documentation=man:keepalived(8)
Documentation=man:keepalived.conf(5)
Documentation=man:genhash(1)
Documentation=https://keepalived.org
[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/usr/local/keepalived/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived -f /usr/local/keepalived/etc/keepalived/keepalived.conf $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
配置 10.1.1.4/24 的 keepalived
Shell > cd /usr/local/keepalived/etc/keepalived && mv keepalived.conf.sample keepalived.conf
Shell > vim /usr/local/keepalived/etc/keepalived/keepalived.conf
global_defs {
router_id VRRP02
vrrp_skip_check_adv_addr
enable_script_security
script_user root
}
vrrp_script check_nginx {
script "/usr/local/keepalived/check_nginx.sh"
interval 2
weight -5
}
vrrp_instance VI_1 {
state BACKUP
interface ens160
virtual_router_id 51
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.1.1.10/24 dev ens160 label ens160:1
}
track_script {
check_nginx
}
}
# bash 脚本相关
Shell > vim /usr/local/keepalived/check_nginx.sh
#!/bin/bash
count=$(ps -lef | grep nginx | grep -v grep | wc -l)
if [ "${count}" == "0" ]
then
/usr/local/nginx/sbin/nginx
sleep 2
count=$(ps -lef | grep nginx | grep -v grep | wc -l)
if [ "${count}" == 0 ]
then
systemctl stop keepalived.service
fi
fi
Shell > chmod 755 /usr/local/keepalived/check_nginx.sh
bash 脚本的意思是检查 Nginx 的进程,若进程不存在,则会尝试启动 Nginx 并睡眠 2 秒,若依旧没有 Nginx 进程存在,则会直接关闭 Keepalived.service
然后是 systemd unit 相关(也就是加上 -f /usr/local/keepalived/etc/keepalived/keepalived.conf
这一内容):
Shell > vim /usr/lib/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target
Documentation=man:keepalived(8)
Documentation=man:keepalived.conf(5)
Documentation=man:genhash(1)
Documentation=https://keepalived.org
[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/usr/local/keepalived/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived -f /usr/local/keepalived/etc/keepalived/keepalived.conf $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
keepalived 的说明
由于 keepalived 一开始是给 路由器 做故障切换的,若针对具体的服务,则需要 vrrp_script <STRING> { }
定义脚本:
script <STRING>
定义脚本位置;interval <integer>
定义执行的间隔时间,单位为秒;weight <integer>
定义权重,脚本每次执行成功则增加权重,失败则减少权重。权重影响的是优先级。值范围为 -253 ~ 253timeout <integer>
定义超时时间;fall <integer>
定义失败的次数,也就是脚本连续执行 n 次失败后,则最终标记为失败,建议值为 2;user <USERNAME>
定义执行脚本的用户或组;init_fail
假设脚本初始状态为失败。
注意点:
router_id <STRING>
必须是不同且唯一的- Master 和 Backup 的实例名称需要一样,我们这里都是
VI_1
- Master 的优先级要比 Backup 的优先级高50
- VRID 需要一样,这里都是
virtual_router_id 51
- Master 和 Backup 配置文件的某些项不一致可能会导致高可用的「脑裂」问题
- 若在配置文件未写入任何有关抢占模式方面的内容,则默认就是 抢占模式
模拟故障切换
按照顺序依次启动各个机器的服务:
-
192.168.100.3/24 –
cd /usr/local/tomcat/bin && ./startup.sh
-
192.168.100.4/24 –
cd /usr/local/tomcat/bin && ./startup.sh
-
10.1.1.3/24 –
/usr/local/nginx/sbin/nginx
-
10.1.1.4/24 –
/usr/local/nginx/sbin/nginx
-
10.1.1.3/24 –
systemctl start keepalived.service
-
10.1.1.4/24 –
systemctl start keepalived.service
正常情况下的访问:
现在假设 Master 故障,于是手动关闭 Master 的 Keepalived(其实就相当于 Nginx 宕机了) – systemctl stop keepalived.service
此时查看 Backup 的 keepalived.service 的日志:
Shell > journalctl -u keepalived.service
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: (VI_1) Backup received priority 0 advertisement
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: (VI_1) Receive advertisement timeout
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: (VI_1) Entering MASTER STATE
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: (VI_1) setting VIPs.
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 10.1.1.10
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:44 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: (VI_1) Sending/queueing gratuitous ARPs on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
4月 20 13:00:49 Backup Keepalived_vrrp[10757]: Sending gratuitous ARP on ens160 for 10.1.1.10
可以看到 Backup 接收 advertisement 报文超时,重新接管并成为 Master,且客户端使用 10.1.1.10 依旧能够正常访问。
当 Master 从故障中恢复过来,由于存在抢占模式(这是默认的),而且 priority 值大,于是把 Master 抢过来。