Sven Wiltink
2018-06-12 15:01:18 UTC
Hello,
There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80 check inter 2000
If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.
Thanks in advance,
Sven
There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80 check inter 2000
If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.
Thanks in advance,
Sven