haproxy bug: healthcheck not passing after port change when statefile is enabled

Discussion:

Sven Wiltink

2018-06-12 15:01:18 UTC

Hello,

There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:

Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports

create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID

create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000

/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000

start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80 check inter 2000

If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.

Thanks in advance,
Sven

Tim Düsterhus

2018-06-12 15:19:50 UTC

Permalink

Sven,

Post by Sven Wiltink
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID

While this would not have an effect on your issue I suggest specifying

Post by Sven Wiltink
# /lib/systemd/system/haproxy.service.d/state.conf
[Service]
RuntimeDirectory=haproxy
ExecReload=
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state |nc -U /var/run/haproxy/admin.sock > /run/haproxy/global-state"
ExecReload=/bin/kill -USR2 $MAINPID

The RuntimeDirectory would automatically be cleaned on `restart` /
`stop`, but not on a `reload`. If this is not wanted take a look at
`RuntimeDirectoryPreserve`
(https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectoryPreserve=).

Best regards
Tim Düsterhus

Sven Wiltink

2018-06-25 10:55:02 UTC

Permalink

Hello,

So we've dug a little deeper and the issue seems to be caused by the port value in the statefile. When the target port of a server has changed between reloads the port specified in the state file is leading. When running tcpdump you can see the healthchecks are being performed for the old port. After stopping haproxy and removing the statefile the healthcheck is performed for the right port. When manually editing the statefile to a random port the healthchecks will be performed for that port instead of the one specified by the config.

The code responsible for this is line http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931

from commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4.

A solution would be invalidating the state when the ports don't match.

-Sven

________________________________
Van: Sven Wiltink
Verzonden: dinsdag 12 juni 2018 17:01:18
Aan: ***@formilux.org
Onderwerp: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hello,

There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:

Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports

create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID

create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000

/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000

start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80 check inter 2000

If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.

Thanks in advance,
Sven

Baptiste

2018-07-03 09:38:14 UTC

Permalink

Hi Sven,

Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.

Just to ensure I'm going to troubleshoot the right issue, could you please
summarize how you trigger this issue in a few simple steps?
IE:
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead

Baptiste

Post by Sven Wiltink
Hello,
So we've dug a little deeper and the issue seems to be caused by the port
value in the statefile. When the target port of a server has changed
between reloads the port specified in the state file is leading. When
running tcpdump you can see the healthchecks are being performed for the
old port. After stopping haproxy and removing the statefile the healthcheck
is performed for the right port. When manually editing the statefile to a
random port the healthchecks will be performed for that port instead of the
one specified by the config.
The code responsible for this is line http://git.haproxy.org/?p=
haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f52
98760cf582;hb=HEAD#l2931
from commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=
3169471964fdc49963e63f68c1fd88686821a0c4.
A solution would be invalidating the state when the ports don't match.
-Sven
------------------------------
*Van:* Sven Wiltink
*Verzonden:* dinsdag 12 juni 2018 17:01:18
*Onderwerp:* haproxy bug: healthcheck not passing after port change when
statefile is enabled
Hello,
There seems to be a bug in the loading of state files after a
configuration change. When changing the destination port of a server the
healthchecks never start passing if the state before the reload was down.
This bug has been introduced after 1.7.9 as we cannot reproduce it on
machines running that version of haproxy. You can use the following steps
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat
/var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now
see that the servers for banaan-443-ipv4 are marked as down, as expected
(nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again.
banaan-443-ipv4 will still be marked as down, although it uses the same
healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80
check inter 2000
If we now stop haproxy and delete the statefile located at
/var/run/haproxy/state/test and start haproxy again the server will be
marked as up.
Thanks in advance,
Sven

Sven Wiltink

2018-07-03 13:41:41 UTC

Permalink

Hey Baptiste,

Thank you for looking into it.

The bug is triggered by running haproxy with the following config:

global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000

- Then start haproxy (it will do healthchecks to port 443)
- change server banaan-vps 127.0.0.1:443 check inter 2000 to server banaan-vps 127.0.0.1:80 check inter 2000
- save the state using /bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test" (this is normally done using the systemd file on reload, see initial mail)
- reload haproxy (it still does healthchecks to port 443 while port 80 was expected)

if you delete the statefile and reload haproxy it will start healthchecks for port 80 as expected

-Sven

________________________________
Van: Baptiste <***@gmail.com>
Verzonden: dinsdag 3 juli 2018 11:38:14
Aan: Sven Wiltink
CC: ***@formilux.org
Onderwerp: Re: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hi Sven,

Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.

Just to ensure I'm going to troubleshoot the right issue, could you please summarize how you trigger this issue in a few simple steps?
IE:
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead

Baptiste

On Mon, Jun 25, 2018 at 12:55 PM, Sven Wiltink <***@transip.nl<mailto:***@transip.nl>> wrote:

Hello,

So we've dug a little deeper and the issue seems to be caused by the port value in the statefile. When the target port of a server has changed between reloads the port specified in the state file is leading. When running tcpdump you can see the healthchecks are being performed for the old port. After stopping haproxy and removing the statefile the healthcheck is performed for the right port. When manually editing the statefile to a random port the healthchecks will be performed for that port instead of the one specified by the config.

The code responsible for this is line http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931

from commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4.

A solution would be invalidating the state when the ports don't match.

-Sven

________________________________
Van: Sven Wiltink
Verzonden: dinsdag 12 juni 2018 17:01:18
Aan: ***@formilux.org<mailto:***@formilux.org>
Onderwerp: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hello,

There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:

Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports

create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID

create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443<http://127.0.0.1:443> check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.

Thanks in advance,
Sven

Baptiste

2018-07-12 12:52:24 UTC

Permalink

Hi Sven,

Thanks for the clarification.
It's a bit more complicated than what it is supposed to be.
I think we may want to apply the port only if it has been changed at
runtime (changed by DNS SRV records).

The status is the following: I have a pending patch which brings SRV record
information into the state file. (WIP, but last mile)
Once it has been merged, we'll be able to fix this issue (by applying the
port only when the server is being managed by an SRV record).

Baptiste

Post by Sven Wiltink
Hey Baptiste,
Thank you for looking into it.
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
- Then start haproxy (it will do healthchecks to port 443)
- change server banaan-vps 127.0.0.1:443 check inter 2000 to server
banaan-vps 127.0.0.1:80 check inter 2000
- save the state using /bin/sh -c "echo show servers state |
/usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
(this is normally done using the systemd file on reload, see initial mail)
- reload haproxy (it still does healthchecks to port 443 while port 80 was expected)
if you delete the statefile and reload haproxy it will start healthchecks
for port 80 as expected
-Sven
------------------------------
*Verzonden:* dinsdag 3 juli 2018 11:38:14
*Aan:* Sven Wiltink
*Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
when statefile is enabled
Hi Sven,
Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.
Just to ensure I'm going to troubleshoot the right issue, could you please
summarize how you trigger this issue in a few simple steps?
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead
Baptiste
Hello,
So we've dug a little deeper and the issue seems to be caused by the port
value in the statefile. When the target port of a server has changed
between reloads the port specified in the state file is leading. When
running tcpdump you can see the healthchecks are being performed for the
old port. After stopping haproxy and removing the statefile the healthcheck
is performed for the right port. When manually editing the statefile to a
random port the healthchecks will be performed for that port instead of the
one specified by the config.
The code responsible for this is line http://git.haproxy.org/?p=hapr
oxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1
928f5298760cf582;hb=HEAD#l2931
from commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=316
9471964fdc49963e63f68c1fd88686821a0c4.
A solution would be invalidating the state when the ports don't match.
-Sven
------------------------------
*Van:* Sven Wiltink
*Verzonden:* dinsdag 12 juni 2018 17:01:18
*Onderwerp:* haproxy bug: healthcheck not passing after port change when
statefile is enabled
Hello,
There seems to be a bug in the loading of state files after a
configuration change. When changing the destination port of a server the
healthchecks never start passing if the state before the reload was down.
This bug has been introduced after 1.7.9 as we cannot reproduce it on
machines running that version of haproxy. You can use the following steps
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.se
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat
/var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now
see that the servers for banaan-443-ipv4 are marked as down, as expected
(nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again.
banaan-443-ipv4 will still be marked as down, although it uses the same
healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80
check inter 2000
If we now stop haproxy and delete the statefile located at
/var/run/haproxy/state/test and start haproxy again the server will be
marked as up.
Thanks in advance,
Sven

Sven Wiltink

2018-10-09 08:51:58 UTC

Permalink

Hey Baptiste,

We noticed the SRV patch has been merged. That should mean that we can now fix this issue as well. Would you be able to fix this or should we

try to provide a patch?

Thanks again in advance,

Sven

________________________________
Van: Baptiste <***@gmail.com>
Verzonden: donderdag 12 juli 2018 14:52:24
Aan: Sven Wiltink
CC: ***@formilux.org
Onderwerp: Re: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hi Sven,

Thanks for the clarification.
It's a bit more complicated than what it is supposed to be.
I think we may want to apply the port only if it has been changed at runtime (changed by DNS SRV records).

The status is the following: I have a pending patch which brings SRV record information into the state file. (WIP, but last mile)
Once it has been merged, we'll be able to fix this issue (by applying the port only when the server is being managed by an SRV record).

Baptiste

On Tue, Jul 3, 2018 at 3:41 PM, Sven Wiltink <***@transip.nl<mailto:***@transip.nl>> wrote:

Hey Baptiste,

Thank you for looking into it.

The bug is triggered by running haproxy with the following config:

global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443<http://127.0.0.1:443> check inter 2000

- Then start haproxy (it will do healthchecks to port 443)
- change server banaan-vps 127.0.0.1:443<http://127.0.0.1:443> check inter 2000 to server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000
- save the state using /bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test" (this is normally done using the systemd file on reload, see initial mail)
- reload haproxy (it still does healthchecks to port 443 while port 80 was expected)

if you delete the statefile and reload haproxy it will start healthchecks for port 80 as expected

-Sven

________________________________
Van: Baptiste <***@gmail.com<mailto:***@gmail.com>>
Verzonden: dinsdag 3 juli 2018 11:38:14
Aan: Sven Wiltink
CC: ***@formilux.org<mailto:***@formilux.org>
Onderwerp: Re: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hi Sven,

Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.

Just to ensure I'm going to troubleshoot the right issue, could you please summarize how you trigger this issue in a few simple steps?
IE:
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead

Baptiste

On Mon, Jun 25, 2018 at 12:55 PM, Sven Wiltink <***@transip.nl<mailto:***@transip.nl>> wrote:

Hello,

So we've dug a little deeper and the issue seems to be caused by the port value in the statefile. When the target port of a server has changed between reloads the port specified in the state file is leading. When running tcpdump you can see the healthchecks are being performed for the old port. After stopping haproxy and removing the statefile the healthcheck is performed for the right port. When manually editing the statefile to a random port the healthchecks will be performed for that port instead of the one specified by the config.

The code responsible for this is line http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931

from commit http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4.

A solution would be invalidating the state when the ports don't match.

-Sven

________________________________
Van: Sven Wiltink
Verzonden: dinsdag 12 juni 2018 17:01:18
Aan: ***@formilux.org<mailto:***@formilux.org>
Onderwerp: haproxy bug: healthcheck not passing after port change when statefile is enabled

Hello,

There seems to be a bug in the loading of state files after a configuration change. When changing the destination port of a server the healthchecks never start passing if the state before the reload was down. This bug has been introduced after 1.7.9 as we cannot reproduce it on machines running that version of haproxy. You can use the following steps to reproduce the issue:

Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports

create a systemd file /etc/systemd/system/haproxy.se<http://haproxy.se>rvice.d/60-haproxy-server_state.conf with the following contents:
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID

create the following files:
/etc/haproxy/haproxy.cfg.disabled:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

/etc/haproxy/haproxy.cfg.different-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

/etc/haproxy/haproxy.cfg.same-port:
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure

defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks

listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE

listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443<http://127.0.0.1:443> check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now see that the servers for banaan-443-ipv4 are marked as down, as expected (nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again. banaan-443-ipv4 will still be marked as down, although it uses the same healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80<http://127.0.0.1:80> check inter 2000

If we now stop haproxy and delete the statefile located at /var/run/haproxy/state/test and start haproxy again the server will be marked as up.

Thanks in advance,
Sven

Baptiste

2018-11-04 18:11:53 UTC

Permalink

Hi Sven,

I reviewed the whole thing and I think the support of port in state file
was added for SRV records, but also for the runtime api, which allows
changing the port at runtime too.
I'll come back to you shortly with a fix for this behavior, currently
discussing with Willy/Fred about it.
(it's more complicated than moving the code
"""
if (port_str)
srv->svc_port = port;
"""
a couple of lines above).

Baptiste

Post by Sven Wiltink
Hey Baptiste,
We noticed the SRV patch has been merged. That should mean that we can now
fix this issue as well. Would you be able to fix this or should we
try to provide a patch?
Thanks again in advance,
Sven
------------------------------
*Verzonden:* donderdag 12 juli 2018 14:52:24
*Aan:* Sven Wiltink
*Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
when statefile is enabled
Hi Sven,
Thanks for the clarification.
It's a bit more complicated than what it is supposed to be.
I think we may want to apply the port only if it has been changed at
runtime (changed by DNS SRV records).
The status is the following: I have a pending patch which brings SRV
record information into the state file. (WIP, but last mile)
Once it has been merged, we'll be able to fix this issue (by applying the
port only when the server is being managed by an SRV record).
Baptiste
Hey Baptiste,
Thank you for looking into it.
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
- Then start haproxy (it will do healthchecks to port 443)
- change server banaan-vps 127.0.0.1:443 check inter 2000 to server
banaan-vps 127.0.0.1:80 check inter 2000
- save the state using /bin/sh -c "echo show servers state |
/usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
(this is normally done using the systemd file on reload, see initial mail)
- reload haproxy (it still does healthchecks to port 443 while port 80 was expected)
if you delete the statefile and reload haproxy it will start healthchecks
for port 80 as expected
-Sven
------------------------------
*Verzonden:* dinsdag 3 juli 2018 11:38:14
*Aan:* Sven Wiltink
*Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
when statefile is enabled
Hi Sven,
Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.
Just to ensure I'm going to troubleshoot the right issue, could you please
summarize how you trigger this issue in a few simple steps?
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead
Baptiste
Hello,
So we've dug a little deeper and the issue seems to be caused by the port
value in the statefile. When the target port of a server has changed
between reloads the port specified in the state file is leading. When
running tcpdump you can see the healthchecks are being performed for the
old port. After stopping haproxy and removing the statefile the healthcheck
is performed for the right port. When manually editing the statefile to a
random port the healthchecks will be performed for that port instead of the
one specified by the config.
The code responsible for this is line
http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931
from commit
http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4
.
A solution would be invalidating the state when the ports don't match.
-Sven
------------------------------
*Van:* Sven Wiltink
*Verzonden:* dinsdag 12 juni 2018 17:01:18
*Onderwerp:* haproxy bug: healthcheck not passing after port change when
statefile is enabled
Hello,
There seems to be a bug in the loading of state files after a
configuration change. When changing the destination port of a server the
healthchecks never start passing if the state before the reload was down.
This bug has been introduced after 1.7.9 as we cannot reproduce it on
machines running that version of haproxy. You can use the following steps
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat
/var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1 user
haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now
see that the servers for banaan-443-ipv4 are marked as down, as expected
(nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy again.
banaan-443-ipv4 will still be marked as down, although it uses the same
healthcheck as the port 80 configuration: server banaan-vps 127.0.0.1:80
check inter 2000
If we now stop haproxy and delete the statefile located at
/var/run/haproxy/state/test and start haproxy again the server will be
marked as up.
Thanks in advance,
Sven

Baptiste

2018-11-06 22:53:11 UTC

Permalink

Hi,

After debriefing internally, the fix will be much longer and may even
trigger a new server-state file format.
I keep you updated.

Baptiste

Post by Baptiste
Hi Sven,
I reviewed the whole thing and I think the support of port in state file
was added for SRV records, but also for the runtime api, which allows
changing the port at runtime too.
I'll come back to you shortly with a fix for this behavior, currently
discussing with Willy/Fred about it.
(it's more complicated than moving the code
"""
if (port_str)
srv->svc_port = port;
"""
a couple of lines above).
Baptiste

Post by Sven Wiltink
Hey Baptiste,
We noticed the SRV patch has been merged. That should mean that we can
now fix this issue as well. Would you be able to fix this or should we
try to provide a patch?
Thanks again in advance,
Sven
------------------------------
*Verzonden:* donderdag 12 juli 2018 14:52:24
*Aan:* Sven Wiltink
*Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
when statefile is enabled
Hi Sven,
Thanks for the clarification.
It's a bit more complicated than what it is supposed to be.
I think we may want to apply the port only if it has been changed at
runtime (changed by DNS SRV records).
The status is the following: I have a pending patch which brings SRV
record information into the state file. (WIP, but last mile)
Once it has been merged, we'll be able to fix this issue (by applying the
port only when the server is being managed by an SRV record).
Baptiste
Hey Baptiste,
Thank you for looking into it.
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1
user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
- Then start haproxy (it will do healthchecks to port 443)
- change server banaan-vps 127.0.0.1:443 check inter 2000 to server
banaan-vps 127.0.0.1:80 check inter 2000
- save the state using /bin/sh -c "echo show servers state |
/usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
(this is normally done using the systemd file on reload, see initial mail)
- reload haproxy (it still does healthchecks to port 443 while port 80 was expected)
if you delete the statefile and reload haproxy it will start healthchecks
for port 80 as expected
-Sven
------------------------------
*Verzonden:* dinsdag 3 juli 2018 11:38:14
*Aan:* Sven Wiltink
*Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
when statefile is enabled
Hi Sven,
Thanks a lot for your feedback!
I'll check how we could handle this use case with the state file.
Just to ensure I'm going to troubleshoot the right issue, could you
please summarize how you trigger this issue in a few simple steps?
- conf v1, server port is X
- generate server state (where port is X)
- update conf to v2, where port is Y
reload HAProxy => X is applied, while you expect to get Y instead
Baptiste
Hello,
So we've dug a little deeper and the issue seems to be caused by the port
value in the statefile. When the target port of a server has changed
between reloads the port specified in the state file is leading. When
running tcpdump you can see the healthchecks are being performed for the
old port. After stopping haproxy and removing the statefile the healthcheck
is performed for the right port. When manually editing the statefile to a
random port the healthchecks will be performed for that port instead of the
one specified by the config.
The code responsible for this is line
http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931
from commit
http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4
.
A solution would be invalidating the state when the ports don't match.
-Sven
------------------------------
*Van:* Sven Wiltink
*Verzonden:* dinsdag 12 juni 2018 17:01:18
*Onderwerp:* haproxy bug: healthcheck not passing after port change when
statefile is enabled
Hello,
There seems to be a bug in the loading of state files after a
configuration change. When changing the destination port of a server the
healthchecks never start passing if the state before the reload was down.
This bug has been introduced after 1.7.9 as we cannot reproduce it on
machines running that version of haproxy. You can use the following steps
Start with a fresh debian 9 install
install socat
install haproxy 1.8.9 from backports
create a systemd file /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf
[Service]
ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
ExecReload=
ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat
/var/run/haproxy.sock - > /var/run/haproxy/state/test"
ExecReload=/bin/kill -USR2 $MAINPID
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1
user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1
user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
global
maxconn 32000
tune.maxrewrite 2048
user haproxy
group haproxy
daemon
chroot /var/lib/haproxy
nbproc 1
maxcompcpuusage 85
spread-checks 0
stats socket /var/run/haproxy.sock mode 600 level admin process 1
user haproxy group haproxy
server-state-file test
server-state-base /var/run/haproxy/state
master-worker no-exit-on-failure
defaults
load-server-state-from-file global
log global
timeout http-request 5s
timeout connect 2s
timeout client 300s
timeout server 300s
mode http
option dontlog-normal
option http-server-close
option redispatch
option log-health-checks
listen stats
bind :1936
bind-process 1
mode http
stats enable
stats uri /
stats admin if TRUE
listen banaan-443-ipv4
bind :443
mode tcp
server banaan-vps 127.0.0.1:443 check inter 2000
listen banaan-80-ipv4
bind :80
mode tcp
server banaan-vps 127.0.0.1:80 check inter 2000
start a netcat process to fake a webserver: nc -klp 80
cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now
see that the servers for banaan-443-ipv4 are marked as down, as expected
(nothing is running on port 443).
Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy
again. banaan-443-ipv4 will still be marked as down, although it uses the
same healthcheck as the port 80 configuration: server banaan-vps
127.0.0.1:80 check inter 2000
If we now stop haproxy and delete the statefile located at
/var/run/haproxy/state/test and start haproxy again the server will be
marked as up.
Thanks in advance,
Sven