Few problems seen in haproxy? (threads, connections).

Discussion:

Krishna Kumar (Engineering)

2018-10-02 15:48:19 UTC

Hi Willy, and community developers,

I am not sure if I am doing something wrong, but wanted to report
some issues that I am seeing. Please let me know if this is a problem.

1. HAProxy system:
Kernel: 4.17.13,
CPU: 48 core E5-2670 v3
Memory: 128GB memory
NIC: Mellanox 40g with IRQ pinning

2. Client, 48 core similar to server. Test command line:
wrk -c 4800 -t 48 -d 30s http://<IP:80>/128

3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout as
of
Oct 2nd).
# haproxy-git -vv
HA-Proxy version 1.9-dev3 2018/09/29
Copyright 2000-2018 Willy Tarreau <***@haproxy.org>

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
-Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
-Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.38 2015-11-23
Running on PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols markes as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE
<default> : mode=TCP|HTTP side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

4. HAProxy results for #processes and #threads
# Threads-RPS Procs-RPS
1 20903 19280
2 46400 51045
4 96587 142801
8 172224 254720
12 210451 437488
16 173034 437375
24 79069 519367
32 55607 586367
48 31739 596148

5. Lock stats for 1.9-dev3: Some write locks on average took a lot more time
to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
Stats about Lock FD:
# write lock : 143933900
# write unlock: 143933895 (-5)
# wait time for write : 11370.245 msec
# wait time for write/lock: 78.996 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ:
# write lock : 2062874
# write unlock: 2062875 (1)
# wait time for write : 7820.234 msec
# wait time for write/lock: 3790.941 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ:
# write lock : 2601227
# write unlock: 2601227 (0)
# wait time for write : 5019.811 msec
# wait time for write/lock: 1929.786 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock POOL:
# write lock : 2823393
# write unlock: 2823393 (0)
# wait time for write : 11984.706 msec
# wait time for write/lock: 4244.788 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER:
# write lock : 184
# write unlock: 184 (0)
# wait time for write : 0.011 msec
# wait time for write/lock: 60.554 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock PROXY:
# write lock : 291557
# write unlock: 291557 (0)
# wait time for write : 109.694 msec
# wait time for write/lock: 376.235 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock SERVER:
# write lock : 1188511
# write unlock: 1188511 (0)
# wait time for write : 854.171 msec
# wait time for write/lock: 718.690 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LBPRM:
# write lock : 1184709
# write unlock: 1184709 (0)
# wait time for write : 778.947 msec
# wait time for write/lock: 657.501 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock BUF_WQ:
# write lock : 669247
# write unlock: 669247 (0)
# wait time for write : 252.265 msec
# wait time for write/lock: 376.939 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock STRMS:
# write lock : 9335
# write unlock: 9335 (0)
# wait time for write : 0.910 msec
# wait time for write/lock: 97.492 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock VARS:
# write lock : 901947
# write unlock: 901947 (0)
# wait time for write : 299.224 msec
# wait time for write/lock: 331.753 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec

6. CPU utilization after test for processes/threads: haproxy-1.9-dev3 runs
at 4800% (48 cpus) for 30 seconds after the test is done. For 1.8.14,
this behavior was not seen. Ran the following command for both:
"ss -tnp | awk '{print $1}' | sort | uniq -c | sort -n"
1.8.14 during test:
451 SYN-SENT
9166 ESTAB
1.8.14 after test:
2 ESTAB

1.9-dev3 during test:
109 SYN-SENT
9400 ESTAB
1.9-dev3 after test:
2185 CLOSE-WAIT
2187 ESTAB
All connections that were in CLOSE-WAIT were from the client, while all
connections in ESTAB state were to the server. This lasted for 30
seconds.
On the client system, all sockets were in FIN-WAIT-2 state:
2186 FIN-WAIT-2
This (2185/2186) seems to imply that client closed the connection but
haproxy did not close the socket for 30 seconds. This also results in
high CPU utilization on haproxy for some reason (100% for each process
for 30 seconds), which is also unexpected as the remote side has closed
the
socket.

7. Configuration file for process mode:
global
daemon
maxconn 26000
nbproc 48
stats socket /var/run/ha-1-admin.sock mode 600 level admin process 1
# (and so on for 48 processes).

defaults
option http-keep-alive
balance leastconn
retries 2
option redispatch
maxconn 25000
option splice-response
option tcp-smart-accept
option tcp-smart-connect
option splice-auto
timeout connect 5000ms
timeout client 30000ms
timeout server 30000ms
timeout client-fin 30000ms
timeout http-request 10000ms
timeout http-keep-alive 75000ms
timeout queue 10000ms
timeout tarpit 15000ms

frontend fk-fe-upgrade-80
mode http
default_backend fk-be-upgrade
bind <VIP>:80 process 1
# (and so on for 48 processes).

backend fk-be-upgrade
mode http
default-server maxconn 2000 slowstart
# 58 server lines follow, e.g.: "server <name> <ip:80>"

8. Configuration file for thread mode:
global
daemon
maxconn 26000
stats socket /var/run/ha-1-admin.sock mode 600 level admin
nbproc 1
nbthread 48
# cpu-map auto:1/1-48 0-39

defaults
option http-keep-alive
balance leastconn
retries 2
option redispatch
maxconn 25000
option splice-response
option tcp-smart-accept
option tcp-smart-connect
option splice-auto
timeout connect 5000ms
timeout client 30000ms
timeout server 30000ms
timeout client-fin 30000ms
timeout http-request 10000ms
timeout http-keep-alive 75000ms
timeout queue 10000ms
timeout tarpit 15000ms

frontend fk-fe-upgrade-80
mode http
bind <VIP>:80 process 1/1-48
default_backend fk-be-upgrade

backend fk-be-upgrade
mode http
default-server maxconn 2000 slowstart
# 58 server lines follow, e.g.: "server <name> <ip:80>"

I had also captured 'perf' output for the system for thread vs processes,
can send it later if required.

Thanks,
- Krishna

Krishna Kumar (Engineering)

2018-10-04 03:12:53 UTC