Discussion:
haproxy 1.5-dev24: 100% CPU Load or Core Dumped
(too old to reply)
John-Paul Bader
2014-05-06 07:01:21 UTC
Permalink
Hey,


I'm currently attempting to replace our commercial Loadbalancer with SSL
termination with haproxy. I'm running it on FreeBSD 9.2 Stable.

We have thousands of requests per second and for a while everything runs
extremely smooth. No queues are running full, machine load is at 0.5,
haproxy is at 2-3% CPU per process.

Then after an undetermined amount of time one process after another
(we're running with nbproc) is locked up with 100% CPU and does not
recover. Eventually all listen queues are filled to the max and I have
to kill and restart haproxy.

All I know so far is that it happens more frequently in peak traffic
times. Also I have tried to use the option httpclose because I thought
it was related to keep-alive issues. When I have tried this, instead of
locking up at 100%, haproxy suddenly exits with a core dump.

Below you will find our current haproxy configuration as well as some
sysctls we have tuned. haproxy 1.5-dev24 was installed from the FreeBSD
ports and is using kqueue.

Does anybody have a suggestion to track the issue down? The system
itself does not seem to have any problem until haproxy locks up. Like I
said before, after that, the listen queues are flooded because haproxy
does not accept any more requests.

Any feedback appreciated.

~ John

Machine:
2x Hexacore Xeon
32GB RAM
2x Dual Port Intel i350 Cards
FreeBSD 9.2 Stable
OpenSSL 1.0.1

haproxy:
global
log /var/run/log local0 notice
maxconn 20000
daemon
nbproc 12


defaults
mode http
retries 3
option redispatch
maxconn 20000
contimeout 5000
clitimeout 50000
srvtimeout 50000
log global
option dontlog-normal
option tcplog
option abortonclose
option forwardfor


frontend tracking_http
mode http
bind 0.0.0.0:80
option http-server-close
timeout http-keep-alive 5000
default_backend web-farm


frontend tracking_https
mode http
bind 0.0.0.0:443 ssl crt /usr/local/etc/ssl/combined.pem ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-RSA-RC4-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES128-SHA:AES256-SHA256:AES256-SHA:RC4-SHA:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:!EDH
reqadd X-Forwarded-Proto:\ https
option http-server-close
timeout http-keep-alive 5000
default_backend web-farm


backend web-farm
mode http
balance roundrobin
server app_1 10.5.1.161:80 cookie app1inst1 check inter 2000
rise 2 fall 5
server app_2 10.5.1.169:80 cookie app1inst2 check inter 2000
rise 2 fall 5
server app_3 10.5.1.177:80 cookie app1inst1 check inter 2000
rise 2 fall 5
server app_4 10.5.1.185:80 cookie app1inst2 check inter 2000
rise 2 fall 5


listen stats :1936
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth xxx:xxx


FreeBSD Sysctls:
net.inet.tcp.delayed_ack=0
net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.msl=5000
kern.ipc.nmbclusters=2039162
kern.ipc.nmbjumbop=1019581
net.inet.icmp.icmplim=1000
kern.ipc.somaxconn=4096
net.inet.tcp.experimental.initcwnd10=1
kern.ipc.soacceptqueue=1024
kern.maxfiles=100000
kern.maxfilesperproc=90000
hw.igb.rx_process_limit=4096
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Lukas Tribus
2014-05-06 10:02:59 UTC
Permalink
Hi,
Post by John-Paul Bader
I'm currently attempting to replace our commercial Loadbalancer with SSL
termination with haproxy. I'm running it on FreeBSD 9.2 Stable.
We have thousands of requests per second and for a while everything runs
extremely smooth. No queues are running full, machine load is at 0.5,
haproxy is at 2-3% CPU per process.
Then after an undetermined amount of time one process after another
(we're running with nbproc) is locked up with 100% CPU and does not
recover. Eventually all listen queues are filled to the max and I have
to kill and restart haproxy.
Very bad. You don't have strace on freebsd x64, correct? Can you downgrade
to dev22 in absence of a better advise?
Post by John-Paul Bader
All I know so far is that it happens more frequently in peak traffic
times. Also I have tried to use the option httpclose because I thought
it was related to keep-alive issues. When I have tried this, instead of
locking up at 100%, haproxy suddenly exits with a core dump.
Can you provide backtraces or executable + coredump? I suggest you do
that privately to Willy, as it will contain your private data (like
certificates, etc).




Regards,

Lukas
Willy Tarreau
2014-05-06 21:35:02 UTC
Permalink
Hi,
Post by Lukas Tribus
Hi,
Post by John-Paul Bader
I'm currently attempting to replace our commercial Loadbalancer with SSL
termination with haproxy. I'm running it on FreeBSD 9.2 Stable.
We have thousands of requests per second and for a while everything runs
extremely smooth. No queues are running full, machine load is at 0.5,
haproxy is at 2-3% CPU per process.
Then after an undetermined amount of time one process after another
(we're running with nbproc) is locked up with 100% CPU and does not
recover. Eventually all listen queues are filled to the max and I have
to kill and restart haproxy.
Very bad. You don't have strace on freebsd x64, correct? Can you downgrade
to dev22 in absence of a better advise?
I keep some memories of another similar bad result on FreeBSD a while ago
that we didn't manage to troubleshoot, in part due to the extremely low
number of users, and in part due to the absence of strace which left us
even more blind.
Post by Lukas Tribus
Post by John-Paul Bader
All I know so far is that it happens more frequently in peak traffic
times. Also I have tried to use the option httpclose because I thought
it was related to keep-alive issues. When I have tried this, instead of
locking up at 100%, haproxy suddenly exits with a core dump.
Can you provide backtraces or executable + coredump? I suggest you do
that privately to Willy, as it will contain your private data (like
certificates, etc).
Could you also try to disable kqueue (start with -dk or use "nokqueue" in
the global section). That's one difference between FreeBSD and other OSes,
and at least it will tell us if the bug is outside of it or not. Please
bear in mind that you'll be running with poll() which scales much less
well with large connection counts.

Also, do you observe the problem only in multi-process mode or also with
a single process ?

Regards,
Willy
John-Paul Bader
2014-05-06 21:57:08 UTC
Permalink
Hey,

I will do more elaborate test runs in the next couple of days. I will
create traces with ktrace which is not as nice as strace but at least
will provide more context. Is there anything in particular you'd be
interested in like only syscalls?

Meanwhile I have build haproxy with debug symbols but in the tests I ran
today, haproxy did not coredump but only went for the 100% CPU way of
failing where I had to kill it manually. This happened with httpclose
and with keep-alive so I'd say the problem is not really related to that.

Its so sad because before the CPU load suddenly risees, and
requests/connections aren't handled anymore haproxy performs so well and
effortless.

Also, if I can help by providing access to a FreeBSD machine, just let
me know. I have plenty :)

If you have any other idea apart from ktrace, coredumps to make
troubleshooting more effective I'd be more than happy to help.

Kind regards, John
Post by Willy Tarreau
Hi,
Post by Lukas Tribus
Hi,
Post by John-Paul Bader
I'm currently attempting to replace our commercial Loadbalancer with SSL
termination with haproxy. I'm running it on FreeBSD 9.2 Stable.
We have thousands of requests per second and for a while everything runs
extremely smooth. No queues are running full, machine load is at 0.5,
haproxy is at 2-3% CPU per process.
Then after an undetermined amount of time one process after another
(we're running with nbproc) is locked up with 100% CPU and does not
recover. Eventually all listen queues are filled to the max and I have
to kill and restart haproxy.
Very bad. You don't have strace on freebsd x64, correct? Can you downgrade
to dev22 in absence of a better advise?
I keep some memories of another similar bad result on FreeBSD a while ago
that we didn't manage to troubleshoot, in part due to the extremely low
number of users, and in part due to the absence of strace which left us
even more blind.
Post by Lukas Tribus
Post by John-Paul Bader
All I know so far is that it happens more frequently in peak traffic
times. Also I have tried to use the option httpclose because I thought
it was related to keep-alive issues. When I have tried this, instead of
locking up at 100%, haproxy suddenly exits with a core dump.
Can you provide backtraces or executable + coredump? I suggest you do
that privately to Willy, as it will contain your private data (like
certificates, etc).
Could you also try to disable kqueue (start with -dk or use "nokqueue" in
the global section). That's one difference between FreeBSD and other OSes,
and at least it will tell us if the bug is outside of it or not. Please
bear in mind that you'll be running with poll() which scales much less
well with large connection counts.
Also, do you observe the problem only in multi-process mode or also with
a single process ?
Regards,
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-07 05:49:32 UTC
Permalink
Hi John-Paul,
Post by John-Paul Bader
Hey,
I will do more elaborate test runs in the next couple of days.
No problem.
Post by John-Paul Bader
I will
create traces with ktrace which is not as nice as strace but at least
will provide more context. Is there anything in particular you'd be
interested in like only syscalls?
I tend to think that syscalls should tell us what's happening. Indeed,
FreeBSD and Linux are both modern operating systems and quite close,
so in general, what works on one of them works on the other one without
any difficulty. The only differences here might be :
- kqueue vs epoll
- specific return value of a syscall that we don't handle properly
(eg: we had a few ENOTCONN vs EAGAIN issues in the past)
Post by John-Paul Bader
Meanwhile I have build haproxy with debug symbols but in the tests I ran
today, haproxy did not coredump but only went for the 100% CPU way of
failing where I had to kill it manually. This happened with httpclose
and with keep-alive so I'd say the problem is not really related to that.
I'm not surprized. If the OS makes a difference, it's in the lower layers,
so what close vs keep-alive may do is only hint the problem to happen more
often.

What I'm thinking about is that it's possible that we don't always properly
consider an error on a file descriptor, then we don't remove it properly
from the list of polled FDs, and that it might be returned by the poller
as active when we think it's closed. At this point, everything can happen :
- loop forever because we get an error when trying to access this fd
and we can't remove it from the polled fd list ;
- crash when we try to dereference the connection which is attached
to this fd.
Post by John-Paul Bader
Its so sad because before the CPU load suddenly risees, and
requests/connections aren't handled anymore haproxy performs so well and
effortless.
Also, if I can help by providing access to a FreeBSD machine, just let
me know. I have plenty :)
At some point it could be useful, especially if we manage to reproduce
the problem on a test platform.
Post by John-Paul Bader
If you have any other idea apart from ktrace, coredumps to make
troubleshooting more effective I'd be more than happy to help.
There's something you can try to see if it's related to what I suspect
above. If you apply this patch and it crashes earlier, it definitely
means that we're having a problem with an fd which is reported after
being closed :

diff --git a/src/connection.c b/src/connection.c
index 1483f18..27bb6c5 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -44,7 +44,7 @@ int conn_fd_handler(int fd)
unsigned int flags;

if (unlikely(!conn))
- return 0;
+ abort();

conn_refresh_polling_flags(conn);
flags = conn->flags & ~CO_FL_ERROR; /* ensure to call the wake handler upon error */

If this happens, retry without kqueue, it will use poll and the issue
should not appear, or we have a bigger bug.

Regards,
Willy
John-Paul Bader
2014-05-07 07:22:32 UTC
Permalink
Hey Willy,


this morning I was running another test without kqueue but sadly with
the same result. Here is my test protocol:

Running fine with nokqueue for about an hour at about 20% CPU per
process, then sudden CPU spike on all processes up to 90%, I started
ktrace but meanwhile the CPU went back to around 33% on each process
[2]. Then after 10 more minutes 3 of the 8 haproxy processes died with a
segfault.

kernel: pid 3963 (haproxy), uid 0: exited on signal 11 (core dumped)

Unfortunately the coredump [1] is not that expressive even with compiled
debug symbols.

The remaining 5 processes survived another 10 minutes before they ramped
up cpu again - this time up to 100%. I have created a ktrace in this
state before reaching the 100%. [3]

When they were at 100% and not accepting any requests anymore I took
another ktrace sample but _nothing_ was written to the output anymore!
That means in this state no syscalls where happening anymore? I also
took a full ktrace sample with IO and everything - it was empty as well.

So it seems unrelated to kqueue as well. Later I will try to run the
test with a fraction of the traffic without nbproc (all the traffic is
too much for one process)

You can find the coredump and the traces here:

[1] http://smyck.org/haproxy/haproxy_coredump_freebsd_poll.txt
[2] http://smyck.org/haproxy/haproxy_ktrace_poll_01_30_percent.txt
[3]
http://smyck.org/haproxy/haproxy_krace_poll_remaining_processes_ramping_up_cpu.txt

I hope that already helps to narrow it down a bit.

Kind regards,

John
Post by Willy Tarreau
Hi John-Paul,
Post by John-Paul Bader
Hey,
I will do more elaborate test runs in the next couple of days.
No problem.
Post by John-Paul Bader
I will
create traces with ktrace which is not as nice as strace but at least
will provide more context. Is there anything in particular you'd be
interested in like only syscalls?
I tend to think that syscalls should tell us what's happening. Indeed,
FreeBSD and Linux are both modern operating systems and quite close,
so in general, what works on one of them works on the other one without
- kqueue vs epoll
- specific return value of a syscall that we don't handle properly
(eg: we had a few ENOTCONN vs EAGAIN issues in the past)
Post by John-Paul Bader
Meanwhile I have build haproxy with debug symbols but in the tests I ran
today, haproxy did not coredump but only went for the 100% CPU way of
failing where I had to kill it manually. This happened with httpclose
and with keep-alive so I'd say the problem is not really related to that.
I'm not surprized. If the OS makes a difference, it's in the lower layers,
so what close vs keep-alive may do is only hint the problem to happen more
often.
What I'm thinking about is that it's possible that we don't always properly
consider an error on a file descriptor, then we don't remove it properly
from the list of polled FDs, and that it might be returned by the poller
- loop forever because we get an error when trying to access this fd
and we can't remove it from the polled fd list ;
- crash when we try to dereference the connection which is attached
to this fd.
Post by John-Paul Bader
Its so sad because before the CPU load suddenly risees, and
requests/connections aren't handled anymore haproxy performs so well and
effortless.
Also, if I can help by providing access to a FreeBSD machine, just let
me know. I have plenty :)
At some point it could be useful, especially if we manage to reproduce
the problem on a test platform.
Post by John-Paul Bader
If you have any other idea apart from ktrace, coredumps to make
troubleshooting more effective I'd be more than happy to help.
There's something you can try to see if it's related to what I suspect
above. If you apply this patch and it crashes earlier, it definitely
means that we're having a problem with an fd which is reported after
diff --git a/src/connection.c b/src/connection.c
index 1483f18..27bb6c5 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -44,7 +44,7 @@ int conn_fd_handler(int fd)
unsigned int flags;
if (unlikely(!conn))
- return 0;
+ abort();
conn_refresh_polling_flags(conn);
flags = conn->flags& ~CO_FL_ERROR; /* ensure to call the wake handler upon error */
If this happens, retry without kqueue, it will use poll and the issue
should not appear, or we have a bigger bug.
Regards,
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-07 07:57:05 UTC
Permalink
Hi John-Paul,
Post by John-Paul Bader
Hey Willy,
this morning I was running another test without kqueue but sadly with
the same result.
OK so let's rule out any possible kqueue issue there for now.
Post by John-Paul Bader
Running fine with nokqueue for about an hour at about 20% CPU per
process, then sudden CPU spike on all processes up to 90%, I started
ktrace but meanwhile the CPU went back to around 33% on each process
[2]. Then after 10 more minutes 3 of the 8 haproxy processes died with a
segfault.
kernel: pid 3963 (haproxy), uid 0: exited on signal 11 (core dumped)
Unfortunately the coredump [1] is not that expressive even with compiled
debug symbols.
It's very interesting, it contains a call to ssl_update_cache(). I didn't
know you were using SSL, but in multi-process mode we have the shared context
model to share the SSL sessions between processes. On Linux, we almost only
use futexes. On other systems, we use mutexes. So that's a difference. It
might be possible that we have a bug in the mutex implementation causing
various effects.

You could try to rebuild with the private cache mode, but it will be a bit
more complicated, because if you have a high load, I guess you want to keep
your users sessions. So you'll probably need to have one front shared process
running in TCP mode and distributing the load to the SSL processes according
to the SSL ID, in order to maintain stickiness between users and processes.

The fact that you have no symbols in your gdb output indicates that the
crash very likely happens inside libssl, maybe it retrieves some crap from
the session cache that it cannot reliably deal with.
Post by John-Paul Bader
The remaining 5 processes survived another 10 minutes before they ramped
up cpu again - this time up to 100%. I have created a ktrace in this
state before reaching the 100%. [3]
When they were at 100% and not accepting any requests anymore I took
another ktrace sample but _nothing_ was written to the output anymore!
That could indicate an attempt to acquire a lock in loops, or simply
that the code is looping in userspace due to a side effect of some memory
corruption consecutive to the bug for example.
Post by John-Paul Bader
That means in this state no syscalls where happening anymore? I also
took a full ktrace sample with IO and everything - it was empty as well.
Oh and BTW, I can confirm that ktrace is really poor compared to strace :-)
Post by John-Paul Bader
So it seems unrelated to kqueue as well. Later I will try to run the
test with a fraction of the traffic without nbproc (all the traffic is
too much for one process)
That would be great! You can try to build with "USE_PRIVATE_CACHE=1" in
order to disable session sharing.

If you have plenty of clients, you can first try to spread the load between
processes using a simple source hash :

listen front
bind-process 1
bind pub_ip:443
balance source
server proc2 127.0.0.2:443 send-proxy
server proc3 127.0.0.3:443 send-proxy
server proc4 127.0.0.4:443 send-proxy
server proc5 127.0.0.5:443 send-proxy
server proc6 127.0.0.6:443 send-proxy
server proc7 127.0.0.7:443 send-proxy
server proc8 127.0.0.8:443 send-proxy

frontend proc2
bind-process 2
bind 127.0.0.2:443 ssl crt ... accept-proxy
... usual stuff

frontend proc3
bind-process 3
bind 127.0.0.3:443 ssl crt ... accept-proxy
... usual stuff

etc.. till process 8.

It's much easier than dealing with SSL ID and might be done with
less adjustments to your existing configuration. And that way you
don't need to share any SSL context between your processes.

Please tell me if you need some help to try to set up something like this.

Willy
John-Paul Bader
2014-05-07 08:28:18 UTC
Permalink
Post by Willy Tarreau
It's very interesting, it contains a call to ssl_update_cache(). I didn't
know you were using SSL, but in multi-process mode we have the shared context
model to share the SSL sessions between processes.
Yes, sorry. In the initial email on this thread I posted our
configuration which included the SSL setup.

We're using OpenSSL 1.0.1g 7 Apr 2014 to benefit from the AES-NI
acceleration.
Post by Willy Tarreau
Oh and BTW, I can confirm that ktrace is really poor compared to strace :-)
haproxy does not include DTrace probes by any chance right? :)
Post by Willy Tarreau
Post by John-Paul Bader
So it seems unrelated to kqueue as well. Later I will try to run the
test with a fraction of the traffic without nbproc (all the traffic is
too much for one process)
That would be great! You can try to build with "USE_PRIVATE_CACHE=1" in
order to disable session sharing.
Right now I'm running a test just with disabled nbproc. Next I will try
to recompile with "USE_PRIVATE_CACHE=1"

Do I have to pass that option like this:

make CFLAGS="-g -O0" USE_PRIVATE_CACHE=1 ?

These are our current build options - for completeness:

haproxy -vv
HA-Proxy version 1.5-dev24-8860dcd 2014/04/26
Copyright 2000-2014 Willy Tarreau <***@1wt.eu>

Build options :
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -g -O0 -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
Running on OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.34 2013-12-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY

Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.

Kind regards,

John
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-07 09:15:21 UTC
Permalink
Post by John-Paul Bader
Post by Willy Tarreau
It's very interesting, it contains a call to ssl_update_cache(). I didn't
know you were using SSL, but in multi-process mode we have the shared context
model to share the SSL sessions between processes.
Yes, sorry. In the initial email on this thread I posted our
configuration which included the SSL setup.
Yes I remember having seen your config, but unfortunately I'm having
a hard time remembering all the configs I see during a single day, I'm
sorry.
Post by John-Paul Bader
We're using OpenSSL 1.0.1g 7 Apr 2014 to benefit from the AES-NI
acceleration.
OK.
Post by John-Paul Bader
Post by Willy Tarreau
Oh and BTW, I can confirm that ktrace is really poor compared to strace :-)
haproxy does not include DTrace probes by any chance right? :)
No, and I have no idea how this works either. But if you feel like it
can provide some value and be done without too much effort, feel free
to try :-)
Post by John-Paul Bader
Post by Willy Tarreau
Post by John-Paul Bader
So it seems unrelated to kqueue as well. Later I will try to run the
test with a fraction of the traffic without nbproc (all the traffic is
too much for one process)
That would be great! You can try to build with "USE_PRIVATE_CACHE=1" in
order to disable session sharing.
Right now I'm running a test just with disabled nbproc. Next I will try
to recompile with "USE_PRIVATE_CACHE=1"
Great.
Post by John-Paul Bader
make CFLAGS="-g -O0" USE_PRIVATE_CACHE=1 ?
Yes that's the principle. You can look at the makefile, all build options
are referenced at the top.
Post by John-Paul Bader
haproxy -vv
HA-Proxy version 1.5-dev24-8860dcd 2014/04/26
BTW, be careful, a few bugs introduced in dev23 on ACLs were fixed after dev24.
So with this version, "acl foo xxx -i yyy" will not work for example. Balance
url_param is broken as well. All of them are fixed in latest snapshot though.
Post by John-Paul Bader
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -g -O0 -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
Running on OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.34 2013-12-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.
OK, nothing unusual here. Thanks for the detailed output, it always
helps!

Best regards,
Willy
Emeric Brun
2014-05-07 14:24:49 UTC
Permalink
Post by Willy Tarreau
Post by John-Paul Bader
Post by Willy Tarreau
It's very interesting, it contains a call to ssl_update_cache(). I didn't
know you were using SSL, but in multi-process mode we have the shared context
model to share the SSL sessions between processes.
Yes, sorry. In the initial email on this thread I posted our
configuration which included the SSL setup.
Yes I remember having seen your config, but unfortunately I'm having
a hard time remembering all the configs I see during a single day, I'm
sorry.
Post by John-Paul Bader
We're using OpenSSL 1.0.1g 7 Apr 2014 to benefit from the AES-NI
acceleration.
OK.
Post by John-Paul Bader
Post by Willy Tarreau
Oh and BTW, I can confirm that ktrace is really poor compared to strace :-)
haproxy does not include DTrace probes by any chance right? :)
No, and I have no idea how this works either. But if you feel like it
can provide some value and be done without too much effort, feel free
to try :-)
Post by John-Paul Bader
Post by Willy Tarreau
Post by John-Paul Bader
So it seems unrelated to kqueue as well. Later I will try to run the
test with a fraction of the traffic without nbproc (all the traffic is
too much for one process)
That would be great! You can try to build with "USE_PRIVATE_CACHE=1" in
order to disable session sharing.
Right now I'm running a test just with disabled nbproc. Next I will try
to recompile with "USE_PRIVATE_CACHE=1"
Great.
Post by John-Paul Bader
make CFLAGS="-g -O0" USE_PRIVATE_CACHE=1 ?
Yes that's the principle. You can look at the makefile, all build options
are referenced at the top.
Post by John-Paul Bader
haproxy -vv
HA-Proxy version 1.5-dev24-8860dcd 2014/04/26
BTW, be careful, a few bugs introduced in dev23 on ACLs were fixed after dev24.
So with this version, "acl foo xxx -i yyy" will not work for example. Balance
url_param is broken as well. All of them are fixed in latest snapshot though.
Post by John-Paul Bader
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -g -O0 -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
Running on OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.34 2013-12-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.
OK, nothing unusual here. Thanks for the detailed output, it always
helps!
Best regards,
Willy
Hi All,

I suspect FreeBSD to not support process shared mutex (supported in both
linux and solaris).

I've just made a patch to add errors check on mutex init, and to
fallback on SSL private session cache in error case.

Could you try this patch and feedback to tell us if this warning appear:

"Unable to init lock for the shared SSL session cache. fallback to
private cache."

Regards,
Emeric
John-Paul Bader
2014-05-07 15:39:45 UTC
Permalink
Hey Emeric,


I have just consulted the Readme of the haproxy source and it says in
the OpenSSL section:

»The BSD and OSX makefiles do not support build options for OpenSSL nor
zlib. Also, at least on OpenBSD, pthread_mutexattr_setpshared() does not
exist so the SSL session cache cannot be shared between multiple
processes. If you want to enable these options, you need to use GNU make
with the default makefile as follows :«

I have just checked if pthread_mutexattr_setpshared is available in
FreeBSD and it does not seem to be the case. So maybe we're on the right
track here.

I will try to apply your patch and confirm this. Would that mean that we
have to use the solution proposed by Willy? Using a source hash to
balance to multiple ssl enabled frontends?

Kind regards,

John
John-Paul Bader
2014-05-07 16:14:13 UTC
Permalink
Ok,


I have just built haproxy with your patches like this:

gmake TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1

When trying to start haproxy it failed with:

[ALERT] 126/160108 (25333) : Unable to allocate SSL session cache.
[ALERT] 126/160108 (25333) : Fatal errors found in configuration.
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy

Without the patches, haproxy is starting.

The Buildinfo looks like this now:

./haproxy -vv
HA-Proxy version 1.5-dev24-07c8b24 2014/05/02
Copyright 2000-2014 Willy Tarreau <***@1wt.eu>

Build options :
TARGET = freebsd
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
Running on OpenSSL version : OpenSSL 1.0.1g 7 Apr 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.34 2013-12-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY

Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.


Kind regards,

John
Willy Tarreau
2014-05-07 20:06:17 UTC
Permalink
Hi John,
Post by John-Paul Bader
Ok,
gmake TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
[ALERT] 126/160108 (25333) : Unable to allocate SSL session cache.
[ALERT] 126/160108 (25333) : Fatal errors found in configuration.
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy
Without the patches, haproxy is starting.
(...)

That's a *very* good news!

Emeric is currently working on an alternate locking mechanism which
would work only using spinlocks and no pthreads. The principle is that
accesses to the cache are so rare (once or twice per SSL connection)
that there's almost never any collision (confirmed by the fact that it
requires seconds to minutes for a crash to happen) and it's worthless
to rely on heavy non-portable mechanisms when a simple hand-crafted
spinlock will do the job fine with no overhead.

So... stay tuned, I think Emeric will soon have something to propose you
to test :-)

Cheers,
Willy
John-Paul Bader
2014-05-07 20:09:28 UTC
Permalink
Woohoo - this sounds very good :)

Thanks in advance for your efforts - much appreciated!

Kind regards,

John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Ok,
gmake TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
[ALERT] 126/160108 (25333) : Unable to allocate SSL session cache.
[ALERT] 126/160108 (25333) : Fatal errors found in configuration.
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy
Without the patches, haproxy is starting.
(...)
That's a *very* good news!
Emeric is currently working on an alternate locking mechanism which
would work only using spinlocks and no pthreads. The principle is that
accesses to the cache are so rare (once or twice per SSL connection)
that there's almost never any collision (confirmed by the fact that it
requires seconds to minutes for a crash to happen) and it's worthless
to rely on heavy non-portable mechanisms when a simple hand-crafted
spinlock will do the job fine with no overhead.
So... stay tuned, I think Emeric will soon have something to propose you
to test :-)
Cheers,
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Emeric BRUN
2014-05-07 20:22:39 UTC
Permalink
My fix is broken, it should only show a warning and fallback on private
cache, i've just pointed the issue.

I will try to send you a workarounf patch soon.

Emeric

----------------original message-----------------
De: "John-Paul Bader" john-paul.bader-***@public.gmane.org
A: "Willy Tarreau" ***@1wt.eu
Copie à: "John-Paul Bader" john-paul.bader-***@public.gmane.org, "Emeric Brun"
ebrun-***@public.gmane.org, haproxy-***@public.gmane.org
Date: Wed, 07 May 2014 22:09:28 +0200
-------------------------------------------------
Post by John-Paul Bader
Woohoo - this sounds very good :)
Thanks in advance for your efforts - much appreciated!
Kind regards,
John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Ok,
gmake TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
[ALERT] 126/160108 (25333) : Unable to allocate SSL session cache.
[ALERT] 126/160108 (25333) : Fatal errors found in configuration.
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy
Without the patches, haproxy is starting.
(...)
That's a *very* good news!
Emeric is currently working on an alternate locking mechanism which
would work only using spinlocks and no pthreads. The principle is that
accesses to the cache are so rare (once or twice per SSL connection)
that there's almost never any collision (confirmed by the fact that it
requires seconds to minutes for a crash to happen) and it's worthless
to rely on heavy non-portable mechanisms when a simple hand-crafted
spinlock will do the job fine with no overhead.
So... stay tuned, I think Emeric will soon have something to propose you
to test :-)
Cheers,
Willy
--
John-Paul Bader | Software Development
www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
John-Paul Bader
2014-05-07 20:54:33 UTC
Permalink
Hmm yeah I noticed from what you wrote in the mail and by reading
through the patch - but still it confirmed that the shared pthread thing
was not available on FreeBSD right?

Would I also need to compile with USE_PRIVATE_CACHE=1 or would you patch
take care of that?

When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?

Kind regards,

John
Post by Emeric BRUN
My fix is broken, it should only show a warning and fallback on private
cache, i've just pointed the issue.
I will try to send you a workarounf patch soon.
Emeric
----------------original message-----------------
Date: Wed, 07 May 2014 22:09:28 +0200
-------------------------------------------------
Post by John-Paul Bader
Woohoo - this sounds very good :)
Thanks in advance for your efforts - much appreciated!
Kind regards,
John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Ok,
gmake TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
[ALERT] 126/160108 (25333) : Unable to allocate SSL session cache.
[ALERT] 126/160108 (25333) : Fatal errors found in configuration.
/usr/local/etc/rc.d/haproxy: WARNING: failed precmd routine for haproxy
Without the patches, haproxy is starting.
(...)
That's a *very* good news!
Emeric is currently working on an alternate locking mechanism which
would work only using spinlocks and no pthreads. The principle is that
accesses to the cache are so rare (once or twice per SSL connection)
that there's almost never any collision (confirmed by the fact that it
requires seconds to minutes for a crash to happen) and it's worthless
to rely on heavy non-portable mechanisms when a simple hand-crafted
spinlock will do the job fine with no overhead.
So... stay tuned, I think Emeric will soon have something to propose you
to test :-)
Cheers,
Willy
--
John-Paul Bader | Software Development
www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-07 22:57:59 UTC
Permalink
Hi John,
Post by John-Paul Bader
Hmm yeah I noticed from what you wrote in the mail and by reading
through the patch - but still it confirmed that the shared pthread thing
was not available on FreeBSD right?
Yes that's it. Old freebsd code did not return an error for this, and
haproxy did not check this error. Newer freebsd code now does return
an error, but haproxy still didn't check. Emeric's patch introduced
the test for the feature. Note that older freebsd versions will still
pretend to work well but be broken, hence the proposal to remove
pthread by default since there's no reliable way of detecting its
support at runtime.
Post by John-Paul Bader
Would I also need to compile with USE_PRIVATE_CACHE=1 or would you patch
take care of that?
No you don't need it anymore.
Post by John-Paul Bader
When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?
No you don't need to change anymore, what Emeric's patch does is to
reimplement a hand-crafted spinlock mechanism. I just ran a few tests
here and at 4.5k conn/s spread over 4 processes I see that the lock is
already held about 1% of the time, which is very low and does not justify
using a syscall to sleep.

I'm appending the two patches for you to test. They're to be applied on
top of latest master, but I think it will be OK on yours (provided you
don't already have previous patches from Emeric).

You don't need to pass any specific options to the Makefile, it defaults
to using this implementation.

Once you confirm that these ones fix your issue, I'll merge them.

Thanks!
Willy
John-Paul Bader
2014-05-08 07:15:20 UTC
Permalink
Hey,

so I have downloaded the haproxy-ss-Latest from the website and applied
your patches. I have compiled it with:

make TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1

It ran very good for 2 hours but then 6 out of 12 processes coredumped,
this time however in the haproxy code realm and apparently session related:

The GDB Backtrace:

#0 kill_mini_session (s=0x804269c00) at src/session.c:299
299 if (!conn->err_code && (s->task->state & TASK_WOKEN_TIMER)) {
[New Thread 802007400 (LWP 101872/<unknown>)]
[New LWP 101994]
(gdb) bt
#0 kill_mini_session (s=0x804269c00) at src/session.c:299
#1 0x0000000000463928 in conn_session_complete (conn=0x8039f2a80) at
src/session.c:355
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
#4 0x000000000046ed85 in _do_poll (p=<value optimized out>, exp=<value
optimized out>)
at src/ev_kqueue.c:141
#5 0x0000000000405c87 in run_poll_loop () at src/haproxy.c:1309
#6 0x0000000000408001 in main (argc=<value optimized out>,
argv=0x7fffffffdd00)
at src/haproxy.c:1633


I'm running with kqueue and nbproc=12,
option http-server-close
and
timeout http-keep-alive 5000

Kind regards,

John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hmm yeah I noticed from what you wrote in the mail and by reading
through the patch - but still it confirmed that the shared pthread thing
was not available on FreeBSD right?
Yes that's it. Old freebsd code did not return an error for this, and
haproxy did not check this error. Newer freebsd code now does return
an error, but haproxy still didn't check. Emeric's patch introduced
the test for the feature. Note that older freebsd versions will still
pretend to work well but be broken, hence the proposal to remove
pthread by default since there's no reliable way of detecting its
support at runtime.
Post by John-Paul Bader
Would I also need to compile with USE_PRIVATE_CACHE=1 or would you patch
take care of that?
No you don't need it anymore.
Post by John-Paul Bader
When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?
No you don't need to change anymore, what Emeric's patch does is to
reimplement a hand-crafted spinlock mechanism. I just ran a few tests
here and at 4.5k conn/s spread over 4 processes I see that the lock is
already held about 1% of the time, which is very low and does not justify
using a syscall to sleep.
I'm appending the two patches for you to test. They're to be applied on
top of latest master, but I think it will be OK on yours (provided you
don't already have previous patches from Emeric).
You don't need to pass any specific options to the Makefile, it defaults
to using this implementation.
Once you confirm that these ones fix your issue, I'll merge them.
Thanks!
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-08 08:59:45 UTC
Permalink
Hi John,
Post by John-Paul Bader
Hey,
so I have downloaded the haproxy-ss-Latest from the website and applied
make TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
It ran very good for 2 hours but then 6 out of 12 processes coredumped,
Great, so good news and bad news at the same time. Good news being that
the shared context was definitely causing the trouble, the bad news being
that the slightly-tested kqueue is still having some trouble.
Post by John-Paul Bader
(gdb) bt full
#0 kill_mini_session (s=0x804269c00) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x8039f2a80) at
src/session.c:355
s = (struct session *) 0x804269c00
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn = <value optimized out>
flags = 41997063
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
new_updt = <value optimized out>
old_updt = 1
#4 0x000000000046ed85 in _do_poll (p=<value optimized out>, exp=<value
optimized out>)
at src/ev_kqueue.c:141
status = 1
count = 0
fd = <value optimized out>
delta_ms = <value optimized out>
timeout = {tv_sec = 0, tv_nsec = 27000000}
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
changes = <value optimized out>
(...)

OK this trace tends to show that we were called for an event on
an FD in a strange state, half-open, half-closed :-/

If the fd were closed, in conn_fd_handler() it would have simply
returned because .owner == NULL. Since it managed to go to
conn_session_complete(), it means that fd.owner was correct but
the connection was not attached to this session, quite strange.
It seems like something was deinitialized, but I can't find a
code sequence which could produce this.

Would you please retry with poll ? I'm not really convinced that
a bug in kqueue could cause this :-/

Thanks
Willy
John-Paul Bader
2014-05-08 09:29:04 UTC
Permalink
Hey,

Sure thing! I've just put an nokqueue in the config and its running
again. Lets see :)

Kind regards,

John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hey,
so I have downloaded the haproxy-ss-Latest from the website and applied
make TARGET=freebsd USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
It ran very good for 2 hours but then 6 out of 12 processes coredumped,
Great, so good news and bad news at the same time. Good news being that
the shared context was definitely causing the trouble, the bad news being
that the slightly-tested kqueue is still having some trouble.
Post by John-Paul Bader
(gdb) bt full
#0 kill_mini_session (s=0x804269c00) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg =<value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x8039f2a80) at
src/session.c:355
s = (struct session *) 0x804269c00
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn =<value optimized out>
flags = 41997063
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
new_updt =<value optimized out>
old_updt = 1
#4 0x000000000046ed85 in _do_poll (p=<value optimized out>, exp=<value
optimized out>)
at src/ev_kqueue.c:141
status = 1
count = 0
fd =<value optimized out>
delta_ms =<value optimized out>
timeout = {tv_sec = 0, tv_nsec = 27000000}
updt_idx =<value optimized out>
en =<value optimized out>
eo =<value optimized out>
changes =<value optimized out>
(...)
OK this trace tends to show that we were called for an event on
an FD in a strange state, half-open, half-closed :-/
If the fd were closed, in conn_fd_handler() it would have simply
returned because .owner == NULL. Since it managed to go to
conn_session_complete(), it means that fd.owner was correct but
the connection was not attached to this session, quite strange.
It seems like something was deinitialized, but I can't find a
code sequence which could produce this.
Would you please retry with poll ? I'm not really convinced that
a bug in kqueue could cause this :-/
Thanks
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
John-Paul Bader
2014-05-08 10:47:06 UTC
Permalink
Ok after about roughly another 2 hours the next core dump happened -
this time with the nokqueue option set.

The coredump looks very similar and now crashes with ev_poll instead of
ev_kqueue:

(gdb) bt full
#0 kill_mini_session (s=0x80337f800) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x802e29480) at
src/session.c:355
s = (struct session *) 0x80337f800
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn = <value optimized out>
flags = 41997063
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
new_updt = <value optimized out>
old_updt = 1
#4 0x000000000046e8cc in _do_poll (p=<value optimized out>,
exp=-613693376) at src/ev_poll.c:167
e = <value optimized out>
status = 0
fd = 53999616
nbfd = 279
wait_time = <value optimized out>
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
count = 78
sr = <value optimized out>
sw = <value optimized out>
rn = <value optimized out>
wn = <value optimized out>
#5 0x0000000000405c87 in run_poll_loop () at src/haproxy.c:1309
next = -613693376
#6 0x0000000000408001 in main (argc=<value optimized out>,
argv=0x7fffffffdc58)
at src/haproxy.c:1633
px = (struct proxy *) 0x0
ret = <value optimized out>
proc = 0
err = <value optimized out>
retry = <value optimized out>
limit = {rlim_cur = 40017, rlim_max = 40017}
errmsg =
"\000\000\000\000\000\000\000\000?w\201\001\b\000\000\000\026\000\000\000\000\000\000\000?\001j\000\000\000\000\000\020+A\000\000\000\000\000\002\002\000\000\000\000\000\000?\001j\000\000\000\000\000
", '\0' <repeats 15 times>,
"h\205F\000\000\000\000\000p????????????\177\000\000????"
pidfd = 0

Kind regards, John
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-08 11:46:30 UTC
Permalink
Post by John-Paul Bader
Ok after about roughly another 2 hours the next core dump happened -
this time with the nokqueue option set.
The coredump looks very similar and now crashes with ev_poll instead of
Indeed it's exactly the same.
Post by John-Paul Bader
(gdb) bt full
#0 kill_mini_session (s=0x80337f800) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x802e29480) at
src/session.c:355
s = (struct session *) 0x80337f800
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn = <value optimized out>
flags = 41997063
That's 0x0280d307 :

- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake

So at least what we know is that it receives an SSL handshake, and that
it dies while processing it and before any error is reported on the
connection. I'll recheck the error path there in case I find anything.
Post by John-Paul Bader
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
new_updt = <value optimized out>
old_updt = 1
#4 0x000000000046e8cc in _do_poll (p=<value optimized out>,
exp=-613693376) at src/ev_poll.c:167
e = <value optimized out>
status = 0
fd = 53999616
This is 0x0337f800 which is the lower part of the session pointer. So
either gdb is confused by the stack frame or it was corrupted.

All that is extremely informative. I'll audit the relevant code paths
and I might ask you to test with a patch.

Thanks John,
Willy
Willy Tarreau
2014-05-08 11:59:18 UTC
Permalink
Post by Willy Tarreau
Post by John-Paul Bader
Ok after about roughly another 2 hours the next core dump happened -
this time with the nokqueue option set.
The coredump looks very similar and now crashes with ev_poll instead of
Indeed it's exactly the same.
Post by John-Paul Bader
(gdb) bt full
#0 kill_mini_session (s=0x80337f800) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x802e29480) at
src/session.c:355
s = (struct session *) 0x80337f800
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn = <value optimized out>
flags = 41997063
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So at least what we know is that it receives an SSL handshake, and that
it dies while processing it and before any error is reported on the
connection. I'll recheck the error path there in case I find anything.
In fact it's a bit more precise, flags is the copy of connection->flags when
entering the function. It indicates a handshake is in progress. And the only
way to reach line 88 is for the handshake to complete without error. So we
know for sure that ssl_sock_handshake() was called and completed the
handshake then removed the CO_FL_SSL_WAIT_HS flag from the connection.

And that becomes interesting, because ssl_sock_handshake() is cut in several
parts. The first one handles renegociations. We could not enter it because
CONNECTED=0. Then SSL_do_handshake() is called. If it does not return exactly
1, it enters the second block which always returns without clearing the
handshake flag. So we also know we did not enter this place. Thus the
handshake succeeded and went to the reneg_ok label. The third block here
is dedicated to server-side, which is not the case here. And we leave...

So what causes the trouble is done within SSL_do_handshake(), which must
be one of our callbacks. Thus that could still be related to the session
cache in fact. Maybe a corner case remains. I'll check that.

Regards,
Willy
John-Paul Bader
2014-05-08 12:09:37 UTC
Permalink
Ok - I will happily test more patches to pin those issues down :)

There is one thing where I'm not sure if its related. We have currently
set up the system with a rather small msl setting:

net.inet.tcp.msl=5000

Also we have fast_finwait2_recycle enabled so we don't have that many
TIME_WAIT and FIN_WAIT_2 connections hanging around.

Now a few days ago I read that reducing the TIME_WAIT could actually be
harmful:

http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

And I wonder if its somehow related or at least provoking this kind of
bug/edge case. I have to admit that I reach my levels of competence here
so this is really just a wild guess but maybe you can confirm or deny
that assumption.

Kind regards,

John
Post by Willy Tarreau
Post by Willy Tarreau
Post by John-Paul Bader
Ok after about roughly another 2 hours the next core dump happened -
this time with the nokqueue option set.
The coredump looks very similar and now crashes with ev_poll instead of
Indeed it's exactly the same.
Post by John-Paul Bader
(gdb) bt full
#0 kill_mini_session (s=0x80337f800) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg =<value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x802e29480) at
src/session.c:355
s = (struct session *) 0x80337f800
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn =<value optimized out>
flags = 41997063
- {SOCK,DATA,CURR}_RD_ENA=1 => it's a handshake, waiting for reading
- {SOCK,DATA,CURR}_WR_ENA=0 => no need for writing
- CTRL_READY=1 => FD is still allocated
- XPRT_READY=1 => transport layer is initialized
- ADDR_FROM_SET=1, ADDR_TO_SET=0 => clearly it's a frontend connection
- INIT_DATA=1, WAKE_DATA=1 => processing a handshake (ssl I guess)
- {DATA,SOCK}_{RD,WR}_SH=0 => no shutdown
- ERROR=0, CONNECTED=0 => handshake not completed yet
- WAIT_L4_CONN=0 => normal
- WAIT_L6_CONN=1 => waiting for an L6 handshake to complete
- SSL_WAIT_HS=1 => the pending handshake is an SSL handshake
So at least what we know is that it receives an SSL handshake, and that
it dies while processing it and before any error is reported on the
connection. I'll recheck the error path there in case I find anything.
In fact it's a bit more precise, flags is the copy of connection->flags when
entering the function. It indicates a handshake is in progress. And the only
way to reach line 88 is for the handshake to complete without error. So we
know for sure that ssl_sock_handshake() was called and completed the
handshake then removed the CO_FL_SSL_WAIT_HS flag from the connection.
And that becomes interesting, because ssl_sock_handshake() is cut in several
parts. The first one handles renegociations. We could not enter it because
CONNECTED=0. Then SSL_do_handshake() is called. If it does not return exactly
1, it enters the second block which always returns without clearing the
handshake flag. So we also know we did not enter this place. Thus the
handshake succeeded and went to the reneg_ok label. The third block here
is dedicated to server-side, which is not the case here. And we leave...
So what causes the trouble is done within SSL_do_handshake(), which must
be one of our callbacks. Thus that could still be related to the session
cache in fact. Maybe a corner case remains. I'll check that.
Regards,
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-08 19:40:31 UTC
Permalink
Good news!

I found it by reading the code :-)
And I could even reproduce it. The bug happens when running with a
handshake (typically SSL) and when an out of memory or a socket error
happens when calling setsockopt(TCP_NODELAY), which might possibly be
made more common by your TCP settings, though it's not necessarily the
case.

Anyway, please apply the attached patch on top of the two other ones,
I'm really confident this time :-)

I'm now going to merge it as well as Emeric's patches into mainline.

Thanks a lot for all your tests John!
Willy
John-Paul Bader
2014-05-09 09:54:56 UTC
Permalink
Hey Willy,


I have just applied the patch and will run another test after lunch.
Since we're testing with live traffic I can't leave it unattended :)

Just out of curiosity, is this a bug that affects also Linux or is it
FreeBSD specific?

Kind regards, John
Post by Willy Tarreau
Good news!
I found it by reading the code :-)
And I could even reproduce it. The bug happens when running with a
handshake (typically SSL) and when an out of memory or a socket error
happens when calling setsockopt(TCP_NODELAY), which might possibly be
made more common by your TCP settings, though it's not necessarily the
case.
Anyway, please apply the attached patch on top of the two other ones,
I'm really confident this time :-)
I'm now going to merge it as well as Emeric's patches into mainline.
Thanks a lot for all your tests John!
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-09 10:09:46 UTC
Permalink
Hi John,
Post by John-Paul Bader
Hey Willy,
I have just applied the patch and will run another test after lunch.
Since we're testing with live traffic I can't leave it unattended :)
Just out of curiosity, is this a bug that affects also Linux or is it
FreeBSD specific?
It's generic. And after I found it, I searched how I could reproduce it
and found it could happen on out of memory condition. So I started
"haproxy -m 8" on linux and injected traffic. Bingo, segfault and exact
same trace as yours :-)

We were recently stumped to discover a crash of haproxy on an ALOHA,
and we couldn't explain it, it was in relation with a high memory
usage with SSL. Now I know where it was, so we'll issue an update :-)

Thanks again for all your traces and tests!
Willy
Lukas Tribus
2014-05-09 10:40:11 UTC
Permalink
Hi!
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hey Willy,
I have just applied the patch and will run another test after lunch.
Since we're testing with live traffic I can't leave it unattended :)
Just out of curiosity, is this a bug that affects also Linux or is it
FreeBSD specific?
It's generic. And after I found it, I searched how I could reproduce it
and found it could happen on out of memory condition. So I started
"haproxy -m 8" on linux and injected traffic. Bingo, segfault and exact
same trace as yours :-)
We were recently stumped to discover a crash of haproxy on an ALOHA,
and we couldn't explain it, it was in relation with a high memory
usage with SSL. Now I know where it was, so we'll issue an update :-)
Thanks again for all your traces and tests!
Willy
Can the 100% cpu load initially reported along with the segfaults be
explained by the unsupported process shared mutex on FreeBSD, btw?




Regards,

Lukas
Willy Tarreau
2014-05-09 11:31:59 UTC
Permalink
Hi Lukas,
Post by Lukas Tribus
Can the 100% cpu load initially reported along with the segfaults be
explained by the unsupported process shared mutex on FreeBSD, btw?
Absolutely. Some lists are updated within the lock. I let you imagine
what a double-linked list looks like when two writers access it at the
same time :-) Sometimes we used to get an outdated pointer resulting
in a segfault, sometimes a loop resulting in 100% user-space cpu.

Willy
John-Paul Bader
2014-05-09 14:52:04 UTC
Permalink
Hey again,

so for the first time haproxy is running for more than 4 hours without a
coredump and without cpu spikes - so far so good. I'll observe it until
tomorrow morning and report again.

Kind regards,

John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hey Willy,
I have just applied the patch and will run another test after lunch.
Since we're testing with live traffic I can't leave it unattended :)
Just out of curiosity, is this a bug that affects also Linux or is it
FreeBSD specific?
It's generic. And after I found it, I searched how I could reproduce it
and found it could happen on out of memory condition. So I started
"haproxy -m 8" on linux and injected traffic. Bingo, segfault and exact
same trace as yours :-)
We were recently stumped to discover a crash of haproxy on an ALOHA,
and we couldn't explain it, it was in relation with a high memory
usage with SSL. Now I know where it was, so we'll issue an update :-)
Thanks again for all your traces and tests!
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-09 15:03:00 UTC
Permalink
Post by John-Paul Bader
Hey again,
so for the first time haproxy is running for more than 4 hours without a
coredump and without cpu spikes - so far so good. I'll observe it until
tomorrow morning and report again.
Great, so we might have addressed all of them finally :-)

Thanks very much for the update!
Willy
John-Paul Bader
2014-05-15 05:41:25 UTC
Permalink
Hey,


good news! HA-Proxy version 1.5-dev25-a339395 2014/05/10 is running
since yesterday morning with a 1/4 of our traffic and without coredumps,
infinite loops or other problems. We have nbproc and kqueue enabled and
everything seems to behave fine.

In peak times we had 11000 sockets open and a connection/s rate of about
1300. The Load of the machine with 12 cores was about 2-2.5 - so lots of
headroom.

It seems the fixes worked as anticipated :)

We'll slowly push more traffic over the next couple of days.

I just wanted to give you a quick update as promised.

Kind regards,

John
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hey Willy,
I have just applied the patch and will run another test after lunch.
Since we're testing with live traffic I can't leave it unattended :)
Just out of curiosity, is this a bug that affects also Linux or is it
FreeBSD specific?
It's generic. And after I found it, I searched how I could reproduce it
and found it could happen on out of memory condition. So I started
"haproxy -m 8" on linux and injected traffic. Bingo, segfault and exact
same trace as yours :-)
We were recently stumped to discover a crash of haproxy on an ALOHA,
and we couldn't explain it, it was in relation with a high memory
usage with SSL. Now I know where it was, so we'll issue an update :-)
Thanks again for all your traces and tests!
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Willy Tarreau
2014-05-15 05:59:06 UTC
Permalink
Hi John,
Post by John-Paul Bader
Hey,
good news! HA-Proxy version 1.5-dev25-a339395 2014/05/10 is running
since yesterday morning with a 1/4 of our traffic and without coredumps,
infinite loops or other problems. We have nbproc and kqueue enabled and
everything seems to behave fine.
In peak times we had 11000 sockets open and a connection/s rate of about
1300. The Load of the machine with 12 cores was about 2-2.5 - so lots of
headroom.
Cool!
Post by John-Paul Bader
It seems the fixes worked as anticipated :)
We'll slowly push more traffic over the next couple of days.
I just wanted to give you a quick update as promised.
That's really appreciated, thank you very much! So now we can conclude
that anyone using SSL on FreeBSD or experiencing crashes must update at
least to dev25.

Best regards,
Willy

John-Paul Bader
2014-05-08 07:19:20 UTC
Permalink
Maybe the full backtrace is more helpful:

(gdb) bt full
#0 kill_mini_session (s=0x804269c00) at src/session.c:299
level = 6
conn = (struct connection *) 0x0
err_msg = <value optimized out>
#1 0x0000000000463928 in conn_session_complete (conn=0x8039f2a80) at
src/session.c:355
s = (struct session *) 0x804269c00
#2 0x0000000000432769 in conn_fd_handler (fd=<value optimized out>) at
src/connection.c:88
conn = <value optimized out>
flags = 41997063
#3 0x00000000004127db in fd_process_polled_events (fd=<value optimized
out>) at src/fd.c:271
new_updt = <value optimized out>
old_updt = 1
#4 0x000000000046ed85 in _do_poll (p=<value optimized out>, exp=<value
optimized out>)
at src/ev_kqueue.c:141
status = 1
count = 0
fd = <value optimized out>
delta_ms = <value optimized out>
timeout = {tv_sec = 0, tv_nsec = 27000000}
updt_idx = <value optimized out>
en = <value optimized out>
eo = <value optimized out>
changes = <value optimized out>
#5 0x0000000000405c87 in run_poll_loop () at src/haproxy.c:1309
next = -626888972
#6 0x0000000000408001 in main (argc=<value optimized out>,
argv=0x7fffffffdd00)
at src/haproxy.c:1633
px = (struct proxy *) 0x0
ret = <value optimized out>
proc = 3
err = <value optimized out>
retry = <value optimized out>
limit = {rlim_cur = 40017, rlim_max = 40017}
errmsg =
"\000\000\000\000\000\000\000\000?w\201\001\b\000\000\000\026\000\000\000\000\000\000\000?\001j\000\000\000\000\000\020+A\000\000\000\000\000\006\002\000\000\000\000\000\000?\001j\000\000\000\000\000
", '\0' <repeats 15 times>,
"h\205F\000\000\000\000\000p????????????\177\000\0008???"
pidfd = 0
Post by Willy Tarreau
Hi John,
Post by John-Paul Bader
Hmm yeah I noticed from what you wrote in the mail and by reading
through the patch - but still it confirmed that the shared pthread thing
was not available on FreeBSD right?
Yes that's it. Old freebsd code did not return an error for this, and
haproxy did not check this error. Newer freebsd code now does return
an error, but haproxy still didn't check. Emeric's patch introduced
the test for the feature. Note that older freebsd versions will still
pretend to work well but be broken, hence the proposal to remove
pthread by default since there's no reliable way of detecting its
support at runtime.
Post by John-Paul Bader
Would I also need to compile with USE_PRIVATE_CACHE=1 or would you patch
take care of that?
No you don't need it anymore.
Post by John-Paul Bader
When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?
No you don't need to change anymore, what Emeric's patch does is to
reimplement a hand-crafted spinlock mechanism. I just ran a few tests
here and at 4.5k conn/s spread over 4 processes I see that the lock is
already held about 1% of the time, which is very low and does not justify
using a syscall to sleep.
I'm appending the two patches for you to test. They're to be applied on
top of latest master, but I think it will be OK on yours (provided you
don't already have previous patches from Emeric).
You don't need to pass any specific options to the Makefile, it defaults
to using this implementation.
Once you confirm that these ones fix your issue, I'll merge them.
Thanks!
Willy
--
John-Paul Bader | Software Development

www.wooga.com
wooga GmbH | Saarbruecker Str. 38 | D-10405 Berlin
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
Lukas Tribus
2014-05-08 08:20:28 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Post by John-Paul Bader
When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?
No you don't need to change anymore, what Emeric's patch does is to
reimplement a hand-crafted spinlock mechanism.
Two slightly unrelated questions:

When we do those kind of workarounds (tcp mode, reconnecting to our own
HTTP(S) frontend), would using unix sockets be more performant (causing
less load) than TCP over the loopback or are there disadvantages?
Post by Willy Tarreau
I just ran a few tests here and at 4.5k conn/s spread over 4 processes I
see that the lock is already held about 1% of the time, which is very low
and does not justify using a syscall to sleep.
Would it be possible to share the SSL cache between two haproxy instances,
via a "peer" like protocol? I think stud had something similar available.

Suppose you have an active/passive setup with VIP failover, starting with
an empty SSL cache after a failover can be a problem.




Regards,

Lukas
Willy Tarreau
2014-05-08 23:07:28 UTC
Permalink
Hi Lukas,
Post by Lukas Tribus
Hi Willy,
Post by Willy Tarreau
Post by John-Paul Bader
When it uses the private cache, I would also have to change the
configuration to allow ssl sessions over multiple http requests right?
No you don't need to change anymore, what Emeric's patch does is to
reimplement a hand-crafted spinlock mechanism.
When we do those kind of workarounds (tcp mode, reconnecting to our own
HTTP(S) frontend), would using unix sockets be more performant (causing
less load) than TCP over the loopback or are there disadvantages?
For SSL they're much better than plain TCP sockets, the network stack
is lighter, the connection rate is much higher, there are no ports to
allocate on any side. The two real disadvantages I'm seeing to unix
sockets here are :
- no support for splice() which means that transferring data between
processes involves a copy. When using SSL that's not an issue since
we already can't splice with SSL.

- having to connect via the filesystem is not always fun to configure
with chroots. But Linux supports abstract sockets, which solve this
trouble.

I used to have a quick'n'dirty patch a year or two ago which made use
of socketpair() to chain a backend to a frontend within the same process.
It allowed me to compare the performance. All in all, the connection
rate was twice as fast as with TCP, and the data rate was twice as
slow eventhough splice was not used. It might be because of a configurable
limit on the buffer sizes that I had not played with.

I would really love to have unix/abstract sockets to the servers for 1.5
(or even backport them later). Normally there's little work to do, I think
that most of the work consists in copy-pasting tcp_connect_server() into
a new function pointed to by proto_uxst->connect.
Post by Lukas Tribus
Post by Willy Tarreau
I just ran a few tests here and at 4.5k conn/s spread over 4 processes I
see that the lock is already held about 1% of the time, which is very low
and does not justify using a syscall to sleep.
Would it be possible to share the SSL cache between two haproxy instances,
via a "peer" like protocol? I think stud had something similar available.
Yes, it was Emeric who did it for stud. He has a PoC patch lying somewhere for
haproxy as well, but he doesn't feel comfortable with having this because he
told me about a corner case where changing the config (or ciphers) on one node
would render its session incompatible with the other nodes, possibly forcing all
users to renegotiate all the time till the configs are in sync (or something
like this :-)).

One thing that was missing in order to implement the feature cleanly was to
be able to listen to a UDP socket and receive events on it. I remember at the
time he used a few hacks, but now it should be much easier and cleaner.
Post by Lukas Tribus
Suppose you have an active/passive setup with VIP failover, starting with
an empty SSL cache after a failover can be a problem.
Yes quite clearly. We also observe this during config reloads for the same
reasons (though this issue will not be solved since only new entries are
broadcast).

Willy
Dmitry Sivachenko
2014-05-10 20:19:45 UTC
Permalink
Post by Emeric Brun
Hi All,
I suspect FreeBSD to not support process shared mutex (supported in both linux and solaris).
I've just made a patch to add errors check on mutex init, and to fallback on SSL private session cache in error case.
Hello,

BTW, nginx does support shared SSL session cache on FreeBSD (probably by other means).
May be it is worth to borrow their method rather than falling back to private cache?
Willy Tarreau
2014-05-10 20:45:54 UTC
Permalink
Post by Dmitry Sivachenko
Post by Emeric Brun
Hi All,
I suspect FreeBSD to not support process shared mutex (supported in both linux and solaris).
I've just made a patch to add errors check on mutex init, and to fallback on SSL private session cache in error case.
Hello,
BTW, nginx does support shared SSL session cache on FreeBSD (probably by other means).
May be it is worth to borrow their method rather than falling back to private cache?
We finally ended up getting rid of pthreads there and simply using
spinlocks. The difference of performance is not even measurable since
the cache is not used that often.

Willy
Vincent Bernat
2014-05-07 16:35:06 UTC
Permalink
Post by Willy Tarreau
Post by John-Paul Bader
haproxy does not include DTrace probes by any chance right? :)
No, and I have no idea how this works either. But if you feel like it
can provide some value and be done without too much effort, feel free
to try :-)
Here is a proof of concept. To test, use `make TARGET=linux2628
USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like
that. Then, there is a quick example in example/haproxy.stp. You can try
it like this:

#+begin_src sh
sudo stap ./examples/haproxy.stp
#+end_src

It is possible to convert the probes.d to a tapset (which is a recipe
for systemtap) to be able to name arguments and convert them in the
appropriate type. I am using this AWK script:
https://github.com/vincentbernat/lldpd/blob/master/src/daemon/dtrace2systemtap.awk

Only works with simple probes.

For dtrace, this would be something like that but I cannot test right
now:

#+begin_src dtrace
haproxy$target:::frontend_accept
{
printf("Frontend %s accepted a connection", copyinstr(arg0));
}
#+end_src

The trick with those tracepoints is that they are just NOOP until you
enable them. So, even when someone compiles dtrace support, they will
not have any performance impact until trying to use the tracepoints.

While the probe arguments can be anything, it is simpler to only keep
simple types like null-terminated strings or int. Otherwise, they are
difficult to exploit. If you put struct, without the debug symbols, the
data is not exploitable.

Now, all the hard work is to put trace points everywhere. A good target
is where stuff are logged. But they can also be put in places where logs
would be too verbose. I currently don't have interest in doing that but
if someone is willing too, it is only a matter of defining the probes in
probes.d and placing them in the C code. This is really nifty to debug
stuff in production. However, I think that people interested in that can
also use debug symbols to place probe at any place they want to. GCC is
now better at providing debug symbols which work on optimized
executables. Ubuntu is providing debug symbols for almost
everything. Tracepoints are still interesting as they can be listed and
they are hand-picked.
Willy Tarreau
2014-05-07 20:19:22 UTC
Permalink
Hi Vincent,
Post by Vincent Bernat
Post by Willy Tarreau
Post by John-Paul Bader
haproxy does not include DTrace probes by any chance right? :)
No, and I have no idea how this works either. But if you feel like it
can provide some value and be done without too much effort, feel free
to try :-)
Here is a proof of concept. To test, use `make TARGET=linux2628
USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like
that. Then, there is a quick example in example/haproxy.stp.
Interesting, but just for my understanding, what does it provide beyond
building with "TRACE=1" where the compiler dumps *all* function calls,
and not only those that were instrumented ? I'm asking because I never
used dtrace, so I'm totally ignorant here.
Post by Vincent Bernat
You can try
#+begin_src sh
sudo stap ./examples/haproxy.stp
#+end_src
It is possible to convert the probes.d to a tapset (which is a recipe
for systemtap) to be able to name arguments and convert them in the
https://github.com/vincentbernat/lldpd/blob/master/src/daemon/dtrace2systemtap.awk
Only works with simple probes.
For dtrace, this would be something like that but I cannot test right
#+begin_src dtrace
haproxy$target:::frontend_accept
{
printf("Frontend %s accepted a connection", copyinstr(arg0));
}
#+end_src
The trick with those tracepoints is that they are just NOOP until you
enable them. So, even when someone compiles dtrace support, they will
not have any performance impact until trying to use the tracepoints.
Well, they will at least have the performance impact of the "if" which
disables them and the inflated/reordered functions I guess! So at least
we have to be reasonable not to put them everywhere (eg: not in the
polling loops nor in the scheduler).
Post by Vincent Bernat
While the probe arguments can be anything, it is simpler to only keep
simple types like null-terminated strings or int. Otherwise, they are
difficult to exploit. If you put struct, without the debug symbols, the
data is not exploitable.
Now, all the hard work is to put trace points everywhere.
That's where gcc does the stuff free of charge in fact. I still tend to
be cautious about what the debugging code becomes over time, because we
had this twice, once with the DPRINTF() macro which was never up to date,
and once with the http_silent_debug() macro which became so unbalanced
over time that I recently totally removed it.
Post by Vincent Bernat
A good target is where stuff are logged.
Yeah that's a good idea.
Post by Vincent Bernat
But they can also be put in places where logs
would be too verbose. I currently don't have interest in doing that but
if someone is willing too, it is only a matter of defining the probes in
probes.d and placing them in the C code. This is really nifty to debug
stuff in production. However, I think that people interested in that can
also use debug symbols to place probe at any place they want to. GCC is
now better at providing debug symbols which work on optimized
executables. Ubuntu is providing debug symbols for almost
everything. Tracepoints are still interesting as they can be listed and
they are hand-picked.
That was the principle of the http_silent_debug() in fact. Just to know
where we passed, in which order at a low cost. But I think I failed at it
by trying to maintain this code stable, while in practice we probably only
need something properly instrumented to easily add new tracepoints when
needed. Maybe your patch can be a nice step forward in that direction, I
have no idea. It's not intrusive, that's possibly something we can merge
and see if it is quickly adopted or not.

Regards,
Willy
Vincent Bernat
2014-05-07 20:56:37 UTC
Permalink
Post by Willy Tarreau
Post by Vincent Bernat
Here is a proof of concept. To test, use `make TARGET=linux2628
USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like
that. Then, there is a quick example in example/haproxy.stp.
Interesting, but just for my understanding, what does it provide beyond
building with "TRACE=1" where the compiler dumps *all* function calls,
and not only those that were instrumented ? I'm asking because I never
used dtrace, so I'm totally ignorant here.
See below.
Post by Willy Tarreau
Post by Vincent Bernat
The trick with those tracepoints is that they are just NOOP until you
enable them. So, even when someone compiles dtrace support, they will
not have any performance impact until trying to use the tracepoints.
Well, they will at least have the performance impact of the "if" which
disables them and the inflated/reordered functions I guess! So at least
we have to be reasonable not to put them everywhere (eg: not in the
polling loops nor in the scheduler).
No, they are really just NOP. They are registered in some part of the
ELF executable and when the tracepoint is activated, the NOP is
replaced by a JMP.

When arguments are expensive to build, there is the possibility to test
if the probe is enabled, but in this case, even when the probe is not
enabled, there is a cost. So, better keep the arguments simple.

I cannot find a link which explains that clearly (I am pretty sure there
was an article on LWN for that). I can show you the result:

$ readelf -x .note.stapsdt ./haproxy

Hex dump of section '.note.stapsdt':
0x00000000 08000000 3d000000 03000000 73746170 ....=.......stap
0x00000010 73647400 af7c4200 00000000 6f634800 sdt..|B.....ocH.
0x00000020 00000000 b09e6900 00000000 68617072 ......i.....hapr
0x00000030 6f787900 66726f6e 74656e64 5f616363 oxy.frontend_acc
0x00000040 65707400 38403235 36382825 72617829 ***@2568(%rax)
0x00000050 00000000 ....

systemtap/dtrace are able to read this section:

$ stap -L 'process("./haproxy").mark("*")'
process("./haproxy").mark("frontend_accept") $arg1:long

(all arguments are seen as long/pointer because this is not something
encoded)

gdb is also able to use them:

(gdb) info probes
Provider Name Where Semaphore Object
haproxy frontend_accept 0x0000000000427caf 0x0000000000699eb0 /home/bernat/code/dailymotion/haproxy/haproxy
(gdb) disassemble frontend_accept
Dump of assembler code for function frontend_accept:
0x0000000000427c90 <+0>: push %r14
0x0000000000427c92 <+2>: push %r13
0x0000000000427c94 <+4>: push %r12
0x0000000000427c96 <+6>: push %rbp
0x0000000000427c97 <+7>: push %rbx
0x0000000000427c98 <+8>: mov %rdi,%rbx
0x0000000000427c9b <+11>: add $0xffffffffffffff80,%rsp
0x0000000000427c9f <+15>: mov 0x270(%rdi),%r12
0x0000000000427ca6 <+22>: mov 0x20(%rdi),%rax
0x0000000000427caa <+26>: mov 0x34(%r12),%ebp
0x0000000000427caf <+31>: nop
0x0000000000427cb0 <+32>: mov 0x30(%rdi),%rax
0x0000000000427cb4 <+36>: movq $0x0,0x2f0(%rdi)
0x0000000000427cbf <+47>: movq $0x0,0x2e8(%rdi)
[...]

See the nop at 427caf?

So the main interest of those probes are:

* low overhead, they can be left in production to be here when you
really need them
* discoverable, someone not tech-savvy enough to read the source can
list them and decide which ones to enable because someone more
tech-savvy chosed them
Post by Willy Tarreau
Post by Vincent Bernat
While the probe arguments can be anything, it is simpler to only keep
simple types like null-terminated strings or int. Otherwise, they are
difficult to exploit. If you put struct, without the debug symbols, the
data is not exploitable.
Now, all the hard work is to put trace points everywhere.
That's where gcc does the stuff free of charge in fact. I still tend to
be cautious about what the debugging code becomes over time, because we
had this twice, once with the DPRINTF() macro which was never up to date,
and once with the http_silent_debug() macro which became so unbalanced
over time that I recently totally removed it.
Yes, this is a big problem. In the kernel where a similar mechanism
exists, some maintainers are reluctant to provide tracepoints because
they would become part of the user/kernel interface and have to be
maintained which is a lot of work.
Post by Willy Tarreau
Post by Vincent Bernat
But they can also be put in places where logs
would be too verbose. I currently don't have interest in doing that but
if someone is willing too, it is only a matter of defining the probes in
probes.d and placing them in the C code. This is really nifty to debug
stuff in production. However, I think that people interested in that can
also use debug symbols to place probe at any place they want to. GCC is
now better at providing debug symbols which work on optimized
executables. Ubuntu is providing debug symbols for almost
everything. Tracepoints are still interesting as they can be listed and
they are hand-picked.
That was the principle of the http_silent_debug() in fact. Just to know
where we passed, in which order at a low cost. But I think I failed at it
by trying to maintain this code stable, while in practice we probably only
need something properly instrumented to easily add new tracepoints when
needed. Maybe your patch can be a nice step forward in that direction, I
have no idea. It's not intrusive, that's possibly something we can merge
and see if it is quickly adopted or not.
In its current form, it is too far limited. We can wait for more people
asking for such support and have someone which will at least add the
minimal instrumentation at key points in the code. The patch is likely
to stay up-to-date for quite some time since it is small and relying on
a cross-OS "frozen" mechanism, so no hurry, it should continue to work
when we really need it.

I am a big fan of systemtap but I usually rely on debug symbols since so
few programs have tracepoints and I can usually understand the code.

For interpreted languages, tracepoints are for more interesting since
understanding a VM is more complex than understanding a regular
program. But tracepoints being discoverable, they are easier to use than
debug symbols.

Debug symbols are available for Ubuntu and Redhat (I mean debug symbols
that match the installed packages). I wanted to do the same thing for
Debian, but no time to go forward on this. This is a huge
difficulty. You want to know something, you don't have the debug
symbols, you need to recompile, stop, start and maybe the problem will
be gone before you had a chance to debug it. The tracepoints are quite
convenient as we can ask distributions to enable them as soon as they
exist.

While GCC has made some good progress to generate good debug symbols,
you can still get missing symbols (missing arguments, missing local
variables) or errors. Usually, they happen right when you need them. ;-)
--
Use uniform input formats.
- The Elements of Programming Style (Kernighan & Plauger)
Vincent Bernat
2014-05-07 20:59:43 UTC
Permalink
Post by Vincent Bernat
* low overhead, they can be left in production to be here when you
really need them
And you enable/disable them while the program is running.
--
panic ("No CPUs found. System halted.\n");
2.4.3 linux/arch/parisc/kernel/setup.c
Willy Tarreau
2014-05-07 21:10:50 UTC
Permalink
Post by Vincent Bernat
Post by Vincent Bernat
* low overhead, they can be left in production to be here when you
really need them
And you enable/disable them while the program is running.
:-)

Thanks very much for the detailed explanation Vincent. So from what I
understand, dtrace is more for production uses while TRACE=1 is more
for the developer. Neither fits the two purposes but if we agree that
neither of them must not cross the frontier towards the other one, both
can be useful and very efficient at little cost for the purpose they aim
at serving (typically your "I was here but I won't dump my args").

So that makes a lot of sense indeed.

I renamed the thread to help people find it in mail archives when they
search for the feature. I think your explanation and patch will be a
nice starting point for whoever wants to devote some time to this.

Thanks!
Willy
Chris Burroughs
2014-05-14 14:56:15 UTC
Permalink
Post by Vincent Bernat
Post by Willy Tarreau
Post by John-Paul Bader
haproxy does not include DTrace probes by any chance right? :)
No, and I have no idea how this works either. But if you feel like it
can provide some value and be done without too much effort, feel free
to try :-)
Here is a proof of concept. To test, use `make TARGET=linux2628
USE_DTRACE=1`. On Linux, you need systemtap-sdt-dev or something like
that. Then, there is a quick example in example/haproxy.stp. You can try
#+begin_src sh
sudo stap ./examples/haproxy.stp
#+end_src
This is a really cool proof of concept. Thank you for sharing Vincent.
Loading...