Discussion:
http-response set-header is unreliable
(too old to reply)
Tim Düsterhus
2018-04-26 15:33:09 UTC
Permalink
Hi

I have got a frontend in mode http that sets various headers
http-response set-header Expect-CT "max-age=3600; report-uri=\"https://xxx.report-uri.com/r/d/ct/reportOnly\""
http-response set-header Expect-Staple "max-age=3600; report-uri=\"https://xxx.report-uri.com/r/d/staple/reportOnly\"; includeSubDomains"
http-response set-header Public-Key-Pins-Report-Only "pin-sha256=\"Vjs8r4z+xxx+eWys=\"; pin-sha256=\"xxx/ltjyo=\"; pin-sha256=\"xxx/uEtLMkBgFF2Fuihg=\"; report-uri=\"https://xxx.report-uri.io/r/default/hpkp/reportOnly\"; max-age=86400"
http-response set-header Referrer-Policy "same-origin"
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
http-response set-header X-Content-Type-Options nosniff
http-response set-header X-Frame-Options SAMEORIGIN
http-response set-header X-XSS-Protection "1; mode=block"
This frontend talks (among others) to a backend that also sets a header
http-response set-header Content-Security-Policy "xxx report-uri https://xxx.report-uri.com/r/d/csp/enforce";
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF41041A
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF46041D
X-Content-Type-Options: nosniff
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF49041F
X-Content-Type-Options: nosniff
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF4A0421
X-Content-Type-Options: nosniff
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF4F0477
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF530491
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF5404B3
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF580598
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF66067F
X-Content-Type-Options: nosniff
Apr 26 15:24:49 xxx haproxy[7565]: 2003:xxx:53728 [26/Apr/2018:15:24:49.681] fe_https~ bk_xxx/nginx 0/0/1/252/253 200 16912 - - ---- 11/8/0/1/0 0/0 {xxx|HTTPie/0.9.2} "GET / HTTP/1.1" EXAMPLE-5AE1EF41041A
Apr 26 15:24:55 xxx haproxy[7565]: 2003:xxx:53730 [26/Apr/2018:15:24:55.034] fe_https~ bk_xxx/nginx 0/0/0/203/203 200 16911 - - ---- 10/7/0/1/0 0/0 {xxx|HTTPie/0.9.2} "GET / HTTP/1.1" EXAMPLE-5AE1EF46041D
The hex value in the request IDs is: %Ts%rt (thus there have only been
two requests in between those two).

I'm running haproxy 1.8.8 on Debian Stretch, installed from Debian
HA-Proxy version 1.8.8-1~bpo9+1 2018/04/19
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fdebug-prefix-map=/build/haproxy-1.8.8=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.1.0f 25 May 2017
Running on OpenSSL version : OpenSSL 1.1.0f 25 May 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
[SPOE] spoe
[COMP] compression
[TRACE] trace
Any ideas?

Best regards
Tim Düsterhus
Willy Tarreau
2018-04-28 05:51:07 UTC
Permalink
Hi Tim,
Post by Tim Düsterhus
Hi
I have got a frontend in mode http that sets various headers
http-response set-header Expect-CT "max-age=3600; report-uri=\"https://xxx.report-uri.com/r/d/ct/reportOnly\""
http-response set-header Expect-Staple "max-age=3600; report-uri=\"https://xxx.report-uri.com/r/d/staple/reportOnly\"; includeSubDomains"
http-response set-header Public-Key-Pins-Report-Only "pin-sha256=\"Vjs8r4z+xxx+eWys=\"; pin-sha256=\"xxx/ltjyo=\"; pin-sha256=\"xxx/uEtLMkBgFF2Fuihg=\"; report-uri=\"https://xxx.report-uri.io/r/default/hpkp/reportOnly\"; max-age=86400"
http-response set-header Referrer-Policy "same-origin"
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
http-response set-header X-Content-Type-Options nosniff
http-response set-header X-Frame-Options SAMEORIGIN
http-response set-header X-XSS-Protection "1; mode=block"
This frontend talks (among others) to a backend that also sets a header
http-response set-header Content-Security-Policy "xxx report-uri https://xxx.report-uri.com/r/d/csp/enforce";
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF41041A
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF46041D
X-Content-Type-Options: nosniff
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF49041F
X-Content-Type-Options: nosniff
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF4A0421
X-Content-Type-Options: nosniff
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF4F0477
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF530491
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF5404B3
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF580598
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-UA-Compatible: IE=edge
X-Req-ID: EXAMPLE-5AE1EF66067F
X-Content-Type-Options: nosniff
Apr 26 15:24:49 xxx haproxy[7565]: 2003:xxx:53728 [26/Apr/2018:15:24:49.681] fe_https~ bk_xxx/nginx 0/0/1/252/253 200 16912 - - ---- 11/8/0/1/0 0/0 {xxx|HTTPie/0.9.2} "GET / HTTP/1.1" EXAMPLE-5AE1EF41041A
Apr 26 15:24:55 xxx haproxy[7565]: 2003:xxx:53730 [26/Apr/2018:15:24:55.034] fe_https~ bk_xxx/nginx 0/0/0/203/203 200 16911 - - ---- 10/7/0/1/0 0/0 {xxx|HTTPie/0.9.2} "GET / HTTP/1.1" EXAMPLE-5AE1EF46041D
The hex value in the request IDs is: %Ts%rt (thus there have only been
two requests in between those two).
I'm running haproxy 1.8.8 on Debian Stretch, installed from Debian
HA-Proxy version 1.8.8-1~bpo9+1 2018/04/19
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fdebug-prefix-map=/build/haproxy-1.8.8=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.1.0f 25 May 2017
Running on OpenSSL version : OpenSSL 1.1.0f 25 May 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
[SPOE] spoe
[COMP] compression
[TRACE] trace
Any ideas?
Not that many ideas. Could you retry by setting "tune.maxrewrite" to a
larger value ? It defaults to 1024, and maybe you're already adding 1kB
of response and there's no more room in the response buffer. It's just
a guess, I could be completely wrong.

Willy
Tim Düsterhus
2018-04-29 19:36:13 UTC
Permalink
Willy,
Post by Willy Tarreau
Not that many ideas. Could you retry by setting "tune.maxrewrite" to a
larger value ? It defaults to 1024, and maybe you're already adding 1kB
of response and there's no more room in the response buffer. It's just
a guess, I could be completely wrong.
I configured it as 1536 and it seems to have fixed the issue. Indeed I
am settings a few very large headers: The Content-Security-Policy header
I'm setting is 410 bytes on it's own. The Public-Key-Pins-Report-Only
header is an additional 258 bytes.

I'm wondering why I am unable to reproduce the issue consistently,
though. The only dynamic header I am setting with haproxy is:

http-response set-header X-Req-ID %[unique-id]

and that one does not differ in length. Either it should fit in the
buffer all the time or it should never fit, no?

And why does it affect two headers at once? If the length is right below
the limit intuitively only the very last header should be affected.

A last: Maybe it makes sense to create a log message if this limit is
hit instead of silently ignoring (security critical!) response headers?

Best regards
Tim Düsterhus
Willy Tarreau
2018-04-30 13:48:50 UTC
Permalink
Hi Tim,
Post by Tim Düsterhus
Willy,
Post by Willy Tarreau
Not that many ideas. Could you retry by setting "tune.maxrewrite" to a
larger value ? It defaults to 1024, and maybe you're already adding 1kB
of response and there's no more room in the response buffer. It's just
a guess, I could be completely wrong.
I configured it as 1536 and it seems to have fixed the issue. Indeed I
am settings a few very large headers: The Content-Security-Policy header
I'm setting is 410 bytes on it's own. The Public-Key-Pins-Report-Only
header is an additional 258 bytes.
I'm wondering why I am unable to reproduce the issue consistently,
http-response set-header X-Req-ID %[unique-id]
and that one does not differ in length. Either it should fit in the
buffer all the time or it should never fit, no?
And why does it affect two headers at once? If the length is right below
the limit intuitively only the very last header should be affected.
I really don't know, maybe the rules are aborted during their processing.
Post by Tim Düsterhus
A last: Maybe it makes sense to create a log message if this limit is
hit instead of silently ignoring (security critical!) response headers?
Well, a long time ago we would get a 400 when trying to add or modify
a header increasing the request size past the max buffer size, or a 502
when doing this on the response. The single fact that it didn't occur
in your case sounds like a bug to me. I suspect that along the
implementation of all these http-request / http-response rulesets, we
may have lost some failure checks in some gray areas (ie: when calling
a Lua service or stuff like this, it's not always black or white). Thus
I guess we need to recheck all of this to ensure we're not missing anything
important because I totally agree with you that a request not fullfilling
all the prerequisites for complete processing should not pass ; same for
responses.

Thanks,
Willy
Tim Düsterhus
2018-04-30 19:06:16 UTC
Permalink
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
And why does it affect two headers at once? If the length is right below
the limit intuitively only the very last header should be affected.
I really don't know, maybe the rules are aborted during their processing.
Anything I could do to help investigate this? I can apply patches with
additional logging or I can send you the unredacted configuration in
private if that would help.
Post by Willy Tarreau
Post by Tim Düsterhus
A last: Maybe it makes sense to create a log message if this limit is
hit instead of silently ignoring (security critical!) response headers?
Well, a long time ago we would get a 400 when trying to add or modify
a header increasing the request size past the max buffer size, or a 502
when doing this on the response. The single fact that it didn't occur
in your case sounds like a bug to me. I suspect that along the
implementation of all these http-request / http-response rulesets, we
may have lost some failure checks in some gray areas (ie: when calling
a Lua service or stuff like this, it's not always black or white). Thus
I guess we need to recheck all of this to ensure we're not missing anything
important because I totally agree with you that a request not fullfilling
all the prerequisites for complete processing should not pass ; same for
responses.
tune.bufsize is documented to send a 502:
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3.2-tune.bufsize

tune.maxrewrite is documented to "prevent addition of headers":
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3.2-tune.maxrewrite

So it works like documented. But that does not mean that the current
documented behaviour is a good behaviour.

Best regards
Tim Düsterhus
Willy Tarreau
2018-04-30 21:06:13 UTC
Permalink
Post by Tim Düsterhus
Post by Willy Tarreau
Post by Tim Düsterhus
And why does it affect two headers at once? If the length is right below
the limit intuitively only the very last header should be affected.
I really don't know, maybe the rules are aborted during their processing.
Anything I could do to help investigate this? I can apply patches with
additional logging or I can send you the unredacted configuration in
private if that would help.
OK, it's just that for now I can't propose anything, I'm context-switching
far too much between many many different stuff :-(
Post by Tim Düsterhus
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3.2-tune.bufsize
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3.2-tune.maxrewrite
No, not "prevent addition of headers", in fact it's the exact opposite, it's
meant to ensure that a received message will not fill a header so that this
number of bytes will always remain available to permit addition of headers.
Now I can understand how it's possible to read the doc like this, but what
is meant there is that if it's too small, as a side effect it will indeed
prevent the addition of headers, but this will result in the message not to
be correctly processed and to fail (thus 400 or 502 depending on the
direction) since the resulting message becomes larger than tune.bufsize.
Post by Tim Düsterhus
So it works like documented.
No, really not. The doc explains a consequence, not a goal (though it may
not be that obvious).
Post by Tim Düsterhus
But that does not mean that the current
documented behaviour is a good behaviour.
Absolutely. What I'm much more worried about now is that a number of
deployments may already accidently be relying on this problem and may
discoverr 400s or 502s the hard way after we fix it :-( So we'll have
to be very careful about this and enumerate what works and what fails
to figure if there's a real risk that people rely on this or not. If so
we may need to add an extra option to relax addition in case of failure
to preserve the current behaviour, which I'd personally hate.

Regards,
Willy
Tim Düsterhus
2018-04-30 23:57:06 UTC
Permalink
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
Anything I could do to help investigate this? I can apply patches with
additional logging or I can send you the unredacted configuration in
private if that would help.
OK, it's just that for now I can't propose anything, I'm context-switching
far too much between many many different stuff :-(
I fiddled a bit by myself: I found that the http_header_add_tail2 in
http_res_get_intercept_rule, case ACT_HTTP_SET_HDR (proto_http.c:2934)
returns -1 (indicating failure). This return value is being ignored.

Adding the following debug output in http_header_add_tail2:

printf("Buffer Information: size(%d) i(%d)\n", msg->chn->buf->size,
msg->chn->buf->i);
buffer_dump(stderr, msg->chn->buf, 0, msg->chn->buf->i);
printf("Add tail: %d: %s\n", bytes, text);

Yields:

Buffer Information: size(16384) i(16367)
Dumping buffer 0x5654d8047960
data=0x5654d8047974 o=0 i=16367 p=0x5654d8047974
relative: p=0x0000
Dumping contents from byte 0 to byte 16367
*snip*
Add tail: 0: X-XSS-Protection: 1; mode=block

in failure cases. You can see that the buffer is filled to the top.

Decoding the dumped buffer reveals that the buffer contains
significantly more of the HTTP *body* in failure cases than in success
cases.
Post by Willy Tarreau
Post by Tim Düsterhus
But that does not mean that the current
documented behaviour is a good behaviour.
Absolutely. What I'm much more worried about now is that a number of
deployments may already accidently be relying on this problem and may
discoverr 400s or 502s the hard way after we fix it :-( So we'll have
to be very careful about this and enumerate what works and what fails
to figure if there's a real risk that people rely on this or not. If so
we may need to add an extra option to relax addition in case of failure
to preserve the current behaviour, which I'd personally hate.
I've outlined above what exactly fails where in my case (i.e. missing
return value checking).

It might make sense to enlarge the rewrite buffer reservation by
default. I can absolutely imagine that people put in a ton of
information in the times of Content-Security-Policy, Expect-{CT,Staple}
and whatnot.
I don't know what issues a too-small buffer for non-rewrites would
cause. Clearly the body must be able to span multiple buffers already,
otherwise I would not be able to send bodies greater than 16kB.
Will it need to allocate more buffers to do the same work, because every
single one is smaller?

Best regards
Tim Düsterhus
Willy Tarreau
2018-05-01 04:28:47 UTC
Permalink
Hi Tim,
Post by Tim Düsterhus
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
Anything I could do to help investigate this? I can apply patches with
additional logging or I can send you the unredacted configuration in
private if that would help.
OK, it's just that for now I can't propose anything, I'm context-switching
far too much between many many different stuff :-(
I fiddled a bit by myself: I found that the http_header_add_tail2 in
http_res_get_intercept_rule, case ACT_HTTP_SET_HDR (proto_http.c:2934)
returns -1 (indicating failure). This return value is being ignored.
OK, thanks for confirming this!
Post by Tim Düsterhus
printf("Buffer Information: size(%d) i(%d)\n", msg->chn->buf->size,
msg->chn->buf->i);
buffer_dump(stderr, msg->chn->buf, 0, msg->chn->buf->i);
printf("Add tail: %d: %s\n", bytes, text);
Buffer Information: size(16384) i(16367)
Dumping buffer 0x5654d8047960
data=0x5654d8047974 o=0 i=16367 p=0x5654d8047974
relative: p=0x0000
Dumping contents from byte 0 to byte 16367
*snip*
Add tail: 0: X-XSS-Protection: 1; mode=block
in failure cases. You can see that the buffer is filled to the top.
Decoding the dumped buffer reveals that the buffer contains
significantly more of the HTTP *body* in failure cases than in success
cases.
Yes, that's indeed expected.
Post by Tim Düsterhus
It might make sense to enlarge the rewrite buffer reservation by
default.
We used to have this a long time ago, the maxrewrite value used to
default to half the buffer size. But it caused too many requests to
be rejected and became absurd when users configure large buffers.
Post by Tim Düsterhus
I can absolutely imagine that people put in a ton of
information in the times of Content-Security-Policy, Expect-{CT,Staple}
and whatnot.
Yes but even once you put this, you hardly reach 1 kB. Most requests are
way below 1 kB headers, at least for performance reasons.
Post by Tim Düsterhus
I don't know what issues a too-small buffer for non-rewrites would
cause.
This limits the maximum size of accepted messages (request or response),
which results in *some* requests to be rejected due to (for example) very
large cookies. Overall, I'd say that the number of problem reports due to
rejected requests or reponses due to their size has dropped to almost zero
since we've limited the value to 1 kB. A long time ago it used to be a very
common report.
Post by Tim Düsterhus
Clearly the body must be able to span multiple buffers already,
otherwise I would not be able to send bodies greater than 16kB.
Will it need to allocate more buffers to do the same work, because every
single one is smaller?
No, the body is simply streamed, not stored. If however you need to analyse
it (eg for a WAF) then you need to configure a larger buffer so that more
of the body can fit.

To give you an idea about how things currently work, when we perform an
recvfrom() call, we decide whether we read up to bufsize or up to
(bufsize-maxrewrite) depending on whether we are forwarding data or
still analysing the message. Thus when we receive a POST, we're not yet
forwarding, so up to 15 kB of headers+body are placed into the buffer,
leaving 1 kB available to add headers.

After the new native internal HTTP representation is implemented, I think
that the need for the maxrewrite will disappear, at the expense of using
one buffer for the headers and another one for the body, but we'll see.

Anyway we need to address the lack of error checking, and I really predict
some breakage here :-/

Willy
Tim Düsterhus
2018-05-01 15:10:19 UTC
Permalink
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
It might make sense to enlarge the rewrite buffer reservation by
default.
We used to have this a long time ago, the maxrewrite value used to
default to half the buffer size. But it caused too many requests to
be rejected and became absurd when users configure large buffers.
Post by Tim Düsterhus
I can absolutely imagine that people put in a ton of
information in the times of Content-Security-Policy, Expect-{CT,Staple}
and whatnot.
Yes but even once you put this, you hardly reach 1 kB. Most requests are
way below 1 kB headers, at least for performance reasons.
Just to make sure that we are on the same page: My failure case is not
the requests, but rather the responses.

As another data point: GitHub adds a 812 Byte Content-Security-Policy
header to all their responses. In total they send more than 2kB of
Post by Willy Tarreau
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Encoding: gzip
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com status.github.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: assets-cdn.github.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com
Content-Type: text/html; charset=utf-8
Date: Tue, 01 May 2018 15:01:16 GMT
Expect-CT: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Server: GitHub.com
Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Sat, 01 May 2038 15:01:16 -0000; secure; HttpOnly
Set-Cookie: _gh_sess=MnNIV1pEWFFXdUp5WkZCWFJDdzBRSlVXblRmTTlsVDlxaXhYWWgxMUVBVHpQYzRtNlg0aEtndTNIS29SNXA2bGFEekYydHYxaFFtd3p3TExyeW1YMG02S1YzTWMrclVxMnMzYnVJdmtIdG5pS3Y1RUtnY0xnVnpna0pITFRsZ3RxTjdQRVVjWVRjV3g3SnhvMkRyM20yVzdtVFNIRVdkYThBWkt3RitZT1QwQ0hsc2hMdXhSUEZhU0RiSlV5TUJ6dWg0UGg3cm1jOERzQldHYlFSNEtxUT09LS1EMmY5R1lidkI4V2FHWWFDaForQXN3PT0%3D--55a310a481cc921162ef33f26c91143493524be7; path=/; secure; HttpOnly
Status: 200 OK
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Transfer-Encoding: chunked
Vary: X-PJAX
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-GitHub-Request-Id: C776:2053:B3DF8:14BA5E:5AE8813C
X-Request-Id: 96a38711-e5c3-4c75-9df6-6282e43d429e
X-Runtime: 0.046230
X-Runtime-rack: 0.051859
X-XSS-Protection: 1; mode=block
-
Post by Willy Tarreau
Post by Tim Düsterhus
I don't know what issues a too-small buffer for non-rewrites would
cause.
This limits the maximum size of accepted messages (request or response),
which results in *some* requests to be rejected due to (for example) very
large cookies. Overall, I'd say that the number of problem reports due to
rejected requests or reponses due to their size has dropped to almost zero
since we've limited the value to 1 kB. A long time ago it used to be a very
common report.
Post by Tim Düsterhus
Clearly the body must be able to span multiple buffers already,
otherwise I would not be able to send bodies greater than 16kB.
Will it need to allocate more buffers to do the same work, because every
single one is smaller?
No, the body is simply streamed, not stored. If however you need to analyse
it (eg for a WAF) then you need to configure a larger buffer so that more
of the body can fit.
To give you an idea about how things currently work, when we perform an
recvfrom() call, we decide whether we read up to bufsize or up to
(bufsize-maxrewrite) depending on whether we are forwarding data or
still analysing the message. Thus when we receive a POST, we're not yet
forwarding, so up to 15 kB of headers+body are placed into the buffer,
leaving 1 kB available to add headers.
To make sure I understand that correctly (for the response case):

tune.bufsize:
Maximum size of response headers.

tune.bufsize - tune.maxrewrite:
Maximum supported size of backend generated response headers.

tune.maxrewrite:
Maximum size of response headers you are guaranteed to be able to add to
a response. You might or might not be able to add more depending on the
speed of the backend sending the body.

Is that correct?
Post by Willy Tarreau
After the new native internal HTTP representation is implemented, I think
that the need for the maxrewrite will disappear, at the expense of using
one buffer for the headers and another one for the body, but we'll see.
Is this planned for 1.9? I'm aware that plans can change :-)
Post by Willy Tarreau
Anyway we need to address the lack of error checking, and I really predict
some breakage here :-/
I'd start of with *logging* when the call fails for the short term.
Users that see it failing can look into their logs to find out what
knobs to turn.

Best regards
Tim Düsterhus
Willy Tarreau
2018-05-02 09:47:15 UTC
Permalink
Post by Tim Düsterhus
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
It might make sense to enlarge the rewrite buffer reservation by
default.
We used to have this a long time ago, the maxrewrite value used to
default to half the buffer size. But it caused too many requests to
be rejected and became absurd when users configure large buffers.
Post by Tim Düsterhus
I can absolutely imagine that people put in a ton of
information in the times of Content-Security-Policy, Expect-{CT,Staple}
and whatnot.
Yes but even once you put this, you hardly reach 1 kB. Most requests are
way below 1 kB headers, at least for performance reasons.
Just to make sure that we are on the same page: My failure case is not
the requests, but rather the responses.
Yes I understood this well, sorry for using the wrong term. When I said
"request" I meant "messages" (in either direction).
Post by Tim Düsterhus
As another data point: GitHub adds a 812 Byte Content-Security-Policy
header to all their responses. In total they send more than 2kB of
Nice one, though I'd argue that sites which do this know that they
are manipulating large contents (it's visible in the config file and
sometimes they are the ones asking to relax the config parsing rules).
So they're also aware of the need to increase maxrewrite.
Post by Tim Düsterhus
Post by Willy Tarreau
Post by Tim Düsterhus
Clearly the body must be able to span multiple buffers already,
otherwise I would not be able to send bodies greater than 16kB.
Will it need to allocate more buffers to do the same work, because every
single one is smaller?
No, the body is simply streamed, not stored. If however you need to analyse
it (eg for a WAF) then you need to configure a larger buffer so that more
of the body can fit.
To give you an idea about how things currently work, when we perform an
recvfrom() call, we decide whether we read up to bufsize or up to
(bufsize-maxrewrite) depending on whether we are forwarding data or
still analysing the message. Thus when we receive a POST, we're not yet
forwarding, so up to 15 kB of headers+body are placed into the buffer,
leaving 1 kB available to add headers.
Maximum size of response headers.
Maximum supported size of backend generated response headers.
Maximum size of response headers you are guaranteed to be able to add to
a response. You might or might not be able to add more depending on the
speed of the backend sending the body.
Is that correct?
Absolutely (though it's true both for request and response).
Post by Tim Düsterhus
Post by Willy Tarreau
After the new native internal HTTP representation is implemented, I think
that the need for the maxrewrite will disappear, at the expense of using
one buffer for the headers and another one for the body, but we'll see.
Is this planned for 1.9? I'm aware that plans can change :-)
Yes, but it has ramifications everywhere in the lower layers. You just
made me realize that I wanted to announce the plans a few months ago and
that we were so much busy dealing with complex bugs that I completely
forgot to communicate on this :-(
Post by Tim Düsterhus
Post by Willy Tarreau
Anyway we need to address the lack of error checking, and I really predict
some breakage here :-/
I'd start of with *logging* when the call fails for the short term.
Adding extra logs not related to traffic actually makes debugging much
worse because these logs don't end up in the correct file/storage/whatever.
It's much easier to filter on 400/502 and find all the request elements to
understand what happens than seeing a random error out of any context.
Part of the difficulty here is to properly indicate what was attempted
and failed, in complex configs where it's hard to even enumerate rulesets.

Maybe one solution could be to tag transactions that experienced such a
failure and to put a warning on them, that would be reported in the logs
with a "W" flag in the termination codes. It would not indicate what
failed, obviously, but would very quickly raise awareness of the fact
that there is a problem. Usually thanks to the second flag indicating
the transaction state and to the rest of the log, it's easier to follow
the processing and to figure what went wrong. Later once we add the
ability to reject again such failures, we could add an "F" (for "failure"
to process).

What do you think ?

Willy
Tim Düsterhus
2018-05-02 22:08:37 UTC
Permalink
Willy,
Post by Willy Tarreau
Nice one, though I'd argue that sites which do this know that they
are manipulating large contents (it's visible in the config file and
sometimes they are the ones asking to relax the config parsing rules).
So they're also aware of the need to increase maxrewrite.
Clearly I was not aware that I needed to increase maxrewrite, despite
adding a ton of headers to my responses.
Yes, I could have found out if I read the complete configuration manual.
But the tunables are defined as "Performance Tuning" in the headline. I
don't have performance issues, I handle less than 15 r/s in peak
situations. It did not even occur to me that there could possibly be a
limit to header rewriting. In fact I found about this whole issue by
accident.
Post by Willy Tarreau
Post by Tim Düsterhus
I'd start of with *logging* when the call fails for the short term.
Adding extra logs not related to traffic actually makes debugging much
worse because these logs don't end up in the correct file/storage/whatever.
It's much easier to filter on 400/502 and find all the request elements to
understand what happens than seeing a random error out of any context.
Part of the difficulty here is to properly indicate what was attempted
and failed, in complex configs where it's hard to even enumerate rulesets.
To me a message like: "Unable to add-header Content-Security-Policy to
response. Possibly the amount of headers exceeds tune.maxrewrite." would
have been more helpful than random 502 without any further information.
Especially since the issue happens randomly: Sometimes the additional
headers fit by chance. Sometimes they don't. I would start by
investigating the connection to the backend services, not investigating
some random tunables (See my paragraph above.). Actually I'm not sure
whether a 502 would even be the correct response. The issue is not with
Post by Willy Tarreau
The 502 (Bad Gateway) status code indicates that the server, while
acting as a gateway or proxy, received an *invalid response from an
inbound server* it accessed while attempting to fulfill the request.
(highlighting mine)

After digging into it I might be able to deduce that the addition of the
new `http-response add-header` line caused the issues. But still I would
be non the wiser. I would have to stumble upon the tunable by accident.
Or ask on the list, like I did.
Post by Willy Tarreau
Maybe one solution could be to tag transactions that experienced such a
failure and to put a warning on them, that would be reported in the logs
with a "W" flag in the termination codes. It would not indicate what
failed, obviously, but would very quickly raise awareness of the fact
that there is a problem. Usually thanks to the second flag indicating
the transaction state and to the rest of the log, it's easier to follow
the processing and to figure what went wrong. Later once we add the
ability to reject again such failures, we could add an "F" (for "failure"
to process).
What do you think ?
It clearly would be better than silently failing, but it still would
have wasted a few hours I suspect (again: see above paragraphs).

I want to note at this point that I'm not running haproxy at scale or
with serious monitoring. The haproxy instance I'm experiencing this
issue with is my personal server, not some company or business one. It
runs my mail and some side / hobby projects. My needs or expectations
might be different.

Best regards
Tim Düsterhus
Willy Tarreau
2018-05-03 03:23:33 UTC
Permalink
Post by Tim Düsterhus
Willy,
Post by Willy Tarreau
Nice one, though I'd argue that sites which do this know that they
are manipulating large contents (it's visible in the config file and
sometimes they are the ones asking to relax the config parsing rules).
So they're also aware of the need to increase maxrewrite.
Clearly I was not aware that I needed to increase maxrewrite, despite
adding a ton of headers to my responses.
Yes, I could have found out if I read the complete configuration manual.
But the tunables are defined as "Performance Tuning" in the headline. I
don't have performance issues, I handle less than 15 r/s in peak
situations. It did not even occur to me that there could possibly be a
limit to header rewriting. In fact I found about this whole issue by
accident.
Oh I can very well imagine. Long gone are the days where it was possible
to read the whole doc in 10 minutes!
Post by Tim Düsterhus
Post by Willy Tarreau
Post by Tim Düsterhus
I'd start of with *logging* when the call fails for the short term.
Adding extra logs not related to traffic actually makes debugging much
worse because these logs don't end up in the correct file/storage/whatever.
It's much easier to filter on 400/502 and find all the request elements to
understand what happens than seeing a random error out of any context.
Part of the difficulty here is to properly indicate what was attempted
and failed, in complex configs where it's hard to even enumerate rulesets.
To me a message like: "Unable to add-header Content-Security-Policy to
response. Possibly the amount of headers exceeds tune.maxrewrite." would
have been more helpful than random 502 without any further information.
We could possibly emit something like this with a warning level, just like
when a server goes down. However we need to rate-limit it as you don't want
your admin log to grow at 1000/s when you're flooded with large bogus
requests that repeatedly cause this to happen.
Post by Tim Düsterhus
Especially since the issue happens randomly: Sometimes the additional
headers fit by chance. Sometimes they don't. I would start by
investigating the connection to the backend services, not investigating
some random tunables (See my paragraph above.).
Actually when you have an error, the termination flags are the most important
thing to look at as they indicate what was wrong and where/when.
Post by Tim Düsterhus
Actually I'm not sure
whether a 502 would even be the correct response. The issue is not with
Post by Willy Tarreau
The 502 (Bad Gateway) status code indicates that the server, while
acting as a gateway or proxy, received an *invalid response from an
inbound server* it accessed while attempting to fulfill the request.
(highlighting mine)
It could as well, but arguably it could also be said that the frontend
never received a valid response from the backend since this one failed
to transform a valid response into another valid one.
Post by Tim Düsterhus
After digging into it I might be able to deduce that the addition of the
new `http-response add-header` line caused the issues. But still I would
be non the wiser. I would have to stumble upon the tunable by accident.
Or ask on the list, like I did.
Just out of curiosity, what do you check more often, in order of priority,
among :
- stats page
- show info on CLI
- traffic logs
- admin logs
- other

Because in fact that might help figure where such painful failures would
need to be shown (possibly multiple places).
Post by Tim Düsterhus
Post by Willy Tarreau
Maybe one solution could be to tag transactions that experienced such a
failure and to put a warning on them, that would be reported in the logs
with a "W" flag in the termination codes. It would not indicate what
failed, obviously, but would very quickly raise awareness of the fact
that there is a problem. Usually thanks to the second flag indicating
the transaction state and to the rest of the log, it's easier to follow
the processing and to figure what went wrong. Later once we add the
ability to reject again such failures, we could add an "F" (for "failure"
to process).
What do you think ?
It clearly would be better than silently failing, but it still would
have wasted a few hours I suspect (again: see above paragraphs).
I want to note at this point that I'm not running haproxy at scale or
with serious monitoring. The haproxy instance I'm experiencing this
issue with is my personal server, not some company or business one. It
runs my mail and some side / hobby projects. My needs or expectations
might be different.
That's an important point. It's the same for me on 1wt.eu or haproxy.org,
sometimes I figure late that there are errors. Most often it's the component
we forget about because it works and we don't spend time looking at the logs.
The probably, like me, you're looking at the stats page once in a while, and
only at the logs when stats look suspicious ?

We already have "Warnings" columns in the stats page which are unused for
the frontends, we could use it to report a count of such failures. Or we
could add an extra "rewrite" column under "warnings" to report such errors
where they were detected.

Willy
Tim Düsterhus
2018-05-03 13:34:01 UTC
Permalink
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
To me a message like: "Unable to add-header Content-Security-Policy to
response. Possibly the amount of headers exceeds tune.maxrewrite." would
have been more helpful than random 502 without any further information.
We could possibly emit something like this with a warning level, just like
when a server goes down. However we need to rate-limit it as you don't want
your admin log to grow at 1000/s when you're flooded with large bogus
requests that repeatedly cause this to happen.
Yes, I agree.
Post by Willy Tarreau
Post by Tim Düsterhus
Especially since the issue happens randomly: Sometimes the additional
headers fit by chance. Sometimes they don't. I would start by
investigating the connection to the backend services, not investigating
some random tunables (See my paragraph above.).
Actually when you have an error, the termination flags are the most important
thing to look at as they indicate what was wrong and where/when.
But still the termination flags do not point me to the *real* issue.
They are relatively coarse grained.
Post by Willy Tarreau
Post by Tim Düsterhus
Actually I'm not sure
whether a 502 would even be the correct response. The issue is not with
Post by Willy Tarreau
The 502 (Bad Gateway) status code indicates that the server, while
acting as a gateway or proxy, received an *invalid response from an
inbound server* it accessed while attempting to fulfill the request.
(highlighting mine)
It could as well, but arguably it could also be said that the frontend
never received a valid response from the backend since this one failed
to transform a valid response into another valid one.
Depending on the definition of valid. To me the 502 implies looking into
the backend service first, not into haproxy. But let's not bikeshed
about this.
Post by Willy Tarreau
Post by Tim Düsterhus
After digging into it I might be able to deduce that the addition of the
new `http-response add-header` line caused the issues. But still I would
be non the wiser. I would have to stumble upon the tunable by accident.
Or ask on the list, like I did.
Just out of curiosity, what do you check more often, in order of priority,
- stats page
- show info on CLI
- traffic logs
- admin logs
- other
Because in fact that might help figure where such painful failures would
need to be shown (possibly multiple places).
Primarily munin, because it shows all my services at a glance. Munin
uses the stats socket.

Next would be the syslog [1]. I use the default Debian packaged logging
set up. I think it places both traffic as well as admin logs into the
same file. I have `log global` in my default section and no specific
logs for frontends / backends.

Last would be the stats page. I use this primarily after reboots to
verify all my backends are properly UP. It's not much use to me for
"long term" information, because I unconditionally reload haproxy after
running certbot renew. Thus my haproxy instance is reloaded once a day.
Too much hassle to pipe in the new certificates via the admin socket.

I don't use any other tools to retrieve information.

[1] I'd love to have a proper integration with systemd-journald to have
all my logs in one place. It's pretty annoying, because some things
("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
Master process) go to systemd-journald (probably printed to stdout /
stderr) and everything else goes into syslog.
Post by Willy Tarreau
Post by Tim Düsterhus
I want to note at this point that I'm not running haproxy at scale or
with serious monitoring. The haproxy instance I'm experiencing this
issue with is my personal server, not some company or business one. It
runs my mail and some side / hobby projects. My needs or expectations
might be different.
That's an important point. It's the same for me on 1wt.eu or haproxy.org,
sometimes I figure late that there are errors. Most often it's the component
we forget about because it works and we don't spend time looking at the logs.
The probably, like me, you're looking at the stats page once in a while, and
only at the logs when stats look suspicious ?
We already have "Warnings" columns in the stats page which are unused for
the frontends, we could use it to report a count of such failures. Or we
could add an extra "rewrite" column under "warnings" to report such errors
where they were detected.
As noted above the stats page is useless to me. Most useful to me would
be something munin could detect, because it would send me a mail.

Actually the first thing I would notice is if haproxy died, because then
my mail does not work either. I put haproxy in front of my Dovecot.
But that's a bit drastic I think. :-)

Best regards
Tim Düsterhus
Willy Tarreau
2018-05-03 15:37:47 UTC
Permalink
Hi Tim,
Post by Tim Düsterhus
Post by Willy Tarreau
Post by Tim Düsterhus
Especially since the issue happens randomly: Sometimes the additional
headers fit by chance. Sometimes they don't. I would start by
investigating the connection to the backend services, not investigating
some random tunables (See my paragraph above.).
Actually when you have an error, the termination flags are the most important
thing to look at as they indicate what was wrong and where/when.
But still the termination flags do not point me to the *real* issue.
They are relatively coarse grained.
If they indicate that an overflow occured in the request or the response,
and you have the information for each and every request, you may find that
it's quite useful, especially when you have to match this against side
effects affecting these requests. The fact that they are harder to spot
is a different issue.
Post by Tim Düsterhus
Post by Willy Tarreau
Just out of curiosity, what do you check more often, in order of priority,
- stats page
- show info on CLI
- traffic logs
- admin logs
- other
Because in fact that might help figure where such painful failures would
need to be shown (possibly multiple places).
Primarily munin, because it shows all my services at a glance. Munin
uses the stats socket.
OK good. This votes in favor of a per-frontend, per-backend counter then
that Munin can check and report when it increases.
Post by Tim Düsterhus
Next would be the syslog [1]. I use the default Debian packaged logging
set up. I think it places both traffic as well as admin logs into the
same file. I have `log global` in my default section and no specific
logs for frontends / backends.
Last would be the stats page. I use this primarily after reboots to
verify all my backends are properly UP. It's not much use to me for
"long term" information, because I unconditionally reload haproxy after
running certbot renew. Thus my haproxy instance is reloaded once a day.
Too much hassle to pipe in the new certificates via the admin socket.
OK, pretty clear. So in short by having this per-proxy counter, we can
satisfy users like you (via Munin) and me (via the stats page).
Post by Tim Düsterhus
I don't use any other tools to retrieve information.
[1] I'd love to have a proper integration with systemd-journald to have
all my logs in one place. It's pretty annoying, because some things
("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
Master process) go to systemd-journald (probably printed to stdout /
stderr) and everything else goes into syslog.
Well, you should probably tell that to the guy who instead of learning
how to use a unix-like system decided it was easier to break everything
in it that used to be pretty simple, stable, reliable and clear for 40
years before he forced his crap into almost every Linux distro to the
point of making them even less observable and debuggable than Windows
nowadays :-(

What you have above looks like stderr. The rest are logs. They are for
very different usages, stderr is there to inform you that something went
wrong during a reload operation (that systemd happily hides so that you
believe it was OK but it was not), while the logs are there for future
traffic analysis and troubleshooting.

And the reason journalctl is this slow very likely lies in its original
purpose which is just to log daemons' output during startup (since it was
confiscated by the tools). It's totally unusable for anything like moderate
to high traffic.

Going back to the initial subject, are you interested in seeing if you
can add a warning counter to each frontend/backend, and possibly a rate
limited warning in the logs as well ? I'm willing to help if needed, it's
just that I really cannot take care of this myself, given that I spent
the last 6 months dealing with bugs and various other discussions, almost
not having been able to start to do anything for the next release :-/ So
any help here is welcome as you can guess.

Thanks!
Willy
Tim Düsterhus
2018-05-03 16:00:45 UTC
Permalink
Willy,
Post by Willy Tarreau
Post by Tim Düsterhus
[1] I'd love to have a proper integration with systemd-journald to have
all my logs in one place. It's pretty annoying, because some things
("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
Master process) go to systemd-journald (probably printed to stdout /
stderr) and everything else goes into syslog.
Well, you should probably tell that to the guy who instead of learning
how to use a unix-like system decided it was easier to break everything
in it that used to be pretty simple, stable, reliable and clear for 40
years before he forced his crap into almost every Linux distro to the
point of making them even less observable and debuggable than Windows
nowadays :-(
Personally I enjoy systemd and take a unit file over an init script all
day. But I'm very well aware of the criticism surrounding it.
Post by Willy Tarreau
What you have above looks like stderr. The rest are logs. They are for
very different usages, stderr is there to inform you that something went
wrong during a reload operation (that systemd happily hides so that you
believe it was OK but it was not), while the logs are there for future
traffic analysis and troubleshooting.
May 3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from server-state after a reload. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from server-state after a reload. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
May 3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
May 3 13:30:26 ### haproxy[14926]: Server bk_yyy/yyy is DOWN, changed from server-state after a reload. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
At least the Proxy ... started / stopped messages should go into the
same log.
Post by Willy Tarreau
And the reason journalctl is this slow very likely lies in its original
purpose which is just to log daemons' output during startup (since it was
confiscated by the tools). It's totally unusable for anything like moderate
to high traffic.
Cannot comment on performance for obvious reasons.
Post by Willy Tarreau
Going back to the initial subject, are you interested in seeing if you
can add a warning counter to each frontend/backend, and possibly a rate
limited warning in the logs as well ? I'm willing to help if needed, it's
just that I really cannot take care of this myself, given that I spent
the last 6 months dealing with bugs and various other discussions, almost
not having been able to start to do anything for the next release :-/ So
any help here is welcome as you can guess.
Personally I'd prefer the rate limited warning over the counter. As
outlined before: A warning counter probably will be incremented for
multiple unrelated reasons in the longer term and thus loses it
usefulness. Having a warning_headers_too_big counter and a
warning_whatever_there_may_be is stupid, no?

I feel that the error counter could / should be re-used for this and
just the log message should be added. My munin already monitors the
error counts. The `eresp` counter seems to fit: "- failure applying
filters to the response.".

Best regards
Tim Düsterhus
Willy Tarreau
2018-05-03 16:18:11 UTC
Permalink
Post by Tim Düsterhus
Post by Willy Tarreau
What you have above looks like stderr. The rest are logs. They are for
very different usages, stderr is there to inform you that something went
wrong during a reload operation (that systemd happily hides so that you
believe it was OK but it was not), while the logs are there for future
traffic analysis and troubleshooting.
May 3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 conns).
May 3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from server-state after a reload. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from server-state after a reload. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
May 3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
May 3 13:30:26 ### haproxy[14926]: Server bk_yyy/yyy is DOWN, changed from server-state after a reload. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
At least the Proxy ... started / stopped messages should go into the
same log.
On a regular system, these ones do not even exist because they are mostly
debug messages, which are dumped after the fork, thus which are never
shown unless you're running in foreground mode. On an init system which
requires daemon to stay in the foreground... you get the debugging
messages on output, and since the daemon confiscates your output, it
sends it to its journal. I *think* (not tried though) that you can hide
them using "-q" on the command line.
Post by Tim Düsterhus
Post by Willy Tarreau
Going back to the initial subject, are you interested in seeing if you
can add a warning counter to each frontend/backend, and possibly a rate
limited warning in the logs as well ? I'm willing to help if needed, it's
just that I really cannot take care of this myself, given that I spent
the last 6 months dealing with bugs and various other discussions, almost
not having been able to start to do anything for the next release :-/ So
any help here is welcome as you can guess.
Personally I'd prefer the rate limited warning over the counter. As
outlined before: A warning counter probably will be incremented for
multiple unrelated reasons in the longer term and thus loses it
usefulness. Having a warning_headers_too_big counter and a
warning_whatever_there_may_be is stupid, no?
For now we don't have such a warning, so the only reason for logging
it would be this header issue. It's never supposed to happen in theory
as it normally needs to be addressed immediately and ultimately we
should block by default on this. And if later we find another reason
to add a warning, we'll figure if it makes sense to use a different
counter or not.

Also you said yourself that you wouldn't look at the logs first but at
munin first. And munin monitors your stats socket, so logically munin
should report you increases of this counter found on the stats socket.
Post by Tim Düsterhus
I feel that the error counter could / should be re-used for this and
just the log message should be added.
Except that it's not an error until we block. We detected an error and
decided to let it pass through, which is a warning. It would be an error
if we'd block on it though.
Post by Tim Düsterhus
My munin already monitors the
error counts. The `eresp` counter seems to fit: "- failure applying
filters to the response.".
If you see an error, you have the guarantee that the request or response
was blocked, so definitely here it doesn't fit for the case where you
don't block. And it's very important not to violate such guarantees as
some people really rely on them. For example during forensics after an
intrusion attempt on your systems, you really want to know if the attacker
managed to retrieve something or not.

Willy

Continue reading on narkive:
Loading...