[Fwd: 302 to 502 error]

Discussion:

Willy Tarreau

2009-07-14 16:26:39 UTC

[forwarding to haproxy-***@public.gmane.org, original address was wrong]

Hi Robbie,

Hi,
i have a problem with the 502 errors. I saw they got better debugging
through the use of the `echo "show errors" | socat stdio socket command`
since 2 versions.
But i never see an error in there, the sess and info work though.
haproxy -v
HA-Proxy version 1.3.18 2009/05/10
GET /go/login/action=logout HTTP/1.1
...
HTTP/1.x 502 Bad Gateway
Cache-Control: no-cache
Connection: close
Content-Type: text/html
Jul 14 13:09:10 localhost haproxy[4634]: clientip:54173
[14/Jul/2009:13:09:10.142] domain domain/webxxx 5/0/1/-1/76 502 8396 - -
PH-- 161/161/144/4/0 0/0 "GET /go/login/action=logout HTTP/1.1"

The -1 above indicates that an error was received before the end of headers.
Let me guess, do you have nginx behind haproxy ? Another user is experiencing
something similar with errors 400 not correctly being forwarded because the
server immediately resets the connection, which translates into an error. It
would help if you could provide a tcpdump capture of the traffic between
haproxy and the backend server. I suspect we'll see an RST immediately after
the 302.

I'd like to be able to reproduce the issue to see if haproxy gets the data or
not. If it gets the data, we might be able to forward them ignoring the error,
but it is very likely that it does not even get the data. However, since you
say that in TCP mode it works, we have a chance to get it to work.

Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

It will help me a lot !

Regards,
Willy

Robbie Aelter

2009-07-15 08:09:28 UTC

Permalink

Hi Willy,

Post by Willy Tarreau
Hi Robbie,

We do not have a nginx running, the setup is like this:
- client => haproxy => backendwebserver (apache)

Post by Willy Tarreau
I'd like to be able to reproduce the issue to see if haproxy gets the data or
not. If it gets the data, we might be able to forward them ignoring the error,
but it is very likely that it does not even get the data. However, since you
say that in TCP mode it works, we have a chance to get it to work.
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

Post by Willy Tarreau
It will help me a lot !
Regards,
Willy

Kind regards,

Robbie
--
Robbie Aelter
System/Network Engineer

NETLOG NV
Emile Braunplein 18
B-9000 Gent
http://corporate.netlog.com/
Tel +32 2 400 43 21
Fax +32 2 400 43 20

robbie-OrzGVW1Z1xXQT0dZR+***@public.gmane.org
http://netlog.com/robbieaelter
GSM +32 474 933298

Willy Tarreau

2009-07-15 08:27:05 UTC

Permalink

Hi Robbie,

Post by Robbie Aelter

Post by Willy Tarreau
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

well, please try the patches from the other mail first, as I
managed to reproduce the issue here and to fix it. If you feel
uneasy with applying patches, you can use the latest snapshot
here instead :

http://haproxy.1wt.eu/download/1.3/src/snapshot/haproxy-1.3.18-patches-20090715.tar.gz

Regards,
Willy

Robbie Aelter

2009-07-15 10:45:51 UTC

Permalink

Hi Willy,

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter

Post by Willy Tarreau
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

well, please try the patches from the other mail first, as I
managed to reproduce the issue here and to fix it. If you feel
uneasy with applying patches, you can use the latest snapshot
http://haproxy.1wt.eu/download/1.3/src/snapshot/haproxy-1.3.18-patches-20090715.tar.gz
Regards,
Willy

I applied the 2 patches, 0013 - 0014, if I am correct, no success. But I
noticed some co-workers have no problem at all. I think it is due to the
cookie, that contains illegal stuff.
I'll try to put a webservice on the local box of haproxy to see the
errors, because now i can't see where in the header parsing it goes
wrong. I base myself on the PH-- and the -1 you noticed,
that means that the header parsing contained errors. Is there a way
around this? I'll diff the cookies to check any differences. If you
still want the strace i can provide you this. The tcpdump is a
little more difficult since everything has to happen live, alot of info
flows through. I cannot filter on client ip since haproxy hides it. But
i saw resets on the backend box, when i performend the logout.

# /var/log/apache2# tcpdump -i eth0.2 -l "tcp dst port 80 and
((tcp[tcpflags] & tcp-rst) = tcp-rst)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.2, link-type EN10MB (Ethernet), capture size 96 bytes
12:39:07.027292 IP proxy.52835 > web.www: R 3926970814:3926970814(0) ack
970538910 win 227 <nop,nop,timestamp 352042308 285313103>
12:39:07.027360 IP proxy.52835 > web.www: R 3926970814:3926970814(0) win 0

I'll mail you the strace in private.

kind regards,

Robbie
--
Robbie Aelter
System/Network Engineer

NETLOG NV
Emile Braunplein 18
B-9000 Gent
http://corporate.netlog.com/
Tel +32 2 400 43 21
Fax +32 2 400 43 20

robbie-OrzGVW1Z1xXQT0dZR+***@public.gmane.org
http://netlog.com/robbieaelter
GSM +32 474 933298

Willy Tarreau

2009-07-15 12:01:16 UTC

Permalink

Hi Robbie,

Post by Robbie Aelter
Hi Willy,

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter

Post by Willy Tarreau
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

well, please try the patches from the other mail first, as I
managed to reproduce the issue here and to fix it. If you feel
uneasy with applying patches, you can use the latest snapshot
http://haproxy.1wt.eu/download/1.3/src/snapshot/haproxy-1.3.18-patches-20090715.tar.gz
Regards,
Willy

I applied the 2 patches, 0013 - 0014, if I am correct, no success. But I
noticed some co-workers have no problem at all. I think it is due to the
cookie, that contains illegal stuff.
I'll try to put a webservice on the local box of haproxy to see the
errors, because now i can't see where in the header parsing it goes
wrong. I base myself on the PH-- and the -1 you noticed,
that means that the header parsing contained errors. Is there a way
around this? I'll diff the cookies to check any differences. If you
still want the strace i can provide you this. The tcpdump is a
little more difficult since everything has to happen live, alot of info
flows through. I cannot filter on client ip since haproxy hides it. But
i saw resets on the backend box, when i performend the logout.
# /var/log/apache2# tcpdump -i eth0.2 -l "tcp dst port 80 and
((tcp[tcpflags] & tcp-rst) = tcp-rst)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.2, link-type EN10MB (Ethernet), capture size 96 bytes
12:39:07.027292 IP proxy.52835 > web.www: R 3926970814:3926970814(0) ack
970538910 win 227 <nop,nop,timestamp 352042308 285313103>
12:39:07.027360 IP proxy.52835 > web.www: R 3926970814:3926970814(0) win 0
I'll mail you the strace in private.

OK I've just checked, and you're precisely in the situation I talked
about with Jean-Baptiste, as in your trace, there are a lot of such
things :

epoll_wait(0, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=129, u64=129}}}, 200, 2) = 1
recvfrom(129, 0x2c6ad10, 16384, 16384, 0, 0) = -1 ECONNRESET (Connection reset by peer)
close(129) = 0

=> When the system receives the data, followed by the RST, the socket is
marked in error before haproxy even has a chance to get the data. So
the response is definitely lost between both TCP stacks and haproxy has
no possibility to read it.

In my opinion, this should only happen if you send large requests that
the server does not care to read. For instance, if you perform a redirection
on a large POST request without reading it entirely.

I'll check the rest of your trace to see if I can figure out a specific
pattern which produces the error.

Regards,
Willy

Robbie Aelter

2009-07-15 12:17:08 UTC

Permalink

Willy,

i have found something that could cause the problem, back in the old
days we had to change the apache LimitRequestFieldsize to 16380.
It is huge, but our developers needed it, maybe i could try to up the
bufsize to 32000 since i can have req en resp to half the size.
I got the idea from your previous reply:

/In case you get a bad reply, you can try three things :
- issue a "show errors" with socat on the unix socket
- set the option to ignore bad response headers
- rebuild with larger buffers, in case the response would
not fit.

/

kind regards,

Robbie

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter
Hi Willy,

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter

Post by Willy Tarreau
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

well, please try the patches from the other mail first, as I
managed to reproduce the issue here and to fix it. If you feel
uneasy with applying patches, you can use the latest snapshot
http://haproxy.1wt.eu/download/1.3/src/snapshot/haproxy-1.3.18-patches-20090715.tar.gz
Regards,
Willy

I applied the 2 patches, 0013 - 0014, if I am correct, no success. But I
noticed some co-workers have no problem at all. I think it is due to the
cookie, that contains illegal stuff.
I'll try to put a webservice on the local box of haproxy to see the
errors, because now i can't see where in the header parsing it goes
wrong. I base myself on the PH-- and the -1 you noticed,
that means that the header parsing contained errors. Is there a way
around this? I'll diff the cookies to check any differences. If you
still want the strace i can provide you this. The tcpdump is a
little more difficult since everything has to happen live, alot of info
flows through. I cannot filter on client ip since haproxy hides it. But
i saw resets on the backend box, when i performend the logout.
# /var/log/apache2# tcpdump -i eth0.2 -l "tcp dst port 80 and
((tcp[tcpflags]& tcp-rst) = tcp-rst)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.2, link-type EN10MB (Ethernet), capture size 96 bytes
12:39:07.027292 IP proxy.52835> web.www: R 3926970814:3926970814(0) ack
970538910 win 227<nop,nop,timestamp 352042308 285313103>
12:39:07.027360 IP proxy.52835> web.www: R 3926970814:3926970814(0) win 0
I'll mail you the strace in private.

OK I've just checked, and you're precisely in the situation I talked
about with Jean-Baptiste, as in your trace, there are a lot of such
epoll_wait(0, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=129, u64=129}}}, 200, 2) = 1
recvfrom(129, 0x2c6ad10, 16384, 16384, 0, 0) = -1 ECONNRESET (Connection reset by peer)
close(129) = 0
=> When the system receives the data, followed by the RST, the socket is
marked in error before haproxy even has a chance to get the data. So
the response is definitely lost between both TCP stacks and haproxy has
no possibility to read it.
In my opinion, this should only happen if you send large requests that
the server does not care to read. For instance, if you perform a redirection
on a large POST request without reading it entirely.
I'll check the rest of your trace to see if I can figure out a specific
pattern which produces the error.
Regards,
Willy

--
Robbie Aelter
System/Network Engineer

NETLOG NV
Emile Braunplein 18
B-9000 Gent
http://corporate.netlog.com/
Tel +32 2 400 43 21
Fax +32 2 400 43 20

robbie-OrzGVW1Z1xXQT0dZR+***@public.gmane.org
http://netlog.com/robbieaelter

Robbie Aelter

2009-07-15 12:45:32 UTC

Permalink

Willy,

the problem seems to be solved, the buffer size was not big enough in
HAproxy, i changed the value in "include/common/defaults.h" as follows:

#ifndef BUFSIZE
#define BUFSIZE 33960
#endif

Compiled again, and it works fine. I'll keep an eye out for other
problems that my occur. But many thanks for the quick responses and
guiding me in right direction.

kind regards,

Robbie

Post by Robbie Aelter
Willy,
i have found something that could cause the problem, back in the old
days we had to change the apache LimitRequestFieldsize to 16380.
It is huge, but our developers needed it, maybe i could try to up the
bufsize to 32000 since i can have req en resp to half the size.
- issue a "show errors" with socat on the unix socket
- set the option to ignore bad response headers
- rebuild with larger buffers, in case the response would
not fit.
/
kind regards,
Robbie

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter
Hi Willy,

Post by Willy Tarreau
Hi Robbie,

Post by Robbie Aelter

Post by Willy Tarreau
Also, if you can reproduce that on unitary tests, could you please run
"strace -o /tmp/haproxy.trace -p $(pidof haproxy)" and send me the file
/tmp/haproxy.trace ?

I will try to reproduce this today and send you the file.

well, please try the patches from the other mail first, as I
managed to reproduce the issue here and to fix it. If you feel
uneasy with applying patches, you can use the latest snapshot
http://haproxy.1wt.eu/download/1.3/src/snapshot/haproxy-1.3.18-patches-20090715.tar.gz
Regards,
Willy

I applied the 2 patches, 0013 - 0014, if I am correct, no success. But I
noticed some co-workers have no problem at all. I think it is due to the
cookie, that contains illegal stuff.
I'll try to put a webservice on the local box of haproxy to see the
errors, because now i can't see where in the header parsing it goes
wrong. I base myself on the PH-- and the -1 you noticed,
that means that the header parsing contained errors. Is there a way
around this? I'll diff the cookies to check any differences. If you
still want the strace i can provide you this. The tcpdump is a
little more difficult since everything has to happen live, alot of info
flows through. I cannot filter on client ip since haproxy hides it. But
i saw resets on the backend box, when i performend the logout.
# /var/log/apache2# tcpdump -i eth0.2 -l "tcp dst port 80 and
((tcp[tcpflags]& tcp-rst) = tcp-rst)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0.2, link-type EN10MB (Ethernet), capture size 96 bytes
12:39:07.027292 IP proxy.52835> web.www: R 3926970814:3926970814(0) ack
970538910 win 227<nop,nop,timestamp 352042308 285313103>
12:39:07.027360 IP proxy.52835> web.www: R 3926970814:3926970814(0) win 0
I'll mail you the strace in private.

OK I've just checked, and you're precisely in the situation I talked
about with Jean-Baptiste, as in your trace, there are a lot of such
epoll_wait(0, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=129, u64=129}}}, 200, 2) = 1
recvfrom(129, 0x2c6ad10, 16384, 16384, 0, 0) = -1 ECONNRESET
(Connection reset by peer)
close(129) = 0
=> When the system receives the data, followed by the RST, the socket is
marked in error before haproxy even has a chance to get the data. So
the response is definitely lost between both TCP stacks and haproxy has
no possibility to read it.
In my opinion, this should only happen if you send large requests that
the server does not care to read. For instance, if you perform a redirection
on a large POST request without reading it entirely.
I'll check the rest of your trace to see if I can figure out a specific
pattern which produces the error.
Regards,
Willy

Willy Tarreau

2009-07-15 12:54:01 UTC

Permalink

Post by Robbie Aelter
Willy,
the problem seems to be solved, the buffer size was not big enough in
#ifndef BUFSIZE
#define BUFSIZE 33960
#endif
Compiled again, and it works fine. I'll keep an eye out for other
problems that my occur. But many thanks for the quick responses and
guiding me in right direction.

Cool. However you can simply build it with that option on the command
line. You can also change the MAXREWRITE value which reserves a part
of the buffer for header rewriting. I generally set MAXREWRITE to
around 1 kB, which is quite enough. In your case, maybe you can build
with 20 kB in BUFSIZE and 1 kB in MAXREWRITE, that way :

$ make TARGET=xxx SMALL_OPTS="-DBUFSIZE=20480 -DMAXREWRITE=1024"

Most likely even just changing MAXREWRITE will be enough.

Thanks for the heads up anyway !

Regards,
Willy