regtest failure for /log/b00000.vtc, tcp health-check makes and closes a connection to s1 server without valid http-request

Discussion:

PiBa-NL

2018-12-08 21:54:32 UTC

Permalink

Hi List, Willy,

The regtestÂ /reg-tests/log/b00000.vtc, is failing for me as shown
below, and attached:

***Â s1Â Â Â 0.0 accepted fd 5 127.0.0.1 29538
**Â Â s1Â Â Â 0.0 === rxreq
---- s1Â Â Â 0.0 HTTP rx failed (fd:5 read: Connection reset by peer)
***Â c1Â Â Â 0.0 closing fd 8
**Â Â c1Â Â Â 0.0 Ending
*Â Â Â topÂ Â 0.0 RESETTING after ./reg-tests/log/b00000.vtc

This happens because the health-check makes a tcp connection, then
disconnects, but s1 server expects a http-request.

So to fix this, i propose to apply 1 out of 3 possible fixes i could
imagine, each one does fix the test when executed.

a- use s2 server specifically for the tcp health-check
b- use a option httpchk, and repeat s1 server twice
c- remove the health-check completely.

I think option C is probably the cleanest and most fail-safe way. And am
'pretty sure' that the health-check isn't actually needed to reproduce
the original issue. Anyhow health-checks could be a source of random
test-failures when the system is really slow it might need 2 checks
during a test, and normal varnishtest server's only processes 1
connection unless specified differently, or using a 's0 -dispatch'.

Or on second (fourth? / last) thought, is there a bug somewhere as the
tcp-health-check 'should' abort the connection even before the
3way-tcp-handshake is completed? And as such s1 should not see that
first connection?? (Is that also possible/valid for a FreeBSD system? Or
would that be a linux trick?)

Regards,

PiBa-NL (Pieter)

Willy Tarreau

2018-12-08 22:49:40 UTC

Permalink

Hi Pieter,

Post by PiBa-NL
Hi List, Willy,
The regtest /reg-tests/log/b00000.vtc, is failing for me as shown below,
*** s1    0.0 accepted fd 5 127.0.0.1 29538
**   s1    0.0 === rxreq
---- s1    0.0 HTTP rx failed (fd:5 read: Connection reset by peer)
*** c1    0.0 closing fd 8
**   c1    0.0 Ending
*    top   0.0 RESETTING after ./reg-tests/log/b00000.vtc
This happens because the health-check makes a tcp connection, then
disconnects, but s1 server expects a http-request.
So to fix this, i propose to apply 1 out of 3 possible fixes i could
imagine, each one does fix the test when executed.
a- use s2 server specifically for the tcp health-check
b- use a option httpchk, and repeat s1 server twice
c- remove the health-check completely.
I think option C is probably the cleanest and most fail-safe way. And am
'pretty sure' that the health-check isn't actually needed to reproduce the
original issue.

If the purpose of the test is to test logs, we indeed possibly don't need
the health check.

Post by PiBa-NL
Anyhow health-checks could be a source of random
test-failures when the system is really slow it might need 2 checks during a
test, and normal varnishtest server's only processes 1 connection unless
specified differently, or using a 's0 -dispatch'.

I agree, we also discussed this last week when trying to figure what
type of tests we could write for health checks. These are a bit tricky.

Post by PiBa-NL
Or on second (fourth? / last) thought, is there a bug somewhere as the
tcp-health-check 'should' abort the connection even before the
3way-tcp-handshake is completed? And as such s1 should not see that first
connection?? (Is that also possible/valid for a FreeBSD system? Or would
that be a linux trick?)

Hmmm you're absolutely right, I was asking if you'd be seeing on your
freebsd machine :-) Yes indeed on Linux we can disable TCP quick-ack
and cause an RST to be emitted before the connection completes so that
the server doesn't see it. But it's also still quite timing dependent,
you only have 40ms to change your mind. This definitely is a source of
random failures when running parallel tests or running them in VMs.

Just let me know which patch you prefer me to apply, I'm fine with
your options.

Willy

PiBa-NL

2018-12-08 23:00:24 UTC

Permalink

Hi Willy,

Post by Willy Tarreau
Hi Pieter,
Just let me know which patch you prefer me to apply, I'm fine with
your options.

The patch i prefer would be 'c', to remove the 'check' from the server
line. As it simply removes all possibly check related 'issues'.

Post by Willy Tarreau
Willy

Regards,

PiBa-NL (Pieter)

Willy Tarreau

2018-12-08 23:12:48 UTC

Permalink

The patch i prefer would be 'c', to remove the 'check' from the server line.
As it simply removes all possibly check related 'issues'.

OK now done, thank you!

willy