Discussion:
[PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
PiBa-NL
2018-09-15 00:03:45 UTC
Permalink
Hi List, Willy,

I've created a regtest that checks that when concurrent connections are
being handled that the connection counters are kept properly.

I think it could be committed as attached. It takes a few seconds to
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on
FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).

I think it might be a good and reproducible test to run.?

Or does it need more tweaking? Thoughts appreciated :).

Regards,

PiBa-NL (Pieter)
Willy Tarreau
2018-09-19 05:36:48 UTC
Permalink
Hi Pieter,
Post by PiBa-NL
Hi List, Willy,
I've created a regtest that checks that when concurrent connections are
being handled that the connection counters are kept properly.
I think it could be committed as attached. It takes a few seconds to run. It
currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD,
adding a 'nokqueue' on 1.8.13 makes it succeed though..).
I think it might be a good and reproducible test to run.?
Or does it need more tweaking? Thoughts appreciated :).
I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".

I think we should create a distinct category for such tests, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
have this :

$ make reg-tests/heavy/conn-counter-3000-req.log

It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.

Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.

Thanks,
Willy
PiBa-NL
2018-09-19 22:48:35 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Hi Pieter,
I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".
Test takes like 5 seconds to run here, and while that is a bit long if
you get a hundred more similar tests and want to continue tweaking
developments while running tests in between. It wouldn't hurt to run
such a (series) of longer tests before creating a patch and submitting
it for inclusion on the official git repository in my opinion, or before
a release.?. My attempt was to test a bit differently than just looking
for regressions of known fixed bugs, and putting a little load on
haproxy so that threads and simultaneous actions 'might' get into
conflicts/locks/stuff which might by chance, show up, which is why i
choose to go a little higher on the number of round-trips with ever
slightly increasing payload..

For me the test produces like 345 lines of output as attached. which
seems not to bad (if the test succeeds).. Besides the 2 instances of cli
output for stats, its seems not that much different from other tests..
And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
./test/b00000-loadtest.vtc passed (4.800

Taking into account conntrack and ulimit, would that mean we can never
'reg-test' if haproxy can really handle like 10000 connections without
issue? Or should the environment be configured by the test?? ,that seems
very tricky at least and probably would be undesirable.. (i just today
figured i could run reg-tests also on my production box to compare if a
new build showed issues there that my test-box might not.. i wouldn't
want system settings to changed by a reg-test run..)
Post by Willy Tarreau
I think we should create a distinct category for such tests
Agreed, which is why i used the currently non-existing '/loadtest/'
folder. If '/heavy/' is better thats of course alright for me to.
Post by Willy Tarreau
, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
$ make reg-tests/heavy/conn-counter-3000-req.log
I'm not exactly sure..("make: don't know how to make reg-tests. Stop").
i would still like to have a way to run all 'applicable' tests with 1
command, even if it takes a hour or so to verify haproxy is working
'perfectly'. But like abns@ tests cant work on FreeBSD, but they should
not 'fail', perhaps get skipped automatically though.?. Anyhow thats a
question for my other mail-topic (
Post by Willy Tarreau
It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.
Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.
Okay no problem :) , ill keep running this particular test myself for
the moment, it 'should' be able to pass normally..  (On my environment
anyhow..)
Post by Willy Tarreau
Thanks,
Willy
Thanks for your comments, and thoughts.

I'm interested in Fred's and anyone elses opinion ;) , and well maybe
this particular test-case could be replaced by something simpler/faster/
with more or less the same likelihood of catching yet unknown issues..?
Looking forward to reactions :) .

Thanks and regards,

PiBa-NL (Pieter)
Willy Tarreau
2018-09-20 11:56:58 UTC
Permalink
Hi Pieter,
Test takes like 5 seconds to run here, and while that is a bit long if you
get a hundred more similar tests and want to continue tweaking developments
while running tests in between. It wouldn't hurt to run such a (series) of
longer tests before creating a patch and submitting it for inclusion on the
official git repository in my opinion, or before a release.
Definitely, however those run before a release should be almost 100%
system-agnostic. Having to prepare the system and tune it properly for
the test not to fail is going to cause a headache every time there is
a failure because it will mean fixing the root cause and re-run the
whole suite, which precisely is what will make all this suite not to
be used at all anymore. This is why I'm really picky on the reliability
and the speed of these tests. It should be stupid-proof, with me being
the stupid (and believe me when it comes to doing repetitive stuff, I'm
among the stupidest persons you have ever met).
My attempt was
to test a bit differently than just looking for regressions of known fixed
bugs, and putting a little load on haproxy so that threads and simultaneous
actions 'might' get into conflicts/locks/stuff which might by chance, show
up, which is why i choose to go a little higher on the number of round-trips
with ever slightly increasing payload..
I really understand the point and I think it is valid to a certain extent.
But that's really not a thing to run by default. And I want to encourage us
(including me) to run reg tests from time to time. If you know that some of
them will take too long, you'll quickly end up avoiding all the ones you can
easily avoid using a single command (eg: playing with the LEVEL variable, or
not running it at all).
For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).
It's already far too much for a user. I should only know if it works
or not, otherwise it hides the output of all other ones (which is what
happened). We must not have to parse the output to know if we didn't
break anything, we just have to check that it looks normal. Here's what
make reg-tests gives me on 1.8 :

***@wtap:haproxy$ time sh make-reg-tests-1.8
# top TEST reg-tests/lua/b00002.vtc passed (0.159)
# top TEST reg-tests/lua/b00001.vtc passed (0.122)
# top TEST reg-tests/lua/b00000.vtc passed (0.110)
# top TEST reg-tests/lua/b00003.vtc passed (0.137)
# top TEST reg-tests/connection/b00000.vtc passed (0.172)
# top TEST reg-tests/server/b00000.vtc passed (0.110)
# top TEST reg-tests/spoe/b00000.vtc passed (0.008)
# top TEST reg-tests/ssl/b00000.vtc passed (0.139)
# top TEST reg-tests/stick-table/b00001.vtc passed (0.110)
# top TEST reg-tests/stick-table/b00000.vtc passed (0.110)
# top TEST reg-tests/log/b00000.vtc passed (0.125)
# top TEST reg-tests/seamless-reload/b00000.vtc passed (0.123)

real 0m1.713s
user 0m0.316s
sys 0m0.068s

As you can see there's no output to parse, it *visually* looks correct.
Besides the 2 instances of cli output
for stats, its seems not that much different from other tests..
And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
./test/b00000-loadtest.vtc passed (4.800
OK then you get a valid output there. It's here that it's ugly. But we
spend enough time analysing bugs, I really refuse to spend extra time
fixing bugs in tests supposed to help detect bugs, otherwise it becomes
recursive...
Taking into account conntrack and ulimit, would that mean we can never
'reg-test' if haproxy can really handle like 10000 connections without
issue?
10k conns definitely is way beyond what you can expect from a non-root
user on a regular shell. I run most of my configs with "maxconn 400"
because that's less than 1024 FDs once you add the extra FDs for
listeners and checks. Anything beyond that will depend on the users'
setup and becomes tricky. And in this case it's more a matter of
stress-testing the system, and we can have stress-test procedures or
tools (just like we all do on our respective setups with different
tools). It's just that one has to know in advance that some preparation
is needed (typically killing a possible browser, unloading some modules,
checking that there's still memory left, maybe adding some addresses to
the loopback, etc). So it's a different approach and it definitely is
out of the scope of automatizing detection of potential regressions
during development.
Or should the environment be configured by the test?? ,that seems
very tricky at least and probably would be undesirable..
No definitely it would be even worse. For sure those used to run
"curl | sudo bash" will have no problem letting it configure their
system, but I'm not among such people and I appreciate a lot that
my machine works every morning when I want to work :-)
Post by Willy Tarreau
$ make reg-tests/heavy/conn-counter-3000-req.log
I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). i
would still like to have a way to run all 'applicable' tests with 1 command,
even if it takes a hour or so to verify haproxy is working 'perfectly'.
One thing is important to keep in mind regarding automated testing : the
tests are *only* useful if they take less cumulated time to detect a bug
than the time it would have taken to detect it oneself. I mean (I'm
exagerating a bit but you'll get it), if the tool takes 1 minute per build,
100 builds a day, thus roughly 25000 minutes per year, that's roughly 52
work days at 8h/day. I definitely claim that a human will not waste 52 full
days a year to detect a bug, not even to fix it. So there is a reasonable
tradeoff to set. That's also why I'm saying that I'm *not* interested in
tests for already fixed bugs, they only waste test time. Their only purpose
is for backports, because like any reproducer, it helps the stable team to
verify that 1) the bug is indeed present in the stable version and needs
a fix, and 2) that the backport was properly done. But once done, this
test becomes useless and for now I don't have a good solution to propose
to keep them without having to re-run them.

I suspect that we could use sequence numbers for such tests, or maybe
just dates, and have a file somewhere that we update from time to time,
which contains the earliest version that we're going to test in that
category (i.e. "run tests for all bugs fixed since 20170101"). It would
only require a single commit in each maintenance release to bump that
file and say "OK, no need to test these ones anymore, they are fixed".

My interest is in large coverage, functional coverage. We can have
config files making use of 10-15% of the known features at once, and
which will fail if any of such features get broken, but will never ever
fail otherwise. This is useful testing. But it's not as easy to implement
as it seems, because once you factor in the maintenance aspect, you'll
realise that sometimes you have to update the test file to adjust something
related to a minor behaviour change and that it doesn't backport as easily.
But that's where most of the value lies in my opinion.
But
get skipped automatically though.?.
Very likely. In fact given that we want *functional* coverage, this means
that either the test is specific to abns and should be skipped on FreeBSD,
or it's generic and makes use of abns becaus eit was convenient there, and
it has to be modified to be portable.
Anyhow thats a question for my other
)
Thanks for the link, I think I missed this one.
Post by Willy Tarreau
Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.
Okay no problem :) , ill keep running this particular test myself for the
moment, it 'should' be able to pass normally..  (On my environment anyhow..)
(...)
I'm interested in Fred's and anyone elses opinion ;) , and well maybe this
particular test-case could be replaced by something simpler/faster/ with
more or less the same likelihood of catching yet unknown issues..? Looking
forward to reactions :) .
Yep. Despite what some people might think, I'm really interested in tests.
I used to write test programs to detect lots of issues on PCs 30 years ago
when I was still a kid, I even managed to detect some fake chips and caches
by then. That might also be why I insist on efficient testing and not just
testing which makes people feel good. I'd really prefer to have only 20
quick tests covering more than 50% of the tricky parts we regularly break
and which people never have any excuse for not running, than 500 tests that
are a pain to run or debug or that constantly report "OK" because they're
related to bugs that were fixed 5 years ago and that are impossible to meet
again unless someone does it on purpose.

However I know that for this to work, we need to create momentum around
tests, process and methodology. If I start by asking that we work on such
efficient tests, we won't ever see nothing because each attempt will suffer
from the same failures we already see and will be demotivating. By starting
the way we do right now, we can experiment, test, collect feedback and ideas,
encourage people to use the tool to help developers reproduce a bug in their
environment, etc. Once enough people have an experience and a valuable
opinion on what can be done, it will be easier to go further and improve
the situation. At the moment I can say I'm really pleased to see that this
is progressing faster than I would have imagined ;-)

Cheers,
Willy
PiBa-NL
2018-09-20 20:25:28 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).
It's already far too much for a user.
Well those 345 lines are if it succeeds while in 'verbose' mode, in
'normal' mode it only produces 1 line of output when successful. Pretty
much all tests produce 100+ lines of 'logging' if they fail for some
reason. From what ive seen varnishtest either produces a bulk of logging
on a failure, or it only logs the failures. There isn't much in between.

As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)
PiBa-NL
2018-09-29 22:05:14 UTC
Permalink
Hi Willy,

I thought lets give those reg-test another try :) as its easy to run and
dev3 just came out.
All tests pass on my FreeBSD system, except this one, new reg-test attached.

Pretty much the same test as previously send, but now with only 4 x 10
connections. Which should be fine for conntrack and sysctls (i hope..).
It seems those stats numbers are 'off', or is my expected value not as
fixed as i thought it would be?

Tested with:
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
FreeBSD freebsd11 11.1-RELEASE

Results:
**** h1    0.0 CLI recv|CumConns: 33
**** h1    0.0 CLI recv|CumReq: 65
---- h1    0.0 CLI expect failed ~ "CumConns: 41"

If my 'expect' is correct,  would the patch be suitable for inclusion
with the other reg-tests this way?
If you want to rename loadtest, to heavytest, or any other tweaks please
feel free to do so.

Regards,
PiBa-NL (Pieter)
Post by PiBa-NL
Hi Willy,
Post by Willy Tarreau
For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).
It's already far too much for a user.
Well those 345 lines are if it succeeds while in 'verbose' mode, in
'normal' mode it only produces 1 line of output when successful.
Pretty much all tests produce 100+ lines of 'logging' if they fail for
some reason. From what ive seen varnishtest either produces a bulk of
logging on a failure, or it only logs the failures. There isn't much
in between.
As for all the rest of the email, thanks for your elaborate response :).
Regards,
PiBa-NL (Pieter)
Willy Tarreau
2018-09-30 05:46:24 UTC
Permalink
Hi Pieter,
Post by PiBa-NL
Hi Willy,
I thought lets give those reg-test another try :) as its easy to run and
dev3 just came out.
All tests pass on my FreeBSD system, except this one, new reg-test attached.
Pretty much the same test as previously send, but now with only 4 x 10
connections. Which should be fine for conntrack and sysctls (i hope..). It
seems those stats numbers are 'off', or is my expected value not as fixed as
i thought it would be?
Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!

I'm also seeing that you rely on threads, I think I noticed another test
involving threads. Probably that we should have a specific directory for
these ones that we can disable completely when threads are not enabled,
otherwise this will also destroy tests (and make them extremely slow due
to varnishtest waiting for the timeout if haproxy refuses to parse the
config).

I think that we should think a bit forward based on these tests. We must
not let varnishtest stop on the first error but rather just log it. Then
at the end we could produce a report of successes and failures that would
be easy to diff from the previous (or expected) one. That will be
particularly useful when running the tests on older releases. As an
example, I had to run your test manually on 1.8 because for I-don't-know-
what-reason, the one about the proxy protocol now fails while it used to
work fine last week for the 1.8.14 release. That's a shame that we can't
complete tests just because one randomly fails.

Thanks,
Willy
Willy Tarreau
2018-09-30 05:56:58 UTC
Permalink
Post by Willy Tarreau
Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
By the way, could you please explain in simple words the issue you've
noticed ? I tried to reverse the vtc file but I don't understand the
details nor what it tries to achieve. When I'm running a simple test
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed, so
I'm not even sure the test is correct :-/

Thanks,
Willy
PiBa-NL
2018-09-30 17:15:59 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Post by Willy Tarreau
Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
By the way, could you please explain in simple words the issue you've
noticed ? I tried to reverse the vtc file but I don't understand the
details nor what it tries to achieve. When I'm running a simple test
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed
But CurrConns is 0, so connections are (supposed to be?) closed? :

**** h1    0.0 CLI recv|CurrConns: 0
**** h1    0.0 CLI recv|CumConns: 27
**** h1    0.0 CLI recv|CumReq: 27
Post by Willy Tarreau
, so
I'm not even sure the test is correct :-/
Thanks,
Willy
What i'm trying to achieve is, well.. testing for regressions that are
not yet known to exist on the current stable version.

So what this test does in short:
It makes 4 clients simultaneously send a request to a threaded haproxy,
which in turn connects 10x backend to frontend and then sends the
request to the s1 server. This with the intended purpose of having
several connections started and broken up as fast as haproxy can process
them while trying to have a high probability of adding/removing items
from lists/counters from different threads thus possibly creating
problems if some lock/sync isn't done correctly. After firing a few
requests it also verifies the expected counts, and results where possible..

History:
Ive been bit a few times with older releases by corruption occurring
inside the POST data when uploading large (500MB+) files to a server
behind haproxy. After a few megabytes are passed correctly the resulting
file would contain differences from their original when compared, the
upload 'seemed' to succeed though. (this was then solved by installing a
newer haproxy build..).. Also sometimes threads have locked up or
crashed things. Or kqueue scheduler turned out to behave differently
than others.. Ive been trying to test such things manually but found i
always forget to run some test. This is why i really like the concept of
having a set of defined tests that validate haproxy is working
'properly', on the OS i run it on.. Also when some issue i ran into gets
fixed i tend to run -dev builds on my production environment for a
while, and well its nice to know that other functionality still works as
it used to..

With writing this test i initially started with the idea of
automatically testing a large file transfer through haproxy, but then
thought where / how to leave such a file, so i thought of transferring a
'large' header with increasing size 'might' trigger a similar
condition.. Though in hindsight that might not actually test the same
code paths..

I created that test with 1 byte growth in the header together with 4000
connections didn't quite achieve that initial big file simulation, but
still i thought it ended up to be a nice test. So submitted it a while
back ;) .. Anyhow haproxy wasn't capable of doing much when dev2 was
tagged so i wasnt to worried the test failed at that time.. And you
announced dev2 as such as well, so that was okay. And perhaps the issue
found then would solve itself when further fixes on top of dev2 were
added ;).

Anyhow with dev3 i hoped all regressions would be fixed, and found this
one still failed on 1.9dev3. So it tuned the numbers in the previous
submitted regtest down a little to avoid conntrack/sysctl default
limits, while still failing the test 'reliably'.. I'm not sure what
exactly is going on, or how bad it is that these numbers don't match up
anymore.. Maybe its only the counter thats not updated in a thread safe
way, perhaps there is a bigger issue lurking with sync points and
whatnot..? Either way the test should pass as i understand it, the 4
defined varnish clients got their answer back and Currconns = 0, also
adding a 3 second delay between waiting for the clients and checking the
stats does not fix it... And as youve checked with 1.8 it does pass.
Though that to could perhaps be a coincidence, maybe now things are
processed even faster now but in different order so the test fails for
the wrong reason.?.

Hope that makes some sense in my thought process :).

Regards,

PiBa-NL (Pieter)
Willy Tarreau
2018-09-30 18:14:35 UTC
Permalink
Post by PiBa-NL
Post by Willy Tarreau
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed
**** h1    0.0 CLI recv|CurrConns: 0
**** h1    0.0 CLI recv|CumConns: 27
**** h1    0.0 CLI recv|CumReq: 27
You're totally right, I think I confused CumConns and CurrConns when
looking at the output. With that said I have no idea what's going on,
I'll have another look.

Thanks,
Willy
PiBa-NL
2018-09-30 17:59:34 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Hi Pieter,
Post by PiBa-NL
Hi Willy,
I thought lets give those reg-test another try :) as its easy to run and
dev3 just came out.
All tests pass on my FreeBSD system, except this one, new reg-test attached.
Pretty much the same test as previously send, but now with only 4 x 10
connections. Which should be fine for conntrack and sysctls (i hope..). It
seems those stats numbers are 'off', or is my expected value not as fixed as
i thought it would be?
Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse.
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it
still works.. Then both CumConns CumReq show 11 for the first stats result.
Post by Willy Tarreau
However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
I'm not fully sure i agree on that.. While i understand that failing
reg-tests can be a pita while developing (if you run them regulary) the
fact is that currently existing tests can already already start to fail
after some major redesign of the code, a few mails back (different
mailthread) i tested like 10 commits in a row and they all suffered from
different failing tests, that would imho not be a reason to remove those
tests, and they didnt stop development.
Post by Willy Tarreau
I'm also seeing that you rely on threads, I think I noticed another test
involving threads. Probably that we should have a specific directory for
these ones that we can disable completely when threads are not enabled,
otherwise this will also destroy tests (and make them extremely slow due
to varnishtest waiting for the timeout if haproxy refuses to parse the
config).
A specific directory will imho not work. How should it be called?
/threaded_lua_with_ssl_using_kqueue_scheduler_on_freebsd_without_absn_for_haproxy_1.9_and_higher/
?
Having varnishtest fail while waiting for a feature that was not
compiled is indeed undesirable as well. So some 'smart' way of defining
'requirements' for a test will be needed so they can gracefully skip if
not applicable.. I'm not sure myself how that way should look though..
On one side i think the .vtc itself might be the place to define what
requirements it has, on the other the other a separate list/script
including logic of what tests to run could be nice.. But then who is
going to maintain that one..
Post by Willy Tarreau
I think that we should think a bit forward based on these tests. We must
not let varnishtest stop on the first error but rather just log it.
varnishtest can continue on error with -k
Using this little mytest.sh script at the moment, this runs all tests
and only failed tests produce a lot of logging..:
  haproxy -v
  varnishtest -j 16 -k -t 20 ./work/haproxy-*/reg-tests/*/*.vtc >
./mytest-result.log 2>&1
  varnishtest -j 16 -k -t 20 ./haproxy_test_OK_20180831/*/*.vtc >>
./mytest-result.log 2>&1
  cat ./mytest-result.log
  echo "" >> ./mytest-result.log
  haproxy -vv  >> ./mytest-result.log

There is also the -q parameter, but then it doesn't log anymore what
tests passed and would only the failed tests will produce 1 log line..
(i do like to log what tests where executed though..)
Post by Willy Tarreau
Then
at the end we could produce a report of successes and failures that would
be easy to diff from the previous (or expected) one. That will be
particularly useful when running the tests on older releases. As an
example, I had to run your test manually on 1.8 because for I-don't-know-
what-reason, the one about the proxy protocol now fails while it used to
work fine last week for the 1.8.14 release. That's a shame that we can't
complete tests just because one randomly fails.
You can continue tests. ( -k ) But better write it out to a logfile
then, or perhaps combine with -l which leaves the /tmp/.vtc folder..
Post by Willy Tarreau
Thanks,
Willy
Regards,
PiBa-NL (Pieter)
Willy Tarreau
2018-09-30 18:22:23 UTC
Permalink
Post by PiBa-NL
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.
Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.
Post by PiBa-NL
Post by Willy Tarreau
However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
I'm not fully sure i agree on that.. While i understand that failing
reg-tests can be a pita while developing (if you run them regulary) the fact
is that currently existing tests can already already start to fail after
some major redesign of the code, a few mails back (different mailthread) i
tested like 10 commits in a row and they all suffered from different failing
tests, that would imho not be a reason to remove those tests, and they didnt
stop development.
The reason is that for now we have no way to let the tests fail gracefully
and report what is OK and what is not. So any error that's in the way will
lead to an absolutely certain behaviour from everyone : nobody will run the
tests anymore since the result will be known.

Don't get me wrong, I'm willing to get as many tests as we can, but 1) we
have to be sure these tests only fail for regressions and not for other
reasons, and 2) we must be sure that these tests do not prevent other ones
from being run nor make it impossible to observe the progress on other
ones. We're still at the beginning with reg tests, and as you can see we
have not even yet sorted out the requirements for some of them like threads
or Lua or whatever else.

I'm just asking that we don't create tests faster than we can sort them
out, that's all. This probably means that we really have to work on these
two main areas which are test prerequisites and synthetic reports of what
worked and what failed.

Ideas and proposals on this are welcome, but to be honest I can't spend
as much time as I'd want on this for now given how late we are on all
what remains to be done, so I really welcome discussions and help on the
subject between the various actors.

Thanks,
Willy
Willy Tarreau
2018-09-30 18:38:11 UTC
Permalink
Post by Willy Tarreau
Post by PiBa-NL
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.
Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.
So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.

So I think it's a minor bug, but not a regression.

Thanks,
Willy
PiBa-NL
2018-09-30 20:16:55 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Post by Willy Tarreau
Post by PiBa-NL
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.
Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.
So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.
So I think it's a minor bug, but not a regression.
Thanks,
Willy
Not sure, only difference between 100x FAILED and 100x OK is the version
here. Command executed and result below.

Perhaps that's just because of the OS / Scheduler used though, i assume
your using some linux distro to test with, perhaps that explains part of
the differences between your and my results.. In the end it doesn't
matter much if its a bug or a regression still needs a fix ;). And well
i don't know if its just the counter thats wrong, or there might be
bigger consequences somewhere. if its just the counter then i guess it
wouldn't hurt much to postpone a fix to a next (dev?) version.

Regards,

PiBa-NL (Pieter)

***@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j
16 -k ./haproxy_test_OK_20180831/loadtest/b00000-loadtest.vtc
...
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b00000-loadtest.vtc
FAILED (0.128) exit=2
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b00000-loadtest.vtc
FAILED (0.135) exit=2
100 tests failed, 0 tests skipped, 0 tests passed
***@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
Copyright 2000-2018 Willy Tarreau <***@haproxy.org>

***@freebsd11:/usr/ports/net/haproxy-devel # pkg add -f
haproxy-1.8.14-selfbuild-reg-tests-OK.txz
Installing haproxy-1.8...
package haproxy is already installed, forced install
Extracting haproxy-1.8: 100%
***@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j
16 -k ./haproxy_test_OK_20180831/loadtest/b00000-loadtest.vtc
0 tests failed, 0 tests skipped, 100 tests passed
***@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.8.14-52e4d43 2018/09/20
Copyright 2000-2018 Willy Tarreau <***@haproxy.org>
Willy Tarreau
2018-10-01 02:00:27 UTC
Permalink
Post by PiBa-NL
Hi Willy,
Post by Willy Tarreau
Post by Willy Tarreau
Post by PiBa-NL
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.
Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.
So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.
So I think it's a minor bug, but not a regression.
Thanks,
Willy
Not sure, only difference between 100x FAILED and 100x OK is the version
here. Command executed and result below.
Perhaps that's just because of the OS / Scheduler used though, i assume your
using some linux distro to test with, perhaps that explains part of the
differences between your and my results..
I find this strange. Your config contains a comment about the length
converter missing from 1.8 so I had to change it to use the deny part
on 1.8. It happens that using deny here precisely is what fixed the
problem for me the first time. I simplified it this way to run a
manual test :

global
stats socket /tmp/sock1 mode 666 level admin
#nbthread 3
log 127.0.0.1:5514 local0
#nokqueue

defaults
mode http
option dontlog-normal
log global
option httplog
timeout connect 3s
timeout client 4s
timeout server 15s

frontend fe1
bind 127.0.0.1:8001
acl donelooping hdr(TEST) -m len 10
http-request set-header TEST "%[hdr(TEST)]x"
use_backend b2 if donelooping
default_backend b1

backend b1
server srv1 127.0.0.1:8001

frontend fe2
bind 127.0.0.1:8002
default_backend b2

backend b2
server srv2 127.0.0.1:8000

Then I test it this way and got the same results on all versions :

$ echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
$ echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 11
CumReq: 21
CumSslConns: 0

$ echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
$ echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 11
CumReq: 11
CumSslConns: 0

Regards,
Willy
PiBa-NL
2018-10-01 21:06:33 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Post by PiBa-NL
Hi Willy,
Post by Willy Tarreau
Post by Willy Tarreau
Post by PiBa-NL
Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.
Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.
So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.
So I think it's a minor bug, but not a regression.
Thanks,
Willy
Not sure, only difference between 100x FAILED and 100x OK is the version
here. Command executed and result below.
Perhaps that's just because of the OS / Scheduler used though, i assume your
using some linux distro to test with, perhaps that explains part of the
differences between your and my results..
I find this strange. Your config contains a comment about the length
converter missing from 1.8 so I had to change it to use the deny part
on 1.8. It happens that using deny here precisely is what fixed the
problem for me the first time. I simplified it this way to run a
global
stats socket /tmp/sock1 mode 666 level admin
#nbthread 3
log 127.0.0.1:5514 local0
#nokqueue
defaults
mode http
option dontlog-normal
log global
option httplog
timeout connect 3s
timeout client 4s
timeout server 15s
frontend fe1
bind 127.0.0.1:8001
acl donelooping hdr(TEST) -m len 10
http-request set-header TEST "%[hdr(TEST)]x"
use_backend b2 if donelooping
default_backend b1
backend b1
server srv1 127.0.0.1:8001
frontend fe2
bind 127.0.0.1:8002
default_backend b2
backend b2
server srv2 127.0.0.1:8000
$ echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
$ echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 11
CumReq: 21
CumSslConns: 0
$ echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
$ echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 11
CumReq: 11
CumSslConns: 0
Regards,
Willy
I have used the exact config above, and well it 'works fine' ?? Results
below.. both types of nc requests send to 1 running haproxy result in a
proper '21'..

But even in this case the results would be 'predicable', while with my
original test the outcome is 'randomized'..
So despite the fact that i don't know whats really wrong, and the
environment it runs on does seem to affect things. There is more going
on that simply a counter incremented twice.. Also your results are
'higher' than expected, while mine are 'lower' than expected.. Perhaps
this single testcase is already showing multiple issues beneath the
carpet ;) ?.

Regards,
PiBa-NL (Pieter)

# haproxy -v
HA-Proxy version 1.8.14-52e4d43 2018/09/20
Copyright 2000-2018 Willy Tarreau <***@haproxy.org>

# echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
# echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
# echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 21
CumReq: 21
CumSslConns: 0
#
#
# haproxy -v
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
Copyright 2000-2018 Willy Tarreau <***@haproxy.org>

# echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
# echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
# echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 21
CumReq: 21
CumSslConns: 0
#
Willy Tarreau
2018-10-02 01:56:30 UTC
Permalink
Hi Pieter,
Post by PiBa-NL
I have used the exact config above, and well it 'works fine' ?? Results
below.. both types of nc requests send to 1 running haproxy result in a
proper '21'..
OK.
Post by PiBa-NL
But even in this case the results would be 'predicable', while with my
original test the outcome is 'randomized'..
So despite the fact that i don't know whats really wrong, and the
environment it runs on does seem to affect things. There is more going on
that simply a counter incremented twice.. Also your results are 'higher'
than expected, while mine are 'lower' than expected.. Perhaps this single
testcase is already showing multiple issues beneath the carpet ;) ?.
It's very likely :-/ That's why it's important to cut this into pieces
to figure what it's showing.

Willy
PiBa-NL
2018-10-04 19:43:36 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
it's important to cut this into pieces
to figure what it's showing.
Okay, the good thing i suppose we already have a reproduction sort-off..

What would be the best way to try and get to the bottom of this? Add
debug output to haproxy code? Get some kind of trace? Anything in
particular that would be of interest?

Regards,

PiBa-NL (Pieter)
Willy Tarreau
2018-10-05 03:14:55 UTC
Permalink
Hi Pieter,
Post by PiBa-NL
Hi Willy,
Post by Willy Tarreau
it's important to cut this into pieces
to figure what it's showing.
Okay, the good thing i suppose we already have a reproduction sort-off..
What would be the best way to try and get to the bottom of this? Add debug
output to haproxy code? Get some kind of trace? Anything in particular that
would be of interest?
If you manage to figure a way to reproduce it with the "connection: close"
header, that's one different bug from what I observed. However to be honest,
given how much work we still have for the rework of the HTTP engine, for now
I'd rather continue to work on the stuff we still have to do than spend one
week figuring why this counter is sometimes wrong, so if you or anyone else
wants to dig this one down to the root cause, I would really appreciate it.

Thanks!
Willy
PiBa-NL
2018-10-11 19:31:48 UTC
Permalink
Hi Willy, William,
Post by Willy Tarreau
it's important to cut this into pieces
to figure what it's showing.
A little update on this issue split in 2 pieces.

-Connection and request counters to low when ran as regtest from
varnishtest (bug?)
It turns out that starting haproxy from varnishtest, and using -W
master-worker mode, actually creates 2 processes that are handling
traffic. That explains that a large part of connections isn't seen by
the other haproxy instance and stats showing to low amounts of
connections. Bisecting it seems to fail on this commit: b3f2be3 ,
perhaps @William can you take a look at it? Not really sure when this
occurs in a 'real' environment, it doesn't seem to happen when manually
running haproxy -W, but still its strange that when varnisttest is
calling haproxy this occurs.

-Request counter to high (possibly a improvement request?)
The http_end_txn_clean_session(..) is called which increments the
counter on a finished http response, and i was testing with 2 different
methods for 1.8 and 1.9 due to missing length converter i used in my 1.9
test, which makes the comparison unfair. Sorry i didn't realize this
earlier i thought it did 'more or less' the same, that seems to have
been 'less'. Together with that i found the numbers odd/unexpected i
assumed a bigger problem that it actually seems to be, perhaps its not
even that bad, haproxy is 'preparing' for a second request over the
keep-alive connection if i understand correctly. Which eventually
doesn't happen, but is counted. Maybe that is a point that can be
improved in a future version if time permits.? Or would it even be
expected to behave like that?

Regards,
PiBa-NL (Pieter)
PiBa-NL
2018-10-12 12:29:51 UTC
Permalink
Hi William,
The attached patch should fix the issue.
The patch works for me, thanks.

Regards,

PiBa-NL (Pieter)
Willy Tarreau
2018-10-12 17:31:07 UTC
Permalink
Post by PiBa-NL
The attached patch should fix the issue.
The patch works for me, thanks.
Great, patch now merged.

Thanks!
Willy

PiBa-NL
2018-10-01 22:01:07 UTC
Permalink
Hi Frederic,
Post by Willy Tarreau
Post by PiBa-NL
Hi List, Willy,
Hi Pieter,
Sorry for this late reply.
I am also sorry to tell you that -D option should not be used anymore
because it as been recently broken. It adds an extra 2s delay on my PC.
There is a 'newer' version of the regtest in one of the other mails (
https://www.mail-archive.com/***@formilux.org/msg31318.html ). it
only runs 4x10 connections. and doesn't use -D anymore and well
discussing with Willy some odd behavioral differences between his and my
test results..
Post by Willy Tarreau
Without this option, on my PC the test fails but after having being
run during about 300ms due to the regex which does not match.
CummConns (resp. CumReq) never reaches 3001 (resp. 3002).
Note that as this script is highly verbose it may be required to
increase the varnishtest internal buffer size (see -b varnishtest
option).
Thanks didn't realize that one could have helped, probably not needed
anymore with the changed regtest though.
Post by Willy Tarreau
Post by PiBa-NL
I've created a regtest that checks that when concurrent connections
are being handled that the connection counters are kept properly.
I think it could be committed as attached. It takes a few seconds to
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue
on FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).
I think it might be a good and reproducible test to run.?
Or does it need more tweaking? Thoughts appreciated :).
haproxy h1 -cli {
    send "show info"
    expect ~ "CumConns: 3001*\\nCumReq: 3002*"
} -wait
Thats a nice one, i did try a .* regex like wildcard check but found
that didn't work, this will certainly help in case the test fails to
produce less repeated/unneeded output and when it doesn't fail it will
be a bit faster i guess, and allow for more checks on the same set of
stats to take place with little effort.
Post by Willy Tarreau
Note that all the Slg_1 syslog specs are not completely run because
you do not wait for its termination with "syslog Slg_1 -wait" command
this is why it does not fails (but it should: the syslog server does
not receive 15 messages).
I didn't intent to check for 15 messages as a valid outcome, but was
looking more for 'some' syslog output when the testcase fails for some
reason.. With the now fairly limited 40 connections it might be feasible
to more properly check syslog output though. With 3000 connections it
wasn't really nice when syslog was writing everything to the screen. Ill
take another look at that one.

Regards,
PiBa-NL (Pieter)
Loading...