Discussion:
faster than load-server-state-from-file?
Pierre Cheynier
2018-09-21 15:50:19 UTC
Permalink
I'm extensively using server-templates to avoid reloading too much but still, backend creation or deletion has to be done by reloading as far as I know. In my specific context, it can happen every 5/10s or so.
As a consequence, I have a lot of servers in the server-state file (>30K lines).

Trying to use load-server-state-from-file to prevent sending trafic to KO servers and retoring stats numbers, I feel that it slows down the reload a lot (multiple seconds).

Any known hint or alternative?

Thanks,

Pierre Cheynier
Willy Tarreau
2018-10-02 02:18:17 UTC
Permalink
Hi Pierre,
Post by Pierre Cheynier
I'm extensively using server-templates to avoid reloading too much but still,
backend creation or deletion has to be done by reloading as far as I know. In
my specific context, it can happen every 5/10s or so.
As a consequence, I have a lot of servers in the server-state file (>30K lines).
Trying to use load-server-state-from-file to prevent sending trafic to KO
servers and retoring stats numbers, I feel that it slows down the reload a
lot (multiple seconds).
Any known hint or alternative?
Not really. Maybe we should see how the state file parser works, because
multiple seconds to parse only 30K lines seems extremely long.

I'm just thinking about a few things. Probably that among these 30K servers,
most of them are in fact tracking other ones ? In this case it could make
sense to have an option to only dump servers which are not tracking
others, as for a reload it can make quite some sense. Is this the case
for you ?

Thanks,
Willy
Pierre Cheynier
2018-10-03 11:56:54 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
Not really. Maybe we should see how the state file parser works, because
multiple seconds to parse only 30K lines seems extremely long.
I would even say multiple minutes :)
Post by Willy Tarreau
I'm just thinking about a few things. Probably that among these 30K servers,
most of them are in fact tracking other ones ? In this case it could make
sense to have an option to only dump servers which are not tracking
others, as for a reload it can make quite some sense. Is this the case
for you ?
What do you mean by "tracking other ones"?

What I can tell is that, for historical reasons, we named all server the same way for each backends (ie. srvN) in the configuration template, and are using "server templates" to add MAINT servers in the pool so that they can be added at runtime later.

This naming thing can be changed now, but I don't know this issue could be related or not.

What we're doing basically when getting a new event:
* if it requires to delete / update / add server(s) in one or multiple pools we only use the runtime API and try to reuse free slots.
* if a backend/frontend has to be created / updated / deleted OR if the free slots for a given backend is full we reload using a configuration template.
* in Jinja2 this template looks like (simplified):

backend be_foo
<options>
{%- for server in servers %}
server srv{{loop.index0}} {{server.address}}:{{server.port}} weight {{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
{%- endfor %}
# Create 25 free slots, servers are numbered from N to N+25
server-template srv {{ servers|length }}-{{ servers|length + 25 }} 0.0.0.0:0 check disabled

Doing this I noticed that we have a lot of 'bad reconciliations' triggering warning logs, such as:

[WARNING] can't find server 'srv28' with id '29' in backend with id '9' or name 'be_test'
[WARNING] backend name mismatch: from server state file: 'be_foo', from running config 'be_bar'

I don't know if these inconsistencies (that clearly have to be fixed) can cause additional delays.

Thanks,

Pierre
Baptiste
2018-10-08 14:20:06 UTC
Permalink
Bonjour Messieurs,

(je passe en FR et hors ML et je top-poste!!!).

Pierre, je suis déjà en contact avec plusieurs autres Pierre de chez Critéo
(le prénom, c'est un critÚre de recrutement chez vous???)
En tant que "dev" et "mainteneur" du server state, je ne suis pas surpris
pas la lenteur de chargement, par contre l'ampleur de cette lenteur
m'étonne beaucoup.
En fait, c'est un parcours de liste fait à base de strcmp de mémoire, donc
si tu as beaucoup de backend qui eux-même ont beaucoup de serveurs, c'est
en effet pas super optimal.
On avait fait comme ça car on pensait que le serveur state ne serait pas
utilisé "at scale" comme vous le faites.

Pierre: Combien as-tu de backend et de serveur (en moyenne) par backend,
dans une seule et même configuration
Il n'y a qu'un seul moyen de virer tous ces warnings, c'est de forcer l'id
des backend et des serveurs dans ta conf (paramêtre 'id').
(j'ai prévu de passer chez Critéo la semaine prochaine, je te ferais signe,
on pourra voir ton problÚme en live).

Willy: Il me semble que les backends sont déjà stockés dans des ebtree.
Pourrait-on stocker aussi les serveurs dans des ebtrees pour accélerer la
recherche?
Ou mieux, faire un arbre qui avec en point d'entrée "<backend>/<server>" ?

Baptiste
Post by Pierre Cheynier
Hi Willy,
Post by Willy Tarreau
Not really. Maybe we should see how the state file parser works, because
multiple seconds to parse only 30K lines seems extremely long.
I would even say multiple minutes :)
Post by Willy Tarreau
I'm just thinking about a few things. Probably that among these 30K
servers,
Post by Willy Tarreau
most of them are in fact tracking other ones ? In this case it could make
sense to have an option to only dump servers which are not tracking
others, as for a reload it can make quite some sense. Is this the case
for you ?
What do you mean by "tracking other ones"?
What I can tell is that, for historical reasons, we named all server the
same way for each backends (ie. srvN) in the configuration template, and
are using "server templates" to add MAINT servers in the pool so that they
can be added at runtime later.
This naming thing can be changed now, but I don't know this issue could be related or not.
* if it requires to delete / update / add server(s) in one or multiple
pools we only use the runtime API and try to reuse free slots.
* if a backend/frontend has to be created / updated / deleted OR if the
free slots for a given backend is full we reload using a configuration
template.
backend be_foo
<options>
{%- for server in servers %}
server srv{{loop.index0}} {{server.address}}:{{server.port}} weight
{{server.weight}}{%- if server.tls %} ssl{%- endif %} check port 8500
{%- endfor %}
# Create 25 free slots, servers are numbered from N to N+25
server-template srv {{ servers|length }}-{{ servers|length + 25 }}
0.0.0.0:0 check disabled
Doing this I noticed that we have a lot of 'bad reconciliations'
[WARNING] can't find server 'srv28' with id '29' in backend with id '9' or name 'be_test'
[WARNING] backend name mismatch: from server state file: 'be_foo', from
running config 'be_bar'
I don't know if these inconsistencies (that clearly have to be fixed) can
cause additional delays.
Thanks,
Pierre
Willy Tarreau
2018-10-08 17:35:43 UTC
Permalink
Hi Baptiste.
Post by Baptiste
Bonjour Messieurs,
(je passe en FR et hors ML et je top-poste!!!).
Just for my curiosity, why not answering in english?
He thought he responded privately and excluded the mailing list from
the CC but apparently he was facing an ENOCOFFEE type of error :-)

Cheers,
Willy
Baptiste
2018-10-09 06:33:02 UTC
Permalink
Post by Willy Tarreau
Hi Baptiste.
Post by Baptiste
Bonjour Messieurs,
(je passe en FR et hors ML et je top-poste!!!).
Just for my curiosity, why not answering in english?
He thought he responded privately and excluded the mailing list from
the CC but apparently he was facing an ENOCOFFEE type of error :-)
Oh yes we all know this error code ;-)
Post by Willy Tarreau
Cheers,
Willy
Regards
Aleks
That's it, furthermore, Willy, Pierre and I speaks the same protocol (the
French Language)...
I "switched" to private, cause I'm asking some "internal" information to be
able to reproduce the behavior and to troubleshoot it.

Sorry for the noise, beers, tea or coffee are on me!

Baptiste

Continue reading on narkive:
Loading...