Discussion:
TCP traffic multiplexing as balance algorithm?
Maik Broemme
2009-05-11 23:57:47 UTC
Permalink
Hi,

I have a small question. Did someone know if it is possible to do simple
traffic multiplexing with HAProxy? Maybe I am missing it somehow, but
want to ask on the list before creating a patch for it.

Just to answer the real-world scenario question. TCP multiplexing can be
very useful for debugging backend servers or doing a simple logging and
passive traffic dumping.

There are two major ideas of implementing it:

- 1:N (Active / Passive)
- 1:N (Active / Active)

Well active means that request is going to destination and response back
to client and passive means that only request is going to the destination.
In configuration it could look like:

listen smtp-filter 127.0.0.1:25
mode tcp
balance multiplex
server smtp1 10.0.0.5:25
server smtp2 10.0.0.6:25

The active / active would be very hard to implement, tcp stream
synchronisation would be a pain and I think no one will really need
this, but active / passive is a very useful feature.

In my environment it is often so, that developers need access to real
traffic data to debug (in the example above) their developed smtp
software. Is anyone else missing such functionality? :)

--Maik
Willy Tarreau
2009-05-13 04:33:16 UTC
Permalink
Hi Maik,
Post by Maik Broemme
Hi,
I have a small question. Did someone know if it is possible to do simple
traffic multiplexing with HAProxy? Maybe I am missing it somehow, but
want to ask on the list before creating a patch for it.
what do you call "traffic multiplexing" ? From your description below, I
failed to understand what it consists in.
Post by Maik Broemme
Just to answer the real-world scenario question. TCP multiplexing can be
very useful for debugging backend servers or doing a simple logging and
passive traffic dumping.
- 1:N (Active / Passive)
- 1:N (Active / Active)
Well active means that request is going to destination and response back
to client and passive means that only request is going to the destination.
listen smtp-filter 127.0.0.1:25
mode tcp
balance multiplex
server smtp1 10.0.0.5:25
server smtp2 10.0.0.6:25
The active / active would be very hard to implement, tcp stream
synchronisation would be a pain and I think no one will really need
this, but active / passive is a very useful feature.
In my environment it is often so, that developers need access to real
traffic data to debug (in the example above) their developed smtp
software. Is anyone else missing such functionality? :)
Access to real data is solved with tcpdump or logs, I don't see what
your load-balancing method will bring here.
Post by Maik Broemme
--Maik
Regards,
Willy
Maik Broemme
2009-05-13 08:40:27 UTC
Permalink
Hi,
Post by Willy Tarreau
Hi Maik,
Post by Maik Broemme
Hi,
I have a small question. Did someone know if it is possible to do simple
traffic multiplexing with HAProxy? Maybe I am missing it somehow, but
want to ask on the list before creating a patch for it.
what do you call "traffic multiplexing" ? From your description below, I
failed to understand what it consists in.
Multiplex means traffic duplication. If you have multiple server
configuration options in one listen group, the incoming traffic is
sent to all servers.
Post by Willy Tarreau
Post by Maik Broemme
Just to answer the real-world scenario question. TCP multiplexing can be
very useful for debugging backend servers or doing a simple logging and
passive traffic dumping.
- 1:N (Active / Passive)
- 1:N (Active / Active)
Well active means that request is going to destination and response back
to client and passive means that only request is going to the destination.
listen smtp-filter 127.0.0.1:25
mode tcp
balance multiplex
server smtp1 10.0.0.5:25
server smtp2 10.0.0.6:25
The active / active would be very hard to implement, tcp stream
synchronisation would be a pain and I think no one will really need
this, but active / passive is a very useful feature.
In my environment it is often so, that developers need access to real
traffic data to debug (in the example above) their developed smtp
software. Is anyone else missing such functionality? :)
Access to real data is solved with tcpdump or logs, I don't see what
your load-balancing method will bring here.
tcpdump is not perfect in that case, because it has to run the hole time
you want to duplicate the traffic and sent it to server1 and server2.
Post by Willy Tarreau
Post by Maik Broemme
--Maik
Regards,
Willy
--Maik
Benoit
2009-05-13 10:38:49 UTC
Permalink
Post by Maik Broemme
Hi,
Multiplex means traffic duplication. If you have multiple server
configuration options in one listen group, the incoming traffic is
sent to all servers.
Hum, i'm sorry but no, multiplexing is not duplication. In fact it's
more like the opposite,
it's the process of combining multiple data stream into one (long descr.
here: http://en.wikipedia.org/wiki/Multiplexing)
Post by Maik Broemme
tcpdump is not perfect in that case, because it has to run the hole time
you want to duplicate the traffic and sent it to server1 and server2.
So let's say you pacth haproxy and he duplicate traffic to two servers
and is able to forget
data from one (the dev/test one) and keep the other (the prod one).

How will haproxy been able to react to the different responses/timing
from each servers ?
Let's say you duplicate MX traffic, with the test server being used to
validate a new configuration
to keep away spammers (simple example).
Let's say both server are able to answer at the exact same time or
aren't too frisky about having the answer
before the question, when the prod server will accept the mail message
and start listinening from it's main
part the other could have rejected it, however he still will receive the
main part, while not expecting it,
which depending of the implementation could lead to troubles, like if
the originating mail server try to
send another mail using the same connection.


This is a very simple example and most MX implementation could react
correctly but that's not the case of
everything
Benoit
2009-05-13 10:39:15 UTC
Permalink
Post by Maik Broemme
Hi,
Multiplex means traffic duplication. If you have multiple server
configuration options in one listen group, the incoming traffic is
sent to all servers.
Hum, i'm sorry but no, multiplexing is not duplication. In fact it's
more like the opposite,
it's the process of combining multiple data stream into one (long descr.
here: http://en.wikipedia.org/wiki/Multiplexing)
Post by Maik Broemme
tcpdump is not perfect in that case, because it has to run the hole time
you want to duplicate the traffic and sent it to server1 and server2.
So let's say you pacth haproxy and he duplicate traffic to two servers
and is able to forget
data from one (the dev/test one) and keep the other (the prod one).

How will haproxy been able to react to the different responses/timing
from each servers ?
Let's say you duplicate MX traffic, with the test server being used to
validate a new configuration
to keep away spammers (simple example).
Let's say both server are able to answer at the exact same time or
aren't too frisky about having the answer
before the question, when the prod server will accept the mail message
and start listinening from it's main
part the other could have rejected it, however he still will receive the
main part, while not expecting it,
which depending of the implementation could lead to troubles, like if
the originating mail server try to
send another mail using the same connection.


This is a very simple example and most MX implementation could react
correctly but that's not the case of
everything
Maik Broemme
2009-05-13 11:09:09 UTC
Permalink
Hi,
Post by Benoit
Post by Maik Broemme
Hi,
Multiplex means traffic duplication. If you have multiple server
configuration options in one listen group, the incoming traffic is
sent to all servers.
Hum, i'm sorry but no, multiplexing is not duplication. In fact it's
more like the opposite,
it's the process of combining multiple data stream into one (long descr.
here: http://en.wikipedia.org/wiki/Multiplexing)
Sorry multiplexing was the wrong word for it, I rellay talk about
duplication.
Post by Benoit
Post by Maik Broemme
tcpdump is not perfect in that case, because it has to run the hole time
you want to duplicate the traffic and sent it to server1 and server2.
So let's say you pacth haproxy and he duplicate traffic to two servers
and is able to forget
data from one (the dev/test one) and keep the other (the prod one).
How will haproxy been able to react to the different responses/timing
from each servers ?
Let's say you duplicate MX traffic, with the test server being used to
validate a new configuration
to keep away spammers (simple example).
Let's say both server are able to answer at the exact same time or
aren't too frisky about having the answer
before the question, when the prod server will accept the mail message
and start listinening from it's main
part the other could have rejected it, however he still will receive the
main part, while not expecting it,
which depending of the implementation could lead to troubles, like if
the originating mail server try to
send another mail using the same connection.
I am thinking about the timing issues, my guess is to add a option for
the duplicate balance algorithm, lets say 'async' or 'sync'. In 'async'
state haproxy will send traffic to dev/test and only take care of
response from dev, regardless if test respond or not. Later answer from
test will be dropped by haproxy. In 'sync' state the haproxy will wait
until dev/test has answered and send the answer from dev to client.

For short:

- async will drop everything from test, regardless of answer
time and send everything to test regardless if it is expected or
not.

- sync will drop everything from test, but wait until it has answered.

There will be - for sure - not much scenarios were you need such
feature.
Post by Benoit
This is a very simple example and most MX implementation could react
correctly but that's not the case of
everything
--Maik
Michael Miller
2009-05-13 12:15:06 UTC
Permalink
Hi,

In your specific case of testing SMTP servers there is a sendmail milter
to do what you want:
http://www.snertsoft.com/sendmail/roundhouse/

I do not believe that what you are trying to achieve is possible at the
TCP level. haproxy does not have any idea of the application protocol
(eg: SMTP) running over the transport (TCP). You really need some form
of application layer proxy to handle the duplication of your requests to
the two servers.

Regards,
Mike
Post by Maik Broemme
Hi,
Post by Benoit
Post by Maik Broemme
Hi,
Multiplex means traffic duplication. If you have multiple server
configuration options in one listen group, the incoming traffic is
sent to all servers.
Hum, i'm sorry but no, multiplexing is not duplication. In fact it's
more like the opposite,
it's the process of combining multiple data stream into one (long descr.
here: http://en.wikipedia.org/wiki/Multiplexing)
Sorry multiplexing was the wrong word for it, I rellay talk about
duplication.
Post by Benoit
Post by Maik Broemme
tcpdump is not perfect in that case, because it has to run the hole time
you want to duplicate the traffic and sent it to server1 and server2.
So let's say you pacth haproxy and he duplicate traffic to two servers
and is able to forget
data from one (the dev/test one) and keep the other (the prod one).
How will haproxy been able to react to the different responses/timing
from each servers ?
Let's say you duplicate MX traffic, with the test server being used to
validate a new configuration
to keep away spammers (simple example).
Let's say both server are able to answer at the exact same time or
aren't too frisky about having the answer
before the question, when the prod server will accept the mail message
and start listinening from it's main
part the other could have rejected it, however he still will receive the
main part, while not expecting it,
which depending of the implementation could lead to troubles, like if
the originating mail server try to
send another mail using the same connection.
I am thinking about the timing issues, my guess is to add a option for
the duplicate balance algorithm, lets say 'async' or 'sync'. In 'async'
state haproxy will send traffic to dev/test and only take care of
response from dev, regardless if test respond or not. Later answer from
test will be dropped by haproxy. In 'sync' state the haproxy will wait
until dev/test has answered and send the answer from dev to client.
- async will drop everything from test, regardless of answer
time and send everything to test regardless if it is expected or
not.
- sync will drop everything from test, but wait until it has answered.
There will be - for sure - not much scenarios were you need such
feature.
Post by Benoit
This is a very simple example and most MX implementation could react
correctly but that's not the case of
everything
--Maik
Loading...