Discussion:
SSL handshake failure
Shawn Heisey
2014-09-09 21:47:30 UTC
Permalink
I do not think this is a problem with haproxy (running 1.5.4), but I'm
hoping haproxy can help me debug it.

When I get SSL handshake failure, can haproxy be configured to log debug
messages about WHY it failed? We don't have any visibility into the
client -- it's at a customer site in Japan, I'm in the US.

There is another question, but it's on an unrelated product. I've got
the latest version of Wireshark (1.12.0), configured with my
certificate's private key for SSL decrypting. The problem is that
Wireshark is telling me that there is something wrong with the TLSv1
frames ("Ignored Unknown Record"). I do not have decrypted responses,
only decrypted requests, and I assume that is because of those TLSv1
problems. The question: Is wireshark buggy, or are those TLSv1 frames
actually problematic? The program was compiled against
openssl-0.9.8e-27.el5_10.1 and it's running on a system with
openssl-0.9.8e-7.el5 installed -- the production systems don't have a
compiler or dev libraries installed, and when I attempted to install
them, yum wouldn't work.

If I force haproxy to use sslv3, then wireshark can decrypt the packets
properly (when checked with a browser), but then our testing tools can't
connect to it.

Thanks,
Shawn
Willy Tarreau
2014-09-10 05:45:22 UTC
Permalink
Hi Shawn,
Post by Shawn Heisey
I do not think this is a problem with haproxy (running 1.5.4), but I'm
hoping haproxy can help me debug it.
When I get SSL handshake failure, can haproxy be configured to log debug
messages about WHY it failed?
Normally it will log one line indicating that a handshake has failed, with
a cause when it could determine it. The following SSL errors are diagnosed
and logged (from include/proto/connection.h) :

case CO_ER_SSL_EMPTY: return "Connection closed during SSL handshake";
case CO_ER_SSL_ABORT: return "Connection error during SSL handshake";
case CO_ER_SSL_TIMEOUT: return "Timeout during SSL handshake";
case CO_ER_SSL_TOO_MANY: return "Too many SSL connections";
case CO_ER_SSL_NO_MEM: return "Out of memory when initializing an SSL connection";
case CO_ER_SSL_RENEG: return "Rejected a client-initiated SSL renegociation attempt";
case CO_ER_SSL_CA_FAIL: return "SSL client CA chain cannot be verified";
case CO_ER_SSL_CRT_FAIL: return "SSL client certificate not trusted";
case CO_ER_SSL_HANDSHAKE: return "SSL handshake failure";
case CO_ER_SSL_HANDSHAKE_HB: return "SSL handshake failure after heartbeat";
case CO_ER_SSL_KILLED_HB: return "Stopped a TLSv1 heartbeat attack (CVE-2014-0160)";
case CO_ER_SSL_NO_TARGET: return "Attempt to use SSL on an unknown target (internal error)";
Post by Shawn Heisey
We don't have any visibility into the
client -- it's at a customer site in Japan, I'm in the US.
If you only get the "SSL handshake failure" message in the logs, it's very
likely that it was not possible to agree on a cipher. A sniffer in the middle
wlil help you diagnose it I guess.
Post by Shawn Heisey
There is another question, but it's on an unrelated product. I've got
the latest version of Wireshark (1.12.0), configured with my
certificate's private key for SSL decrypting. The problem is that
Wireshark is telling me that there is something wrong with the TLSv1
frames ("Ignored Unknown Record"). I do not have decrypted responses,
only decrypted requests, and I assume that is because of those TLSv1
problems. The question: Is wireshark buggy, or are those TLSv1 frames
actually problematic? The program was compiled against
openssl-0.9.8e-27.el5_10.1 and it's running on a system with
openssl-0.9.8e-7.el5 installed -- the production systems don't have a
compiler or dev libraries installed, and when I attempted to install
them, yum wouldn't work.
It is possible that the more recent openssl lib above defined a few extra
fields that are not supported by the older one used at runtime, resulting
in undefined behaviour. If you cannot upgrade the production version, I
suggest that instead you rebuild haproxy with a static openssl lib, ideally
with the same version first, then with a more recent one (eg: 1.0.1h) if it
continues to fail.
Post by Shawn Heisey
If I force haproxy to use sslv3, then wireshark can decrypt the packets
properly (when checked with a browser), but then our testing tools can't
connect to it.
TLS supports a wide number of extensions, it is possible that wireshark
doesn't know them all. But it's also possible that some garbage is being
sent due to the incompatibility between the build and runtime version,
which would explain why Wireshark cannot decrypt it.

Regards,
Willy
Shawn Heisey
2014-09-10 18:20:00 UTC
Permalink
Post by Willy Tarreau
It is possible that the more recent openssl lib above defined a few extra
fields that are not supported by the older one used at runtime, resulting
in undefined behaviour. If you cannot upgrade the production version, I
suggest that instead you rebuild haproxy with a static openssl lib, ideally
with the same version first, then with a more recent one (eg: 1.0.1h) if it
continues to fail.
Post by Shawn Heisey
If I force haproxy to use sslv3, then wireshark can decrypt the packets
properly (when checked with a browser), but then our testing tools can't
connect to it.
TLS supports a wide number of extensions, it is possible that wireshark
doesn't know them all. But it's also possible that some garbage is being
sent due to the incompatibility between the build and runtime version,
which would explain why Wireshark cannot decrypt it.
I managed to get a dev environment on the server where I'm running it
and get it recompiled. Unfortunately that didn't change the info in the
capture, so I'm guessing that it's something that wireshark just can't
deal with yet. The request that shows up in the capture with "Ignored
Unknown Record" on the TLSv1 packets worked perfectly from a browser --
it was a request for a jpg image.

This means that if I can't force sslv3, I won't be able to fully decode
the packet capture. I *can* see the requests, which may be enough.

I'd really like to completely reload these with an OS newer than CentOS
5. I don't know what the linux2628 target has that's not in linux26,
but I bet it's pretty nice.

Thanks,
Shawn
Willy Tarreau
2014-09-10 21:00:23 UTC
Permalink
Post by Shawn Heisey
Post by Willy Tarreau
It is possible that the more recent openssl lib above defined a few extra
fields that are not supported by the older one used at runtime, resulting
in undefined behaviour. If you cannot upgrade the production version, I
suggest that instead you rebuild haproxy with a static openssl lib, ideally
with the same version first, then with a more recent one (eg: 1.0.1h) if it
continues to fail.
Post by Shawn Heisey
If I force haproxy to use sslv3, then wireshark can decrypt the packets
properly (when checked with a browser), but then our testing tools can't
connect to it.
TLS supports a wide number of extensions, it is possible that wireshark
doesn't know them all. But it's also possible that some garbage is being
sent due to the incompatibility between the build and runtime version,
which would explain why Wireshark cannot decrypt it.
I managed to get a dev environment on the server where I'm running it
and get it recompiled. Unfortunately that didn't change the info in the
capture, so I'm guessing that it's something that wireshark just can't
deal with yet. The request that shows up in the capture with "Ignored
Unknown Record" on the TLSv1 packets worked perfectly from a browser --
it was a request for a jpg image.
This means that if I can't force sslv3, I won't be able to fully decode
the packet capture. I *can* see the requests, which may be enough.
Could you try with the same version for building and runtime ? Also,
can you try with a recent openssl version in this dev environment ?
I'm not dismissing 0.9.8 which must work of course, but since you're
having two different versions, we cannot rule out a problem there.
Post by Shawn Heisey
I'd really like to completely reload these with an OS newer than CentOS
5. I don't know what the linux2628 target has that's not in linux26,
but I bet it's pretty nice.
You'll see them in the makefile : splice(), accept4(), cpu affinity,
transparent proxy. That's already nice indeed :-) But that's not
compatible with RHEL5 which comes with a heavily patched 2.6.18.

Willy
Shawn Heisey
2014-09-11 01:09:13 UTC
Permalink
Post by Willy Tarreau
having two different versions, we cannot rule out a problem there.
I did manage to do that. My captures (of my test requests) don't show an
improvement in wireshark's ability to decrypt.

I suspect that the actual handshake problem with the customer is on their
end. The certificate we were using in production was expired and had the
wrong host name in the subject, so we got a new one with the correct name.
They couldn't connect to that either. I now have placed that expired and
incorrect cert in haproxy's configuration, and I bet they'll be able to
connect to it now. I think their client is probably very stupid.

Because they're in Japan, it takes pretty much a full day for every little
tweak we make to get tested. I hope we can get a more interactive testing
session going.

Thanks,
Shawn
Willy Tarreau
2014-09-11 05:43:19 UTC
Permalink
Post by Shawn Heisey
Post by Willy Tarreau
having two different versions, we cannot rule out a problem there.
I did manage to do that. My captures (of my test requests) don't show an
improvement in wireshark's ability to decrypt.
I suspect that the actual handshake problem with the customer is on their
end. The certificate we were using in production was expired and had the
wrong host name in the subject, so we got a new one with the correct name.
They couldn't connect to that either. I now have placed that expired and
incorrect cert in haproxy's configuration, and I bet they'll be able to
connect to it now. I think their client is probably very stupid.
It is also possible that they have stored locally a copy of your old cert
or maybe they have your CA's certs and you changed to a new CA to sign this
new cert.

If they're willing to do some tests, you could probably use openssl itself
to wait for an incoming connection. Check "openssl s_server -accept <port>"
for this. You'd get a lot more detailed info from the openssl tool itself.
Maybe you'l discover that their client is old and cannot negociate a key of
the size you're using for example. Next step could be to have them connect
to your openssl using "openssl s_client -connect" and see on their side if
anything looks wrong.
Post by Shawn Heisey
Because they're in Japan, it takes pretty much a full day for every little
tweak we make to get tested. I hope we can get a more interactive testing
session going.
Yeah I know this situation as well. On the other hand you don't waste as much
time as you think because it forces you to prepare everything and review
everything before going to sleep, and generally you can spot a number of
issues on your side.

Willy
Shawn Heisey
2014-09-11 13:16:45 UTC
Permalink
Post by Willy Tarreau
It is also possible that they have stored locally a copy of your old cert
or maybe they have your CA's certs and you changed to a new CA to sign this
new cert.
It's the same CA and intermediate cert. We suspect that they have
configured it to only validate a specific certificate -- the broken one.
Post by Willy Tarreau
If they're willing to do some tests, you could probably use openssl itself
to wait for an incoming connection. Check "openssl s_server -accept <port>"
for this. You'd get a lot more detailed info from the openssl tool itself.
Maybe you'l discover that their client is old and cannot negociate a key of
the size you're using for example. Next step could be to have them connect
to your openssl using "openssl s_client -connect" and see on their side if
anything looks wrong.
I can make this suggestion, although one thing I do know is that they've
got an aspx application running on Windows 2003 ... they probably know
less about SSL than my immediate supervisor ... and that's saying something.

I'm getting backend servers going down due to Layer6 timeout after five
seconds. Is this during SSL handshaking, by chance? I can start a new
thread on this issue if that's advisable. This may be the entire
problem with these connections -- the Mule process on the back end may
have some SSL issues. We've discussed turning SSL off on the back end
.. if switching to haproxy proves useful, we may do that.

Sep 11 05:21:57 localhost.localdomain haproxy[8434]: Backup Server
services-ai-search-backend/aladdin is DOWN, reason: Layer6 timeout,
check duration: 5001ms. 0 active and 0 backup servers left. 2 sessions
active, 0 requeued, 0 remaining in queue.
Sep 11 05:24:28 localhost.localdomain haproxy[8434]: Server
services-ai-request-backend/fiesta is DOWN, reason: Layer6 timeout,
check duration: 5004ms. 0 active and 0 backup servers left. 1 sessions
active, 0 requeued, 0 remaining in queue.

Thanks,
Shawn

Loading...