Discussion:
TCP Socket TIME_WAIT error Windows WCF
Srdan Dukic
2010-05-24 23:00:36 UTC
Permalink
Hi,

I've setup HAProxy (version 1.3.15.2 from repo's) on Debian to load balance
requests to a bunch of Windows web services running on IIS7. My problem is
that after load testing the system, Windows runs out of sockets as they all
end up in the TCP TIME_WAIT state. (Nice explanation of this TCP state found
here:
http://blog.zhuzhaoyuan.com/2009/03/a-word-on-time_wait-and-close_wait/)

When removing the load balancer and having the requests go directly to the
web server there are only a maximum of 10 sockets in the TIME_WAIT state at
any one time.

It seems like there's some error in communication between HAProxy and
Windows/IIS7 where the sockets aren't closed properly. I've tried adding the
"httpclose" option but the problem remained. Has anyone had this problem
before? My haproxy.cfg is as below:

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 40960
#chroot /usr/share/haproxy
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000

listen webfarm 192.168.17.150:80
mode http
stats enable
stats auth admin:admin
balance roundrobin
option httpchk
option httpclose
http-check disable-on-404
server dn1 192.168.17.136 check
server dn2 192.168.17.137 check
server dn3 192.168.17.138 check

Thank you
--
Srđan Đukić
Willy Tarreau
2010-05-25 04:21:28 UTC
Permalink
Hi,
Post by Srdan Dukic
Hi,
I've setup HAProxy (version 1.3.15.2 from repo's) on Debian to load balance
requests to a bunch of Windows web services running on IIS7. My problem is
that after load testing the system, Windows runs out of sockets as they all
end up in the TCP TIME_WAIT state.
Could you please recheck the state on the server ? I don't believe a minute
that it runs "out of sockets" if they are in TIME_WAIT state because TIME_WAIT
is a final state without data and the socket does not exist anymore in the
process which previously hold it. You can easily have millions of them, they
are harmless. OK this is Windows, but their TCP stack is not *that* bad.

What may happen however is that if you stop and restart the service, then it
cannot bind because of "address already in use", but I hope that IIS developers
have handled this very common issue.
Post by Srdan Dukic
http://blog.zhuzhaoyuan.com/2009/03/a-word-on-time_wait-and-close_wait/)
When removing the load balancer and having the requests go directly to the
web server there are only a maximum of 10 sockets in the TIME_WAIT state at
any one time.
This is because when you connect directly, you maintain the connections for
as long as possible, while via haproxy they are closed after every exchange.
Post by Srdan Dukic
It seems like there's some error in communication between HAProxy and
Windows/IIS7 where the sockets aren't closed properly.
In fact it's the opposite. If you see them in the TIME_WAIT state, then they
are properly closed. How did you conclude that Windows ran out of sockets ?
Just because you can't connect anymore ? Are you sure you don't have iptables
loaded on your load balancer, which would have its state stable filled after
a few thousand tests and which would refuse to let new connections pass ?

Regards,
Willy
Srdan Dukic
2010-05-25 23:18:03 UTC
Permalink
Post by Willy Tarreau
Hi,
Could you please recheck the state on the server ? I don't believe a minute
that it runs "out of sockets" if they are in TIME_WAIT state because TIME_WAIT
is a final state without data and the socket does not exist anymore in the
process which previously hold it. You can easily have millions of them, they
are harmless. OK this is Windows, but their TCP stack is not *that* bad.
What may happen however is that if you stop and restart the service, then it
cannot bind because of "address already in use", but I hope that IIS developers
have handled this very common issue.
The actual error message is as follows:

Insufficient winsock resources available to complete socket connection
initiation.

System.InsufficientMemoryException: Insufficient winsock resources available
to complete socket connection initiation. ---> System.Net.WebException:
Unable to connect to the remote server --->
System.Net.Sockets.SocketException: An operation on a socket could not be
performed because the system lacked sufficient buffer space or because a
queue was.....
Post by Willy Tarreau
Post by Srdan Dukic
When removing the load balancer and having the requests go directly to
the
Post by Srdan Dukic
web server there are only a maximum of 10 sockets in the TIME_WAIT state
at
Post by Srdan Dukic
any one time.
This is because when you connect directly, you maintain the connections for
as long as possible, while via haproxy they are closed after every exchange.
The WCF windows client which connects directly to the server is configured
to disable HTTP keepalives and cookies. Also the WCF service running on the
web server is stateless. Does this make a difference or are you saying that
HAProxy closes TCP/IP connections every time as opposed to HTTP connections?
If so, is there a way to get HAProxy to not close the connection after every
exchange?

Another thing I should mention is that when we tried the setup with
NLB<http://en.wikipedia.org/wiki/Network_Load_Balancing_Services>(configured
with "Multiple Host", "Affinity: none") we didn't see this
problem. Would you happen to know if NLB closes connections on each request
or keeps them open and reuses them?
Post by Willy Tarreau
Post by Srdan Dukic
It seems like there's some error in communication between HAProxy and
Windows/IIS7 where the sockets aren't closed properly.
In fact it's the opposite. If you see them in the TIME_WAIT state, then they
are properly closed. How did you conclude that Windows ran out of sockets ?
Just because you can't connect anymore ?
See the error message above.
Post by Willy Tarreau
Are you sure you don't have iptables
loaded on your load balancer, which would have its state stable filled after
a few thousand tests and which would refuse to let new connections pass ?
The setup is a default Debain Lenny install. The iptables firewall does not
have any rules in it, although the firewall itself is not completely
disabled.

Thank you
--
Srđan Đukić
Willy Tarreau
2010-05-30 05:07:11 UTC
Permalink
Hi,
Post by Srdan Dukic
Insufficient winsock resources available to complete socket connection
initiation.
For connection *initiation* : so it means that it's not an accept() which
fails, but a connect(). Is your server trying to connect to any other
backend server ? Or maybe you're running the load tester on the same
machine as the server and the error you see is in fact for the load
tester ?

In this case, this can clearly be attributed to the number of TIME_WAIT
sockets. I don't know if there is a tunable in windows to allow reuse of
them, otherwise you'd end up with a server which is limited by the max
number of possible source ports and the connection rate.

(...)
Post by Srdan Dukic
The WCF windows client which connects directly to the server is configured
to disable HTTP keepalives and cookies.
Well, if it does not do HTTP keepalives either when connecting directly to
the server, then I really wonder what the difference can be !
Post by Srdan Dukic
Also the WCF service running on the
web server is stateless. Does this make a difference or are you saying that
HAProxy closes TCP/IP connections every time as opposed to HTTP connections?
Could you explain what difference you make between "HTTP connections" and
"TCP connections" ? Both are the same since HTTP is transported over TCP.
Post by Srdan Dukic
If so, is there a way to get HAProxy to not close the connection after every
exchange?
You may remove "option httpclose" and it will let your client maintain
keep-alive with the server, but since you said that your client disables
keep-alive, this should not make any difference. Maybe it will indicate
that your client does keep-alive regardless of its settings ?
Post by Srdan Dukic
Another thing I should mention is that when we tried the setup with
NLB<http://en.wikipedia.org/wiki/Network_Load_Balancing_Services>(configured
with "Multiple Host", "Affinity: none") we didn't see this
problem. Would you happen to know if NLB closes connections on each request
or keeps them open and reuses them?
From what I've heard of NLB, it should not interfer with HTTP, so if your
client uses keep-alives, it will let them pass.
Post by Srdan Dukic
Post by Willy Tarreau
Are you sure you don't have iptables
loaded on your load balancer, which would have its state stable filled after
a few thousand tests and which would refuse to let new connections pass ?
The setup is a default Debain Lenny install. The iptables firewall does not
have any rules in it, although the firewall itself is not completely
disabled.
OK, anyway, from your message, the problem is a lack of source ports on the
server to connect to somewhere else. Still you should be very careful with
iptables, if the nf_conntrack (or ip_conntrack) module is loaded, most often
it's loaded with default settings which are OK for a desktop PC but not for
a server, and the connection table can be filled after just a few seconds
of tests. In this case, you'd see "Conntrack table full" in "dmesg".

But for now you need to figure what the server is trying to connect to and
see if by any chance your client would do keep-alives by default which would
explain why the server would in turn establish less connections to the remote
point.

Regards,
Willy

Loading...