You are here

Huge amount of TIME_WAIT connections

In MySQL we have the typical behaviour that we open and close connections very often and rapidly. So we have very short-living connections to the server. This can lead in extreme cases to the situation that the maximum number of TCP ports are exhausted.

The maximum number of TCP ports we can find with:

# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

In this example we can have in maximum (61000 - 32768 = 28232) connections concurrently open.

When a TCP connections closes the port cannot be reused immediately afterwards because the Operating System has to wait for the duration of the TIME_WAIT interval (maximum segment lifetime, MSL). This we can see with the command:

# netstat -nat

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:10051               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:10051             127.0.0.1:60756             TIME_WAIT
tcp        0      0 127.0.0.1:10050             127.0.0.1:50191             TIME_WAIT
tcp        0      0 127.0.0.1:10050             127.0.0.1:52186             ESTABLISHED
tcp        0      0 127.0.0.1:10051             127.0.0.1:34445             TIME_WAIT

The reason for waiting is that packets may arrive out of order or be retransmitted after the connection has been closed. CLOSE_WAIT indicates that the other side of the connection has closed the connection. TIME_WAIT indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately.

The Maximum Segment Lifetime can be found as follows:

# cat /proc/sys/net/ipv4/tcp_fin_timeout
60

This basically means your system cannot guarantee more than ((61000 - 32768) / 60 = 470) ports at any given time.

Solutions

There are several strategies out of this problem:

  • Open less frequently connections to your MySQL database. Put more payload into one connection. Often Connection Pooling is used to achieve this.
  • Increasing the port range. Setting the range to 15000 61000 is pretty common these days (extreme tuning: 1024 - 65535).
  • Increase the availability by decreasing the FIN timeout.

Those values can be changed online with:

# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
# echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range

Or permanently by adding it to /etc/sysctl.conf

An other possibility to change this behaviour is to use tcp_tw_recycle and tcp_tw_reuse. By default they are disabled:

# cat /proc/sys/net/ipv4/tcp_tw_recycle
0
# cat /proc/sys/net/ipv4/tcp_tw_reuse
0

These parameters allow fast cycling of sockets in TIME_WAIT state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these ports.

The tcp_tw_recycle could cause some problems when using load balancers:

tcp_tw_reuse Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0.
It should not be changed without advice/request of technical experts.
tcp_tw_recycle Enable fast recycling TIME_WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts.

Literature

Comments

Thank you for the article. Sorry to nitpick.
My colleague pointed out if the local port range is 32768 – 61000, that there are 28233 available ports, not 28232.

It may also be worth mentioning the difference between FIN TIMEOUT (/proc/sys/net/ipv4/tcp_fin_timeout) and the TIMEWAIT length which is hard-coded to 60s in the linux kernel.

This question contains some good pointers and the answer contains a little program in C that you can compile and use to see how long the timeout is:
http://unix.stackexchange.com/questions/17218/how-long-is-a-tcp-local-so...

A similar article:
http://www.krenel.org/tcp-time_wait-and-ephemeral-ports-bad-friends/

thatsafunnynamecomment

Hello thatsafunnynamecomment,

Thanks for reading and correcting my findings! You are absolutely right. I did a bit short-cut too much in maths!

For your comment #2 it looks like I did not investigate carefully enough. Thanks for correcting me. I found several sources pointing to tcp_fin_timeout and mentioned TIME_WAIT is affected. So I should just be more careful next time. Especially in a domain I am an absolute noob.

Shinguz

Shinguzcomment