-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP connections building up in CLOSE_WAIT state #218
Comments
Looking at a
|
I ran into the same issue here. |
I had over 100,000 connections in the CLOSE_WAIT state when I had issues, probably due to clients connecting for very short periods of time. I believe in the end this improved the situation to the point that I now rarely have this issue: sudo sysctl net.ipv4.tcp_keepalive_intvl=60
sudo sysctl net.ipv4.tcp_keepalive_probes=9
sudo sysctl net.ipv4.tcp_keepalive_time=60 |
These are the default on my docker container: The only option that makes considerable would be "tcp_keepalive_time", however, I keep watching the connection and found it keeps close_wait for ever intead of 7200 seconds. I am still in tring to produce the same issue via script and will change Thank you. |
I don't think you understand how those parameters work, 7200 seconds is how long it waits to send the first keepalive after the connection becomes idle. My settings lower the total time after idle to 10 min. For the problem I had, my best guess as to what was happening is the following:
I also found that if I turn off |
@gregjhogan Thank you for your patience. I read the link again and I think we are talking kinda same. Since you statement that logstash will do clean. So I can setup a clear step to show that makes no sense.
Now I boot another virutal machine, from there I can send request to logstash, code is below(python3):
After script launch, just disconnect the virutal nic, so that you can check in logstash it keeps connection ESTABLISHED for ever. To make more connection you can kill script then connect nic back then launch script (do exactly same step to make sure there is no rst/fin to logstash) to make connection increase huge:
and I wait for more than 10 minutes and make sure I am waiting longer then what net.ipv4.tcp_keepalive_xx settled. Are we on same page? Thanks again. |
I mentioned above that I saw similar behavior (ESTABLISHED forever) if I turn off |
Yes, you are right. Let me add that option to see the result. Thank you, @gregjhogan |
with short keepalive values, the close_wait seems go aways. But see more than 100+ threads on syslog like below:
Via wireshark the number of syslog(udp packet) is less than 20 pps. It produces 429 for http plugin(I don't understand as well :-( ) and for there more than hours (No time to wait how long it will keeps -- restarted and see it recovered) |
Logstash information:
docker container
docker.elastic.co/logstash/logstash:8.11.4
Description of the problem including expected versus actual behavior:
I have messages getting sent to the logstash tcp input plugin from python code running on 5,000+ workers.
The connection to the logstash input is through an azure load balancer.
It seems like logstash isn't recognizing when these connections get disconnected, because I can easily build up 100,000 connections in the
CLOSE_WAIT
state on the logstash server in ~30 min. Eventually this seems to essentially stall logstash (it stops making output connections to elasticsearch). Restarting the docker container instantly cleans up all the connections, and after restart sometimes it takes a few hours to get into this bad state where most connections seem to get left in theCLOST_WAIT
state.Steps to reproduce:
CLOSE_WAIT
state on the logstash side)This may only happen when many clients are sending messages to the same logstash TCP input at the same time.
The text was updated successfully, but these errors were encountered: