[nsd-users] NSD4 goes unresponsive with lots of TCP connection!

Kabindra Shrestha kabindra at geeks.net.np
Fri Apr 8 06:08:15 UTC 2016


Hi Wouter,


> On Apr 6, 2016, at 2:49 PM, W.C.A. Wijngaards <wouter at nlnetlabs.nl> wrote:
> 
> Signed PGP part
> Hi Kabindra,
> 
> I have not heard of this before, how is TCP affecting NSD?
After couple thousand of TCP queries, NSD goes unresponsive for both TCP and UDP.
[kabindra at 1 ~]$ dig @`hostname` -p 5350 ch txt hostname.bind

; <<>> DiG 9.8.1 <<>> @<replaced> -p 5350 ch txt hostname.bind
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached
[kabindra at 1 ~]$ dig @`hostname` -p 5350 ch txt hostname.bind +tcp

; <<>> DiG 9.8.1 <<>> @ <replaced> -p 5350 ch txt hostname.bind +tcp
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

One thing we noticed, we have set the server-count to 4, so it should have 4 child process forked, right? when NSD goes unresponsive, we see couple of <defunct> process and more than 4 child processes.
also, these NSD processes are using lots of CPU. I have left this box out of service for almost 2 days now after going unresponsive but you can see the cpu usage on the below image, it's not coming down.






> 

>   NSD has a
> fixed number of tcp connections, configured in tcp-count: 100 from the
> nsd.conf file.  That should be what is services.  You should increase
> that count to increase responsiveness to TCP.
Yes, that's what we changed earlier to increase responsiveness to TCP.

> 
> UDP should be unaffected.
That is not the case we are seeing.

> 
> The backlog is for tcp connections waiting to be accepted.  256 is
> reasonably portable, reasonably large.  I don't see how that value is
> your problem.
It has been so far and should be true for most of the users but recently with the increase in TCP traffic, I doubt that's still the case. With the RRL implemented I believe it's going to increase some amount of TCP traffic than what it used to be, right?
So say if I increase the number of tcp-counts to 1024 but my backlog is set to 256, will I still be able to get 1024 connections at a time or will I be limited to 256 connections concurrently?

>   Is your kernel and networking subsystem failing?

I don't think so, if it was the problem I would see problem for other services on that server as well, right?


> 
> The OS can return EMFILE or ENFILE to accept(), nsd starts to stop
> accepting TCP connections to relieve buffer stress on the OS.  But
> again, UDP should not have been impacted?
Again, that's not the case we are seeing.

> 
> Are you using so-reuseport: yes?
Nope.


>   I have had reports that it disrupts
> connectivity (depending on OS, particular version of the OS, and more
> recent versions of NSD do not use reuseport on TCP anymore).

Sorry, forgot to mention earlier, we are on CentOS 6 and NSD 4.1.8.

Thanks.

> 
> Best regards, Wouter
> 
> On 05/04/16 18:28, Kabindra Shrestha wrote:
> > Hi,
> >
> > We are seeing some large number of TCP connections to our DNS
> > servers (in thousands) and NSD goes unresponsive after certain time
> > and doesn't recover, it stops responding to UDP as well. We tried
> > increasing the number of tcp-counts but it doesn't help. I noticed
> > the TCP backlog is hardcoded to 256 in NSD config, so even with
> > customised TCP backlogs on the system its still being throttled at
> > around 256. Is there anyway we can change this value without
> > recompiling the NSD.
> >
> >
> > [kabindra at 05 nsd-4.1.8]$ grep BACKLOG * config.h.in:#undef
> > TCP_BACKLOG configure:#define TCP_BACKLOG 256
> > configure.ac:AC_DEFINE_UNQUOTED([TCP_BACKLOG], [256], [Define to
> > the backlog to be used with listen.])
> >
> >
> > We are using NSD4.1.8.
> >
> > ( From one of the servers that went unresponsive, we have seen that
> > TCP number closing to 10k. )
> >
> > #ss -s Total: 5591 (kernel 5640) TCP:   5067 (estab 4968, closed 4,
> > orphaned 0, synrecv 0, timewait 3/0), ports 28
> >
> > Transport Total     IP        IPv6 *	  5640      -         - RAW
> > 0         0         0 UDP	  122       63        59 TCP	  5063
> > 5017      46 INET	  5185      5080      105 FRAG	  0         0
> > 0
> >
> >
> > Thanks.
> >
> > Regards, Kabindra Shrestha
> >
> >
> >
> > _______________________________________________ nsd-users mailing
> > list nsd-users at NLnetLabs.nl
> > https://open.nlnetlabs.nl/mailman/listinfo/nsd-users
> >
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> https://open.nlnetlabs.nl/mailman/listinfo/nsd-users

Regards,
Kabindra Shrestha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 114187 bytes
Desc: not available
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 119257 bytes
Desc: not available
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.bin>


More information about the nsd-users mailing list