regular statistics dumps getting out of sync

Arnt Gulbrandsen arnt at gulbrandsen.priv.no
Mon Aug 7 14:54:15 UTC 2006


I've done such things, and in my experience, the quality of the output 
is better if you resync.

If you resync, you get effectively the right interval until conditions 
are completely horrible, and then it falls back to 2*interval, 
3*interval, etc. The interval's _effectively_ right because when a 
signal is delivered a second late, generally the same reason has also 
prevented you from doing anything that would be reflected in the 
statistics you report.

I've only seen the 2*interval thing in case of disasters. True disasters 
like another process eating all RAM+swap. (IIRC rrdtool can be 
configured to detect 2*interval periods and display them as outages.)

By comparison, if you don't resync, the period changes by a much smaller 
factor, and it starts deteriorating much sooner. You don't need a fork 
bomb to affect data quality, you just need a bit of overload or bad 
luck.

The algorithms I've used are (translating from my select() to nsd's alarm()):

     alarm( nsd->st.boot - time(NULL) % nsd->st.period );.

and

     alarm( nsd->st.period - ( time(NULL) % nsd->st.period ) );

The first gives better data for a single process, since its forst 
st.period is optimally reported. The second gives better aggregate data 
across a process restart or when data from several nsds are combined.

Arnt



More information about the nsd-users mailing list