ntpd should run niced

Bug #229632 reported by Jim Hill
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ntp (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

Binary package hint: ntp

In ntp-4.2.4p4+dfsg debian/ntp.init starts ntpd with no special priority (I believe this is not new).

If the system is busy at all when ntpd polls the jitter goes way up, and ntpd gets nervous and drops the poll times (it's been a while since I saw anything but 1024; 64 or 128 weren't uncommon before).

I'm guessing the attached patch will measurably decrease the load on ntp.ubuntu.com, since I see that time synchronization installs ntpd.

Version commands output as requested:

jthill@shrdlu:~/src/ubuntu/ntp-4.2.4p4+dfsg$ lsb_release -rd
Description: Ubuntu 8.04
Release: 8.04
jthill@shrdlu:~/src/ubuntu/ntp-4.2.4p4+dfsg$ apt-cache policy ntp
ntp:
  Installed: 1:4.2.4p4+dfsg-3ubuntu2
  Candidate: 1:4.2.4p4+dfsg-3ubuntu2
  Version table:
 *** 1:4.2.4p4+dfsg-3ubuntu2 0
        500 http://archive.ubuntu-rocks.org hardy/main Packages
        100 /var/lib/dpkg/status

Tags: patch
Revision history for this message
Jim Hill (gjthill) wrote :
Revision history for this message
David A. Cobb (superbiskit) wrote :

I have had some success doing something similar by running

renice -n-8 $(pidof ntpd)

in rc.local
(I'm at my employer's windoze machine right now, so I can't attach the actual code)

Revision history for this message
Chuck Short (zulcss) wrote :

Thanks it might be considered for the future.

Regards
chuck

Changed in ntp (Ubuntu):
importance: Undecided → Low
status: New → Triaged
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

You can edit your defaults file to run ntpd at maximum priority:

--- /etc/default/ntp~ 2009-12-04 13:07:15.000000000 -0500
+++ /etc/default/ntp 2010-02-09 17:56:07.000000000 -0500
@@ -1 +1 @@
-NTPD_OPTS='-g'
+NTPD_OPTS='-g -N'

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Actually the -N argument does nothing.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Hacking the startup script seems to be the best temporary fix.

-- default-intrepid-ntp-init-script 2010-02-09 15:01:43.255277872 -0500
+++ /etc/init.d/ntp 2010-02-09 15:02:48.000000000 -0500
@@ -57,7 +57,7 @@
                        exit 1
                fi
                lock_ntpdate
- start-stop-daemon --start --quiet --oknodo --pidfile $PIDFILE --startas $DAEMON -- -p $PIDFILE -u $UGID $NTPD_OPTS
+ start-stop-daemon --start --quiet --oknodo --pidfile $PIDFILE --startas $DAEMON --nicelevel -8 -- -p $PIDFILE -u $UGID $NTPD_OPTS
                status=$?
                unlock_ntpdate
                log_end_msg $status

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :
Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

We are limited a single NTP upstream source due to political reasons outside our control and our Ubuntu ntpd loses sync periodically. I tried a number of solutions but they all failed. When I tried setting the ntpd server's nice value to negative eight it just works. Clearly the stability of ntpd is improved by reducing the number of cases in which ntpd gets preempted by scheduling.

While troubleshooting this I started keeping logs with:

while true ; do ntpq -p | grep name-of-upstream-time-source | logger -t ntpq -p daemon.info; done

A pair of log entries showing loss of sync looks like:

time remote refid st t when poll reach delay offset jitter
===================================================================================================================
2010-02-08:2010-02-08T21:53:00-05:00 *xxxx-xxxx.xxx.x 128.59.39.48 2 u 12 64 377 0.512 0.030 0.098
2010-02-08:2010-02-08T21:54:04-05:00 xxxx-xxxx.xxx.x 128.59.39.48 2 u 11 64 377 0.512 0.030 0.342

The "*" in the first entry indicates that the local Ubuntu host was synchronized to the remote host. The second entry does not contain the "*" since synchronization was lost.

By changing the nicelevel of the ntpd server process to negative eight the logs no longer show any loss of synchronization.

See also http://ubuntuforums.org/showthread.php?p=8801540

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Edit:

while true ; do ntpq -p | grep name-of-upstream-time-source | logger -t ntpq -p daemon.info; sleep 64; done

the "sleep 64" was missing.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Scheduler preemption of ntpd causes the time on other systems to appear unstable since measurements are ruined if preemption occurs during them.

C de-Avillez (hggdh2)
tags: added: patch
Revision history for this message
Weisi (spamcop-5) wrote :

I fully support this request.
Comment #6 is a good solution. Please note that due to the line-breaks the diff won't work without mangling.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

@Weisi comment #7 is comment #6 in a patch file without line-breaks.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I should probably submit this change to Debian not Canonical.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

You are supposed to be able to use the -N (--nice) option to ntpd but upstream ntp bug report shows it doesn't work:

http://bugs.ntp.org/show_bug.cgi?id=1230

Testing on 11.04

$ lsb_release -ds
Ubuntu 11.04
$ uname -srv
Linux 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011
$ ntpd --version
ntpd - NTP daemon program - Ver. 4.2.6p2
$ apt-cache policy ntp | egrep 'Installed|Candidate'
  Installed: 1:4.2.6.p2+dfsg-1ubuntu5.1
  Candidate: 1:4.2.6.p2+dfsg-1ubuntu5.1

WIthout -N

$ (ps alx | head -1) ; (ps alx | grep ntpd) | grep -v grep
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 117 1955 1 20 0 38864 2164 poll_s Ss ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /var/lib/ntp/ntp.conf.dhcp -u 117:126

With -N

$ (ps alx | head -1) ; (ps alx | grep ntpd) | grep -v grepF UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 117 2915 1 20 0 38736 2096 poll_s Ss ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -N -c /var/lib/ntp/ntp.conf.dhcp -u 117:126

PRI = 20 and NI = 0 in both cases. If -N did anything NI would not be 0

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I thought it might be because the setpriority() call was tried after permissions were dropped but running as UID 0 doesn't change the "niceness" as seen in ps output

$ (ps alx | head -1) ; (ps alx | grep ntpd) | grep -v grep
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 0 14085 1 20 0 32404 1976 poll_s Ss ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -N -c /var/lib/ntp/ntp.conf.dhcp

UID = 0, PRI = 20, NI = 0

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

When run with -N and -u 117:126

Aug 19 08:14:22 linux ntpd[14158]: sched_setscheduler(): Operation not permitted
Aug 19 08:14:22 linux ntpd[14158]: setpriority() error: Permission denied
Aug 19 08:14:22 linux ntpd[14158]: set_process_priority: No way found to improve our priority

Freaking apparmor

Aug 19 08:10:04 linux kernel: [ 3150.656384] type=1400 audit(1313755804.646:29): apparmor="DENIED" operation="capable" parent=1 profile="/usr/sbin/ntpd" pid=14085 comm="ntpd" capability=23 capname="sys_nice"

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This can be added to the apparmor profile to allow -N option to work:
  capability sys_nice,

Then run:
$ sudo apparmor_parser -r /etc/apparmor.d/usr.sbin.ntpd

I'm preparing an upload for this policy change now.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

ntp (1:4.2.6.p2+dfsg-1ubuntu11) oneiric; urgency=low

  * debian/apparmor-profile: allow sys_nice for -N option to work. More
    work is needed to make ntpd start niced, so not auto-closing the bug.
    - LP: 229632

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

I switched AppArmor for ntpd into "complain" mode

$ sudo aa-complain /usr/sbin/ntpd
Setting /usr/sbin/ntpd to complain mode.

after that the ntpd "-N" flag started to work.

$ (ps alx | head -1) ; (ps alx | grep ntpd) | grep -v grep
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 117 14798 1 -100 - 38736 2096 poll_s Ss ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -N -c /var/lib/ntp/ntp.conf.dhcp -u 117:126

PRI=1 NI=-100

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Oops, didn't read Jamie's posts until just now, thanks Jamie!

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Can an SRU for Lucid be done for the AppArmor modification too?

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

@Jamie given that the default ntpd configuration is to poll and the stability of the clock affects the polling interval[1] I would strongly suggest the default ntpd configuration be adjusted to run with the "-N" option since it would cut down on the load of NTP servers if polling was less frequent. When this scales to the world sizes it does become an issue.

1. In a nutshell when the clock is stable there is less need for ntpd to check to see the time on the upstream NTP server so it ratchets back the interval (decreases frequency) for polling by sleeping longer between checks. See ntp option maxpoll and minpoll in "man ntp.conf" to see the controls of the limits of the polling frequency range. See "What is the best polling Interval?"
http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ALGO-POLL-BEST

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

The minimum sleep time between ntpd upstream requests is 64 seconds. With 64000 clients an ntpd server must answer one thousand requests per second.

If the client ntpd clocks are very stable they will ratchet back to one request every 1024 seconds (about 17 minutes) so the same 6400 clients would only make the server process 62.5 requests per second.

Revision history for this message
nutznboltz (nutznboltz-deactivatedaccount) wrote :

Try with and with out ntpd "-N" option.

Use "ntpq -p" or "ntpdc -c peers" to view the polling interval. Both reports have a "poll" column which is the sleep time in seconds between requests being sent out over the network to the upstream NTP source.

The "poll" column will eventually change from 64 to 128 then 256 through 1024. This change can take hours or even days to occur as the clock stabilizes.

Without "-N" I haven't seen the polling interval get to 1024 on my systems as process preemption destabilizes ntpd too much.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I know it's a long time, but I'm cleaning up old NTP bugs atm.

While I totally agree with all that was said before, times have changed since this went dormant and we have to look at it freshly.

For NTP, the config via -N works these days:
default:
(ps alx | head -1) ; (ps alx | grep ntpd) | grep -v grep
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 114 10771 1 20 0 110040 4988 - Ss ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 114:120

with cat /etc/default/ntp
NTPD_OPTS='-g -N'

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
5 114 22635 1 -100 - 110032 4948 - Ssl ? 0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -N -u 114:120

So now the remaining request of the bug would be to make -N the default - and since it works just as is now the patch itself would be rather trivial.

But since the default time synchronization these days is via systemd anyway it would only affect those explicitly using ntp.
That said I'll set this bug to incomplete for now, so it can time out if nobody cares but live on if one does.

This shall not be an offense, if anybody still cares and considers it important to change the default I'd encourage you to go to file a Debian bug. I don't think this would be worth a Ubuntu delta, if Debian agrees it would be picked up by the next merge.

Changed in ntp (Ubuntu):
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.