Activity log for bug #1744988

Date Who What changed Old value New value Message
2018-01-23 17:35:46 Juul Spies bug added bug
2018-01-24 16:18:36 Launchpad Janitor linux-hwe (Ubuntu): status New Confirmed
2018-01-31 11:17:35 Juul Spies attachment added tsc.patch https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+attachment/5046309/+files/tsc.patch
2018-01-31 12:26:36 Ubuntu Foundations Team Bug Bot tags patch
2018-01-31 12:26:36 Ubuntu Foundations Team Bug Bot bug added subscriber Joseph Salisbury
2018-01-31 17:09:11 Joseph Salisbury linux-hwe (Ubuntu): importance Undecided Medium
2018-01-31 17:09:15 Joseph Salisbury linux-hwe (Ubuntu): status Confirmed Incomplete
2018-01-31 17:11:11 Joseph Salisbury affects linux-hwe (Ubuntu) linux (Ubuntu)
2018-01-31 17:11:16 Joseph Salisbury linux (Ubuntu): status Incomplete Triaged
2018-01-31 17:11:25 Joseph Salisbury nominated for series Ubuntu Artful
2018-01-31 17:11:25 Joseph Salisbury bug task added linux (Ubuntu Artful)
2018-01-31 17:11:25 Joseph Salisbury nominated for series Ubuntu Bionic
2018-01-31 17:11:25 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-01-31 17:11:31 Joseph Salisbury linux (Ubuntu Artful): status New Triaged
2018-01-31 17:11:34 Joseph Salisbury linux (Ubuntu Artful): importance Undecided Medium
2018-01-31 17:11:43 Joseph Salisbury tags patch kernel-da-key patch
2018-02-01 05:50:01 Pavel bug added subscriber pavel
2018-02-01 15:52:47 Joseph Salisbury linux (Ubuntu Artful): status Triaged In Progress
2018-02-01 15:52:50 Joseph Salisbury linux (Ubuntu Bionic): status Triaged In Progress
2018-02-01 15:52:53 Joseph Salisbury linux (Ubuntu Artful): assignee Joseph Salisbury (jsalisbury)
2018-02-01 15:52:57 Joseph Salisbury linux (Ubuntu Bionic): assignee Joseph Salisbury (jsalisbury)
2018-02-01 18:41:03 Joseph Salisbury linux (Ubuntu Bionic): status In Progress Fix Committed
2018-02-01 18:53:00 Joseph Salisbury description We observe NTP time drift on two servers running hwe kernels in Xenial. A few weeks ago we wanted to switch from 4.4 to 4.10. When rebooting the servers to the 4.10 kernel we were seeing a big time offset within minutes after booting. Despite running ntpd, it would not keep up and the offset stayed and kept growing over t. Rebooting back into the 4.4 at the time we immediatly noticed the time stayed normal. Over time I have tested about a dozen versions making me think something has been introduced in kernel 4.10 that makes the clock go out of sync. So what do we observe? After 1 min uptime: remote refid st t when poll reach delay offset jitter ============================================================================== *ntp4.bit.nl .PPS. 1 u 5 16 7 0.497 100.084 81.382 +ntp1.bit.nl 193.0.0.229 2 u 8 16 7 0.603 93.241 70.643 +ntp2.bit.nl 193.67.79.202 2 u 8 16 7 0.582 93.218 70.674 +ntp3.bit.nl 193.79.237.14 2 u 9 16 7 0.781 90.488 70.574 A couple of minutes later (and also hours/days, the offset just keeps growing over time) remote refid st t when poll reach delay offset jitter ============================================================================== *ntp4.bit.nl .PPS. 1 u 13 16 377 0.447 400.198 151.335 +ntp1.bit.nl 193.0.0.229 2 u 13 16 377 0.313 400.561 151.339 +ntp2.bit.nl 193.67.79.202 2 u 13 16 377 0.517 400.445 151.398 +ntp3.bit.nl 193.79.237.14 2 u 12 16 377 0.934 402.013 151.384 As mentioned I tested about a dozen of kernels and I thought I got it pinpointed to a specific release when the drifting got introduced, 4.10rc1. Below the test results of the kernels I have tested up till today: Tested: 4.4.0-112-generic: not affected Tested: 4.8.0-41-generic: not affected Tested: 4.8.0-58-generic : not affected Tested: 4.9.0 mainline: not affected Tested: 4.9.66 mainline: not affected Tested: 4.10-rc1 mainline: affected Tested: 4.10 mainline: affected Tested: 4.10.0-38-generic: affected Tested: 4.10.0-40-generic: affected Tested: 4.13.0-16-generic: affected Tested: 4.13.0-31-generic: affected Tested: 4.14.3 mainline: affected Tested: 4.15-rc1 mainline: affected When I was about to file this bugreport about an hour ago I noticed 4.15-rc9 was present and thought I gave it a go to make sure I really tested the latest version. And while running it over an hour now it stable. Mostl likely the following from the changelog is related the issue we are having: Len Brown (3): x86/tsc: Future-proof native_calibrate_tsc() x86/tsc: Fix erroneous TSC rate on Skylake Xeon x86/tsc: Print tsc_khz, when it differs from cpu_khz Both servers that are having issues on our side our equipped with the following cpu: Cpu Model (from /proc/cpuinfo) vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz Standard information as requested: 1: Description: Ubuntu 16.04.3 LTS Release: 16.04 2: root@bit-host6:~# apt-cache policy linux-image-generic-hwe-16.04 linux-image-generic-hwe-16.04: Installed: 4.13.0.31.51 Candidate: 4.13.0.31.51 3: Stable time 4: A big time offset == SRU Justification == We observe NTP time drift on two servers running hwe kernels in Xenial. A few weeks ago we wanted to switch from 4.4 to 4.10. When rebooting the servers to the 4.10 kernel we were seeing a big time offset within minutes after booting. Despite running ntpd, it would not keep up and the offset stayed and kept growing over t. Rebooting back into the 4.4 at the time we immediatly noticed the time stayed normal. Over time I have tested about a dozen versions making me think something has been introduced in kernel 4.10 that makes the clock go out of sync. So what do we observe? After 1 min uptime:      remote refid st t when poll reach delay offset jitter ============================================================================== *ntp4.bit.nl .PPS. 1 u 5 16 7 0.497 100.084 81.382 +ntp1.bit.nl 193.0.0.229 2 u 8 16 7 0.603 93.241 70.643 +ntp2.bit.nl 193.67.79.202 2 u 8 16 7 0.582 93.218 70.674 +ntp3.bit.nl 193.79.237.14 2 u 9 16 7 0.781 90.488 70.574 A couple of minutes later (and also hours/days, the offset just keeps growing over time)      remote refid st t when poll reach delay offset jitter ============================================================================== *ntp4.bit.nl .PPS. 1 u 13 16 377 0.447 400.198 151.335 +ntp1.bit.nl 193.0.0.229 2 u 13 16 377 0.313 400.561 151.339 +ntp2.bit.nl 193.67.79.202 2 u 13 16 377 0.517 400.445 151.398 +ntp3.bit.nl 193.79.237.14 2 u 12 16 377 0.934 402.013 151.384 As mentioned I tested about a dozen of kernels and I thought I got it pinpointed to a specific release when the drifting got introduced, 4.10rc1. Below the test results of the kernels I have tested up till today: Tested: 4.4.0-112-generic: not affected Tested: 4.8.0-41-generic: not affected Tested: 4.8.0-58-generic : not affected Tested: 4.9.0 mainline: not affected Tested: 4.9.66 mainline: not affected Tested: 4.10-rc1 mainline: affected Tested: 4.10 mainline: affected Tested: 4.10.0-38-generic: affected Tested: 4.10.0-40-generic: affected Tested: 4.13.0-16-generic: affected Tested: 4.13.0-31-generic: affected Tested: 4.14.3 mainline: affected Tested: 4.15-rc1 mainline: affected When I was about to file this bugreport about an hour ago I noticed 4.15-rc9 was present and thought I gave it a go to make sure I really tested the latest version. And while running it over an hour now it stable. Mostl likely the following from the changelog is related the issue we are having: Len Brown (3):       x86/tsc: Future-proof native_calibrate_tsc()       x86/tsc: Fix erroneous TSC rate on Skylake Xeon       x86/tsc: Print tsc_khz, when it differs from cpu_khz Both servers that are having issues on our side our equipped with the following cpu: Cpu Model (from /proc/cpuinfo) vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz Standard information as requested: 1: Description: Ubuntu 16.04.3 LTS Release: 16.04 2: root@bit-host6:~# apt-cache policy linux-image-generic-hwe-16.04 linux-image-generic-hwe-16.04:   Installed: 4.13.0.31.51   Candidate: 4.13.0.31.51 3: Stable time 4: A big time offset == Fixes == da4ae6c4a0b8 ("x86/tsc: Future-proof native_calibrate_tsc()") b51120309348 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon") 4b5b2127238e ("x86/tsc: Print tsc_khz, when it differs from cpu_khz") == Regression Potential == Low. These three commits fix an existing regression. They were also cc'd to stable so have had addition upstream review. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug.
2018-02-17 05:00:26 Khaled El Mously linux (Ubuntu Artful): status In Progress Fix Committed
2018-03-19 10:57:04 Stefan Bader tags kernel-da-key patch kernel-da-key patch verification-needed-artful
2018-03-21 07:50:03 Juul Spies tags kernel-da-key patch verification-needed-artful kernel-da-key patch verification-done-artful
2018-03-22 12:31:28 Janåke Rönnblom bug added subscriber Janåke Rönnblom
2018-04-03 14:10:10 Launchpad Janitor linux (Ubuntu Artful): status Fix Committed Fix Released
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-0861
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-1000407
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-15129
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-16994
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17448
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17450
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17741
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17805
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17806
2018-04-03 14:10:10 Launchpad Janitor cve linked 2017-17807
2018-04-03 14:10:10 Launchpad Janitor cve linked 2018-1000026
2018-04-03 14:10:10 Launchpad Janitor cve linked 2018-5332
2018-04-03 14:10:10 Launchpad Janitor cve linked 2018-5333
2018-04-03 14:10:10 Launchpad Janitor cve linked 2018-5344
2019-10-03 08:26:13 Po-Hsu Lin linux (Ubuntu): status Fix Committed Fix Released
2019-10-03 08:26:15 Po-Hsu Lin linux (Ubuntu Bionic): status Fix Committed Fix Released