Comment 3 for bug 1393473

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote : Re: [System tests] Ceph_ha_restart test failed because of time skew detected

Andrey Sledzinskiy, you are right, ntpd on the master was into de-synced state for several time (for about an hour):

[root@nailgun ~]# ntptime
ntp_gettime() returns code 5 (ERROR)
  time d814ba4b.636b3b94 Mon, Nov 17 2014 18:17:47.388, (.388355681),
  maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
  modes 0x0 (),
  offset 0.000 us, frequency -500.000 ppm, interval 1 s,
  maximum error 16000000 us, estimated error 16000000 us,
  status 0x2041 (PLL,UNSYNC,NANO),
  time constant 6, precision 0.001 us, tolerance 500 ppm,

[root@nailgun ~]# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
+ns.aksinet.net 195.2.64.5 2 u 18 64 377 22.068 330.121 130.682
*195.91.239.8 .PPS. 1 u 17 64 377 15.440 331.223 129.450
+cello.corbina.n 192.36.144.22 2 u 14 64 377 23.976 298.865 104.554
 LOCAL(0) .LOCL. 10 l 642 64 0 0.000 0.000 0.000

[root@nailgun ~]# ntpq -c assoc

ind assid status conf reach auth condition last_event cnt
===========================================================
  1 44432 945a yes yes none candidate sys_peer 5
  2 44433 961a yes yes none sys.peer sys_peer 1
  3 44434 9414 yes yes none candidate reachable 1
  4 44435 8023 yes no none reject unreachable 2

It was synchronized later :

[root@nailgun ~]# ntptime
ntp_gettime() returns code 0 (OK)
  time d814c205.8fa1f47c Mon, Nov 17 2014 18:50:45.561, (.561065380),
  maximum error 173659 us, estimated error 21622 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 23017.206 us, frequency 42.672 ppm, interval 1 s,
  maximum error 173659 us, estimated error 21622 us,
  status 0x2001 (PLL,NANO),
  time constant 6, precision 0.001 us, tolerance 500 ppm,

We need to reconsider our time sync strategy.