devops.error.DevopsCalledProcessError: Command '/etc/init.d/ntp stop' returned exit code 1

Bug #1621920 reported by Alexey. Kalashnikov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
High
Fuel QA Team
Mitaka
Confirmed
High
Fuel QA Team

Bug Description

Tags: area-devops
summary: - ntp stop return code 1
+ devops.error.DevopsCalledProcessError: Command '/etc/init.d/ntp stop'
+ returned exit code 1
Dmitry Klenov (dklenov)
tags: added: area-devops
Changed in fuel:
milestone: none → 9.1
assignee: nobody → Fuel QA Team (fuel-qa)
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
status: New → Confirmed
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → MOS Linux (mos-linux)
Revision history for this message
Albert Syriy (asyriy) wrote :

Actually the command should not work:

'/etc/init.d/ntp stop' returned exit code 1 while expected [0]

The NTPD was rebuild to fix the LP #1585751 and now it should start/stop using Upstart.
See the commit and description with explanations at the link
https://review.fuel-infra.org/#/c/25388/

Please consider to update the test and using Upstart for managing NTP.

Changed in fuel:
assignee: MOS Linux (mos-linux) → Fuel QA Team (fuel-qa)
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Albert, if you have introduced the new functionality, you have to support it in tests!
Why QA team have to do such alignment w/o any significant notification?

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → MOS Linux (mos-linux)
Albert Syriy (asyriy)
Changed in fuel:
assignee: MOS Linux (mos-linux) → Albert Syriy (asyriy)
importance: Undecided → Critical
status: Confirmed → In Progress
Revision history for this message
Albert Syriy (asyriy) wrote :

It's quite hard to reproduce the bug.
It's a race conditions.
I just noticed from the logs, that there is attempt to stop ntp service, which actually has been stopped.
And at this point we got the "ntp: stop: Unknown instance" message.

Revision history for this message
Albert Syriy (asyriy) wrote :

The root cause of the issue is race conditions.
Configuring network interfaces (if-up script) follows to ntpd restart.

Here the details:
When one interface is restarting ntpd (and the service is actually in the state stop/waiting, but not starting yet), a second interface is go to restarting to we got the case when the second interface are trying stopping (and starting) the ntpd interface at the same time.
As the result the "ntp: stop: Unknown instance" message is appeared.

Nothing wrong with that, the service will successfully start and run later.

Nevertheless attempt to sync time with NTP (the following commands):
--------
service stop ntp
ntpdate ...
service start ntp
-------
when interfaces are restarting follow to the issue:

1. Interface one restart ntpd (the ntpd service has been stopped bun not started yet).
2. The command `service stop ntp` will attempt to stop ntpd, but got the error (the service has been stopped).
3. Interface one (start the ntpd service). Ntp daemon is in running state.
4. ntpdate command attempt to sync time (- will failed because nptd is running now)
5. service start ntp ( - will failed due to the service has been running).

I am curious, is this scenario possible in fields?

Revision history for this message
Albert Syriy (asyriy) wrote :

Since the issue has not been discovered in the field (customers have not reported yet), the case when interfaces are restarted and ntp restarted and caught the race conditions are very rare.
Change the bug severity to high.
We agreed, that @vkhlyunev will try to reproduce the bug (or scenario) on HW lab.

Changed in fuel:
importance: Critical → High
Revision history for this message
Albert Syriy (asyriy) wrote :

Move the bug to Vladimir @vkhlyunev waiting for test results.

Changed in fuel:
assignee: Albert Syriy (asyriy) → Vladimir Khlyunev (vkhlyunev)
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Not reproduced on baremetal - its ok to move to 9.2

Changed in fuel:
assignee: Vladimir Khlyunev (vkhlyunev) → Albert Syriy (asyriy)
Revision history for this message
Albert Syriy (asyriy) wrote :

Moving the bug to the 9.2 will enable us update scripts and avoid racing in management ntp.

Changed in fuel:
milestone: 9.1 → 9.2
status: In Progress → Confirmed
Albert Syriy (asyriy)
Changed in fuel:
assignee: Albert Syriy (asyriy) → Fuel QA Team (fuel-qa)
Revision history for this message
Alexey. Kalashnikov (akalashnikov) wrote :

Reproduced on today swarm 9.1 snapshot #323,
When I try to revert snapshot for failed swarm test on:
http://paste.openstack.org/show/583477/

failed test:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.command_line/77/consoleFull

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Roman Vyalov (r0mikiam)
Changed in fuel:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.