Fuel for OpenStack

devops.error.DevopsCalledProcessError: Command '/etc/init.d/ntp stop' returned exit code 1

Bug #1621920 reported by Alexey. Kalashnikov on 2016-09-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Won't Fix	High	Fuel QA Team	Fuel for OpenStack 9.2
	Mitaka	Confirmed	High	Fuel QA Team	Fuel for OpenStack 9.x-updates

Bug Description

Swarm test failed with error:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.repetitive_restart/53/testReport/(root)/ceph_partitions_repetitive_cold_restart/ceph_partitions_repetitive_cold_restart/

When i try to revert environment it reproduced:
http://paste.openstack.org/show/570242/

workaround
reverting without time synchronization successfully done

Tags:

Alexey. Kalashnikov (akalashnikov) on 2016-09-09

summary:

- ntp stop return code 1
+ devops.error.DevopsCalledProcessError: Command '/etc/init.d/ntp stop'
+ returned exit code 1

Dmitry Klenov (dklenov) on 2016-09-13

tags:	added: area-devops
Changed in fuel:
milestone:	none → 9.1
assignee:	nobody → Fuel QA Team (fuel-qa)

Nastya Urlapova (aurlapova) on 2016-09-14

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
status:	New → Confirmed

Dmitry Pyzhov (dpyzhov) on 2016-09-14

Changed in fuel:
assignee:	Fuel Sustaining (fuel-sustaining-team) → MOS Linux (mos-linux)

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-14:

Actually the command should not work:

'/etc/init.d/ntp stop' returned exit code 1 while expected [0]

The NTPD was rebuild to fix the LP #1585751 and now it should start/stop using Upstart.
See the commit and description with explanations at the link
https://review.fuel-infra.org/#/c/25388/

Please consider to update the test and using Upstart for managing NTP.

Changed in fuel:
assignee:	MOS Linux (mos-linux) → Fuel QA Team (fuel-qa)

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2016-09-14:

Albert, if you have introduced the new functionality, you have to support it in tests!
Why QA team have to do such alignment w/o any significant notification?

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → MOS Linux (mos-linux)

Albert Syriy (asyriy) on 2016-09-15

Changed in fuel:
assignee:	MOS Linux (mos-linux) → Albert Syriy (asyriy)
importance:	Undecided → Critical
status:	Confirmed → In Progress

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-16:

It's quite hard to reproduce the bug.
It's a race conditions.
I just noticed from the logs, that there is attempt to stop ntp service, which actually has been stopped.
And at this point we got the "ntp: stop: Unknown instance" message.

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-19:

The root cause of the issue is race conditions.
Configuring network interfaces (if-up script) follows to ntpd restart.

Here the details:
When one interface is restarting ntpd (and the service is actually in the state stop/waiting, but not starting yet), a second interface is go to restarting to we got the case when the second interface are trying stopping (and starting) the ntpd interface at the same time.
As the result the "ntp: stop: Unknown instance" message is appeared.

Nothing wrong with that, the service will successfully start and run later.

Nevertheless attempt to sync time with NTP (the following commands):
--------
service stop ntp
ntpdate ...
service start ntp
-------
when interfaces are restarting follow to the issue:

1. Interface one restart ntpd (the ntpd service has been stopped bun not started yet).
2. The command `service stop ntp` will attempt to stop ntpd, but got the error (the service has been stopped).
3. Interface one (start the ntpd service). Ntp daemon is in running state.
4. ntpdate command attempt to sync time (- will failed because nptd is running now)
5. service start ntp ( - will failed due to the service has been running).

I am curious, is this scenario possible in fields?

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-19:

Since the issue has not been discovered in the field (customers have not reported yet), the case when interfaces are restarted and ntp restarted and caught the race conditions are very rare.
Change the bug severity to high.
We agreed, that @vkhlyunev will try to reproduce the bug (or scenario) on HW lab.

Changed in fuel:
importance:	Critical → High

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-20:

Move the bug to Vladimir @vkhlyunev waiting for test results.

Changed in fuel:
assignee:	Albert Syriy (asyriy) → Vladimir Khlyunev (vkhlyunev)

Revision history for this message

Vladimir Khlyunev (vkhlyunev) wrote on 2016-09-22:

Not reproduced on baremetal - its ok to move to 9.2

Changed in fuel:
assignee:	Vladimir Khlyunev (vkhlyunev) → Albert Syriy (asyriy)

Revision history for this message

Albert Syriy (asyriy) wrote on 2016-09-22:

Moving the bug to the 9.2 will enable us update scripts and avoid racing in management ntp.

Changed in fuel:
milestone:	9.1 → 9.2
status:	In Progress → Confirmed

Albert Syriy (asyriy) on 2016-09-22

Changed in fuel:
assignee:	Albert Syriy (asyriy) → Fuel QA Team (fuel-qa)

Revision history for this message

Alexey. Kalashnikov (akalashnikov) wrote on 2016-09-29:

Reproduced on today swarm 9.1 snapshot #323,
When I try to revert snapshot for failed swarm test on:
http://paste.openstack.org/show/583477/

failed test:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.command_line/77/consoleFull

Revision history for this message

Dmitry Belyaninov (dbelyaninov) wrote on 2016-10-23:

#10

https://product-ci.infra.mirantis.net/view/9.x_acceptance/job/9.x.acceptance.ubuntu.load/17/console

Roman Vyalov (r0mikiam) on 2017-02-03

Changed in fuel:
status:	Confirmed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.