tripleo

systemd wrappers (sidecars) locking doesn't really work

Bug #1874470 reported by Brent Eagles on 2020-04-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Incomplete	High	Unassigned

Bug Description

IIUC The wrapper that populates the processes file and the sync executed by the related systemd service on the host are supposed to share a file lock to prevent races on the processes file. This is because the wrapper adds to the file and sync truncates the file after it runs. However, the lock used in the wrapper is under /var/lock in the container which is not shared with the host so the sync script never waits for the wrapper to be done. Moving the lock file to a path on a shared mount in the container seems to solve that particular race.

... in addition ...

it appears that the triggering of the systemd process that runs the sync command is also racy. It appears that if the processes file has entry added after the shared lock is released, but the sync process isn't completed, sync doesnt happen again.

This was reproduced by restarting the neutron dhcp agent container when 3 subnets were configured. The first problem resulted in only one sidecar being created, the second issue would occasionally result in one or more side car containers being missed in the sync. The processes file would have remaining entries and restarting the dhcp_dnsmasq service on the host would cause the remaining side cars to get created.

See original description

Tags:

Brent Eagles (beagles) on 2020-04-23

Changed in tripleo:
status:	New → Triaged
milestone:	none → ussuri-rc3
importance:	Undecided → Critical

Brent Eagles (beagles) on 2020-04-23

description:

updated

Bogdan Dobrelya (bogdando) on 2020-04-24

tags:

added: train-backport-potential

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2020-04-24:

> It appears that if the processes file has entry added after the shared lock is released, but the sync process isn't completed, sync doesnt happen again.

that particular part of the issue really sounds like a race in systemd watchers?..

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2020-04-24:

https://github.com/systemd/systemd/pull/5839 ?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2020-04-24:

Sorry, this one is probably a better link https://github.com/systemd/systemd/issues/5770

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-24: Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/722816

Changed in tripleo:
assignee:	nobody → Brent Eagles (beagles)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-27:

Fix proposed to branch: master
Review: https://review.opendev.org/723373

Changed in tripleo:
assignee:	Brent Eagles (beagles) → Bogdan Dobrelya (bogdando)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2020-04-27:

An alternative implementation w/o introducing a sync daemon to replace the oneshot sync service https://review.opendev.org/#/c/723373/

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2020-04-27:

note, patches should depend on the fix for the "the lock used in the wrapper is under /var/lock in the container which is not shared with the host so the sync script never waits for the wrapper to be done" part, which is not ready yet...

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-27: Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/723522

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-29: Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/724259

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-05: Fix merged to tripleo-heat-templates (master)

#10

Reviewed: https://review.opendev.org/723522
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1517df0fc30b7b10263aa96fe48978d7bf17a0fe
Submitter: Zuul
Branch: master

commit 1517df0fc30b7b10263aa96fe48978d7bf17a0fe
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Apr 27 15:11:21 2020 +0200

Add shared volume for side-car wrapper locks

    The lock used in the wrapper is under /var/lock in the container which
    is not shared with the host so the sync script never waits for the
    wrapper to be done. Moving the lock file to a path on a shared mount in
    the container seems to solve that particular race.

Partial-bug: #1874470

Change-Id: Iaa3a19bc47241e6eb686d65c1a198ec69505398e
Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-06: Fix merged to tripleo-ansible (master)

#11

Reviewed: https://review.opendev.org/724259
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=90a05a5f8a57928f3d429925468749c482eaf1b6
Submitter: Zuul
Branch: master

commit 90a05a5f8a57928f3d429925468749c482eaf1b6
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Apr 29 11:05:12 2020 +0200

Use shared volume for side-car wrapper locks

    Change-Id: I660b7189a9e1c3197f2cdcc77af62584691dde16
    Partial-bug: #1874470
    Depends-On: https://review.opendev.org/723522
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-07: Change abandoned on tripleo-ansible (master)

#12

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/723373

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-07:

#13

Change abandoned by Brent Eagles (<email address hidden>) on branch: master
Review: https://review.opendev.org/722816

Brent Eagles (beagles) on 2020-05-14

Changed in tripleo:
importance:	Critical → High

Revision history for this message

yatin (yatinkarel) wrote on 2020-05-14:

#14

After reboot /var/lock/containers get's deleted and ovn metadata container didn't start until the directory get's created manually.

/var/lock is a symlink to /var/run/lock, so get's cleaned up on reboot.

lrwxrwxrwx. 1 root root 11 Jan 13 21:49 /var/lock -> ../run/lock

Bogdan Dobrelya (bogdando) on 2020-05-14

Changed in tripleo:
status:	In Progress → Triaged
assignee:	Bogdan Dobrelya (bogdando) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-15: Related fix proposed to tripleo-heat-templates (master)

#15

Related fix proposed to branch: master
Review: https://review.opendev.org/728360

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-21: Change abandoned on tripleo-heat-templates (master)

#16

Change abandoned by yatin (<email address hidden>) on branch: master
Review: https://review.opendev.org/728360
Reason: In favor of revert https://review.opendev.org/#/c/728891/

wes hayutin (weshayutin) on 2020-05-26

Changed in tripleo:
milestone:	ussuri-rc3 → victoria-1

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	victoria-1 → victoria-3

Marios Andreou (marios-b) on 2020-11-03

Changed in tripleo:
milestone:	victoria-3 → wallaby-1

Marios Andreou (marios-b) on 2020-12-08

Changed in tripleo:
milestone:	wallaby-1 → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Marios Andreou (marios-b) on 2021-03-17

Changed in tripleo:
milestone:	wallaby-3 → wallaby-rc1

Marios Andreou (marios-b) on 2021-05-06

Changed in tripleo:
milestone:	wallaby-rc1 → xena-1

Revision history for this message

Marios Andreou (marios-b) wrote on 2021-05-07:

#17

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone:	xena-1 → none
status:	Triaged → Incomplete

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-systemd-systemd #5770
[open RFE 🎁 pid1] Edit

Bug watches keep track of this bug in other bug trackers.