MariaDB upgrade fails - containers get signal 9 during upgrade

Bug #2029613 reported by Michal Nasiadka
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Status tracked in Bobcat
Antelope
Fix Committed
Critical
Michal Nasiadka
Bobcat
Fix Released
Critical
Michal Nasiadka

Bug Description

Due to a bug in kolla-ansible systemd functionality MariaDB upgrade fails.

kolla_docker_worker/systemd_worker does not take into account that MariaDB role starts a bootstrap container (named 'mariadb') to bootstrap Galera cluster (with --wsrep-new-cluster) and then restarts it to non-bootstrap mode (with the same name).

Basically the first run of 'mariadb' container is without systemd (due to docker_restart_policy "no" on the kolla_docker task, and then it creates a systemd unit, does daemon-reload and enablement of the new unit - but stopping the container fails, because the unit is already in stopped state).

Changed in kolla-ansible:
importance: Undecided → Critical
assignee: nobody → Michal Nasiadka (mnasiadka)
Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Mark Goddard (mgoddard) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/890198
Committed: https://opendev.org/openstack/kolla-ansible/commit/1497ab2ab30475083289cc1fea54464b695fca49
Submitter: "Zuul (22348)"
Branch: master

commit 1497ab2ab30475083289cc1fea54464b695fca49
Author: Michal Nasiadka <email address hidden>
Date: Thu Aug 3 19:54:31 2023 +0000

    systemd: handle running container without systemd unit

    MariaDB bootstrap has a phase where the first MariaDB container
    is running with Galera bootstrap - after a check that WSREP
    is synced is successful - we restart the container.

    The bootstrap container is named mariadb and running with
    docker_restart_policy: "no" - the restarted container should be running
    in systemd.

    Before this patch the code created a systemd unit but it was initially
    stopped - so stopping was always a success - and the container would be
    killed with SIGKILL on removal (which obviously breaks MariaDB).

    This patch also improves docker/systemd stops by waiting for real
    unit/container stop and adds failing CI for containers that are
    killed with signal 9.

    Closes-Bug: #2029613

    Change-Id: I0a03e509ce228a50e081fcab44d2b4831251190c

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/891750

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/891750
Committed: https://opendev.org/openstack/kolla-ansible/commit/a5addd82581e097d953287d70d44cb6ce652048b
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit a5addd82581e097d953287d70d44cb6ce652048b
Author: Michal Nasiadka <email address hidden>
Date: Thu Aug 3 19:54:31 2023 +0000

    systemd: handle running container without systemd unit

    MariaDB bootstrap has a phase where the first MariaDB container
    is running with Galera bootstrap - after a check that WSREP
    is synced is successful - we restart the container.

    The bootstrap container is named mariadb and running with
    docker_restart_policy: "no" - the restarted container should be running
    in systemd.

    Before this patch the code created a systemd unit but it was initially
    stopped - so stopping was always a success - and the container would be
    killed with SIGKILL on removal (which obviously breaks MariaDB).

    This patch also improves docker/systemd stops by waiting for real
    unit/container stop and adds failing CI for containers that are
    killed with signal 9.

    Closes-Bug: #2029613

    Change-Id: I0a03e509ce228a50e081fcab44d2b4831251190c
    (cherry picked from commit 1497ab2ab30475083289cc1fea54464b695fca49)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.