2023-03-24 14:39:44 |
Scati Labs I+D |
bug |
|
|
added bug |
2023-03-24 14:39:44 |
Scati Labs I+D |
attachment added |
|
apport crash view https://bugs.launchpad.net/bugs/2012740/+attachment/5657282/+files/pacemaker_crash.txt |
|
2023-03-27 07:23:17 |
Athos Ribeiro |
bug |
|
|
added subscriber Ubuntu Server |
2023-03-27 07:23:26 |
Athos Ribeiro |
pacemaker (Ubuntu): status |
New |
Triaged |
|
2023-03-27 07:24:07 |
Athos Ribeiro |
tags |
|
bitesize server-todo |
|
2023-03-27 07:24:46 |
Athos Ribeiro |
nominated for series |
|
Ubuntu Jammy |
|
2023-03-27 07:24:46 |
Athos Ribeiro |
bug task added |
|
pacemaker (Ubuntu Jammy) |
|
2023-03-27 07:24:53 |
Athos Ribeiro |
pacemaker (Ubuntu Jammy): status |
New |
Triaged |
|
2023-03-27 07:24:58 |
Athos Ribeiro |
pacemaker (Ubuntu): status |
Triaged |
Fix Released |
|
2023-03-28 15:16:59 |
Christian Ehrhardt |
pacemaker (Ubuntu Jammy): assignee |
|
Michał Małoszewski (michal-maloszewski99) |
|
2023-04-01 13:35:55 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~michal-maloszewski99/ubuntu/+source/pacemaker/+git/pacemaker/+merge/440191 |
|
2023-04-06 08:30:46 |
Michał Małoszewski |
description |
After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node.
Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675
And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6
Please, provide an update for pacemaker because it is unusable this way. |
[Impact]
The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components.
Users of mysql clusters migrating from bionic to jammy reported a crash.
This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback.
[Test Plan]
lxc launch ubuntu:22.04 node1
lxc shell node1
apt update && apt dist-upgrade -y
apt install pcs mysql-server resource-agents -y
echo hacluster:hacluster | chpasswd
mysql -e "CREATE USER 'replicator'@'localhost'"
mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'"
systemctl disable mysql.service
systemctl stop mysql.service
exit
lxc copy node1 node2
lxc start node2
lxc shell node1
pcs host auth node1 node2 -u hacluster -p hacluster
pcs cluster setup --force mysqlclx node1 node2 transport udpu
pcs cluster enable --all
pcs cluster start --all
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs resource create p_mysql ocf:heartbeat:mysql \
replication_user=replicator \
test_user=root \
op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \
interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \
notify interval=0s timeout=90 promote interval=0s timeout=120 start \
interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true
pcs resource promotable p_mysql p_mysql-master notify=true
Example of failed output:
There should be a crash file at /var/crash/ in some of the nodes.
Example of successful output:
No crash file at /var/crash/.
[Where problems could occur]
The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd.
Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that.
---------------------------------original report--------------------------
After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node.
Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675
And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6
Please, provide an update for pacemaker because it is unusable this way. |
|
2023-04-06 08:33:27 |
Christian Ehrhardt |
description |
[Impact]
The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components.
Users of mysql clusters migrating from bionic to jammy reported a crash.
This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback.
[Test Plan]
lxc launch ubuntu:22.04 node1
lxc shell node1
apt update && apt dist-upgrade -y
apt install pcs mysql-server resource-agents -y
echo hacluster:hacluster | chpasswd
mysql -e "CREATE USER 'replicator'@'localhost'"
mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'"
systemctl disable mysql.service
systemctl stop mysql.service
exit
lxc copy node1 node2
lxc start node2
lxc shell node1
pcs host auth node1 node2 -u hacluster -p hacluster
pcs cluster setup --force mysqlclx node1 node2 transport udpu
pcs cluster enable --all
pcs cluster start --all
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs resource create p_mysql ocf:heartbeat:mysql \
replication_user=replicator \
test_user=root \
op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \
interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \
notify interval=0s timeout=90 promote interval=0s timeout=120 start \
interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true
pcs resource promotable p_mysql p_mysql-master notify=true
Example of failed output:
There should be a crash file at /var/crash/ in some of the nodes.
Example of successful output:
No crash file at /var/crash/.
[Where problems could occur]
The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd.
Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that.
---------------------------------original report--------------------------
After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node.
Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675
And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6
Please, provide an update for pacemaker because it is unusable this way. |
[Impact]
* The pacemaker-controld is Pacemaker’s coordinator, which maintains a consistent view of the cluster membership and orchestration of all the other components.
* Users of mysql clusters migrating from bionic to jammy reported a crash.
* This crash is caused by lrmd_dispatch_internal(), which assigns the exit_reason string directly from an XML node to a new lrmd_event_data_t object (without duplicating), and this string gets freed twice. The fix is to make a copy of event.exit_reason in lrmd_dispatch_internal() before the callback.
[Test Plan]
lxc launch ubuntu:22.04 node1
lxc shell node1
apt update && apt dist-upgrade -y
apt install pcs mysql-server resource-agents -y
echo hacluster:hacluster | chpasswd
mysql -e "CREATE USER 'replicator'@'localhost'"
mysql -e "GRANT RELOAD, PROCESS, SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'localhost'"
systemctl disable mysql.service
systemctl stop mysql.service
exit
lxc copy node1 node2
lxc start node2
lxc shell node1
pcs host auth node1 node2 -u hacluster -p hacluster
pcs cluster setup --force mysqlclx node1 node2 transport udpu
pcs cluster enable --all
pcs cluster start --all
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs resource create p_mysql ocf:heartbeat:mysql \
replication_user=replicator \
test_user=root \
op demote interval=0s timeout=120 monitor interval=20 timeout=30 monitor \
interval=10 role=Master timeout=30 monitor interval=30 role=Slave timeout=30 \
notify interval=0s timeout=90 promote interval=0s timeout=120 start \
interval=0s timeout=120 stop interval=0s timeout=120 meta notify=true
pcs resource promotable p_mysql p_mysql-master notify=true
Example of failed output:
There should be a crash file at /var/crash/ in some of the nodes.
Example of successful output:
No crash file at /var/crash/.
[Where problems could occur]
* The patch itself modifies only the lmrd code, so regressions should be limited to the behavior of lmrd.
* Since the code changes affect event dispatching and memory allocation, therefore potential regressions would most likely be related to that.
---------------------------------original report--------------------------
After migrating a mysql cluster from bionic to jammy (pacemaker 2.1.2-1ubuntu3), pacemaker started to malfunction because of pacemaker-controld crashes. It is easy to reproduce doing a standby of the promoted node.
Apport crash view has been attached and it is the same bug reported in redhat https://bugzilla.redhat.com/show_bug.cgi?id=2039675
And was fixed in this commit https://github.com/ClusterLabs/pacemaker/commit/ed8b2c86ab77aaa3d7fd688c049ad5e1b922a9c6
Please, provide an update for pacemaker because it is unusable this way. |
|
2023-04-06 08:33:56 |
Christian Ehrhardt |
pacemaker (Ubuntu Jammy): status |
Triaged |
Fix Committed |
|
2023-04-06 08:33:59 |
Christian Ehrhardt |
pacemaker (Ubuntu Jammy): status |
Fix Committed |
In Progress |
|
2023-04-14 22:19:08 |
Steve Langasek |
pacemaker (Ubuntu Jammy): status |
In Progress |
Fix Committed |
|
2023-04-14 22:19:10 |
Steve Langasek |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2023-04-14 22:19:12 |
Steve Langasek |
bug |
|
|
added subscriber SRU Verification |
2023-04-14 22:19:15 |
Steve Langasek |
tags |
bitesize server-todo |
bitesize server-todo verification-needed verification-needed-jammy |
|
2023-04-17 18:54:17 |
Michał Małoszewski |
tags |
bitesize server-todo verification-needed verification-needed-jammy |
bitesize server-todo verification-done-jammy verification-needed |
|
2023-04-18 08:05:43 |
Christian Ehrhardt |
tags |
bitesize server-todo verification-done-jammy verification-needed |
bitesize server-todo verification-done verification-done-jammy |
|
2023-04-27 14:53:16 |
Launchpad Janitor |
pacemaker (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|
2023-04-27 14:53:22 |
Andreas Hasenack |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|