(keystone.server.flask.request_processing.middleware.auth_context): 2023-04-01 08:36:25,698 ERROR (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during
query')
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 148, in execute
result = self._query(query)
File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 310, in _query
conn.query(q)
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 775, in _read_query_result
result.read()
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1163, in read
self._read_result_packet(first_packet)
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1236, in _read_result_packet
self._read_rowdata_packet()
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1270, in _read_rowdata_packet
packet = self.connection._read_packet()
File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 725, in _read_packet
packet.raise_for_error()
File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/usr/lib/python3/dist-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.OperationalError: (1053, 'Server shutdown in progress')
So "Server shutdown in progress" is a bit ominous.
From mysql-innodb-cluster/0 at the same time:
2023-04-01T08:36:23.178312Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 192.168.33.61:3306'
2023-04-01T08:36:23.178353Z 0 [System] [MY-011500] [Repl] Plugin group_replication reported: 'Primary server with address 192.168.33.61:3306 left the group. Electing new Primary.'
2023-04-01T08:36:23.178515Z 0 [System] [MY-011507] [Repl] Plugin group_replication reported: 'A new primary with address 192.168.33.83:3306 was elected. The new primary will execute all previous group transactions before allowing writes.'
2023-04-01T08:36:23.178770Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 192.168.33.83:3306, 192.168.33.84:3306 on view 16803355385089891:10.'
2023-04-01T08:36:23.179472Z 68 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2023-04-01T08:36:23.179579Z 68 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 192.168.33.83:3306.'
2023-04-01T08:37:31.674902Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 192.168.33.61:3306, 192.168.33.83:3306, 192.168.33.84:3306 on view 16803355385089891:11.'
2023-04-01T08:37:36.190735Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address 192.168.33.61:3306 was declared online within the replication group.'
2023-04-01T08:38:15.493860Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 192.168.33.83:3306'
2023-04-01T08:38:15.493903Z 0 [System] [MY-011500] [Repl] Plugin group_replication reported: 'Primary server with address 192.168.33.83:3306 left the group. Electing new Primary.'
i.e. 192.168.33.84 (which is mysql-innodb-cluster/0's local address) left the cluster and then rejoined during the the attempted keystone-manager.py connection. With the cluster at 2 units, it would go RO; the out of cluster unit (this one) would refuse connections if that's what the mysql-router was connected to.
Why did it leave the cluster and then rejoin?
It appeared to be handling a certificates-relation-changed hook from vault/0 at the same time:
2023-04-01 08:36:07 INFO unit.mysql-innodb-cluster/0.juju-log server.go:316 certificates:454: Reactive main running for hook certificates-relation-changed
So the cluster dropped out due to vault providing certificates and the mysql-innodb-cluster/0 restarting. It then took 2 minutes 30 seconds (ish) to reform the cluster, but which time the keystone unit had run out of retries and given up connecting to the cluster.
Additionally, mysql-innodb-cluster/1 also dropped out at 8:36
2023-04-01T08:36:22.413986Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.32-0ubuntu0.20.04.2).
And mysql-innodb-cluster/2:
2023-04-01T08:37:20.236588Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.32-0ubuntu0.20.04.2) (Ubuntu).
...
2023-04-01T08:39:28.616528Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address 192.168.33.83:3306 was declared online within the replication group.'
So the cluster was still up/down until 08:39:28.
--
Summary & thoughts
------------------
3 minutes is actually a long time for the mysql server to update certificates and re-join as a cluster. Basically, the cluster was offline for 3 minutes and no other hooks in the system that would cause a mysql database write would work, and would have to be deferred.
It may be that the system was being deployed in a resource constrained environment, and thus the timeouts on database access were exceeded.
keystone (as a classic charm) either has to wait longer in the hook, or hook retries should be used that will retry the erroring hook until it succeeds - both are suboptimal.
Alternatively, the hook could be deferred; in a reactive or ops-framework charm this is reasonably easy to achieve; unfortunately, classic charms do not have the capability to do this, and there are no (current) plans to do so.
The mysql-innodb-cluster charm could defer all router-db relations until it has achieved as TLS cluster if vault is related to it; this would prevent any other related charm from attempting to access the database until the it has formed a cluster AND upgraded to TLS if vault is related. This might be achieved using goal-state to decide if certificates are going to arrive.
As a mitigation strategy, it may be preferable to deploy vault + mysql-innodb-cluster *first*, wait for them to settle into a cluster with TLS enabled, and then do the rest of the deployment. This would at least work around this race hazard with the mysql-server bootstrapping itself into TLS mode.
From keystone/1:
(keystone. server. flask.request_ processing. middleware. auth_context) : 2023-04-01 08:36:25,698 ERROR (pymysql. err.Operational Error) (2013, 'Lost connection to MySQL server during /sqlalche. me/e/14/ e3q8) python3/ dist-packages/ sqlalchemy/ engine/ base.py" , line 1802, in _execute_context dialect. do_execute( python3/ dist-packages/ sqlalchemy/ engine/ default. py", line 732, in do_execute execute( statement, parameters) python3/ dist-packages/ pymysql/ cursors. py", line 148, in execute python3/ dist-packages/ pymysql/ cursors. py", line 310, in _query python3/ dist-packages/ pymysql/ connections. py", line 548, in query _affected_ rows = self._read_ query_result( unbuffered= unbuffered) python3/ dist-packages/ pymysql/ connections. py", line 775, in _read_query_result python3/ dist-packages/ pymysql/ connections. py", line 1163, in read _read_result_ packet( first_packet) python3/ dist-packages/ pymysql/ connections. py", line 1236, in _read_result_packet _read_rowdata_ packet( ) python3/ dist-packages/ pymysql/ connections. py", line 1270, in _read_rowdata_ packet ._read_ packet( ) python3/ dist-packages/ pymysql/ connections. py", line 725, in _read_packet raise_for_ error() python3/ dist-packages/ pymysql/ protocol. py", line 221, in raise_for_error raise_mysql_ exception( self._data) python3/ dist-packages/ pymysql/ err.py" , line 143, in raise_mysql_ exception err.Operational Error: (1053, 'Server shutdown in progress')
query')
(Background on this error at: https:/
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
cursor.
File "/usr/lib/
result = self._query(query)
File "/usr/lib/
conn.query(q)
File "/usr/lib/
self.
File "/usr/lib/
result.read()
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
packet = self.connection
File "/usr/lib/
packet.
File "/usr/lib/
err.
File "/usr/lib/
raise errorclass(errno, errval)
pymysql.
So "Server shutdown in progress" is a bit ominous.
From mysql-innodb- cluster/ 0 at the same time:
2023-04- 01T08:36: 23.178312Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 192.168.33.61:3306' 01T08:36: 23.178353Z 0 [System] [MY-011500] [Repl] Plugin group_replication reported: 'Primary server with address 192.168.33.61:3306 left the group. Electing new Primary.' 01T08:36: 23.178515Z 0 [System] [MY-011507] [Repl] Plugin group_replication reported: 'A new primary with address 192.168.33.83:3306 was elected. The new primary will execute all previous group transactions before allowing writes.' 01T08:36: 23.178770Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 192.168.33.83:3306, 192.168.33.84:3306 on view 168033553850898 91:10.' 01T08:36: 23.179472Z 68 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_ only=ON. ' 01T08:36: 23.179579Z 68 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 192.168. 33.83:3306. ' 01T08:37: 31.674902Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 192.168.33.61:3306, 192.168.33.83:3306, 192.168.33.84:3306 on view 168033553850898 91:11.' 01T08:37: 36.190735Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address 192.168.33.61:3306 was declared online within the replication group.' 01T08:38: 15.493860Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 192.168.33.83:3306' 01T08:38: 15.493903Z 0 [System] [MY-011500] [Repl] Plugin group_replication reported: 'Primary server with address 192.168.33.83:3306 left the group. Electing new Primary.'
2023-04-
2023-04-
2023-04-
2023-04-
2023-04-
2023-04-
2023-04-
2023-04-
2023-04-
i.e. 192.168.33.84 (which is mysql-innodb- cluster/ 0's local address) left the cluster and then rejoined during the the attempted keystone-manager.py connection. With the cluster at 2 units, it would go RO; the out of cluster unit (this one) would refuse connections if that's what the mysql-router was connected to.
Why did it leave the cluster and then rejoin?
It appeared to be handling a certificates- relation- changed hook from vault/0 at the same time:
2023-04-01 08:36:07 INFO unit.mysql- innodb- cluster/ 0.juju- log server.go:316 certificates:454: Reactive main running for hook certificates- relation- changed
and from the machine.lock:
2023-04-01 08:36:15 unit-mysql- innodb- cluster- 0: mysql-innodb- cluster/ 0 uniter (run relation-changed (454; unit: vault/0) hook), waited 0s, held 9s
--
So the cluster dropped out due to vault providing certificates and the mysql-innodb- cluster/ 0 restarting. It then took 2 minutes 30 seconds (ish) to reform the cluster, but which time the keystone unit had run out of retries and given up connecting to the cluster.
Additionally, mysql-innodb- cluster/ 1 also dropped out at 8:36
2023-04- 01T08:36: 22.413986Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.32- 0ubuntu0. 20.04.2) .
And mysql-innodb- cluster/ 2:
2023-04- 01T08:37: 20.236588Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.32- 0ubuntu0. 20.04.2) (Ubuntu). 01T08:39: 28.616528Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address 192.168.33.83:3306 was declared online within the replication group.'
...
2023-04-
So the cluster was still up/down until 08:39:28.
--
Summary & thoughts
------------------
3 minutes is actually a long time for the mysql server to update certificates and re-join as a cluster. Basically, the cluster was offline for 3 minutes and no other hooks in the system that would cause a mysql database write would work, and would have to be deferred.
It may be that the system was being deployed in a resource constrained environment, and thus the timeouts on database access were exceeded.
keystone (as a classic charm) either has to wait longer in the hook, or hook retries should be used that will retry the erroring hook until it succeeds - both are suboptimal.
Alternatively, the hook could be deferred; in a reactive or ops-framework charm this is reasonably easy to achieve; unfortunately, classic charms do not have the capability to do this, and there are no (current) plans to do so.
The mysql-innodb- cluster charm could defer all router-db relations until it has achieved as TLS cluster if vault is related to it; this would prevent any other related charm from attempting to access the database until the it has formed a cluster AND upgraded to TLS if vault is related. This might be achieved using goal-state to decide if certificates are going to arrive.
As a mitigation strategy, it may be preferable to deploy vault + mysql-innodb- cluster *first*, wait for them to settle into a cluster with TLS enabled, and then do the rest of the deployment. This would at least work around this race hazard with the mysql-server bootstrapping itself into TLS mode.