Services not running that should be: cinder-scheduler

Bug #1980963 reported by Marian Gasparovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Cinder Charm
Expired
Undecided
Unassigned

Bug Description

During SQA test run one cinder unit stayed blocked because "Services not running that should be: cinder-scheduler"

We saw it only once so far. It is wallaby-focal environment.

Artifacts -
https://oil-jenkins.canonical.com/artifacts/4e444742-a7ba-4432-852b-e94f04531009/index.html

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I also see `Services not running that should be: cinder-volume` in testrun https://solutions.qa.canonical.com/v2/testruns/8ff4ec95-5bcd-4961-82c3-de13929566e6/

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

Seen once more on: https://solutions.qa.canonical.com/v2/testruns/ecdab91a-624c-4655-b74e-2ba4b8a69747/

Artifacts: https://oil-jenkins.canonical.com/artifacts/ecdab91a-624c-4655-b74e-2ba4b8a69747/index.html

SKU: fcb-undertest-yoga-jammy-fce_release

cinder/2 blocked idle 5/lxd/3 10.246.165.84 8776/tcp Services not running that should be: cinder-scheduler, cinder-volume

Revision history for this message
Abdullah (motjuste) wrote (last edit ):
Download full text (10.5 KiB)

This occurs again: https://solutions.qa.canonical.com/testruns/4f0927be-1641-489c-b251-8c3351a92fd0/

SKU: fcb-master-yoga-focal

Artifacts: https://oil-jenkins.canonical.com/artifacts/4f0927be-1641-489c-b251-8c3351a92fd0/index.html
Relevant crashdump: https://oil-jenkins.canonical.com/artifacts/4f0927be-1641-489c-b251-8c3351a92fd0/generated/generated/openstack/juju-crashdump-openstack-2023-10-17-16.36.05.tar.gz

The logs in cinder/1 say to look at journalctl logs:

```
2023-10-17 14:46:47 WARNING unit.cinder/1.shared-db-relation-changed logger.go:60 Job for cinder-scheduler.service failed.
2023-10-17 14:46:47 WARNING unit.cinder/1.shared-db-relation-changed logger.go:60 See "systemctl status cinder-scheduler.service" and "journalctl -xe" for details.
```

Last relevant entries in the journal file:

```
Oct 17 15:01:19 juju-280b01-4-lxd-2 systemd[1]: Reloaded The Apache HTTP Server.
Oct 17 15:01:19 juju-280b01-4-lxd-2 systemd[1]: Reloading.
Oct 17 15:01:21 juju-280b01-4-lxd-2 systemd[1]: Reloading.
Oct 17 15:01:22 juju-280b01-4-lxd-2 systemd[1]: Reloading.
Oct 17 15:01:24 juju-280b01-4-lxd-2 systemd[1]: Reloading.
Oct 17 15:01:25 juju-280b01-4-lxd-2 systemd[1]: Started OpenStack Cinder Scheduler.
Oct 17 15:01:25 juju-280b01-4-lxd-2 systemd[1]: Reloading.
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:152: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: last_heartbeat = column_property(
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:160: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: num_hosts = column_property(
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:169: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.
Oct 17 15:01:29 juju-280b01-4-lxd-2 cinder-volume[14731]: num_down_hosts = column_property(
Oct 17 15:01:31 juju-280b01-4-lxd-2 systemd[1]: cinder-volume.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 15:01:31 juju-280b01-4-lxd-2 systemd[1]: cinder-volume.service: Failed with result 'exit-code'.
Oct 17 15:01:31 juju-280b01-4-lxd-2 systemd[1]: cinder-volume.service: Scheduled restart job, restart counter is at 1.
Oct 17 15:01:31 juju-280b01-4-lxd-2 systemd[1]: Stopped OpenStack Cinder Volume.
Oct 17 15:01:31 juju-280b01-4-lxd-2 systemd[1]: Started OpenStack Cinder Volume.
Oct 17 15:01:40 juju-280b01-4-lxd-2 systemd[1]: Condition check resulted in iSCSI initiator daemon (iscsid) being skipped.
Oct 17 15:01:41 juju-280b01-4-lxd-2 cinder-scheduler[15001]: /usr/lib/python3/dist-packages/cinder/db/sqlalchemy/models.py:152: SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
Download full text (5.5 KiB)

The key part is:

Oct 17 16:46:47 juju-280b01-4-lxd-2 systemd[1]: Stopped OpenStack Cinder Scheduler.
Oct 17 16:46:47 juju-280b01-4-lxd-2 systemd[1]: cinder-scheduler.service: Start request repeated too quickly.
Oct 17 16:46:47 juju-280b01-4-lxd-2 systemd[1]: cinder-scheduler.service: Failed with result 'start-limit-hit'.
Oct 17 16:46:47 juju-280b01-4-lxd-2 systemd[1]: Failed to start OpenStack Cinder Scheduler.

i.e. it ran out of restarts and so stopped trying; it would be good to know *why* it was restarting so often, but that sadly isn't in that log.

Looking at the cinder-scheduler log we see:

2023-10-17 14:46:45.904 127342 ERROR cinder Traceback (most recent call last):
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
2023-10-17 14:46:45.904 127342 ERROR cinder self.dialect.do_execute(
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 732, in do_execute
2023-10-17 14:46:45.904 127342 ERROR cinder cursor.execute(statement, parameters)
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 148, in execute
2023-10-17 14:46:45.904 127342 ERROR cinder result = self._query(query)
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 310, in _query
2023-10-17 14:46:45.904 127342 ERROR cinder conn.query(q)
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 548, in query
2023-10-17 14:46:45.904 127342 ERROR cinder self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 775, in _read_query_result
2023-10-17 14:46:45.904 127342 ERROR cinder result.read()
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1156, in read
2023-10-17 14:46:45.904 127342 ERROR cinder first_packet = self.connection._read_packet()
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 725, in _read_packet
2023-10-17 14:46:45.904 127342 ERROR cinder packet.raise_for_error()
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 221, in raise_for_error
2023-10-17 14:46:45.904 127342 ERROR cinder err.raise_mysql_exception(self._data)
2023-10-17 14:46:45.904 127342 ERROR cinder File "/usr/lib/python3/dist-packages/pymysql/err.py", line 143, in raise_mysql_exception
2023-10-17 14:46:45.904 127342 ERROR cinder raise errorclass(errno, errval)
2023-10-17 14:46:45.904 127342 ERROR cinder pymysql.err.ProgrammingError: (1146, "Table 'cinder.clusters' doesn't exist")

2023-10-17 14:46:43 INFO unit.cinder/1.juju-log server.go:325 shared-db:146: [cinder] Cluster leader, performing db sync
2023-10-17 14:46:45 DEBUG unit.cinder/1.shared-db-relation-changed logger.go:60 2023-10-17 14:46:45.411 127341 INFO cinder.db.migratio...

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It would be good to 'catch' one of these whilst the system is still up, because we need to rule out whether the table does actually exist (and there's a caching issue *somewhere*) which causes the cinder -> router -> mysql to not see the table, or the table is missing and the db sync is failing.

Changed in charm-cinder:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Cinder Charm because there has been no activity for 60 days.]

Changed in charm-cinder:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.