We are running Sc010 kvm in both vexx cloud and in the internal cloud.
The job which runs in the internal cloud fails with the below error:-
~~~
2022-08-12 07:38:27,832 p=89450 u=root n=ansible | 2022-08-12 07:38:27.831192 | fa163e0d-40f2-7933-9109-000000000070 | FATAL | Run cephadm bootstrap
.
.
Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version
ceph: stderr Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 9106, in <module>
main()
File "/usr/sbin/cephadm", line 9094, in main
r = ctx.func(ctx)
File "/usr/sbin/cephadm", line 1969, in _default_image
return func(ctx)
File "/usr/sbin/cephadm", line 4707, in command_bootstrap
image_ver = CephContainer(ctx, ctx.image, 'ceph', ['--version']).run().strip()
File "/usr/sbin/cephadm", line 3739, in run
out, _, _ = call_throws(self.ctx, self.run_cmd(),
File "/usr/sbin/cephadm", line 1636, in call_throws
raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version: Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory", "stderr_lines": ["Verifying podman|docker is present...", "Verifying lvm2 is present...", "Verifying time synchronization is in place...", "Unit chronyd.service is enabled and running", "Repeating the final host check...", "podman (/bin/podman) version 4.1.1 is present", "systemctl is present", "lvcreate is present", "Unit chronyd.service is enabled and running", "Host looks OK", "Cluster fsid: e1f5356e-8579-59d7-a01c-bd09ff028582", "Verifying IP 192.168.42.1 port 3300 ...", "Verifying IP 192.168.42.1 port 6789 ...", "Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network", "Adjusting default settings to suit single-host cluster...", "Pulling container image quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph...", "Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version", "ceph: stderr Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory", "Traceback (most recent call last):", " File "/usr/sbin/cephadm", line 9106, in <module>", " main()", " File "/usr/sbin/cephadm", line 9094, in main", " r = ctx.func(ctx)", " File "/usr/sbin/cephadm", line 1969, in _default_image", " return func(ctx)", " File "/usr/sbin/cephadm", line 4707, in command_bootstrap", " image_ver = CephContainer(ctx, ctx.image, 'ceph', ['--version']).run().strip()", " File "/usr/sbin/cephadm", line 3739, in run", " out, _, _ = call_throws(self.ctx, self.run_cmd(),", " File "/usr/sbin/cephadm", line 1636, in call_throws", " raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')", "RuntimeError: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph -e NODE_NAME=standalone.localdomain -e CEPH_USE_RANDOM_NONCE=1 quay.rdoproject.org/tripleomastercentos9/daemon:current-ceph --version: Error: container-init binary not found on the host: stat /usr/libexec/podman/catatonit: no such file or directory"], "stdout": "", "stdout_lines": []}
~~~
Same sc010 kvm job is passing in vexx Cloud.
https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master&skip=0
As per blog[1] This can happen due to the missing catatonit package which is a weak dependency of podman.
[1] https://unix.stackexchange.com/questions/619212/podman-run-with-init-gives-me-error-container-init-binary-not-found-on-the-h
From logs, I can confirm podman-catatonit.x86_64 missing in the internal job but present in the job running in vexx cloud.
Another difference is in the podman version and the source repo of the podman package:-
In vexx job:-
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-kvm-standalone-master/97aa4c5/logs/undercloud/var/log/extra/package-list-installed.txt.gz
~~~
podman.x86_64 2:4.1.1-3.el9 @appstream
podman-catatonit.x86_64 2:4.1.1-3.el9 @appstream
~~~
Internal job:-
~~~
podman.x86_64 2:4.1.1-6.el9 @quickstart-centos-appstreams
~~~
Now it is seen in most of the standalone jobs where ceph is deployed.
We can following error in these jobs: /logserver. rdoproject. org/openstack- periodic- integration- main/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-scenario001- standalone- master/ a4483e9/ job-output. txt operator. tripleo_ ceph_deploy : Run Ceph Deploy".
[1]. https:/
during "tripleo.
[2]. https:/ /logserver. rdoproject. org/openstack- periodic- integration- main/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-scenario010- ovn-provider- standalone- master/ 2a0a471/ job-output. txt
[3]. https:/ /logserver. rdoproject. org/openstack- periodic- integration- main/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-scenario004- standalone- master/ 24b2443/ job-output. txt
[4]. https:/ /logserver. rdoproject. org/openstack- periodic- integration- main/opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 9-scenario010- kvm-standalone- master/ 6c09e31/ job-output. txt