Dan has already proposed the patches but putting my analysis here just to keep a record of RCA and in case anyone is interested in knowing what happened :)
First log:
Jun 26 01:02:49.782746 np0034441113 cinder-volume[99943]: DEBUG cinder.volume.volume_utils [req-25e4ae38-b629-4698-97ea-d5b712cb4209 req-e4bf7950-b372-4010-9fd0-80f48fdd26c1 tempest-ServerActionsV293TestJSON-85724339 None] Attempting download of e7228a06-ab6d-4ddd-9fa5-d23219175760 ((None, [{'url': 'file:///opt/stack/data/glance/images/e7228a06-ab6d-4ddd-9fa5-d23219175760', 'metadata': {'store': 'cheap'}}, {'url': 'rbd://937d1bdf-6a12-4028-b011-5e2cfa4ba014/images/e7228a06-ab6d-4ddd-9fa5-d23219175760/snap', 'metadata': {'store': 'robust'}}])) to volume c0dd890d-0f87-4409-996e-7d071475acfd. {{(pid=99943) copy_image_to_volume /opt/stack/cinder/cinder/volume/volume_utils.py:1190}}
Last Log:
Jun 26 01:03:23.661182 np0034441113 cinder-volume[99943]: DEBUG cinder.volume.manager [req-25e4ae38-b629-4698-97ea-d5b712cb4209 req-e4bf7950-b372-4010-9fd0-80f48fdd26c1 tempest-ServerActionsV293TestJSON-85724339 None] Re-imaged e7228a06-ab6d-4ddd-9fa5-d23219175760 to volume c0dd890d-0f87-4409-996e-7d071475acfd successfully. {{(pid=99943) reimage /opt/stack/cinder/cinder/volume/manager.py:5391}}
Total time taken ~34 seconds
The timeout for the reimage operation is set by the config option "reimage_timeout_per_gb" which defaults to 20 seconds per GB which causes the failure.
Hi,
Dan has already proposed the patches but putting my analysis here just to keep a record of RCA and in case anyone is interested in knowing what happened :)
Using this job result to analyse logs: /b932a1446345e1 01b3ef- 4740624f0848c8c 3257f704064a451 6f.ssl. cf5.rackcdn. com/883557/ 2/check/ nova-ceph- multistore/ d7db64f/
https:/
Nova logs state that we failed with a timeout
Jun 26 01:03:11.721921 np0034441113 nova-compute[ 97944]: ERROR oslo_messaging. rpc.server result = hub.switch() 97944]: ERROR oslo_messaging. rpc.server File "/usr/local/ lib/python3. 10/dist- packages/ eventlet/ hubs/hub. py", line 313, in switch 97944]: ERROR oslo_messaging. rpc.server return self.greenlet. switch( ) 97944]: ERROR oslo_messaging. rpc.server eventlet. timeout. Timeout: 20 seconds
Jun 26 01:03:11.721921 np0034441113 nova-compute[
Jun 26 01:03:11.721921 np0034441113 nova-compute[
Jun 26 01:03:11.721921 np0034441113 nova-compute[
Looking at the cinder logs, we can analyse the time taken with the request ID for the reimage request
Request ID: req-25e4ae38- b629-4698- 97ea-d5b712cb42 09
First log: volume[ 99943]: DEBUG cinder. volume. volume_ utils [req-25e4ae38- b629-4698- 97ea-d5b712cb42 09 req-e4bf7950- b372-4010- 9fd0-80f48fdd26 c1 tempest- ServerActionsV2 93TestJSON- 85724339 None] Attempting download of e7228a06- ab6d-4ddd- 9fa5-d232191757 60 ((None, [{'url': 'file:/ //opt/stack/ data/glance/ images/ e7228a06- ab6d-4ddd- 9fa5-d232191757 60', 'metadata': {'store': 'cheap'}}, {'url': 'rbd:// 937d1bdf- 6a12-4028- b011-5e2cfa4ba0 14/images/ e7228a06- ab6d-4ddd- 9fa5-d232191757 60/snap' , 'metadata': {'store': 'robust'}}])) to volume c0dd890d- 0f87-4409- 996e-7d071475ac fd. {{(pid=99943) copy_image_ to_volume /opt/stack/ cinder/ cinder/ volume/ volume_ utils.py: 1190}}
Jun 26 01:02:49.782746 np0034441113 cinder-
Last Log: volume[ 99943]: DEBUG cinder. volume. manager [req-25e4ae38- b629-4698- 97ea-d5b712cb42 09 req-e4bf7950- b372-4010- 9fd0-80f48fdd26 c1 tempest- ServerActionsV2 93TestJSON- 85724339 None] Re-imaged e7228a06- ab6d-4ddd- 9fa5-d232191757 60 to volume c0dd890d- 0f87-4409- 996e-7d071475ac fd successfully. {{(pid=99943) reimage /opt/stack/ cinder/ cinder/ volume/ manager. py:5391} }
Jun 26 01:03:23.661182 np0034441113 cinder-
Total time taken ~34 seconds
The timeout for the reimage operation is set by the config option "reimage_ timeout_ per_gb" which defaults to 20 seconds per GB which causes the failure.