The agent does expect case when partx returns 0 even if it failed to read partition table

Bug #1736386 reported by Nikolay Fedotov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ironic-python-agent
In Progress
Medium
Nikolay Fedotov

Bug Description

https://github.com/openstack/ironic-python-agent/blob/db5272cfea9fa894675690d54b7d042cb3d01df3/ironic_python_agent/extensions/image.py#L44-L46

Here ^^^ partx returns 0 if it "failed to read partition table". The agent believe that everything is Ok and go further. A moment later the agent tries to find partition by UUID but it is not visible yet. As a result DeviceNotFound exception occurs then bare instance stucks in ERROR state.

The partition exists.

Retrying (on second+ attempt) "nova boot" helps.

See logs attached

Revision history for this message
Nikolay Fedotov (nfedotov) wrote :
Revision history for this message
Nikolay Fedotov (nfedotov) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/525577

Changed in ironic-python-agent:
assignee: nobody → Nikolay Fedotov (nfedotov)
status: New → In Progress
Revision history for this message
Nikolay Fedotov (nfedotov) wrote :

Proposed fix did not help. Now partx is called 3 times but it keeps returning "failed to read partition table" message and the partition is not found. But the conductor detected the partition and then passed it's UUID to ironic-agent a moment ago.

Dmitry Tantsur (divius)
Changed in ironic-python-agent:
importance: Undecided → Medium
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

I looked at the logs and I can't help but wonder if what is occuring is the bock device is still locked via the iscsi connection and it has not been fully disengaged and released, which would be required to read the new partition table.

Revision history for this message
Nikolay Fedotov (nfedotov) wrote :

Yes. It looks like there is a race condition somewhere between "destroying metadata"<->"exposing disk to conductor"->"do partitioning on conductor side". I added disk-wait after destroying metadata step and it works for me now. Created new issue for "No partition with UUID..." https://bugs.launchpad.net/ironic-python-agent/+bug/1739421 because this one is about partx. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.