Validate that images root and its master cache are on the same device

Bug #1507894 reported by wangjianhe
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
In Progress
Medium
Unassigned

Bug Description

" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:84
2015-10-20 14:36:05.745 9804 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:85
2015-10-20 14:36:05.745 9804 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): env LC_ALL=C LANG=C qemu-img info /var/lib/ironic/master_images/tmpN_lBn2/5be19a0a-9675-443c-9431-0db9b5574fb4.part execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:199
2015-10-20 14:36:05.759 9804 DEBUG oslo_concurrency.processutils [-] CMD "env LC_ALL=C LANG=C qemu-img info /var/lib/ironic/master_images/tmpN_lBn2/5be19a0a-9675-443c-9431-0db9b5574fb4.part" returned: 0 in 0.014s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:225
2015-10-20 14:36:05.760 9804 DEBUG ironic.common.utils [-] Execution completed, command line is "env LC_ALL=C LANG=C qemu-img info /var/lib/ironic/master_images/tmpN_lBn2/5be19a0a-9675-443c-9431-0db9b5574fb4.part" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:83
2015-10-20 14:36:05.760 9804 DEBUG ironic.common.utils [-] Command stdout is: "image: /var/lib/ironic/master_images/tmpN_lBn2/5be19a0a-9675-443c-9431-0db9b5574fb4.part
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 20G
" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:84
2015-10-20 14:36:05.760 9804 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/lib/python2.7/site-packages/ironic/common/utils.py:85
2015-10-20 14:36:05.761 9804 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "download-image" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:404
2015-10-20 14:36:05.761 9804 DEBUG ironic.common.states [-] Exiting old state 'deploying' in response to event 'fail' on_exit /usr/lib/python2.7/site-packages/ironic/common/states.py:177
2015-10-20 14:36:05.761 9804 DEBUG ironic.common.states [-] Entering new state 'deploy failed' in response to event 'fail' on_enter /usr/lib/python2.7/site-packages/ironic/common/states.py:183
2015-10-20 14:36:10.529 9804 WARNING ironic.conductor.manager [-] Error in deploy of node 5a2ebb83-9303-4738-80cd-33bf10bed74b: [Errno 18] Invalid cross-device link

[Errno 18] Invalid cross-device link

i use ironic kilo.
ironic node-update 5a2ebb83-9303-4738-80cd-33bf10bed74b add instance_info/image_source=5be19a0a-9675-443c-9431-0db9b5574fb4

this image_source add winodws 's qcow2 image.

Revision history for this message
M V P Nitesh (m-nitesh) wrote :

We should not update the node with image id. We have to use image path for that.
Eg: ironic node-update 5a2ebb83-9303-4738-80cd-33bf10bed74 add instance_info/image_source=file:///opt/overcloud-full.qcow2

Revision history for this message
Dmitry Tantsur (divius) wrote :

Hi! Is it possible that /var/lib and /tftpboot (or /httpboot) are on different partitions for you? This could cause such problem.

Changed in ironic:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Ironic because there has been no activity for 60 days.]

Changed in ironic:
status: Incomplete → Expired
Revision history for this message
Mark Goddard (mgoddard) wrote :

Seeing this when running Bifrost in a Docker container with /httpboot and /tftpboot mapped to named Docker volumes. /var/lib/docker/volumes is backed by an LVM volume, and all volumes appear as different mount points within the container.

Perhaps when iPXE is in use we ought to use /httpboot/master_images as the pxe.tftp_master_path location in order to avoid this?

Revision history for this message
Chris Hoge (hoge) wrote :

Same problem for me here too.

Revision history for this message
Chris Hoge (hoge) wrote :

@mgoddard, following up on this, the workaround is to set all of the image config parameters in the pxe configuration to point to the same drive. For example (I moved the tftp and httpboot to a directory that I could mount as a docker volume):

tftp_root = /imagedata/tftpboot
tftp_master_path = /imagedata/tftpboot/master_images
instance_master_path = /imagedata/httpboot/master_images
images_path = /imagedata/tmp

The images_path is easy to miss, and defaults to /var/lib/python...

As far as I can tell, the only place where hard links are set in Ironic are in image_cache.py

Revision history for this message
MarginHu (margin2017) wrote :

thanks Chris, I met this issue and solve it with your workaround.

Ruby Loo (rloo)
Changed in ironic:
status: Expired → Confirmed
Revision history for this message
Ruby Loo (rloo) wrote :

Someone at Intel has also encountered this issue (with kubernetes). Seems like we should fix it, given that more folks seem to be using ironic with containers.

Changed in ironic:
importance: Undecided → Medium
Revision history for this message
Anup (anup-d-navare) wrote :

How to reproduce this?

Revision history for this message
Dmitry Tantsur (divius) wrote :

> Seems like we should fix it, given that more folks seem to be using ironic with containers.

The fix probably is to document to have master cache and images root on the same device.

What I suggest is:
1. document that [pxe]tftp_root has to be on the same device as [pxe]tftp_master_path
2. document that [pxe]images_path has to be on the same device as [pxe]instance_master_path
3. add validation on conductor start-up that both assumptions hold

summary: - 2015-10-20 14:36:10.529 9804 WARNING ironic.conductor.manager [-] Error
- in deploy of node 5a2ebb83-9303-4738-80cd-33bf10bed74b: [Errno 18]
- Invalid cross-device link
+ Validate that images root and its master cache are on the same device
Changed in ironic:
status: Confirmed → Triaged
tags: added: low-hanging-fruit pxe
Revision history for this message
Anup (anup-d-navare) wrote :

I was able to reproduce this error in my local setup. And after debugging found that the "os.link()" python module here https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/image_cache.py#L162 is causing this error. "os.link()" does not take paths if they are on different partitions. Tried using os.symlink() instead but does not work.

Dmitry Tantsur (divius)
Changed in ironic:
assignee: nobody → Dmitry Tantsur (divius)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/547036

Changed in ironic:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Dmitry Tantsur (<email address hidden>) on branch: master
Review: https://review.openstack.org/547036
Reason: I don't have time for this, feel free to take over.

Dmitry Tantsur (divius)
Changed in ironic:
assignee: Dmitry Tantsur (divius) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.