nova_compute mount nfs backend every time when vm was launched

Bug #1783978 reported by sklgromek
48
This bug affects 11 people
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Medium
Radosław Piliszek

Bug Description

I use nova compute with the cinder nfs backend, and each time I run a new instance, or restart nova_compute container, the nfs resource is mounted again even if it's already mounted.

root@compute-1:~# mount | grep nfs | wc -l
67
root@compute-1:~# docker restart nova_compute
nova_compute
root@compute-1:~# mount | grep nfs | wc -l
131
root@compute-1:~# docker restart nova_compute
nova_compute
root@compute-1:~# mount | grep nfs | wc -l
259
root@compute-1:~# docker restart nova_compute
nova_compute
root@compute-1:~# mount | grep nfs | wc -l
515

I tried to reproduce the problem on devstack, but everything looks fine here, so I think it is a problem with a kolla

I am using stable/queens release with ubuntu-source images, but with centos-source we had the same problem.

Tags: cinder volume
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

Is it still a bug? Can you reproduce it with latest kolla-ansible/kolla code?

Changed in kolla:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kolla because there has been no activity for 60 days.]

Changed in kolla:
status: Incomplete → Expired
Revision history for this message
Konstantinos Mouzakitis (mouza8) wrote :

Hello all! I'm facing the same issue, using the stable/stein release with centos-binary images. The cinder backends related to this are the ones that use a shared bind mount: http://paste.openstack.org/show/789538/. I've checked this with a NFS backend as well as a Quobyte one and every time one of the nova containers is restarted the mounts double (plus one for me):

[root@node02 ~]# mount | grep nova -c
7
[root@node02 ~]# docker restart nova_compute
nova_compute
[root@node02 ~]# mount | grep nova -c
15
[root@node02 ~]# docker restart nova_compute
nova_compute
[root@node02 ~]# mount | grep nova -c
31
[root@node02 ~]#

Any ideas are very much appreciated!

Thanks a lot!

Changed in kolla:
status: Expired → New
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Could you actually show the mounts? (possibly replacing sensitive stuff with random unique values if any)

Changed in kolla:
status: New → Incomplete
Revision history for this message
Mariusz Karpiarz (mkarpiarz) wrote :
Download full text (6.5 KiB)

```
# mount | grep nfs
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
# mount | grep -c nfs
8
# docker restart nova_compute
nova_compute
# mount | grep -c nfs
16
# mount | grep nfs
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/nova/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d0f988f513a2652c556e3ad4d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.17.5,local_lock=none,addr=192.168.17.11)
192.168.17.11:/kolla_nfs on /var/lib/docker/volumes/nova_compute/_data/mnt/03084b2d...

Read more...

Revision history for this message
Konstantinos Mouzakitis (mouza8) wrote :

Hello all. Just wanted to update you on some further investigation I did on this. So, the problem isn't the shared bind mount used by the backends mentioned above. It's the nested bind mount! There's /var/lib/nova as a destination of a bind mount on the nova containers and with the above cinder backends /var/lib/nova/mnt is also added as a separate bind mount. This causes the mounts to be doubled every time the nova containers (compute, libvirt, ssh) are restarted.

A solution I've found to this is to change the directory that nova mounts the cinder volumes, by changing quobyte_mount_point_base or nfs_mount_point_base depending on the backend. This also needs that directory to be configured as a new bind mount in the three nova containers in kolla-ansible/ansible/roles/nova/defaults/main.yml. A reconfigure after this and it's all working fine. Then you can just umount the old stale mounts on the host.

I'll be looking to patch this by adding the above login in the nova defaults file if we have nfs or quobyte enabled.

Hope this helps anyone that is facing the mounts problem!

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kolla because there has been no activity for 60 days.]

Changed in kolla:
status: Incomplete → Expired
Changed in kolla:
status: Expired → Confirmed
importance: Undecided → Medium
Changed in kolla:
assignee: nobody → Radosław Piliszek (yoctozepto)
Changed in kolla-ansible:
status: New → Confirmed
importance: Undecided → Medium
no longer affects: kolla
Changed in kolla-ansible:
assignee: nobody → Radosław Piliszek (yoctozepto)
Revision history for this message
Kristina Jasser (marvin01) wrote :

same problem here - will someone try to find a solution at some point?

ERDEM AĞBAHCA (erdemag)
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Erdem reached out to me with his currently happening issue and we are working on debugging this issue so that I can propose the best fix.

Changed in kolla-ansible:
status: Confirmed → In Progress
Revision history for this message
Radosław Piliszek (yoctozepto) wrote (last edit ):

I have created a minimal reproducer: https://gist.github.com/yoctozepto/e6fdc2789297fdfdff4fd45fe64c9cb9

And found that the issue was already observed, albeit for a different reason: https://github.com/moby/moby/issues/35323

It seems docker treats cases (2) and (4) differently and for some reason preserves the "shared" submounts and then forcibly mounts again on container start which causes the exponential growth effect.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Please test this patch https://review.opendev.org/c/openstack/kolla-ansible/+/825514 on an already deployed solution (apply and redeploy). Does it allow the VMs to continue to run? Does it allow to create new VMs? Can pre-existing VMs be manipulated (rebooted, migrated, shelved and unshelved). I need this info for the release note.

Revision history for this message
ERDEM AĞBAHCA (erdemag) wrote :

Hello. Thank you for quick solution.This patch solves mount problem.

To answer your questions:
1 - It allows only instances that do not have volume/volumes attached. You need to stop volume attaching instances to prevent data loss due to volume mount changes or at least to remove already filled mountpoint cap on physical compute node.
2 - After successful redeployment of nova-libvirt container you can manipulate pre-existing VMs.

However:
Info: If nova-libvirt is not running you can do the things below over horizon webgui.
If you have running instances on the compute and nova-libvirt container cannot start due to this bug here is what you should do to prevent data loss.

1 - Make sure gracefully shutdown instances that has volumes attached on the compute. Over ssh or connecting via vnc from physical_ip:5XXX port
2 - Check if the instances that you shutdown still has processes running on the compute node. Freebsd instances need killing even after graceful shutdown. Make sure to kill corresponding qemu processes to those instances.
3 - umount /var/lib/nova/mnt on compute node. It can still respond as busy, but since you have stopped all volume attaching instances you don't have to worry about data loss and you can lazy umount with "umount -l"
4 - Apply the patch (This won't affect other running instances on the same compute)
5 - Start the instances you previously shutdown, check if there is any problem with the volumes (there shouldn't be any).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by "Radosław Piliszek <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/825514
Reason: not pursuing

Revision history for this message
Angelos Kolaitis (aggkolaitis) wrote (last edit ):

I also was affected by this issue after enabling Cinder with NFS backend, and the proposed bug-fix did solve the bug for me. Are there any plans to move forward with it? Is there any way to patch this bug without risking breaking existing deployments?

Revision history for this message
Antony Messerli (antonym) wrote (last edit ):

We are seeing this behavior as well, where the mounts slowly increase until the machine appears to become unstable. We are on 2023.1 using Cinder with an NFS backend.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.