I have been regularly hitting this too, in a test suite (for the Calico networking driver) using Queens.
The typical pattern for me is:
- launch an instance on a compute node (which is an NFS client); this succeeds
- after 20-30s, launch another instance on the same compute node; this fails with the same call stack as above.
But I guess that's close enough for us to be talking about the same root cause here.
The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?
(I should add: I'm pretty sure this problem was introduced going from Pike to Queens. I run the same test suite regularly on Pike and Queens, and the Pike test never sees a problem like this; and the nova.privsep.path.utime code was introduced between Pike and Queens.)
I have been regularly hitting this too, in a test suite (for the Calico networking driver) using Queens.
The typical pattern for me is:
- launch an instance on a compute node (which is an NFS client); this succeeds
- after 20-30s, launch another instance on the same compute node; this fails with the same call stack as above.
But I guess that's close enough for us to be talking about the same root cause here.
The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?
(I should add: I'm pretty sure this problem was introduced going from Pike to Queens. I run the same test suite regularly on Pike and Queens, and the Pike test never sees a problem like this; and the nova.privsep. path.utime code was introduced between Pike and Queens.)