Hey Neil,
"The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?"
I wondered the same thing. I'm guessing most folks use something like Ceph for shared storage (at last users using TripleO).
Looking at your patch I think the modifying the file time is done for a reason (I cant recall what it was but I remember it was to notify some other mechanism in nova about the cache), so rather than just ignoring the failure, my patch in my testing resolves it. Did you ever try using my patch to see if the problem still happens for you?
Hey Neil,
"The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?"
I wondered the same thing. I'm guessing most folks use something like Ceph for shared storage (at last users using TripleO).
Looking at your patch I think the modifying the file time is done for a reason (I cant recall what it was but I remember it was to notify some other mechanism in nova about the cache), so rather than just ignoring the failure, my patch in my testing resolves it. Did you ever try using my patch to see if the problem still happens for you?