NFS works fine with BZR unless your BZR installation is broken. It's always "that other guy's problem". But real users in real network configurations who don't happen to give bzr what it demands in terms of file locks have been complaining about this for five years. Bzr doesn't gracefully fall back; it dies and leaves an ugly corpse behind. Patches have been posted for this but never applied. And yes, my NFS server is running lockd, but it doesn't actually work with my Ubuntu client's NFS implementation. Somehow, my network filesystem all works, flawlessly, for years, but not for bzr.
This bug has fourteen duplicates. In many, you tried to patiently explain that it was *their* problem and that if they just teased their network configuration long enough, no bug would actually need to be fixed. Several of those bug reports had other users tack on comments saying, "Me too - I'm having the same trouble". As far as I could tell, few of them ever actually got their problem resolved -- they just went away unsatisfied. In one of them, #137387, you went so far as to analyze a tcpdump of the underlying NFS traffic and detected a possible bug in file locking in NFSv4, then wrote "I'm not sure what to do with this bug. I'm close to saying it's a server bug or quirk and bzr is not doing anything wrong." From other comments, I think it related to subtle semantics: can you upgrade a read lock to a write lock by having the same program make a second lock on the same file, or does the second lock produce some kind of later error? It's much harder to "do the obvious thing" in that case, from the wrong end of a network connection and without any idea of which user process an NFS request is coming from. Instead, a truncate that tries to drop a byte locked by the read lock might well return an error, protecting the file contents for the reader. Even if you're right and they're wrong, you didn't solve that user's problem. The sysadmins involved had been fighting much more serious NFS bugs for years; they were hoping to find version control software that didn't tickle subtle file locking semantics questions. They patched out the read lock and it fixed the problem, but you didn't accept the patch. You didn't even take their fix for leaving a lockfile lying around when a filesystem lock fails (requiring a bzr break-lock to recover). In another report, #114528, two users reported switching their project to subversion because it worked on their AFP network and bzr didn't. Still no response.
Rather than telling me and every other bug reporter to reconfigure our filesystems and LANs and patch up our kernels, please consider this a wake-up call. File locking is giving your users more trouble than it cures. The existing code doesn't work cleanly in a wide variety of actual installations, despite the specs saying that it should. Even when it works, it takes too much sysadmin effort, and when it fails, it invariably hurts somebody who never actually needed any file locking, somebody who'll never do bzr operations in parallel. Rewriting the whole file format seems to be taking a lot of time and isn't working yet. Maybe the old file format will work even if you don't lock it, or you can add a command that script writers can use to explicitly control parallelism for the few who care. Perhaps before the decade is out, you or another bzr maintainer could eliminate the misfeature, remove the poorly chosen dependency, accept some of the patches, handle an error condition, finish the rewrite, fix the bug or whatever it takes. Thank you for your work on bzr.
NFS works fine with BZR unless your BZR installation is broken. It's always "that other guy's problem". But real users in real network configurations who don't happen to give bzr what it demands in terms of file locks have been complaining about this for five years. Bzr doesn't gracefully fall back; it dies and leaves an ugly corpse behind. Patches have been posted for this but never applied. And yes, my NFS server is running lockd, but it doesn't actually work with my Ubuntu client's NFS implementation. Somehow, my network filesystem all works, flawlessly, for years, but not for bzr.
This bug has fourteen duplicates. In many, you tried to patiently explain that it was *their* problem and that if they just teased their network configuration long enough, no bug would actually need to be fixed. Several of those bug reports had other users tack on comments saying, "Me too - I'm having the same trouble". As far as I could tell, few of them ever actually got their problem resolved -- they just went away unsatisfied. In one of them, #137387, you went so far as to analyze a tcpdump of the underlying NFS traffic and detected a possible bug in file locking in NFSv4, then wrote "I'm not sure what to do with this bug. I'm close to saying it's a server bug or quirk and bzr is not doing anything wrong." From other comments, I think it related to subtle semantics: can you upgrade a read lock to a write lock by having the same program make a second lock on the same file, or does the second lock produce some kind of later error? It's much harder to "do the obvious thing" in that case, from the wrong end of a network connection and without any idea of which user process an NFS request is coming from. Instead, a truncate that tries to drop a byte locked by the read lock might well return an error, protecting the file contents for the reader. Even if you're right and they're wrong, you didn't solve that user's problem. The sysadmins involved had been fighting much more serious NFS bugs for years; they were hoping to find version control software that didn't tickle subtle file locking semantics questions. They patched out the read lock and it fixed the problem, but you didn't accept the patch. You didn't even take their fix for leaving a lockfile lying around when a filesystem lock fails (requiring a bzr break-lock to recover). In another report, #114528, two users reported switching their project to subversion because it worked on their AFP network and bzr didn't. Still no response.
Rather than telling me and every other bug reporter to reconfigure our filesystems and LANs and patch up our kernels, please consider this a wake-up call. File locking is giving your users more trouble than it cures. The existing code doesn't work cleanly in a wide variety of actual installations, despite the specs saying that it should. Even when it works, it takes too much sysadmin effort, and when it fails, it invariably hurts somebody who never actually needed any file locking, somebody who'll never do bzr operations in parallel. Rewriting the whole file format seems to be taking a lot of time and isn't working yet. Maybe the old file format will work even if you don't lock it, or you can add a command that script writers can use to explicitly control parallelism for the few who care. Perhaps before the decade is out, you or another bzr maintainer could eliminate the misfeature, remove the poorly chosen dependency, accept some of the patches, handle an error condition, finish the rewrite, fix the bug or whatever it takes. Thank you for your work on bzr.