Comment 13 for bug 680529

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 680529] Re: "Lock was renamed into place, but now is missing"

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/25/2011 7:06 AM, Daniel Bela wrote:
> Hi again,
>
>> Yes, that's probably what it is, and that's probably what would fix
>> it. If you would like to try a patch to the point it works reliably
>> on your network, I will help you finish and land it. Basically you
>> just need to insert a loop (perhaps up to say 5 times), and a
>> time.sleep() in lockdir.py.
>
> well, I probably have to give it a try. Could you point me to the function in which I would insert the loop?
> Is that lock_write(), or lock_read(), or unlock()?
>
> I'm in no way a programmer, so this is kind of hard for me. But I'll try
> for sure...
>
> Regards
> Bela
>

It would be in bzrlib/lockdir.py

As part of _attempt_lock (on line 251), you can see that it calls
"self.peek()" to check that we actually managed to lock correctly.

My guess is that the earlier "self.transport.rename()" call is
succeeding, but that by the time we try to peek() we have failed to find
the file.

You could potentially put the wait loop in either peek or in
_attempt_lock. Though I'll mention that there are quite a few callers of
peek, but I haven't thought through whether it would be appropriate for
them to block for a few seconds if they can't find the file (some of
them probably not).

To help in debugging this in the future, I would probably also add a
mutter('After successfully renaming the lock, we failed to find a lock
file, trying again in %.1f seconds.')

Or something along those lines.

As long as it doesn't slow down the common case of the lock file
appearing immediately, we could probably wait a second or two.

Note that if we can't obtain the lock in the first place (it is already
held), we default to waiting up to 30 seconds, polling every 1.0s to see
if the lock has been released. I think those numbers are now a bit too
high, but that does give a baseline we can think about.

I would probably peek every 100ms for at most 1.0s, but really it
depends on your filesystem consistency timeout. (If it takes 10s for a
renamed file to show up at the new location, then that's how long you
have to wait.)

I would probably set those numbers as values that you can get at
externally (like _DEFAULT_TIMEOUT_SECONDS), and we might consider having
them as a configuration setting.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0/IiwACgkQJdeBCYSNAAPXgwCfXDKeKy/1hf5GYmcm74h8e1io
/SQAoJkMEoQQayCteWjbUFvUjO0Bj6QT
=uvaE
-----END PGP SIGNATURE-----