lock problem on transient disconnection during acquisition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
txzookeeper |
New
|
Medium
|
Unassigned |
Bug Description
<benbangert> hazmat: were you going to fix txzookeeper's lock?
<hazmat> benbangert, unsurprising, and that sounds great re zc.zk changes
<hazmat> benbangert, what was the problem?
<benbangert> the create node edge case
<benbangert> you create the node, server dies and you get connection loss, but the node was created
<benbangert> so then txzookeeper reconnects... and makes another node :)
<benbangert> and now there's two nodes it created, except it doesn't know the other one actually worked
<benbangert> thats why the recipe has the GUID bit in it now
* hazmat files a bug
<benbangert> if you look at my async lock, on connection loss during create candidate, it waits till reconnect and then calls get_children to see if the create actually did work
<benbangert> same thing could happen with create node using non-ephemeral of course, which would cause a program to throw a NodeAlreadyExists bug and might leave someone scratching their head if they weren't aware of that edge case
<hazmat> yeah.. it would have to check the lock children and match on session id owner for an error to know determinstically for the ephemeral seq
<hazmat> or use explicit client ids/guids for the node names
<benbangert> yea, the lock recipe uses the guid node name, prolly cause its cheaper/faster than calling get on every child
<hazmat> definitely
Changed in txzookeeper: | |
importance: | Undecided → Medium |