Storm should try to support two-phase commit where possible

Bug #132485 reported by Christopher Armstrong
16
Affects Status Importance Assigned to Milestone
Landscape Server
Invalid
Wishlist
Unassigned
Launchpad itself
Invalid
Low
Unassigned
Storm
Fix Released
Wishlist
Free Ekanayaka
psycopg
Fix Released
Undecided
Unassigned
psycopg2 (Ubuntu)
Fix Released
Wishlist
Unassigned

Bug Description

Certain databases support two-phase commit. Here's the proposed API:

store.prepare()
...
store.commit_prepared()

The implementation of these methods should indirect through the backends, perhaps with something like self._connection.prepare() and self._connection.commit_prepared().

If a database backend doesn't support two-phase commit, these methods should raise some sort of well-defined "Unsupported" exception.

Related branches

Revision history for this message
James Henstridge (jamesh) wrote :

You probably also need an rollback_prepared() for the case where some other resource fails the first phase of the commit and you want to abort the transaction.

The other form this API could take up is for store.commit() and store.rollback() to perform the second phase of a commit if prepare() was called, and make prepare() a no-op for backends that do not support two-phase commit. This is essentially how Zope transaction data managers that don't support two-phase commit are implemented.

Revision history for this message
James Henstridge (jamesh) wrote :

Another thing to consider is what behaviour should a connection or store exhibit between the two phases of commit. The two options I can see are:
 1. disallow all queries besides the final commit or rollback
 2. consider prepare() to mark the start of the next transaction (this is what Postgres effectively allows, allowing you to accumulate multiple prepared transactions).

I'd probably lean towards (1) due to it being simpler and more likely to fit other databases. This probably requires more investigation though.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Posting some discussions that happened one month ago on the topic:

Aug 14 13:23:58 <niemeyer> radix: I think store.commit_prepared() might not be needed
Aug 14 13:24:07 <radix> you think it should just be "commit()"?
Aug 14 13:24:59 <niemeyer> radix: If the backend needs special handling for prepared statements, it can just "if self._prepared_id: ..."
Aug 14 13:26:44 <niemeyer> radix: This way the whole API change would consist of adding {store,connection}.prepare()
Aug 14 13:27:12 <radix> ok, fair enough
Aug 14 13:27:28 <niemeyer> radix: store.commit() doesn't even have to change, I think
Aug 14 13:28:16 <radix> oh
Aug 14 13:28:18 <radix> _connection.commit :)
Aug 14 13:28:24 <radix> cool
Aug 14 13:32:07 <niemeyer> radix: I wonder if we should make connection.prepare() raise an exception, or just pass.. there's a subtle difference
Aug 14 13:32:32 <niemeyer> radix: If store.prepare() raises an exception, people won't call it on unsupported databases
Aug 14 13:32:50 <niemeyer> radix: That's good, in the sense that we're warning them that the database won't support it
Aug 14 13:33:50 <niemeyer> radix: But if we make it just pass, prepare() will at least flush() changes, which even though most people don't know, it's a lot better than nothing.
Aug 14 13:26:44 <niemeyer> radix: This way the whole API change would consist of adding {store,connection}.prepare()
Aug 14 13:27:12 <radix> ok, fair enough
Aug 14 13:27:28 <niemeyer> radix: store.commit() doesn't even have to change, I think
Aug 14 13:28:16 <radix> oh
Aug 14 13:28:18 <radix> _connection.commit :)
Aug 14 13:28:24 <radix> cool
Aug 14 13:32:07 <niemeyer> radix: I wonder if we should make connection.prepare() raise an exception, or just pass.. there's a subtle difference
Aug 14 13:32:32 <niemeyer> radix: If store.prepare() raises an exception, people won't call it on unsupported databases
Aug 14 13:32:50 <niemeyer> radix: That's good, in the sense that we're warning them that the database won't support it
Aug 14 13:33:50 <niemeyer> radix: But if we make it just pass, prepare() will at least flush() changes, which even though most people don't know, it's a lot better than nothing.

Revision history for this message
Richard Boulton (richardboulton) wrote :

Just to add my thoughts: adding only store.prepare() to the API makes sense to me; this keeps the API consistent and unsurprising since commit() is always used to finish the transaction. Also, changing code to use 2-phase commit instead of 1-phase is as simple as possible.

If 2-phase commit isn't supported by a database, prepare() should definitely raise an exception. Here's my reasoning:

 - the point of prepare() is to check that there are no constraint violations which will stop a commit() proceeding, to allow a coordinated 2-phase commit.
 - if it is possible for prepare() to return successfully without performing this check, it isn't possible to write a reliable 2-phase commit algorithm.
 - therefore, if prepare doesn't raise an exception if 2-phase commit isn't supported by the databases, programmers have to check whether 2-phase commit is supported by the underlying database if they want to know whether they're safe from database integrity violation.
 - this exposes programmers who haven't read the documentation carefully, or who are forgetful, to the risk that they're using a database backend which doesn't support 2-phase commit, but think that they _are_ using 2-phase commit, because prepare() is returning successfully.
 - this can lead to dataloss and general badness.

It is easy for a programmer to write a special case check around prepare() to catch an "Unsupported" exception (or, we could provide an optional parameter to prepare() to say "don't raise an exception if unsupported") - but we should make sure the programmer is aware that they're ignoring an error.

In simpler terms, I think programmers would find it surprising that prepare() could return successfully without actually doing the prepare, so the "principle of least surprise" says that it shouldn't do that.

To take the appropriate line from the Zen of Python "Errors should never pass silently."

Changed in psycopg:
status: Unknown → Confirmed
Revision history for this message
James Henstridge (jamesh) wrote :

I've done an initial patch to add the necessary APIs to psycopg2. Federico has a few issues with the extra methods, so it won't be going in right away.

Changed in psycopg:
status: Confirmed → New
Revision history for this message
James Henstridge (jamesh) wrote :

Here's a status update on the infrastructure side:

 1. We fleshed out a standard DB-API extension for two-phase commit. It is included in the current version of PEP 249.
 2. I haven't yet implemented it for psycopg2. My initial patch would need a bit of work to support the new API, and I've got other things on my plate in the short term.
 3. The DB-API extension should be implementable for MySQL if someone wants to take on that job.

Once we've got some adapters supporting the extension we can work on the Storm API.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Wow, that's awesome!

Stuart Bishop (stub)
Changed in storm:
importance: Undecided → Wishlist
status: New → Confirmed
Changed in launchpad:
importance: Undecided → High
status: New → Triaged
Changed in launchpad-foundations:
milestone: none → 2.1.12
Changed in landscape:
importance: Undecided → Wishlist
Changed in launchpad-foundations:
milestone: 2.1.12 → 2.2.1
Changed in launchpad-foundations:
milestone: 2.2.1 → 2.2.2
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

The risk for Launchpad is minimal and the recovery pretty trivial.

Changed in launchpad-foundations:
importance: High → Low
milestone: 2.2.2 → none
Changed in landscape:
milestone: none → later
Chuck Short (zulcss)
Changed in psycopg2 (Ubuntu):
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

Hi there! I'd like to revive this bug.

Since version 2.3, the pyscopg2 driver implements the two-phase commit extension of the DB API specification version 2.0. So I'm keen to pick it up and implement Storm's support for two-phase commit on top of the DB API 2.0 interface.

The idea would be to add a prepare() method to the Store class, as proposed by earlier comments in this bug. Invoking prepare() on a backend that doesn't support the DB API two-phase commit extension would raise NotSupportedError, as recommended in comment #4. Then commit() or rollback() would transparently perform the second phase of the commit, if prepare() was called, as suggested in comment #1.

We might also want to have a begin(transaction_id) method on the Store class, accepting a unique string identifier. This would call the backend's tcp_begin() DB API method under the hood, and can be used by transaction managers.

Integration with ZStorm would be done using the above API, and probably adding something like a set_tpc(flag) method to ZStorm to enable/disable two-phase commit (it would be off by default).

Another option would be to explicitly add two new methods, e.g. tcp_commit() and tcp_rollback(), to clearly mark the difference with commit() and rollback(), though I'm not sure how useful that would be and what the possible use cases are.

Thoughts?

Revision history for this message
Stuart Bishop (stub) wrote :

I'd be happy with just adding prepare() as you describe and modifying commit() and rollback().

The default should certainly be off, as having it on by default is slow and dangerous if you don't have procedures in place to clean up prepared but uncommitted transactions.

Extending this beyond that with begin() or explicit tpc_commit()/tpc_rollback() methods would warrant some discussion on the mailing list. In particular, you would also need to expose the connection.xid() method to allow people to generate the transaction ids . There is also the option of just adding the transaction id as an argument to prepare() if people need to set it explicitly.

Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

Hi Stuart,

thanks for the comments!

Okay, let's go for modified commit() and rollback(). I will make tpc off by default in ZStorm and have a way to configure it per-store, because you might don't want to have it on for all stores, for example if they set the isolation level to autocommit (tpc is not supported), something like zstorm.set_default_tpc(name, flag), and maybe have a way to configure it per-transaction too.

Regarding begin(), I think we need it because the ZStorm machinery needs to call it when during register_store_with_transaction, afaiu you need to call the DB API level tpc_begin() before issuing any query in the transaction. So prepare() alone is not enough, unless we are fine with ZStorm accessing the raw connection directly via store._connection._raw_connection (but that doesn't seem great).

My idea would be to add a Storm-level Xid class, that you can instantiate passing the standard XA arguments (format, transaction_id, branch_qualifier). Then you can pass Xid instances to store.begin(), and they would be handled transparently.

If you think we should move this to the mailing list, I'll do.

Revision history for this message
Free Ekanayaka (free.ekanayaka) wrote :

I went ahead and implemented a first version of the branch, with 2 new store methods: begin(xid) and prepare(). So we can discuss actual code (possibly on the mailing list). If there are no objections I'll open a merge proposal for it, so people can comment and I can make any needed changes.

Changed in storm:
status: Confirmed → In Progress
assignee: nobody → Free Ekanayaka (free.ekanayaka)
milestone: none → 0.20
Stuart Bishop (stub)
Changed in launchpad:
status: Triaged → Invalid
Changed in psycopg2 (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Stuart Bishop (stub) wrote :

psycopg #172 not updating

Changed in psycopg:
importance: Unknown → Undecided
status: New → Fix Released
Changed in storm:
status: In Progress → Fix Committed
Changed in storm:
status: Fix Committed → Fix Released
affects: launchpad → ubuntu-beginners-launchpad-projects
Changed in ubuntu-beginners-launchpad-projects:
status: Invalid → New
William Grant (wgrant)
affects: ubuntu-beginners-launchpad-projects → launchpad
Changed in launchpad:
status: New → Invalid
Revision history for this message
🤖 Landscape Builder (landscape-builder) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

Landscape Team

Changed in landscape:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.