Journal recovery failure on global timestamp mismatch

Bug #1125603 reported by Peter Beaman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Akiban Persistit
Fix Released
Critical
Peter Beaman

Bug Description

A customer system fails to start due to this Exception:

Caused by: com.persistit.exception.CorruptVolumeException: Volume /var/lib/akiban/akiban_data.v01 has a global timestamp greater than system timestamp: 58897889320 > 58869008855
 at com.persistit.VolumeHeader.verifyVolumeHeader(VolumeHeader.java:337)

Upon further investigation we discovered a record in the journal with an invalid VolumeHandle:

 3,712,000,036,776 58,869,008,137 IT ( 59) handle 00735 volume 2147483647 treeName information_schema.index_statistics

The volume handle refers to the newly add lock volume. No tree or volume records related to that volume should be found in the journal. The consequence is that RecoveryManager falsely determined that the journal is corrupt when except for the presence of this erroneous record it is not.

Almost certainly this bug was induced by support for the new Exchange#lock mechanism.

Related branches

Peter Beaman (pbeaman)
Changed in akiban-persistit:
assignee: nobody → Peter Beaman (pbeaman)
Revision history for this message
Yuval Shavit (yshavit) wrote :

I had this occur to me earlier in the day, too. I saved the contents of /tmp/akiban_server before blowing it away and starting from scratch. Let me know if you need that.

Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: Confirmed → Fix Committed
Changed in akiban-persistit:
milestone: none → 3.2.6
Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.