pt-archiver --bulk-insert data loss
Bug #820079 reported by
Daniel Nichter
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Using --bulk-insert deletes more rows than it inserts, losing data.
Originally reported at http://
Changed in percona-toolkit: | |
assignee: | Daniel Nichter (daniel-nichter) → nobody |
Changed in percona-toolkit: | |
importance: | Critical → High |
Changed in percona-toolkit: | |
importance: | High → Undecided |
Changed in percona-toolkit: | |
assignee: | nobody → Brian Fraser (fraserbn) |
Changed in percona-toolkit: | |
assignee: | Brian Fraser (fraserbn) → nobody |
To post a comment you must log in.
mysql> show create Table Test; -+----- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -----+ -+----- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -----+ -+----- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- -----+
+------
| Table | Create Table |
+------
| Test | CREATE TABLE `Test` (
`PKey` varchar(30) NOT NULL DEFAULT '',
`Time` double DEFAULT NULL,
PRIMARY KEY (`PKey`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+------
1 row in set (0.00 sec)
Note: Test_Archive also has exactly the same structure.
mysql> select * from Test;
+------+------+
| PKey | Time |
+------+------+
| d | 7 |
| a | 10 |
| b | 9 |
| c | 8 |
| z | 1 |
| x | 2 |
| v | 3 |
| u | 4 |
+------+------+
8 rows in set (0.00 sec)
mysql> select * from Test_Archive;
Empty set (0.00 sec)
Now executed,
mk-archiver -u xxxx -p xxxx --source h=localhost, D=xxxxxx, t=Test --dest h=localhost, D=xxxxxx, t=Test_ Archive --where="Time < 8" --limit=2 --bulk-insert --bulk-delete
After that,
mysql> select * from Test;
+------+------+
| PKey | Time |
+------+------+
| a | 10 |
| b | 9 |
| c | 8 |
| x | 2 |
| v | 3 |
+------+------+
5 rows in set (0.00 sec)
mysql> select * from Test_Archive;
+------+------+
| PKey | Time |
+------+------+
| d | 7 |
| z | 1 |
+------+------+
2 rows in set (0.00 sec)
Data loss ...Attached the output of this as debug.out and the data used as data.sql
On my initial analysis,
The delete query is taking Pkey corresponding to last seen time column of first_sql. The same pKey is used in the condition of get_next sql query. However this doesn't need to be correct across all the queries.
Note that things will work fine if the data set is slightly modified i.e. replace the position of d and z.
The problem also occurs when the primary key is a alpha numeric string.
Note that things will work fine if i=Time is set in DSN. However in our real scenario this can't be set due to variety of reasons.
IMO, we could do order by of both PKey and Time i.e. primary key and the column on which condition is made or give a option to specify the column name that has to be used for select (get_next) and delete (del row sql) queries eventhough they are not indexes.