innobackupex incremental backups duplicate files unnecessarily
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraBackup moved to https://jira.percona.com/projects/PXB |
Triaged
|
Wishlist
|
Unassigned | ||
2.0 |
Won't Fix
|
Undecided
|
Unassigned | ||
2.1 |
Triaged
|
Wishlist
|
Unassigned | ||
2.2 |
Triaged
|
Wishlist
|
Unassigned | ||
2.3 |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Our MySQL databases have tens of thousands of tables, so each of the directories in /var/lib/mysql/ contains a huge number of .frm files. Each database could contain a gigabyte or more of just rarely-changing .frm files. We use the --rsync option for innobackupex to help speed up the process of copying these files, but have found that space is basically wasted as each incremental backup contains multiple gigabytes of frm files despite the fact that they have not been modified since the last incremental backup.
As a simple solution, we have created the following patch which modifies the behavior of the --rsync option. When the --rsync option AND the --incremental-
This tells rsync that any files which are unchanged when compared to the contents of the incremental-basedir should be hardlinked back to their original copies, not duplicated. Any changed files are still handled properly - they are rsync'ed in to the new location, but any unchanged files (like our thousands of .frm's) are hardlinked which saves disk space.
Using this patch we have reduced the time it takes to run an incremental backup, as well as the amount of disk space used from many gigabytes per incremental backup to just a few hundred megabytes.
Restore options are not impacted by this change - the hardlink files look and behave identically to "real" files, they simply save on disk space.
Jason,
Sounds like a nice feature. Thanks for the contribution! The exact release which we can merge it into depends on available bandwidth, but we can try to do this for 2.0.5.