Fallocate support in innodb
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.1 |
Won't Fix
|
Wishlist
|
Unassigned | |||
5.5 |
Triaged
|
Wishlist
|
Unassigned | |||
5.6 |
Triaged
|
Wishlist
|
Unassigned | |||
5.7 |
Fix Released
|
Wishlist
|
Unassigned |
Bug Description
Currently innodb physically writes zeroes to file for --
innodb table space creation (ibdata), log file creation(
--- all of which make the process really slow. So I decided to add fallocate support to all of the above. Even though benefit should come from fast creation of initial files*, most benefit will be visible in extension, since it can actively affect the queries and also adds overhead with mutexes etc. Fallocate is by far a O(1) operation. I have tested it on XFS/ext4 filesystem on my box for small sizes and results are really good. But needs to be benchmarked on better systems.
The code is here (commits from 3547 to 3550) -- https:/
* Earlier, I have seen a case of innodb ibdata file being set to 2-3 TB and that physical writing of zeroes taking hours even on RAID, so on a downtime or fresh boxes adding time significantly.
PS: The only caveat so far is that on old ext4 (<= 2009) systems, Direct I/O with fallocate falls back to buffered IO. XFS doesn't have any such issues.
Changed in percona-server: | |
importance: | Undecided → Wishlist |
status: | New → Triaged |
We may not go with fallocate() all the time - as (at least for XFS, which is all anybody cares about) you then get unwritten extents, which means that as you fill up the datafile you're getting filesystem metadata log traffic for converting the unwritten extents to written extents, potentially having a performance impact that one wouldn't expect.
(we'd use posix_fallocate() instead of fallocate() as the posix version is portable)