Percona Server moved to https://jira.percona.com/projects/PS

Bug #892831
Comment #2

Comment 2 for bug 892831

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2011-11-21: Re: [Bug 892831] Re: Fallocate support in innodb

Regarding unwritten extents, I had a doubt regarding that*. However, after
discussing with XFS developers, I understood that since unwritten extents became
default years ago, the performance impact in converting unwritten extents to
written one are negligible now, far outweighed by benefits of fallocate and, of
course, better than writing zeroes.

Regarding fallocate, I went with fallocate instead of the posix variant because
posix_fallocate fallsback to old legacy behavior on unsupported systems silently
which may not be desirable.

* The doubt was I wanted to test XFS specific ioctls like XFS_IOC_RESVSP64,
XFS_IOC_ZERO_RANGE, but their usage was not encouraged since fallocate provided
a better interface and more stable api. (and fallocate internally calls these)

I also asked about this (in xfsctl man page)
"
If the XFS filesystem is configured to flag unwritten file extents, performance
will be negatively affected when writing to preallocated space, since extra
filesystem transactions are required to convert extent flags on the range of
the file written."

-- seems this statement no longer applies and will be removed.

* On Mon, Nov 21, 2011 at 09:21:31AM -0000, Stewart Smith <email address hidden> wrote:
>We may not go with fallocate() all the time - as (at least for XFS,
>which is all anybody cares about) you then get unwritten extents, which
>means that as you fill up the datafile you're getting filesystem
>metadata log traffic for converting the unwritten extents to written
>extents, potentially having a performance impact that one wouldn't
>expect.
>
>(we'd use posix_fallocate() instead of fallocate() as the posix version
>is portable)
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/892831
>
>Title:
> Fallocate support in innodb
>
>Status in Percona Server with XtraDB:
> Triaged
>
>Bug description:
> Currently innodb physically writes zeroes to file for --
>
> innodb table space creation (ibdata), log file creation(ib_logfile*),
> innodb single tablespace creation (ibd), extension of table space
> files (both ibdata and ibd)
>
> --- all of which make the process really slow. So I decided to add
> fallocate support to all of the above. Even though benefit should come
> from fast creation of initial files*, most benefit will be visible in
> extension, since it can actively affect the queries and also adds
> overhead with mutexes etc. Fallocate is by far a O(1) operation. I
> have tested it on XFS/ext4 filesystem on my box for small sizes and
> results are really good. But needs to be benchmarked on better
> systems.
>
> The code is here (commits from 3547 to 3550) --
> https://code.launchpad.net/~raghavendra-prabhu/+junk/mysql-server-
> fallocate and is based on latest mysql server tip from here --
> bazaar.launchpad.net/%2Bbranch/mysql-server/ . It needs to be built
> with -DWITH_FALLOCATE=ON to cmake, system should also support it
> (added a feature test for that).
>
> * Earlier, I have seen a case of innodb ibdata file being set to 2-3
> TB and that physical writing of zeroes taking hours even on RAID, so
> on a downtime or fresh boxes adding time significantly.
>
> PS: The only caveat so far is that on old ext4 (<= 2009) systems,
> Direct I/O with fallocate falls back to buffered IO. XFS doesn't have
> any such issues.
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/percona-server/+bug/892831/+subscriptions
>
Regards,
--------------------------
Raghavendra D Prabhu (TZ: GMT + 530)
Call: +91 96118 00062
mailto:<email address hidden>
Percona, Inc. - http://www.percona.com / Blog: http://www.mysqlperformanceblog.com/
Skype: percona.raghavendrap
GPG: 0xD72BE977

Percona Live MySQL Conference April 10-12 Santa Clara
http://www.percona.com/live/mysql-conference-2012/

Regarding fallocate, I went with fallocate instead of the posix variant because
posix_fallocate fallsback to old legacy behavior on unsupported systems silently
which may not be desirable.

I also asked about this (in xfsctl man page)
"
If the XFS filesystem is configured to flag unwritten file extents, performance
will be negatively affected when writing to preallocated space, since extra
filesystem transactions are required  to  convert extent flags on the range of
the file written."

-- seems this statement no longer applies and will be removed.

* On Mon, Nov 21, 2011 at 09:21:31AM -0000, Stewart Smith <stewart@flamingspork.com> wrote:
>We may not go with fallocate() all the time - as (at least for XFS,
>which is all anybody cares about) you then get unwritten extents, which
>means that as you fill up the datafile you're getting filesystem
>metadata log traffic for converting the unwritten extents to written
>extents, potentially having a performance impact that one wouldn't
>expect.
>
>(we'd use posix_fallocate() instead of fallocate() as the posix version
>is portable)
>
>-- 
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/892831
>
>Title:
>  Fallocate support in innodb
>
>Status in Percona Server with XtraDB:
>  Triaged
>
>Bug description:
>  Currently innodb physically writes zeroes to file for --
>
>  innodb table space creation (ibdata), log file creation(ib_logfile*),
>  innodb single tablespace creation (ibd), extension of table space
>  files (both ibdata and ibd)
>
>  --- all of  which make the process really slow. So I decided to add
>  fallocate support to all of the above. Even though benefit should come
>  from fast creation of initial files*, most benefit will be visible in
>  extension, since it can actively affect the queries and also adds
>  overhead with mutexes etc. Fallocate is by far a O(1) operation. I
>  have tested it on XFS/ext4 filesystem  on my box for small sizes and
>  results are really good. But needs to be benchmarked on better
>  systems.
>
>  The code is here (commits from 3547 to 3550)  --
>  https://code.launchpad.net/~raghavendra-prabhu/+junk/mysql-server-
>  fallocate  and is based on latest mysql server tip from here --
>  bazaar.launchpad.net/%2Bbranch/mysql-server/ . It needs to be built
>  with -DWITH_FALLOCATE=ON to cmake, system should also support it
>  (added a feature test for that).
>
>  * Earlier, I have seen a case of innodb ibdata file being set to 2-3
>  TB and that physical writing of zeroes taking hours even on RAID, so
>  on a downtime or fresh boxes adding time significantly.
>
>  PS: The only caveat so far is that on old ext4 (<= 2009) systems,
>  Direct I/O with fallocate falls back to buffered IO. XFS doesn't have
>  any such issues.
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/percona-server/+bug/892831/+subscriptions
>
Regards,
--------------------------
Raghavendra D Prabhu (TZ: GMT + 530) 
Call: +91 96118 00062
mailto:raghavendra.prabhu@percona.com
Percona, Inc. - http://www.percona.com / Blog: http://www.mysqlperformanceblog.com/
Skype: percona.raghavendrap
GPG: 0xD72BE977

Percona Live MySQL Conference April 10-12 Santa Clara
http://www.percona.com/live/mysql-conference-2012/