rpm database extremely slow after preupgrade to f16
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
RPM |
New
|
Undecided
|
Unassigned | ||
Fedora |
Fix Released
|
Medium
|
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #2 |
Created attachment 532898
xz'ed Packages file
I discovered I can attach from an URL...
In Red Hat Bugzilla #752897, Kevin (kevin-redhat-bugs) wrote : | #3 |
Is this the same as https:/
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #4 |
Hi Kevin,
I don't think it is the same. This situation appears to be entirely IO bound. Very little CPU time is used whilst it rebuilds the db.
You can observe this out for yourself by downloading the Packages file I uploaded and rebuilding it on a F16 machine. (You probably want to rebuild twice...)
Cheers,
Rob
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #5 |
Okay the insane slowness is reproducable for me too (for once), excellent. Will
have a look.
Just to get this straight: on F15 things were business as usual, and it only
started acting up after the upgrade to F16, right? Its pretty peculiar as the
rpm version is exactly the same on both.
Are others seeing this as well, or does this just happen to be an otherwise popular/interesting bug (judging by the number of CC's we have already)?
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #6 |
Well I only noticed the slowdown after the F16 upgrade. I believe it was not slow before that, so I guess whatever operations happened on the database during the upgrade lead to the database becoming slow.
I still have another desktop that is running F15. I rebuilt this database on there to find that it's also slow on there. That machine is running rpm-4.9.
I think it might just be an interesting bug...
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #7 |
FWIW the "insane slowness" I referred to occurs when you do 'rpm --rebuilddb' from just the Packages file, triggering index autocreation which in this situation ends up turning fsync on for the new database being built when it shouldn't. The end result is that 'rpmdb --rebuilddb' takes over 20 *minutes* on my systems, when in normal circumstances it completes in a few seconds on the same systems. That's just a dumb (but harmless) bug, will fix.
The 2-5min times for --rebuilddb are quite likely to be something else - when the index/fsync bug is triggered you'll see "warning: Generating <N> missing indexes" message during --rebuilddb. It's a kinda known issue that rpmdb performance starts deteriorating over time (or rather, huge number of upgrades etc), rebuilding the db brings it back to normal and what you're seeing might be just that: the f15 -> f16 upgrade just happened to push it past whatever the limit where performance goes down the drain.
And that apparently different "a few minutes" time range case I'm not able to reproduce. Index generation and db rebuild when done separately run in the expected "a few seconds" timeframe:
[pmatilai@turre ~]$ mkdir /tmp/slowdb
[pmatilai@turre ~]$ xz -cd ~/Download/
[pmatilai@turre ~]$ time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 0m5.577s
user 0m3.888s
sys 0m0.453s
[pmatilai@turre ~]$ time rpmdb --rebuilddb --dbpath /tmp/slowdb/
real 0m7.630s
user 0m4.907s
sys 0m0.698s
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #8 |
Oh, one more thing, so obvious it's easy to forget: was the system actually otherwise idle when you noticed the slowness? After a distro install/upgrade, there can be any number of other io-intensive things like prelink, mlocate and man-db cronjobs running which will severely slow down other io-intensive operations such as --rebuilddb.
Robert, if you still have the full original "slow" db (all of it, not just the Packages file) backup somewhere, can you try reproducing it with that, after ensuring the system is really idle? Also the timings for separate --initdb and --rebuilddb as shown in comment #6 might be interesting.
In Red Hat Bugzilla #752897, Yaroslav (yaroslav-redhat-bugs) wrote : | #9 |
(In reply to comment #6)
My system has this bug.
It's a rawhide system, rpm version 4.9.1.2, yum version 3.4.3.
First I thought I'll just blame yum, which indeed deserves some blame of its own, but after profiling I've found that slowness is somewhere in the RPM itself.
Judging from strace, it seems that the Package database is being read over and over and over again, via pread(), in chunks of 4 KiB. Given that the Packages is 95 MiB in size, that is long time.
Increasing mp_mmapsize to 256 megabytes from 128, and the cache to 128 from 64, hasn't helped it a bit.
strace tells me the files are in fact mmap()'ed, but my puny C knowledge says you usually just use the file as an array of bytes in memory. However, lots of pread() calls issued after this, and lots of pwrite() calls issued when indexes are being written, somehow don't fit my picture of how mmap()ed files should be written or read. Correct me if I'm wrong.
As for timings, they are horrible. 30 minutes for --initdb step.
[root@think slowpoke]# time rpm --initdb --dbpath /root/slowpoke
warning: Generating 12 missing index(es), please wait...
real 30m17.679s
user 0m31.035s
sys 0m14.904s
I'm afraid of doing --rebuilddb; it really seems that either nofsync flag is ignored, or that mmap() is ignored.
In Red Hat Bugzilla #752897, Yaroslav (yaroslav-redhat-bugs) wrote : | #10 |
Created attachment 533295
strace on rpm --initdb --dbpath /run/slowpoke
Notice how many 4K-pread()'s and pwrite()'s are there.
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #11 |
Hi Panu,
Yes, rebuilding the database from the Packages file takes considerably longer. On two machines that I have it took approximately 50 minutes. It's the second rebuilding that takes several minutes.
The numbers I quoted in the initial post for rebuilding and simplePkgList() are on a database that has already been rebuilt from from just the Packages file.
The system I was using was idle. When the rpm stuff was going, there was pretty much nothing else accessing the disk (was watching it in iotop and gkrellm). There was pretty much nothing else using the cpu either (was watching in top).
I'll generate some --initdb stats if you like, but since I've replicated the same behaviour on different machines with quite different hardware, I am somewhat sceptical that this'll be useful.
Cheers,
Rob
In Red Hat Bugzilla #752897, Shawn (shawn-redhat-bugs) wrote : | #12 |
In my rawhide box:
[root@segfault ~]# time rpm --rebuilddb -v
real 0m48.277s
On a Fedora 15 -> 16 preupgrade:
[root@coredump ~]# time rpm --rebuilddb -v
real 2m35.502s
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #13 |
How much memory do the systems exhibiting this have, and what fs is /var on?
Like said on my own systems I can't reproduce anything but the index-creation/
In /usr/lib/rpm/macros change "mp_size=64Mb" to "mp_size=1Mb" (which incidentally is the value it had for years and years), remove the db environment: 'rm -f /var/lib/
Now to figure out what the heck is actually going on in there...
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #14 |
Hi Panu,
My rootfs is ext3, which is what /var is on.
Changing mp_size to 1Mb does indeed bring performance back. (And changing it back to 64Mb takes it away again.)
Cheers,
Rob
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #15 |
So it is the same thing that I'm seeing then and no extra factors like completely different filesystem (ext3 vs ext4 is clone enough), good.
The (dramatic) slowdowndown is a somewhat peculiar combination of rpmdb size, system RAM and cache size: with "average" rpmdb sizes things are ok with the current 64Mb cache (this value was the same throughout F15) even with lower RAM, so for example your laptop's own db didn't exhibit the slowness but the desktop systems larger db crawled on the laptop.
Rebuilding the db is a special case though and doesn't really benefit from caching, whereas larger (than 1Mb) cache IS beneficial for other functionality. So its a question of finding a sweet spot that doesn't kill lower end systems while having a reasonable cache on "normal" operations...
How much memory do your systems have, circa 1GB or so?
In Red Hat Bugzilla #752897, Rob (rob-redhat-bugs) wrote : | #16 |
Ah, apologies. Forgot to write how much RAM these machine have:
* desktop: 6GB (32-bit install)
* laptop: 2GB
Cheers.
In Red Hat Bugzilla #752897, Samuel (samuel-redhat-bugs) wrote : | #17 |
I'm seeing this as well. I was wondering why yum was so slow, but then I figured out it was actually the rpm Packages database. It only started after upgrading to F16. The strange thing I noticed is that a warm start doesn't help at all as if there is no caching being done. Any rpm operation is slow, even if repeated immediately after. My laptop has 5GB of RAM.
That workaround definitely fixes it for me, it's actually speedy now.
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #18 |
So its not limited to systems with "low" memory. I'm curious as to why I'm not seeing this (except with lower available memory), when upping the cache size I did much of the testing on a laptop with 2Gb memory and was seeing definite improvements with the larger cache.
At this point better its quite obviously best to just back the cache size down to the previous value that is working so much better for others, at least until the issue is better understood.
In Red Hat Bugzilla #752897, Samuel (samuel-redhat-bugs) wrote : | #19 |
My laptop does have a lot of things loaded most of the time, but still a 64MB cache seems pretty minimal. I also have two desktops with 2GB RAM that I upgraded and have this same problem. Did you upgrade yours or was it a clean install? I will see if I can do some testing with that. Although the issue may be related to the number of packages installed as well as all the computers involved have a lot of packages. The desktops have around 2300 and the laptop nearly 2800.
In the process of getting that information remotely while no one was logged in, I didn't see the problem. The first run of "rpm -qa" took quite a while with heavy disk access, but the second run was fast, basically entirely cached. I wonder if there is something about the disk reads with the larger cache setting that trigger something in the kernel to not cache when there's significant memory load that doesn't happen with smaller cache setting. I will see if I can find the time to do some more testing.
In Red Hat Bugzilla #752897, Hin-Tak (hin-tak-redhat-bugs) wrote : | #20 |
*** Bug 754761 has been marked as a duplicate of this bug. ***
In Red Hat Bugzilla #752897, Hin-Tak (hin-tak-redhat-bugs) wrote : | #21 |
As in 754761, I started noticing rpm being very slow (it used to be just yum in f15, and I assume it was the python overhead) in f16, whenever I have a lot of idle browser windows open - i.e. the system is mostly idle but with a lot of idle processes which can be swap out. I am just expecting an interactive process, especially a root-owned one like rpm/yum, to push more of those idle processes into swap to proceed.
FWIW, my system is 2GB RAM + 2GB swap. During most of F15 I had probably 3000+ packages installed but I desperately needed texlive 201x so I got it on from one of the fedora packages-
If this bug is affected by large package list, putting on texlive 2011 from http://
(In reply to comment #12)
> In /usr/lib/rpm/macros change "mp_size=64Mb" to "mp_size=1Mb" (which
> incidentally is the value it had for years and years)...
Just for interest, when did it got changed?
tags: | added: fedora rpmdb |
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #22 |
Seeing this: http://
Just a theory for now, but at any rate that's the kind of thing that would explain why the very same DB configuration worked fine in (early) F15 but things got dirt slow in F16: F15 started with kernel 2.6.38 (but later updates brought in 3.0.x masquerading as 2.6.40 iirc) whereas F16 had 3.x from day one. If somebody wants to try and recompile their kernel to see if it makes any difference, this is what's supposed to fix it: http://
In Red Hat Bugzilla #752897, Stephen (stephen-redhat-bugs) wrote : | #23 |
I don't think it's a disk performance issue. It's the sheer volume of disk activity that occurs.
Without the mp_size change, I'm seeing vast amounts of disk IO from rpm even from simple queries like "yum list available ...". It sustains writes (yes, writes!) of between 3MB and 15MB per second for 90 seconds... that's an insane amount of write IO for a readonly query of the bdb, and I think it's a red herring to be looking for kernel reasons why that amount of IO is slow. rpm/bdb really shouldn't be generating that much write traffic in the first place.
btw, "iostat" is a good way to watch the traffic resulting here. I usually use "iostat 2" to get it updated every couple of seconds.
With mp_size=1MB as in comment #12, the problem disappears completely. With the default /usr/lib/macros, rpm is almost unusable, on every system I've got which has been updated from older versions of Fedora.
In Red Hat Bugzilla #752897, Stephen (stephen-redhat-bugs) wrote : | #24 |
btw, is there any reason not to revert mp_size while we work out what the underlying problem is? This is a pretty major issue for those of us it affects.
In Red Hat Bugzilla #752897, Stephen (stephen-redhat-bugs) wrote : | #25 |
One more data point: in slow mode, the write IO all seems to be to the bdb internal journal files. (This done while executing a simple yum query, "yum list available foobar".) The most recently modified files in /var/lib/rpm look like
[rpm]# ls -lath
total 280M
-rw-r--r--. 1 root root 24K Feb 7 11:20 __db.001
-rw-r--r--. 1 root root 3.7M Feb 7 11:20 __db.002
-rw-r--r--. 1 root root 81M Feb 7 11:20 __db.003
-rw-r--r--. 1 root root 536K Feb 7 11:20 __db.004
-rw-r--r--. 1 rpm rpm 6.5M Feb 7 10:47 Dirnames
...
Reverting mp_size to 1MB by itself doesn't fix this, the results look the same. But clearing the __db* files and rerunning, I still get the write traffic, but much less; and the db files look like
[rpm]# ls -lath
total 198M
-rw-r--r--. 1 root root 24K Feb 7 11:27 __db.001
-rw-r--r--. 1 root root 240K Feb 7 11:27 __db.002
-rw-r--r--. 1 root root 1.3M Feb 7 11:27 __db.003
-rw-r--r--. 1 root root 536K Feb 7 11:27 __db.004
-rw-r--r--. 1 rpm rpm 6.5M Feb 7 10:47 Dirnames
...
Curiously, the __db.003 file (which I think is the bdb mmap cache file) has shrunk by almost exactly the same factor of 64 by which we shrank mp_size. But the actual amount of disk write according to iostat has shrunk to just a few kB.
So the real question is why we're generating so much traffic to the mmap file; and if we can't avoid that, then we should probably just minimise the issue by reverting the size of that file back to its older value.
In Red Hat Bugzilla #752897, Met (met-redhat-bugs) wrote : | #26 |
I can confirm this bug on both of my F16 installations. One were upgraded with preupgrade, the other with yum --releasever=16 upgrade.
In both cases I get huge amounts of IO, it takes yum up to 10min to list packages to be upgraded. And then the next 10min to finish yum after `cleaning' the last package. On one system updated with yum (not preupgrade) the trick with mp_size=1MB did the job (on the other I didn't try yet)
**I think it is a high-priority bug**
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #27 |
The "mp" (for "memory pool") thing is BDB jargon for its internal (shared) cache, through which all db IO goes. That's why you get a lot of write-IO on read-only queries too, especially after reboot which erases the whole BDB environment including the cache.
On my four (F16) systems, mp_size=64Mb vs 1Mb makes no difference whatsoever to the --initdb times (a dumb benchmark for caching related issues as caching is of no benefit for this operation, but serves as a simple reproducer for this issue). However on a VM with just 512MB this makes a huge differece, quite obviously something horribly stupid happens with the cache when memory is on the low side.
And it also does have to do with the kernel. With mp_size=1Mb there's no measurable difference, but with 64Mb the difference between original F15 kernel and current F16 one is quite drastic. Using the packages from comment #1 as a test-case, here are the timing on the virtual machine with 512M memory, the magnitude of difference is easily reproducable here:
[root@localhost slowdb]# uname -r
2.6.38.
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 2m39.459s
user 0m6.173s
sys 0m2.478s
[root@localhost slowdb]# uname -r
3.2.3-2.fc16.x86_64
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 10m34.010s
user 0m5.728s
sys 0m2.117s
FWIW, mp_size=1Mb does the deed in roughly 8 seconds on the same VM on both kernels. This is the kind of time I get on my laptop running this VM, regardless of mp_size setting.
So, obviously the larger cache is severely hurting rpm performance in some situations, much more than its winning in the best scenarios. I'll switch it back to the former default for now, finding an optimum is going to take some added testing and tuning on different types of systems. But there's quite a regression in kernel performance as well, whether it's the plugging issue I referred to in comment #21 or something else I've no clue.
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #28 |
Oh and the timings were produced on the same F16 installation, the only difference is booting with F15 vs F16 kernel.
In Red Hat Bugzilla #752897, Panu (panu-redhat-bugs) wrote : | #29 |
Dunno how much the Fedora kernel version can be trusted these days, but testing with a sampling of kernels grabbed from koji, this doesn't seem to start at 2.6.39 but 2.6.40 and gradually getting worse. This is the same test-case as with the above numbers, 3.2.3 retested for curiosity and producing a little less horrible figure this time, but still by far slowest of them all:
[root@localhost slowdb]# uname -r
2.6.38.
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 2m39.459s
user 0m6.173s
sys 0m2.478s
[root@localhost slowdb]# uname -r
2.6.39-
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 2m24.814s
user 0m5.939s
sys 0m2.245s
[root@localhost slowdb]# uname -r
2.6.40-
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 6m26.406s
user 0m5.772s
sys 0m2.152s
[root@localhost slowdb]# uname -r
2.6.41.
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 5m42.944s
user 0m5.796s
sys 0m2.164s
[root@localhost slowdb]# uname -r
3.2.3-2.fc16.x86_64
[root@localhost slowdb]# time rpmdb --initdb --dbpath /tmp/slowdb/
warning: Generating 12 missing index(es), please wait...
real 7m53.031s
user 0m5.750s
sys 0m2.217s
In Red Hat Bugzilla #752897, Fedora (fedora-redhat-bugs) wrote : | #30 |
rpm-4.9.1.2-5.fc16 has been submitted as an update for Fedora 16.
https:/
In Red Hat Bugzilla #752897, Fedora (fedora-redhat-bugs) wrote : | #31 |
rpm-4.9.1.2-4.fc15 has been submitted as an update for Fedora 15.
https:/
In Red Hat Bugzilla #752897, Fedora (fedora-redhat-bugs) wrote : | #32 |
Package rpm-4.9.1.2-5.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=
as soon as you are able to.
Please go to the following url:
https:/
then log in and leave karma (feedback).
In Red Hat Bugzilla #752897, Christoph (christoph-redhat-bugs) wrote : | #33 |
I have been suffering from this for ages and tried a lot, but this update finally fixes my problems. Transactions are now going fast again. Thanks!
In Red Hat Bugzilla #752897, Fedora (fedora-redhat-bugs) wrote : | #34 |
rpm-4.9.1.2-5.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
In Red Hat Bugzilla #752897, Fedora (fedora-redhat-bugs) wrote : | #35 |
rpm-4.9.1.2-4.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
In Red Hat Bugzilla #752897, Mike (mike-redhat-bugs) wrote : | #36 |
*** Bug 811544 has been marked as a duplicate of this bug. ***
Changed in fedora: | |
importance: | Unknown → Medium |
status: | Unknown → Fix Released |
Description of problem:
Performing normal operations in yum took a very long time after upgrading my desktop to f16. After some prodding directed by the fine fellows in #yum, it was concluded that this was a problem with my rpm database. It seems that interacting with the Packages database takes a lot of time.
First thing that I tried was to rebuild the rpm database (following the "oh, something's wrong with RPM, better rebuild its database" strategy), after deleting the __db* files. Rebuilding the database takes a large amount of time:
[rob@zarniwoop ~(master)]$ time sudo rpm --rebuilddb
real 5m0.445s
user 0m11.372s
sys 0m3.372s
Also:
[root@zarniwoop rpm]$ time python -c 'import yum; yum.YumBase( ).rpmdb. simplePkgList( )' debuginfo, fastestmirror, presto, refresh-packagekit
Loaded plugins: auto-update-
real 3m22.227s
user 0m1.779s
sys 0m1.097s
My desktop has 3237 packages installed.
One suspicion was the disks in my desktop: However, they appear to be completely ship-shape. There's no funk in dmesg, I can write to a file in /tmp (same FS as /var/lib/rpm) at 80MB/sec using dd, and parted reports my partitions are aligned correctly. filefrag reports that the Packages file is in 55 chunks, which doesn't seem crazy.
skvidal suggested that I tried copying my desktop's rpm database over to my laptop and tried rebuilding it on there to see if it was a hardware problem. My laptop also runs F16 and does not exhibit this slowness with its own rpm database (it takes 20s to rebuild and has 1907 packages installed). My laptop also displays slowness when rebuilding the rpm database from my desktop:
[rob@megadodo rpm]$ time rpm --root= /tmp/desktop- chroot/ --rebuilddb
real 2m2.058s
user 0m12.884s
sys 0m3.166s
For comparison, my laptop rebuilding its _own_ rpm database:
[rob@megadodo rpm]$ time sudo rpm --rebuilddb
real 0m19.878s
user 0m6.495s
sys 0m1.132s
So it appears that the problem is with the database itself, rather than the machine it is running on.
Other things of potential interest: pastebin. com/YXtTvNMv) on my desktop, skvidal suggested I removed them and tried rebuilding. This didn't help. The Packages file below is from before I removed all of them.
* `package-cleanup --dupes` reports no duplicates after churning with RPM for a long time...
* Upon observing that I had a lot of gpg-pubkey's installed (http://
You should find a copy of my Packages file here: https:/ /xgoat. com/share/ rpm-slowdb/ Packages. xz
(28MB)
Version numbers: 1.2-1.fc16. i686 fc16.i686. PAE #1 SMP Tue Nov 1 20:53:45 UTC 2011 i686 i686 i386 GNU/Linux
rpm: rpm-4.9.
uname: Linux zarniwoop.blob 3.1.0-7.
How reproducible: Very!