LockError on blob cache
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL3 |
Won't Fix
|
Medium
|
Tres Seaver |
Bug Description
Wed Aug 15 06:21:30 2012 ERROR karl Error locking file /srv/osfkarl/
Details
Error locking file /srv/osfkarl/
Traceback (most recent call last):
File "/srv/osfkarl/
_lock_file(fp)
File "/srv/osfkarl/
raise LockError("Couldn't lock %r" % file.name)
LockError: Couldn't lock '/srv/osfkarl/
Chris wrote:
Whatever it is seems to be done now. There are a lot more locking
errors in the log but they have to do with the lock used by
'check_size' to prune the blob_cache. I think what's going on,
though, is we're having multiple processes trying to use the same blob
cache and we probably shouldn't really be doing that. It worked
earlier when it was all on one box--then all processes used the blob
files used by the database itself and I guess it was anticipated that
multiple processes would be accessing that pile. With the server over
on another box we use a non-shared blob_cache, meaning the database
server has its pile and the client just has a smaller local cache. It
appears, though, that cache wasn't intended to be used by more than
one process at a time so you occasionally see lock errors, although
this is the first time I've seen it for something besides the
check_size thing which itself, is pretty ancillary.
So the long and the short of it is, to be entirely correct in our
usage we probably need each process (both webapp procs, mailin,
gsa_sync, etc...) to use its own blob cache, so we can avoid locking
errors. And of course, we'll need to make sure the blob_caches are
cleaned up when a process exits, else they'll stack up and eat our
disk.
Fortunately, this seems to be a pretty rare occurrence, so it's
probably not super time critical. We have, after all, been using
non-shared blob caches since our switch to gocept.
Ok. One thing I notice is that there isn't a stack trace, which means
it isn't being reported by our exception capturing stuff. It's just
something, somewhere, using log.error(...). This means that whatever
is going on it may not have an end user impact. Regardless, though,
it's obviously annoying if it's going to be tripping the alarm.
Changed in karl3: | |
milestone: | none → m117 |
Changed in karl3: | |
milestone: | m117 → m118 |
Changed in karl3: | |
milestone: | m118 → m119 |
Changed in karl3: | |
milestone: | m120 → m122 |
Changed in karl3: | |
milestone: | m122 → m123 |
Changed in karl3: | |
milestone: | m125 → m126 |
Changed in karl3: | |
status: | Confirmed → Won't Fix |
We need to slow everything down until we get clarity on the Q4 budget.