Better logging around lockutils
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.concurrency |
Confirmed
|
High
|
Unassigned |
Bug Description
From email thread:
http://
Notes from Sean
-------
What occured to me is that in debugging locking issues what we actually
care about is 2 things semantically:
#1 - tried to get a lock, but someone else has it. Then we know we've
got lock contention. .
#2 - something is still holding a lock after some "long" amount of time.
#2 turned out to be a critical bit in understanding one of the worst
recent gate impacting issues.
You can write a tool today that analyzes the logs and shows you these
things. However, I wonder if we could actually do something creative in
the code itself to do this already. I'm curious if the creative use of
Timers might let us emit log messages under the conditions above
(someone with better understanding of python internals needs to speak up
here). Maybe it's too much overhead, but I think it's worth at least
asking the question.
The same issue exists when it comes to processutils I think, warning
that a command is still running after 10s might be really handy, because
it turns out that issue #2 was caused by this, and it took quite a bit
of decoding to figure that out.
-------
Changed in oslo.concurrency: | |
importance: | Undecided → High |
milestone: | none → next-kilo |
status: | New → Confirmed |
Changed in oslo.concurrency: | |
milestone: | 0.1.0 → next-kilo |
Changed in oslo.concurrency: | |
milestone: | 0.2.0 → kilo-next |
Changed in oslo.concurrency: | |
milestone: | 0.3.0 → next-kilo |
Changed in oslo.concurrency: | |
milestone: | next-kilo → none |
This is partially addressed by https:/ /review. openstack. org/#/c/ 130872/