jujud leaking file handles
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Fix Released
|
High
|
Cheryl Jennings | ||
1.22 |
Fix Released
|
Critical
|
Cheryl Jennings | ||
1.23 |
Fix Released
|
High
|
Cheryl Jennings | ||
1.24 |
Fix Released
|
High
|
Cheryl Jennings |
Bug Description
Still needs further investigation as to the root cause. However we are seeing 250,000 open file handles (according to lsof) for Jujud on a production server. It is currently failing to connect to the API server because of "too many open file handles".
From what we can tell the start of the machine-1.log has "failure to connect" because of too many open file handles. So likely the operational issue is that if the API servers are down, we slowly leak file descriptors. And after long enough time we can no longer allocate new ones so it will always fail to connect.
The lsof output shows ~99% of the open file handles as stuck in CLOSE_WAIT which should be TCP saying "the remote side has closed your connection, but you haven't closed it yet."
Changed in juju-core: | |
status: | Triaged → In Progress |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
Appears to be a dup of 1420057. Taking a look...