local provider machine agent uses 100% CPU after host reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pyjuju |
Fix Released
|
High
|
Benjamin Saller | ||
0.5 |
Triaged
|
High
|
Unassigned | ||
juju (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Precise |
Won't Fix
|
High
|
Unassigned |
Bug Description
The juju program uses 100% CPU if the host is rebooted while running a LXC environment. The CPU consumption does not happen immediately after reboot, but starts within a day or two of regular system use.
Destroying the environment kills the processes.
Here is the runaway process command:
/usr/bin/python -m juju.agents.machine --nodaemon --logfile /tmp/local-
Steps to reproduce:
1. Create an environment with the local (LXC) provider
2. Set up the wordpress+mysql as-per the juju tutorial
3. Reboot the host system
4. Use the system for a day
What happens: juju runs away, taking 100% of a CPU with it
I am using juju version 0.5+bzr537-
Related branches
- Kapil Thangavelu (community): Needs Fixing
-
Diff: 634 lines (+333/-79)12 files modifiedjuju/agents/machine.py (+1/-1)
juju/control/destroy_environment.py (+1/-1)
juju/lib/lxc/__init__.py (+1/-1)
juju/lib/lxc/data/juju-create (+2/-8)
juju/lib/service.py (+148/-0)
juju/lib/tests/test_service.py (+100/-0)
juju/machine/unit.py (+1/-0)
juju/providers/local/__init__.py (+21/-8)
juju/providers/local/agent.py (+23/-6)
juju/providers/local/tests/test_agent.py (+15/-51)
juju/providers/local/tests/test_provider.py (+18/-1)
setup.py (+2/-2)
Changed in juju: | |
milestone: | none → galapagos |
assignee: | nobody → Benjamin Saller (bcsaller) |
Changed in juju: | |
milestone: | galapagos → honolulu |
summary: |
- Juju uses 100% CPU after host reboot + local provider machine agent uses 100% CPU after host reboot |
Changed in juju: | |
status: | Triaged → In Progress |
Changed in juju (Ubuntu): | |
importance: | Undecided → High |
Changed in juju (Ubuntu Precise): | |
importance: | Undecided → High |
Changed in juju (Ubuntu): | |
status: | New → Triaged |
Changed in juju (Ubuntu Precise): | |
status: | New → Triaged |
milestone: | none → ubuntu-12.04.1 |
Changed in juju (Ubuntu Precise): | |
milestone: | ubuntu-12.04.1 → none |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Hi Maris. When I try this with the latest juju on precise, I don't get a runaway process. Mine just reconnects every few seconds...
[pid 1090] poll([{fd=4, events=POLLIN}, {fd=10, events= POLLIN| POLLOUT} ], 2, 3333) = 1 ([{fd=10, revents= POLLIN| POLLOUT| POLLERR| POLLHUP} ]) htons(55429) , sin_addr= inet_addr( "192.168. 122.1") }, 16) = -1 EINPROGRESS (Operation now in progress) POLLIN| POLLOUT} ], 2, 3333) = 1 ([{fd=10, revents= POLLIN| POLLOUT| POLLERR| POLLHUP} ]) htons(55429) , sin_addr= inet_addr( "192.168. 122.1") }, 16) = -1 EINPROGRESS (Operation now in progress) POLLIN| POLLOUT} ], 2, 3333) = 1 ([{fd=10, revents= POLLIN| POLLOUT| POLLERR| POLLHUP} ])
[pid 1090] getsockopt(10, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
[pid 1090] close(10) = 0
[pid 1090] poll([{fd=4, events=POLLIN}], 1, 3333) = 0 (Timeout)
[pid 1090] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 10
[pid 1090] setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1090] fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
[pid 1090] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 1090] connect(10, {sa_family=AF_INET, sin_port=
[pid 1090] poll([{fd=4, events=POLLIN}, {fd=10, events=
[pid 1090] getsockopt(10, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
[pid 1090] close(10) = 0
[pid 1090] poll([{fd=4, events=POLLIN}], 1, 3333) = 0 (Timeout)
[pid 1090] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 10
[pid 1090] setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1090] fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
[pid 1090] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 1090] connect(10, {sa_family=AF_INET, sin_port=
[pid 1090] poll([{fd=4, events=POLLIN}, {fd=10, events=
[pid 1090] getsockopt(10, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
[pid 1090] close(10) = 0
[pid 1090] poll([{fd=4, events=POLLIN}], 1, 3333
Can you strace the runaway process for a while with
strace -f -o /tmp/strace- machine- agent.log -p $PID_OF_ MACHINE_ AGENT
And then attach that as well?
Thanks!