LXD machine-status stays in "allocating".

Bug #1558061 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel

Bug Description

The recent change I added to report progress while copying LXD images is working, but it is leaving the machine in status: allocated:
machines:
  "0":
    juju-status:
      current: started
      since: 16 Mar 2016 16:36:52+04:00
      version: 2.0-beta3.1
    dns-name: 54.78.131.112
    instance-id: i-34ad00be
    machine-status:
      current: running
      message: running
      since: 16 Mar 2016 16:36:52+04:00
    series: trusty
    containers:
      0/lxd/0:
        juju-status:
          current: started
          since: 16 Mar 2016 16:40:52+04:00
          version: 2.0-beta3.1
        dns-name: 10.0.3.184
        instance-id: juju-machine-0-lxd-0
        machine-status:
          current: allocating
          message: 'copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases:
            100%'
          since: 16 Mar 2016 16:39:59+04:00
        series: trusty

Notice that the "juju-status" is started, but the "machine-status" is still allocating.

Tags: lxd 2.0-count
Revision history for this message
John A Meinel (jameinel) wrote :

So I worked out *why* this is happening, but I'm not sure how to fix it yet.
Specifically, CopyImage starts a copy operation and uses another goroutine listening on a websocket for events from LXD indicating operation progress.

However, we are still getting those events even after CopyImage has returned. Which doesn't make a lot of sense. But I see this in the log files:

Revision history for this message
John A Meinel (jameinel) wrote :

This is a filtered version of machine-0.log with TRACE logging of API calls.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.9 KiB)

for just some key points:

start of copying

2016-03-17 12:16:00 INFO juju.tools.lxdclient client_image.go:82 found image from https://cloud-images.ubuntu.com/releases for trusty = 75182b1241be475a64e68a518ce853e800e9b50397d2f152816c24f038c94d6e
2016-03-17 12:16:01 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 1%
2016-03-17 12:16:01 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":348,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 1%","Data":null}]}}

copying is still very early, but we think we're ready to start the instance:
2016-03-17 12:16:03 TRACE juju.apiserver apiserver.go:289 -> [4] machine-0 {"RequestId":355,"Response":{"Results":[{"Error":null}]}}
2016-03-17 12:16:03 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 9%
2016-03-17 12:16:03 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":356,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 9%","Data":null}]}}
2016-03-17 12:16:04 INFO juju.container.lxd lxd.go:131 starting instance "juju-machine-0-lxd-5" (image "ubuntu-trusty")...
2016-03-17 12:16:04 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":357,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"Starting container","Data":null}]}}

^^ we're at the point of "Starting container" which doesn't get called until after EnsureImageExists() returns.

when the copying is close to done:

2016-03-17 12:16:48 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 79%
2016-03-17 12:16:48 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":429,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 79%","Data":null}]}}
2016-03-17 12:16:48 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 80%
2016-03-17 12:16:48 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":430,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 80%","Data":null}]}}
2016-03-17 12:16:49 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":431,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"running","Info":"Container started","Data":null}]}}

^- We're at the poin...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

If I SSH into that machine directly, and run
 $ lxc image delete ubuntu-trusty
 $ time lxc image copy ubuntu:trusty --alias ubuntu-trusty local:
it takes only 5s for the image to actually copy.

However, our logging seems to say
2016-03-17 12:16:02 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 5%
2016-03-17 12:16:02 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":352,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 5%","Data":null}]}}
2016-03-17 12:16:02 TRACE juju.apiserver apiserver.go:289 -> [4] machine-0 {"RequestId":352,"Response":{"Results":[{"Error":null}]}}
2016-03-17 12:16:02 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 6%
2016-03-17 12:16:02 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":353,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 6%","Data":null}]}}
2016-03-17 12:16:03 TRACE juju.apiserver apiserver.go:289 -> [4] machine-0 {"RequestId":353,"Response":{"Results":[{"Error":null}]}}
2016-03-17 12:16:03 INFO juju.tools.lxdclient client_image.go:85 copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 7%
2016-03-17 12:16:03 TRACE juju.apiserver apiserver.go:275 <- [4] machine-0 {"RequestId":354,"Type":"Provisioner","Version":2,"Request":"SetInstanceStatus","Params":{"Entities":[{"Tag":"machine-0-lxd-5","Status":"allocating","Info":"copying image for ubuntu-trusty from https://cloud-images.ubuntu.com/releases: 7%","Data":null}]}}

That it is taking 1s for every % of copy being finished. So it appears that maybe we're just being too slow to record the copy progress.

Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/4772

Addresses this by allowing us to discard progress messages that come in while we are still handling the previous one.

Revision history for this message
John A Meinel (jameinel) wrote :

Can't land it right now because Master is blocked on other bugs.

tags: added: 2.0-count
Changed in juju-core:
milestone: 2.0-beta3 → 2.0-beta14
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
assignee: nobody → John A Meinel (jameinel)
status: Triaged → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta15 → none
milestone: none → 2.0-beta15
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.