azure destroy-environment does not complete

Bug #1324910 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins

Bug Description

Juju CI seesa lot of evidence that destroy-environment does not do everything it needs to to complete. CI sees many cases where juju claims the env is already bootstrapped., sometime after destroy-environment --force was called. The test job fails, but the next might pass. I suspect that Azure is still cleaning up resources

We commonly find azure jenv files left behind. but even when there isn't, we see the "already bootstrapped" message.

The azure dashboard shows that no machines or services are running, but the networks do exist. Storage still has a container with state files in it :(

Manually deleting these can help, but we still need to wait about 10 minutes before we run the job again, otherwise we see the "already bootstrapped" message. We cannot always delete. The dashboard often fails to delete networks and blobs.

CI may change azure testing to create a different env name to ensure resources from the last test are not reused, but we then run the risk of using up the limited resources azure provides.

Revision history for this message
Curtis Hovey (sinzui) wrote :

The 1.18.4 release was stalled by this bug.

$ juju --show-log destroy-environment --force curtis-azure
2014-06-03 14:48:08 INFO juju.cmd supercommand.go:302 running juju-1.18.4-trusty-amd64 [gc]
WARNING! this command will destroy the "curtis-azure" environment (type: azure)
This includes all machines, services, data and other resources.

Continue [y/N]? y
2014-06-03 14:48:23 ERROR juju.cmd supercommand.go:305 cannot delete the environment's virtual network: asynchronous operation failed: BadRequest - An error occurred when setting the network configuration: The virtual network juju-curtis-azure-vnet is currently in use.. (http code 400: Bad Request)

My general reaction is "just fucking die". The network cannot be deleted from the azure dashboard either.

tags: added: destroy-environment
Changed in juju-core:
milestone: none → next-stable
importance: Medium → High
Revision history for this message
Curtis Hovey (sinzui) wrote :

We have changes azure testing to use a pool of environment names to avoid quick reuse of a name that has resources still up or coming down after destroy environment fails.

We observer that the cloud health checks are more reliable because the test is only run once an hour, long enough for azure to complete cleanup. The network is reused. I think that is fine because vnets are like storage; there is a limited number we can create, but they can be shared an reused.

The juju tests that bring environments down and up an rapid succession (rapid in azure terms in 20 minutes) are very likely to fail on the 2 and 4 attempts because juju or azure believe the env is up even though we used --force with destroy-environment and there is no jenv.

Changed in juju-core:
milestone: next-stable → 1.19.4
Revision history for this message
Curtis Hovey (sinzui) wrote :

I am manually deleting the containers left behind each test.
I am manually deleting disks and their associated vhds each day.
I often delete left overs services and VMs that are running when no azure tests have been run on 30 minutes.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Azure seems to not want to let go of the network, as you've found in the console. It does *eventually* let it go, but that can take quite a while in my experience. It would be nice if there were a way to tell Azure to delete the network when it can, but there's no such API.

One option we have is to *try* to delete the network, and if that fails just leave the network and the affinity group we created lying around and reuse them next time we bootstrap the environment.

Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.