Juju fails to destroy-environment with wipe enabled: Node cannot be released in its current state ('Disk erasing').

Bug #1386327 reported by Raphaël Badin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Undecided
Unassigned

Bug Description

Using MAAS trunk (revision 3302) and juju 1.18.4+dfsg-0ubuntu0.14.04.1 (in trusty-updates):

To recreate:
- Configure MAAS to "Erase nodes' disks prior to releasing."
- Bootstrap an environment and wait until the bootstrap node is 'Deploying' (and has been booted up);
- Abort the bootstrap operation.

 juju bootstrap
Launching instance
WARNING picked arbitrary tools &{"1.18.4-trusty-amd64" "https://streams.canonical.com/juju/tools/releases/juju-1.18.4-trusty-amd64.tgz" "992e4244874ffec4af083cdeb58040420320f63ac6a3f7526c81d963fa4e53d6" %!q(int64=7389403)}
 - /MAAS/api/1.0/nodes/node-c35a1bdc-4b04-11e4-b919-eca86bfdb3be/
Waiting for address
Attempting to connect to nuc2.maas:22
Attempting to connect to 192.168.10.75:22
^CInterrupt signalled: waiting for bootstrap to exit
ERROR bootstrap failed: interrupted
Stopping instance...
Bootstrap failed, destroying environment
ERROR Bootstrap failed, and the environment could not be destroyed: gomaasapi: got error back from server: 409 CONFLICT (Node cannot be released in its current state ('Disk erasing').)
ERROR interrupted

This is the apache log:

192.168.10.2 - - [27/Oct/2014:17:50:01 +0100] "POST /MAAS/api/1.0/nodes/node-b6ea27ca-4b04-11e4-b919-eca86bfdb3be/?op=release HTTP/1.1" 200 698 "-" "Go 1.1 package http"
192.168.10.2 - - [27/Oct/2014:17:50:02 +0100] "DELETE /MAAS/api/1.0/files/fa0413fb-f0ae-44d3-895d-a29fbbef5ac3-provider-state/ HTTP/1.1" 204 216 "-" "Go 1.1 package http"
192.168.10.2 - - [27/Oct/2014:17:50:02 +0100] "GET /MAAS/api/1.0/nodes/?agent_name=fa0413fb-f0ae-44d3-895d-a29fbbef5ac3&op=list HTTP/1.1" 200 706 "-" "Go 1.1 package http"
192.168.10.2 - - [27/Oct/2014:17:50:02 +0100] "POST /MAAS/api/1.0/nodes/node-b6ea27ca-4b04-11e4-b919-eca86bfdb3be/?op=release HTTP/1.1" 409 297 "-" "Go 1.1 package http"

As you can see, the node is released *twice* and the second 'release' operation operates on a 'disk erasing' operation. That's the part that breaks.

Christian Reis (kiko)
Changed in maas:
milestone: none → next
summary: - Juju fails to destroy an environment: Node cannot be released in its
- current state ('Disk erasing').
+ Juju fails to destroy-environment with wipe enabled: Node cannot be
+ released in its current state ('Disk erasing').
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This is the same bug as bug 1384001 IMO. It's just that the disk erasing feature was enabled in MAAS so the state after DEPLOYED is different.

Revision history for this message
Raphaël Badin (rvb) wrote :

> This is the same bug as bug 1384001 IMO. It's just that the disk erasing
> feature was enabled in MAAS so the state after DEPLOYED is different.

I don't think so; as you can see from the apache log, the 'release' operation is issued twice and it fails the second time because the node is already releasing.

Revision history for this message
Raphaël Badin (rvb) wrote :

Now, I'm trying to reproduce this today (with or without the patches I've landed to fix 1384001) and Juju doesn't issue the second 'release' request any more… Marking this as incomplete until I can reproduce…

Changed in maas:
status: New → Incomplete
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1386327] Re: Juju fails to destroy-environment with wipe enabled: Node cannot be released in its current state ('Disk erasing').

On Tuesday 28 Oct 2014 15:57:29 you wrote:
> > This is the same bug as bug 1384001 IMO. It's just that the disk erasing
> > feature was enabled in MAAS so the state after DEPLOYED is different.
>
> I don't think so; as you can see from the apache log, the 'release'
> operation is issued twice and it fails the second time because the node
> is already releasing.

That's exactly what was happening with the other bug (I spoke to a juju dev).

Christian Reis (kiko)
Changed in maas:
milestone: next → 1.7.1
Revision history for this message
John A Meinel (jameinel) wrote :

So if you issue a "juju destroy-environment" we ask the API server if we can clean up nicely, and then wait a bit and then the client goes through and explicitly lists the provider and kills everything directly (as a back-stop for if the API server is malfunctioning/etc).

Which explains how we could get 2 requests to kill a machine, and how there is a bit of a race.

I think it is fair for the juju client code to notice that the machine is (for all intents and purposes) already dying as much as Juju can do (DIsk Erasing or Releasing are both as good as we can make it with API calls), and either not make the request, or handle the error as a non-fatal error.

Changed in maas:
milestone: 1.7.1 → 1.7.2
Changed in maas:
milestone: 1.7.2 → 1.7.3
Gavin Panella (allenap)
Changed in maas:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.