Subcloud add failure due to bootstrap replay failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
In Progress
|
Undecided
|
Rupanjan Chakraborty |
Bug Description
Brief Description
Many subclouds in scale lab could not be re-added due to bootstrap replay failure.
Severity
Major
Steps to Reproduce
To check if this is reproducible in DC1000-2, we had to apply a change and restart dcmanager service. As a result, all subclouds that were in the middle of bootstrap failed
Readd the 250 subclouds
Expected Behavior
Subclouds could be re-added without issues
Actual Behavior
Some subcloud failed to bootstrap with the following error
Failed to provision the initial system config.
Traceback (most recent call last):
File "/tmp/.
File "/tmp/.
File "/tmp/.
File "/usr/lib/
return self._delete(
File "/usr/lib/
File "/usr/lib/
return self._http_
File "/usr/lib/
raise exceptions.
cgtsclien
stdout_lines:
An example of these failed subclouds is subcloud104. Ansible bootstrap was terminated while it was in the middle of the following task which should not affect the bootstrap replay.
TASK [bootstrap/
Wednesday 06 September 2023 21:31:46 +0000 (0:00:01.269) 0:36:06.602 ***
FAILED - RETRYING: Check controller-0 is in online state (15 retries left).
Reproducibility
100% reproducible
System Configuration
Distrubuted Cloud
Alarms
N/A
Test Activity
Developer Testing
Workaround
AWS subcloud: Delete the VM, redeploy the VM, delete the subcloud and re-add
Hardware subcloud: delete the subcloud and re-add with reinstall option
Changed in starlingx: | |
assignee: | nobody → Rupanjan Chakraborty (rchakrab) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/898841
Review: https:/