Trying to play whack-a-mole with all of the places that could raise unexpected exceptions from lower layers, like the hypervisor in this case, is going to be a losing game. We should probably just restrict the fault message to admin-only since we don't audit them and this is python so explicitly handling typed exceptions isn't a thing.
Deciding *where* to obfuscate the message is the question I think. We could say that for any non-NovaException types (where format_message() is implemented), we could just include the message if context.is_admin here:
And the policy rule would default to admin-only. A policy rule is better since deployments could tailor it so non-admin support personal could still see those details but not the tenant user that owns the server.
If we do that, I'd also fix the logic here so we rely on policy rather than this janky code logic:
Ugh, OK. So in this particular case, the instance goes to ERROR state from the reboot failure here:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ manager. py#L3537
The exception is re-raised and the fault (libvirtError) gets recorded by the wrap_instance_fault decorator:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ manager. py#L3448
This is what sets the message field on the InstanceFault:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ utils.py# L109
This is likely where the libvirtError gets turned into the message:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ utils.py# L78
Trying to play whack-a-mole with all of the places that could raise unexpected exceptions from lower layers, like the hypervisor in this case, is going to be a losing game. We should probably just restrict the fault message to admin-only since we don't audit them and this is python so explicitly handling typed exceptions isn't a thing.
Deciding *where* to obfuscate the message is the question I think. We could say that for any non-NovaException types (where format_message() is implemented), we could just include the message if context.is_admin here:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ utils.py# L78
But that won't fix old compute nodes until you patch them. It also doesn't mean that NovaExceptions handled here:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/compute/ utils.py# L73
Wouldn't also expose something from a lower level because we have several exceptions like this:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/exception. py#L267
Where whatever lower level error happens gets turned into the "reason" value.
Given that, it probably makes most sense to just add a new policy rule that is checked in the API to determine if the fault message can be shown here:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/api/ openstack/ compute/ views/servers. py#L553
And the policy rule would default to admin-only. A policy rule is better since deployments could tailor it so non-admin support personal could still see those details but not the tenant user that owns the server.
If we do that, I'd also fix the logic here so we rely on policy rather than this janky code logic:
https:/ /github. com/openstack/ nova/blob/ 2c0cb71fb0ac0d5 02dc9fed24211e1 ef15407b8f/ nova/api/ openstack/ compute/ views/servers. py#L562