instead of "seccomp". This basically means that instead of all seccomp() users setting these flags, it is up to userspace to set manually via prctl(). The linked upstream change goes into all the reasons why this is the right thing to do.
From the cpuid on the output of the failing cloud provider we see
SSBD: speculative store bypass disable = true
suggesting that this has been explicitly disabled? It's unclear to me if that's set by the cloud provider in qemu? Not sure if I can tell from a guest without backend access?
OpenDev is a canary for this sort of thing as we are extremely heterogeneous with clouds, we have resources donated by about 7-8 different cloud providers, each with multiple regions (across x86_64 & arm64) that we use simultaneously for CI work (we use whatever people will donate). I've tested and booting with spec_store_bypass_disable=prctl stops the traces in the affected cloud, so we'll probably implement this.
However, I think there's probably enough here to think about backporting this commit for maximum compatibility of the generic images. It seems like the system works well enough (which is how it passed all our initial CI) but the traces spewing will quickly lead to disks filling up with bloated log files (how we found it after running in production).
So after reading and experimenting a bit more, what the upstream change is doing is setting the defaults to
spec_store_ bypass_ disable= prctl v2_user= prctl
spectre_
instead of "seccomp". This basically means that instead of all seccomp() users setting these flags, it is up to userspace to set manually via prctl(). The linked upstream change goes into all the reasons why this is the right thing to do.
From the cpuid on the output of the failing cloud provider we see
SSBD: speculative store bypass disable = true
suggesting that this has been explicitly disabled? It's unclear to me if that's set by the cloud provider in qemu? Not sure if I can tell from a guest without backend access?
OpenDev is a canary for this sort of thing as we are extremely heterogeneous with clouds, we have resources donated by about 7-8 different cloud providers, each with multiple regions (across x86_64 & arm64) that we use simultaneously for CI work (we use whatever people will donate). I've tested and booting with spec_store_ bypass_ disable= prctl stops the traces in the affected cloud, so we'll probably implement this.
However, I think there's probably enough here to think about backporting this commit for maximum compatibility of the generic images. It seems like the system works well enough (which is how it passed all our initial CI) but the traces spewing will quickly lead to disks filling up with bloated log files (how we found it after running in production).