Over the past 3-ish weeks we've had 3 seperate HP Proliant DL380 Gen9 servers lock up with a similar looking cpu lockup bug. All 3 of these servers are nova-compute nodes in an OpenStack cluster, with a reasonable amount of load on them. The symptoms are the load shoots up into the hundreds, and ps stops returning.
I've attached lspci -vnvn and 3 sets of syslog message traces that we grabbed on each of the 3 times it has crashed.
Over the past 3-ish weeks we've had 3 seperate HP Proliant DL380 Gen9 servers lock up with a similar looking cpu lockup bug. All 3 of these servers are nova-compute nodes in an OpenStack cluster, with a reasonable amount of load on them. The symptoms are the load shoots up into the hundreds, and ps stops returning.
I've attached lspci -vnvn and 3 sets of syslog message traces that we grabbed on each of the 3 times it has crashed.
$ cat /proc/version_ signature 49.65~14. 04.1-generic 3.16.7-ckt15
Ubuntu 3.16.0-
Please let us know if you need any further information.