BERT: Error records from previous boot: Firmware Error Record Type: SOC Firmware Error Record Type2, Revision: 2, Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d while training neural network on CPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
intel-microcode (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I was training a neural network on CPU using Tensorflow + Keras in Python. The crash happened twice in the same day after installing an update to Ubuntu.
1) Ubuntu 20.04.4 LTS
2)
intel-microcode:
Installed: 3.20210608.
Candidate: 3.20210608.
Version table:
*** 3.20210608.
500 http://
500 http://
100 /var/lib/
3.
500 http://
3) Not crash.
4) Crash.
Excerpt from /var/log/kern.log:
BERT: Error records from previous boot:
[Hardware Error]: event severity: fatal
[Hardware Error]: Error 0, type: fatal
[Hardware Error]: section_type: Firmware Error Record Reference
[Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[Hardware Error]: Revision: 2
[Hardware Error]: Record Identifier: 8f87f311-
[Hardware Error]: 00000000: 0100c303 00000280 04726ce2 0000002f .........lr./...
[Hardware Error]: 00000010: 000a0000 00000010 80001fff 0001f668 ............h...
[Hardware Error]: 00000020: 0001f664 000231c6 deadbeef deafbeef d....1..........
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: intel-microcode 3.20210608.
ProcVersionSign
Uname: Linux 5.13.0-30-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Wed Feb 23 09:32:56 2022
InstallationDate: Installed on 2021-11-23 (91 days ago)
InstallationMedia: Ubuntu 20.04.3 LTS "Focal Fossa" - Release amd64 (20210819)
SourcePackage: intel-microcode
UpgradeStatus: No upgrade log present (probably fresh install)
Status changed to 'Confirmed' because the bug affects multiple users.