The ras-mc-ctl script does not match the SQLite structure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
rasdaemon (Debian) |
Fix Released
|
Unknown
|
|||
rasdaemon (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
In Progress
|
Medium
|
Matthew Ruffell |
Bug Description
[Impact]
When running ras-mc-ctl to check for any logged machine errors, the following
error is printed:
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/
Can't call method "execute" on an undefined value at /usr/sbin/
This happens because ras-mc-ctl assumes for each error type a table will exist in the SQLite database the application maintains. However, these tables are only created if the specific error type feature is enabled at compile time.
debian/rules:
override_
Note, --enable-devlink is not enabled.
Dumping the SQLite database shows the table does not exist:
$ sqlite3 /var/lib/
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE mc_event (id INTEGER PRIMARY KEY, timestamp TEXT, err_count INTEGER, err_type TEXT, err_msg TEXT, label TEXT, mc INTEGER, top_layer INTEGER, middle_layer INTEGER, lower_layer INTEGER, address INTEGER, grain INTEGER, syndrome INTEGER, driver_detail TEXT);
CREATE TABLE aer_event (id INTEGER PRIMARY KEY, timestamp TEXT, dev_name TEXT, err_type TEXT, err_msg TEXT);
CREATE TABLE extlog_event (id INTEGER PRIMARY KEY, timestamp TEXT, etype INTEGER, error_count INTEGER, severity INTEGER, address INTEGER, fru_id BLOB, fru_text TEXT, cper_data BLOB);
CREATE TABLE mce_record (id INTEGER PRIMARY KEY, timestamp TEXT, mcgcap INTEGER, mcgstatus INTEGER, status INTEGER, addr INTEGER, misc INTEGER, ip INTEGER, tsc INTEGER, walltime INTEGER, cpu INTEGER, cpuid INTEGER, apicid INTEGER, socketid INTEGER, cs INTEGER, bank INTEGER, cpuvendor INTEGER, bank_name TEXT, error_msg TEXT, mcgstatus_msg TEXT, mcistatus_msg TEXT, mcastatus_msg TEXT, user_action TEXT, mc_location TEXT);
CREATE TABLE arm_event (id INTEGER PRIMARY KEY, timestamp TEXT, error_count INTEGER, affinity INTEGER, mpidr INTEGER, running_state INTEGER, psci_state INTEGER);
COMMIT;
ras-mc-ctl needs to be patched to only query tables for features that have been enabled at compile time, and not just all of them.
[Testcase]
Ideally deploy a bare metal server, with access to ECC memory and MCE information.
(The bug is also reproducible in a LXD VM; comment #5.)
$ sudo apt install rasdaemon
$ ras-mc-ctl --summary
No Memory errors.
No PCIe AER errors.
No Extlog errors.
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/
Can't call method "execute" on an undefined value at /usr/sbin/
If you install the test package in the below ppa:
https:/
$ ras-mc-ctl --summary
No Memory errors.
No PCIe AER errors.
No Extlog errors.
No MCE errors.
We can now see MCE errors, and the script returns with a clean exit code.
[Where problems could occur]
ras-mc-ctl is being fixed so it can run to completion, and doing so allows ras-mc-ctl to display MCE error information. If a server has suffered any MCE's, previously they would not be displayed, while they will be now, and may alarm some system administrators. I think this is ultimately a good thing, so they can investigate the source of MCE's and determine if they need to replace any hardware.
If a regression occurs, system administrators may not be able to see compiled information on ECC errors, MCE's, PCIe AER errors etc. These errors will still be present in dmesg and syslog, until a new version of rasdaemon is released.
Since the changes simply if statement guard blocks of code which queries each database table, the risk of regression is low.
[Other info]
Merge request:
https:/
Upstream issue:
https:/
This was fixed upstream in 0.6.7 with commit:
commit 546cf713f667437
Author: Subhendu Saha <email address hidden>
Date: Tue, 12 Jan 2021 03:29:55 -0500
Subject: Fix ras-mc-ctl script.
Link: https:/
The diff looks large, but it is not the case. The commit simply added if (has_feature) guards to blocks of code. The blocks themseleves did not change, the only code being added is setting the value of has_feature variables, and the if (has_feature) statements themselves.
Note: I dropped the has_arm hunks due to 0.6.5 not having support yet in ras-mc-ctl for that feature.
Only Focal needs the fix (included in Jammy and later).
description: | updated |
description: | updated |
description: | updated |
affects: | linux (Ubuntu) → ras (Ubuntu) |
affects: | ras (Ubuntu) → rasdaemon (Ubuntu) |
tags: | added: focal hirsute impish |
Changed in rasdaemon (Debian): | |
status: | Unknown → New |
Changed in rasdaemon (Debian): | |
status: | New → Fix Released |
tags: | added: patch |
Changed in rasdaemon (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in rasdaemon (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in rasdaemon (Ubuntu Focal): | |
status: | New → In Progress |
Changed in rasdaemon (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in rasdaemon (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in rasdaemon (Ubuntu Bionic): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in rasdaemon (Ubuntu Focal): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
no longer affects: | rasdaemon (Ubuntu Bionic) |
description: | updated |
tags: | added: se-sponsor-mfo |
description: | updated |
Status changed to 'Confirmed' because the bug affects multiple users.