The ras-mc-ctl script does not match the SQLite structure

Bug #1931847 reported by yannek
50
This bug affects 9 people
Affects Status Importance Assigned to Milestone
rasdaemon (Debian)
Fix Released
Unknown
rasdaemon (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
In Progress
Medium
Matthew Ruffell

Bug Description

[Impact]

When running ras-mc-ctl to check for any logged machine errors, the following
error is printed:

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.

This happens because ras-mc-ctl assumes for each error type a table will exist in the SQLite database the application maintains. However, these tables are only created if the specific error type feature is enabled at compile time.

debian/rules:

override_dh_auto_configure:
        dh_auto_configure -- \
        --enable-mce --enable-aer --enable-sqlite3 --enable-extlog \
        --enable-abrt-report --enable-arm

Note, --enable-devlink is not enabled.

Dumping the SQLite database shows the table does not exist:

$ sqlite3 /var/lib/rasdaemon/ras-mc_event.db .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE mc_event (id INTEGER PRIMARY KEY, timestamp TEXT, err_count INTEGER, err_type TEXT, err_msg TEXT, label TEXT, mc INTEGER, top_layer INTEGER, middle_layer INTEGER, lower_layer INTEGER, address INTEGER, grain INTEGER, syndrome INTEGER, driver_detail TEXT);
CREATE TABLE aer_event (id INTEGER PRIMARY KEY, timestamp TEXT, dev_name TEXT, err_type TEXT, err_msg TEXT);
CREATE TABLE extlog_event (id INTEGER PRIMARY KEY, timestamp TEXT, etype INTEGER, error_count INTEGER, severity INTEGER, address INTEGER, fru_id BLOB, fru_text TEXT, cper_data BLOB);
CREATE TABLE mce_record (id INTEGER PRIMARY KEY, timestamp TEXT, mcgcap INTEGER, mcgstatus INTEGER, status INTEGER, addr INTEGER, misc INTEGER, ip INTEGER, tsc INTEGER, walltime INTEGER, cpu INTEGER, cpuid INTEGER, apicid INTEGER, socketid INTEGER, cs INTEGER, bank INTEGER, cpuvendor INTEGER, bank_name TEXT, error_msg TEXT, mcgstatus_msg TEXT, mcistatus_msg TEXT, mcastatus_msg TEXT, user_action TEXT, mc_location TEXT);
CREATE TABLE arm_event (id INTEGER PRIMARY KEY, timestamp TEXT, error_count INTEGER, affinity INTEGER, mpidr INTEGER, running_state INTEGER, psci_state INTEGER);
COMMIT;

ras-mc-ctl needs to be patched to only query tables for features that have been enabled at compile time, and not just all of them.

[Testcase]

Ideally deploy a bare metal server, with access to ECC memory and MCE information.
(The bug is also reproducible in a LXD VM; comment #5.)

$ sudo apt install rasdaemon
$ ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.

If you install the test package in the below ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf369475-test

$ ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.

We can now see MCE errors, and the script returns with a clean exit code.

[Where problems could occur]

ras-mc-ctl is being fixed so it can run to completion, and doing so allows ras-mc-ctl to display MCE error information. If a server has suffered any MCE's, previously they would not be displayed, while they will be now, and may alarm some system administrators. I think this is ultimately a good thing, so they can investigate the source of MCE's and determine if they need to replace any hardware.

If a regression occurs, system administrators may not be able to see compiled information on ECC errors, MCE's, PCIe AER errors etc. These errors will still be present in dmesg and syslog, until a new version of rasdaemon is released.

Since the changes simply if statement guard blocks of code which queries each database table, the risk of regression is low.

[Other info]

Merge request:
https://github.com/mchehab/rasdaemon/pull/35
Upstream issue:
https://github.com/mchehab/rasdaemon/issues/30

This was fixed upstream in 0.6.7 with commit:

commit 546cf713f667437fb6e283cc3dc090679eb47d08
Author: Subhendu Saha <email address hidden>
Date: Tue, 12 Jan 2021 03:29:55 -0500
Subject: Fix ras-mc-ctl script.
Link: https://github.com/mchehab/rasdaemon/commit/546cf713f667437fb6e283cc3dc090679eb47d08

The diff looks large, but it is not the case. The commit simply added if (has_feature) guards to blocks of code. The blocks themseleves did not change, the only code being added is setting the value of has_feature variables, and the if (has_feature) statements themselves.

Note: I dropped the has_arm hunks due to 0.6.5 not having support yet in ras-mc-ctl for that feature.

Only Focal needs the fix (included in Jammy and later).

description: updated
description: updated
description: updated
affects: linux (Ubuntu) → ras (Ubuntu)
affects: ras (Ubuntu) → rasdaemon (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rasdaemon (Ubuntu):
status: New → Confirmed
tags: added: focal hirsute impish
Changed in rasdaemon (Debian):
status: Unknown → New
Changed in rasdaemon (Debian):
status: New → Fix Released
Revision history for this message
David (hollalay) wrote :

Guys I'm affected by this issue, are you still working on that?
Debian is already on 0.6.6, can you please fix it?

thank you

Revision history for this message
ZeOM (zeom) wrote :

Hi everyone - I was having this problem blocking me from investigating on a 20.04.4

The Debian version 0.6.6 failed to work.

The version 0.6.7 (which should work) was using a not compatible libc version.

I finally find that the Debian buster old package 0.6.0 works :
https://packages.debian.org/buster/rasdaemon

dpkg -i rasdaemon_0.6.0-1.2_amd64.deb

tags: added: patch
Changed in rasdaemon (Ubuntu):
status: Confirmed → Fix Released
Changed in rasdaemon (Ubuntu Bionic):
status: New → In Progress
Changed in rasdaemon (Ubuntu Focal):
status: New → In Progress
Changed in rasdaemon (Ubuntu Bionic):
importance: Undecided → Medium
Changed in rasdaemon (Ubuntu Focal):
importance: Undecided → Medium
Changed in rasdaemon (Ubuntu Bionic):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in rasdaemon (Ubuntu Focal):
assignee: nobody → Matthew Ruffell (mruffell)
no longer affects: rasdaemon (Ubuntu Bionic)
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Attached is a debdiff for Focal which fixes this issue.

description: updated
description: updated
tags: added: se-sponsor-mfo
description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

The bug is also reproducible in a LXD VM (updated description):

 $ lxc launch --vm ubuntu:focal ras
 $ lxc exec ras -- su - ubuntu

 $ sudo apt update && sudo apt install -y rasdaemon

 $ dpkg -s rasdaemon | grep Version:
 Version: 0.6.5-1ubuntu1.1

 $ ras-mc-ctl --summary
 No Memory errors.

 No PCIe AER errors.

 No Extlog errors.

 DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
 Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.

 $ echo $?
 2

The fix (test package) works as expected:

 $ dpkg -s rasdaemon | grep Version:
 Version: 0.6.5-1ubuntu1.2

 $ ras-mc-ctl --summary
 No Memory errors.

 No PCIe AER errors.

 No Extlog errors.

 No MCE errors.

 $ echo $?
 0

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Matthew,

Thanks for the excellent bug report and debdiff, as always!

I verified mechanically that the only code change are the
`if` blocks, and all the rest is indentation changes only.

I have also looked at the source code to understand the
issue, and your description is straight to the point.

No other upstream fixes to the introduced changes.

Only Focal is affected (updated description).

Nice touch on providing a description of the backport.
The only change I made was s/hunk/hunks/ (2 ARM hunks).

The package built correcly on all architectures with
-updates and -proposed, and I could verify them fine.

Other checks performed as well, all good!

Uploaded to Focal.
Thanks!

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Details:
---

Commit:

 $ grep Homepage: debian/control
 Homepage: https://github.com/mchehab/rasdaemon

 $ git describe --contains 546cf713f667437fb6e283cc3dc090679eb47d08
 v0.6.7~17

 $ rmadison -a source rasdaemon
  rasdaemon | 0.5.6-2 | xenial/universe | source
  rasdaemon | 0.5.6-2ubuntu1.1 | xenial-updates/universe | source
  rasdaemon | 0.6.0-1 | bionic/universe | source
  rasdaemon | 0.6.0-1ubuntu0.2 | bionic-updates/universe | source
  rasdaemon | 0.6.5-1ubuntu1 | focal/universe | source
  rasdaemon | 0.6.5-1ubuntu1.1 | focal-updates/universe | source
  rasdaemon | 0.6.7-1 | jammy/universe | source
  rasdaemon | 0.6.8-1 | lunar/universe | source
  rasdaemon | 0.6.8-1.1 | mantic/universe | source

Indentation only:

 $ cat 0001-Fix-ras-mc-ctl-script.patch | sed '/^@@/,$ { s/^[ -]//; /^+/d; }' > patch.old

 $ cat 0001-Fix-ras-mc-ctl-script.patch | sed '/^@@/,$ { s/^+ /+/; s/^+\t /+\t/; s/^[ +]//; /^-/d; }' > patch.new

 $ diff -U0 patch.old patch.new | grep -v '^@@'
 --- patch.old 2023-10-07 18:16:19.279023843 -0300
 +++ patch.new 2023-10-07 18:19:37.895783167 -0300
 +my $has_aer = 0;
 +my $has_arm = 0;
 +my $has_devlink = 0;
 +my $has_disk_errors = 0;
 +my $has_extlog = 0;
 +my $has_mce = 0;
 +
 +@WITH_AER_TRUE@$has_aer = 1;
 +@WITH_ARM_TRUE@$has_arm = 1;
 +@WITH_DEVLINK_TRUE@$has_devlink = 1;
 +@WITH_DISKERROR_TRUE@$has_disk_errors = 1;
 +@WITH_EXTLOG_TRUE@$has_extlog = 1;
 +@WITH_MCE_TRUE@$has_mce = 1;
 +
 +if ($has_aer == 1) {
 + }
 +if ($has_arm == 1) {
 + }
 +if ($has_extlog == 1) {
 + }
 +if ($has_devlink == 1) {
 + }
 +if ($has_disk_errors == 1) {
 + }
 +if ($has_mce == 1) {
 + }
 +if ($has_aer == 1) {
 + }
 +if ($has_arm == 1) {
 + }
 +if ($has_extlog == 1) {
 + }
 +if ($has_devlink == 1) {
 + }
 +if ($has_disk_errors == 1) {
 + }
 +if ($has_mce == 1) {
 + }
 --

Removed:
 # ARM processor arm_event errors

Source:

 @ ras-record.c

 777 #ifdef HAVE_DEVLINK
 778 rc = ras_mc_create_table(priv, &devlink_event_tab);
 ...
 782 #endif

 550 static int ras_mc_create_table(struct sqlite3_priv *priv,
 551 const struct db_table_descriptor *db_tab)
 ...
 557 p += snprintf(p, end - p, "CREATE TABLE IF NOT EXISTS %s (",
 558 db_tab->name);

 399 #ifdef HAVE_DEVLINK
 ...
 410 static const struct db_table_descriptor devlink_event_tab = {
 411 .name = "devlink_event",

 @ configure.ac

  94 AC_ARG_ENABLE([devlink],
  95 AS_HELP_STRING([--enable-devlink], [enable devlink health events (currently experimental)]))
  96
  97 AS_IF([test "x$enable_devlink" = "xyes" || test "x$enable_all" == "xyes"], [
  98 AC_DEFINE(HAVE_DEVLINK,1,"have devlink health events collect")
  99 AC_SUBST([WITH_DEVLINK])
 100 ])
 101 AM_CONDITIONAL([WITH_DEVLINK], [test x$enable_devlink = xyes || test x$enable_all == xyes])
 102 AM_COND_IF([WITH_DEVLINK], [USE_DEVLINK="yes"], [USE_DEVLINK="no"])

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.