memory_compare script was changed to a 25% variance allowed

Bug #1355408 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Fix Released
Low
Daniel Manrique

Bug Description

At some point in recent history, the memory_compare script was changed to pass on anything with a 25% variance or less in RAM. This was 10% historically and satisfied both client and server systems.

With a 10% variance, clients (mostly laptops) with shared RAM that use more than 10% are flagged but everything else is passed (assuming they vary by 10% or less).

With the 25% we run the very real risk of missing cases where RAM variance between the kernel is significantly different from what's physically there without being flagged by a test failure to investigate further.

A recent example, we had a server that had 512MB difference between what was physically installed and what the kernel addressed. the 10% test flagged this and on investigation, the system was passed once we did some digging and discovered that the server used an AMD APU that also had a built in GPU that was using 512MB of RAM, thus the discrepancy.

However, had this test run using the new 25% limit, it very likely would not have been noticed, and if it was not a GPU shared ram issue, we would have overlooked a potentially 25% gap in RAM that had no explanation other than hardware or kernel bug.

So a question that came to mind is this (thinking of it from a client cert perspective):

If a laptop is certified with 8GB of ram, and is sharing 1GB, but the variance is the full 25%, there would be 1 full GB of RAM unaccounted for that may go unnoticed because of the wider limit on this test.

I haven't found exactly when this change occurred, but a recent server certificate, tested at the beginning of June with this version from the dev ppa:

plainbox-provider-checkbox 0.5~dev+bzr3032+pkg5~ubuntu14.04.1

still showed the 10% variance

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :

I first noticed this today while testing this version from the "testing" ppa

plainbox-provider-checkbox 0.9~c1~ppa~ubuntu14.04.1

Revision history for this message
Jeff Lane  (bladernr) wrote :

Or I guess to put this into server scale, if I am cert testing a system with 1TiB of RAM, I could potentially not notice in my test results that up to 250Gib of RAM was missing, and no GPU in the world would account for that large of a discrepancy.

Revision history for this message
Daniel Manrique (roadmr) wrote :

This was done in revision 2256:

checkbox-old:scripts:memory_compare: set an adaptive diff. threshold

To avoid false positive with systems sharing system memory with graphic
cards,

The variable threshold is set as follow:
- if total memory amount is 2 GB or less, then the threshold is 25%
- for more than 2 GB it's 20%
- and for more than 6 GB it goes back down to 10%.

And the code reflects this. Line 65:

    threshold = get_threshold(installed_memory)

and the get_threshold method:

def get_threshold(installed_memory):
    GB = 1024**3
    if installed_memory <= 2 * GB:
        return 25
    elif installed_memory <= 6 * GB:
        return 20
    else:
        return 10

This was done because a GPU can conceivably take, say, 256MB of memory, and on a 2-GB system that would put us over the 10% threshold (and yet be completely normal and explainable).

Now, if this is not working as expected, please let me know, but otherwise, maybe the only change we'd need to make is removing the unused and misleading THRESHOLD constant at the beginning.

Changed in plainbox-provider-checkbox:
status: New → Incomplete
Daniel Manrique (roadmr)
Changed in plainbox-provider-checkbox:
milestone: none → 0.10
Revision history for this message
Jeff Lane  (bladernr) wrote :

yeah, that threw me. And your description makes sense, and yes, that does work, which does explain why I never noticed the change until now.

Revision history for this message
Daniel Manrique (roadmr) wrote :

OK, let's mark this as "in progress" then. Thanks for reporting this, and for confirming it works as expected (hurray!).

By the way, the threshold is easily adjustable as you saw, so if you want it to be stricter in systems with more RAM (your rationale makes perfect sense), we can do that no problem.

Changed in plainbox-provider-checkbox:
status: Incomplete → In Progress
importance: Undecided → Low
assignee: nobody → Daniel Manrique (roadmr)
Daniel Manrique (roadmr)
Changed in plainbox-provider-checkbox:
status: In Progress → Fix Committed
Daniel Manrique (roadmr)
Changed in plainbox-provider-checkbox:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.