I'm not convinced that the original problem was caused by stale data; as I noted, it occurred even when I deleted the ~/.cache directory tree. That said, the original problem also disappeared prior to 14.04 GA, so that's a moot point. Because everything was changing so rapidly back then, and because the systems that exhibited the problem are now in OIL, it would be next to impossible to reproduce the original problem today.
As to dealing with the problem of a changed job definition, my initial reaction is that it would be appropriate to invalidate the original run and re-run everything from scratch (deleting the partial results from the first run). My reasoning is that a job definition should only have changed if a software update had occurred, so continuing runs the risk of having an inconsistent set of software used in test results -- that is, Test A might have version 1.2.3 of Package X, whereas Test B might be run using version 1.2.4 of Package X (where both tests obviously rely on Package X in some important way). Such an inconsistency strikes me as being potentially very confusing, should consistency be important when comparing test results (either manually or, at least in theory, by some other test). Displaying a warning to the effect that old results will be deleted and tests started again seems reasonable, as does giving the option to abort the run and leave whatever temporary results are available intact. That would be helpful if those initial results held important data for debugging or whatever.
I'm not convinced that the original problem was caused by stale data; as I noted, it occurred even when I deleted the ~/.cache directory tree. That said, the original problem also disappeared prior to 14.04 GA, so that's a moot point. Because everything was changing so rapidly back then, and because the systems that exhibited the problem are now in OIL, it would be next to impossible to reproduce the original problem today.
As to dealing with the problem of a changed job definition, my initial reaction is that it would be appropriate to invalidate the original run and re-run everything from scratch (deleting the partial results from the first run). My reasoning is that a job definition should only have changed if a software update had occurred, so continuing runs the risk of having an inconsistent set of software used in test results -- that is, Test A might have version 1.2.3 of Package X, whereas Test B might be run using version 1.2.4 of Package X (where both tests obviously rely on Package X in some important way). Such an inconsistency strikes me as being potentially very confusing, should consistency be important when comparing test results (either manually or, at least in theory, by some other test). Displaying a warning to the effect that old results will be deleted and tests started again seems reasonable, as does giving the option to abort the run and leave whatever temporary results are available intact. That would be helpful if those initial results held important data for debugging or whatever.