ondra and I have been hammering away at this, but progress is painfully slow given that:
a) the problem is not seen on every boot.
b) we can only view the end of kmsg log.
c) rebuild times are relatively slow.
From what ondra says he's seen today, it sounds as though we might be hitting a stack corruption issue - the debug I've given him is not being displayed as expected. Even with the initial fix I created based on code inspection alone, ondra is still seeing exactly the same assertion failure which we thought would have been fixed by the initial fix for this bug.
I've tried various ways to recreate the issue (both on a device and on a normal system, code review / code analysis tools, runtime checkers), but have so far been unsuccessful.
Current work-arounds:
1) Keep using '--no-log' in the kernel command-line.
Pros: reliable.
Cons: means that no system jobs get their output logged.
2) Disable the /etc/init/flush-early-job-log.conf job.
Pros: seems to be reliable (? but needs further testing).
Cons: means early job output is not logged (however, on the device in question the only output seems to be from /etc/init/container-detect.conf (and that output is not even required).
ondra and I have been hammering away at this, but progress is painfully slow given that:
a) the problem is not seen on every boot.
b) we can only view the end of kmsg log.
c) rebuild times are relatively slow.
From what ondra says he's seen today, it sounds as though we might be hitting a stack corruption issue - the debug I've given him is not being displayed as expected. Even with the initial fix I created based on code inspection alone, ondra is still seeing exactly the same assertion failure which we thought would have been fixed by the initial fix for this bug.
I've tried various ways to recreate the issue (both on a device and on a normal system, code review / code analysis tools, runtime checkers), but have so far been unsuccessful.
Current work-arounds:
1) Keep using '--no-log' in the kernel command-line.
Pros: reliable.
Cons: means that no system jobs get their output logged.
2) Disable the /etc/init/ flush-early- job-log. conf job. container- detect. conf (and that output is not even required).
Pros: seems to be reliable (? but needs further testing).
Cons: means early job output is not logged (however, on the device in question the only output seems to be from /etc/init/