1. Why 60-80% of the cases are bad? Why not 100%? Where is the
randomness coming from?
- Looks like it's from the timing. When alarms are triggered, they
may or may not be right on the requested moment. It's perfectly
fine to be late by a few msec. When it's right on time, it
turned out be a corner case (bug612620 Comment#21) that is not
handled well by sync.c.
- people with faster machine (than my 1997 vintage desktop) will
probably experience higher rate (eg. 95%) of bad cases, because
faster machines have higher chance to trigger the alarm right on
time.
2. Why was it not an issue in F12?
- in F12, the alarms is also triggered right on time, which
_would_ be the corner case. However in F12 sync.c always invoke
SyncChangeCounter() a few more times than necessary, after the
alarm is triggered. The net result is that the counter value is
never right on the border.
00:02:11.110 #5 SyncChangeCounter newval=60000, oldval=10003
00:02:11.110 #4 SyncAlarmTriggerFired alarm id 0x00c0000d,counter=60000
00:02:11.111 #5 SyncChangeCounter newval=60001, oldval=60000
Note that "newval= 60001", not 60000 (the border, aka.
test_value). In F12 the newval always ends up a few msec more
than the test value.
- in F13, this extra invocation of SyncChangeCounter is
eliminated. So when the alarm is triggered, newval remains right
on the border.
17:34:58.532 #5 SyncChangeCounter newval=60000, oldval=20010
17:34:58.533 #4 SyncAlarmTriggerFired alarm id 0x00c00015,counter=60000
17:35:04.796 #5 SyncChangeCounter newval= 1, oldval=60000
Note that, after #4 SyncAlarmTriggerFired, newval remains 60000,
the boundary condition that exposes an existing old bug. Also
note that the second "#5 SyncChangeCounter" in F13 was 6 sec
later, unlike F12, which is within 1 msec.
- so my guess is that sync.c in F13 has some good improvements
(removing extra calls to SyncChangeCounter), which exposes an
existing old boundary-condition bug.
Some thoughts on this issue.
1. Why 60-80% of the cases are bad? Why not 100%? Where is the
randomness coming from?
- Looks like it's from the timing. When alarms are triggered, they
may or may not be right on the requested moment. It's perfectly
fine to be late by a few msec. When it's right on time, it
turned out be a corner case (bug612620 Comment#21) that is not
handled well by sync.c.
- people with faster machine (than my 1997 vintage desktop) will
probably experience higher rate (eg. 95%) of bad cases, because
faster machines have higher chance to trigger the alarm right on
time.
2. Why was it not an issue in F12? Counter( ) a few more times than necessary, after the rFired alarm id 0x00c0000d, counter= 60000
- in F12, the alarms is also triggered right on time, which
_would_ be the corner case. However in F12 sync.c always invoke
SyncChange
alarm is triggered. The net result is that the counter value is
never right on the border.
00:02:11.110 #5 SyncChangeCounter newval=60000, oldval=10003
00:02:11.110 #4 SyncAlarmTrigge
00:02:11.111 #5 SyncChangeCounter newval=60001, oldval=60000
Note that "newval= 60001", not 60000 (the border, aka.
test_value). In F12 the newval always ends up a few msec more
than the test value.
- in F13, this extra invocation of SyncChangeCounter is rFired alarm id 0x00c00015, counter= 60000 rFired, newval remains 60000,
eliminated. So when the alarm is triggered, newval remains right
on the border.
17:34:58.532 #5 SyncChangeCounter newval=60000, oldval=20010
17:34:58.533 #4 SyncAlarmTrigge
17:35:04.796 #5 SyncChangeCounter newval= 1, oldval=60000
Note that, after #4 SyncAlarmTrigge
the boundary condition that exposes an existing old bug. Also
note that the second "#5 SyncChangeCounter" in F13 was 6 sec
later, unlike F12, which is within 1 msec.
- so my guess is that sync.c in F13 has some good improvements
(removing extra calls to SyncChangeCounter), which exposes an
existing old boundary-condition bug.