So, I rebuilt the glibc 2.23 from the 16.04 sources and modified the values written to the adapt_count parm in the lock elision code. It's a short and the original code may store values 0, 1, 2, 3. We were seeing either 1 (canary hit in constructor) or 0 (canary hit in destructor). I changed it to use the values 0x3333, 0x2222, 0x1111, and 0. And I just saw the constructor canary hit with the expected values 0x11, 0x11 in the changed bytes. So, this is a race condition in the lock elision code with mutex located on the stack and being reused quickly by another hardware thread on the same processor core.
So, I rebuilt the glibc 2.23 from the 16.04 sources and modified the values written to the adapt_count parm in the lock elision code. It's a short and the original code may store values 0, 1, 2, 3. We were seeing either 1 (canary hit in constructor) or 0 (canary hit in destructor). I changed it to use the values 0x3333, 0x2222, 0x1111, and 0. And I just saw the constructor canary hit with the expected values 0x11, 0x11 in the changed bytes. So, this is a race condition in the lock elision code with mutex located on the stack and being reused quickly by another hardware thread on the same processor core.