There are following issues that make it easy for GCC to break this code by optimization:
- The zero-th operand is marked as output-only, but it is an input-output one: CMPXCHG8B reads the memory pointed to it, and also sets EDX:EAX if the operand and EDX:EAX are non-equal. It is also an output operand: the instruction sets it to ECX:EBX.
- The code probably tried to say that zero-th operand is also an input one by specifying it as the 4th operand ("m" (*a)), but this fails, because GCC does not guarantee putting the input and output operands referenced by the same C expression at the same location. Also this operand is not referenced in the assembly at all.
While we are at it, there are following issues that make the resulting code suboptimal:
- "volatile" is not needed as all the output operands are specified fully (i.e. no undescribed side-effects in the assembly);
- Saving EBX in the code is not required, just tell GCC that the code clobbers EBX and GCC will handle this;
- No need to force using ECX for input &set and output ret;
- Clobber "memory" is redundant as the code does not access memory in a way undescribed by constraints;
- Clobber "esp", I fail to understand its point at all.
but in fact the original bug there was that ebx operand had a wrong constraint (memory, failing because it is stack pointer-relative) when it should have been directly EBX register.
Before and after dissasambly (note the new version is shorter):
I believe this is an issue with 64-bit CAS implementation on IA32 systems. Here is the relevant code from include/ atomic/ x86_gcc. h:
asm volatile ("push %%ebx;"
"movl (%%ecx), %%ebx;"
"movl 4(%%ecx), %%ecx;"
LOCK_ prefix "; cmpxchg8b %0;"
"setz %2; pop %%ebx"
: "=m" (*a), "+A" (*cmp), "=c" (ret)
: "c" (&set), "m" (*a)
: "memory", "esp")
There are following issues that make it easy for GCC to break this code by optimization:
- The zero-th operand is marked as output-only, but it is an input-output one: CMPXCHG8B reads the memory pointed to it, and also sets EDX:EAX if the operand and EDX:EAX are non-equal. It is also an output operand: the instruction sets it to ECX:EBX.
- The code probably tried to say that zero-th operand is also an input one by specifying it as the 4th operand ("m" (*a)), but this fails, because GCC does not guarantee putting the input and output operands referenced by the same C expression at the same location. Also this operand is not referenced in the assembly at all.
While we are at it, there are following issues that make the resulting code suboptimal:
- "volatile" is not needed as all the output operands are specified fully (i.e. no undescribed side-effects in the assembly);
- Saving EBX in the code is not required, just tell GCC that the code clobbers EBX and GCC will handle this;
- No need to force using ECX for input &set and output ret;
- Clobber "memory" is redundant as the code does not access memory in a way undescribed by constraints;
- Clobber "esp", I fail to understand its point at all.
The rewritten code:
int32 val_for_ebx= (set & 0xFFFFFFFF);
int32 val_for_ecx= set >> 32;
asm (LOCK_prefix "; cmpxchg8b %0;"
"setz %1"
: "+m" (*a), "=g" (ret), "+A" (*cmp)
: "b" (val_for_ebx), "c" (val_for_ecx)
: "flags")
Note that this is similar to bugs.mysql. com/bug. php?id= 52419 and lists.mysql. com/commits/ 118512
http://
http://
but in fact the original bug there was that ebx operand had a wrong constraint (memory, failing because it is stack pointer-relative) when it should have been directly EBX register.
Before and after dissasambly (note the new version is shorter):
0x08231070 <+219>: mov %esi,-0x30(%ebp)
0x08231073 <+222>: mov %ebx,-0x2c(%ebp)
0x08231076 <+225>: mov %edi,%ebx
0x08231078 <+227>: mov (%edi),%eax
0x0823107a <+229>: mov 0x4(%edi),%edx
0x0823107d <+232>: lea -0x20(%ebp),%ecx
0x08231080 <+235>: mov %ecx,-0x3c(%ebp)
0x08231083 <+238>: mov -0x30(%ebp),%esi
0x08231086 <+241>: mov -0x2c(%ebp),%edi
0x08231089 <+244>: add %eax,%esi
0x0823108b <+246>: adc %edx,%edi
0x0823108d <+248>: mov %esi,-0x20(%ebp)
0x08231090 <+251>: mov %edi,-0x1c(%ebp)
0x08231093 <+254>: mov -0x3c(%ebp),%ecx
0x08231096 <+257>: push %ebx
0x08231097 <+258>: mov (%ecx),%ebx
0x08231099 <+260>: mov 0x4(%ecx),%ecx
=> 0x0823109c <+263>: lock cmpxchg8b (%ebx)
0x082310a0 <+267>: sete %cl
0x082310a3 <+270>: pop %ebx
After:
0x08231062 <+205>: mov %esi,-0x28(%ebp)
0x08231065 <+208>: mov %ebx,-0x24(%ebp) ; Save EBX
0x08231068 <+211>: mov %edi,%esi
0x0823106a <+213>: mov (%edi),%eax
0x0823106c <+215>: mov 0x4(%edi),%edx
0x0823106f <+218>: mov -0x28(%ebp),%ecx
0x08231072 <+221>: mov -0x24(%ebp),%ebx ; Restore EBX
0x08231075 <+224>: add %eax,%ecx
0x08231077 <+226>: adc %edx,%ebx
0x08231079 <+228>: mov %ecx,-0x20(%ebp)
0x0823107c <+231>: mov %ebx,-0x1c(%ebp)
0x0823107f <+234>: mov %ecx,%ebx
0x08231081 <+236>: mov -0x1c(%ebp),%ecx
0x08231084 <+239>: lock cmpxchg8b (%esi)
0x08231088 <+243>: sete %bl