Inefficient initialization of bit-packed fields
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Arm Embedded Toolchain |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Greetings,
We'd like to replace macro field packing with a structure, preferrable a bit field. Yet when we made an implementation the code size grew by 1/2KiB for a small program.
Here's the principle of the macro, though the complete implementation is probably more elaborate.
#define X(a,b,c,d) (((a)<<
The C++ language implementations are in the attachment.
With -Os, we get something pretty verbose for initialization:
2: 2301 movs r3, #1
4: f363 0007 bfi r0, r3, #0, #8
8: 2302 movs r3, #2
a: f363 200f bfi r0, r3, #8, #8
e: 2303 movs r3, #3
10: f363 4017 bfi r0, r3, #16, #8
14: 2304 movs r3, #4
With -O3, we get a constant word load for the bit fields which seems pretty optimal size-wise. The byte-wise version is a little less compact because it handles truncation after loading the word.
It seems like the output machine code could be improved by selecting the O3 strategy for Os. The byte-wise initialization may deserve some attention as well.
Cheers
Changed in gcc-arm-embedded: | |
status: | New → Confirmed |
Checking the selection of the CPU, it seems this might related to the M4. The M0 target generates efficient code for Os. Also, it's not clear why function lookup_bifi with O3 for M4 is longer than the M0 version.