mumax3 test suite fails against glibc 2.38

Bug #2032624 reported by Simon Chopin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GLibC
New
Medium
cbmc (Ubuntu)
New
Undecided
Unassigned
cxref (Ubuntu)
New
Undecided
Unassigned
gauche-c-wrapper (Ubuntu)
New
Undecided
Unassigned
glibc (Ubuntu)
In Progress
Medium
Unassigned
mumax3 (Ubuntu)
New
Critical
Unassigned
nvidia-nccl (Ubuntu)
New
Undecided
Unassigned
rocm-hipamd (Ubuntu)
New
Undecided
Unassigned
stdgpu-contrib (Ubuntu)
New
Undecided
Unassigned

Bug Description

The autopkgtests fail with the following error:

921s nvcc -std=c++03 -ccbin=/usr/bin/cuda-gcc --compiler-options -Werror --compiler-options -Wall -Xptxas -O3 -ptx -arch=compute_50 -code=sm_50 copypadmul2.cu -o copypadmul2_50.ptx
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(30): error: identifier "__Float32x4_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(31): error: identifier "__Float64x2_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(40): error: identifier "__SVFloat32_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(41): error: identifier "__SVFloat64_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(42): error: identifier "__SVBool_t" is undefined

Marking as critical as this blocks the glibc transition.

CVE References

Simon Chopin (schopin)
Changed in glibc (Ubuntu):
importance: Undecided → Critical
tags: added: update-excuse
tags: added: foundations-todo
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

This commit https://sourceware.org/git/?p=glibc.git;a=commit;h=cd94326a1326c4e3f1ee7a8d0a161cc0bdcaf07e added the file `sysdeps/aarch64/fpu/bits/math-vector.h.

On a mantic system, the header file gets placed at /usr/include/aarch64-linux-gnu/bits/math-vector.h, which used to do only a single thing for aarch64, which was:
#include <bits/libm-simd-decl-stubs.h>

And after the commit, a few types are added such as

#if __GNUC_PREREQ(9, 0)
# define __ADVSIMD_VEC_MATH_SUPPORTED
typedef __Float32x4_t __f32x4_t;
typedef __Float64x2_t __f64x2_t;
...

Simply commenting out the new types is enough to fix this issue, but completely removing the newly added support for libmvec is not a great idea.

Perhaps nvidia-cuda-toolkit-gcc needs to be rebuilt with support for these types?

Revision history for this message
Graham Inggs (ginggs) wrote :

The nvidia-cuda-toolkit-gcc package only contains the /usr/bin/cuda-g++ and /usr/bin/cuda-gcc wrappers and has a dependency on the highest supported g++, currently g++-12.

See: https://packages.ubuntu.com/mantic/devel/nvidia-cuda-toolkit-gcc

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Tried a no-change rebuild of nvidia-cuda-toolkit (https://launchpad.net/~mitchdz/+archive/ubuntu/nvidia-cuda-toolkit-mantic-merge) using the proposed archive and that did not solve the problem.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Ah I posted my comment right after your ginggs. Thanks for the pointer! You're right, on my system cuda-gcc just points to gcc-12

$ ll $(which /usr/bin/cuda-gcc)
lrwxrwxrwx 1 root root 6 Aug 23 14:17 /usr/bin/cuda-gcc -> gcc-12*

I tried using gcc-13 instead as I would hope that version would see these new types, but I'm still seeing __Float32x4_t undefined, in addition to some new types being undefined

nvcc -std=c++11 -ccbin=/usr/bin/g++-13 --allow-unsupported-compiler --compiler-options -Werror --compiler-options -Wall -Xptxas -O3 -ptx -arch=compute_50 -code=sm_50 copypadmul2.cu -o copypadmul2_50.ptx
...
/usr/include/stdlib.h(147): error: identifier "_Float64" is undefined
/usr/include/stdlib.h(153): error: identifier "_Float128" is undefined
/usr/include/stdlib.h(159): error: identifier "_Float32x" is undefined
/usr/include/stdlib.h(165): error: identifier "_Float64x" is undefined
...

Also another note, these particular CUDA code snippets don't really need these types, so finding a way to not include them will work (maybe patching libc6-dev to include another preprocessor directive) but I think ultimately that's a bad idea because someone could want a .cu file that uses both arm SIMD extensions in addition to the CUDA code.

Revision history for this message
Graham Inggs (ginggs) wrote :

Some similar reports I found (although from some years ago):

https://forums.developer.nvidia.com/t/nvcc-compilation-errors-on-24-2-l4t-platform-tx1/45937

https://github.com/InsightSoftwareConsortium/ITK/issues/1959

"The user space in R23.x is 32-bit. NEON is also from the 32-bit compatibility mode that makes ARMv8 able to execute armhf. The errors tend to imply that some 32-bit compatibility mode library for NEON is missing."

Seems to imply some mismatch between NEON (32-bit) and arm64?

Revision history for this message
Graham Inggs (ginggs) wrote :

We'll ignore this failure and allow glibc to migrate, and that does not preclude further investigation.

Note that mumax3/arm64 is not built in Debian, and did not built in jammy, so we may end up removing the arm64 binary.

Revision history for this message
Matthias Klose (doko) wrote :

Removing packages from mantic:
 mumax3 3.10-8 in mantic arm64
Comment: LP: #2032624, remove mumax3 binary on arm64
1 package successfully removed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This seems to be causing bug 2033747 too.

Revision history for this message
Graham Inggs (ginggs) wrote :

This also seems to cause nvidia-nccl to FTBFS on arm64 in the test rebuild

https://people.canonical.com/~ginggs/ftbfs-report/test-rebuild-20230830-mantic-mantic.html

tags: added: ftbfs
Revision history for this message
Graham Inggs (ginggs) wrote :

cxref also FTBFS on arm64 in the test rebuild

Revision history for this message
Graham Inggs (ginggs) wrote :

Also gauche-c-wrapper, rocm-hipamd and stdgpu-contrib

Revision history for this message
Heinrich Schuchardt (xypron) wrote :

cbmc fails to build from source on arm64 with LTO disabled as reported in LP 2036745:

Failed test: fmod1
CBMC version 5.89.0 (cbmc-5.89.0) 64-bit arm64 linux
Parsing main.c
file /usr/include/aarch64-linux-gnu/bits/math-vector.h line 30: syntax error before '__f32x4_t'
PARSING ERROR

https://launchpadlibrarian.net/688275364/buildlog_ubuntu-mantic-arm64.cbmc_5.89.0-2ubuntu1~ppa1_BUILDING.txt.gz

Revision history for this message
In , Simon Chopin (schopin) wrote :

The use of vector types such as __Float32x4_t in the aarch64 math-vector.h header breaks quite a few programs that are essentially parsing C code but using GCC as their preprocessor. GCC expands to the paths using its own intrinsic types, which aren't implemented by the consuming programs.

I'm not sure if this qualifies as a bug in glibc, as it seems reasonable to rely on those types, but we've seen this happen in quite a few instances in Ubuntu:

https://bugs.launchpad.net/ubuntu/+source/mumax3/+bug/2032624

Revision history for this message
Simon Chopin (schopin) wrote (last edit ):
Changed in glibc:
importance: Unknown → Medium
status: Unknown → New
Revision history for this message
In , Simon Chopin (schopin) wrote :

I posted a tentative patch adding a way to work around those types at https://sourceware.org/pipermail/libc-alpha/2023-September/151770.html

I'll ship it in my next Ubuntu upload for Mantic as a way to unblock us due to our fairly tight schedule, but I'm hoping we can come up with a better long-term solution.

Revision history for this message
Simon Chopin (schopin) wrote :

I'll be shipping a temporary workaround patch that disables the vec types if __ARM_VEC_MATH_DISABLED is defined. We still need to patch each failure individually to add that flag to the preprocessor step (not at build time but at runtime!), but at least the patching should be easier and quicker than providing proper support for the various vector types.

We shouldn't bother upstreaming those fixes to Debian, as I'm pretty sure the final glibc part of the solution will look fairly different than my current patch, but at least we can get those packages working in the mean time.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.38-1ubuntu5

---------------
glibc (2.38-1ubuntu5) mantic; urgency=medium

  * Update from upstream release branche:
    - CVE-2023-4527: Stack read overflow with large TCP responses in
      no-aaaa mode
    - CVE-2023-4806: use after free in getcanonname
    - LP: #2031909: Fix oversized __io_vtables
  * d/p/u/0001-Fix-leak-in-getaddrinfo-introduced-by-the-fix-for-CV:
    Cherry-picked to fix a regression in one of the previous CVE fixes
    (LP: #2037516, CVE-2023-5156)
  * d/p/lp2032624.patch: add an escape hatch in arm64 math-vector.h.
    This should help fixing multiple FTBFS (LP: #2032624)

 -- Simon Chopin <email address hidden> Wed, 27 Sep 2023 16:38:18 +0200

Changed in glibc (Ubuntu):
status: New → Fix Released
Revision history for this message
Simon Chopin (schopin) wrote :

Reopening in glibc as I had some upstream feedback that basically mean my workaround is not a good idea. I agree with them, and thus we should drop it, both in upcoming releases but also in the upcoming Mantic SRU to avoid users starting to depend on it, however unlikely that would be.

Changed in glibc (Ubuntu):
importance: Critical → Medium
status: Fix Released → In Progress
Revision history for this message
In , Connor-baker (connor-baker) wrote :

Adding some additional context:

We're running into this issue in Nixpkgs: https://github.com/NixOS/nixpkgs/pull/264599#pullrequestreview-1707381631.

The GLIBC 2.38 update introduces intrinsics for `aarch64-linux` in `math.h`.

NVCC (NVIDIA's CUDA Compiler) declares itself to be the same compiler as its host compiler. This causes inclusion of unsupported `aarch64-linux` intrinsics. NVCC is now unable to compile any CUDA file for `aarch64-linux` because it does not support these intrinsics: https://forums.developer.nvidia.com/t/nvcc-fails-to-build-with-arm-neon-instructions-cpp-vs-cu/248355/2.

I'll be submitting the same patch I've made for Nixpkgs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.