> [ 5.134271] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
> [ 5.322247] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
> [ 5.510230] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No EDID read.
Is this connected to a KVM? The lack of reading the EDID is concerning.
> UBSAN warnings could be a red herring. They've added a compiler flag that complains about flexible arrays if they're declared incorrectly (false positive). Will take a look tomorrow.
Yeah I agree they're probably a red herring. The actual issue is that UVD IP block fails to init due to a timeout.
As a potential workaround (this isn't a solution), you might be able to skip the uvd_v6_0 IP block init.
To do this you need to look up which IP block number it is which is from your logs:
[ 4.836457] kernel: [drm] add ip block number 0 <vi_common>
[ 4.836458] kernel: [drm] add ip block number 1 <gmc_v8_0>
[ 4.836459] kernel: [drm] add ip block number 2 <tonga_ih>
[ 4.836459] kernel: [drm] add ip block number 3 <gfx_v8_0>
[ 4.836460] kernel: [drm] add ip block number 4 <sdma_v3_0>
[ 4.836461] kernel: [drm] add ip block number 5 <powerplay>
[ 4.836462] kernel: [drm] add ip block number 6 <dm>
[ 4.836462] kernel: [drm] add ip block number 7 <uvd_v6_0>
[ 4.836463] kernel: [drm] add ip block number 8 <vce_v3_0>
Then you can add "amdgpu.ip_block_mask=0xffffff7f" to your kernel command line to skip IP block 7 (uvd_v6_0).
If that helps the issue then it does confirm the out of bounds check is a red herring and the real issue is the uvd stuff. I'd like to see data points for those other kernels I suggested to narrow down when this problem started.
> [ 5.134271] kernel: [drm:detect_ link_and_ local_sink [amdgpu]] *ERROR* No EDID read. link_and_ local_sink [amdgpu]] *ERROR* No EDID read. link_and_ local_sink [amdgpu]] *ERROR* No EDID read.
> [ 5.322247] kernel: [drm:detect_
> [ 5.510230] kernel: [drm:detect_
Is this connected to a KVM? The lack of reading the EDID is concerning.
> UBSAN warnings could be a red herring. They've added a compiler flag that complains about flexible arrays if they're declared incorrectly (false positive). Will take a look tomorrow.
Yeah I agree they're probably a red herring. The actual issue is that UVD IP block fails to init due to a timeout.
[ 6.025262] kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ ring_test_ helper [amdgpu]] *ERROR* ring uvd test failed (-110) device_ ip_init [amdgpu]] *ERROR* hw_init of IP block <uvd_v6_0> failed -110 device_ ip_init failed
[ 6.025511] kernel: [drm:amdgpu_
[ 6.025661] kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu_
[ 6.025663] kernel: amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[ 6.025737] kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
As a potential workaround (this isn't a solution), you might be able to skip the uvd_v6_0 IP block init.
To do this you need to look up which IP block number it is which is from your logs:
[ 4.836457] kernel: [drm] add ip block number 0 <vi_common>
[ 4.836458] kernel: [drm] add ip block number 1 <gmc_v8_0>
[ 4.836459] kernel: [drm] add ip block number 2 <tonga_ih>
[ 4.836459] kernel: [drm] add ip block number 3 <gfx_v8_0>
[ 4.836460] kernel: [drm] add ip block number 4 <sdma_v3_0>
[ 4.836461] kernel: [drm] add ip block number 5 <powerplay>
[ 4.836462] kernel: [drm] add ip block number 6 <dm>
[ 4.836462] kernel: [drm] add ip block number 7 <uvd_v6_0>
[ 4.836463] kernel: [drm] add ip block number 8 <vce_v3_0>
Then you can add "amdgpu. ip_block_ mask=0xffffff7f " to your kernel command line to skip IP block 7 (uvd_v6_0).
If that helps the issue then it does confirm the out of bounds check is a red herring and the real issue is the uvd stuff. I'd like to see data points for those other kernels I suggested to narrow down when this problem started.