AMD Rembrandt / Phoenix PSR-SU related freezes

Bug #2024774 reported by Mario Limonciello
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux Firmware
Fix Released
Unknown
linux-firmware (Ubuntu)
Status tracked in Mantic
Jammy
Fix Released
Undecided
Unassigned
Lunar
Fix Released
Undecided
Unassigned
Mantic
Fix Released
Undecided
Unassigned

Bug Description

[ Impact ]

 When using kernel 6.2 or later AMD has enabled PSR selective update (PSR-SU).
 After a non-deterministic amount of time the system may hang with a message like this in the logs:
 "[amdgpu 0000:67:00.0: [drm] *ERROR* [CONNECTOR:78:eDP-1] commit wait timed out]"

 Affects users of laptops that contain:
 * AMD Rembrandt (Yellow Carp) or AMD Phoenix (Pink Sardine) chips
 * eDP panels with Parade TCONs (8-03 and 8-01 both reported to fail)

[ Test Plan ]

 * Test an affected laptop with the newer firmware and ensure that PSR-SU function can be enabled and system is stable.
 * Ensure other functions such as hotplugging monitors and suspending continue to work.

[ Where problems could occur ]

 * Affected firmware only is loaded on Rembrandt and Phoenix laptops. Problems would be localized to these machines.

[ Other Info ]
The minimum firmware needed to help these hangs:
* Rembrandt: 0x400003a or later
* Phoenix: 0x8001000 or later

The following commit upgrades the firmware for Rembrandt (amdgpu/yellow_carp_dmcub.bin) to 0x400003c:
9dbd8ec2 ("amdgpu: DMCUB updates for various AMDGPU asics")

The following commit upgrades the firmware for Phoenix (amdgpu/dcn_3_1_4_dmcub.bin) to 0x8001a00:
045b2136 ("amdgpu: update DMCUB to v0.0.172.0 for various AMDGPU ASICs")

The TCON in a given laptop can be identified from the DPCD with this script:
https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/psr.py

Revision history for this message
Mario Limonciello (superm1) wrote :

I include tasks for Jammy because 6.2-HWE kernel will backport soon and expose this issue in Jammy.

description: updated
Revision history for this message
Mario Limonciello (superm1) wrote :

For reference, current version of linux-firmware (20220329.git681281e4-0ubuntu3.14) contains following versions:

Rembrandt: 0x4000022
Phoenix: 0x8000e00

Changed in linux-firmware:
status: Unknown → New
Juerg Haefliger (juergh)
tags: added: kern-7206
Revision history for this message
Mario Limonciello (superm1) wrote :

Upstream linux-firmware has a tag that includes these firmware (20230625) so perhaps mantic task can be closed by syncing to that tag.

Revision history for this message
Juerg Haefliger (juergh) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote :

Thanks, I see mantic has migrated as well, closing the mantic task.

Changed in linux-firmware (Ubuntu Mantic):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu Jammy):
status: New → Confirmed
Changed in linux-firmware (Ubuntu Lunar):
status: New → Confirmed
Revision history for this message
Roemer Claasen (rclaasen) wrote :
Download full text (56.6 KiB)

Hi Mario,

Not entirely sure if this is the exact logging you are looking for, but this is the result from my testing.

I've installed the latest firmware, running mainline kernel 6.4.0 on Jammy LTS on an AMD 6850u T14s Gen 3.

Hope this helps to pinpoint the PSR problems. If you would like me to try patches or would like more logs, please let me know what I can do to support.

Thanks for all the work, kind regards,

Roemer

➜ ~ sudo python Downloads/psr.py

DRI device 0 DMCUB F/W version: 0x0400003c
○ PSR 2 with Y coordinates (eDP 1.4a) [3]
○ Sink OUI: Parade
○ resv_40f: 01
○ ID String: 08-03
○ PSR Status: 00-00-02

➜ ~ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.4.0-060400-generic root=UUID=41d8f993-282a-48a5-b355-6f537a3a17ab ro quiet splash amdgpu.dcdebugmask=0x0 vt.handoff=7

Linux firmware
➜ ~ git log
commit ee91452dac5abfc4c5b9827cf55e701d8c0ca678 (HEAD -> main, tag: 20230625, origin/main, origin/HEAD)
Author: Emil Velikov <email address hidden>
Date: Mon Jun 5 14:58:12 2023 +0100

20230703 kern.log: amdgpu crash, no freeze, system recovering

Jul 3 19:45:50 rct14s kernel: [ 1001.214627] ------------[ cut here ]------------
Jul 3 19:45:50 rct14s kernel: [ 1001.214638] WARNING: CPU: 3 PID: 237 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:126 dmub_psr_get_state+0xcc/0xe0 [amdgpu]
Jul 3 19:45:50 rct14s kernel: [ 1001.215248] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc nvme_fabrics ccm michael_mic rfcomm vboxnetadp(OE) vboxnetflt(OE) snd_seq_dummy snd_hrtimer vboxdrv(OE) cmac algif_hash algif_skcipher af_alg bnep overlay qrtr_mhi amdgpu snd_soc_dmic snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci intel_rapl_msr snd_sof_xtensa_dsp intel_rapl_common snd_sof snd_sof_utils edac_mce_amd qrtr snd_soc_core ath11k_pci snd_compress ath11k btusb ac97_bus snd_ctl_led snd_pcm_dmaengine kvm_amd btrtl iommu_v2 uvcvideo snd_pci_ps btbcm drm_buddy joydev snd_hda_codec_realtek btintel videobuf2_vmalloc qmi_helpers kvm snd_rpl_pci_acp6x btmtk gpu_sched snd_hda_codec_generic binfmt_misc snd_hda_codec_hdmi uvc mac80211 snd_acp_pci irqbypass snd_hda_intel drm_suballoc_helper snd_seq_midi crct10dif_pclmul
Jul 3 19:45:50 rct14s kernel: [ 1001.215324] bluetooth drm_ttm_helper snd_seq_midi_event snd_intel_dspcfg videobuf2_memops snd_pci_acp6x ttm polyval_clmulni videobuf2_v4l2 snd_intel_sdw_acpi ecdh_generic snd_rawmidi snd_hda_codec drm_display_helper polyval_generic nls_iso8859_1 ecc input_leds ghash_clmulni_intel snd_pci_acp5x videodev snd_hda_core sha512_ssse3 cec cfg80211 aesni_intel snd_hwdep rc_core videobuf2_common snd_seq thinkpad_acpi crypto_simd snd_rn_pci_acp3x snd_pcm nvram drm_kms_helper mc libarc4 cryptd think_lmi snd_acp_config ledtrig_audio serio_raw i2c_algo_bit snd_seq_device hid_multitouch snd_soc_acpi firmware_attributes_class wmi_bmof platform_profile sch_fq_codel k10temp ucsi_acpi syscopyarea rapl mhi sysfillrect snd_pci_acp3x sysimgblt snd_timer typec_...

Revision history for this message
Mario Limonciello (superm1) wrote :

> Not entirely sure if this is the exact logging you are looking for, but this is the result from my testing.

Thanks for having a try.

> ➜ ~ sudo python Downloads/psr.py
> DRI device 0 DMCUB F/W version: 0x0400003c
> ○ PSR 2 with Y coordinates (eDP 1.4a) [3]
> ○ Sink OUI: Parade
> ○ resv_40f: 01
> ○ ID String: 08-03
> ○ PSR Status: 00-00-02

You do have an affected panel, and if the system isn't freezing this shows that the updated firmware worked. The version you tested ( 0x0400003c ) is a little newer than I suggested ( 0x400003a ) but that's fine. The fix carries forward.

> BOOT_IMAGE=/boot/vmlinuz-6.4.0-060400-generic root=UUID=41d8f993-282a-48a5-b355-6f537a3a17ab ro quiet splash amdgpu.dcdebugmask=0x0 vt.handoff=7

Good; you don't have PSR manually disabled.

> Jul 3 19:45:50 rct14s kernel: [ 1001.214638] WARNING: CPU: 3 PID: 237 at drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:126 dmub_psr_get_state+0xcc/0xe0 [amdgpu]

This warning we're tracking at https://gitlab.freedesktop.org/drm/amd/-/issues/2645

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the upstream commits update firmware for other asics too, meaning that we'd need to pull in earlier updates for all of them and not just Rembrandt/Phoenix..

Revision history for this message
Mario Limonciello (superm1) wrote :

Ah, in that case I would say it shouldn't be a cherry pick. Just pick the file.

IE We don't need the whole commit for either, just amdgpu/yellow_carp_dmcub.bin (For RMB) and amdgpu/dcn_3_1_4_dmcub.bin (for PHX).

Juerg Haefliger (juergh)
Changed in linux-firmware (Ubuntu Jammy):
status: Confirmed → Fix Committed
Changed in linux-firmware (Ubuntu Lunar):
status: Confirmed → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Mario, or anyone else affected,

Accepted linux-firmware into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20230323.gitbcdcfbcf-0ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Mario, or anyone else affected,

Accepted linux-firmware into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20220329.git681281e4-0ubuntu3.15 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Mario Limonciello (superm1) wrote :

I've verified the PHX jammy update (which jumped up to 0x8001b00) with a non-problematic panel doesn't cause any additional regressions with display or suspend on both 6.4-rc1 or with 6.1-OEM 1015.

Changed in linux-firmware:
status: New → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Hello Mario, or anyone else affected,

Accepted linux-firmware into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20230323.gitbcdcfbcf-0ubuntu1.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Robie Basak (racb) wrote :

Hello Mario, or anyone else affected,

Accepted linux-firmware into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20220329.git681281e4-0ubuntu3.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-done-jammy
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

how about verifying lunar too?

Revision history for this message
Mario Limonciello (superm1) wrote :

yeah double checked lunar kernel 6.2.0-26-generic + newer GPU F/W on both Rembrandt and Phoenix platforms and didn't observe any new problems.

tags: added: verification-done-lunar
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Update Released

The verification of the Stable Release Update for linux-firmware has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 20220329.git681281e4-0ubuntu3.16

---------------
linux-firmware (20220329.git681281e4-0ubuntu3.16) jammy; urgency=medium

  * Follow-up: potential S3 issue for amdgpu Navi 31/Navi33 (LP: #2027959)
    - amdgpu: update GC 11.0.1 firmware for amd.5.5 release
    - amdgpu: update GC 11.0.4 firmware for amd.5.5 release
    - amdgpu: Update GC 11.0.1 and 11.0.4
  * Add firmware files for HP G10 series laptops (LP: #2023193)
    - cirrus: Add firmware and tuning files for HP G10 series laptops

linux-firmware (20220329.git681281e4-0ubuntu3.15) jammy; urgency=medium

  * upgrade iwlwifi firmware of FW API 72 for WiFi 6E support in Malaysia and Morocco (LP: #2020627)
    - iwlwifi: add new FWs from core72-129 release
    - iwlwifi: add new PNVM binaries from core74-44 release
    - iwlwifi: add new FWs from core74_pv-60 release
    - iwlwifi: add new FWs from core75-47 release
    - iwlwifi: add new FWs from core76-35 release
    - iwlwifi: update core69 and core72 firmwares for Ty device
    - iwlwifi: update core69 and core72 firmwares for So device
  * i915: Add DMC/GuC/HuC firmware for Meteor Lake (LP: #2026253)
    - i915: Add DMC v2.11 for MTL
    - i915: Update MTL DMC to v2.12
    - i915: Add GuC v70.6.6 for MTL
    - i915: Add HuC v8.5.0 for MTL
  * AMD Rembrandt / Phoenix PSR-SU related freezes (LP: #2024774)
    - SAUCE: DMCUB updates for DCN314 and Yellow Carp
  * potential S3 issue for amdgpu Navi 31/Navi33 (LP: #2024427)
    - amdgpu: update GC 11.0.0 firmware for amd.5.5 release
    - amdgpu: update GC 11.0.2 firmware for amd.5.5 release

 -- Juerg Haefliger <email address hidden> Wed, 19 Jul 2023 10:37:52 +0200

Changed in linux-firmware (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 20230323.gitbcdcfbcf-0ubuntu1.4

---------------
linux-firmware (20230323.gitbcdcfbcf-0ubuntu1.4) lunar; urgency=medium

  * Follow-up: potential S3 issue for amdgpu Navi 31/Navi33 (LP: #2027959)
    - amdgpu: update GC 11.0.1 firmware for amd.5.5 release
    - amdgpu: update GC 11.0.4 firmware for amd.5.5 release
    - amdgpu: Update GC 11.0.1 and 11.0.4
  * Add firmware files for HP G10 series laptops (LP: #2023193)
    - cirrus: Add firmware and tuning files for HP G10 series laptops

linux-firmware (20230323.gitbcdcfbcf-0ubuntu1.3) lunar; urgency=medium

  * AMD Rembrandt / Phoenix PSR-SU related freezes (LP: #2024774)
    - SAUCE: DMCUB updates for DCN314 and Yellow Carp
  * potential S3 issue for amdgpu Navi 31/Navi33 (LP: #2024427)
    - amdgpu: update GC 11.0.0 firmware for amd.5.5 release
    - amdgpu: update GC 11.0.2 firmware for amd.5.5 release

 -- Juerg Haefliger <email address hidden> Wed, 19 Jul 2023 10:46:52 +0200

Changed in linux-firmware (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Juerg Haefliger (juergh) wrote :

* AMD Rembrandt / Phoenix PSR-SU related freezes (LP: #2024774)
    - SAUCE: DMCUB updates for DCN314 and Yellow Carp

Seems to cause another regression: bug 2030795

Revision history for this message
Anson Tsao (ansontsao) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote (last edit ):

That's odd; I did explicitly test it with 6.2.0-26 in Ubuntu as did a lot of people in AMD Gitlab with newer kernels. I suspect it's related to specific TCON in their panel. But I'm glad to hear it's fixed by the newer one. I have no qualms with that coming in.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.