i915 DMS firmware kbl_dmc_ver1_{01,04}.bin hang system during suspend

Bug #1857883 reported by Alex Tu
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Invalid
Undecided
Unassigned
OEM Priority Project
Triaged
Undecided
Alex Tu
linux-firmware (Ubuntu)
Invalid
Undecided
You-Sheng Yang

Bug Description

steps I done:
 - install stock Bionic
 - full update and reboot
 - suspend system => Failed with dark screen
 - install linux-oem-osp1 and reboot
 - suspend system => Failed with dark screen

tests I tried:
 - pm_test failed in processors with dark screen

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-oem-osp1 5.0.0.1030.34
ProcVersionSignature: Ubuntu 5.0.0-1030.34-oem-osp1 5.0.21
Uname: Linux 5.0.0-1030-oem-osp1 x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Mon Dec 30 16:59:36 2019
InstallationDate: Installed on 2019-12-30 (0 days ago)
InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805)
SourcePackage: linux-meta-oem-osp1
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: u 1401 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2019-12-30 (0 days ago)
InstallationMedia: Ubuntu 18.04.3 LTS "Bionic Beaver" - Release amd64 (20190805)
MachineType: Dell Inc. Latitude 5590
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-1030-oem-osp1 root=UUID=d065cd98-cc0b-4711-bcab-74f3e677de3a ro quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 5.0.0-1030.34-oem-osp1 5.0.21
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-1030-oem-osp1 N/A
 linux-backports-modules-5.0.0-1030-oem-osp1 N/A
 linux-firmware 1.173.14
Tags: bionic
Uname: Linux 5.0.0-1030-oem-osp1 x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 09/19/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.11.1
dmi.board.name: 0083K0
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.11.1:bd09/19/2019:svnDellInc.:pnLatitude5590:pvr:rvnDellInc.:rn0083K0:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: Latitude
dmi.product.name: Latitude 5590
dmi.product.sku: 0817
dmi.sys.vendor: Dell Inc.

Revision history for this message
Alex Tu (alextu) wrote :
Revision history for this message
Alex Tu (alextu) wrote :

this bug is created by "apport-bug linux-oem-osp1"

Changed in oem-priority:
assignee: nobody → Alex Tu (alextu)
tags: added: apport-collected
description: updated
Revision history for this message
Alex Tu (alextu) wrote : AlsaInfo.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : CRDA.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : IwConfig.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : Lspci.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : Lsusb.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcEnviron.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : ProcModules.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : PulseList.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : RfKill.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : UdevDb.txt

apport information

Revision history for this message
Alex Tu (alextu) wrote : WifiSyslog.txt

apport information

tags: added: oem-priority originate-from-1857526 somerville
Revision history for this message
Alex Tu (alextu) wrote : Re: system failed to suspend

the pre-loaded Xenial OEM image works well.

So, I tried to update the kernel on Xenial OEM, to approach the environment of Bionic.
The steps are:
 1. install latest oem kernel 5.0.0-1030-oem-osp1 , then systemctl suspend and passed.
 2. install latest linux-firmware 1.173.14 , then systemctl suspend <= FAILD

Please refer to following attach message by "apport-collect -p linux-firmware"

From Bionic side,
because there's no error message during dark screen (/var/log/kern.log from next boot), so I tried debugging it by pm_test on Bionic. Then this issue can be reproduced by processors sleep level.

Revision history for this message
Alex Tu (alextu) wrote :

because there's issue on apport , so I attach sosreport instead here.

Changed in oem-priority:
status: New → Triaged
You-Sheng Yang (vicamo)
no longer affects: linux-meta-oem-osp1 (Ubuntu)
Revision history for this message
You-Sheng Yang (vicamo) wrote :

I can reproduce this with clean Xenial oem install with linux-firmware upgraded from 1.157.12~somerville4 to 1.157.22. However, that 1.157.12~somerville4 is no longer available anywhere but the factory recovery image, so I can only downgrade to 1.157, but it's still affected. So I think some change in 1.157.12~somerville4 is critical to keep the system, especially ath10k, stable through the suspend resume process.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

I can reproduce with following steps:

1. do factory recovery (Xenial oem install)
2. upgrade linux-firmware along to 1.157.22
3. completely power off and reboot after 1 minute
4. trigger system suspend

In the beginning CAP lock indicator is blinking, and then keyboard may have one flash, then power LID always on and leaves the system hard lockup state. No response to kernel magic keys or power button (short) presses.

affects: linux-oem-osp1 (Ubuntu) → linux-firmware (Ubuntu)
Changed in linux-firmware (Ubuntu):
assignee: nobody → You-Sheng Yang (vicamo)
status: New → In Progress
Revision history for this message
You-Sheng Yang (vicamo) wrote :

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux-firmware/commit/?h=xenial&id=4c85562e738eaf5c8b96ef832954aeb10523f970 is the first broken commit that may hang the system in the first or second suspend/resume process when booting with 4.4.0-100-generic kernel, and linux-firmware 1.157.22 will be also safe with commit 4c85562e738e reverted.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

This commit was first cherry-picked to linux-firmware from upstream in bug 1637481 as https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux-firmware/commit/?h=xenial&id=f428ce858f2bcd5a5e5c0869db6ac09a1a110d79 for bringing in firmware blobs from yakkety, and was then reverted as https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux-firmware/commit/?h=xenial&id=2657ec0cf07071aeaa5a72460ff70fd0587e92b8 with commit messages (no bug link):

  Revert "linux-firmware: First DMC image for Kabylake."

  This reverts commit f428ce858f2bcd5a5e5c0869db6ac09a1a110d79,
  which is causing sound to stop working on some Kabylake
  systems.

And then added back as https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux-firmware/commit/?h=xenial&id=4c85562e738eaf5c8b96ef832954aeb10523f970 in bug 1711400, and finally become an issue yet again here.

The latest bug 1711400 was to pull firmware blobs for 4.12 linux-hwe-edge kernels like the first one did. No further detail about the audio regression status with this re-committed.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

This issue can be reproduced only on:

  00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620
  [8086:5917] (rev 07) (prog-if 00 [VGA controller])
   Subsystem: Dell UHD Graphics 620 [1028:0817]

Latest v5.5-rc5 mainline kernel + korg HEAD linux-firmware doesn't fix this. Same with drm-tip/2020-01-07.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=1156e62c5ec45061955a29e1b9299ffda58479d3 this commit introduced an update (v1.04 vs. v1.01) to the i915 DMC for Kaby Lake and has been included in linux-firmware/bionic and on. However this doesn't fix this issue and is still reproducible on following kernel versions:

  * 4.15.0-74-generic: v1.01
  * 4.18.0-25-generic: v1.04
  * 5.0.0-37-generic: v1.04
  * 5.0.0-1033-oem-osp1: v1.04
  * 5.3.0-24-generic: v1.04
  * v5.5-rc5: v1.04
  * drm-tip/2020-01-07: v1.04

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Filed public issue https://gitlab.freedesktop.org/drm/intel/issues/933. Since this affects all kernel and linux-firmware versions, removing kbl_dmc_ver1_*.bin doesn't seem a viable solution that we'll accept globally. Need further inputs. Pending.

Changed in linux-firmware (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
You-Sheng Yang (vicamo) wrote :

By the way, with kernel newer than 4.18, at the first time suspend/resume may succeed, but the second time and on it will always hang.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

This should affect all linux-firmware >= 1.157.13 (factory image has 1.157.12~somerville4, xenial-updates has 1.157.22).

The only work-around so far is to remove /lib/firmware/i915/kbl_dmc_ver1_01.bin (and /lib/firmware/i915/kbl_dmc_ver1_04.bin if available).

Revision history for this message
xiaoliang (liang-xiao1) wrote :

@Kent

Is there any comment on why this issue only happened on the config from customer, but not on our configs.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

DMC here stands for Display Micro-Controller, and the blobs are firmware for DMC. So that's probably a question for VGA and/or BIOS vendors.

You-Sheng Yang (vicamo)
summary: - system failed to suspend
+ i915 DMS firmware kbl_dmc_ver1_{01,04}.bin hang system during suspend
Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote :

> Is there any comment on why this issue only happened on the config from customer, but not on our configs.

I would recommend to compare CPU stepping, perhaps a regression only present in later CPU stepping after pre-production samples.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Attach log captured with v5.6-rc7. However, without hardware console, there is no any information dumped when system is trying suspend. This is quite critical for debugging suspend/resume problems.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Attach log captured with v5.6-rc7. However, without hardware console, there is no any information dumped when system is trying suspend. This is quite critical for debugging suspend/resume problems.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

cpu family : 6
model : 142
model name : Intel(R) Core(TM) i3-8130U CPU @ 2.20GHz
stepping : 10
microcode : 0xca

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

assuming fixed since

Changed in linux-firmware (Ubuntu):
status: Incomplete → Invalid
Changed in hwe-next:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.