On Thu, Aug 22, 2013 at 06:10:50PM -0000, Aditya wrote:
> May be the community is looking for the solution in the wrong direction.
Well, I will elaborate on this bug once and for all.
First of all, we can't blame the Linux kernel for this bug, because it is
firmware-specific.
"firmware-specific" means that the peace of software (i.e., the backlight
driver) causing this bug is written by the affected systems manufacturers
(Dell, HP, etc.), and burned into the BIOS ROM chips of those systems. This
software is closed-source, so there is no way to fix it, other than via
firmware updates provided by the manufacturers.
The freeze happens at the time of the interaction between the Linux kernel and
the backlight driver provided by the firmware. Basically, It happens as
follows:
- The user adjust the brightness using the keyboard hot keys, or a
software application (e.g., gnome-control-center).
- In the first case, The X window system provides a prioritized
list of backlight drivers interfaces (exported by the Linux
kernel under /sys/class/backlight/) that are used for
handling the keyboard key presses. The highest priority
interface, which are present on the system, is used.
- In the second case, the software application just selects the
interface to use for adjusting the brightness from the
currently available interfaces under /sys/class/backlight/
- Depending on the driver underlying the interface selected at the
previous step, the system may or may not enter SMM (System Management
Mode) to adjust the brightness. SMM is an operational mode which the
system enters when it wants to run critical firmware (e.g., the peace
of software responsible for shutting down the system when the
processor temperature hit a critical threshold, or in our case, the
peace of software responsible for adjusting the brightness level,
etc). When the system is executing software in SMM, it is no longer
under the control of the Linux kernel, and is fully controlled by the
firmware executed.
The video0_acpi and the dell_backlight (or whatever it's called on
systems other than Dell) interfaces under /sys/class/backlight/, are
interfaces, exported by the Linux kernel, for firmware drivers that
execute in SMM. So if they are selected in the previous step, the
system is going to enter SMM for adjusting the brightness. On the
other hand, the intel_backlight interface on systems with Intel
Graphics is just an interface for the Linux kernel driver responsible
for adjusting the backlight level of the intel graphics chip. It is
executed just like any other driver in the Linux kernel, without the
need to enter special modes like SMM. So, if this driver is buggy, we
can say that the Linux kernel is buggy, because it is considered part
of the kernel. So, we have two cases:
1. The driver underlying the interface selected (by the X
window system or the software application used for adjusting
the brightness) is firmware. In this case, when adjusting the
brightness, the Linux kernel just instructs the processor to
enter SMM in order to execute the instructions of this
driver, and when finished, it takes control back on the
system.
2. The driver underlying the interface selected, is the Linux
kernel driver for the graphics chip. In this case, when
adjusting the brightness, this driver, provided by the Linux
kernel, is responsible for doing the job, while the system
is fully controlled by the Linux kernel, and without the
need for entering any special modes like SMM for executing
opaque firmware.
In case 1, we have a problem, because the kernel has another driver,
exported via /proc/sys/kernel/nmi_watchdog, that uses a hardware
timer to periodically issue signals called NMIs (Non-Maskable
Interrupts) every second or two.
If an NMI is emitted while the system is operating in SMM, the buggy
firmware executing in SMM causes the system to freeze.
In case 2, we are fine, because there is no buggy firmware involved.
The apci_backlight=vendor solution is not reliable, because all it does is to
instruct the Linux kernel to not export the acpi_video0 interface, which is the
interface for the BIOS ACPI backlight driver that is executed from SMM. But, it
also instructs the kernel to export the interface for the vendor driver, which
is also firmware and is executed from SMM. So, depending on the priority of
those two drivers in the X window system configurations (having higher/lower
priority than a kernel-supplied driver like intel_backlight), or the ad-hoc way
, by which, a backlight-adjusting application selects the interface to use, the
interface for the vendor driver (executing in SMM) may be the one that is
selected after booting with the acpi_backlight=vendor kernel parameter. That's
why using this kernel parameter sometimes doesn't work; we just replaced one
buggy firmware driver executing in SMM with another buggy firmware driver
executing in SMM.
CONCLUSION: The only reliable way of avoiding this bug on systems with buggy
firmware is by putting this line in /etc/rc.local
echo 0 >/proc/sys/kernel/nmi_watchdog
When executed, this command will instruct the kernel to stop emitting NMIs
periodically, and therefore, we can avoid the conflict resulting when the X
window system or backlight-adjusting software applications selects an interface
exported by the Linux kernel for a firmware driver that has to be executed from
SMM.
> Looking in /sys/class/backlight/ lists 3 folders on my Dell inspiron 7520.
> One of the folders is intel_backlight .
>
> Manually doing
> root@Sirius:~# echo 2000 > /sys/class/backlight/intel_backlight/brightness
> works.
You are using the wrong way for testing this bug. You have to do quick
successive adjustments to reproduce the bug. Try using a script like the
fluctuate_backlight.sh shell script provided in the attachments of the bug
report at https://bugzilla.kernel.org/show_bug.cgi?id=57571 for reproducing the
bug via the sysfs interfaces. I wrote it and used it a lot for testing, while
following up to this bug report. It always works right with intel_backlight,
though, so it won't make a difference in this case, because, as mentioned
above, intel_backlight is an interface for a driver that isn't executed from
SMM... However, I was able to use it to reproduce the bug with all other
interfaces for drivers executing from SMM (e.g., acpi_video0, and
dell_backlight, on Dell systems).
Oh, forgot to mention that there is another more reliable way for avoiding this
bug: buy a new laptop, and don't forget to try it out in the store before
finishing the deal :-)
On Thu, Aug 22, 2013 at 06:10:50PM -0000, Aditya wrote:
> May be the community is looking for the solution in the wrong direction.
Well, I will elaborate on this bug once and for all.
First of all, we can't blame the Linux kernel for this bug, because it is
firmware-specific.
"firmware-specific" means that the peace of software (i.e., the backlight
driver) causing this bug is written by the affected systems manufacturers
(Dell, HP, etc.), and burned into the BIOS ROM chips of those systems. This
software is closed-source, so there is no way to fix it, other than via
firmware updates provided by the manufacturers.
The freeze happens at the time of the interaction between the Linux kernel and
the backlight driver provided by the firmware. Basically, It happens as
follows:
- The user adjust the brightness using the keyboard hot keys, or a center) .
software application (e.g., gnome-control-
- In the first case, The X window system provides a prioritized backlight/ ) that are used for
list of backlight drivers interfaces (exported by the Linux
kernel under /sys/class/
handling the keyboard key presses. The highest priority
interface, which are present on the system, is used.
- In the second case, the software application just selects the backlight/
interface to use for adjusting the brightness from the
currently available interfaces under /sys/class/
- Depending on the driver underlying the interface selected at the
previous step, the system may or may not enter SMM (System Management
Mode) to adjust the brightness. SMM is an operational mode which the
system enters when it wants to run critical firmware (e.g., the peace
of software responsible for shutting down the system when the
processor temperature hit a critical threshold, or in our case, the
peace of software responsible for adjusting the brightness level,
etc). When the system is executing software in SMM, it is no longer
under the control of the Linux kernel, and is fully controlled by the
firmware executed.
The video0_acpi and the dell_backlight (or whatever it's called on backlight/ , are
systems other than Dell) interfaces under /sys/class/
interfaces, exported by the Linux kernel, for firmware drivers that
execute in SMM. So if they are selected in the previous step, the
system is going to enter SMM for adjusting the brightness. On the
other hand, the intel_backlight interface on systems with Intel
Graphics is just an interface for the Linux kernel driver responsible
for adjusting the backlight level of the intel graphics chip. It is
executed just like any other driver in the Linux kernel, without the
need to enter special modes like SMM. So, if this driver is buggy, we
can say that the Linux kernel is buggy, because it is considered part
of the kernel. So, we have two cases:
1. The driver underlying the interface selected (by the X
window system or the software application used for adjusting
the brightness) is firmware. In this case, when adjusting the
brightness, the Linux kernel just instructs the processor to
enter SMM in order to execute the instructions of this
driver, and when finished, it takes control back on the
system.
2. The driver underlying the interface selected, is the Linux
kernel driver for the graphics chip. In this case, when
adjusting the brightness, this driver, provided by the Linux
kernel, is responsible for doing the job, while the system
is fully controlled by the Linux kernel, and without the
need for entering any special modes like SMM for executing
opaque firmware.
In case 1, we have a problem, because the kernel has another driver, kernel/ nmi_watchdog, that uses a hardware
exported via /proc/sys/
timer to periodically issue signals called NMIs (Non-Maskable
Interrupts) every second or two.
If an NMI is emitted while the system is operating in SMM, the buggy
firmware executing in SMM causes the system to freeze.
In case 2, we are fine, because there is no buggy firmware involved.
The apci_backlight= vendor solution is not reliable, because all it does is to vendor kernel parameter. That's
instruct the Linux kernel to not export the acpi_video0 interface, which is the
interface for the BIOS ACPI backlight driver that is executed from SMM. But, it
also instructs the kernel to export the interface for the vendor driver, which
is also firmware and is executed from SMM. So, depending on the priority of
those two drivers in the X window system configurations (having higher/lower
priority than a kernel-supplied driver like intel_backlight), or the ad-hoc way
, by which, a backlight-adjusting application selects the interface to use, the
interface for the vendor driver (executing in SMM) may be the one that is
selected after booting with the acpi_backlight=
why using this kernel parameter sometimes doesn't work; we just replaced one
buggy firmware driver executing in SMM with another buggy firmware driver
executing in SMM.
CONCLUSION: The only reliable way of avoiding this bug on systems with buggy
firmware is by putting this line in /etc/rc.local
echo 0 >/proc/ sys/kernel/ nmi_watchdog
When executed, this command will instruct the kernel to stop emitting NMIs
periodically, and therefore, we can avoid the conflict resulting when the X
window system or backlight-adjusting software applications selects an interface
exported by the Linux kernel for a firmware driver that has to be executed from
SMM.
> Looking in /sys/class/ backlight/ lists 3 folders on my Dell inspiron 7520. backlight/ intel_backlight /brightness
> One of the folders is intel_backlight .
>
> Manually doing
> root@Sirius:~# echo 2000 > /sys/class/
> works.
You are using the wrong way for testing this bug. You have to do quick backlight. sh shell script provided in the attachments of the bug /bugzilla. kernel. org/show_ bug.cgi? id=57571 for reproducing the
successive adjustments to reproduce the bug. Try using a script like the
fluctuate_
report at https:/
bug via the sysfs interfaces. I wrote it and used it a lot for testing, while
following up to this bug report. It always works right with intel_backlight,
though, so it won't make a difference in this case, because, as mentioned
above, intel_backlight is an interface for a driver that isn't executed from
SMM... However, I was able to use it to reproduce the bug with all other
interfaces for drivers executing from SMM (e.g., acpi_video0, and
dell_backlight, on Dell systems).
Oh, forgot to mention that there is another more reliable way for avoiding this
bug: buy a new laptop, and don't forget to try it out in the store before
finishing the deal :-)