SRU Request for upgrade to X-Server 1.20.14 in focal/20.04-LTS

Bug #1958673 reported by Mario Kleiner
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
psychtoolbox-3 (Ubuntu)
Confirmed
Undecided
Unassigned
xorg-server (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

I was asked by one of your X-Server maintainers (Timo) to file a bug report here to request an upgrade to the current X-Server 1.20.14 for focal / Ubuntu 20.04-LTS from X-Server 1.20.13. I think this is called a SRU?

The 1.20.14 version contains patches to fix some recent CVE's which are already in your 1.20.13 server, and additionally two patches from myself, cfe. the following merge request for the backport:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/778

Therefore an upgrade to 1.20.14 would add my two patches from that merge request. These are trivial backports from master and X-Server 21.1, which did not need any adaptation of the patches.

One patch is a bug fix for a bug present in server 1.20 since day one.

The other patch can be considered a regression fix for Ubuntu 20.04-LTS on hybrid graphics laptops with Intel iGPU + some other dGPU, although this is not a regression in the X-Server itself, but how the video driver selection by current Ubuntu interacts with the server. See explanation below why i consider it a regression fix.

One patch fixes a bug: "Fix RandR leasing for more than 1 simultaneously active lease."

The bug is a trivial code bug, which was there since RandR leasing was introduced into the 1.20 server by Keith Packard. Keith has reviewed my bug description and fix and agreed that it is indeed a bug and that my fix is correct. See the following merge request for the master branch with Keith's R-b in the discussion:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/767

The bug triggers whenever more than one RandR output gets leased out simultaneously during a session. It doesn't matter if multiple outputs get leased to a single client application, or if multiple separate client applications each lease out one output, or any mix of these.

Only the very last output leased in a session can be released back to the server by the corresponding client (voluntarily or due to client exit, crash etc.). All other previously leased out outputs by the same or other clients will be completely dead and unusable until the X-Session is terminated and the X-Server restarted. So this can be seen as a denial of service bug.

Typically affected applications would be Vulkan applications using the direct display extension, e.g., the octave-psychtoolbox-3 package in Ubuntu for scientific/medical research, or high end video games which may use this functionality, and probably most prominently VR applications built with SteamVR or the OpenXR Monado runtime and VR compositor.

It has been tested extensively by myself on single X-Screen and dual X-Screen setups with 1, 2 and three displays, leasing out 1, 2 or 3 outputs.

The second patch is "modesetting: Allow Present flips with mismatched stride on atomic drivers."

It applies to the xorg-video-modesetting ddx.

It does not fix a bug in the strict sense, but it does improve the quality and performance of unredirected fullscreen applications like video games or scientific software like octave-psychtoolbox-3 under PRIME render offload in a way that turns them from "technically correct behavior, but unusable in practice" to "good". The patch is based on a tip by Michel Daenzer from AMD. See the following original merge request for X-Server master for details:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/740

On modern kernels and atomic modesetting drivers, the patch allows the modesetting-ddx to use page flipping for fullscreen applications even if the Pixmap stride between the display gpu (iGPU, e.g., Intel, AMD) and the render offload gpu (e.g., AMD, NVidia) is mismatched, as atomic modesetting drivers can easily handle that.

Without the patch, on such mismatched stride configurations, rejecting the pageflip, normally the X-Server and modesetting-ddx would fall back to a fullscreen copy swap, executed via glamor and therefore Mesa as a framebuffer copy from application backbuffer to system/X-Server frontbuffer. This increases required memory bandwidth and thereby reduces cpu and gpu performance, increases latency and the potential for skipped frames, e.g., for OpenGL and Vulkan applications, and makes presentation timestamps unreliable, which is especially bad for scientific applications like the octave-psychtoolbox-3 package.

More importantly for the general case, it causes massive visual tearing and display artifacts. The current implementation of the X-Servers DRI3/Present backend and glamor + Mesa's does not use hardware synchronization to vertical blank (vsync) for such "copy swaps", only software sync, which is prone to random X-Server scheduling and processing latency between start of vblank and submission of the copy command to Mesa. Additionally Mesa submits the hardware copy commands to the regular 3D engine command stream, so further random delays are added by any other running GUI application, stalling the "copy command hw packet" behind other unrelated 2D or 3D rendering commands or video decoding commands in the gpu hardware command stream. All this taken together usually causes the gpu copy operation only to start while the display is already inside the next active refresh cycle, which causes massive visual tearing artifacts and other rendering artifacts on many games and 3D applications.

The old xorg-video-intel Intel ddx for Intel iGPU does not have this problem as it lacks checks for mismatched stride, so it "just worked" by accident there with any combination of Intel iGPU + AMD/NVidia dGPU on modern atomic modesetting capable Intel kms drivers.

The xorg-video-amdgpu amdgpu ddx has checks in place to ignore stride mismatch on atomic modesetting capable amdgpu kms drivers on modern AMD iGPU's, so works whenever running on suitable hardware + kernel.

This problem only affects the xorg-video-modesetting ddx which lacks handling for this. By default, Ubuntu 20.04 uses amdgpu-ddx for AMD iGPU's, so the combination of AMD iGPU + AMD dGPU is not affected by default. In the past Ubuntu used intel-ddx for Intel iGPU's, avoiding the problem as well.

But current Ubuntu 20.04 uses the modesetting-ddx for Intel iGPU's by default, and has to use it for hw acceleration on recent Gen-10 and later Intel gpu's like Icelake, Tigerlake, ... so the problem is unavoidable for combinations of Intel iGPU + other dGPU. Therefore this patch could be considered a fix for a functional regression of Ubuntu on modern Intel graphics hardware.

Not all combinations of iGPU + dGPU PRIME render offload hardware setups are affected at all video resolution settings. If mismatched stride happens and causes unusable bad tearing depends on the specific iGPU + dGPU hardware combo and display resolution settings. But if it happens, it defeats PRIME renderoffload.

As "Optimus" hybrid graphics laptops are very common, and also models with Intel iGPU + AMD dGPU, this patch should make more of them work better at better performance.

The problem was encountered and the patch extensively tested and confirmed to fix the problem on a Apple MacBookPro mid 2017 15 inch Retina model with Intel Kabylake GT2 (UHD graphics 630) as server display gpu and AMD Radeon Pro 560 (Polaris 11) render offload gpu. I assume many of the new Laptops with Intel + AMD will benefit from this fix as well.

Thanks for consideration
-mario

Tags: focal
tags: added: focal
Changed in xorg-server (Ubuntu Jammy):
status: New → Fix Released
no longer affects: psychtoolbox-3 (Ubuntu Jammy)
Revision history for this message
Rob Pieke (robpieke) wrote :

Good morning! I noticed that this report started with a request to move Focal 20.04 up to 1.20.14, but it looks like only Jammy was moved up? Is there a plan to move Focal up as well?

I selfishly ask as one of the MATE+NVIDIA495/510 sufferers (i.e., https://github.com/mate-desktop/mate-desktop/issues/505)

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in psychtoolbox-3 (Ubuntu):
status: New → Confirmed
Revision history for this message
Rob Pieke (robpieke) wrote :

Just checking in again :) The description of this report talks about Focal, but I only see Jammy in the affected list?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.