Extra glClear() calls (stellarium 0.15, Raspberry Pi)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Stellarium |
Confirmed
|
Undecided
|
Florian Schaukowitsch |
Bug Description
I was wondering why stellarium was so slow on vc4 (Raspberry Pi), and VC4_DEBUG=perf told me that we were getting some extra frame draws due to multiple glClear() calls within a frame. Here's an example of an extraneous glClear call I found in apitrace:
11207 glClearColor(red = 0, green = 0, blue = 0, alpha = 1)
11208 glClear(mask = GL_COLOR_
11209 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 7)
11210 glBufferSubData
11211 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
11212 glClearColor(red = 0, green = 0, blue = 0, alpha = 0)
11213 glClear(mask = GL_COLOR_
For vc4 and other tiled renderers, it's really important that we get all the glClear() calls stacked up at the start of the frame before any other drawing has happens. I've got some tricks to recognize a series of glClear()s as a single clear, and I succeed at this one, but non-tiled renderers won't go to the work to merge the two clears you've done here, so it's a waste for them.
The actual vc4 performance penalty is here:
17488 glDrawElements(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_INT, indices = blob(144))
17489 glVertexAttrib3
17490 glVertexAttrib3
17491 glVertexAttrib3
17492 glStencilMask(mask = 255)
17493 glClearStencil(s = 0)
17494 glScissor(x = 0, y = 0, width = 1920, height = 1080)
17495 glClear(mask = GL_STENCIL_
17496 glDisable(cap = GL_STENCIL_TEST)
17497 glStencilFunc(func = GL_ALWAYS, ref = 0, mask = 255)
17498 glDisable(cap = GL_SCISSOR_TEST)
17499 glColorMask(red = GL_FALSE, green = GL_FALSE, blue = GL_FALSE, alpha = GL_FALSE)
17500 glUseProgram(
17501 glEnable(cap = GL_STENCIL_TEST)
17502 glStencilMask(mask = 128)
17503 glStencilOp(fail = GL_KEEP, zfail = GL_KEEP, zpass = GL_REPLACE)
17504 glStencilFunc(func = GL_ALWAYS, ref = 128, mask = 255)
stellarium has done some drawing in the frame, and decides to clear the stencil for the first time. I could potentially notice that the stencil has never been used before, but if I do so then I have the problem that stencil and depth are in the same buffer, so I can't just clear stencil on its own (I would have to draw a full screen quad with func=GL_ALWAYS). I'm going to try to fix vc4 to draw a quad instead of flushing at this point, but we could do better if you cleared color and depth at the start of the frame. Doing so would hurt non-tiled renderers, unfortunately, so maybe the solution would be for me to expose visuals with depth but not stencil (so that you can communicate to the driver that it can always ignore the depth bits)
Attaching the stellarium apitrace trace dump
Thank you very much for report and for suggest for an apitrace utilite!