Eric Anholt ([info]anholt) wrote,
@ 2008-09-12 16:15:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:moblin

taste of success
As part of the work we're doing for Moblin, I got our head trees in shape for GL compositing on GEM. We had a nice demo of the technology at XDS from krh, and today I got the same setup working: glxgears and totem on a cube all playing nicely together. It feels like things are finally coming together, and we'll be ready to support this stuff in releases soon, even if it won't be at the end of this release cycle.

All the code's in master, and available to test. The usual warnings about master apply, and just to show how serious we are about our in-development code, among the known issues are:

Can't do UXA (and therefore DRI2) with tiling.
It's kind of hard to teach X about tiled buffers. NVIDIA went with wrapping all pixel access from libfb to produce libwfb. Our plan is to return to GTT-managed access of buffers in the X Server so we don't have to teach it about this -- we keep the nice write-combined performance we're used to with framebuffer access, though we also suffer from painful read performance we're used to if we have fallbacks that read (See also: Render gradients and convolutions).

2D performance has tanked when you use UXA.
If you fallback on front buffer rendering with the X Server, GEM goes wild with cache flushing because the X Server isn't telling it just what memory it touched, yet we're telling GEM that the results need to land in the front buffer "soon". This means that compositing is fast, but non-composited isn't if you run emacs or gitk anything else that causes fallbacks. There are a couple of potential fixes for this, but the current plan is to avoid it using GTT mapping, which resolves the coherency issue.

Long term, we're going to be adding support for telling GEM what pages we touched after the fact (or maybe use fault-based clflushing) for OpenGL, and at that point we may reconsider how X maps its buffers.

3D performance has tanked in some cases
Haven't debugged this one, and it's next on the list. Can't reproduce on the test systems here.

Applications hang on to a lot of buffer objects
In TTM we would allocate giant buffer objects and then suballocate out of them, because allocation performance was so slow. This meant that userland had to pay attention to a lot of fencing issues (and you had to expose the idea of a fence) so that you could know which pieces were still used.

We went with a simpler model for GEM, where the userland caches buffer objects of similar size, and reuses them when the kernel tells us they're no longer used by the GPU. By returning "freed" buffers to the cache, we get wonderful performance when an app is running flat out and allocating and freeing buffers like mad. However, as we think about cairo moving to a GL backend, and all your apps sitting waiting for input hanging onto these cached buffers, the memory usage is likely to become an issue. They're pageable, but that's slow and we're looking at systems with limited memory+swap anyway. Instead, we need to free buffers when they're not serving any use in the cache. One way we're thinking about is on allocate, when a cached buffer is "really old" (seconds), actually free it.

That still leaves some excess memory laying around when an app does some rendering and then stops for input. The long-term solution would be for userland to tell the kernel when it wasn't going to actively use a buffer and the contents could be thrown out. Then, in the memory pressure callback from the kernel, we can throw out cached buffers and nuke mappings, and userland has to allocate new ones when it needs.




(7 comments) - (Post a new comment)


[info]hub_
2008-09-13 12:11 am UTC (link)
This lead to two questions:

will this also impact the "desktop" chipset as found in laptops (including older ones) ?

will this also help the other free driver for other families (like RadeonHD) since it seems to be part of the infrastructure? (correct me if I'm wrong)

(Reply to this) (Thread)


[info]anholt
2008-09-13 08:06 pm UTC (link)
I don't know what you mean by '"desktop" chipset as found in laptops'

We're not doing anything for other driver families. Dave Airlie at RedHat is doing plenty of work on the real radeon driver for catching it up to where we're going with Intel (building memory management, kernel modesetting, and new hardware support), and I'm not sure what the nouveau guys are up to. However, the new incarnation of DRI2 should be a significant simplification of people's DRI drivers once they've got memory management and can go DRI2-only.

(Reply to this) (Parent)(Thread)


[info]hub_
2008-09-13 08:10 pm UTC (link)
I didn't mean Intel working directly in others drivers, but yeah that answer my question.

As for "desktop" chipset in laptop, I meant PC vs Ultra-Mobile but maybe they are just the same, making my question void.

Thanks

(Reply to this) (Parent)

Kernel to use?
(Anonymous)
2008-09-13 01:21 am UTC (link)
Hi Eric, Philip Langdale here.

Which kernel should we be using for this? Some branch of your linux-2.6 git tree? drm-gem-merge or drm-gem-dri2?

I'm seeing fatal X errors at start up with both EXA and UXA and with or without tiling.

"Failed to submit batchbuffer: Unknonw error 4125864784"

and before that, the log indicates:

"intel(0): [drm] Failed to name buffer -22"

three times.

(Reply to this) (Thread)

Re: Kernel to use?
[info]anholt
2008-09-13 08:59 pm UTC (link)
drm-gem-dri2 is now gone since it was outdated. drm-gem-merge is what you want.

If you're still having issues, please file a bug with Xorg.0.log and the sha1s of each tree you're using.

(Reply to this) (Parent)

GL and Cairo
(Anonymous)
2008-09-13 09:28 am UTC (link)
"However, as we think about cairo moving to a GL backend"

I thought the general consensus was that this was a bad idea and drivers should use the 3d-engine to accelerate EXA instead. Has this changed?

(Reply to this) (Thread)

Re: GL and Cairo
[info]anholt
2008-09-13 08:01 pm UTC (link)
The consensus was that EXA was the best short-term solution, particularly at a time when shaders weren't prevalent, and EXA has made the use of cairo that we see in things like GTK pretty decent. The problem with offloading all of the hard cairo work to EXA through render, is that we have to keep writing 3d pipeline-using driver code to accelerate the EXA operations. It's a lot of work, and if we could avoid it it'd be nice.

One thing we're looking at is a GL acceleration architecture. We almost had one with XGL, only XGL used glitz which was a render-like wrapper around GL. We'd rather use GL directly, so we can just write whatever GL acceleration we need for any operation that anybody uses.

The other option, which I think makes a lot more sense for cairo, is to have a direct GL backend. It means we can use whatever techniques we find for offloading processing, without having to come up with Render protocol to transfer the information to the server. So far the techniques we've seen haven't been as high quality as what cairo's current rendering offers, but we know we could do our current technique with GLSL 1.3, and with more work I'm sure earlier too.

(Reply to this) (Parent)


(7 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…