One of the key things we've done recently is remove the old options in the 2D driver. We had proliferation problems before where everybody was running a different configuration (XAA or EXA or UXA, DRI1 or DRI2, KMS or not) to get their particular usage working well. So we'd fix someone's configuration, and break someone else's. The moment I cleaned out all that, we suddenly fixed a bunch of bugs with the path you should be using (UXA, DRI2, KMS) that had been obscured by the mess all over the driver. But the 2D driver's actually been pretty quiet -- the cool stuff is in:
Merge: 44ada1a 07f4f3e
Author: Linus Torvalds <firstname.lastname@example.org>
Date: Fri May 29 08:48:13 2009 -0700
Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kerne
We've been making continual incremental improvement to the kernel code. The driver quality hit a low point around January when we were landing our last features we'd been developing over the previous year -- KMS and DRI2 in particular. A lot of it was deadline driven, and we had to land things sooner than we probably would have otherwise. But every week since then we've been turning around nice fixes and I'd like to mention just what happened this week in the kernel here:
krh fixed the swap-related corruption
This turned out to be a bug with our GTT mapping where we got some cache domain management wrong. GTT mapping is a great performance feature -- when you're uploading data to the GPU, instead of writing it to memory, flushing the CPU cache, and having the GPU access the buffer, just write it through the GPU's aperture. The streaming write performance is basically the same (it's a 50% win to a 50% loss in microbenchmarks, in the noise at a macro level), but the important thing is that it avoids sucking up your CPU's cache space for data you no longer want in the CPU's cache. If it means that your app starts fitting in L2 or L1 when it didn't before, the results can be huge.
Only, we made a mistake. When an object wasn't in the GTT aperture, touching your GTT mapping would fault the object in. We would forget to set the cache domain of the object to the GTT. If the CPU hadn't touched the pages, it worked out, or if the object had been bound to the GTT before userland started down the faulting path it worked out since userland asked to do the domain transition. But in the case of swapping of GEM objects, the CPU has the object in its cache since it was just DMAed from disk, and we silently dropped the user's GTT domain transition request because the object wasn't in the GTT domain yet. So we faulted it in, wrote some data uncached into the object, and later on down the line the CPU cache lines got flushed out to it. You ended up with old glyph data on top of your new glyph data. The solution was to do domain setting at the right time -- when the object gets bound to the GTT in the fault handler.
I fixed 8xx 3D rendering
Since around January, 8xx 3D's been in bad shape, most of it related to tiling. Daniel Vetter came along in March and fixed it up pretty significantly -- the untested port of userland stuff to the kernel had been quite wrong. Only, it turned out that Daniel's stuff was also a mis-translation of the working userland code to the kernel, and since he didn't have the docs it wasn't as easy to debug as it should be. I did have the docs, so after a day of sitting down and investigating this model bug report comment by Daniel, I had a fix for the kernel for stride issues, and a fix for a Mesa regression for resizing issues.
We're still working on getting docs for these old chipsets out, so that people don't need to get blocked on us. Sadly, it's slow going -- our group doesn't have the authority to do release the docs, so we have to convince other groups (who have other business to be doing) to spend their time going through the process to get the docs released.
I applied a big workaround for 865 cache flushing.
There's something weird about the 865. People that have been following our development probably know a bit about how we do weird cache management things. In particular, for getting an object in the CPU cache accessible by the GPU, we map each page of the object to the CPU, clflush each cache line of the page, then write to a magic register to flush the chipset's cache of flushed CPU cachelines out to memory, at which point the GPU can actually see them. This works beautifully (and much faster than using kernel mechanisms to manage PTE cachability flags) on all the other hardware we have. But on the 865, if you started up X -retro, it would show that some of the little blits drawing the root weave wouldn't appear. The blits are 32 bytes each, or half a cache line, and generally they'd be missing in groups of two. It's as if cache lines didn't get flushed, and the GPU saw zeroes (the original contents of the page) instead of the new values containing blit commands.
krh did a bunch of experimenting, and nothing else we came up with helped except for wbinvd. So when we're CPU cache flushing the objects on 865, we now do wbinvd instead of mapping pages and clflushing them, and the desktop appears to be stable again.
After some more investigation of docs, I've got a couple more experiments to try, but at least 2.6.30 should have working 865 support.
So, that's it, folks. One week of kernel fixes, and we've got rendering and stability improvements across the board. I'm hoping next week I get do do work, I can sit down and figure out the memory leak with GL compositing -- we just got a simple testcase posted on the mailing list that's supposed to show an actual leak, and I want to take a look at it soon.