Eric Anholt ([info]anholt) wrote,
@ 2009-05-29 09:18:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Things are continuing to settle down in the Intel graphics driver. We're in the ~5th month of driver stabilization, and I think it's starting to really show.

One of the key things we've done recently is remove the old options in the 2D driver. We had proliferation problems before where everybody was running a different configuration (XAA or EXA or UXA, DRI1 or DRI2, KMS or not) to get their particular usage working well. So we'd fix someone's configuration, and break someone else's. The moment I cleaned out all that, we suddenly fixed a bunch of bugs with the path you should be using (UXA, DRI2, KMS) that had been obscured by the mess all over the driver. But the 2D driver's actually been pretty quiet -- the cool stuff is in:


commit 3da9e9d34ed7d2f5c33fd194d9dd09e15f4e51c0
Merge: 44ada1a 07f4f3e
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri May 29 08:48:13 2009 -0700

Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel


We've been making continual incremental improvement to the kernel code. The driver quality hit a low point around January when we were landing our last features we'd been developing over the previous year -- KMS and DRI2 in particular. A lot of it was deadline driven, and we had to land things sooner than we probably would have otherwise. But every week since then we've been turning around nice fixes and I'd like to mention just what happened this week in the kernel here:

krh fixed the swap-related corruption
This turned out to be a bug with our GTT mapping where we got some cache domain management wrong. GTT mapping is a great performance feature -- when you're uploading data to the GPU, instead of writing it to memory, flushing the CPU cache, and having the GPU access the buffer, just write it through the GPU's aperture. The streaming write performance is basically the same (it's a 50% win to a 50% loss in microbenchmarks, in the noise at a macro level), but the important thing is that it avoids sucking up your CPU's cache space for data you no longer want in the CPU's cache. If it means that your app starts fitting in L2 or L1 when it didn't before, the results can be huge.

Only, we made a mistake. When an object wasn't in the GTT aperture, touching your GTT mapping would fault the object in. We would forget to set the cache domain of the object to the GTT. If the CPU hadn't touched the pages, it worked out, or if the object had been bound to the GTT before userland started down the faulting path it worked out since userland asked to do the domain transition. But in the case of swapping of GEM objects, the CPU has the object in its cache since it was just DMAed from disk, and we silently dropped the user's GTT domain transition request because the object wasn't in the GTT domain yet. So we faulted it in, wrote some data uncached into the object, and later on down the line the CPU cache lines got flushed out to it. You ended up with old glyph data on top of your new glyph data. The solution was to do domain setting at the right time -- when the object gets bound to the GTT in the fault handler.

I fixed 8xx 3D rendering
Since around January, 8xx 3D's been in bad shape, most of it related to tiling. Daniel Vetter came along in March and fixed it up pretty significantly -- the untested port of userland stuff to the kernel had been quite wrong. Only, it turned out that Daniel's stuff was also a mis-translation of the working userland code to the kernel, and since he didn't have the docs it wasn't as easy to debug as it should be. I did have the docs, so after a day of sitting down and investigating this model bug report comment by Daniel, I had a fix for the kernel for stride issues, and a fix for a Mesa regression for resizing issues.

We're still working on getting docs for these old chipsets out, so that people don't need to get blocked on us. Sadly, it's slow going -- our group doesn't have the authority to do release the docs, so we have to convince other groups (who have other business to be doing) to spend their time going through the process to get the docs released.

I applied a big workaround for 865 cache flushing.
There's something weird about the 865. People that have been following our development probably know a bit about how we do weird cache management things. In particular, for getting an object in the CPU cache accessible by the GPU, we map each page of the object to the CPU, clflush each cache line of the page, then write to a magic register to flush the chipset's cache of flushed CPU cachelines out to memory, at which point the GPU can actually see them. This works beautifully (and much faster than using kernel mechanisms to manage PTE cachability flags) on all the other hardware we have. But on the 865, if you started up X -retro, it would show that some of the little blits drawing the root weave wouldn't appear. The blits are 32 bytes each, or half a cache line, and generally they'd be missing in groups of two. It's as if cache lines didn't get flushed, and the GPU saw zeroes (the original contents of the page) instead of the new values containing blit commands.

krh did a bunch of experimenting, and nothing else we came up with helped except for wbinvd. So when we're CPU cache flushing the objects on 865, we now do wbinvd instead of mapping pages and clflushing them, and the desktop appears to be stable again.

After some more investigation of docs, I've got a couple more experiments to try, but at least 2.6.30 should have working 865 support.

So, that's it, folks. One week of kernel fixes, and we've got rendering and stability improvements across the board. I'm hoping next week I get do do work, I can sit down and figure out the memory leak with GL compositing -- we just got a simple testcase posted on the mailing list that's supposed to show an actual leak, and I want to take a look at it soon.



(12 comments) - (Post a new comment)

swap-related corruption: not gone
(Anonymous)
2009-05-29 07:50 pm UTC (link)
Hi,
I am using the intel driver on ubuntu from xorg-edgers ppa, and while the font corruption is gone, I noticed that it has been replaced by the corruption of my background picture and icons on it. They become mangled lines. However, changing the background fixes the look of icons and the picture. This happens after resume from hibernate.

(Reply to this) (Thread)

Re: swap-related corruption: not gone
[info]anholt
2009-05-29 09:04 pm UTC (link)
If you have a regression, please git-bisect and report a bug for the broken commit. Otherwise, it probably won't get fixed.

(Reply to this) (Parent)

2d speed regressions
(Anonymous)
2009-06-02 03:26 pm UTC (link)
I see many 2d slowdown when UXA+KMS enabled, compared to EXA on older releases. Any hope to get that fixed soon?

(Reply to this) (Thread)

Re: 2d speed regressions
[info]anholt
2009-06-04 06:10 am UTC (link)
It'll be fixed when you report a bug upstream instead of in a blog, with instructions on how to reproduce the problem.

(Reply to this) (Parent)(Thread)

Re: 2d speed regressions
(Anonymous)
2009-06-04 10:14 pm UTC (link)
did so for quite a few bugs, not a single one has been looked at.

(Reply to this) (Parent)(Thread)

Re: 2d speed regressions
[info]anholt
2009-08-19 03:47 pm UTC (link)
Note that all the issues I mentioned were about correctness, not performance. We generally (I sometimes stray from this) prioritize bugs in terms of "does it prevent the system from working", "does it render incorrectly," then "does it reduce performance". We've spent a lot of time on the first two, and we've still probably got a hundred "my machine stops working" bugs to deal with. It's way more important than performance, generally.

But, Clemens, there's also the problem that the bugs you're reporting are about how performance has dropped for your Java 2D code. You chose to NIH cairo, and so you may not be seeing the benefits when we fix performance problems for cairo (what people like, you know, Firefox use. And every other app on my desktop). Your Java project is a lower priority for me to work on than "every app on my customers' desktops," so we probably won't get to them until we've got at least all the "easy" stuff for performance in OpenGL and cairo done. And I'm sure I've got "easy" stuff I'd like to do that would keep me busy for a year right now.

(Reply to this) (Parent)

issues...
(Anonymous)
2009-06-07 07:49 am UTC (link)
Thank you for interesting status updates.

Currently I had access to three different notebooks with i915, i965 and 855 based chipsets. I failed to get stable configuration at any of this hardware.

The worst story was with 8xx chipset but I'll try another time after this status update...

i915 based intel based notebook works, but still xserver crashes (very randomly) there too. Performance really degrades (compared to xorg-server 1.3.x times) but I'm going to test the fixes you reported here about.

On i956 I still unable to use 3d acceleration. It locks http://bugs.freedesktop.org/show_bug.cgi?id=20570 . And if previously it locked quite randomly now it locks immediately after X server start and I don't even manage to test things I was able to reproduce previously.

All bugs I encountered are already reported at bugzilla so the question here is: You are writing here that things starting to settle and 'really show'. Does this mean that you consider the driver in good shape? Is it configuration problem that I'm still unable to get it working without crash/locks and etc? (btw, I'm using Xorg without config at /etc/X11/xorg.conf)

(Reply to this) (Thread)

Re: issues...
[info]anholt
2009-08-19 03:55 pm UTC (link)
#20570 is a nasty one. We unfortunately don't have a way to introspect our hardware (the other side of the wall has some serious infrastructure for capturing all rendering commands, then replaying them step-by-step on a simulator or real hardware to examine what happens. we don't have that). As you saw, I thought I came up with a fix (and it surely fixed some of your hangs), but you've still got more. We're unfortunately working blind in trying to solve these "random" problems, just fixing issues as we get reliable testcases and hoping they help the "random" ones.

(Reply to this) (Parent)

Still slow.
(Anonymous)
2009-06-23 06:42 am UTC (link)
Heh there is blog post: http://kostyasha.blogspot.com/2009/06/blog-post.html

It's in Russian, but the most important part there are numbers:

xorg-server-1.5? mesa-7.4.2 , xf86-video-intel-2.7.1 tuxonice-sources-2.6.28-r10 = 1200 fps
xorg-server-1.6.1.901-r3, mesa-7.4.2, xf86-video-intel-2.7.1 tuxonice-sources-2.6.28-r10 = 200 fps
xorg-server-1.6.1.901-r3, mesa-7.4.2, xf86-video-intel-2.7.1 gentoo-sources-2.6.30-r1 = 555 fps

So even with 2.6.30 kernel new driver is still more then twice slower... So games-fps/nexuiz is hard to play, while quake now works :) Thank you.

(Reply to this) (Thread)

Re: Still slow.
[info]anholt
2009-06-23 04:27 pm UTC (link)
Those numbers are all above the screen refresh rate, so they're irrelevant.

If you've got a slow app, benchmark the slow app, not a fast app. And then use sysprof and intel_gpu_top while doing so to do analysis.

(Reply to this) (Parent)(Thread)

What do people need to have for sysprof output to be useful?
(Anonymous)
2009-08-19 09:53 am UTC (link)
Part of the issue is that it is hard to provide useful information without someone holding your hand and guiding you through the process.

One question I have is what needs to be installed for sysprof output to be useful to developers? If I have a closed source 3D app will sysprof output be useless (even if other apps are showing similar issues)?

(Reply to this) (Parent)(Thread)

Re: What do people need to have for sysprof output to be useful?
[info]anholt
2009-08-19 03:37 pm UTC (link)
debug symbols in all binaries taking up much (say, 10% at the highest?) CPU time. If all the CPU time is spent in the closed source 3d app, then the problem is probably not our fault, and you shouldn't be telling us about it :)

(Reply to this) (Parent)


(12 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…