I've been promising great 2D performance from open source graphics for years. It was reaching the point where I was feeling awfully bad about being wrong so frequently. So this summer I started playing in my free time with making a GL backend for cairo. There was a previous sort of GL backend in the form of glitz, but it made a big mistake in trying to abstract GL through a Render-like API. The problem with accelerating 2D is that Render is a bad match for hardware!
A native GL backend turned out to be shockingly easy, now that we have support for EXT_framebuffer_objects all over, non-power-of-two textures, and GLSL. Here's a comparison of 3 backends, normalized to the image backend. Bigger bars means faster.
This shows an accelerated backend beating the CPU rasterization backend on 3 tests. Note that things for the image backend are a little unfair in its favor -- we can't scan out from cached system memory buffers, so if you want to actually see the results you have to do an upload at some point, which isn't reflected in the cairo-perf-trace results. Being able to beat that with GPU rendering to something that could be scanned out is pretty awesome. But that's only 3 tests -- for most of them image is winning. I've got some ideas for hacks on the 965 driver that may fix up a bunch of those bars (it's hard to estimate, since it's all about cache effects, and fixing those has a tendency to improve by more than the amount of time spent according to sysprof).
Since comparing to image isn't too fair, and we're not using image today, I did a comparison to xlib. This looks awesome:
By replacing Xlib usage with GL, we get a speedup on almost all the testcases, and a huge speed up on one that Xlib is pathologically slow on (I haven't figured out why for xlib yet). We've got a good pass rate on the cairo test suite, so I think this stuff is ready for people to start experimenting with in apps.
There's much more to do for performance still. I've got a plan to work on the 965 driver to improve glyphs-heavy tests like firefox-talos-gfx (and ETQW and WoW as well). For firefox-talos-svg, right now we're hitting aperture full because of all the spans data we're sending out before the GPU gets done with things. If we speed up the GPU rendering just a little, for example by tuning the inefficient shaders we're using right now, we can probably avoid hitting aperture full and cut CPU further. I think we're missing throttling for non-swapbuffers apps in DRI2, and we might actually do better and avoid aperture full if we do some appropriate throttling. And there's a lot of room for people who'd like to experiment with GL shader and state optimizations to jump in and tear this code apart.
I'd say that the Linux 2D acceleration story is starting to finally look good after all these years.