So, I've previously come out as saying that component alpha rendering was hard (and I wrote the math down wrong even there. Jeez, it's srcv.X = src.X * mask.X, not srcv.X = src.X * mask.A). I really believed it -- we hadn't accelerated it in the what, 5? years that it's been around in X, and my ideas for doing it hadn't been working out. David Reveman supposedly had a way for doing component alpha text (where you've got a solid source), but it required relatively new hardware, and I couldn't see how it would actually work out and didn't find it in the code when I looked. I think he also had a 4-pass system for older hardware. I'd come up with a way to do it on Radeons in 2 passes with a temporary (ugh), and keithp had planned a heinous way to do it in about 2 passes on R200s. But it turns out it's actually easy.
Really, PictOpOver is the only operation we care about. If we support others, cool, but everyone's using Over. The trouble is that component-alpha rendering requires two different sources for blending: one for the source value to the blender, which is the per-channel multiplication of source and mask, and one for the source alpha for multiplying with the destination channels, which is the multiplication of the source channels by the mask alpha. So the equation for Over is:
dst.A = src.A * mask.A + (1 - (src.A * mask.A)) * dst.A dst.R = src.R * mask.R + (1 - (src.A * mask.R)) * dst.R dst.G = src.G * mask.G + (1 - (src.A * mask.G)) * dst.G dst.B = src.B * mask.B + (1 - (src.A * mask.B)) * dst.B
But we can do some simpler operations, right? How about PictOpOutReverse, which has a source factor of 0 and dest factor of (1 - source alpha). We can get the source alpha value (srca.X = src.A * mask.X) out of the texture blenders pretty easily. So we can do a component-alpha OutReverse, which gets us:
dst.A = 0 + (1 - (src.A * mask.A)) * dst.A dst.R = 0 + (1 - (src.A * mask.R)) * dst.R dst.G = 0 + (1 - (src.A * mask.G)) * dst.G dst.B = 0 + (1 - (src.A * mask.B)) * dst.B
OK. And if an op doesn't use the source alpha value for the destination factor, then we can do the channel multiplication in the texture blenders to get the source value, and ignore the source alpha that we wouldn't use. We've supported this in the Radeon driver for a long time. An example would be PictOpAdd, which does:
dst.A = src.A * mask.A + dst.A dst.R = src.R * mask.R + dst.R dst.G = src.G * mask.G + dst.G dst.B = src.B * mask.B + dst.B
Hey, this looks good! If we do a PictOpOutReverse and then a PictOpAdd right after it, we get:
dst.A = src.A * mask.A + ((1 - (src.A * mask.A)) * dst.A) dst.R = src.R * mask.R + ((1 - (src.A * mask.R)) * dst.R) dst.G = src.G * mask.G + ((1 - (src.A * mask.G)) * dst.G) dst.B = src.B * mask.B + ((1 - (src.A * mask.B)) * dst.B)
The best part is that we can do this trick easily (I think it was about 30 lines of code), with no API changes to EXA. My subpixel text on the laptop is now 1/2 speed of non-subpixel text, but 5-6 times faster than before. Sweet.