But what gets me even more excited than getting our first major kernel merge done is that we're starting to create a culture of review surrounding our driver. Keith started by insisting on posting patches instead of git trees, which meant that I read his patches and found issues before they were queued for upstream. I started posting patches in retaliation, and he kept NAKing them for needing improvements (which I've usually got around to) or being obsoleted by better work he did. Now we've got the rest of the team joining the party. While it has slowed things down a bit, it's also nice -- the internal TODO list of "someone committed some junk and I need to go fix it up when I get some time" is no longer growing out of control, and stability is definitely increasing.
The majority of the improvements in the last week have been in vblank handling. The vblank-rework changes had gone through a series of QA cycles before I merged them, but there were some issues that cropped up in integrating them with the GEM changes, and there were some plain old bugs we found when we started trying to use them on our desktops. We also fixed some VT switching bugs that would have hurt suspend/resume, and a couple more G4X issues.
My highest priority now is sorting out our failures with batchbuffers being too large. Our testcases up until now have had texture load below the size of the aperture. However, with more interesting apps like sauerbraten or Virtual Forbidden City, we're running into a problem where we accumulate a batchbuffer for execution that can't be loaded into the aperture all at once, and you get an error message and no rendering. Dave Airlie had fixed this in classic mode with his check_aperture changes, but we never did them for GEM, since figured we'd have the whole aperture available instead of just 32MB. But when a single mipmapped texture is 24MB, we run out of aperture quickly anyway. There are a few things we're planning on doing to resolve the problem:
- Implement check_aperture on GEM so that we can flush the batch when it's likely going to be too big for the aperture. This will avoid having rendering dropped on the floor for being too big.
- Fix counting of aperture space consumed in check_aperture. Right now we just sum up all the sizes of buffers targeted by relocations, but if you keep referencing the same big buffer over and over you'll flush too often.
- Implement PPGTT support for new chipsets. This gives us a 2GB virtual address space for your batchbuffers to play in instead of 256 or 512MB. I'm trying to avoid saying "that'll be enough for anybody", but it'll certainly make a lot of apps happier. This'll be a bit of work, as we don't have a PCI aperture mapping to that address space, so we can't use some of our old tricks the same way. However, it should be a pretty significant win on any serious 3D workload.
We're also looking into getting an appropriate API into the kernel for our transient mappings of the aperture. One of the sticking points in getting GEM merged was a hack we were doing with the kernel mapping APIs on CONFIG_HIGHMEM x86. By actually sitting down and writing the API we need, we should get improved performance in PAT non-MTRR environments (like the G[M]45 where we don't get an MTRR slot), and improved performance on 64-bit where we couldn't do the CONFIG_HIGHMEM hack, for a 20% to 200% improvement depending on which case of failure occurred. By being in a kernel tree, we can submit a single patch to the kernel community showing the API we want and how we plan to use it, and get much better feedback than we've been able to in the past. In this case, Ingo came up with fun ideas for how we can get the advantages of our atomic mapping path on CONFIG_HIGHMEM x86 without the scheduling restrictions of actually being atomic, which would be awfully convenient for one of our code paths.
For those looking to run the latest hotness, here's the list:
kernel: drm-intel-next in my tree (still has some bugfixes to be merged)
libdrm: 2.4.0 (nothing major has happened since then)
xf86-video-intel: 2.5.0 (nothing major has happened since then)
xserver: master (be sure to update your input drivers too!)
That's 2 things with version numbers on them compared to 2 weeks ago. The X server would be in the list too, but we missed that the glyph cache didn't make it into the 1.5 series, which is critical for 2D performance.