Notably, I learned some unstated rules about how loading and storing from the tilebuffer work, which has significantly improved stability on the Pi (as opposed to simulation, which only asserted about following half of these rules).
I got an intro on the debug process for GPU hangs, which ultimately just looks like "run it through simpenrose (the simulator) directly. If that doesn't catch the problem, you capture a .CLIF file of all the buffers involved and feed it into RTL simulation, at which point you can confirm for yourself that yes, it's hanging, and then you hand it to somebody who understands the RTL and they tell you what the deal is." There's also the opportunity to use JTAG to look at the GPU's perspective of memory, which might be useful for some classes of problems. I've started on .CLIF generation (currently simulation-environment-only), but I've got some bugs in my generated files because I'm using packets that the .CLIF generator wasn't prepared for.
I got an overview of the cache hierarchy, which pointed out that I wasn't flushing the ARM dcache to get my writes out into system L2 (more like an L3) so that the GPU could see it. This should also improve stability, since before we were only getting lucky that the GPU would actually see our command stream.
Most importantly, I ended up fixing a mistake in my attempt at reset using the mailbox commands, and now I've got working reset. Testing cycles for GPU hangs have dropped from about 5 minutes to 2-30 seconds. Between working reset and improved stability from loads/stores, we're at the point that X is almost stable. I can now run piglit on actual hardware! (it takes hours, though)
On the X front, the modesetting driver is now merged to the X Server with glamor-based X rendering acceleration. It also happens to support DRI3 buffer passing, but not Present's pageflipping/vblank synchronization. I've submitted a patch series for DRI2 support with vblank synchronization (again, no pageflipping), which will get us more complete GLX extension support, including things like GLX_INTEL_swap_event that gnome-shell really wants.
In other news, I've been talking to a developer at Raspberry Pi who's building the KMS support. Combined with the discussions with keithp and ajax last week about compositing inside the X Server, I think we've got a pretty solid plan for what we want our display stack to look like, so that we can get GL swaps and video presentation into HVS planes, and avoid copies on our very bandwidth-limited hardware. Baby steps first, though -- he's still working on putting giant piles of clock management code into the kernel module so we can even turn on the GPU and displays on our own without using the firmware blob.
- 93.8% passrate on piglit on simulation
- 86.3% passrate on piglit gpu.py on Raspberry Pi
All those opcodes I mentioned in the previous post are now completed -- sadly, I didn't get people up to speed fast enough to contribute before those projects were the biggest things holding back the passrate. I've started a page at http://dri.freedesktop.org/wiki/VC4/ for documenting the setup process and status.
And now, next steps. Now that I've got GPU reset, a high priority is switching to interrupt-based render job tracking and putting an actual command queue in the kernel so we can have multiple GPU jobs queued up by userland at the same time (the VC4 sadly has no ringbuffer like other GPUs have). Then I need to clean up user <-> kernel ABI so that I can start pushing my linux code upstream, and probably work on building userspace BO caching.