Back in early 2011, Christopher James Halse Rogers (RAOF) upstreamed a change to Mesa that allowed building the big pile of shared code as a shared library, which the various drivers could link against, so that we had only one copy on disk. Looking at a build I've got here, my i965_dri.so is 967k and libdricore is 4390k – so for each driver sharing the libdricore, we saved about 4MB. It made a big difference to distros trying to ship install CDs.
The problem with this is that it means all of Mesa's symbols have to be public, so that the drivers can get to them. This means an application could accidentally call one of our symbols (or potentially override one of our symbols with theirs). Now, we do like to prefix our symbols to make that unlikely, but looking through the symbols exported, there are some scary ones. _math_matrix_translate()? I could see that conflicting. hash_table_insert()? Oh, I bet nobody's named a function that before.
The other problem with making all our symbols visible is that the compiler doesn't get to be smart for us. All of those calls from i965_dri.so into Mesa core are actual function calls, not inlined. They all produce relocations. We could contort our coding style to move inlineable code into headers at the expense of our sanity, but not having to manually inline is why we have optimizing compilers.
Enter megadrivers. What if we built all of the drivers together as a single .so? I've hacked up a build of i965_dri.so to build all of the driver code in with the core. If all the drivers can do this, then we get all the benefits of sharing the built code, while also allowing link-time optimization, and the application can never accidentally look under the covers.
The tricky part here was the loader interface. There are two loaders: libGL.so.1, and the X Server. Both dlopen your dri.so and look for a symbol named __driDriverExtensions (actually, libGL.so.1 also looks for __driConfigOptions, used to support the driconf application). From the vtables in that structure, all of the rest of the driver gets called. Each driver needs a different copy of the symbol, to point to its own functions. So to do the i965 megadriver, I made a tiny i965_dri.so which has just:
0000000000200b20 D driDriverAPI
0000000000200e00 D __driDriverExtensions
for a total of 5.5k, and that links against the 4.6MB libmesa_dri_drivers9.3.0-devel.so, which exports:
00000000003fbcc0 R __dri2ConfigOptions
00000000002dc120 R __driConfigOptions
00000000002d522c T _fini
0000000000033a38 T _init
0000000000660a60 D _mesa_dri_core_extension
0000000000654fc0 D _mesa_dri_dri2_extension
00000000000ed300 T _mesa_dri_intel_allocate_buffer
00000000000eddd0 T _mesa_dri_intel_create_buffer
00000000000f6520 T _mesa_dri_intel_create_context
00000000000edc00 T _mesa_dri_intel_destroy_buffer
00000000000f5790 T _mesa_dri_intel_destroy_context
00000000000edc30 T _mesa_dri_intel_destroy_screen
00000000000ed3e0 T _mesa_dri_intel_init_screen
00000000000edc70 T _mesa_dri_intel_make_current
00000000000ed2e0 T _mesa_dri_intel_release_buffer
00000000000eddb0 T _mesa_dri_intel_unbind_context
With only one driver converted, this change is hardly an improvement over the previous state of affairs – now along with libdricore, you've got another copy of the core in libmesa_dri_drivers.so. I'll be working on converting other classic drivers next, so we can hopefully drop libdricore.
Initial performance results: Enabling LTO on a dricore build, I saw a -0.798709% +/- 0.333703% (n=30) effect on INTEL_NO_HW=1 cairo-gl runtime. On a megadrivers+LTO compared to non-megadrivers, non-LTO, the difference was -6.35008% +/- 0.675067% (n=10).
I think this is definitely promising
Now, there is at least one minor downside: Your megadriver has to link against the shared library deps of all of the sub-drivers. That means you'll be runtime linking libdrm_radeon.so along with libdrm_intel.so, for example. There's very little overhead to that, so I'm willing to trade that off for runtime overhead reduction. But the Radeon guys are excited about LLVM, which has had issues with breaking applications due to mismatched symbols between LLVM-using apps and LLVM-using drivers, and I wouldn't want our driver to suffer if that's an ongoing issue. It may be that if there are problems like this, we need to segment into megadrivers-with-that-dep and megadrivers-without-that-dep, for hopefully just two copies of Mesa core, instead of N.
I'm headed off to debconf day after tomorrow, where I'll hopefully be talking with distro folks about this plan, and some ideas for how to get graphics driver updates out faster.