Loading `libvulkan.so.1` on Linux with `std.ElfDynLib`

2025-02-10

This was originally a topic on Ziggit.dev.

At the moment, my Linux application framwork seizer uses software rendering to achieve it’s goal of static linking. Long term, I’m not really satisfied with that as a solution. GPUs are amazingly powerful, and recreating everything they CPU side is just not possible. The stumbling block for seizer is that:

  1. Graphics libraries require dynamic linking or dlopen.
  2. Linux has no system libc, so we must bring our own dlopen to access graphics libraries.
  3. Graphics libraries use functions from glibc or musl.
  4. glibc and musl don’t support this usecase; they only support being used with their own dynamic linker.

(See also this post in my topic for shimizu)

Currently, I have two ideas for getting around this problem:

  1. Work with the various upstreams (mesa, glibc, musl, etc.) to support for being loaded by third party dlopen implementations.
  2. Write my own GPU library. (Been meaning to watch Raw dogging linux graphics (DRM) - YouTube)

For this topic, I am focused on idea 1.


Before filing any issues I want to make sure that I have something concrete to go off of. I’ve created a repo here with a small program that attempts to load libvulkan.so.1:

My first attempt looked something like this:

pub fn main() !void {
    var vulkan_dynlib = try std.DynLib.open("libvulkan.so.1");
    defer vulkan_dynlib.close();

    const vkGetInstanceProcAddr = vulkan_dynlib.lookup(vk.PfnGetInstanceProcAddr, "vkGetInstanceProcAddr") orelse return error.vkGetInstanceProcAddrNotFound;

    const vkb: VulkanBaseDispatch = try VulkanBaseDispatch.load(vkGetInstanceProcAddr);
    std.log.debug("vkCreateInstance = {}", .{vkb});
}

const VulkanBaseDispatch = vk.BaseWrapper(vulkan_apis);

const vk = @import("vulkan");
const vulkan_apis: []const vk.ApiInfo = &.{
    vk.features.version_1_0,
};

const dl_lib = @import("dlopen-mesa-with-zig_lib");
const std = @import("std");

Running the program gives us this output, and we are given a stark reminder of the uphill battle we may be facing:

error: ElfHashTableNotFound
/home/geemili/code/zig/build-master/stage3/lib/zig/std/dynamic_library.zig:346:45: 0x103e340 in open (dlopen-mesa-with-zig)
            .hashtab = maybe_hashtab orelse return error.ElfHashTableNotFound,
                                            ^
/home/geemili/code/zig/build-master/stage3/lib/zig/std/dynamic_library.zig:32:28: 0x103b126 in open (dlopen-mesa-with-zig)
        return .{ .inner = try InnerType.open(path) };
                           ^
/home/geemili/code/dlopen-mesa-with-zig/src/main.zig:2:25: 0x103af78 in main (dlopen-mesa-with-zig)
    var vulkan_dynlib = try std.DynLib.open("libvulkan.so.1");
                        ^

My system libvulkan.so.1 contains no DT_HASH section, only a DT_GNU_HASH section. Glibc recently disabled DT_HASH by default. While the technical merits can be debated, what this unquestionably represents is a disregard for backwards compatibility. Issues similar to this is why seizer only supports a built-in software renderer at the moment. But I digress.

To solve Zig’s standard library not supporting DT_GNU_HASH, I copied ElfDynLib from Zig’s standard library and added support for looking up symbols using DT_GNU_HASH. I used “ELF: better symbol lookup via DT_GNU_HASH” by flapenguin.me for reference. (UPDATE: made a pull request adding DT_GNU_HASH support to Zig’s standard library)

Having shaved that yak, I replaced std.DynLib with our copy of ElfDynLib, and ran it again. This time we get:

~/code/dlopen-mesa-with-zig> zig build run
debug: bits found in bloom filter, symbol vkGetInstanceProcAddr may exist
debug: found hash match for 0x53f1e7aa
debug: found symbol "vkGetInstanceProcAddr"
Segmentation fault at address 0x0
???:?:?: 0x0 in ??? (???)

Running it with lldb we can see the segmentation fault occurs in vkGetInstanceProcAddr:

~/code/dlopen-mesa-with-zig> lldb ./zig-out/bin/dlopen-mesa-with-zig
(lldb) target create "./zig-out/bin/dlopen-mesa-with-zig"
Current executable set to '/home/geemili/code/dlopen-mesa-with-zig/zig-out/bin/dlopen-mesa-with-zig' (x86_64).
(lldb) run
Process 3009743 launched: '/home/geemili/code/dlopen-mesa-with-zig/zig-out/bin/dlopen-mesa-with-zig' (x86_64)
debug: bits found in bloom filter, symbol vkGetInstanceProcAddr may exist
debug: found hash match for 0x53f1e7aa
debug: found symbol "vkGetInstanceProcAddr"
debug: vkGetInstanceProcAddr = fn (vk.Instance, [*:0]const u8) callconv(.c) ?*const fn () callconv(.c) void@7ffff7f0e4e0
Process 3009743 stopped
* thread #1, name = 'dlopen-mesa-wit', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
(lldb) bt
* thread #1, name = 'dlopen-mesa-wit', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x00007ffff7f0e51b
    frame #2: 0x000000000103d7d3 dlopen-mesa-with-zig`vk.BaseWrapper(loader=0x00007ffff7f0e4e0).load__anon_4774 at vk.zig:27075:27
    frame #3: 0x000000000103be42 dlopen-mesa-with-zig`main.main at main.zig:8:64
    frame #4: 0x000000000103bc9b dlopen-mesa-with-zig`start.posixCallMainAndExit [inlined] start.callMain at start.zig:656:37
    frame #5: 0x000000000103bc81 dlopen-mesa-with-zig`start.posixCallMainAndExit [inlined] start.callMainWithArgs at start.zig:616:20
    frame #6: 0x000000000103bc08 dlopen-mesa-with-zig`start.posixCallMainAndExit(argc_argv_ptr=0x00007fffffffe3e0) at start.zig:571:36
    frame #7: 0x000000000103b84e dlopen-mesa-with-zig`start._start at start.zig:271:5
(lldb) fr sel 1
frame #1: 0x00007ffff7f0e51b
->  0x7ffff7f0e51b: testl  %eax, %eax
    0x7ffff7f0e51d: jne    0x7ffff7f0e550
    0x7ffff7f0e51f: movq   0x603e2(%rip), %rbx
    0x7ffff7f0e526: movq   -0x38(%rbp), %rax
(lldb) disassemble -s 0x7ffff7f0e4e0 -e 0x7ffff7f0e51b+16
    0x7ffff7f0e4e0: endbr64 
    0x7ffff7f0e4e4: pushq  %rbp
    0x7ffff7f0e4e5: movq   %rsp, %rbp
    0x7ffff7f0e4e8: pushq  %r15
    0x7ffff7f0e4ea: pushq  %r14
    0x7ffff7f0e4ec: pushq  %r13
    0x7ffff7f0e4ee: leaq   0x3352c(%rip), %r13
    0x7ffff7f0e4f5: pushq  %r12
    0x7ffff7f0e4f7: movq   %rdi, %r12
    0x7ffff7f0e4fa: pushq  %rbx
    0x7ffff7f0e4fb: subq   $0x18, %rsp
    0x7ffff7f0e4ff: movq   %fs:0x28, %rbx
    0x7ffff7f0e508: movq   %rbx, -0x38(%rbp)
    0x7ffff7f0e50c: movq   %rsi, %rbx
    0x7ffff7f0e50f: movq   %r13, %rsi
    0x7ffff7f0e512: movq   %rbx, %rdi
    0x7ffff7f0e515: callq  *0x6055d(%rip)
->  0x7ffff7f0e51b: testl  %eax, %eax
    0x7ffff7f0e51d: jne    0x7ffff7f0e550
    0x7ffff7f0e51f: movq   0x603e2(%rip), %rbx
    0x7ffff7f0e526: movq   -0x38(%rbp), %rax
    0x7ffff7f0e52a: fs     
(lldb) 

vkGetInstanceProcAddr is loaded at 0x7ffff7f0e4e0, and the return pointer in the backtrace points at 0x7ffff7f0e51b.

Opening libvulkan.so.1 with rizin, we can seek to sym.vkGetInstanceProcAddr at 0x000284e0:

~/code/dlopen-mesa-with-zig> rizin /usr/lib/libvulkan.so.1
ERROR: Cannot determine entrypoint, using 0x00007040.
 -- The unix-like reverse engineering framework.
[0x00007040]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls
[x] Analyze len bytes of instructions for references
[x] Check for classes
[x] Analyze local variables and arguments
[x] Type matching analysis for all functions
[x] Applied 0 FLIRT signatures via sigdb
[x] Propagate noreturn information
[x] Integrate dwarf function information.
[x] Resolve pointers to data sections
[x] Use -AA or aaaa to perform additional experimental analysis.
[0x00007040]> s sym.vkGetInstanceProcAddr
[0x000284e0]> 

Doing a bit of math we can determine that the instruction we want to see is at:

[0x000284e0]> pd 20
            ;-- vkGetInstanceProcAddr:
┌ sym.vkGetInstanceProcAddr(int64_t arg1, const char *s1);
│           ; arg int64_t arg1 @ rdi
│           ; arg const char *s1 @ rsi
│           ; var int var_48h @ stack - 0x48
│           ; var int64_t var_40h @ stack - 0x40
│           0x000284e0      endbr64                                    ; RELOC TARGET 64 vkGetInstanceProcAddr @ 0x000284e0
│           0x000284e4      push  rbp
│           0x000284e5      mov   rbp, rsp
│           0x000284e8      push  r15
│           0x000284ea      push  r14
│           0x000284ec      push  r13
│           0x000284ee      lea   r13, [str.vkGetInstanceProcAddr]     ; 0x5ba21 ; "vkGetInstanceProcAddr"
│           0x000284f5      push  r12
│           0x000284f7      mov   r12, rdi                             ; arg1
│           0x000284fa      push  rbx
│           0x000284fb      sub   rsp, 0x18
│           0x000284ff      mov   rbx, qword fs:[0x28]
│           0x00028508      mov   qword [var_40h], rbx
│           0x0002850c      mov   rbx, rsi                             ; arg2
│           0x0002850f      mov   rsi, r13                             ; const char *s2
│           0x00028512      mov   rdi, rbx                             ; const char *s1
│           0x00028515      call  qword [reloc.strcmp]                 ; [reloc.strcmp:8]=0x892b8 reloc.target.strcmp
│           0x0002851b      test  eax, eax
│       ┌─< 0x0002851d      jne   0x28550
│       │   0x0002851f      mov   rbx, qword [reloc.vkGetInstanceProcAddr.88908] ; [0x88908:8]=0x284e0 sym.vkGetInstanceProcAddr

rizin helpfully informs us that 0x00028515 is a call to reloc.target.strcmp.

0x00028515  call  qword [reloc.strcmp]   ; [reloc.strcmp:8]=0x892b8 reloc.target.strcmp

This is no doubt a missing feature of std.DynLib. As far as I can tell it doesn’t have ElfDynLib doesn’t support relocations or recursively loading dynamic libraries.

Anyway, that’s all for now. I’ll pick up from here when/if I get around to it.