Improve VM GPU passthrough support by enabling automatic OVMF MMIO window sizing when attaching device #15872

Improve VM GPU passthrough support by enabling automatic OVMF MMIO window sising when attaching device #15872

https://github.com/canonical/lxd/issues/15872

MitchellAugustin opened on Jun 25, 2025 · edited by MitchellAugustin Member Please confirm

I have searched existing issues to check if an issue already exists for my feature request. Is your feature request related to a problem? Please describe. Due to edk2-0005-disable-dynamic-mmio-winsize.patch, OVMF’s new PlatformDynamicMmioWindow functionality is bypassed, which requires users to run additional config steps for GPU passthrough. (something like -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536).

Users should not need to perform this extra step, as it is not required on other VM orchestration services based on recent OVMF.

Describe the solution you’d like Enable dynamic window sising by removing edk2-0005-disable-dynamic-mmio-winsize.patch in future LXD versions, and update the FAQ to include any other relevant info from the comment threads below

Describe alternatives you’ve considered No response

Additional context Original report (no longer accurate after further investigation, in comments below): QEMU’s default pci-hole64 size is too small for many modern GPUs. This can be worked around as described in the lxd faq, but I’d like to consider some nicer end-user solutions if possible so that LXD GPU passthrough “just works” without additional workarounds.

We could also consider applying, at passthrough device attachment time, a heuristic similar to what ovmf does here, where the pci-hole64-size could be a function of the physical memory address width.

Any VMs booting with OVMF after that patch should no longer need to do -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536 for most GPUs, but we still need to change the pci-hole64 size via lxc config set raw.qemu=’-global q35-pcihost.pci-hole64-size=2048G’ (or similar).

Activity

MitchellAugustin added the Feature issue type on Jun 25, 2025

skozina added Jira Triggers the synchronisation of a GitHub issue in Jira on Jun 26, 2025 MitchellAugustin MitchellAugustin commented on Jul 12, 2025 MitchellAugustin on Jul 12, 2025 Member Author @tomponline I asked on the libvirt mailing list whether they do any explicit resising of the pci64-hole-size, and they said that they don’t, but that the behavioral difference could be due to some difference in the VM topology.

I started taking a look into this. One thing I notice is that, while it seems that my usual virt-install script template and the LXD VM that I spawned with lxd launch ubuntu: --vm, lxc config device add passthroughtest gpu gpu pci=0000:1b:00.0 (adjusted to a 4 CPU, 128GB RAM limit) both are using qemu CPU host-passthrough, there are some differences in the lscpu output. Ex, with lscpu on the libvirt VM, I see

Virtualisation features:
Virtualisation: VT-x Hypervisor vendor: KVM Virtualisation type: full and with lscpu on the LXD VM, I only see:

Virtualisation features:
Virtualisation: VT-x but I thought LXD’s VMs were also KVM-based, so the omission of those two lines is confusing to me. (I see -accel kvm in my qemu command generated by libvirt, but I figured that was just buried somewhere in the LXD config as well.)

There’s also a difference in the feature flags reported by lscpu. (see my prior link)

So, my first question to you/the LXD team is - do you know why there’s a difference here? Is LXD using some variant of CPU host-passthrough that is subtly different from this typical qemu config I get with libvirt?

tomponline tomponline commented on Jul 14, 2025 tomponline on Jul 14, 2025 · edited by tomponline Member Hi,

LXD does use KVM (it only supports that type of VM actually).

The machine type is q35:

lxd/lxd/instance/drivers/driver_qemu_templates.go

Line 77 in a69a124

machineType = “q35” lxd/lxd/instance/drivers/driver_qemu_templates.go

Line 113 in a69a124

{key: “accel”, value: “kvm”}, LXD uses a mixture of QMP configuration, config files and command line options - with command line options being the least preferred preference so as to avoid “noisy” process lists.

lxd/lxd/instance/drivers/driver_qemu_config_test.go

Lines 34 to 53 in a69a124

qemuBaseOpts{architecture: osarch.ARCH_64BIT_INTEL_X86}, `# Machine [machine] graphics = “off” type = “q35” accel = “kvm” usb = “off” [global] driver = “ICH9-LPC” property = “disable_s3” value = “1” [global] driver = “ICH9-LPC” property = “disable_s4” value = “1” [boot-opts] strict = “on”`,

LXD also uses a number of CPU extensions, but primarily it uses the “host” setting to pass through all of the host’s CPU extensions, so it should appear as the host’s CPU:

https://github.com/canonical/lxd/blob/a69a124f4b9e8e93f29ee9bab640736

MitchellAugustin MitchellAugustin commented on Jul 15, 2025 MitchellAugustin on Jul 15, 2025 Member Author Thanks. Is that last link meant to go somewhere else? I’m not sure how it is relevant here.

MitchellAugustin MitchellAugustin commented on Aug 2, 2025 MitchellAugustin on Aug 2, 2025 · edited by MitchellAugustin Member Author Hi @tomponline , I have new findings regarding this issue, and I think I figured out the root cause for why LXD behaves differently than libvirt here.

TL;DR: It seems that LXD’s OVMF is not the same as what is installed via the Noble ovmf apt package - or more precisely, /snap/lxd/33110/share/qemu/OVMF_VARS.4MB.fd and /snap/lxd/33110/share/qemu/OVMF_CODE.4MB.fd do not have the same dynamic MMIO window sising functionality as /usr/share/OVMF/OVMF_CODE_4M.fd and /usr/share/OVMF/OVMF_VARS_4M.fd installed by the ovmf apt package.

I’m not sure if these two being functionally different is by design, but the following test below shows that when I bind-mount the ovmf apt package’s versions onto the snap, I’m able to use the guest GPUs without overriding the QEMU settings:

Test run on DGX A100

Control test:

sudo snap install lxd and lxd init (defaults, but with 512GiB storage) lxd launch ubuntu:noble passthroughtest –vm Shut down VM Add GPUs a. lxc config device add passthroughtest gpu gpu pci=0000:07:00.0 b. lxc config device add passthroughtest gpu1 gpu pci=0000:0f:00.0 c. lxc config device add passthroughtest gpu2 gpu pci=0000:47:00.0 d. lxc config device add passthroughtest gpu3 gpu pci=0000:4e:00.0 Expand memory and CPU (not needed on all platforms - only needed for the DGXes sometimes, in my experience) a. lxc config set passthroughtest limits.memory 8192MB b. lxc config set passthroughtest limits.cpu 8 c. lxc start passthroughtest In VM: sudo apt install ubuntu-drivers-common && sudo ubuntu-drivers install –gpgpu && sudo apt install nvidia-utils-XXX-server && sudo nvidia-smi a. Observe that BARs fail to map in dmesg, and that nvidia-smi fails root@passthroughtest:~# sudo nvidia-smi NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. New functionality test (same VM):

Power down the VM sudo apt install ovmf on host sudo mount -o bind /usr/share/OVMF/OVMF_CODE_4M.fd /snap/lxd/33110/share/qemu/OVMF_CODE.4MB.fd sudo mount -o bind /usr/share/OVMF/OVMF_VARS_4M.fd /snap/lxd/33110/share/qemu/OVMF_VARS.4MB.fd lxc start passthroughtest In VM, sudo nvidia-smi a. Observe that all GPUs are usable without changing QEMU settings Additional context: I was compelled to try this when I realised yesterday that setting -global q35-pcihost.pci-hole64-size=2048G alone doesn’t enable the GPU to work in my A100 guest; in fact, it was actually the change to

lxc config set passthroughtest raw.qemu=’ -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=862144 ‘ that gets the guest GPUs to work, even without the q35-pcihost.pci-hole64-size override on that system. This didn’t make sense to me, since any OVMF after this patch has a large enough MMIO window that this opt/ovmf/X-PciMmio64Mb override is not needed for guest GPUs to work on this machine. (Noble edk2 includes this patch). My confusion here is that I thought the lxd snap v5.21.3-c5ae129 also bundled the same OVMF as in Noble - so I don’t know if that is incorrect, or if OVMF_CODE.4MB.fd is generated differently from OVMF_CODE_4M.fd in some way that is needed for other LXD VM functionality.

I think the answer to that question will determine how we’d proceed with fixing this in LXD.

MitchellAugustin MitchellAugustin commented on Aug 2, 2025 MitchellAugustin on Aug 2, 2025 Member Author Relevant log

tomponline tomponline commented on Aug 6, 2025 tomponline on Aug 6, 2025 Member Hi @MitchellAugustin thanks for the awesome investigation.

My confusion here is that I thought the lxd snap v5.21.3-c5ae129 also bundled the same OVMF as in Noble

In actual fact we currently build edk2 from upstream sources, but we do have a desire to switch to Noble’s EDK2 sources.

https://github.com/canonical/lxd-pkg-snap/blob/5.21-candidate/snapcraft.yaml#L327-L449

Also I notice that we are currently including this patch which sounds relevant:

https://github.com/canonical/lxd-pkg-snap/blob/5.21-candidate/snapcraft.yaml#L369

Which was originally introduced to fix issues with CentOS 7 and Ubuntu 18.04 VM guests:

canonical/lxd-pkg-snap#135

So we could try removing that patch and seeing if its still an issue with the edk2 version we use.

tomponline changed the issue type from Feature to Bug on Aug 6, 2025

tomponline changed the title [-]Automatically apply larger q35-pcihost.pci-hole64-size when attaching a passthrough device[/-] [+]Improve VM GPU passthrough support by automatically apply larger q35-pcihost.pci-hole64-size when attaching a passthrough device[/+] on Aug 6, 2025

tomponline changed the title [-]Improve VM GPU passthrough support by automatically apply larger q35-pcihost.pci-hole64-size when attaching a passthrough device[/-] [+]Improve VM GPU passthrough support by automatically appling larger q35-pcihost.pci-hole64-size when attaching device[/+] on Aug 6, 2025

mihalicyn self-assigned thison Aug 6, 2025 tomponline tomponline commented on Aug 6, 2025 tomponline on Aug 6, 2025 Member @skozina @mihalicyn @mionaalex I think this would be a good idea to evaluate as part of our switch to the noble edk2 sources for the 6 LTS release.

MitchellAugustin MitchellAugustin commented on Aug 6, 2025 MitchellAugustin on Aug 6, 2025 Member Author Hi @tomponline , thanks for that info! That patch is definitely relevant, and I would expect that removing it would enable passed-through GPUs to “just work” when attached in the same way as they do with Noble’s edk2 bind-mounted into place. This would bring LXD in line with how libvirt behaves today, so my opinion is that we should remove the patch (and thus re-enable dynamic mmio window sising) in future LXD releases.

I do want to add some additional context though. While dynamic MMIO window sising is the intended behaviour of OVMF, it may come with a performance penalty (of around 30 seconds extra boot time per ~128GB GPU, scaled based on size and number of GPUs) for users who are currently passing through multiple large GPUs and using pci=realloc pci=nocrs rather than the larger opt/ovmf/X-PciMmio64Mb override to get them to work in the VMs.

We saw this issue with Nvidia users of libvirt who transitioned from Jammy to Noble, since Noble introduced the dynamic MMIO window sising in OVMF. Technically speaking, this isn’t a regression - pci=realloc pci=nocrs just causes the mapping to occur at a point in the boot process where a previously existing slow path in the kernel was not being hit. That kernel bug is totally resolved without user workarounds now in Plucky’s kernel and all 6.15+ based kernels, but wasn’t SRUable to older kernels.

With that said, if there are LXD users who are using pci=realloc pci=nocrs rather than a larger X-PciMmio64Mb override to get their passed-through GPUs to work, they might see boot times increase on those VMs with passthrough. If they can’t upgrade to a newer kernel, they will be able to force a revert to the old OVMF behaviour by setting an X-PciMmio64Mb override smaller than what is needed for their GPUs (in conjunction with pci=realloc pci=nocrs), as long as LXD’s OVMF contains my patch. (usage described here; this is in Noble and beyond).

(For users who are using the larger X-PciMmio64Mb as recommended by the LXD FAQ, they should not notice any negative performance impact from this switch, though.)

So, to summarise - in my opinion, I still think the correct path forward here is to re-enable dynamic window sising by removing edk2-0005-disable-dynamic-mmio-winsize.patch in future LXD versions - but we should also make some note on the FAQ about how to work around this potential boot time slowdown on older kernels, in case some are using the workaround with a faster boot time that will become shadowed by default with this change.

MitchellAugustin changed the title [-]Improve VM GPU passthrough support by automatically appling larger q35-pcihost.pci-hole64-size when attaching device[/-] [+]Improve VM GPU passthrough support by enabling automatic OVMF MMIO window sising when attaching device[/+] on Aug 6, 2025 tomponline tomponline commented on Aug 6, 2025 tomponline on Aug 6, 2025 Member Thanks this is excellent context.

We also need to decide if when we remove that patch it also breaks older guests (which was the reason for disabling it in the first place) whether we just recommend those older guests use seabios mode (security.csm=true for now) and avoid EDK2 altogether.