Fixing PCI passthrough Windows 10 VM boot issues

I setup my PC to dual-boot Linux and Windows, while also making it possible to boot the Windows disk in a QEMU KVM-accelerated VM, using PCI-passthrough, VirtIO devices, VFIO drivers and other tricks to achieve near-native performance in games. I’ve discussed this in previous blog posts.

Unfortunately, sometimes when I try to boot Windows in the VM, it gets stuck, for example while:

  • Displaying the Windows logo, or
  • While attempting to configure/repair devices during Windows boot.

This happens a bit more often than I’d like, and I’ve not found solutions to these problems to be easy to find on the web, so I wanted to document my experience in the hope that others will find it useful.

Example failure mode

Apologies for the poor image quality, these are phone camera photos rather than screenshots.

Boot logo only, no indication of loading, constant CPU utilisation (1 core?)

Sometimes I also get a spinner that turns about half a revolution and then stops moving.

Windows 10 booting, showing only the Windows 10 logo and no spinners or any other loading indication
Windows 10 booting, showing only the boot logo.

Troubleshooting

Basics

Check your monitor is using the right input

If you’ve got a PCI-passthrough’d graphics card, make sure that’s plugged into the monitor and your monitor’s input setting is correct.

Reboot into native Windows

For reasons I don’t fully understand, rebooting into native Windows often fixes the issues. Subsequently the machine boots just fine in the virtual machine.

(Force-)reboot the VM

Heading says it all!

Shutdown and power up VM 3x to trigger automated diagnostics

For me, shutting down and powering up the VM three times results usually in the “Automatic Repair couldn’t repair your PC” screen. Pressing “Advanced Options” brings you to another screen, where I usually select “Continue to Windows 10” (without doing anything).

Picture showing the Automatic Repair options offered during Windows 10 boot. The title is "Automatic Repair", and the text reads "Press 'Advanced Options' to try other options to repair your PC or 'Shut down' to turn off your PC. Log File: C:\WINDOWS\System32\Logfiles\Srt\SrtTrail.txt". Below are two buttons: Shut down and Avanced Options.
Windows 10 boot automatic repair options.

“Continuing to Windows 10” has occasionally caused me to get to Windows, although I’ve then sometimes been greeted with a very low resolution login screen. At least I can use that to look at the device manager. There I’ve found the Nvidia driver to not be initialised correctly. As I’ve started and stopped the VM several times, I suspect that the GPU probably struggles with being reset several times without a proper power down; I’ve then gotten back to a normally working system by shutting down my PC entirely, starting Linux and the VM, and everything’s worked again.

Virtual hardware/driver related issues

Remove unnecessary devices

Passing through my joystick (a Thrustmaster T.16000M) while the machine boots seems to stall the boot process. Hot-plugging it while the VM is running seems to work just fine. Go figure.

Up until very recently, I had persistent issues with the VM not booting. Eventually, I discovered (using the IDE-booting method mentioned below) that the virtual LSI Logic SCSI controller showed up in the device manager without a driver. As I couldn’t easily find one, and I didn’t really need it anyway, I just removed the virtual SCSI controller.

To identify the culprit, you need to get the PCI bus, device and function numbers from the Windows Device manager. These can then be correlated with the libvirt XML representation of the VM, which you can get using [sudo] virsh dumpxml <VM domain name>

Boot disk driver issues

Normally, I use a VirtIO serial controller and a VirtIO HDD backed by a raw disk device as my Windows OS disk. I sometimes find that boot issues can be fixed by temporarily switching the HDD bus back to SATA/IDE, sorting out whatever issue exists, and then switching it back to VirtIO. If you need to install the VirtIO driver while you can only boot from SATA/IDE, I would try attaching a second disk VirtIO disk, set the boot disk to SATA/IDE, and try installing the driver for the second disk (I don’t know for sure if this will work).

I understand there to exist a distinction between boot-start drivers and normal ones. The Microsoft help page on installing a boot-start driver explains the process at Windows installation time, but to this date I don’t know how to do this after the system has been installed. Is it sufficient to install the driver normally?

The windows startup repair features (accessed via ‘Advanced Options’ button in the Automatic Repair screenshot further up) can be handy for checking logs using cmd and notepad, but it’s nowhere near as useful as a live Linux USB stick/CD is repairing a Linux distribution that doesn’t boot.  I only really use the command line. I’ve tried using pnputil (as recommended in this SuperUser question) for installing drivers, but this has never been the final thing that has fixed an issue for me, so I can’t conclusively tell if this ever made a difference.

Update all VirtIO drivers

Once you can boot the machine somehow, you can grab the VirtIO driver disk from https://docs.fedoraproject.org/quick-docs/en-US/creating-windows-virtual-machines-using-virtio-drivers.html (under Direct downloads -> Stable virtio-win iso) and attach that iso via a SATA CDROM drive to the VM. You can then go through all the VirtIO devices in the device manager and try to update the driver. Remember to update the controllers, e.g. the Storage Controllers -> Red Hat VirtIO SCSI Controller.

ACPI/reboot related (unresolved)

One issue that I’ve always had with my Windows 10 VM is that it doesn’t shut down cleanly when powering it off from the Windows UI – instead, it reboots. This to me suggests that there is perhaps an issue in the power management/ACPI area. Normally, I work around this by manually powering the VM off from the Linux side at the right moment. Unfortunately, the right moment is not obvious while undertaking Windows updates with several reboots, so I tend to end up doing these from a natively-booted Windows.

It would be great if I was able to find a solution to this problem.

Update VM BIOS/OVMF UEFI firmware

Remember when you installed the VM, you had to tell QEMU via libvirt which firmware to use (e.g. SeaBIOS or more likely OVMF)? Depending on when and how you setup your VM, this firmware could be out of date, and updating it might be worth a try. For example, I originally used one from one of Kraxel’s repos, but have since switched to using the one from the ovmf Arch Linux package.

Do make backups of both the firmware code and variables before doing this; you’ll find the paths to both files in the libvirt XML (sudo libvirt dumpxml <name>). Look for the <loader> and <nvram> tags.

This is one of those things that I tried, but since it didn’t fix the issue immediately afterwards and I didn’t roll back either, I can’t conclusively say whether or not this helped.

Summary

I’ve outlines various approaches I’ve tried (not always succesfully) to recover my PCI-passthrough and otherwise accelerated Windows VM that’s no longer booting. Unfortunately, there are still many open questions in this area; if you know any answers to any of my questions, have references worth reading or other things to add, or just want to say that one of the tips has worked for you, I’d look forward to hear from you in the comments.

Updates

7/10/2018: Added example failure mode picture and added section on rebooting VM 3x.

27/12/2018: Added note about updating VirtIO Storage controllers; when switching boot disk type to something other than VirtIO, I’ve had more luck with SATA than IDE. Also added a note about updating the OVMF firmware

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.