How we messed with boot-time drivers

One of our customers recently reported a problem with running our software in tandem with VMware vShield. Although the solution turned out to be quite simple, the investigation process is worth writing a blog post about, since it revealed some interesting facts related to driver behavior at boot time.

The main idea behind VMware vShield is to provide antivirus services to virtual machines without installing a full antivirus solution on guest operating systems. Input data from virtual machines is obtained from a small driver named vsepflt.sys and called vShield Endpoint Thin Agent. The driver registers itself as a file system minifilter and then communicates with the antivirus solution on the host system. Unfortunately, when our software was running inside a virtual machine, the vsepflt.sys driver mysteriously refused to start at boot time, although it worked fine when loaded manually.

We thought the manual load solutions wasn’t good enough, because even though it would probably work fine in practice, it might indicate something nasty inside our software. All we knew was that the vsepflt.sys driver initialization failed with the error code 0x80000011 (STATUS_DEVICE_BUSY).

Since there was no direct or indirect indication where the problem might be in our software and it wasn’t easily possible to examine the exact conditions on the customer’s machine during the system boot, I decided to look into the vsepflt.sys driver a bit. The driver (at least its initialization routines) turned out to be quite well-written – all error paths in its DriverEntry function (the routine responsible for driver initialization) seemed to produce a debug print saying what probably went wrong. Routine names were usually included within the debug messages which made the code more readable. Also, routines responsible for debug prints suggested that the messages aren’t just sent to the debugger, but also to another place outside the virtual machine where the customer can collect them and provide them to us (enabling debug logging for the VMware Tools vShield Endpoint thin agent driver). After a while this actually happened, and after instructing VMware to produce more detailed logs, the following debug message was discovered:

Listing 1: Debug messages from vsepflt.sys

DEBUG: VFileUmcReadParams : umcMsgTimeoutMs: 2000
 DEBUG: VFileUmcReadParams : umcMsgTimeout100Ns: -20000000
 INFO: DriverEntry : Created EPSec control device: 0xfffffa8003e1a6a0.
 DEBUG: VSockLayer_Init : Initializing VSockLayer module
 ERROR: VSockLayer_Init : VMCISock_WskCaptureProviderNPI: 0x80000011
 ERROR: VFileSocketMgr_Init : Failed to initialize vsock layer: 0x80000011
 ERROR: DriverEntry : VFileSocketMgr_Init() failed: 0x80000011
 AUDIT: DriverEntry : vfileFilter build-2530600 load failed: 0x80000011

The message indicated that a call to the WskCaptureProviderNPI routine failed with a STATUS_DEVICE_BUSY error code. Those of you familiar with the Winsock Kernel socket-like interface introduced on Windows Vista that allows kernel components to conveniently communicate over network know that the routine with the same name needs to be called when initializing the client side of the interface (registering a Winsock Kernel application). However, it didn’t prove to be that simple. The vsepflt.sys driver does not use Winsock Kernel (that would cause problems with pre-Vista compatibility) and neither of the WSK initialization routines (WskRegister, WskCaptureProviderNPI) contain a code path which obviously contained the error code we had seen. So, a deeper analysis was required.

It turned out that the vsepflt.sys driver uses the VMware vSockets interface to communicate with software on the host operating system. VMware vSockets is a socket-like interface made specifically for that purpose. It consists of a kernel driver and a DLL that registers as Winsock Service Provider (WSP), so an application may use standard socket functions to utilize the interface; it just needs to specify a special address family constant when creating a socket. The good news was that the vSockets interface is documented. Unfortunately, there was also bad news: the documentation clearly says that the interface is intended to be used only by applications and not by third-party kernel components.

That caused quite a problem since we had speculated about creating a simple file system minifilter driver that would act like the vsepflt.sys, at least during the initialization phase. Fortunately, we knew quite precisely where things went wrong inside vsepflt.sys (the debug messages helped here). Further reverse engineering revealed that the failure is caused by sending an IOCTL request to device object named \Device\vmci served by \Driver\vsock, the driver implementing the kernel part of VMware vSockets. The discovery allowed us to create a test driver that attempted to send the same IOCTL during its initialization. The code sending the IOCTL is shown on Listing 2. You can see that the IOCTL is not very hard to build since it just needs to zero the whole input buffer except the first DWORD, which needs to be set to 1. Things are a little bit more complicated inside vsepflt.sys; there is probably a static library that handles vSockets operations but that hardly matters at all if we are only interested in sending the IOCTL.

Listing 2: Sending the WskCaptureProviderNPI-like IOCTL (for 32-bit Windows versions)

NTSTATUS _ConnectVSocks(void)
{
  LARGE_INTEGER timeout;
  PIRP irp = NULL;
  PIO_STACK_LOCATION irpStack = NULL;
  KEVENT event;
  IO_STATUS_BLOCK iosb;
  PDEVICE_OBJECT deviceObject = NULL;
  PFILE_OBJECT fileObject = NULL;
  UNICODE_STRING uDeviceName;
  NTSTATUS status = STATUS_UNSUCCESSFUL;

  timeout.QuadPart = -10000000;
  RtlInitUnicodeString(&uDeviceName, L"\\DosDevices\\vmci");
  for (SIZE_T i = 0; i < 100; ++i) {
    status = IoGetDeviceObjectPointer(&uDeviceName, 0x1, &fileObject, &deviceObject);
    if (NT_SUCCESS(status)) {
      ULONG inputBuffer[0x5C / sizeof(ULONG)];

      memset(inputBuffer, 0, 0x5C);
      inputBuffer[0] = 1;
      KeInitializeEvent(&event, NotificationEvent, FALSE);

      irp = IoBuildDeviceIoControlRequest(0x810320C0, deviceObject, inputBuffer, sizeof(inputBuffer), inputBuffer, sizeof(inputBuffer), FALSE, &event, &iosb);
      if (irp != NULL) {
        irpStack = IoGetNextIrpStackLocation(irp);
        irpStack->FileObject = fileObject;
        status = IoCallDriver(deviceObject, irp);
        if (status == STATUS_PENDING) {
          (VOID)KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);
          status = iosb.Status;
        }

        if (NT_SUCCESS(status))
          break;
      } else {
        status = STATUS_INSUFFICIENT_RESOURCES;
        ObDereferenceObject(fileObject);
      }
    }
      
    if (!NT_SUCCESS(status))
      KeDelayExecutionThread(KernelMode, FALSE, &timeout);
  }

  return status;
}

After the test driver was built, we were able to successfully reproduce the problem in our environment. When our software was running, the IOCTL always failed with STATUS_DEVICE_BUSY, no matter how many attempts we made (Listing 2 indicates that the test driver attempts to send the IOCTL 100 times before giving up). The best thing about the test driver, however, was that we were able to examine the operating system state. The most interesting thing we found was the list of boot-start drivers initialized (or being initialized) that is shown on Listing 3. To compare the list with a machine containing just VMware Tools, see Listing 4. I decided to list driver objects, rather than mapped driver files, to get a better idea of which of them were already initialized.

Listing 3: drivers initialized early at boot-time on a machine running our software

0: kd> !object \Driver
Object: 8c458b08  Type: (85547b50) Directory
    ObjectHeader: 8c458af0 (new version)
    HandleCount: 0  PointerCount: 23
    Directory Object: 8c408ed0  Name: Driver

    Hash Address  Type          Name
    ---- -------  ----          ----
     01  855d2a70 Driver        LSI_SAS
     04  8556bf38 Driver        vmbus
         8555c798 Driver        Compbatt
         85628f38 Driver        msisadrv
     05  855665c8 Driver        mountmgr
     08  8556b788 Driver        atapi
     09  8555d7d8 Driver        volmgrx
     11  8556da20 Driver        amdxata
     16  8556b670 Driver        vsock
     17  85566e40 Driver        vmci
     18  856481b8 Driver        00000142
         855f18a8 Driver        WMIxWDM
         855f92d8 Driver        ACPI_HAL
     25  8555c670 Driver        volmgr
     29  85628978 Driver        pci
     30  8555ce10 Driver        partmgr
     32  8554c908 Driver        ACPI
         8556d030 Driver        msahci
         8554d320 Driver        Wdf01000
     33  85619668 Driver        PnpManager
     34  856287d0 Driver        vdrvroot
     36  85566f38 Driver        intelide
0: kd> !drvobj \Driver\vmci
Driver object (85566e40) is for:
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for vmci.sys -
 \Driver\vmci
Driver Extension List: (id , addr)

Device Object list:

0: kd> !object \Global??\VMCIDev
Object Global??\VMCIDev not found

Listing 4: List of initialized boot-time drivers on a nearly clean machine

 0: kd> !object \Driver
Object: 8c067dd8  Type: (851ce040) Directory
    ObjectHeader: 8c067dc0 (new version)
    HandleCount: 0  PointerCount: 23
    Directory Object: 8c004e18  Name: Driver

    Hash Address  Type          Name
    ---- -------  ----          ----
     01  85825460 Driver        LSI_SAS
     04  8539f1e8 Driver        vmbus
         85b60730 Driver        Compbatt
         85ca3330 Driver        msisadrv
     05  8539b1e8 Driver        mountmgr
     08  8537f1e8 Driver        atapi
     09  85cf1998 Driver        volmgrx
     11  85213568 Driver        amdxata
     16  8539e1e8 Driver        vsock
     17  853291e8 Driver        vmci
     18  852471b8 Driver        00000142
         85230030 Driver        WMIxWDM
         8522f908 Driver        ACPI_HAL
     25  85b17030 Driver        volmgr
     29  858f3088 Driver        pci
     30  85a4a728 Driver        partmgr
     32  853971e8 Driver        msahci
         8514c8c0 Driver        ACPI
         85235db8 Driver        Wdf01000
     33  85222ce0 Driver        PnpManager
     34  85be4e88 Driver        vdrvroot
     36  853411e8 Driver        intelide
0: kd> !drvobj \Driver\vmci
Driver object (853291e8) is for:
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for vmci.sys -
 \Driver\vmci
Driver Extension List: (id , addr)

Device Object list:
8584c6e8  8584c890 
0: kd> !object \Global??\VMCIDev
Object: 8c1cb978  Type: (851cef78) SymbolicLink
    ObjectHeader: 8c1cb960 (new version)
    HandleCount: 0  PointerCount: 1
    Directory Object: 8c008d20  Name: VMCIDev
    Target String is '\Device\VMCIHostDev'
0: kd> !devobj \Device\VMCIHostDev
Device object (8584c890) is for:
 VMCIHostDev \Driver\vmci DriverObject 853291e8
Current Irp 00000000 RefCount 0 Type 00000004 Flags 00002040
Dacl 8c1fbca8 DevExt 8584c948 DevObjExt 8584c9c8
ExtensionFlags (0000000000) 
Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
AttachedTo (Lower) 8522f030 \Driver\PnpManager
Device queue is not busy.

The strange thing about Listing 3 was that none of our drivers was actually loaded, although they were registered as boot-time drivers. The vsock.sys driver was loaded and probably initialized enough to receive IOCTL requests. I decided to take a quick look into that driver too, just to see if there were any obvious dependencies with other drivers. And there was one.

The vsock.sys driver communicates with another VMware driver, vmci.sys (\Driver\vmci, do not confuse with \Device\vmci owned by \Driver\vsock) that is also started at boot time. The \DosDevices\VMCIDev symbolic link is used to establish a communication channel to the driver. As can be seen from Listing 3, no such symbolic link exists while it is present on Listing 4. So, I deduced that our software somewhat restricts the vmci.sys driver from creating its device objects early enough.

The vmci.sys driver serves a virtual PnP device of System setup class that probably represents a root bus device for all activity related to VMware Tools. Since one of our drivers registers as an upper filter for this setup class, initialization of the vmci.sys driver was postponed until all upper filters were ready (specifying Driver Load Order). That’s exactly why the driver wasn’t ready when needed by vsock.sys. And that caused the STATUS_DEVICE_BUSY problem. Remember that Listing 3 shows that none of our drivers are loaded that early during boot.

The whole story came to its happy end at last. We proposed two solutions to the problem: removing our upper filter for the System device setup class or moving our drivers into a load order group, such as System Boot Extender, that loads earlier at boot time. We learned again that one is never careful enough when it comes to the initialization of boot drivers. Messing with the boot-time load order may not only bring the system into unbootable state, but can also lead to problems with much less obvious causes.

Finally, we would like to thank our customer for being really helpful when we needed additional information related to the problem. This is not always the case.


Safetica Employee Martin

Written by Martin Drab, Developer @Safetica Technologies

Martin always digs deep to know how things work under the hood. Especially interested in low-level and kernel programming on Windows, he maintains core parts of the Device Control for you. A flawlessly working code, an interesting technical article and a good swimming time are things that please him most.

 

email

This article has 1 comments

  1. Farid Reply

    Your focus in resolving the case for the customer was commendable! Not to mention the patience! That is the kind of support dedication that we like with Safetica Team ensuring that customers ALWAYS enjoy using the product and have a support team to rely on when the need arises. Good Work Martin!

Leave a Comment

Your email address will not be published. Required fields are marked *