nt-load-order Part 2: More than you ever wanted to know

This is the second part of a two-part blog series on WinDbg fundamentals, the Windows driver load order, and my nt-load-order crate:

nt-load-order-gui screenshot

Welcome to the second part of my blog series on the Windows driver load order. Good to see that you made it up to here.

After the preparations in my last post, we are now finally equipped with all necessary tools to analyze the load order and develop a compatible sorting algorithm.

The hardcoded modules

Let’s begin with the easy part. As we can conclude from the call tree shown in the previous post, the first 3-4 modules in the load order are hardcoded. It all begins with the kernel binary ntoskrnl.exe and is followed by the HAL hal.dll (which is just a stub in latest Windows but still loaded). If you enabled kernel debugging during boot, the corresponding KD transport driver is loaded. In my case, this is kdcom.dll for the serial kernel debugging driver.

Up to here, this matches exactly what I already knew from the open-source ReactOS bootloader (“FreeLdr”). However, Windows is a different breed and loads a fourth hardcoded driver, namely the CPU microcode updater mcupdate.dll. This driver is different depending on the detected CPU, which is why it’s actually called mcupdate_AuthenticAMD.dll in the filesystem of my machine. Nevertheless, the BaseDllName set in the KLDR_DATA_TABLE_ENTRY structure is always mcupdate.dll.

These 3-4 modules are always the first ones you see when dumping the LoadOrderListHead. They don’t change their positions anymore after having been added.

Let’s now come to the actual drivers.

Loading boot drivers

Looking at the call tree, we have now reached the part where OslGetBootDrivers is called. As you can imagine from the subcalls, things are more dynamic here: The bootloader checks the registry for the drivers to load in OslHiveFindDrivers, adds dependencies of these drivers in CmpAddDependentDrivers, and finally sorts the list in CmpSortDriverList.

On the first glance, all of this looks similar to what has been researched before and what’s also used in the ReactOS bootloader. On the other hand, ReactOS has no support for API Sets in the bootloader, so things must be subtly different again on Windows.

A trip into the registry

Collecting the dynamic boot drivers begins by checking the List value in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ServiceGroupOrder registry key. This is a multi-line string value (REG_MULTI_SZ) containing the names of driver groups that are meant to be loaded.

Registry Editor showing the ServiceGroupOrder key and its List value

Each of these groups has a binary value (REG_BINARY) in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GroupOrderList registry key. This is basically a length-prefixed array. The first 4 bytes represent a little-endian integer denoting the count of items that follow. The following items are also 4-byte little-endian integers called tags. What matters here is both the value and the order of the tags. But we’ll come to this shortly. This format hasn’t changed for decades, so you can find several independent implementations these days (e.g. in the ReactOS bootloader).

Registry Editor showing the GroupOrderList key and its Base value

Finally, each boot driver has a subkey in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services. But this is not limited to boot drivers: Demand-loaded drivers, legacy drivers, and NT services also claim their spots in this registry key.

To write a bootloader, we need to enumerate all subkeys and check the Start value of each subkey. This REG_DWORD value denotes if a service shall be started and by whom. All possible configurations are documented by Microsoft, but our bootloader is only interested in zero values, which indicate boot drivers.

Registry Editor showing the 3ware Services key and its Start value

Recent Windows versions have added another layer of indirection on top: The registry subkey of a driver may contain a StartOverride subkey with an extra value for each hardware profile.

Hardware profiles had their heydays during the early days of Windows NT when laptops had lots of I/O devices but too few interrupt lines to use all of them at the same time. Some hardware devices also refused to work simultaneously for other reasons. Hardware profiles offered an easy way to only activate one of the affected devices at boot time.

Current Windows versions are still structured around hardware profiles under the hood, but I haven’t seen them being used for a long time. Anyway, the StartOverride subkey is a remnant of this time, and I still saw it used in Windows 10 and 11. An example is the “3ware” driver, which looks like a boot driver when looking at its Start value, but is turned into a demand-loaded driver via StartOverride.

Registry Editor showing the StartOverride subkey of the 3ware Services key and its value for hardware profile 0

StartOverride is one of the subtle differences I found in the bootloader of recent Windows versions. The open-source implementations known to me don’t consider this key.

Start and StartOverride are not the only interesting values for a bootloader. Each registry subkey of a driver may also contain Group and Tag values. This is used to associate a driver to a group, and reference a position in the load order (relative within the group).

Sorting the boot drivers

Now that we know about all groups and boot drivers, we can finally sort them and determine the notorious driver load order.

From a 10,000-foot view, boot drivers are sorted by their groups according to the List value in the ServiceGroupOrder key. Within these groups, the order of tags given in the GroupOrderList key is considered, and each driver claims the spot of its referenced tag.

Of course, Microsoft doesn’t make it as simple as that. If we really want to get the exact same load order as the Windows bootloader, we need to consider all the corner cases:

  • What happens if a driver has a Group but no Tag value?
    As it turns out, the Windows bootloader assigns an index of 0xffff_fffe to such drivers, putting them at the very end of the load order within the group.

  • What happens if a driver references a group but that group has no entry in GroupOrderList?
    Surprisingly, this happens in a default installation of Windows 10 and 11: Multiple drivers reference a non-existing “Core” group. In this case, the Windows bootloader simply treats the Tag value as an index. At this point, it also becomes important to know that the indices are 1-based.

  • What happens if two drivers have the same group and tag?
    There are a few such cases and the Windows bootloader manages to maintain a deterministic order for them. However, this deterministic order is a byproduct of the doubly linked list and its move operations inside the Windows bootloader. I will cover these details in the following. My first attempt was to use a sorting algorithm based on swapping elements in a vector instead of moving them in a list. But this yields a slightly different order compared to the original.

This complexity raises the question why Windows has chosen to go via the extra indirection of tags at all. Why not just maintain a string list of drivers within a group, just like the List value in the ServiceGroupOrder key?
I haven’t invented this system, so I can only speculate. But the answer probably lies in performance and future-proofing – as so often in designs of early Windows NT: With numeric tags, adding a driver to a group is as simple as editing an array of integers. Inserting a string in the middle is much more complex. That was probably considered fine for the group list, because this one rarely changes. But the Windows NT developers apparently expected drivers to be added to groups more regularly.

Additionally, you can even swap the positions of two drivers just by exchanging their tags. The array in GroupOrderList doesn’t need to be touched at all for this.

Even more corner cases

Apparently, all of this wasn’t enough for the new generation of Windows developers. In addition to this complex but manageable group and tag system, they hardcoded some groups into the bootloader. If a driver references this group, it takes precedence before all the other groups, irrespective of any entry in the ServiceGroupOrder key.

A prominent group is the “Early-Launch” group, containing Early-Launch Anti-Malware (ELAM) drivers. As the name suggests, these drivers shall be loaded at the earliest possible stage to start monitoring the system before any third-party driver is loaded. The Windows bootloader hardcodes two more groups, namely “Core Platform Extensions” and “Core Security Extensions”.

Does this conclude the chapter on hardcoded groups? Of course not!
Apart from group names, recent Windows bootloaders also hardcode individual drivers in lists. These drivers again take precedence before everything else.

I’ve identified two such lists called OslCoreDriverServices and OslTpmCoreDriverServices. Their names indeed suggest that they have good reasons to be loaded first. However, I’d really be interested to hear why these drivers justified another hardcoded list in the bootloader instead of being managed via groups and tags.

The importance of the doubly linked list

We already know from the previous post that the Windows bootloader uses a doubly linked list for passing the final driver list as LoadOrderListHead to the Windows kernel. While linked lists are not recommended for new applications in user-mode due to the inherent CPU cache misses when accessing them, they have been ubiquitous in operating system kernels and continue to be important there.

Therefore, it’s no surprise that the CmpDoSort function of the Windows bootloader uses a sorting algorithm that is fully built for doubly linked lists. What surprised me though is that any other algorithm would yield a subtly different order. You ask how that happens? Let’s find out by looking at how the Windows bootloader sorts drivers:

  1. CmpDoSort is called with a list containing the boot drivers loaded from the Services key of the registry. The registry stores them in alphabetic order and outputs them in alphabetic order. But the Windows bootloader iterates through each entry and pushes it to the front of the linked list. The result is a boot driver list that contains all items in reversed alphabetical order.

  2. The reversed alphabetical list is sorted by tag, from front to back, using Insertion sort.

  3. The resulting list is only then sorted by group, but from back to front: We begin with the last group in the ServiceGroupOrder and iterate through all drivers in the list from the last to the first one. All drivers that belong to this group are pushed to the front of the list. The iteration is continued in reverse for all drivers until we reach the first driver that we moved again. The entire process is then repeated for the second-to-last group, the third-to-last group, and so on. Don’t let the backwards iteration fool you here: By iterating backwards twice and pushing elements to the front of the list, the resulting list is actually sorted in ascending direction.

  4. The resulting boot driver list is filtered: Drivers from the hardcoded groups and service lists are removed and put into their dedicated lists. OslLoadDrivers is then called for the dedicated lists first before the remaining drivers in the boot driver list are added. This is how LoadOrderListHead gets its final order.

Starting from a reverse alphabetical list, then iterating in forward direction, later iterating again in backward direction, and all of that while moving elements around, leaves no driver untouched in the list. But they still have a deterministic order on every boot. Every driver has its list position for a reason.

I tried to reproduce the same order with a different sorting algorithm, but this turned out to be harder than expected. If you’re interested in the details, check out the What didn’t work? chapter at the end of this post. I ultimately gave up and rather spent the efforts to better understand the original algorithm and come up with a compatible reimplementation.

Handling dependencies

Only after sorting all boot drivers into a list, the bootloader even begins to look at the driver binaries. Loading them into memory is the first step, but usually not enough: We also need to inspect the PE headers of all boot drivers for imported dependencies. These need to be loaded and added to the LoadOrderListHead too.

To keep things interesting, Microsoft developers made the resulting order of imports peculiar again. You may assume it can be summarized to a simple formula like “dependencies come before their dependents”, but it’s actually not like that. Let’s take a look at how the Wdf01000 driver and its dependencies are arranged in the LoadOrderListHead:

FileReason to load
Wdf01000.sysIs a boot driver
WDFLDR.SYSImport of Wdf01000.sys
WppRecorder.sysImport of SleepStudyHelper.sys
SleepStudyHelper.sysImport of Wdf01000.sys

Looking at SleepStudyHelper.sys, its import WppRecorder.sys is loaded before. But why doesn’t the same apply to Wdf01000.sys? Why is it loaded before all of its imports in LoadOrderListHead?

An analysis of the LdrpLoadImage function inside the Windows bootloader reveals the reason behind this. The relevant part can roughly be written in pseudocode like:

function LoadImage(Image, IsImport):
    AddToEndOfLoadOrderListHead(Image)

    for Import in Image.Imports:
        LoadImage(Import, TRUE)

    if (IsImport):
        RemoveEntryFromLoadOrderListHead(Image)
        AddToEndOfLoadOrderListHead(Image)

Wdf01000.sys as a boot driver would be loaded by calling LoadImage("Wdf01000.sys", FALSE). The algorithm first inserts a given boot driver at the end of LoadOrderListHead and doesn’t move it anymore. It then handles all its imports by recursively calling itself again, indicating that these are imports. Imports are inserted at the end of the list again. Imports of imports are handled the same. However, each time all imports of an import are fully handled in LoadImage, the previously added import is removed from its position and inserted at the end of the list. As a result, you end up with a load order where an explicitly added boot driver comes before any of its imports, but its imports are added after their imports. One could say that this part at least follows a strict “dependencies come before their dependents” rule, but really only this part of it.

How important is all of this for the Windows bootloader? Wouldn’t a different algorithm work equally well? I’m afraid these questions will remain unanswered unless someone from Microsoft speaks up. Until then, we as reverse-engineers can only speculate and try to understand and replicate the original as good as possible for achieving compatibility.

Handling dependencies with API Sets

There is an important second part to consider when implementing a LoadImage-like function: Windows 10 introduced API Sets even for kernel-mode components. While your bootloader for previous Windows versions could just follow the filenames of imports in a driver’s PE header, a bootloader for modern Windows adds a few more steps:

  • Each filename is checked whether it’s an API Set (starts with either api- or ext-).
  • This API Set filename is then looked up in the API Set Map of the target operating system (found in its apisetschema.dll).
  • If an entry is found in the API Set Map, then this is the actual import file to load.
  • If no entry is found, then this import doesn’t exist on the target operating system. This situation is entirely legit: Remember that Windows also runs on totally different Microsoft hardware products these days, like Xbox and HoloLens. They don’t need all features of a general-purpose Windows operating system.

Fortunately, you don’t need to implement the details behind API Sets yourself. I already presented my Rust crate for that in a previous post: nt-apiset: A Rust parser for Windows API Set Map files

Putting it all together

Did you closely follow all rules and quirks until now? Don’t worry, you don’t need to turn this into code yourself.

I have implemented all these steps in a Rust library called nt-load-order. It can be used to analyze either the current operating system or any target system root directory. The latter step even works on non-Windows platforms (leveraging my platform-independent nt-hive crate).

The library comes with a GUI example application that demonstrates these features. You can freely turn on and off any step in the creation of the boot driver load order and find out why a certain driver got its position in the list.

All my previous crates only had basic console examples, partly due to the non-existence of a widely accepted cross-platform Rust GUI toolkit. But for this exceptionally Windows-specific crate, I found it acceptable to provide an example application exclusively targeting Windows. On the plus side, this example can be a Win32 GUI application – my first use of the native-windows-gui crate.

nt-load-order-gui example application

I would like to thank GitHub user CasualX for his pelite Rust crate, which I’m using to parse the PE headers in my library.

I also needed a data structure with at least linked list semantics to get the exact same sort order as the Windows bootloader. I found this Survey of Rust linked list libraries and out of the winners that ticked all boxes, I chose dlv-list by Scott Godwin (partly due to having a much handier name than generational_token_list, sorry for that!)

What didn’t work

This is the part that is usually left out, because people only like to talk about their wins. But I consider it equally important to also document the things that I tried and that didn’t work.

When I initially took on this task, my idea was to sort the boot drivers by using a few calls to Rust’s sort_by functions along with closures to define comparison rules. Rust’s sorting algorithm would then establish a total order for the boot drivers. sort_by would use “stable sorting”, meaning that equal elements would remain at their original positions.

Like most attempts to outsmart the original, this idea failed: While it did establish a load order, multiple boot drivers ended up on different positions compared to the original. It turned out that the exact load order of the Windows bootloader could only be replicated by also using the original algorithm described above.

This may or may not make a difference in practice. I don’t know the reasons Microsoft had for implementing it like this and whether the exact order matters. But as a reverse-engineer creating a compatible reimplementation, I need to pay utmost attention to correctness and follow the original as close as possible. This also applies to things where I have no chance to figure out the original reasons. Hence, my final code uses an algorithm that very much resembles the one found in the Windows bootloader.

Acknowledgments

This crate is dedicated to the Slovenian town of Bled, where I was that weird guy researching sorting algorithms at the beautiful lake in Summer.