Windows 7 and Server 2008 R2 Kernel Changes (TechEd Europe 2009) : Windows Research Kernel @ HPIWindows Research Kernel @ HPI
Michael and I got the chance to attend TechEd Europe 2009 in Berlin, where Mark Russinovich was giving a great talk about the changes within the Windows 7 and Server 2008 R2 kernels. And although this blog is about the WRK and its details, we thought it might still be valuable to stay up to date with current OSes.
For those who don't have the time for watching the whole 70 minute presentation, here is a summary of the changes:
Basically, the most noteable change here is that Windows 7's version number is 6.1, where Vista's was 6. Although Windows 7 is a major release, according to Mark, Microsoft decided to use this version number to ensure backwards compatibility with applications that check for Vista's version number.
Also, the footprint of the kernel got smaller, but I did not verify that yet. Another change is that the Registry now completely resides in pageable memory. Prior Windows 7, the Registry was in non-pageable memory. In order to not let the Registry consume up all the non-paged pool, Windows implemented some strategies to selectively swap out some parts of the Registry. Now, this swapping basically comes for free with paging 🙂
Microsoft did also implant a new performance analysis platform called PerfTrack that allows Microsoft to measure, mostly user perceiveable, performance and send results back to MS data bases. Microsoft therefore identified up to 300 scenarios, each of which has a start event, like click the Start button, and an end event, like the Start Menu shows up. Those results are constantly evaluated against Windows' performance goals.
- Windows uses an Aging page replacement strategy. Prior to Windows 7, each page was assigned an age value consisting of two bits. Now, each page has an age value consisting of 3 bits and therefore allowing a better approximation of pages that have least recently been used.
- The kernel space so far split up into virtual sections, that is, there was portion for the paged pool, for the pageable system code, and the system cache. Now, these sections are each represented by a working set allowing better distribution of pages among those sections.
The mantra for this aspect basically is: "Keep idle and stay idle!" What this basically means is that Windows 7 now tries to reduce background activity to a minimum and tries to maximize load on a logical processor (LP, a core) before employing a new LP.
To accomplish that goal Windows now implements Core Parking. Core Parking means that Windows tries to migrate load on as few LPs as possible in order to bring as much idle LPs as possible into a deep sleep state, where they save the most energy. This is done by keeping the hardware architecture in mind, i.e. Windows tries to migrate loads also to as few processor sockets as possible in order to be able to switch off even CPU sockets and not jut single cores. Windows performs a P-state calculation appr. every 50 ms and after computing the state of the system, potentially triggeres migrations. Core Parking, however, is only available on Server 2008 R2 and clients (Windows 7) that have hyper-threading available.
What was not mentioned here by Mark is, what impact this will have on the overall performance, for example, because a thread is moved to another NUMA node and therefore memory accesses get more expensive, or because after migrating a thread cache lines have to be filled again…
In order to keep the system idle as long as possible, i.e. do work only if it is absolutely necessary. Windows Services are in this regard sometimes a problem as they start running at boot time an frequently wake up to poll for any changes in the environment. This is why Windows 7 comes with the Unified Background Process Manager, which is an infrastructure that allows services to start and stop based on registered events. Events are generated through the Event Tracing for Windows (ETW) facility and service may now subscribe to those events and are waked whenever one of their events fired. Such an event is, for example, the "arrival" of a new IP address. The UBMP infrastructure is visible to the Service Control Manager (SC) through the following command line command: "sc qtriggerinfo". This will give you a list of events you may register for.
Another reason that prevents Windows from staying idle and save energy are custom timer events. They are termed custom events to differentiate those from the periodic timer interrupt that Windows uses to maintain the tick count. The peridic timer usually fires every 15 ms. This timer, at least at the moment, cannot be avoided. But maybe the custom timer events in between. Therefore Microsoft implemented Timer Coalescing into Windows, which is basically a timer grouping mechanism to group custom timer events around the periodic timer. The idea here is, that for most timers, it may still be sufficient to wack up in multiples of the periodic timer and not the exact time specified. To indicate that a timer can accept some certain degree of delay, Windows provides a new timer API that allows developers to specify a tolerable delay. This delay is used to move the arrival of the timer notification closer to the periodic timer.
Speaking of the periodic timer, prior Windows versions distributed the timer interrupt among all processors. That means, whenever a timer interrupt occured, which is handle by exactly one and the same processor all the time, the processor distributed this event to all other processors, no matter in what power state they were. So it could happen that processors in a sleep state were woken up just to update their interrupt time. With Windows 7 and Server 2008 R2, the time is distributed only to those processors that are not in a sleep state. Microsoft named this approach the Intelligent Timer Tick Distribution approach. Pretty intelligent, isn't it?
Fault Tolerant Heap
The Fault Tolerant Heap (FTH) is a mechanism that should help in reducing application crashes that result from unintentionally corrupting the heap. Therefore, Windows monitors application crashes and activates a heap monitor for those applications that crashed more than 4 times because of a heap corruption. It also implements a shim for suspicious applications that maintains up to 4 MB of buffer to overcome double free errors, i.e. an application tries to free the same memory range more than once, which is among the top causes for heap corruption.
If the corruption, e.g. due to buffer overflows, keeps crashing, the shim is removed and error reporting is improved to help analyze the problem.
Process reflection is a mechanism to help dumping information about processes that are likely to hang or that are long running ones. Prior to Windows 7, that process and all of its threads had to be suspended until the dump was completed. Process reflection now clones a process before dumping it. The cloning is based on fork(), which basically copies a process, but all its pages are marked for copy-on-write, i.e. whenever the original process updates one of its memory data, the memory manager will copy the page so that the original process see the latest changes while the clone does not. On the other hand, this mechanism allows the dump proces to read the state of the cloned process without any disturbance.
- Virtual Accounts: Service may now be provided with virtual account names, like "NT SERVICE\service1″, in order to better isolate services from one another. Passwords for those accounts are managed by the operating system.
- Native VHD Support: Windows 7 can natively mount VHD image files as partitions. Better yet, Windows can also boot from VHDs.
- Symmetric Multithreading: In system with hyper-threading, it may happen that two threads are scheduled on one core, but different hyper-threads. This will not increase the throughput as the hyper-threading processor must switch between its hyper-threads. Therefore, Windows is now hyper-threading aware and therefore prefers scheduling threads on idle cores over logical processors.
- Dynamic Fair Share Scheduling (DFSS): On terminal server instances, Windows now performs a fair share scheduling approach based on the session of a thread. Each session is given a time share based on the amount of session over an interval of 150 ms. Whenever a share of a session is exausted, none of those threads that belong to the session will be scheduled until all the other sessions finished their share.
- Increased number of processors: Windows now supports up to 256 processors (LPs).