Copy Link
Add to Bookmark
Report
Phrack Inc. Volume 12 Issue 65 File 04
==Phrack Inc.==
Volume 0x0c, Issue 0x41, Phile #0x04 of 0x0f
|=-----------------------------------------------------------------------=|
|=---=[ Stealth hooking : Another way to subvert the Windows kernel ]=---=|
|=-----------------------------------------------------------------------=|
|=--------------------=[ by mxatone and ivanlef0u ]=---------------------=|
|=-----------------------------------------------------------------------=|
1 - Introduction on anti-rookits technologies and bypass
1.1 - Rookits and anti-rootkits techniques
1.2 - About kernel level protections
1.3 - Concept key: use kernel code against itself
2 - Introducing stealth hooking on IDT.
2.1 - How Windows manage hardware interrupts
2.1.1 - Hardware interrupts dispatching on Windows
2.1.2 - Hooking hardware IT like a ninja
2.1.3 - Application 1 : Kernel keylogger
2.1.4 - Application 2 : NDIS incoming packets sniffer
2.2 - Conclusion about stealth hooking on IDT
3 - Owning NonPaged pool using stealth hooking
3.1 - Kernel allocation layout review
3.1.1 - Difference between Paged and NonPaged pool
3.1.2 - NonPaged pool tables
3.1.3 - Allocation and free algorithms
3.2 - Getting code execution abusing allocation code
3.2.1 - Data corruption of MmNonPagedPoolFreeListHead
3.2.2 - Expend it for every size
3.3 - Exploit our position
3.3.1 - Generic stack redirection
3.3.2 - Userland process code injection
4 - Detection
5 - Conclusion
6 - References
---[ 1 - Introduction on anti-rookits technologies and bypass
Nowadays rootkits and anti-rootkits are becoming more and more important
into the IT security landscape. Loved by some, hated by others, rootkits
can be considered as the holy grail of backdoors : stealthy, little,
close to hardware, ingenious, vicious... Their control over a computer
locally or remotely make them the best choice for an attacker.
Anti-rootkits try to detect and eradicate those malicious programs.
Rk techniques and complexity are evolving fast and today developing a rk or
anti-rk is a very hard mission.
This paper deals about rootkits on Windows platform. More precisely about
new kind of hijacking techniques that can be applied to the Windows kernel.
Readers are assumed to be aware about rootkits techniques on Windows.
----[ 1.1 - Rootkits and anti-rootkits technics
A rootkit hijacks an operating system's behavior. In order to achieve this
task, it can simply modify the operating system's binaries but that's not
very stealthy. Most rk's use hooks on important functions and change theirs
results. A basic hook redirects execution flow by changing function start
or a function pointer but there is no single way to hook a routine. The
most common example is the SSDT (System Service Descriptor Table), this
table contains the syscall list which is a set of functions pointers. If
you can modify a pointer in this table, you are able to control the
behavior of one function. That's an example of how rootkits proceed,
obviously there is a lot of critical areas that can be controlled by an
attacker.
Anti-rootkits try to check those areas, but the task is very hard. Most of
the time, anti-rk software makes a comparison between the memory image of
the program and its binary on the disk or verify some function pointer
tables to see if something has changed.
That's how the war between rk-makers and anti-rk-junkies began, trying
to find the best way, the best area, for hooking critical operating
system features. On Windows those following areas are often used by
rootkits :
- SSDT (kernel syscalls table) and shadow SSDT (win32k syscall table) are
the simplest solution.
- MSR (Model Specific Registers) can be modified by a rootkit. On Windows
the MSR_SYSENTER_EIP is used by the assembly instructions 'sysenter' to
enter into ring0 mode. Hijacking this MSR allow an attacker to control
the system.
- MajorFunctions are functions used by drivers for I/O processing with
others devices, hooking those functions can be useful for a rootkit.
- IDT (Interrupt Descriptor Table) is table used by the system for
handling exceptions and interruptions.
Another kind of techniques has appeared. By accessing to the kernel
objects a rootkit can easily change information about processes, threads,
loaded modules and other stuff. Those techniques are called DKOM (Direct
Kernel Object Manipulation). For example, the Windows kernel maintains a
double linked list called PsActiveProcessList (EPROCESS structures)
containing information about running processes. Unlink one of them and
your process will disappear from process lists like task manager, whereas
the process is still running.
To block those kernel objects modifications, anti-rk checks other
sections. For processes, they used to read the PspCidTable which
has a table of PID (Process IDentifier) and TID (Thread IDentifier).
A comparison between this table and PsActiveProcessList shows hidden
processes. Against those attacks anti-rk tools have to find others
sections and tricks to detect altered objects.
One of the first paper about Windows stealth was written by Holy Father,
"Invisibility on NT boxes" [1]. With this paper came one of the first
public implementations of a rootkit with a ring0 driver, Hacker
Defender [2], coded by Holy Father and Ratter of the famous VXing mag 29A
[3]. This driver was able to elevate process rights using token
manipulation. The rest of the rootkit uses user-land hooks to perform files
and registry hiding, process infection with dll injection. A good example
of a full ring0 rootkit is NT Rootkit of Greg Hoglund [4], this driver uses
SSDT hooks to perform stealth operation. It registers a Filter Device
Object above the NTFS file system and above the keyboard device for
filtering IRP (I/O Request Packets). It also provides a NDIS protocol
driver to hide communication on the network. Even if this rk was written
for NT 4.0 and Win2K it's a perfect example for beginners.
After came more advanced ring0 rk like FU [5], written by Fuzen_op and its
improvement FUto published in the famous technical journal Uninformed [6].
Vista improvement on driver verification introduces new rootkits mostly
based on hardware features. Like BootRoot [7] and Pixie [8] by Eeye
loaded before any protection. Finally Joanna Rutkowska with her Blue
Pill [9] used virtualization technology to create layer between the
operating system and the hardware.
In the wild the rk are used most of the time for lame mail spamming or
botnets. They often use old techniques but some of them are interesting
like Rustock [10] series or StormWorm [11] and the MBR rootkit [12]. They
implement a lot of tricks as ADS (Alternate Data Stream), code obfuscation,
anti-debug, anti-VM or polymorphic code. The goal is not only subverting
the kernel but also slow down their analysis and make them harder to
defeat.
Even if the technology used by rootkits are more and more sophisticated,
the underground community is still developing POCs to improve current
techniques. Unreal [13] and AK992 [14] are both great examples. The first
uses an ADS and a NTFS MajorFunctions hooking to hide itself, the second
checks IRP completion when sended to disk's devices. You can find plenty
examples of rootkit techniques on rootkit.com.
Finally, this part would not be complete if we don't speak about anti-rk.
The most famous is Rk Unhooker by MP_ART & EP_X0FF and their team UG North.
Others anti-rk are DarkSpy [15] by CardMagic, IceSword [16] by pjf and
Gmer [17].
----[ 1.2 - About kernel level protections
When we talk about protection, we must notice where the protection takes
place into the system. A protection has an advantage on an attack only
if it operates from a higher level. Protections like PaX or Exec Shield
are efficient because they protecting userland from kernel.
Protections like PatchGuard and other HIPS also protect the system
integrity but as far as an attacker can find a way to attack those
protections at their own level they will be useless. A protection is
reliable only if it can't be corrupted by an attacker. Assuming an
attacker find a way to inject code into the protection and you can
consider that your b0x is dead.
That's why PatchGuard isn't so efficient [18]. But we know that disabling
or destroying a protection is very noisy. No, the best way is to fly under
the radar by working with special objects and events that cannot be
checked because of their volatility.
In June 2006, Greg Hoglund presented the concept of KOH (Kernel Object
Hooking) [19]. A new way of detouring code execution, you don't have to
modify static code section but rather you work on dynamic allocated
structures/codes like DPC (Deferred Procedure Calls). For protections,
it's hard to find and verify those areas due to their instabilities.
Others cool objects are IRP. They are the object used by the Windows
kernel I/O manager to communicate with devices. Each I/O operation on
hardware generates an IRP, sycalls send IRP to a driver through his
device. In general a driver owns several devices; one of them is used to
communique with the userland by using IOCTL and others devices are
managing IRP by filtering them or performing a requested task.
IRP are sent to a driver using its MajorFunctions table. This table
includes the different functionalities provided by the driver. You can
check the result returned by a MajorFunction by installing a completion
routine on an IRP. They are very volatile objects; controlling and
checking them is very hard.
In fact, if you want to check everything you would need to completely
redesign operating system architecture. So keep in mind that protection
cannot be everywhere at every time and we will demonstrate it in the
following parts.
----[ 1.3 - Concept key: use kernel code against itself
The idea behind this paper is exploiting kernel code. Exploitation is
possible because input defines code behavior. Submitting a crafted input
to a vulnerable software can leads into code execution. Dangerous input is
of course defined by your target. Kernel space contains more exploitation
scenarios because you can change its environment. A rootkit can not
change basic inputs as arguments. But it can change the environment around
a code. Heap exploitation techniques such as unlinking is a perfect
example. By changing a memory block structure, you are able to overwrite
4 bytes. Some techniques can even change next allocated block address [20].
It does work because a program trusts those information. In kernel, you
have a total control on the environment. Also completely checking the
kernel is bad for performance and totally impossible.
Changing code environment has been used successfully for the phide2
rootkit [21] technique. This rootkit can hide threads without hooking
Windows scheduler which is impressive. As it relies on code behavior, it
needs strong reverse knowledge. It extends this concept into unknown
operating system behaviors. Generic protections are based on generic
assumptions. Such as checking only driver images for code hooks. These
days operating systems design is against those protections and requires
advanced software rootkit techniques.
---[ 2 - Introducing stealth hooking on IDT
Let's introduce our concept about stealth hooking with an example based on
IDT. First we will see what is the IDT and its purpose. Then we will
discuss about hardware interrupts and how Windows deals with them.
IDT (Interrupt Descriptor Table) is a CPU specific linear table localized
in kernel-land. IDT can be read with ring3 privilege level but you must
have ring0 privilege if you want to write into it. IDT is composed of 256
entries of KIDTENTRY structures and you can use the Kernel Debugger (KD)
included into the Debugging Tools for Windows [22] to see the definition
of an IDT entry.
kd> dt nt!_KIDTENTRY
+0x000 Offset : Uint2B
+0x002 Selector : Uint2B
+0x004 Access : Uint2B
+0x006 ExtendedOffset : Uint2B
Here we don't want to (re)explain the architecture of the IDT so we advise
you to read Kad's paper published in Phrack 59 about IDT and about how it
works [23].
The first 32 entries of IDT are reserved by the CPU for exceptions. Others
are use to handle hardware interrupts and special system events.
Here is a dump of the first 64 entries of the Windows' IDT.
kd> !idt -a
Dumping IDT:
00: 804df350 nt!KiTrap00
01: 804df4cb nt!KiTrap01
02: Task Selector = 0x0058
03: 804df89d nt!KiTrap03
04: 804dfa20 nt!KiTrap04
05: 804dfb81 nt!KiTrap05
06: 804dfd02 nt!KiTrap06
07: 804e036a nt!KiTrap07
08: Task Selector = 0x0050
09: 804e078f nt!KiTrap09
0a: 804e08ac nt!KiTrap0A
0b: 804e09e9 nt!KiTrap0B
0c: 804e0c42 nt!KiTrap0C
0d: 804e0f38 nt!KiTrap0D
0e: 804e164f nt!KiTrap0E
0f: 804e197c nt!KiTrap0F
10: 804e1a99 nt!KiTrap10
11: 804e1bce nt!KiTrap11
12: 804e197c nt!KiTrap0F
13: 804e1d34 nt!KiTrap13
14: 804e197c nt!KiTrap0F
15: 804e197c nt!KiTrap0F
16: 804e197c nt!KiTrap0F
17: 804e197c nt!KiTrap0F
18: 804e197c nt!KiTrap0F
19: 804e197c nt!KiTrap0F
1a: 804e197c nt!KiTrap0F
1b: 804e197c nt!KiTrap0F
1c: 804e197c nt!KiTrap0F
1d: 804e197c nt!KiTrap0F
1e: 804e197c nt!KiTrap0F
1f: 804e197c nt!KiTrap0F
20: 00000000
21: 00000000
22: 00000000
23: 00000000
24: 00000000
25: 00000000
26: 00000000
27: 00000000
28: 00000000
29: 00000000
2a: 804deb92 nt!KiGetTickCount
2b: 804dec95 nt!KiCallbackReturn
2c: 804dee34 nt!KiSetLowWaitHighThread
2d: 804df77c nt!KiDebugService
2e: 804de631 nt!KiSystemService
2f: 804e197c nt!KiTrap0F
30: 806f3d48 hal!HalpClockInterrupt
31: 80dd816c i8042prt!I8042KeyboardInterruptService (KINTERRUPT 80dd8130)
32: 804ddd04 nt!KiUnexpectedInterrupt2
33: 80dd3224 serial!SerialCIsrSw (KINTERRUPT 80dd31e8)
34: 804ddd18 nt!KiUnexpectedInterrupt4
35: 804ddd22 nt!KiUnexpectedInterrupt5
36: 804ddd2c nt!KiUnexpectedInterrupt6
37: 804ddd36 nt!KiUnexpectedInterrupt7
38: 806edef0 hal!HalpProfileInterrupt
39: 80f0827c ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 80f08240)
3a: 80dc67cc vmsrvc+0x1C16 (KINTERRUPT 80dc6790)
3b: 80df6414 NDIS!ndisMIsr (KINTERRUPT 80df63d8)
3c: 80de040c i8042prt!I8042MouseInterruptService (KINTERRUPT 80de03d0)
3d: 804ddd72 nt!KiUnexpectedInterrupt13
3e: 80ed78a4 atapi!IdePortInterrupt (KINTERRUPT 80ed7868)
3f: 80f01dd4 atapi!IdePortInterrupt (KINTERRUPT 80f01d98)
40: 804ddd90 nt!KiUnexpectedInterrupt16
[...]
This dump represents a typical Windows IDT, you can see the IDT entries
index followed by the address of the handler and this name. The first 32
entries are filled by KiTrap* functions that manage exceptions. The rest
of the table is left to the system, you can see specials system interrupts
like KiSystemService and KiCallbackReturn and handlers used by drivers
like I8042KeyboardInterruptService or I8042MouseInterruptService.
----[ 2.1 - How Windows manage hardware interrupts
When we talk about interrupts we must introduce the concept of IRQL
(Interrupt ReQuest Level). The kernel represents IRQLs internally as a
number from 0 through 31 on x86 with higher numbers representing higher
priority interrupts. Although the kernel defines the standard set of IRQLs
for software interrupts, the HAL (Hardware Abstraction Layer) maps
hardware interrupt numbers to the IRQLs.
+----------------+
31 | Highests | \
to | IRQLs | | Clock, system failure.
27 | | /
+----------------+
26 | | \
to | DEVICE_IRQL | | Hardware interrupts.
3 | | /
+----------------+
2 | DISPATCH_LEVEL | Scheduler, DPC.
+----------------+
1 | APC_LEVEL | Used when dispatching APC.
+----------------+
0 | PASSIVE_LEVEL | Threads run at this IRQL.
+----------------+
Each processor has its own IRQL. You can have a core running at an IRQL=
DISPATCH_LEVEL whereas another is running at PASSIVE_LEVEL. In fact IRQL
represents the "mask ability" of the current running code. Interrupts from
a source with an IRQL above the current level interrupt the processor,
whereas interrupts from sources with IRQLs equal to or below the current
level are masked until an executing thread decrease the IRQL.
Some system components are not accessible when code is running at
IRQL>=DISPATH_LEVEL. Accessing to paged memory (memory which can be
swapped on disk) is impossible and lots of kernel functions cannot be used.
Hardware interrupts are asynchronous and reached by external peripherals.
For example when you hit a key, your keyboard device sends an IRQ
(Interrupt ReQuest) routed by the Southbridge [24] on your interrupt
controller through the Northbridge [25]. The Southbridge is a chip that can
be described like a I/O controller hub. This chip receives all the I/O
externals interrupt and send them to the Northbridge. The Northbridge is
directly connected to your memory and high speed graphic bus also to your
CPU. This chip is also known as the memory controller hub.
On most x86 systems we find a chipset called i82489, Advanced Programmable
Interrupt Controller (APIC). The APIC is composed by 2 main components, a
I/O APIC, one per CPU, and a LAPIC (Local APIC) on each core. I/O APIC
uses a routing algorithm to dispatch an interrupt on the best adapted core.
According to the principle of locality, I/O APIC will deliver the device
interrupt on the core which handled it the previous time [26].
After this LAPIC translates the IRQ to an 8-bits value, an interrupt
vector. This interrupt vector represents IDT's entry index associated with
the handler. When the core is ready to handle the interrupt, its
instruction flow is redirected on the IDT entry.
IDT IDT IDT IDT
1 2 3 4
+---+ +---+ +---+ +---+
| | | | | | | |
|---| |---| |---| |---|
| | | | | | | |
|---| |---| |---| |---|
| | | | | | | |
+---+ +---+ +---+ +---+
| | | |
+--------+ +--------+ +--------+ +--------+
| | | | | | | |
| core 1 | | core 2 | | core 3 | | core 4 |
| | | | | | | |
+--------+ +--------+ +--------+ +--------+
| LAPIC | | LAPIC | | LAPIC | | LAPIC |
+---+----+ +---+----+ +---+----+ +---+----+
| | | |
| | | |
<---+--------------+------+-------+-------------+----->
Interrupt | Processor system bus
Messages |
|
|
External +------+------+
Interrupts | |
---------------> I/O APIC |
| |
+-------------+
-----[ 2.3.1 Hardware interrupts dispatching on Windows
On Windows, the interrupt handler isn't executed immediately, there is a
code template first. This template is implemented in the function
KiInterruptTemplate and does two things. First, it saves the current
core state in the stack and dispatches code flow to the right "interrupt
dispatcher".
When a interrupt is raised, after the core status core is saved, code flow
is transferred to the interrupt handler as defined in the IDT. In fact
each interrupt handler in the IDT points to a KiInterruptTemplate
routine [27]. KiInterruptTemplate will call KiInterruptDispatch which
performs the following operations :
- Acquire the service routine spinlock.
- Raise IRQL to DEVICE_IRQL, the IRQL of a given interrupt vector is
calculated by subtracting the interrupt vector from 27d.
- Call the interrupt handler, an ISR (Interrupt Service Routine).
- Lower IRQL.
- Release the service routine spinlock.
For example, the keyboard device ISR is I8042KeyboardInterruptService.
ISR are routines for handling interrupts like top-halves in the linux
kernel. According to the WDK (Windows Driver Kit), the ISR must do
whatever is appropriate to the device to dismiss the interrupt. Then, it
should do only what is necessary to save stage and queue a DPC. It means it
interruption management will take place on a lower IRQL than during ISR
execution. The I/O processing is done into the DPC.
DPC (Deferred Procedure Call) are equivalent of bottom-halves in linux.
DPC works at IRQL DISPATCH_LEVEL, lower than the ISR's IRQL. In fact the
ISR will queue a DPC to process the entire interrupt at a lower IRQL in
order to avoid the core preemption taking too much time. For the keyboard
the DPC is I8042KeyboardIsrDpc. Here a figure to sum up the interrupt
processing :
+-------------------------+
Hardware Interrupt /----> Here we are at |
| | | IRQL=DEVICE_LEVEL |
| | | The KiInterruptDispatch |
/---> IDT ---\ | | routine calls the ISR. |
| | | |
| | | ISR handles interrupt |
+-----------------------+ | | and queue a DPC for |
| KiInterruptTemplate ------/ | later processing |
+-----------------------+ | |
+-------------------------+
KiInterruptDispatch receives one main argument from KiInterruptTemplate,
a pointer to an interrupt object stored in the EDI register. Interrupt
objects are defined by a KINTERRUPT structure :
kd> dt nt!_KINTERRUPT
+0x000 Type : Int2B
+0x002 Size : Int2B
+0x004 InterruptListEntry : _LIST_ENTRY
+0x00c ServiceRoutine : Ptr32 unsigned char
+0x010 ServiceContext : Ptr32 Void
+0x014 SpinLock : Uint4B
+0x018 TickCount : Uint4B
+0x01c ActualLock : Ptr32 Uint4B
+0x020 DispatchAddress : Ptr32 void
+0x024 Vector : Uint4B
+0x028 Irql : UChar
+0x029 SynchronizeIrql : UChar
+0x02a FloatingSave : UChar
+0x02b Connected : UChar
+0x02c Number : Char
+0x02d ShareVector : UChar
+0x030 Mode : _KINTERRUPT_MODE
+0x034 ServiceCount : Uint4B
+0x038 DispatchCount : Uint4B
+0x03c DispatchCode : [106] Uint4B
We retrieve in this structure, the SpinLock and the ServiceRoutine. Notice
that SynchronizeIrql contains the IRQL when the ISR will be executed.
For each entry in the IDT which handles a hardware interrupt, the
KiInterruptTemplate is contained in the DispatchCode table of the
KINTERRUPT structure.
For the keyboard device we have this KINTERRUPT :
kd> dt nt!_KINTERRUPT 80dd8130
+0x000 Type : 22
+0x002 Size : 484
+0x004 InterruptListEntry : _LIST_ENTRY [ 0x80dd8134 - 0x80dd8134 ]
+0x00c ServiceRoutine : 0xfa815495 unsigned char
->i8042prt!I8042KeyboardInterruptService+0
+0x010 ServiceContext : 0x80e2ec88
+0x014 SpinLock : 0
+0x018 TickCount : 0xffffffff
+0x01c ActualLock : 0x80e2ed48 -> 0
+0x020 DispatchAddress : 0x804da8d8 void nt!KiInterruptDispatch+0
+0x024 Vector : 0x31
+0x028 Irql : 0x1a ''
+0x029 SynchronizeIrql : 0x1a ''
+0x02a FloatingSave : 0 ''
+0x02b Connected : 0x1 ''
+0x02c Number : 0 ''
+0x02d ShareVector : 0 ''
+0x030 Mode : 1 ( Latched )
+0x034 ServiceCount : 0
+0x038 DispatchCount : 0xffffffff
+0x03c DispatchCode : [106] 0x56535554
Let's have a look at the beginning of KiInterruptTemplate :
nt!KiInterruptTemplate:
804da972 54 push esp
804da973 55 push ebp
804da974 53 push ebx
804da975 56 push esi
804da976 57 push edi
804da977 83ec54 sub esp,54h
804da97a 8bec mov ebp,esp
804da97c 89442444 mov dword ptr [esp+44h],eax
804da980 894c2440 mov dword ptr [esp+40h],ecx
804da984 8954243c mov dword ptr [esp+3Ch],edx
804da988 f744247000000200 test dword ptr [esp+70h],20000h
804da990 0f852a010000 jne nt!V86_kit_a (804daac0)
804da996 66837c246c08 cmp word ptr [esp+6Ch],8
804da99c 7423 je nt!KiInterruptTemplate+0x4f (804da9c1)
804da99e 8c642450 mov word ptr [esp+50h],fs
804da9a2 8c5c2438 mov word ptr [esp+38h],ds
804da9a6 8c442434 mov word ptr [esp+34h],es
804da9aa 8c6c2430 mov word ptr [esp+30h],gs
804da9ae bb30000000 mov ebx,30h
804da9b3 b823000000 mov eax,23h
804da9b8 668ee3 mov fs,bx
804da9bb 668ed8 mov ds,ax
804da9be 668ec0 mov es,ax
804da9c1 648b1d00000000 mov ebx,dword ptr fs:[0]
804da9c8 64c70500000000ffffffff mov dword ptr fs:[0],0FFFFFFFFh
804da9d3 895c244c mov dword ptr [esp+4Ch],ebx
804da9d7 81fc00000100 cmp esp,10000h
804da9dd 0f82b5000000 jb nt!Abios_kit_a (804daa98)
804da9e3 c744246400000000 mov dword ptr [esp+64h],0
804da9eb fc cld
804da9ec 8b5d60 mov ebx,dword ptr [ebp+60h]
804da9ef 8b7d68 mov edi,dword ptr [ebp+68h]
804da9f2 89550c mov dword ptr [ebp+0Ch],edx
804da9f5 c74508000ddbba mov dword ptr [ebp+8],0BADB0D00h
804da9fc 895d00 mov dword ptr [ebp],ebx
804da9ff 897d04 mov dword ptr [ebp+4],edi
804daa02 f60550f0dfffff test byte ptr ds:[0FFDFF050h],0FFh
804daa09 750d jne nt!Dr_kit_a (804daa18)
nt!KiInterruptTemplate2ndDispatch:
804daa0b bf00000000 mov edi,0
nt!KiInterruptTemplateObject:
804daa10 e9c3fcffff jmp nt!KeSynchronizeExecution+0x2 (804da6d8)
[...]
Remember, this code is unique for each KINTERRUPT. We said before that
KiInterruptDispatch receives its arguments from the EDI register (a
pointer to the KINTERRUPT of the interrupt). In the KiInterruptTemplate
we can see this little code :
[...]
nt!KiInterruptTemplate2ndDispatch:
804daa0b bf00000000 mov edi,0
nt!KiInterruptTemplateObject:
804daa10 e9c3fcffff jmp nt!KeSynchronizeExecution+0x2 (804da6d8)
[...]
Here we have a mov "edi, 0" and a jmp, but if we look at the
KiInterruptTemplate code contained in the keyboard's KINTERRUPT we have :
ffb72525 bf5024b7ff mov edi,0FFB72450h ; Keyboard KINTERRUPT
ffb7252a e9a9839680 jmp nt!KiInterruptDispatch (804da8d8)
Wow, instructions are modified! The kernel will dynamically changes those
2 instructions in the KiInterruptTemplate code. In EDI we find the
KINTERRUPT object and the jmp branch on KiInterruptDispatch.
Why this implementation ? Because we can easily change the dispatch
handler. Even if we often have the KiInterruptDispatch we can find
KiFloatingDispatch or KiChainDispatch. KiChainedDispatch is for vectors
shared among multiple interrupt objects and KiFloatingDispatch is like
KiInterruptDispatch, but it saves the floating core state too.
Windows provides APIs for connecting interrupts on IDT. IoConnectInterrupt
and IoConnectInterruptEx, according to the WDK :
NTSTATUS
IoConnectInterrupt(
OUT PKINTERRUPT *InterruptObject,
IN PKSERVICE_ROUTINE ServiceRoutine,
IN PVOID ServiceContext,
IN PKSPIN_LOCK SpinLock OPTIONAL,
IN ULONG Vector,
IN KIRQL Irql,
IN KIRQL SynchronizeIrql,
IN KINTERRUPT_MODE InterruptMode,
IN BOOLEAN ShareVector,
IN KAFFINITY ProcessorEnableMask,
IN BOOLEAN FloatingSave
);
As you can see IoConnectInterrupt returns in the InterruptObject parameter
a KINTERRUPT structure, the same that we retrieve in the IDT. Previously
you have seen in the KiInterruptTemplate two labels,
KiInterruptTemplateObject and KiInterruptTemplate2ndDispatch. Those two
labels are used by kernel function to find the two instructions in the
KiInterruptTemplateRoutine. KeInitializeInterrupt uses the
KiInterruptTemplateObject label to update the "jmp Ki*Dispatch" and the
KiConnectVectorAndInterruptObject function uses
KiInterruptTemplate2ndDispatch to modify the "mov edi, <&Kinterrupt>".
-----[ 2.3.2 Hooking hardware IDT like a ninja
Now, think about this. We want to hook the IDT in a stealth way, we know
that replacing an entry directly is not the best solution. Anti-rooktits
don't check the dynamically allocated KiInterruptTemplate routine. So we
can modify this routine as we wish. There are three possible ways :
- Redirect the "jmp Ki*Dispatch" on our dispatch routine, we have to code
our dispatch routine, not so hard.
- Change the kinterrupt address passed in EDI by the instruction
"mov edi, <&Kinterrupt>". The new KINTERRUPT will be the same than the
previous one, only the ServiceRoutine will be modified by us.
- Create our own KiInterruptTemplate, hard ...
In this paper, we choosed the simplest way. We change the
"mov edi, <&kinterrupt>" by a "mov edi, <&OurKinterrupt>" and we implement
our ServiceRoutine. We know that this instruction is followed by a jmp, so
with a disassembly engine we can retrieve the instruction before the jmp
nt!KiInterruptDispatch and modify it. We must keep in mind, when the
ServiceRoutine is running, the interrupt is not handled yet and we are
running at DEVICE_IRQL IRQL. This is not a fair situation, because a
lot of Windows kernel functions are not accessible. We know, that most
ISR queued a DPC, so after the ISR has been executed, the last entry in
the current core DPC queue should contain the DPC routine of our interrupt.
If we want to access data generated by the interrupt we must proceed like
the ISR. Replacing the original ISR by our own ISR handler is very hard,
because it depends too much on the hardware device. But we know that the
real I/O is done by the DPC, so when KiInterruptTemplate will call our
ServiceRoutine, first we call the original ServiceRoutine and we modify
the last DPC entry by our.
DPC are represented by KDPC structures :
kd> dt nt!_KDPC
+0x000 Type : Int2B
+0x002 Number : UChar
+0x003 Importance : UChar
+0x004 DpcListEntry : _LIST_ENTRY
+0x00c DeferredRoutine : Ptr32 void
+0x010 DeferredContext : Ptr32 Void
+0x014 SystemArgument1 : Ptr32 Void
+0x018 SystemArgument2 : Ptr32 Void
+0x01c Lock : Ptr32 Uint4B
DPC list can be found in the KPRCB (Kernel Processor Control Region Block)
structure of the current processor. KPRCB is preceded by a KPCR (Kernel
Processor Control Block) structure which is located at FS:[0x1C] on the
current processor. KPRCB is a 0x120 bytes from the beginning of the KPCR
structure.
dt nt!_KPRCB
[...]
+0x860 DpcListHead : _LIST_ENTRY
+0x868 DpcStack : Ptr32 Void ; DPC arguments
+0x86c DpcCount : Uint4B ; DPC core counter
+0x870 DpcQueueDepth : Uint4B ; Numbers of DPC in the list
+0x874 DpcRoutineActive : Uint4B
+0x878 DpcInterruptRequested : Uint4B
+0x87c DpcLastCount : Uint4B
+0x880 DpcRequestRate : Uint4B
+0x884 MaximumDpcQueueDepth : Uint4B
+0x888 MinimumDpcRate : Uint4B
Now we know how to retrieve the DPC of our interrupt, we can easily
change it to our own and handle the data.
For the keyboard the DPC is queued by KeInsertQueueDpc in the
I8xQueueCurrentKeyboardInput routine called by the keyboard's ISR.
kd> dt nt!_KDPC 80e3461c
+0x000 Type : 19 ; 19=DpcObject
+0x002 Number : 0 ''
+0x003 Importance : 0x1 ''
+0x004 DpcListEntry : _LIST_ENTRY [ 0xffdff980 - 0x80559684 ]
+0x00c DeferredRoutine : 0xfa815650 void i8042prt!I8042KeyboardIsrDpc
+0x010 DeferredContext : 0x80e343b8
+0x014 SystemArgument1 : (null)
+0x018 SystemArgument2 : (null)
+0x01c Lock : 0xffdff9c0 -> 0
Here is the figure of the attack :
MyKinterrupt structure
+---------------------+
Hardware Interrupt /----> MyServiceRoutine |
| | | Calls the original |
| | | ISR ------\
\---> IDT ---\ | | And modify the DPC | |
| | | queue. | |
| | +---------------------+ |
+---------------------+ | |
| KiInterruptTemplate -----/ Original Kinterrupt |
+---------------------+ +---------------------+ |
Core | | |
+------------+ | ServiceRoutine <-----/
| | | Queues the ISR's DPC|
|DpcListHead |--\ +---------------------+
| | |
+------------+ |
| +-----+ +-----+ +-----+ +-----+
\-> DPC |---->| DPC |---->| DPC |---->| DPC |-->DpcListHead
DpcListHead<---| |<----| |<----| |<----| |
+-----+ +-----+ +-----+ +-----+
/\
||
Last DPC entry
Modified after the call
to the ServiceRoutine.
-----[ 2.3.3 - Application 1 : Kernel keylogger
It's time to design a POC. In this sample we will see how to sniff
keyboard keystrokes. As you see previously, we are now able to control the
DPC generated by an interrupt. For the keyboard we will hijack the
I8042KeyboardIsrDpc routine which is set into the DPC's keyboard
interruption. With our own DPC handler we will reproduce the behavior of
the original routine, unfortunately this kind of routine is hard to write
so we ripped some pieces of codes and used reversing techniques (notice the
lazy hacker style).
In our DPC handler we must call the KeyboardClassServiceCallback [28]
routine, this routine is provided by the Kbdclass driver. This callback
transfers input data buffer of a device to the class data queue. A function
keyboard driver must calls this class service callback in its DPC routine.
Here is the KeyboardClassServiceCallback's prototype :
VOID
KeyboardClassServiceCallback (
IN PDEVICE_OBJECT DeviceObject,
IN PKEYBOARD_INPUT_DATA InputDataStart,
IN PKEYBOARD_INPUT_DATA InputDataEnd,
IN OUT PULONG InputDataConsumed
);
Parameters :
DeviceObject : Pointer to the class device object.
InputDataStart : Pointer to the first keyboard input data packet in
the input data buffer of the port device.
InputDataEnd : Pointer to the keyboard input data packet that
immediately follows the last data packet in the input data buffer of
the port device.
InputDataConsumed : Pointer to the number of keyboard input data
packets that are transferred by the routine.
KEYBOARD_INPUT_DATA is defined by :
typedef struct _KEYBOARD_INPUT_DATA {
USHORT UnitId;
USHORT MakeCode;
USHORT Flags;
USHORT Reserved;
ULONG ExtraInformation;
} KEYBOARD_INPUT_DATA, *PKEYBOARD_INPUT_DATA;
So in our DPC handler we just have to check the MakeCode member of the set
of KEYBOARD_INPUT_DATA structures. The MakeCode (or scancode) represents
the data sent by the keyboard to the system when you hit or release a
key, each key has it's own scancode and the system usually translates the
scancode into a character depending on you code page. For example the
scancode 19d on classical US keyboard is translated into the keycode 'e'.
In order to know if CAPSLOCK is activated we send an IOCTL to the
functional keyboard device but we can only send IOCTL at a PASSIVE_LEVEL
IRQL. For that we use a system thread which will sent IOCTL with the
kernel API IoBuildDeviceIoControlRequest. In fact the scancodes are queued
in a list locked by a spinlock and thread synchronized with a semaphore.
The thread is listening to incoming keystrokes then converts scancodes into
keycodes. Like the kernel keylogger Klog does [29].
-----[ 2.3.4 - Application 2 : NDIS packet sniffer
In the same way, an interrupt is raised when your network card receives
a packet. When this kind of interrupt is raised NDIS ISR handler
(ndisMIsr) routine launches the miniport ISR interrupt handler. The
ndisMIsr routine is used as a wrapper for miniport ISR and DPC. You can
see in the IDT the following entry :
3b: 80df6414 NDIS!ndisMIsr (KINTERRUPT 80df63d8)
It means, your ISR handler is not called directly when an interrupt
occurs, it is the ndisMIsr routine. Miniport's ISR is called by ndisMIsr
and the miniport DPC is also queued in this routine. The DPC queued is
the ndisMDpc routine which wraps your own DPC miniport handler. Finally
NDIS wraps all the interrupt process with ndisMIsr and ndisMDpc routines on
Windows XP with NDIS 5.1. We don't know if this implementation is still
present on Windows Vista with NDIS 6.0.
We know we can hijack the ndisMDpc handler by our own handler. With NDIS
we will proceed in the same way but we will not hook the MiniportDpc
routine but directly hook the ndisMDpc routine. Why? Because we know that
ndisMDpc wraps the MiniportDpc routine and in fact MiniportDpc depends too
much on the hardware of the miniport device. Each miniport device is
represented by an NDIS_MINIPORT_BLOCK [30] structure, in this structure
we find a reference to a NDIS_MINIPORT_INTERRUP structure, which looks
like :
kd> dt ndis!_NDIS_MINIPORT_INTERRUPT
+0x000 InterruptObject : Ptr32 _KINTERRUPT
+0x004 DpcCountLock : Uint4B
+0x008 Reserved : Ptr32 Void
+0x00c MiniportIsr : Ptr32 Void
+0x010 MiniportDpc : Ptr32 Void
+0x014 InterruptDpc : _KDPC
+0x034 Miniport : Ptr32 _NDIS_MINIPORT_BLOCK
+0x038 DpcCount : UChar
+0x039 Filler1 : UChar
+0x03c DpcsCompletedEvent : _KEVENT
+0x04c SharedInterrupt : UChar
+0x04d IsrRequested : UChar
If we look at the ndisMDpc routine we notice that only the first parameter
is used and this parameter refers to a NDIS_MINIPORT_INTERRUPT structure.
The ndisMDpc function will call the MiniportDpc field of this structure.
We just have to hijack this pointer by our routine in order to control the
incoming packets on the system.
The NDIS documentation specifies that a miniport DPC routine must
notify the bound protocol driver that an that an array of received
packets is available by calling the NdisMIndicateReceivePacket function
[31].
VOID
NdisMIndicateReceivePacket(
IN NDIS_HANDLE MiniportAdapterHandle,
IN PPNDIS_PACKET ReceivePackets,
IN UINT NumberOfPackets
);
In the ndis.h header we have :
#define NdisMIndicateReceivePacket(_H, _P, _N) \
{ \
(*((PNDIS_MINIPORT_BLOCK)(_H))->PacketIndicateHandler)( \
_H, \
_P, \
_N); \
}
So in our MiniportDpc routine we will hihjack the PacketIndicateHandler,
which is often the ethFilterDprIndicateReceivePacket routine in the
NDIS_MINIPORT_BLOCK structure, in order to filter the incoming packets on
the miniport. After we have hijacked this pointer we call the original
MiniportDpc routine that will process everything. After that, we restore
the PacketIndicateHandler handler in the NDIS_MINIPORT_BLOCK for stealth
reasons. To sum up we must :
- Hijack the routine into the DPC queued by the ndisMIsr routine.
- Now that we have hijacked the ndisMDpc we modify the
PacketIndicateHandler into the NDIS_MINIPORT_BLOCK of the miniport.
- We call the ndisMDpc routine. It will call the original MiniportDpc
handler
- The MiniportDpc routine calls the NdisMIndicateReceivePacket macro. Our
filter function is called and we do our job.
- When the ndisMDpc returns we restore the original PacketIndicateHandler
into the NDIS_MINIPORT_BLOCK of the miniport.
With this filter, we can monitor or modify the incoming packets. For
example, our PacketIndicateHandler hook can search in the incoming packets
for a tag, when this tag his found the rootkit triggers a function.
---[ 2.2 - Conclusion about stealth hooking on IDT
In this part we have seen how Windows manages his hardware interrupts by
using a global template function dedicated to all interrupts. The fact
that this template routine his forged for each interrupts is the main
point of this attack, with that we can create a fake template routine that
cannot be detected directly. The stealth of our attack remains on two
points :
- We modify only dynamic allocated and forged code
- We hijack highly temporal dynamic allocated structures which when
running, are always preempting the core.
So, even if the scope of our attack is restricted, controlling the hardware
is the best way for a rk to reach critical components. Finally, we have
just cheated the system with its own features and that's the purpose of a
stealth rootkit.
--[ 3 - Owning NonPaged pool using stealth hooking
Rootkit sophistication depends on how it subverts the kernel. More
complex techniques come out as kernel and hardware understanding evolve.
Nowadays there is so many ways to subvert the kernel, in consequence
protections become harder to defeat. We're going to present a different
means to gain control. Next techniques apply this approach to the kernel
memory allocator.
Our goal is getting execution on every NonPaged allocation without using
any hook. It must bypass any hooking verification even those based on code
page comparison or hashing. It will be done by modifying data used by the
allocator. We just apply the concept of using code against itself. We do
believe that this concept can be used on others components and in
different ways successfully.
We won't try to convince you that this technique is perfect. It evades
current protections and detection systems. The more important is that they
would need more than a simple modification to prevent and block an attack
based on kernel code behavior.
---[ 3.1 - Kernel allocation layout review
As every operating system, Windows kernel puts forward some functions in
order to allocate or free memory. Virtual memory is organized as block of
memory called pages. In Intel x86 architecture, a page size is 4096 bytes
and most allocations requests are smaller. Thus, kernel functions like
ExAllocatePoolWithTag and ExFreePoolWithTag kept unused memory blocks for
next allocations. Internal functions directly interact with hardware each
time a page is needed. All those procedures are complex and delicate that's
why drivers trust kernel implementation.
-----[ 3.1.1 - Difference between Paged and NonPaged pool
Kernel system memory is divided in two different kind of pool. It has been
separated to distinguish most used memory blocks. The system must know
which pages should be resident and which can be temporarily discarded. The
page fault handler restores pageable memory only when IRQL is inferior of
DPC or DISPATCH level. Paged pool can be paged in or out of the system. A
memory block paged out will be saved on the file system and so unused part
of paged memory will not be resident in memory. NonPaged pool is present
in every IRQL level and then is put-upon for important tasks.
The file pagefile.sys contains paged out memory. It was attacked to inject
unsigned code into Vista kernel [32]. Some solutions was discussed as
disabling kernel memory paging. Joanna Rutkowska defended this solution as
more secure than others but with a small physical memory loss. Microsoft
just denied raw disk access, which may prove that Paged and NonPaged
layout is an important feature of Windows kernel [33].
This article focuses on NonPaged pool layout as PagedPool handling is
totally different. NonPaged pool can be more or less considered as
following a typical heap implementation. Global information about system
pool can be found in Microsoft Windows Internals [34].
-----[ 3.1.2 - NonPaged pool tables
The allocation algorithm must be fast allocating on the most used sizes.
That why three different tables exist and each one is devoted to a size
range. We found this organization in most memory management algorithms.
Retrieving memory blocks from hardware takes time. Windows balances between
response faster and avoid memory wasting. Response time becomes faster if
memory blocks are stored for next allocations. In the other hand, if you
keep too much memory, it can penalize memory demands.
Each table implements a different way to store memory blocks. We will
present each table and where you can find them.
The NonPaged lookaside is a per-processor table covering size inferior or
equal to 256 bytes. Each processor has a processor control register (PCR)
storing data concerning only a single processor like IRQL level, GDT, IDT.
Its extension called processor control region (PCRB) contains lookasides
tables. Next windbg dump presents NonPaged lookaside table and its
structure.
kd> !pcr
KPCR for Processor 0 at ffdff000:
Major 1 Minor 1
NtTib.ExceptionList: 805486b0
NtTib.StackBase: 80548ef0
NtTib.StackLimit: 80546100
NtTib.SubSystemTib: 00000000
NtTib.Version: 00000000
NtTib.UserPointer: 00000000
NtTib.SelfTib: 00000000
SelfPcr: ffdff000
Prcb: ffdff120
Irql: 00000000
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: 8003f400
GDT: 8003f000
TSS: 80042000
CurrentThread: 80551920
NextThread: 00000000
IdleThread: 80551920
DpcQueue: 0x80551f80 0x804ff29c
kd> dt nt!_KPRCB ffdff120
[...]
+0x5a0 PPNPagedLookasideList : [32]
+0x000 P : 0x819c6000 _GENERAL_LOOKASIDE
+0x004 L : 0x8054dd00 _GENERAL_LOOKASIDE
[...]
kd> dt nt!_GENERAL_LOOKASIDE
+0x000 ListHead : _SLIST_HEADER
+0x008 Depth : Uint2B
+0x00a MaximumDepth : Uint2B
+0x00c TotalAllocates : Uint4B
+0x010 AllocateMisses : Uint4B
+0x010 AllocateHits : Uint4B
+0x014 TotalFrees : Uint4B
+0x018 FreeMisses : Uint4B
+0x018 FreeHits : Uint4B
+0x01c Type : _POOL_TYPE
+0x020 Tag : Uint4B
+0x024 Size : Uint4B
+0x028 Allocate : Ptr32 void*
+0x02c Free : Ptr32 void
+0x030 ListEntry : _LIST_ENTRY
+0x038 LastTotalAllocates : Uint4B
+0x03c LastAllocateMisses : Uint4B
+0x03c LastAllocateHits : Uint4B
+0x040 Future : [2] Uint4B
Lookaside tables permit faster block retrieving than typical double linked
list. For this optimization lock time is really important and a single
linked list is a faster mechanism than software locking.
ExInterlockedPopEntrySList function is used to pop an entry from a single
linked list using hardware locking instruction "lock".
PPNPagedLookasideList is the lookaside table we were talking about. It
contains two lookaside lists P and L. Depth field of the GENERAL_LOOKASIDE
structure defines how many entries can be in ListHead single list. The
system updates regularly the depth using different counters. The update
algorithm is based on processor number and is different for P and L. Depth
of the P list is updated more frequently than L list as it optimizes
performances on very small blocks.
The second table depends how many processors are used and how system
managed them. Allocation system walk it if size is inferior or equal to
4080 bytes or if lookaside research failed. Even if target table can
change, it always has the same POOL_DESCRIPTOR structure. On single
processor, a variable called PoolVector is used to retrieve
NonPagedPoolDescriptor pointer. On multi processor, the
ExpNonPagedPoolDescriptor table has 16 slots containing pool descriptors.
Each processor PRCB points on a KNODE structure. A node can be linked on
more than one processor and contains a color field used as an index in
ExpNonPagedPoolDescriptor. Next figures illustrate this algorithm.
PoolVector
+------------+
| NonPaged | --------------> NonPagedPoolDescriptor
|------------+
| Paged |
+------------+
[ Figure 1 - Single processor pool descriptor ]
Processor #1
+------------+
| | ExpNonPagedPoolDescriptor
| PRCB ------\ +-------------------+
| | | /---------> SLOT #01 |
+------------+ | | | SLOT #02 |
/---------/ | | SLOT #03 |
| KNODE | | SLOT #04 |
|---> +------------+ | | SLOT #05 |
| | Proc mask | | | SLOT #06 |
| | color (01) --/ | SLOT #07 |
| | ... | | SLOT #08 |
| +------------+ | SLOT #09 |
| | SLOT #10 |
\---------\ | SLOT #11 |
Processor #2 | | SLOT #12 |
+------------+ | | SLOT #13 |
| | | | SLOT #14 |
| PRCB ------/ | SLOT #15 |
| | | SLOT #16 |
+------------+ +-------------------+
[ Figure 2 - Multiple processor pool descriptor ]
A global variable ExpNumberOfNonPagedPools defines if multi processor case
is used. It should reflect processor number but it can change between
operating system versions.
The next dump shows POOL_DESCRIPTOR structure from windbg.
kd> dt nt!_POOL_DESCRIPTOR
+0x000 PoolType : _POOL_TYPE
+0x004 PoolIndex : Uint4B
+0x008 RunningAllocs : Uint4B
+0x00c RunningDeAllocs : Uint4B
+0x010 TotalPages : Uint4B
+0x014 TotalBigPages : Uint4B
+0x018 Threshold : Uint4B
+0x01c LockAddress : Ptr32 Void
+0x020 PendingFrees : Ptr32 Void
+0x024 PendingFreeDepth : Int4B
+0x028 ListHeads : [512] _LIST_ENTRY
Queued spinlock synchronization, part of HAL library, is used to restrict
concurrency on a pool descriptor. It assures that only one thread and one
processor will access and unlink an entry from a pool descriptor. HAL
library changes on different architectures and what is a simple IRQL
raising on single processor becomes a more complex queued system on
multi-processor. For default pool descriptor, general NonPaged queued
spinlock is locked (LockQueueNonPagedPoolLock). Else, a custom queued
spinlock is created.
The third and last table is shared by processors for size superior of 4080
bytes. MmNonPagedPoolFreeListHead is also used when others tables lack
memory. It composed by 4 LIST_ENTRY each one representing a page number,
except for the last one which holds all superiors pages kept by the system.
Access to this table is guarded by general non paged queued spinlock
also called LockQueueNonPagedPoolLock. During the free procedure of a
smaller block, ExFreePoolWithTag merges it with previous and next free
blocks. It can create a block superior or equal to 1 page. In this case,
the new block is added in the MmNonPagedPoolFreeListHead table.
-----[ 3.1.3 - Allocation and free algorithms
Kernel allocation does not change that much between OS versions but its
algorithm is as hard as the userland heap one. In this part, we want to
illustrate basic behavior between tables during allocation or free
procedures. A lot of details have been thrown away such as synchronization
mechanisms. Those algorithms will help you for the technique explanation
but also understanding the basic elements of kernel allocation. Despite
kernel exploitation is not part of this paper, pool overflow is an
interesting topic that needs understanding of some part of this algorithm.
NonPaged pool allocation algorithm (ExAllocatePoolWithTag):
IF [ Size > 4080 bytes ]
[
- Call the MiAllocatePoolPages function
- Walk MmNonPagedPoolFreeListHead LIST_ENTRY table.
- Retrieve memory from hardware if necessary.
- Return memory page aligned (without header).
]
IF [ Size <= 256 bytes ]
[
- Pop entry from PPNPagedLookasideList table.
- If something is found return memory block.
]
IF [ ExpNumberOfNonPagedPools > 1 ]
- PoolDescriptor from ExpNumberOfNonPagedPools and used index
comes from PRCB KNODE color.
ELSE
- PoolDescriptor is PoolVector first entry, designed by symbol
as NonPagedPoolDescriptor.
FOREACH [ >= Size entry of PoolDescriptor.ListHeads ]
[
IF [ Entry is not empty ]
[
- Unlink entry and split it if needed
- Return memory block
]
]
- Call the MiAllocatePoolPages function
- Walk MmNonPagedPoolFreeListHead LIST_ENTRY table..
- Split it correctly to the right size
- Return new memory block
NonPaged pool free algorithm (ExFreePoolWithTag) :
IF [ MemoryBlock is page aligned ]
[
- Call the MiFreePoolPages function
- Determine block type (Paged or NonPaged)
- Depending on how many blocks are kept in
MmNonPagedPoolFreeListHead, we release it to the hardware.
]
ELSE
[
- Merge previous and next block if possible
IF [ NewMemoryBlock size <= 256 bytes ]
[
- Look at PPNPagedLookasideList entry depth and see if we
should keep it.
- We return if memory block is pushed into lookaside list
]
IF [ NewMemoryBlock size <= 4080 bytes ]
[
- Use POOL_HEADER PoolIndex variable to determine which
PoolDescriptor must be used.
- Insert it in the proper LIST_ENTRY array entry
- If anything goes well, return
]
- Depending on how many blocks are kept in
MmNonPagedPoolFreeListHead, we release it to the hardware.
]
Paged pool algorithm is very different especially for page aligned blocks.
Smaller size management should be not that far from NonPaged but in
assembly code we definitely saw that NonPaged and Paged pool are totally
separated. Once you know a little more about how NonPaged allocation works,
we can now talk about exploitation part.
---[ 3.2 - Getting code execution abusing allocation code
Our main goal is getting code execution on every allocation attempts for
NonPaged pool only. This result must be done only by changing data used by
targeted code. Our purpose is proving that kernel code can serve our
interest only by changing typical data environment. Our work is based
on a new rootkit developed to gain control over NonPaged allocation.
We start with getting code execution for allocation superior or equal
to 1 page. As we saw on previous part, it concerns the third and last
table.
-----[ 3.2.1 - Data corruption of MmNonPagedPoolFreeListHead
MmNonPagedPoolFreeListHead conserves page aligned memory blocks to speed up
memory allocation. It links held memory block using a LIST_ENTRY structure.
This structure is common and use in Windows heap library for example.
kd> dt nt!_LIST_ENTRY
+0x000 Flink : Ptr32 _LIST_ENTRY
+0x004 Blink : Ptr32 _LIST_ENTRY
MmNonPagedPoolFreeListHead access is protected by general NonPaged queued
spinlock LockQueueNonPagedPoolLock. It assures that only one thread and
processor can look and modify this structure.
So we need a way to get control over allocation and unlinking procedure
seems perfect. We can poison this linked list with a fake entry, with the
highest size possible, which unlinking will modify current executed code.
At kernel level, you can modify code as data without any protection issues.
Unlinking was used when heap exploitation started [20] but modifying code
was not possible from userland. As spinlock assures us exclusivity, there
is no risk on some race condition. The created "hook" would be dynamic and
code restored directly. Page guard protection reverse [18] shows that code
is only checked every 5 minutes. Whether a modification is found, real code
is just replaced.
This method has plenty assets but also a lot of obstacles. Let start by
enumerating all those obstacles :
- On a basic implementation of unlinking, list become unwalkable.
It breaks most utilization of the table.
- Pass through page cleaning methods and always be the first block on the
list otherwise we could miss some call.
- We break code path and sooner or later we must return as if our
hijacking has never been there and everything goes fine.
- Processor prefetch make self code modification dangerous.
Unlinking gives us 4 bytes overwriting to build an opcode and create a
redirection. In our case, we influenced current context and a register
should point to the unlinked entry. We said should point without choosing a
single register because kernel changes between versions or service packs.
As soon as we discuss context, we will stay talking about general
situations. We choose to make a jmp [reg+XX] which is FF60XX in hex.
This technique effectiveness lies on keeping the MmNonPagedPoolFreeListHead
walkable. A double linked list, as LIST_ENTRY, is walkable if Flink is
correct. Therefore we can choose an address for Flink as 0xXXXX60FF and
Blink will point to the code address. The Intel x86 architecture using
little endian our address is quite easy to found, we must check opcode
offset and discard too close possibilities. Next figure illustrates a
poisoned entry.
MmNonPagedPoolFreeListHead[i]
/------> +--------------------+
| | Flink | ---\
| |--------------------| |
| <---- | Blink | |
| +--------------------+ |
| | ... | |
| +--------------------+ |
| /-------------------------------/
| |
| | Poisoned entry
| | +--------------------+
| | | PreviousSize : - |
| | +--------------------+
| | | PoolIndex : - |
| | +--------------------+
| | | PoolType: NonPaged |
| | +--------------------+
| | | BlockSize : i |
| | +--------------------+
| | | PoolTag : - |
| \---> +--------------------+
| | Flink : 0xYYXX60FF | <--\
| |--------------------| |
| X--- | Blink : 0x80YYYYYY | |
| +--------------------+ |
| |
| /-------------------------------/
| | Fake entry (0xYYXX60FF)
| | +--------------------+
| | | PreviousSize : - |
| | +--------------------+
| | | PoolIndex : - |
| | +--------------------+
| | | PoolType: NonPaged |
| | +--------------------+
| | | BlockSize : < i |
| | +--------------------+
| | | PoolTag : - |
| |---> +--------------------+
| | | Flink : 0x80..... | ---\
| | |--------------------| |
| \---- | Blink : Poisoned | |
| +--------------------+ |
\--------------- [...] ------------/
Unlinking instruction : mov [0x80YYYYYY], 0xYYXX60FF
New Opcode after unlinking : jmp [reg+XX] (FF 60 XX)
[ Figure 3 - Poisoned double linked list ]
This figure shows a MmNonPagedPoolFreeListHead entry layout that assures
predicted unlinking and then code execution. We must maintain this layout
or we will lose our position. NonPaged blocks come from two different
virtual memory ranges. The second memory region start is stored in
MmNonPagedPoolExpansionStart. A cleaning function is called sometime to
free blocks from the expansion NonPaged pool. To avoid this cleaning, we
can use a Paged pool block locked. You can lock a memory block with the
MmProbeAndLockPages function. This lock makes described memory region as
resident. Another more discreet way is to remap a NonPaged block with the
function MmMapLockedPagesSpecifyCache. It is more discrete because this
mapping would be just before expansion NonPaged pool memory range. Using
a locked Paged pool block creates an address totally differently. A quick
look at those addresses between NonPaged ones show a clear difference.
As virtual memory is very large, it does not take too much time to find
an address like 0xYYXX60FF. We will not unlock those pages until our
technique is running.
To defeat code path issues we differentiate two different states. The first
state is when our block is selected. The second state is when our block is
unlinked. If we were able to return to the first step with our next fake
entry selected, we could continue walkin
g code as normal. We achieve that
by using a generic approach. At IRQL equal to DISPATCH_LEVEL, we corrupt a
MmNonPagedPoolFreeListHead entry with some invalid pointers. With a hook on
the page fault handler we are capable to see first and second stages,
restore the right context each time and save context difference between
those states.
Assembly dump from MiAllocatePoolPages :
lea eax, [esi+8] ; Stage #1 esi is selected block and esi+8 its size
cmp [eax], ebx ; Check with needed size
mov ecx, esi
jnb loc_47014B
[...]
loc_47014B:
sub [esi+8], ebx
mov eax, [esi+8]
shl eax, 0Ch
add eax, esi
cmp _MmProtectFreedNonPagedPool, 0 ; Protected mode, don't care
mov [ebp+arg_4], eax
jnz short loc_47016E
mov eax, [esi] ; \ Stage #2
mov ecx, [esi+4] ; | Unlinking
mov [ecx], eax ; | procedure
mov [eax+4], ecx ; /
jmp short loc_470174
Now let's see how it works during our test technique with interrupt fault
handler (int 0xE) hooked :
lea eax, [esi+8]
; Stage #1 - Check with needed size
cmp [eax], ebx ; ----> PAGE FAULT esi = 0xAAAAAAAA | eax = esi + 8
; - We keep EIP and all registers
; - Scan all registers for 0xAAAAAAAA +/- 8
; and correct the current context. Continue.
mov ecx, esi
jnb loc_47014B
[...]
loc_47014B:
sub [esi+8], ebx
mov eax, [esi+8]
shl eax, 0Ch
add eax, esi
cmp _MmProtectFreedNonPagedPool, 0 ; Protected mode, don't care
mov [ebp+arg_4], eax
jnz short loc_47016E
mov eax, [esi] ; \ Stage #2 - Unlinking procedure
mov ecx, [esi+4] ; |
mov [ecx], eax ; | ------> PAGE FAULT ecx = 0xBBBBBBBB
; | eax = 0xCCCCCCCC
; | - Keep EIP and sub this context from
; | Stage #1 saved context
; | - Change fault registers and
; | structure pointers. Continue.
mov [eax+4], ecx ; /
jmp short loc_470174
Fault addresses 0xAAAAAAA, 0xBBBBBBBB and 0xCCCCCCCC must point on invalid
addresses to force a caught page fault. This test is made only once and
when we still have exclusivity on all processors. The int 0xE (page fault)
handler is restored just after.
This generic technique permits us to restore a valid context just before
selected block size is checked. Once we get code execution, we apply
context difference, change the current block register and then return at
first stage address. It works well because our two stages are very close,
once a selected block size is checked, unlinking is directly made.
Given examples were based on a single LIST_ENTRY of the
MmNonPagedPoolFreeListHead table but you must poison all entries. If a
given entry is empty (except for our fake blocks), the algorithm tries
next the entry. It means we will be called more than one time per
allocation. We created a mechanism to manage multiple call on a single
allocation. If the first entry is empty, the second entry is used and so
on. Then we will be called twice or more. By checking current table, we
can predict a future code execution on the same allocation and avoid
executing payload more than one time per allocation request.
Prefetch is a processor feature that retrieves more than a single
instruction from memory before it executes them. Some processor use a
complex branch prediction algorithm to fetch as much instruction as
possible. After some tests, we saw that processors invalidate code cache
when a modification occurs in cached memory addresses. Our driver supports
a case where code modification could be right after current instruction.
To achieve that we created a routine which calculates prefetch cache size
and consider it in next parts of our technique. We could also search
specifics instructions which clean prefetch cache like a far jump but it
can only be used as an option.
This technique gives us code execution for NonPaged allocation superior or
equal to 1 page. It achieves that with a stealth hook, created by kernel
code and cleaned by our routine directly after. It's far from being
perfect as those allocations are not used that much. Next part describes
how this technique can be extended to gain control over all NonPaged pool
allocations.
-----[ 3.2.2 - Expend it for every size
Others lists can not be hijacked the same way because synchronization
mechanisms are not exclusive. Changing some assembly code becomes tricky if
it can be executed by more than one thread at a time. Our method is
assuring our previous technique execution on any allocation. Once we have
control, we can find a way to retore ExAllocatePoolWithTag context with a
correct return value. We must do that without recoding a single line of
memory allocator. It is possible to create our own allocator but Windows
one is great and it will perfectly do the job for us.
During allocation, the lookaside list is checked first. It will pop an
entry and if this entry is not NULL, use it. This entry comes from
GENERAL_LOOKASIDE ListHeader field. This field structure is SLIST_HEADER.
kd> dt nt!_SLIST_HEADER .
+0x000 Alignment : Uint8B
+0x000 Next :
+0x000 Next : Ptr32 _SINGLE_LIST_ENTRY
+0x004 Depth : Uint2B
+0x006 Sequence : Uint2B
The ExInterlockedPopEntrySList function pops an entry from a SLIST_HEADER
structure. The Next field is a pointer to the next SLIST node (single
linked list). The Depth field represents how many entries are kept in the
list. ExFreePoolWithTag compare GENERAL_LOOKASIDE optimal depth with
current SLIST_HEADER depth. ExAllocatePoolWithTag does not check this
field and just looks if some entry can be popped out Next field. To stunt
allocation and free procedure on NonPaged lookaside table, we set Next
field to NULL and Depth field to 0xFFFF. This state will be preserved and
this table will not be used anymore.
Our technique expansion relies entirely on subverting how the
ExpNonPagedPoolDescriptor table is used. In the previous part, we explained
global variable ExpNumberOfNonPagedPools involvement in this process. It
is possible to expand number of NonPaged pools and then play with current
KNODE color. During allocation, the KNODE color defines which pool
descriptor is used. Then during free procedure, PoolIndex field of
POOL_HEADER keep pool descriptor color.
So we can use this nice feature to our advantage. Default KNODE color on
every processors would point on an empty pool descriptors. It will lead to
code execution using our base technique. If the function
MiAllocatePoolPages return address is not the one use for classical page
rounded allocation, we know that a smaller allocation occur. All we have
to do is switch PRCB KNODE pointer to a copy with custom color and recall
ExAllocatePoolWithTag. Everything related to allocation and block
management will be implemented as it needs to be even if it differs
between operating system versions. Returned blocks PoolIndex will point to
our own pool descriptor and free procedure, which will perfectly work. Lets
see how it will look on a single processor.
ExpNonPagedPoolDescriptor
+-------------------+
| PREVIOUS POOLDESC | <--- Kept for compatibility (0)
| EMPTY POOLDESC | <--- Default KNODE->color (1)
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| -- |
| CUSTOM POOLDESC | <--- Used for our allocations (16)
+-------------------+
[ Figure 4 - Corrupted ExpNonPagedPoolDescriptor ]
[ on single processor ]
This setup is just an example and you can manage the arrangement as you
want. We could transfer previous blocks from older pool descriptors in our
own and then receive free blocks. It is also possible to use multiple pool
descriptors and so on. Beware of system pool descriptor recycling as it can
leads to strange behavior specially on multi-processor architecture.
Once we have our fresh allocated block, we must return at
ExAllocatePoolWithTag return address. MiAllocatePoolPages has been called
to retrieve a new page and fill the current pool descriptor with it. It's
obvious that we can't return normally and let page allocation occurs. On
Intel x86 architecture the stack is used to store local variables,
arguments and saved registers. The Windows compiler starts by reserving
local variable and then pushes each register before its modification.
The next figure shows our stack configuration once we have code execution.
top
+--------------------+
| Our stack elements | Restore assembly example
+--------------------+ <------ /---------------\
| | | pop ecx |
| Saved registers | | pop ebx |
| | | pop esi |
+--------------------+ | leave |
| | | retn 0Ch |
| | \---------------/
| | |
| | |
| Stack variables | |
| | |
| | |
| | |
+--------------------+ [new stack level]
| Saved EBP | |
+--------------------+ |
| Return Address | |
+--------------------+ |
| | |
| Function arguments | |
| | |
+--------------------+ <--------------/
bottom
[ Figure 5 - Stack context after code execution ]
[ ~ small blocks case ~ ]
The restore assembly part shows correct assembly in current function which
perfectly restores the context. It does not correspond of the first series
of pop instruction before return. There is an important risk that some
register has not been pushed yet. It is possible to deduce the pushed
register number by looking at function prologue when stack variables are
reserved. In the Windows compiler, it's quite simple and we can easily
calculate the pushed register number. A simple disassembly analysis on
needed pop register number does the job. It must be done for
MiAllocatePoolPages and ExAllocatePoolWithTag. We change the return
address stored in the stack and go to the deduced MiAllocatePoolPages
address. Last step is setting eax register for the return value. Both
functions return a value and preserve eax value. Our analyzer is dynamic
and registers each pop and its register. That why we can restore the
proper context even if it changes between versions.
The Windows compiler is really easy to predict and does not create too
strange assembly organization. This technique is theoretically possible
on every assembly code that follow stdcall specification. The approach
could differ on others compilers.
---[ 3.3 Exploit our position
This article present a way of subverting the Windows kernel by modifying
only data. No function pointers, no static hooking or others classical
technique. It could exempt us of any other explanation. But it would not
be complete without some concrete examples. I personally believe that the
only limitation here is imagination.
-----[ 3.3.1 Generic stack redirection
Allocation occurs in so many places that you must rely on known context
and functions. Once everything is setup and before releasing exclusivity,
some stack redirection database can be created.
The first way to do this is calling a handler if stack backtracing
reveals a specific function. Stack backtracing shows only return addresses
and not which function call it. Debuggers resolve those functions by deep
analysis or symbol checking. Implementing those features would take too
much time. So it's better to target a specific return address on
ExAllocatePoolWithTag stack frame. It will definitely improve check speed.
To do that, we indicates to our stack redirection API that we target a
specific function. Then launch a normal call or procedure that will lead to
our function. Every allocation during this time will show important
backtrace stacks.
Let say, we target an IRP and we know which function handles it by looking
at IRP dispatch table. We also know by reversing that it will allocate a
NonPaged block. Launching an I/O request, our API could register some
NonPaged call and recognize later.
In the wild, it will call the appropriate handler with sub context
information. Sometimes getting a context is not enough. The second way
stays on same principles but modifies the stack to assure our handler is
called once the function end. Efficiency depends on what is your target
and how you modifying it.
-----[ 3.3.2 Userland process code injection
This technique can be also used to inject code in userland to subvert
trusted applications. NonPaged allocation occurs a lot in kernel mode and
it happens in every process. Some kernel drivers like win32k.sys call
userland many times. This call is achieve by the function
KeUserModeCallback [35]. It modifies userland stack to switch temporarily
for a call in userland. Available functions are limited by a table.
Userland injection from kernel should not be resident and only concern
known trusted application as browsers. Injection can be done on
explorer.exe as well to launch an hidden instance of a trusted program.
KeUserModeCallback algorithm can be easily remade or copied then
relocated.Redirection table could be subverted to redirect the call. We
can also think about exploiting userland calls. It does not make any sense
to add checks on those available functions.
--[ 4 - Detection
This article does not try to convince you that subverting IDT or
allocation mechanism using advanced technique is the future. Most detection
tools only indicate if a rookit may or may not be in this computer. It has
pains identifying which module is responsible. It detects antivirus or
firewall as rootkits. A protection layout could detect itself as a rootkit
because it does everything a rootkit does and so does not ask it to block
or uninstall a rootkit. Rootkit papers demonstrate so many great ways to
easily bypass those protections. But we don't see much those techniques in
the wild, simply because rootkits don't need them for the moment.
Detect software behavior modification could be part of a Verifiable
Operating System [36]. It will involve basic checks on known memory
structures. Checks integrity of LIST_ENTRY structures and correct them if
needed. We can blame rootkit protections as much as we want but detecting
rootkits on a closed operating system is almost impossible. Gives more
information for kernel components will certainly leads to more
sofisticates attacks. In the other hand, it could reduce attack surface.
It is specially true on a defence oriented operating system. Next
protection improvements should come from the operating system itself.
Now that there are hardware improvements for virtualisation, such as
hypervisors, there will be extensions to hardware to detect and protect
against rootkits. It offers a real control on operating system behavior
without advanced research on kernel layout. Some protections techniques
that were impossible to implement in Windows environment like PAX, could
rely on those hardware features. Our techniques could be detected by
registering and monitoring some specific events on the processor. It is
possible today to do that but performance issues are important.
Our attacks could be blocked using targeted protection such as signatures.
An attack is defined as how many times it takes to create a generic
protection. In this area, Patchguard is an important improvement.
--[ 5 - Conclusion
This papers techniques were made to show that elegant software hijacking
can still evades most protections and avoid any performance issues or
unstable behaviors. Even though, these techniques are hardly reliable and
should be considered only as a technical proof of concept. New protections
are not efficient enough or present. They do not represent a threat for
a rootkit which targets millions of computers. Reversing is an important
tool in improving software rootkits techniques. Detecting that a rootkit
is present should not be enough. A protection which cannot uninstall a
rootkit or prevent infection is useless. Drivers signatures was a good
idea as it was designed to stop current infections entries. But infection
prevention includes local kernel exploitation. Generic detection of those
attacks would need an important improvement in anti-rootkits protections
and operating system design.
--[ 6 - References
[1] Holy Father, Invisibility on NT boxes, How to become unseen on Windows
NT (Version: 1.2)
http://vx.netlux.org/lib/vhf00.html
[2] Holy Father, Hacker Defender
https://www.rootkit.com/vault/hf/hxdef100r.zip
[3] 29A
http://vx.netlux.org/29a
[4] Greg Hoglund, NT Rootkit
https://www.rootkit.com/vault/hoglund/rk_044.zip
[5] fuzen_op, FU
http://www.rootkit.com/project.php?id=12
[6] Peter Silberman, C.H.A.O.S, FUto
http://uninformed.org/?v=3&a=7
[7] Eeye, Bootroot
http://research.eeye.com/html/tools/RT20060801-7.html
[8] Eeye, Pixie
http://research.eeye.com/html/papers/download/
eEyeDigitalSecurity_Pixie%20Presentation.pdf
[9] Joanna Rutkowska and Alexander Tereshkin, Blue Pill project
http://bluepillproject.org/
[10] Frank Boldewin, A Journey to the Center of the Rustock.B Rootkit
http://www.reconstructer.org/papers/
A%20Journey%20to%20the%20Center%20of%20the%20Rustock.B%20Rootkit.zip
[11] Frank Boldewin, Peacomm.C - Cracking the nutshell
http://www.reconstructer.org/papers/
Peacomm.C%20-%20Cracking%20the%20nutshell.zip
[12] Stealth MBR rootkit
http://www2.gmer.net/mbr/
[13] EP_X0FF and MP_ART, Unreal.A, bypassing modern Antirootkits
http://www.rootkit.com/newsread.php?newsid=647
[14] AK922 : Bypassing Disk Low Level Scanning to Hide File
http://rootkit.com/newsread.php?newsid=783
[15] CardMagic and wowocock, DarkSpy
http://www.fyyre.net/~cardmagic/index_en.html
[16] pjf, IceSword
http://pjf.blogone.net
[17] Gmer
http://www.gmer.net/index.php
[18] Pageguard papers (Uniformed) :
- Bypassing PatchGuard on Windows x64 by skape & Skywing
http://www.uninformed.org/?v=all&a=14&t=sumry
- Subverting PatchGuard Version 2 by Skywing
http://www.uninformed.org/?v=all&a=28&t=sumry
- PatchGuard Reloaded: A Brief Analysis of PatchGuard Version 3 by Skywing
http://www.uninformed.org/?v=all&a=38&t=sumry
[19] Greg Hoglund, Kernel Object Hooking Rootkits (KOH Rootkits)
http://www.rootkit.com/newsread.php?newsid=501
[20] Windows Heap Overflows - David Litchfield
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-litchfield/
bh-win-04-litchfield.ppt
[21] Bypassing Klister 0.4 With No Hooks or Running a Controlled
Thread Scheduler by 90210 - 29A
http://vx.netlux.org/29a/magazines/29a-8.rar
[22] Microsoft, Debugging Tools for Windows
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
[23] Kad, Phrack 59, Handling Interrupt Descriptor Table for fun and profit
http://phrack.org/issues.html?issue=59&id=4#article
[24] Wikipedia, Southbridge
http://en.wikipedia.org/wiki/Southbridge_(computing)
[25] Wikipedia, Northbridge
http://en.wikipedia.org/wiki/Northbridge_%28computing%29
[26] The NT Insider, Stop Interrupting Me -- Of PICs and APICs
http://www.osronline.com/article.cfm?article=211 (login required)
[27] Russinovich, Solomon, Microsoft Windows Internals, Fourth Edition
Chapter 3. System Mechanisms -> Trap Dispatching
[28] MSDN, KeyboardClassServiceCallback
http://msdn2.microsoft.com/en-us/library/ms793303.aspx
[29] Clandestiny, Klog
http://www.rootkit.com/vault/Clandestiny/Klog%201.0.zip
[30] Alexander Tereshkin, Rootkits: Attacking Personal Firewalls
www.blackhat.com/presentations/bh-usa-06/BH-US-06-Tereshkin.pdf
[31] MSDN, NdisMIndicateReceivePacket
http://msdn2.microsoft.com/en-us/library/aa448038.aspx
[32] Subverting VistaTM Kernel For Fun And Profit by Joanna Rutkowska
http://invisiblethings.org/papers/
joanna%20rutkowska%20-%20subverting%20vista%20kernel.ppt
[33] Vista RC2 vs. pagefile attack by Joanna Rutkowska
http://theinvisiblethings.blogspot.com/2006/10/
vista-rc2-vs-pagefile-attack-and-some.html
[34] Russinovich, Solomon, Microsoft Windows Internals, Fourth Edition
Chapter 7. Memory Management -> System Memory Pools
[35] KeUserModCallback ref - "Ring0 under WinNT/2k/XP" by Ratter - 29A
http://www.illmob.org/files/text/29a7/Articles/29A-7.003
[36] Joanna Rutkowska - Towards Verifiable Operating Systems
http://theinvisiblethings.blogspot.com/2007/01/
towards-verifiable-operating-systems.htm