Copy Link
Add to Bookmark
Report

1.10 Introducing SHELF Loading (Part 1)

eZine's profile picture
Published in 
tmp0ut
 · 2 years ago

The Nexus between Static and Position Independent Code
~ @ulexec and @Anonymous_

1. Introduction

Over the last several years there have been several enhancements in Linux offensive tooling in terms of sophistication and complexity. Linux malware has become increasingly more popular, given the higher number of public reports documenting Linux threats. These include government-backed Linux implants such as APT28's VPNFilter, Drovorub or Winnti wide range of Linux Malware.

However, this increase in popularity does not seem to have had much of an impact in the totality of the sophistication of the current Linux threat landscape just yet. It's a fairly young ecosystem, where cybercriminals have not been able to identify reliable targets for monetization apart from Cryptocurrency Mining, DDoS, and more recently, Ransomware operations.

In today's Linux threat landscape, even the smallest refinement or introduction of complexity often results in AV evasion, and therefore Linux malware authors do not tend to invest unnecessary resources to sophisticate their implants. There are various reasons why this phenomenon occurs, and it is subject to ambiguity. The Linux ecosystem, in contrast to other popular spheres such as Windows and MacOS, is more dynamic and diverse, stemming from the many flavors of ELF files for different architectures, the fact that ELF binaries can be valid in many different forms, and that the visibility of Linux threats is often quite poor.

Due to these issues, AV vendors face a completely different set of challenges detecting these threats. Often times this disproportionate detection failure of simple/unsophisticated threats leaves an implicit impression that Linux malware is by nature not complex. This statement couldn't be further from the truth, and those familiar with the ELF file format know that there is quite a lot of room for innovation with ELF files that other file formats are not able to provide due to their lack of flexibility, even if we have not seen it abused as much over the years.

In this article we are going to discuss a technique that achieves an uncommon functionality of file formats, which generically converts full executables to shellcode in a way that demonstrates, yet again, another example that ELF binaries can be manipulated to achieve offensive innovation that is hard or impossible to replicate in other file formats.

2. A Primer On ELF Reflective Loading

In order to understand the technique, we must first give a contextual background on previously known ELF techniques upon which this one is based, with a comparison of the benefits and tradeoffs.

Most ELF packers, or any application implementing any form of ELF binary loading, are primarily based on what's known as User-Land-Exec.

User-Land-Exec is a method first documented by @thegrugq, in which an ELF binary can be loaded without using any of the execve family of system calls, and hence its name.

For the sake of simplicity, the steps to implement an ordinary User-Land-Exec with support of ET_EXEC and ET_DYN ELF binaries is illustrated in the following diagram, showcasing an implementation of the UPX packer for ELF binaries:

1.10 Introducing SHELF Loading (Part 1)
Pin it

As we can observe, this technique has the following requirements (by @thegrugq):

  1. Clean out the address space
  2. If the binary is dynamically linked, load the dynamic linker.
  3. Load the binary.
  4. Initialize the stack.
  5. Determine the entry point (i.e. the dynamic linker or the main executable).
  6. Transfer execution to the entry point.

On a more technical level, we come up with the following requirements:

  1. Setup the stack of the embedded executable with its correspondent Auxiliary Vector.
  2. Parse PHDR's and identify if there is a PT_INTERP segment, denoting that the file is a dynamically linked executable.
  3. LOAD interpreter if PT_INTERP is present.
  4. LOAD target embedded executable.
  5. Pivot to mapped e_entry of target executable or interpreter accordingly, depending if the target executable is a dynamically linked binary.

For a more in-depth explanation, we suggest reading @thegrugq's comprehensive paper on the matter [9].

One of the capabilities of conventional User-Land-Exec are the evasion of an execve footprint as previously mentioned, in contrast with other techniques such as memfd_create/execveat, which are also widely used to load end execute a target ELF file. Since the loader maps and loads the target executable, the embedded executable has the flexibility of having a non-conventional structure. This has the side benefit of being useful for evasion and anti-forensics purposes.

On the other hand, since there are a lot of critical artifacts involved in the loading process, it can be easy to recognize by reverse-engineers, as well as being somewhat fragile due to the fact that the technique is heavily dependent on these components. For this reason, writing User-Land-Exec based loaders have been somewhat tedious. As more features get added to the ELF file format, this technique has been inclined to mature over time and thereby increasing its complexity.

The new technique that we will be covering in this paper relies on implementing a generic User-Land-Exec loader with a reduced set of constraints supporting a hybrid PIE and statically linked ELF binaries that to our knowledge have yet to be reported.

We believe this technique represents a drastic improvement of previous versions of User-Land-Exec loaders, since based on the lack of technical implementation constraints and the nature of this new hybrid static/PIE ELF flavor, the extent of capabilities it can provide is wider and more evasive than with previous User-Land-Exec variants.

3. Internals Of Static PIE Executable Generation

3.1 Background

In July of 2017 H. J. Lu patched a bug entry in GCC bugzilla named ‘Support creating static PIE'. This patch mentioned the implementation of a statically based PIE in his branch at glibc hjl/pie/static, in which Lu documented that by supplying –static and –pie flags to the linker along with PIE versions of crt*.o as input, static PIE ELF executables could be generated. It is important to note, that at the time of this patch, generation of fully statically linked PIE binaries was not possible.[1]

In August, Lu submitted a second patch[2] to the GCC driver, for adding the –static flag to support static PIE files that he was able to demonstrate in his previous patch. The patch was accepted in trunk[3], and this feature was released in GCC v8.

Moreover, in December of 2017 a commit was made in glibc[4] adding the option –enable-static-pie. This patch made it possible to embed the needed parts of ld.so to produce standalone static PIE executables.

The major change in glibc to allow static PIE was the addition of the function _dl_relocate_static_pie which gets called by __libc_start_main. This function is used to locate the run-time load address, read the dynamic segment, and perform dynamic relocations before initialization, then transfer control flow of execution to the subject application.

In order to know which flags and compilation/linking stages were needed in order to generate static PIE executables, we passed the flag –static-pie –v to GCC. However, we soon realized by doing this that the linker generated a plethora of flags and calls to internal wrappers. As an example, the linking phase is handled by the tool /usr/lib/gcc/x86_64-linux-gnu/9/collect2 and GCC itself is wrapped by /usr/lib/gcc/x86_64-linux-gnu/9/cc1. Nevertheless, we managed to remove the irrelevant flags and we ended up with the following steps:

1.10 Introducing SHELF Loading (Part 1)
Pin it

These steps are in fact the same provided by Lu, supplying the linker with input files compiled with –fpie, and –static, -pie, -z text, --no-dynamic-linker. In particular, the most relevant artifacts for static PIE creation are rcrt1.o, libc.a, and our own supplied input file, test.o. The rcrt1.o object contains the _start code which has the code required to correctly load the application before executing its entry point by calling the correspondent libc startup code contained in __libc_start_main:

1.10 Introducing SHELF Loading (Part 1)
Pin it

As previously mentioned, __libc_start_main will call the new added function _dl_relocate_static_pie (defined at elf/dl-reloc-static-pie.c file of glibc source). The primary steps performed by this function are commented in the source:

1.10 Introducing SHELF Loading (Part 1)
Pin it

With the help of these features, GCC is capable of generating static executables which can be loaded at any arbitrary address.

We can observe that _dl_relocate_static_pie will handle the needed dynamic relocations. One noticeable difference of rcrt1.o from conventional crt1.o is that all contained code is position independent. Inspecting what the generated binaries look like we see the following:

1.10 Introducing SHELF Loading (Part 1)
Pin it

At first glance they seem to be common dynamically linked PIE executables, based on the ET_DYN executable type retrieved from the ELF header. However, upon closer examination of the segments, we will observe the nonexistent PT_INTERP segment usually denoting the path to the interpreter in dynamically linked executables and the existence of a PT_TLS segment, usually contained only in statically linked executables.

1.10 Introducing SHELF Loading (Part 1)
Pin it

If we check what the dynamic linker identifies the subject executable as, we will see it identifies the file type correctly:

1.10 Introducing SHELF Loading (Part 1)
Pin it

In order to load this file, all we would need to do is map all the PT_LOAD segments to memory, set up the process stack with the correspondent Auxiliary Vector entries, and then pivot to the mapped executable's entry point. We do not need to be concerned about mapping the RTLD since we don't have any external dependencies or link time address restrictions.

As we can observe, we have four loadable segments commonly seen in SCOP ELF binaries. However, for the sake of easier deployment, it will be crucial if we could merge all those segments into one as is usually done with ELF disk injection into a foreign executable. We can do just this by using the –N linker flag to merge data and text within a single segment.


3.2. Non-compatibility of GCC's -N and static-pie flags

If we pass –static-pie and –N flags together to GCC we see that it generates the following executable:

1.10 Introducing SHELF Loading (Part 1)
Pin it

The first thing we noticed about the type of generated ELF when using –static-pie alone was that it had a type of ET_DYN, and now together with –N it results in an ET_EXEC.

In addition, if we take a closer look at the segment's virtual addresses, we see that the generated binary is not a Position Independent Executable. This is due to the fact that the virtual addresses appear to be absolute addresses and not relative ones. To understand why our program is not being linked as expected, we inspected the linker script that was being used.

As we are using the ld linker from binutils, we took a look on how ld selected the linker script; this is done in the ld/ldmain.c code at line 345:

1.10 Introducing SHELF Loading (Part 1)
Pin it

The ldfile_open_default_command_file is in fact an indirect call to an architecture independent function generated at compile time that contains a set of internal linker scripts to be selected depending upon the flags passed to ld. Because we are using the x86_64 architecture, the generated source will be ld/elf_x86_64.c, and the function which is called to select the script is gldelf_x86_64_get_script, which is simply a set of if-else-if statements to select the internal linker script. The –N option sets the config.text_read_only variable to false, which forces the selection function to use an internal script which does not produce PIC as can be seen below:

1.10 Introducing SHELF Loading (Part 1)
Pin it

This way of selecting the default script makes the –static-pie and –N flags non-compatible, because the forced test of selecting the script based on –N is parsed before –static-pie.


3.3. Circumvention via custom Linker Script

The incompatibility between –N, -static, and –pie flags led us to a dead end, and we were forced to think of different ways to overcome this barrier. What we attempted was to provide a custom script to drive the linker. As we essentially needed to merge the behavior of two separate linker scripts, our approach was to choose one of the scripts and adapt it to generate the desired outcome with features of the remaining script.

We chose the default script of –static-pie over the one used with –N because in our case it was easier to modify as opposed to changing the –N default script to support PIE generation.

To accomplish this goal, we would need to change the definition of the segments, which are controlled by the PHDRS [5] field in the linker script. If the command is not used the linker will provide program headers generated by default – However, if we neglect this in the linker script, the linker will not create any additional program headers and will strictly follow the guidelines defined in the subject linker script.

Taking into account the details discussed above, we added a PHDRS command to the default linker script, starting with all the original segments which are created by default when using –static-pie:

1.10 Introducing SHELF Loading (Part 1)
Pin it

After this we need to know how each section maps to each segment – and for this we can use readelf as shown below:

1.10 Introducing SHELF Loading (Part 1)
Pin it

With knowledge of the mappings, we just needed to change the section output definition in the linker script which adds the appropriate segment name at the end of each function definition, as shown in the following example:

1.10 Introducing SHELF Loading (Part 1)
Pin it

Here, the .tdata and .tbss sections are being assigned to the segments that get mapped in the same order that we saw in the output of the readelf –l command. Eventually, we ended up having a working script precisely changing all mapped sections which were mapped in data to the text segment:

1.10 Introducing SHELF Loading (Part 1)
Pin it

If we compile our subject test file with this linker script, we see the following generated executable:

1.10 Introducing SHELF Loading (Part 1)
Pin it

We now have a static-pie with just one loadable segment. The same approach can be repeated to remove other irrelevant segments, keeping only critical segments necessary for the execution of the binary. As an example, the following is a static-pie executable instance with minimal program headers needed to run:

1.10 Introducing SHELF Loading (Part 1)
Pin it

The following is the final output of our desired ELF structure – having only one PT_LOAD segment generated by a linker script with the PHDRS command configured as in the screenshot below:

1.10 Introducing SHELF Loading (Part 1)
Pin it
← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT