Introduction to the development of LKMs (Linux Kernel Modules)
Number 0x02: 15/02/2007
[ --- The Bug! Magazine
_____ _ ___ _
/__ \ |__ ___ / __\_ _ __ _ / \
/ /\/ '_ \ / _ \ /__\// | | |/ _` |/ /
/ / | | | | __/ / \/ \ |_| | (_| /\_/
\/ |_| |_|\___| \_____/\__,_|\__, \/
|___/
[ M . A . G . A . Z . I . N . E ]
[ Numero 0x02 <---> Edicao 0x01 <---> Artigo 0x02 ]
.> 14 de Fevereiro de 2007,
.> The Bug! Magazine < staff [at] thebugmagazine [dot] org >
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Introducao ao Desenvolvimento de LKMs (Linux Kernel Modules)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.> 08 de Novembro de 2006
.> Strauss < strauss [at] rfdslabs [dot] com [dot] br >
Index
- Introduction
- Why this?
- What?
- Some good advice
- Hello World" for LKMs
- 5.1. Initialization and removal
- 5.2. Compiling
- 5.3. init.h
- 5.3.1. module_init() and module_exit()
- 5.3.2. __init __initdata and __exit
- 5.4. Makefile for compiling more than one module
- 5.5. Adding information to modules
- 5.6. Receiving arguments
- 5.7. Modules with more than one file
- Conclusions
1. Introduction
This text has the purpose of initiating the reader to the development of LKMs. We do not show here anything new or much less to serve as a complete reference for the development of LKMs. Our only goal is to organize content for beginners taken from sources publicly available on the Internet (and/or in my head, on demand) in order to provide a tutorial that is both: direct/concise and efficient in its mission.
Some programming knowledge (C language in particular) will be required, but that should be expected.
Some of the information presented here is specific to the 2.6.x version of the Linux kernel. In particular, how we will compile our sample modules.
Having said all that, "buckle your seatbelt, Dorothy, 'cause Kansas is going bye-bye". ;)
2. Why this?
The best way to start our learning is to know if this lecture is even necessary or at least useful to the reader. Although, if you ended up here, you probably have some interest in the topic, some (good?) reasons to become a kernel hacker are:
- It's one of the best ways to get started in OS development (along with the respective theoretical knowledge);
- Device Drivers are just one kind of LKM, and device drivers are always in fashion;
- LKMs are a powerful tool to do everything you always wanted to do with your computer but couldn't within user-land;
- Inevitably, practicing developing LKMs will make you more and more fluent in the Linux kernel (and people will call you an expert).
If none of this interests you and you are not addicted to any kind of knowledge, it is better to find something else to do.
3. What?
An LKM is a piece of code that extends the functionality of the Linux kernel. They share the same address space, the same namespace and the same processor privilege level (ring0 (Starr) on Intel x86).
Some modules can be inserted dynamically (ie. at runtime) into the kernel, while the rest can attach their code only under kernel recompilation. The tools in the module-init-tools package (e.g. insmod, rmmod, lsmod, etc.) are useful for the management of dynamically loadable LKMs (hereafter simply called DLLKMs).
It is important to mention that DLLKMs will be rejected by the kernel if the kernel is of a different version than the one it was compiled for. There is, however, a scheme called modversion, which can be enabled by the kernel compiler configuration options, that provides some compatibility for these situations. Essentially, a checksum is made of the code of the public functions that the module uses and this checksum is compared with the checksum of the same functions in the running kernel. If all these checksums match, it's safe to assume that nothing has changed in the kernel (at least with respect to the functions this module uses) and that it will work correctly in the running kernel, even if it is a different version.
4. Some good advice
First, if you are going to write LKMs you will be working in the same namespace as the kernel, don't pollute it! The kernel, because of its size, already exports a lot of symbols and adding new symbols only increases the probability of a collision. If you do not need to have your symbols exported, use the keyword 'static' in your declarations. If, on the other hand, you really need to export the symbols from your module (to be used by other modules, for example), uniquely name them. This is usually done by adding a unique prefix, such as the module name, to the symbol names. E.g.: meulkm_var1
It is worth mentioning that the list of all symbols present in the running kernel can be found at: /proc/kallsyms (if your kernel was compiled with proc filesystem support).
It is also notable that not all of these symbols are available to DLLKMs; they use a subset of kallsyms: the ones that are exported with the EXPORT_SYMBOL() macro in the source code. I know of no other way to list these symbols than by examining the kernel source code (grep is your friend).
Another good advice is to use floating-point: don't use it! Using floating-point operations inside the kernel alters the state of the FPU and has a good chance of causing "unexplainable" bugs in the program that uses your module/driver. If you _really_ need decimals, use fixed point or some proper representation of rationals.
Don't forget to consider with special "care" the security of your LKM. Because they run with maximum privileges, the presence of bugs in a module can completely compromise the security and/or stability of the system. Reading The Bug! magazine may help you with that. :)
And the last tip: the kernel mailing list, lkml, and the documents that come with the kernel code are great reference sources.
5. "Hello World!" for LKMs
Time to write some code! Let's start with a simple "Hello World!" module and from there add some useful functionality to the code as new concepts are introduced.
5.1. Initialization and removal
Well, first of all, every LKM must include linux/kernel.h and linux/module.h.
These files contain a number of definitions and declarations essential to any code written for the linux kernel (and for modules, in particular). In particular, linux/module.h declares two functions that must be implemented by a LKM. The prototypes follow:
int init_module(void);
void cleanup_module(void);
As you might expect, these functions handle the initialization and removal of a module, respectively. Typically, you will want to use init_module() to do some sort of registration (listeners, hooks, kernel list-structure elements, etc...), while cleanup_module() will undo everything you did at initialization. In our example, however, we will use init to give us a "Hello" and cleanup for a "Goodbye":
---BING-BONG---BING-BOI---hello1.c---BING-BOI---BING-BONG---
#include <linux/kernel.h>
#include <linux/module.h>
int init_module()
{
printk(KERN_EMERG "Hello, World 1!\n");
/* retorno diferente de 0 significa
* erro e faz o modulo nao ser inserido */
return 0;
}
void cleanup_module()
{
printk(KERN_EMERG "Goodbye, cruel World 1!\n");
}
---BING-BONG---BING-BOI---hello1.c---BING-BOI---BING-BONG---
Notice the use of the printk() function. It serves us here as a replacement for printf(), because we can't use the libraries we are used to using in user-space, even libc. Fortunately, the kernel has implementations of several functions that are useful to us, like printk(), dynamic memory allocation functions, and string manipulation functions.
Strictly speaking printk() is a logging function: it takes a message and a priority level of that message and writes it to /var/log/messages. However, if the priority level is greater than or equal to the "constant" console_loglevel (it is actually a "define"), the message is also printed to the console. We are using the highest priority level, KERN_EMERG, and so we get the "printf" effect we wanted for our "Hello World".
5.2. Compiling
To compile our first module (and subsequent modules) we will use a scheme provided by the kernel called kbuild. We won't cover the details of the kbuild here, but we will basically create a Makefile in the same directory as our module source that sets a few variables and then makes use of a much more complex Makefile that already comes with the kernel.
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
obj-m += hello1.o
all:
make -C /lib/modules/`uname -r`/build M=`pwd` modules
clean:
make -C /lib/modules/`uname -r`/build M=`pwd` clean
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
Let's test it:
[beer@pwnzmyw0rld linuxiztehg4y] make
*** coisas dizendo que fizemos tudo certo (ou nao) sao impressas aqui ***
[beer@pwnzmyw0rld linuxiztehg4y] insmod hello1.ko
Hello World 1!
[beer@pwnzmyw0rld linuxiztehg4y] rmmod hello1
Goodbye, cruel World 1!
I know, it's depressingly easy.
5.3. init.h
The header file linux/init.h has some interesting macros and definitions for developing startup and cleanup routines for LKM:
5.3.1. module_init() e module_exit()
The macros module_init() and module_exit() take a function as parameter and make those functions the init and cleanup functions, respectively. Besides serving as a way to rename the init and cleanup functions to whatever you want (instead of using init_module and cleanup_module), the macro makes specific adjustments to how the module will be compiled, module or built-in.
5.3.2. __init __initdata and __exit
These defines only make a difference when the module is compiled as built-in. __init
and __initdata
put initialized global functions and variables, respectively, in a section that is cleared from memory once the kernel has finished booting, i.e., you are telling the kernel that those bits of code and data are not needed after the module initializes.
__exit
, on the other hand, says that the function(s) are only concerned with cleanup of the module. Consequently, they are useless for built-in modules and the kernel never loads them into memory.
Let's see an example of the use of these defines and macros:
---BING-BONG---BING-BOI---hello2.c---BING-BOI---BING-BONG---
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int hello_data __initdata = 2;
static int __init hello_init(void)
{
printk(KERN_EMERG "Hello, World %d!\n", hello_data);
return 0;
}
static void __exit hello_exit(void)
{
printk(KERN_EMERG "Goodbye, cruel World 2!\n");
}
module_init(hello_init);
module_exit(hello_exit);
---BING-BONG---BING-BOI---hello2.c---BING-BOI---BING-BONG---
5.4. Makefile for compiling more than one module
The Makefile for simultaneously compiling the two modules we have written so far looks like this:
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
obj-m += hello1.o
obj-m += hello2.o
all:
make -C /lib/modules/`uname -r`/build M=`pwd` modules
clean:
make -C /lib/modules/`uname -r`/build M=`pwd` clean
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
With this Makefile, 'make' will produce both hello1.ko and hello2.ko.
5.5. Adding information to modules
Modules have a special section, called .modinfo, which can store a lot of information about the module, like author, license type, and a short textual description. This information can be investigated with the modinfo program, as we see below:
[beer@pwnzmyw0rld linuxiztehg4y] modinfo hello1.ko
filename: hello1.ko
vermagic: 2.6.17.9 mod_unload K7 REGPARM gcc-4.1
depends:
[beer@pwnzmyw0rld linuxiztehg4y]
As we can see, the compiler does not fill in much information by default but we can improve this documentation with some special macros:
---BING-BONG---BING-BOI---hello3.c---BING-BOI---BING-BONG---
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int hello_data __initdata = 3;
static int __init hello_init(void)
{
printk(KERN_EMERG "Hello, World %d!\n", hello_data);
return 0;
}
static void __exit hello_exit(void)
{
printk(KERN_EMERG "Goodbye, cruel World 3!\n");
}
module_init(hello_init);
module_exit(hello_exit);
MODULE_AUTHOR("Strauss <strauss@rfdslabs.com.br>");
MODULE_DESCRIPTION("A simple \"Hello World\" module");
MODULE_LICENSE("GPL");
---BING-BONG---BING-BOI---hello3.c---BING-BOI---BING-BONG---
Here we use MODULE_AUTHOR(), MODULE_DESCRIPTION() and MODULE_LICENSE(), but there are several others that may be of use. It is an exercise for the reader to research what the other macros are (they are in linux/module.h).
[beer@pwnzmyw0rld linuxiztehg4y] modinfo hello3.ko
filename: hello3.ko
author: Strauss <strauss@rfdslabs.com.br>
description: A simple "Hello World" module
license: GPL
vermagic: 2.6.17.9 mod_unload K7 REGPARM gcc-4.1
depends:
[beer@pwnzmyw0rld linuxiztehg4y]
Now it is more beautiful.
5.6. Receiving arguments
At first glance, you might think that modules cannot take arguments (note that init_module() takes void), but this is not true. The kernel provides a way of passing arguments through macros defined in linux/moduleparam.h.
First, we declare which global variables can be controlled externally (i.e. our arguments) with the macros module_param(), module_param_array() and module_param_string() whose signatures follow:
module_param(name, type, perm)
- name -> name of the variable/parameter
- type -> type of the variable/parameter
- perm -> permissions in /sys/module/<nome-do-modulo/parameters/<nome-do-param>
module_param_array(name, type, nump, perm)
- nump -> pointer to an integer that will hold the number of elements in the array
module_param_string(name, string, len, perm)
- string -> same as 'name'
- len -> usually sizeof(string)
Note that 'perm' is a number in octal format equal to the chmod permissions. If 'perm' is equal to 0, the parameter entry in /sys is not created.
Parameters can also be documented for modinfo, with the macro MODULE_PARAM_DESC(). Here is an example:
---BING-BONG---BING-BOI---hello4.c---BING-BOI---BING-BONG---
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int hello_data __initdata = 0;
static int __init hello_init(void)
{
printk(KERN_EMERG "Hello, World %d!\nì, hello_data);
return 0;
}
static void __exit hello_exit(void)
{
printk(KERN_EMERG "Goodbye, cruel World 4!\n");
}
module_init(hello_init);
module_exit(hello_exit);
module_param(hello_data, int, 000);
MODULE_AUTHOR(Julio Auto <jam@cin.ufpe.br>);
MODULE_DESCRIPTION(ìA simple \îHello World\î moduleî);
MODULE_LICENSE(ìGPLî);
MODULE_PARM_DESC(hello_data, ì\îHello World\î counterî);
---BING-BONG---BING-BOI---hello4.c---BING-BOI---BING-BONG---
5.7. Modules with more than one file
Sometimes, depending on the organization and complexity of your LKM, it may make sense to separate the module into more than 1 source file. If this is the case, the kbuild must be instructed a little differently through the Makefile. Example:
---BING-BONG---BING-BOI---load.c---BING-BOI---BING-BONG---
#include <linux/kernel.h>
#include <linux/module.h>
int init_module()
{
printk(KERN_EMERG ìLoading now!\nî);
return 0;
}
---BING-BONG---BING-BOI---load.c---BING-BOI---BING-BONG---
---BING-BONG---BING-BOI---unload.c---BING-BOI---BING-BONG---
#include <linux/kernel.h>
#include <linux/module.h>
void cleanup_module()
{
printk(KERN_EMERG ìUnloading now!\nî);
}
---BING-BONG---BING-BOI---unload.c---BING-BOI---BING-BONG---
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
obj-m += hello-1.o
obj-m += hello-2.o
obj-m += hello-3.o
obj-m += hello-4.o
obj-m += loadunload.o
loadunload-objs := load.o unload.o
all:
make -C /lib/modules/`uname -r`/build M=`pwd` modules
clean:
make -C /lib/modules/`uname -r`/build M=`pwd` clean
---BING-BONG---BING-BOI---Makefile---BING-BOI---BING-BONG---
So, in this example we put the loading code of our module in one file and the unloading code in another. Our Makefile will compile this into a single object file, loadunload.ko, but since we don't have a loadunload.c, we need to tell make where to find loadunload.o, and this is what we do in the line "loadunload-objs := load.o unload.o". In this case, the intermediate object loadunload.o will be built from two other intermediate objects, load.o and unload.o, which are compiled from the code in load.c and unload.c
6. Conclusions
Developing LKMs is a very powerful resource and a first step into the world of operating system development. From what we can see in this text, things are much simpler than it seems.
Specific uses of LKMs, like device drivers and rootkits, use the same concepts and basics and only differ in the use of different kernel structures and APIs (like a function call to register a device driver with a device in /dev). This could be material for a future text of The Bug!, either written by me or by you that just learned how to hack the kernel.
:)
Questions, comments and suggestions can be sent to my e-mail: strauss@rfdslabs.com.br
Finally, I would like to send a hug to all the rfdslabs people and to The Bug! and leave some basic references for the beginning of LKM development.
Linux Kernel Modules Programming Guide http://www.tldp.org/LDP/lkmpg/2.6/html/index.html
Unreliable Guide To Hacking The Kernel http://kernelbook.sourceforge.net/kernel-hacking.pdf
Cheers,
Strauss