SLAM4.024: An idiot guide to writing polymorphic engines by Trigger/SLAM
An idiot guide to writing polymorphic engines
by Trigger [SLAM] '97
1. INTRODUCTION
Polymorphism is, in my opinion, very hard to explain *well* in just one little tutorial. You can find tute's about almost everything, including poly- morphism, but I personally have never seen a tute covering ALL aspects of (writing) polymorphic engines. Why? Because there's so much to it, and most important, there's not *one* specific way of doing it. Many tutorials will explain and give many examples of how generated decryptors will look like, and why a random layout for decryptors is important. I will try to explain you more than the average text does, but I will still cover just the basics of polymorphism, so you can get a good idea what it takes. Also you should get an idea whether you'd really want to try to make one. Note that this is an "idiot guide to writing polymorphic engines" so the basics of *writing* one, and the practical problems you will come across will be covered. I will give you examples of constructions, not because those are the most efficient ones to use, but to teach you how you can start trying to build your OWN constructions. For that purpose a full working source code is not necessary and therefore not included. Maybe this will disappoint some readers, but I don't believe a ready-to-rip source code contributes to the creativity of the learning reader.
Throughout this article I will refer to a polymorphic engine just as an "engine". Although I will cover some basics, I'm assuming you know what is meant with things like XOR-encryption, scan strings, anti-heuristics, etc.
2. WHY POLYMORPHISM?
Nowadays, considering the virus-awareness, unencrypted viruses don't stand a chance in the wild: a virus is naked without some form of encryption. Unless it contains some nice anti-heuristic tricks, it will easily be spotted by even the lamest AV product available. It's easy to implement encryption in a virus, e.g. with a simple XOR-loop with varying immediate (=key). The gain you get out of using even a simple form of encryption is definitely worth the effort.
However, this yields just little improvement: nowadays, virus scanners are 'taught' what the average decryption loop looks like, and because those instructions are very typical for viruses, the scanner will alarm the user (=heuristic detection). A solution for this problem is to create loops with instructions which don't immediately betray their purpose. In other words, create a decryption loop which doesn't *look* like one. This will fool most heuristic engines.
At this point the virus can't be spotted as an unknown virus (this type of virus scanning is called 'generic' detection, of which heuristic detection is an example). But this still is not enough. Once the virus reaches an AV person, he can easily make a scan string from your 133+ anti-heuristic decryption routine; all your effort down the drain. Your virus suddenly becomes detectable by every AV product known to mankind...
To prevent the AV community from using scan strings to detect your virus, you should avoid any constant values in the virus. We already encrypted the virus body, but the decryptor always remains constant, and can therefore be used for a scan string. Of course, we can't encrypt the decryption loop, but a possible solution is letting the virus produce different decryption loops for each (number of) infection(s). This way, one copy of the virus will never look the same as another one. In other words: an AV product can impossibly detect the virus with a scan string.
Nice, but is this really worth all the hassle (considering e.g. most of the time engines are larger than the viruses itself) ? Most of the time: yes. First of all, a polymorphic virus is very annoying for the AV people, and assuming the engine is complex enough, they have to design a detection method dedicated to your particular engine. This will take some time, which gives your virus some more time to spread. Second of all, designing a detection method which cathces 100% of all possible generations is very hard.
The more complex your engine is, the harder it will be. If your polymorph is not 100% detectable, it still stands a chance to survive, EVEN if AV say they can detect it! At this point I agree with Bontchev: any detection rate below 100% is practically useless, as one wrongly left infected file can re-infect the whole system/network.
Concluding, it's fair to say that (well implemented) polymorphism is a very powerful tool as both a pre- and post-discovery strategy.
3. HOW TO CALL AN ENGINE
The first thing you will have to understand is how an engine normally is implemented in a virus. At the time, I didn't really get that part, I still don't know if it's because it actually *is* hard to understand, or just because I'm a complete moron. :) Hell, I'll explain it anyway. Here's an example of how an engine could want to be called:
(if you already understand this, you can skip this chapter, but keep these parameters in mind, as I will be referring to them in the following example engine.)
Entry parameters:
DS:SI = code to be encrypted (virus body)
ES:DI = place to put decryptor and encrypted code
CX = code length
AX = offset in memory of virus entry
Returns:
CX = length of decryptor + encrypted code
DS:DX = offset of decryptor and encrypted code
What the engine will do, is encrypt a piece of code (the virus) with some nice scheme, and place a matching decryptor in front of it. As you can see, a user-friendly engine returns the parameters in i21/40 format: after the engine is called, the virus needs only to set AH=40 and call Int21, assuming the handle is in BX. Note that the input ES:DI and the output DS:DX are the same.
So, what to do is the following:
- Set the correct parameter to point to the virus body, here DS:SI;
- Set the correct parameter to a free space in memory, big enough to hold the decryptor and the encrypted virus body (don't forget to include the engine in the size; it's a part of the virus!) Here this parameter is ES:DI. Mostly the free space used is after the virus;
- Set the correct parameter to the size of the virus (again include the size of the engine, as it is a part of the virus)
- Set the memory offset parameter (here AX) to it's correct value. This is the offset (in memory, when the file is loaded) at which the decryptor will start. This requires an example: When infecting a 200 (decimal) bytes long COM, the offset at which the decryptor (the beginning of the virus) will start would be 100h + 200d = 1C8h. Note that the offset in the FILE would be 200d = C8h, but as you know a COM file is loaded in memory after PSP, ie. starting from offset 100h. With EXEs, this value would be the EXE's new IP you're storing in the file header, as DOS doesn't mess with this offset.
- We're almost ready to call the engine. The last thing to check is if the engine destroys registers that aren't used as parameters. It will save you some debugging. :) After that, call poly!
4. AN EXAMPLE ENGINE
A polymorphic engine produces (from a given amount of variations) a random decryptor which matches the (also random) encryption method on each infection. How does it know how to do that? The programmer of the engine thought of a couple of decryptor layouts, from which the engine chooses one, and then it randomizes any variable bytes in it. This requires explanation: let's just assume that a (primitive) engine has only one decryptor layout.
Let's say it looks like this:
[1] MOV <reg16> , offset_of_first_encrypted_word
:decrypt_next_byte:
[2] <crypt> [<reg16>] , random_immediate
[3] <add2> <reg16>
[4] CMP <reg16> , offset_of_last_encrypted_word
[5] JB decrypt_next_byte
Legenda: <reg16> = a random 16 bit register (AX, BX, SI, BP, etc.)
[<reg16>] = pointer to contents of offset stored in <reg16>
eg: BX=1234, [BX]=what's at offset 1234
<crypt> = a crypt instruction, eg: XOR, ADD, SUB
<add2> = add 2 to the <reg16> (add 1 for byte encryption)
The engine will fill in all variable values:
- [1] It selects (from a table given by the programmer) a random register to work with. In this example the engine will store the MOV instruction encoded with the correct register (more on this later), after which it will store the offset of the first encrypted byte (or word) of the virus.
- [2] After that, it selects (again from a table given by the programmer) a decrypt operation. Note that the engine has to remember which decrypt operation it chose, because later on, the virus has to be encrypted in such a way that the previously generated decryptor will correctly decrypt the virus. That's right, the encryption method is derived from the generated decryptor. The engine stores the crypt-instruction, and then it will generate a random value (=immediate) which will accompany the crypt operation. Again this value needs to be remembered, otherwise later on, the virus can't be encrypted so that the decryptor works. For example, if the random value is 1234 and the crypt-operation is XOR and the register is BX, the crypt-instruction is XOR [BX],1234 (duh). The engine stores the value, and repeats [3] for a (random) amount of times, as one single encryption operation yields a weak encryption.
- [3] Add 2 to <reg16>. Advanced engines make a choice whether to use byte or word encryption, but I'm assuming this engine uses word encryption. The register needs to be increased by two, so that in the next decryption loop the decryptor will decrypt the next word. (why am I explaining this.. :))
- [4] Compare the register to EOV (end of virus). Speaks for itself. (I hope :))
- [5] If we haven't reached EOV yet, jump and decrypt the next byte/word.
4.1 REGISTER ENCODING
In this example the engine wants to pick a register. To understand how this is done we need to take a look at instruction encoding of the 80x86. At the time I was working on my engine, I didn't have technical documents so I grabbed SoftIce and a Hex-Bin calculator and started investigating. Here's a piece of my investigation on MOV reg16,reg16:
INSTRUCTION HEX BINARY
MOV AX,AX 8BC0 10001011 11000000
MOV AX,BX 8BC3 10001011 11000011
MOV AX,CX 8BC1 10001011 11000001
MOV AX,DX 8BC2 10001011 11000010
MOV BX,AX 8BD8 10001011 11011000
MOV BX,BX 8BDB 10001011 11011011
MOV BX,CX 8BD9 10001011 11011001
MOV BX,DX 8BDA 10001011 11011010
MOV CX,AX 8BC8 10001011 11001000
MOV CX,BX 8BCB 10001011 11001011
MOV CX,CX 8BC9 10001011 11001001
MOV CX,DX 8BCA 10001011 11001010
(...)
Look at the binary values; do you notice anything? Obviously, the first byte in a MOV reg16,reg16 instruction always is 10001011b or 8Bh. The second byte only changes a little if we alter the registers. A closer look reveals that bit 5-7 always remain 110.
NOTE: For those of you that don't know anything, the bit most to the right is called bit 0, so the bit most to the left is the last bit: bit 7. A byte consisting of 7 bits instead of 8? No, bit 0 is the 1st bit, so bit 7 is the 8th bit.
Where was I... Oh yeah, the last bits remain 110. Now it looks like the rest of the byte changes all the time, which is indeed the case. I hear the attentive reader shouting: but what about the 5th and the 3rd bit?! Well, in this example I didn't use the 16 bit registers SI, DI, SP and BP. Those ones can also be encoded, and they require that third bit.
Conclusion: 10001011 11...... = MOV reg16,reg16
Note that the first three dots are filled by bits representing the TARGET register, ie. the one to the left, and the second three are representing the SOURCE register, the one to the right.
The attentive reader probably noticed that I just explained the MOV reg16,reg16 instruction, while the MOV in the example engine doesn't need the two registers after the MOV, but only ONE register and an immediate (the starting offset).
Let's take a look at that:
INSTRUCTION HEX BINARY
MOV AX,1234 B83412 10111000 00110100 00010010
MOV CX,1234 B93412 10111001 00110100 00010010
MOV DX,1234 BA3412 10111010 00110100 00010010
MOV BX,1234 BB3412 10111011 00110100 00010010
MOV SP,1234 BC3412 10111100 00110100 00010010
MOV BP,1234 BD3412 10111101 00110100 00010010
MOV SI,1234 BE3412 10111110 00110100 00010010
MOV DI,1234 BF3412 10111111 00110100 00010010
This time it may be more interesting to look at the Hex values than to look at the binary values. I have sorted the instructions a bit, so it shouldn't be hard to see a pattern. :) The difference with the previous example is that the 'register encoding' in the instruction is now done in the first (and only) byte of the instruction itself. The whole instruction takes three bytes: one for the instruction itself with a register encoded in it, and in the other two the immediate is stored, in Intel little-endian format.
When we further analyse this example, we see that (in the first byte) only the last three bits change every time. That's not so weird, considering the fact that 3 bits are necessary to encode 16 bit registers in instructions.
Conclusion: 10111... 00110100 00010010 = MOV <reg16>, 1234
Now it's time to list the registers and how they're encoded in instructions. Looking at both examples we can easily see that:
AX = 000b = 0h
CX = 001b = 1h
DX = 010b = 2h
BX = 011b = 3h
SP = 100b = 4h
BP = 101b = 5h
SI = 110b = 6h
DI = 111b = 7h
All usable 16bit registers are listed here (IP doesn't count), and we can see that they all just fit in the available 3 bits. What a coincidence :) Write this list down, you'll be needing it.
The question is now, how do we tell the engine to do this? Very simple: the boys at Intel invented the OR instruction. I think many beginner virus writers do not know the exact purpose of this instruction, while for simple COM/EXE techniques this instruction is seldomly used. I think you SHOULD know that:
0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1
But you're probably wondering: for what purpose would I ever want to use this instruction? Well, what it *actually* does, is making sure that in the target byte all bits are SET (=1), which are set either in value 1 OR in value 2. And this is VERY useful for register encoding: remember our 'wildcard' for MOV <reg16>,<reg16>? No? It was: 10001011 11...... = MOV <reg16>,<reg16>, with the first three dots for the target reg16, and the second three for source reg16. Say we fill the dots with zeros, and then OR them with the encoding value for registers. Say we wanted MOV BX,CX. First we'd want to move the value 10001011 11000000 (the basic value for MOV <reg16>,<reg16>) in a register, otherwise the CPU can't work with it. We'll put it in AX:
MOV AX, 8BC0h (use your bin2hex calculator)
Then we want to have the encoding value for the target register in bit 3-5, and the source register in bit 0-2, because the MOV instruction requires us to do so. BX=011b and CX=001b, so when we put this together we get: 011001b What happens when we OR this value with AX (which contains the value for a MOV instruction): OR AX, 19 (again use the bin2hex calculator)
AX = 8BC0 = 10001011 11000000
19 = 00000000 00011001 OR
---------------------
10001011 11011001 which is 8BD9, which is MOV BX,CX!
Mission accomplished, you now know how to encode registers in instructions.
4.2 PICKING A REGISTER
One of the first things an engine needs to do, is to pick a register to work with. Here we will see a big drawback of this particular decryptor layout example. In this example we are using registers to point to memory locations, that means e.g. XOR [BX],1234. Due to the 80x86 architecture, we are limited to but four registers to do that, namely BX, SI, DI and BP. So XOR [AX],1234 for example is an illegal instruction. Only those four registers are capable of this addressing mode.
There's one problem with BP though. Let me show you something:
81 37 34 12 XOR WORD PTR [BX],1234
81 77 01 34 12 XOR WORD PTR [BX+1],1234
In the first instruction, we're addressing with just the register, without a relative offset. In the second instruction, we DO have a relative offset, and it is encoded in an extra byte. Note that the second byte changes from 37 to 77 when we introduce the relative offset. On bitlevel, 37 and 77 only differ 1 bit, namely the sixth. This is of course to let the CPU know whether there will be a relative offset or not.
81 34 34 12 XOR WORD PTR [SI],1234
81 74 01 34 12 XOR WORD PTR [SI+1],1234
The same story goes for SI. When there is no relative offset accompanying the register (in other words: relative offset = 0), the instruction looks different. This same story also goes for DI, but not for BP!
81 76 00 34 12 XOR WORD PTR [BP+00],1234
81 76 01 34 12 XOR WORD PTR [BP+1],1234
As you can see, when there is no relative offset accompanying the register, it STILL is encoded, simply as a zero byte. Why? No idea.
What does this have to do with the engine? Well, in the decryptor in the example, we're addressing without a relative offset. For SI, DI and BX goes that there will not be a zero byte to let the CPU know, but just a bit set or unset in the instruction byte. This is a good thing, because otherwise we would have a stable byte (i.e. a byte that's the same in each decryptor generated), namely the zero. So if we wanted the engine to be able to pick registers from a list which included BP, we would have to write a routine that checks if he picked BP, and if so, adjust the instruction byte, and write the extra zero. Of course, for the purpose of creating decryptors as random as possible, this is a good idea. This, however, is an advancement to be worked on when you finished a working engine. I'm giving you a good advice here: don't try to create a complex engine right away, first start with a simple but working 'framework', which you can expand later on. If you start including all kinds of nice features, you'll probably just end up debugging your ass off.
Back to the example. I'm assuming we'll just use BX, SI and DI, so we don't have to deal with that relative offset shit. Err.. what should I explain further.. Just put the register encoding values for BX, SI and DI in a table, and let the engine choose from 'em.
Maybe you're wondering why I told you this whole story about BP and it's relative offset. It's not really that important and specifically a problem for this type of decryptor layout, and I also just could've told you: don't use BP in this example. I didn't, because I wanted to let you know you can't just 'assume' stuff, and you have to check everything.
4.3 MOVING
After reading 4.1 this shouldn't be really tough I guess. :)
In the example engine, we want to have MOV <reg16>, imm16, which looked like:
10111... 00110100 00010010 = MOV , 1234
We don't want the 1234, so forget the last two bytes. Important is that the instruction byte (the first byte) is 10111..., with a register encoding value filling the last three dots.
Example: how would we make the engine generate a MOV SI,0120 in the decryptor? We'd first have to fill the dots in the unencoded value for MOV <reg16>, imm16 with zeros, so we can OR the value to the correct opcode. From now on, when I'm talking about an unencoded value for an instruction, I'm assuming the dots are filled with zeros. Okay, so the unencoded value for MOV <reg16>, imm16 is B8h (converted in hex), let's put that value in AX. Then, we'd have to OR AX with the correct register encoding value. We wanted SI, whose value is 6h (110b) so we will perform: OR AX,6h.
4.4 DECRYPTING
Let me first of all mention once again the fact that the strategy behind most engines is that *first* the decryptor is generated, and after that the correct (matching) encryption is applied to the virus. Just to prevent some confusion.
Now, we want to make a decryptor, and we want each decryptor to be as random as possible. So we have to give the engine as much as possible crypt operations. XOR, ADD and SUB are good examples, because they have the same instruction layout. They work with 2 operands, for our purpose we will have the pointing register, and a random 16 bit immediate, so e.g. SUB [BX],1234. It's a Good Thing (tm) to include more crypt operations like NEG, NOT, INC, DEC, but the disadvantage of these other types is that they have other instruction layouts, and therefore require separate routines for including in the decryptor. If you decide to include more crypt operations in your engine, these four are a good choice, because they all require one operand, and for our example engine that would be a pointing register, e.g. NOT [SI]. Let's just say our example engine uses only XOR, ADD, and SUB.
Again, to build an instruction, we need to know the unencoded value for the instruction. Assuming the dots (where the register encoding should take place) are filled with zero-bits, here are the values:
XOR: 81 30 xx xx
ADD: 81 00 xx xx
SUB: 81 28 xx xx
Please note that these are the values for use with *pointing* registers! So with these values, you would be able to produce an instruction like XOR [SI],1234 but not XOR SI,1234 ! We need pointing registers for our example engine, so the above values will do.
Another important note: the first byte is 81, no matter what crypt instruction the engine picks! That's not good, because the whole idea of the engine is that we wouldn't have 'stable bytes' in the decryptor. So you can see you simply cannot do with just these three crypt operations, simply because it'll create a stable byte. As said, it is wise to ignore this for the time being, and first try to create a working engine. Then you can start working on problems like these. Well, you actually *need* to work on them, because if you leave stable bytes in your decryptor, you still haven't disabled the possibility for scan strings.
Okay, back to the program, say the engine picked ADD to include as crypt operation. We can simply store the 81, while it doesn't need any register encoding. Then we have to encode the correct register in the second instruction byte. Earlier, we encoded just the registers in the instruction, and now we need to encode pointing registers in the instruction. I haven't given you the encoding values for the pointing registers, so here they are:
000b = 0h = [BX+SI]
001b = 1h = [BX+DI]
010b = 2h = [BP+SI]
011b = 3h = [BP+DI]
100b = 4h = [SI]
101b = 5h = [DI]
110b = 6h = [immediate]
111b = 7h = [BX]
You should see a possible advancement: addressing using two registers. For example, set BX to zero, and let SI be the pointing register, but now use [BX+SI] instead of just [SI] to address the correct byte/word. More randomness!
Okay, I think the rest is pretty easy. OR 30h with the correct register encoding value, store the byte, and store a random 16 bit immediate. Don't forget to 'remember' which cryptop the engine stored, and which immediate it used. Most engines create an encryptor along the way, which can be executed when the decryptor is generated.
This is only one crypt operation. Uhm.. like, just repeat this step a couple of times, depending on how much crypt operations you want in your decryptor.
4.5 INCREASING
Now it's time in the decryptor to increase the register. Since we're using word encryption, the register needs to be increased by two. Obvious instructions are two INCs; ADD reg, 2; or maybe even SUB reg, -2. But also consider string instructions like SCAS, CMPS, STOS, LODS and MOVS. They perform an action, and then increase SI, DI or both. That could suit our purposes quite well. However, be careful with these, as for instance MOVS may have some unwanted results. I think you should be able to find the correct values for these instructions. To make this tutorial not too longwinded I will only shortly explain the use of the two INCs, as you now should be able to figure out the rest yourself.
The unencoded hex value for INC is 40hex (hi P/S) and you should encode it with the normal register encoding values (not the *pointing* registers, because that would be a crypt operation! (compare INC BX and INC [BX])) to get the necessary result. E.g. INC SI would be OR 40, 6. With 6 being the register encoding value for SI. This should be clear by now :)
4.6 COMPARING
Of course the decryptor has to know up to which offset it has to decrypt. In this example the only possibility of doing this is a CMP instruction, which is pretty weak because it gives the decryptor a stable byte. Another way to do the same thing would be SUB, but when SUBtracting the register you're using again in each loop, it will be destroyed. A way of doing this could be giving another register the end offset each loop, and then subtracting that register from the main register (used for the actual decryption). It's just an idea.
For this example we'll stick with the CMP. The unencoded value for CMP reg16, imm16 is 81F8h, 81 being the instruction byte which needs register encoding. So we can store the 81, and OR the F8 with the normal register encoding values (again *not* the pointing register values), and store the result.
After that, we'll have to store the imm16. This value is the last byte/word that needs decrypting. How do we calculate this value? Rather simple: starting offset + code length. Code length is passed to the engine using CX, and the starting offset can be calculated by memory offset + decryptor length. Memory offset was again a parameter for the engine, namely AX, and the decryptor length shouldn't be hard to calculate. If your decryptors are always the same length it's easy, but when you have variable length decryptors, you should keep track of how many bytes you're storing. So:
AX length CX
/---------------+---------------\ /----+------\ /-----------+-----------\|
⁄---------------------------------¬-------------¬-------------------------ø
≥ Host file ≥ Decryptor ≥ Virus body ≥
¿---------------------------------¡-------------¡-------------------------Ÿ
last byte/word to be encrypted ------/
AX + decryptor length + CX = CMP operand
Pretty easy huh. Add the values, and store it as operand.
4.7 JUMPING
After the CMP comes a conditional jump. The obvious instruction would be JB, but that again introduces a stable byte, because we don't have a substitute for it. Yeah, JNA would be nice, but the problem is they have the same hex values, which is not so weird considering that they do exactly the same thing. What about JNZ? This could be interesting, but only if we'd add one/two to the CMP operand! Otherwise the last byte/word wouldn't get decrypted! Again, I'm assuming we're only using JB. (Note that IF you choose to use the JNZ feature the following explanation also sort of applies to it)
If the condition is met (condition = we're below the last byte/word) the jump should jump back to the first decryption operation. How do we 'encode' this? First of all, JB has 72h as hex value, so we can store that. As you all should know from COM infections, such a jump has a displacement byte right behind it. This displacement byte is the amount of bytes the CPU should move IP, where you should see the starting point at the offset of the next instruction (so 2 instructions behind the 72 of the JB). Here's an example:
0BF9:03B6 81 37 64 23 XOR [BX],2364
0BF9:03BA (...) (...)
0BF9:03C6 81 FB A4 06 CMP BX,06A4
0BF9:03CA 72 EA JB 03B6
0BF9:03CC (...) (...)
In this example the first decrypt operation starts at offset 3B6. The "next instruction" of which I was talking would start at 3CC in this example. So the correct calculation for the JB displacement would be:
New offset - Offset of next instruction = Displacement
3B6 - 3CC = FFEA
As you can see, we get FFEA as displacement. We can only use one byte, so that would be EAh. Now most of the time you would say that EAh = 234d, but there's also something to say for EAh = -16d, and that's exactly the amount of bytes the CPU should move. Note that you don't have to worry about things like the memory offset, because it doesn't matter where in memory this is located: the displacement will be the same every time, as the amount of bytes to move doesn't change.
5. ENCRYPTING
At this point we're done with the decryption loop. Along the way I was telling you your engine should 'remember' what it did. Sounds okay, but you're probably wondering how that is done. First of all, it is wise when making a table with crypt operations, to include the inverse crypt operations. E.g. store SUB along with ADD, ADD along with SUB, and XOR along with XOR. In that way, you can store the decrypt operation in the decryptor, and the inverse crypt operation (for encrypting) in the encryptor. Note that the first instruction in the decryptor should have it's inverse operation as *last* operation in the encryptor.
6. RANDOMIZE
Throughout this tutorial I told you to let your engine pick something at random. Most high level languages have functions which return a 'random' value, but, as usual in assembly language, we get to do it ourselves.
As there is no way you can actually make a computer return a totally random value, because it just can't do that, we have to build a routine that gives us a value as random as possible. The most used way to get a random value is reading from port 40h. Because the value isn't really random, it's wise to perform some calculations, which would randomize the number some more. Another possibility is to let the decision depend on a value like system time or date. BTW: when you manage to do this correctly, you'll get a form of slow polymorphism.
Experiment with calculations and write a program which outputs the random values to screen *on bitlevel*. Investigate the values to check if they really are quite random. Note that this IS quit important, and if you rely on a bad randomizer, you'll end up with some options of your engine getting disabled, as they're never reached.
Once you have a random value, and you want to use it to decide e.g. which cryptoperation you will use, you have to AND it out to e.g. the offset of the last value of the table. (If you don't understand what AND is: AND makes sure that in the result byte only those bits will be set, which are both set in the first operand AND in the second operand. If you still don't get it, re-read the part about OR, and investigate AND in the same way, i.e. try to get to know what it REALLY does.) So, if you have 5 options, AND the random value out to 4, so you can add it to the starting offset of the table. Don't AND it out to 5, because the starting offset + 5 is just behind the table. This is because at offset 0 there is a value too. So, if you have x values, AND the random value out to x-1.
There is one problem though, which is overlooked by some writers. Get your hex2bin calculator, and start ANDing random values with 4h. Say you do it 100 times, you'll only get either 4 or 0 as a result. You'll NEVER get 1, 2 or 3. Hmm.. weird.. Let's try it with 5h. Perform AND xx,5 with 100 values, and you'll get 0, 1, 4 and 5. But NEVER 2 or 3. To understand this problem, we have to look at the values we're ANDing with at bitlevel:
0h = 00000000b
1h = 00000001b
2h = 00000010b
3h = 00000011b
4h = 00000100b
5h = 00000101b
6h = 00000110b
7h = 00000111b.
When ANDing with a value, bits in the result byte will only be set when they're also set in the value you're ANDing with. Say you're ANDing with 5h, and you'd want to get 3h as a result. 3h equals 10b so that means that in the value we're ANDing with, bit 1 should be set. Because the value we're ANDing with is 5h, and that equals 101b, bit 1 can NEVER be set. So you'll NEVER get 3h as a result. Again, this is quite important, because if you choose to neglect this problem, you'll end up with some features of your engine simply being disabled!
I hope you already 'feel' that values with all (necessary) bits set are suitable for ANDing with, as the result of the operation varies from 0 to the value you're ANDing with, which is what we want. So 1b (mostly unnecessary), 11b, 111b, 1111b and so on, are values suitable for ANDing with. If you have too few options in your table and can't think of another option to fill the table with (so you can AND with one of the mentioned values), you could simply put duplicate options in your table. This isn't as stupid as it may seem, as some options are more flexible than others: e.g. when deciding whether the engine should use normal INC or a string operation (LODS, STOS, etc.) for increasing, the option for "use a string operation" is attractive for duplicating in your table, as it has more sub-options (later on, the engine has to decide *which* string operation it should use) than the "use an INC" option.
Please note that this is just one way of working around this problem. You should be creative enough to think of another one, although I doubt it being more effective considering size.
7. SIDENOTE
One more important note. When working on your own engine, you're probably planning on adding some nice features. And when you're including new instructions, you need to know the unencoded values of these instructions. If you're not instantly running SoftIce in the background it's probably very tempting to use DOS DEBUG to check out the values. Never do that.... There is something very strange about the 80x86 of which I don't know the explanation: some instructions can take two different identities. E.g. the instruction "OR AX,AX" can take "09C0" as 'identity', but "0BC0" gives us the exact same instruction.
All 'official' assemblers like Borland TASM, Microsoft's MASM and all others, all pick the same of the two choices they can make, namely the 'highest' (so "0BC0" for "OR AX,AX"). In other words, if a certain instruction takes an identity which normally wouldn't be produced by regular assemblers, it is very likely this is the work of some polymorphic engine. Some (or most?) antivirus programs know this and alarm if they find such an instruction. I recommend using SoftIce anyway (although it's assembler also has some bugs), but if you for some strange reason don't or can't, use Borland's TurbodeBugger to check the values, I guess that should do the trick too.
A sidenote to this sidenote would be something PM pointed out to me: the still ever popular A86 shareware assembler has a nasty side-effect to it. At times, it seems to pick the 'wrong' one of the two possible instruction codes, resulting in the same alarms being produced. This can easily be avoided by using the Real Thing (TASM). There's no reason for not using TASM anyway.
8. GARBAGE
Garbage instructions are instructions included in the decryptor, which serve no real purpose for the execution of the decryptor, but are only included to hide the so obvious real purpose of the decryptor. Without any garbage, the decryptor actually *looks* like a decryptor, which makes it very easy to spot it heuristically. Furthermore, garbage instructions will let the 'real' instructions be at random places (in a constant order of course). In one generation the increase and compare operation are right next to each other, while in another generation some other (do-nothing) instruction is included between them.
The classical concept of garbage instructions consists of including one-byte do-nothing instructions like NOP, CLC, STC, INT 3 (the only one-byte interrupt (CCh)) between the actual decryptor instructions. Nowadays, the use of these one-byte instructions has become rather trivial, as almost all AV programs just ignore these instructions when they encounter a decryption loop.
These one-byte do-nothing instructions have also lead to something AV calls variable scan strings. In these scan strings, it doesn't matter if you have zero or thousands of garbage instructions between two constant values, the AV program just ignores those, and just checks if the two constant values occur in the right order (so, first the increase, then the compare instruction, here assuming these are stable bytes in the decryptor). In other words, these one-byte garbage instructions really serve no purpose anymore.
Furthermore, AV has noticed that huge amounts of these kinds of garbage instructions in a relatively small area are very likely to be the results of a polymorphic engine. In other words, using these one-byte do-nothing instructions can result in heuristic flags being triggered.
Luckily, there are other ways of producing 'garbage code'. The basic idea behind all of them is producing code which COULD be code required for the program to run correctly. A nice example is including some stupid Int21 call, like GetVersion or something. There are a lot of possibilities here. Each time take two things in account:
- Whatever garbage you use, make sure it doesn't influence the execution of the decryptor, but also make sure it LOOKS like useful code
- When using large garbage constructions (like the setting up of an Int21 call), make sure you have enough variations, otherwise AV could make a scan string (or some more if that would be enough) just from your garbage code!
Things to be considered are the possibility of including some nice armor tricks in the garbage construction (I'm calling it construction because sometimes the garbage doesn't consist of just one instruction anymore), but again, make sure to have enough variations, because armor code is of course uncommon code, which AV loves, because they can make scan strings of it!
For those wondering how to include garbage instructions/constructions at random: just place a call to your insert_garbage_at_current_offset before each major decryptor operation. An extremely simplified example:
start:
call maybe_insert_garbage
call set_up_registers
call maybe_insert_garbage
call perform_move_instruction
call maybe_insert_garbage
call do_first_crypt_operation
call maybe_insert_garbage
(...)
9. MULTIPLE DECRYPTORS
A virus writer which for some reason doesn't want to include a complete engine, but still wants *some* variation in his decryptors may consider the use of a couple of constant decryptors. They all do the same thing, but they look different. Here, the variation isn't on instruction level, but on routine level, which is far more simpler to implement. This technique can be called polymorphic in it's true meaning (poly=many, morph=shape), but as just a few routines are generated, it's often called oligomorphic (oligo=few). On it's own this method isn't considered strong, as a set of scan strings can be used to identify every possible generation.
The combining of these two methods however gives a very large variety of both decryptor layouts and decryptor instructions, which can be considered strong polymorphism. Most of the more powerful engines combine the two methods, and are therefore very hard to detect reliably.
Throughout this tutorial I have tried to explain how to avoid constant values using variation on instruction level. If you want to extend your engine with variation on *routine* level you must use well-considered code. The engine should decide which decryptor layout (routine) it will generate, and then it should generate that particular routine. As some things have to be done in more decryptor layouts (e.g. increasing the register) it's often wise to make routines (for e.g. increasing the register) which are independant on from which decryptor-layout setup they are called from. So the increase_pointer routine (which on his turn chooses from a variation of instructions) must be able to be called from different layout setups. (hope this was clear.. :))
10. CONCLUDING
I hope I have given you a basic idea of writing polymorphic engines. Once you have written one you have a pretty good idea about polymorphism and you should be able to enhance it with more Fancy Features.
Finally, I'd like to thank Case/IRG for explaining some stuff to me when I was starting out with polymorphism, and not to forget PM, who was willing to test the article.
If you like this tutorial (or have any questions for that matter), feel free to contact me at trg-@usa.net, on IRC #virus, or contact another SLAM member. Feedback will encourage me to write more tutorials. By the way, if you hate it, I also wanna know :)
Trigger [SLAM] '97