Xine - issue #5 - Phile 114

Published in

· 7 months ago

 

                                             Ú-----------------------------¿ 
                                             | Xine - issue #5 - Phile 114 | 
                                             À-----------------------------Ù 

 

 

 

 

 
			C to assembly, language point of view 
			------------------------------------- 

 

 

 
	 Intro 
	------- 

 
	Many of you think c is useless for viruswriter. Many poeple wich start  
vxing handle better c language than assembly one. But the true reason where C 
is locked for virus is the dependency of the compiler, wich block some special  
manipulation. But the use of a C compiler is not totally senseless. If the  
optimization is not perfect, it can build a code very proach and similar to 
the assembly one, if you follow some rules. Of course, you will loose a few  
size, but you will gain by stability, portability, and finally by coding much  
fastly your nice littles routines, that you may customize once in assembly. 

 

 

	 The first step, code parser, analyzer 
	--------------------------------------- 

 

		a) syntax analyzer 
		------------------ 

 
the major improvent in hll is the ability of analysing syntaxic expression, and 
evaluate them, things that the processor can't do. Syntax analyzer is quite  
common to all language, wich result in fact in a series of mathematical  
operation of variable and known value. The best way I did to make a syntax  
analyzer was using a binary tree and recursivity. 

so, lets assume the simple case: 	b+c			  + 
								 / \ 
							 	B   C 

for this case, you need to have:	b+c*d			  + 
								 / \ 
								B   * 
								   / \ 
								  C   D 

the second part of the plus operation has been changed, and it's now an  
expression himself, for such operation, you have to take care of operator 
priority, the * is prior to the +, then you have to evaluate c*d before 
making the +. 

now to see why recusivity and binary tree are usefull, imagine you have 
(b+c*d)+a , a part of the expression three will stay the same, but in the  
upper of it, you have to add a "+a", after you then try (b+c*d)-(a*b+d) 

note, it's not important here if it's B or a value, it's just symbolic. 

 

 
		b) language needings 
		-------------------- 

 
You need also to know what type of data you are manipulating, in fact, their  
"class" are different, and this have many repercussion later in compiling.  
Normally, assembler programmer handle just nice little integers, wich is quite  
suffisant. Boolean are also integers, but the majority of the test are to see  
if it's zero or not. and not using the complex processor flags, wich seems  
better but much problematic for boolean operation as exemple. 

Each language have a same squeletton, a language handle test, jump, function  
calls, and boucle. For C, it's easy, if true (non zero as seen) , execute next  
instruction. The next instruction can be a group of instruction, and then you  
introduce instruction block. 

Instruction block have also to be seen as a directory with a main directory. 
classical instruction are subdirectory of instruction. Then for that you have  
to handle the text file a certain way, line per line. So you are forced to  
reconstruct the main body. 

 

 
		c) memory, variable, the magic key ? 
		------------------------------------ 

 
The only extensible zone for the processor is the memory, so, the other  
extensible number variable make perfect the linking between memory and 
constants.  

Now we have two type of data, the global and local ones, global have to be 
stored with the general program, when local have to be allocated, in C, 
the compiler take care of allocation, deallocate when the block finish, wich 
define the need of a simple and fast memory allocator/deallocator. 

for that, the stack is quite perfect, as often seen in program, the mov ebp,esp  
save the actual stack state. The job is evaluating for each constant a dynamic 
place in the stack. 

C have also defined pregenerated new and delete wich are just typical  
malloc(sizeof), not to be implemented in the dynamic calculation 
	 

 

 
	 The second step, the meta-code 
	-------------------------------- 

 
Translating an expression to the processor is not as easy as you can think,  
because there's today two style of processor, the accumulators based one and  
the stack based one, ie CPU/FPU. Often the accumulator cpu can act with  
some manipulation as FPU, but it make larger the code, memory and data  
manipulation. 

First, for CPU, you work often with a working register, the accumulator one on 
intels as exemple.  

lets take our first exemple b + c*d, so, analysing the tree, we look that way 

load c 
mul d 
save #1		 
load b 
add #1 

 
then, now the accumulator equal the expression, watch that here the expression  
have just one operator, because we have no predefined processor, then we have  
assume we work with a very restricted one. 

there's three type of opperation, load save and operations. The problem is  
assuming the first save/add 

now let look on a stack operation how it will work 

push b 
push d 
push c 
mul 
plus 

The principe of stack processor is that the result is allways push on top of  
the stack, wich equal the accumulator of the 1st type.  

 

 

	Third step, translating the code 
	-------------------------------- 

Translating the code to assembly output is much easy, because meta code are a  
series of macro, these macro can be directly changed into small groups of  
instruction, lets see: 

load c		-> mov	eax,[esp+c] 
mul d		-> mov	ecx,[esp+d], xor edx,edx, mul ecx 
save #1		-> mov	[esp+#1],eax 		// # are dynamically alocated 
load b		-> mov  eax,[esp+b] 
add #1		-> add	eax,[esp+#1] 

and for the second one, lets see 

push b		-> fld	qdword ptr [esp+b] 
push d		-> fld	qdword ptr [esp+d] 
push c		-> fld	qdword ptr [esp+c] 
mul		-> fmul, fxchg, fstp * 
plus		-> fadd, fxchg, fstp * 

we fstp * mean we send ST(0) to trash, can be as exemple sub esp,8 , fstp [esp],  
add esp,8 

 

 
	Conclusion 
	---------- 

 I hope you found that interresting, I hope to see multiplatform viruses soon :)

Xine - issue #5 - Phile 114

Share this article

Let's discover also

Xine - issue #5 - Phile 114

Xine - issue #4 - Phile 000

Xine - issue #5 - Phile 000

Xine - issue #3 - Phile 000

Xine - issue #5 - Phile 301

Xine - issue #3 - Phile 001

Xine - issue #4 - Phile 001

Xine - issue #4 - Phile 003

Xine - issue #5 - Phile 116

Xine - issue #5 - Phile 107

Recent Articles

The First Earth's Circumnavigation by Antonio Pigafetta

Yak Facts Issue #10: It's Flavorific!

Yak Facts Issue #9: Now with Ginseng

Yak Facts Issue #8: As Seen On TV

Yak Facts Issue #7: Caution: Live Animals

Yak Facts Issue #6: Repeat as necessary

The Esoteric Origin of the Universal Weekly Sequence

Yak Facts Issue #5: Repeat as necessary

Yak Facts Issue #4: In Technicolor

SA CROCORIGA MANNOSA

Recent Comments