Copy Link
Add to Bookmark
Report
Xine - issue #5 - Phile 101
Ú-----------------------------¿
| Xine - issue #5 - Phile 101 |
À-----------------------------Ù
elements of X86 assembly (First step)
+------------------------+
This text was originally written to help by friend billy in his (crazy)
path of understanding the intel processor. Here we will try tp analyze deeply
the binaries of intel instructions themself.
So, how are things done ?
+-------------------------+
The intel instructions set is a couple of instruction, represented by
binary value of 8, 16, 24 bits (may be 32, couldn't find any, what can't we
except from intel crazyness) ? There's no logical links between all these
instruction to their binary code. So, you have to do as everybody. But, we
could make classes of instruction, depending their arguments. The binary
have fixed value, and then we can add some value to switch the processing
of the instruction. They are instruction flags:
ie: ADC: 0001001woorrrmmm
There a table you can get at the following adress:
http://www.imada.ou.dk/~jews/PInfo/intel.html (*)
(*) Look errata over this table
The instruction himself
+-----------------------+
The instruction himself can have 6 different ""flags"" that will be
handled, often just 2 or 3 are used simultaneously. These flags are directly
put into the instruction binary code.
Each instruction have them stored differently, the big majority have it done
the same way (as exemple, xor sub add have the same prefix) but there some
little bastards.
the w flag
+----------+
The w flag is important for the instruction operand, he define if the register
argument is a byte or a word. (0 for 8 bit, 1 for 16/32 bit)
** word_value may be a 16 bit value, in fact it depends the codec you are
working on, as exemple, the word value will be in a 16 bit segment a 16
bit value, but on a 32 bit segment it will be a 32 bit value.
(** Also, see prefixes and utilisation)
The o flag
+----------+
The o flag set the primary function of argument, it's a double bit flag,
possibility are: * 00 = dword ptr [memory_register]
(note that if memory_register = 110 (ebp code) ,it become dword ptr
[hard_coded_address](*)
* 01 = dword ptr [memory_register+8_bit_value]
* 10 = dword ptr [memory_register+word_value]
* 11 = directly register
** word_value may be a 16 bit value ? No, it depends the codec you are working
on, as exemple, the word value will be in a 16 bit segment a 16 bit value,
but on a 32 bit segment it will a 32 bit value.
(** Also, see prefixes and utilisation)
so, to have a mov eax,[ebx] for exemple, you have to set the oo flag
to 00.
The register flag
+-----------------+
The register flag represent the register operands, it's 3 bit
they can be:
eax = 000 ecx = 001 edx = 010 ebx = 011 esp = 100
esi = 101 edi = 110 ebp = 111
The memory flag
+---------------+
The memory flag is quite the same as the register flag, except:
if the memory flag equal esp, and the o flag is not 11,
it means you use an advanced register.
When you are using an advanced register, the instruction is directly followed
by a byte, here's the decryption of the byte
AA BBB CCC dword ptr [ccc + bbb * (aa^2)]
CCC is eqaul to the adressing register
BBB is equal to the evaluation register
AA is equal to the multiplication byte
if AA = 00 and BBB = 100, then there's no evaluation register.
The S flags
+-----------+
The S flag is not very common, but appears sometime in instruction like
push ds or push es, it's a 3 bit value that have the following value:
ES: 000 CS: 001 SS: 010 DS: 011 FS: 100 GS: 101
an exemple: push Segment is equal to 0000111110sss000
you remplace sss by a given value, easy as bonjour :)
The Conditional flags
+---------------------+
The conditional flag is a 4 bit value, here's the handled conditions by
the processors:
0000 = O 0100 = E/Z 1000 = S 1100 = L/NGE
0001 = NO 0101 = NE/NZ 1001 = NS 1101 = GE/NL
0010 = C/B/NAE 0110 = BE/NA 1010 = P/PE 1110 = LE/NG
0011 = NC/AE/NB 0111 = A/NBE 1011 = NP/PO 1111 = G/NLE
The Prefixes themself
+---------------------+
before some instruction, you can specify the destination and/or the
way the instruction have to be processed. Here's a small description:
066h you switch from word to dword and from dword to word
067h you switch from word to dword in memory accompagnment
ie: mov eax,[ebx+memory16]
02Eh destination is to CS segment
03Eh destination is to DS segment
026h destination is to ES segment
036h destination is to SS segment
064h destination is to FS segment
065h destination is to GS segment
The following value
+-------------------+
Now the following value may be:
1ø) fixed by the instruction himself, it can be a Immediate in 8 16 or 32,
2ø) it not specified, you have to look the operand to find the instruction
size and to check the prefix
We want an exemple !
+--------------------+
All this theorical shit overload your brain, okay, we will make a small
exemple to show you that's not that difficult.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Exemple, you take this:
8B 4C 24 04
1ø) Prefix analisis, no prefix
2ø) Instruction opcode detection
8B 4C = 10001011 01 001 100
^ ^ ^ ^
w o r m
but MOV Reg, Mem is equal 1000101000000000b
what you can notice is that bit 8 if the w flag, there's also
the o the r and the m flag.
so if you zero them it from 8B4C, you finally obtain 8A00h, and it's
the MOV Reg, Mem opcode
now, if you isolate the w flag, you notice it's 1
-> we deal with 32 bit register
now, if you isolate the o flag, you notice it's 01
-> we deal with a memory destination liek [reg+8_bit]
now, if you isolate the r, you notice it's 001
-> the instruction is MOV ECX, blabla
now, if you isolate the m, you notice it's 100
-> extended value!
so you take the next byte, it's 24h
it's basically 00100100h
so it should be [esp+esp*(2^0] but this doesn't exist for intel,
he interprete it as [esp].
Remember, you have to add the 8_bit, it's equal 04
now, you look if there was a value to add, the operand are a register
and a memory location, no Imediate, so you pass this point.
so, the instruction was mov ecx, [esp+4]
----------------
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
An other exemple:
8D 5D D4
8D is not a prefix, so
8D is equal 10001101b and lea is 10001101oorrrmmm
clean the 8D5D of the o r m flags and then compare, you will have 8D00
so, take a look on the oo rrr mmm
5D = 01 011 101
o = 01, 8 bit displacement
r = 011, it's equal EBX
s = 101, it's equal EBP
the 8 bit displacement is equal to D4
so, the instruction was: lea ebx,[ebp+d4]
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
As last exemple, we will take this:
66 81 03 65 65
1ø) Analyzing the prefix, the instruction are muted from 32 to 16 bit
type, what previously was a dword is now a word!
2ø) 81 03 = 1000 0001 0000 0011 , if you look the ADD instruction in
^ ^ ^
w o m
the table, and then clean the wom flags of 8103, you will finally find
this: 1000 0000 0000 0000, the ADD Mem, Imm8
3ø) the w flag is equal to 1 that we deal with word
4ø) the o flag is equal to 00 , we are dealing with a memory
destination [register]
5ø) the m flag is equal to 011 , the memory destination is [ebx]
6ø) checking the size of the operand that follow the opcode
it's 1ø) a word (coz o flag is 00)
2ø) word mute from 32 bit into 16 bit value
now, we discussed about operand size, it's 16 bit, equal 6565
7ø) that's all folks
finally we find that the instruction was: add word ptr [ebx],6565
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
so, for the rest, I let you experience by yourself, remember:
scienca vincere tenebrae!
Now you wanna know the instruction size ?
+-----------------------------------------+
Here's how you have to proceed:
1ø) Check if prefix exist
2ø) Recognize instruction, get his size
3ø) Check Operand, if there's a destination
3 bisø) Extended register ? Yes, Increment instruction size
3 tricø) Followed by a value ? Yes, calculate his size
(Pay attention to the prefix)
4ø) With operand check and prefix analyzed, and instruction recognized
you can proceed to add the size of the following value.
so, look how is stored an instruction
[Prefix?] [Instruction] [Extended memory?] [Add to memory pointer?] [value?]
all the ? are to check, they may exist, may be not. It's your job
to do that, now, go ahead!
Errata over the table
+---------------------+
This table is really usefull for a programmer, so I processed it into an
include file almost readable by an assember (good for vx!) so, if you wanna
use this, no problem, look the 8086.inc to see how are stored the datas.
This table had a few bugs, a big one for exemple is the fact of having
set that operand reg,reg are the same as mem,reg. It's in fact true when
compiling, it's totally false when decompiling. Anyway, enjoy!
Check the table and you can see there's 2 different manner to do a push edx.
push register and push memory (switched to register)