DnA 4-12: Assembler Tutorial Part I
x86 Assembly Language Tutorial [1/?] by Horsepowr
Written for DnA
This article is the first in a series of documents of which aim to teach the novice, or experienced high-level language programmer x86 assembly language. As you all know, assembly language, or more appropriately, representative mnemonic code, is a symbolic form of the processor's host machine language. Every opcode (or function, if you will) is directly represented in the cpu's microcode, which controls everything the cpu can do, and is, in fact, the ONLY way the processor can "speak" with software. No matter what language is used, what operating system is being run, or anything, all code is processed by the cpu as machine code, which as I've stated, is just a direct binary translation of assembly language. An assembler, is NOT a compiler, a compiler translates a language to object code (we will get into this term in a later article), while an assembler simply checks to make sure all the instructions are valid for the given processor, and converts it to object code from there, with no "translation" like a compiler does. This results in executables that not only have FULL control over the entire system (potentially, as often times the operating system will restrict certain processes), but have no extraneous code whatsoever. The end result is code that is 100% efficient (assuming the programmer optimized his/her routines during the coding process) and of the smallest possible size. This is valuable in any instance where size is of importance (low memory/disk space situations), speed is needed (video, I/O, DMA) or direct control of the system level hardware is required (an operating system or device driver).
Now that you understand what assembly language is, you may ask, why would you want to learn such a terse and cryptic language? The answer is explained in the advantages of assembly language. If a C program's video routines leave something to be desired, assembly language routines are often the very key to many succesful projects. Anyone interested in Virii creation also will like the aspects of control and the minimal size offered by this machine level code. You may also say: Yeah but isn't assembly language really hard to learn and doesn't it take forever to code in? To answer the first topic: No, not if you have knowledge of the cpu you'll be coding in, or better yet, knowledge of programming on the platform of choice. It's simply learning logic that is hard for many people. To print to the screen, it's not just some simple 1 line function. You must interface with the hardware either directly, or indirectly through the BIOS and/or operating system. The coder (programmer) must then place all the instructions in just the right order, with all the proper precautions taken for success. This is gained through experience with the language, as it is with any other language. To answer the second question, yes, it does take more time to code in assembler, but the key is, you invest more time initially in the code, so that it will take LESS time to execute. It's not reccomended that software be written in 100% assembly, as that is often times foolish, but the proper combination of a high level language that implements speed and/or size crucial assembly language routines with the main code is often the difference between a mediocre program and one that has people sending you all sorts of mail about how bitchen your stuff is and how can they get more, etc, etc.
As the issues progress I will get more and more in-depth regarding coding in assembler, and those of you who are extremely application oriented will not feel patronized or deprived in this area, but for the first few articles, we'll be looking at the basics. Since you've already read thus far, I assume you are still interested in learning assembler. The first thing any person wanting to learn assembler needs to know is hexidecimal notation. If you are an experienced C programmer, or an advanced BASIC or Pascal programmer, you may want to bypass it, but if you feel at all unsure about your knowledge on hex, then by all means, read on and refresh yourself.
Hexidecimal notation is merely an easy and efficient way of representing binary numerals. For example the hexidecimal number FF (yes I realize they are letters, but they are representative of numbers, so they are treated as such) is equal to 11111111 in binary. Isn't it much easier to read or type FF than 11111111? And it gets worse as nubers grow, as FF is merely the decimal equivalent of 255. Imagine the complexity in binary of the decimal number 12,309,851! Hexidecimal is by far the choice base for assembly language programming, and is therefore crucial that you understand them, and are comfortable working with them. First of all, if you are not familiar with binary (base 2) I will explain that. Binary is a numeric system in which the only different digits that may be used in each position are a 0 or a 1. This is great for logic and electronics as digits can be represented by a true or false value (on/off in an electronic circuit). But it is rather limiting in the fact that as quantity's grow large, so do the place values of the number. The way the system works is like this: Picture an egg carton, with only the bottom row of 6 egg holes left. Let these holes represent the #'s 1, 2, 4, 8, 16, and 32, from right to left (see diagram).
32 16 8 4 2 1
___ ___ ___ ___ ___ ___
/ \ / \ / \ / \ / \ / \
| X | | | | X | | X | | X | | |
\ / \ / \ / \ / \ / \ /
--- --- --- --- --- ---
1 0 1 1 1 0
You'll notice that in each egg pouch that has an "X" in it, the numeral 1 is right below it, and those which are empty have zeros. This is how binary works. Starting from the right, each digit represents 2x the number proceeding it, and the rightmost digit always represents the value 1. To get a total value of a binary number, you add all the values for the 1 digits, so in the example, the sum of the values of the 1 digits would be 32+8+4+2, or 46. So the binary number 101110 is equal to the decimal number 46.
Hexidecimal is merely a way of representing binary #'s, in effect condensing them. Each hexidecimal (which will me henceforth referred to as `hex') digit represents 4 binary digits. Hex is base 16, so each value position (place) can have a maximum value of 15, just as decimal (base 10) can have a maximum value of nine. For example, when you add 2+9 in decimal, the largest that the "ones" place can equal is 9 (0-9 equals 10 digits, hence base 10), you must "carry" when the additive exceeds this maximum value, yielding 11, which takes two digits. When you add hex, you do not have to carry until the value exceeds 15, but you may ask how can 15 be represented in a single digit? The answer is by letters. As in decimal, 0-9 is equal to the values 0-9, but rather than having to move over a value place, 9 increases to A, then B, etc, all the way to F (which is equal to a decimal 15), at which it has reached it's digit bound, and must carry to 10. So if in hex the value doesn't carry into the second place until the first digit exceeds 15, the 1 in 10 hex is equal to 16. So although you may see 10 as 10 decimal, 10 hex is actually 16, meaning 16*1+0. For each place in a hexidecimal number, you exponentialize 16 by it's distance form the first digit ( which in the case of 10, the distance is 1 so 16 to the 1st power is 16) times the value of the digit. So if it was 20 hex, you would say, okay the 2 is 1 distance away from the ones place, so we multiply the 2 times 16 to the first, which is 32 decimal, plus anything in the rightmost places, which in this case happen to be zero, so your total is 32. A 3 digit example is 2A7 hex. The 2 is a distance of 2 from the ones place, so 16 to the 2nd power is 256, which multiplied by two is 512. The A is 1 digit away from the ones spot, so it will be 16 to the first power, or 16 times A, which is 10 decimal, so 160. The ones spot contains 7, which is zero distance form the one's place, so 16 to the zero power is 1 so 7 times 1 equals zero. Then you add all these digit value together (512+160+7) to get 679 decimal. Now that you understand how hex relates to decimal values (you do understand right, I haven't lost you yet have I?), it's much easier to see how hex relates to binary. As I stated before, 1 hex digit (Maximum value of 15) represents 4 binary digits (1111 = 8+4+2+1 = 15, so once again, a maximum value of 15) it's just a matter of compressing 4 digits with a maximum value of 15 to one digit with a maximum value of 15. For example, 1001 binary is 8+0+0+1 or 9. Nine in hex is just that, 9. Okay, how about 10110110 binary? No problem, take them 4 bits (binary digits) at a time for each hex digit. The first four are 1011 = 8+0+2+1 = 11 decimal = B hex, the second four are 0110 = 0+4+2+0 = 6 decimal = 6 hex, then put the two hex digits together to get the grand total of B6, which is equal to 10110110 binary which is equal to 182 decimal. If this is really making no sense to you and you have taken at least a pre-collegiate algebra class or better, and have some knowledge of computer programming then I suggest getting Peter Norton's Book on PC Assembly, which is not a very good source on learning assembly language, but is very helpful in learning base conversions. If you have less than an academic level of math, and have no programming experience, it is understandable that you are confused. It is advised to you at this point to seek a language with a hardware shelter such as BASIC or Pascal. If you did grasp this concept, Great!, you're well on your way to learning assembly language.
This concludes this article, as your eye's are almost as tired as my fingers, but look for the next article where I will discuss the way the x86 processors handle data and instructions using memory and the registers. If you have any questions, feel free to channel any feedback, requests, hate mail, or whatever through DnA, or direct to me at my system: The Finish Line (714) 572-8696 v.32bis.
Hasta -HP ---tURB@---