Copy Link

Add to Bookmark

Report

Assembly Language for Veggies (And C programmers) Part 1

eZine lover (@eZine)

Published in

Assembly Language for Veggies

· 4 months ago

Assembly Language for Veggies (And C programmers) Part 1.

So you wanna be an Assembly Language programmer? OK, no problem! this DOC is designed to introduce you to the basics of ASM and the concepts behind same. I will be providing examples and some demo routines along the way, along with cross refences and examples from other languages to clarify certain points.

OK, so here goes...

When you program in assembly language, you have complete and utter control of the computer, and everything it does. YOU get to choose EXACTLY it's behavior under your program. you can directly access any hardware and do anything - the only limit is your skill.

WHAT YOU GET

Basicly, assembly programs talk directly to the 8088, 8086, 80188, 80186, 80286, 80386, or 80486 IC inside your Machine. This is a custom chip designed by Intel and is called the CPU (Central Processing Unit). We begin by looking into these chips. your machine, depending upon model, will use one of these chips. XT's have either 8088's, 8086's or their NEC clones, the V20's and V30's. (The NEC Chips are 100% compatible), whilst the AT's use the 8018x series (Ratrely, but they are used!) or 80286 chip. The newer fast machines use 80386 or 80486 chips and hense their name. All the chips are "Upward Compatible" - that means that anything the 8088 could do, all the chips can do too, except faster. The 186 and 286 added more instructions - the 386 & 486 can do those as well.. so you see that the 486 is king of the mountain, but will do the exact same job of an 8088 (only about 30 times faster!) if required.

Because of this Upward compatibility, you see that we can write a program that works on an 8088 and expect it to execute correctly on any IBM design, regardless of CPU UNLESS we use instructions specificly for one of the later chips (Which is nearly never).

So, to program these chips, one requires an understanding of them.... Here goes. The chip has the ability to execute machine code instructions. This is the most important job of the chip. It reads an INSTRUCTION from computer memory, figures out what the instruction means, and executes it, then gets the next instruction. That is ALL that a CPU is capable of doing!!!! As long as a computer is operating, it is doing this...from the first second you switch it on, until you switch it off again....

Even when a machine has "Crashed" it can still be doing something - and usually is - but what it is doing is useless and won't allow the operator a chance to send it instructions to tell it to stop it's useless activity. The only way to stop a CPU from doing it's job is to HOLD the RESET button on the computer down, or to switch power off.

Thus you see, you must have a logical set of instructions with a correct start point, and a correct end point. The CPU keeps track of what it is doing with a set of REGISTERS. the registers are of utmost importance to the programmer, for without them he would be lost.

Here are the registers of the 8088 series (common to all models):

AX, BX, CX, DX      SP, BP, SI, DI   CS, DS, ES, SS    IP, F.

The letters are the standard reference as used by common agreement. all registers are 16 bits wide - that is they can hold a number from 0 to FFFF hex. They are grouped according to use :

IP - Instruction Pointer, is used internally by the CPU to keep track of what instruction it should execute NEXT....IE a marker of where in memory it is up to.
F - Flags, also internal to the CPU, is a set of 1 bit markers that can be either 0 or 1 to indicate a certain CPU status. The Flags have a set of instructions designed to read individual status Bits built into the CPU.
CS - code segment, the memory segment of the executing program. (more on segments to come in a tic..) - this will be set upon startup of your program and is usually NEVER touched.
DS - data segment, the default segment for which to get data from - used by some instructions for transferring data about in memory.
ES - Same as DS, but toally user definable.
SS - Stack segment - Like DS, but only for stack operations. not normally touched by user.. see section on stack.
SP - Stack pointer - a bit akin to IP, but for stack operations.
BP - Base pointer - general 16 bit register for user useage.
SI - source index - used by some instructions for data transfer. for user useage.
DI - Destination Index - same as SI.
AX - Accumulator. 16 bit general register for user useage. all math conducted inside this register.
BX - Base - general register for user useage - also used in some operations.
CX - count - general register for user useage - also used in some block movement operations as a loop counter.
DX - Data - general register for user useage - also used in memory reference and 32 bit math operations.

To keep things flexible, AX, BX, CX and DX can be divided into 2 8 bit registers... Note: These are not extra, separate registers, simply a way of accessing the same register 8 bits at a time!! The 8 bit versions are called AH and AL , BH and BL etc... not too obviously, AH is the top 8 bits of AX, whilst AL is the bottom 8 bits...

Thus a program that stores 67ac into AX could just as easily store 67 into AH and ac into AL - it would result in the same thing - AX would now equal 67ac.

One important concept to be grasped is that the registers are just like pidgeon holes.... they just hold a number. That number can be an address, the ASCII code for a letter, the result of a math instruction or whatever. The CPU only knows it's got a number... thus, there's no such thing as:

Var 

  cx : word; 
  al : char;

or similar... It overcomes a big hassle in many languages... in PASCAL one can't take a number variable and drop it into the middle of a string, one must use the STR( function... not in ASM... one just umm.... uses it! thus there are no "conversion" functions built in, or needed.... makes things a LOT simpler at times!

As you ave gathered, the 8088 series are 16 bit CPU's - called this because all the registers are 16 bit, and the data paths inside the cips are 16 bit also! (Funny 'bout that)... BUT they were designed to use up to 1 MB of memory. (Take my word for it) .... The problem is that 1 Meg requires 20 bits to count up all the combinations... how does one count to 20 bits with 16 bit registers? Impossible! - YES!! .... so the designers thought that instead of inventing a 20 bit CPU they'd design SEGMENTATION. This is one thing new programmers come to hate! It's easy if you follow it carefully, but more often than not people stuff it up. This is where the segment registers come into play.

Memory is accessed using a combination of 2 16 bit registers... the segment and the offset... Valid combinations include : CS:IP (for where to get the next instruction from) SS:SP (stack location) DS:SI, ES:DI and more... Note that a SEGMENT register must come first (CS, DS, ES, SS) - you can't do AX:DI - it just isn't allowed. This is a hardware restriction, but in practice it's not a hassle.

Here's the math for working out which address you're at...

The segment registers point to the start of a 64k "Chunk" of RAM, whilst the offset points to the byte within that chunk.

(All addresses in HEX notation)

you can have many combinations that relate to the same physical address...
Thus: 0000:0401 is the same as 0040:0001 , f000:a000 is the same as fa00:0000 Addition is performed inside the CPU to work things out thus:

     Segment register:    0000          0040 
 plus offset register:     0401          0001 
------------------------------------------------- 
               equals:    00401         00401 
--------------------------------------------------

note how the result is 5 hex digits long - that's 20 bits in binary. The segment is moved one digit along as it's a 64k chunk it points to. (64k = 4 bits = 1 hex digit)

By the way, get used to hex, it's the generic way of referring to register contents.. It's always a 4 DIGIT number for a 16 bit register, a 2 DIGIT number for 8 bit, or a 5 digit number for 20 bit. the conversion is thus:

           |          |           |           |           | 
Binary:     1  0  1  0  0  0  0  1  1  1  0  0  1  0  0  1

Take the nuber in groups of 4 bits. A hex digit is base 16 - there are 16 possibilities per digit (Decimal offers 10 [0-9]) hex has 0-9 and a-f [16 varieties]

you get 16 combinations in 4 bits - from 0 0 0 0 to 1 1 1 1 [0-f] so the number above is: a1c9 Remember that each bit has a "weight" thus:

8 4 2 1   -  weight 

0 1 1 0   -  hex number

to convert quickly, take a group of 4 bits, mentally ad the weights of all "1" bits - in this example 4+2 and the result is 6. The hex for this binary is 6. note that in hex addition, 9+1=a, not 10!!!

SO:

1 1 1 1    =  8+4 (c) + 2 (e) +1 (f)  =  F hex.

That is all the CPU provides for you to use!!! (And all you need)... Here's how...

THE BASIC IBM PC

We begin out examples by looking at a basic IBM PC equipped with <say> a floppy, a hard drive, some RAM and a video card, running MS-DOS.

OK, when your program is started, it is given access to all available memory from wherever dos has currently used up to the end of physical memory. This could be as much as 600k or maybe even more under DOS 5.0, or as little as 30 or 60k in a very small multitasking window. Your program has permission to do anything to this block of memory, and it's contents at load time are garbage.

Program begins execution at the first instruction in your program (CS:IP will initially point here ) and wanders through, following the program to the end.

In the IBM PC the CS, IP, DS, ES and SS:SP are all preset for you to valid, correct settings when your program is loaded. further, CS, DS, ES (and usually SS) will all be equal.

Because of the 64k segmentation limitation, everyone seems to do things in 64k chunks, and DOS is no exception. your program always begins at CS:0100 (The first 256 hex bytes are filled with information to be used by the program if needed) and the SS:SP is usually placed at the very end of the segment (ie SS=CS, IP=FFFE)

ABOUT THE STACK

The stack is vital to the operation of any program. It is for holding temporary addresses during program execution, and can be used by the user or the CPU at any time. Thus, a valid stack must always be maintained. Whenever an instruction executes the equivilant of a BASIC GOSUB, the address of where to go upon RETURN is saved on the stack. This must be a 16 bit digit (CS:IP) thus the stack starts at FFFE and not FFFF. after a storage, the SP is decreased by 2, so it then points to FFFC. don't ask why it grows downward, it just does.... the lower the SP, the bigger the stack. Again, when the RETURN is executed, the SP has 2 added to it, and again becomes FFFE. more on the stack later.

now it's time to see what is available to our program when it's run.

IBM thought they'd give us a set of interface routines for using the hardware they'd built in. Nice of them that, saves us from directly manipulating the hardware which is usually a tricky and weird task! These are called the BIOS routines and are built into a chip on the computer's hardware. They are responsible for starting the computer when powered u, and also loading the operating system MS-DOS.

DOS also supplies a set of routines for working with DOS - these are called the DOS routines (No shit!) and are available whenever DOS is in memory.

There's stuff like reading and writing to disk, screen, etc. getting emory sizes etc and all sorts. See a good ASM book for details - there's hundreds of them and take about 200 pages of text to fully cover - I'm not typing that lot out again!!!

In fact, most of your programs will simply be loading up and calling these routines... Here's a simple example (Type this into A86, it'll work !!)

; Demo program1 

begin:         jmp start 

string         db 'Hi there!!$' 

start:         mov dx,offset string 
         mov ah,09 
         int 021 
         int 020

now that will be very confusing to you, but it's a simple program in assembler (Can you guess what it does?) Let's look at it line by line.

; Demo program1 --- any text after a ; is ignored - you're actually telling the assembler here, not writing 8088 code. This could be left out without any problem. It does not effect the size of the final code.
begin: jmp start Here's our first instruction. Begin and start are LABELS used by the assembler to refer to an address... note how there's no hardware addresses written in... I could have said simply JMP CS:010F but it's much easier to use a label. That way if I added more between Begin: and start: i would not have to recalculate the address. The assembler works out the address at assembly time and substitutes it instead.
string db '.....$' String is another label. db means define byte. this is how we reserve memory. everything between the quotes is stored into the program and appears in memory at load time referenced by the string label.
start: mov dx,offset string this loads the DX register with the ADDRESS of the string label. Note the word offset. This means you want the address of the label, not what is at that address.
mov ah,09 - loads the AH register with 09 hex. This is needed by the next instruction.
INT 021 - Call MS-DOS's built in routines They see that AH=09, decide that you want a write string to screen routine and display the text starting at the location in DX (Well, really DS:DX, but as I said, DS was setup for us before the program began) until it sees a $ symbol. The routine is written to return to our program when the $ symbol is encountered. the $ is not written to the screen.
INT 020 - call another DOS routine. This one returns control to the calling program (In most cases Command.com)

To fully understand all about what the hell is going on with all these INT's, I strongly suggest you invest in one of these books:

The Peter Norton Programmer's Guide to the IBM PC - Peter Norton. Try to get edition #2 but if you can only get a first ed copy or one'ds going cheap grab it - they're pretty good (I still use an ed.1 copy!)
Advanced MS-DOS - Ray Duncan. Only buy 2nd ed. 1st ed. was fairly limited and not really worth the money - it lacks any coverage above dos 3.0....

Nothing else is worth your money. I'll make the occasional page cross reference (esp. to the Norton book which I feel is the better of the two) from time to time..

ALSO

Scab from your favourite leeching BBS a copy of A86 V3.21 or later, and D86 to go with it... this is the assembler I'll be using in the future... I'll consider demonstrating MASM if you really want me too, but I don't know a hell of a lot about it and don't really want to learn... I only know enuf to know what a basic program might need.

This brings lesson 1 pretty much to a close... get yourself one of these books, delve into it, get A86, type in the demo, absorb as much as you can then write me back with your questions and problems!

I'll be starting lesson 2 soon!.... Cya there.

.\\erlin