80X86 Programming

Introduction
Opcode Byte
Addressing Byte
Using DEBUG
Moves
Jumps
Arithmetic and Logic
Shifts
Loops
Stack Instructions
Shortcut Instructions
Address Loading Instructions
Input/Output Instructions
Assembler Programs
Epilogue
References

Introduction

8086 programming is the basis for programming the 80X86 series of processors, and their descendants, the Pentium and its relatives. DEBUG, which simulates an 8086 environment, is available in Windows 98, and is an excellent tool for learning 8086 programming. This article contains an explanation of the 8086 instruction set, including experiments you can do with DEBUG. You need nothing else but your Windows computer, though a good reference manual is a comfort.

In the beginning, most small processors were used in pocket calculators, and had a 4-bit data bus, enough for digits. In 1972, Intel brought out the 8008 processor with an 8-bit data bus, wide enough to handle characters and go on to greater things as a general microcontroller. This processor was very hard to use, with multiple power voltages and complex clock generation. It was replaced in 1974 by the excellent 8080 processor with a 16 bit address bus that could address 65,536 bytes. This began the rise of the microprocessor. The 8080A and the 8085, which appeared in 1977, were improved devices, but competition, mainly in the form of the 6502, which was superior to the 8085 and adopted for the Apple II, showed more was necessary. The 16-bit 8086 of 1978 was the result, with its 8088 sister with an 8-bit data bus, introduced in 1979. The 8088 was adopted for the IBM PC, and gave rise to the processors now used in most PC's.

The programming presented here is programming in its most fundamental and powerful form: machine language. Not assembly language, with its training wheels and need for learning a complicated format and obscure conventions for its source files, and sometimes doubtful output. Not high-level language, where you only glue together programs written by someone else. Here, we work with the hex bytes themselves and construct instructions bit by bit. This is by no means difficult or obscure, and becomes easy and familiar with practice. It is particularly valuable for learning microprocessor programming, and checking assembled programs. Large programs are created by combining modules, not by doing everything from scratch. Processors are now so fast that program optimization is futile, and memory so abundant that conciseness is unnecessary. However, machine language is still the best way to write optimum and concise programs. After 10 or 15 year's experience with the 8086, you will find it easy.

8086 memory addressing is segmented, an excellent idea that shortens instructions and simplifies programming. An address is in two parts, the segment seg, a 16-bit unsigned number, and the offset off, another 16-bit quantity. To get the physical address, which may have up to 32 bits, The segment is left-shifted a certain number of places and added to the offset. In the 8086, the first segmented-address processor, the segment is shifted four places left, equivalent to one hex digit. For example, if the segment is A000 and the offset 3040, then the full 20-bit physical address is A0000 + 3040 = A3040. In programming the 8086, you never have to work with 20-bit quantities; everything is done with 16-bit words. 20 bits can address 1,048,576 bytes, a MB. Later processors shifted the segment further left, and the ultimate limit is a 32-bit address. None of this affects the programming, and is invisible to the programmer, aside from providing him or her with a large address space.

We shall use word as signifying a 16-bit quantity, two bytes. Words are stored in memory low byte first. For example, $35C4 ($ indicates a hexadecimal number, but will often be omitted) is stored as C4 35 in two successive locations. For processors that increment the program counter IP, this is the logical order, since low-order bytes must be processed before high-order bytes. Addresses are stored offset first, then segment. The example in the preceding paragraph would be stored as 40 30 00 A0. In stating an address, the usual form is seg:off, which is used by DEBUG.

Processor instructions are fetched from memory relative to the code segment, specified by the register CS, while data is fetched relative to the data segment, specified by the register DS. These are the default segments. There is also the extra segment, specified by the register ES, which is an alternative data segment. The use of this segment must usually be directly programmed. Stack operations are relative to the stack segment in SS. The four segment registers CS, DS, ES and SS are set to the same value in DEBUG, and much can be done without changing them. In effect, we have a 64KB address space with this assumption, and programming deals only with 16-bit offsets.

All 8086 registers are 16 bits wide, but four are divided into high and low bytes. There is full support for 8-bit operations in the instruction set. There are four "general purpose" registers AX, BX, CX and DX. The AX register is divided into AL and AH. AL or AX is called the accumulator. BX is divided into BL and BH. It is often used to hold an offset (address). CX is divided into CL and CH. It is often used as a loop counter. DX is divided into DL and DH, and typically holds part of a 32-bit quantity, an I/O port address, or data.

The stack pointer SP and base pointer BP are used to access data relative to the stack segment SS. SI (source index) and DI (destination index) are index registers used in addressing. The program counter IP or PC, and the status word containing the processor flags, conrol processor operation. In addition to these are the four segment registers CS, DS, SS and ES. DEBUG shows the contents of all registers that will be effective when you begin to run a program, as well as the values when your program exits.

Opcode Byte

An 8086 instruction consists of from one to six bytes, plus perhaps an optional segment override byte. The first byte in every instruction is the opcode byte, which specifies the operation to be performed. If data is to be handled, the second byte is an addressing byte. If a displacement or other data is required, it is contained in the following bytes.

In the opcode of a data-transfer instruction, bit 0 (the one on the end at the right) is usually the "size" bit w. w = 0 makes a byte (8-bit) operation, and w = 1 makes a word (16-bit) operation. Bit 1 is the "direction" bit d. This specifies which reference in the addressing byte is the source, and which the destination, of data. Opcodes that do not manipulate data do not have d or w bits.

For example, the instruction that copies data from memory to register, register to memory, or register to register, has the opcode 1000 10dw, and the mnemonic MOV. Mnemonics are only a convenience in machine language programming. If we write MOV AX,BX we mean that BX is the source of the data, and AX is its destination. That is, BX → AX. An expression like this is called assembly language when it appears in a text source file for submission to an assembler program, and must be written in the proper format known to the program. DEBUG's disassembly (rendering of instruction bytes as assembly language) will show what format it expects when it assembles assembly language that you type in.

Addressing Byte

r/m	mem	w=0	w=1
000	BX+SI	AL	AX
001	BX+DI	CL	CX
010	BP+SI	DL	DX
011	BP+DI	BL	BX
100	SI	AH	SP
101	DI	CH	BP
110	ABS	DH	SI
111	BX	BH	DI

The second byte of most instructions specifies the addressing mode. This byte is of the form mmrrrsss. The bits mm specify the interpretation of the bits sss, and give the number of displacement bytes to follow. The bits rrr specify a processor register, according to the code in the right-hand two columns in the table at the right. The register indicated depends on the value of the bit w in the opcode, where w = 0 specifies a byte operation, and w = 1 a word operation.

If the direction bit in the opcode d = 0, the source is rrr ("reg") and the destination is sss ("r/m"). If d = 1, then the source is sss and the destination is rrr. If mm=11, the bits sss also refer to a register, using the same code as for rrr. If mm=00, the memory reference is as in the second column of the table, and there are no displacement bytes, except in the case sss = 110, when a 16-bit offset is specified by two following bytes. By default, this is relative to the segment register DS. When m = 01, there is an 8-bit offset, and when m = 10, there is a 16-bit offset. For sss = 110, the address is BP + DISP, not shown in the table.

For example, the instruction to copy a byte from AL to AH is 1000 1000 1110 0000, or 88 E0. Here, rrr = 100 = AH and sss = 000 = AL. An equivalent instruction is 1000 1010 1100 0100, or 8A C4. The assembly language for this instruction is MOV AH,AL. Which of the two instructions is assembled depends on the assembler.

Using Debug

Even Windows 98 has the DEBUG program. To start it, go to the MS-DOS prompt, enter CD\, and then DEBUG. The prompt is "-". Enter r for a register display. Here you will find all the 8986 registers, the status flags, and other information, including a disassembly of the first few bytes in memory. These bytes are, of course, rubbish. Note the address shown, in the form CS:IP, where CS is the code segment and IP is the instruction pointer, usually 0100 at this point. Actually, all the segment registers are set to the same value. Except for experiments, the segment registers are not changed in these exercises, and the processor is working with an effective 64KB address space.

Type in e100 88 e0 and press return. You have stored the bytes 88 E0 in memory beginning at DS:100. If you enter r again, these bytes will be disassembled. Now change AX to 0055 by entering rax. DEBUG responds with 0000:, and you type 0055 and press Enter. Now another r shows that this value has been placed in AX. These values, of course, are what are put into the processor registers when you execute a program from DEBUG. To execute the single instruction that is shown disassembled, enter t. A register display after the instruction has been executed is now shown. You'll see that the 55 has indeed been copied into AH.

It is easy to see what effects instructions have by the same procedure. Just type in e followed by the starting address, then the bytes, press Enter, and then r to see the disassembly. Try e103 8a c4, and see what it does. Any of the registers can be set as desired before the instruction is executed with a t. Execution will always start at CS:IP, which can also be set by the rip command. This is a powerful and entertaining way to exercise 8086 instructions.

Memory contents can be seen with the d command. For example, d100 14f will display memory from DS:100 to 14f. Memory is changed with the e command, as we have seen.

Moves

The instructions that copy data from one place to another, probably the most frequently used of all, make a good study of opcode formation and addressing modes. These instructions all have the mnemonic MOV, for "move", although the actual action is to copy a byte or a word from one place to another. The data is unchanged at the source, but overwrites whatever was previously at the destination. It is customary to call this "moving", but the true nature of the process should be remembered.

Let's begin with the instructions that move data between memory and the accumulator. By "accumulator" is meant AL or AX, depending on whether the data is a byte or a word. Memory is addressed by its offset relative to the data segment, a 16-bit quantity. To load the accumulator from memory, the instruction is 1010000w ADDL ADDH. w = 0 loads a byte, w = 1 loads a word. Try A0 20 01, which should move the byte at DS:120 into AL. To store the accumulator in memory, the instruction is 1010001w. Try A2 21 01, which should store the byte just moved to AL at DS:121. Note that the opcodes differ only in the value of d. The assembly language for these instructions is MOV AL,[0120] and MOV [0121],AL. Note that memory references are in square brackets, and that the operands are in the order destination,source. These considerations just make your source text comprehensible to the assembler and have no connection with the machine language. The reader can try out these instructions at word width.

Another instruction of restricted scope moves immediate data into a register. By immediate data we mean bytes in the instruction stream that are treated as numerical data, not as instructions or addresses. In assembly language, they are preceded by #. The instruction that loads immediate data into a register is 1011wrrr, followed by one or two data bytes. For example, 10110101 moves a byte into CH. Experiment with B5 2D, which is MOV CH,#2D. This instruction gives the choice between 16 registers, but the segment registers are not included among them.

There are special instructions for loading and storing segment registers. They are 100011d0, followed by an addressing byte. The addressing byte is mm0rrsss, where rr = 00 for ES, 01 for CS, 10 for SS and 11 for DS. If mm = 11, then sss is interpreted as one of the 16-bit registers, as for w = 1 (even though bit 0 is 0 here). d = 0 moves from the segment register rr to register or memory sss, while d = 1 moves from register or memory sss to segment register rr. There is no immediate addressing, so an immediate address must first be loaded into a processor register and then into a segment register. Try the instruction 8e c1, which should load the CX register into the ES register. Set the CX register to something reasonable before tracing this instruction.

We have mentioned an instruction that loads immediate data into a register. A more general instruction loads immediate data into memory or a register. The opcode 1100011w is followed by an addressing byte mm000rrr, and the one or two bytes of immediate data. mm rrr is always the destination of the immediate data. If mm = 11, rrr specifies a register, while in other cases it gives the mode of addressing memory. If additional bytes are necessary for this, they immediately follow the addressing byte, before the immediate data. For example, execute the instruction c6 06 20 01 ff, which is MOV BYTE PTR [0120],FF. Here, w = 0, and the addressing byte is 00 000 110. 110 specifies a direct address, which follows the addressing byte as 20 01. If this were a word instruction, it would be 6 bytes long. Note the BYTE PTR, which is required to show that the instruction deals with a byte. A word instruction would have WORD PTR instead. The size of the immediate data does not impress the assembler.

The most general MOV instruction moves data from register to memory, memory to register, or register to register. It has the form 100010dw mmrrrsss. It is clear that rrr must specify a register, but sss may specify a register or memory. Therefore, direct transfers from memory to memory without going through a register are not available with this instruction. By this time, the reader should be able to construct or analyze such a MOV instruction. For example, MOV AX,[120] assembles as 10001011 00000110 20 01, or 8B 06 20 01. WORD PTR is not required here, because AX specifies the width well enough. The reverse direction is 89 06 20 01, or MOV [120],AX. An address is enclosed in square brackets to show that the data stored at that address is meant.

Now, one of the more complex addressing schemes can be used. For example let's use BX + DI + DISP(8). Suppose BX = 0100, DI = 0020, DISP = 03. The byte at 0123 can be moved into AL with 10001010 01000001 03 = 8A 41 03. We have used 000 = Al and 001 = BX + DI + DISP for mm = 01. This disassembles to MOV AL,[BX+SI+03]. If you can master this instruction, you can deal with any of the 40 addressing modes available on the 8086.

Data can be moved from memory to memory using the string move instructions. The source byte or word is pointed to by DS:SI, while the destination byte or word is pointed to by ES:DI. DEBUG sets ES = DS, so you do not have to worry about it. The default segment for SI can be overridden, but DI must use ES. The DF flag determines whether the SI and DI registers are incremented or decremented when a transfer is made. The instruction 10100100 = A4 moves bytes, 10100101 = A5 moves words. The assembly language is MOVSB and MOVSW. If w = 0, DF = 0, both SI and DI are incremented by 1; if w = 1, DF = 0, SI and DI are incremented by 2. To see how this works, set SI = 0120 and DI = 0130. Load some MOVSW instructions with e100 a5 a5 a5 a5 a5 a5 a5 a5. Examine memory with d100 14f. Now execute a few instructions and note the effect on memory. Each time, two bytes will be transferred and SI, DI will be incremented by 2 each time.

Data pointed to by DS:SI can be loaded into AL or AX with LODSB (AC) or LODSW (AD), after which SI is automatically incremented or decremented. In the other direction, STOSB (AA) or STOSW (AB) stores AL or AX at ES:DI with automatic increment or decrement of DI. The byte pointed to by SI can be compared with the byte pointed to by DI using CMPSB (A6) or CMPSW (A7). AL or AX can be compared with the byte pointed to by DI with SCASB (AE) or SCASW (AF). The string instructions are generally used in a loop. They may be automatically repeated by preceding them with REP (F2), which counts down the CX register. For CMPS and SCAS, REPZ (F2) will exit early if the Z flag is set, and REPNZ (F3) exits early if Z = 0. In the first case, the two operands are equal, in the latter they are different. The zero test is only made for CMPS and SCAS.

To change a flag in DEBUG, use the rf command. The abbreviations for set flags are OV, DN, EI, NG, ZR, AC, PE, CY, while if they are clear the abbreviations are NV, UP, DI, PL, NZ, NA, PO, NC. To decrement registers when using MOVS, enter DN and press Enter. Now change SI to 130 and DI to 140 and repeat the preceding exercise. If you are moving a string up in memory, you want to start at the high end and move downwards (DN) so you do not overwrite the string before moving it. If you are moving a string downwards, you want to start at the low end and move upwards (UP) for the same reason.

If a string instruction is preceded by the REP instruction, 11110011 = F3, the string instruction will be repeated until the CX register is decremented to zero. If you enter f3 a5, it will be disassembled as REPZ MOVSW. REPZ is the same as REP for MOVS. If it is traced, both instructions will be executed together. Set CX = 0008, SI = 0120, DI = 0130, then trace, or use g 102 (this means go from 0100 with breakpoint at 102). The bytes in 120-12f will have been copied into 130-13f.

Jumps

Instructions are fetched from CS:IP, where CS is the 16-bit code segment register, and IP is the 16-bit offset, or instruction pointer. The CS: cannot be overridden, so instructions must always be fetched relative to CS. The processor increments IP as each instruction is executed, so that it points to the next instruction. CS is not incremented, so the IP would roll over if it passed FFFF. A general, or FAR, jump replaces both CS and IP. Its form is EA IPL IPH CSL CSH, where IPL is the instruction pointer low byte, and so on. This is called an intersegment jump, and is disassembled as JMP CS:IP.

The second byte of a short jump is interpreted as an 8-bit signed number and is added to the IP plus 2, so that this points to the next instruction to be executed. The target of the jump can be up to 127 ahead, or -128 behind the location following the instruction. This instruction is EB DISP. The second and third bytes of a long jump are interpreted as a signed 16-bit integer, which are added to the IP as for the short jump. This instruction is E9 DISPL DISPH. The disassembler does not distinguish between them. An assembler assembles the short jump if possible, the long jump if necessary.

An indirect jump has the address of the jump target in memory. If this consists of four byes, IPL, IPH, CSL, CSH, then the instruction FF mm101rrr where mm,rrr points to IPL will perform the jump. For example, FF 00101101 is disassembled as JMP FAR [DI]. The 101 specifies intersegment indirect. DI points to a memory location DS:DI that contains IPL, followed by the remaining three bytes. If your DEBUG started with 12F6:0100, load 02 01 f6 12 at 0120 (this is the jump vector), put 0120 in DI, and ff 2d at 0100. When you trace this, the IP should wind up at 0102. If you assembled 2f instead of 2d, the instruction would dissassemble as JMP FAR DI and would go off somewhere in space if executed (Windows protests strongly).

Note that there were two levels of indirection above, first with DI and then with the jump target in memory. With the jump address stored at 0120, you could also assemble ff 2e 20 01. Now the addressing byte is 00101110, specifying direct addressing. This instruction disassembles as JMP FAR [0120], and there is only a single level of indirection. If you trace this instruction, it will work as well. Be sure that the target address is loaded beginning at 0120.

An intrasegment jump changes only the IP. The addressing byte is mm100sss in this case, with the 100 indicating intrasegment indirect. If mm = 11 and sss = 011, then the instruction FF E3 disassembles as JMP BX. BX should contain the new IP. FF 27 disassembles as JMP [BX]. In this case, BX contains the address of the byte specifying the new IP. Try both of these jumps, initializing the registers as necessary.

A very important class of jumps consists of the conditional jumps, that check certain flags and jump only under certain conditions. These all use only an 8-bit displacement byte considered as a signed integer, so the range of conditional jumps is limited. For example JZ or JE (74) jumps only if the zero flag Z = 1; that is, if the result of an immediately preceding computation was zero. If Z = 0, execution passes to the next instruction in order. The complement to this is JNE or JNZ (75), which jumps if Z = 0 and does not if Z = 1. In these instructions, JZ or JE, JNE or JNZ, are alternate mnemonics. The numbers in parentheses are the hex opcodes.

The Sign flag S represents the highest-order bit of an arithmetic result. If S = 1, then bit 7 = 1, and the result is negative, considered as a signed integer. If S = 0, then the result is positive. JNS (79) jumps if S = 0, while JS (78) jumps if S = 1. The overflow flag Ov is set if the highest order bit changes in a calculation. With signed integers, this means an overflow. JNO (71) jumps if Ov = 0; JO (70) jumps if Ov = 1.

JAE, jump if above or equal, and JNB, jump if not below, which are the same thing, would better be called JNC, jump if no carry. The opcode is 73. The opposite is JB, jump if below, and JNAE, jump if not above or equal, opcode 72, but better JC. JA, jump if above, and JNBE, jump if not below or equal, opcode 77, jump only if C and Z are both zero. JBE, jump if below or equal, and JNA, jump if not above, opcode 76, jump if either C = 1 or Z = 1. JCXZ (E3) jumps only if the CX register is 0. This jump is convenient when CX is a loop counter, and is used for a loop exit. JCXZ does not check the Z flag, but CX directly. LOOP is the same as JCXNZ would be, jumping if CX is not zero, but includes a decrement of CX, which JCXZ does not have.

I am sorry for this, but there are six more pairs of conditional jumps depending on more obscure flag combinations. Simplest is JNP or JPO (7B), which jumps on odd parity, P = 0. JP and JPE (7A) jump on even parity, P = 1. The 8086 checks parity, although this is seldom used. Then there's JLE and JNG (7E), which jumps for Z = 1 or S ≠ Ov. JG and JNLE (7D) jump when Z = 0 and S = Ov. JGE and JNL (7D) jump for S = Ov, and JL and JNGE (7C) for S ≠ Ov. The mnemonics suggest the order relation between signed integers on which they jump. For unsigned integers, we can get along with far fewer conditional jumps.

All of these conditional jumps are very easily exercised by assembling them, setting the flags, and tracing them. Use a displacement byte of, say, 02, so that it can be definitely observed when a jump is taken. There is often more than one mnemonic for a certain jump. Remember to distingush signed and unsigned integers. The mnemonics JA, JB, JAE and JBE refer to unsigned integers, while JG, JL, JGE and JLE refer to signed integers. In cases of uncertainty, look at the flags that are tested and how they will be set by comparisons you intend.

Limited flag control by the program is a weakness of the 8086 instruction set. The interrupt enable flag can be set with STI (FB) and cleared with CLI (FA). CLI is the one that disables interrupts. The direction flag can be cleared with CLD (FC) and set with STD (FD). This flag is discussed above in connection with string move instructions. The only other flag that can be controlled by the programmer, and the only one that can be used in a program, is the carry flag. Carry can be cleared with CLC (F8), set with STC (F9) and complemented with CMC (F5). Instead of processor flags, software flags can be used as well.

The flags can be set for comparing two numbers by subtracting them. However, this destroys the number. An instruction that subtracts two numbers and sets the flags, but does not save the result of the subtraction and leaves the number untouched is called a compare. By using CMP, it can be determined if one number is greater than, less than, or equal to, another number, using tthe Z and S flags. As for ADD and SUB, three CMP instructions exist. The simplest compares the accumulator with immediate data, opcode 0011110w. 3C is CMP AL,DATA, while 3D is CMP AX,DATA. 3C is followed by one byte, 3D by two. A register or memory location can be compared with immediate data with 100000sw, followed by the addressing byte mm111rrr and the data. The s bit controls sign extension of the data. For w = 0, s is not considered. For w = 1, s = 0 means all 16 bits of the data are present, while s = 1 means only 8 bits are present, the high byte obtained by sign extension (i.e. FF becomes FFFF, 7F becomes 007F). The general compare instruction has the opcode 001110dw and a normal addressing byte. There are also string comparison instructions, CMPSB and CMPSW that compare DS:SI with ES:DI, similar to the string move instructions. CMPSB is A6, CMPSW is A7.

TEST is similar to CMP, but a logical AND is done, instead of a subtraction. The carry flag is always cleared. The Z flag is usually the one tested afterwards. For TESTing AL, the opcode is A8, for AX, A9. One or two data bytes follow. For TESTing a register or memory, the opcode is F6 (byte) or F7 (word), followed by the addressing byte mm000rrr, and then one or two bytes of data. Finally a register or memory may be TESTed against a register. The opcodes are 84 and 85 for byte or word operations, followed by a normal addressing byte mmrrrsss. Here, rrr is the register against which memory or register sss is TESTed. TEST is often used to determine whether a particular bit is set. Suppose MASK is the byte 01000000. Then TEST AL,MASK (A8 40) will set Z = 0 if bit 6 of the byte in AL is set, Z = 1 if it is clear. Try this in DEBUG to see it in action.

Arithmetic and Logic

If a digital computer can add, it can also subtract, multiply and divide, but routines must be written to accomplish these tasks. However, it simplifies programming if special instructions are available to perform these operations internally. The 8086 has ADD instructions that add immediate data to the accumulator, add immediate data to a register or memory location, and the general case of adding register to register, memory to register and register to memory. There are three parallel ADC instructions that also add in the carry flag. To perform a multiple-precision add, one starts with ADC and then adds higher-order words or bytes with ADC. The opcode for the general ADD is 000000dw, and for the general ADC 000100dw, each followed by the usual addressing byte.

Similarly, there are three SUB instructions, and three SBB instructions, that include the borrow. The carry flag is a borrow flag in subtraction. C = 1 means that a borrow has occurred. The opcode for the general SUB is 001010dw, and for the general SBB is 000110dw, each followed by the usual addressing byte.

The 8086 will do an 8 x 8 bit multiply, with a 16-bit result, or a 16 x 16 bit multiply, with a 32-bit result. One factor must be in the AL register for a byte operation, or in AX for a word operation. The other factor may be in another register or in memory. The product is returned in AX for an 8-bit multiply, and in DX:AX for a 16-bit multiply. If AH is not zero after the multiplication for an 8-bit multiplication, or if DX is not zero afterwards, then the OV and CY flags are set. The opcode for an 8-bit multiplication is F6, and for a 16-bit multiplication, F7. The addressing byte is mm100sss. For example, if the second factor is in AH, the addressing byte would be 11100100, so the whole instruction would be F6 E4, disassembled as MUL AH. If the second factor were in BX, the addressing byte would be 11100011. The whole instruction would be F7 E3, disassembled as MUL BX.

To test the 16-bit multiplication, put 2C89 in AX and A030 in BX. Execute the instruction MUL BX. The result will be 1BDDF9B0, with the high word 1BDD in DX, and the low word F9B0 in AX. OV and CY will be set. Also try multiplying 2 by 2 by loading 0202 in AX, and executing MUL AH. Since AH will be zero afterwards, OV and CY will be cleared.

Division uses the same opcodes, F6 and F7, but the addressing byte mm110sss. For division of DX:AX by BX, the addressing byte is 11110011 or F3. Therefore, F7 F3 would be disassembled as DIV BX. In the multiplication suggested above, the BX register will not be changed, so the registers will be properly set for dividing 1BDDF9B0 by A030. Execute the instruction. AX will now hold the quotient 2C89, while DX will be 0000, since the remainder will be zero. The flags after a DIV are not predictable. DIV is used for unsigned division.

A signed multiplication is done by IMUL, addressing byte mm101sss. An addressing byte mm111sss gives IDIV, which does a signed division. If the quotients are larger than 7F for an 8-bit division, or 7FFF for a 16-bit division, a divide by zero interrupt is raised. The registers used are the same as for DIV; AX is divided by the divisor, with quotient in AL and remainder in AH, or DX:AX is divided by the divisor, with quotient in AX and remainder in DX. In most cases, DIV is used, not IDIV. A signed byte is in the range +127 to -128, a signed word from +32767 to -32768.

To see the difference between MUL and IMUL, first load IMUL AH (F6 EC) at 100, and put FFFF in AX. As a signed byte, FF = -1, so (-1) x (-1) = +1. Trace the instruction, and see that AX = 0001 afterwards. Now load MUL AH (F6 E4), and again set AX = FFFF. Now FF = 255, so 255 x 255 = 65,025, or FF x FF = FE01. Tracing this instruction gives AX = FE01, as expected. Only for byte factors less than 128 do MUL and IMUL give the same results.

It is curious that MUL, IMUL, DIV and IDIV all share the same opcodes F6 and F7, but with different addressing bytes. The logic instructions NOT and NEG also share the opcodes. The addressing byte for NOT is mm010sss, and for NEG it is mm011sss. NOT finds the 1's complement of the register or memory contents, while NEG finds the 2's complement, which is just the 1's complement plus 1.

The binary logical operations are AND, OR and XOR. These are bitwise operators operating on the individual bits. As in the case of ADD and SUB, three versions of each are provided, one involving the accumulator and immediate data, one with any register or memory and immediate data, and one for register to register, register to memory and memory to register operations. AND is used to clear bits, OR to set them, and XOR to complement them. Logical operations clear the carry flag, and affect the zero and sign flags.

The simplest AND instruction ANDs AL with an immediate byte, opcode 24, or AX with an immediate word, opcode 25. The similar OR instruction has opcodes 0C and 0D, and the simlar XOR instruction opcodes 34 and 35. These instructions are easy to experiment with to see exactly what they do. The instructions 0C 20 24 DF 34 20 set bit 5 in AL, clear the bit, and toggle the bit (back to 1).

The general AND instruction has opcode 001000dw, the general OR instruction 000010dw, and the general XOR 001100dw. Each is followed by an addressing byte that gives register-register, register-memory and memory-register modes. See if you can assemble the instruction AND BX,DX [one possibility is 23 DA].

If you need to convert a byte in AL to a word, nothing has to be done for an unsigned byte. For a signed byte, bit 7, the sign bit, must be extended into AH. This can be done with CBW, opcode 98. A 32 bit quantity is stored in DX:AX. A word in AX can be sign-extended by CWD (convert word to doubleword), opcode 99.

Shifts

The bits in a byte or word may be shifted to the right or left. A vacated space may be filled with a 0, a 1, or the carry flag. A bit shifted out may go to the carry flag. A shift one place to the left is equivalent to multiplication by 2, while a shift one place to the right is equivalent to division by 2.

A logical shift right, SHR, brings a 0 into the high order bit, and the low order bit is shifted out. The opcode is 110100cw, followed by the addressing byte mm101rrr. If c = 1, the number of places to shift is given by the CL register. If c = 0, the shift is one place. A logical shift left, SHL has the same opcode, but the addressing byte mm100rrr. A 0 comes in at the low-order bit, and the high-order bit is shifted out. An arithmetic shift right, SAR, copies the high-order bit instead of introducing zeros (preserving the sign of a signed quantity). The opcode is the same, but the addressing byte is mm111rrr.

Rotates shift in at one end the bit that is shifted out at the other. They have the same opcode as the shifts, but the addressing byte for ROL, rotate left, is mm000rr, and for ROR, rotate right, is mm001rrr. If carry is added as an additional bit, then RCL puts the bit shifted out at the right into carry, and takes the bit in carry and shifts it in at the left. RCR does the same thing, but in the other direction. The addressing bytes are mm010rrr for RCL, and mm011rrr for RCR.

All the shift and rotate instructions have the same opcode for a shift by one place, which is D0 (11010000) for byte operations and D1 (11010001) for word operations, followed by an addressing byte that specifies the operation and its location in a register or memory. D2 and D3 are used for multiple shifts, counted by the value in CL. RCL and RCR can be used to isolate individual bits, in a place where they can be tested with conditinal jumps, which makes them especially useful.

Loops

A loop, or repetitive structure, is very commonly required. In a high level language, this may appear as a for loop in C: for (i=0; i

In machine language, a loop is an elegant structure. At the top, we have MOV CX,N, then the statement block { }. At the end, DEC CX, JNZ disp, where the displacement disp takes execution back to the first {. After the block has been executed N times, execution passes to the next statement. This is so common that a special instruction, LOOP disp does what the DEC and JNZ do. The loop is then simply MOV CX,N { } LOOP disp. The instructions LOOPZ and LOOPNZ check other flags, as well as CX = 0. The LOOP opcode is E2, LOOPZ E1, LOOPNZ E0. CX is generally dedicated as a loop counter.

Registers and memory locations can be decremented and incremented with special instructions, so that ADD and SUB are not necessary. Decrementing is subtracting 1, while incrementing is adding 1. The simplest DEC is DEC reg, opcode 01001rrr, where rrr codes the 16-bit register to be decremented. There is a corresponding INC reg, opcode 01000rrr. These are fast instructions, requiring only 2 bus cycles. A more general DEC instruction is encoded 1111111w mm001sss, and can do 8-bit decrements as well as 16-bit ones. The INC has the same opcode, but an addressing byte mm000sss. When operating on registers, 3 bus cycles are required. Memory increments and decrements are much slower. For incrementing or decrementing registers by more than 1, ADD and SUB, which require 4 cycles, are at least as economical as repeated INC and DEC.

Stack Instructions

The stack is temporary storage originally used to preserve return addresses when a subroutine is called. It is so convenient for temporary storage that it is used as the location of temporary or automatic variables for the use of subroutines. The top of the stack is at SS:SP, which is really the bottom, since the stack builds downwards, towards smaller offsets. It is very important that everything one pushes on the stack is later removed, so that the stack does not grow out of bounds and trash memory. Data may be overwritten on the stack, but it is not erased.

PUSH reg, opcode 01010rrr, decrements SP, stores the high byte, decrements SP again, and stores the low byte. SP winds up pointing to the last byte stored on the stack, and a word is stored low byte low, as normal. POP reg, opcode 01011rrr, loads the low byte in the register, increments SP, loads the high byte, and increments SP again. A PUSH followed by a POP restores the stack to its original state. The SP can be moved to BP, and then data can be accessed at SS:BP, since the default segment for BP is SS. This is how a subroutine accesses its automatic variables.

There are special instructions for PUSHing and POPping the segment registers and the flags, which often have to be saved. Segment registers are pushed with 000ss110, where ss = 00 for ES, 01 for CS, 10 for SS, and 11 for DS. They are popped with 000ss111. In assembly language, the usual abbreviations for the segment registers can be used, such as PUSH DS. As for the flags, use PUSHF (9C) and POPF (9D).

Memory or a register can also be PUSHed. The opcode is FF, and the addressing byte is mm110rrr. To POP to a register or memory, the opcode is 8F and the addressing byte is mm000rrr. All PUSH and POP operations are word operations that change SP by 2.

A subroutine call may be intersegment, in which CS is changed as well as IP, or intrasegment, when only IP is changed. An intrasegment or NEAR CALL is E8, followed by two bytes, interpreted as a signed 16-bit displacement. When it is executed, the IP is pushed on the stack, and the displacement is added to the current IP to get the new IP. SP is decremented by 2. An intrasegment RET is C3. This instruction pops two bytes off the stack and loads them into the IP as the address of the next instruction, and increments SP by 2. It is worth while to observe this in action with DEBUG. Your subroutine may consist only of RET, C3. Find out exactly what is pushed on the stack.

An intersegment or FAR CALL is 9A, followed by the offset and segment in that order, low bytes first. That is, 9A 00 01 F6 12 calls a routine beginning at 12F6:0100. The return address (current IP + 5)is pushed on the stack, segment and then offset, and SP is decremented by 4. The corresponding RET is CB, which pops the offset and segment, puts it in CS:IP, and increments SP by 4.

Indirect CALLs find the target address in a register or memory. An intrasegment indirect call has the opcode FF, followed by the addressing byte mm010rrr, where mm,rrr specify the memory location or the register. There may be displacement or direct address bytes, if necessary. An intersegment indirect CALL has the addressing byte mm011rrr, but here mm should not be 11, and rrr should refer only to a memory location. The offset and segment are stored beginning at this address, in the usual order.

The type of RET must always match the type of CALL.

Shortcut Instructions

It is sometimes necessary to transfer the contents of some register into the accumulator to do arithmetic. If MOV AX,BX is used, the previous contents of AX is lost. If AX has to be preserved, it can be stored temporarily on the stack and then recalled later. We could also MOV CX,AX and then restore it from CX, provided we are not using CX. However, the 8086 offers the instruction XCHG acc,reg that exchanges the values in the accumulator and another register without losing one of them. The opcode is 10010rrr for exchanging register rrr with the accumulator. This is a 16-bit operation. 10010000 or 90 exchanges AX with itself. That is, it does nothing, and this is the NOP instruction. More generally, one can exchange the contents of a register with another register or memory. The opcodes are 86 for a byte operation, 87 for a word operation, followed by mmrrrsss, where rrr is the register to be exchanged with register sss or memory.

A translation table may exist in memory. This consists of up to 256 consecutive bytes, which are matched with the bytes from 0 to 255. For example, if the table begins 23 03 4C 67 28 ... , then the byte 03 translates to 67, the corresponding byte in the table. The instruction XLAT, D7 performs this lookup, provided the byte is placed in AL and the table begins at the offset stored in BX. These registers must be used. XLAT adds the byte to the offset in BX, and uses the result to load the byte at that address into AL. This can easily be tested in DEBUG. Store a table beginning at, say 110, put 0110 in BX and D7 at 100. Load various bytes in AL and see what they translate to.

Address Loading Instructions

We have seen that we can use an offset loaded in BX to access data. However, the offset of a location in memory is often expressed as the sum of several contributions, such as BX + SI + DISP. This offset is called the effective address, EA. The instruction LEA can calculate the offset as the sum of such contributions, and store it in a register. The opcode is 8D, followed by an addressing byte mmrrrsss. mm,sss gives the addressing mode, such as BX + SI + DISP, while rrr specifies the register in which the EA is to be stored. For example, LEA BX,[BX + SI + DISP] with an 8-bit DISP of 08 uses the addressing byte 01011000 or 58. The instruction is 8D 58 08. Try this in DEBUG, with, say, BX = 0100, SI = 0010. BX should be loaded with 0118. Note that memory is not involved in this, just address arithmetic. The next instruction could be MOV AX,[BX] or 8B 07. In this case the same thing could be done with a single instruction, MOV AX,[BX + SI + 08]. In other cases, however, LEA can be useful.

Sometimes the address of data is in memory as offset and segment. To access the data, we must load DS as well as the offset. The instruction LDS will do this. The opcode is C5, and the addressing byte is mmrrrsss. rrr is the destination register in the usual code, while the location of the address in memory is given by mm,sss. Suppose the address is at offset 0200, and we want to put the offset of the address in BX. Then, the addressing byte is 00011110, and we must follow with the byte 0200. The result is C5 1E 00 02, LDS BX,[0200]. At location 0200 in memory we must have first the offset, and then the segment, of the data.

An exactly similar instruction loads ES and the offset. This is LES, with opcode C4. In string operations, we often have to load DS:SI and ES:DI for the source and destination of the string, and LDS, LES do this efficiently, loading the necessary segments as well. When using these instructions, remember that the segment registers are changed, and may have to be restored. As we have seen, they may be pushed on the stack and restored after the string movement has been accomplished.

Input/Output Instructions

The 8086 provides an additional 64K address space with distinct control signals. Only two instructions, IN and OUT activate these control signals. The 16-bit I/O address or port is loaded into the DX register. The instruction IN AL,DX (EC) reads a byte from the address in DX, while IN AX,DX (ED) reads two successive bytes from DX and DX + 1. OUT DX,AL (EE) and OUT DX,AX (EF) write a byte or a word, respectively, to DX.

If the I/O address is in the range 00-FF, IN AL,port (E4), IN AX,port (E5), OUT port,AL (E6) and OUT port,AX (E7). Each opcode is followed by a byte specifying the I/O address or port number. In many systems, this restricted address space is sufficient. The IBM PC only decoded the low 10 address lines for an I/O access, giving 1024 different ports.

Although the I/O space is probably a legacy from the microcontroller ancestry of the 8086, it has proved very useful. The I/O bus can be separated from the main address bus, and run with different characteristics, such as a slower speed.

The I/O ports cannot be used directly from Windows, unfortunately, even under the MS-DOS prompt.

Assembler Programs

The classic assembler for the 8086 is MASM, MACRO Assembler, which was written by Microsoft to support the MS-DOS operating system. It produces object code that can be processed by LINK to make executable .EXE files for MS-DOS. Another excellent assembler is Turbo Assembler by Borland, which overcomes certain misfeatures in MASM. MASM makes two complete passes through the source code. The first determines how many bytes there will be, so that addresses can be determined. The second pass performs the actual assembly. Turbo is a single-pass assembler that does the assembly on the first pass, except for a few addresses that must be fixed up at the end. An assembler may assemble a longer instruction than turns out to be necessary if it must make assumptions about what is to come.

If run in Windows 98, MASM will assemble properly, but will not be able to save the .OBJ file ("FCB Unavailable") if started with MASM TEST;. Using MASM TEST,TEST,TEST,NUL; generated an .OBJ file, however. CREF will hang the computer if it is run, for some reason. If Notebook is used as a text editor, be sure that the file saved has the correct name. If necessary, .TXT can be changed to .ASM when you get to the MASM directory.

The source file is a straight ASCII file with lines ending in CR-LF. A line consists of four fields, separated by at least one space: label, opcode, operand and comment. A comment is introduced by a semicolon ;, which may begin a line. The label is optional. The opcode is a mnemonic for a machine instruction or an assembly command (a pseudo-op). Lower case is converted to upper case. Binary constants are 1's and 0's followed by B. Decimal constants are not specially identified, or are followed by D. They may be expressed in floating-point exponential notation. Hexadecimal constants are followed by H, and must begin with a digit. Valid constants are: 10001100B, 255, 0A030H. The ASCII value is represented by characters in "" and ''. The current value of the instruction pointer in an assembly is represented by $.

As an example of a pseudo-op, DB assembles specified bytes at that location in the assembly. A line like AVAR DB 00H defines a variable AVAR (a symbol that may be used elsewhere to specify this address) with an initial value 00H. The name must not end in :, because the colon is used to identify NEAR program labels (i.e., those that can be accessed only by intrasegment calls). WHAT DB ? defines an uninitialized variable WHAT. NAME DB 'NAPOLEON' assembles the ASCII codes of the letters in the string. The first character can be accessed by NAME. DB 10 DUP(0FFH) assembles 10 bytes with the value 0FFH. LIST DB 0,1,2,3,4 stores the integers from 0 to 4 beginning at LIST. DW is used in the same way to define 16-bit quantities. DD defines a doubleword, or a four-byte quantity.

ORG defines the offset in the CS at which the assembler should begin assembling bytes, as in ORG 120H. EQU assigns a value to a symbol, as in CR EQU 0DH. Then, the more recognizable CR can be used in place of the literal byte 0D. CR cannot be re-equated, but if you had used = instead of EQU to define it, it could be. A subroutine can be put between PROC and ENDP pseudo-ops. In this case, the PROC statement defines the type of CALL to be used. PROC FOO NEAR will cause a near (intersegment) call; PROC BAR FAR a far (intrasegment) call.

Segments should be defined by a segment definition such as DATA SEGMENT PUBLIC, ASSUME DS:DATA at the top, and DATA ENDS at the end. Data relative to DS will assemble in this section. CODE SEGMENT PUBLIC, ASSUME CS:DATA will define a code segment. All code will be assembled relative to CS, whatever might be declared. These segment declarations are necessary for making an .EXE file with LINK. The DS register must be explicitly loaded in a program to access the data. This has more to do with the MS-DOS system than to the 8086, however. If you don't use SEGMENT directives, everything will go into the code segment.

Macro expansion is a feature of MASM. A macro is defined with dummy arguments. When a macro is called with actual arguments, these are substituted for the dummy argumets, and the statements are assembled. For example, a macro definition might be SWAP MACRO A,B, MOV AX,[B], PUSH AX, MOV AX,[A], MOV [B],AX, POP AX, MOV [A],AX, ENDM. A macro call would be SWAP HERE,THERE, which would assemble MOV AX,[THERE], PUSH AX, MOV AX,[THERE], MOV [HERE],AX POP AX, MOVE [HERE],AX. There are some additional facilities and conveniences, but this presents the main idea.

MASM can be run from the DOS prompt. If not followed by filenames, it prompts the user for them. MASM foo; assembles foo.asm to make foo.obj, with no listing or cross-reference file. MASM foo,,,; also makes the listing file foo.lst and the cross-reference file foo.crf. MASM foo,,,NUL; omits the cross-reference file. Most assemblers, and LINK, work this way from the command line.

Epilogue

Nearly all the instructions except eight supporting decimal or BCD arithmetic, or for 8080 compatibility, have been considered above. There are four instructions for system purposes that are seldom used in programs. Two to stay away from are WAIT, 9B, or HLT, F4. Both of these stop execution, WAIT until the TEST input is brought low or there is an interrupt or reset, HLT until an interrrupt or a reset. Under ordinary conditions, either will kill the processor irretrievably. LOCK, F0, pulls the /LOCK output low for the duration of the following instruction. Finally, ESC, any opcode from D8 to DF, causes data to be put on the data bus, but the 8086 does nothing with it. It enables coprocessors to use the data bus to acquire or store data, using the 8086's addressing capabilities.

This article has not dealt with the development of programs to perform certain tasks, only with the individual instructions. In general, programs should be resolved into subroutines that do as little as possible individually, and which are called by an outer program to accomplish its aims.

Machine language programs are simply bytes, and mean nothing to the human observer, but are meant for the eyes of the processor alone. Therefore, it is necessary to use aids such as text descriptions, flow charts, and so forth to make the intent of the program clear. Most programmers these days see only source code, and think of it as the program, when, in fact, it is not. Source code is not always translated correctly into machine language, and may not make available all the facilities of the processor, even with assembly language.

References

R. Rector and G. Alexy, The 8086 Book (Berkeley, CA: Osborne/McGraw-Hill, 1980). One of the best 8086 manuals.

__________, Reference, Disk Operating System Version 3.20, Pub. 68X2405 (Boca Raton, FL: IBM and Microsoft, 1986). Using DEBUG.

__________, iAPX 86,88 User's Manual (Santa Clara, CA: Intel Corp., 1981).

__________, Macro Assembler by Microsoft, Pub. 6025218 (Boca Raton, FL: IBM, 1981). This is version 1.0; there were later versions as well.

__________, Turbo Assembler 1.0 User's Guide (Scotts Valley, CA: Borland International, 1988). This is really quite a good text on assembly programming. The User's Guides become briefer with later releases.

Return to Math Index

Composed by J. B. Calvert
Created 18 September 2004
Last revised 19 September 2004