It is easy to make programs for DOS with your bare hands
An original PC or XT is an excellent tool for learning about computers and microprocessors. I have a PC, with 640K RAM, clock and mouse, which I use for making EPROM's and as a terminal for embedded systems. It works excellently for these purposes, and the sharp monochrome monitor is pleasant. The XT has a new, larger hard disk, all the memory it can hold, clock and mouse, and color video. It is used for developing plug-in cards for the ISA bus and other experiments. Both of these just remained when more modern systems were acquired, and have continued to do a good job in the laboratory. Actually, you can use any computer that will run MS-DOS about the same way, but perhaps with a little unnecessary complexity. If you are interested in computer control, acquiring one of these old systems if you can find a good one may be a good move.
In this page, I will explain how to use the DEBUG utility program, and how to control the parallel (printer) port of the computer. Both these things come with the computer, and one used to get complete information on how to use them. It is this information that is so valuable, and so lacking when you buy a current model. DOS is much easier to use than Windows, faster and simpler when running one program for one purpose. Any version of DOS after 2.0 will be suitable for a small system. I use 3.2 and 5.0, principally. DEBUG is so much fun to use that it will give you a feeling of overconfidence in your computer skills.
If you know 8086 assembly language programming, so much the better. However, I will show everything you need explicitly, and this will give you a feel for the subject that you can develop elsewhere. Later processors will still do everything that an 8086 did, and even the newest computers will have most things in the same places. Any time spent on understanding these things here will not be wasted when you proceed to later Intel processors.
Nearly all computer systems will have a parallel or printer port, usually at the LPT1 port address of 0378 (port addresses and the LPT notation are explained below). For control applications or experiments, you may want to install further parallel ports, and this is easy to do. Boards are available for either the ISA bus or the PCI bus that support two parallel ports. Installing the boards is dead easy. They merely need to be plugged in a slot, and they are ready for use. Make sure that the cable to the second port is plugged in the right way round; the red stripe should go towards pin 1 of the header. Select the port addresses for the two ports, perhaps 278 for port A and 268 for port B. If you will not use the interrupt feature, remove the jumpers so that the interrupt lines are disconnected. To use the port, reads and writes are made directly to its three I/O addresses.
All new parallel ports can be configured for input. In the original printer adapter, the connector data pins were driven by a latch with 3-state ouputs (an LS374), and read by a buffer with 3-state ouputs (an LS244) onto the I/O channel bus. For some reason, the output enable of the latch was hard-wired to 0, permanently enabling the outputs. This meant that reading the port only showed the state of the latches. No data sources could be connected to the pins, because of contention with the output latch. LSI adapters later replaced the SSI chips of the original adapters, but retained all the functionality. They added a latch for bit 5 of the control output word at address 037A, and the output of this latch was connected to the output enable of the output latch. A 1 written to this bit puts the data outputs into high impedance, so that now the LS244 (or equivalent) reads the levels put on the data pins, without contention with the output latch. This new arrangement is called EPP (extended parallel port) or something, but is really very simple. Because it suggests good problems, I will generally assume in what follows that the parallel port is the original type, and cannot input data bytes.
It will be much more interesting if you have an experimental board connected to the parallel port of the computer that you use. A photograph of such a board will appear here when the film has been developed and scanned. The board should buffer the outputs from pins 2-9 of the DB25-S connector, and display them on LED's. I used an LS540 buffer and 8 units of a 10-unit bar graph display for this. Also, it should be possible to apply high or low logic levels to pins 10-13 and 15, using DIP switches and pullup resistors (5.1k). Pins 1, 14, 16 and 17 should also be buffered so their levels can be read. I used an LS04 for this. Pin 25 provides the ground connection (all the pins 18-25 are ground, so there can be one ground wire for each data wire if needed). This board has its own power supply, since power should not be drawn from the parallel port, whose outputs can drive only one LSTTL input. LED's should not be connected directly to the connector pins.
Parallel ports use DB-25S (socket) connectors on the back of your computer. The DB-25P or DB-9P (plug) you may also see is the serial port. The connector is shown at the right with pin assignments. Bit 0 of the data is pin 2, and bit 7 is pin 9. The data is written to the base address of the port (LPT1 is usually 0378), and also read there if you have a bidirectional (EPP) port configured for input. The bits of status input, read from port (base + 1), are at pins 11, 10, 12, 13 and 15 from bit 7 to bit 3. The control outputs are at pins 17, 16, 14 and 1, for bits 3 to 0. These assignments are discussed further below. Certain of the logic levels on the status and control bits are inverted. The parallel port can be used as a general interface for control purposes. With a few solid-state relays, it is easy to control lights, fans and such.
DEBUG is a utility program furnished with DOS that allows you to investigate the computer, see the actual bytes in any disk file, read and write data to and from diskettes either by filename or by sector, load and execute programs, including single-stepping instruction by instruction, assemble and disassemble machine instructions, examine and alter the state of memory, and several other useful things of the same nature. As you see, it is a very powerful utility and can teach a lot about DOS, computers and microprocessors. It allows you to see what happens instruction by instruction. Each version of DOS may have its own version of DEBUG, made to be compatible with it, and different versions should not be mixed.
Addresses are, of course, a very important part of using DEBUG, and it is necessary to understand the segmented memory structure of the 8086. 16 bits is insufficient for the size of the memory map of the 8086, so 20 bits are used instead. The 20 bits are divided between two 16-bit quantities that can be held in the internal registers, which are only 16 bits wide. One part is called the segment, from 0000 to FFFF, and the other is called the offset, from 0000 to FFFF. The complete 20-bit physical address is found by shifting the segment 4 bits, one hex digit, left and adding the offset to it. For example, if the segment is 3C00 and the offset is 120A (written 3C00:120A), then the physical address is 3C000 + 120A = 3D20A. Generally, we will work with a fixed segment address, and have a 64KB segment available, addressed by the offset. The programmer is never concerned with physical addresses, just segments and offsets. Incidentally, all numbers in DEBUG are hexadecimal, and are written with no other adornment. If a number is expected, a leading 0-9 is not even necessary.
To start DEBUG, simply type "debug" at the DOS prompt, so it looks like this: |C>debug and press Enter. You should get the DEBUG prompt, a hyphen: -. This means that DEBUG is ready to receive a command. The commands are all single letters, followed by the parameters. To exit DEBUG, type a "q" (Quit) after the prompt and press Enter. By "executing" a command we mean typing it with its parameters at the DEBUG prompt and then pressing Enter. You may make further input during the execution of the command, as prompted. There is no distinction between upper and lower case letters, and a simple space is a satisfactory separator. Separators are not usually needed, except between parameters.
First of all, type the command "r" (Register) and press Enter. Study every part of this important display. It shows what the status of the processor will be when you execute your own instructions. Right now, of course, the processor is busy running DEBUG, but DEBUG will allow you to take the wheel when you want. AX, BX, CX and DX are the general purpose registers, each 16 bits wide and divided into 8-bit upper and lower halves, called AH, AL, BH, BL, CH, CL, DH and DL. DEBUG does not know these names, only the names of the 16-bit versions, but the assembly language does use them. AX is the "accumulator" when that makes any difference. Except for special uses, all the general purpose registers are alike.
DS, ES, SS and CS are the segment registers, holding this part of any memory address. They are 16 bits wide, and are not divided into 8-bit halves. When you start DEBUG, they will all hold the same value, which points to what we will call the program segment, which is 64KB long. It starts above the highest address used by DEBUG, and shows you where free memory exists. Actually, you have absolutely everything above this, all the rest of the space in memory to play with, but we will only need the 64KB program segment, and will not alter the segment registers. Different instructions use different segment registers by default. Generally, CS (code segment) is used with instruction addresses for execution and DS (data segment) for data in memory.
SP is the stack pointer. The location SS:SP is the "top" of the stack, which actually is the bottom, since the stack grows downward towards lower offsets. It shows where the last item pushed on the stack was stored. DEBUG sets it at FFEE, near the top of the program segment, when you start DEBUG. If you load a file with file extension .COM or .EXE, which should be a program, SP is set to FFFE, with two bytes of zeros at FFFE and FFFF. The stack builds downwards, and DEBUG allots 100 hex (256) bytes to the stack. It keeps the information pretty much to itself, however, and as long as we use only the lower parts of the segment, there will be no collision. SI (source index) and DI (destination index) are index registers that use segments DS (data segment) and ES (extra segment) to point to locations in memory. BP (base pointer), finally, is a register to store a popular value of SP to access memory at SS:BP. All these are unitary 16-bit registers.
IP is the instruction pointer, and a very important register. It points to the first byte of the next instruction that the processor intends to execute. This instruction is "disassembled" on the last line of the register display so you can see what it is and stop yourself from doing something foolish. The address, the instruction bytes, and the assembly language are shown. IP points to the program location CS:IP and is automatically incremented as execution proceeds.
Now that we know what all the registers are, we should know how to set them to whatever values we desire, and that is easy. If we wish to set AX to 23A7, we execute the command -r ax, and DEBUG shows us a :. Now we type in 23A7 and press Enter. Doing "r" will show that AX now holds 23A7. The same process works for any register.
Finally, the line of letter pairs NV UP DI PL NZ NA PO NC symbolize 0 values, or clear states, of the processor flags Overflow, Direction, Disable Interrupt, Sign, Zero, Auxiliary Carry, Parity and Carry. The opposite states, set or 1, are OV DN EI NG ZR AC PE CY. To set them to any desired state, simply execute the command -rf, which will show you the current states, then a -, after which you can write down any number of these abbreviations, and the flags will be so set. Remember that the flags will take these values only when execution of your program starts, like all the other information in the register display.
The "d" (Dump) command is used to see what bytes are in memory. Type in the following command, and press Enter: -d0:400. You will see 128 bytes, those in addresses 0:400 to 0:47F, arranged in a neat table with addresses on the left in segment:offset form, 16 bytes on a line, in two groups of 8 separated by a hyphen. At the right, the bytes are interpreted as ASCII (character) data, which are usually garbage, as here. This is the equipment table that BIOS creates and maintains. The whole top line is devoted to the serial and parallel ports. The first 8 bytes are the base addresses for the serial ports COM1 to COM4. 00 00 means that the port is not installed. The next six bytes are the parallel ports LPT1 to LPT3. Older BIOS's may only list two LPT's, and newer ones four or more. Whether they are listed here or not, the ports will be available if they have been installed. You will probably see the figures 78 03, which mean a port address 0378, for LPT1. In the 8086 world, 16-bit words are stored with the low byte at the low address, high byte at the high address. The information we have just obtained will be used below. If you execute -d again, without any parameters, you will get the next 128 bytes. You can see any desired number of bytes by putting L (or l) and the number of bytes, as -d0:400 L10, which will show you only 16 bytes.
The "e" (Enter) command is used to change the bytes in memory. Try this: -e200 "now is the winter of our discontent made glorious summer". Press Enter and then dds:200, Enter, to see what the effect was. If you forget the ds: you will get the segment from the last time you used "d," which was 0 if you were just looking at 0:400. Now you can see the utility of the ASCII display in the "d" command. Now execute just -e200 and press Enter. You will see the byte there, followed by a period. You can type in any byte to change it, or just press the Space bar to go to the next byte. If you press Enter, you will leave the command. A hyphen ("-") goes back one address. Now you know how to get any particular bytes in memory, even those corresponding to text as ASCII.
If your DOS has not found all the parallel ports that you have installed, you can remedy the omission yourself by loading the addresses in the equipment table. Also, by looking at the table you can see what port corresponds to LPT1, which to LPT2, and so on. These names are reserved DOS names for the parallel ports, and can be used wherever a filename would work. PRN is a synonym for LPT1.
Now let's consider putting a program in memory, and executing it. The first 256 bytes of the program segment is reserved for data concerning the program, and is called the program segment prefix, the PSP. That's the reason DEBUG puts the IP at 0100 when it starts. Now we have everything between O100 and FEEE at our disposal. Incidentally, when you go about changing memory and things, stay in this area. If you change things in memory below the program segment, you will probably kill the operating system and the computer will freeze. So long as you are in the program segment (or above) all is well.
Look at the PSP with the Dump command ("-d0") and note that the first two bytes are CD 20. These happen to be a processor instruction. To find out which one, just execute -u0. The "u" command is the Disassemble command, that takes the given bytes and does its best to interpret them as instructions. When you are dealing with a program, the result will be meaningful. For random data, you just get garbage that is meaningless. In this case, we see that CD 20 dissasembles to INT 20. This is the software interrupt that DOS uses to end a program and return to the operating system. Here, it goes back to DEBUG that reassures us with the statement "Program terminated normally."
This suggests our first program. Instructions are assembled with the "a" (Assemble) command, which is the inverse of the "u" command. To use it, you must know the assembly language that DEBUG expects, which is the usual 8086 assembly language with a few tricks. Type in -a100, and press Enter. The address will be displayed, and DEBUG will wait for your input. Type in JMP 0 (jump to offset zero in the code segment). You should have no error message, and can press Enter to terminate the command. If you do have an error, DEBUG will give you another chance. Investigate what you have done with "-u100" and JMP 0000 should be displayed. These three bytes are your whole program.
Execute an -r to be sure everything is ready. Check the IP and see that it is 0100. Now excute -g (Go), and "Program terminated normally" should be seen immediately. It worked! Every program should end with a jump to 0 so that execution passes smoothly back to DOS and DEBUG. Don't let your programs run off "into the grass" when there are no more instructions to execute. It will try to, unless you guide it in the path of righteousness. This is not the best way to end programs in general, but it will work here and is simple. If the IP were set incorrectly, we could use -g=100 instead. The "=" must be used in this case.
If you know how subroutines work, you will understand the trick of using a RET (return from subroutine) at the end of the program instead of JMP 0. DEBUG always puts at least two bytes of zeros on the stack, where the RET instruction looks for the return address. So, it goes to CS:0 as if you had executed JMP 0 instead. This is just an alternative way to use the INT 20 termination request.
Now we need a longer program with multiple instructions to see how single-stepping works. Let's just move numbers around in the registers. Start assembling at 0100, and type in the following instructions (or any others you may want): MOV AX,FFFF MOV BL,AH MOV CX,. MOV is the basic instruction for copying values. The first parameter is the destination, the second is the source. A number by itself is just a number. A number in square brackets is an address of a byte in memory. The source and destination must be the same width (8 or 16 bits). Add the JMP 0, and your second program is ready.
Instead of using "g", execute -t (Trace) instead. (use t=100 if the IP is not properly set). Only the current instruction will be executed, and a new register display with the changes will be seen. Analyze the changes carefully, and check that they agree with your ideas of what the instructions should do. When you get to the INT 20 at the end of the program, use "g" instead of "t" to terminate the program. Otherwise, you will just trace into the DOS code, and that is not what you want.
The 8086 has a second address space, that of the I/O ports, which goes from 0 to FFFF. Only part of this space is used in the PC, up to port 3FF. Only two instructions are available, IN and OUT. For ports 00-FF, whose address can fit in a single byte, there are short instructions that include the port address. For larger port addresses, the address must first be loaded into the DX register, and the instructions IN AL,DX or OUT DX,AL used instead. There are both byte and word operations, but the byte operations are most used. The data must pass through the AL register, another restriction.
DEBUG has "i" (In) and "o" (Out) instructions of its own that allow you to communicate with the I/O ports directly, without writing a program. Try -o378 ff and -o378 00 and watch the LED's connected to the parallel port. This assumes, of course, that the base address of the port is 378. If it is something else, use that. Try different numbers, until it is clear what is happening. For input, execute -i379, and DEBUG will fetch the levels of pins 11-13 and 15. Note that the level on pin 11 is inverted. This is just a peculiarity of the printer interface, and has no deep significance. The five input bits are bits 3-7 at port 379. Unfortunately, there is not a complete input byte on the original printer interface.
The easiest way to find out if a parallel port is present and working is to write (output) a byte to its base address, then read (input) from the base address to see if you get the same byte. For example, first do -o378 00 and -i378, getting 00 in response, then -o378 ff and i378 again, getting ff. This exercises all the bits and makes sure they are stored and change properly.
Port 37A (still assuming a base address of 378) is a control port, basically an output. Bit 0, inverted, is sent to pin 1. Bit 1, inverted, is sent to pin 14. Bit 2 is sent to pin 16, and bit 3, inverted, to pin 17. You can verify this by sending values to the port and checking the resulting output. Bit 4 is an interrupt enable. If this bit is 1, then an interrupt will be requested if pin 10 transitions from high to low (pin 10 is one of the inputs at port 379). A more recent addition is bit 5, which enables input at the main data port, 378, as well as output. This feature was added because the parallel port has proved a valuable I/O channel for more things than printers--scanners, for example, that need to send data to the computer. Most newer parallel ports will support this feature. When this bit is written as a 1, then reads can be made from 378 as well as writes.
Writing 21h to the control port will enable input by disabling the output data latch. The circuit shown at the right is recommended when connecting to the parallel port for input. Pin 1 is used to disable peripheral data until the data port is configured for input. When the system is first turned on, BIOS will write 00h to the control port, pulling pin 1 high, and preventing contention with input data. Writing 00h to the control port will re-enable output.
In summary, the parallel port has three channels. The eight-bit data channel is at the base address, and is basically output, but newer ports can be configured for input, as has just been mentioned. At (base + 1) is the status channel, five bits, and at (base + 2) is the 5- or 6-bit control channel, which is output. These bits can be managed to support a useful I/O interface.
DOS has the feature that if a program is in a disk file, the program can be loaded and executed simply by typing its name at the DOS prompt. Of course, the program must be in a prescribed form so that DOS can understand what to do with it. The commonest type of program is given the file extension .EXE, and is produced by a program called a linker. An earlier, and much simpler, format is the .COM file, and we can make .COM files with DEBUG only. .COM stands for COre iMage (not "COMmand" though they can be used for DOS commands) and the file contains the actual bytes of the program, which are not modified on loading. This somewhat restricts the generality of such programs, which must handle all funny business themselves instead of letting DOS do it, but for small, simple programs it is an advantage.
Once you have a working program in DEBUG, with the first instruction at IP=0100 and ending cleanly with a jump to PSP:0, all you need to do is to save it to disk with a .COM file extension. This uses two additional DEBUG instructions, "n" (Name) and "w" (Write). Let's name our program "hello." Then, execute -n hello.com to establish the filename. The "n" command establishes a "filespec" table in memory used for the disk transfers. Set the registers BX (high word) and CX (low word) to the length of the program in bytes. Obviously one can create pretty large files if you don't take care that BX=0000. Also, don't create files less than 10 hex bytes, or DEBUG gets annoyed. There are lots of complications, but for simple programs this works very well and is very simple.
DEBUG can be started with a filename on the command line, as "debug hello.com," and it will automatically create the filespec table and load the program in condition to be run immediately with the "g" command. If the file is an .EXE file, the file will be modified on loading. To see the actual bytes in the file, rename the file with a different extension, such as .BIN. .COM and .EXE files always load beginning at CS:0100, but you can load other files beginning at any address with the "l" (Load) command, such as -l200, which will begin loading at CS:0200. Of course, you must already have established a filespec with the "n" command or when starting DEBUG.
It is a custom for the first programming exercise in learning a new programming language to be writing a program the displays "Hello, World!" on the video screen. Let's do that here, creating a .COM file that does the job. Writing on the video screen may seem a daunting task, but DOS makes it easy for us by doing most of the work. In fact, DOS calls the BIOS for the job. There is no need for us to write the code, which would be a lot of work. We can use a DOS system call, which DOS handles in the following way. The entry is a software interrupt, INT 21, which is like a subroutine call. Which DOS function we want is selected by the value in AH. The 09 function writes a string (sequence of characters) on the video screen. The address of the first character of the string is put in DX, and characters are written until a $ is encountered, which terminates the string and is not written itself.
Start DEBUG as usual. Using "-a100" begin assembly at 0100 as follows: MOV AH,9 MOV DX,110 (we will begin the string at DS:0110) INT 21 JMP 0, four instructions. Now, execute -e110 "Hello, world!$". The program is complete, and can be tested by -g. The actual bytes can be found with a dump, and disassembly can check the assembly. Now name the file with -n hello.com. The length of the file is a few bytes less than 32, so put 0020 in CX, and be sure BX = 0000. Finally, just type w and press Enter. The file will then be written. Exit DEBUG and check that hello.com is there by doing a DIR. Now type "hello" and observe that the program works. Well, that's it! You have made a program that can be executed by DOS.
There are other DOS functions that handle keyboard input and video output, and the very important subject of disk files, that many programs require. It is all very much simpler than the equivalent in Windows, which is often of ludicrous complexity. The DOS function 4C terminates a program safely, without worrying about how the segment registers are set, and is better than INT 20. Here, as elsewhere, the chief problem is obtaining information.
DEBUG has some commands that help to manipulate memory. These commands all take a first parameter that is a range, consisting of the starting and ending addresses, or the starting address and the length, like 100 110 or 100 L10. The "f" (Fill) command puts a certain byte or bytes in a range. For example, -f100 11f 00 will fill the 16 bytes from 100 to 11f with zeros. The "s" (Search) command looks for a certain byte or bytes in the range, and reports where they are, if found. -s100 17f 00 will find all the zero bytes between addresses 100 and 17f. The "c" (Compare) command scans two ranges in memory and reports all the places where they differ. For example, -c100 L20 200 will compare the ranges 100 to 11f and 200 to 21f and return all the bytes that differ, in the format addr1 byte1 byte2 addr2, where addr1 and addr2 are corresponding locations in the two ranges. The "m" (Move) command moves the bytes in the first range to the second range (you don't have to worry about overlaps; DEBUG takes care of it, so you can move just a few places in either direction).
There are only two more DEBUG commands. The "h" (Hex Arithmetic) command takes two hex numbers as parameters, and gives you their sum and difference. Try -h 0 1 and see what you get. Negative numbers are returned in 2's complement form. The other command is "p" (Proceed), which is just like the trace command "t" except that it goes all the way through a subroutine, interrupt call and similar bits before returning. Use it instead of "t" if you don't want to see all the details of a DOS function, a subroutine call, or the copying of a string by a LOOP instruction.
Although the load and write commands can affect specific disk sectors, it is best not to go anywhere near this possibility, and to use "l" and "w" without any parameters, to avoid any misunderstanding with DEBUG that will shatter your disk. Of course, it's "w" that can do all the damage, and "l" is fairly safe to use in the form -laddr if you want to load the file at some odd location. Sometimes, disks can be fixed with these commands, but this takes an expert mechanic and good knowledge of disk formats.
The following references may be almost impossible to obtain, but they are what I used in the preparation of this page, and are in my standard reference library. Good luck in finding alternatives. This shows how much has changed in twenty years, and how inaccessible and expensive computer information has become.
IBM Corp. and Microsoft, Inc., Disk Operating System Version 3.20 Reference (Boca Raton, FL: 1986) or equivalent. Contains DEBUG commands.
Microsoft, Microsoft MS-DOS Operating System Programmer's Reference Manual (Bellevue, WA: Microsoft, 1981, 1983).
IBM Corp., Technical Reference 2.02 (Boca Raton, FL: IBM, 1983). Includes complete circuit diagrams and BIOS listing.
R. Rector and G. Alexy, The 8086 Book (Berkeley, CA: Osborne/McGraw-Hill, 1980).
T. Dettmann, DOS Programmer's Reference, 2nd Edition (Carmel, IN: Que, 1989).
Composed by J. B. Calvert
Created 24 July 2002
Last revised 8 August 2002