Hello, world!


Kernighan and Richie made the Hello, world! program famous as a first program for beginners in any computer language, thoough they were talking about C. If you can write a program to display this message on the console screen, then you have gone a long way towards being able to write and run any program. Here, I shall show how to do this in machine language and in assembly language for the 8086, using Windows 98 as a platform. The reader should compare the ease of preparation of these programs and, most significantly, the small size of the executable files. Few programs can be prepared as easily as the machine language one, and none are as small: only 26 bytes. The assembly language program is a bit more trouble, and a little larger: 1056 byes, most of this in the unused part of the .EXE file. For comparison, the .EXE file prepared with Borland C is 15,121 bytes. When prepared with LP77 Fortran, the .EXE file is 22,106 bytes.

To make the machine language program, a .COM file, start DEBUG at the MS-DOS prompt. Then, with -a100, assemble the following instructions: MOV DX,10B, MOV AH,09, INT 21, MOV AH,4C, INT 21. These instructions can also be assembled by hand, with the result BA 0B 01 B4 09 CD 21 B4 4C CD 21. Now, at the next location, 10B, load 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 24, which is the ASCII for Hello, world! followed by a $. You should test the program at this point, but we will proceed by saving it on disk. Execute -n hello.com to name the file, load BX with 0000, CX with 001A, and then execute -w. DEBUG will say loading 1A bytes, and when you check in the default directory, you will find HELLO.COM, 26 bytes long. Type HELLO at the DOS prompt, and you will see the greeting displayed.

This procedure is exceedingly easy and elegant, made possible by the segmented architecture of the 8086. A .COM program has all its segment registers set to the same value, called the PSP, Program Segment Prefix, which DEBUG or DOS establishes at the first free area of memory. The stack pointer SP is set to FFFF, and two bytes of zeros are pushed, so SP winds up at FFFE. Then, execution begins at 0100. A .COM program basically uses only 16-bit offsets, so the segment registers can be loaded with anything (so long as they are equal) and the program will still work properly. Actually, once a .COM program has been loaded and is running, it can do absolutely anything it wants. It can change segment registers, or load more code, or anything else like this. (In Windows, it may be limited, but not when running under DOS).

The starting at offset 0100 is the reason DEBUG starts up with IP = 0100, and why the write and load commands begin at 0100. When we loaded BX:CX with 0000:001A, the write command -w knew to write 1A bytes to disk beginning at 0100. First, we had to open the disk file, with -n hello.com. DOS recognizes the file extension .COM as indicating a program, makes a PSP, loads the program at 0100, and passes execution to it. An .EXE file has bytes within it that identify it, but a .COM file does not. DEBUG, like DOS, also treats .COM files differently. You must change the file extension to keep DEBUG in the dark. The easiest way to write programs for DOS is to prepare a .COM file. If you use an assembler, it is necessary to use ORG 100 so that the program will assemble properly.

A more general HELLO program can be prepared as an .EXE file, and in this case we want to use an assembler, such as MASM. An .EXE file allows a much more general use of segments, and the possibility of preparing a program as separate modules that are combined by LINK into a complete program. An .EXE file has a complex structure, and is specific to DOS. Even a Windows program can be contained in an .EXE file.

First, we need a source file. This can be prepared in Notebook, saved as a .TXT file, and then renamed to an .ASM file. There may be a way to save it directly as an .ASM file, so if you know how, do so. Save it to a directory that contains MASM and LINK. The file is:

stack SEGMENT PARA STACK 'STACK'
DB 64 DUP('STACK ')
stack ENDS
;
dseg SEGMENT PARA PUBLIC 'DATA'
greet DB 'Hello, world!$'
dseg ENDS
;
cseg SEGMENT PARA PUBLIC 'CODE'
start PROC FAR
ASSUME CS:CSEG,DS:DSEG,SS:STACK,ES:NOTHING
mov ax,dseg
mov ds,ax
mov dx,OFFSET greet
mov ah,09H
int 21H
mov ah,4CH
int 21H
start ENDP
cseg ENDS
END start

This program shows the extra things that have to be done for an .EXE file. First, we must explicitly define a stack, and the stack segment declaration does this. PARA means that it starts on a 16 byte boundary, an address like XXX0. The first STACK is its name, the second means that it is intended for a stack, and 'STACK' is its group name. All segments with this name will be combined. The resulting stack will be the size of the largest stack segment defined. This stack is 512 bytes long, and is filled with the ASCII code for STACK with 3 spaces between them. This makes the stack easy to see on a memory dump. You may well ask why such a small program needs such a large stack. Well, when Windows or DOS detects an emergency, it uses the stack of whatever progam is running to save data. This stack is only there in case.

Next, we have a segment named DSEG (any name would do) that is also PARA, but is also PUBLIC so it will be combined with data segments from other modules. Its group is 'DATA' and all segments in this group will be concatenated and addressed as one. Of course, it is not necessary here, but might as well hold the greeting, defined by a DB. Then comes the code segment, named CSEG, also PARA and PUBLIC, and in the group 'CODE'. In this program, we will have only one segment in each group, and so no problem will arise in addressing variables. MASM behaves oddly (and erroneously) when combining segments, it turns out.

In the code segment, we must use a PROC declaration to declare the main routine as FAR. The ASSUME statement in the PROC declaration shows what we expect to be in the segment registers when the program executes. The very first thing we must do is load DS with the segment address of DSEG, as shown. This will only be determined when the program is loaded. Now we can go ahead with the same instructions as in HELLO.COM. The symbol GREET will carry the offset of the message. This, too, will only be known at load time, or at least when all the modules of the program are linked. As with HELLO.COM, the best way to end the program and return to DOS is with MOV AH,4C INT 21H. This is a DOS operating system call like the one we use to display the greeting. The END instruction shows where to start execution, at the FAR pointer START.

Assemble the program with MASM HELLO,HELLO,HELLO,NUL; Simpler commands are possible in DOS, like MASM HELLO; but they will not work here in Windows. Check to make sure you got a HELLO.OBJ file, and print out the HELLO.LST file. Do not do this while in the MS-DOS prompt! It won't print, you will clobber Windows, and will have to restart to unbusy the printer driver. Instead, load the listing file in Notebook, and print it out from there. Note the items that were not determined at assembly time. The listing file is excellent documentation for an assembly program.

Now start LINK and answer the prompts. At [.OBJ] enter HELLO, and accept the remaining three defaults. Almost instantly the program will come back, and you should have HELLO.EXE in the default directory. LINK puts the message at offset 0, then the stack above this, and finally the code. The HELLO.MAP file gives the details. Again, do not print this file while in MS-DOS by using >PRN! Use Notebook to avoid insulting the printer driver, that likes to do all the printing itself. To test HELLO.EXE, rename HELLO.COM temporarily to something like HELLO.COX. Now type HELLO. When you press Enter, Hello, world! will be displayed. HELLO.EXE can be investigated in DEBUG as well, and executed instruction by instruction. Use -p to avoid tracing each instruction in an interrupt handler.

The .EXE program is of considerable generality. The code and data segments can be filled with instructions and data, and other modules can be linked in with LINK.

Assembler can also be used to generate .COM files. The source file for a HELLO.COM program is shown at the right for use with Turbo Assembler (TASM). It uses the simplified segment directives instead of the MASM ones. .MODEL tiny tells the linker to load the PSP in all the segment registers. The code segment is introduced by .CODE, which uses default names and characteristics like those we have given for MASM. org 100h sets the program counter for assembly at the entry point for .COM files, and this is followed by a short jump to the start of the program. Next, the data segment is opened with .DATA. This could be done first, before opening the code segment, but this shows that segments can be opened and closed at will. The code segment is re-entered, and the routine that prints a C string is written. A C string ends in a null (00) byte. Using DOS function 09h, the string was terminated with $, which is bad news if you have to output a string containing a $. This program uses DOS function 02h in a loop. CX limits the maximum number of characters printed to 256, but the routine usually exits when it encounters the terminating 00. When a character has been loaded, the flags are set with or dl,dl which just reproduces the character. The main program just loads BX, calls sout, and then terminates.

This source file is assembled by TASM hello and then linked with TLINK /t hello. The /t switch tells TLINK to create a .COM file.


Return to Math Index

Composed by J. B. Calvert
Created 18 September 2004
Last revised