In the beginning, the DOS .COM file format was the format for executables of size less than 64KB and let's face it, who would really need more. I'm headed down a path to discuss PE file format executables and no good discussion is right without a foundation. In the beginning, there was DOS; life was simple and there was nothing between you and the computer. This post describes the early executable .COM file format showing code, data, everything that you need for small executables!
The .COM file format was simple. The whole program had to be less that 64KB in size. Technically less than 64KB minus 0x100 bytes, but stick with me. To run a program, DOS allocated memory for the size of the file, loaded the entire file contents into memory and ... branched to it via a "call" instruction. Simple stuff, absolutely no runtime fix ups, no DLL records, no DLLs!, heck no linked operating systems APIs that you could call! Need something in your program? You should make it part of your 64KB because that's all there is!
Going from memory today because a DOS computer is not readilly available, the offset (IP) at the start of execution was 100 hex bytes. Notice I didn't say into the starting segment, there was only ONE segment. Code, data, everything in one place with absolutely no distinction between anything. The verbatim bytes from the .COM file are placed into memory at CS:100, and DOS calls it. Did the program work or not? DOS didn't care, it loaded it into memory and ... "called" it. Everything else was bonus time. If the machine managed to not become a colored checkerboard mess during the life of the program, the program could eventually "return" and execution would go back to DOS who would put you back at the COMMAND.COM prompt.
Though I have no DOS computer readily available, I do have old programs. The smallest and easist for show and tell is one I wrote with some friends in 1988 to efficiently reboot a computer skipping the memory test during boot. The program is ... 16 bytes and performs the same activity as Ctrl-Alt-Delete including setting a flag to tell the BIOS to skip the memory test on boot. Interestingly, we wrote this using debug.com and the "a' command so there is no source and ... there never was any source. To show what does, need a disassembler that understands 16-bit intel little endian assembly language and DEBUG.COM from DOS would do it, but again, no DOS handy. Found a suitable disassembler online at shell-storm, here, but first need hex.
Using my own HexDump utility, get these hex bytes
* input file: boot.com
* OFFSET +0 +4 +8 +C
00000000 B840008E D8C70672 - 003412EA 5BE000F0
* 16 bytes converted
Feed that into the disassembler above and get the below. I added the comments.
0x0000: mov ax, 0x40 ; Establish vision to the BIOS data area
0x0003: mov ds, ax ; Data segment is 40 equals physical address 400
0x0005: mov word ptr [0x72], 0x1234 ; set 40:72 to 0x1234 (BIOS Flag for fast reboot)
0x000b: ljmp 0xf000:0xe05b ; Branch to BIOS "reboot" code
When run that, these 4 instructions are "it". DOS loads the code into memory at offset 100h, and branches (calls) to the first byte. That's it.
These were simple times and it was downright impressive what could be accomplished in 64KB, or in this case, in 16 bytes. This program reboots the machine, so there is no need for the code above to include return logic, but it could be done with a "ret" instruction or you could issue the DOS system call to terminate with a return code.
Calling runtime services
These early systems didn't have library linkage or DLL fixups - there were no .LIBS or .DLLS and "that's the way we liked it". If you wanted to call the operating system to do something useful like display a string or read a file, you loaded up some registers and kicked off an "int 21h" to call DOS. YES - software issuing equivilant of hardware interrupts to call the operating system! Its DOS, do what you want! The DOS programming references told you what to put in registers before the call, issue the DOS interrupt and things happen. Get the parameters wrong, the machine could be hosed - but hey! It was your machine and there was very little that couldn't be cured with a Big-Red-Switch or a Ctrl-Alt-Delete or as this program demonstrates, having a batch file call a program named "boot.com" to accomplish the same end.
Compatibilty with CPM
With a joke at the start of this post that that nobody would ever need more than 64KB, the real story of .COM in DOS is that it would provide application backward compatibility with the 16-bit (64KB is all you get) CPM operating system. DOS was written to the abilities of the 8088/8086 Intel CPUs and these had an segment:offset based addressing limit of 20 bits equals 1MB. The segment was shifted left 4 and offset added to get physical address. CPM executable format was 64KB flat memory with no segments. If you were designing DOS for an 8086/8088 and wanted backward compatibiltiy with the existing apps at the time, this made it possible. Load an application into memory, pre-set the CS, DS, ES and SS segment registers to point to the loaded SEGMENT (paragraph boundary memory) and "call" the loaded program and it would run the whole time being blissfully unaware that the machine supported segment:offset addressing.
Next, we move forward to the New Executables!