What is a Machine Word and its Implications?

The registers of a processor and the buses are the actual workhorses of a computer. Originally, computer registers used to be 4 bit and then moved to 8 bit, 16 bit, 32 bit and now 64 bits in size.

What is a Machine Word?

A “machine word” is the unit of data that can be moved between storage and the processor of a computer. The longer the machine word, the more the processor can do in one operation. It is the amount of data that the CPU’s internal registers can hold and transfer to the processor at one time. Historically, “words” used to be the term to refer to memory. For example, programmers used to use terms such as “writing words” or “processing words of instructions”.

The size of the “machine word” is dependent on the processor. It is pre-decided by the configuration of the processor’s instruction set hardware. “Word size” is the number of bits processed by a processor in one shot.

Intel 4004 – 4 bit word size – world’s first commercial microprocessor launched in 1971
Intel 8008 – 8 bit word size – world’s first widely used 8 bit processor
Intel 8085 – 8 bit word size
Intel 8086 – 16 bit word size (Data or Word Size: 16 bits, Address bus size: 20 bits)
Intel 8087 – 16 bit word size – floating point operations were introduced first time
Intel Pentium I – 32 bit word size
Intel Pentium II – 32 bit word size
Intel Pentium III – 32 bit word size
Pentium IV versions before the Prescott were 32 bit and since Prescott were 64 bit size
i3/i5/i7 – are all 64 bit in word size
All processors nowadays from both Intel and AMD have 64 bit word size

A word in the modern 64 bit processors will be 8 bytes, a DWORD (Double Word) is 16 bytes, a QWORD (Quad Word) is 32 bytes.

Pointer Size is the Word Size of the Machine

The word size (which has never really been a precise term) of a processor is best loosely defined as the largest natural size for arithmetic which is generally the size of the registers in the machine. This is quite frequently the width of the data path (which is distinctly different from the data bus). The data path is simply the width of the ALUs. The pointer size is usually the word size of a machine.

A Common Misunderstanding Related to Virtual Table Size

A common misunderstanding is that the maximum size of the virtual address space is determined by word size i.e. if the word size is n bits the max virtual address space is 2^n -1. But, that is not necessarily true!

The size of the virtual address space is simply determined by the number of bits in the virtual page number of the page table (and the TLB). On current AMD64 based machines, only 48 bits of the virtual address are usable. The upper 16 are a sign extension of bit 47. On current amd64 machines, the physical address size is 52 bits. These physical address bits are the ones that are sent on the bus. Many machines today use an address bus that is narrower than the number of address bits. These bits are simply split up and sent across the bus using multiple clock cycles. DDRx DRAM is an example of this process.

Assembly Language Programming

I vividly remember struggling to really code in assembly programming language (for 8086) for my microprocessors subject during my Computer Science Engineering days. Assembly programming language works by directly loading and doing operations between registers. For example, in the below “BP aka. Base Pointer” is a pointer register. Similarly, SP is Stack Pointer and IP is Instruction Pointer. Similarly, “LDR” is an instruction to load a value from a memory address into a register. Again, “STR” is an instruction to store a value from a register at a memory address location.

Similarly, “PC aka Program Counter” is a register in the CPU that keeps track of the memory address of the next instruction that needs to be executed. “PUSH” instruction places the current value of the “PC” register onto the stack, to be readied for execution. “MOV” instruction copies data from a source to destination. Similarly, “ADD” and “SUB” instructions are self-explanatory.

The size of these registers is dependent on the size of the word of the processor. A 16 bit word processor has registers that are 16 bit wide (or 2 bytes). This means the register can hold 0 to 65,535 for unsigned binary numbers and -32768 to 32767 for signed binary numbers.

Implication of 32/64 bit Processor with 32/64 bit Operating System

A 32 bit processor can run a 32 bit operating system. But, you cannot run a 64 bit operating system (OS) on a 32 bit processor. Installing a 64 bit OS would provide the processor with 64 bit instructions (32 bit OS will provide the processor with 32 bit instructions). Now if the processor is 32 bit and the OS is 64 bit, then you are in effect providing 64 bit instructions to a processor that can handle only 32 bit instructions at a time. If you are planning to do this, chances are that the OS wouldn’t even boot up. You won’t be able to run it. 32 bit processes cannot load 64 bit DLLs for execution.

On the contrary, you will be able to run a 32 bit OS on a 64 bit processor. And similarly, 32 bit applications on a 64 bit processor. A 64 bit processor can run both 32 bit and 64 bit applications. But, if you run a 32 bit application on a 64 bit processor, it is not really a very efficient way of running things. You’re not using the full potential of the processor.

Floating Point Instructions

Floating point instructions are processed through a set of registers because floating points cannot be stored in regular registers.

Assembly-language programs have to be written in terms of the specific processor’s instruction set and architecture, such as its CPU registers, memory locations, and input/output device registers. The low-level code is translated into machine-code by a simple process of transliteration, this is usually carried out by a program known as an assembler.

The output of the assembler (the ‘object code’) can then be ‘linked’ with any library routines or external subroutines which are called from the program, and ‘located’ by inserting into the file the absolute addresses of the memory locations where the program code and data will loaded in the target system. Because it is target machine dependent and the translated code is unstructured, it becomes really difficult to debug such a software and test the output of assemblers.

Assembly-language programming, which requires a detailed understanding of instruction sets and processor architecture, is normally only necessary in applications where it is critical that the processing models and programming constructs used in the design are supported properly at machine level. Typical examples are compilers, the kernels of operating systems, interface software including interrupt handling, and certain aspects of real-time software. This is the province of the ‘systems programmer’ rather than the ‘applications programmer’. Knowledge of assembly language programming is not essential for general applications programming.

Hope this is useful, thank you.

References:

https://www.swansontec.com/sintel.html