Assembly Part 1 - Let's Learn Assembly!

June 23, 2020

In the beginning, there were punch cards. Eventually, someone got the bright idea to have the computer be programmable. Just type in the hexadecimal code and let it run. The problem is that it’s very difficult to look at hexadecimal and decipher what it does.

Enter Assembly

Assembly is still really down to the metal, where every detail of how the computer does its task must be specified. The difference is that Assembly makes these instructions human-readable.

The next step above that would be to use a programming language, such as C, Java, or Typescript. This is certainly easier than using Assembly, but to this day, there are still tasks which systems programming languages cannot accomplish. Some examples include:

  • Aggressive optimization (C and Rust are already very fast, but they’re not perfect)
  • Assembly makes it easier to calculate exactly how long a program will take to run
  • Programs that have to work directly with hardware, such as drivers
  • The booting of an operating system

Requirements

Assembly isn’t the same on all systems, unfortunately. Different computers need different code to work. Here’s what you need for this tutorial:

  • An x86 computer (This won’t work on a Raspberry Pi, for example)
  • A 32-bit or 64-bit operating system (preferably Linux)
  • An Assembler (NASM on Linux or MASM on Windows)
  • Experience in low-level programming (C, C++, Rust, and Go are good languages to know)

Sections

Executable programs can be divided into three sections (you can use more, but this tutorial will stick to three). Here they are:

  • text - This section contains the actual instructions that your code will run.
  • bss - All of the global variables are stored here. Any static variable is placed here.
  • data - This section is used for constant globals.

Sections are declared simply by typing section .name. For example, the data section would be declared using:

section .data

Variables

Variables, as we’ve already talked about, are stored in the bss section. We can’t just declare their value, like in a normal language. Instead, we can tell the assembler exactly how many bytes to reserve.

section .bss
	var	resb 4

This creates a variable called var and reserves four bytes for it. If we wanted to reserve two bytes, we would’ve put a 2 at the end. To access the value of var, we surround its name in square brackets: [var].

Statements

A statement in Assembly follows the following format:

mnemonic [operands] [;comment]

Let’s break it down.

The mnemonic is the actual to run. Some operations take one parameter. Some take multiple. There are many instructions in Assembly, but we’ll focus on the following ones.

Mnemonic Operand 1 Operand 2 Description
mov location value Sets operand 1 to operand 2
inc location Adds one to the location
dec location Subtracts one from the location
add location value Adds the value to the location
sub location value Subtracts the value from the location
jmp label Jumps to a part of the program
cmp value1 value2 Compares two values
je label Jumps to a part of the program if the two values are equal
int interrupt Creates a software interrupt

Comments in Assembly are anything that comes after a semicolon (;). You should already be familiar with what these do – they help explain your code to other people who are reading it.

We’ll go into more detail about these instructions later. For now, here are some examples:

mov [var], 5	; var = 5
dec [var]	; var --
add [var], 3	; var += 3
; See if you can come up with your own!

Labels

Consider the following C code

void main() {
	int var = 0;
	while (1) {
		var ++;
	}
}

This code uses a while loop to repeat forever. Assembly doesn’t have loops that are as simple though. In Assembly, you have to do something more similar to the following

void main() {
    int var = 0;
    loop:
        var ++;
        goto loop;
}

You’d be forgiven for not knowing this is valid C code. (It’s pretty bad practice.) But in Assembly, that’s all you have. Let’s try to translate this to Assembly.

Let’s set up our program. We need a text section to store the program instructions, and a bss section to store our variable.

section .text
section .bss

We didn’t talk about this yet, but we need to tell the program where to start in our program. We’ll make a label called _start and start there. We can tell the linker where to start using global _start.

section .text
	global _start

	_start:

Now we need to create our variable. We’ll use a 32-bit integer, which requires four bytes.

section .bss
	var resb 4

Now we need to initialize the variable. This is exactly what the mov instruction is for.

_start:
	mov dword [var], 0 ; We have "dword" here because it's a 32 bit operation

Now we need a loop. We’ll make a label, call it loop, and jump unconditionally to it.

_start:
	mov dword [var], 0
loop:
	jmp loop

Finally, we need to increment our variable.

section .text
	global _start

	_start:
		mov dword [var], 0
	loop:
		inc dword [var]
		jmp loop
section .bss
	var resb 4

I should probably mention how you can run this. Assuming that the file is called incrementor.asm, and you’re using NASM:

nasm -f elf incrementor.asm
ld -m elf_i386 -s -o incrementor incrementor.o
./incrementor

Registers

Did you know that your CPU has built-in memory? 😲 Registers are memory that is built into the CPU. Because of this, it’s lightning-quick to use registers, instead of storing values in RAM.

So why don’t we just use registers for everything? Here’s the problem. We don’t have very many registers. This tutorial will only use four. This will become a problem later, but as long as we need less than four variables, this should work for us. We’ll use four: eax, ebx, ecx, and edx. We’ll use these four because it’s very easy to remember them. They all follow the format of e_x. Each of these registers can store one 32-bit number.

We can rewrite our infinite loop from before to use a register

section .text
	global _start

	_start:
		mov eax, 0
	loop:
		inc eax
		jmp loop

Now we don’t need any RAM at all!… except to store the actual program in memory. We also don’t need to specify the size of the operation. The size of eax is always four bytes.

Conclusion

This concludes the basics of Assembly. Keep an eye out for the next article to see how we can write an actual program using Assembly.


About the author