>>55471
Writing our first program
————————-
The structure of a C program defines a main function, and a return value. Meanwhile, the runtime only ever executes instructions sequentially, how do we ever quit? If we don't actually ask to operating system to terminate the program, the program will eventually reach a last instruction, and duitifully continue executing whatever random data remains after sequentially - this will certainly result in a crash when random memory locations are accessed, or the binary doesnt translate to an actual valid instruction. So, the Linux Kernel provides an system call exit() to allow us to request termination. So, we know a minimal C runtime environment must call exit() at the end of the termination of the main function, and must call the main function.
We will begin this process with by writing a simple exit.S assembly program, that demonstrates how a system call is actually made, and is a minimally viable program.
exit.s:
.text
.global _start
_start:
movq $60, %rax
movq $0, %rdi
syscall
Lets analyze this cryptic little program, and go over how to build it. Lines in the listing that begin with a period are "assembler directives" cause the assembler to change mode, or execute some kind of command. The ".text" directive tells the assembler this section of the file will contain code rather than data, other segments include ".data" , or ".bss" both are types of data segments.
Symbols, are text readable to humans that defines an address location in memory, for say a variable name, or the location of a function, etc, in memory. They "_start" symbol is a special symbol used to denote the entry point of the executable, we define the symbol with the "_start:" line, the colon marks it as a label. The ".global" directive above it causes the assembler export this to a special global symbol table so that the linker can see the symbol - that is, its not just internal, its an address other programs or things may be intrested in. In this case, we need the linker to see the symbol so that it can write the correct address in the entry point of the executable. We don't know what memory address this is at code time, but it will be determined later for us.
Following this, we actually see the only assembly instructions we will ever need to know to write programs using system calls, movq, and syscall. movq moves a value into a "register", a register is place to store data that is actually on the processor - not in memory. Knowing what registers are available, and what they are actually used for will become important, and is more difficult than the assmebly instructions we just wrote.
The first moves the value 60, into register rax. The "$" is used to denote a literal, that is, 60 isn't an address, its the value we want. Note, this 60 is in base 10, like you are used to reading.
It follows then, that the second instruction moves 0 into rdi.
Finally, the 3rd instruction "syscall" requests the operating system preform a service for us.
But why the values, 0 and 60? And how does the operating system know which services to preform for us? As you can imagine, these questions are related. %rax, the rax register, is used to hold the system call number - the identity of the service, and the first argument to the service, is passed in %rdi. For exit, a quick view of man 2 exit shows that this is the "return code" the process will have. So, our process retruns 0.
But, how did we know $60, was the correct value to put in %rax to cause an exit? That is actually the number of the syscall as defined by the kernel, this information is made available by standard library header files as symbolic constants (sys_exit), and a listing can be found here: http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ .
The what registers to use to pass arguments are actually defined in the System V AMD64 ABI , which tells us that arguments go in %rdi, %rsi, %rdx, %rcx, %r8, %r9 .
we compile the thing by use of the as, and ld commands. as is used to generate object code:
as exit.s -o exit.o
ld is used to create a useful executable
ld -o exit exit.o
Thus we can exit.