/cyber/ -Escaping the Standard C library: Writing LS

Name
Email
Subject	REC STOP
Comment *
File	Select/drop/paste files here
Password	(Randomized for file and post deletion; you may also set your own.)
Archive	Archive [500 char limit]
* = required field	[▶Show post options & limits] Confused? See the FAQ.

Flag	open flag menu
Embed	(replaces files and can be used instead)
Oekaki	Show oekaki applet (replaces files and can be used instead)
Options	Do not bump (you can also write sage in the email field)
Allowed file types:jpg, jpeg, gif, png, webp,webm, mp4, mov, swf, pdf Max filesize is16 MB. Max image dimensions are15000 x15000. You may upload5 per post.

File: 8f01d3a42cc8e20⋯.png (39.34 KB,470x470,1:1,5-step-infographic.png)

Escaping the Standard C library: Writing LS User 03/28/19 (Thu) 00:05:13 No.55470

This post contains C and X86_64 code for for Linux to write a functional program without requiring the C library. This is generally not useful, and is done as a pedantic excersize to learn about how the linux runtime environment, and the C Library works. In this work, we will implement a small, incomplete, and probably bug-ridden ls, including various standard c library functions as part of that implementation.

What is the C Library ?

———————–

The libc, the standard c library is the common set of functions used by c programmers, including the most commonly used functions like malloc(), and printf(). If you have written C, there is a great chance that you have used these functions, but have you ever wondered how they are implemented? How does printf actually get text to your terminal? Unfortunately, most of those details are actually abstracted by kernel, and terminal software, and will will not uncover all of the mystery by simply writing the printf() function - thankfully printf() knows absolutely nothing about your terminal. Libc is used to make the same functions (Ie. printf() ) available across all platforms. Using libc, or libc conformant functions, simplifies the porting of applications across platforms. Libc does this by taking the services that various operating systems offer, and building the standard functions out of them - that is to say, Libc abstracts the operating system away from the code.

# C Program #

# Standard C library #

# Operating System Kernel #

Operating System Services / Application Programming Interfaces

————————————————————–

The linux operating system provides services to programs through "system calls". System calls are similiar to functions, run by the kernel (linux) that manipulate hidden trusted data structures, transfer data between processes, write files, create new processes, and a wide variety of other serivices. A special peculiartiy of unix like operating systems is everything is considered a file, and the system calls open(), read(), write(), close() are used to interact with them. Our printf() example actually only uses the write() system call to a special file to write data to your terminal. The manual pages for system calls are page 2, and can be read for example with the following for write() : man 2 write

____________________________

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 00:06:41 No.55471

>>55470

The Linux Run-Time Environment

——————————

When creating a process, different regions of memory (RAM), are allocated in the address space for different purposes, these regions are called "segments". Each Linux process has a text, data, heap, and stack segment, each with a specific purpose.

The text segment is used to contain the actual machine code run by processor. When program is started, an address called the "entry point" is the first instruction executed, and all instructions execute sequentially following it - although special instructions called branches can skip around - this conditional branching is used to create loops, and if structures in C.

The data segment is a read only segment that contains static program data, including strings, initial values for variables, and otherwise read only structures.

The heap is a segment used for dynamically allocated memory. When a process requests dynamically allocated memory additional memory is allocated on the heap. This might be slightly confusing, a clever reader might ask, by the heap is already allocated at time the process is loaded into memory, how can a process request more memory? The answer to that really, is that actually these segments are just regions of address space, and although all of the address space exists at the time the process is created, not all of the memory is actually useable - if it was, running any program would consume massive amounts of memory. Instead, the kernel will only allow access to addresses that it knows are supposed to be used (technically addresses that have a mapping in the page translation table, which maps the pretend addresses of the processes space, to the real physical addresses of actual physical memory - the page translation table is used to multiplex RAM) and attempts to access memory outside of allocated memory, IE. invalid addresses, leave the kernel with nothing reasonable to do other than kill the process as no data can meaningfully be read or written there, this is done by issues SIGSEGV - a segmentation fault.

The stack contains information that allow C function calls to occur - all temporary structures for a function call including its arguments, its local variables, and where it was called from "the return address" are stored here. The return address is used to go back to executing the function that was called before the current function at the correct point. When the stack segment grows until no more room remains for new function calls, a "stack overflow" occurs. This problem typically only occurs with deep recursion.

In linux, new processes are only ever created from old processes in a process called forking. When a processes is forked it inherits certain data structures from its parent including special file descriptors : standard input, standard output, and standard error, which identify files to which input is read from, output sent to, and error messages sent to respectively.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 00:42:57 No.55472

>>55471

Writing our first program

————————-

The structure of a C program defines a main function, and a return value. Meanwhile, the runtime only ever executes instructions sequentially, how do we ever quit? If we don't actually ask to operating system to terminate the program, the program will eventually reach a last instruction, and duitifully continue executing whatever random data remains after sequentially - this will certainly result in a crash when random memory locations are accessed, or the binary doesnt translate to an actual valid instruction. So, the Linux Kernel provides an system call exit() to allow us to request termination. So, we know a minimal C runtime environment must call exit() at the end of the termination of the main function, and must call the main function.

We will begin this process with by writing a simple exit.S assembly program, that demonstrates how a system call is actually made, and is a minimally viable program.

exit.s:

.text

.global _start

_start:

movq $60, %rax

movq $0, %rdi

syscall

Lets analyze this cryptic little program, and go over how to build it. Lines in the listing that begin with a period are "assembler directives" cause the assembler to change mode, or execute some kind of command. The ".text" directive tells the assembler this section of the file will contain code rather than data, other segments include ".data" , or ".bss" both are types of data segments.

Symbols, are text readable to humans that defines an address location in memory, for say a variable name, or the location of a function, etc, in memory. They "_start" symbol is a special symbol used to denote the entry point of the executable, we define the symbol with the "_start:" line, the colon marks it as a label. The ".global" directive above it causes the assembler export this to a special global symbol table so that the linker can see the symbol - that is, its not just internal, its an address other programs or things may be intrested in. In this case, we need the linker to see the symbol so that it can write the correct address in the entry point of the executable. We don't know what memory address this is at code time, but it will be determined later for us.

Following this, we actually see the only assembly instructions we will ever need to know to write programs using system calls, movq, and syscall. movq moves a value into a "register", a register is place to store data that is actually on the processor - not in memory. Knowing what registers are available, and what they are actually used for will become important, and is more difficult than the assmebly instructions we just wrote.

The first moves the value 60, into register rax. The "$" is used to denote a literal, that is, 60 isn't an address, its the value we want. Note, this 60 is in base 10, like you are used to reading.

It follows then, that the second instruction moves 0 into rdi.

Finally, the 3rd instruction "syscall" requests the operating system preform a service for us.

But why the values, 0 and 60? And how does the operating system know which services to preform for us? As you can imagine, these questions are related. %rax, the rax register, is used to hold the system call number - the identity of the service, and the first argument to the service, is passed in %rdi. For exit, a quick view of man 2 exit shows that this is the "return code" the process will have. So, our process retruns 0.

But, how did we know $60, was the correct value to put in %rax to cause an exit? That is actually the number of the syscall as defined by the kernel, this information is made available by standard library header files as symbolic constants (sys_exit), and a listing can be found here: http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ .

The what registers to use to pass arguments are actually defined in the System V AMD64 ABI , which tells us that arguments go in %rdi, %rsi, %rdx, %rcx, %r8, %r9 .

we compile the thing by use of the as, and ld commands. as is used to generate object code:

as exit.s -o exit.o

ld is used to create a useful executable

ld -o exit exit.o

Thus we can exit.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 01:35:46 No.55474

WHy do people code for themselves anyway? programmable ICs are cheap and writing stuff in terms of logic statements is not that hard

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 01:49:57 No.55476

>>55474

This post is clearly not intended for you if you think that IC's can replace coding. I will await your web browser IC.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 02:02:02 No.55477

>>55476

Why would I write my own web browser when 99% of web pages are specifically designed to interface with either Internet Explorer 9 or Chrome, and are therefore botnets that would be overtly hostile to homebrewed browsers, probably displaying a simple page along the lines of a plaintext "Please use a supported browser." error?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 02:09:42 No.55478

>>55477

That went over your head - trying to use an IC for a highly complex and constantly evolving task like rendering a web page is retarded. Programming is useful.

>Why do people code

doesn't make much sense as a question. They code because computers provide immense utility only unlocked through coding.

>Why do people code for themselves anyway?

To learn to code. Why do people study anything.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 14:47:11 No.55484

File: 40e25c341123694⋯.jpg (84.52 KB,400x644,100:161,forth-on-the-atari-ISBN-38….jpg)

>>55477

You would, if you wrote your own OS that's not x86/Unix/C based, as most boring projects are these days.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 14:53:54 No.55485

>>55484

See >>54606

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 15:48:54 No.55486

File: 98c48fa47e4088f⋯.mp4 (11.69 MB,640x360,16:9,output.mp4)

>>55485

Yeah, I read that Neal Stephenson article the first time it was linked on slashdot decades ago. Anyway it's just some words. A lot of people talk, but don't actually do anything interesting. Terry Davis actually did something different and interesting, but in the end he realized that x86 was the wrong platform because:

a) it translates CISC->RISC internally and this defeats all the hard work he did to optimize the CISC code generated by his compiler

b) in this particular case, CPU literally means cianigger processing unit

He had laid out some preliminary plans to move his OS to 64-bit PIC, but has been MIA or KIA for some time now.

ColorForth is another source of inspiration, and a *real* project that showed you don't have to be bound to Unix and C. It was for x86, but there's nothing that prevents someone from making a similar project on ARM or whatever.

https://colorforth.github.io/

I've been running Linux since 1995, then OpenBSD, and more recently NetBSD on ARM. They basically get the job done, but there's nothing terribly interesting here for me, nothing trully fulfilling. I actually had a lot more fun on older systems like Z80-based home computers and also Amiga. Ever since Windows became the de-facto standard (mid 90's with Win95, and the death of Commodore, Atari, and various other alternatives), I've been running Unix-based stuff simply because it's not as bad. But they too have been getting needlessly complicated and subverted.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/28/19 (Thu) 19:08:20 No.55489

>>55486

Terry Davis has nothing to do with Unix being a good or bad idea. Everything as a file is an awesome idea. I am actually far more interested in 9p and inferno than Temple Os which I used when it was called ThorOS.

x86 is a bad platform for a variety of reasons, but i am not sold on the CISC->RISC overhead being significant at the scale of desktop CPUs, or unpredictable for optimization. x86 is bad because its a closed design that no one but US Gov and intel have seen.

>Z80-based home computers

>x86 has too much overhead with CISC->RISC, lets just opt for the pure CISC utlra slow clone.

I understand a bit though, building your own, means more control, and more freedom, which is what I am trying to do with this post, show people how to build the standard c library, and what system calls and ABIs are.

Generally, people ranting about how unix is horrible don't have a decent alternative. nb4 lispfag, lisp isn't a kernel, there is no reason a unix/c system couldn't dump you at a debugger when a process received SIGSEGV.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/30/19 (Sat) 22:04:06 No.55519

>>55472

No more time to write tutorials, here is the code for a hyper simple, std lib free ls:

ls.c


#include <sys/types.h>
#include <unistd.h>
//#include <sys/stat.h>
//#include <fcntl.h>
//#include <dirent.h>
#include <stdarg.h>
#include <sys/syscall.h>
#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0
#define STDOUT 1
#ifndef BUF_SIZE
#define BUF_SIZE 1024
#endif
#ifndef PRINTF_BUF_MAX
#define PRINTF_BUF_MAX 64
#endif
#define O_RDONLY 00000000
#define FALSE 0

struct linux_dirent64 {
	ino_t d_ino; /*inode number*/
	off_t d_off; /*offset to next structure */
	unsigned short d_reclen; /*size of this dirent*/
	unsigned char d_type; /*file type*/
	char d_name[]; /* filename */
};

struct iovec {
	void *iov_base;
	size_t iov_len;

};

int getdents64(unsigned int fd, struct linux_dirent64 *dirp, unsigned int count);
ssize_t writev(int fd, const struct iovec *iov, int iovcnt);

const char *cwd = "." ;

int 
main(int argc, char *argv[])
{
	char buf[BUF_SIZE];
	char *dir = (argc > 1) ? argv[1] : cwd ; 
	int fd = open(dir, O_RDONLY);
	if (fd == -1)
	{
		write(STDOUT, "Error!\n", 7);
		exit(EXIT_FAILURE);
	}
	int bytes_read = getdents64(fd,&buf,BUF_SIZE);
	off_t base = 0;
	struct linux_dirent64 *dirp = (struct linux_dirent64*)&buf;
	while(dirp < (void*)&buf + bytes_read)
	{
		if(strcmp(dirp->d_name, "..") != 0 && strcmp(dirp->d_name,".") != 0)
			printf("%s  ", dirp->d_name);
		dirp = (void*)dirp + dirp->d_reclen;
	}
	puts("");
	close(fd); 
	exit(EXIT_SUCCESS);
}

int
puts(const char *s)
{
	const char *newline  = "\n";
	struct iovec out[2];
	out[0].iov_base = s;
	out[0].iov_len = strlen(s);
	out[1].iov_base = newline;
	out[1].iov_len = 1;
	writev(STDOUT, &out, 2);
	return 0;
}

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/30/19 (Sat) 22:05:28 No.55520

>>55519

ls.2 - 2


int
printf(const char *fmt, ...)
{
	int buffer_full = FALSE;
	char buffer[PRINTF_BUF_MAX];
	va_list ap;
	va_start(ap, fmt);
	int i = 0;
	const char *s1;
	for(char *c = fmt; *c != 00 && i < PRINTF_BUF_MAX; *c++, i++)
	{
		if(*c != '%')
		{
			buffer[i] = *c;
			continue;
		}
		else
		{
			c++;
			switch(*c)
			{
				case 's' :
					//lets handle a string
					s1 = va_arg(ap,char*);
					if(strlen(s1) > PRINTF_BUF_MAX - i)
					{
						break; //NOT ENOUGH SPACE!
					}
					strncpy(&buffer[i],s1, strlen(s1));
					i+= strlen(s1) - 1; //-1 because for loop will add one!
			}
		}
	}
	va_end(ap);
	write(STDOUT,buffer, i);
}

char *
strncpy(char *dest, const char *src, size_t n)
{
	size_t i;
	for(i = 0; i < n && src[i] != 00; i++)
	{
		dest[i] = src[i];
	}
	for(; i < n; i++)
	{
		dest[i] = 00;
	}
	return dest;
}

int
strcmp(const char *s1, const char *s2)
{
	char *p1 = s1;
	char *p2 = s2;
	if(*p1 == 00 && *p1 == *p2)
	{
		return 0;
	}
	while(*p1 != 00)
	{
		if(*p1 > *p2)
		{
			return 1;
		}
		else if(*p1 < *p2)
		{
			return -1;
		}
		p1++;
		p2++;
	}
	//if we are here, we have traversed s1, is s2 done?
	if(*p2 != 00)
		return -1;
	return 0;
}

int
strlen(const char *s)
{
	int i = 0;
	for(char* cur=s; *cur != 0; cur++)
	{
		i++;
	}
	return i;
}

int 
getdents64(unsigned int fd, struct linux_dirent64 *dirp, unsigned int count)
{
	return syscall(SYS_getdents64, fd, dirp, count);
}

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/30/19 (Sat) 22:07:45 No.55521

>>55520

ls.c 3



ssize_t
write(int fd, const void *buf, size_t count)
{
	return syscall(SYS_write, fd, buf, count);
}


ssize_t
writev(int fd, const struct iovec *iov, int iovcnt)
{
	return syscall(SYS_writev, fd, iov, iovcnt);
}

int
open(const char *pathname, int flags)
{
	return syscall(SYS_open, pathname,flags);
}
int
close(int fd)
{
	return syscall(SYS_close, fd);
}

void 
exit(int status)
{
	return syscall(SYS_exit, status);
}

long syscall(long number, ...)
{
	long result;
	va_list ap;
	va_start(ap,number);	
	asm("syscall"
	    : "=a" (result)
	    : "D" (va_arg(ap,long)),"S" (va_arg(ap,long)), "d" (va_arg(ap,long)), "a" (number)
	    : "rcx", "r11"
	   );
	va_end(ap);
	if(result < 0)
	{
		result = -1;
	}
	return result;
}

Here is a loader

loader.s


.globl _start
.text
_start:
	popq	%rdi
	lea	(%rsp), %rsi
	call	main

and finally, a make file

Makefile


CFLAGS =-g -march=native -nostdlib
ls : ls.c
	cc $(CFLAGS) ls.c loader.s -o ls
clean : 
	rm -f ls

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/30/19 (Sat) 22:10:21 No.55522

>>55521

I won't return to this board for a few years due to a change in my life, I wanted to write a tutorial for you guys, but I ran out of time. Godspeed anons. Stay 水.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User 03/30/19 (Sat) 23:19:13 No.55524

>>55522

take care anon, we will await your ls.c 4.0

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

User SAGE! 09/28/20 (Mon) 12:33:27 No.58095

asdf

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.