[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]

/prog/ - Programming

Programming
Name
Email
Subject
REC
STOP
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶Show post options & limits]
Confused? See the FAQ.
Options

Allowed file types:jpg, jpeg, gif, png, webp,webm, mp4, mov
Max filesize is16 MB.
Max image dimensions are15000 x15000.
You may upload5 per post.


File: 1450393789140.png (147.95 KB,1920x953,1920:953,aloonix.png)

41a193 No.3734

Hi, for years, I've been writing most of my things in python, where it's completely unacceptable to be calling external commands, and been working on low level things with C and assembly, where I barely even need libc most of the time, so calling external commands in a program other than a shell script is very weird to me.

Now, after opening my eyes to (the awesomeness that is) lua, I'm wrapping a script around a tool written in shell (can't be written/it's very cumbersome to write in anything else, for a number of reasons). I'm also calling curl somewhere in the script. I do this because I've used libcurl in C before, and to me, the library doesn't seem to be made for using it to simply get a file, and instead, the command line interface is.

I'm aware that calling external commands breaks portability, but that is a non-issue in this case.

Now, I'm debating if I should either call find -type f or use LuaFileSystem to recursively iterate over every file in a directory.

I'd like to know: What is the most acceptable option, calling curl and find, or using their respective libraries? Also, is one of the options actually objectively better?

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

41a193 No.3740

Using the libraries and APIs is better unless you need the flexibility. Don't call an external program unless you need to. On top of the pain of fork and execing, it's also much more expensive and far less efficient.

Also, curl is just a thin wrapper around libcurl anyway.

It's hard to know what would be the proper way exactly without knowing your exact use case. The best option, though, is to either use libcurl and readdir/readdir_r for those two uses, or luacurl and luafilesystem, depending on which part of your program and how integrated you want that functionality to be.

You usually don't want to fork and exec unless you want to be able to invoke external functionality dynamically, such as having the user specify a command to use (like with a shell, or with any system that needs user plugins that should be language-agnostic, can work with program output, and don't need real IPC).

Don't call an external command unless you really need to and it fits your workflow. Doing an external call to something like curl or find is redundant, pointless, and prone to way more bugs. Not to mention that the overhead of managing the forked process as a child and making sure it exits properly and everything that goes with it is way more work than just using libcurl and readdir properly.

Here's the basic way with a fork:


int pipefd[2];
pipe(pipefd);

pid_t parent = getpid();
pid_t pid = fork();

if (pid == -1)
{
// Some error forking
return 1;
} else if (pid > 0)
{
// I am the parent
// Close writing end
close(pipefd[1]);

ssize_t newstart = 0;
char buffer[BUFSIZE];

while (true)
{
// Do the BUFSIZE - newstart - 2 for necessary injected characters, if needed
ssize_t size = read(pipefd[0], buffer + newstart, BUFSIZE - newstart - 2);

// This is done because the buffer was already offset. We want size
// here to be the size of the active buffer, not the size of the read
// characters from the pipe
size += newstart;

if (feof(pipefd[0]))
{
if (size > 0)
{
buffer[size] = '\n';
buffer[size + 1] = '\0';
// This calls dosomethingtofile on each filename split, then if
// there is a trailing one without a newline at the end, it
// moves it to the beginning of the buffer and returns the
// character position immediately after
newstart = splitfilenames(buffer, size);
}
break;
}

if (size > 0)
{
buffer[size] = '\n';
buffer[size + 1] = '\0';
// See previous call
newstart = splitfilenames(buffer, size);
}

}
int status;
do {
waitpid(pid, &status, 0);
} while (!(WIFEXITED(status) || WIFSIGNALED(status)));
} else
{
// I am the child
// Close reading end
close(pipefd[0]);
// Exec the command
execlp("find", "find", "/path/to/dir", (char *)NULL);
_exit(1); // exec never returns
}

Note that you still need to define splitfilenames here, which would call dosomethingtofile on each filename it finds, and if it has a fragment of one last in the buffer, it moves it to the beginning of the buffer and returns its size (or zero if there is no fragment).

And the basic way with readdir:


void recursivelydodir(int fd)
{
DIR *dir = fdopendir(fd);
struct dirent *dirent = NULL;
while ((dirent = readdir(dir)))
{
char *name = dirent->d_name;
size_t length = strlen(name);
// Skip "" . and ..
if (length == 0 || (length == 1 && name[0] == '.') || (length == 2 && name[0] == '.' && name[1] == '.'))
{
continue;
}

struct stat st;
fstatat(fd, name, &st, 0);
if (S_ISREG(st.st_mode))
{
dosomethingtofile(name);
} else if (S_ISDIR(st.st_mode))
{
int newfd = openat(fd, name, O_RDONLY | O_DIRECTORY);
recursivelydodir(newfd);
close(newfd);
}
}
}

void foo()
{
int fd = open("/path/to/dir", O_RDONLY | O_DIRECTORY);
recursivelydodir(fd);
close(fd);
}

Not only is readdir easier, easier to read, more reliable, and faster, it's also more portable, because it (or some compatible version of it) is implemented in most Unixlike OSes, and Windows through Cygwin.

Note that I have not compiled or run either of these samples which I just wrote up on the spot, so they should be treated as pseudo-code at best. There should also be a lot more error-checking and boilerplate stuff, and the fork example should also probably check the exit status.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

41a193 No.3741

It just occurred to me that you probably meant calling the script in Lua with pcall or something.

You can do that, but still keep in mind it will be far slower and more prone to bugs. You also have issues if you want to be able to get stdout, stderr, and exit status at once.

It is also worth mentioning that POSIX does have a simpler popen interface that wraps forking and execing. I haven't done POSIX C programming in a while.

Either way, native is always faster than forking. You fork and exec if you need the flexibility to decide certain things at runtime.

Now, practically, it will make little difference as long as find is there and the output is reliable. Objectively, forking is a worse option, but you won't notice a difference in runtime or anything unless you are doing some very heavy CPU-bound work.

I'd say to implement both ways and see what works best for you, if time isn't an issue.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

41a193 No.3744

>>3741

Yeah, I'm using system(3) and popen(3) from Lua depending on if I need to get the output.

I'm aware that calling the commands without the popen(3) interface from C is way more tedious, but that isn't the case from Lua, where it's two lines at best.

Thanks for providing your insight. You're right, using the libraries is the preferable way. There's just too many incompatibilities with implementations of commands, and sometimes the commands aren't even available and you just find out during runtime.

However, working in an environment I'm writing shell scripts for anyway (those are way more sensitive to the environment than Lua, and I'm using both curl and find in them) and not wanting to redistribute the lua script, I don't think it really matters in terms of being error-prone. In terms of being cpu-intensive… Again, script argument. Those call a command on nearly every operation. I think that's way more cpu-intensive. (and it doesn't really matter either, it's fast enough). So I can just use any method in this case, while still preferably the library method.

I just have to add one thing: It really depends on whatever you're doing if the readability increases or not (at least in lua). Doing "for file in io.popen("find"):lines()" is way less tedious than the example on the luafilesystem page to recursively iterate over a directory. Same thing goes for curl, really.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

41a193 No.3748

>>3744

That's a good point. In the end, whatever gets the job done and doesn't break is preferable if you can get it up and running.

The primary argument to library vs forking for me is flexibility. If you need to do something to files while iterating over them (such as taking a stat of each, or checking extended filesystem attributes), you can more easily do it with the library system.

If you're willing to handle the small performance hit of forking (and to be honest the performance impact of both filesystem access and network transmission are far higher) and you are sure the command syntaxes can be consistent, it's just up to personal preference.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Nerve Center][Random][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]