Spawning a basic curl command in C++

The purpose of this blog is to help web developers to jump into systems programming. So you can ask any questions; there are no dummy questions. I want this blog to be a discussion space for every programmer who lives this journey.

As a dev tool engineer, spawning sub-processes was part of my daily job. I got used to doing it in Node.js with the [child\_process](https://nodejs.org/api/child_process.html) module from the standard library.

I recently switched to a system programming position (still in the dev tool world). I had to change my daily companion from Node.js to C++.

Then I had to learn how to spawn child processes in C++. Even if the terms and concepts are similar, the ergonomics of the C API we need to use for this purpose could fear people from a web background.

This article aims to unveil gray areas and make them easier for you!

✅ What you'll be able to do at the end of this article?

spawning basic subprocesses in CPP for UNIX OSes,
creating a user-friendly API to consume

❌ What won't this article cover?

spawning subprocess in CPP for windows (maybe in the next post)
complex cases like manipulating stderr or stdout

Ready? Let's start our journey by defining terms.

When you launch a program (binary) on your computer, the OS will create a process object stored in memory. This object contains a state (new, ready, running, waiting, terminated).

According to the machine resource and availability, the OS scheduler executes the program and puts the process into different states.

If you want to dig into this topic a bit more. I advise you to this video:

Now it's time to see which API we'll use to spawn a process in C++.

Which C++ API for spawning a process?

There are probably different approaches to spawning a process in C++. My first web research brought me to posix_spawn/posix_spawnp functions.

According to the man:

The posix_spawn() and posix_spawnp() functions are used to create a new child process that executes a specified file.

"Executing a file?" (a voice pops into my mind)

Yes, in this context, file means your executable.

"Okay, but what is the difference between posix_spawn() and posix_spawnp()?"

The difference is about the second argument. In the posix_spawn case, the argument should be a path to the executable (absolute or relative), but in the posix_spawnp case, the executable is specified as a simple name.

posix_spawn  -> "/usr/bin/curl"
posix_spawnp -> "curl"

In the latter case, the system will search for the executable in the list of directories stored in the PATH environment variable. Now let's take a look at the API itself:

posix_spawnp API

In the following listing, you'll find the posix_spawnp API description.

int posix_spawnp(pid_t *restrict pid,
                 const char *restrict file,
                 const posix_spawn_file_actions_t *restrict file_actions,
                 const posix_spawnattr_t *restrict attrp,
                 char *const argv[restrict],
                 char *const envp[restrict]);

We will skip parameter number 3 and number 4 as, during my personal experience, I didn't have to deal with them.

There are a few caveats:

argv: the first item should be the same as file argument
argv: the last item should be 0 (Look at execve documentation for more details)
envp: should be declared as extern char** environ; in the file because it will make available by execve(2) when a process begins

Let's try to implement a basic version!

Basic implementation of a spawn function in C++

Let's say that we want to spawn a basic curl command:

#include <cstdio>
#include <cstdlib>
#include <errno.h>
#include <spawn.h>
#include <sys/wait.h>

// NOTE: made available by execve(2) when a process begins
extern char **environ;

int main() {
  pid_t pid; // #1
  char *args[] = {"curl", // #2
                  "https://jsonplaceholder.typicode.com/posts/1",
                  0};
  int status = posix_spawnp(&pid, // #3
                            args[0], 
                            nullptr, 
                            nullptr, 
                            args, 
                            environ);

  int s = waitpid(pid, &status, 0); // #4
  if (s == -1) {
    errno = status; // #5
    perror("posix_spawn");
    exit(EXIT_FAILURE);
  }

  return status > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

What we have in the body of main:

#1: First, we declare the argument pid as pid_t
#2: Then we declare an array of strings (named args) that contains the program we want to use, followed by arguments.
#3: We invoke posix_spawnp function with pid and args and store the result into a status variable.

Note that we use args twice: once for the file argument and then for argv the argument.

#4: The last part is about waiting for the process to end with the waitpid function (sys/wait.h header) that will return a -1 in case of error.
#5: We want to display the error message in case of an error. It's why we set the errno variable from errno.h header and then call perror with a label. Then it should print something like: posix_spawn: <error-message>

If you want to play with it, visit this link.

Now let's reshape the whole implementation. This version is not reusable at the moment.

First, we should wrap the body of our function into a spawn function:

#include <cstdio>
#include <cstdlib>
#include <errno.h>
#include <spawn.h>
#include <sys/wait.h>

// NOTE: made available by execve(2) when a process begins
extern char **environ;

int spawn(char *args[]) {
  pid_t pid;
  int status = posix_spawnp(&pid, 
                            args[0], 
                            nullptr, 
                            nullptr, 
                            args, 
                            environ);

  int s = waitpid(pid, &status, 0);
  if (s == -1) {
    errno = status;
    perror("posix_spawn");
    exit(EXIT_FAILURE);
  }

  return status > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

int main() {
  char *args[] = {"curl", 
                  "https://jsonplaceholder.typicode.com/posts/1", 
                  0};
  return spawn(args);
}

That's not enough! Ideally, we'd like to mimic the Node.Js API and be able to pass the command and arguments separately. Something like: spawn("curl", ["https://www.google.fr"]);

Also, instead of using old C strings and arrays, we could use std::string & std::vector.

#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <errno.h>
#include <spawn.h>
#include <string>
#include <sys/wait.h>
#include <vector>

// NOTE: made available by execve(2) when a process begins
extern char **environ;

// #2
std::vector<const char *> format_args(const std::string &command,
                                      const std::vector<std::string> &arguments) {
  // NOTE: we need two more slot:
  // - one for the command itself
  // - another for the last "0" item
  std::vector<const char *> cstr_args(arguments.size() + 2);
  std::transform(std::cbegin(arguments), std::cend(arguments),
                 std::begin(cstr_args) + 1,
                 [](const auto &v) { return v.c_str(); });
  cstr_args[0] = command.c_str();

  return cstr_args;
}

int spawn(const std::string &command, const std::vector<std::string> &args) {
  pid_t pid;
  const std::vector<const char *> cstr_args = format_args(command, args);
  // #3
  char *const *raw_args = const_cast<char *const *>(cstr_args.data());

  int status = posix_spawnp(&pid,
                            raw_args[0], 
                            nullptr, 
                            nullptr, 
                            raw_args, 
                            environ);

  int s = waitpid(pid, &status, 0);
  if (s == -1) {
    errno = status;
    perror("posix_spawn");
    return EXIT_FAILURE;
  }

  return status > 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

int main() {
  const std::string command = "curl";
  const std::vector<std::string> args{
      "https://jsonplaceholder.typicode.com/posts/1"};
  int status = spawn(command, args); // #1

  return status;
}

What we have in this new version:

#1: the API of the spawn function now takes two parameters, one for the command (std::string) and another for the arguments (std::vector<std::string>).
#2:then the spawn function has to format this input to fit into an old-fashioned array of strings. That's the purpose of the format_args function that returns a vector of C strings
#3:finally, we have to convert this vector into an old-fashioned array; it's trivial because the std::vector expose a .data() the method that makes that conversion easier.

💡If you are not familiar with c++, consider just the format_args function as a magic box that converts our arguments into an old-fashioned C array.If you want to play with the full version:

Conlusion

Congratulation! You managed to finish this post and you're now ready to spawn processes in C++.

Let me know in the comment section if you want to see a Windows version of this post.

Takeaways

If you want to read more about spawning process, I advise you to read:

If you want to understand how I convert a std::vector<std::string> to a char *args[], please take a look at: