Creating a Multi-Call Linux Binary
Note: This is publication is now archived. For reference only.
Published 09 December 2002
Rate and comment
Authors: Gregory Geiselhart
A multi-call binary is an executable, written in C, that performs the action of more than one utility. A prime example of a multi-call binary is the BusyBox package. BusyBox implements a large number of standard Linux utilities (such as the ls and ln commands) in a single executable. This enables specialized Linux distributions to have a reduced size. This tip describes how multi-call binaries are written.
The BusyBox package is one of the best examples of a multi-call binary. This concept allows a single executable file to perform the function of dozens of different utilities that are usually packaged as separate files. Multi-call binaries exploit a number of operating system features that make it possible for a user of a system to not even know that the programs they are running are all, in fact, the same file.
There are two ways to invoke BusyBox functions:
- In the first method, you issue the command busybox followed by the name of the function you want to issue. For example, the command busybox ls would perform the directory list function (equivalent to the usual ls command). This method requires no administration, but users of the program would have to remember that they could not simply perform a function by issuing the name of a command.
- The second method is to create a set of symbolic links to the BusyBox executable, each with the name of a function implemented by BusyBox. When BusyBox is run, it checks the name by which it was invoked, and uses that name as the function to be executed. This method does require some administration, as the symbolic links must be maintained, but system users can follow the normal practice of performing a function by issuing the name of the command.
To illustrate, the following sequence shows the content of a BusyBox /bin directory and the effect of issuing the ls command.
# ls -l l*
lrwxrwxrwx 1 root root 12 Oct 2 00:11 ln -> /bin/busybox
lrwxrwxrwx 1 root root 14 Oct 2 00:11 login -> /bin/tinylogin
lrwxrwxrwx 1 root root 12 Oct 2 00:11 ls -> /bin/busybox
# ls -lG
ls: invalid option -- G
BusyBox v0.60.3 (2002.09.26-00:58+0000) multi-call binary
Usage: ls [-1AacCdeFilnpLRrSsTtuvwxXhk] [filenames...]
List directory contents
In the first output you can also see login, which is a symbolic link to /bin/tinylogin. TinyLogin is a partner program to BusyBox, and performs the functions of programs like login and sulogin. These functions could have been implemented in BusyBox, but for security reasons it is preferred to have a separate executable for login processing.
This example also shows us another feature of the BusyBox utility. In the full GNU implementation of ls, the -G option is valid (it suppresses the display of the group name from the directory list). In the interests of saving space, however, not all of the function of the various utilities is provided. This is quite appropriate for BusyBox, however, since the idea is to eliminate unused (or little used) functions in the interests of reducing the executable size.
So, how does a multi-call binary like BusyBox, when invoked using a symbolic link, know what function to perform? The answer is that the way a multi-call binary program is written differs from a normal program.
The C language is used for most systems programming on UNIX/POSIX systems. Programs written in C always have a main() function, which is the first part of the program to be executed. The main function is written in a particular way, to allow the operating system to pass parameters to it. A typical main() function declaration appears here:
int main(int argc, *char argv)
The parameters passed to the main() function are argc, an integer containing the number of parameters passed by the system to the program, and argv, the list of the parameters passed. By convention (on UNIX/POSIX systems, at least), there will always be at least one parameter passed to the program: the name used to invoke the program. This is usually the command typed by the user at the shell prompt to invoke the command, and will just about always be the name of the file that contains the program. In C notation, this value (the first item in the array called argv) is argv.
Most single call binaries ignore the contents of argv, as the program is designed to perform a single task and it is irrelevant what name the system used to invoke the program. Some programs, for security reasons, do make sure that the command issued is correct. This can prevent a malicious user from executing a program they should not have access to.
A multi-call binary pays attention to this parameter, however, and uses it to determine which function to execute. In the case of BusyBox, if argv is the same as the executable file name, it will use the second item in the parameter list (argv) as the name of the function to be executed. If argv is not the same as the name of the BusyBox executable file, it will attempt to use the contents of argv as the name of the requested function.
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.
Follow IBM Redbooks
Follow IBM Redbooks