Building Software from Source Code in Linux

Preparation

Overview

This lesson plan covers the (very) basics of building small projects from C or Fortran source code using the GCC compiler, and automating this process using GNU Make. It is intended for scientists venturing into scientific programming, to help ease the frustrations that typically come up when starting to work in compiled programming languages.

The material is not unique, and borrows heavily from the references listed at the end of the lesson. Comments are always welcome!

Building a single-file program

Let's start with a simple example: building a "hello world" program with the GCC compiler.

Our C program (hello.c) looks like this:

#include <stdio.h>
main()
{
    (void) printf("Hello World\n");
    return (0);
}

To build a working executable from this file in the simplest way possible, run:

$ gcc hello.c

The Fortran program (hello.f90) is as following:

program main
    print *, "Hello world"
end program main
To build the Fortran program,
$ gfortran hello.f90

The gcc/gfortran command creates an executable with a default name of a.out. Running this command prints the familiar message:

$ a.out
Hello World

More happened here than meets the eye. In fact, the command gcc wraps up 4 steps of the build process:

  1. Preprocess
  2. Compile
  3. Assemble
  4. Link

Step 1: Preprocess

In this step, gcc calls preprocessing program cpp to interpret preprocessor directives and modify the source code accordingly.

Some common directives are:

We could perform just this step of the build process like so:

cpp hello.c hello.s

Examining the output file (vim hello.i) shows that the long and messy stdio.h header has been appended to our simple code. You may also like to explore adding #define statements, or conditional code blocks.

Step 2: Compile

In this step, the (modified) source code is translated from the C programming language into assembly code.

Assembly code is a low-level programming language with commands that correspond to machine instructions for a particular type of hardware. It is still just plain text --- you can read assembly and write it too if you so desire.

To perform just the compilation step of the build process, we would run:

gcc -S -c hello.i -o hello.i

Examining the output file (vim hello.s) shows that processor-specific instructions needed to run our program on this specific system. Interestingly, for such a simple program as ours, the assembly code is actually shorter than the preprocesses source code (though not the original source code).

Step 3: Assemble

Assembly code is then translated into object code (more). This is a binary representation of the actions your computer needs to take to run your program. It is no longer human-readable, but it can be understood by your processor.

To perform just this step of the build process, we would run:

gcc -c hello.s -o hello.o

You can try to view this object file like we did the other intermediate steps, but the result will not be terribly useful (vim hello.o). Your text editor is trying to interpret binary machine language commands as ASCII characters, and (mostly) failing. Perhaps the most interesting result of doing so is that there are intelligable bits --- these are the few variables, etc, that actually are ASCII characters.

Also note that object files are not executables, you can't run them until after the next step.

In the final step, gcc calls the linker program ld to combine the object file with any external functions it needs (e.g. library functions or functions from other source files). In our case, this would include printf from the C standard library.

To perform just this step of the build process, we would run:

gcc hello.o -o hello

Challege:

Compile and run the following program (squares.c):

#include <stdio.h>
main()
{ 
    int i;

    printf("\t Number \t\t Square of Number\n\n");

    for (i=0; i<=25; ++i)
    printf("\t %d \t\t\t %d \n", i, i*i);
}

If you have some extra time, try walking through the process step-by-step and inspecting the results.

Solution:

gcc squares.c -o squares
./squares

Building a multi-file program

For all but the smallest programming projects, it is convenient to break up the source code into multiple files. Typically, these include a main function in one file, and one or more other files containing functions / subroutines called by main(). In addition, a header file is usually used to share custom data types, function prototypes, preprocessor macros, etc.

We will use a simple example program in the multi_string folder, which consists of:

The easiest way to compile such a program is to include all the required source files at the gcc command line:

gcc main.c WriteMyString.c -o my_string
./my_string

It is also quite common to separate out the process into two steps:

  1. source code -> object code
  2. object code -> executable (or library)

The reason is that this allows you to reduce compiling time by only recompiling objects that need to be updated. This seems (and is) silly for small projects, but becomes important quickly. We will use this approach later when we discuss automating the build process.

gcc -c WriteMyString.c
gcc -c main.c
gcc WriteMyString.o main.o -o write
./write

Including header files

Note that it is not necessary to include the header file on the gcc command line. This makes sense since we know that the (bundeled) preprocessing step will append any required headers to the source code before it is compiled.

There is one caveat: the preprocessor must be able to find the header files in order to include them. Our example works because header.h is in the working directory when we run gcc. We can break it by moving the header to a new subdirectory, like so:

mkdir include
mv header.c include
gcc main.c WriteMyString.c -o my_string

The above commands give the output error:

main.c:4:20: fatal error: header.h: No such file or directory
 #include "header.h"
                    ^
compilation terminated.

We can fix this by specifically telling gcc where it can find the requisite headers, using the -I flag:

gcc -I ./include main.c WriteMyString.c -o my_string

This is most often need in the case where you wish to use external libraries installed in non-standard locations. We will explore this case below.

Challenge

In the folder multi_fav_num you will find another simple multi-file program. Build this source code to a program named fav_num using separate compile and link steps. Once you have done this successfully, change the number defined in other.c and rebuild. You should not have to recompile main.c to do this.

Solution:

gcc -c main.c
gcc -c other.c
gcc main.o other.o -o fav_num
./fav_num

vim other.c

gcc -c other.c
gcc main.o other.o -o fav_num
./fav_num

Linking external libraries

NOTE: content in this section is (lightly) modified from this site.

A library is a collection of pre-compiled object files that can be linked into your programs via the linker. In simpler terms, they are machine code files that contain functions, etc, you can use in your programs.

A few example functions that come from libraries are:

We will return to these in a moment.

Shared libraries vs static libraries

Static libraries certainly seem simpler, but most programs use shared libraries and dynamic linking. There are several reasons why the added complexity is thought to be worth it:

Because of the advantage of dynamic linking, GCC will prefer a shared library to a static library if both are available (by default).

Building with shared libraries in default (known) locations

Let's start with an example that uses the sqrt() function from the math library:

#include <stdio.h>
#include <math.h>

void main()
{ 
    int i;

    printf("\t Number \t\t Square Root of Number\n\n");

    for (i=0; i<=360; ++i)
        printf("\t %d \t\t\t %d \n", i, sqrt((double) i));

}

Notice the function sqrt, which we use, but do not define. The (machine) code for this function is stored in libm.so, and the function definition is stored in the header file math.h.

To build successfully, we must:

  1. #include the header file for the external library
  2. Make sure that the preprocessor can find this header file
  3. Instruct the linker to link to the external library

Let's go ahead and build the program. To compile and link this in separate steps, we would run:

gcc -c roots.c
gcc roots.o -lm -o roots

The first command preprocesses roots.c, appending the header files, and then translates it to object code. This step does need to find the header file, but it does not yet require the library.

The second command links all of the object code into the executable. It does not need to find the header file (it is already compiled into roots.o) but it does need to find the library file.

Library files are included using the -l flag. Thier names are given excluding the lib prefix and exluding the .so suffix.

Just as we did above, we can combine the build steps into a single command:

gcc roots.c -lm -o roots

IMPORTANT Because we are using shared libraries, the linker must be able to find the linked libraries at runtime, otherwise the program will fail. You can check the libraries required by a program, and whether they are being found correctly or not using the ldd command. For out roots program, we get the following

ldd roots
linux-vdso.so.1 =>  (0x00007fff8bb8a000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffc69550000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffc691bc000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffc69801000)

Which shows that our executable requires a few basic system libraries as well as the math library we explicitly included, and that all of these dependencies are found by the linker.

Challenge

Before moving on, let's take a few minutes to break this build process. Try the following and read the error messages carefully. These are your hints to fixing a broken build process.

  1. Delete #include <math.h> from roots.c
  2. Omit -lm from the linking step

The preprocessor will search some default paths for included header files. Before we go down the rabbit hole, it is important to note that you do not have to do this for a typical build, but the commands may prove useful when you are trying to work out why something fails to build.

o look for the header, we can run the following commands to show the preprocessor search path and look for files in therein:

cpp -Wp,-v

Which has the following output:

ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/include
  /usr/include
  End of search list.
  ^C

The last few lines show the paths where GCC will search for header files by default. We can then search these include paths for the file we want, math.h like so:

find /usr/include /usr/lib/gcc/x86_64-redhat-linux/4.4.7/include -name math.h

Which has the following output:

/usr/include/FL/math.h
/usr/include/c++/4.4.4/tr1/math.h
/usr/include/math.h

If we are really curious, we could open the header and see what it contains, but this is rarely necessary.

The linker will search some default paths for included library files. Again, it is important to note that you do not have to do this for a typical build, but the commands may prove useful when you are trying to work out why something fails to build.

To look for the library, we can run the following command to get a list of all library files the linker is aware of, then search that list for the math library we need:

ldconfig -p 
ldconfig -p | grep libm.so

The latter command gives the output:

libm.so.6 (libc6,x86-64, OS ABI: Linux 2.6.18) => /lib64/libm.so.6
libm.so.6 (libc6, hwcap: 0x0028000000000000, OS ABI: Linux 2.6.18) => /lib/i686/nosegneg/libm.so.6
libm.so.6 (libc6, OS ABI: Linux 2.6.18) => /lib/libm.so.6
libm.so (libc6,x86-64, OS ABI: Linux 2.6.18) => /usr/lib64/libm.so
libm.so (libc6, OS ABI: Linux 2.6.18) => /usr/lib/libm.so

We certainly have the math library available. In fact, there are a few versions of this library known to the linker. Thankfully, we can let the linker sort out which one to use.

We might also want to peek inside a library file (or any object code for that matter) to see what functions and variables are defined within. We can list all the names, then search for the one we care about, like so:

nm /lib/libm.so.6
nm /lib/libm.so.6 | sqrt

The output of this command contains the following line, which shows us that it does indeed include something called sqrt.

0000000000025990 W sqrt

Building with shared libraries in non-default (unknown) locations

note: the following command lines build the libctest.so shared library used in the example below:

gcc -Wall -fPIC -c ctest1.c ctest2.c
gcc -shared -Wl,-soname,libctest.so -o libctest.so ctest1.o ctest2.o

or

gcc ctest1.c ctest2.c -shared -o libctest.so

end note

Let's switch to a new bit of example code, called use_ctest.c that makes use of a (very simple) custom library in the ctest directory:

#include <stdio.h>
#include "ctest.h"
 
int main(){
    int x;
    int y;
    int z;
    ctest1(&x);
    ctest2(&y);
    z = (x / y);
    printf("%d / %d = %d\n", x, y, z);
    return 0;
}

Trying to compile this fails with an error:

gcc -c use_ctest.c

use_ctest.c:2:19: error: ctest.h: No such file or directory

As the error message indicates, the problem here is that an included header file is not found by the preprocessor. We can use the -I flag to fix this problem:

gcc -I ctest_dir/include -c use_ctest.c

When we try to link the program to create an executable, we know we need to explicitly add the library with the -l flag, but in this case we still get an error:

gcc use_ctest.o -lctest -o use_ctest
/usr/bin/ld: cannot find -lctest
collect2: ld returned 1 exit status

Just like for the header, we need to explicitly specify the path to the library file:

gcc -Lctest_dir/lib  use_ctest.o -lctest -o use_ctest

Success, or so it would seem. What happens when we try to run our shiny new executable?

./use_ctest

./use_ctest: error while loading shared libraries: libctest.so: cannot open shared object file: No such file or directory

We can diagnose this problem by checking to see if the dynamic linker is able to gather up all the dependencies at runtime:

ldd use_ctest

linux-vdso.so.1 =>  (0x00007fffd75ff000)
libctest.so => not found
libc.so.6 => /lib64/libc.so.6 (0x00007f802d21b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f802d5dd000)

The output clearly shows that it does not. The problem here is that the dynamic linker will only search the default paths unless we:

  1. Permanently add our custom library to this search path. This option is not covered here - I am assuming that many of you will be working on clusters and other systems where you do not have root permissions.

  2. Specify the location of non-standard libraries using the LD_LIBRARY_PATH variable. LD_LIBRARY_PATH contains a colon (:) separated list of directories where the dynamic linker should look for shared libraries. The linker will search these directories before the default system paths. You can define the value of LD_LIBRARY_PATH for a particular command only by preceeding the command with the definintion, like so:

    LD_LIBRARY_PATH=ctest_dir/lib:$LD_LIBRARY_PATH ./use_ctest

    Or define it for your whole shell as an environment variable:

    export LD_LIBRARY_PATH=./ctest_dir/lib:$LD_LIBRARY_PATH
    ./use_ctest
  3. Hard-code the location of non-standard libraries into the executable. Setting (and forgeting to set) LD_LIBRARY_PATH all the time can be tiresome. An alternative approach is to burn the location of the shared libraries into the executable as an RPATH or RUNPATH. This is done by adding some additional flags for the linker, like so:

    gcc use_ctest.o -Lctest_dir/lib -lctest -Wl,-rpath=./ctest_dir/lib -o use_ctest

    We can confirm that this worked by running the program (resetting LD_LIBRARY_PATH first if needed), and more explicitly, by examining the executable directly:

    ./use_ctest
    readelf -d use_ctest

Challenge

Without using your history, try to recompile and run the use_ctest program. For an additional challenge, try to do so using RUNPATH to hardcode the location of the shared library.

Automating the build process with GNU Make

The manual build process we used above can become quite tedious for all but the smallest projects. There are many ways that we might automate this process. The simplest would be to write a shell script that runs the build commands each time we invoke it. Let's take the simple hello.c program as a test case:

#!/bin/bash
gcc -c hello.c
gcc hello.o -o hello

This works fine for small projects, but for large multi-file projects, we would have to compile all the sources every time we change any of the sources.

The Make utility provides a useful way around this problem. The solution is that we (the programmer) write a special script that defines all the dependencies between source files, edit one or more files in our project, then invoke Make to recompile only those files that are affected by any changes.

How GNU Make works

Make is a mini-programming language unto itself. The command make looks for a file named Makefile or makefile in the same directory by default. Other file names can be specified by the option -f:

make -f filename

For the hello program, a Makefile might look like this:

hello: hello.o
    gcc hello.o -o hello

hello.o: hello.c
    gcc -c hello.c

clean:
    rm hello hello.o

The syntax here is target: prerequisite_1 prerequisite_2 etc. The command block that follows will be executed to generate the target if any of the prerequisites have been modified. The command lines always start with a tab key (It does not work with spaces). The first (top) target will be built by default, or you can specify a specific target to build following the make command. When we run make for the first time, the computer will take the following actions:

  1. Find the default target, which is our executable file hello.
  2. Check to see if hello is up-to-date. hello does not exist, so it is out-of-date and will have to be built
  3. Check to see if the prerequisite hello.o is up-to-date. hello.o does not exist, so it is out-of-date and will have to be built.
  4. The prerequisite hello.c is not a target, so there is nothing left to check. The command gcc -c hello.c will be run to build hello.o
  5. Now hello.o is up to date, so make builds the next target, hello by running the command gcc hello.o -o hello
  6. Done.

A target is considered out-of-date if:

  1. it does not exist, or
  2. it is older than any of the prerequisites.

Note that the command under the clean target is not executed by make, because it is neither the first target nor an prerequisite of any other target. To bring this target up, we need to specify the target name:

make clean 

This will remove the executable and the .o files, which is necessary before recompiling the codes. Notice that if all targets are up-to-date, make does not recompile anything.

Let's look at an example for our first multi-file program:

write: main.o WriteMyString.o
        gcc main.o WriteMyString.o -o write

main.o: main.c header.h
        gcc -c main.c

WriteMyString.o: WriteMyString.c
        gcc -c WriteMyString.c

clean: 
        rm write *.o

In the first build, make builds the targets in the following sequence: main.o, WriteMyString.o and write. This compiles all source codes and links object files to build the executable. In the next build, make will only build the targets whose prerequisite has been modified since last make. This feature makes it efficient for building a program with many source code files. For example, if WriteMyString.c is modified, only WriteMyString.c is recompiled, while main.c is not. If main.c or header.h is modified, only main.c is recompiled, while WriteMyString.c is not. In either case, the write target will be built, since either main.o or WriteMyString.o is updated.

By default, make prints on the screen all the commands that it executes. To suppress the print, add @ before the commands.

Challenge

Starting from the template below (or using our previous Makefile), see if you can write your own makefile for the multi_fav_num program:

fav: _____  _____
        gcc _____  ______ -o fav 

main.o: _____  _____ 
        gcc ___  _____

other.o: _____  _____
        _________________

clean: 
        rm _____  _____

Writing a good Makefile

A Makefile could be very compilcated in a practical program with many source codes. It is important to write a Makefile in good logic. The text in the Makefile should be as simple, clear as possbile. To this end, we will introduce more useful features of Makrefile in this section.

You may have noticed that there are many duplications of the same file name or command name in our previous Makefiles. It is more convinient to use varialbes. Still take our first multi-file program for example:

CC=gcc
OBJ=main.o WriteMyString.o
EXE=write

$(EXE): $(OBJ)
        $(CC) $(OBJ) -o $(EXE)

main.o: main.c header.h
        $(CC) -c main.c

WriteMyString.o: WriteMyString.c
        $(CC) -c WriteMyString.c

clean:
        rm $(EXE) *.o

Here we have defined the varialbes CC for the compiler, OBJ for object files and EXE for the executable file. If we want to change the compiler or the file names, we only modify the corresponding variables at one place, but do not need to modify all related places in the Makefile.

We can upgrade the Makefile to a higher automatic level using the so-called "automatic variables":
$(EXE): $(OBJ)
        $(CC) $^ -o $@

main.o: main.c header.h
        $(CC) -c $<

WriteMyString.o: WriteMyString.c
        $(CC) -c $< 

Here we have used the following automatic variables:

These automatic variables automatically take the names of current target or prerequisites, no matter what names are assigned to them.

Furthermore, we can notice that the main.o and WriteMyString.o targets are built by the same command. Is there a way to combine the two duplicated commands into one so as to compile all source code files by one command line? Yes, it can be done with an implicit rule:

%.o: %.c
        $(CC) -c $<

main.o: header.h 

Here % stands for the same thing in the prerequisites as it does in the target. In this example, any .o target has a corresponding .c file as an implied prerequisite. If a target (e.g. main.o) needs additional prerequisites (e.g. header.h), write an actionless rule with those prerequisites. We can imagine that applying this impilict rule should significantly simpify the Makefile when there are a large number of (say hundreds of) source code files.

If there are many varialbes to be defined, it is convinient to write the definition of all variables in another file, and then include the file in Makefile:

include ./variables 

The content of the file variables is as following:

CC=gcc
OBJ=main.o WriteMyString.o
EXE=write

In most cases, the target name is a file name. But there are exceptions, such as the clean target in this example. The rm command will not create any file named clean. What if there exists a file named clean in this directory? Let's do an experiment.

touch clean
make clean
make: `clean' is up to date.
The clean target does not work properly. Since it has no prerequisite, clean will always be considered up-to-date, and thus nothing will be done. To avoid this problem, we can declare the target to be phony by making it a prerequisite of the special target .PHONY as follows:
.PHONY: clean
A phony target is one that is not really the name of a file; rather it is just a name for a recipe to be executed.

Finally, we end up with a pretty elegant Makefile:

include ./variables
.PHONY: clean

$(EXE): $(OBJ)
        $(CC) $^ -o $@

%.o: %.c
        $(CC) -c $<

main.o: header.h

clean:
        rm $(EXE) *.o

Challenge

Rewrite a Makefile for the multi_fav_num program using regular variables, automatic variables and implicit rules.

References