Create a new working directory, then make a copy of the example files for the tutorial, like so:
cd ~
cp -r /project/scv/examples/BuildingSoftware/src tut
This lesson plan covers the (very) basics of building small projects from C or Fortran source code using the GCC compiler, and automating this process using GNU Make. It is intended for scientists venturing into scientific programming, to help ease the frustrations that typically come up when starting to work in compiled programming languages.
The material is not unique, and borrows heavily from the references listed at the end of the lesson. Comments are always welcome!
Let's start with a simple example: building a "hello world" program with the GCC compiler.
Our C program (hello.c
) looks like this:
#include <stdio.h>
main()
{
(void) printf("Hello World\n");
return (0);
}
To build a working executable from this file in the simplest way possible, run:
$ gcc hello.c
The Fortran program (hello.f90
) is as following:
program main
print *, "Hello world"
end program main
$ gfortran hello.f90
The gcc/gfortran
command creates an executable with a default name of a.out
. Running this command prints the familiar message:
$ a.out
Hello World
More happened here than meets the eye. In fact, the command gcc
wraps up 4 steps of the build process:
In this step, gcc
calls preprocessing program cpp
to interpret preprocessor directives and modify the source code accordingly.
Some common directives are:
#include
#include <stdio.h>
#define
#ifdef ... #end
conditional compilation, the code block is included only if a certain macro is defined, e.g:
#ifdef TEST_CASE
a=1; b=0; c=0;
#endif
We could perform just this step of the build process like so:
cpp hello.c hello.s
Examining the output file (vim hello.i
) shows that the long and messy stdio.h
header has been appended to our simple code. You may also like to explore adding #define
statements, or conditional code blocks.
In this step, the (modified) source code is translated from the C programming language into assembly code.
Assembly code is a low-level programming language with commands that correspond to machine instructions for a particular type of hardware. It is still just plain text --- you can read assembly and write it too if you so desire.
To perform just the compilation step of the build process, we would run:
gcc -S -c hello.i -o hello.i
Examining the output file (vim hello.s
) shows that processor-specific instructions needed to run our program on this specific system. Interestingly, for such a simple program as ours, the assembly code is actually shorter than the preprocesses source code (though not the original source code).
Assembly code is then translated into object code (more). This is a binary representation of the actions your computer needs to take to run your program. It is no longer human-readable, but it can be understood by your processor.
To perform just this step of the build process, we would run:
gcc -c hello.s -o hello.o
You can try to view this object file like we did the other intermediate steps, but the result will not be terribly useful (vim hello.o
). Your text editor is trying to interpret binary machine language commands as ASCII characters, and (mostly) failing. Perhaps the most interesting result of doing so is that there are intelligable bits --- these are the few variables, etc, that actually are ASCII characters.
Also note that object files are not executables, you can't run them until after the next step.
In the final step, gcc
calls the linker program ld
to combine the object file with any external functions it needs (e.g. library functions or functions from other source files). In our case, this would include printf
from the C standard library.
To perform just this step of the build process, we would run:
gcc hello.o -o hello
Compile and run the following program (squares.c
):
#include <stdio.h>
main()
{
int i;
printf("\t Number \t\t Square of Number\n\n");
for (i=0; i<=25; ++i)
printf("\t %d \t\t\t %d \n", i, i*i);
}
If you have some extra time, try walking through the process step-by-step and inspecting the results.
gcc squares.c -o squares
./squares
For all but the smallest programming projects, it is convenient to break up the source code into multiple files. Typically, these include a main function in one file, and one or more other files containing functions / subroutines called by main(). In addition, a header file is usually used to share custom data types, function prototypes, preprocessor macros, etc.
We will use a simple example program in the multi_string
folder, which consists of:
main.c
: The main driver function, which calls a subroutine and exitsWriteMyString.c
: a module containing the subroutine called by mainheader.h
: one function prototype and one macro definitionThe easiest way to compile such a program is to include all the required source files at the gcc
command line:
gcc main.c WriteMyString.c -o my_string
./my_string
It is also quite common to separate out the process into two steps:
The reason is that this allows you to reduce compiling time by only recompiling objects that need to be updated. This seems (and is) silly for small projects, but becomes important quickly. We will use this approach later when we discuss automating the build process.
gcc -c WriteMyString.c
gcc -c main.c
gcc WriteMyString.o main.o -o write
./write
Note that it is not necessary to include the header file on the gcc
command line. This makes sense since we know that the (bundeled) preprocessing step will append any required headers to the source code before it is compiled.
There is one caveat: the preprocessor must be able to find the header files in order to include them. Our example works because header.h
is in the working directory when we run gcc
. We can break it by moving the header to a new subdirectory, like so:
mkdir include
mv header.c include
gcc main.c WriteMyString.c -o my_string
The above commands give the output error:
main.c:4:20: fatal error: header.h: No such file or directory
#include "header.h"
^
compilation terminated.
We can fix this by specifically telling gcc
where it can find the requisite headers, using the -I
flag:
gcc -I ./include main.c WriteMyString.c -o my_string
This is most often need in the case where you wish to use external libraries installed in non-standard locations. We will explore this case below.
In the folder multi_fav_num
you will find another simple multi-file program. Build this source code to a program named fav_num
using separate compile and link steps. Once you have done this successfully, change the number defined in other.c
and rebuild. You should not have to recompile main.c
to do this.
gcc -c main.c
gcc -c other.c
gcc main.o other.o -o fav_num
./fav_num
vim other.c
gcc -c other.c
gcc main.o other.o -o fav_num
./fav_num
NOTE: content in this section is (lightly) modified from this site.
A library is a collection of pre-compiled object files that can be linked into your programs via the linker. In simpler terms, they are machine code files that contain functions, etc, you can use in your programs.
A few example functions that come from libraries are:
printf()
from the libc.so
shared librarysqrt()
from the libm.so
shared libraryWe will return to these in a moment.
A static library has file extension of .a
(archive file). When your program links a static library, the machine code of external functions used in your program is copied into the executable. At runtime, everything your program needs is wrapped up inside the executable.
A shared library has file extension of ".so" (shared objects). When your program is linked against a shared library, only a small table is created in the executable. At runtime, the exectutable must be able to locate the functions listed in this table. This is done by the operating system - a process known as dynamic linking.
Static libraries certainly seem simpler, but most programs use shared libraries and dynamic linking. There are several reasons why the added complexity is thought to be worth it:
Because of the advantage of dynamic linking, GCC will prefer a shared library to a static library if both are available (by default).
Let's start with an example that uses the sqrt()
function from the math library:
#include <stdio.h>
#include <math.h>
void main()
{
int i;
printf("\t Number \t\t Square Root of Number\n\n");
for (i=0; i<=360; ++i)
printf("\t %d \t\t\t %d \n", i, sqrt((double) i));
}
Notice the function sqrt
, which we use, but do not define. The (machine) code for this function is stored in libm.so
, and the function definition is stored in the header file math.h
.
To build successfully, we must:
#include
the header file for the external libraryLet's go ahead and build the program. To compile and link this in separate steps, we would run:
gcc -c roots.c
gcc roots.o -lm -o roots
The first command preprocesses roots.c
, appending the header files, and then translates it to object code. This step does need to find the header file, but it does not yet require the library.
The second command links all of the object code into the executable. It does not need to find the header file (it is already compiled into roots.o
) but it does need to find the library file.
Library files are included using the -l
flag. Thier names are given excluding the lib
prefix and exluding the .so
suffix.
Just as we did above, we can combine the build steps into a single command:
gcc roots.c -lm -o roots
IMPORTANT Because we are using shared libraries, the linker must be able to find the linked libraries at runtime, otherwise the program will fail. You can check the libraries required by a program, and whether they are being found correctly or not using the ldd
command. For out roots program, we get the following
ldd roots
linux-vdso.so.1 => (0x00007fff8bb8a000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffc69550000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffc691bc000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffc69801000)
Which shows that our executable requires a few basic system libraries as well as the math library we explicitly included, and that all of these dependencies are found by the linker.
Before moving on, let's take a few minutes to break this build process. Try the following and read the error messages carefully. These are your hints to fixing a broken build process.
#include <math.h>
from roots.c
-lm
from the linking stepThe preprocessor will search some default paths for included header files. Before we go down the rabbit hole, it is important to note that you do not have to do this for a typical build, but the commands may prove useful when you are trying to work out why something fails to build.
o look for the header, we can run the following commands to show the preprocessor search path and look for files in therein:
cpp -Wp,-v
Which has the following output:
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include
/usr/include
End of search list.
^C
The last few lines show the paths where GCC will search for header files by default. We can then search these include paths for the file we want, math.h
like so:
find /usr/include /usr/lib/gcc/x86_64-redhat-linux/4.4.7/include -name math.h
Which has the following output:
/usr/include/FL/math.h
/usr/include/c++/4.4.4/tr1/math.h
/usr/include/math.h
If we are really curious, we could open the header and see what it contains, but this is rarely necessary.
The linker will search some default paths for included library files. Again, it is important to note that you do not have to do this for a typical build, but the commands may prove useful when you are trying to work out why something fails to build.
To look for the library, we can run the following command to get a list of all library files the linker is aware of, then search that list for the math library we need:
ldconfig -p
ldconfig -p | grep libm.so
The latter command gives the output:
libm.so.6 (libc6,x86-64, OS ABI: Linux 2.6.18) => /lib64/libm.so.6
libm.so.6 (libc6, hwcap: 0x0028000000000000, OS ABI: Linux 2.6.18) => /lib/i686/nosegneg/libm.so.6
libm.so.6 (libc6, OS ABI: Linux 2.6.18) => /lib/libm.so.6
libm.so (libc6,x86-64, OS ABI: Linux 2.6.18) => /usr/lib64/libm.so
libm.so (libc6, OS ABI: Linux 2.6.18) => /usr/lib/libm.so
We certainly have the math library available. In fact, there are a few versions of this library known to the linker. Thankfully, we can let the linker sort out which one to use.
We might also want to peek inside a library file (or any object code for that matter) to see what functions and variables are defined within. We can list all the names, then search for the one we care about, like so:
nm /lib/libm.so.6
nm /lib/libm.so.6 | sqrt
The output of this command contains the following line, which shows us that it does indeed include something called sqrt
.
0000000000025990 W sqrt
note: the following command lines build the libctest.so shared library used in the example below:
gcc -Wall -fPIC -c ctest1.c ctest2.c
gcc -shared -Wl,-soname,libctest.so -o libctest.so ctest1.o ctest2.o
or
gcc ctest1.c ctest2.c -shared -o libctest.so
end note
Let's switch to a new bit of example code, called use_ctest.c
that makes use of a (very simple) custom library in the ctest
directory:
#include <stdio.h>
#include "ctest.h"
int main(){
int x;
int y;
int z;
ctest1(&x);
ctest2(&y);
z = (x / y);
printf("%d / %d = %d\n", x, y, z);
return 0;
}
Trying to compile this fails with an error:
gcc -c use_ctest.c
use_ctest.c:2:19: error: ctest.h: No such file or directory
As the error message indicates, the problem here is that an included header file is not found by the preprocessor. We can use the -I
flag to fix this problem:
gcc -I ctest_dir/include -c use_ctest.c
When we try to link the program to create an executable, we know we need to explicitly add the library with the -l
flag, but in this case we still get an error:
gcc use_ctest.o -lctest -o use_ctest
/usr/bin/ld: cannot find -lctest
collect2: ld returned 1 exit status
Just like for the header, we need to explicitly specify the path to the library file:
gcc -Lctest_dir/lib use_ctest.o -lctest -o use_ctest
Success, or so it would seem. What happens when we try to run our shiny new executable?
./use_ctest
./use_ctest: error while loading shared libraries: libctest.so: cannot open shared object file: No such file or directory
We can diagnose this problem by checking to see if the dynamic linker is able to gather up all the dependencies at runtime:
ldd use_ctest
linux-vdso.so.1 => (0x00007fffd75ff000)
libctest.so => not found
libc.so.6 => /lib64/libc.so.6 (0x00007f802d21b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f802d5dd000)
The output clearly shows that it does not. The problem here is that the dynamic linker will only search the default paths unless we:
Permanently add our custom library to this search path. This option is not covered here - I am assuming that many of you will be working on clusters and other systems where you do not have root permissions.
Specify the location of non-standard libraries using the LD_LIBRARY_PATH
variable. LD_LIBRARY_PATH
contains a colon (:) separated list of directories where the dynamic linker should look for shared libraries. The linker will search these directories before the default system paths. You can define the value of LD_LIBRARY_PATH
for a particular command only by preceeding the command with the definintion, like so:
LD_LIBRARY_PATH=ctest_dir/lib:$LD_LIBRARY_PATH ./use_ctest
Or define it for your whole shell as an environment variable:
export LD_LIBRARY_PATH=./ctest_dir/lib:$LD_LIBRARY_PATH
./use_ctest
Hard-code the location of non-standard libraries into the executable. Setting (and forgeting to set) LD_LIBRARY_PATH
all the time can be tiresome. An alternative approach is to burn the location of the shared libraries into the executable as an RPATH
or RUNPATH
. This is done by adding some additional flags for the linker, like so:
gcc use_ctest.o -Lctest_dir/lib -lctest -Wl,-rpath=./ctest_dir/lib -o use_ctest
We can confirm that this worked by running the program (resetting LD_LIBRARY_PATH
first if needed), and more explicitly, by examining the executable directly:
./use_ctest
readelf -d use_ctest
Without using your history, try to recompile and run the use_ctest program. For an additional challenge, try to do so using RUNPATH to hardcode the location of the shared library.
The manual build process we used above can become quite tedious for all but the smallest projects. There are many ways that we might automate this process. The simplest would be to write a shell script that runs the build commands each time we invoke it. Let's take the simple hello.c
program as a test case:
#!/bin/bash
gcc -c hello.c
gcc hello.o -o hello
This works fine for small projects, but for large multi-file projects, we would have to compile all the sources every time we change any of the sources.
The Make utility provides a useful way around this problem. The solution is that we (the programmer) write a special script that defines all the dependencies between source files, edit one or more files in our project, then invoke Make to recompile only those files that are affected by any changes.
Make is a mini-programming language unto itself. The command make
looks for a file named Makefile
or makefile
in the same directory by default. Other file names can be specified by the option -f
:
make -f filename
For the hello
program, a Makefile might look like this:
hello: hello.o
gcc hello.o -o hello
hello.o: hello.c
gcc -c hello.c
clean:
rm hello hello.o
The syntax here is target: prerequisite_1 prerequisite_2 etc
. The command block that follows will be executed to generate the target if any of the prerequisites have been modified. The command lines always start with a tab key (It does not work with spaces). The first (top) target will be built by default, or you can specify a specific target to build following the make
command. When we run make
for the first time, the computer will take the following actions:
hello
.hello
is up-to-date. hello
does not exist, so it is out-of-date and will have to be builthello.o
is up-to-date. hello.o
does not exist, so it is out-of-date and will have to be built.hello.c
is not a target, so there is nothing left to check. The command gcc -c hello.c
will be run to build hello.o
hello.o
is up to date, so make
builds the next target, hello
by running the command gcc hello.o -o hello
A target is considered out-of-date if:
Note that the command under the clean
target is not executed by make
, because it is neither the first target nor an prerequisite of any other target. To bring this target up, we need to specify the target name:
make clean
This will remove the executable and the .o
files, which is necessary before recompiling the codes. Notice that if all targets are up-to-date, make
does not recompile anything.
Let's look at an example for our first multi-file program:
write: main.o WriteMyString.o
gcc main.o WriteMyString.o -o write
main.o: main.c header.h
gcc -c main.c
WriteMyString.o: WriteMyString.c
gcc -c WriteMyString.c
clean:
rm write *.o
In the first build, make
builds the targets in the following sequence: main.o
, WriteMyString.o
and write
. This compiles all source codes and links object files to build the executable. In the next build, make
will only build the targets whose prerequisite has been modified since last make
. This feature makes it efficient for building a program with many source code files. For example, if WriteMyString.c
is modified, only WriteMyString.c
is recompiled, while main.c
is not. If main.c
or header.h
is modified, only main.c
is recompiled, while WriteMyString.c
is not. In either case, the write
target will be built, since either main.o
or WriteMyString.o
is updated.
By default, make
prints on the screen all the commands that it executes. To suppress the print, add @
before the commands.
Starting from the template below (or using our previous Makefile), see if you can write your own makefile for the multi_fav_num
program:
fav: _____ _____
gcc _____ ______ -o fav
main.o: _____ _____
gcc ___ _____
other.o: _____ _____
_________________
clean:
rm _____ _____
A Makefile could be very compilcated in a practical program with many source codes. It is important to write a Makefile in good logic. The text in the Makefile should be as simple, clear as possbile. To this end, we will introduce more useful features of Makrefile in this section.
You may have noticed that there are many duplications of the same file name or command name in our previous Makefiles. It is more convinient to use varialbes. Still take our first multi-file program for example:
CC=gcc
OBJ=main.o WriteMyString.o
EXE=write
$(EXE): $(OBJ)
$(CC) $(OBJ) -o $(EXE)
main.o: main.c header.h
$(CC) -c main.c
WriteMyString.o: WriteMyString.c
$(CC) -c WriteMyString.c
clean:
rm $(EXE) *.o
Here we have defined the varialbes CC
for the compiler, OBJ
for object files and EXE
for the executable file. If we want to change the compiler or the file names, we only modify the corresponding variables at one place, but do not need to modify all related places in the Makefile.
$(EXE): $(OBJ)
$(CC) $^ -o $@
main.o: main.c header.h
$(CC) -c $<
WriteMyString.o: WriteMyString.c
$(CC) -c $<
Here we have used the following automatic variables:
$@
--- the name of the current target$^
--- the names of all the prerequisites$<
--- the name of the first prerequisiteThese automatic variables automatically take the names of current target or prerequisites, no matter what names are assigned to them.
Furthermore, we can notice that the main.o
and WriteMyString.o
targets are built by the same command. Is there a way to combine the two duplicated commands into one so as to compile all source code files by one command line? Yes, it can be done with an implicit rule:
%.o: %.c
$(CC) -c $<
main.o: header.h
Here %
stands for the same thing in the prerequisites as it does in the target. In this example, any .o
target has a corresponding .c
file as an implied prerequisite. If a target (e.g. main.o
) needs additional prerequisites (e.g. header.h
), write an actionless rule with those prerequisites. We can imagine that applying this impilict rule should significantly simpify the Makefile when there are a large number of (say hundreds of) source code files.
If there are many varialbes to be defined, it is convinient to write the definition of all variables in another file, and then include the file in Makefile:
include ./variables
The content of the file variables
is as following:
CC=gcc
OBJ=main.o WriteMyString.o
EXE=write
In most cases, the target name is a file name. But there are exceptions, such as the clean
target in this example. The rm
command will not create any file named clean. What if there exists a file named clean in this directory? Let's do an experiment.
touch clean
make clean
make: `clean' is up to date.
The clean
target does not work properly. Since it has no prerequisite, clean
will always be considered up-to-date, and thus nothing will be done. To avoid this problem, we can
declare the target to be phony by making it a prerequisite of the special target .PHONY
as follows:
.PHONY: clean
A phony target is one that is not really the name of a file; rather it is just a name for a recipe to be executed.
Finally, we end up with a pretty elegant Makefile:
include ./variables
.PHONY: clean
$(EXE): $(OBJ)
$(CC) $^ -o $@
%.o: %.c
$(CC) -c $<
main.o: header.h
clean:
rm $(EXE) *.o
multi_fav_num
program using regular variables, automatic variables and implicit rules.