13 Using the merger
There are many program analyses that are more effective when
done on the whole program.
The merger is a tool that combines all of the C source files in a project
into a single C file. There are two tasks that a merger must perform:
-
Detect what are all the sources that make a project and with what
compiler arguments they are compiled.
- Merge all of the source files into a single file.
For the first task the merger impersonates a compiler and a linker (both a
GCC and a Microsoft Visual C mode are supported) and it expects to be invoked
(from a build script or a Makefile) on all sources of the project. When
invoked to compile a source the merger just preprocesses the source and saves
the result using the name of the requested object file. By preprocessing at
this time the merger is able to take into account variations in the command
line arguments that affect preprocessing of different source files.
When the merger is invoked to link a number of object files it collects the
preprocessed sources that were stored with the names of the object files, and
invokes the merger proper. Note that arguments that affect the compilation or
linking must be the same for all source files.
For the second task, the merger essentially concatenates the preprocessed
sources with care to rename conflicting file-local declarations (we call this
process alpha-conversion of a file). The merger also attempts to remove
duplicate global declarations and definitions. Specifically the following
actions are taken:
-
File-scope names (static globals, names of types defined with
typedef, and structure/union/enumeration tags) are given new names if they
conflict with declarations from previously processed sources. The new name is
formed by appending the suffix ___n, where n is a unique integer
identifier. Then the new names are applied to their occurrences in the file.
- Non-static declarations and definitions of globals are never renamed.
But we try to remove duplicate ones. Equality of globals is detected by
comparing the printed form of the global (ignoring the line number directives)
after the body has been alpha-converted. This process is intended to remove
those declarations (e.g. function prototypes) that originate from the same
include file. Similarly, we try to eliminate duplicate definitions of
inline functions, since these occasionally appear in include files.
- The types of all global declarations with the same name from all files
are compared for type isomorphism. During this process, the merger detects all
those isomorphisms between structures and type definitions that are required for the merged program to be legal. Such structure tags and
typenames are coalesced and given the same name.
- Besides the structure tags and type names that are required to be
isomorphic, the merger also tries to coalesce definitions of structures and
types with the same name from different file. However, in this case the merger
will not give an error if such definitions are not isomorphic; it will just
use different names for them.
- In rare situations, it can happen that a file-local global in
encountered first and it is not renamed, only to discover later when
processing another file that there is an external symbol with the same name.
In this case, a second pass is made over the merged file to rename the
file-local symbol.
Here is an example of using the merger:
The contents of file1.c is:
struct foo; // Forward declaration
extern struct foo *global;
The contents of file2.c is:
struct bar {
int x;
struct bar *next;
};
extern struct bar *global;
struct foo {
int y;
};
extern struct foo another;
void main() {
}
There are several ways in which one might create an executable from these
files:
-
gcc file1.c file2.c -o a.out
gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
ld file1.o file2.o -o a.out
gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
ar r libfile2.a file2.o
gcc file1.o libfile2.a -o a.out
gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
ar r libfile2.a file2.o
gcc file1.o -lfile2 -o a.out
In each of the cases above you must replace all occurrences of gcc and
ld with cilly --merge, and all occurrences of ar with cilly
--merge --mode=AR. It is very important that the --merge flag be used
throughout the build process. If you want to see the merged source file you
must also pass the --keepmerged flag to the linking phase.
The result of merging file1.c and file2.c is:
// from file1.c
struct foo; // Forward declaration
extern struct foo *global;
// from file2.c
struct foo {
int x;
struct foo *next;
};
struct foo___1 {
int y;
};
extern struct foo___1 another;