ccomp_profiling/CCOMP_PROFILING.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162

# Profiling the CompCert compiler

This documentation is about the profiling of the compiler itself, not about using programs'
profiling information to improve optimizations. For the latter, please refer to
[OPTIM_PROFILING](../OPTIM_PROFILING.md).  It is possible to compile CompCert using the `Landmarks`
profiling library, in order to measure the compiler performance, and to find in which functions the
program spends most of its time.

## Landmarks

First, install [Landmarks](https://github.com/LexiFi/landmarks) on your computer with:
```
opam install landmarks-ppx
```

The Landmark library can either automatically instrument the code using a preprocessor extention
(-ppx) or work with a manually instrumented code (see the online documentation).

In our case, we prefer to use the automatic instrumentation, which will automatically instrument
top-level functions in each module.

## Cleaning and building CompCert with the profiling lib

First, we need to clean up our installation, in order to rebuild CompCert with a special Make
variable indicating that we want to use the profiling library, and to automatically instrument the
ocaml code.

```
make distclean
./config_<your_platform>.sh
```

Then, edit the file `Makefile.config` and change the `OCAML_LM_PROF` variable to `true`. After that,
we can follow the classical comilation process:

```
make -jN
```

## Profile CompCert on a specific input file

To manually extract profiling information about CompCert, on a choosen source file, one only have to
set an environment variable as in this example:

```
OCAML_LANDMARKS=on,format=json,output=temporary:<directory>,time ./ccomp -S <source_file>.c
```

The `OCAML_LANDMARKS` variable will be read at runtime. Below are some explanations about the
arguments:

The arguments are of the form `arg[=option]` where, for flags without sub-arguments, the text in
brackets is ommitted.
The `on` option is just to turn on profiling, but can be replaced by `off` to disable it.
The `format=json` tells the profiling lib to output data as a json, but it may be replaced by
`textual` to output data in a tty style format.
The `output=temporary:<directory>` tells the profiling to store the results in `<directory>`, but
it's also possible to use `stderr` or `stdout`.
Using a temporary dir is handy because files will get a generated (unique) name, and as the json
contains a label with the name of the source program, there is no loss of information.
The `temporary` option is useful when writing automated scripts.
Finally, the `time` option is used to record system time (in seconds) of execution in addition to
the number of clock ticks.

## Extract the profiling ratios for some function(s)

Using the above command, you can get a profiling database for any program compiled with CompCert.
Now, to examine the time spent in one or more function, compared to the total time, one can use the
`prof_function.sh` script.

```
./prof_function.sh jq <path_to_json_file> <fct1;fct2;...>
```

Call the script using the above template, where `jq` is a json parser available on every linux
distribution (it needs to be installed). The second argument is the path of the json file, and the
third is a list of function to monitor. For instance, in the `RTLpath` version of CompCert (on the
CPP_2022 branch), I use `function_equiv_checker;WF.*function_checker;Live.*function_checker`.

The above list contains three elements (regexp, separated by ";"):
- the first is the name of the SE checker
- the second is the wellformed liveness checker (for pre-output-regs)
- the third is the "normal" liveness checker

When calling the script using this list of functions, and a json file obtained with the above
method, the script will output the following result:

```
./prof_function.sh jq ../Compcert_one/csv_tmp/profile_at_exitf1e257.tmp "function_equiv_checker;WF.*function_checker;Live.*function_checker"
total_cycles,total_time
55433885580.00000000000000000000,15.36316000000000000017
sum_fct_cycles,sum_fct_time
802227564.00000000000000000000,.22260400000000000001
test_name,ratio_cycles,ratio_time
"../../ccomp -S -stdlib ../../runtime -dparse -dclight -dasm -fstruct-return -c -o symbol.o symbol.c",1.44717902345534985300,1.44894670106931126100
```

Description of the above output (line by line):
The first line is the shell command used to call the script, the lines 2 and 3 are giving the total
number of cycles and the total system time used by CompCert when compiling the file.
The lines 4 and 5 represent those numbers (cycles, system time) but for the sum of the functions
taken as arguments (so in our example, the 3 checkers used in `RTLpath`).
Finally, lines 6 and 7 are providing the launched command (here to compile `symbol.c`, a CI test),
and the ratios for the number of cycles and system time by comparing the previous sum with the
totals.

## Profile Using the Continuous Integration tests

As testing with only one source file is not representative, there is a script that can launch the
above process for every test in the CI: `prof_ci_tests.sh`.  The latter takes two arguments: the
CompCert repo clone root directory (absolute path) and the list of functions, as above.

It will create a directory `json_tmp` containing the jsons files for each CI test, and a file
prof.csv, containing an array of the output (the last line - ratios - only) of `prof_function.sh`
for each test.

## Profile multiple versions of CompCert with different functions

To compute results using different CompCert version, use the script `prof_multiple_ccomp.sh`, after
editing the variables it contains. Using this script will call the `prof_ci_tests.sh` script on each
CompCert version specified, with the list of functions given for these version.

## Merging results obtained with `prof_multiple_ccomp.sh`

Gather every `prof.csv` (one for each profiled version) in the same place (by default, they are
created in the CompCert version root directory), and launch the `stats_prof_info.py` script (you
need to have the `pandas` module, from pip). The script takes as arguments the path of each json
file. Then, it will merge the csv (keeping the same order as args) into one final csv. The merge is
an inner join, so the tests not available in every version are deleted. The script also clean lines
full of zeros, that can occurs when the CI compile object files, for instance.

Once you get the resulting file, you can simply open it with libre office to average each column and
get a precise idea concerning the "weight" of the given function for each CompCert version.

## Example

I created those scripts to observe the performance difference between the verifier(s) of `RTLpath`,
`BTL` with a separated liveness checker (first version) and `BTL` with liveness check embedded in
symbolic execution (last version).
The first `BTL` version is named "BTL OLD" and the last "BTL NEW".
The collected data is accessible [here](https://gricad-gitlab.univ-grenoble-alpes.fr/gourdinl/verimag_pres/-/tree/master/benches/results/ccomp_profiling/btl_old_new_vs_rtlpath) as a tar gzip archive.
In the folder, filenames are prefixed with a number:
- "one" corresponds to RTLpath
- "two" corresponds to BTL NEW
- "three" corresponds to BTL OLD

The file `merged.csv` in the folder is the resulting file after merging.  I have manually modified
it, so the first line gives information on the CompCert version (column headers), and the last line
gives averages of each column.

The fact that, for a CompCert version, ratios of the cycles and time columns are close to each
other, is a good sign: if ratios are similar, the measure is precise.

We can conclude the following:
- For `RTLpath`: the three verifiers combined are (on average) responsible for about 2.11% of the
    total compilation time.
- For `BTL NEW`: the unique verifier is taking about 1.14%, which is the lower (thus best) execution
    time.
- For `BTL OLD`: the two verifiers combined are responsible for about 2.17%.

Hence, the old version of `BTL` was sligthly slower than `RTLpath`, but the new one is two times
faster!