From 731e787ff3c7017a262f51f94d6d576aaa4e9018 Mon Sep 17 00:00:00 2001 From: John Wickerson Date: Fri, 13 Aug 2021 13:21:42 +0000 Subject: Update on Overleaf. --- evaluation.tex | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/evaluation.tex b/evaluation.tex index 7afe23f..26917a0 100644 --- a/evaluation.tex +++ b/evaluation.tex @@ -179,16 +179,33 @@ By looking at the median, when division/modulo operations are enabled, we see th \begin{figure} \centering - \begin{tabular}{cccc}\toprule - \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule - 40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule - \end{tabular} - \caption{Results of fuzzing \vericert{} using random C files generated by Csmith.}\label{tab:fuzzing} + %\begin{tabular}{cccc}\toprule + % \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule + % 40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule + %\end{tabular} + + \begin{tikzpicture}[xscale=0.11] + \draw[-latex] (13,0.5) to (13,0.25); + \draw[-latex] (55,0.5) to (55,0.25); + \draw[-latex] (99.85,0.5) to (99.85,0.25); + \draw[green, line width=5mm] (0,0) to (26.0,0); + \draw[yellow, line width=5mm] (26.0,0) to (99.7,0); + \draw[red, line width=5mm] (99.7,0) to (100,0); + \node[anchor=south] at (13,0.5) {40379 passing (26.00\%)}; + \node[anchor=south] at (55,0.5) {114849 compile-time errors (73.97\%)}; + \node[anchor=south] at (100,0.5) {39 run-time errors (0.03\%)}; + \end{tikzpicture} + \caption{Results of fuzzing \vericert{} using 155267 random C programs generated by Csmith.}\label{tab:fuzzing} \end{figure} -To check that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, \vericert{} was fuzzed using Csmith~\cite{yang11_findin_under_bugs_c_compil}, with a similar configuration used by \citet{du21_fuzzin_high_level_synth_tools} when other HLS tools were fuzzed, including \legup{}. The main features that were turned off in addition to the ones turned off by \citet{du21_fuzzin_high_level_synth_tools} were generation of global variables, as well as generation of operations other than 32-bits. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0. +\begin{quotation} +{\it ``Beware of bugs in the above code; I have only proved it correct, not tried it.''} \par\hfill -- D. E. Knuth (1977) +\end{quotation} + +To gain further confidence that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, we fuzzed \vericert{} using Csmith~\cite{yang11_findin_under_bugs_c_compil}. \citeauthor{yang11_findin_under_bugs_c_compil} previously used Csmith in an extensive fuzzing campaign on CompCert and found a handful of bugs in the unverified parts of that compiler, so it is natural to explore whether it can find bugs in \vericert{} too. \citet{du21_fuzzin_high_level_synth_tools} have recently used Csmith to fuzz other HLS tools including \legup{}, so we configured Csmith in a similar way. In addition to the features turned off by \citeauthor{du21_fuzzin_high_level_synth_tools}, we turned off the generation of global variables and non-32-bit operations. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0. -The results of the fuzzing run are shown in Table~\ref{tab:fuzzing}. Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC. Then, because of some small limitations in the programs accepted by \vericert{}, most of the test-cases, 73.97\%, failed at compile time. The most common reasons for this were unsigned comparisons between integers, which needed to be signed for \vericert{} instead, and no support for 8-bit operations. The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code. Because the C code could not be tailored exactly to the supported subset of inputs \vericert{} accepts, such a high compile time failure rate is expected. Finally, and more interestingly, there were a total of 39 runtime failures, which the correctness theorem should be proving could not be possible. However, these 39 failures are all because of a bug in the pretty printing of the final Verilog code, where a single bit not operation (\texttt{!}) was used instead of the proper 32-bit not operation modelled in the semantics (\verb|~|). Once this bug was fixed, all test-cases passed. +The results of the fuzzing run are shown in Figure~\ref{tab:fuzzing}. Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC. Most of the test-cases, 73.97\%, failed at compile time. The most common reasons for this were unsigned comparisons between integers (\vericert{} requires them to be signed), and the presence of 8-bit operations (which \vericert{} does not support, and which we could not turn off due to a limitation in Csmith). % The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code. +Because the test-cases generated by Csmith could not be tailored exactly to the C fragment that \vericert{} supports, such a high compile-time failure rate is not unexpected. Finally, and most interestingly, there were a total of 39 run-time failures, which the correctness theorem should be proving impossible. However, all 39 of these failures are due to a bug in the pretty-printing of the final Verilog code, where a logical negation (\texttt{!}) was accidentally used instead of a bitwise negation (\verb|~|). Once this bug was fixed, all test-cases passed. %%% Local Variables: %%% mode: latex -- cgit