summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2021-08-13 13:21:42 +0000
committernode <node@git-bridge-prod-0>2021-08-13 13:22:38 +0000
commit731e787ff3c7017a262f51f94d6d576aaa4e9018 (patch)
treee076c9f99ce26830ade832b5ed1f1337a5e64618 /evaluation.tex
parent9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3 (diff)
downloadoopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.tar.gz
oopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.zip
Update on Overleaf.
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex31
1 files changed, 24 insertions, 7 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 7afe23f..26917a0 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -179,16 +179,33 @@ By looking at the median, when division/modulo operations are enabled, we see th
\begin{figure}
\centering
- \begin{tabular}{cccc}\toprule
- \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule
- 40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule
- \end{tabular}
- \caption{Results of fuzzing \vericert{} using random C files generated by Csmith.}\label{tab:fuzzing}
+ %\begin{tabular}{cccc}\toprule
+ % \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule
+ % 40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule
+ %\end{tabular}
+
+ \begin{tikzpicture}[xscale=0.11]
+ \draw[-latex] (13,0.5) to (13,0.25);
+ \draw[-latex] (55,0.5) to (55,0.25);
+ \draw[-latex] (99.85,0.5) to (99.85,0.25);
+ \draw[green, line width=5mm] (0,0) to (26.0,0);
+ \draw[yellow, line width=5mm] (26.0,0) to (99.7,0);
+ \draw[red, line width=5mm] (99.7,0) to (100,0);
+ \node[anchor=south] at (13,0.5) {40379 passing (26.00\%)};
+ \node[anchor=south] at (55,0.5) {114849 compile-time errors (73.97\%)};
+ \node[anchor=south] at (100,0.5) {39 run-time errors (0.03\%)};
+ \end{tikzpicture}
+ \caption{Results of fuzzing \vericert{} using 155267 random C programs generated by Csmith.}\label{tab:fuzzing}
\end{figure}
-To check that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, \vericert{} was fuzzed using Csmith~\cite{yang11_findin_under_bugs_c_compil}, with a similar configuration used by \citet{du21_fuzzin_high_level_synth_tools} when other HLS tools were fuzzed, including \legup{}. The main features that were turned off in addition to the ones turned off by \citet{du21_fuzzin_high_level_synth_tools} were generation of global variables, as well as generation of operations other than 32-bits. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0.
+\begin{quotation}
+{\it ``Beware of bugs in the above code; I have only proved it correct, not tried it.''} \par\hfill -- D. E. Knuth (1977)
+\end{quotation}
+
+To gain further confidence that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, we fuzzed \vericert{} using Csmith~\cite{yang11_findin_under_bugs_c_compil}. \citeauthor{yang11_findin_under_bugs_c_compil} previously used Csmith in an extensive fuzzing campaign on CompCert and found a handful of bugs in the unverified parts of that compiler, so it is natural to explore whether it can find bugs in \vericert{} too. \citet{du21_fuzzin_high_level_synth_tools} have recently used Csmith to fuzz other HLS tools including \legup{}, so we configured Csmith in a similar way. In addition to the features turned off by \citeauthor{du21_fuzzin_high_level_synth_tools}, we turned off the generation of global variables and non-32-bit operations. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0.
-The results of the fuzzing run are shown in Table~\ref{tab:fuzzing}. Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC. Then, because of some small limitations in the programs accepted by \vericert{}, most of the test-cases, 73.97\%, failed at compile time. The most common reasons for this were unsigned comparisons between integers, which needed to be signed for \vericert{} instead, and no support for 8-bit operations. The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code. Because the C code could not be tailored exactly to the supported subset of inputs \vericert{} accepts, such a high compile time failure rate is expected. Finally, and more interestingly, there were a total of 39 runtime failures, which the correctness theorem should be proving could not be possible. However, these 39 failures are all because of a bug in the pretty printing of the final Verilog code, where a single bit not operation (\texttt{!}) was used instead of the proper 32-bit not operation modelled in the semantics (\verb|~|). Once this bug was fixed, all test-cases passed.
+The results of the fuzzing run are shown in Figure~\ref{tab:fuzzing}. Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC. Most of the test-cases, 73.97\%, failed at compile time. The most common reasons for this were unsigned comparisons between integers (\vericert{} requires them to be signed), and the presence of 8-bit operations (which \vericert{} does not support, and which we could not turn off due to a limitation in Csmith). % The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code.
+Because the test-cases generated by Csmith could not be tailored exactly to the C fragment that \vericert{} supports, such a high compile-time failure rate is not unexpected. Finally, and most interestingly, there were a total of 39 run-time failures, which the correctness theorem should be proving impossible. However, all 39 of these failures are due to a bug in the pretty-printing of the final Verilog code, where a logical negation (\texttt{!}) was accidentally used instead of a bitwise negation (\verb|~|). Once this bug was fixed, all test-cases passed.
%%% Local Variables:
%%% mode: latex