Update on Overleaf.

author: John Wickerson <j.wickerson@imperial.ac.uk> 2021-08-13 13:21:42 +0000
committer: node <node@git-bridge-prod-0> 2021-08-13 13:22:38 +0000
commit: 731e787ff3c7017a262f51f94d6d576aaa4e9018 (patch)
tree: e076c9f99ce26830ade832b5ed1f1337a5e64618 /evaluation.tex
parent: 9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3 (diff)
download: oopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.tar.gz
oopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.zip
1 files changed, 24 insertions, 7 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 7afe23f..26917a0 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -179,16 +179,33 @@ By looking at the median, when division/modulo operations are enabled, we see th
 
 \begin{figure}
   \centering
-  \begin{tabular}{cccc}\toprule
-    \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule
-    40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule
-  \end{tabular}
-  \caption{Results of fuzzing \vericert{} using random C files generated by Csmith.}\label{tab:fuzzing}
+  %\begin{tabular}{cccc}\toprule
+  %  \textbf{Passing} & \textbf{Compile time errors} & \textbf{Runtime errors} & \textbf{Total}\\\midrule
+  %  40379 (26.00\%) & 114849 (73.97\%) & 39 (0.03\%) & 155267\\\bottomrule
+  %\end{tabular}
+  
+  \begin{tikzpicture}[xscale=0.11]
+  \draw[-latex] (13,0.5) to (13,0.25);
+  \draw[-latex] (55,0.5) to (55,0.25);
+  \draw[-latex] (99.85,0.5) to (99.85,0.25);
+  \draw[green, line width=5mm] (0,0) to (26.0,0);
+  \draw[yellow, line width=5mm] (26.0,0) to (99.7,0);
+  \draw[red, line width=5mm] (99.7,0) to (100,0);
+  \node[anchor=south] at (13,0.5) {40379 passing (26.00\%)};
+  \node[anchor=south] at (55,0.5) {114849 compile-time errors (73.97\%)};
+  \node[anchor=south] at (100,0.5) {39 run-time errors (0.03\%)};
+  \end{tikzpicture}
+  \caption{Results of fuzzing \vericert{} using 155267 random C programs generated by Csmith.}\label{tab:fuzzing}
 \end{figure}
 
-To check that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, \vericert{} was fuzzed using Csmith~\cite{yang11_findin_under_bugs_c_compil}, with a similar configuration used by \citet{du21_fuzzin_high_level_synth_tools} when other HLS tools were fuzzed, including \legup{}.  The main features that were turned off in addition to the ones turned off by \citet{du21_fuzzin_high_level_synth_tools} were generation of global variables, as well as generation of operations other than 32-bits. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0.
+\begin{quotation}
+{\it ``Beware of bugs in the above code; I have only proved it correct, not tried it.''} \par\hfill -- D. E. Knuth (1977)
+\end{quotation}
+
+To gain further confidence that the Verilog designs generated by \vericert{} are actually correct, and that the correctness theorem is indeed effective, we fuzzed \vericert{} using Csmith~\cite{yang11_findin_under_bugs_c_compil}. \citeauthor{yang11_findin_under_bugs_c_compil} previously used Csmith in an extensive fuzzing campaign on CompCert and found a handful of bugs in the unverified parts of that compiler, so it is natural to explore whether it can find bugs in \vericert{} too. \citet{du21_fuzzin_high_level_synth_tools} have recently used Csmith to fuzz other HLS tools including \legup{}, so we configured Csmith in a similar way. In addition to the features turned off by \citeauthor{du21_fuzzin_high_level_synth_tools}, we turned off the generation of global variables and non-32-bit operations. The generated designs were tested by simulating them and comparing the output value to the results of compiling the test-cases with GCC 10.3.0.
 
-The results of the fuzzing run are shown in Table~\ref{tab:fuzzing}.  Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC.  Then, because of some small limitations in the programs accepted by \vericert{}, most of the test-cases, 73.97\%, failed at compile time.  The most common reasons for this were unsigned comparisons between integers, which needed to be signed for \vericert{} instead, and no support for 8-bit operations.  The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code.  Because the C code could not be tailored exactly to the supported subset of inputs \vericert{} accepts, such a high compile time failure rate is expected.  Finally, and more interestingly, there were a total of 39 runtime failures, which the correctness theorem should be proving could not be possible.  However, these 39 failures are all because of a bug in the pretty printing of the final Verilog code, where a single bit not operation (\texttt{!}) was used instead of the proper 32-bit not operation modelled in the semantics (\verb|~|).  Once this bug was fixed, all test-cases passed.
+The results of the fuzzing run are shown in Figure~\ref{tab:fuzzing}.  Out of 155267 test-cases generated by Csmith, 26\% of them passed, meaning they compiled without error and resulted in the same final value as GCC. Most of the test-cases, 73.97\%, failed at compile time.  The most common reasons for this were unsigned comparisons between integers (\vericert{} requires them to be signed), and the presence of 8-bit operations (which \vericert{} does not support, and which we could not turn off due to a limitation in Csmith). % The latter programs are present because of a slight limitation with how Csmith can be configured, as it does not allow 8-bit operations to be turned off completely, and will get stuck when it generates the C code.  
+Because the test-cases generated by Csmith could not be tailored exactly to the C fragment that \vericert{} supports, such a high compile-time failure rate is not unexpected. Finally, and most interestingly, there were a total of 39 run-time failures, which the correctness theorem should be proving impossible.  However, all 39 of these failures are due to a bug in the pretty-printing of the final Verilog code, where a logical negation (\texttt{!}) was accidentally used instead of a bitwise negation (\verb|~|).  Once this bug was fixed, all test-cases passed.
 
 %%% Local Variables:
 %%% mode: latex
author	John Wickerson <j.wickerson@imperial.ac.uk>	2021-08-13 13:21:42 +0000
committer	node <node@git-bridge-prod-0>	2021-08-13 13:22:38 +0000
commit	731e787ff3c7017a262f51f94d6d576aaa4e9018 (patch)
tree	e076c9f99ce26830ade832b5ed1f1337a5e64618 /evaluation.tex
parent	9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3 (diff)
download	oopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.tar.gz oopsla21_fvhls-731e787ff3c7017a262f51f94d6d576aaa4e9018.zip