summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-09-14 16:55:08 +0100
committerYann Herklotz <git@yannherklotz.com>2020-09-14 16:55:08 +0100
commit72e8dc4b707a13c68722ce2f28665a84df5a7937 (patch)
tree9ced0eafedd25d1a3e79d4b8980acf03e22e45a7
parentf09e782d0925bc735aadc29bf595d1e3cc187351 (diff)
downloadfccm21_esrhls-72e8dc4b707a13c68722ce2f28665a84df5a7937.tar.gz
fccm21_esrhls-72e8dc4b707a13c68722ce2f28665a84df5a7937.zip
Add small changes
-rw-r--r--intro.tex5
-rw-r--r--main.tex2
-rw-r--r--method-new.tex3
-rw-r--r--related.tex2
-rw-r--r--tool-figure.tex2
5 files changed, 6 insertions, 8 deletions
diff --git a/intro.tex b/intro.tex
index da09f55..27b2831 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,4 +1,3 @@
-
\section{Introduction}
High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape.
It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL.
@@ -66,9 +65,9 @@ This paper reports on our campaign to test HLS tools by fuzzing.
\item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing.
- \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). \JW{Put a sentence here summarising our findings from this experiment, once we have them.}
+ \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). \JW{Put a sentence here summarising our findings from this experiment, once we have them.}\YH{Yes, will do that as soon as I have them.}
\end{itemize}
-% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by gcc, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
+% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by GCC, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
% We have tested three widely used HLS tools: LegUp~\cite{canis13_legup}, Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, and the Intel HLS Compiler~\cite{?}. For all three tools, we were able to find valid C programs that cause crashes while compiling and valid C programs that cause wrong RTL to be generated. We have submitted a total of \ref{?} bug reports to the developers, \ref{?} of which have been confirmed and \ref{?} of which have now been fixed at the time of writing.
% We hope that our work serves to stimulate efforts to improve the quality of HLS tools.
diff --git a/main.tex b/main.tex
index ace7062..1a6b605 100644
--- a/main.tex
+++ b/main.tex
@@ -115,7 +115,7 @@
High-level synthesis (HLS) is becoming an increasingly important part of the computing landscape, even in safety-critical domains where correctness is key.
As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
-We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and the Intel HLS Compiler -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the gcc \JW{We should decide on a consistent way to capitalise `gcc'. I vote for `GCC' because that's how Wikipedia writes it.} compiler. When discrepancies arose between gcc and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all three HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
+We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and the Intel HLS Compiler -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all three HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
\end{abstract}
%%
diff --git a/method-new.tex b/method-new.tex
index e34d99a..695a173 100644
--- a/method-new.tex
+++ b/method-new.tex
@@ -60,7 +60,6 @@ This is vital for our work since we want to generate programs that are HLS-frien
\code{safe\_ops\_signed\_prob} & Disabled \\
\code{binary\_bit\_and/or\_prob} & Disabled \\
\code{-{}-no-packed-struct} & Enabled \\
- \code{-{}-no-safe-math} & Enabled \\
\code{-{}-no-embedded-assigns} & Enabled\\
\code{-{}-no-argc} & Enabled\\
\code{-{}-max-funcs} & 5 \\
@@ -70,7 +69,7 @@ This is vital for our work since we want to generate programs that are HLS-frien
\bottomrule
\end{tabular}
\caption{Summary of important changes to Csmith's feature probabilities and parameters to generate HLS-friendly programs for our testing campaign.
- \JW{Shouldn't `no-safe-math' be disabled, rather than enabled? I mean, we want safe-math enabled, right? So we should disable no-safe-math, no?}
+ \JW{Shouldn't `no-safe-math' be disabled, rather than enabled? I mean, we want safe-math enabled, right? So we should disable no-safe-math, no?}\YH{Yes, that shouldn't have been there.}
% \NR{Discussion point: Do we enable unions or not? If not, why do have a union bug in Fig. 4? Reviewers may question this.}
% \YH{I've now removed the union bug, it's too similar to another bug I think.}
}\label{tab:properties}
diff --git a/related.tex b/related.tex
index 466040b..6097355 100644
--- a/related.tex
+++ b/related.tex
@@ -4,7 +4,7 @@ The only other work of which we are aware on fuzzing HLS tools is that by Lidbur
Other stages of the FPGA toolchain have been subjected to fuzzing. Herklotz et al.~\cite{verismith} tested several FPGA synthesis tools using randomly generated Verilog programs. Where they concentrated on the RTL-to-netlist stage of hardware design, we focus our attention on the earlier C-to-RTL stage.
-Several authors have taken steps toward more rigorously engineered HLS tools that may be more resilient to testing campaigns such as ours. One such example might be a Handel-C compiler which has been mechanically proven correct, at least in part, using the HOL theorem prover~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}. However, Handel-C relies upon the user to perform the parallelisation that is necessary for good hardware performance, and would require many more checks to avoid data races. Another example is an HLS tool called SPARK~\cite{gupta03_spark}, which contains passes that are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}, such as the scheduling algorithm. Unfortunately the tools are not readily available yet to test properly, or do not support C as input directly. Finally, the HLS tool Catapult C~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. \JW{Obvious reader question at this point: why not test that claim by giving our Csmith test cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}
+Several authors have taken steps toward more rigorously engineered HLS tools that may be more resilient to testing campaigns such as ours. One such example might be a Handel-C compiler which has been mechanically proven correct, at least in part, using the HOL theorem prover~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}. However, Handel-C relies upon the user to perform the parallelisation that is necessary for good hardware performance, and would require many more checks to avoid data races. Another example is an HLS tool called SPARK~\cite{gupta03_spark}, which contains passes that are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}, such as the scheduling algorithm. Unfortunately the tools are not readily available yet to test properly, or do not support C as input directly. Finally, the HLS tool Catapult C~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. Future work will address testing Catapult-C as well, however, as Vivado HLS, LegUp HLS and Intel HLS are more prevalent these were prioritised. \JW{Obvious reader question at this point: why not test that claim by giving our Csmith test cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}\YH{Yes, definitely, it would be great to get an idea of how Catapult C performs, and I currently have it installed already. I have added a small sentence for that now, but let me know if I should mention this in the conclusion instead though. }
%%% Local Variables:
%%% mode: latex
diff --git a/tool-figure.tex b/tool-figure.tex
index f4226d6..8c140a7 100644
--- a/tool-figure.tex
+++ b/tool-figure.tex
@@ -7,7 +7,7 @@ block/.style={draw, minimum width=20mm, minimum height=8mm,align=center},
\begin{tikzpicture}
\node[block] (csmith) at (-2,1) {\strut Csmith};
\node[block] (pragma) at (0,-1) {\strut pragma \\ generation};
- \node[block] (gcc) at (4,1) {\strut gcc};
+ \node[block] (gcc) at (4,1) {\strut GCC};
\node[block] (hls) at (4,-1) {\strut HLS};
\node[block] (eq) at (8,1) {\strut co-simulation \\ checking };
\node[block] (red) at (8,-1) {\strut reduction};