summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-09-15 02:27:51 +0100
committerYann Herklotz <git@yannherklotz.com>2020-09-15 02:27:51 +0100
commita5507ce92f57bafa51131386a7e2e0bfe62c26e8 (patch)
treee897821e035acb858187276a6f43e751b4b830bf
parent8922c77f9ba66a9232a106ad5635b7a70c1d9630 (diff)
downloadfccm21_esrhls-a5507ce92f57bafa51131386a7e2e0bfe62c26e8.tar.gz
fccm21_esrhls-a5507ce92f57bafa51131386a7e2e0bfe62c26e8.zip
Add more stuff
-rw-r--r--conclusion.tex2
-rw-r--r--eval_rewrite.tex28
-rw-r--r--intro.tex2
-rw-r--r--method-new.tex16
4 files changed, 25 insertions, 23 deletions
diff --git a/conclusion.tex b/conclusion.tex
index bad70b9..273b705 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -1,5 +1,5 @@
\section{Conclusion}
-We have shown how existing fuzzing tools can be modified so that their outputs are compatible with HLS tools. We have used this testing framework to run 10,000 test cases \JW{check numbers} through three different HLS tools. In total, we found at least 6 individual and unique bugs in all the tools, which have been reduced, analysed, and reported to the tool vendors. These bugs include crashes as well as instances of generated designs not behaving in the same way as the original code.
+We have shown how existing fuzzing tools can be modified so that their outputs are compatible with HLS tools. We have used this testing framework to run 6,700 test cases through three different HLS tools, including 3,645 test cases through three different version of Vivado HLS to show how bugs are fixed and introduced. In total, we found at least 6 individual and unique bugs in all the tools, which have been reduced, analysed, and reported to the tool vendors. These bugs include crashes as well as instances of generated designs not behaving in the same way as the original code.
One can always question how much bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the wild'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features that they handle. Nevertheless, we would argue that any errors in the HLS tool are worth identifying because they have the potential to cause problems, either now or in the future. And when HLS tools \emph{do} go wrong (or indeed any sort of compiler for that matter), it is particularly infuriating for end-users because it is so difficult to identify whether the fault lies with the tool or with the program it has been given to compile.
diff --git a/eval_rewrite.tex b/eval_rewrite.tex
index e084b05..7595b11 100644
--- a/eval_rewrite.tex
+++ b/eval_rewrite.tex
@@ -44,7 +44,7 @@
\label{tab:unique_bugs}
\end{table}
-To evaluate the different HLS tools 10,000 \JW{check numbers} test cases were generated and fed into each tool, keeping the test cases constant so that the comparison between the tools was fair. Three different tools were tested, including three different versions of Vivado HLS, which are shown in Table~\ref{tab:unique_bugs}. Bugs were found in all tools that were tested, and in total, \ref{??} unique bugs were found and reported to the tool vendors.
+To evaluate the different HLS tools 6,700 test cases were generated and fed into each tool, keeping the test cases constant so that the comparison between the tools was fair. Three different tools were tested, including three different versions of Vivado HLS, which are shown in Table~\ref{tab:unique_bugs}. Bugs were found in all tools that were tested, and in total, \ref{??} unique bugs were found and reported to the tool vendors.
We were only able to test one version of LegUp HLS (version 4.0). LegUp 7.5 is GUI-based and not suitable for scripting; however, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
\subsection{Bugs found}
@@ -53,7 +53,7 @@ This section describes some of the bugs that were found in the various tools tha
\subsubsection{LegUp Assertion Error}
-The following piece of code produces an assertion error in LegUp even though it should compile without any errors, meaning an analysis pass in LegUp is incorrectly. This assertion error is equivalent to an unexpected crash of the tool as it means that an unexpected state was reached by this input. This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
+The piece of code shown in Figure~\ref{fig:eval:legup:assert} produces an assertion error in LegUp 4.0 and 7.5 even though it should compile without any errors. The assertion error states that This assertion error is equivalent to an unexpected crash of the tool as it means that an unexpected state was reached by this input. This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
\begin{figure}
\begin{minted}{c}
@@ -61,12 +61,10 @@ int a[2][2][1] = {{{0},{1}},{{0},{0}}};
int main() { a[0][1][0] = 1; }
\end{minted}
-\caption{An assertion bug in LegUp HLS.}
-\label{fig:eval:legup:assert}
+\caption{An assertion bug in LegUp HLS when setting \texttt{NO\_INLINE} to prevent inlining.}\label{fig:eval:legup:assert}
\end{figure}
-
-The buggy test case has to do with initialisation and assignment to a three dimensional array, for which the above piece of code is the minimal example. It initialises the array with zeros except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location, which causes LegUp to crash with an assertion error.
+The buggy test case has to do with initialisation and assignment to a three dimensional array, for which the above piece of code is the minimal example. However, in addition to that it requires the \texttt{NO\_INLINE} constant to be set, which disallows inlining of functions. The code initialises the array with zeros except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location, which causes LegUp to crash with an assertion error. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design does that LegUp generates when the \texttt{NO\_INLINE} flag is turned off.
%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
%
@@ -80,7 +78,7 @@ The buggy test case has to do with initialisation and assignment to a three dime
\subsubsection{LegUp Miscompilation}
-The following test case produces an incorrect netlist in LegUp 4.0, meaning the result of simulating the design and running the C code directly is different.
+The test case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect netlist in LegUp 4.0, meaning the result of simulating the design and running the C code directly is different.
\begin{figure}
\begin{minted}{c}
@@ -94,15 +92,14 @@ int main() {
return b;
}
\end{minted}
-\caption{An output mismatch where GCC returns 1 and LegUp HLS returns 0.}
-\label{fig:eval:legup:wrong}
+\caption{An output mismatch where GCC returns 1 and LegUp HLS returns 0.}\label{fig:eval:legup:wrong}
\end{figure}
-In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the netlist contains the correct result. As \texttt{a} and \texttt{d} are constants, the if-statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0.
+In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the netlist contains the correct result. As \texttt{a} and \texttt{d} are constants, the if-statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the if-statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not have changed. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the \texttt{volatile} keyword and the analysis that is performed to simplify the if-statement.
\subsubsection{Vivado Miscompilation}
-The following code does not output the right value when compiled with all Vivado versions and GCC, as it returns \texttt{0x0} with Vivado whereas it should be returning \texttt{0xF}. This test case is much longer compared to the other test cases that were reduced and could not be made any smaller, as everything in the code is necessary to trigger the bug.
+Figure~\ref{fig:eval:vivado:mismatch} shows code that does not output the right value when compiled with all Vivado versions and GCC, as it returns \texttt{0x0} with Vivado whereas it should be returning \texttt{0xF}. This test case is much longer compared to the other test cases that were reduced and could not be made any smaller, as everything in the code is necessary to trigger the bug.
The array \texttt{a} is initialised to all zeros, as well as the other global variables \texttt{g} and \texttt{c}, so as to not introduce any undefined behaviour. However, \texttt{g} is also assigned the \texttt{volatile} keyword, which ensures that the variable is not optimised away. The function \texttt{d} then accumulates the values \texttt{b} that it is passed into a hash that is stored in \texttt{c}. Each \texttt{b} is eight bits wide, so function \texttt{e} calls the function 7 times for some of the bits in the 64 bit value of \texttt{f} that it is passed. Finally, in the main function, the array is initialised partially with a for loop, after which the \texttt{e} function is called twice, once on the volatile function but also on a constant. Interestingly enough, the second function call with the constant is also necessary to trigger the bug.
@@ -125,17 +122,14 @@ int main() {
return c;
}
\end{minted}
-\caption{An output mismatch where GCC returns \texttt{0xF}, whereas Vivado HLS return \texttt{0x0}.}
-\label{fig:eval:vivado:mismatch}
+\caption{An output mismatch where GCC returns \texttt{0xF}, whereas Vivado HLS return \texttt{0x0}.}\label{fig:eval:vivado:mismatch}
\end{figure}
\subsection{Bugs in Vivado HLS versions}
In addition to the explanation to bugs given in Section~\ref{sec:evaluation}, bugs found in various versions of Vivado are also analysed, which are shown in Figure~\ref{fig:sankey_diagram}. The figure depicts failing test cases in 3645 test cases that were passed to Vivado 2018.3, 2019.1 and 2019.2. All test cases that fail in the same tools are grouped together into a ribbon, showing when a bug is present in one of the tools.
-Firstly, there is a group of failing test cases that is constant between all versions of Vivado HLS, meaning these are bugs that were not fixed between the versions. Other ribbons can be seen weaving in and out of failing for a version, meaning these bugs were fixed or reintroduced in those versions. From the diagram it can then be seen that Vivado HLS 2018.3 contains the most failing test cases compared to the other versions, having 62 test cases fail in total. Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
-
-
+There is a group of failing test cases that is constant between all versions of Vivado HLS, meaning these are bugs that were not fixed between the versions. Other ribbons can be seen weaving in and out of failing for a version, meaning these bugs were fixed or reintroduced in those versions. The diagram demonstrates that Vivado HLS 2018.3 contains the most failing test cases compared to the other versions, having 62 test cases fail in total. Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
\definecolor{ribbon1}{HTML}{8dd3c7}
\definecolor{ribbon2}{HTML}{b3de69}
@@ -179,6 +173,8 @@ Firstly, there is a group of failing test cases that is constant between all ver
\caption{A Sankey diagram that tracks 3645 test cases through three different versions of Vivado HLS. The ribbons collect the test cases that pass and fail together. The 3573 test cases that pass in all three versions are not depicted. }\label{fig:sankey_diagram}
\end{figure}
+From this diagram it can also be observed that there must be at least six individual bugs that were found by the fuzzer in Vivado HLS, as a ribbon must contain at least one unique bug.\YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?}.
+
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
diff --git a/intro.tex b/intro.tex
index 2569fe9..ae8da19 100644
--- a/intro.tex
+++ b/intro.tex
@@ -65,7 +65,7 @@ This paper reports on our campaign to test HLS tools by fuzzing.
\item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing.
- \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). \JW{Put a sentence here summarising our findings from this experiment, once we have them.}\YH{Yes, will do that as soon as I have them.}
+ \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). We found that in general there about half as many failures in versions 2019.1 and 2019.2 compared to 2018.3. However, there were also test cases that only failed in versions 2019.1 and 2019.2, meaning bugs were probably introduced due to the addition of new features.
\end{itemize}
% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by GCC, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
diff --git a/method-new.tex b/method-new.tex
index 8ff8281..ef8c62c 100644
--- a/method-new.tex
+++ b/method-new.tex
@@ -89,10 +89,14 @@ More importantly, we disable the generation of several language features to enab
First, we ensure that all mathematical expressions are safe and unsigned, to ensure no undefined behaviour.
We also disallow assignments being embedded within expressions, since HLS generally does not support them.
We eliminate any floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which in turn make it hard to reduce bugs to their minimal form.
-We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.
-We also disable void functions, since we are not supporting pointers.
+We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.\YH{I've looked at the documentation and even pointer to pointer is supported, but maybe not pointer to pointer to pointer. I think there was some other pointer assignment that didn't quite work, but I don't remember now. Immature might be a good description though.}
+We also disable void functions, since we are not supporting pointers.\YH{Figure \ref{fig:eval:vivado:mismatch} actually has void functions...}
We disable the generation of unions as these were not well supported by some of the tools such as LegUp 4.0.
-\JW{Obvious reader question at this point: if a feature is badly-supported by some HLS tool(s), how do we decide between disabling it in Csmith vs. keeping it in and filing lots of bug reports? For instance, we say that we disable pointers because lots of HLS tools don't cope well with them, but we keep in `volatile' which also leads to problems. Why the different treatments?}
+
+To differentiate between if a feature should be disabled or reported as a bug, the tool in question is taken into account. Unfortunately there is no standard subset of C that should be supported by HLS tools, and every tool chooses a slightly different subset. It is therefore important to choose the right subset so that the most bugs are found in each tool, but we are not generating code that the tool does not support. Therefore, to decide if a feature should be disable it should fail gracefully in one of the tools, stating that the feature is not supported or explaining what the issue is. If the HLS tool fails in a different way though, such as generating a wrong design or crashing during synthesis, the feature is kept in the generated test cases.
+
+Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.
+
We enforce that the main function of each generated program must not have any input arguments to allow for HLS synthesis.
We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma.
We also disable bitwise AND and OR operations because when applied to constant operands, some versions of Vivado HLS errored out with `Wrong pragma usage.'
@@ -179,11 +183,13 @@ We do not count time-outs as bugs, but we record them.
% And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test case can be dumped at any timeout period if the task is not finished within the limited time.
-\subsection{Reducing buggy programs}
-\label{sec:method:reduce}
+\subsection{Reducing buggy programs}\label{sec:method:reduce}
+
Once we discover a program that crashes the HLS tool or whose C/RTL simulations do not match, we further scrutinise the program to identify the root cause(s) of the undesirable behaviour.
As the programs generated by Csmith can be fairly large, we must systematically reduce these programs to identify the source of a bug.
+\YH{Edit this section some more}
+
Reduction is performed by iteratively removing some part of the original program and then providing the reduced program to the HLS tool for re-synthesis and co-simulation.
The goal is to find the smallest program that still triggers the bug.
We apply two consecutive methods of reduction in this work.