summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2020-09-15 08:57:28 +0000
committeroverleaf <overleaf@localhost>2020-09-15 09:01:53 +0000
commit4daefd9bed7dea9500b3cc626266fd979fb2edcc (patch)
tree138282b20b6de440a2cd0468a744364e00e4b26a
parentebaf38ee9fa9ed6a74230c93fa2d15df44521c9a (diff)
downloadfccm21_esrhls-4daefd9bed7dea9500b3cc626266fd979fb2edcc.tar.gz
fccm21_esrhls-4daefd9bed7dea9500b3cc626266fd979fb2edcc.zip
Update on Overleaf.
-rw-r--r--conclusion.tex4
-rw-r--r--eval.tex69
-rw-r--r--intro.tex4
-rw-r--r--method.tex12
-rw-r--r--related.tex4
5 files changed, 53 insertions, 40 deletions
diff --git a/conclusion.tex b/conclusion.tex
index 273b705..e027b27 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -1,11 +1,11 @@
\section{Conclusion}
-We have shown how existing fuzzing tools can be modified so that their outputs are compatible with HLS tools. We have used this testing framework to run 6,700 test cases through three different HLS tools, including 3,645 test cases through three different version of Vivado HLS to show how bugs are fixed and introduced. In total, we found at least 6 individual and unique bugs in all the tools, which have been reduced, analysed, and reported to the tool vendors. These bugs include crashes as well as instances of generated designs not behaving in the same way as the original code.
+We have shown how existing fuzzing tools can be modified so that their outputs are compatible with HLS tools. We have used this testing framework to run 6,700 test-cases through three different HLS tools, and 3,645 test-cases through three different version of Vivado HLS to show how bugs are fixed and introduced. In total, we found at least 6 unique bugs in all the tools. These bugs include crashes as well as instances of generated designs not behaving in the same way as the original code.
One can always question how much bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the wild'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features that they handle. Nevertheless, we would argue that any errors in the HLS tool are worth identifying because they have the potential to cause problems, either now or in the future. And when HLS tools \emph{do} go wrong (or indeed any sort of compiler for that matter), it is particularly infuriating for end-users because it is so difficult to identify whether the fault lies with the tool or with the program it has been given to compile.
Further work could be done on supporting more HLS tools, especially ones that claim to prove that their output is correct before terminating. This could give an indication on how effective these proofs are, and how often they are actually able to complete their equivalence proofs during compilation in a feasible time scale.
-As HLS is becoming increasingly relied upon, it is important to make sure that HLS tools are also reliable. We hope that this work further motivates the need for rigorous engineering of HLS tools, either by validating that each output the tool produces is correct or by proving the HLS tool itself correct once and for all.
+As HLS is becoming increasingly relied upon, it is important to make sure that HLS tools are also reliable. We hope that this work further motivates the need for rigorous engineering of HLS tools, either by validating that each output the tool produces is correct or by proving the HLS tool itself correct once and for all.
%%% Local Variables:
%%% mode: latex
diff --git a/eval.tex b/eval.tex
index 1708519..c3c66fc 100644
--- a/eval.tex
+++ b/eval.tex
@@ -13,9 +13,9 @@
\draw[white] (-4.4,4.4) ellipse (3.75 and 2.75); % making the
\draw[white] (-10.2,4.4) ellipse (3.75 and 2.75); % outlines
\draw[white] (-7.3,2) ellipse (3.75 and 2.75); % fully opaque
- \node[align=center] at (-10.2,6.3) {\Large\textsf{\textbf{Xilinx Vivado HLS}} \\ \textsf{\textbf{(all versions)}}};
- \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel HLS Compiler}}};
- \node at (-7.3,0) {\Large\textsf{\textbf{LegUp}}};
+ \node[align=center] at (-10.2,6.3) {\Large\textsf{\textbf{Xilinx Vivado HLS}} \\ \Large\textsf{\textbf{2019.1}}};
+ \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel i++}}};
+ \node at (-7.3,0) {\Large\textsf{\textbf{LegUp 4.0}}};
\node at (-5.5,3) {\Huge 1 (\textcolor{red}{1})};
\node at (-9.1,3) {\Huge 4 (\textcolor{red}{0})};
@@ -27,7 +27,7 @@
\node at (-13.6,-0.5) {\Huge 5856};
\end{tikzpicture}
}
-\caption{A Venn diagram showing the number of failures in each tool out of 6700 test cases that were run. Overlapping regions mean that the test cases failed in all those tools. The numbers in parentheses represent the number of test cases that timed out.}\label{fig:existing_tools}
+\caption{A Venn diagram showing the number of failures in each tool out of 6700 test-cases that were run. Overlapping regions mean that the test-cases failed in all those tools. The numbers in parentheses represent the number of test-cases that timed out.}\label{fig:existing_tools}
\end{figure}
\begin{table}
@@ -37,20 +37,20 @@
\midrule
Xilinx Vivado HLS (all versions) & $\ge 2$\\
LegUp HLS & $\ge 3$\\
- Intel HLS Compiler & $\ge 1$\\
+ Intel i++ & $\ge 1$\\
\bottomrule
\end{tabular}
\caption{Unique bugs found in each tool.}
\label{tab:unique_bugs}
\end{table}
-We generate 6700 test cases and provide them to three HLS tools: Vivado HLS, LegUp HLS and Intel HLS.
-We use the same test cases across all tools for fair comparison.
-We were able to test three different versions of Vivado HLS (v2018.3,v2019.1 and v2019.2).
+We generate 6700 test-cases and provide them to three HLS tools: Vivado HLS, LegUp HLS and Intel i++.
+We use the same test-cases across all tools for fair comparison.
+We were able to test three different versions of Vivado HLS (v2018.3, v2019.1 and v2019.2).
We were only able to test one version of LegUp: 4.0.
At the point of writing, LegUp 7.5 is still GUI-based and therefore we could not script our tests.
-However, we are able to reproduce bugs found in LegUp 4.0 in LegUp 7.5.
-Finally, we tested one version of Intel HLS (vXXXX.X).
+However, we were able to manually reproduce bugs found in LegUp 4.0 in LegUp 7.5.
+Finally, we tested one version of Intel i++ (v\ref{XXXX.X}).
% Three different tools were tested, including three different versions of Vivado HLS. We were only able to test one version of LegUp HLS (version 4.0), because although LegUp 7.5 is available, it is GUI-based and not amenable to scripting. However, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
% LegUp and Vivado HLS were run under Linux, while the Intel HLS Compiler was run under Windows.
@@ -58,8 +58,8 @@ Finally, we tested one version of Intel HLS (vXXXX.X).
\subsection{Results across different HLS tools}
Figure~\ref{fig:existing_tools} shows a Venn diagram of our results.
-We see that 167 (2.5\%), 83 (1.2\%) and 26 (0.4\%) test-cases fail in LegUp, Vivado HLS and Intel HLS respectively.
-Despite Intel HLS having the lowet failure rate, it has the highest time-out rate with 540 programs, because of its long compilation time.
+We see that 167 (2.5\%), 83 (1.2\%) and 26 (0.4\%) test-cases fail in LegUp, Vivado HLS and Intel i++ respectively.
+Despite i++ having the lowest failure rate, it has the highest time-out rate (540 test-cases), because of its remarkably long compilation time.
% We remark that although the Intel HLS Compiler had the smallest number of confirmed test-case failures, it had the most time-outs (which could be masking additional failures)
Note that the absolute numbers here do not necessary correspond number of bugs found.
Multiple programs could crash or fail due to the same bug.
@@ -68,9 +68,9 @@ We write `$\ge$' in the table to indicate that all the bug counts are lower boun
\subsection{Results across versions of an HLS tool}
-Besides comparing the reliability of different HLS tools, we also investigated the reliability of Vivado HLS over time. Figure~\ref{fig:sankey_diagram} shows the results of giving 3645 test cases to Vivado HLS 2018.3, 2019.1 and 2019.2.
-Test cases that pass and fail in the same tools are grouped together into a ribbon.
-For instance, the topmost ribbon represents the 31 test-cases that fail in all three versions of Vivado HLS. Other ribbons can be seen weaving in and out; these indicate that bugs were fixed or reintroduced in the various versions. The diagram demonstrates that Vivado HLS 2018.3 contains the most failing test cases compared to the other versions, having 62 test cases fail in total. %Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
+Besides comparing the reliability of different HLS tools, we also investigated the reliability of Vivado HLS over time. Figure~\ref{fig:sankey_diagram} shows the results of giving 3645 test-cases to Vivado HLS 2018.3, 2019.1 and 2019.2.
+Test-cases that pass and fail in the same tools are grouped together into a ribbon.
+For instance, the topmost ribbon represents the 31 test-cases that fail in all three versions of Vivado HLS. Other ribbons can be seen weaving in and out; these indicate that bugs were fixed or reintroduced in the various versions. The diagram demonstrates that Vivado HLS 2018.3 contains the most failing test-cases compared to the other versions, having 62 test-cases fail in total. %Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
Interestingly, as an indicator of reliability of HLS tools, the blue ribbon shows that there are test-cases that fail in v2018.3, pass in v2019.1 but then fail again in 2019.2.
\definecolor{ribbon1}{HTML}{8dd3c7}
@@ -112,17 +112,18 @@ Interestingly, as an indicator of reliability of HLS tools, the blue ribbon show
\node[white] at (2,2.5) {36};
\node[white] at (4,2.25) {41};
\end{tikzpicture}
- \caption{A Sankey diagram that tracks 3645 test cases through three different versions of Vivado HLS. The ribbons collect the test cases that pass and fail together. The 3573 test cases that pass in all three versions are not depicted.
+ \caption{A Sankey diagram that tracks 3645 test-cases through three different versions of Vivado HLS. The ribbons collect the test-cases that pass and fail together. The 3573 test-cases that pass in all three versions are not depicted.
}\label{fig:sankey_diagram}
\end{figure}
% \NR{Why are there missing numbers in the ribbons?}
-As in our Venn diagram, the absolute numbers in Figure~\ref{fig:sankey_diagram} do not necessary correspond to the number of bugs. However, we can deduce from this diagram that there must be at least six unique bugs in Vivado HLS, given that a ribbon must contain at least one unique bug. \YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
+As in our Venn diagram, the absolute numbers in Figure~\ref{fig:sankey_diagram} do not necessary correspond to the number of bugs. However, we can deduce from this diagram that there must be at least six unique bugs in Vivado HLS, given that a ribbon must contain at least one unique bug.
+%\YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
In addition to that, it can then be seen that Vivado HLS v2018.3 must have at least 4 individual bugs, of which two were fixed and two others stayed in Vivado HLS v2019.1. However, with the release of v2019.1, new bugs were introduced as well. % Finally, for version 2019.2 of Vivado HLS, there seems to be a bug that was reintroduced which was also present in Vivado 2018.3, in addition to a new bug. In general it seems like each release of Vivado HLS will have new bugs present, however, will also contain many previous bug fixes. However, it cannot be guaranteed that a bug that was previously fixed will remain fixed in future versions as well.
\subsection{Some specific bugs found}
-This section describes some of the bugs that were found in the various tools that were tested. We describe two bugs in LegUp and one in Vivado HLS; in each case, the bug was first reduced automatically using \creduce{}, and then reduced further manually to achieve the minimal test case. Although we did find test-case failures in the Intel HLS Compiler, the very long compilation times for that tool meant that we did not have time to reduce any of the failures down to an example that is minimal enough to present here.
+This section describes some of the bugs that were found in the various tools that were tested. We describe two bugs in LegUp and one in Vivado HLS; in each case, the bug was first reduced automatically using \creduce{}, and then reduced further manually to achieve the minimal test-case. Although we did find test-case failures in Intel i++, the very long compilation times for that tool meant that we did not have time to reduce any of the failures down to an example that is minimal enough to present here.
\subsubsection{LegUp assertion error}
@@ -134,12 +135,14 @@ This shows that there is a bug in one of the compilation passes in LegUp, howeve
\begin{minted}{c}
int a[2][2][1] = {{{0},{1}},{{0},{0}}};
-int main() { a[0][1][0] = 1; }
+int main() {
+ a[0][1][0] = 1;
+}
\end{minted}
\caption{This program causes an assertion failure in LegUp HLS when \texttt{NO\_INLINE} is set.}\label{fig:eval:legup:assert}
\end{figure}
-The buggy test case has to do with initialisation and assignment to a three-dimensional array, for which the above piece of code is the minimal example. However, in addition to that it requires the \texttt{NO\_INLINE} flag to be set, which disables function inlining. The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
+The buggy test-case has to do with initialisation and assignment to a three-dimensional array, for which the above piece of code is the minimal example. However, in addition to that it requires the \texttt{NO\_INLINE} flag to be set, which disables function inlining. The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
%
@@ -153,7 +156,7 @@ The buggy test case has to do with initialisation and assignment to a three-dime
\subsubsection{LegUp miscompilation}
-The test case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect Verilog in LegUp 4.0, which means that the results of RTL simulation is different to the C execution.
+The test-case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect Verilog in LegUp 4.0, which means that the results of RTL simulation is different to the C execution.
\begin{figure}
\begin{minted}{c}
@@ -162,19 +165,23 @@ int b = 1;
int main() {
int d = 1;
- if (d + a) b || 1;
- else b = 0;
+ if (d + a)
+ b || 1;
+ else
+ b = 0;
return b;
}
\end{minted}
\caption{An output mismatch: LegUp HLS returns 0 but the correct result is 1.}\label{fig:eval:legup:wrong}
\end{figure}
-In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the Verilog produces the correct result. As \texttt{a} and \texttt{d} are constants, the \code{if} statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the \code{if} statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not change. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the \texttt{volatile} keyword and the analysis that is performed to simplify the \code{if} statement.
+In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the Verilog produces the correct result. As \texttt{a} and \texttt{d} are constants, the \code{if} statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the \code{if} statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not change. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the use of \texttt{volatile} keyword, which interferes with the analysis that attempts to simplify the \code{if} statement.
\subsubsection{Vivado HLS miscompilation}
-Figure~\ref{fig:eval:vivado:mismatch} shows code that does not output the right value when compiled with all Vivado HLS versions, as it returns \texttt{0x0} with Vivado HLS whereas it should be returning \texttt{0xF}. This test case is much longer compared to the other test cases that were reduced and could not be made any smaller, as everything in the code seems to be necessary to trigger the bug.
+Figure~\ref{fig:eval:vivado:mismatch} shows code that does not output the right result when compiled with all Vivado HLS versions.
+It returns \texttt{0x0} with Vivado HLS, instead of \texttt{0xF}. This test-case is much larger compared to the other test-cases that were reduced.
+We could not reduce this program any further, as everything in the code was necessary to trigger the bug.
The array \texttt{a} is initialised to all zeroes, as well as the other global variables \texttt{g} and \texttt{c}, so as to not introduce any undefined behaviour. However, \texttt{g} is also given the \texttt{volatile} keyword, which ensures that the variable is not optimised away. The function \texttt{d} then accumulates the values \texttt{b} that it is passed into a hash stored in \texttt{c}. Each \texttt{b} is eight bits wide, so function \texttt{e} calls the function 7 times for some of the bits in the 64-bit value of \texttt{f} that it is passed. Finally, in the main function, the array is initialised partially with a \code{for} loop, after which the \texttt{e} function is called twice, once on the volatile function and once on a constant. Interestingly, the second function call with the constant is also necessary to trigger the bug.
@@ -184,7 +191,9 @@ volatile unsigned int g = 0;
int a[256] = {0};
int c = 0;
-void d(char b) { c = (c & 4095) ^ a[(c ^ b) & 15]; }
+void d(char b) {
+ c = (c & 4095) ^ a[(c ^ b) & 15];
+}
void e(long f) {
d(f); d(f >> 8); d(f >> 16); d(f >> 24);
@@ -192,12 +201,14 @@ void e(long f) {
}
int main() {
- for (int i = 0; i < 56; i++) a[i] = i;
- e(g); e(-2L);
+ for (int i = 0; i < 56; i++)
+ a[i] = i;
+ e(g);
+ e(-2L);
return c;
}
\end{minted}
-\caption{An output mismatch where GCC returns \texttt{0xF}, whereas Vivado HLS return \texttt{0x0}.}\label{fig:eval:vivado:mismatch}
+\caption{An output mismatch: Vivado HLS returns \texttt{0x0} but the correct result is \texttt{0xF}.}\label{fig:eval:vivado:mismatch}
\end{figure}
diff --git a/intro.tex b/intro.tex
index 44a79e6..2a48d4b 100644
--- a/intro.tex
+++ b/intro.tex
@@ -62,11 +62,11 @@ This paper reports on our campaign to test HLS tools by fuzzing.
\begin{itemize}
\item We use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate ten thousand valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We augment each program with a random selection of HLS-specific directives.
- \item We give these programs to three widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and the Intel HLS Compiler~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
+ \item We give these programs to three widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and the Intel HLS Compiler, which is also known as i++~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
\item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing.
- \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). We found that in general there about half as many failures in versions 2019.1 and 2019.2 compared to 2018.3. However, there were also test cases that only failed in versions 2019.1 and 2019.2, meaning bugs were probably introduced due to the addition of new features.
+ \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). We found that in general there about half as many failures in versions 2019.1 and 2019.2 compared to 2018.3. However, there were also test-cases that only failed in versions 2019.1 and 2019.2, meaning bugs were probably introduced due to the addition of new features.
\end{itemize}
% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by GCC, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
diff --git a/method.tex b/method.tex
index d27f22b..b9356ec 100644
--- a/method.tex
+++ b/method.tex
@@ -89,13 +89,15 @@ More importantly, we disable the generation of several language features to enab
First, we ensure that all mathematical expressions are safe and unsigned, to ensure no undefined behaviour.
We also disallow assignments being embedded within expressions, since HLS generally does not support them.
We eliminate any floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which in turn make it hard to reduce bugs to their minimal form.
-We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.\YH{I've looked at the documentation and even pointer to pointer is supported, but maybe not pointer to pointer to pointer. I think there was some other pointer assignment that didn't quite work, but I don't remember now. Immature might be a good description though.}
-We also disable void functions, since we are not supporting pointers.\YH{Figure \ref{fig:eval:vivado:mismatch} actually has void functions...} \JW{Hm, true. Perhaps these were introduced during reduction.}
+We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.
+%\YH{I've looked at the documentation and even pointer to pointer is supported, but maybe not pointer to pointer to pointer. I think there was some other pointer assignment that didn't quite work, but I don't remember now. Immature might be a good description though.}
+We also disable void functions, since we are not supporting pointers.
+%\YH{Figure \ref{fig:eval:vivado:mismatch} actually has void functions...} \JW{Hm, true. Perhaps these were introduced during reduction.}
We disable the generation of unions as these were not supported by some of the tools such as LegUp 4.0.
To decide whether a problematic feature should be disabled or reported as a bug, the tool in question is taken into account. Unfortunately there is no standard subset of C that is supported by HLS tools; every tool chooses a slightly different subset. It is therefore important to choose the right subset in order to maximise the number of real bugs found in each tool, while avoiding generating code that the tool does not support. Therefore, we disable a feature if it fails gracefully (i.e. with an error message stating the issue) in one of the tools. If the HLS tool fails in a different way though, such as generating a wrong design or crashing during synthesis, the feature is kept in our test suite.
-Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.
+%Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.
We enforce that the main function of each generated program must not have any input arguments to allow for HLS synthesis.
We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma.
@@ -198,7 +200,7 @@ We first perform a custom reduction in which we iteratively remove the HLS direc
%Although, our custom reduction gives us the freedom and control of how to reduce buggy programs, it is arduous and requires a lot of manual effort.
We then use the \creduce{} tool~\cite{creduce} to automatically reduce the remaining C program.
\creduce{} is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
-It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test case that is small enough to understand and step through.
+It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test-case that is small enough to understand and step through.
However, the downside of using \creduce{} with HLS tools is that we are not in control of which lines and features are prioritised for removal.
As a consequence, we can easily end up with \creduce{} producing programs that are not synthesisable, despite being valid C.
@@ -206,7 +208,7 @@ As a consequence, we can easily end up with \creduce{} producing programs that a
Even though \creduce{} does not normally introduce undefined behaviour, it can introduce behaviour that is unsupported in the HLS tools.
An example is the reduction of a function call, where the reducer realises that a mismatch is still observed when the function call's arguments are removed, and the function pointer is assigned a constant instead.
This is however often unsupported in HLS tools, since a function pointer does not have a concrete interpretation in hardware, because in the absence of instructions, functions are not associated with a particular memory location.
-Once unhandled behaviour is introduced at any point in the reduction, the test cases will often zero in on that unhandled behaviour, even though it does not actually represent an interesting bug.
+Once unhandled behaviour is introduced at any point in the reduction, the test-cases will often zero in on that unhandled behaviour, even though it does not actually represent an interesting bug.
To prevent this, we use a script to guide \creduce{} away from introducing these unhandled behaviours as much as possible.
This script involves adding \texttt{-fsanitize=undefined} to the GCC options in order to abort when undefined behaviour is detected at runtime, and erroring out whenever any warning is encountered while running the HLS tool (except common warnings that are known to be harmless).
diff --git a/related.tex b/related.tex
index a9d2f58..039737d 100644
--- a/related.tex
+++ b/related.tex
@@ -1,6 +1,6 @@
\section{Related Work}
-The only other work of which we are aware on fuzzing HLS tools is that by Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin}, who tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test cases they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the inherent nondeterminism of concurrency.
+The only other work of which we are aware on fuzzing HLS tools is that by Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin}, who tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test-cases they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the inherent nondeterminism of concurrency.
Other stages of the FPGA toolchain have been subjected to fuzzing. Herklotz et al.~\cite{verismith} tested several FPGA synthesis tools using randomly generated Verilog programs. Where they concentrated on the RTL-to-netlist stage of hardware design, we focus our attention on the earlier C-to-RTL stage.
@@ -15,7 +15,7 @@ However, the tool does not support C as input directly, so is not amenable to fu
\item In the SPARK HLS tool~\cite{gupta03_spark}, some compiler passes, such as scheduling, are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}. Unfortunately, this tool is not yet readily available to test properly.
-\item Finally, the Catapult C HLS tool~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. In future work, we intend to test Catapult C alongside Vivado HLS, LegUp, and the Intel HLS Compiler.
+\item Finally, the Catapult C HLS tool~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. In future work, we intend to test Catapult C alongside Vivado HLS, LegUp, and Intel i++.
\end{itemize}
%more prevalent these were prioritised.