summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2020-12-17 14:04:42 +0000
committeroverleaf <overleaf@localhost>2020-12-31 14:48:38 +0000
commita5249eb597549437802d2ed852919e5b9a923840 (patch)
tree29a32aa1fba1dc0211be88497884d0c7a2db1690
parentea9289245fbc493530e9435faf498cc4a824c70f (diff)
downloadfccm21_esrhls-a5249eb597549437802d2ed852919e5b9a923840.tar.gz
fccm21_esrhls-a5249eb597549437802d2ed852919e5b9a923840.zip
Update on Overleaf.
-rw-r--r--conclusion.tex8
-rw-r--r--eval.tex207
-rw-r--r--intro.tex64
-rw-r--r--main.tex40
-rw-r--r--method.tex165
-rw-r--r--old/old_legup_bug.tex24
-rw-r--r--related.tex17
-rw-r--r--tool-figure.tex24
8 files changed, 263 insertions, 286 deletions
diff --git a/conclusion.tex b/conclusion.tex
index a5a3557..fd28bcb 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -1,13 +1,11 @@
\section{Conclusion}
-We have shown how existing fuzzing tools can be modified so that their outputs are compatible with HLS tools. We have used this testing framework to run 6,700 test-cases through three different HLS tools, and 3,645 test-cases through three different version of Vivado HLS to show how bugs are fixed and introduced. In total, we found at least 6 unique bugs in all the tools. These bugs include crashes as well as instances of generated designs not behaving in the same way as the original code.
+We have shown how an existing fuzzing tool can be modified so that its output is suitable for HLS, and then used it in a campaign to test the reliability of three modern HLS tools. In total, we found at least \numuniquebugs{} unique bugs across all the tools, including crashes and miscompilation bugs.
-One can always question how much bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the wild'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features that they handle. \JW{I added the following two sentences, based on our rebuttal -- please check.} Nevertheless, we would argue that although the \emph{test-cases} we generated do not resemble the programs that humans write, the \emph{bugs} that we exposed using those test-cases are real, and could also be exposed by realistic programs. Moreover, it is worth noting that HLS tools not exclusively provided with human-written programs to compile: they are often fed programs that have been automatically generated by another compiler. Ultimately, we believe that any errors in an HLS tool are worth identifying because they have the potential to cause problems, either now or in the future. And when HLS tools \emph{do} go wrong (or indeed any sort of compiler for that matter), it is particularly infuriating for end-users because it is so difficult to identify whether the fault lies with the tool or with the program it has been given to compile.
-
-Further work could be done on supporting more HLS tools, especially ones that claim to prove that their output is correct before terminating. This could give an indication on how effective these proofs are, and how often they are actually able to complete their equivalence proofs during compilation in a feasible timescale.
+Further work could be done on supporting more HLS tools, especially ones that claim to prove that their output is correct before terminating, such as Catapult-C~\cite{mentor20_catap_high_level_synth}. % This could give an indication of how effective these proofs are, and how often they are actually able to complete their equivalence proofs during compilation in a feasible timescale.
Conventional compilers have become quite resilient to fuzzing over the last decade, so recent work on fuzzing compilers has had to employ increasingly imaginative techniques to keep finding new bugs~\cite{karine+20}. In comparison, we have found that HLS tools -- at least, as they currently stand -- can be made to exhibit bugs even using the relatively basic fuzzing techniques that we employed in this project.
-As HLS is becoming increasingly relied upon, it is important to make sure that HLS tools are also reliable. We hope that this work further motivates the need for rigorous engineering of HLS tools, either by validating that each output the tool produces is correct or by proving the HLS tool itself correct once and for all.
+As HLS is becoming increasingly relied upon, it is important to make sure that HLS tools are also reliable. We hope that this work further motivates the need for rigorous engineering of HLS tools, whether that is by validating that each output the tool produces is correct or by proving the HLS tool itself correct once and for all.
%%% Local Variables:
%%% mode: latex
diff --git a/eval.tex b/eval.tex
index 49e39c5..9046a15 100644
--- a/eval.tex
+++ b/eval.tex
@@ -1,76 +1,69 @@
\section{Evaluation}\label{sec:evaluation}
+We generate \totaltestcases{} test-cases and provide them to three HLS tools: Vivado HLS, LegUp HLS and Intel i++.
+We use the same test-cases across all tools for fair comparison (except the HLS directives, which have tool-specific syntax).
+We were able to test three different versions of Vivado HLS (v2018.3, v2019.1 and v2019.2).
+We tested one version of Intel i++ (version 18.1), and one version of LegUp (4.0).
+LegUp 7.5 is GUI-based and therefore we could not script our tests.
+However, we were able to manually reproduce all the bugs found in LegUp 4.0 in LegUp 7.5.
+
+% Three different tools were tested, including three different versions of Vivado HLS. We were only able to test one version of LegUp HLS (version 4.0), because although LegUp 7.5 is available, it is GUI-based and not amenable to scripting. However, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
+% LegUp and Vivado HLS were run under Linux, while the Intel HLS Compiler was run under Windows.
+
+\subsection{Results across different HLS tools}
+
\definecolor{vivado}{HTML}{7fc97f}
\definecolor{intel}{HTML}{beaed4}
\definecolor{legup}{HTML}{fdc086}
\begin{figure}
\centering
\begin{tikzpicture}[scale=0.61]
- \draw (-14.5,7.65) rectangle (0,-1);
- \fill[vivado,fill opacity=0.5] (-4.4,4.4) ellipse (3.75 and 2.75);
- \fill[intel,fill opacity=0.5] (-10.2,4.4) ellipse (3.75 and 2.75);
- \fill[legup,fill opacity=0.5] (-7.3,2) ellipse (3.75 and 2.75);
- \draw[white, thick] (-4.4,4.4) ellipse (3.75 and 2.75);
- \draw[white, thick] (-10.2,4.4) ellipse (3.75 and 2.75);
- \draw[white, thick] (-7.3,2) ellipse (3.75 and 2.75);
- \node[align=center, anchor=north] at (-10.2,6.5) {\textsf{\textbf{Xilinx Vivado HLS}} \\ \textsf{\textbf{v2019.1}}};
- \node[anchor=north] at (-4.4,6.5) {\textsf{\textbf{Intel i++ 18.1}}};
- \node at (-7.3,0) {\textsf{\textbf{LegUp 4.0}}};
-
- \node at (-5.5,3) {1 (\textcolor{red}{1})};
- \node at (-9.1,3) {4 (\textcolor{red}{0})};
- \node at (-3,4.1) {26 (\textcolor{red}{540})};
- \node at (-11.6,4.1) {79 (\textcolor{red}{20})};
- \node at (-7.3,1) {162 (\textcolor{red}{6})};
- \node at (-7.3,5.1) {0 (\textcolor{red}{5})};
- \node at (-7.3,3.9) {0 (\textcolor{red}{0})};
- \node at (-13.6,-0.5) {5856};
+ \draw (-7.2,7.0) rectangle (7.2,0.7);
+ \fill[intel,fill opacity=0.5] (2.5,4.4) ellipse (3.75 and 1.5);
+ \fill[vivado,fill opacity=0.5] (-2.5,4.4) ellipse (3.75 and 1.5);
+ \fill[legup,fill opacity=0.5] (0,3) ellipse (3.75 and 1.5);
+ \draw[white, thick] (2.5,4.4) ellipse (3.75 and 1.5);
+ \draw[white, thick] (-2.5,4.4) ellipse (3.75 and 1.5);
+ \draw[white, thick] (0,3) ellipse (3.75 and 1.5);
+ \node[align=center, anchor=south west] at (-6.4,6) {\textcolor{vivado}{\bf Xilinx Vivado HLS v2019.1}};
+ \node[anchor=south east] at (6.4,6) {\textcolor{intel}{\bf Intel i++ 18.1}};
+ \node at (4,1.6) {\textcolor{legup}{\bf LegUp 4.0}};
+
+ \node at (1.8,3.5) {1 (\textcolor{red}{1})};
+ \node at (-1.8,3.5) {4 (\textcolor{red}{0})};
+ \node at (4.0,4.5) {26 (\textcolor{red}{540})};
+ \node at (-4.0,4.5) {79 (\textcolor{red}{20})};
+ \node at (0,2.1) {162 (\textcolor{red}{6})};
+ \node at (0,4.9) {0 (\textcolor{red}{5})};
+ \node at (0,3.9) {0 (\textcolor{red}{0})};
+ \node at (-6,1.4) {5856};
\end{tikzpicture}
-\caption{A Venn diagram showing the number of failures in each tool out of 6700 test-cases that were run. Overlapping regions mean that the test-cases failed in multiple tools. The numbers in parentheses represent the number of test-cases that timed out.}\label{fig:existing_tools}
+\caption{The number of failures per tool out of \totaltestcases{} test-cases. Overlapping regions mean that the same test-cases failed in multiple tools. The numbers in parentheses report how many test-cases timed out.}\label{fig:existing_tools}
\end{figure}
-\begin{table}
- \centering
- \begin{tabular}{lr}\toprule
+Figure~\ref{fig:existing_tools} shows a Venn diagram of our results.
+We see that 167 (2.5\%), 83 (1.2\%) and 26 (0.4\%) test-cases fail in LegUp, Vivado HLS and Intel i++ respectively.
+Despite i++ having the lowest failure rate, it has the highest time-out rate (540 test-cases), because of its remarkably long compilation time.
+% We remark that although the Intel HLS Compiler had the smallest number of confirmed test-case failures, it had the most time-outs (which could be masking additional failures)
+Note that the absolute numbers here do not necessarily correspond to the number of bugs in the tools, because a single bug in a language feature that appears frequently in our test suite could cause many programs to crash or fail.
+Hence, we reduce many of the failing test-cases in an effort to identify unique bugs; these are summarised in the table below.
+
+\begin{table}[h]
+\centering
+\begin{tabular}{lr}\toprule
\textbf{Tool} & \textbf{Unique Bugs}\\
\midrule
- Xilinx Vivado HLS (all versions) & $\ge 2$\\
+ Xilinx Vivado HLS v2019.1 & $\ge 2$\\
LegUp HLS & $\ge 3$\\
Intel i++ & $\ge 1$\\
\bottomrule
\end{tabular}
- \caption{Unique bugs found in each tool. The ``$\ge$'' sign signifies a lower bound on the number of unique bugs found after test-case reduction. %\JW{is `all versions' correct here? and should we add version numbers like in the Venn?}\YH{Yes it is actually correct here, I don't mind adding the specific version either though}\JW{Ok let's leave it as-is.}
- }
- \label{tab:unique_bugs}
\end{table}
-We generate 6700 test-cases and provide them to three HLS tools: Vivado HLS, LegUp HLS and Intel i++.
-We use the same test-cases across all tools for fair comparison.
-We were able to test three different versions of Vivado HLS (v2018.3, v2019.1 and v2019.2).
-We tested one version of Intel i++ (version 18.1), and one version of LegUp (4.0).
-LegUp 7.5 is GUI-based and therefore we could not script our tests.
-However, we were able to manually reproduce bugs found in LegUp 4.0 in LegUp 7.5.
-
-% Three different tools were tested, including three different versions of Vivado HLS. We were only able to test one version of LegUp HLS (version 4.0), because although LegUp 7.5 is available, it is GUI-based and not amenable to scripting. However, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
-% LegUp and Vivado HLS were run under Linux, while the Intel HLS Compiler was run under Windows.
-
-\subsection{Results across different HLS tools}
-
-Figure~\ref{fig:existing_tools} shows a Venn diagram of our results.
-We see that 167 (2.5\%), 83 (1.2\%) and 26 (0.4\%) test-cases fail in LegUp, Vivado HLS and Intel i++ respectively.
-Despite i++ having the lowest failure rate, it has the highest time-out rate (540 test-cases), because of its remarkably long compilation time.
-% We remark that although the Intel HLS Compiler had the smallest number of confirmed test-case failures, it had the most time-outs (which could be masking additional failures)
-Note that the absolute numbers here do not necessary correspond to the number of bugs in the tools, because a single bug in a language feature that appears frequently in our test suite could cause many programs to crash or fail.
-Hence, we reduce many of the failing test-cases to identify unique bugs, as summarised in Table~\ref{tab:unique_bugs}.
-We write `$\ge$' in the table to indicate that all the bug counts are lower bounds -- we did not have time to go through the test-case reduction process for every failure.
+We write `$\ge$' above to emphasise that all the bug counts are lower bounds -- we did not have time to go through the rather arduous test-case reduction process for every failure.
\subsection{Results across versions of an HLS tool}
-Besides comparing the reliability of different HLS tools, we also investigated the reliability of Vivado HLS over time. Figure~\ref{fig:sankey_diagram} shows the results of giving 3645 test-cases to Vivado HLS v2018.3, v2019.1 and v2019.2.
-Test-cases that pass and fail in the same tools are grouped together into a ribbon.
-For instance, the topmost ribbon represents the 31 test-cases that fail in all three versions of Vivado HLS. Other ribbons can be seen weaving in and out; these indicate that bugs were fixed or reintroduced in the various versions. The diagram demonstrates that Vivado HLS v2018.3 contains the most failing test-cases compared to the other versions, having 62 test-cases fail in total. %Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
-Interestingly, as an indicator of reliability of HLS tools, the blue ribbon shows that there are test-cases that fail in v2018.3, pass in v2019.1 but then fail again in v2019.2.
-
\definecolor{ribbon1}{HTML}{8dd3c7}
\definecolor{ribbon2}{HTML}{b3de69}
\definecolor{ribbon3}{HTML}{bebada}
@@ -110,37 +103,35 @@ Interestingly, as an indicator of reliability of HLS tools, the blue ribbon show
\node[white] at (2,2.5) {36};
\node[white] at (4,2.25) {41};
\end{tikzpicture}
- \caption{A Sankey diagram that tracks 3645 test-cases through three different versions of Vivado HLS. The ribbons collect the test-cases that pass and fail together. The black bars are labelled with the total number of test-case failures per version. The 3573 test-cases that pass in all three versions are not depicted.
+ \caption{A Sankey diagram that tracks \vivadotestcases{} test-cases through three different versions of Vivado HLS. The ribbons collect the test-cases that pass and fail together. The black bars are labelled with the total number of test-case failures per version. The 3573 test-cases that pass in all three versions are not depicted.
}\label{fig:sankey_diagram}
\end{figure}
-% \NR{Why are there missing numbers in the ribbons?}
-As in our Venn diagram, the absolute numbers in Figure~\ref{fig:sankey_diagram} do not necessary correspond to the number of bugs. However, we can deduce from this diagram that there must be at least six unique bugs in Vivado HLS, given that a ribbon must contain at least one unique bug.
-%\YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
-In addition to that, it can then be seen that Vivado HLS v2018.3 must have at least 4 individual bugs, of which two were fixed and two others stayed in Vivado HLS v2019.1. However, with the release of v2019.1, new bugs were introduced as well. % Finally, for version 2019.2 of Vivado HLS, there seems to be a bug that was reintroduced which was also present in Vivado 2018.3, in addition to a new bug. In general it seems like each release of Vivado HLS will have new bugs present, however, will also contain many previous bug fixes. However, it cannot be guaranteed that a bug that was previously fixed will remain fixed in future versions as well.
+Besides comparing the reliability of different HLS tools, we also investigated the reliability of Vivado HLS over time. Figure~\ref{fig:sankey_diagram} shows the results of giving \vivadotestcases{} test-cases to Vivado HLS v2018.3, v2019.1 and v2019.2.
+Test-cases that pass and fail in the same tools are grouped together into a ribbon.
+For instance, the topmost ribbon represents the 31 test-cases that fail in all three versions of Vivado HLS. Other ribbons can be seen weaving in and out; these indicate that bugs were fixed or reintroduced in the various versions. We see that Vivado HLS v2018.3 had the most test-case failures (62).
+Interestingly, as an indicator of reliability of HLS tools, the blue ribbon shows that there are test-cases that fail in v2018.3, pass in v2019.1 but then fail again in v2019.2.
+As in our Venn diagram, the absolute numbers here do not necessary correspond to the number of actual bugs, but we can deduce that there must be at least six unique bugs in Vivado HLS, given that each ribbon corresponds to at least one unique bug.
-\subsection{Some specific bugs found}
-This section describes some of the bugs that were found in the various tools that were tested. We describe two bugs in LegUp and one in Vivado HLS; in each case, the bug was first reduced automatically using \creduce{}, and then reduced further manually to achieve the minimal test-case. Although we did find test-case failures in Intel i++, the long compilation times for that tool meant that we did not have time to reduce any of the failures down to an example that is minimal enough to present here.
-\subsubsection{LegUp assertion error}
-The code shown in Figure~\ref{fig:eval:legup:assert} leads to an assertion error in LegUp 4.0 and 7.5 even though it should compile without any errors.
-An assertion error counts as a crash of the tool, as it means that an unexpected state was reached by this input.
-This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
+%\YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
+%In addition to that, it can then be seen that Vivado HLS v2018.3 must have at least 4 individual bugs, of which two were fixed and two others stayed in Vivado HLS v2019.1. However, with the release of v2019.1, new bugs were introduced as well. % Finally, for version 2019.2 of Vivado HLS, there seems to be a bug that was reintroduced which was also present in Vivado 2018.3, in addition to a new bug. In general it seems like each release of Vivado HLS will have new bugs present, however, will also contain many previous bug fixes. However, it cannot be guaranteed that a bug that was previously fixed will remain fixed in future versions as well.
-\begin{figure}
+\subsection{Some specific bugs found}
+
+We now describe two more of the bugs we found: one crash bug in LegUp and one miscompilation bug in Vivado HLS. As in Example~\ref{ex:vivado_miscomp}, each bug was first reduced automatically using \creduce{}, and then reduced further manually to achieve the minimal test-case. Although we did find test-case failures in Intel i++, the long compilation times for that tool meant that we did not have time to reduce any of the failures down to an example that is minimal enough to present here.
+
+\begin{example}[A crash bug in LegUp]
+The program shown below leads to an internal compiler error (an unhandled assertion in this case) in LegUp 4.0 and 7.5.
\begin{minted}{c}
int a[2][2][1] = {{{0},{1}},{{0},{0}}};
-
-int main() {
- a[0][1][0] = 1;
-}
+int main() { a[0][1][0] = 1; }
\end{minted}
-\caption{This program causes an assertion failure in LegUp HLS when \texttt{NO\_INLINE} is set.}\label{fig:eval:legup:assert}
-\end{figure}
-
-The buggy test-case has to do with initialisation and assignment to a three-dimensional array, for which the above piece of code is the minimal example. However, in addition to that it requires the \texttt{NO\_INLINE} flag to be set, which disables function inlining. The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
+%An assertion error counts as a crash of the tool, as it means that an unexpected state was reached by this input.
+%This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
+It initialises a 3D array with zeroes, and then assigns to one element. The bug only appears when function inlining is disabled (\texttt{NO\_INLINE}). % The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
%
@@ -152,68 +143,40 @@ The buggy test-case has to do with initialisation and assignment to a three-dime
%int main() { return un.a; }
%\end{minted}
-\subsubsection{LegUp miscompilation}
-
-The test-case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect Verilog in LegUp 4.0 and 7.5, which means that the results of RTL simulation is different to the C execution.
+\end{example}
\begin{figure}
\begin{minted}{c}
-volatile int a = 0;
-int b = 1;
-
-int main() {
- int d = 1;
- if (d + a)
- b || 1;
- else
- b = 0;
- return b;
-}
-\end{minted}
-\caption{An output mismatch: LegUp HLS returns 0 but the correct result is 1.}\label{fig:eval:legup:wrong}
-\end{figure}
-
-In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp. If the \texttt{volatile} keyword is removed from \texttt{a}, then the Verilog produces the correct result. As \texttt{a} and \texttt{d} are constants, the \code{if} statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the \code{if} statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not change. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the use of \texttt{volatile} keyword, which interferes with the analysis that attempts to simplify the \code{if} statement.
-
-\subsubsection{Vivado HLS miscompilation}
-
-Figure~\ref{fig:eval:vivado:mismatch} shows code that does not output the right result when compiled with all Vivado HLS versions.
-It returns \texttt{0x0} with Vivado HLS, instead of \texttt{0xF}. This test-case is much larger compared to the other test-cases that were reduced.
-We could not reduce this program any further, as everything in the code was necessary to trigger the bug.
-
-The array \texttt{a} is initialised to all zeroes, as well as the other global variables \texttt{g} and \texttt{c}, so as to not introduce any undefined behaviour. However, \texttt{g} is also given the \texttt{volatile} keyword, which ensures that the variable is not optimised away. The function \texttt{d} then accumulates the values \texttt{b} that it is passed into a hash stored in \texttt{c}. Each \texttt{b} is eight bits wide, so function \texttt{e} calls the function seven times for some of the bits in the 64-bit value of \texttt{f} that it is passed. Finally, in the main function, the array is initialised partially with a \code{for} loop, after which the \texttt{e} function is called twice, once on the volatile function and once on a constant. Interestingly, the second function call with the constant is also necessary to trigger the bug.
-
-\begin{figure}
-\begin{minted}{c}
-volatile unsigned int g = 0;
-int a[256] = {0};
-int c = 0;
-
-void d(char b) {
- c = (c & 4095) ^ a[(c ^ b) & 15];
-}
-
-void e(long f) {
- d(f); d(f >> 8); d(f >> 16); d(f >> 24);
- d(f >> 32); d(f >> 40); d(f >> 48);
-}
-
+static volatile int a[9][1][7];
int main() {
- for (int i = 0; i < 56; i++)
- a[i] = i;
- e(g);
- e(-2L);
- return c;
+ int tmp = 1;
+ for (int b = 0; b < 2; b++) {
+ a[0][0][0] = 3;
+ a[0][0][0] = a[0][0][0];
+ }
+ for (int i = 0; i < 9; i++)
+ for (int k = 0; k < 7; k++)
+ tmp ^= a[i][0][k];
+ return tmp;
}
\end{minted}
-\caption{An output mismatch: Vivado HLS returns \texttt{0x0} but the correct result is \texttt{0xF}.}\label{fig:eval:vivado:mismatch}
+\caption{Miscompilation bug in Intel i++. It should return 2 because \code{3 \^{} 1 = 2}, however, Intel i++ returns 0 instead.}\label{fig:eval:intel:mismatch}
\end{figure}
-\subsubsection{Intel i++ miscompilation}
+%\begin{example}[A miscompilation bug in Vivado HLS]
+
+%Figure~\ref{fig:eval:vivado:mismatch} shows code that should output \texttt{0xF}, but outputs \texttt{0x0} when compiled with Vivado HLS (all three versions).
+%This test-case is much larger compared to the other test-cases that were reduced.
+%We could not reduce this program any further, as everything in the code was necessary to trigger the bug.
+%This test-case is interesting because the root cause appears to be the hashing boilerplate that we added to Csmith, rather than the randomly-generated code that Csmith produced. In the main function, an array is partially initialised, and then a function \texttt{hashl} for hashing a long integer is called twice, once on a volatile global \texttt{x}, and once on a constant. That hashing function invokes a second hashing function on some of the bytes in the long integer.
-\JW{Write about one of the Intel bugs here.}
+%The array \texttt{arr} is initialised to all zeroes, as well as the other global variables \texttt{x} and \texttt{y}, so as to not introduce any undefined behaviour. However, \texttt{x} is also given the \texttt{volatile} keyword, which ensures that the variable is not optimised away. The function \texttt{hashc} then accumulates the values that it is passed into a hash stored in \texttt{h}. Each \texttt{b} is eight bits wide, so function \texttt{hashl} calls the function seven times for some of the bits in the 64-bit value of \texttt{f} that it is passed. Finally, in the main function, the array is initialised partially with a \code{for} loop, after which the \texttt{hashl} function is called twice, once on the volatile function and once on a constant. Interestingly, the second function call with the constant is also necessary to trigger the bug.
+%\end{example}
+\begin{example}[A miscompilation bug in Intel i++]
+Figure~\ref{fig:eval:intel:mismatch} shows a miscompilation bug that was found in Intel i++. Intel i++ does not seem to notice the assignment to 3 in the previous for loop, or tries to perform some optimisations that seem to analyse the array incorrectly and therefore results in a wrong value being returned.
+\end{example}
%%% Local Variables:
%%% mode: latex
diff --git a/intro.tex b/intro.tex
index 7f0d12f..710b520 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,16 +1,18 @@
\section{Introduction}
High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape.
-It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL.
-It is even being used in high-assurance settings, such as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
+It promises hardware engineers an increase in productivity by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware description languages (HDL) such as Verilog and VHDL.
+HLS is being used in an ever greater range of domains, including such high-assurance settings as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
-The approach we take in this paper is \emph{fuzzing}.
+The approach we take is \emph{fuzzing}.
%To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases.
%Therein lies the difficulty in testing HLS tools.
%Human testing may not achieve both these objectives, as HLS tools are often require complex inputs to trigger wrong behaviour.
%In this paper, we employ program fuzzing on HLS tools.
This is an automated testing method in which randomly generated programs are given to compilers to test their robustness~\cite{fuzzing+chen+13+taming,fuzz+sun+16+toward,fuzzing+liang+18+survey,fuzzing+zhang+19,yang11_findin_under_bugs_c_compil,lidbury15_many_core_compil_fuzzin}.
The generated programs are typically large and rather complex, and they often combine language features in ways that are legal but counter-intuitive; hence they can be effective at exercising corner cases missed by human-designed test suites.
-Fuzzing has been used extensively to test conventional compilers; for example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} used it to reveal more than three hundred bugs in GCC and Clang \JW{Clang -> LLVM?}. In this paper, we bring fuzzing to the HLS context.
+Fuzzing has been used extensively to test conventional compilers; for example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} used it to reveal more than three hundred bugs in GCC and LLVM.
+%\JW{Clang or LLVM?}\YH{Hard to say actually, they just mention Clang, but I believe they go hand in hand. If it was optimisations, it is likely LLVM, but Clang is still the front end compiler.} \JW{To my mind, Clang is part of LLVM, so a bug in Clang is necessarily a bug in LLVM. So I think we should say LLVM throughout.}
+In this paper, we bring fuzzing to the HLS context.
%We specifically target HLS by restricting a fuzzer to generate programs within the subset of C supported by HLS.
@@ -25,50 +27,42 @@ Fuzzing has been used extensively to test conventional compilers; for example, Y
% Program fuzzing is bla..
% Fuzzing enables us to overcome
-\subsection*{An example of a compiler bug found by fuzzing}
-Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS.\footnote{\JW{I added the following paragraph -- please check.} This program, like all the others in this paper, includes a \code{main} function, which means that it compiles straightforwardly with GCC. To compile it with an HLS tool, we rename \code{main} to \code{main\_}, synthesise that function, and then add a new \code{main} function as a testbench that calls \code{main\_}.} The bug was initially revealed by a randomly generated program of around 113 lines, which we were able to reduce to the minimal example shown in the figure.
-The program repeatedly shifts a large integer value \code{b} right by the values stored in array \code{a}.
+\begin{example}[A miscompilation bug in Vivado HLS]
+\label{ex:vivado_miscomp}
+Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS v2018.3, v2019.1 and v2019.2.\footnote{This program, like all the others in this paper, includes a \code{main} function, which means that it compiles straightforwardly with GCC. To compile it with an HLS tool, we rename \code{main} to \code{result}, synthesise that function, and then add a new \code{main} function as a testbench that calls \code{result}.} The bug was initially revealed by a randomly generated program of around 113 lines, which we were able to reduce to the minimal example shown in the figure. This bug was also reported to Xilinx and confirmed to be a bug.\footnote{https://bit.ly/3mzfzgA}
+The program repeatedly shifts a large integer value \code{x} right by the values stored in array \code{arr}.
Vivado HLS returns \code{0x006535FF}, but the result returned by GCC (and subsequently confirmed manually to be the correct one) is \code{0x046535FF}.
+\end{example}
\begin{figure}[t]
- \centering
\begin{minted}{c}
-unsigned int b = 0x1194D7FF;
-int a[6] = {1, 1, 1, 1, 1, 1};
+unsigned int x = 0x1194D7FF;
+int arr[6] = {1, 1, 1, 1, 1, 1};
int main() {
- for (int c = 0; c < 2; c++)
- b = b >> a[c];
- return b;
+ for (int i = 0; i < 2; i++)
+ x = x >> arr[i];
+ return x;
}
\end{minted}
- \caption{Miscompilation bug found in Xilinx Vivado HLS v2018.3, v2019.1 and v2019.2. The program returns \code{0x006535FF} but the correct result is \code{0x046535FF}.}
+ \caption[Miscompilation bug in Xilinx Vivado HLS. The generated RTL returns \code{0x006535FF} but the correct result is \code{0x046535FF}.]{Miscompilation bug in Xilinx Vivado HLS. The generated RTL returns \code{0x006535FF} but the correct result is \code{0x046535FF}.}
\label{fig:vivado_bug1}
\end{figure}
-The circumstances in which we found this bug illustrate some of the challenges in testing HLS tools.
-For instance, without the for-loop, the bug goes away.
-Moreover, the bug only appears if the shift values are accessed from an array.
-And -- particularly curiously -- even though the for-loop only has two iterations, the array \code{a} must have at least six elements; if it has fewer than six, the bug disappears.
-Even the seemingly random value of \code{b} could not be changed without masking the bug.
-It seems unlikely that a manually generated test program would bring together all of the components necessary for exposing this bug.
-In contrast, producing counter-intuitive, complex but valid C programs is the cornerstone of fuzzing tools.
-For this reason, we find it natural to adopt fuzzing for our HLS testing campaign.
-% \NR{Yann, please double check my claims about the bug. I hope I accurately described what we discussed. }\YH{Yes I agree with all that, I think that is a good description of it}
-
-\subsection*{Our contribution}
-This paper reports on our campaign to test HLS tools by fuzzing.
-\begin{itemize}
- \item We use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate thousands of valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We also augment each program with a random selection of HLS-specific directives.
-
- \item We give these programs to three widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and the Intel HLS Compiler, which is also known as i++~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
-
- \item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, 6700 test cases were run through each tool out of which 272 test cases failed in at least one of the tools. Test case reduction was then performed on some of these failing test cases to obtain at least 6 unique failing test cases.
+The example above demonstrates the effectiveness of fuzzing. It seems unlikely that a human-written test-suite would discover this particular bug, given that it requires several components all to coincide -- a for-loop, shift-values accessed from an array with at least six elements, and a rather random-looking value for \code{x} -- before the bug is revealed!
+
+Yet this example also begs the question: do bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the real world'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features that they handle. Nevertheless, we would argue that although the \emph{test-cases} we generated do not resemble the programs that humans write, the \emph{bugs} that we exposed using those test-cases are real, and \emph{could also be exposed by realistic programs}. Moreover, it is worth noting that HLS tools are not exclusively provided with human-written programs to compile: they are often fed programs that have been automatically generated by another compiler. Ultimately, we believe that any errors in an HLS tool are worth identifying because they have the potential to cause problems, either now or in the future. And problems caused by HLS tools going wrong (or indeed any sort of compiler for that matter) are particularly egregious, because it is so difficult for end-users to identify whether the fault lies with the tool or with the program it has been given to compile.
+
+\subsection{Our approach and results}
+
+Our approach to fuzzing HLS tools comprises three steps.
+First, we use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate thousands of valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We also augment each program with a random selection of HLS-specific directives. Second, we give these programs to three widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and the Intel HLS Compiler, which is also known as i++~\cite{intel20_sdk_openc_applic}. Third, if we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
+
+Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, \totaltestcases{} test cases were run through each tool out of which \totaltestcasefailures{} test cases failed in at least one of the tools. Test case reduction was then performed on some of these failing test cases to obtain at least \numuniquebugs{} unique failing test cases.
- \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found that in general there about half as many failures in versions v2019.1 and v2019.2 compared to v2018.3. However, there were also test-cases that only failed in versions v2019.1 and v2019.2, meaning bugs were probably introduced due to the addition of new features.
-\end{itemize}
+To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found far fewer failures in versions v2019.1 and v2019.2 compared to v2018.3, but we also identified a few test-cases that only failed in versions v2019.1 and v2019.2, which suggests that some new features may have introduced bugs.
-\JW{I added the following paragraph -- please check.} The overall aim of our paper is to raise awareness about the (un)reliability of current HLS tools, and to serve as a call-to-arms for investment in better-engineered tools. We hope that future work on developing more reliable HLS tools will find our empirical study a valuable source of motivation.
+In summary, the overall aim of our paper is to raise awareness about the reliability (or lack thereof) of current HLS tools, and to serve as a call-to-arms for investment in better-engineered tools. We hope that future work on developing more reliable HLS tools will find our empirical study a valuable source of motivation.
% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by GCC, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
diff --git a/main.tex b/main.tex
index 8dd121e..e6b62d2 100644
--- a/main.tex
+++ b/main.tex
@@ -1,6 +1,11 @@
-%\documentclass[hyphens,prologue,x11names,rgb,sigconf,anonymous,review]{acmart}
\documentclass[conference]{IEEEtran}
+\newif\ifCOMMENTS
+\COMMENTStrue
+
+\newif\ifBLIND
+\BLINDtrue
+
\usepackage[english]{babel}
\usepackage{graphicx}
\usepackage{siunitx}
@@ -16,8 +21,14 @@
\usepackage{hyperref}
%\usepackage{balance}
-\newif\ifCOMMENTS
-\COMMENTStrue
+\newcommand\totaltestcases{6700}
+\newcommand\totaltestcasefailures{272}
+\newcommand\numuniquebugs{6}
+\newcommand\vivadotestcases{3645}
+
+\theoremstyle{definition}
+\newtheorem{example}[]{Example}
+
\newcommand{\Comment}[3]{\ifCOMMENTS\textcolor{#1}{{\bf [[#2:} #3{\bf ]]}}\fi}
\newcommand\JW[1]{\Comment{red!75!black}{JW}{#1}}
\newcommand\YH[1]{\Comment{green!50!blue}{YH}{#1}}
@@ -31,36 +42,23 @@
\title{Fuzzing High-Level Synthesis Tools}
-\newif\ifBLIND
-\BLINDfalse
\ifBLIND
\author{Blind review}
\else
\author{%
-\IEEEauthorblockN{Zewei Du}
-\IEEEauthorblockA{Imperial College London, UK \\
-Email: zewei.du19@imperial.ac.uk}
-\and
-\IEEEauthorblockN{Yann Herklotz}
-\IEEEauthorblockA{Imperial College London, UK \\
-Email: yann.herklotz15@imperial.ac.uk}
-\and
-\IEEEauthorblockN{Nadesh Ramanathan}
-\IEEEauthorblockA{Imperial College London, UK \\
-Email: n.ramanathan14@imperial.ac.uk}
-\and
-\IEEEauthorblockN{John Wickerson}
+\IEEEauthorblockN{Zewei Du, Yann Herklotz, Nadesh Ramanathan, and John Wickerson}
\IEEEauthorblockA{Imperial College London, UK \\
-Email: j.wickerson@imperial.ac.uk}}
+Email: \{zewei.du19, yann.herklotz15, n.ramanathan14, j.wickerson\}@imperial.ac.uk}
+}
\fi
\maketitle
\begin{abstract}
High-level synthesis (HLS) is becoming an increasingly important part of the computing landscape, even in safety-critical domains where correctness is key.
-As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
+As such, HLS tools are increasingly relied upon. But are they trustworthy?
We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and the Intel HLS Compiler -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all three HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
-Out of 6700 test cases, we found 272 programs that failed in at least one tool, out of which we were able to identify at least 6 unique bugs.
+Out of \totaltestcases{} test cases, we found \totaltestcasefailures{} programs that failed in at least one tool, out of which we were able to discern at least \numuniquebugs{} unique bugs.
\end{abstract}
diff --git a/method.tex b/method.tex
index 1105056..8841208 100644
--- a/method.tex
+++ b/method.tex
@@ -2,17 +2,18 @@
% \NR{I think this section is very detailed. I think we can start with a figure of our tool-flow. Then, we can break down each item in the figure and discuss the necessary details (Fig.~\ref{fig:method:toolflow}).}
-This section describes how we conducted our testing campaign, the overall flow of which is shown in Figure~\ref{fig:method:toolflow}.
+The overall flow of our testing approach is shown in Figure~\ref{fig:method:toolflow}.
\input{tool-figure}
-%\begin{itemize}
-%\item
-In~\S\ref{sec:method:csmith}, we describe how we configure Csmith to generate HLS-friendly random programs for our testing campaign.
+This section describes how test-cases are generated (\S\ref{sec:method:csmith}), executed (\S\ref{sec:method:testing}), and reduced (\S\ref{sec:method:reduce}).
+
+
+%In~\S\ref{sec:method:csmith}, we describe how we modify Csmith to generate random HLS-friendly programs for our testing campaign.
%\item
-In~\S\ref{sec:method:annotate}, we discuss how we augment those random programs with directives and the necessary configuration files for HLS compilation.
+%In~\S\ref{sec:method:annotate}, we discuss how we augment those random programs with directives and the necessary configuration files for HLS compilation.
%\item
-In~\S\ref{sec:method:testing}, we discuss how we set up compilation and co-simulation checking for the three HLS tools under test.
+%In~\S\ref{sec:method:testing}, we discuss how we set up compilation and co-simulation checking for the three HLS tools under test.
%\item
-Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic programs in order to obtain minimal examples of bugs.
+%Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic programs in order to obtain minimal examples of bugs.
%\end{itemize}
% How we configure Csmith so that it only generates HLS-friendly programs.
@@ -21,7 +22,7 @@ Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic prog
% \item How we run each HLS tool, using timeouts as appropriate.
% \end{itemize}
-\subsection{Generating programs via Csmith}
+\subsection{Generating test-cases}
\label{sec:method:csmith}
% \NR{
@@ -32,22 +33,26 @@ Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic prog
% }
% \YH{I believe that we are now maybe using our own hashing, even though one of the bugs in Vivado was found in the Csmith hashing algorithm.. So I don't know how best to mention that.}
-For our testing campaign, we require a random program generator that produces C programs that are both semantically valid and feature-diverse; Csmith~\cite{yang11_findin_under_bugs_c_compil} meets both these criteria.
+%For our testing campaign, we require a random program generator that produces C programs that are both semantically valid and feature-diverse; Csmith~\cite{yang11_findin_under_bugs_c_compil} meets both these criteria.
%Csmith is randomised code generator of C programs for compiler testing, that has found more than 400 bugs in software compilers.
%Csmith provides several properties to ensure generation of valid C programs.
-Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes, which renders the program useless as a test-case). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
+%Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes, which renders the program useless as a test-case). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
%Validity is critical for us since these random programs are treated as our ground truth in our testing setup, as shown in Figure~\ref{fig:method:toolflow}.
-Additionally, Csmith allows users control over how it generates programs.
-For instance, the probabilities of choosing various C constructs can be tuned.
+%We use a modified version of Csmith to generate our test-cases.
+%Csmith is designed so that each C program is syntactically and semantically valid, and free from undefined behaviour.
+
+%Additionally, Csmith allows users control over how it generates programs.
+%For instance, the probabilities of choosing various C constructs can be tuned.
%By default, Csmith assigns a probability value for each of these features.
%Csmith also allows users to customise these values to tune program generation for their own purposes.
-This is vital for our work since we want to generate programs that are HLS-friendly.
+%This is vital for our work since we want to generate programs that are HLS-friendly.
%Configuring Csmith to generate HLS-friendly programs is mostly an exercise of reducing and restricting feature probabilities.
\begin{table}
\centering
+ \caption{Summary of changes to Csmith's probabilities and parameters.}\label{tab:properties}
\begin{tabular}{ll}
\toprule
\textbf{Property/Parameter} & \textbf{Change} \\
@@ -71,80 +76,83 @@ This is vital for our work since we want to generate programs that are HLS-frien
\code{-{}-max-expr-complexity} & 2 \\
\bottomrule
\end{tabular}
- \caption{Summary of important changes to Csmith's feature probabilities and parameters to generate HLS-friendly programs for our testing campaign.
- % \JW{Shouldn't `no-safe-math' be disabled, rather than enabled? I mean, we want safe-math enabled, right? So we should disable no-safe-math, no?}\YH{Yes, that shouldn't have been there.}
- % \NR{Discussion point: Do we enable unions or not? If not, why do have a union bug in Fig. 4? Reviewers may question this.}
- % \YH{I've now removed the union bug, it's too similar to another bug I think.}
- }\label{tab:properties}
\end{table}
-
+Csmith exposes several parameters through which the user can adjust how often various C constructs appear in the randomly generated programs. Table~\ref{tab:properties} describes how we configured these parameters.
%\paragraph{Significant probability changes}
-Table~\ref{tab:properties} lists the main changes that we put in place to ensure that HLS tools are able to synthesise all of our generated programs.
+%Table~\ref{tab:properties} lists the main changes that we put in place to ensure that HLS tools are able to synthesise all of our generated programs.
Our overarching aim is to make the programs tricky for the tools to handle correctly (in order to maximise our chances of exposing bugs), while keeping the synthesis and simulation times low (in order to maximise the rate at which tests can be run).
-To this end, we increase the probability of generating \code{if} statements in order to increase the number of control paths, but we reduce the probability of generating \code{for} loops and array operations since they generally increase run times but not hardware complexity.
-Relatedly, we reduce the probability of generating \code{break}, \code{goto}, \code{continue} and \code{return} statements, because with fewer \code{for} loops being generated, these statements tend to lead to uninteresting programs that simply exit prematurely.
+For instance, we increase the probability of generating \code{if} statements so as to increase the number of control paths, but we reduce the probability of generating \code{for} loops and array operations since they generally increase run times but not hardware complexity.
+We disable various features that are not supported by HLS tools such as assignments inside expressions, pointers, and union types.
+We avoid floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which make it hard to reduce bugs to a minimal form.
+%Relatedly, we reduce the probability of generating \code{break}, \code{goto}, \code{continue} and \code{return} statements, because with fewer \code{for} loops being generated, these statements tend to lead to uninteresting programs that simply exit prematurely.
%\paragraph{Important restrictions}
-More importantly, we disable the generation of several language features to enable HLS testing.
+%More importantly, we disable the generation of several language features to enable HLS testing.
% \NR{Few sentences on different features whose probabilities were changed significantly.}
% Table~\ref{tab:properties} lists the important restrictions that we put in place to ensure that HLS tools are able to synthesise programs generated by Csmith.
-First, we ensure that all mathematical expressions are safe and unsigned, to ensure no undefined behaviour.
-We also disallow assignments being embedded within expressions, since HLS generally does not support them.
-We eliminate any floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which in turn make it hard to reduce bugs to their minimal form.
-We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.
+%First, we ensure that all mathematical expressions are safe and unsigned, to ensure no undefined behaviour.
+%We also disallow assignments being embedded within expressions, since HLS generally does not support them.
+%We eliminate any floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which in turn make it hard to reduce bugs to their minimal form.
+%We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.
%\YH{I've looked at the documentation and even pointer to pointer is supported, but maybe not pointer to pointer to pointer. I think there was some other pointer assignment that didn't quite work, but I don't remember now. Immature might be a good description though.}
%\YH{Figure \ref{fig:eval:vivado:mismatch} actually has void functions...} \JW{Hm, true. Perhaps these were introduced during reduction.}
-We disable the generation of unions as these were not supported by some of the tools such as LegUp 4.0.
+%We disable the generation of unions as these were not supported by some of the tools such as LegUp 4.0.
-To decide whether a problematic feature should be disabled or reported as a bug, the tool in question is taken into account. Unfortunately there is no standard subset of C that is supported by HLS tools; every tool chooses a slightly different subset. It is therefore important to choose the right subset in order to maximise the number of real bugs found in each tool, while avoiding generating code that the tool does not support. Therefore, we disable a feature if it fails gracefully (i.e. with an error message stating the issue) in one of the tools. If the HLS tool fails in a different way though, such as generating a wrong design or crashing during synthesis, the feature is kept in our test suite.
+%To decide whether a problematic feature should be disabled or reported as a bug, the tool in question is taken into account. Unfortunately there is no standard subset of C that is supported by HLS tools; every tool chooses a slightly different subset. It is therefore important to choose the right subset in order to maximise the number of real bugs found in each tool, while avoiding generating code that the tool does not support. Therefore, we disable a feature if it fails gracefully (i.e. with an error message stating the issue) in one of the tools. If the HLS tool fails in a different way though, such as generating a wrong design or crashing during synthesis, the feature is kept in our test suite.
%Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.
-We enforce that the main function of each generated program must not have any input arguments to allow for HLS synthesis.
-We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma.
-We also disable bitwise AND and OR operations because when applied to constant operands, some versions of Vivado HLS errored out with `Wrong pragma usage.'
+%We enforce that the main function of each generated program must not have any input arguments to allow for HLS synthesis.
+%We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma.
+%We also disable bitwise AND and OR operations because when applied to constant operands, some versions of Vivado HLS errored out with `Wrong pragma usage.'
%\NR{Did we report this problem to the developers? If so, can we claim that here?}\YH{We haven't I think, we only reported one bug.}
%\paragraph{Parameter settings}
-Finally, we tweak several integer parameters that influence program generation.
-We limit the maximum number of functions (five) and array dimensions (three) in our random C programs, in order to reduce the design complexity and size.
+%Finally, we tweak several integer parameters that influence program generation.
+%We limit the maximum number of functions (five) and array dimensions (three) in our random C programs, in order to reduce the design complexity and size.
%\JW{Is this bigger or smaller than the default? Either way: why did we make this change?}
-We also limit the depth of statements and expressions, to reduce the synthesis and simulation times.
+%We also limit the depth of statements and expressions, to reduce the synthesis and simulation times.
% \JW{Why? Presumably this is to do with keeping synthesis/simulation times as low as possible?}
%\NR{I remove Zewei's writing but please feel free to re-instate them}
-\subsection{Augmenting programs for HLS testing}
-\label{sec:method:annotate}
+%\subsection{Augmenting programs for HLS testing}
+%\label{sec:method:annotate}
-We augment the programs generated by Csmith to prepare them for HLS testing.
-We do this in two ways: program instrumentation and directive injection.
-This involves either modifying the C program or accompanying the C program with a configuration file, typically a \code{tcl} file.
-Finally, we must also generate a tool-specific build script per program, which instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.
+To prepare the programs generated by Csmith for HLS testing, we modify them in two ways. First, we inject random HLS directives, which instruct the HLS tool to perform certain optimisations, including: loop pipelining, loop unrolling, loop flattening, loop merging, expression balancing, function pipelining, function-level loop merging, function inlining, array mapping, array partitioning, and array reshaping. Some directives can be applied via a separate configuration file (.tcl), some require us to add labels to the C program (e.g. to identify loops), and some require placing pragmas at particular locations in the C program.
-\paragraph{Instrumenting the original C program}
+The second modification we make has to do with the top-level function. Each program generated by Csmith ends its execution by printing a hash of all its variables' values, in the hope that any miscompilations will be exposed through this hash value. We found that Csmith's built-in hash function led to infeasibly long synthesis times, so we replaced it with our own simple XOR-based one.
+
+Finally, we generate a synthesisable testbench that executes the main function of the original C program, and a tool-specific script that instructs the HLS tool to create a design project and then build and simulate the design.
+
+
+%We do this in two ways: program instrumentation and directive injection.
+%This involves either modifying the C program or accompanying the C program with a configuration file, typically a \code{tcl} file.
+%Finally, we must also generate a tool-specific build script per program, which instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.
+
+%\paragraph{Instrumenting the original C program}
%To ensure that the C programs can be successfully synthesised and simulated, we must adhere to a few rules that are common to HLS tools.
-We generate a synthesisable testbench that executes the main function of the original C program.
+%We generate a synthesisable testbench that executes the main function of the original C program.
%Csmith does not generate meaningful testbenches for HLS synthesis.
%So we invoke the original C program's main function from another top-level function.
-This top-level testbench contains a custom XOR-based hash function that takes hashes of the program state at several points during execution, combines all these hashes together, and then returns this value.
-By making the program's output sensitive to the program state in this way, we maximise the likelihood of detecting bugs when they occur.
-Csmith-generated programs do already include their own hashing function, but we replaced this with a simple XOR-based hashing function because we found that the Csmith one led to infeasibly long synthesis times.
+%This top-level testbench contains a custom XOR-based hash function that takes hashes of the program state at several points during execution, combines all these hashes together, and then returns this value.
+%By making the program's output sensitive to the program state in this way, we maximise the likelihood of detecting bugs when they occur.
+%Csmith-generated programs do already include their own hashing function, but we replaced this with a simple XOR-based hashing function because we found that the Csmith one led to infeasibly long synthesis times.
%We inject hashing on program states at several stages of the C program.
%By doing so, we keep track of program state, increasing the likelihood of encountering a bug.
%The final hash value is returned as a part of the main function to assist in determining the presence of a bug.
-\paragraph{Injecting HLS directives}
-Directives are used to instruct HLS tools to optimise the resultant hardware to meet specific performance, power and area targets.
-Typically, a HLS tool identifies these directives and subjects the C code to customised optimisation passes.
-In order to test the robustness of these parts of the HLS tools, we randomly generated directives for each C program generated by Csmith.
-Some directives can be applied via a separate configuration file, others require us to add labels in the C program (e.g. to identify loops), and a few directives require placing pragmas at particular locations in a C program.
-We generate three classes of directives: those for loops, those for functions, and those for variables.
-For loops, we randomly generate directives including loop pipelining (with rewinding and flushing), loop unrolling, loop flattening, loop merging, loop tripcount, loop inlining, and expression balancing.
+%\paragraph{Injecting HLS directives}
+%Directives are used to instruct HLS tools to optimise the resultant hardware to meet specific performance, power and area targets.
+%Typically, a HLS tool identifies these directives and subjects the C code to customised optimisation passes.
+%In order to test the robustness of these parts of the HLS tools, we randomly generated directives for each C program generated by Csmith.
+%Some directives can be applied via a separate configuration file, others require us to add labels in the C program (e.g. to identify loops), and a few directives require placing pragmas at particular locations in a C program.
+%We generate three classes of directives: those for loops, those for functions, and those for variables.
+%For loops, we randomly generate directives including loop pipelining (with rewinding and flushing), loop unrolling, loop flattening, loop merging, loop tripcount, loop inlining, and expression balancing.
%\NR{Not so clear about nested loops. Discuss.}\YH{I believe they are just handled like simple loops?}
-For functions, we randomly generate directives including function pipelining, function-level loop merging, function inlining, and expression balancing.
-For variables, we randomly generate directives including array mapping, array partitioning and array reshaping.
+%For functions, we randomly generate directives including function pipelining, function-level loop merging, function inlining, and expression balancing.
+%For variables, we randomly generate directives including array mapping, array partitioning and array reshaping.
%\NR{Would a table of directives help? }\YH{I think there might be too many if we go over all the tools.}
%\paragraph{Generating build script}
@@ -154,7 +162,7 @@ For variables, we randomly generate directives including array mapping, array pa
%\NR{Have I missed any LegUp directives? What is these lines referring to. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. }\YH{I think that covers most of it.}
-\subsection{Testing various HLS tools}
+\subsection{Compiling the test-cases using the HLS tools}
\label{sec:method:testing}
% \NR{Some notes on key points from this section:
@@ -167,14 +175,14 @@ For variables, we randomly generate directives including array mapping, array pa
% }
% \NR{So Vivado HLS does co-simulation automatically, whereas we had to invoke them for other tools?}
-Having generated HLS-friendly programs and automatically augmented them with directives and meaningful testbenches, we are now ready to provide them to HLS tools for testing.
+%Having generated HLS-friendly programs and automatically augmented them with directives and meaningful testbenches, we are now ready to provide them to HLS tools for testing.
%Figure~\ref{fig:method:toolflow} shows the three stages of testing, depicted as the testing environment in the dashed area.
For each HLS tool in turn, we compile the C program to RTL and then simulate the RTL.
-Independently, we also compile the C program using GCC and execute it.
-\JW{I added the following sentence -- please check.} Although each HLS tool has its own built-in C compiler that could be used to obtain the reference output, we prefer to obtain the reference output ourselves in order to minimise our reliance on the tool being tested. \JW{Do we want to add something like `Indeed, we found at least one program where the output from GCC did not match the output from the HLS tool's built-in C compiler.'?}
+We also compile the C program using GCC and execute it.
+Although each HLS tool has its own built-in C compiler that could be used to obtain the reference output, we prefer to obtain the reference output ourselves in order to minimise our reliance on the tool that is being tested.
-To ensure that our testing is scalable for a large number of large, random programs, we also enforce several time-outs: we set a 5-minute time-out for C execution and a 2-hour time-out for C-to-RTL synthesis and RTL simulation.
-We do not count time-outs as bugs, but we record them.
+To ensure that our testing scales to a large number of large programs, we also enforce several time-outs: we set a 5-minute time-out for C execution and a 2-hour time-out for C-to-RTL synthesis and RTL simulation.
+We do not count time-outs as bugs, but we do record them.
%% JW: This paragraph is not really needed because we repeat the sentiment in the next subsection anyway.
%There two types of bugs that we can encounter in this testing setup: programs that cause the HLS tool to crash during compilation (e.g. an unhandled assertion violation or a segmentation fault), and programs where the software execution and the RTL simulation do not return the same value. Programs that cause either type of error are given to the reduction stage, which aims to minimise the programs and (hopefully) identify the root cause(s).
@@ -191,30 +199,31 @@ We do not count time-outs as bugs, but we record them.
\subsection{Reducing buggy programs}\label{sec:method:reduce}
-Once we discover a program that crashes the HLS tool or whose C/RTL simulations do not match, we further scrutinise the program to identify the root cause(s) of the undesirable behaviour.
-As the programs generated by Csmith can be fairly large, we must systematically reduce these programs to identify the source of a bug.
+Once we discover a program that crashes the HLS tool or whose C/RTL simulations do not match, we systematically reduce it to its minimal form using the \creduce{} tool~\cite{creduce}, in the hope of identifying the root cause. This is done by successively removing or simplifying parts of the program, checking that the bug remains at each step.
%\YH{Edit this section some more}
-Reduction is performed by iteratively removing some part of the original program and then providing the reduced program to the HLS tool for re-synthesis and co-simulation.
-The goal is to find the smallest program that still triggers the bug.
-We apply two consecutive methods of reduction in this work.
-The first step is to reduce the labels and pragmas that were added afterwards to make sure that these do not affect the behaviour of the program. These are reduced until there are no more declarations left or the bug does not get triggered anymore.
+%Reduction is performed by iteratively removing some part of the original program and then providing the reduced program to the HLS tool for re-synthesis and co-simulation.
+%The goal is to find the smallest program that still triggers the bug.
+%We apply two consecutive methods of reduction in this work.
+%The first step is to reduce the labels and pragmas that were added afterwards to make sure that these do not affect the behaviour of the program. These are reduced until there are no more declarations left or the bug does not get triggered anymore.
% \NR{We can add one or two more sentences summarising how we reduce the programs. Zewei is probably the best person to add these sentences.}\YH{Added some more lines, we can ask Zewei if she is OK with that.}
%Although, our custom reduction gives us the freedom and control of how to reduce buggy programs, it is arduous and requires a lot of manual effort.
-We then use the \creduce{} tool~\cite{creduce} to automatically reduce the remaining C program.
-\creduce{} is an existing reducer for C and C++ and runs the reduction steps in parallel to converge as quickly as possible. It is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
-It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test-case that is small enough to understand and step through.
+%We then use the \creduce{} tool~\cite{creduce} to automatically reduce the remaining C program.
+%\creduce{} is an existing reducer for C and C++ and runs the reduction steps in parallel to converge as quickly as possible. It is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
+%It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test-case that is small enough to understand and step through.
+
+We also check at each stage of the reduction process that the reduced program remains within the subset of the language that is supported by the HLS tools; without this check, \creduce{} only zeroed in on programs that were outside of this subset and hence did not represent real bugs.
-However, the downside of using \creduce{} with HLS tools is that we are not in control of which lines and features are prioritised for removal.
-As a consequence, we can easily end up with \creduce{} producing programs that are not synthesisable, despite being valid C.
+%However, the downside of using \creduce{} with HLS tools is that we are not in control of which lines and features are prioritised for removal.
+%As a consequence, we can easily end up with \creduce{} producing programs that are not synthesisable, despite being valid C.
%This is because the semantics of C for HLS tools might be a bit different to the semantics for C of a standard C compiler that \creduce{} was built for.
-Even though \creduce{} does not normally introduce undefined behaviour, it can introduce behaviour that is unsupported in the HLS tools.
-An example is the reduction of a function call, where the reducer realises that a mismatch is still observed when the function call's arguments are removed, and the function pointer is assigned a constant instead.
-This is however often unsupported in HLS tools, since a function pointer does not have a concrete interpretation in hardware, because in the absence of instructions, functions are not associated with a particular memory location.
-Once unhandled behaviour is introduced at any point in the reduction, the test-cases will often zero in on that unhandled behaviour, even though it does not actually represent an interesting bug.
-To prevent this, we use a script to guide \creduce{} away from introducing these unhandled behaviours as much as possible.
-This script involves adding \texttt{-fsanitize=undefined} to the GCC options in order to abort when undefined behaviour is detected at runtime, and erroring out whenever any warning is encountered while running the HLS tool (except common warnings that are known to be harmless).
+%Even though \creduce{} does not normally introduce undefined behaviour, it can introduce behaviour that is unsupported in the HLS tools.
+%An example is the reduction of a function call, where the reducer realises that a mismatch is still observed when the function call's arguments are removed, and the function pointer is assigned a constant instead.
+%This is however often unsupported in HLS tools, since a function pointer does not have a concrete interpretation in hardware, because in the absence of instructions, functions are not associated with a particular memory location.
+%Once unhandled behaviour is introduced at any point in the reduction, the test-cases will often zero in on that unhandled behaviour, even though it does not actually represent an interesting bug.
+%To prevent this, we use a script to guide \creduce{} away from introducing these unhandled behaviours as much as possible.
+%This script involves adding \texttt{-fsanitize=undefined} to the GCC options in order to abort when undefined behaviour is detected at runtime, and erroring out whenever any warning is encountered while running the HLS tool (except common warnings that are known to be harmless).
%%% Local Variables:
%%% mode: latex
diff --git a/old/old_legup_bug.tex b/old/old_legup_bug.tex
new file mode 100644
index 0000000..00a61da
--- /dev/null
+++ b/old/old_legup_bug.tex
@@ -0,0 +1,24 @@
+\begin{example}[A miscompilation bug in LegUp]
+
+The test-case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect Verilog in LegUp 4.0 and 7.5, which means that the results of RTL simulation is different to the C execution.
+
+\begin{figure}
+\begin{minted}{c}
+volatile int a = 0;
+int b = 1;
+
+int main() {
+ int d = 1;
+ if (d + a)
+ b || 1;
+ else
+ b = 0;
+ return b;
+}
+\end{minted}
+\caption{An output mismatch: LegUp HLS returns 0 but the correct result is 1.}\label{fig:eval:legup:wrong}
+\end{figure}
+
+In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp. If the \texttt{volatile} keyword is removed from \texttt{a}, then the Verilog produces the correct result. As \texttt{a} and \texttt{d} are constants, the \code{if} statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the \code{if} statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not change. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the use of \texttt{volatile} keyword, which interferes with the analysis that attempts to simplify the \code{if} statement.
+
+\end{example} \ No newline at end of file
diff --git a/related.tex b/related.tex
index 039737d..8887e52 100644
--- a/related.tex
+++ b/related.tex
@@ -5,19 +5,10 @@ The only other work of which we are aware on fuzzing HLS tools is that by Lidbur
Other stages of the FPGA toolchain have been subjected to fuzzing. Herklotz et al.~\cite{verismith} tested several FPGA synthesis tools using randomly generated Verilog programs. Where they concentrated on the RTL-to-netlist stage of hardware design, we focus our attention on the earlier C-to-RTL stage.
Several authors have taken steps toward more rigorously engineered HLS tools that may be more resilient to testing campaigns such as ours.
-\begin{itemize}
-
-\item The Handel-C compiler by Perna and Woodcock~\cite{perna12_mechan_wire_wise_verif_handel_c_synth} has been mechanically proven correct, at least in part, using the HOL theorem prover.
-%However, the Handel-C language relies upon the user to perform the parallelisation that is necessary for good hardware performance, and would require many more checks to avoid data races.
-However, the tool does not support C as input directly, so is not amenable to fuzzing.
-
-\item Ramanathan et al.~\cite{ramanathan+17} proved their implementation of C atomic operations in LegUp correct up to a bound using model checking. However, our testing campaign is not applicable to their implementation because we do not generate concurrent C programs.
-
-\item In the SPARK HLS tool~\cite{gupta03_spark}, some compiler passes, such as scheduling, are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}. Unfortunately, this tool is not yet readily available to test properly.
-
-\item Finally, the Catapult C HLS tool~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. In future work, we intend to test Catapult C alongside Vivado HLS, LegUp, and Intel i++.
-
-\end{itemize}
+The Handel-C compiler by Perna and Woodcock~\cite{perna12_mechan_wire_wise_verif_handel_c_synth} has been mechanically proven correct, at least in part, using the HOL theorem prover; however, the tool does not support C as input directly, so is not amenable to fuzzing.
+Ramanathan et al.~\cite{ramanathan+17} proved their implementation of C atomic operations in LegUp correct up to a bound using model checking; however, our testing campaign is not applicable to their implementation because we do not generate concurrent C programs.
+In the SPARK HLS tool~\cite{gupta03_spark}, some compiler passes, such as scheduling, are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}; unfortunately, this tool is no longer available.
+Finally, the Catapult C HLS tool~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program; it should therefore never produce wrong RTL. In future work, we intend to test Catapult C alongside Vivado HLS, LegUp, and Intel i++.
%more prevalent these were prioritised.
% JW: We're not really sure that LegUp is more prevalent than Catapult HLS. Indeed, it probably isn't!
%\JW{Obvious reader question at this point: why not test that claim by giving our Csmith test cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}\YH{Yes, definitely, it would be great to get an idea of how Catapult C performs, and I currently have it installed already. I have added a small sentence for that now, but let me know if I should mention this in the conclusion instead though. }
diff --git a/tool-figure.tex b/tool-figure.tex
index efdaf95..a7e0131 100644
--- a/tool-figure.tex
+++ b/tool-figure.tex
@@ -1,22 +1,22 @@
-\begin{figure*}[t]
+\begin{figure}[t]
\centering
\tikzset{
file/.style={draw=none, fill=black!25, shape=circle, minimum width=5mm},
-block/.style={draw, minimum width=20mm, minimum height=8mm,align=center},
+block/.style={draw, minimum height=8mm,align=center},
}
\begin{tikzpicture}
- \node[block] (csmith) at (-2,1) {\strut Csmith};
- \node[block] (pragma) at (0,-1) {\strut pragma \\ generation};
- \node[block] (gcc) at (4,1) {\strut GCC};
- \node[block] (hls) at (4,-1) {\strut HLS};
- \node[block] (eq) at (8,1) {\strut co-simulation \\ checking };
- \node[block] (red) at (8,-1) {\strut reduction};
+ \node[block] (csmith) at (-1.3,1) {\strut Csmith};
+ \node[block] (pragma) at (-0.2,-1) {\strut pragma \\ generation};
+ \node[block] (gcc) at (2,1) {\strut GCC};
+ \node[block] (hls) at (3.2,-1) {\strut HLS tool \\ under test};
+ \node[block] (eq) at (5.5,1) {\strut co-simulation \\ checking };
+ \node[block] (red) at (5.8,-1) {\strut reduction};
\draw[-latex] (csmith.east) to [out=0, in=90] (pragma.north) ;
\draw[-latex] (csmith.east) to [auto] node {C program} (gcc.west);
- \draw[-latex] (pragma) to [auto] node[align=center] {C program $+$ \\ directives} (hls) ;
+ \draw[-latex] (pragma) to [auto] node[align=center] {C program \\ $+$ \\ directives} (hls) ;
\draw[-latex] (gcc) to [auto] node {executable} (eq);
- \draw[-latex] (hls) to [auto] node[inner sep=0] {Verilog program} (eq) ;
- \draw[-latex,dashed,red] (eq) to [auto] node [red] {fail} (red) ;
+ \draw[-latex] (hls) to [auto, pos=0.3] node[inner sep=0, align=center] {Verilog \\ program} (eq) ;
+ \draw[-latex,dashed,red] (eq) to [auto, pos=0.7] node [red] {fail} (red) ;
\draw[-latex,dashed,red] (hls) to [auto] node [red] {crash} (red) ;
% marking the test harness
%\node[] (label) at (3.7,1.75) {\strut Test harness} ;
@@ -27,4 +27,4 @@ block/.style={draw, minimum width=20mm, minimum height=8mm,align=center},
% \includegraphics{}
\caption{The overall flow of our approach to fuzzing HLS tools.}
\label{fig:method:toolflow}
-\end{figure*}
+\end{figure}