summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2020-09-14 15:05:15 +0000
committeroverleaf <overleaf@localhost>2020-09-14 15:47:05 +0000
commitf09e782d0925bc735aadc29bf595d1e3cc187351 (patch)
tree9c323f14437787f81274f7aba10be2289f5acc94
parent11f9b46c8c5b3152435b3e5de43008e558f980dc (diff)
downloadfccm21_esrhls-f09e782d0925bc735aadc29bf595d1e3cc187351.tar.gz
fccm21_esrhls-f09e782d0925bc735aadc29bf595d1e3cc187351.zip
Update on Overleaf.
-rw-r--r--eval_rewrite.tex26
-rw-r--r--intro.tex76
-rw-r--r--main.tex2
-rw-r--r--method-new.tex145
-rw-r--r--related.tex8
-rw-r--r--tool-figure.tex2
6 files changed, 130 insertions, 129 deletions
diff --git a/eval_rewrite.tex b/eval_rewrite.tex
index bc90c89..2ad1c24 100644
--- a/eval_rewrite.tex
+++ b/eval_rewrite.tex
@@ -46,11 +46,16 @@ This section describes some of the bugs that were found in the various tools tha
The following piece of code produces an assertion error in LegUp even though it should compile without any errors, meaning an analysis pass in LegUp is incorrectly. This assertion error is equivalent to an unexpected crash of the tool as it means that an unexpected state was reached by this input. This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
+\begin{figure}
\begin{minted}{c}
int a[2][2][1] = {{{0},{1}},{{0},{0}}};
int main() { a[0][1][0] = 1; }
\end{minted}
+\caption{An assertion bug in LegUp HLS.}
+\label{fig:eval:legup:assert}
+\end{figure}
+
The buggy test case has to do with initialisation and assignment to a three dimensional array, for which the above piece of code is the minimal example. It initialises the array with zeros except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location, which causes LegUp to crash with an assertion error.
@@ -68,6 +73,7 @@ The buggy test case has to do with initialisation and assignment to a three dime
The following test case produces an incorrect netlist in LegUp 4.0, meaning the result of simulating the design and running the C code directly is different.
+\begin{figure}
\begin{minted}{c}
volatile int a = 0;
int b = 1;
@@ -79,6 +85,9 @@ int main() {
return b;
}
\end{minted}
+\caption{An output mismatch where GCC returns \ref{XX} and LegUp HLS returns \ref{XX}.}
+\label{fig:eval:legup:wrong}
+\end{figure}
In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the netlist contains the correct result. As \texttt{a} and \texttt{d} are constants, the if-statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0.
@@ -86,8 +95,8 @@ In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when
The following code does not output the right value when compiled with Vivado 2019.2 and GCC.
+\begin{figure}
\begin{minted}{c}
-#include <stdio.h>
#include <stdint.h>
volatile uint32_t g_2 = 0;
@@ -97,20 +106,19 @@ uint32_t c = 0;
void d(uint8_t b) { c = (c & 4095) ^ a[(c ^ b) & 15]; }
void e(uint64_t f) {
- d(f); d(f >> 8); d(f >> 16);
- d(f >> 24); d(f >> 32); d(f >> 40);
- d(f >> 48);
+ d(f); d(f >> 8); d(f >> 16); d(f >> 24);
+ d(f >> 32); d(f >> 40); d(f >> 48);
}
int result() {
- int i = 0;
- for (; i < 56; i++) { a[i] = i; }
- e(g_2);
- e(-2L);
- printf("checksum = %X\n", c);
+ for (int i=0; i < 56; i++) a[i] = i;
+ e(g_2); e(-2L);
return c;
}
\end{minted}
+\caption{An output mismatch where GCC returns \ref{XX} and Vivado HLS return \ref{XX}.}
+\label{fig:eval:vivado:mismatch}
+\end{figure}
\subsection{Bugs in Vivado HLS versions}
diff --git a/intro.tex b/intro.tex
index 99bffa3..da09f55 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,4 +1,8 @@
+
\section{Introduction}
+High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape.
+It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL.
+It is even being used in high-assurance settings, such as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
\begin{figure}[t]
\centering
@@ -7,29 +11,23 @@ unsigned int b = 0x1194D7FF;
int a[6] = {1, 1, 1, 1, 1, 1};
int main() {
- int c;
- for (c = 0; c < 2; c++)
- b = b >> a[c];
+ for (int c = 0; c < 2; c++) b = b >> a[c];
return b;
}
\end{minted}
- \caption{Miscompilation bug found in Vivado 2018.3 and 2019.2 which returns \code{0x006535FF} instead of \code{0x046535FF} which is the correct result.}\label{fig:vivado_bug1}
+ \caption{Miscompilation bug found in Xilinx Vivado HLS 2018.3 and 2019.2. The program returns \code{0x006535FF} but the correct result is \code{0x046535FF}. \JW{Collapse lines 5-7 into a single line?}\YH{Yes I think it's good like this}}
+ \label{fig:vivado_bug1}
\end{figure}
-High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape.
-It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL.
-It is even being used in high-assurance settings, such as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
-
-To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases.
-Therein lies the difficulty in testing HLS tools.
-Human testing may not achieve both these objectives, as HLS tools are often require complex inputs to trigger wrong behaviour.
-In this paper, we employ program fuzzing on HLS tools.
-
-Fuzzing is an automated testing method that provides unexpected, counter-intuitive and random programs to compilers to test their robustness~\cite{fuzzing+chen+13+taming,fuzz+sun+16+toward,fuzzing+liang+18+survey,fuzzing+zhang+19,yang11_findin_under_bugs_c_compil,lidbury15_many_core_compil_fuzzin}.
-Program fuzzing has been used extensively in testing software compilers.
-For example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} found more than 300 bugs in GCC and clang.
-Despite of the influence of fuzzing on software compilers, to the best of our knowledge, it has not been explored significantly within the HLS context.
-We specifically target HLS by restricting a fuzzer to generate programs within the subset of C supported by HLS.
+The approach we take in this paper is \emph{fuzzing}.
+%To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases.
+%Therein lies the difficulty in testing HLS tools.
+%Human testing may not achieve both these objectives, as HLS tools are often require complex inputs to trigger wrong behaviour.
+%In this paper, we employ program fuzzing on HLS tools.
+This is an automated testing method in which randomly generated programs are given to compilers to test their robustness~\cite{fuzzing+chen+13+taming,fuzz+sun+16+toward,fuzzing+liang+18+survey,fuzzing+zhang+19,yang11_findin_under_bugs_c_compil,lidbury15_many_core_compil_fuzzin}.
+The generated programs are typically large and rather complex, and they often combine language features in ways that are legal but counter-intuitive; hence they can be effective at exercising corner cases missed by human-designed test suites.
+Fuzzing has been used extensively to test conventional compilers; for example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} used it to reveal more than three hundred bugs in GCC and Clang. In this paper, we bring fuzzing to the HLS context.
+%We specifically target HLS by restricting a fuzzer to generate programs within the subset of C supported by HLS.
% Most fuzzing tools randomly generate random C programs that are then provided to the compiler under test.
@@ -44,31 +42,31 @@ We specifically target HLS by restricting a fuzzer to generate programs within t
% Fuzzing enables us to overcome
-\paragraph{An example of a fuzzed buggy program}
-Figure~\ref{fig:vivado_bug1} shows a minimal example that produces the wrong result during RTL simulation in VivadoHLS, compared to GCC execution.
-In this example, we right shift a large integer value \code{b} by values of array elements, in array \code{a}, within iterations of a \code{for}-loop.
-VivadoHLS returns \code{0x006535FF} instead of \code{0x046535FF} as in GCC.
-The circumstances in which we found this bug shows the challenge of testing HLS tools.
+\paragraph{An example of a compiler bug found by fuzzing}
+Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS. The bug was initially revealed by a large, randomly generated program, which we reduced to the minimal example shown in the figure.
+The program repeatedly shifts a large integer value \code{b} right by the values stored in array \code{a}.
+Vivado HLS returns \code{0x006535FF}, but the result returned by GCC (and subsequently manually confirmed to be the correct one) is \code{0x046535FF}.
-For instance, the for-loop is necessary to ensure that a bug was detected.
-Also, the shift value needs to be accessed from an array.
-Replacing the array accesses within the loop with constants result in the bug not surfacing.
-Additionally, the array \code{a} needed to be at least six elements in size although the for-loop only has two iterations.
-% Any array smaller than that did not surface this bug.
-Finally, the value of \code{b} is an oracle that could not be changed without masking the bug.
-Producing such circumstances within C code for HLS testing is both arduous and counter-intuitive to human testers.
-In contrast, producing non-intuitive, complex but valid C programs is the cornerstone of fuzzing tools.
-Thus, it was natural to adopt program fuzzing for our HLS testing campaign.
+The circumstances in which we found this bug illustrate some of the challenges in testing HLS tools.
+For instance, without the for-loop, the bug goes away.
+Moreover, the bug only appears if the shift values are accessed from an array.
+And -- particularly curiously -- even though the for-loop only has two iterations, the array \code{a} must have at least six elements; if it has fewer than six, the bug disappears.
+Even the seemingly random value of \code{b} could not be changed without masking the bug.
+It seems unlikely that a manually generated test program would bring together all of the components necessary for exposing this bug.
+In contrast, producing counter-intuitive, complex but valid C programs is the cornerstone of fuzzing tools.
+For this reason, we found it natural to adopt fuzzing for our HLS testing campaign.
% \NR{Yann, please double check my claims about the bug. I hope I accurately described what we discussed. }\YH{Yes I agree with all that, I think that is a good description of it}
-\paragraph{Our contributions}
-In this paper, we conduct a widespread testing campaign by fuzzing HLS compilers.
-We do so in the following manner:
+\paragraph{Our contribution}
+This paper reports on our campaign to test HLS tools by fuzzing.
\begin{itemize}
- \item We utilise Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate well-formed C programs from the subset of the C language supported by HLS tools;
- \item Then, we test these programs together with a random selection of HLS directives by comparing the gcc and HLS outputs, and we also keep track of programs that crash HLS tools;
- \item As part of our testing campaign, we generate 10 thousand test cases that we test against the three well-known HLS tools: Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and Intel HLS~\cite{intel20_sdk_openc_applic};
- \item During our testing campaign, we found \ref{XX} bugs that we discuss and also report to the respective developers, where \ref{XX} bugs have been confirmed.
+ \item We use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate ten thousand valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We augment each program with a random selection of HLS-specific directives.
+
+ \item We give these programs to three widely used HLS tools: Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and Intel HLS~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the C-reduce tool~\cite{creduce}.
+
+ \item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing.
+
+ \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). \JW{Put a sentence here summarising our findings from this experiment, once we have them.}
\end{itemize}
% we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by gcc, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug.
diff --git a/main.tex b/main.tex
index 540471f..ace7062 100644
--- a/main.tex
+++ b/main.tex
@@ -115,7 +115,7 @@
High-level synthesis (HLS) is becoming an increasingly important part of the computing landscape, even in safety-critical domains where correctness is key.
As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy.
-We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and the Intel HLS Compiler -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the gcc compiler. When discrepancies arose between gcc and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all three HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
+We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and the Intel HLS Compiler -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the gcc \JW{We should decide on a consistent way to capitalise `gcc'. I vote for `GCC' because that's how Wikipedia writes it.} compiler. When discrepancies arose between gcc and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all three HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
\end{abstract}
%%
diff --git a/method-new.tex b/method-new.tex
index 860aea6..e34d99a 100644
--- a/method-new.tex
+++ b/method-new.tex
@@ -2,16 +2,13 @@
% \NR{I think this section is very detailed. I think we can start with a figure of our tool-flow. Then, we can break down each item in the figure and discuss the necessary details (Fig.~\ref{fig:method:toolflow}).}
+This section describes how we conducted our testing campaign, the overall flow of which is shown in Figure~\ref{fig:method:toolflow}.
\input{tool-figure}
-
-In this section, we present the manner in which we perform our testing campaign.
-Figure~\ref{fig:method:toolflow} shows our overall testing flow.
-We present the following material in this section:
\begin{itemize}
-\item In~\S\ref{sec:method:csmith}, we introduce what CSmith provides us and how we configure CSmith to generate HLS-friendly random programs for our testing campaign.
-\item In~\S\ref{sec:method:annotate}, we discuss how we annotate random programs generated by CSmith with directives and the necessary configuration files for HLS compilation.
-\item In~\S\ref{sec:method:testing}, we discuss how we setup the HLS compilation and co-simulation checking for various HLS tools.
-\item Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce the CSmith-generated programs that either crash the HLS tool under test or produce a different result compared to their gcc executables.
+\item In~\S\ref{sec:method:csmith}, we describe how we configure CSmith to generate HLS-friendly random programs for our testing campaign.
+\item In~\S\ref{sec:method:annotate}, we discuss how we augment those random programs with directives and the necessary configuration files for HLS compilation.
+\item In~\S\ref{sec:method:testing}, we discuss how we set up compilation and co-simulation checking for the three HLS tools under test.
+\item Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic programs in order to obtain minimal examples of bugs.
\end{itemize}
% How we configure Csmith so that it only generates HLS-friendly programs.
@@ -31,18 +28,18 @@ Questions:
}
\YH{I believe that we are now maybe using our own hashing, even though one of the bugs in Vivado was found in the Csmith hashing algorithm.. So I don't know how best to mention that.}
-For our testing campaign, we require a random generator that produces program that are both valid and diverse in features.
-Csmith fits both these criteria.
-Csmith is randomised code generator of C programs for compiler testing, that has found more than 400 bugs in software compilers~\cite{yang11_findin_under_bugs_c_compil}.
-Csmith provides several properties to ensure generation of valid C programs.
-Validity is critical for us since these random programs are treated as our ground truth in our testing setup, as shown in Figure~\ref{fig:method:toolflow}.
+For our testing campaign, we require a random program generator that produces C programs that are both semantically valid and feature-diverse; Csmith~\cite{yang11_findin_under_bugs_c_compil} meets both these criteria.
+%Csmith is randomised code generator of C programs for compiler testing, that has found more than 400 bugs in software compilers.
+%Csmith provides several properties to ensure generation of valid C programs.
+Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
+%Validity is critical for us since these random programs are treated as our ground truth in our testing setup, as shown in Figure~\ref{fig:method:toolflow}.
-Additionally, Csmith allows users to diversify the features generated in these programs.
-These features include choosing probabilities for various C constructs in addition to choosing when constructs can be generated.
-By default, Csmith assigns a probability value for each of these features.
-Csmith also allows users to customise these values to tune program generation for their own purposes.
-The ability to customise program generation is vital for our work since we want to generate programs that are HLS-friendly.
-Configuring Csmith to generate HLS-friendly programs is mostly an exercise of reducing and restricting feature probabilities.
+Additionally, Csmith allows users control over how it generates programs.
+For instance, the probabilities of choosing various C constructs can be tuned.
+%By default, Csmith assigns a probability value for each of these features.
+%Csmith also allows users to customise these values to tune program generation for their own purposes.
+This is vital for our work since we want to generate programs that are HLS-friendly.
+%Configuring Csmith to generate HLS-friendly programs is mostly an exercise of reducing and restricting feature probabilities.
\begin{table}
@@ -69,84 +66,82 @@ Configuring Csmith to generate HLS-friendly programs is mostly an exercise of re
\code{-{}-max-funcs} & 5 \\
\code{-{}-max-block-depth} & 2 \\
\code{-{}-max-array-dim} & 3 \\
- \code{-{}–max-expr-complexity} & 2 \\
+ \code{-{}-max-expr-complexity} & 2 \\
\bottomrule
\end{tabular}
\caption{Summary of important changes to Csmith's feature probabilities and parameters to generate HLS-friendly programs for our testing campaign.
+ \JW{Shouldn't `no-safe-math' be disabled, rather than enabled? I mean, we want safe-math enabled, right? So we should disable no-safe-math, no?}
% \NR{Discussion point: Do we enable unions or not? If not, why do have a union bug in Fig. 4? Reviewers may question this.}
% \YH{I've now removed the union bug, it's too similar to another bug I think.}
}\label{tab:properties}
\end{table}
-\paragraph{Significant probability changes}
-Table~\ref{tab:properties} lists the important changes that we put in place to ensure that HLS tools are able to synthesise programs generated by Csmith.
-We only increase the probability of generating \code{if} statements for HLS testing, but reduce the probability of generating \code{for} loops and array operations.
-Additionally, we reduce the probability of complex control paths by limiting the generation of \code{break}, \code{goto}, \code{continue} and \code{return} statements.
+%\paragraph{Significant probability changes}
+Table~\ref{tab:properties} lists the main changes that we put in place to ensure that HLS tools are able to synthesise all of our generated programs.
+We increase the probability of generating \code{if} statements for HLS testing \JW{Why?}, but reduce the probability of generating \code{for} loops and array operations. \JW{Why?}
+Additionally, we reduce the probability of complex control paths by limiting the generation of \code{break}, \code{goto}, \code{continue} and \code{return} statements. \JW{Why?}
-\paragraph{Important restrictions}
+%\paragraph{Important restrictions}
More importantly, we disable the generation of a list of features to enable HLS testing.
% \NR{Few sentences on different features whose probabilities were changed significantly.}
% Table~\ref{tab:properties} lists the important restrictions that we put in place to ensure that HLS tools are able to synthesise programs generated by Csmith.
-Firstly, we ensure all math expressions are safe and unsigned.
-We also disable assignments to be embedded within expressions.
-We eliminate any floating-point numbers as part of HLS testing, since they typically involve external libraries or use of hard IPs on FPGA.
-We also disable the generation of pointers for HLS testing, since pointer support in HLS tools are either absent or not mature.
-We also disable any void functions, since we are not supporting pointers.
-We disable generation of unions and unions within C structures as these weren't well supported by some of the tools such as LegUp 4.0.
-
-Several tool-specific restrictions are also put in place.
-We enforce the the main function of these programs must not have any input arguments, to allow effective HLS synthesis and simulation.
-We disable structure packing within Csmith since it generates the ``\code{\#pragma pack(1)}'' before structure definition.
-This pragma cause conflicts in HLS tools since it is interpreted as an unsupported pragma.
-We also disable generation of bit-wise binary AND and OR operations.
-When we applied these bit-wise operators applied on constants caused specific versions of Vivado HLS to error out as ``Wrong pragma usage''.
-\NR{Did we report this problem to the developers? If so, can we claim that here?}\YH{We haven't I think, we only reported one bug.}
-
-\paragraph{Parameter settings}
-Finally, we also control several integer parameters that influence program generation.
-We limit the maximum number of functions (five) and array dimensions (three) in our random C programs.
-We also limit the depth of statements and expressions of programs generated by Csmith.
-
-\NR{I remove Zewei's writing but please feel free to re-instate them}
-
-\subsection{Annotating programs for HLS testing}
+Firstly, we ensure all math expressions are safe and unsigned. \JW{Why?}
+We also disable assignments to be embedded within expressions. \JW{Why?}
+We eliminate any floating-point numbers as part of HLS testing, since they typically involve external libraries or use of hard IPs on FPGA \JW{which is a problem because...}.
+We also disable the generation of pointers for HLS testing, since pointer support in HLS tools are either absent \JW{citation?} or not mature \JW{citation?}.
+We also disable void functions, since we are not supporting pointers.
+We disable generation of unions and unions within C structures \JW{can we just say `unions' then? Why do we also need `unions within C structures'?} as these weren't well supported by some of the tools such as LegUp 4.0.
+\JW{Obvious reader question at this point: if a feature is badly-supported by some HLS tool(s), how do we decide between disabling it in Csmith vs. keeping it in and filing lots of bug reports? For instance, we say that we disable pointers because lots of HLS tools don't cope well with them, but we keep in `volatile' which also leads to problems. Why the different treatments?}
+We enforce that the main function of each generated program must not have any input arguments, to allow effective HLS synthesis and simulation. \JW{`effective' is rather vague. Can we put in a better justification here?}
+We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma.
+We also disable generation of bitwise AND and OR operations because when applied to constant operands, some versions of Vivado HLS errored out with `Wrong pragma usage.'
+%\NR{Did we report this problem to the developers? If so, can we claim that here?}\YH{We haven't I think, we only reported one bug.}
+
+%\paragraph{Parameter settings}
+Finally, we control several integer parameters that influence program generation.
+We limit the maximum number of functions (five) and array dimensions (three) in our random C programs. \JW{Is this bigger or smaller than the default? Either way: why did we make this change?}
+We also limit the depth of statements and expressions. \JW{Why? Presumably this is to do with keeping synthesis/simulation times as low as possible?}
+
+%\NR{I remove Zewei's writing but please feel free to re-instate them}
+
+\subsection{Augmenting programs for HLS testing}
\label{sec:method:annotate}
-We annotate the programs generated by Csmith to prepare them for HLS testing.
-We perform annotations in two ways: program instruction and directive injection.
-These annotations involve either modifying the C program or accompanying the C program with a configuration file, which typically a \code{tcl} file.
-Finally, we must allow generate tool-specific build script to accompanying the annotated C program and its configuration file.
-
-\paragraph{Instrumenting original C program}
-In order to the C program generated by Csmith that are synthesisable and simulatable in HLS, we must adhere to a few rules that are common to HLS tools.
-Firstly, we must generate a synthesisable testbench that execute the main function of original C program generated by Csmith.
-Csmith does not generate meaningful testbenches for HLS synthesis.
-So we invoke the original C program's main function from another top-level function.
-Secondly, this top-level testbench is coupled with custom hash functions for program instrumentation.
-We inject XOR hashing on program states at several stages of the C program.
-By doing so, we keep track of program state, increasing the likelihood of encountering a bug.
-The final hash value is returned as a part of the main function to assist in determining the presence of a bug.
+We augment the programs generated by Csmith to prepare them for HLS testing.
+We do this in two ways: program instrumentation and directive injection.
+This involves either modifying the C program or accompanying the C program with a configuration file, typically a \code{tcl} file.
+Finally, we must also generate a tool-specific build script per program, which instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.
+
+\paragraph{Instrumenting the original C program}
+%To ensure that the C programs can be successfully synthesised and simulated, we must adhere to a few rules that are common to HLS tools.
+We generate a synthesisable testbench that executes the main function of the original C program.
+%Csmith does not generate meaningful testbenches for HLS synthesis.
+%So we invoke the original C program's main function from another top-level function.
+This top-level testbench contains a custom XOR-based hash function that takes hashes of the program state at several points during execution, combines all these hashes together, and then returns this value.
+By making the program's output sensitive to the program state in this way, we maximise the likelihood of detecting bugs when they occur.
+Csmith-generated programs do already include their own hashing function, but we replaced this with a simple XOR-based hashing function because we found that the Csmith one led to infeasibly long synthesis times.
+%We inject hashing on program states at several stages of the C program.
+%By doing so, we keep track of program state, increasing the likelihood of encountering a bug.
+%The final hash value is returned as a part of the main function to assist in determining the presence of a bug.
\paragraph{Injecting HLS directives}
-Directives enable HLS tools to optimise the resultant hardware to meet specific performance, power and area targets.
-Typically, a HLS tool identifies these directives and subjects the C code to customised optimisation passes.
-Hence, it is essential for our testing campaign to attempts to target these optimisation passes for bug hunting.
-We randomly generated directives each C program generated by Csmith.
-Most directives can be applied in a separate configuration file.
-However, some directives require us to introduce program labels in the C program, and a few directives requires place pragmas at particular locations in a C program.
-We target directive generation at three level of optimisations: loops, functions and variables.
+Directives are used to instruct HLS tools to optimise the resultant hardware to meet specific performance, power and area targets.
+Typically, a HLS tool identifies these directives and subjects the C code to customised optimisation passes.
+In order to test the robustness of these parts of the HLS tools, we randomly generated directives for each C program generated by Csmith.
+Some directives can be applied via a separate configuration file, others require us to add labels in the C program (e.g. to identify loops), and a few directives require placing pragmas at particular locations in a C program.
+We generate three classes of directives: those for loops, those for functions, and those for variables.
For loops, we randomly generate directives including loop pipelining (with rewinding and flushing), loop unrolling, loop flattening, loop merging, loop tripcount, loop inlining, and expression balancing.
-\NR{Not so clear about nested loops. Discuss.}\YH{I believe they are just handled like simple loops?}
+%\NR{Not so clear about nested loops. Discuss.}\YH{I believe they are just handled like simple loops?}
For functions, we randomly generate directives including function pipelining, function-level loop merging, function inlining, and expression balancing.
For variables, we randomly generate directives including array mapping, array partitioning and array reshaping.
-\NR{Would a table of directives help? }\YH{I think there might be too many if we go over all the tools.}
+%\NR{Would a table of directives help? }\YH{I think there might be too many if we go over all the tools.}
-\paragraph{Generating build script}
-In addition to the annotated C program and configuration file with directives, we also must generate a tool-specific build script.
-This script instructs the tool to create a design project and perform the necessary steps to build and simulate the design.
+%\paragraph{Generating build script}
+%In addition to the annotated C program and configuration file with directives, we also generate a tool-specific build script.
+%This script instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.
-\NR{Have I missed any LegUp directives? What is these lines referring to. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. }\YH{I think that covers most of it.}
+%\NR{Have I missed any LegUp directives? What is these lines referring to. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. }\YH{I think that covers most of it.}
\subsection{Testing various HLS tools}
diff --git a/related.tex b/related.tex
index 2296f33..466040b 100644
--- a/related.tex
+++ b/related.tex
@@ -1,10 +1,10 @@
-\section{Background}
+\section{Related Work}
-Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin} fuzz-tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test cases they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the nondeterminism of concurrency.
+The only other work of which we are aware on fuzzing HLS tools is that by Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin}, who tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test cases they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the inherent nondeterminism of concurrency.
-Herklotz et al.~\cite{verismith} fuzz-tested several FPGA synthesis tools using randomly generated Verilog programs. Where they concentrated on the RTL-to-netlist stage of hardware design, we focus our attention on the earlier C-to-RTL stage.
+Other stages of the FPGA toolchain have been subjected to fuzzing. Herklotz et al.~\cite{verismith} tested several FPGA synthesis tools using randomly generated Verilog programs. Where they concentrated on the RTL-to-netlist stage of hardware design, we focus our attention on the earlier C-to-RTL stage.
-Several authors have taken steps toward more rigorously engineered HLS tools that may fare better under testing campaigns like ours. One such example might be a Handel-C compiler which has been partially formally proven in a theorem prover called HOL~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}. However, Handel-C requires the manual parallelisation of hardware and would require many more checks to avoid data races. In addition to that, a HLS tool called SPARK~\cite{gupta03_spark} also contains passes that are validated during compilation time~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}, such as the scheduling algorithm. Unfortunately the tools are not readily available yet to properly test, or do not support C as input directly. Finally Catapult C~\cite{mentor20_catap_high_level_synth} is also a high-level synthesis tool that proves that the netlist is equivalent to the input, and should therefore not produce any bugs and only fail if the output cannot be proven.
+Several authors have taken steps toward more rigorously engineered HLS tools that may be more resilient to testing campaigns such as ours. One such example might be a Handel-C compiler which has been mechanically proven correct, at least in part, using the HOL theorem prover~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}. However, Handel-C relies upon the user to perform the parallelisation that is necessary for good hardware performance, and would require many more checks to avoid data races. Another example is an HLS tool called SPARK~\cite{gupta03_spark}, which contains passes that are mechanically validated during compilation~\cite{chouksey20_verif_sched_condit_behav_high_level_synth}, such as the scheduling algorithm. Unfortunately the tools are not readily available yet to test properly, or do not support C as input directly. Finally, the HLS tool Catapult C~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program. It should therefore never produce wrong RTL. \JW{Obvious reader question at this point: why not test that claim by giving our Csmith test cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}
%%% Local Variables:
%%% mode: latex
diff --git a/tool-figure.tex b/tool-figure.tex
index fe38dd3..f4226d6 100644
--- a/tool-figure.tex
+++ b/tool-figure.tex
@@ -1,4 +1,4 @@
-\begin{figure*}[th]
+\begin{figure*}[t]
\centering
\tikzset{
file/.style={draw=none, fill=black!25, shape=circle, minimum width=5mm},