summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2020-09-15 09:46:54 +0000
committeroverleaf <overleaf@localhost>2020-09-15 09:48:42 +0000
commit05e03565de7b4a5319c9d4c15f327317229f41b5 (patch)
treed62154f53486814303f967866a32f9e78fe47402
parent4daefd9bed7dea9500b3cc626266fd979fb2edcc (diff)
downloadfccm21_esrhls-05e03565de7b4a5319c9d4c15f327317229f41b5.tar.gz
fccm21_esrhls-05e03565de7b4a5319c9d4c15f327317229f41b5.zip
Update on Overleaf.
-rw-r--r--eval.tex4
-rw-r--r--intro.tex2
-rw-r--r--main.tex2
-rw-r--r--method.tex6
4 files changed, 7 insertions, 7 deletions
diff --git a/eval.tex b/eval.tex
index c3c66fc..bf8dc5d 100644
--- a/eval.tex
+++ b/eval.tex
@@ -14,7 +14,7 @@
\draw[white] (-10.2,4.4) ellipse (3.75 and 2.75); % outlines
\draw[white] (-7.3,2) ellipse (3.75 and 2.75); % fully opaque
\node[align=center] at (-10.2,6.3) {\Large\textsf{\textbf{Xilinx Vivado HLS}} \\ \Large\textsf{\textbf{2019.1}}};
- \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel i++}}};
+ \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel i++ 18.1}}};
\node at (-7.3,0) {\Large\textsf{\textbf{LegUp 4.0}}};
\node at (-5.5,3) {\Huge 1 (\textcolor{red}{1})};
@@ -50,7 +50,7 @@ We were able to test three different versions of Vivado HLS (v2018.3, v2019.1 an
We were only able to test one version of LegUp: 4.0.
At the point of writing, LegUp 7.5 is still GUI-based and therefore we could not script our tests.
However, we were able to manually reproduce bugs found in LegUp 4.0 in LegUp 7.5.
-Finally, we tested one version of Intel i++ (v\ref{XXXX.X}).
+Finally, we tested one version of Intel i++ 18.1.
% Three different tools were tested, including three different versions of Vivado HLS. We were only able to test one version of LegUp HLS (version 4.0), because although LegUp 7.5 is available, it is GUI-based and not amenable to scripting. However, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
% LegUp and Vivado HLS were run under Linux, while the Intel HLS Compiler was run under Windows.
diff --git a/intro.tex b/intro.tex
index 2a48d4b..44d0b1c 100644
--- a/intro.tex
+++ b/intro.tex
@@ -64,7 +64,7 @@ This paper reports on our campaign to test HLS tools by fuzzing.
\item We give these programs to three widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and the Intel HLS Compiler, which is also known as i++~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
- \item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing.
+ \item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found 6 bugs across the three tools.
\item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). We found that in general there about half as many failures in versions 2019.1 and 2019.2 compared to 2018.3. However, there were also test-cases that only failed in versions 2019.1 and 2019.2, meaning bugs were probably introduced due to the addition of new features.
\end{itemize}
diff --git a/main.tex b/main.tex
index 4ec8a73..2366220 100644
--- a/main.tex
+++ b/main.tex
@@ -152,7 +152,7 @@ We have subjected three widely used HLS tools -- LegUp, Xilinx Vivado HLS, and t
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
-\keywords{fuzzing}
+\keywords{compilers, fuzzing, hardware design, reliability, testing}
%%
%% This command processes the author and affiliation and title
diff --git a/method.tex b/method.tex
index b9356ec..701a3e3 100644
--- a/method.tex
+++ b/method.tex
@@ -31,7 +31,7 @@ This section describes how we conducted our testing campaign, the overall flow o
For our testing campaign, we require a random program generator that produces C programs that are both semantically valid and feature-diverse; Csmith~\cite{yang11_findin_under_bugs_c_compil} meets both these criteria.
%Csmith is randomised code generator of C programs for compiler testing, that has found more than 400 bugs in software compilers.
%Csmith provides several properties to ensure generation of valid C programs.
-Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
+Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes, which renders the program useless as a test-case). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
%Validity is critical for us since these random programs are treated as our ground truth in our testing setup, as shown in Figure~\ref{fig:method:toolflow}.
Additionally, Csmith allows users control over how it generates programs.
@@ -195,11 +195,11 @@ As the programs generated by Csmith can be fairly large, we must systematically
Reduction is performed by iteratively removing some part of the original program and then providing the reduced program to the HLS tool for re-synthesis and co-simulation.
The goal is to find the smallest program that still triggers the bug.
We apply two consecutive methods of reduction in this work.
-We first perform a custom reduction in which we iteratively remove the HLS directives that we added before synthesis of the C program.
+The first step is to reduce the labels and pragmas that were added afterwards to make sure that these do not affect the behaviour of the program. These are reduced iteratively until there are no more declarations left or the bug does not get triggered anymore.
% \NR{We can add one or two more sentences summarising how we reduce the programs. Zewei is probably the best person to add these sentences.}\YH{Added some more lines, we can ask Zewei if she is OK with that.}
%Although, our custom reduction gives us the freedom and control of how to reduce buggy programs, it is arduous and requires a lot of manual effort.
We then use the \creduce{} tool~\cite{creduce} to automatically reduce the remaining C program.
-\creduce{} is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
+\creduce{} is an existing reducer for C and C++ and runs the reduction steps in parallel to converge as quickly as possible. It is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test-case that is small enough to understand and step through.
However, the downside of using \creduce{} with HLS tools is that we are not in control of which lines and features are prioritised for removal.