summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2021-04-04 21:03:01 +0000
committeroverleaf <overleaf@localhost>2021-04-05 12:44:36 +0000
commit6a7ebeb13cd708e5b9e2a09400255ccbd3f0242a (patch)
tree33565de568589a583c91e462c734da2e7e9d07bd
parent62a127dfb009b8ffe94ac348ecafb7f596406cbd (diff)
downloadfccm21_esrhls-6a7ebeb13cd708e5b9e2a09400255ccbd3f0242a.tar.gz
fccm21_esrhls-6a7ebeb13cd708e5b9e2a09400255ccbd3f0242a.zip
Update on Overleaf.
-rw-r--r--eval.tex4
-rw-r--r--main.tex2
-rw-r--r--method.tex2
3 files changed, 4 insertions, 4 deletions
diff --git a/eval.tex b/eval.tex
index 88ab6a1..a1b9b28 100644
--- a/eval.tex
+++ b/eval.tex
@@ -66,7 +66,7 @@ Figures~\ref{fig:eval:legup:crash}, \ref{fig:eval:intel:mismatch}, and~\ref{fig:
int a[2][2][1] = {{{0},{1}},{{0},{0}}};
int main() { a[0][1][0] = 1; }
\end{minted}
-\caption{This program leads to an internal compiler error (an unhandled assertion in this case) in LegUp 4.0. It initialises a 3D array with zeroes and then assigns to one element. The bug only appears when function inlining is disabled (\texttt{NO\_INLINE}).}
+\caption{This program leads to an internal compiler error (an unhandled assertion in this case) in LegUp 4.0. It initialises a 3D array with zeroes and then assigns one element. The bug only appears when function inlining is disabled (\texttt{NO\_INLINE}), thus confirming the effectiveness of generating random directives.}
\label{fig:eval:legup:crash}
\end{figure}
%An assertion error counts as a crash of the tool, as it means that an unexpected state was reached by this input.
@@ -224,7 +224,7 @@ For instance, the topmost ribbon represents the 31 test-cases that fail in all t
Interestingly, the blue ribbon shows that there are test-cases that fail in v2018.3, pass in v2019.1, and then fail again in v2019.2!
As in our Euler diagram, the numbers do not necessary correspond to the number of actual bugs, though we can observe that there must be at least six unique bugs in Vivado HLS, given that each ribbon corresponds to at least one unique bug.
-%\AD{This reminds me of the correcting commits metric from Junjie Chen et al.'s empirical study on compiler testing. Could be worth making the connection. }
+%\AD{This reminds me of the correcting commits metric from Junjie Chen et al.'s empirical study on compiler testing. https://xiongyingfei.github.io/papers/ICSE16.pdf. Could be worth making the connection. }
%\YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
%In addition to that, it can then be seen that Vivado HLS v2018.3 must have at least 4 individual bugs, of which two were fixed and two others stayed in Vivado HLS v2019.1. However, with the release of v2019.1, new bugs were introduced as well. % Finally, for version 2019.2 of Vivado HLS, there seems to be a bug that was reintroduced which was also present in Vivado 2018.3, in addition to a new bug. In general it seems like each release of Vivado HLS will have new bugs present, however, will also contain many previous bug fixes. However, it cannot be guaranteed that a bug that was previously fixed will remain fixed in future versions as well.
diff --git a/main.tex b/main.tex
index 5671dc9..f5bc5c5 100644
--- a/main.tex
+++ b/main.tex
@@ -59,7 +59,7 @@ Email: \{yann.herklotz15, zewei.du19, n.ramanathan14, j.wickerson\}@imperial.ac.
High-level synthesis (HLS) is becoming an increasingly important part of the computing landscape, even in safety-critical domains where correctness is key.
As such, HLS tools are increasingly relied upon. But are they trustworthy?
-We have subjected four widely used HLS tools -- LegUp, Xilinx Vivado HLS, the Intel HLS Compiler and Bambu -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all four HLS tools can be made to generate wrong designs and one tool could be made to crash when given valid C programs, which thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
+We have subjected four widely used HLS tools -- LegUp, Xilinx Vivado HLS, the Intel HLS Compiler and Bambu -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all four HLS tools can be made to generate wrong designs from valid C programs and one tool could be made to crash; this underlines the need for these increasingly trusted tools to be more rigorously engineered.
Out of \totaltestcases{} test-cases, we found \totaltestcasefailures{} programs that caused at least one tool to fail, out of which we were able to discern at least \numuniquebugs{} unique bugs.
\end{abstract}
diff --git a/method.tex b/method.tex
index c377c3e..d5fb319 100644
--- a/method.tex
+++ b/method.tex
@@ -122,7 +122,7 @@ We avoid floating-point numbers since these often involve external libraries or
To prepare the programs generated by Csmith for HLS testing, we modify them in two ways. First, we inject random HLS directives, which instruct the HLS tool to perform certain optimisations, including: loop pipelining, loop unrolling, loop flattening, loop merging, expression balancing, function pipelining, function-level loop merging, function inlining, array mapping, array partitioning, and array reshaping. Some directives can be applied via a separate configuration file (.tcl), some require us to add labels to the C program (e.g. to identify loops), and some require placing pragmas at particular locations in the C program.
-%\AD{Did any reduced test-case involve these HLS-specific features?} \JW{The LegUp bug in Figure 4 requires NO\_INLINE -- does that count? If so, perhaps we could append to the Figure 4 caption: `thus vindicating our strategy of adding random HLS directives to our test-cases'.}
+%rstrst\AD{Did any reduced test-case involve these HLS-specific features?} \JW{The LegUp bug in Figure 4 requires NO\_INLINE -- does that count? If so, perhaps we could append to the Figure 4 caption: `thus vindicating our strategy of adding random HLS directives to our test-cases'.}
The second modification has to do with the top-level function. Each program generated by Csmith ends its execution by printing a hash of all its variables' values, hoping that miscompilations will be exposed through this hash value. Csmith's built-in hash function leads to infeasibly long synthesis times, so we replace it with a simple XOR-based one.