summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-03-30 18:48:23 +0100
committerYann Herklotz <git@yannherklotz.com>2021-03-30 18:48:23 +0100
commit0f40e13fab830957ac055e076055280cdb82efff (patch)
tree45853a4552e3535f5ae8993a12d16537f984d23e
parent4eb24224015629936727e37f60738ac412578f50 (diff)
downloadfccm21_esrhls-0f40e13fab830957ac055e076055280cdb82efff.tar.gz
fccm21_esrhls-0f40e13fab830957ac055e076055280cdb82efff.zip
Fix use of test-case
-rw-r--r--eval.tex2
-rw-r--r--intro.tex4
-rw-r--r--main.tex2
-rw-r--r--method.tex2
-rw-r--r--related.tex4
5 files changed, 7 insertions, 7 deletions
diff --git a/eval.tex b/eval.tex
index 4ed8eac..f1543e4 100644
--- a/eval.tex
+++ b/eval.tex
@@ -74,7 +74,7 @@ int main() { a[0][1][0] = 1; }
%This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
% The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
-%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
+%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test-cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
%
%\begin{minted}{c}
%union U1 { int a; };
diff --git a/intro.tex b/intro.tex
index 7634475..7fa4905 100644
--- a/intro.tex
+++ b/intro.tex
@@ -61,9 +61,9 @@ Ultimately, we believe that any errors in an HLS tool are worth identifying beca
Our approach to fuzzing HLS tools comprises three steps.
First, we use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate thousands of valid C programs within the subset of the C language that is supported by all the HLS tools we test. We also augment each program with a random selection of HLS-specific directives. Second, we give these programs to four widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup}, the Intel HLS Compiler, also known as i++~\cite{intel20_sdk_openc_applic}, and finally Bambu~\cite{pilato13_bambu}. Third, if we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
-Our testing campaign revealed that all four tools could be made to generate an incorrect design. In total, \totaltestcases{} test cases were run through each tool, of which \totaltestcasefailures{} failed in at least one of the tools. Test case reduction was then performed on some of these failing test cases to obtain at least \numuniquebugs{} unique failing test cases.
+Our testing campaign revealed that all four tools could be made to generate an incorrect design. In total, \totaltestcases{} test-cases were run through each tool, of which \totaltestcasefailures{} failed in at least one of the tools. Test-case reduction was then performed on some of these failing test-cases to obtain at least \numuniquebugs{} unique failing test-cases.
-To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found far fewer failures in versions v2019.1 and v2019.2 compared to v2018.3, but we also identified a few test cases that only failed in versions v2019.1 and v2019.2; this suggests that some new features may have introduced bugs.
+To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found far fewer failures in versions v2019.1 and v2019.2 compared to v2018.3, but we also identified a few test-cases that only failed in versions v2019.1 and v2019.2; this suggests that some new features may have introduced bugs.
In summary, the overall aim of our paper is to raise awareness about the reliability (or lack thereof) of current HLS tools, and to serve as a call-to-arms for investment in better-engineered tools. We hope that future work on developing more reliable HLS tools will find our empirical study a valuable source of motivation.
diff --git a/main.tex b/main.tex
index 1d9a7e4..a3bb473 100644
--- a/main.tex
+++ b/main.tex
@@ -59,7 +59,7 @@ High-level synthesis (HLS) is becoming an increasingly important part of the com
As such, HLS tools are increasingly relied upon. But are they trustworthy?
We have subjected four widely used HLS tools -- LegUp, Xilinx Vivado HLS, the Intel HLS Compiler and Bambu -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all four HLS tools can be made either to crash or to generate wrong code when given valid C programs, and thereby underlines the need for these increasingly trusted tools to be more rigorously engineered.
-Out of \totaltestcases{} test cases, we found \totaltestcasefailures{} programs that failed in at least one tool, out of which we were able to discern at least \numuniquebugs{} unique bugs.
+Out of \totaltestcases{} test-cases, we found \totaltestcasefailures{} programs that failed in at least one tool, out of which we were able to discern at least \numuniquebugs{} unique bugs.
\end{abstract}
diff --git a/method.tex b/method.tex
index a06b2a0..4cc52be 100644
--- a/method.tex
+++ b/method.tex
@@ -194,7 +194,7 @@ To ensure that our testing scales to a large number of large programs, we enforc
%In cases where simulation takes longer, we do not consider them as failures or crashes and we record them.
% When testing Intel HLS, we place three time-outs since its execution is generally slower than other HLS tools: a \ref{XX}-minute time-out for C compilation, a \ref{XX}-hour time out for HLS compilation and a \ref{XX}-hour time-out for co-simulation.
-% And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test case can be dumped at any timeout period if the task is not finished within the limited time.
+% And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test-case can be dumped at any timeout period if the task is not finished within the limited time.
\subsection{Reducing buggy programs}\label{sec:method:reduce}
diff --git a/related.tex b/related.tex
index b2edec2..208f1fc 100644
--- a/related.tex
+++ b/related.tex
@@ -1,6 +1,6 @@
\section{Related Work}
-The only other work of which we are aware on fuzzing HLS tools is that by Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin}, who tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test-cases \JW{Need to audit `test case' vs `test-case' throughout. I don't mind which we use; I mildly prefer `test-case' because it's less ambiguous when parsing.} they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas and only including supported constructs) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the inherent nondeterminism of concurrency.
+The only other work of which we are aware on fuzzing HLS tools is that by Lidbury et al. \cite{lidbury15_many_core_compil_fuzzin}, who tested several OpenCL compilers, including an HLS compiler from Altera (now Intel). They were only able to subject that compiler to superficial testing because so many of the test-cases they generated led to it crashing. In comparison to our work: where Lidbury et al. generated target-independent OpenCL programs that could be used to test HLS tools and conventional compilers alike, we specifically generate programs that are tailored for HLS (e.g. with HLS-specific pragmas and only including supported constructs) with the aim of testing the HLS tools more deeply. Another difference is that where we test using sequential C programs, they test using highly concurrent OpenCL programs, and thus have to go to great lengths to ensure that any discrepancies observed between compilers cannot be attributed to the inherent nondeterminism of concurrency.
Other stages of the FPGA toolchain have been subjected to fuzzing. In previous work~\cite{verismith}, we tested several FPGA synthesis tools using randomly generated Verilog programs. Where that work concentrated on the RTL-to-netlist stage of hardware design, this work focuses on the C-to-RTL stage.
@@ -11,7 +11,7 @@ In the SPARK HLS tool~\cite{gupta03_spark}, some compiler passes, such as schedu
Finally, the Catapult C HLS tool~\cite{mentor20_catap_high_level_synth} is designed only to produce an output netlist if it can mechanically prove it equivalent to the input program; it should therefore never produce wrong RTL. In future work, we intend to test Catapult C alongside Vivado HLS, LegUp, Intel i++, and Bambu.
%more prevalent these were prioritised.
% JW: We're not really sure that LegUp is more prevalent than Catapult HLS. Indeed, it probably isn't!
-%\JW{Obvious reader question at this point: why not test that claim by giving our Csmith test cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}\YH{Yes, definitely, it would be great to get an idea of how Catapult C performs, and I currently have it installed already. I have added a small sentence for that now, but let me know if I should mention this in the conclusion instead though. }
+%\JW{Obvious reader question at this point: why not test that claim by giving our Csmith test-cases to Catapult C too? Can we address that here? No worries if not; but shall we try and do that after the deadline anyway?}\YH{Yes, definitely, it would be great to get an idea of how Catapult C performs, and I currently have it installed already. I have added a small sentence for that now, but let me know if I should mention this in the conclusion instead though. }
%%% Local Variables:
%%% mode: latex