summaryrefslogtreecommitdiffstats
path: root/intro.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2021-03-26 21:16:22 +0000
committeroverleaf <overleaf@localhost>2021-03-28 14:43:30 +0000
commit546590b262efafc33133059f1184268dd25d89fe (patch)
treef379f428c06d1fafe2e0932d03cc500d8a40c5de /intro.tex
parentb16f3242909e90bdb547ab1500d46f5e38527981 (diff)
downloadfccm21_esrhls-546590b262efafc33133059f1184268dd25d89fe.tar.gz
fccm21_esrhls-546590b262efafc33133059f1184268dd25d89fe.zip
Update on Overleaf.
Diffstat (limited to 'intro.tex')
-rw-r--r--intro.tex23
1 files changed, 11 insertions, 12 deletions
diff --git a/intro.tex b/intro.tex
index 29f09d1..7634475 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,8 +1,9 @@
\section{Introduction}
-High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape.
-It promises hardware engineers an increase in productivity by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware description languages (HDL) such as Verilog and VHDL.
-HLS is being used in an ever greater range of domains, including such high-assurance settings as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon, even though ``high-level synthesis research and development is inherently prone to introducing bugs or regressions in the final circuit functionality''~\cite[Section 3.4.6]{canis15_legup}. In this paper, we investigate whether they are trustworthy and give an empirical evaluation of their reliability.
+High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape, even in such high-assurance settings as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}.
+The appeal of HLS is twofold: it promises hardware engineers an increase in productivity by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand Verilog and VHDL.
+
+As such, we are increasingly reliant on HLS tools. But are these tools reliable? Questions have been raised about the reliability of HLS before; for example, Andrew Canis, co-creator of the LegUp HLS tool, wrote that ``high-level synthesis research and development is inherently prone to introducing bugs or regressions in the final circuit functionality''~\cite[Section 3.4.6]{canis15_legup}. In this paper, we investigate whether there is substance to this concern by conducting an empirical evaluation of the reliability of several widely used HLS tools.
The approach we take is \emph{fuzzing}.
%To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases.
@@ -30,10 +31,8 @@ In this paper, we bring fuzzing to the HLS context.
\begin{example}[A miscompilation bug in Vivado HLS]
\label{ex:vivado_miscomp}
-Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS v2018.3, v2019.1 and v2019.2.\footnote{This program, like all the others in this paper, includes a \code{main} function, which means that it compiles straightforwardly with GCC. To compile it with an HLS tool, we rename \code{main} to \code{result}, synthesise that function, and then add a new \code{main} function as a testbench that calls \code{result}.} The bug was initially revealed by a randomly generated program of around 113 lines, which we were able to reduce to the minimal example shown in the figure. We reported this issue to Xilinx, who confirmed it to be a bug.\footnote{Link to Xilinx bug report redacted for review.}
-% \footnote{https://bit.ly/3mzfzgA}
-The program repeatedly shifts a large integer value \code{x} right by the values stored in array \code{arr}.
-Vivado HLS returns \code{0x006535FF}, but the result returned by GCC (and subsequently confirmed manually to be the correct one) is \code{0x046535FF}.
+Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS v2018.3, v2019.1 and v2019.2.\footnote{This program, like all the others in this paper, includes a \code{main} function, which means that it compiles straightforwardly with GCC. To compile it with an HLS tool, we rename \code{main} to \code{result}, synthesise that function, and then add a new \code{main} function as a testbench that calls \code{result}.} The program repeatedly shifts a large integer value \code{x} right by the values stored in array \code{arr}.
+Vivado HLS returns \code{0x006535FF}, but the result returned by GCC (and subsequently confirmed manually to be the correct one) is \code{0x046535FF}. The bug was initially revealed by a randomly generated program of around 113 lines, which we were able to reduce to the minimal example shown in the figure. We reported this issue to Xilinx, who confirmed it to be a bug.\footnote{\url{https://bit.ly/3mzfzgA}}
\end{example}
\begin{figure}[t]
@@ -53,18 +52,18 @@ int main() {
The example above demonstrates the effectiveness of fuzzing. It seems unlikely that a human-written test-suite would discover this particular bug, given that it requires several components all to coincide before the bug is revealed!
-Yet this example also begs the question: do bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the real world'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features that they handle. Nevertheless, although the \emph{test-cases} we generated do not resemble the programs that humans write, the \emph{bugs} that we exposed using those test-cases are real, and \emph{could also be exposed by realistic programs}.
+Yet this example also begs the question: do bugs found by fuzzers really \emph{matter}, given that they are usually found by combining language features in ways that are vanishingly unlikely to happen `in the real world'~\cite{marcozzi+19}. This question is especially pertinent for our particular context of HLS tools, which are well-known to have restrictions on the language features they support. Nevertheless, although the \emph{test-cases} we generated do not resemble the programs that humans write, the \emph{bugs} that we exposed using those test-cases are real, and \emph{could also be exposed by realistic programs}.
%Moreover, it is worth noting that HLS tools are not exclusively provided with human-written programs to compile: they are often fed programs that have been automatically generated by another compiler.
-Ultimately, we believe that any errors in an HLS tool are worth identifying because they have the potential to cause problems, either now or in the future. And problems caused by HLS tools going wrong (or indeed any sort of compiler for that matter) are particularly egregious, because it is so difficult for end-users to identify whether the fault lies with their design or the HLS tool.
+Ultimately, we believe that any errors in an HLS tool are worth identifying because they have the potential to cause problems -- either now or in the future. And problems caused by HLS tools going wrong (or indeed any sort of compiler for that matter) are particularly egregious, because it is so difficult for end-users to identify whether the fault lies with their design or the HLS tool.
\subsection{Our approach and results}
Our approach to fuzzing HLS tools comprises three steps.
-First, we use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate thousands of valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We also augment each program with a random selection of HLS-specific directives. Second, we give these programs to four widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup}, the Intel HLS Compiler, which is also known as i++~\cite{intel20_sdk_openc_applic} and finally Bambu~\cite{pilato13_bambu}. Third, if we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
+First, we use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate thousands of valid C programs within the subset of the C language that is supported by all the HLS tools we test. We also augment each program with a random selection of HLS-specific directives. Second, we give these programs to four widely used HLS tools: Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup}, the Intel HLS Compiler, also known as i++~\cite{intel20_sdk_openc_applic}, and finally Bambu~\cite{pilato13_bambu}. Third, if we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the \creduce{} tool~\cite{creduce}.
-Our testing campaign revealed that all four tools could be made to generate an incorrect design. In total, \totaltestcases{} test cases were run through each tool out of which \totaltestcasefailures{} test cases failed in at least one of the tools. Test case reduction was then performed on some of these failing test cases to obtain at least \numuniquebugs{} unique failing test cases.
+Our testing campaign revealed that all four tools could be made to generate an incorrect design. In total, \totaltestcases{} test cases were run through each tool, of which \totaltestcasefailures{} failed in at least one of the tools. Test case reduction was then performed on some of these failing test cases to obtain at least \numuniquebugs{} unique failing test cases.
-To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found far fewer failures in versions v2019.1 and v2019.2 compared to v2018.3, but we also identified a few test-cases that only failed in versions v2019.1 and v2019.2, which suggests that some new features may have introduced bugs.
+To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (v2018.3, v2019.1, and v2019.2). We found far fewer failures in versions v2019.1 and v2019.2 compared to v2018.3, but we also identified a few test cases that only failed in versions v2019.1 and v2019.2; this suggests that some new features may have introduced bugs.
In summary, the overall aim of our paper is to raise awareness about the reliability (or lack thereof) of current HLS tools, and to serve as a call-to-arms for investment in better-engineered tools. We hope that future work on developing more reliable HLS tools will find our empirical study a valuable source of motivation.