diff options
author | Yann Herklotz <ymh15@ic.ac.uk> | 2020-09-14 15:05:15 +0000 |
---|---|---|
committer | overleaf <overleaf@localhost> | 2020-09-14 15:47:05 +0000 |
commit | f09e782d0925bc735aadc29bf595d1e3cc187351 (patch) | |
tree | 9c323f14437787f81274f7aba10be2289f5acc94 /intro.tex | |
parent | 11f9b46c8c5b3152435b3e5de43008e558f980dc (diff) | |
download | fccm21_esrhls-f09e782d0925bc735aadc29bf595d1e3cc187351.tar.gz fccm21_esrhls-f09e782d0925bc735aadc29bf595d1e3cc187351.zip |
Update on Overleaf.
Diffstat (limited to 'intro.tex')
-rw-r--r-- | intro.tex | 76 |
1 files changed, 37 insertions, 39 deletions
@@ -1,4 +1,8 @@ + \section{Introduction} +High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape. +It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL. +It is even being used in high-assurance settings, such as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy. \begin{figure}[t] \centering @@ -7,29 +11,23 @@ unsigned int b = 0x1194D7FF; int a[6] = {1, 1, 1, 1, 1, 1}; int main() { - int c; - for (c = 0; c < 2; c++) - b = b >> a[c]; + for (int c = 0; c < 2; c++) b = b >> a[c]; return b; } \end{minted} - \caption{Miscompilation bug found in Vivado 2018.3 and 2019.2 which returns \code{0x006535FF} instead of \code{0x046535FF} which is the correct result.}\label{fig:vivado_bug1} + \caption{Miscompilation bug found in Xilinx Vivado HLS 2018.3 and 2019.2. The program returns \code{0x006535FF} but the correct result is \code{0x046535FF}. \JW{Collapse lines 5-7 into a single line?}\YH{Yes I think it's good like this}} + \label{fig:vivado_bug1} \end{figure} -High-level synthesis (HLS), which refers to the automatic translation of software into hardware, is becoming an increasingly important part of the computing landscape. -It promises to increase the productivity of hardware engineers by raising the abstraction level of their designs, and it promises software engineers the ability to produce application-specific hardware accelerators without having to understand hardware desciption languages (HDL) such as Verilog and VHDL. -It is even being used in high-assurance settings, such as financial services~\cite{hls_fintech}, control systems~\cite{hls_controller}, and real-time object detection~\cite{hls_objdetect}. As such, HLS tools are increasingly relied upon. In this paper, we investigate whether they are trustworthy. - -To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases. -Therein lies the difficulty in testing HLS tools. -Human testing may not achieve both these objectives, as HLS tools are often require complex inputs to trigger wrong behaviour. -In this paper, we employ program fuzzing on HLS tools. - -Fuzzing is an automated testing method that provides unexpected, counter-intuitive and random programs to compilers to test their robustness~\cite{fuzzing+chen+13+taming,fuzz+sun+16+toward,fuzzing+liang+18+survey,fuzzing+zhang+19,yang11_findin_under_bugs_c_compil,lidbury15_many_core_compil_fuzzin}. -Program fuzzing has been used extensively in testing software compilers. -For example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} found more than 300 bugs in GCC and clang. -Despite of the influence of fuzzing on software compilers, to the best of our knowledge, it has not been explored significantly within the HLS context. -We specifically target HLS by restricting a fuzzer to generate programs within the subset of C supported by HLS. +The approach we take in this paper is \emph{fuzzing}. +%To test the trustworthiness of HLS tools, we need a robust way of generating programs that both have good coverage and also explores various corner cases. +%Therein lies the difficulty in testing HLS tools. +%Human testing may not achieve both these objectives, as HLS tools are often require complex inputs to trigger wrong behaviour. +%In this paper, we employ program fuzzing on HLS tools. +This is an automated testing method in which randomly generated programs are given to compilers to test their robustness~\cite{fuzzing+chen+13+taming,fuzz+sun+16+toward,fuzzing+liang+18+survey,fuzzing+zhang+19,yang11_findin_under_bugs_c_compil,lidbury15_many_core_compil_fuzzin}. +The generated programs are typically large and rather complex, and they often combine language features in ways that are legal but counter-intuitive; hence they can be effective at exercising corner cases missed by human-designed test suites. +Fuzzing has been used extensively to test conventional compilers; for example, Yang \textit{et al.}~\cite{yang11_findin_under_bugs_c_compil} used it to reveal more than three hundred bugs in GCC and Clang. In this paper, we bring fuzzing to the HLS context. +%We specifically target HLS by restricting a fuzzer to generate programs within the subset of C supported by HLS. % Most fuzzing tools randomly generate random C programs that are then provided to the compiler under test. @@ -44,31 +42,31 @@ We specifically target HLS by restricting a fuzzer to generate programs within t % Fuzzing enables us to overcome -\paragraph{An example of a fuzzed buggy program} -Figure~\ref{fig:vivado_bug1} shows a minimal example that produces the wrong result during RTL simulation in VivadoHLS, compared to GCC execution. -In this example, we right shift a large integer value \code{b} by values of array elements, in array \code{a}, within iterations of a \code{for}-loop. -VivadoHLS returns \code{0x006535FF} instead of \code{0x046535FF} as in GCC. -The circumstances in which we found this bug shows the challenge of testing HLS tools. +\paragraph{An example of a compiler bug found by fuzzing} +Figure~\ref{fig:vivado_bug1} shows a program that produces the wrong result during RTL simulation in Xilinx Vivado HLS. The bug was initially revealed by a large, randomly generated program, which we reduced to the minimal example shown in the figure. +The program repeatedly shifts a large integer value \code{b} right by the values stored in array \code{a}. +Vivado HLS returns \code{0x006535FF}, but the result returned by GCC (and subsequently manually confirmed to be the correct one) is \code{0x046535FF}. -For instance, the for-loop is necessary to ensure that a bug was detected. -Also, the shift value needs to be accessed from an array. -Replacing the array accesses within the loop with constants result in the bug not surfacing. -Additionally, the array \code{a} needed to be at least six elements in size although the for-loop only has two iterations. -% Any array smaller than that did not surface this bug. -Finally, the value of \code{b} is an oracle that could not be changed without masking the bug. -Producing such circumstances within C code for HLS testing is both arduous and counter-intuitive to human testers. -In contrast, producing non-intuitive, complex but valid C programs is the cornerstone of fuzzing tools. -Thus, it was natural to adopt program fuzzing for our HLS testing campaign. +The circumstances in which we found this bug illustrate some of the challenges in testing HLS tools. +For instance, without the for-loop, the bug goes away. +Moreover, the bug only appears if the shift values are accessed from an array. +And -- particularly curiously -- even though the for-loop only has two iterations, the array \code{a} must have at least six elements; if it has fewer than six, the bug disappears. +Even the seemingly random value of \code{b} could not be changed without masking the bug. +It seems unlikely that a manually generated test program would bring together all of the components necessary for exposing this bug. +In contrast, producing counter-intuitive, complex but valid C programs is the cornerstone of fuzzing tools. +For this reason, we found it natural to adopt fuzzing for our HLS testing campaign. % \NR{Yann, please double check my claims about the bug. I hope I accurately described what we discussed. }\YH{Yes I agree with all that, I think that is a good description of it} -\paragraph{Our contributions} -In this paper, we conduct a widespread testing campaign by fuzzing HLS compilers. -We do so in the following manner: +\paragraph{Our contribution} +This paper reports on our campaign to test HLS tools by fuzzing. \begin{itemize} - \item We utilise Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate well-formed C programs from the subset of the C language supported by HLS tools; - \item Then, we test these programs together with a random selection of HLS directives by comparing the gcc and HLS outputs, and we also keep track of programs that crash HLS tools; - \item As part of our testing campaign, we generate 10 thousand test cases that we test against the three well-known HLS tools: Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and Intel HLS~\cite{intel20_sdk_openc_applic}; - \item During our testing campaign, we found \ref{XX} bugs that we discuss and also report to the respective developers, where \ref{XX} bugs have been confirmed. + \item We use Csmith~\cite{yang11_findin_under_bugs_c_compil} to generate ten thousand valid C programs from within the subset of the C language that is supported by all the HLS tools we test. We augment each program with a random selection of HLS-specific directives. + + \item We give these programs to three widely used HLS tools: Vivado HLS~\cite{xilinx20_vivad_high_synth}, LegUp HLS~\cite{canis13_legup} and Intel HLS~\cite{intel20_sdk_openc_applic}. When we find a program that causes an HLS tool to crash, or to generate hardware that produces a different result from GCC, we reduce it to a minimal example with the help of the C-reduce tool~\cite{creduce}. + + \item Our testing campaign revealed that all three tools could be made to crash while compiling or to generate wrong RTL. In total, we found \ref{XX} bugs across the three tools, all of which have been reported to the respective developers, and \ref{XX} of which have been confirmed at the time of writing. + + \item To investigate whether HLS tools are getting more or less reliable over time, we also tested three different versions of Vivado HLS (2018.3, 2019.1, and 2019.2). \JW{Put a sentence here summarising our findings from this experiment, once we have them.} \end{itemize} % we test, and then augment each program with randomly chosen HLS-specific directives. We synthesise each C program to RTL, and use a Verilog simulator to calculate its return value. If synthesis crashes, or if this return value differs from the return value obtained by executing a binary compiled from the C program by gcc, then we have found a candidate bug. We then use trial-and-error to reduce the C program to a minimal version that still triggers a bug. |