\documentclass[conference]{IEEEtran} \newif\ifCOMMENTS \COMMENTStrue \newif\ifBLIND \BLINDfalse \usepackage[english]{babel} \usepackage{graphicx} \usepackage{siunitx} \usepackage{minted} \setminted{baselinestretch=1, numbersep=5pt, xleftmargin=9pt, linenos, fontsize=\footnotesize} \usepackage{amsthm} \usepackage{pgfplots} \usepackage{tikz} %\usepackage{subcaption} \usepackage{booktabs} \usepackage{multirow} \usepackage{multicol} \usepackage{hyperref} %\usepackage{balance} \newcommand\totaltestcases{6700} \newcommand\totaltestcasefailures{1191} \newcommand\numuniquebugs{8} \newcommand\vivadotestcases{3645} \theoremstyle{definition} \newtheorem{example}[]{Example} \newcommand{\Comment}[3]{\ifCOMMENTS\textcolor{#1}{{\bf [[#2:} #3{\bf ]]}}\fi} \newcommand\JW[1]{\Comment{red!75!black}{JW}{#1}} \newcommand\AD[1]{\Comment{yellow!50!black}{AD}{#1}} \newcommand\YH[1]{\Comment{green!50!blue}{YH}{#1}} \newcommand\NR[1]{\Comment{yellow!50!black}{NR}{#1}} \newcommand\ZD[1]{\Comment{blue!50!black}{NR}{#1}} \newcommand{\code}[1]{\texttt{#1}} \newcommand\creduce{C-Reduce} \begin{document} \title{An Empirical Study of the Reliability \\ of High-Level Synthesis Tools} \ifBLIND \author{Blind review} \else \author{% \IEEEauthorblockN{Yann Herklotz, Zewei Du, Nadesh Ramanathan, and John Wickerson} \IEEEauthorblockA{Imperial College London, UK \\ Email: \{yann.herklotz15, zewei.du19, n.ramanathan14, j.wickerson\}@imperial.ac.uk} } \fi \maketitle \begin{abstract} High-level synthesis (HLS) is becoming an increasingly important part of the computing landscape, even in safety-critical domains where correctness is key. As such, HLS tools are increasingly relied upon. But are they trustworthy? We have subjected four widely used HLS tools -- LegUp, Xilinx Vivado HLS, the Intel HLS Compiler and Bambu -- to a rigorous fuzzing campaign using thousands of random, valid C programs that we generated using a modified version of the Csmith tool. For each C program, we compiled it to a hardware design using the HLS tool under test and checked whether that hardware design generates the same output as an executable generated by the GCC compiler. When discrepancies arose between GCC and the HLS tool under test, we reduced the C program to a minimal example in order to zero in on the potential bug. Our testing campaign has revealed that all four HLS tools can be made to generate wrong designs from valid C programs and one tool could be made to crash; this underlines the need for these increasingly trusted tools to be more rigorously engineered. Out of \totaltestcases{} test-cases, we found \totaltestcasefailures{} programs that caused at least one tool to fail, out of which we were able to discern at least \numuniquebugs{} unique bugs. \end{abstract} \input{intro} \input{related} \input{method} % \input{testing-system-new} \input{eval} \input{conclusion} %\begin{acks} %For final version of paper. %\end{acks} \bigskip \bibliographystyle{IEEEtran} \bibliography{conference.bib} \end{document} \endinput %%% Local Variables: %%% mode: latex %%% TeX-master: t %%% End: