\documentclass[hyphens,prologue,x11names,rgb,sigconf]{acmart} \usepackage[textsize=small,shadow]{todonotes}% \usepackage{soul} \usepackage{subcaption} \usepackage{listings} \lstset{ basicstyle=\tt, escapeinside=@@, } \setlength{\fboxsep}{1.5pt} % to reduce \colorbox padding when highlighting %\usepackage{minted} %\setminted{ %fontsize=\small, %escapeinside=@@, %} %\usemintedstyle{manni} \definecolor{highlight1}{HTML}{b4d6d0} % 8dd3c7 \definecolor{highlight2}{HTML}{fcfcdc} % ffffb3 \definecolor{highlight3}{HTML}{c9c7d6} % bebada \definecolor{highlight4}{HTML}{fcc1bb} % fb8072 % Leave review comments using % \jwcomment{...} (for John) or \yhcomment{...} (for Yann) % Using either directly leaves a margin note, using it as, % e.g. \jwcomment[inline]{...} leaves an inline comment \newcommand{\jwcomment}[2][]{\todo[author={John}, color=ACMLightBlue, #1]{#2}} \newcommand{\yhcomment}[2][]{\todo[author={Yann}, color=ACMGreen, #1]{#2}} \newif\ifCOMMENTS \COMMENTStrue \newcommand{\Comment}[3]{\ifCOMMENTS\textcolor{#1}{{\bf [\![#2:} #3{\bf ]\!]}}\fi} \newcommand\JW[2][]{\st{#1}\Comment{red!75!black}{JW}{#2}} \newcommand{\citeintext}[1] {\citeauthor{#1} in \citeyear{#1}~\cite{#1}} \newcommand{\citeshort}[1] {\citeauthor{#1} \citeyear{#1}~\cite{#1}} \begin{document} \title{Resource Sharing for Verified High-Level Synthesis} \author{Michail Pardalos} \email{michail.pardalos17@imperial.ac.uk} \affiliation{\institution{Imperial College London}} \author{Yann Herklotz} \email{yann.herklotz15@imperial.ac.uk} \affiliation{\institution{Imperial College London}} \author{John Wickerson} \email{j.wickerson@imperial.ac.uk} \affiliation{\institution{Imperial College London}} \begin{abstract} High-level synthesis (HLS) is playing an ever-increasing role in hardware design, but concerns have been raised about its reliability. Seeking to address these concerns, Herklotz et al. have recently developed an HLS compiler, called Vericert, that has been mechanically proven (using the Coq proof assistant) to output Verilog designs that are behaviourally equivalent to the input C program. Unfortunately, Vericert's output cannot compete performance-wise with that of established HLS tools such as LegUp. A major reason for this is Vericert's complete lack of support for resource sharing. In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call}, Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We have extended Vericert's formal correctness proof to cover the translation from C source into this extended HTL language. In order to benchmark Vericert-Fun's performance, we have added an (unverified) Verilog-producing backend. Our results on the PolyBench/C benchmark suite show the generated hardware having a resource usage of 61\% of Vericert's on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count. \end{abstract} \maketitle \section{Introduction} \label{sec:introduction} The drive for faster, more energy-efficient computation has led to a surge in demand for custom hardware accelerators. These devices are traditionally designed using hardware description languages such as Verilog or VHDL, but the complexities of designing in such a language, as well as the abundance of engineers trained in software rather than hardware development, has meant that \emph{high-level synthesis} (HLS) tools have become an enticing option. These tools, while incredibly useful, are also known to be unreliable. Previous work by \citet{Herklotz2021_empiricalstudy} has found numerous \emph{miscompilation} bugs in commercial HLS tools including Xilinx Vivado HLS~\cite{xilinx_vitis}, Intel i++~\cite{intel_hls}, and LegUp~\cite{legup_CanisAndrew2011}. This instability can be a significant hindrance in the development process, especially given the longer iteration times of hardware design compared to software. It also undermines the usefulness of HLS in safety- or security-critical settings. It is therefore essential to ensure that high-level synthesis tools are as reliable as possible. Vericert~\cite{Herklotz2020} is an HLS tool that aims to address this issue. Its correctness has been checked to the highest possible standard: machine-checked proof. It achieves that by providing a proof, checked using the Coq proof assistant, that every step of its translation from C to Verilog preserves the semantics of (i.e.\ behaves the same way as) its input program. This proof means that we can always trust any Verilog design produced by Vericert to behave the same way as the C program given to it as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler. Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}. The generated hardware must also meet several other desiderata, including high throughput, low latency, and \emph{area efficiency}. A common optimisation used by HLS tools to improve area efficiency is \emph{resource sharing}; that is, re-using hardware for more than one purpose. Accordingly, our work adds resource sharing to Vericert. Keeping with the aims of the Vericert project, we also extend the correctness proof. \section{Background} \paragraph{The Coq proof assistant} Vericert is implemented using the Coq proof assistant~\cite{coq}. This means that it consists of a collection of functions that define the compilation process, together with the proof of a theorem stating that those definitions constitute a correct HLS tool. Coq mechanically checks this proof using a formal mathematical calculus, and then automatically translates the function definitions into OCaml code that can be compiled and executed. Engineering a software system within a proof assistant like Coq is widely held to be the gold standard for correctness. Recent years have shown that it is feasible to design substantial pieces of software in this way, such as database management system~\cite{malecha+10}, web servers~\cite{chlipala15}, and operating system kernels~\cite{gu+16}. Coq has also been successfully deployed in the hardware design process, both in academia~\cite{braibant+13, bourgeat+20} and in industry~\cite{silveroak}. It has even been applied specifically to the HLS process: Faissole et al.~\cite{faissole+19} have used it to verify that HLS optimisations respect dependencies present in the source code. \paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures, that has been implemented and proven correct using Coq. CompCert handles most of the C99 language, and generally generates code of comparable performance to that generated by GCC at optimisation level \texttt{-O1}. CompCert transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains simple and well-scoped, and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its internal passes. \paragraph{The Vericert verified HLS tool} Introduced by \citet{Herklotz2020}, Vericert is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, Vericert performs no significant optimisations, beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than the designs generated by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}. Vericert branches off from CompCert at the intermediate language called register-transfer language (RTL). Since `RTL' is better known in the hardware community as `register-transfer level', and the two concepts are completely distinct, we shall henceforth refer to the CompCert intermediate language as `3AC' (for `three-address code'). In the 3AC language, each function in the program is represented as a numbered list of instructions with gotos -- essentially, a control-flow graph (CFG). The essence of Vericert's compilation strategy is to treat this CFG as a finite-state machine (FSM), with each instruction in the CFG becoming an FSM state, and each edge in the CFG becoming an FSM transition. Moreover, program variables that do not have their address taken are mapped to hardware registers; other variables (including arrays and structs) are allocated in a block of RAM that represents the stack. More specifically, Vericert builds a finite-state machine with datapath (FSMD)~\cite{HwangVahid1999}. An FSMD comprises two maps, both of which take the current FSM state as their input: a \emph{control map} for determining the next FSM state, and a \emph{datapath} for updating the RAM and registers. FSMDs are captured in Vericert's new intermediate language, HTL. When Vericert compiles from HTL to the final Verilog output, these maps are converted from proper `mathematical' functions into syntactic Verilog case-statements, and each is placed inside an always-block. \JW{Worked example around here.} \section{Rough notes} Figure~\ref{fig:example_C} shows an example C file. \begin{figure} \begin{lstlisting} int add(int a, int b) { @return a + b;@ } int main() { @\colorbox{highlight1}{int v = 0;}@ @\colorbox{highlight2}{v = add(v, 1);}@ @\colorbox{highlight3}{v = add(v, 2);}@ @\colorbox{highlight4}{return v;}@ } \end{lstlisting} \caption{Example C file} \label{fig:example_C} \end{figure} Figure~\ref{fig:example_3AC} shows the 3AC that the CompCert frontend compiles it to. \JW{In the HTL, the parameters to add are called a and b, but in the 3AC they're called x2 and x1. Presumably the HTL should use x2 and x1 too, right?} \begin{figure} \begin{lstlisting} add (x2, x1) { @2: x3 = x2 + x1 + 0 (int)@ @1: return x3@ } main () { @\colorbox{highlight1}{9: x3 = 0}@ @\colorbox{highlight2}{8: x6 = 1}@ @\colorbox{highlight2}{7: x1 = "add"(x3, x6)}@ @\colorbox{highlight2}{6: x3 = x1}@ @\colorbox{highlight3}{5: x5 = 2}@ @\colorbox{highlight3}{4: x2 = "add"(x3, x5)}@ @\colorbox{highlight3}{3: x3 = x2}@ @\colorbox{highlight4}{2: x4 = x3}@ @\colorbox{highlight4}{1: return x4}@ } \end{lstlisting} \caption{Example 3AC code} \label{fig:example_3AC} \end{figure} \begin{figure} \begin{lstlisting}[basicstyle=\tt\footnotesize] add (a, b) { externctrl { clk -> main.clk } controllogic { 2: reg_4 <= 1; 1: reg_4 <= 3; 3: ; } datapath { 2: reg_3 <= {{a + b} + 0}; 1: finish = 1; return = reg_3; 3: finish <= 0; } } main () { externctrl { add_1_a -> add.param_0; add_1_b -> add.param_1; add_1_finish -> add.finish; add_1_rst -> add.rst; add_1_return -> add.return; add_0_a -> add.param_0; add_0_b -> add.param_1; add_0_finish -> add.finish; add_0_rst -> add.rst; add_0_return -> add.return; clk -> main.clk; } controllogic { 9: reg_7 <= 8; 8: reg_7 <= 7; 7: reg_7 <= 12; 12: if ({add_0_finish == 1}) reg_7 <= 6; 6: reg_7 <= 5; 5: reg_7 <= 4; 4: reg_7 <= 10; 10: if ({add_1_finish == 1}) reg_7 <= 3; 3: reg_7 <= 2; 2: reg_7 <= 1; 1: reg_7 <= 11; 11: ; } datapath { 9: reg_3 <= 0; 8: reg_6 <= 1; 7: add_0_rst <= 1; add_0_a <= reg_3; add_0_b <= reg_6; 12: add_0_rst <= 0; reg_1 <= add_0_return; 6: reg_3 <= reg_1; 5: reg_5 <= 2; 4: add_1_rst <= 1; add_1_a <= reg_3; add_1_b <= reg_5; 10: add_1_rst <= 0; reg_2 <= add_1_return; 3: reg_3 <= reg_2; 2: reg_4 <= reg_3; 1: finish = 1; return = reg_4; 11: finish <= 0; } } \end{lstlisting} \caption{Example HTL code} \label{fig:example_HTL} \end{figure} \begin{figure} \begin{tikzpicture}[yscale=-1] \node(fun1) at (1.5,2) {function}; \node(fun2) at (0,2) {function}; \node[anchor=west](C) at (-2,2) {\bf C:}; \node(CFG1) at (1.5,3) {CFG}; \node(CFG2) at (0,3) {CFG}; \node(CFG) at (0.75,4) {CFG}; \node[anchor=west](TAC) at (-2,3) {\bf 3AC:}; \node(FSMD) at (0.75,5) {FSMD}; \node[anchor=west](HTL) at (-2,5) {\bf HTL:}; \node(module) at (0.75,6) {module}; \node[anchor=west](Verilog) at (-2,6) {\bf Verilog:}; \draw[->] (fun1) to node [auto] {CompCert frontend} (CFG1); \draw[->] (fun2) to (CFG2); \draw[->] (CFG1) to node [auto, pos=0.2] {inlining} (CFG); \draw[->] (CFG2) to (CFG); \draw[->] (CFG) to node [auto] {FSMD generation} (FSMD); \draw[->] (FSMD) to node [auto] {Verilog generation} (module); \end{tikzpicture} \caption{Key compilation passes and intermediate languages in Vericert~\cite{vericert}} \label{fig:vericert_flow} \begin{tikzpicture}[yscale=-1] \node(fun1) at (1.5,2) {function}; \node(fun2) at (0,2) {function}; \node[anchor=west](C) at (-2,2) {\bf C:}; \node(CFG1) at (1.5,3) {CFG}; \node(CFG2) at (0,3) {CFG}; \node[anchor=west](TAC) at (-2,3) {\bf 3AC:}; \node(FSMD1) at (1.5,4) {FSMDE}; \node(FSMD2) at (0,4) {FSMDE}; \node(FSMD3) at (1.5,5) {FSMDE}; \node(FSMD4) at (0,5) {FSMDE}; \node[anchor=west](HTL) at (-2,4) {\bf HTL:}; \node(module) at (0.75,6) {module}; \node[anchor=west](Verilog) at (-2,6) {\bf Verilog:}; \draw[->] (fun1) to node [auto] {CompCert frontend} (CFG1); \draw[->] (fun2) to (CFG2); \draw[->] (CFG1) to node [auto] {FSMDE generation} (FSMD1); \draw[->] (CFG2) to (FSMD2); \draw[->, dashed] (FSMD1) to node [auto] {renaming} (FSMD3); \draw[->, dashed] (FSMD2) to (FSMD4); \draw[->, dashed] (FSMD3) to node [auto, pos=0.2] {Verilog generation} (module); \draw[->, dashed] (FSMD4) to (module); \end{tikzpicture} \caption{Key compilation passes and intermediate languages in Vericert-Fun, with dashed arrows indicating passes that have been implemented but not verified} \label{fig:vericert-fun_flow} \end{figure} A few points that we might want to answer at some point: \begin{itemize} \item What is so good about having a machine-checked correctness proof for an HLS tool? Explain that it is the gold-standard for high-integrity computer systems. Give example of Csmith finding 0 bugs in the verified parts of CompCert, compared to 100s in GCC and LLVM. Explain that machine-checked correctness proofs are becoming more common these days thanks to advances in automated theorem proving technology. Explain that we can't guarantee that hardware produced via Vericert will never go wrong, because: \begin{itemize} \item the translation from the Coq source code of Vericert into executable OCaml is unverified, \item the compilation of that OCaml program is unverified, \item the machine that runs the compiled OCaml program could go wrong, \item the Coq proof assistant could contain a bug, as could the machine running it, \item the pretty-printing of the final Verilog design is unverified, \item the synthesis of that Verilog design into a netlist is unverified, as are the tools that perform place-and-route and FPGA bitstream generation, \item the execution on the FPGA could go wrong, e.g. as a result of cosmic rays. \end{itemize} \item What does Vericert currently do? \item What is needed for the correctness proof? (Semantics for C, semantics for Verilog, ...) \end{itemize} \bibliographystyle{ACM-Reference-Format} \bibliography{references} \end{document}