addressing michalis's comments

author: John Wickerson <j.wickerson@imperial.ac.uk> 2022-01-11 08:38:20 +0000
committer: John Wickerson <j.wickerson@imperial.ac.uk> 2022-01-11 08:38:20 +0000
commit: ad9eecf94a8eb84541ec7ab80b08b331b98e2de9 (patch)
tree: b0416d949e1265e28d3fa10b34e0674b96a648c0
parent: 8cf849a6bab80ebf64e4fc4c0b911b9aee6514f8 (diff)
download: fccm22_rsvhls-ad9eecf94a8eb84541ec7ab80b08b331b98e2de9.tar.gz
fccm22_rsvhls-ad9eecf94a8eb84541ec7ab80b08b331b98e2de9.zip
1 files changed, 40 insertions, 24 deletions
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 01eeeae..fcc9dc7 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -70,9 +70,9 @@ escapeinside=||,
 \begin{abstract}
   High-level synthesis (HLS) is playing an ever-increasing role in hardware design, but concerns have been raised about its reliability. Seeking to address these concerns, Herklotz et al. have recently developed an HLS compiler, called Vericert, that has been mechanically proven (using the Coq proof assistant) to output Verilog designs that are behaviourally equivalent to the input C program. Unfortunately, Vericert's output cannot compete performance-wise with that of established HLS tools such as LegUp. A major reason for this is Vericert's complete lack of support for resource sharing. 
   
-  In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call} \MP{Is this an over-simplification? We don't really generate a "block of hardware" for function calls}, Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We have extended Vericert's formal correctness proof to cover the translation from C source into this extended HTL language. \MP{Is this done?}
+  In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call} \MP{Is this an over-simplification? We don't really generate a "block of hardware" for function calls}, Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend Vericert's formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
   
-  In order to benchmark Vericert-Fun's performance, we have added an (unverified) Verilog-producing backend. Our results on the PolyBench/C benchmark suite show the generated hardware having a resource usage of 61\% of Vericert's on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
+We have benchmarked Vericert-Fun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 61\% of Vericert's on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
 \end{abstract}
 
 \maketitle
@@ -82,11 +82,11 @@ escapeinside=||,
 
 The drive for faster, more energy-efficient computation has led to a surge in demand for custom hardware accelerators. These devices are traditionally designed using hardware description languages such as Verilog or VHDL, but the complexities of designing in such a language, as well as the abundance of engineers trained in software rather than hardware development, has meant that \emph{high-level synthesis} (HLS) tools have become an enticing option. 
 
-These tools are useful, but doubts have been raised about their reliability. For instance, \citet{Herklotz2021_empiricalstudy} found numerous miscompilation bugs in commercial HLS tools including Xilinx Vivado HLS~\cite{xilinx_vitis}, Intel i++~\cite{intel_hls}, and LegUp~\cite{legup_CanisAndrew2011}. This unreliability can be a significant hindrance in the development process, especially given the longer iteration times of hardware design compared to software. It also undermines the usefulness of HLS in safety- or security-critical settings. It is therefore essential to ensure that HLS tools are as reliable as possible.
+These tools are useful, but doubts have been raised about their reliability. For instance, \citet{Herklotz2021_empiricalstudy} found numerous miscompilation bugs in commercial HLS tools including Xilinx Vivado HLS~\cite{xilinx_vitis}, Intel i++~\cite{intel_hls}, and LegUp~\cite{legup_CanisAndrew2011}. This unreliability can be a significant hindrance in the development process, especially given the long iteration times of hardware design compared to software. It also undermines the usefulness of HLS in safety- or security-critical settings. It is therefore essential to ensure that HLS tools are as reliable as possible.
 
-Vericert~\cite{Herklotz2020} is an HLS tool that aims to address this issue. Its correctness has been checked to the highest possible standard: machine-checked proof. It achieves that by providing a proof, checked using the Coq proof assistant~\cite{coq}, that every step of its translation from C to Verilog preserves the semantics of (i.e.\ behaves the same way as) its input program. This proof means that we can always trust any Verilog design produced by Vericert to behave the same way as the C program given to it as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler.
+Vericert~\cite{Herklotz2020} is a new HLS tool that aims to address this issue. Its correctness has been checked to the highest possible standard: machine-checked proof. It provides a proof, checked using the Coq proof assistant~\cite{coq}, that every step of its translation from C to Verilog preserves the semantics of (i.e.\ behaves the same way as) its input program. This proof means that we can always trust any Verilog design produced by Vericert to behave the same way as the C program given as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler.
 
-Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}. The generated hardware must also meet several other desiderata, including high throughput, low latency, and good \emph{area efficiency}. A common optimisation used by HLS tools to improve area efficiency is \emph{resource sharing}; that is, re-using hardware for more than one purpose. Accordingly, our work adds resource sharing to Vericert. Keeping with the aims of the Vericert project, we also extend the correctness proof.
+Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}. The generated hardware must also meet several other desiderata, including high throughput, low latency, and good \emph{area efficiency}, which is the topic of this paper. A common optimisation employed by HLS tools to improve area efficiency is \emph{resource sharing}; that is, re-using hardware for more than one purpose. Accordingly, our work adds resource sharing to Vericert. Keeping with the aims of the Vericert project, work is ongoing to extend the correctness proof.
 
 \section{Background}
 
@@ -94,20 +94,24 @@ Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}.
 
 Engineering a software system within a proof assistant like Coq is widely held to be the gold standard for correctness. Recent years have shown that it is feasible to design substantial pieces of software in this way, such as database management systems~\cite{malecha+10}, web servers~\cite{chlipala15}, and operating system kernels~\cite{gu+16}. Coq has also been successfully deployed in the hardware design process, both in academia~\cite{braibant+13, bourgeat+20} and in industry~\cite{silveroak}. It has even been applied specifically to the HLS process: Faissole et al.~\cite{faissole+19} used it to verify that HLS optimisations respect dependencies present in the source code.
 
-\paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures~\cite{six+20}, that has been implemented and proven correct using Coq. CompCert accepts most of the C99 language, and generally produces code of comparable performance to that produced by GCC at optimisation level \texttt{-O1}.
-
-CompCert transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains relatively simple and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its passes.
+\paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures~\cite{six+20}, that has been implemented and proven correct using Coq. CompCert accepts most of the C99 language, and generally produces code of comparable performance to that produced by GCC at optimisation level \texttt{-O1}. It transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains relatively simple and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its passes.
 
 \paragraph{The Vericert verified HLS tool}
 
-Introduced by \citet{Herklotz2020}, Vericert is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, Vericert performs no significant optimisations, beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than that achieved by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}.
+Introduced by \citet{Herklotz2020}, Vericert is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, Vericert performs no significant optimisations beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than that achieved by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}.
 
 Vericert branches off from CompCert at the intermediate language called \emph{register-transfer language} (RTL). Since that abbreviation is usually used in the hardware community for `register-transfer level', we shall henceforth avoid possible confusion by referring to the CompCert intermediate language as `3AC' (for `three-address code').
 
-In the 3AC language, each function in the program is represented as a numbered list of instructions with gotos -- essentially, a control-flow graph (CFG). The essence of Vericert's compilation strategy is to treat this CFG as a state machine, with each instruction in the CFG becoming a state, and each edge in the CFG becoming a transition. Moreover, program variables that do not have their address taken are mapped to hardware registers; other variables (including arrays and structs) are allocated in a block of RAM that represents the stack. More precisely, Vericert builds a state machine with datapath~\cite{HwangVahid1999}. This comprises two maps, both of which take the current state as their input: the \emph{control logic} map for determining the next state, and a \emph{datapath} map for updating the RAM and registers. These state machines are captured in Vericert's new intermediate language, HTL. When Vericert compiles from HTL to the final Verilog output, these maps are converted from proper `mathematical' functions into syntactic Verilog case-statements, and each is placed inside an always-block.
+In the 3AC language, each function in the program is represented as a numbered list of instructions with gotos -- essentially, a control-flow graph (CFG). The essence of Vericert's compilation strategy is to treat this CFG as a state machine, with each instruction in the CFG becoming a state, and each edge in the CFG becoming a transition. Moreover, program variables that do not have their address taken are mapped to hardware registers; other variables (including arrays and structs) are allocated in a block of RAM that represents the stack. More precisely, Vericert builds a \emph{finite state machine with datapath} (FSMD). This comprises two maps, both of which take the current state as their input: the \emph{control logic} map for determining the next state, and a \emph{datapath} map for updating the RAM and registers. These state machines are captured in Vericert's new intermediate language, HTL. When Vericert compiles from HTL to the final Verilog output, these maps are converted from proper `mathematical' functions into syntactic Verilog case-statements, and each is placed inside an always-block.
 
 \begin{figure}
+
+\definecolor{colorflow}{HTML}{8EB85B}
+
 \begin{tikzpicture}[yscale=-1]
+
+\tikzset{flowlines/.style={draw, colorflow, ultra thick}}
+
 \node(fun1) at (1.5,2) {function};
 \node(fun2) at (0,2) {function};
 \node[anchor=west](C) at (-2,2) {\bf C:};
@@ -123,13 +127,14 @@ In the 3AC language, each function in the program is represented as a numbered l
 \draw[->] (fun2) to (CFG2);
 \draw[->] (CFG1) to node [auto, pos=0.2] {inlining} (CFG);
 \draw[->] (CFG2) to (CFG);
-\draw[->] (CFG) to node [auto] {state machine generation} (FSMD);
+\draw[->] (CFG) to node [auto] (smg) {state machine generation} (FSMD);
 \draw[->] (FSMD) to node [auto] {Verilog generation} (module);
-\end{tikzpicture}
-\caption{Key compilation passes and intermediate languages in Vericert~\cite{vericert}}
-\label{fig:vericert_flow}
 
-\begin{tikzpicture}[yscale=-1]
+\begin{pgfonlayer}{background}
+\node[flowlines, fit=(C)(module)(smg)] (background1) {};
+\end{pgfonlayer}
+
+\begin{scope}[yshift=55mm]
 \node(fun1) at (2,2) {function};
 \node(fun2) at (0,2) {function};
 \node[anchor=west](C) at (-2,2) {\bf C:};
@@ -145,15 +150,24 @@ In the 3AC language, each function in the program is represented as a numbered l
 \node[anchor=west](Verilog) at (-2,6) {\bf Verilog:};
 \draw[->] (fun1) to node [auto] {CompCert frontend} (CFG1);
 \draw[->] (fun2) to (CFG2);
-\draw[->] (CFG1) to node [auto] {state machine generation} (FSMD1);
+\draw[->] (CFG1) to node [auto] (smg) {state machine generation} (FSMD1);
 \draw[->] (CFG2) to (FSMD2);
-\draw[->, dashed] (FSMD1) to node [auto] {renaming} (FSMD3);
-\draw[->, dashed] (FSMD2) to (FSMD4);
-\draw[->, dashed] (FSMD3) to node [auto, pos=0.2] {Verilog generation} (module);
-\draw[->, dashed] (FSMD4) to (module);
+\draw[->] (FSMD1) to node [auto] {renaming} (FSMD3);
+\draw[->] (FSMD2) to (FSMD4);
+\draw[->] (FSMD3) to node [auto, pos=0.2] {Verilog generation} (module);
+\draw[->] (FSMD4) to (module);
+
+\end{scope}
+
+\begin{pgfonlayer}{background}
+\node[flowlines, fit=(C)(module)(smg)] (background2) {};
+\end{pgfonlayer}
+
+\draw[flowlines, -latex] (background1.south) to node [auto] {\bf this work} (background2.north -| background1.south);
+
 \end{tikzpicture}
-\caption{Key compilation passes and intermediate languages in Vericert-Fun, with dashed arrows indicating passes that have been implemented but not verified}
-\label{fig:vericert-fun_flow}
+\caption{Key compilation passes and intermediate languages in Vericert~\cite{Herklotz2020} (top) and Vericert-Fun (bottom)}
+\label{fig:flow}
 \end{figure}
 
 \section{Implementation of Vericert-Fun}
@@ -243,7 +257,9 @@ Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as
 \label{fig:example_3AC}
 \end{figure}
 
-Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citeauthor{vericert}; what is new here is the handling of function calls, so the following description concentrates on that aspect. Before we proceed, there are three conventions worth noting: first, ``$*{\rightarrow}\langle\mathit{node}\rangle$'' should be interpreted as an abbreviation for edges from all nodes to $\langle\mathit{node}\rangle$; second, the dotted edges are intended to suggest how control is passed between the two state machines, but they carry no formal meaning \MP{There should be a better way to show the CFG. The lines from main to add show the initiation of a parallel execution. The ones from add to main show its ending (roughly). This is also not quite correct since functions are, conceptually, all executing at all times}; and third, we allow the \lstinline{main} machine to register $\langle\mathit{reg}\rangle$ in the \lstinline{add} machine using ``\lstinline{add.}$\langle\mathit{reg}\rangle$''.
+Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citet{Herklotz2020}; what is new here is the handling of function calls, so the following description concentrates on that aspect. Before we proceed, there are two conventions worth noting: first, ``$*{\rightarrow}\langle\mathit{node}\rangle$'' should be interpreted as an abbreviation for edges from all nodes to $\langle\mathit{node}\rangle$, and second, we allow the \lstinline{main} machine to refer to the \lstinline{add} machine's register $\langle\mathit{reg}\rangle$ by writing ``\lstinline{add.}$\langle\mathit{reg}\rangle$''.
+
+The solid edges within the two state machines indicate the transfer of control between states, while the dashed edges between the state machines are more `fictional'. The ground truth is that both state machines run continuously, but it is convenient to think of only one of the machines as doing useful work at any point in time. So, the dashed edges indicate the points at which the `active' machine changes.
 
 \definecolor{background}{HTML}{eeeeee}
 \begin{figure}
@@ -319,7 +335,7 @@ Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into tw
 \label{fig:example_HTL}
 \end{figure}
 
-Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This puts the \lstinline{add} machine into state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to progress from state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
+In more detail: Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This puts the \lstinline{add} machine into state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to progress from state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
 
 The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
author	John Wickerson <j.wickerson@imperial.ac.uk>	2022-01-11 08:38:20 +0000
committer	John Wickerson <j.wickerson@imperial.ac.uk>	2022-01-11 08:38:20 +0000
commit	ad9eecf94a8eb84541ec7ab80b08b331b98e2de9 (patch)
tree	b0416d949e1265e28d3fa10b34e0674b96a648c0
parent	8cf849a6bab80ebf64e4fc4c0b911b9aee6514f8 (diff)
download	fccm22_rsvhls-ad9eecf94a8eb84541ec7ab80b08b331b98e2de9.tar.gz fccm22_rsvhls-ad9eecf94a8eb84541ec7ab80b08b331b98e2de9.zip