Update on Overleaf.

author: John Wickerson <j.wickerson@imperial.ac.uk> 2022-01-14 15:55:48 +0000
committer: node <node@git-bridge-prod-0> 2022-01-14 16:05:06 +0000
commit: 2edb33bdea26d0b4406ad37da8de1421273f6903 (patch)
tree: 489cf13195cb549f4f41f13e304f9ad930575127
parent: 17784589e22e617401208b30782ca38c0a08e6d1 (diff)
download: fccm22_rsvhls-2edb33bdea26d0b4406ad37da8de1421273f6903.tar.gz
fccm22_rsvhls-2edb33bdea26d0b4406ad37da8de1421273f6903.zip
1 files changed, 30 insertions, 39 deletions
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index d3a95c5..d9c5a4d 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -61,9 +61,8 @@ escapeinside=||,
 \newcommand\YH[2][]{\st{#1}\Comment{blue!50!green}{YH}{#2}}
 \newcommand\MP[2][]{\st{#1}\Comment{blue!50!magenta}{MP}{#2}}
 
-
-\newcommand{\citeintext}[1] {\citeauthor{#1} in \citeyear{#1}~\cite{#1}}
-\newcommand{\citeshort}[1] {\citeauthor{#1} \citeyear{#1}~\cite{#1}}
+\newcommand\vericert{Vericert}
+\newcommand\vericertfun{Vericert-Fun}
 
 \title{Resource Sharing for Verified High-Level Synthesis \\ (Work in Progress)}
 
@@ -84,12 +83,12 @@ escapeinside=||,
 \maketitle
 
 \begin{abstract}
-  High-level synthesis (HLS) is playing an ever-increasing role in hardware design, but concerns have been raised about its reliability. Seeking to address these concerns, Herklotz et al. have recently developed an HLS compiler, called Vericert, that has been mechanically proven (using the Coq proof assistant) to output Verilog designs that are behaviourally equivalent to the input C program. Unfortunately, Vericert's output cannot compete performance-wise with that of established HLS tools such as LegUp. A major reason for this is Vericert's complete lack of support for resource sharing. 
+  High-level synthesis (HLS) is playing an ever-increasing role in hardware design, but concerns have been raised about its reliability. Seeking to address these concerns, Herklotz et al. have recently developed an HLS compiler, called \vericert{}, that has been mechanically proven (using the Coq proof assistant) to output Verilog designs that are behaviourally equivalent to the input C program. Unfortunately, \vericert{}'s output cannot compete performance-wise with that of established HLS tools such as LegUp. A major reason for this is \vericert{}'s complete lack of support for resource sharing. 
   
-  In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call}, %\MP{Is this an over-simplification? We don't really generate a `block of hardware' for function calls}
-  Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend Vericert's formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
+  In this paper, we present \vericertfun: \vericert{} extended with function-level resource sharing. Where original \vericert{} creates one block of hardware per function \emph{call}, %\MP{Is this an over-simplification? We don't really generate a `block of hardware' for function calls}
+  \vericertfun{} creates one block of hardware per function \emph{definition}. The enabling innovation is to extend \vericert{}'s intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend \vericert{}'s formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
   
-We have benchmarked Vericert-Fun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 61\% of Vericert's on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
+We have benchmarked \vericertfun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 61\% of \vericert{}'s on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
 \end{abstract}
 
 \section{Introduction}
@@ -99,36 +98,36 @@ The drive for faster, more energy-efficient computation has led to a surge in de
 
 These tools are useful, but doubts have been raised about their reliability. For instance, \citet{Herklotz2021_empiricalstudy} found numerous miscompilation bugs in commercial HLS tools including Xilinx Vivado HLS~\cite{xilinx_vitis}, Intel i++~\cite{intel_hls}, and LegUp~\cite{legup_CanisAndrew2011}. This unreliability can be a significant hindrance in the development process, especially given the long iteration times of hardware design compared to software. It also undermines the usefulness of HLS in safety- or security-critical settings. It is therefore essential to ensure that HLS tools are as reliable as possible.
 
-Vericert~\cite{Herklotz2020} is a new HLS tool that aims to address this issue. Its correctness has been checked to the highest possible standard: machine-checked proof. It provides a proof, checked using the Coq proof assistant~\cite{coq}, that every step of its translation from C to Verilog preserves the semantics of (i.e.\ behaves the same way as) its input program. This proof means that we can always trust any Verilog design produced by Vericert to behave the same way as the C program given as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler.
+\vericert{}~\cite{Herklotz2020} is a new HLS tool that aims to address this issue. Its correctness has been checked to the highest possible standard: machine-checked proof. It provides a proof, checked using the Coq proof assistant~\cite{coq}, that every step of its translation from C to Verilog preserves the semantics of (i.e.\ behaves the same way as) its input program. This proof means that we can always trust any Verilog design produced by \vericert{} to behave the same way as the C program given as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler.
 
-Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}. The generated hardware must also meet several other desiderata, including high throughput, low latency, and good \emph{area efficiency}, which is the topic of this paper. A common optimisation employed by HLS tools to improve area efficiency is \emph{resource sharing}; that is, re-using hardware for more than one purpose. Accordingly, our work adds resource sharing to Vericert. Keeping with the aims of the Vericert project, work is ongoing to extend the correctness proof.
+Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}. The generated hardware must also meet several other desiderata, including high throughput, low latency, and good \emph{area efficiency}, which is the topic of this paper. A common optimisation employed by HLS tools to improve area efficiency is \emph{resource sharing}; that is, re-using hardware for more than one purpose. Accordingly, our work adds resource sharing to \vericert{}. Keeping with the aims of the \vericert{} project, work is ongoing to extend the correctness proof.
 
 \section{Background}
 
-\paragraph{The Coq proof assistant} Vericert is implemented using the Coq proof assistant~\cite{coq}. This means that it consists of a collection of functions that define the compilation process, together with the proof of a theorem stating that those definitions constitute a correct HLS tool. Coq mechanically checks this proof using a formal mathematical calculus, and then automatically translates the function definitions into OCaml code that can be compiled and executed. 
+\paragraph{The Coq proof assistant} \vericert{} is implemented using the Coq proof assistant~\cite{coq}. This means that it consists of a collection of functions that define the compilation process, together with the proof of a theorem stating that those definitions constitute a correct HLS tool. Coq mechanically checks this proof using a formal mathematical calculus, and then automatically translates the function definitions into OCaml code that can be compiled and executed. 
 
 Engineering a software system within a proof assistant like Coq is widely held to be the gold standard for correctness. Recent years have shown that it is feasible to design substantial pieces of software in this way, such as database management systems~\cite{malecha+10}, web servers~\cite{chlipala15}, and operating system kernels~\cite{gu+16}. Coq has also been successfully deployed in the hardware design process, both in academia~\cite{braibant+13, bourgeat+20} and in industry~\cite{silveroak}. It has even been applied specifically to the HLS process: Faissole et al.~\cite{faissole+19} used it to verify that HLS optimisations respect dependencies present in the source code.
 
 \paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures~\cite{six+20}, that has been implemented and proven correct using Coq. CompCert accepts most of the C99 language, and generally produces code of comparable performance to that produced by GCC at optimisation level \texttt{-O1}. It transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains relatively simple and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its passes. 
 
-That the Csmith compiler testing tool has found hundreds of bugs in GCC and LLVM but has never found a single bug in (the verified parts of) CompCert~\cite{csmith} is a testament to the reliability of this development approach. That said, we cannot guarantee that hardware produced via Vericert-Fun will never go wrong, because of fallibilities in components not covered by the correctness theorem, including the computer checking the proofs, the pretty-printer of the final Verilog design, the synthesis toolchain~\cite{verismith}, and the FPGA device itself.
+That the Csmith compiler testing tool has found hundreds of bugs in GCC and LLVM but has never found a single bug in (the verified parts of) CompCert~\cite{csmith} is a testament to the reliability of this development approach. That said, we cannot guarantee that hardware produced via \vericertfun{} will never go wrong, because of fallibilities in components not covered by the correctness theorem, including the computer checking the proofs, the pretty-printer of the final Verilog design, the synthesis toolchain~\cite{verismith}, and the FPGA device itself.
  
-\paragraph{The Vericert verified HLS tool}
+\paragraph{The \vericert{} verified HLS tool}
 
-Introduced by \citet{Herklotz2020}, Vericert is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, Vericert performs no significant optimisations beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than that achieved by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}.
+Introduced by \citet{Herklotz2020}, \vericert{} is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, \vericert{} performs no significant optimisations beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than that achieved by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}.
 
-Vericert branches off from CompCert at the intermediate language called \emph{register-transfer language} (RTL). Since that abbreviation is usually used in the hardware community for `register-transfer level', we shall henceforth avoid possible confusion by referring to the CompCert intermediate language as `3AC' (for `three-address code').
+\vericert{} branches off from CompCert at the intermediate language called \emph{register-transfer language} (RTL). Since that abbreviation is usually used in the hardware community for `register-transfer level', we shall henceforth avoid possible confusion by referring to the CompCert intermediate language as `3AC' (for `three-address code').
 
-In the 3AC language, each function in the program is represented as a numbered list of instructions with gotos -- essentially, a control-flow graph (CFG). The essence of Vericert's compilation strategy is to treat this CFG as a state machine, with each instruction in the CFG becoming a state, and each edge in the CFG becoming a transition. Moreover, program variables that do not have their address taken are mapped to hardware registers; other variables (including arrays and structs) are allocated in a block of RAM that represents the stack. More precisely, Vericert builds a \emph{finite state machine with datapath} (FSMD). This comprises two maps, both of which take the current state as their input: the \emph{control logic} map for determining the next state, and a \emph{datapath} map for updating the RAM and registers. These state machines are captured in Vericert's new intermediate language, HTL. When Vericert compiles from HTL to the final Verilog output, these maps are converted from proper `mathematical' functions into syntactic Verilog case-statements, and each is placed inside an always-block.
+In the 3AC language, each function in the program is represented as a numbered list of instructions with gotos -- essentially, a control-flow graph (CFG). The essence of \vericert{}'s compilation strategy is to treat this CFG as a state machine, with each instruction in the CFG becoming a state, and each edge in the CFG becoming a transition. Moreover, program variables that do not have their address taken are mapped to hardware registers; other variables (including arrays and structs) are allocated in a block of RAM that represents the stack. More precisely, \vericert{} builds a \emph{finite state machine with datapath} (FSMD). This comprises two maps, both of which take the current state as their input: the \emph{control logic} map for determining the next state, and a \emph{datapath} map for updating the RAM and registers. These state machines are captured in \vericert{}'s new intermediate language, HTL. When \vericert{} compiles from HTL to the final Verilog output, these maps are converted from proper `mathematical' functions into syntactic Verilog case-statements, and each is placed inside an always-block.
 
-The overall Vericert flow is shown at the top of Figure~\ref{fig:flow}. The key point to note here is the `inlining' step, which folds all function definitions into their call sites. This allows Vericert to make the simplifying assumption that there is only a single CFG, but has the unwanted effect of duplicating hardware. In this work, we remove some of this inlining and hence some of the duplication.
+The overall \vericert{} flow is shown at the top of Figure~\ref{fig:flow}. The key point to note here is the `inlining' step, which folds all function definitions into their call sites. This allows \vericert{} to make the simplifying assumption that there is only a single CFG, but has the unwanted effect of duplicating hardware. In this work, we remove some of this inlining and hence some of the duplication.
 
 \paragraph{Resource sharing in HLS}
 
-Resource sharing is a feature expected of most HLS compilers. In a typical architecture generated by HLS~\cite{coussy+09}, a number of `functional components' are selected from a library according to the needs of the specific design. In the scheduling process, a clock cycle is chosen for each operations such that the component is available. Given the need to mechanically verify the correctness of our implementation, Vericert-Fun follows a simpler approach: we share resources at the granularity of entire functions, rather than individual operations. Function-level resource sharing is implemented in commercial HLS compilers such as the Intel HLS compiler~\cite{intel_hls} or
+Resource sharing is a feature expected of most HLS compilers. In a typical architecture generated by HLS~\cite{coussy+09}, a number of `functional components' are selected from a library according to the needs of the specific design, and in the scheduling process, a clock cycle is chosen for each operation such that the components it requires are available. Given the need to mechanically verify the correctness of our implementation, \vericertfun{} follows a simpler approach: we share resources at the granularity of entire functions, rather than individual operations. Function-level resource sharing is implemented in commercial HLS compilers such as the Intel HLS compiler~\cite{intel_hls} or
 Xilinx Vitis~\cite{xilinx_vitis}, and is guided by the programmer through appropriate pragmas.
 
-Perna et al.~\cite{perna+12} developed a verified HLS tool for the Handel-C language, but, like Vericert, they did not implement function-level resource sharing, instead arranging that ``all procedure and function calls are expanded in the
+Perna et al.~\cite{perna+12} developed a verified HLS tool for the Handel-C language, but, like \vericert{}, they did not implement function-level resource sharing, instead arranging that ``all procedure and function calls are expanded in the
 front-end''.
 
 \begin{figure}
@@ -194,11 +193,11 @@ front-end''.
 \draw[flowlines, -latex] (background1.south) to node [auto] {\bf this work} (background2.north -| background1.south);
 
 \end{tikzpicture}
-\caption{Key compilation passes and intermediate languages in Vericert~\cite{Herklotz2020} (top) and Vericert-Fun (bottom)}
+\caption{Key compilation passes and intermediate languages in \vericert{}~\cite{Herklotz2020} (top) and \vericertfun{} (bottom)}
 \label{fig:flow}
 \end{figure}
 
-\section{Implementation of Vericert-Fun}
+\section{Implementation of \vericertfun}
 
 \tikzset{
 st/.style={draw=black, fill=white, rounded corners, align=left, font=\tt\footnotesize, minimum width=40mm},
@@ -209,7 +208,7 @@ tit/.style={anchor=north west, font=\tt\footnotesize},
 edlab/.style={auto, inner sep=2pt, align=left, font=\tt\footnotesize}
 }
 
-In this section, we shall explain the implementation of Vericert-Fun, using Figure~\ref{fig:example_C} as a worked example. The basic idea is shown at the bottom of Figure~\ref{fig:flow}: we avoid inlining the function calls at the 3AC level (except in certain circumstances described below), instead maintaining one state machine per function. All the state machines run simultaneously, and function calls are implemented by sending messages between the state machines. We combine all of these state machines into a single Verilog module, after renaming variables as necessary to avoid clashes.
+In this section, we shall explain the implementation of \vericertfun, using Figure~\ref{fig:example_C} as a worked example. The basic idea is shown at the bottom of Figure~\ref{fig:flow}: we avoid inlining the function calls at the 3AC level (except in certain circumstances described below), instead maintaining one state machine per function. All the state machines run simultaneously, and function calls are implemented by sending messages between the state machines. We combine all of these state machines into a single Verilog module, after renaming variables as necessary to avoid clashes.
 
 \begin{figure}
 \centering
@@ -394,32 +393,24 @@ In more detail: Execution begins in state 9 of the \lstinline{main} machine, and
 
 The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
 
-\JW{Still need to talk about:
-\begin{itemize}
-\item It might be nice to motivate the additional idle state at the end, like in mp's thesis. 
-\item Need to explain and motivate the assumption made about no pointers in called functions.
-\item Explain externctrl.
-\item Talk about alternative implementations that we considered, e.g. separate Verilog modules, and why we discounted them.
-\end{itemize}}
+One technical challenge we encountered in the implementation of \vericertfun{} has to do with the fact that the caller and callee state machines modify each other's variables. This is problematic because each function is translated into a state machine independently, and hence the variable names used in the other state machines are not necessarily fixed yet. We work around this problem by introducing an additional ... \JW{TODO: finish explaining externctrl.}
+
+\JW{Explain the assumption made about no pointers in called functions.}
 
-\section{Proving Vericert-Fun correct}
+\section{Proving \vericertfun{} correct}
 
-The CompCert correctness theorem~\cite{compcert} expresses that every behaviour that can be exhibited by the compiled program is also a behaviour of the source program. Vericert~\cite{Herklotz2020} adapted this theorem for HLS by replacing `compiled program' with `generated Verilog design'. In both cases, a formal semantics is required for the source and target languages. Vericert-Fun targets the same fragment of the Verilog language as \citeauthor{Herklotz2020} already mechanised in Coq, so no changes are required there. 
+The CompCert correctness theorem~\cite{compcert} expresses that every behaviour that can be exhibited by the compiled program is also a behaviour of the source program. \vericert{}~\cite{Herklotz2020} adapted this theorem for HLS by replacing `compiled program' with `generated Verilog design'. In both cases, a formal semantics is required for the source and target languages. \vericertfun{} targets the same fragment of the Verilog language as \citeauthor{Herklotz2020} already mechanised in Coq, so no changes are required there. 
 
 Where changes \emph{are} required is in the semantics of the intermediate language HTL, which sits between CompCert's 3AC and the final Verilog.
-When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. 
+When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. \MP{TODO: Consider saying something here about how the effect of a function call is to put the state machine into a particular `callstate'. My concern is that this requires introducing a different notion of `state', distinct from the states of the state machine, and that this will probably confuse readers.}
 
+At the point of writing, the correctness of \vericertfun{} from C to HTL has been mostly proven. \YH{TODO: Clarify what `mostly' means.} The pass that renames variables in HTL is yet to be proven, as is the pass that generates the final Verilog. 
 
-\JW{Todo:
-\begin{itemize}
-\item Summarise what we have proved so far and what is left to prove.
-\item State how large the implementation is, how large the proof is, and the rough person-months both took.
-\item Explain what the value of the proof has been so far, even though it has not been finished. For instance, has it led to any implementation bugs being ironed? Has the proof process led to changes in the design?
-\end{itemize}}
+To give a rough idea of the scale and complexity of the task: the implementation of \vericertfun{} involved the addition of about \MP{\ref{???}} lines of Coq code to \vericert{} and took the first author \MP{\ref{???}} months. The correctness proof, so far, has taken \MP{\ref{???}} lines of Coq code and \MP{\ref{???}} months.
 
 \section{Performance evaluation}
 
-We now compare the performance of the hardware generated by Vericert-Fun against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) at 50MHz, to determine area usage and fmax.
+We now compare the performance of the hardware generated by \vericertfun{} against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) at 50MHz, to determine area usage and fmax.
 
 Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the data points lie very close to 1, which suggests no significant impact. On average the cycle count increases by 0.7\%; this modest increase is in line with expectations because our translation introduces an extra state per function call. The impact on fmax is similarly minimal, ranging between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
 
@@ -445,7 +436,7 @@ The y-axis shows the impact of resource sharing on the area usage of the hardwar
 
 \section{Future work}
 
-Our immediate priority is to complete Vericert-Fun's correctness proof. In the medium term, we would like to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers; we anticipate that this will lead to further reductions in area usage. In the longer term, we are considering how to implement resource sharing even more effectively in a verified HLS tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.
+Our immediate priority is to complete \vericertfun's correctness proof. In the medium term, we would like to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers; we anticipate that this will lead to further reductions in area usage. We would also like to make \vericertfun generate designs with one Verilog module per C function, as this is more idiomatic than packing all the state machines into a single module; we did not do this yet because it would require extending the formal Verilog semantics to handle multiple modules. \YH{TODO: Please check that sentence.} In the longer term, we are considering how to implement resource sharing even more effectively in a verified HLS tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.
 
 
 \bibliographystyle{ACM-Reference-Format}
author	John Wickerson <j.wickerson@imperial.ac.uk>	2022-01-14 15:55:48 +0000
committer	node <node@git-bridge-prod-0>	2022-01-14 16:05:06 +0000
commit	2edb33bdea26d0b4406ad37da8de1421273f6903 (patch)
tree	489cf13195cb549f4f41f13e304f9ad930575127
parent	17784589e22e617401208b30782ca38c0a08e6d1 (diff)
download	fccm22_rsvhls-2edb33bdea26d0b4406ad37da8de1421273f6903.tar.gz fccm22_rsvhls-2edb33bdea26d0b4406ad37da8de1421273f6903.zip