1 files changed, 107 insertions, 94 deletions
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 5d41c6c..d3a95c5 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -1,4 +1,18 @@
-\documentclass[hyphens,prologue,x11names,rgb,sigconf]{acmart}
+\documentclass[conference]{IEEEtran}
+\usepackage{amsmath}
+\usepackage{xcolor}
+\usepackage{caption}
+\usepackage{subcaption}
+\usepackage{graphicx}
+\usepackage{calc}  
+\usepackage{enumitem} 
+\usepackage{tabu}
+\usepackage{tikz}
+\usepackage{pgfplots}
+\usepackage{multirow}
+\usepackage{booktabs}
+\usepackage{hyperref}
+\usepackage[sort, numbers]{natbib}
 
 \usepackage[textsize=small,shadow]{todonotes}%
 \usepackage{soul}
@@ -9,8 +23,8 @@
 
 \usepackage{listings}
 \lstset{
-basicstyle=\tt,
-columns=fixed, basewidth=0.5em, %tip from https://tex.stackexchange.com/a/179072
+basicstyle=\tt\small,
+columns=fixed, %basewidth=0.5em, %tip from https://tex.stackexchange.com/a/179072
 escapeinside=||,
 }
 \setlength{\fboxsep}{1.5pt} % to reduce \colorbox padding when highlighting
@@ -51,32 +65,33 @@ escapeinside=||,
 \newcommand{\citeintext}[1] {\citeauthor{#1} in \citeyear{#1}~\cite{#1}}
 \newcommand{\citeshort}[1] {\citeauthor{#1} \citeyear{#1}~\cite{#1}}
 
+\title{Resource Sharing for Verified High-Level Synthesis \\ (Work in Progress)}
+
 \begin{document}
 
-\title{Resource Sharing for Verified High-Level Synthesis}
+%\author{Michail Pardalos}
+%\email{michail.pardalos17@imperial.ac.uk}
+%\affiliation{\institution{Imperial College London}}
 
-\author{Michail Pardalos}
-\email{michail.pardalos17@imperial.ac.uk}
-\affiliation{\institution{Imperial College London}}
+%\author{Yann Herklotz}
+%\email{yann.herklotz15@imperial.ac.uk}
+%\affiliation{\institution{Imperial College London}}
 
-\author{Yann Herklotz}
-\email{yann.herklotz15@imperial.ac.uk}
-\affiliation{\institution{Imperial College London}}
+%\author{John Wickerson}
+%\email{j.wickerson@imperial.ac.uk}
+%\affiliation{\institution{Imperial College London}}
 
-\author{John Wickerson}
-\email{j.wickerson@imperial.ac.uk}
-\affiliation{\institution{Imperial College London}}
+\maketitle
 
 \begin{abstract}
   High-level synthesis (HLS) is playing an ever-increasing role in hardware design, but concerns have been raised about its reliability. Seeking to address these concerns, Herklotz et al. have recently developed an HLS compiler, called Vericert, that has been mechanically proven (using the Coq proof assistant) to output Verilog designs that are behaviourally equivalent to the input C program. Unfortunately, Vericert's output cannot compete performance-wise with that of established HLS tools such as LegUp. A major reason for this is Vericert's complete lack of support for resource sharing. 
   
-  In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call} \MP{Is this an over-simplification? We don't really generate a "block of hardware" for function calls}, Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend Vericert's formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
+  In this paper, we present Vericert-Fun: Vericert extended with function-level resource sharing. Where original Vericert creates one block of hardware per function \emph{call}, %\MP{Is this an over-simplification? We don't really generate a `block of hardware' for function calls}
+  Vericert-Fun creates one block of hardware per function \emph{definition}. The enabling innovation is to extend Vericert's intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend Vericert's formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
   
 We have benchmarked Vericert-Fun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 61\% of Vericert's on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
 \end{abstract}
 
-\maketitle
-
 \section{Introduction}
 \label{sec:introduction}
 
@@ -94,8 +109,10 @@ Clearly, however, it is not enough for an HLS tool simply to be \emph{correct}.
 
 Engineering a software system within a proof assistant like Coq is widely held to be the gold standard for correctness. Recent years have shown that it is feasible to design substantial pieces of software in this way, such as database management systems~\cite{malecha+10}, web servers~\cite{chlipala15}, and operating system kernels~\cite{gu+16}. Coq has also been successfully deployed in the hardware design process, both in academia~\cite{braibant+13, bourgeat+20} and in industry~\cite{silveroak}. It has even been applied specifically to the HLS process: Faissole et al.~\cite{faissole+19} used it to verify that HLS optimisations respect dependencies present in the source code.
 
-\paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures~\cite{six+20}, that has been implemented and proven correct using Coq. CompCert accepts most of the C99 language, and generally produces code of comparable performance to that produced by GCC at optimisation level \texttt{-O1}. It transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains relatively simple and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its passes.
+\paragraph{The CompCert verified C compiler} Among the most celebrated applications of Coq is the CompCert project~\cite{compcert}. CompCert is a lightly optimising C compiler, with backend support for the Arm, x86, PowerPC, and Kalray VLIW architectures~\cite{six+20}, that has been implemented and proven correct using Coq. CompCert accepts most of the C99 language, and generally produces code of comparable performance to that produced by GCC at optimisation level \texttt{-O1}. It transforms its input through a series of ten intermediate languages before generating the final output. This design ensures that each individual pass remains relatively simple and hence feasible to prove correct. The correctness proof of the entire compiler is formed by composing the correctness proofs of each of its passes. 
 
+That the Csmith compiler testing tool has found hundreds of bugs in GCC and LLVM but has never found a single bug in (the verified parts of) CompCert~\cite{csmith} is a testament to the reliability of this development approach. That said, we cannot guarantee that hardware produced via Vericert-Fun will never go wrong, because of fallibilities in components not covered by the correctness theorem, including the computer checking the proofs, the pretty-printer of the final Verilog design, the synthesis toolchain~\cite{verismith}, and the FPGA device itself.
+ 
 \paragraph{The Vericert verified HLS tool}
 
 Introduced by \citet{Herklotz2020}, Vericert is a verified C-to-Verilog HLS tool. It is an extension of CompCert, essentially augmenting the existing verified C compiler with a new hardware-oriented intermediate language (called HTL) and a Verilog backend. In its current form, Vericert performs no significant optimisations beyond those it inherits from CompCert's frontend. This results in performance generally about one order of magnitude slower than that achieved by comparable, unverified HLS tools such as LegUp~\cite{legup_CanisAndrew2011}.
@@ -106,12 +123,19 @@ In the 3AC language, each function in the program is represented as a numbered l
 
 The overall Vericert flow is shown at the top of Figure~\ref{fig:flow}. The key point to note here is the `inlining' step, which folds all function definitions into their call sites. This allows Vericert to make the simplifying assumption that there is only a single CFG, but has the unwanted effect of duplicating hardware. In this work, we remove some of this inlining and hence some of the duplication.
 
-\begin{figure}
+\paragraph{Resource sharing in HLS}
+
+Resource sharing is a feature expected of most HLS compilers. In a typical architecture generated by HLS~\cite{coussy+09}, a number of `functional components' are selected from a library according to the needs of the specific design. In the scheduling process, a clock cycle is chosen for each operations such that the component is available. Given the need to mechanically verify the correctness of our implementation, Vericert-Fun follows a simpler approach: we share resources at the granularity of entire functions, rather than individual operations. Function-level resource sharing is implemented in commercial HLS compilers such as the Intel HLS compiler~\cite{intel_hls} or
+Xilinx Vitis~\cite{xilinx_vitis}, and is guided by the programmer through appropriate pragmas.
+
+Perna et al.~\cite{perna+12} developed a verified HLS tool for the Handel-C language, but, like Vericert, they did not implement function-level resource sharing, instead arranging that ``all procedure and function calls are expanded in the
+front-end''.
 
+\begin{figure}
 \definecolor{colorflow}{HTML}{8EB85B}
 
+\centering
 \begin{tikzpicture}[yscale=-1]
-
 \tikzset{flowlines/.style={draw, colorflow, ultra thick}}
 
 \node(fun1) at (1.5,2) {function};
@@ -131,9 +155,10 @@ The overall Vericert flow is shown at the top of Figure~\ref{fig:flow}. The key
 \draw[->] (CFG2) to (CFG);
 \draw[->] (CFG) to node [auto] (smg) {state machine generation} (FSMD);
 \draw[->] (FSMD) to node [auto] {Verilog generation} (module);
+\coordinate (r) at (6,3);
 
 \begin{pgfonlayer}{background}
-\node[flowlines, fit=(C)(module)(smg)] (background1) {};
+\node[flowlines, fit=(C)(module)(r)] (background1) {};
 \end{pgfonlayer}
 
 \begin{scope}[yshift=55mm]
@@ -158,11 +183,12 @@ The overall Vericert flow is shown at the top of Figure~\ref{fig:flow}. The key
 \draw[->] (FSMD2) to (FSMD4);
 \draw[->] (FSMD3) to node [auto, pos=0.2] {Verilog generation} (module);
 \draw[->] (FSMD4) to (module);
+\coordinate (r) at (6,3);
 
 \end{scope}
 
 \begin{pgfonlayer}{background}
-\node[flowlines, fit=(C)(module)(smg)] (background2) {};
+\node[flowlines, fit=(C)(module)(r)] (background2) {};
 \end{pgfonlayer}
 
 \draw[flowlines, -latex] (background1.south) to node [auto] {\bf this work} (background2.north -| background1.south);
@@ -175,37 +201,38 @@ The overall Vericert flow is shown at the top of Figure~\ref{fig:flow}. The key
 \section{Implementation of Vericert-Fun}
 
 \tikzset{
-st/.style={draw=black, fill=white, rounded corners, align=left, font=\tt, minimum width=40mm},
+st/.style={draw=black, fill=white, rounded corners, align=left, font=\tt\footnotesize, minimum width=40mm},
 ed/.style={draw, ->},
 exted/.style={draw, thick, dotted, ->, rounded corners},
 lab/.style={anchor=south west, inner sep=1pt, font=\tiny},
-tit/.style={anchor=north west, font=\tt},
-edlab/.style={auto, inner sep=2pt, align=left, font=\tt}
+tit/.style={anchor=north west, font=\tt\footnotesize},
+edlab/.style={auto, inner sep=2pt, align=left, font=\tt\footnotesize}
 }
 
 In this section, we shall explain the implementation of Vericert-Fun, using Figure~\ref{fig:example_C} as a worked example. The basic idea is shown at the bottom of Figure~\ref{fig:flow}: we avoid inlining the function calls at the 3AC level (except in certain circumstances described below), instead maintaining one state machine per function. All the state machines run simultaneously, and function calls are implemented by sending messages between the state machines. We combine all of these state machines into a single Verilog module, after renaming variables as necessary to avoid clashes.
 
 \begin{figure}
 \centering
+\begin{minipage}[b]{0.45\linewidth}
 \begin{lstlisting}
-int add(int a, int b) { 
+int add(int a, int b) 
+{ 
   |\HL5{return a + b;}| 
 }
 
-int main() {
+int main() 
+{
   |\HL1{int v = 0;}|
   |\HL2{v = add(v, 1);}|
   |\HL3{v = add(v, 2);}|
   |\HL4{return v;}|
 }
 \end{lstlisting}
-\caption{Running example: C program, with each instruction colour-coded so it can be tracked through the compilation stages}
+\caption{A simple C program with each instruction colour-coded so it can be tracked through the compilation stages}
 \label{fig:example_C}
-\end{figure}
-
-Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as obtained by the CompCert frontend. It takes the form of two CFGs, one per function. The control flow in this example is straightforward, but in general, 3AC programs can contain unrestricted gotos. The nodes of the CFGs are numbered in reverse order, as are the parameters of the \lstinline{add} function; both are conventions inherited from CompCert.
-
-\begin{figure}
+\end{minipage}
+\hfill
+\begin{minipage}[b]{0.5\linewidth}
 % \begin{lstlisting}
 % add (x2, x1) {
 %   |\HL5{2: x3 = x2 + x1 + 0 (int)}|
@@ -224,7 +251,7 @@ Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as
 % }
 % \end{lstlisting}
 \begin{tikzpicture}[node distance=2mm]
-\node[st, fill=highlight5]    (a2) at (0,-1) {x3 = x2 + x1 + 0 (int)};
+\node[st, fill=highlight5]    (a2) at (0,-1) {x3 = x2 + x1 + 0(int)};
 \node[st, fill=highlight5, below=of a2] (a1) {return x3};
 \draw[ed] (a2) to (a1);
 \node[tit] at ([xshift=-4mm, yshift=5mm]a2.north west) (labadd) {add(x2, x1) \{};
@@ -255,10 +282,14 @@ Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as
 \foreach\i in {1,2}
   \node[lab, anchor=south east] at (a\i.west) {\i};
 \end{tikzpicture}
-\caption{Running example: 3AC representation comprising two CFGs}
+\caption{3AC representation comprising two CFGs}
 \label{fig:example_3AC}
+\end{minipage}
+
 \end{figure}
 
+Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as obtained by the CompCert frontend. It takes the form of two CFGs, one per function. The control flow in this example is straightforward, but in general, 3AC programs can contain unrestricted gotos. The nodes of the CFGs are numbered in reverse order, as are the parameters of the \lstinline{add} function; both are conventions inherited from CompCert.
+
 Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citet{Herklotz2020}; what is new here is the handling of function calls, so the following description concentrates on that aspect. Before we proceed, there are two conventions worth noting: first, ``$*{\rightarrow}\langle\mathit{node}\rangle$'' should be interpreted as an abbreviation for edges from all nodes to $\langle\mathit{node}\rangle$, and second, we allow the \lstinline{main} machine to refer to the \lstinline{add} machine's register $\langle\mathit{reg}\rangle$ by writing ``\lstinline{add.}$\langle\mathit{reg}\rangle$''.
 
 The solid edges within the two state machines indicate the transfer of control between states, while the dashed edges between the state machines are more `fictional'. The ground truth is that both state machines run continuously, but it is convenient to think of only one of the machines as doing useful work at any point in time. So, the dashed edges indicate the points at which the `active' machine changes.
@@ -280,7 +311,7 @@ The solid edges within the two state machines indicate the transfer of control b
 \node[st, fill=highlight4, below=of m2] (m1)  {fin = 1; \\ ret = reg\_4;};
 \node[st, fill=highlight4, below=of m1] (m11) {fin <= 0;};
 
-\node[st, fill=highlight5, above=40mm of m9] (a2) {reg\_7 <= \{\{x2 + x1\} + 0\};};
+\node[st, fill=highlight5, above=38mm of m9] (a2) {reg\_7 <= \{\{x2 + x1\} + 0\};};
 \node[st, fill=highlight5, below=of a2] (a1) {fin = 1; \\ ret = reg\_7;};
 \node[st, fill=highlight5, below=of a1] (a3) {fin <= 0;};
 
@@ -333,46 +364,62 @@ The solid edges within the two state machines indicate the transfer of control b
 \draw[exted] ([yshift=1mm]a3.east) -- ++(21mm,0) |- (m3.east);
 
 \end{tikzpicture}
-\caption{Running example: HTL representation comprising two state machines}
+\caption{HTL representation comprising two state machines}
 \label{fig:example_HTL}
-\end{figure}
-
-In more detail: Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This puts the \lstinline{add} machine into state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to progress from state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
-
-The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
-
-\begin{figure}
-\begin{tikztimingtable}[timing/d/background/.style={fill=white}, timing/xunit=1em]
-{\tt\small clk} & 0.5L 38{0.5C} \\
-{\tt\small rst} & 0.5H 19L \\
-{\tt\small fin} & 0.5X 17L H L \\
-{\tt\small state} & 0.5X D{9} D{8} D{7} 4D{12} D{6} D{5} D{4} 4D{10} D{3} D{2} D{1} 2D{11} \\
-{\tt\small rst} & 3.5X H 6L H 8L \\
-{\tt\small fin} & 4.5X 2L H 6L H 5L \\
-{\tt\small state} & 4.5X D{2} D{1} 5D{3} D{2} D{1} 6D{3} \\
+\medskip
+\begin{tikztimingtable}[timing/d/background/.style={fill=white}, timing/xunit=1em, timing/yunit=0.6em]
+{\tt\footnotesize clk}\!\!\! & 0.5L 38{0.5C} \\
+{\tt\footnotesize rst}\!\!\! & 0.5H 19L \\
+{\tt\footnotesize fin}\!\!\! & 0.5X 17L H L \\
+{\tt\footnotesize state}\!\!\! & 0.5X D{9} D{8} D{7} 4D{12} D{6} D{5} D{4} 4D{10} D{3} D{2} D{1} 2D{11} \\
+{\tt\footnotesize rst}\!\!\! & 3.5X H 6L H 8L \\
+{\tt\footnotesize fin}\!\!\! & 4.5X 2L H 6L H 5L \\
+{\tt\footnotesize state}\!\!\! & 4.5X D{2} D{1} 5D{3} D{2} D{1} 6D{3} \\
 \extracode
 \foreach\i in {1,2,...,19}
 \node[help lines, anchor=west, inner sep=0] at (\i-0.5,2.25) {\tiny \i};
 \begin{pgfonlayer}{background}
   \vertlines[help lines]{0.5,1.5,...,18.5}
 \end{pgfonlayer}
-\draw[decoration={brace,mirror,raise=5pt},decorate] (-3,-1) to node[left=6pt] {\tt\small main} (-3,-6.5);
-\draw[decoration={brace,mirror,raise=5pt},decorate] (-3,-7) to node[left=6pt] {\tt\small add} (-3,-12.5);
+\draw[decoration={brace,mirror,raise=5pt},decorate] (-2.6,-1) to node[left=6pt] {\tt\footnotesize main} (-2.6,-6.5);
+\draw[decoration={brace,mirror,raise=5pt},decorate] (-2.6,-7) to node[left=6pt] {\tt\footnotesize
+add} (-2.6,-12.5);
 \end{tikztimingtable}
 \caption{Timing diagram for the state machines in Figure~\ref{fig:example_HTL}}
 \label{fig:timingdiagram}
+
 \end{figure}
 
+In more detail: Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This puts the \lstinline{add} machine into state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to progress from state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
+
+The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
+
 \JW{Still need to talk about:
 \begin{itemize}
-\item the additional idle state at the end
-\item externctrl
-\item alternative implementations that we considered, e.g. separate Verilog modules
+\item It might be nice to motivate the additional idle state at the end, like in mp's thesis. 
+\item Need to explain and motivate the assumption made about no pointers in called functions.
+\item Explain externctrl.
+\item Talk about alternative implementations that we considered, e.g. separate Verilog modules, and why we discounted them.
+\end{itemize}}
+
+\section{Proving Vericert-Fun correct}
+
+The CompCert correctness theorem~\cite{compcert} expresses that every behaviour that can be exhibited by the compiled program is also a behaviour of the source program. Vericert~\cite{Herklotz2020} adapted this theorem for HLS by replacing `compiled program' with `generated Verilog design'. In both cases, a formal semantics is required for the source and target languages. Vericert-Fun targets the same fragment of the Verilog language as \citeauthor{Herklotz2020} already mechanised in Coq, so no changes are required there. 
+
+Where changes \emph{are} required is in the semantics of the intermediate language HTL, which sits between CompCert's 3AC and the final Verilog.
+When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. 
+
+
+\JW{Todo:
+\begin{itemize}
+\item Summarise what we have proved so far and what is left to prove.
+\item State how large the implementation is, how large the proof is, and the rough person-months both took.
+\item Explain what the value of the proof has been so far, even though it has not been finished. For instance, has it led to any implementation bugs being ironed? Has the proof process led to changes in the design?
 \end{itemize}}
 
 \section{Performance evaluation}
 
-We now compare the performance of the hardware generated by Vericert-Fun against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. \JW{Do we have to skip any of the benchmarks? Michalis's thesis talks about skipping adi and ludcmp -- is that still the case?} We used the Icarus Verilog simulator~\cite{icarus} to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) at 50MHz, to determine area usage and fmax.
+We now compare the performance of the hardware generated by Vericert-Fun against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) at 50MHz, to determine area usage and fmax.
 
 Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the data points lie very close to 1, which suggests no significant impact. On average the cycle count increases by 0.7\%; this modest increase is in line with expectations because our translation introduces an extra state per function call. The impact on fmax is similarly minimal, ranging between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
 
@@ -396,43 +443,9 @@ RESULTS GRAPH HERE.\par
 
 The y-axis shows the impact of resource sharing on the area usage of the hardware. On average we see a decrease in area usage of 12\%. The impact ranges from a 59\% decrease to a 6\% increase, with a 12\% decrease on average. It is hard to attribute precise reasons for this given the heuristic-driven nature of the synthesis process, but certainly the benchmarks with more function calls saw more benefit from resource sharing.
 
+\section{Future work}
 
-\section{Proving Vericert-Fun correct}
-
-The CompCert correctness theorem~\cite{compcert} expresses that every behaviour that can be exhibited by the compiled program is also a behaviour of the source program. Vericert~\cite{Herklotz2020} adapted this theorem for HLS by replacing `compiled program' with `generated Verilog design'. In both cases ...
-
-The behaviours of a Verilog design are determined by appealing to a formal semantics of the Verilog language, which \citeauthor{Herklotz2020} mechanised in Coq.
-
-In order to extend the Vericert correctness theorem to Vericert-Fun, we need to extend the Verilog semantics to handle ...
-
-\JW{Todo:
-\begin{itemize}
-\item What is needed for the correctness proof? (Semantics for C, semantics for Verilog, ...)
-\end{itemize}}
-
-
-\section{Rough notes}
-A few points that we might want to address at some point:
-
-\begin{itemize}
-
-\item What is so good about having a machine-checked correctness proof for an HLS tool? Explain that it is the gold-standard for high-integrity computer systems. Give example of Csmith finding 0 bugs in the verified parts of CompCert, compared to 100s in GCC and LLVM. Explain that machine-checked correctness proofs are becoming more common these days thanks to advances in automated theorem proving technology. Explain that we can't guarantee that hardware produced via Vericert will never go wrong, because:
-\begin{itemize}
-\item the translation from the Coq source code of Vericert into executable OCaml is unverified,
-\item the compilation of that OCaml program is unverified,
-\item the machine that runs the compiled OCaml program could go wrong, 
-\item the Coq proof assistant could contain a bug, as could the machine running it,
-\item the pretty-printing of the final Verilog design is unverified,
-\item the synthesis of that Verilog design into a netlist is unverified, as are the tools that perform place-and-route and FPGA bitstream generation,
-\item the execution on the FPGA could go wrong, e.g. as a result of cosmic rays.
-\end{itemize}
-
-\end{itemize}
-
-
-
-
-
+Our immediate priority is to complete Vericert-Fun's correctness proof. In the medium term, we would like to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers; we anticipate that this will lead to further reductions in area usage. In the longer term, we are considering how to implement resource sharing even more effectively in a verified HLS tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.
 
 
 \bibliographystyle{ACM-Reference-Format}
@@ -440,7 +453,7 @@ A few points that we might want to address at some point:
 
 \appendix
 
-\section{Verilog output for running example}
+We provide the Verilog output that is generated from our running example (Figures~\ref{fig:example_C} to~\ref{fig:example_HTL}).
 
 \begin{lstlisting}
 module main (clk, rst, ret, fin)