7 files changed, 108 insertions, 34 deletions
diff --git a/acmart.cls b/acmart.cls
index e8b3b9b..239698b 100644
--- a/acmart.cls
+++ b/acmart.cls
@@ -489,7 +489,7 @@
 \def\@makefnmark{\hbox{\@textsuperscript{\normalfont\@thefnmark}}}
 \let\@footnotemark@nolink\@footnotemark
 \let\@footnotetext@nolink\@footnotetext
-\RequirePackage[bookmarksnumbered,unicode,draft]{hyperref}
+\RequirePackage[bookmarksnumbered,unicode]{hyperref}
 \pdfstringdefDisableCommands{%
   \def\addtocounter#1#2{}%
   \def\unskip{}%
diff --git a/algorithm.tex b/algorithm.tex
index ba56d2a..757857d 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,6 +1,5 @@
 \section{Designing a verified HLS tool}
-
-%\JW{The first part of this section (up to 2.1) is good but needs tightening up. Ultimately, the point of this section is to explain that there's an existing verified compiler called CompCert which has a bunch of stages, and we need to make a decision about where to tap into that pipeline. Too early and we miss out on some helpful optimisations; too late and we've ended up too target-specific. What if you put a few more stages into Figure 1 -- there aren't actually that many missing anyway. Suppose you add in Cminor between C\#minor and 3AC. Then the high-level structure of your argument in this subsection could be: (1) why Cminor is too early, (2) why LTL is too late, and then maybe (3) why 3AC is just right. The Goldilocks structure, haha!}
+\label{sec:design}
 
 This section covers the main architecture of the HLS tool, and the way in which the back end was added to \compcert{}.  This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
 %\JW{I've experimented with adding a few paragraph headings to the text below -- see what you think. The advantage of headings is that it can make the text easier to read quickly.}\YH{Yes I think it works well actually, makes the sections clearer.}
diff --git a/evaluation.tex b/evaluation.tex
index 13ab4b4..8f20b29 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -1,4 +1,5 @@
 \section{Evaluation}
+\label{sec:evaluation}
 \NR{
 To-do list: 
 \begin{itemize}
diff --git a/introduction.tex b/introduction.tex
index 1ea640a..9fc5771 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -8,45 +8,48 @@
 
 \subsection{Can you trust your high-level synthesis tool?}
 
-As latency, throughput and energy efficiency become increasingly important, custom hardware accelerators are being designed for numerous applications~\cite{??}. 
+As latency, throughput and energy efficiency become increasingly important, custom hardware accelerators are being designed for numerous applications. 
 Alas, designing these accelerators can be a tedious and error-prone process using a hardware description language (HDL) such as Verilog.
 An attractive alternative is high-level synthesis (HLS), in which hardware designs are automatically compiled from software written in a language like C. 
-Modern HLS tools such as \legup{}~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{??}, and Bambu HLS~\cite{} can produce designs with comparable performance and energy-efficiency to those manually coded in HDL~\cite{bdti,autoesl}, while offering the convenient abstractions and rich ecosystem of software development.
+Modern HLS tools such as \legup{}~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{intel_hls}, and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and energy-efficiency to those manually coded in HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341}, while offering the convenient abstractions and rich ecosystem of software development.
 
-But existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to the software they were given. 
-This undermines any reasoning conducted at the software level. 
-And there are reasons to doubt that HLS tools actually \emph{do} preserve equivalence. First, just like conventional compilers, HLS tools are large pieces of software that perform a series of complex analyses and transformations. 
-Second, unlike conventional compilers, HLS tools target HDL, which has superficial syntactic similarities to software languages but a subtly different semantics, making it easy to introduce behavioural mismatches between the output of the HLS tool and its input.
+But existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to the software they were given, and this undermines any reasoning conducted at the software level. 
+
+Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence. 
+%Some of these reasons are general: HLS tools are large pieces of software, they perform a series of complex analyses and transformations, and the HDL output they produce has superficial syntactic similarities to a software language but a subtly different semantics.
 %Hence, the premise of this work is: Can we trust these compilers to translate high-level languages like C/C++ to HDL correctly? 
-\JW{These doubts have been shown to be justified by... / These doubts have been borne out by / corroborated... }
-For instance, \citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  
-More recently, Du et al.~\cite{du+20} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.  
-%
-Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}.
+%Other reasons are more specific: 
+For instance, Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports.\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}} 
+Meanwhile, \citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  
+And more recently, Du et al.~\cite{du+20} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.  
+
+\subsection{Existing workarounds}
+
+Aware of the reliability shortcomings of HLS tools, hardware designers routinely check the generated hardware for functional correctness. This is commonly done by simulating the design against a large test-bench; however, the guarantees are only as good as the test-bench, meaning that if all the inputs were not covered exhaustively, there is a risk that bugs remain. 
 
-\subsection{Existing verification workarounds}
+An alternative is to use \emph{translation validation} to formally prove the input and output equivalent~\cite{pnueli98_trans}. Translation validation has been successfully applied to several HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. 
+Yet translation validation must be repeated every time the compiler is invoked, and it is an expensive task, especially for large designs. 
+For example, the translation validation for Catapult C~\cite{mentor20_catap_high_level_synth, slec_whitepaper} may require several rounds of expert modifications to the input C program before validation succeeds. And even when it succeeds, translation validation does not provide watertight guarantees unless the validator itself has been mechanically proven correct, which is seldom the case. 
 
-It is rather difficult to exhaustively test a HLS tool to prove for the absence of bugs since these codebases are very large and often include custom passes and directives.
-Hence, most existing work on verifying HLS tools focus on proving that HLS-generated hardware is equivalent to its software counterpart for all possible inputs of the program, or \emph{translation validation}~\cite{pnueli98_trans}. 
-Translation validation has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
-However, translation validation often suffers from one key problem: the validator itself might not have been mechanically proven to be correct and even it is proven there can be an interpretation gap between the mechanised proof and its implementation. 
-Furthermore, translation validation also has other practical problems: it needs to be invoked everytime a new program is compiled and it can also lead to exponential growth in state space. 
-For example, Catapult C's translation validation~\cite{mentor20_catap_high_level_synth} warns that it might require large number of iterations to perform validation.
+Our position is that none of the above workarounds are necessary if the HLS tool can simply be trusted to work correctly.
 
-\subsection{Our approach}
-Proving the absence of bugs within HLS tools has always been a tricky and difficult process.
-A radical solution is to write an HLS tool in a theorem prover such as Coq~\cite{bertot04_inter_theor_provin_progr_devel}, so that the translation algorithm can be proven correct in the theorem prover itself.  The algorithm can then be extracted to executable code, which exhibits the same properties as the Coq code that was proven correct in the theorem prover.  In this paper we describe a fully verified HLS tool called \vericert{}, which adds a Verilog back end to \compcert{}, an existing C compiler that has been written and formally verified in the Coq theorem prover, and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics.  The main contributions of the paper are the following:
+\subsection{Our solution}
+We have written a new HLS tool in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel} and proven that it any output it produces always has the same behaviour as its input. Our tool, which is called \vericert{}, is automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is the same as the implementation of the tool. \vericert{} is built by extending the \compcert{} verified C compiler~\cite{leroy09_formal_verif_realis_compil} with a new hardware-specific intermediate language and a Verilog back end. It supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables.
+
+\subsection{Contributions and Outline}
+
+The contributions of this paper are as follows:
 
 \begin{itemize}
-  \item We provide the first mechanised and formally verified HLS tool going from C to Verilog.
-  \item Description of how the Verilog semantics integrate into \compcert{} and how it interacts with \compcert{}'s intermediate language.
-  \item Benchmark comparisons between \vericert{} and LegUp~\cite{canis11_legup}, a well-known open source, unverified HLS tool on Polybench, a large C benchmark, and including some specific test cases from CHstone.
+  \item We present \vericert{}, the first mechanically verified HLS tool that compiles C to Verilog. The design of \vericert{} is described in Section~\ref{sec:design}.
+  \item We prove \vericert{} correct w.r.t. an existing semantics for Verilog due to \citet{loow19_formalise}. We describe in Section~\ref{sec:verilog} how we lightly extended this semantics to make it suitable as an HLS target. Section~\ref{sec:proof} describes the proof itself.
+  \item We have conducted a performance comparison between \vericert{} and a widely-used (unverified) HLS tool called \legup{}~\cite{canis11_legup} using the PolyBench benchmarks. As described in Section~\ref{sec:evaluation}, \vericert{} generates hardware that is about 9x slower and 21x less area-efficient than that generated by \legup{}. We expect that these numbers will improve as we develop \vericert{} with further optimisations like loop pipelining and scheduling.
 \end{itemize}
 
-The first section will describe the Verilog semantics that were used and extended to fit into \compcert{}'s model.  The second section will then describe the HLS algorithm, together with its proof.
-
-\vericert{} is open source and is hosted on GitHub\footnote{\url{https://github.com/ymherklotz/vericert}}.
-% }
+\vericert{} is fully open source and available online. \JW{We'll have to blind this (and maybe even the name of the tool itself) for submission.}
+\begin{center}
+\url{https://github.com/ymherklotz/vericert}
+\end{center}
 
 
 %%% Local Variables:
diff --git a/proof.tex b/proof.tex
index 128377b..5d0a7c6 100644
--- a/proof.tex
+++ b/proof.tex
@@ -1,8 +1,10 @@
 \section{Proof}
+\label{sec:proof}
 
+\NR{Now that we have adapted the Verilog semantics to the CompCert model, we are in a position to formally prove the correctness of our C to Verilog compilation.}
 This section describes the main correctness theorem that was proven and the main ideas behind the proofs.
 
-The main correctness theorem is the exact same correctness theorem as stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil} and states that for all Clight source programs $C$, if the translation from the source to the target Verilog code succeeds, assuming that $C$ has correct observable behaviour $B$ when executed, then the target Verilog code will simulate with the same behaviour $B$, where behaviour can either be convergent, divergent or ``going wrong'' and is associated with a trace $t$ of any external function calls.  The following backwards simulation theorem describes this property, where $\Downarrow_{s}$ and $\Downarrow$ stand for simulation and execution respectively.
+The main correctness theorem is the exact same correctness theorem as stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil} which states that for all Clight source programs $C$, if the translation from the source to the target Verilog code succeeds, assuming that $C$ has correct observable behaviour $B$ when executed, then the target Verilog code will simulate with the same behaviour $B$, where behaviour can either be convergent, divergent or ``going wrong'' and is associated with a trace $t$ of any external function calls.  The following backwards simulation theorem describes this property, where $\Downarrow_{s}$ and $\Downarrow$ stand for simulation and execution respectively.
 
 \begin{equation*}
   \forall C, V, B \notin \texttt{Wrong},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \land V \Downarrow_{s} B \implies C \Downarrow B.
diff --git a/references.bib b/references.bib
index bab444d..ec8e945 100644
--- a/references.bib
+++ b/references.bib
@@ -335,6 +335,23 @@
   year = {2008},
 }
 
+@inproceedings{loow19_formalise,
+  author    = {Andreas L{\"{o}}{\"{o}}w and
+               Magnus O. Myreen},
+  title     = {A proof-producing translator for Verilog development in {HOL}},
+  booktitle = {FormaliSE@ICSE},
+  pages     = {99--108},
+  publisher = {{IEEE} / {ACM}},
+  year      = {2019}
+}
+
+@misc{slec_whitepaper,
+  author = {Chauhan, Pankaj},
+  title = {Formally Ensuring Equivalence between C++ and RTL designs},
+  url = {https://bit.ly/2KbT0ki},
+  year = {2020},
+}
+
 @inproceedings{loow19_verif_compil_verif_proces,
   author =       {L\"{o}\"{o}w, Andreas and Kumar, Ramana and Tan, Yong Kiam and
                   Myreen, Magnus O. and Norrish, Michael and Abrahamsson, Oskar
@@ -426,6 +443,14 @@
   year            = 2020,
 }
 
+@misc{intel_hls,
+  author = {Intel},
+  title = {High-level Synthesis Compiler},
+  url = {https://intel.ly/2UDiWr5},
+  urldate = {2020-11-18},
+  year = {2020},
+  }
+
 @misc{intel20_sdk_openc_applic,
   author          = {Intel},
   title           = {{SDK} for {OpenCL} Applications},
@@ -434,6 +459,47 @@
   year            = 2020,
 }
 
+@inproceedings{homsirikamol+14,
+  author    = {Ekawat Homsirikamol and
+               Kris Gaj},
+  title     = {Can high-level synthesis compete against a hand-written code in the
+               cryptographic domain? {A} case study},
+  booktitle = {ReConFig},
+  pages     = {1--8},
+  publisher = {{IEEE}},
+  year      = {2014}
+}
+
+@INPROCEEDINGS{7818341,
+  author={Maxime Pelcat and C\'edric Bourrasset and Luca Maggiani and François Berry},
+  booktitle={2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)}, 
+  title={Design productivity of a high level synthesis compiler versus HDL},
+  year={2016},
+  volume={},
+  number={},
+  pages={140-147},
+  doi={10.1109/SAMOS.2016.7818341}}
+
+@misc{silexicahlshdl,
+author = {Gauthier, Stephane and Wadood, Zubair},
+title = {High-Level Synthesis:
+Can it outperform
+hand-coded HDL?},
+url = {https://bit.ly/2IDhKBR},
+year = {2020},
+}
+
+@inproceedings{bambu_hls,
+  author    = {Christian Pilato and
+               Fabrizio Ferrandi},
+  title     = {Bambu: {A} modular framework for the high level synthesis of memory-intensive
+               applications},
+  booktitle = {{FPL}},
+  pages     = {1--4},
+  publisher = {{IEEE}},
+  year      = {2013}
+}
+
 @inproceedings{nigam20_predic_accel_desig_time_sensit_affin_types,
   author          = {Nigam, Rachit and Atapattu, Sachille and Thomas, Samuel and Li, Zhijing and
                   Bauer, Theodore and Ye, Yuwei and Koti, Apurva and Sampson, Adrian and Zhang,
diff --git a/verilog.tex b/verilog.tex
index bd36873..1f3a3dc 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -81,6 +81,8 @@ In addition to adding support for two-dimensional arrays, support for receiving
   \label{fig:inferrence_module}
 \end{figure*}
 
+\NR{Would signposting this section at the start: "\compcert{} defines a specific model of computation which all semantics have to follow in order to prove properties about them. In this section, we first describe three of \compcert{}'s original computational states. Then, we describe the five registers and semantic rules we add to Verilog semantics to integrate it to the \compcert{} model."} 
+
 \compcert{} defines a specific model of computation which all semantics have to follow in order to prove properties about them.  \compcert{} has three main states that need to be defined:
 
 \begin{description}
@@ -90,6 +92,7 @@ In addition to adding support for two-dimensional arrays, support for receiving
 \end{description}
 
 However, as Verilog behaves quite differently to software programming languages, this model does not match the Verilog semantics.  Instead, the module definition in Verilog needs to be enriched to support this model of computation, by adding required signals to the inputs and outputs of modules.
+\NR{I understand this better. Maybe a concise way of saying this is: "The CompCert computation model require our Verilog semantics to be enriched. In order to support the three CompCert states, our module definitions require five additional Verilog registers."}
 
 \begin{description}
   \item[program counter] First of all, the program counter needs to be modelled, which can be done using a register which keeps track of state, denoted as $\sigma$.
@@ -98,10 +101,10 @@ However, as Verilog behaves quite differently to software programming languages,
   \item[stack] The function stack can be modelled as a RAM using a two-dimensional array in the module, denoted as \textit{stk}.
 \end{description}
 
-Figure~\ref{fig:inferrence_module}\NR{This ref points to Figure 14 instead of Figure 5!} shows the inference rules that convert from one state to another.  The first rule \textsc{Step} is the normal rule of execution.  Assuming that the module is not being reset, and that the finish state has not been reached yet and that the current and next state are $v$ and $v'$, and finally that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Module} rule, we can then define one step in the \texttt{State}.  The \textsc{Finish} rule is the rule that returns the final value of running the module and is applied when the \textit{fin} register is set.  The return value is then taken from the \textit{ret} register.
+Figure~\ref{fig:inferrence_module}\NR{This ref points to Figure 14 instead of Figure 5!} shows the inference rules that convert from one state to another.  The first rule \textsc{Step} is the normal rule of execution.  Assuming that the module is not being reset, and that the finish state has not been reached yet and that the current and next state are $v$ and $v'$, and finally that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Module} rule, we can then define one step in the \texttt{State}.\NR{No module rule in Fig. 5?}  The \textsc{Finish} rule is the rule that returns the final value of running the module and is applied when the \textit{fin} register is set.  The return value is then taken from the \textit{ret} register.
 
 
-The first thing to note about the \textsc{Call} rule is that there is no step from \texttt{State} to \texttt{Callstate}, as function calls are not supported, and it is therefore impossible in our semantics to ever reach a \texttt{Callstate} except for the initial call to main.  The same can be said about the \textsc{Return} state rule, which will only be matched for the final return value from the main function, as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.  Therefore, in addition to the rules shown in Figure~\ref{fig:inferrence_module}, an initial state and final state need to be defined.
+The first thing to note about the \textsc{Call} rule is that there is no step from \texttt{State} to \texttt{Callstate}, as function calls are not supported, and it is therefore impossible in our semantics to ever reach a \texttt{Callstate} except for the initial call to main.  The same can be said about the \textsc{Return} state rule, which will only be matched for the final return value from the main function, as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.  Therefore, in addition to the rules shown in Figure~\ref{fig:inferrence_module}\NR{Wrong figure again}, an initial state and final state need to be defined.
 
 \begin{align*}
   \texttt{initial:}\quad &\forall P,\ \yhconstant{Callstate } []\ P.\texttt{main } []\\