5 files changed, 74 insertions, 69 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 425e1b0..684975c 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,17 +1,40 @@
-\section{Adding an HLS back end to CompCert}
+\section{Adding an HLS back end to \compcert{}}
 
-This section covers the main architecture of the HLS tool, and the way in which the backend was added to CompCert.  This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
+This section covers the main architecture of the HLS tool, and the way in which the backend was added to \compcert{}.  This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
+
+\begin{figure}
+  \centering
+  \resizebox{0.47\textwidth}{!}{
+  \begin{tikzpicture}
+    [language/.style={fill=white,rounded corners=2pt}]
+    \fill[compcert,rounded corners=3pt] (-1,-1) rectangle (9,1.5);
+    \fill[formalhls,rounded corners=3pt] (-1,-1.5) rectangle (9,-2.5);
+    \node[language] at (0,0) (clight) {Clight};
+    \node[language] at (2,0) (cminor) {C\#minor};
+    \node[language] at (4,0) (rtl) {3AC};
+    \node[language] at (6,0) (ltl) {LTL};
+    \node[language] at (8,0) (ppc) {PPC};
+    \node[language] at (4,-2) (dfgstmd) {HTL};
+    \node[language] at (7,-2) (verilog) {Verilog};
+    \node at (0,1) {\compcert{}};
+    \node at (0,-2) {Vericert};
+    \draw[->] (clight) -- (cminor);
+    \draw[->,dashed] (cminor) -- (rtl);
+    \draw[->] (rtl) -- (ltl);
+    \draw[->,dashed] (ltl) -- (ppc);
+    \draw[->] (rtl) -- (dfgstmd);
+    \draw[->] (dfgstmd) -- (verilog);
+  \end{tikzpicture}}
+  \caption{Verilog backend to Compcert, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine.  Finally, it is transformed to a hardware description of the state machine in Verilog.}%
+  \label{fig:rtlbranch}
+\end{figure}
+
+The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.
 
 % - Explain main differences between translating C to software and to hardware.
 
 %   + This can be done by going through the simple example.
 
-\begin{figure*}
-  \centering
-  \includegraphics[scale=0.3,trim={10cm 10cm 7cm 7cm},clip=true]{data/accumulator_fsmd2.pdf}
-  \caption{Accumulator hardware diagram.}\label{fig:accumulator_diagram}
-\end{figure*}
-
 \begin{figure}
   \centering
   \begin{subfigure}[b]{0.5\linewidth}
@@ -22,7 +45,7 @@ This section covers the main architecture of the HLS tool, and the way in which
     \inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
     \caption{Accumulator RTL code.}\label{fig:accumulator_rtl}
   \end{subfigure}
-  \caption{Accumulator example using CompCert to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
+  \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
 \end{figure}
 
 \begin{figure}
@@ -35,42 +58,21 @@ This section covers the main architecture of the HLS tool, and the way in which
     \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
     \caption{Accumulator Verilog code.}\label{fig:accumulator_v_2}
   \end{subfigure}
-  \caption{Accumulator example using CompCert to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.} \JW{Ok, what about drawing the FSMD in some sort of pictorial way? I think you tried drawing it as a schematic previously, but that was too big and clumsy. What about drawing the FSMD as a state machine, with sixteen states and labelled edges between them? Or might that be too abstract?}}\label{fig:accumulator_v}
+  \caption{Accumulator example using \compcert{} to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.} \JW{Ok, what about drawing the FSMD in some sort of pictorial way? I think you tried drawing it as a schematic previously, but that was too big and clumsy. What about drawing the FSMD as a state machine, with sixteen states and labelled edges between them? Or might that be too abstract?}}\label{fig:accumulator_v}
 \end{figure}
 
-Taking a simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behaviour description into a hardware design.  
+Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behaviour description into a hardware design.
+
+\begin{figure*}
+  \centering
+  \includegraphics[scale=0.3,trim={10cm 10cm 7cm 7cm},clip=true]{data/accumulator_fsmd2.pdf}
+  \caption{Accumulator hardware diagram.}\label{fig:accumulator_diagram}
+\end{figure*}
 
 %   + TODO Explain the main mapping in a short simple way
 
 % - Explain why we chose 3AC (RTL) as the branching off point.
 
-\begin{figure}
-  \centering
-  \resizebox{0.47\textwidth}{!}{
-  \begin{tikzpicture}
-    [language/.style={fill=white,rounded corners=2pt}]
-    \fill[compcert,rounded corners=3pt] (-1,-1) rectangle (9,1.5);
-    \fill[formalhls,rounded corners=3pt] (-1,-1.5) rectangle (9,-2.5);
-    \node[language] at (0,0) (clight) {Clight};
-    \node[language] at (2,0) (cminor) {C\#minor};
-    \node[language] at (4,0) (rtl) {3AC};
-    \node[language] at (6,0) (ltl) {LTL};
-    \node[language] at (8,0) (ppc) {PPC};
-    \node[language] at (4,-2) (dfgstmd) {HTL};
-    \node[language] at (7,-2) (verilog) {Verilog};
-    \node at (0,1) {CompCert};
-    \node at (0,-2) {Vericert};
-    \draw[->] (clight) -- (cminor);
-    \draw[->,dashed] (cminor) -- (rtl);
-    \draw[->] (rtl) -- (ltl);
-    \draw[->,dashed] (ltl) -- (ppc);
-    \draw[->] (rtl) -- (dfgstmd);
-    \draw[->] (dfgstmd) -- (verilog);
-  \end{tikzpicture}}
-  \caption{Verilog backend branching off at the 3AC stage in CompCert.}%
-  \label{fig:rtlbranch}
-\end{figure}
-
 %   + TODO Clarify connection between CFG and FSMD
 
 %   + TODO Explain how memory is mapped
diff --git a/evaluation.tex b/evaluation.tex
index 8ab8999..192f397 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -2,20 +2,20 @@
 
 Our evaluation is designed to answer the following three research questions.
 \begin{description}
-\item[RQ1] How fast is the hardware generated by Vericert, and how does this compare to existing HLS tools?
-\item[RQ2] How area-efficient is the hardware generated by Vericert, and how does this compare to existing HLS tools?
-\item[RQ3] How long does Vericert take to produce hardware?
+\item[RQ1] How fast is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
+\item[RQ2] How area-efficient is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
+\item[RQ3] How long does \vericert{} take to produce hardware?
 \end{description}
 
 \subsection{Experimental Setup}
 
-\paragraph{Choice of HLS tool for comparison.} We compare Vericert against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
 
-\paragraph{Choice of benchmarks.} We evaluate Vericert using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since Vericert only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of Vericert, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
+\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
 
 \paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these tasks, and we target an Arria 10 \ref{?} FPGA.
 
-\subsection{RQ1: How fast is Vericert-generated hardware?}
+\subsection{RQ1: How fast is \vericert{}-generated hardware?}
 
 \begin{figure}
 \begin{tikzpicture}
@@ -25,7 +25,7 @@ Our evaluation is designed to answer the following three research questions.
     height=80mm,
     width=80mm,
     xlabel={LegUp (ms)},
-    ylabel={Vericert (ms)},
+    ylabel={\vericert{} (ms)},
     xmin=0.01,
     xmax=10,
     ymax=500,
@@ -41,15 +41,15 @@ Our evaluation is designed to answer the following three research questions.
   
 \end{axis}
 \end{tikzpicture}
-\caption{A comparison of the execution time of hardware designs generated by Vericert and by LegUp. The diagonal is at $y=50x$.}
+\caption{A comparison of the execution time of hardware designs generated by \vericert{} and by LegUp. The diagonal is at $y=50x$.}
 \label{fig:comparison_time}
 \end{figure}
 
-Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because Vericert-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, Vericert generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
+Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because \vericert{}-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, \vericert{} generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
 
-The difference in cycle counts shows the degree of  parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of CompCert instructions that we have proven to be correct.
+The difference in cycle counts shows the degree of  parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of \compcert{} instructions that we have proven to be correct.
 
-\subsection{RQ2: How area-efficient is Vericert-generated hardware?}
+\subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
 
 \begin{figure}
 \begin{tikzpicture}
@@ -57,7 +57,7 @@ The difference in cycle counts shows the degree of  parallelism that LegUp's sch
     height=80mm,
     width=80mm,
     xlabel={LegUp (\%)},
-    ylabel={Vericert (\%)},
+    ylabel={\vericert{} (\%)},
     xmin=0, ymin=0,
   ]
   
@@ -67,13 +67,13 @@ The difference in cycle counts shows the degree of  parallelism that LegUp's sch
   
 \end{axis}
 \end{tikzpicture}
-\caption{A comparison of the resource utilisation of designs generated by Vericert and by LegUp}
+\caption{A comparison of the resource utilisation of designs generated by \vericert{} and by LegUp}
 \label{fig:comparison_area}
 \end{figure}
 
-Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that Vericert designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
+Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
 
-\subsection{RQ3: How long does Vericert take to produce hardware?}
+\subsection{RQ3: How long does \vericert{} take to produce hardware?}
 
 \begin{figure}
 \begin{tikzpicture}
@@ -81,7 +81,7 @@ Figure~\ref{fig:comparison_area} compares the size of the hardware designs gener
     height=80mm,
     width=80mm,
     xlabel={LegUp (s)},
-    ylabel={Vericert (s)},
+    ylabel={\vericert{} (s)},
   ]
   
 \addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
@@ -90,11 +90,11 @@ Figure~\ref{fig:comparison_area} compares the size of the hardware designs gener
   
 \end{axis}
 \end{tikzpicture}
-\caption{A comparison of compilation time for Vericert and for LegUp \JW{Numbers for Vericert not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
+\caption{A comparison of compilation time for \vericert{} and for LegUp \JW{Numbers for \vericert{} not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
 \label{fig:comparison_comptime}
 \end{figure}
 
-Figure~\ref{fig:comparison_comptime} compares the compilation times of Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that Vericert compilation is about ten times faster than LegUp compilation. The simple reason for this is that Vericert omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
+Figure~\ref{fig:comparison_comptime} compares the compilation times of \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} compilation is about ten times faster than LegUp compilation. The simple reason for this is that \vericert{} omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
 
 \begin{table}
   \begin{tabular}{lcccccc}
@@ -121,7 +121,7 @@ Figure~\ref{fig:comparison_comptime} compares the compilation times of Vericert
     mips & 18482 & 78.43 & 10617 & 7690 & 0 & 0\\
     \bottomrule
   \end{tabular}
-  \caption{CHstone programs synthesised in Vericert}
+  \caption{CHstone programs synthesised in \vericert{}}
 \end{table}
 
 %%% Local Variables:
diff --git a/introduction.tex b/introduction.tex
index bcaf52f..8411ebe 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -25,28 +25,28 @@ However, the fact that the behaviour is preserved after HLS cannot be guaranteed
 %%\NR{Having said that, keep the text for related work section.}\YH{Added into related works.}
 To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there is a risk that bugs remain in the untested code.  To ensure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent.  Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. \JW{Does this link back to Mentor's Catapult-C, which you mentioned earlier? Does Catapult-C attempt to do translation validation as part of its HLS process? And if so, can you make the point that this effort is largely ineffective because once the design is a reasonable size, the translation validation usually fails anyway?}\YH{TODO: Currently I only have a whitepaper which goes over the translation validation in a very high level, but they do actually mention some flakiness and state that the user would have to manually change the code to fix that.  So I think I can actually make that point.  I just realised they have a pretty funny diagram of verification $\rightarrow$ differences $\rightarrow$ adjustments $\rightarrow$ ... until it is finally verified.} A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code.  In addition to that, it would allow for the implementation of optimisation passes which are also proven correct mechanically by translation validation, thereby greatly improving the reliability of these passes.
 
-CompCert~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has been written and formally verified in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel}.  First of all, most of the compiler passes in CompCert have been proven correct, meaning that once the compiler is built, the proofs can be erased as the algorithm has been shown to be correct independent of the input.  However, some optimisation passes such as software pipelining require translation validation~\cite{tristan08_formal_verif_trans_valid}, in which case the correctness of the compiler pass needs to be checked at runtime.  However, even in this case the verifier itself is proven to only verify code correct that does indeed behave in the same way.
+\compcert{}~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has been written and formally verified in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel}.  First of all, most of the compiler passes in \compcert{} have been proven correct, meaning that once the compiler is built, the proofs can be erased as the algorithm has been shown to be correct independent of the input.  However, some optimisation passes such as software pipelining require translation validation~\cite{tristan08_formal_verif_trans_valid}, in which case the correctness of the compiler pass needs to be checked at runtime.  However, even in this case the verifier itself is proven to only verify code correct that does indeed behave in the same way.
 
 %% Contributions of paper
 
-In this paper we describe a fully verified HLS tool called Vericert, which adds a Verilog back end to CompCert and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics.  The main contributions of the paper are the following:
+In this paper we describe a fully verified HLS tool called \vericert{}, which adds a Verilog back end to \compcert{} and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics.  The main contributions of the paper are the following:
 
 \begin{itemize}
   \item First mechanised and formally verified HLS flow from C to Verilog.
-  \item Proof by simulation mechanised in Coq between CompCert's intermediate language and Verilog.
-  \item Description of the Verilog semantics integrated into CompCert and how this interacts with CompCert's intermediate language.
+  \item Proof by simulation mechanised in Coq between \compcert{}'s intermediate language and Verilog.
+  \item Description of the Verilog semantics integrated into \compcert{} and how this interacts with \compcert{}'s intermediate language.
   \item Correct by construction Verilog code for four programs that are part of the CHStone benchmark suite, which is a well-known HLS benchmark.
-%%  \item \NR{We implement our Verilog semantics in CompCert and we are able to generate correct-by-construction Verilog for all programs in the CHStone benchmark suite, which is a well-known HLS benchmark.}
+%%  \item \NR{We implement our Verilog semantics in \compcert{} and we are able to generate correct-by-construction Verilog for all programs in the CHStone benchmark suite, which is a well-known HLS benchmark.}
 \end{itemize}
 
-The first section will describe the Verilog semantics that were used and extended to fit into CompCert's model.  The second section will then describe the HLS algorithm, together with its proof.
+The first section will describe the Verilog semantics that were used and extended to fit into \compcert{}'s model.  The second section will then describe the HLS algorithm, together with its proof.
 
-Vericert is open source and is hosted on GitHub\footnote{\url{https://github.com/ymherklotz/vericert}}.
+\vericert{} is open source and is hosted on GitHub\footnote{\url{https://github.com/ymherklotz/vericert}}.
 
 \NR{Other comments:}
 \NR{Is both the translator and verifier written in Coq?}\YH{Currently there is no verifier, the algorithms themselves are proven directly in Coq.}
 %%\NR{A tool-flow diagram here will be useful. I thought you had one previously?}
-\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in Vericert?}\YH{The LegUp example actually with the shift might be a good one.}
+\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in \vericert{}?}\YH{The LegUp example actually with the shift might be a good one.}
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/main.tex b/main.tex
index 9fa2217..4387a03 100644
--- a/main.tex
+++ b/main.tex
@@ -66,6 +66,9 @@
 \newcommand\yhfunction[1]{\texttt{\textcolor{functioncolour}{#1}}}
 \newcommand\yhconstant[1]{\texttt{\textcolor{constantcolour}{#1}}}
 
+\newcommand{\vericert}{Vericert}%
+\newcommand{\compcert}{CompCert}%
+
 \begin{document}
 
 %% Title information
@@ -130,7 +133,7 @@
   % \JW{‘Not generating hardware’ isn’t too bad, and indeed VeriCert is guilty of that too, when presented with C programs that contain features that it can’t yet handle. The real problem is the HLS tool crashing unceremoniously.}
   High-level synthesis (HLS) refers to automatically compiling software into hardware.  In a world increasingly reliant on application-specific hardware accelerators, HLS provides a higher-level of abstraction for hardware designers, while also allowing the designers to benefit from the rich software ecosystem to verify the functionality of their designs.  However, adoption of HLS in safety-critical applications is still limited, because it may be infeasible to verify that the hardware implements the same behaviour as the software it was generated from, eliminating the benefits that software verification may provide.  There is also evidence that HLS tools tend to be quite unreliable instead, either generating wrong hardware or sometimes crashing in unexpected ways for supported inputs.
 
-  We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics.  Our tool, called VeriCert, extends the CompCert verified C compiler and consists of a new hardware-specific intermediate language and a Verilog back end, which has been proven correct in Coq.  VeriCert supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables.  It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool. \JW{Hopefully in due course we can nuance this a bit by adding in some figures relating to the unverified Vericert extensions.}\YH{This might be the case, but maybe we want to keep that for a completely new paper as discussed.}
+  We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics.  Our tool, called \vericert{}, extends the \compcert{} verified C compiler and consists of a new hardware-specific intermediate language and a Verilog back end, which has been proven correct in Coq.  \vericert{} supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables.  It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool. \JW{Hopefully in due course we can nuance this a bit by adding in some figures relating to the unverified \vericert{} extensions.}\YH{This might be the case, but maybe we want to keep that for a completely new paper as discussed.}
 \end{abstract}
 
 %% 2012 ACM Computing Classification System (CSS) concepts
@@ -157,7 +160,7 @@
 
 %% Keywords
 %% comma separated list
-\keywords{CompCert, Coq, high-level synthesis, C, Verilog}
+\keywords{\compcert{}, Coq, high-level synthesis, C, Verilog}
 
 \maketitle
 
diff --git a/verilog.tex b/verilog.tex
index c45aa26..b672145 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -25,7 +25,7 @@ The main addition to the Verilog syntax is the explicit declaration of inputs an
 
 \subsection{Semantics}
 
-An existing operational semantics for Verilog~\cite{loow19_verif_compil_verif_proces} was adapted for the semantics of the language that Vericert eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
+An existing operational semantics for Verilog~\cite{loow19_verif_compil_verif_proces} was adapted for the semantics of the language that Vericert eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of \compcert{}'s register transfer language (RTL) quite well.
 
 %~\NR{So the semantics does not support combinational always blocks or assignments?}\YH{Currently not, as they are not needed.}
 %~\NR{Does the semantics also enforce that variables can only be modified within one always block (multiple drivers not allowed)? PL community might think of data races. }\YH{Will add that.}
@@ -84,7 +84,7 @@ One main difference between these semantics and the Verilog semantics by \citet{
 In these semantics, module instantiations are not supported, as they can be modelled by inlining the logic that modules would have produced.  This therefore means that function calls are implemented by inlining all the functions as well.  Consequently, recursive function calls are not supported, however, these are not supported by other high-level synthesis tools either, as allocating memory dynamically is not possible with fixed size RAM blocks.
 \NR{Were these last two paragraphs referring to your assumptions or limitations or simplications you made? Discuss with Yann.}
 
-To integrate our semantics with CompCert, we need to define the same states that CompCert uses to define their semantics, which are the \texttt{State}, \texttt{Callstate} and \texttt{Returnstate}.  The \texttt{Callstate} and \texttt{Returnstate} are needed because the initial state describes a call to main, and the final state results in the return value from the main function.  Even though the Verilog semantics do not support function calls, semantics for the initial function call and final return state need to be implemented to be properly shown equivalent to the Verilog semantics.  Firstly, the \texttt{State} is contains all the information necessary to identify the current execution state of the function or module.  In Verilog, we therefore keep track of the current module $m$ we are working on, the current value of the program counter $v$, which translates to the current value of the state register $s_{t}$, the current contents of the stack frame \textit{sf} and finally the current states of the association maps for variables $\Gamma_{r}$ and arrays $\Gamma_{a}$.  We can therefore define the state as \texttt{State} \textit{sf} $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$.  The \texttt{Callstate} only needs to be supported for the initial call to the function and the \texttt{Returnstate} only needs to be supported for the return of the main function.  As function calls are not supported in the semantics, there cannot be any other possible calling state or return state in the program.
+To integrate our semantics with \compcert{}, we need to define the same states that \compcert{} uses to define their semantics, which are the \texttt{State}, \texttt{Callstate} and \texttt{Returnstate}.  The \texttt{Callstate} and \texttt{Returnstate} are needed because the initial state describes a call to main, and the final state results in the return value from the main function.  Even though the Verilog semantics do not support function calls, semantics for the initial function call and final return state need to be implemented to be properly shown equivalent to the Verilog semantics.  Firstly, the \texttt{State} is contains all the information necessary to identify the current execution state of the function or module.  In Verilog, we therefore keep track of the current module $m$ we are working on, the current value of the program counter $v$, which translates to the current value of the state register $s_{t}$, the current contents of the stack frame \textit{sf} and finally the current states of the association maps for variables $\Gamma_{r}$ and arrays $\Gamma_{a}$.  We can therefore define the state as \texttt{State} \textit{sf} $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$.  The \texttt{Callstate} only needs to be supported for the initial call to the function and the \texttt{Returnstate} only needs to be supported for the return of the main function.  As function calls are not supported in the semantics, there cannot be any other possible calling state or return state in the program.
 \NR{We had a lot of discussion on functions earlier in this section, but then now we claim that the Verilog semantics does not support functions. If I recall correctly, our C program can't have functions either, or we inline all functions. Aren't they contradicting? Discuss with Yann.}
 
 We then define the semantics of running the module for one clock cycle in the following way: