5 files changed, 49 insertions, 48 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 712ad3c..8323cfb 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -206,11 +206,34 @@ This subsection lists some key challenges that were encountered when implementin
 
 One big difference between C and Verilog is how memory can be represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring a two dimensional array with specific properties.  One main limitation is that RAMs often only allow one read and one write per clock cycle, for example if implementing single port RAM, which is the most common type of RAM.  To make loads and stores as efficient as possible, the RAM needs to be implemented as being word addressable, so that a load and store of an integer can be done in one clock cycle.
 
-However, the memory model that CompCert uses for it's intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the CompCert memory model have to also be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported for the HLS backend, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting by the base address of the memory.
+However, the memory model that \compcert{} uses for it's intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model have to also be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported for the HLS backend, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting by the base address of the memory.
 
 \subsubsection{Reset signals}
 
-\subsubsection{Implementation of the \texttt{Oshrximm} instruction}
+\YH{This section could maybe go into the proof section instead.} In hardware there is also no notion of a stack which can be popped or pushed.  It is therefore necessary to add reset signals to the main module that will be called so that the proper state can be set at the start of the function.
+
+Even though functions calls are not supported by \vericert{}, the initial function call that \compcert{} uses to enter the main function does need to be supported in the Verilog semantics, as explained further in Section~\ref{sec:verilog}.  The reset signal therefore has to be reasoned about correctly in the Semantics and in the initial function call to ensure that the initial state of the module is set correctly, as in 3AC, the entry point of the function is part of the function definition.
+
+\subsubsection{\texttt{Oshrximm} instruction}
+
+% Mention that this optimisation is not performed sometimes (clang -03).
+
+Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware could run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
+
+These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
+
+Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two.  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:
+
+\begin{align*}
+  &\forall x, y \in \mathbb{Z},\ \ 0 \leq y < 31,\ \ -2^{31} \leq x < 2^{31},\\
+  &x \div 2^y =
+  \begin{cases}
+    \left\lfloor \frac{x}{2^y} \right\rfloor = x >> y,& \text{if } x \geq 0\\
+    \left\lceil \frac{x}{2^y} \right\rceil = - \left\lfloor \frac{-x}{2^y} \right\rfloor = - ( - x >> y ),& \text{otherwise}.
+  \end{cases}\\
+\end{align*}
+
+The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
 
 %\JW{I wonder if Section 2 could benefit from a `Some Key Challenges' subsection, where you highlight several interesting bits of the translation process, each with their own paragraph heading. These could be something like:\begin{enumerate}\item Discrepancy between C and Verilog w.r.t. signedness \item Deciding between byte- and word-addressable memories \item Adding reset signals \item Implementing the Oshrximm instruction correctly \end{enumerate} For the causal reader, this would immediately signal two things: (1) you can skip this subsection on your initial pass, and (2) proving the HLS tool correct was a non-trivial undertaking.}
 
diff --git a/appendix.tex b/appendix.tex
index 2ab077e..1e3cdb3 100644
--- a/appendix.tex
+++ b/appendix.tex
@@ -5,6 +5,27 @@
 \begin{figure*}
   \centering
   \begin{minipage}{1.0\linewidth}
+    \begin{align*}
+      v\quad ::=&\; 32 \yhkeyword{'d} n\\
+      \textit{op}\quad ::=&\; \yhkeyword{+ } | \yhkeywordsp{- } | \yhkeywordsp{* } \cdots \\
+      e\quad ::=&\; v\;\; |\;\; x\;\; |\;\; e \yhkeyword{[} e \yhkeyword{]}\;\; |\;\; e\ \mathit{op}\ e\;\; |\;\; \yhkeyword{\textasciitilde} e\;\; |\;\; e \yhkeywordsp{? } e \yhkeywordsp{: } e\\
+      s\quad ::=&\; s\ s\ |\ \epsilon\\[-2pt]
+      |&\; \yhkeyword{if(} e \yhkeyword{) } s \yhkeywordsp{else } s\\[-2pt]
+      |&\; \yhkeyword{case(} e \yhkeyword{) } e : s\ \{\ e : s\ \}\ [\ s\ ] \yhkeywordsp{endcase}\\[-2pt]
+      |&\; e = e \yhkeyword{;}\\[-2pt]
+      |&\; e \Leftarrow e \yhkeyword{;}\\
+      d\quad ::=&\; \yhkeyword{[n-1:0] } r\ |\ \yhkeyword{[n-1:0] } r \yhkeywordsp{[m-1:0]}\\
+      m\quad ::=&\; \yhkeyword{reg } d \yhkeyword{;}\ |\ \yhkeyword{input wire } d \yhkeyword{;}\ |\ \yhkeyword{output reg } d \yhkeyword{;}\\
+      |&\; \yhkeywordsp{always @(posedge clk) } s \\
+      m \text{ list}\quad ::=&\; \{ m \}
+    \end{align*}
+  \end{minipage}
+  \caption{Verilog syntax for values $v$, expressions $e$, statements $s$ and module items $m$.}\label{fig:verilog_syntax}
+\end{figure*}
+
+\begin{figure*}
+  \centering
+  \begin{minipage}{1.0\linewidth}
     \begin{gather*}
       \label{eq:1}
       \inferrule[Skip]{ }{\textit{srun}\ \sigma\ \epsilon\ \sigma}\\
diff --git a/main.tex b/main.tex
index 0fc1eea..8697347 100644
--- a/main.tex
+++ b/main.tex
@@ -46,6 +46,7 @@
 \usepackage{minted}
 
 \setminted{fontsize=\small}
+\usemintedstyle{manni}
 
 \newif\ifCOMMENTS
 \COMMENTStrue
diff --git a/proof.tex b/proof.tex
index 355094c..a01b688 100644
--- a/proof.tex
+++ b/proof.tex
@@ -113,28 +113,7 @@ The Verilog semantics that are used are deterministic, as the order of operation
 
 \JW{Would be nice to include a few high-level metrics here. How many person-years of effort was the proof (very roughly)? How many lines of Coq? How many files, how many lemmas? How long does it take for the Coq proof to execute?}
 
-\subsection{Proving \texttt{Oshrximm} correct}
-
-% Mention that this optimisation is not performed sometimes (clang -03).
-
-Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware could run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
-
-These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
-
-Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two.  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:
-
-\begin{gather*}
-\forall x, y \in \mathbb{Z},\ \ 0 \leq y < 31,\ \ -2^{31} \leq x < 2^{31},\\
-x \div 2^y =
-\begin{cases}
-  \left\lfloor \frac{x}{2^y} \right\rfloor = x >> y,& \text{if } x \geq 0\\
-  \left\lceil \frac{x}{2^y} \right\rceil = - \left\lfloor \frac{-x}{2^y} \right\rfloor = - ( - x >> y ),& \text{otherwise}
-\end{cases}\\
-\end{gather*}
-
-The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
-
-%\subsection{Coq Mechanisation}
+\subsection{Coq Mechanisation}
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/verilog.tex b/verilog.tex
index dfa378e..e2a716b 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -1,35 +1,12 @@
-\section{Verilog}
+\section{Verilog}\label{sec:verilog}
 
 \JW{I'm not sure the Verilog syntax figure adds enough value to be worthy of inclusion in the body -- perhaps it could be demoted to an appendix. Instead, I think it would be interesting to see a bit more about how the semantics works. Not the whole gamut of inference rules or anything, but a little bit about how the rules work and what the appropriate notion of state is.}
 
 This section describes the Verilog semantics that were chosen for the target language, including the changes that were made to the semantics to be a better fit as an HLS target.
 
 Verilog~\cite{06_ieee_stand_veril_hardw_descr_languag} is a hardware description language commonly used to design hardware.  A Verilog design can then be synthesised into logic gates which describes how different gates connect to each other, called a netlist.  This representation can then be mapped onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC) to implement the design that was described in Verilog.  The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware.  Many Verilog features are only useful for simulation and do not affect the actual hardware itself, we can therefore restrict the Verilog semantics to the synthesisable subset of Verilog~\cite{05_ieee_stand_veril_regis_trans_level_synth}.  In addition to that, HLS dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.  Supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated and synthesised by existing tools and how it is dictated by the standard.
-%~\NR{Didn't get this sentence? Do you mean that the HLS algo further restricts the synthesisable subset?}\YH{Yes basically, because we get to choose what we generate.  For example, we don't have to support combinational always blocks.}
-%\NR{What is the distinction here between the semantics and simulation? Discuss.}\YH{Tried to clarify that I meant simulation and synthesis by tools and how it should be understood from the standard}
 
 The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design.  These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design. An abstraction of the Verilog syntax that is generated is shown in Figure~\ref{fig:verilog_syntax}.  The main features that are supported by the syntax and semantics are module items, such as variable declarations and always blocks, statements and expressions.
-%\NR{What is HOL? Was it discussed earlier?}\YH{Another theorem prover, have clarified hopefully.}
-
-\begin{figure}
-  \centering
-  \begin{align*}
-    v\quad ::=&\; 32 \yhkeyword{'d} n\\
-    \textit{op}\quad ::=&\; \yhkeyword{+ } | \yhkeywordsp{- } | \yhkeywordsp{* } \cdots \\
-    e\quad ::=&\; v\;\; |\;\; x\;\; |\;\; e \yhkeyword{[} e \yhkeyword{]}\;\; |\;\; e\ \mathit{op}\ e\;\; |\;\; \yhkeyword{\textasciitilde} e\;\; |\;\; e \yhkeywordsp{? } e \yhkeywordsp{: } e\\
-    s\quad ::=&\; s\ s\ |\ \epsilon\\[-2pt]
-    |&\; \yhkeyword{if(} e \yhkeyword{) } s \yhkeywordsp{else } s\\[-2pt]
-    |&\; \yhkeyword{case(} e \yhkeyword{) } e : s\ \{\ e : s\ \}\ [\ s\ ] \yhkeywordsp{endcase}\\[-2pt]
-    |&\; e = e \yhkeyword{;}\\[-2pt]
-    |&\; e \Leftarrow e \yhkeyword{;}\\
-    d\quad ::=&\; \yhkeyword{[n-1:0] } r\ |\ \yhkeyword{[n-1:0] } r \yhkeywordsp{[m-1:0]}\\
-    m\quad ::=&\; \yhkeyword{reg } d \yhkeyword{;}\ |\ \yhkeyword{input wire } d \yhkeyword{;}\ |\ \yhkeyword{output reg } d \yhkeyword{;}\\
-    |&\; \yhkeywordsp{always @(posedge clk) } s \\
-    m \text{ list}\quad ::=&\; \{ m \}
-  \end{align*}
-
-  \caption{Verilog syntax for values $v$, expressions $e$, statements $s$ and module items $m$.}\label{fig:verilog_syntax}
-\end{figure}
 
 The semantics of Verilog differ from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, instead of describing an algorithm, which is often done sequentially.  The main construct in Verilog is the always block, which is construct that contains statements.  A module can contain multiple always blocks, which all run in parallel.  Each always block also contains a list of events at which it should trigger, which could either contain signals that are assigned to other signals in that always block, or a different signal such as a clock which will trigger the always b lock periodically.  Two types of assignments are also supported in always blocks: nonblocking and blocking assignment.  Nonblocking assignment modifies the signal at the end of the timestep, and atomically, meaning a swap operation can be implement without a temporary variable.  Blocking assignment, on the other hand, assigns the variable directly in the always block for later signals to pick up.  Using these constructs it is therefore possible to describe how hardware functions, where always blocks that are triggered by a clock periodically get translated into flip-flops and always blocks triggered by changes in any internal signals are translated into combinational logic.