5 files changed, 27 insertions, 19 deletions
diff --git a/algorithm.tex b/algorithm.tex
index ba869e4..ba00bff 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -85,7 +85,7 @@ It has an unlimited number of pseudo-registers, and is represented as a control
 
 \subsection{An introduction to Verilog}
 
-This section will introduce Verilog for readers that may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}.  Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA.  Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
+This section will introduce Verilog for readers who may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}.  Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA.  Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
 
 Verilog behaves quite differently to standard software programming languages due to it having to express the parallel nature of hardware.  The basic construct to achieve this is the always-block, which is a collection of assignments that are executed every time some event occurs.  In the case of \vericert{}, this event is either a positive (rising) or a negative (falling) clock edge.  All always-blocks triggering on the same event are executed in parallel. Always-blocks can also express control-flow using if-statements and case-statements.
 
@@ -132,11 +132,11 @@ endmodule
       to [out=60,in=130] ($(s2.east) + (0.3,0.7)$) to [out=310,in=10] (s2);
     \end{tikzpicture}
   \end{subfigure}
-  \caption{A simple state machine implemented in Verilog, with it's state machine representation on the right.}%
+  \caption{A simple state machine implemented in Verilog, with its diagrammatic representation on the right.}%
   \label{fig:tutorial:state_machine}
 \end{figure}
 
-A simple state machine can therefore be implemented, it's Verilog implementation as well as a representation of the state machine can be seen in Figure~\ref{fig:tutorial:state_machine}.
+A simple state machine can be implemented as shown in Figure~\ref{fig:tutorial:state_machine}.
 At every positive edge of the clock (\texttt{clk}), both of the always-blocks will trigger simultaneously.  The first always-block controls the values in the register \texttt{x} and the output \texttt{z}, while the second always-block controls the next state the state machine should go to.  When the \texttt{state} is 0, \texttt{x} will be assigned to the input \texttt{y} using nonblocking assignment, denoted by \texttt{<=}.  Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second always-block, the input \texttt{y} will be checked, and if it's high it will move on to the next state, otherwise it will stay in the current state.  When \texttt{state} is 1, the first always-block will reset the value of \texttt{x} and then set \texttt{z} to the original value of \texttt{x}, since nonblocking assignment does not change its value until the end of the clock cycle.  Finally, the last always-block will set the state to be 0 again.
 
 \begin{figure}
@@ -399,8 +399,7 @@ One big difference between C and Verilog is how memory is represented.  Although
 However, the memory model that \compcert{} uses for its intermediate languages is byte-addre\-ssa\-ble~\cite{blazy05_formal_verif_memor_model_c}.  If a byte-addressable memory was used in the target hardware, which is closer to \compcert{}'s memory model, then a load and store would instead take four clock cycles, because a RAM can only perform one read and write per clock cycle.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way.  Since only integer loads and stores are currently supported in \vericert{}, it follows that the addresses given to the loads and stores will be multiples of four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory can be done by dividing the address by four.
 
 \subsubsection{Implementation of RAM interface}
-The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance.  To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.
-Interestingly, the Verilog syntax for the RAM interface is quite strict, as the synthesis tool will pattern-match on it and only work for a predefined set of interfaces. \NR{Bring forward this sentence to help with flow.}
+The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance.  To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.\footnote{Interestingly, the Verilog code shown for the RAM interface must not be modified, because the synthesis tool will only generate a RAM when the code matches a small set of specific patterns. \JW{I tweaked this slightly in an attempt to clarify; please check.}} %\NR{Bring forward this sentence to help with flow.}
 
 %\JW{I think the following sentence could be cut as we've said this kind of thing a couple of times already.} Without the interface, the array would be implemented using registers, which would increase the size of the hardware considerably.
 
@@ -486,7 +485,8 @@ One might hope that the synthesis tool consuming our generated Verilog would con
 where $\gg$ stands for a logical right shift. %Once this equivalence about the shifts and division operator is proven correct, it can be used to implement the \texttt{Oshrximm} using the efficient shift version instead of how the \compcert{} semantics described it.
 When proving this equivalence, we actually found a bug in our original implementation that was due to the fact that a na\"{i}ve shift rounds towards $-\infty$. \NR{What do you mean byy naive shift here?}
 
-\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed.  But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation.  I'll try and reword this.}\NR{I am not following this paragraph below too.} \compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused.  In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
+%\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed.  But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation.  I'll try and reword this.}\NR{I am not following this paragraph below too.} 
+\compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused.  In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts. \JW{I wonder, can I file this under `things that you've improved in CompCert generally as part of your efforts on Vericert''?}
 
 %The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
 
diff --git a/conclusion.tex b/conclusion.tex
index cf83a6d..6d98ffc 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -19,7 +19,7 @@ Beyond this, we plan to implement scheduling and loop pipelining, since this all
 Other optimisations include resource sharing to reduce the circuit area, and using tailored hardware operators that use hard IP blocks on chip and can be pipelined. 
 % this could include multi-cycle operations and pipelining optimisations so that division and multiplication operators also become more efficient.
 
-Finally, it's worth considering how trustworthy \vericert{} is compared to other HLS tools. The guarantee of full functional equivalence between input and output that \vericert{} provides is a strong one, the semantics for the source and target languages are both well-established, and Coq is a mature and thoroughly tested system. However, \vericert{} cannot guarantee to provide an output for every valid input -- for instance, as remarked in Section~\ref{sec:proof:htl_verilog}, \vericert{} will error out if given a program with more than about four million instructions! -- but our evaluation indicates that it does not seem to error out too frequently. And of course, \vericert{} cannot guarantee that the final hardware produced will be correct, because the Verilog it generates must pass through a series of unverified tools along the way. This concern may be allayed in the future thanks to recent work by~\citet{10.1145/3437992.3439916} to produce a verified hardware synthesis tool.
+Finally, it's worth considering how trustworthy \vericert{} is compared to other HLS tools. The guarantee of full functional equivalence between input and output that \vericert{} provides is a strong one, the semantics for the source and target languages are both well-established, and Coq is a mature and thoroughly tested system. However, \vericert{} cannot guarantee to provide an output for every valid input -- for instance, as remarked in Section~\ref{sec:proof:htl_verilog}, \vericert{} will error out if given a program with more than about four million instructions! -- but our evaluation indicates that it does not seem to error out too frequently. And of course, \vericert{} cannot guarantee that the final hardware produced will be correct, because the Verilog it generates must pass through a series of unverified tools along the way. This concern may be allayed in the future thanks to recent work by~\citet{10.1145/3437992.3439916} to produce a verified logic synthesis tool.
 
 
 %%% Local Variables:
diff --git a/introduction.tex b/introduction.tex
index ee5be65..0269640 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -44,8 +44,7 @@ The contributions of this paper are as follows:
 
 \begin{itemize}
   \item We present \vericert{}, the first mechanically verified HLS tool that compiles C to Verilog. In Section~\ref{sec:design}, we describe the design of \vericert{}, including a few optimisations related to memory accesses and division.
-  \item We state the correctness theorem of \vericert{} with respect to an existing semantics for Verilog due to \citet{loow19_proof_trans_veril_devel_hol}. In Section~\ref{sec:verilog}, we describe how we extended this semantics to make it suitable as an HLS target.  We also describe how the Verilog semantics are integrated into CompCert's language semantics model and it's framework for performing simulation proofs.\JW{I’m not sure what that means}\YH{Is this better?}  Additionally, a mapping of CompCert's infinite memory model onto a finite Verilog array is also described.
-%\JW{I think ``Verilog's memory model'' is a little misleading, because the memory model isn't part of Verilog. Can we say ``CompCert's memory model is mapped onto a finite Verilog array''?}
+  \item We state the correctness theorem of \vericert{} with respect to an existing semantics for Verilog due to \citet{loow19_proof_trans_veril_devel_hol}. In Section~\ref{sec:verilog}, we describe how we extended this semantics to make it suitable as an HLS target.  We also describe how the Verilog semantics is integrated into CompCert's language semantics model \JW{or `execution model'?} and its framework for performing simulation proofs. A mapping of CompCert's infinite memory model onto a finite Verilog array is also described.
   \item In Section~\ref{sec:proof}, we describe how we proved the correctness theorem. The proof follows standard \compcert{} techniques -- forward simulations, intermediate specifications, and determinism results -- but we encountered several challenges peculiar to our hardware-oriented setting. %\NR{`specific' is better than `peculiar'?} %JW: I think this is a nice use of peculiar. Note that it means `distinctive' here, not `weird' -- the third meaning at https://www.dictionary.com/browse/peculiar 
   These include handling discrepancies between the byte-addressed memory assumed by the input software and the word-addressed memory that we implement in the output hardware,  different handling of unsigned comparisons between C and Verilog, and carefully implementing memory reads and writes so that these behave properly as a RAM in hardware.
   %\JW{Not sure `rearranging' is quite the right word. Sounds like you're rearranging independent reads/writes w.r.t. each other. Maybe change `correctly rearranging' to `carefully implementing'?}
diff --git a/proof.tex b/proof.tex
index 7fb1048..7a4188c 100644
--- a/proof.tex
+++ b/proof.tex
@@ -18,9 +18,7 @@ Together, these differences mean that translating 3AC directly to Verilog is inf
 
 \subsection{Formulating the correctness theorem}
 
-The main correctness theorem is analogous to that stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil}: for all Clight source programs $C$, if the translation to the target Verilog code succeeds, and $C$ has safe observable behaviour $B$ when executed, then the target Verilog code will have the same behaviour $B$. Here, a `safe' execution is one that either converges or diverges, but does not ``go wrong''. If the program does admit some wrong behaviour (like undefined behaviour in C), the correctness theorem does not apply. A behaviour, then, is either a final state (in the case of convergence) or divergence. In \compcert{}, a behaviour is also associated with a trace of I/O events, but since external function calls are not supported in \vericert{}, this trace will always be empty.  Is this correctness theorem also right for HLS?
-
-It may be argued that hardware inherently runs forever and therefore does not produce a definitive final result.  This means that the \compcert{} correctness theorem would likely not help with proving that the hardware is actually working correctly, as the behaviour would always be divergent.  However, in practice HLS designs are not normally the top-level of the design which needs to connect to other components and would therefore need to run forever.  Instead, HLS designs are often used in larger hardware designs as smaller components which take an input, execute, and then terminate with an answer.  To start the execution of the hardware and to signal to the HLS component that the inputs are ready, the \textit{rst} signal is set and unset.  Then, once the result is ready, the \textit{fin} signal is set and the result value is placed in \textit{ret}.  These signals are also present in the semantics of execution shown in Figure~\ref{fig:inference_module}.  The theorem of correctness therefore also uses these signals, and the proof shows that once the \textit{fin} flag is set, the value in \textit{ret} is correct according to the semantics of Verilog and Clight.  Note that the compiler is allowed to fail and not produce any output; the correctness theorem only applies when the translation succeeds.
+The main correctness theorem is analogous to that stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil}: for all Clight source programs $C$, if the translation to the target Verilog code succeeds, and $C$ has safe observable behaviour $B$ when executed, then the target Verilog code will have the same behaviour $B$. Here, a `safe' execution is one that either converges or diverges, but does not `go wrong'. If the program does admit some wrong behaviour (such as undefined behaviour in C), the correctness theorem does not apply. A behaviour, then, is either a final state (in the case of convergence) or divergence. In \compcert{}, a behaviour is also associated with a trace of I/O events, but since external function calls are not supported in \vericert{}, this trace will always be empty.  
 
 %This correctness theorem is also appropriate for HLS, as HLS is often used as a part of a larger hardware design that is connected together using a hardware description language like Verilog.  This means that HLS designs are normally triggered multiple times and results are returned each time when the computation terminates, which is the property that the correctness theorem states.
 
@@ -35,10 +33,11 @@ It may be argued that hardware inherently runs forever and therefore does not pr
   \end{equation*}
 \end{theorem}
 
-The theorem is a `backwards simulation' result (from target to source), following the terminology used in the \compcert{} literature~\cite{leroy09_formal_verif_realis_compil}. The theorem does not demand the `if' direction too, because compilers are permitted to resolve any non-determinism present in their source programs.
-In practice, Clight programs are all deterministic, as are the Verilog programs in the fragment we consider. This means that we can prove the correctness theorem above by first inverting it to become a forwards simulation result, following standard \compcert{} practice.
+Why is this correctness theorem also the right one for HLS? It could be argued that hardware inherently runs forever and therefore does not produce a definitive final result.  This would mean that the \compcert{} correctness theorem would likely not help with proving that the hardware is actually working correctly, as the behaviour would always be divergent.  However, in practice, HLS does not normally produce the top-level of the design that needs to connect to other components and would therefore need to run forever.  Rather, HLS often produces smaller components that take an input, execute, and then terminate with an answer.  To start the execution of the hardware and to signal to the HLS component that the inputs are ready, the \textit{rst} signal is set and unset.  Then, once the result is ready, the \textit{fin} signal is set and the result value is placed in \textit{ret}.  These signals are also present in the semantics of execution shown in Figure~\ref{fig:inference_module}.  The correctness theorem therefore also uses these signals, and the proof shows that once the \textit{fin} flag is set, the value in \textit{ret} is correct according to the semantics of Verilog and Clight.  Note that the compiler is allowed to fail and not produce any output; the correctness theorem only applies when the translation succeeds.
+
+How can we prove this theorem? First, note that the theorem is a `backwards simulation' result (every target behaviour must also be a source behaviour), following the terminology used in the \compcert{} literature~\cite{leroy09_formal_verif_realis_compil}. The reverse direction (every source behaviour must also be a target behaviour) is not demanded because compilers are permitted to resolve any non-determinism present in their source programs. However, since Clight programs are all deterministic, as are the Verilog programs in the fragment we consider, we can actually reformulate the correctness theorem above as a forwards simulation result (following standard \compcert{} practice), which makes it easier to prove.
 
-Furthermore, to prove the forward simulation, it suffices to prove forward simulations between each pair of consecutive intermediate languages, as these results can be composed to prove the correctness of the whole HLS tool. 
+To prove this forward simulation, it suffices to prove forward simulations between each pair of consecutive intermediate languages, as these results can be composed to prove the correctness of the whole HLS tool. 
 The forward simulation from 3AC to HTL is stated in Lemma~\ref{lemma:htl} (Section~\ref{sec:proof:3ac_htl}), the forward simulation for the RAM insertion is shown in Lemma~\ref{lemma:htl_ram} (Section~\ref{sec:proof:ram_insertion}), then the forward simulation between HTL and Verilog is shown in Lemma~\ref{lemma:verilog} (Section~\ref{sec:proof:htl_verilog}) and finally, the proof that Verilog is deterministic is given in Lemma~\ref{lemma:deterministic} (Section~\ref{sec:proof:deterministic}).
 
 \subsection{Forward simulation from 3AC to HTL}\label{sec:proof:3ac_htl}
diff --git a/verilog.tex b/verilog.tex
index 791b9c1..39530d0 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -102,7 +102,7 @@ To support this computational model, we extend the Verilog module we generate wi
   \item[return value] The return value can be modelled by setting a finished flag to 1 when the result is ready, and putting the result into a 32-bit output register. These are denoted as \textit{fin} and \textit{ret} respectively.
 %\JW{Is there a mismatch between `ret' in the figure and `rtrn' in the text?}
   \item[stack] The function stack can be modelled as a RAM block, which is implemented using an array in the module, and denoted as \textit{stk}.
-\JW{Is there a mismatch between `st' in the figure and `stk' in the text?}\YH{It was actually between $\Gamma_{a}$ and \textit{stk}.  The \textit{st} should have been $\sigma$.}
+%\JW{Is there a mismatch between `st' in the figure and `stk' in the text?}\YH{It was actually between $\Gamma_{a}$ and \textit{stk}.  The \textit{st} should have been $\sigma$.}
 \end{description}
 
 Figure~\ref{fig:inference_module} shows the inference rules for moving between the computational states.  The first, \textsc{Step}, is the normal rule of execution.  It defines one step in the \texttt{State} state, assuming that the module is not being reset, that the finish state has not been reached yet, that the current and next state are $v$ and $v'$, and that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Step} rule.  The \textsc{Finish} rule returns the final value of running the module and is applied when the \textit{fin} register is set; the return value is then taken from the \textit{ret} register.
@@ -119,7 +119,7 @@ Therefore, in addition to the rules shown in Figure~\ref{fig:inference_module},
 
 \subsection{Memory Model}\label{sec:verilog:memory}
 
-The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language.  There is no preexisting architecture that Verilog will produce; it can describe any memory layout that is needed.  Instead of having specific semantics for memory, the semantics only needs to support the language features that can produce these different memory layouts, these being Verilog arrays.  We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment.  We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays used in Verilog.  The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite.  There have already been various efforts to define a finite memory model for all compiler passes in \compcert{}, such as Comp\-Cert\-S~\cite{besson18_compc}, Comp\-Cert\-ELF~\cite{wang20_compc} and Comp\-Cert\-TSO~\cite{sevcik13_compc}, however, we define the translation from \compcert{}'s standard infinite memory model to finitely sized arrays that can be represented in Verilog, leaving the compiler passes intact. \JW{I'm not quite sure I understand. Let me check: Are you saying that previous work has shown how all the existing CompCert passes can be adapted from an infinite to a finite memory model, but what we're doing is leaving the default (infinite) memory model for the CompCert front end, and just converting from an infinite memory model to a finite memory model when we go from 3AC to HTL?}\YH{Yes exactly, most papers changed the whole memory model to thread through properties that were then needed in the back end, but we currently don't need to do that.  I need to double check though for CompCertELF, it doesn't actually seem to be the case.  Will edit this section later.}
+The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language.  There is no preexisting architecture that Verilog will produce; it can describe any memory layout that is needed.  Instead of having specific semantics for memory, the semantics only needs to support the language features that can produce these different memory layouts, these being Verilog arrays.  We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment.  We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays used in Verilog.  The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite.  There have already been various efforts to define a finite memory model for \JW{individual?} compiler passes in \compcert{}, such as Comp\-Cert\-S~\cite{besson18_compc}, Comp\-Cert\-ELF~\cite{wang20_compc} and Comp\-Cert\-TSO~\cite{sevcik13_compc}, however, we define the translation from \compcert{}'s standard infinite memory model to finite arrays that can be represented in Verilog, leaving the compiler passes intact. \JW{I'm not quite sure I understand. Let me check: Are you saying that previous work has shown how all the existing CompCert passes can be adapted from an infinite to a finite memory model, but what we're doing is leaving the default (infinite) memory model for the CompCert front end, and just converting from an infinite memory model to a finite memory model when we go from 3AC to HTL?}\YH{Yes exactly, most papers changed the whole memory model to thread through properties that were then needed in the back end, but we currently don't need to do that.  I need to double check though for CompCertELF, it doesn't actually seem to be the case.  Will edit this section later.}
 
 \begin{figure}
   \centering
@@ -190,12 +190,22 @@ The Verilog semantics do not define a memory model for Verilog, as this is not n
     \draw (7,-4.3) -- (12,-4.3);
     \node at (9.5,-4.7) {\small \texttt{stack[0] <= 0xDEADBEEF;}};
   \end{tikzpicture}
-  \caption{Change in the memory model during the translation of 3AC to HTL.  The state of the memories in each case is right after the execution of the store to memory.}\label{fig:memory_model_transl}
+  \caption{Change in the memory model during the translation of 3AC into HTL.  The state of the memories in each case is right after the execution of the store to memory.}\label{fig:memory_model_transl}
 \end{figure}
 
 %\JW{It's not completely clear what the relationship is between your work and those works. The use of `only' suggests that you've re-done a subset of work that has already been done -- is that the right impression?}\YH{Hopefully that's more clear.}
 
-This translation is represented in Figure~\ref{fig:memory_model_transl}.  \compcert{} defines a map from blocks to maps from memory addresses to memory contents.  Each block represents an area in memory; for example, a block can represent a global variable or a stack for a function. As there are no global variables, the main stack's block number can be assumed to always be 0. \JW{So the stack frame for a function called by main would be in a different block, is that the idea? Seems unusual not to have a single stack.}\YH{Yeah exactly, it makes it much easier to reason about though, because everything is nicely isolated.  This is exactly what CompCertELF and CompCertS try and solve though.} \JW{Would global variables normally be put in blocks 1, 2, etc.?}\YH{Yes, although it may also be possible that they could be numbered 0, 1, 2, 3, 4, pushing the block of the stack higher.}  Meanwhile, our Verilog semantics defines two finite arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$. \JW{It's a slight shame that `block' is used in two different senses in the preceding two sentences. I guess that can't be helped.}\YH{Ah that's true, I hadn't even noticed.  Yeah I think it would be good to keep the name ``block'' for CompCert's blocks.} The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle.  During our translation we only convert block 0 to a Verilog memory, and ensure that it is the only block that is present.  This means that the block necessarily represents the stack of the main function.  The invariant that then has to hold in the proofs is that block 0 should be equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.
+This translation is represented in Figure~\ref{fig:memory_model_transl}.  \compcert{} defines a map from blocks to maps from memory addresses to memory contents.  Each block represents an area in memory; for example, a block can represent a global variable or a stack for a function. As there are no global variables, the main stack can be assumed to be block 0, \JW{and this is the only block we translate}.
+%\JW{So the stack frame for a function called by main would be in a different block, is that the idea? Seems unusual not to have a single stack.}
+%\YH{Yeah exactly, it makes it much easier to reason about though, because everything is nicely isolated.  This is exactly what CompCertELF and CompCertS try and solve though.} 
+%\JW{Would global variables normally be put in blocks 1, 2, etc.?}
+%\YH{Yes, although it may also be possible that they could be numbered 0, 1, 2, 3, 4, pushing the block of the stack higher.} 
+Meanwhile, our Verilog semantics defines two finite arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$. 
+%\JW{It's a slight shame that `block' is used in two different senses in the preceding two sentences. I guess that can't be helped.}
+%\YH{Ah that's true, I hadn't even noticed.  Yeah I think it would be good to keep the name ``block'' for CompCert's blocks.} 
+The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle. %During our translation we only convert block 0 to a Verilog memory, and ensure that it is the only block that is present.  
+%This means that the block necessarily represents the stack of the main function.  
+The invariant used in the proofs is that block 0 should be equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.
 
 %However, in practice, assigning and reading from an array directly in the state machine will not produce a memory in the final hardware design, as the synthesis tool cannot identify the array as having the necessary properties that a RAM needs, even though this is the most natural formulation of memory.  Even though theoretically the memory will only be read from once per clock cycle, the synthesis tool cannot ensure that this is true, and will instead create a register for each memory location.  This increases the size of the circuit dramatically, as the RAM on the FPGA chip will not be reused.  Instead, the synthesis tool expects a specific interface that ensures these properties, and will then transform the interface into a proper RAM during synthesis.  Therefore, a translation has to be performed from the naive use of memory in the state machine, to a proper use of a memory interface.