4 files changed, 23 insertions, 9 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 38fd5bf..927c54f 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -226,11 +226,11 @@ Another problem with the representation of the state as an actual register is th
 
 \subsection{Optimisations}
 
-Although \vericert{} is not yet an `optimising' HLS compiler, we have implemented a few optimisations that aim to improve the quality of the hardware designs it produces.
+Although \vericert{} is not yet a proper `optimising' HLS compiler, we have implemented a few optimisations that aim to improve the quality of the hardware designs it produces.
 
 \subsubsection{Byte- and word-addressable memories}
 
-One big difference between C and Verilog is how memory is represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring two-dimensional arrays with specific properties.  A major limitation is that \JW{common} RAMs often only allow one read and one write per clock cycle, \JWcouldcut{for example if implementing single port RAM, which is the most common type of RAM}.  To make loads and stores as efficient as possible, the RAM needs to be \JWcouldcut{implemented as being} word-addressable, so that an entire integer can be loaded or stored in one clock cycle.
+One big difference between C and Verilog is how memory is represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring two-dimensional arrays with specific properties.  A major limitation is that RAMs often only allow one read and one write per clock cycle, \JWcouldcut{for example if implementing single port RAM, which is the most common type of RAM}.  To make loads and stores as efficient as possible, the RAM needs to be \JWcouldcut{implemented as being} word-addressable, so that an entire integer can be loaded or stored in one clock cycle.
 However, the memory model that \compcert{} uses for its intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported in our HLS back end, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting \JWcouldcut{by} the base address of the memory. \JW{Why does `subtracting the base address of the memory' have anything to do with whether the memory is byte or word addressed? Don't you have to do that either way? Or perhaps you're saying that your memory is not only word-addressed, but it also starts at address 0 rather than some random address like you'd get in software?}
 
 \subsubsection{Reset signals}
diff --git a/introduction.tex b/introduction.tex
index 3895fe4..a71f9a2 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -74,7 +74,7 @@ Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations
 \subsection{Existing verification workarounds}
 
 It is rather difficult to exhaustively test a HLS tool to prove for the absence of bugs since these codebases are very large and often include custom passes and directives.
-Hence, most existing work on verifying HLS tools focus on proving that HLS-generated hardware is equivalent to its software counterpart for all possible inputs of the program, as known as translation validation~\cite{pnueli98_trans}. 
+Hence, most existing work on verifying HLS tools focus on proving that HLS-generated hardware is equivalent to its software counterpart for all possible inputs of the program, or \emph{translation validation}~\cite{pnueli98_trans}. 
 Translation validation has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
 However, translation validation often suffers from one key problem: the validator itself might not have been mechanically proven to be correct and even it is proven there can be an interpretation gap between the mechanised proof and its implementation. 
 Furthermore, translation validation also has other practical problems: it needs to be invoked everytime a new program is compiled and it can also lead to exponential growth in state space. 
diff --git a/main.tex b/main.tex
index a6da81a..84890f5 100644
--- a/main.tex
+++ b/main.tex
@@ -54,9 +54,11 @@
 \newcommand\JW[1]{\Comment{red!75!black}{JW}{#1}}
 \newcommand\YH[1]{\Comment{green!50!blue}{YH}{#1}}
 \newcommand\JP[1]{\Comment{blue!50!black}{JP}{#1}}
-\newcommand\NR[1]{\Comment{yellow!50!black}{NR}{#1}}
+\newcommand\NR[1]{\Comment{blue}{NR}{#1}}
 
 \newcommand\JWcouldcut[1]{{\st{#1}}}
+\newcommand\NRcouldcut[1]{{\st{#1}}}
+\newcommand\NRreplace[2]{{\st{#1} #2}}
 
 \definecolor{compcert}{HTML}{66c2a5}
 \definecolor{formalhls}{HTML}{fc8d62}
diff --git a/verilog.tex b/verilog.tex
index 0eef44e..d7e04bb 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -1,18 +1,30 @@
 \section{Verilog}\label{sec:verilog}
 
-This section describes the Verilog semantics that were chosen for the target language, including the changes that were made to the semantics to be a better fit as an HLS target.  The Verilog standard is quite large~\cite{06_ieee_stand_veril_hardw_descr_languag,05_ieee_stand_veril_regis_trans_level_synth}, however, the syntax and semantics can be reduced to a small subset that \vericert{} needs to target.
+\newcommand{\alwaysblock}{\texttt{always}-block}
 
-The Verilog semantics is based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design.  This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main features that are not supported by the syntax and semantics are continuous assignment and combinational always blocks.
+This section describes the Verilog semantics that were \JW{was} chosen for the target language, including the changes that were made to the semantics to make it \JWcouldcut{a better fit} a \JW{suitable} HLS target.  The Verilog standard is quite large~\cite{06_ieee_stand_veril_hardw_descr_languag,05_ieee_stand_veril_regis_trans_level_synth}, but the syntax and semantics can be reduced to a small subset that \vericert{} needs to target.
+\NR{Have we discussed what our options were and why we chose the HOL4 semantics?} \JW{Good point -- we should cite the Maude semantics too. I think that's the only viable alternative.}
 
-The semantics of Verilog differ from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, instead of describing an algorithm, which is often done sequentially.  The main construct in Verilog is the always block, which is construct that contains statements.  A module can contain multiple always blocks, which all run in parallel.  These always blocks further contain statements such as if-statements or assignments to variables.  Each always block also contains a list of events at which it should trigger, which could either contain signals that are assigned to other signals in that always block, or a different signal such as a clock which will trigger the always block periodically, only the latter are actually supported in our target semantics.  As hardware designs normally describe events that will be executed periodically for an infinite amount of time, the top-level of the semantics can be described using small-step semantics, whereas the execution of one small step is then described using big-step semantics.  An example of a rule in the semantics for executing an always block in the semantics shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the always block:
+The Verilog semantics \JW{we use is ported to Coq from} a semantics written in HOL4  by \citet{loow19_verif_compil_verif_proces}.% which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design. 
+This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main features that are not supported by the syntax and semantics are continuous assignment and combinational always blocks.
+\NR{Shall we use special font fo always-blocks: maybe \alwaysblock{}}
+
+The semantics of Verilog differs from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, rather than an algorithm, which is usually sequential.  The main construct in Verilog is the always block \JW{consider `always-block' as it's easier to parse}. \JWcouldcut{which is a construct that contains statements.}
+A module can contain multiple always blocks, all of which run in parallel.  These always blocks further contain statements such as if-statements or assignments to variables.  Each always block also contains a list of events at which it should trigger, which could either contain signals that are assigned to other signals in that always block, or a different signal such as a clock which will trigger the always block periodically, only the latter are actually supported in our target semantics. \JW{That sentence is a bit wordy. Can you just say something like: `We support only \emph{synchronous} logic, which means that the always block is triggered on (and only on) the rising edge of a clock signal.'?}  
+\NR{We should mention that variables cannot be driven by multiple \alwaysblock{}s, since one might get confused with data races when relating to concurrent processes in software.} \JW{Mm, }
+
+
+
+As hardware designs normally describe events that will be executed periodically for an infinite amount of time, the top-level of the semantics is best described using small-step semantics, whereas the execution of one small step is described using big-step semantics. An example of a rule in the semantics for executing an always block is shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the always block:
 
 \begin{equation*}
   \inferrule[Always]{(\Sigma, s)\longrightarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \longrightarrow_{\text{always}} \Sigma'}
 \end{equation*}
 
-\noindent which shows that assuming the statement $s$ in the always block runs with state $\Sigma$ and produces the new state $\Sigma'$, the always block will result in the same final state.  As only clocked always blocks are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle which is what it is defined to do.
+\noindent which shows that assuming the statement $s$ in the always block runs with state $\Sigma$ and produces the new state $\Sigma'$, the always block will result in the same final state.  As only clocked always blocks are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle \NRcouldcut{which is what it is defined to do}.
+\NR{The mention about small steps being one cycle and 'only clocked/synchronous \alwaysblock{} are supported' can move earlier.}
 
-Two types of assignments are supported in always blocks: nonblocking and blocking assignment.  Nonblocking assignment modifies the signal at the end of the timestep and atomically.  Blocking assignment, on the other hand, assigns the variable directly in the always block for later signals to pick up.  To model both these assignments, the state $\Sigma$ has to be split into two parts: $\Gamma$, containing the current values of all variables, and $\Delta$, containing the values that will be assigned to the variables at the end of the clock cycle, we can therefore say that $\Sigma = (\Gamma, \Delta)$.  The nonblocking assignment can therefore be expressed as the following:
+Two types of assignments are supported in always blocks: nonblocking and blocking assignment.  Nonblocking assignment modifies the signal at the end of the timestep and atomically.  Blocking assignment, on the other hand, assigns the variable \NRreplace{directly}{instantaneously?} in the always block for later signals to pick up.  To model both these assignments, the state $\Sigma$ has to be split into two \NRreplace{parts}{sets?}: $\Gamma$, containing the current values of all variables, and $\Delta$, containing the values that will be assigned to the variables at the end of the clock cycle, we can therefore say that $\Sigma = (\Gamma, \Delta)$.~\NR{Can we say that $\Gamma$ contains instantaneous (ephemeral) updates for the current cycle and $\Delta$ contains periodical updates for the next cycle? Will be good to distinguish those two updates with the same terms across the paper, which can tie in with small-step and big-step distinction aybe asynchronous vs synchronous updates. }  The nonblocking assignment can therefore be expressed as the following:
 
 \begin{equation*}
   \inferrule[Nonblocking Reg]{\yhkeyword{name}\ d = \yhkeyword{OK}\ n \\ (\Gamma, e) \longrightarrow_{\text{expr}} v}{((\Gamma, \Delta), d\ \yhkeyword{ <= } e) \longrightarrow_{\text{stmnt}} (\Gamma, \Delta [n \mapsto v])}\\