3 files changed, 50 insertions, 45 deletions
diff --git a/introduction.tex b/introduction.tex
index cbf4780..1476d33 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -2,20 +2,18 @@
 
 %% Motivation for why HLS might be needed
 
-The current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general purpose \JW{general-purpose} CPU.\@  However, custom hardware designs come at the cost of having to design and produce it \JW{them?}, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog.  Especially with the size of hardware designs grow \JW{growing} over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of hardware description languages \JW{HDLs} can be quite inefficient.  Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time consuming \JW{time-consuming}.
+The current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@  However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog.  Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient.  Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
 
 %% Definition and benefits of HLS
-One possible solution to \JW{the} tedious design process of custom hardware is to use high-level synthesis (HLS), which is the process of generating custom hardware, represented in a hardware description language \JW{HDL}, based on a behavioural description, often \JW{in} a subset of C.  This elevates the level of abstraction, because the description of the algorithm in C is inherently untimed, meaning actions don't have to be scheduled into clock cycles.  The higher level of abstraction makes it easier to reason about the algorithms and therefore also makes them easier to maintain.  This already reduces the time taken to design the hardware, especially if a software description of the algorithm already exists, because it won't \JW{doesn't} have to be designed again at a lower level and directly in hardware.  However, another benefit of using HLS to design the hardware, \JW{no comma here} is that functional verification of the design is much simpler and more efficient than if it was \JW{were} done at the HDL stage, as the whole software ecosystem can be used to do that.  Instead of having to run simulations of the hardware, the C code can just be compiled and executed natively, as the hardware design after HLS should have the same behaviour.
-\NR{The abstraction of HLS helps in two ways: improving productivity of hardware designers and reducing the entry barrier of hardware design for software programmers. Both these audiences stand to benefit from the guarantees provided by verified C-to-Verilog generation.} \JW{Yes, Nadesh makes a good point here. Worth incorporating.}
+One possible solution to the tedious design process of custom hardware is to use high-level synthesis (HLS), which is the process of generating custom hardware, represented in a HDL, based on a behavioural description, often in a subset of C.  This elevates the level of abstraction, because the description of the algorithm in C is inherently untimed, meaning actions don't have to be scheduled into clock cycles.  The higher level of abstraction makes it easier to reason about the algorithms and therefore also makes them easier to maintain.  This already reduces the time taken to design the hardware, especially if a software description of the algorithm already exists, because it doesn't have to be designed again at a lower level and directly in hardware.  However, another benefit of using HLS to design the hardware is that functional verification of the design is much simpler and more efficient than if it were done at the HDL stage, as the whole software ecosystem can be used to do that.  Instead of having to run simulations of the hardware, the C code can just be compiled and executed natively, as the hardware design after HLS should have the same behaviour.  Moreover, the abstraction that HLS provides helps to improve the productivity of hardware designers, as well as reduces the barrier of entry to hardware design for software programmers.  For hardware designers, checking that the created hardware does indeed behave properly may reduce productivity, whereas for software programmers it may be unfeasible to properly test the hardware as they are unaware of the standard tools used.  Both these audiences therefore stand to benefit from the guarantees provided by verified C-to-Verilog generation.
+% \NR{The abstraction of HLS helps in two ways: improving productivity of hardware designers and reducing the entry barrier of hardware design for software programmers. Both these audiences stand to benefit from the guarantees provided by verified C-to-Verilog generation.} \JW{Yes, Nadesh makes a good point here. Worth incorporating.}\YH{Added.}
 
 %% Unreliability of HLS
 
-However, the fact that the behaviour is preserved after HLS cannot be guaranteed most existing tools,\YH{Mentor's catapult C can in some cases} meaning behavioural simulation of the hardware design still has to be performed. HLS tools are also known to be quite unreliable, for example, Intel's (formerly Altera's) OpenCL SDK compiler contained too many bugs to even be considered for random testing, as more than 25\% of the testcases failed~\cite{lidbury15_many_core_compil_fuzzin}. In addition to that, designers often feel like HLS tools are quite unreliable and fragile with respect to which language features that are supported.\YH{Need citation}
-\JW{Here's some text that could be helpful in this paragraph... However, most HLS tools cannot guarantee that compilation is behaviour-preserving. In fact, on the contrary, there is some evidence that current HLS tools are actually quite \emph{unreliable} in this regard. For instance, an attempt by \citet{lidbury15_many_core_compil_fuzzin} to fuzz Altera's (now Intel's) OpenCL compiler had to be abandoned because the compiler ``either crashed or emitted an internal compiler error'' on so many of their test inputs. 
-Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}.}
+However, the fact that the behaviour is preserved after HLS cannot be guaranteed most existing tools,\YH{Mentor's catapult C can in some cases} meaning behavioural simulation of the hardware design still has to be performed. HLS tools are also known to be quite unreliable, for example, Intel's (formerly Altera's) OpenCL SDK compiler contained too many bugs to even be considered for random testing, as more than 25\% of the test cases failed~\cite{lidbury15_many_core_compil_fuzzin}. In addition to that, designers often feel like HLS tools are quite unreliable and fragile with respect to which language features that are supported.  However, most HLS tools cannot guarantee that compilation is behaviour-preserving. In fact, on the contrary, there is some evidence that current HLS tools are actually quite \emph{unreliable} in this regard. For instance, an attempt by \citet{lidbury15_many_core_compil_fuzzin} to fuzz Altera's (now Intel's) OpenCL compiler had to be abandoned because the compiler ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}.  As HLS tools are extremely complex and can therefore incorrectly change the behaviour of the design, it is not possible to guarantee that all the properties of the code that were proven in software will also hold for the generated hardware.
+
 % JW: Another candidate, probably less interesting:
 % https://bit.ly/intel-hls-memory-bug
-As HLS tools are extremely complex and can therefore incorrectly change the behaviour of the design, it is not possible to guarantee that all the properties of the code that were proven in software will also hold for the generated hardware.
 
 %% Current work in formal verification of HLS
 %%\NR{This is a good paragraph, but we need to relate it more to this work and why this work is different.}
@@ -28,7 +26,7 @@ CompCert~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has b
 
 %% Contributions of paper
 
-In this paper we describe a fully verified high-level synthesis \JW{HLS} tool called CoqUp, which adds a Verilog backend to CompCert and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics.  The main contributions of the paper are the following:
+In this paper we describe a fully verified high-level synthesis \JW{HLS} tool called CoqUp, which adds a Verilog back end to CompCert and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics.  The main contributions of the paper are the following:
 
 \begin{itemize}
   \item First mechanised and formally verified HLS flow from C to Verilog.
@@ -45,7 +43,7 @@ CoqUp is open source and is hosted on Github\footnote{https://github.com/ymherkl
 \NR{Other comments:}
 \NR{Is both the translator and verifier written in Coq?}\YH{Currently there is no verifier, the algorithms themselves are proven directly in Coq.}
 %%\NR{A tool-flow diagram here will be useful. I thought you had one previously?}
-\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in CoqUp?}\YH{The legup example actually with the shift might be a good one.}
+\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in CoqUp?}\YH{The LegUp example actually with the shift might be a good one.}
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/main.tex b/main.tex
index 40151be..a0c8c73 100644
--- a/main.tex
+++ b/main.tex
@@ -59,6 +59,15 @@
 
 \definecolor{compcert}{HTML}{66c2a5}
 \definecolor{formalhls}{HTML}{fc8d62}
+\definecolor{keywordcolour}{HTML}{8f0075}
+\definecolor{functioncolour}{HTML}{721045}
+\definecolor{constantcolour}{HTML}{0000bb}
+
+\newcommand\yhkeywordsp[1]{\;\;\texttt{\textcolor{keywordcolour}{#1}}}
+\newcommand\yhkeyword[1]{\texttt{\textcolor{keywordcolour}{#1}}}
+\newcommand\yhfunctionsp[1]{\;\;\texttt{\textcolor{functioncolour}{#1}}}
+\newcommand\yhfunction[1]{\texttt{\textcolor{functioncolour}{#1}}}
+\newcommand\yhconstant[1]{\texttt{\textcolor{constantcolour}{#1}}}
 
 \begin{document}
 
@@ -182,8 +191,8 @@
 \maketitle
 
 \input{introduction}
-\input{verilog}
 \input{algorithm}
+\input{verilog}
 \input{proof}
 \input{evaluation}
 \input{related}
diff --git a/verilog.tex b/verilog.tex
index 431bca1..e68352d 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -2,40 +2,38 @@
 
 Verilog is a hardware description language commonly used to design hardware.  A Verilog design can then be synthesised into more basic logic which describes how different gates connect to each other, called a netlist.  This representation can then be put onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASPIC) to implement the design that was described in Verilog.  The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware.  Many Verilog features are only useful for simulation and do not affect the actual hardware itself, which means that these features do not have to be modelled in the semantics.  In addition to that, as the HLS algorithm dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.  Only supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated.
 
-The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit.  These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main syntax for the Verilog subset is the following: \JW{Eventually the $e$ alternatives could be listed horizontally rather than with one per line, to save space.} \JW{This verilog syntax looks weird to me. I didn't think there was a `then' keyword, for instance. Perhaps you're aiming more at some sort of abstracted syntax of Verilog? What does the semicolon on its own mean? Some sort of skip statement? The case statement looks weird too -- how do you get multiple cases in a single switch statement, and where is the default case? }
+The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit.  These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main syntax for the Verilog subset is the following: \JW{This verilog syntax looks weird to me. I didn't think there was a `then' keyword, for instance. Perhaps you're aiming more at some sort of abstracted syntax of Verilog? What does the semicolon on its own mean? Some sort of skip statement? The case statement looks weird too -- how do you get multiple cases in a single switch statement, and where is the default case? }\YH{I think I have fixed most cases, yes the single semicolon is a skip statement, should I make that more obvious by naming it? }
 
 \begin{align*}
-  v ::=&\; \mathit{sz} * n\\
-  e ::=&\; v\\[-2pt]
-  |&\; x\\[-2pt]
-  |&\; e [e]\\[-2pt]
-  |&\; e\ \mathit{op}\ e\\[-2pt]
-  |&\; \texttt{!} e\ |\ \texttt{\textasciitilde} e\\[-2pt]
-  |&\; e \texttt{ ? } e \texttt{ : } e\\
-  s ::=&\; s\ \texttt{;}\ s\ |\ \texttt{;}\\[-2pt]
-  |&\; \texttt{if } e \texttt{ then } s \texttt{ else } s\\[-2pt]
-  |&\; \texttt{case } e\ [e : s] \texttt{ endcase}\\[-2pt]
-  |&\; e = e\\[-2pt]
-  |&\; e \Leftarrow e\\
-  d ::=&\; \texttt{[n-1:0] } r\ |\ \texttt{[n-1:0] } r \texttt{ [m-1:0]}\\
-  m ::=&\ \texttt{reg } d \texttt{;}\ |\ \texttt{input wire } d \texttt{;}\ |\ \texttt{output reg } d \texttt{;}\\
-|&\; \text{\tt always @(posedge clk)}\ s
+  v\quad ::=&\; \mathit{sz} \yhkeyword{'d} n\\
+  \textit{op}\quad ::=&\; \yhkeyword{+ } | \yhkeywordsp{- } | \yhkeywordsp{* } \cdots \\
+  e\quad ::=&\; v\;\; |\;\; x\;\; |\;\; e \yhkeyword{[} e \yhkeyword{]}\;\; |\;\; e\ \mathit{op}\ e\;\; |\;\; \yhkeyword{!} e\;\; |\;\; \yhkeyword{\textasciitilde} e\;\; |\;\; e \yhkeywordsp{? } e \yhkeywordsp{: } e\\
+  s\quad ::=&\; \yhkeyword{begin } s \yhkeyword{; } s \yhkeywordsp{end } |\ \epsilon\\[-2pt]
+  |&\; \yhkeyword{if } e\ s \yhkeywordsp{else } s\\[-2pt]
+  |&\; \yhkeyword{case (} e \yhkeyword{) } e : s\ \{\ e : s\ \}\ [\ \yhkeyword{default} : \textit{def}\ ] \yhkeywordsp{endcase}\\[-2pt]
+  |&\; e = e \yhkeyword{;}\\[-2pt]
+  |&\; e \Leftarrow e \yhkeyword{;}\\
+  d\quad ::=&\; \yhkeyword{[n-1:0] } r\ |\ \yhkeyword{[n-1:0] } r \yhkeywordsp{[m-1:0]}\\
+  m\quad ::=&\; \yhkeyword{reg } d \yhkeyword{;}\ |\ \yhkeyword{input wire } d \yhkeyword{;}\ |\ \yhkeyword{output reg } d \yhkeyword{;}\\
+|&\; \yhkeywordsp{always @(posedge clk) begin } s \yhkeywordsp{end}\\
+  m \text{ list}\quad ::=&\; \{ m \}
 \end{align*}
 
-The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays.  This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated.  In addition to that, literal values are not represented by a list of nested boolean \JW{Boolean} values, but instead they are represented by a size and its value \JW{But I wouldn't use `$*$' to separate the size and the value here, because it makes it look like you're multiplying them together. You could use the apostrophe symbol like real Verilog? \texttt{4'b5} and so on?}, meaning a boolean is represented as a value with size one.  Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}
+The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays.  This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated.  In addition to that, literal values are not represented by a list of nested Boolean values, but instead they are represented by a size and its value, meaning a Boolean is represented as a value with size one.  Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}\YH{Yes that's actually completely true, I added a $\vec{m}$ rule for now, but can also add the $m$ afterwards.}
 
 \subsection{Semantics}
 
-Existing operational semantics~\cite{loow19_verif_compil_verif_proces} were adapted for the semantics of the language that CoqUp eventually targets.  These semantics are  \JW{This semantics is a} small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
+Existing operational semantics~\cite{loow19_verif_compil_verif_proces} were adapted for the semantics of the language that CoqUp eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
 
-At the top-level, always blocks describe logic which is run every time some event occurs.  The only event that is supported by these semantics is detecting the positive 
-\JW{rising} edge of the clock, so that we can implement synchronous logic.  As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.  However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.  However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.  This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel.  We can denote these two association maps as $s = (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})$, where $\Gamma_{r}$ is the current value of the registers, $\Gamma_{a}$ is the current value of the array, and $\Delta_{r}$ and $\Delta_{a}$ are the values of the variables and arrays when the clock cycle ends.
+At the top-level, always blocks describe logic which is run every time some event occurs.  The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic.  As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.  However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.  However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.  This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel.  We can denote these two association maps as $s = (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})$, where $\Gamma_{r}$ is the current value of the registers, $\Gamma_{a}$ is the current value of the array, and $\Delta_{r}$ and $\Delta_{a}$ are the values of the variables and arrays when the clock cycle ends.
 
-We can then define how one step in the semantics looks like.  We therefore first need to define the structure of the main module which will contain the logic for the program.  In general, functions that are translated to hardware will require basic handshaking signals so that the translated function can be used in hardware.  Firstly, they require an input for the clock, so that all the sequential circuits are run at the right time.  They then require a start and reset input, so that the hardware generated from the function can be reused multiple times.  Finally, they need a finish and return signal, where finish will go high when the result is ready to be read.  In addition to that, the function could take an arbitrary number of inputs which act as arguments to the function, so that the function can be called with different arguments.  However, in addition to inputs and outputs to the module, we also need to keep track of some internal signals and properties about the module.  Firstly, we need to keep track of the internal variables that contain the current state of the module, and the current contents of the stack.  Finally, the module will contain the entry point of the module and the list of module items that declare all of the internal registers and contain the encoding of the state machine that behaves in the same way as the function.  We can therefore declare it in the following way: \JW{I'd be inclined to write `$r~\mathrm{list}$' rather than $\vec{r}$, as it's a little more readable. (Assuming it's more-or-less the same thing?)}
+We can then define how one step in the semantics looks like.  We therefore first need to define the structure of the main module which will contain the logic for the program.  In general, functions that are translated to hardware will require basic handshaking signals so that the translated function can be used in hardware.  Firstly, they require an input for the clock, so that all the sequential circuits are run at the right time.  They then require a start and reset input, so that the hardware generated from the function can be reused multiple times.  Finally, they need a finish and return signal, where finish will go high when the result is ready to be read.  In addition to that, the function could take an arbitrary number of inputs which act as arguments to the function, so that the function can be called with different arguments.  However, in addition to inputs and outputs to the module, we also need to keep track of some internal signals and properties about the module.  Firstly, we need to keep track of the internal variables that contain the current state of the module, and the current contents of the stack.  Finally, the module will contain the entry point of the module and the list of module items that declare all of the internal registers and contain the encoding of the state machine that behaves in the same way as the function.  We can therefore declare it in the following way:
+
+%\JW{I'd be inclined to write `$r~\mathrm{list}$' rather than $\vec{r}$, as it's a little more readable. (Assuming it's more-or-less the same thing?)}
 
 \begin{align*}
-  \mathit{M} \quad ::= \quad \big\{\ &\mathtt{args} : \vec{r}\\
-                                     &\mathtt{body} : \vec{m}\\
+  \mathit{M} \quad ::= \quad \big\{\ &\mathtt{args} : r \text{ list}\\
+                                     &\mathtt{body} : m \text{ list}\\
                                      &\mathtt{entrypoint} : n\\
                                      &\mathtt{st, stk, finish, return, start, reset, clk} : r\\
                                      &\mathtt{stacksize} : n\ \big\}
@@ -45,32 +43,32 @@ The two main evaluation functions are then \textit{erun}, which takes in the cur
 
 \begin{gather*}
   \label{eq:1}
-  \inferrule[Skip]{ }{\textit{srun}\ s\ \epsilon = s}\\
+  \inferrule[Skip]{ }{\textit{srun}\ \sigma\ \epsilon\ \sigma}\\
 %
-  \inferrule[Seq]{\textit{srun}\ s_{0}\ \textit{st}_{1}\ s_{1} \\ \textit{srun}\ s_{1}\ \textit{st}_{2}\ s_{2}}{\textit{srun}\ s_{0}\ (\textit{st}_{1}\ \texttt{;}\ \textit{st}_{2})\ s_{2}}\\
+  \inferrule[Seq]{\textit{srun}\ \sigma_{0}\ \textit{s}_{1}\ \sigma_{1} \\ \textit{srun}\ \sigma_{1}\ \textit{s}_{2}\ \sigma_{2}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{begin } \textit{s}_{1} \yhkeyword{;}\ \textit{s}_{2} \yhkeywordsp{end})\ \sigma_{2}}\\
 %
-  \inferrule[CondTrue]{\textit{erun}\ \Gamma_{0}\ c\ v_{c} \\ \texttt{valToB}\ v_{c} = \texttt{true} \\ \textit{srun}\ s_{0}\ \textit{stt}\ s_{1}}{\textit{srun}\ s_{0}\ (\texttt{if } c \texttt{ then  } \textit{stt} \texttt{ else } \textit{stf}\,)\ s_{1}}\\
+  \inferrule[CondTrue]{\textit{erun}\ \Gamma_{0}\ c\ v_{c} \\ \yhfunction{valToB}\ v_{c} = \yhconstant{true} \\ \textit{srun}\ \sigma_{0}\ \textit{st}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if } c\ \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
 %
-  \inferrule[CondFalse]{\textit{erun}\ \Gamma_{0}\ c\ v_{c} \\ \texttt{valToB}\ v_{c} = \texttt{false} \\ \textit{srun}\ s_{0}\ \textit{stf}\ s_{1}}{\textit{srun}\ s_{0}\ (\texttt{if } c \texttt{ then  } \textit{stt} \texttt{ else } \textit{stf}\,)\ s_{1}}\\
+  \inferrule[CondFalse]{\textit{erun}\ \Gamma_{0}\ c\ v_{c} \\ \yhfunction{valToB}\ v_{c} = \yhconstant{false} \\ \textit{srun}\ \sigma_{0}\ \textit{sf}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if } c\ \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
 %
-  \inferrule[CaseNoMatch]{\textit{srun}\ s_{0}\ (\texttt{case}\ e\ cs\ \textit{def})\ s_{1} \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ mve \neq ve}{\textit{srun}\  s_{0}\ (\texttt{case}\ e\ ((me,\ sc) :: cs)\ \textit{def})\ s_{1}}\\
+  \inferrule[CaseNoMatch]{\textit{srun}\ \sigma_{0}\ (\yhkeyword{case}\ e\ cs\ \textit{def})\ \sigma_{1} \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ mve \neq ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } (me : sc) :: cs) \yhkeywordsp{default} : \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
-  \inferrule[CaseMatch]{\textit{srun}\  s_{0}\ sc\ s_{1} \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ mve = ve}{\textit{srun}\  s_{0}\ (\texttt{case}\ e\ ((me,\ sc) :: cs)\ \textit{def})\ s_{1}}\\
+  \inferrule[CaseMatch]{\textit{srun}\  \sigma_{0}\ sc\ \sigma_{1} \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ mve = ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } (me : sc) :: cs) \yhkeywordsp{default} : \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
-  \inferrule[CaseDefault]{\textit{srun}\  s_{0}\ st\ s_{1}}{\textit{srun}\  s_{0}\ (\texttt{case}\ e\ []\ (\texttt{Some}\ st))\ s_{1}}\\
+  \inferrule[CaseDefault]{\textit{srun}\  \sigma_{0}\ s\ \sigma_{1}}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } [] \yhkeywordsp{default} : (\yhkeyword{Some}\ s) \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
-  \inferrule[Blocking Reg]{\texttt{name}\ \textit{lhs} = \texttt{OK}\ n \\ \textit{erun}\  \Gamma_{r}\ \Gamma_{a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r},\Gamma_{a},\Delta_{r},\Delta_{a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Gamma_{a}, \Delta_{r}, \Delta_{a})}\\
+  \inferrule[Blocking Reg]{\yhfunction{name}\ \textit{lhs} = \yhconstant{OK}\ n \\ \textit{erun}\  \Gamma_{r}\ \Gamma_{a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r},\Gamma_{a},\Delta_{r},\Delta_{a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{r} [n \mapsto v_{\textit{rhs}}], \Gamma_{a}, \Delta_{r}, \Delta_{a})}\\
 %
-  \inferrule[Nonblocking Reg]{\texttt{name}\ \textit{lhs} = \texttt{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Delta_{a})}
+  \inferrule[Nonblocking Reg]{\yhfunction{name}\ \textit{lhs} = \yhconstant{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} [n \mapsto v_{\textit{rhs}}], \Delta_{a})}
 %
-%  \inferrule[Blocking Array]{\texttt{name}\ \textit{lhs} = \texttt{OK}\ n \\ \textit{erun}\  \Gamma_{r}\ \Gamma_{a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r},\Gamma_{a},\Delta_{r},\Delta_{a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Gamma_{a}, \Delta_{r}, \Delta_{a})}\\
+%  \inferrule[Blocking Array]{\yhkeyword{name}\ \textit{lhs} = \yhkeyword{OK}\ n \\ \textit{erun}\  \Gamma_{r}\ \Gamma_{a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r},\Gamma_{a},\Delta_{r},\Delta_{a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Gamma_{a}, \Delta_{r}, \Delta_{a})}\\
 %
-%  \inferrule[Nonblocking Array]{\texttt{name}\ \textit{lhs} = \texttt{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Delta_{a})}
+%  \inferrule[Nonblocking Array]{\yhkeyword{name}\ \textit{lhs} = \yhkeyword{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Delta_{a})}
 \end{gather*}
 
 \YH{TODO: Add rules for blocking and nonblocking assignment to arrays.} \JW{The `Skip' rule has an erroneous $=$.} \JW{You could use a bit of colour here, e.g. the keywords like `if' could be coloured for readability.} \JW{The difference between `s' and `st' is hard to remember, since both are prefixes of both `state' and `statement'! It's quite common to use `$\sigma$' for states, so you might consider `$s$' and `$\sigma$' for statements and states?} \JW{The function update syntax is not familiar to me, but perhaps it is what is used in Coq? More typical would be `$\Delta[n\mapsto v]$'.}
 
-Taking the \textsc{CondTrue} rule as an example, this rule will only apply if the boolean result of running the expression results in a \texttt{true} value.  It then also states that the statement in the true branch of the conditional statement \textit{stt} runs from state $s_{0}$ to state $s_{1}$.  If both of these conditions hold, we then get that the conditional statement will also run from state $s_{0}$ to state $s_{1}$.  The \textsc{Blocking} and \textsc{Nonblocking} rules are a bit more interesting, as these modify the blocking and nonblocking association maps respectively.
+Taking the \textsc{CondTrue} rule as an example, this rule will only apply if the Boolean result of running the expression results in a \texttt{true} value.  It then also states that the statement in the true branch of the conditional statement \textit{stt} runs from state $\sigma_{0}$ to state $\sigma_{1}$.  If both of these conditions hold, we then get that the conditional statement will also run from state $\sigma_{0}$ to state $\sigma_{1}$.  The \textsc{Blocking} and \textsc{Nonblocking} rules are a bit more interesting, as these modify the blocking and nonblocking association maps respectively.
 
 One main difference between these semantics and the Verilog semantics by \citet{loow19_verif_compil_verif_proces} is that there is no function for external nondeterministic effects, such as memories and inputs and outputs.  These are instead handled explicitly in the semantics by using two dimensional unpacked arrays to model memories and assuming that inputs to modules cannot change.  Another difference with these semantics is that partial updates to arrays are fully supported, due to the fact that there are two different queues for arrays and variables.  Originally, if there was a blocking assignment to an array, and then a nonblocking assignment to a different region in the array, then the blocking assignment would disappear at the end of the clock cycle.  This is because the complete array would be overwritten with the updated array in the nonblocking association maps.  However, in our semantics, only the values that were changed in the array are actually recorded in the nonblocking assignment queue, meaning once the blocking and nonblocking array association maps are merged, only the actual indices that changed with nonblocking assignment are updated in the blocking assignment map.