7 files changed, 149 insertions, 150 deletions
diff --git a/algorithm.tex b/algorithm.tex
index e43e804..f35e0c7 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -27,9 +27,11 @@
   \caption{Verilog backend branching off at the RTL stage.}\label{fig:rtlbranch}
 \end{figure}
 
-This section covers the main architecture of the HLS tool, and how the backend was added to CompCert. \JW{New paragraph.} CompCert is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various different optimisations.  When designing a new backend for CompCert, it is therefore crucial to know where to branch off and start the hardware generation.  Many of the optimisations that the CompCert backend performs are not necessary when generating custom hardware and not relying on a CPU anymore, such as register allocation or even scheduling.  It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
+This section covers the main architecture of the HLS tool, and how the backend was added to CompCert.
 
-Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour.  CompCert's RTL is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. \JP{Perhaps this needs some further qualification? RTL uses the operations from the target architecture, and indeed performs architecture specific optimisations prior to RTL gen, so (for sake of example) switching from x86 RTL to RISC-V RTL could have a significant impact on performance.}  In addition to that, many optimisations that are also useful for HLS are performed in RTL, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that CoqUp takes is show in figure~\ref{fig:rtlbranch}.
+CompCert is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various different optimisations.  When designing a new backend for CompCert, it is therefore crucial to know where to branch off and start the hardware generation.  Many of the optimisations that the CompCert backend performs are not necessary when generating custom hardware and not relying on a CPU anymore, such as register allocation or even scheduling.  It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
+
+Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour.  CompCert's RTL is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. \JP{Perhaps this needs some further qualification? RTL uses the operations from the target architecture, and indeed performs architecture specific optimisations prior to RTL gen, so (for sake of example) switching from x86 RTL to RISC-V RTL could have a significant impact on performance.}\YH{Yes will definitely include those, just have to think about where.}  In addition to that, many optimisations that are also useful for HLS are performed in RTL, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that CoqUp takes is show in figure~\ref{fig:rtlbranch}.
 
 %%TODO: Maybe add why LTL and the other smaller languages are not that well suited
 
@@ -38,155 +40,32 @@ Existing HLS compilers usually use LLVM IR as an intermediate representation whe
 \begin{figure}
   \centering
   \begin{subfigure}[b]{0.4\linewidth}
-\begin{minted}{c}
-int main() {
-    int x[3] = {1, 2, 3};
-    int sum = 0, incr = 1;
-    for (int i = 0; i < 3; i=i+incr)
-        sum += x[i];
-    return sum;
-}
-\end{minted}
+    \inputminted{c}{data/accumulator.c}
     \caption{Accumulator C code.}\label{fig:accumulator_c}
   \end{subfigure}%
   \begin{subfigure}[b]{0.6\linewidth}
-\begin{minted}[fontsize=\footnotesize]{c}
-main() {
-   16:	x9 = 1
-   15:	int32[stack(0)] = x9
-   14:	x8 = 2
-   13:	int32[stack(4)] = x8
-   12:	x7 = 3
-   11:	int32[stack(8)] = x7
-   10:	x3 = 0
-    9:	nop
-    8:	x1 = 0
-    7:	x6 = stack(0) (int)
-    6:	x5 = int32[x6 + x1 * 4 + 0]
-    5:	x3 = x3 + x5 + 0 (int)
-    4:	x1 = x1 + 1 (int)
-    3:	if (x1 <s 3) goto 7 else goto 2
-    2:	x4 = x3
-    1:	return x4
-}
-\end{minted}
+    \inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
     \caption{Accumulator C code.}\label{fig:accumulator_rtl}
   \end{subfigure}
-  \caption{Accumulator example using CompCert to translate from C to RTL.}\label{fig:accumulator}
+  \caption{Accumulator example using CompCert to translate from C to RTL.}\label{fig:accumulator_c_rtl}
 \end{figure}
 
 \begin{figure}
   \centering
   \begin{subfigure}[b]{0.5\linewidth}
-\begin{minted}[fontsize=\tiny]{systemverilog}
-main() {
-  datapath {
-   16:	reg_9 <= 32'd1;
-   15:	reg_13[32'd0] <= reg_9;
-   14:	reg_8 <= 32'd2;
-   13:	reg_13[32'd1] <= reg_8;
-   12:	reg_7 <= 32'd3;
-   11:	reg_13[32'd2] <= reg_7;
-   10:	reg_3 <= 32'd0;
-    9:	;
-    8:	reg_1 <= 32'd0;
-    7:	reg_6 <= 32'd0;
-    6:	reg_5 <= reg_13[{{{reg_6 + 32'd0}
-                        + {reg_1 * 32'd4}} / 32'd4}];
-    5:	reg_3 <= {reg_3 + {reg_5 + 32'd0}};
-    4:	reg_1 <= {reg_1 + 32'd1};
-    3:	;
-    2:	reg_4 <= reg_3;
-    1:	reg_11 <= 1'd1; reg_12 <= reg_4;
-  }
-
-  controllogic {
-   16:	reg_10 <= 32'd15;
-   15:	reg_10 <= 32'd14;
-   14:	reg_10 <= 32'd13;
-   13:	reg_10 <= 32'd12;
-   12:	reg_10 <= 32'd11;
-   11:	reg_10 <= 32'd10;
-   10:	reg_10 <= 32'd9;
-    9:	reg_10 <= 32'd8;
-    8:	reg_10 <= 32'd7;
-    7:	reg_10 <= 32'd6;
-    6:	reg_10 <= 32'd5;
-    5:	reg_10 <= 32'd4;
-    4:	reg_10 <= 32'd3;
-    3:	reg_10 <= ({$signed(reg_1) < $signed(32'd3)}
-                        ? 32'd7 : 32'd2);
-    2:	reg_10 <= 32'd1;
-    1:	;
-  }
-}
-\end{minted}
-    \caption{Accumulator C code.}\label{fig:accumulator_c}
+    \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator.htl}
+    \caption{Accumulator C code.}\label{fig:accumulator_htl}
   \end{subfigure}%
   \begin{subfigure}[b]{0.5\linewidth}
-\begin{minted}[fontsize=\tiny]{systemverilog}
-module main(reg_14, reg_15, reg_16, reg_11, reg_12);
-  always @(posedge reg_16)
-    if ({reg_15 == 1'd1})
-      reg_10 <= 32'd16;
-    else
-      case (reg_10)
-        32'd16: reg_10 <= 32'd15;
-        32'd8: reg_10 <= 32'd7;
-        32'd4: reg_10 <= 32'd3;
-        32'd12: reg_10 <= 32'd11;
-        32'd2: reg_10 <= 32'd1;
-        32'd10: reg_10 <= 32'd9;
-        32'd6: reg_10 <= 32'd5;
-        32'd14: reg_10 <= 32'd13;
-        32'd1: ;
-        32'd9: reg_10 <= 32'd8;
-        32'd5: reg_10 <= 32'd4;
-        32'd13: reg_10 <= 32'd12;
-        32'd3: reg_10 <= ({$signed(reg_1) < $signed(32'd3)}
-                               ? 32'd7 : 32'd2);
-        32'd11: reg_10 <= 32'd10;
-        32'd7: reg_10 <= 32'd6;
-        32'd15: reg_10 <= 32'd14;
-        default:;
-      endcase
-  always @(posedge reg_16)
-    case (reg_10)
-      32'd16: reg_9 <= 32'd1;
-      32'd8: reg_1 <= 32'd0;
-      32'd4: reg_1 <= {reg_1 + 32'd1};
-      32'd12: reg_7 <= 32'd3;
-      32'd2: reg_4 <= reg_3;
-      32'd10: reg_3 <= 32'd0;
-      32'd6: reg_5 <= reg_13[{{{reg_6 + 32'd0}
-                             + {reg_1 * 32'd4}} / 32'd4}];
-      32'd14: reg_8 <= 32'd2;
-      32'd1: begin reg_11 = 1'd1; reg_12 = reg_4; end
-      32'd9: ;
-      32'd5: reg_3 <= {reg_3 + {reg_5 + 32'd0}};
-      32'd13: reg_13[32'd1] <= reg_8;
-      32'd3: ;
-      32'd11: reg_13[32'd2] <= reg_7;
-      32'd7: reg_6 <= 32'd0;
-      32'd15: reg_13[32'd0] <= reg_9;
-      default:;
-    endcase
-  reg [31:0] reg_13 [2:0];
-  input [0:0] reg_16, reg_14, reg_15;
-  reg [31:0] reg_8, reg_4, reg_10, reg_6,
-             reg_1, reg_9, reg_5, reg_3, reg_7;
-  output reg [31:0] reg_12;
-  output reg [0:0] reg_11;
-endmodule
-\end{minted}
-    \caption{Accumulator C code.}\label{fig:accumulator_rtl}
+    \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator.v}
+    \caption{Accumulator C code.}\label{fig:accumulator_v}
   \end{subfigure}
-  \caption{Accumulator example using CompCert to translate from C to RTL.}\label{fig:accumulator}
+  \caption{Accumulator example using CompCert to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.}}\label{fig:accumulator_htl_v}
 \end{figure}
 
-To describe the translation, we start with an example of how to translate a simple accumulator example, which is shown in figure~\ref{fig:accumulator}.  Using this example, the different stages in the translation can be explained, together with how they were proven.  The example includes constructs such as arrays, for loops and addition, however, CoqUp also supports all other operators like multiplication and division, as well as conditional statements.  Function calls are also supported by inlining all of the functions, meaning recursive function calls are not supported.
+To describe the translation, we start with an example of how to translate a simple accumulator example, which is shown in figure~\ref{fig:accumulator_c}.  Using this example, the different stages in the translation can be explained, together with how they were proven.
 
-The first step is to use the CompCert front end passes to convert the C code into RTL, which is the intermediate language that CompCert uses for most of its optimisations.  The translation is shown in figure~\ref{fig:accumulator}, where the RTL code is depicted in figure~\ref{fig:accumulator_rtl}.  After that, the code is
+The first step is to use the CompCert front end passes to convert the C code into RTL, which is the intermediate language that CompCert uses for most of its optimisations.  The translation is shown in figure~\ref{fig:accumulator_c_rtl}, where the RTL code is depicted in figure~\ref{fig:accumulator_rtl}.  After that, the code is
 
 \subsection{CompCert RTL}
 
diff --git a/data/accumulator.c b/data/accumulator.c
new file mode 100644
index 0000000..7d78a61
--- /dev/null
+++ b/data/accumulator.c
@@ -0,0 +1,7 @@
+int main() {
+    int x[3] = {1, 2, 3};
+    int sum = 0, incr = 1;
+    for (int i = 0; i < 3; i=i+incr)
+        sum += x[i];
+    return sum;
+}
diff --git a/data/accumulator.htl b/data/accumulator.htl
new file mode 100644
index 0000000..99e3ccb
--- /dev/null
+++ b/data/accumulator.htl
@@ -0,0 +1,41 @@
+main() {
+  datapath {
+   16:	reg_9 <= 32'd1;
+   15:	reg_13[32'd0] <= reg_9;
+   14:	reg_8 <= 32'd2;
+   13:	reg_13[32'd1] <= reg_8;
+   12:	reg_7 <= 32'd3;
+   11:	reg_13[32'd2] <= reg_7;
+   10:	reg_3 <= 32'd0;
+    9:	;
+    8:	reg_1 <= 32'd0;
+    7:	reg_6 <= 32'd0;
+    6:	reg_5 <= reg_13[{{{reg_6 + 32'd0}
+                        + {reg_1 * 32'd4}} / 32'd4}];
+    5:	reg_3 <= {reg_3 + {reg_5 + 32'd0}};
+    4:	reg_1 <= {reg_1 + 32'd1};
+    3:	;
+    2:	reg_4 <= reg_3;
+    1:	reg_11 <= 1'd1; reg_12 <= reg_4;
+  }
+
+  controllogic {
+   16:	reg_10 <= 32'd15;
+   15:	reg_10 <= 32'd14;
+   14:	reg_10 <= 32'd13;
+   13:	reg_10 <= 32'd12;
+   12:	reg_10 <= 32'd11;
+   11:	reg_10 <= 32'd10;
+   10:	reg_10 <= 32'd9;
+    9:	reg_10 <= 32'd8;
+    8:	reg_10 <= 32'd7;
+    7:	reg_10 <= 32'd6;
+    6:	reg_10 <= 32'd5;
+    5:	reg_10 <= 32'd4;
+    4:	reg_10 <= 32'd3;
+    3:	reg_10 <= ({$signed(reg_1) < $signed(32'd3)}
+                        ? 32'd7 : 32'd2);
+    2:	reg_10 <= 32'd1;
+    1:	;
+  }
+}
diff --git a/data/accumulator.rtl b/data/accumulator.rtl
new file mode 100644
index 0000000..a6b528c
--- /dev/null
+++ b/data/accumulator.rtl
@@ -0,0 +1,18 @@
+main() {
+   16:	x9 = 1
+   15:	int32[stack(0)] = x9
+   14:	x8 = 2
+   13:	int32[stack(4)] = x8
+   12:	x7 = 3
+   11:	int32[stack(8)] = x7
+   10:	x3 = 0
+    9:	nop
+    8:	x1 = 0
+    7:	x6 = stack(0) (int)
+    6:	x5 = int32[x6 + x1 * 4 + 0]
+    5:	x3 = x3 + x5 + 0 (int)
+    4:	x1 = x1 + 1 (int)
+    3:	if (x1 <s 3) goto 7 else goto 2
+    2:	x4 = x3
+    1:	return x4
+}
diff --git a/data/accumulator.v b/data/accumulator.v
new file mode 100644
index 0000000..baabfe3
--- /dev/null
+++ b/data/accumulator.v
@@ -0,0 +1,53 @@
+module main(reg_14, reg_15, reg_16, reg_11, reg_12);
+  always @(posedge reg_16)
+    if ({reg_15 == 1'd1})
+      reg_10 <= 32'd16;
+    else
+      case (reg_10)
+        32'd16: reg_10 <= 32'd15;
+        32'd8: reg_10 <= 32'd7;
+        32'd4: reg_10 <= 32'd3;
+        32'd12: reg_10 <= 32'd11;
+        32'd2: reg_10 <= 32'd1;
+        32'd10: reg_10 <= 32'd9;
+        32'd6: reg_10 <= 32'd5;
+        32'd14: reg_10 <= 32'd13;
+        32'd1: ;
+        32'd9: reg_10 <= 32'd8;
+        32'd5: reg_10 <= 32'd4;
+        32'd13: reg_10 <= 32'd12;
+        32'd3: reg_10 <= ({$signed(reg_1) < $signed(32'd3)}
+                               ? 32'd7 : 32'd2);
+        32'd11: reg_10 <= 32'd10;
+        32'd7: reg_10 <= 32'd6;
+        32'd15: reg_10 <= 32'd14;
+        default:;
+      endcase
+  always @(posedge reg_16)
+    case (reg_10)
+      32'd16: reg_9 <= 32'd1;
+      32'd8: reg_1 <= 32'd0;
+      32'd4: reg_1 <= {reg_1 + 32'd1};
+      32'd12: reg_7 <= 32'd3;
+      32'd2: reg_4 <= reg_3;
+      32'd10: reg_3 <= 32'd0;
+      32'd6: reg_5 <= reg_13[{{{reg_6 + 32'd0}
+                             + {reg_1 * 32'd4}} / 32'd4}];
+      32'd14: reg_8 <= 32'd2;
+      32'd1: begin reg_11 = 1'd1; reg_12 = reg_4; end
+      32'd9: ;
+      32'd5: reg_3 <= {reg_3 + {reg_5 + 32'd0}};
+      32'd13: reg_13[32'd1] <= reg_8;
+      32'd3: ;
+      32'd11: reg_13[32'd2] <= reg_7;
+      32'd7: reg_6 <= 32'd0;
+      32'd15: reg_13[32'd0] <= reg_9;
+      default:;
+    endcase
+  reg [31:0] reg_13 [2:0];
+  input [0:0] reg_16, reg_14, reg_15;
+  reg [31:0] reg_8, reg_4, reg_10, reg_6,
+             reg_1, reg_9, reg_5, reg_3, reg_7;
+  output reg [31:0] reg_12;
+  output reg [0:0] reg_11;
+endmodule
diff --git a/introduction.tex b/introduction.tex
index 2ac9eba..b7a9bc9 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -20,7 +20,7 @@ However, the fact that the behaviour is preserved after HLS cannot be guaranteed
 %%\NR{Focus on more high-level of "why this work is interesting"? Two key points we want to get across to the reader is that in existing works: validation is neccessary every time a new program is compiled and the verifying algorithm is not verified.}
 %%\NR{Also define words like validation, verifying algorithm (can you use the word ``verifier'',mechanisation)}
 %%\NR{Having said that, keep the text for related work section.}\YH{Added into related works.}
-To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This can either be done by simulation with a large test bench, however, to be sure that the hardware does indeed behave in the same way as the C code, it may be necessary to prove that they are equivalent. \JW{I think that point could be strengthened by emphasising that simulation with a test-bench only provides guarantees that are as good as the test-bench! That is, if the test-bench does not cover all possible inputs then bugs may remain.} Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. \JW{Does this link back to Mentor's Catapult-C, which you mentioned earlier? Does Catapult-C attempt to do translation validation as part of its HLS process? And if so, can you make the point that this effort is largely ineffective because once the design is a reasonable size, the translation validation usually fails anyway?} A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code.  In addition to that, it would allow for the implementation of translation validated optimisation passes which are also proven correct mechanically, thereby greatly improving the reliability of these passes.
+To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This can is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there may still be bugs in the untested code.  To be sure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent. \JW{I think that point could be strengthened by emphasising that simulation with a test-bench only provides guarantees that are as good as the test-bench! That is, if the test-bench does not cover all possible inputs then bugs may remain.}\YH{Added a sentence on that, it does motivate the equivalence proving more.} Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. \JW{Does this link back to Mentor's Catapult-C, which you mentioned earlier? Does Catapult-C attempt to do translation validation as part of its HLS process? And if so, can you make the point that this effort is largely ineffective because once the design is a reasonable size, the translation validation usually fails anyway?}\YH{TODO: Currently I only have a whitepaper which goes over the translation validation in a very high level, but they do actually mention some flakiness and state that the user would have to manually change the code to fix that.  So I think I can actually make that point.  I just realised they have a pretty funny diagram of verification $\rightarrow$ differences $\rightarrow$ adjustments $\rightarrow$ ... until it is finally verified.} A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code.  In addition to that, it would allow for the implementation of translation validated optimisation passes which are also proven correct mechanically, thereby greatly improving the reliability of these passes.
 
 CompCert~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has been written and formally verified in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel}.  First of all, most of the compiler passes in CompCert have been proven correct, meaning that once the compiler is built, the proofs can be erased as the algorithm has been shown to be correct independent of the input.  However, some optimisation passes such as software pipelining require translation validation~\cite{tristan08_formal_verif_trans_valid}, in which case the correctness of the compiler pass needs to be checked at runtime.  However, even in this case the verifier itself is proven to only verify code correct that does indeed behave in the same way.
 
diff --git a/verilog.tex b/verilog.tex
index f40baf5..8f6e7dd 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -2,15 +2,15 @@
 
 Verilog is a hardware description language commonly used to design hardware.  A Verilog design can then be synthesised into more basic logic which describes how different gates connect to each other, called a netlist.  This representation can then be put onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASPIC) to implement the design that was described in Verilog.  The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware.  Many Verilog features are only useful for simulation and do not affect the actual hardware itself, which means that these features do not have to be modelled in the semantics.  In addition to that, as the HLS algorithm dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.  Only supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated.
 
-The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit.  These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main syntax for the Verilog subset is the following: \JW{This verilog syntax looks weird to me. I didn't think there was a `then' keyword, for instance. Perhaps you're aiming more at some sort of abstracted syntax of Verilog? What does the semicolon on its own mean? Some sort of skip statement? The case statement looks weird too -- how do you get multiple cases in a single switch statement, and where is the default case? }\YH{I think I have fixed most cases, yes the single semicolon is a skip statement, should I make that more obvious by naming it? } \JW{It still looks a bit funny to me -- a bit of a halfway-house between `proper' Verilog syntax and `abstract' Verilog syntax. E.g. the way `begin...end' blocks contain exactly two statements, or the way that you get an erroneous double semicolon by combining the `begin...end' rule with the `e=e;' rule. People who are very familiar with C-like syntax will know that this isn't quite right... but then again, it doesn't really matter whether you handle the full syntax, because you only have to pretty-print a subset of it. So, why not stick here with a slightly abstracted Verilog syntax? It would make the operational semantics easier to read, for instance. Basically like you had it before, but explicitly labelled as a simplified syntax, so readers don't get confused!}\YH{Yeah, the syntax is a bit funny, mostly because this is actually how it is also encoded in Coq.  The main weird rule is the Seq rule, basically because it actually doesn't have a semicolon in between and no begin and end block, but it looks a bit strange to just put two $s$ next to each other.  We therefore don't really have begin and end blocks, and basically glue them to each statement instead, so in our semantics, the if statement actually looks like: if $e$ begin $s$ end else begin $s$ end, but it gets very verbose for case statements.  I got rid of the sequence for now, so maybe this is better, but this is nearly exactly how we actually encode it.}
+The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit.  These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main syntax for the Verilog subset is the following: \JW{This verilog syntax looks weird to me. I didn't think there was a `then' keyword, for instance. Perhaps you're aiming more at some sort of abstracted syntax of Verilog? What does the semicolon on its own mean? Some sort of skip statement? The case statement looks weird too -- how do you get multiple cases in a single switch statement, and where is the default case? }\YH{I think I have fixed most cases, yes the single semicolon is a skip statement, should I make that more obvious by naming it? } \JW{It still looks a bit funny to me -- a bit of a halfway-house between `proper' Verilog syntax and `abstract' Verilog syntax. E.g. the way `begin...end' blocks contain exactly two statements, or the way that you get an erroneous double semicolon by combining the `begin...end' rule with the `e=e;' rule. People who are very familiar with C-like syntax will know that this isn't quite right... but then again, it doesn't really matter whether you handle the full syntax, because you only have to pretty-print a subset of it. So, why not stick here with a slightly abstracted Verilog syntax? It would make the operational semantics easier to read, for instance. Basically like you had it before, but explicitly labelled as a simplified syntax, so readers don't get confused!}\YH{Yeah, the syntax is a bit funny, mostly because this is actually how it is also encoded in Coq.  The main weird rule is the Seq rule, basically because it actually doesn't have a semicolon in between and no begin and end block, but it looks a bit strange to just put two $s$ next to each other.  We therefore don't really have begin and end blocks, and basically glue them to each statement instead, so in our semantics, the if statement actually looks like: if $e$ begin $s$ end else begin $s$ end, but it gets very verbose for case statements.  I got rid of the sequence for now, but now it just looks like function application, so the semi colon kind of acted like a constructor.  I can just add a \texttt{Seq} constructor though, which might be clearer.}
 
 \begin{align*}
   v\quad ::=&\; \mathit{sz} \yhkeyword{'d} n\\
   \textit{op}\quad ::=&\; \yhkeyword{+ } | \yhkeywordsp{- } | \yhkeywordsp{* } \cdots \\
   e\quad ::=&\; v\;\; |\;\; x\;\; |\;\; e \yhkeyword{[} e \yhkeyword{]}\;\; |\;\; e\ \mathit{op}\ e\;\; |\;\; \yhkeyword{!} e\;\; |\;\; \yhkeyword{\textasciitilde} e\;\; |\;\; e \yhkeywordsp{? } e \yhkeywordsp{: } e\\
-  s\quad ::=&\; s s |\ \epsilon\\[-2pt]
-  |&\; \yhkeyword{if } e\ s \yhkeywordsp{else } s\\[-2pt]
-  |&\; \yhkeyword{case (} e \yhkeyword{) } e : s\ \{\ e : s\ \}\ [\ \yhkeyword{default} : \textit{def}\ ] \yhkeywordsp{endcase}\\[-2pt]
+  s\quad ::=&\; s\ s\ |\ \epsilon\\[-2pt]
+  |&\; \yhkeyword{if(} e \yhkeyword{) } s \yhkeywordsp{else } s\\[-2pt]
+  |&\; \yhkeyword{case(} e \yhkeyword{) } e : s\ \{\ e : s\ \}\ [\ s\ ] \yhkeywordsp{endcase}\\[-2pt]
   |&\; e = e \yhkeyword{;}\\[-2pt]
   |&\; e \Leftarrow e \yhkeyword{;}\\
   d\quad ::=&\; \yhkeyword{[n-1:0] } r\ |\ \yhkeyword{[n-1:0] } r \yhkeywordsp{[m-1:0]}\\
@@ -25,7 +25,7 @@ The main addition to the Verilog syntax is the explicit declaration of inputs an
 
 Existing operational semantics~\cite{loow19_verif_compil_verif_proces} were adapted for the semantics of the language that CoqUp eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
 
-At the top-level, always blocks describe logic which is run every time some event occurs.  The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic.  As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.  However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.  However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.  This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel.  We can denote these two association maps as $s = (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})$, where $\Gamma_{r}$ is the current value of the registers, $\Gamma_{a}$ is the current value of the array, and $\Delta_{r}$ and $\Delta_{a}$ are the values of the variables and arrays when the clock cycle ends.
+At the top-level, always blocks describe logic which is run every time some event occurs.  The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic.  As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.  However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.  However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.  This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel.  We can denote these two association maps as $s = (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})$, where $\Gamma_{\rm r}$ is the current value of the registers, $\Gamma_{\rm a}$ is the current value of the array, and $\Delta_{\rm r}$ and $\Delta_{\rm a}$ are the values of the variables and arrays when the clock cycle ends.
 
 We can then define how one step in the semantics looks like.  We therefore first need to define the structure of the main module which will contain the logic for the program.  In general, functions that are translated to hardware will require basic handshaking signals so that the translated function can be used in hardware.  Firstly, they require an input for the clock, so that all the sequential circuits are run at the right time.  They then require a start and reset input, so that the hardware generated from the function can be reused multiple times.  Finally, they need a finish and return signal, where finish will go high when the result is ready to be read.  In addition to that, the function could take an arbitrary number of inputs which act as arguments to the function, so that the function can be called with different arguments.  However, in addition to inputs and outputs to the module, we also need to keep track of some internal signals and properties about the module.  Firstly, we need to keep track of the internal variables that contain the current state of the module, and the current contents of the stack.  Finally, the module will contain the entry point of the module and the list of module items that declare all of the internal registers and contain the encoding of the state machine that behaves in the same way as the function.  We can therefore declare it in the following way:
 
@@ -39,34 +39,35 @@ We can then define how one step in the semantics looks like.  We therefore first
                                      &\mathtt{stacksize} : n\ \big\}
 \end{align*}
 
-The two main evaluation functions are then \textit{erun}, which takes in the current state together with an expression and returns a value, and \textit{srun}, which takes the current state and a statement as input, and returns the updated state.  The inductive rules defining \textit{srun} are shown below:
+The two main evaluation functions are then \textit{erun}, which takes in the current state together with an expression and returns a value, and \textit{srun}, which takes the current state and a statement as input, and returns the updated state.  The inductive rules defining \textit{srun} are shown below, where $\sigma_{n} = (\Gamma_{\rm r}^{n}, \Gamma_{\rm a}^{n}, \Delta_{\rm r}^{n}, \Delta_{\rm a}^{n})$:
 
 \begin{gather*}
   \label{eq:1}
   \inferrule[Skip]{ }{\textit{srun}\ \sigma\ \epsilon\ \sigma}\\
 %
-  \inferrule[Seq]{\textit{srun}\ \sigma_{0}\ \textit{s}_{1}\ \sigma_{1} \\ \textit{srun}\ \sigma_{1}\ \textit{s}_{2}\ \sigma_{2}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{begin } \textit{s}_{1} \yhkeyword{;}\ \textit{s}_{2} \yhkeywordsp{end})\ \sigma_{2}}\\
+  \inferrule[Seq]{\textit{srun}\ \sigma_{0}\ \textit{s}_{1}\ \sigma_{1} \\ \textit{srun}\ \sigma_{1}\ \textit{s}_{2}\ \sigma_{2}}{\textit{srun}\ \sigma_{0}\ (\textit{s}_{1}\ \textit{s}_{2})\ \sigma_{2}}\\
 %
-  \inferrule[CondTrue]{\textit{erun}\ \Gamma_{0}\ c\ v_{c} \\ \yhfunction{valToB}\ v_{c} = \yhconstant{true} \\ \textit{srun}\ \sigma_{0}\ \textit{st}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if } c\ \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
+  \inferrule[CondTrue]{\textit{erun}\ \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ c\ v_{c} \\ \yhfunction{valToB}\ v_{c} = \yhconstant{true} \\ \textit{srun}\ \sigma_{0}\ \textit{st}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if(} c \yhkeyword{) } \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
 %
-  \inferrule[CondFalse]{\textit{erun}\ \Gamma_{0}\ c\ v_{\rm c} \\ \yhfunction{valToB}\ v_{\rm c} = \yhconstant{false} \\ \textit{srun}\ \sigma_{0}\ \textit{sf}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if } c\ \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
+  \inferrule[CondFalse]{\textit{erun}\ \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ c\ v_{\rm c} \\ \yhfunction{valToB}\ v_{\rm c} = \yhconstant{false} \\ \textit{srun}\ \sigma_{0}\ \textit{sf}\ \sigma_{1}}{\textit{srun}\ \sigma_{0}\ (\yhkeyword{if(} c \yhkeyword{) } \textit{st} \yhkeywordsp{else } \textit{sf})\ \sigma_{1}}\\
 %
-  \inferrule[CaseNoMatch]{\textit{srun}\ \sigma_{0}\ (\yhkeyword{case}\ e\ cs\ \textit{def})\ \sigma_{1} \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ mve \neq ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } ((me : sc) :: cs) \yhkeywordsp{default} : \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
+  \inferrule[CaseNoMatch]{\textit{srun}\ \sigma_{0}\ (\yhkeyword{case(} e \yhkeyword{) } cs\ \textit{def} \yhkeywordsp{endcase})\ \sigma_{1} \\ \textit{erun}\  \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ me\ mve \\ \textit{erun}\  \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ e\ ve \\ mve \neq ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case(} e \yhkeyword{) } ((me : sc) :: cs)\ \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
-  \inferrule[CaseMatch]{\textit{srun}\  \sigma_{0}\ sc\ \sigma_{1} \\ \textit{erun}\  \Gamma_{0}\ e\ ve \\ \textit{erun}\  \Gamma_{0}\ me\ mve \\ mve = ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } ((me : sc) :: cs) \yhkeywordsp{default} : \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
+  \inferrule[CaseMatch]{\textit{srun}\  \sigma_{0}\ sc\ \sigma_{1} \\ \textit{erun}\  \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ e\ ve \\ \textit{erun}\  \Gamma_{\rm r}^{0}\ \Gamma_{\rm a}^{0}\ me\ mve \\ mve = ve}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case(} e \yhkeyword{) } ((me : sc) :: cs)\ \textit{def} \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
-  \inferrule[CaseDefault]{\textit{srun}\  \sigma_{0}\ s\ \sigma_{1}}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case (} e \yhkeyword{) } [] \yhkeywordsp{default} : (\yhkeyword{Some}\ s) \yhkeywordsp{endcase})\ \sigma_{1}}\\
+  \inferrule[CaseDefault]{\textit{srun}\  \sigma_{0}\ s\ \sigma_{1}}{\textit{srun}\  \sigma_{0}\ (\yhkeyword{case(} e \yhkeyword{) } []\ (\yhconstant{Some}\ s) \yhkeywordsp{endcase})\ \sigma_{1}}\\
 %
   \inferrule[Blocking Reg]{\yhfunction{name}\ \textit{lhs} = \yhconstant{OK}\ n \\ \textit{erun}\  \Gamma_{\rm r}\ \Gamma_{\rm a}\ \textit{rhs}\ v_{\rm rhs}}{\textit{srun}\  (\Gamma_{\rm r},\Gamma_{\rm a},\Delta_{\rm r},\Delta_{\rm a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{\rm r} [n \mapsto v_{\rm rhs}], \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})}\\
 %
-  \inferrule[Nonblocking Reg]{\yhfunction{name}\ \textit{lhs} = \yhconstant{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r} [n \mapsto v_{\rm rhs}], \Delta_{\rm a})}
+  \inferrule[Nonblocking Reg]{\yhfunction{name}\ \textit{lhs} = \yhconstant{OK}\ n \\ \textit{erun}\  \Gamma_{\rm r}\ \Gamma_{\rm a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r} [n \mapsto v_{\rm rhs}], \Delta_{\rm a})}
 %
 %  \inferrule[Blocking Array]{\yhkeyword{name}\ \textit{lhs} = \yhkeyword{OK}\ n \\ \textit{erun}\  \Gamma_{r}\ \Gamma_{a}\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r},\Gamma_{a},\Delta_{r},\Delta_{a})\ (\textit{lhs} = \textit{rhs})\ (\Gamma_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Gamma_{a}, \Delta_{r}, \Delta_{a})}\\
 %
 %  \inferrule[Nonblocking Array]{\yhkeyword{name}\ \textit{lhs} = \yhkeyword{OK}\ n \\ \textit{erun}\  \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\  (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Delta_{a})}
 \end{gather*}
 
-\YH{TODO: Add rules for blocking and nonblocking assignment to arrays.} \JW{CaseNoMatch and CaseMatch rules both have a mismatched parenthesis.} \JW{In CondTrue and CondFalse, the relationship between $\Gamma_0$ and $\sigma_0$ needs clarifying.} \JW{CaseNoMatch is missing the `default:' and `endcase' keywords.} \JW{In CaseNoMatch, it feels weird to me that you keep evaluating $e$ for each case of the switch, rather than just once at the start of the switch statement. I guess it's ok because a failed match doesn't change the state. Just a bit quirky, I guess.} \JW{In CaseDefault, I was slightly surprised to see `Some' -- is that necessary?} \JW{In BlockingReg, erun is taking more parameters than it does elsewhere.} \JW{When using subscripted variables like $\Gamma_r$, I prefer $\Gamma_{\rm r}$ because $r$ is a fixed name (short for `register'), not a variable.}
+\YH{TODO: Add rules for blocking and nonblocking assignment to arrays.} \JW{CaseNoMatch and CaseMatch rules both have a mismatched parenthesis.}\YH{Thanks, fixed.} \JW{In CondTrue and CondFalse, the relationship between $\Gamma_0$ and $\sigma_0$ needs clarifying.}\YH{Clarified now in the previous paragraph.} \JW{CaseNoMatch is missing the `default:' and `endcase' keywords.}
+\YH{I've removed the default keyword now, aiming for a more abstract syntax, but added `endcase'} \JW{In CaseNoMatch, it feels weird to me that you keep evaluating $e$ for each case of the switch, rather than just once at the start of the switch statement. I guess it's ok because a failed match doesn't change the state. Just a bit quirky, I guess.}\YH{Yes that is a bit annoying actually, however, I couldn't really figure out the best way to only evaluate it once, as there isn't really a start to the case statement, we just describe that you could start anywhere in the case statement and evaluate it.  One solution would be to define a separate inductive rule that finds a matching case based on an evaluated value, which may be cleaner actually.} \JW{In CaseDefault, I was slightly surprised to see `Some' -- is that necessary?}\YH{That's true, currently it's a \texttt{Some} because the default case is optional, however, we actually always use it, so we could change it to be mandatory.} \JW{In BlockingReg, erun is taking more parameters than it does elsewhere.}\YH{Yes thank you, fixed that now.} \JW{When using subscripted variables like $\Gamma_r$, I prefer $\Gamma_{\rm r}$ because $r$ is a fixed name (short for `register'), not a variable.}\YH{That is much nicer actually!}
 
 Taking the \textsc{CondTrue} rule as an example, this rule will only apply if the Boolean result of running the expression results in a \texttt{true} value.  It then also states that the statement in the true branch of the conditional statement \textit{stt} runs from state $\sigma_{0}$ to state $\sigma_{1}$.  If both of these conditions hold, we then get that the conditional statement will also run from state $\sigma_{0}$ to state $\sigma_{1}$.  The \textsc{Blocking} and \textsc{Nonblocking} rules are a bit more interesting, as these modify the blocking and nonblocking association maps respectively.