3 files changed, 45 insertions, 22 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 3ea6778..83b879a 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -331,7 +331,49 @@ However, the memory model that \compcert{} uses for its intermediate languages i
 
 \subsubsection{Implementation of RAM templates}
 
-Verilog arrays can be used in a variety of ways, however, these do not all produce optimal hardware designs.  Synthesis tools provide code snippets that they know how to transform into various constructs, including proper RAMs in the final hardware.  If, instead of using these templates, the array was accessed immediately in the data-path, then the synthesis tool would not be able to identify it as having the right properties for a RAM, and would instead implement the array using registers.  This is extremely expensive, and for large memories this can easily blow up the area usage of the FPGA, and because of the longer wires that are needed, it would also affect the performance of the circuit.  The initial translation from 3AC to HTL converts loads and stores to direct accesses to the memory, as this preserves the same behaviour without having to insert more registers and signalling.  Another pass from HTL to HTL then performs the translation from this na\"ive representation to using a proper RAM template, adding the necessary data, address and enable signals to make the synthesis tool infer that block as a proper RAM.  The RAM template that is used has some interesting additions to make the proof of this pass easier, which is further explained in Section~\ref{sec:verilog:memory}.
+Verilog arrays can be used in a variety of ways, however, these do not all produce optimal hardware designs.  Synthesis tools provide code snippets that they know how to transform into various constructs, including proper RAMs in the final hardware.  If, instead of using these templates, the array was accessed immediately in the data-path, then the synthesis tool would not be able to identify it as having the right properties for a RAM, and would instead implement the array using registers.  This is extremely expensive, and for large memories this can easily blow up the area usage of the FPGA, and because of the longer wires that are needed, it would also affect the performance of the circuit.  The initial translation from 3AC to HTL converts loads and stores to direct accesses to the memory, as this preserves the same behaviour without having to insert more registers and signalling.  Another pass from HTL to HTL then performs the translation from this na\"ive representation to using a proper RAM template, adding the necessary data, address and enable signals to make the synthesis tool infer that block as a proper RAM.
+
+There are two interesting parts to the memory template that is used for the stack of the main function.  Firstly, the memory updates are triggered on the negative edge of the clock, out of phase with the rest of the design, which is triggered on the positive edge of the clock.  The main advantage is that instead of loads and stores taking three and two clock cycles respectively, they only take two and one clock cycle instead, greatly improving their performance.  In addition to that, using the negative edge for the clock is supported by many synthesis tools, it therefore does not affect the maximum frequency of the final design.  Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is also atypical.  To make the proof simpler, the goal is to create a RAM which disables itself after every use, so that firstly, the proof can assume that the RAM is disabled at the start and end of every clock cycle, and secondly so that only the state which contains the load and store need to be modified to ensure this property.  Using a simple enable signal, it would not be possible to disable it in the RAM itself, as well as enabling it in the datapath, as this would result in a register being driven twice from two different locations.  We can instead generate a second enable signal that is set by the user, and the original enable signal is then updated by the RAM to be equal to the value that the user set.  This means that the RAM should be enabled whenever the two signals are different, and disabled otherwise.
+
+\begin{figure}
+  \centering
+  \begin{subfigure}[b]{0.48\linewidth}
+    \begin{tikztimingtable}[timing/d/background/.style={fill=white}]
+      \small clk & 2L 3{6C} \\
+      \small u\_en & 2D{u\_en} 18D{$\overline{\text{u\_en}}$}\\
+      \small en & 8D{u\_en} 12D{$\overline{\text{u\_en}}$}\\
+      \small addr & 2U 18D{3} \\
+      \small d\_out & 8U 12D{0xDEADBEEF} \\
+      \small r & 14U 6D{0xDEADBEEF} \\
+      \extracode
+      \node[help lines] at (2,2.25) {\tiny 1};
+      \node[help lines] at (8,2.25) {\tiny 2};
+      \node[help lines] at (14,2.25) {\tiny 3};
+      \begin{pgfonlayer}{background}
+        \vertlines[help lines]{2,8,14}
+      \end{pgfonlayer}
+    \end{tikztimingtable}
+    \caption{Timing diagram for loads.  The \texttt{u\_en} signal is toggled which enables the RAM, then d\_out is set to be the value stored at the address in the RAM, which is finally assigned to the register \texttt{r}.}
+  \end{subfigure}\hfill%
+  \begin{subfigure}[b]{0.48\linewidth}
+    \begin{tikztimingtable}[timing/d/background/.style={fill=white}]
+      \small clk & 2L 2{7C} \\
+      \small u\_en & 2D{u\_en} 14D{$\overline{\text{u\_en}}$}\\
+      \small en & 9D{u\_en} 7D{$\overline{\text{u\_en}}$}\\
+      \small addr & 2U 14D{3} \\
+      \small d\_in & 2U 14D{0xDEADBEEF} \\
+      \small stack[addr] & 9U 7D{0xDEADBEEF} \\
+      \extracode
+      \node[help lines] at (2,2.25) {\tiny 1};
+      \node[help lines] at (9,2.25) {\tiny 2};
+      \begin{pgfonlayer}{background}
+        \vertlines[help lines]{2,9}
+      \end{pgfonlayer}
+    \end{tikztimingtable}
+    \caption{Timing diagram for stores.  The \texttt{u\_en} signal is toggled to enable the RAM, together with the address \texttt{addr} and the data to store \texttt{d\_in}.  On the negative edge the data is stored into the RAM.}
+  \end{subfigure}
+  \caption{Timing diagrams showing the execution of loads and stores over multiple clock cycles.}\label{fig:ram_load_store}
+\end{figure}
 
 \subsubsection{Implementing the \texttt{Oshrximm} instruction}
 
diff --git a/main.tex b/main.tex
index 07c754a..93c4f4d 100644
--- a/main.tex
+++ b/main.tex
@@ -46,6 +46,7 @@
 \usepackage{tikz}
 \usepackage{minted}
 \usepackage{microtype}
+\usepackage{tikz-timing}
 
 \usetikzlibrary{shapes,calc,arrows.meta}
 \usetikzlibrary{pgfplots.groupplots}
diff --git a/verilog.tex b/verilog.tex
index e77aa97..cb0d243 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -140,27 +140,7 @@ However, in practice, assigning and reading from an array directly in the state
   \caption{Specification for memory implementation in HTL, which is then implemented by equivalent Verilog code.}\label{fig:htl_ram_spec}
 \end{figure}
 
-\begin{figure}
-  \begin{subfigure}[b]{0.48\linewidth}
-\begin{minted}{verilog}
-  reg [31:0] mem [SIZE-1:0];
-  always @(negedge clk)
-    if (en != u_en) begin
-      if (wr_en) mem[addr] <= d_in;
-      else d_out <= mem[addr];
-      en <= u_en;
-    end
-\end{minted}
-    \caption{Implementation of memory specification in Verilog of its HTL specification.}\label{fig:verilog_ram_impl}
-  \end{subfigure}\hfill%
-  \begin{subfigure}[b]{0.48\linewidth}
-    \includegraphics[width=\linewidth]{diagrams/load_waveform.pdf}
-    \caption{Example wave form for a load being performed by the Verilog code. \JW{Can we say that the load is loading the value DEADBEEF from address 3 into the variable x, just to help the reader make sense of the signals in the timing diagram? }}
-  \end{subfigure}
-  \caption{RAM representation in Verilog and a trace of its execution.}
-\end{figure}
-
-This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.  There are two interesting parts to the memory template that is used for the stack of the main function.  Firstly, the memory updates are triggered on the negative edge of the clock, out of phase with the rest of the design, which is triggered on the positive edge of the clock.  The main advantage is that instead of loads and stores taking three and two clock cycles respectively, they only take two and one clock cycle instead, greatly improving their performance.  In addition to that, using the negative edge for the clock is supported by many synthesis tools, it therefore does not affect the maximum frequency of the final design.  Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is also atypical.  To make the proof simpler, the goal is to create a RAM which disables itself after every use, so that firstly, the proof can assume that the RAM is disabled at the start and end of every clock cycle, and secondly so that only the state which contains the load and store need to be modified to ensure this property.  Using a simple enable signal, it would not be possible to disable it in the RAM itself, as well as enabling it in the datapath, as this would result in a register being driven twice from two different locations.  We can instead generate a second enable signal that is set by the user, and the original enable signal is then updated by the RAM to be equal to the value that the user set.  This means that the RAM should be enabled whenever the two signals are different, and disabled otherwise.
+This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.
 
 %\begin{figure}
 %  \centering