Changes to diagrams

author: Yann Herklotz <git@yannherklotz.com> 2021-04-12 18:20:36 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-04-12 18:20:36 +0100
commit: 7fab5ba301c2200d14bb081480e1b378b266e062 (patch)
tree: 51c982f681875923ef6ab4a0775120a3287e22c4 /verilog.tex
parent: 6613a83ad971a00a35315018486b3b12884ebb83 (diff)
download: oopsla21_fvhls-7fab5ba301c2200d14bb081480e1b378b266e062.tar.gz
oopsla21_fvhls-7fab5ba301c2200d14bb081480e1b378b266e062.zip
1 files changed, 24 insertions, 15 deletions
diff --git a/verilog.tex b/verilog.tex
index c8b62ee..3cc435c 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -15,7 +15,7 @@ The semantics combines the big-step and small-step styles. The overall execution
 An example of a rule for executing an \alwaysblock{} is shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the \alwaysblock{}:
 
 \begin{equation*}
-  \inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}} \Sigma'}
+  \inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}^{+}} \Sigma'}
 \end{equation*}
 
 \noindent This rule says that assuming the statement $s$ in the \alwaysblock{} runs with state $\Sigma$ and produces the new state $\Sigma'$, the \alwaysblock{} will result in the same final state.  %Since only clocked \alwaysblock{} are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle for each \alwaysblock{}.
@@ -31,7 +31,7 @@ Nonblocking assignment can therefore be expressed as follows:
 
 Finally, the following rule dictates how the whole module runs in one clock cycle:
 \begin{equation*}
-  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma' // \Delta')}
+  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m}) \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\ \Delta'')}
 \end{equation*}
 where $\Gamma$ is the initial state of all the variables, and $\vec{m}$ is a list of variable declarations and \alwaysblock{}s that $\downarrow_{\text{module}}$ evaluates sequentially to obtain $(\Gamma', \Delta')$. The final state is obtained by merging these maps using the $//$ operator, which gives priority to the right-hand operand in a conflict. This rule ensures that the nonblocking assignments overwrite at the end of the clock cycle any blocking assignments made during the cycle.
 
@@ -111,7 +111,15 @@ Therefore, in addition to the rules shown in Figure~\ref{fig:inferrence_module},
 
 \subsection{Memory Model}
 
-The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language.  There is no preexisting architecture that Verilog will produce, it can describe any memory layout that is needed.  Instead of having specific semantics for memory, the semantics only need to support the language features that can produce these different memory layouts, these being Verilog arrays.
+The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language.  There is no preexisting architecture that Verilog will produce, it can describe any memory layout that is needed.  Instead of having specific semantics for memory, the semantics only need to support the language features that can produce these different memory layouts, these being Verilog arrays.  We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment.  We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays that are used in Verilog.  The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite.  There have already been various efforts to define a finite memory model for \compcert{}, such as CompCertS~\cite{besson18_compc}, CompCertELF~\cite{wang20_compc} and CompCertTSO~\cite{sevcik13_compc}, however, we only define the translation from \compcert{}'s standard infinite memory model to finitely sized arrays that can be represented in Verilog.
+
+This translation is represented in Figure~\ref{fig:memory_model_transl}, where \compcert{} defines a map from blocks to maps from memory address to memory contents.  Instead, our Verilog semantics define two finitely sized arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$.  The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle.
+
+\begin{figure}
+  \centering
+  \includegraphics[width=0.5\linewidth]{diagrams/memory_model.pdf}
+  \caption{Change in the memory model during the translation of 3AC to HTL.  This is immediately after the assignment to the array}\label{fig:memory_model_transl}
+\end{figure}
 
 However, in practice, assigning and reading from an array directly in the state machine will not produce a memory in the final hardware design, as the synthesis tool cannot identify the array as having the necessary properties that a RAM needs, even though this is the most natural formulation of memory.  Even though theoretically the memory will only be read from once per clock cycle, the synthesis tool cannot ensure that this is true, and will instead create a register for each memory location.  This increases the size of the circuit dramatically, as the RAM on the FPGA chip will not be reused.  Instead, the synthesis tool expects a specific template that ensures these properties, and will then transform the template into a proper RAM during synthesis.  Therefore, a translation has to be performed from the naive use of memory in the state machine, to a proper use of a memory template.
 
@@ -119,19 +127,20 @@ However, in practice, assigning and reading from an array directly in the state
   \centering
   \begin{minipage}{1.0\linewidth}
     \begin{gather*}
-      \inferrule[None]{ }{\textit{exec\_ram}\ \Gamma\ \Delta\ \yhconstant{None}\ \Delta}\qquad
+      \inferrule[None]{ }{(\Gamma,\Delta, \yhconstant{None}) \downarrow_{\text{ram}} \Delta}\qquad
 %
-      \inferrule[Idle]{\Gamma_{\rm r}\ !\ \textit{r.en} = \Gamma_{\rm r}\ !\ \textit{r.u\_{en}}}{\textit{exec\_ram}\ (\Gamma_{\rm r}, \Gamma_{\rm a})\ \Delta\ (\yhconstant{Some}\ r)\ \Delta}\\
+      \inferrule[Idle]{\Gamma_{\rm r}\ !\ \textit{r.en} = \Gamma_{\rm r}\ !\ \textit{r.u\_{en}}}{((\Gamma_{\rm r}, \Gamma_{\rm a}), \Delta, \yhconstant{Some}\ r) \downarrow_{\text{ram}} \Delta}\\
 %
-      \inferrule[Read]{\Gamma_{\rm r}\ !\ \textit{r.en} \ne \Gamma_{\rm r}\ !\ \textit{r.u\_en} \\ \Gamma_{\rm r}\ !\ \textit{r.wr\_en} = 0}{ \textit{exec\_ram}\ (\Gamma_{\rm r}, \Gamma_{\rm a})\ (\Delta_{\rm r}, \Delta_{\rm a})\ (\yhconstant{Some}\ r)\ (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}, \textit{r.d\_out} \mapsto (\Gamma_{\rm a}\ !\ \textit{r.mem})\ !\ \textit{r.addr}], \Delta_{\rm a}) }\\
+      \inferrule[Read]{\Gamma_{\rm r}\ !\ \textit{r.en} \ne \Gamma_{\rm r}\ !\ \textit{r.u\_en} \\ \Gamma_{\rm r}\ !\ \textit{r.wr\_en} = 0}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), \yhconstant{Some}\ r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}, \textit{r.d\_out} \mapsto (\Gamma_{\rm a}\ !\ \textit{r.mem})\ !\ \textit{r.addr}], \Delta_{\rm a}) }\\
 %
-      \inferrule[Write]{\Gamma_{\rm r}\ !\ \textit{r.en} \ne \Gamma_{\rm r}\ !\ \textit{r.u\_en} \\ \Gamma_{\rm r}\ !\ \textit{r.wr\_en} = 1}{ \textit{exec\_ram}\ (\Gamma_{\rm r}, \Gamma_{\rm a})\ (\Delta_{\rm r}, \Delta_{\rm a})\ (\yhconstant{Some}\ r)\ (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}], \Delta_{\rm a}[\textit{r.mem} \mapsto (\Gamma_{\rm a}\ !\ \textit{r.mem})[\textit{r.addr} \mapsto \textit{r.d\_in}]]) }
+      \inferrule[Write]{\Gamma_{\rm r}\ !\ \textit{r.en} \ne \Gamma_{\rm r}\ !\ \textit{r.u\_en} \\ \Gamma_{\rm r}\ !\ \textit{r.wr\_en} = 1}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), \yhconstant{Some}\ r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}], \Delta_{\rm a}[\textit{r.mem} \mapsto (\Gamma_{\rm a}\ !\ \textit{r.mem})[\textit{r.addr} \mapsto \textit{r.d\_in}]]) }
     \end{gather*}
   \end{minipage}
   \caption{Specification for memory implementation in HTL, which is then implemented by equivalent Verilog code.}\label{fig:htl_ram_spec}
 \end{figure}
 
 \begin{figure}
+  \begin{subfigure}[b]{0.48\linewidth}
 \begin{minted}{verilog}
   reg [31:0] mem [SIZE-1:0];
   always @(negedge clk)
@@ -141,10 +150,16 @@ However, in practice, assigning and reading from an array directly in the state
       en <= u_en;
     end
 \end{minted}
-  \caption{Implementation of memory specification in Verilog of its HTL specification.}\label{fig:verilog_ram_impl}
+    \caption{Implementation of memory specification in Verilog of its HTL specification.}\label{fig:verilog_ram_impl}
+  \end{subfigure}\hfill%
+  \begin{subfigure}[b]{0.48\linewidth}
+    \includegraphics[width=\linewidth]{diagrams/load_waveform.pdf}
+    \caption{Example wave form for a load being performed by the Verilog code.}
+  \end{subfigure}
+  \caption{RAM representation in Verilog and a trace of it's execution.}
 \end{figure}
 
-This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.  There are two interesting parts to the memory template that is used for the stack of the main function.  Firstly, the memory updates are triggered on the negative edge of the clock, out of phase with the rest of the design, which is triggered on the positive edge of the clock.  The main advantage is that instead of loads and stores taking three and two clock cycles respectively, they only take two and one clock cycle instead, greatly improving their performance.  In addition to that, using the negative edge for the clock is supported by many synthesis tools, it therefore does not affect the maximum frequency of the final design.  Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is also atypical.  To make the proof simpler, the goal is to create a RAM which disables itself after every use, so that firstly, the proof can assume that the RAM is disabled at the start and end of every clock cycle, and secondly so that only the state which contains the load and store need to be modified to ensure this property.
+This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.  There are two interesting parts to the memory template that is used for the stack of the main function.  Firstly, the memory updates are triggered on the negative edge of the clock, out of phase with the rest of the design, which is triggered on the positive edge of the clock.  The main advantage is that instead of loads and stores taking three and two clock cycles respectively, they only take two and one clock cycle instead, greatly improving their performance.  In addition to that, using the negative edge for the clock is supported by many synthesis tools, it therefore does not affect the maximum frequency of the final design.  Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is also atypical.  To make the proof simpler, the goal is to create a RAM which disables itself after every use, so that firstly, the proof can assume that the RAM is disabled at the start and end of every clock cycle, and secondly so that only the state which contains the load and store need to be modified to ensure this property.  Using a simple enable signal, it would not be possible to disable it in the RAM itself, as well as enabling it in the datapath, as this would result in a register being driven twice from two different locations.  We can instead generate a second enable signal that is set by the user, and the original enable signal is then updated by the RAM to be equal to the value that the user set.  This means that the RAM should be enabled whenever the two signals are different, and disabled otherwise.
 
 %\begin{figure}
 %  \centering
@@ -158,12 +173,6 @@ This memory template can be represented using the following semantics shown in F
 %  \end{subfigure}
 %\end{figure}
 
-\begin{figure}
-  \centering
-  \includegraphics[width=0.5\linewidth]{diagrams/memory_model.pdf}
-  \caption{Change in the memory model during the translation of 3AC to HTL.  This is immediately after the assignment to the array}
-\end{figure}
-
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: "main"
author	Yann Herklotz <git@yannherklotz.com>	2021-04-12 18:20:36 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-04-12 18:20:36 +0100
commit	7fab5ba301c2200d14bb081480e1b378b266e062 (patch)
tree	51c982f681875923ef6ab4a0775120a3287e22c4 /verilog.tex
parent	6613a83ad971a00a35315018486b3b12884ebb83 (diff)
download	oopsla21_fvhls-7fab5ba301c2200d14bb081480e1b378b266e062.tar.gz oopsla21_fvhls-7fab5ba301c2200d14bb081480e1b378b266e062.zip