verilog.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235

\section{A Formal Semantics for Verilog}\label{sec:verilog}

\newcommand{\alwaysblock}{always-block}

This section describes the Verilog semantics that was chosen for the target language, including the changes that were made to the semantics to make it a suitable HLS target.  The Verilog standard is quite large~\cite{06_ieee_stand_veril_hardw_descr_languag,05_ieee_stand_veril_regis_trans_level_synth}, but the syntax and semantics can be reduced to a small subset that \vericert{} needs to target.  This section  also describes how \vericert{}'s representation of memory differs from \compcert{}'s memory model.

The Verilog semantics we use is ported to Coq from a semantics written in HOL4 by \citet{loow19_proof_trans_veril_devel_hol} and used to prove the translation from HOL4 to Verilog~\cite{loow19_verif_compil_verif_proces}. % which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design. 
This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model the hardware constructs required for HLS.  The main features that are excluded are continuous assignment and combinational \alwaysblock{}s; these are modelled in other semantics such as that by~\citet{meredith10_veril}. %however, these are not necessarily needed, but require more complicated event queues and execution model.

The semantics of Verilog differs from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, rather than an algorithm, which is usually sequential.  The main construct in Verilog is the \alwaysblock{}.
A module can contain multiple \alwaysblock{}s, all of which run in parallel.  These \alwaysblock{}s further contain statements such as if-statements or assignments to variables.  We support only \emph{synchronous} logic, which means that the \alwaysblock{} is triggered on (and only on) the positive or negative edge of a clock signal.
%\NR{We should mention that variables cannot be driven by multiple \alwaysblock{}s, since one might get confused with data races when relating to concurrent processes in software.} \JW{Given the recent discussion on Teams, it seems to me that we perhaps don't need to mention here what happens if a variable is driven multiple times per clock cycle, especially since \vericert{} isn't ever going to do that.}

The semantics combines the big-step and small-step styles. The overall execution of the hardware is described using a small-step semantics, with one small step per clock cycle; this is appropriate because hardware is routinely designed to run for an unlimited number of clock cycles and the big-step style is ill-suited to describing infinite executions. Then, within each clock cycle, a big-step semantics is used to execute all the statements.
An example of a rule for executing an \alwaysblock{} that is triggered at the positive edge of the clock is shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the \alwaysblock{}:

\begin{equation*}
  \inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}^{+}} \Sigma'}
\end{equation*}

\noindent This rule says that assuming the statement $s$ in the \alwaysblock{} runs with state $\Sigma$ and produces the new state $\Sigma'$, the \alwaysblock{} will result in the same final state.  %Since only clocked \alwaysblock{} are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle for each \alwaysblock{}.

Two types of assignments are supported in \alwaysblock{}s: nonblocking and blocking assignment.  Nonblocking assignments all take effect simultaneously at the end of the clock cycle, %and atomically.
while blocking assignments happen instantly so that later assignments in the clock cycle can pick them up.  To model both of these assignments, the state $\Sigma$ has to be split into two maps: $\Gamma$, which contains the current values of all variables and arrays, and $\Delta$, which contains the values that will be assigned at the end of the clock cycle. $\Sigma$ can therefore be defined as follows: $\Sigma = (\Gamma, \Delta)$.
Nonblocking assignment can therefore be expressed as follows:
\begin{equation*}
  \inferrule[Nonblocking Reg]{\yhkeyword{name}\ d = \yhkeyword{OK}\ n \\ (\Gamma, e) \downarrow_{\text{expr}} v}{((\Gamma, \Delta), d\ \yhkeyword{ <= } e) \downarrow_{\text{stmnt}} (\Gamma, \Delta [n \mapsto v])}\\
\end{equation*}

\noindent where assuming that $\downarrow_{\text{expr}}$ evaluates an expression $e$ to a value $v$, the nonblocking assignment $d\ \yhkeyword{ <= } e$ updates the future state of the variable $d$ with value $v$.

Finally, the following rule dictates how the whole module runs in one clock cycle:
\begin{equation*}
  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma'\ //\ \Delta')}
\end{equation*}
where $\Gamma$ is the initial state of all the variables, $\epsilon$ is the empty map because the $\Delta$ map is assumed to be empty at the start of the clock cycle, and $\vec{m}$ is a list of variable declarations and \alwaysblock{}s that $\downarrow_{\text{module}}$ evaluates sequentially to obtain $(\Gamma', \Delta')$. The final state is obtained by merging these maps using the $//$ operator, which gives priority to the right-hand operand in a conflict. This rule ensures that the nonblocking assignments overwrite at the end of the clock cycle any blocking assignments made during the cycle.

\subsection{Changes to the Semantics}

Five changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as an HLS target.

\paragraph{Adding array support}
The main change is the addition of support for arrays, which are needed to model RAM in Verilog.  RAM is needed to model the stack in C efficiently, without having to declare a variable for each possible stack location. % In the original semantics, RAMs (as well as inputs and outputs to the module) could be modelled using a function from variable names (strings) to values, which could be modified accordingly to model inputs to the module.  This is quite an abstract description of memory and can also be expressed as an array of bitvectors instead, which is the path we took. This requires the addition of array operators to the semantics and correct reasoning of loads and stores to the array in different \alwaysblock{}s simultaneously.  
Consider the following Verilog code:

\begin{center}
\begin{minted}[xleftmargin=40pt,linenos]{verilog}
reg [31:0] x[1:0];
always @(posedge clk) begin x[0] = 1; x[1] <= 1; end
\end{minted}
\end{center}

which modifies one array element using blocking assignment and then a second using nonblocking assignment. If the existing semantics were used to update the array, then during the merge, the entire array \texttt{x} from the nonblocking association map would replace the entire array from the blocking association map.  This would replace \texttt{x[0]} with its original value and therefore behave incorrectly. Accordingly, we modified the maps so they record updates on a per-el\-em\-ent basis. Our state $\Gamma$ is therefore further split up into $\Gamma_{r}$ for instantaneous updates to variables, and $\Gamma_{a}$ for instantaneous updates to arrays ($\Gamma = (\Gamma_{r}, \Gamma_{a})$); $\Delta$ is split similarly ($\Delta = (\Delta_{r}, \Delta_{a})$). The merge function then ensures that only the modified indices get updated when $\Gamma_{a}$ is merged with the nonblocking map equivalent $\Delta_{a}$.

\paragraph{Adding negative edge support}
To reason about circuits that execute on the negative edge of the clock (such as our RAM interface described in Section~\ref{sec:algorithm:optimisation:ram}),  support for negative-edge-triggered \alwaysblock{}s was added to the semantics. This is shown in the modifications of the \textsc{Module} rule shown below:

\begin{equation*}
  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m}) \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \yhkeyword{module}\ \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\ \Delta'')}
\end{equation*}

The main execution of the module $\downarrow_{\text{module}}$ is split into $\downarrow_{\text{module}^{+}}$ and $\downarrow_{\text{module}^{-}}$, which are rules that only execute \alwaysblock{}s triggered at the positive and at the negative edge respectively. The positive-edge-triggered \alwaysblock{}s are processed in the same way as in the original \textsc{Module} rule. The output maps $\Gamma'$ and $\Delta'$ are then merged and passed as the blocking assignments map into the negative edge execution, so that all the blocking and nonblocking assignments are present.  Finally, all the negative-edge-triggered \alwaysblock{}s are processed and merged to give the final state.

\paragraph{Adding declarations} Explicit support for declaring inputs, outputs and internal variables was added to the semantics to make sure that the generated Verilog also contains the correct declarations.  This adds some guarantees to the generated Verilog and ensures that it synthesises and simulates correctly.

\paragraph{Removing support for external inputs to modules} Support for receiving external inputs was removed from the semantics for simplicity, as these are not needed for an HLS target. The main module in Verilog models the main function in C, and since the inputs to a C function should not change during its execution, there is no need for external inputs for Verilog modules.

\paragraph{Simplifying representation of bitvectors} Finally, we use 32-bit integers to represent bitvectors rather than arrays of booleans. This is because \vericert{} (currently) only supports types represented by 32 bits.

\subsection{Integrating the Verilog Semantics into \compcert{}'s Model}
\label{sec:verilog:integrating}

\begin{figure*}
  \centering
  \begin{minipage}{1.0\linewidth}
    \begin{gather*}
      \inferrule[Step]{\Gamma_r[\mathit{rst}] = 0 \\ \Gamma_r[\mathit{fin}] = 0 \\ (m, (\Gamma_r, \Gamma_a))\ \downarrow_{\text{program}} (\Gamma_r', \Gamma_a')}{\yhconstant{State}\ \mathit{sf}\ m\ \ \Gamma_r[\sigma]\ \ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{State}\ \mathit{sf}\ m\ \ \Gamma_r'[\sigma]\ \ \Gamma_r'\ \Gamma_a'}\\
      %
      \inferrule[Finish]{\Gamma_r[\mathit{fin}] = 1}{\yhconstant{State}\ \mathit{sf}\ m\ \sigma\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{Returnstate}\ \mathit{sf}\ \Gamma_r[ \mathit{ret}]}\\
      %
      \inferrule[Call]{ }{\yhconstant{Callstate}\ \mathit{sf}\ m\ \vec{r} \longrightarrow \yhconstant{State}\ \mathit{sf}\ m\ n\ ((\yhfunction{init\_params}\ \vec{r}\ a)[\sigma \mapsto n, \mathit{fin} \mapsto 0, \mathit{rst} \mapsto 0])\ \epsilon}\\
      %
      \inferrule[Return]{ }{\yhconstant{Returnstate}\ (\yhconstant{Stackframe}\ r\ m\ \mathit{pc}\ \Gamma_r\ \Gamma_a :: \mathit{sf})\ v \longrightarrow \yhconstant{State}\ \mathit{sf}\ m\ \mathit{pc}\ (\Gamma_{r} [ \sigma \mapsto \mathit{pc}, r \mapsto v ])\ \Gamma_{a}}
    \end{gather*}
  \end{minipage}
  \caption{Top-level small-step semantics for Verilog modules in \compcert{}'s computational framework.}%
  \label{fig:inference_module}
\end{figure*}

The \compcert{} computation model defines a set of states through which execution passes. In this subsection, we explain how we extend our Verilog semantics with four special-purpose  registers in order to integrate it into \compcert{}.

\compcert{} executions pass through three main states:
\begin{description}
  \item[\texttt{State} $\mathit{sf}$ $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$] The main state when executing a function, with stack frame $\mathit{sf}$, current module $m$, current state $v$ and variable states $\Gamma_{r}$ and $\Gamma_{a}$.
  \item[\texttt{Callstate} $\mathit{sf}$ $m$ $\vec{r}$] The state that is reached when a function is called, with the current stack frame $\mathit{sf}$, current module $m$ and arguments $\vec{r}$.
  \item[\texttt{Returnstate} $\mathit{sf}$ $v$] The state that is reached when a function returns back to the caller, with stack frame $\mathit{sf}$ and return value $v$.
\end{description}

To support this computational model, we extend the Verilog module we generate with the following four registers and a RAM block:

\begin{description}
  \item[program counter] The program counter can be modelled using a register that keeps track of the state, denoted as $\sigma$.
  \item[function entry point] When a function is called, the entry point denotes the first instruction that will be executed. This can be modelled using a reset signal that sets the state accordingly, denoted as $\mathit{rst}$.
  \item[return value] The return value can be modelled by setting a finished flag to 1 when the result is ready, and putting the result into a 32-bit output register. These are denoted as $\mathit{fin}$ and $\mathit{ret}$ respectively.
%\JW{Is there a mismatch between `ret' in the figure and `rtrn' in the text?}
  \item[stack] The function stack can be modelled as a RAM block, which is implemented using an array in the module, and denoted as $\mathit{stk}$.
%\JW{Is there a mismatch between `st' in the figure and `stk' in the text?}\YH{It was actually between $\Gamma_{a}$ and \mathit{stk}.  The \mathit{st} should have been $\sigma$.}
\end{description}

Fig.~\ref{fig:inference_module} shows the inference rules for moving between the computational states.  The first, \textsc{Step}, is the normal rule of execution.  It defines one step in the \texttt{State} state, assuming that the module is not being reset, that the finish state has not been reached yet, that the current and next state are $v$ and $v'$, and that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Step} rule.  The \textsc{Finish} rule returns the final value of running the module and is applied when the $\mathit{fin}$ register is set; the return value is then taken from the $\mathit{ret}$ register.

Note that there is no step from \texttt{State} to \texttt{Callstate}; this is because function calls are not supported, and it is therefore impossible in our semantics ever to reach a \texttt{Callstate} except for the initial call to main. So the \textsc{Call} rule is only used at the very beginning of execution; likewise, the \textsc{Return} rule is only matched for the final return value from the main function.
Therefore, in addition to the rules shown in Fig.~\ref{fig:inference_module}, an initial state and final state need to be defined:

\begin{gather*}
  \inferrule[Initial]{\yhfunction{is\_internal}\ P.\texttt{main}}{\yhfunction{initial\_state}\ (\yhconstant{Callstate}\ []\ P.\texttt{main}\ [])}\qquad
  \inferrule[Final]{ }{\yhfunction{final\_state}\ (\yhconstant{Returnstate}\ []\ n)\ n}
\end{gather*}

\noindent where the initial state is the \texttt{Callstate} with an empty stack frame and no arguments for the \texttt{main} function of program $P$, where this \texttt{main} function needs to be in the current translation unit.  The final state results in the program output of value $n$ when reaching a \texttt{Returnstate} with an empty stack frame.

\subsection{Memory Model}\label{sec:verilog:memory}

The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language.  There is no preexisting architecture that Verilog will produce; it can describe any memory layout that is needed.  Instead of having specific semantics for memory, the semantics only needs to support the language features that can produce these different memory layouts, these being Verilog arrays.  We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment.  We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays used in Verilog.  The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite.  There have already been efforts to define a general finite memory model for all intermediate languages in \compcert{}, such as \compcert{}\-S~\cite{besson18_compc} or \compcert{}-TSO~\cite{sevcik13_compc}, or keeping the intermediate languages intact and translate to a more concrete finite memory model in the back end, such as in \compcert{}\-ELF~\cite{wang20_compc}.  We also define such a translation from \compcert{}'s standard infinite memory model to finite arrays that can be represented in Verilog.  There is therefore no more notion of an abstract memory model and all the interactions to memory are encoded in the hardware itself.

%\JW{I'm not quite sure I understand. Let me check: Are you saying that previous work has shown how all the existing CompCert passes can be adapted from an infinite to a finite memory model, but what we're doing is leaving the default (infinite) memory model for the CompCert front end, and just converting from an infinite memory model to a finite memory model when we go from 3AC to HTL?}\YH{Yes exactly, most papers changed the whole memory model to thread through properties that were then needed in the back end, but we currently don't need to do that.  I need to double check though for CompCertELF, it doesn't actually seem to be the case.  Will edit this section later.}

\definecolor{compcertmemmodel}{HTML}{e2ccea}
\definecolor{vericertmemmodel}{HTML}{cbe1db}
\begin{figure}
  \centering
  \begin{tikzpicture}
    \fill[compcertmemmodel,rounded corners=3pt] (0,0) rectangle (5,-5);
    \fill[vericertmemmodel,rounded corners=3pt] (7,0) rectangle (12,-5);
    \node[right] at (0,-0.3) {\small \textbf{\compcert{}'s Memory Model}};
    \node[right] at (7,-0.3) {\small \textbf{Verilog Memory Representation}};
    \node[right] (x0) at (0.2,-1.9) {\small 0};
    \node[right] (x1) at (0.2,-2.5) {\small 1};
    \node[rotate=90] (x2) at (0.43,-3.1) {$\cdots$};
    \foreach \x in {0,...,6}{%
      \node[right] (s\x) at (2.5,-1-\x*0.3) {\small \x};
      \node[right] (t\x) at (4,-1-\x*0.3) {};
      \draw[->] (s\x) -- (t\x);
    }

    \node[right] at (t0) {\small \texttt{DE}};
    \node[right] at (t1) {\small \texttt{AD}};
    \node[right] at (t2) {\small \texttt{BE}};
    \node[right] at (t3) {\small \texttt{EF}};
    \node[right] at (t4) {\small \texttt{12}};
    \node[right] at (t5) {\small \texttt{34}};
    \node[right] at (t6) {\small \texttt{56}};
    \node[right] at (3.1,-3.1) {$\cdots$};

    \node[right] at (3.1,-4) {$\cdots$};
    \node[scale=1.3] at (6,-2.5) {\Huge $\Rightarrow$};

    \draw[->] (x0) -- (s3);
    \draw[->] (x1) -- (2.5,-4);
    \draw (0,-4.3) -- (5,-4.3);
    \node at (2.5,-4.7) {\small \texttt{x[0] = 0xDEADBEEF;}};

    \draw (7.2,-1.2) rectangle (9.4,-3.9);
    \draw (9.6,-1.2) rectangle (11.8,-3.9);

    \foreach \x in {0,...,8}{%
      \draw (7.2,-1.2-\x*0.3) -- (9.4,-1.2-\x*0.3);
      \draw (9.6,-1.2-\x*0.3) -- (11.8,-1.2-\x*0.3);
      \node (b\x) at (8.3,-1.35-\x*0.3) {};
      \node (nb\x) at (10.7,-1.35-\x*0.3) {};
    }

    \node[scale=1.2] at (b0) {\tiny\texttt{0: Some 00000000}};
    \node[scale=1.2] at (b1) {\tiny\texttt{1: Some 12345600}};
    \node[scale=1.2] at (b2) {\tiny\texttt{2: Some 00000000}};
    \node[scale=1.2] at (b3) {\tiny\texttt{3: Some 00000000}};
    \node[scale=1.2] at (b4) {\tiny\texttt{4: Some 00000000}};
    \node[scale=1.2] at (b5) {\tiny\texttt{5: Some 00000000}};
    \node[scale=1.2] at (b6) {\tiny\texttt{6: Some 00000000}};
    \node[scale=1.2] at ($(b7) - (0,0.05)$) {$\cdots$};
    \node[scale=1.2] at (b8) {\tiny\texttt{N: Some 00000000}};

    \node[scale=1.2] at (nb0) {\tiny\texttt{0: Some DEADBEEF}};
    \node[left,scale=1.2] at (nb1) {\tiny\texttt{1: None}};
    \node[left,scale=1.2] at (nb2) {\tiny\texttt{2: None}};
    \node[left,scale=1.2] at (nb3) {\tiny\texttt{3: None}};
    \node[left,scale=1.2] at (nb4) {\tiny\texttt{4: None}};
    \node[left,scale=1.2] at (nb5) {\tiny\texttt{5: None}};
    \node[left,scale=1.2] at (nb6) {\tiny\texttt{6: None}};
    \node[scale=1.2] at ($(nb7) - (0,0.05)$) {$\cdots$};
    \node[left,scale=1.2] at (nb8) {\tiny\texttt{N: None}};

    \node at (8.3,-1) {$\Gamma_{a}$};
    \node at (10.7,-1) {$\Delta_{a}$};

    \draw (7,-4.3) -- (12,-4.3);
    \node at (9.5,-4.7) {\small \texttt{stack[0] <= 0xDEADBEEF;}};
  \end{tikzpicture}
  \Description{\compcert{}'s memory model is translated into a more concrete memory model based on Verilog arrays.  Two association maps are therefore needed to keep track of the blocking and nonblocking assignments.}
  \caption{Change in the memory model during the translation of 3AC into HTL.  The state of the memories in each case is right after the execution of the store to memory.}\label{fig:memory_model_transl}
\end{figure}

%\JW{It's not completely clear what the relationship is between your work and those works. The use of `only' suggests that you've re-done a subset of work that has already been done -- is that the right impression?}\YH{Hopefully that's more clear.}

This translation is represented in Fig.~\ref{fig:memory_model_transl}.  \compcert{} defines a map from blocks to maps from memory addresses to memory contents.  Each block represents an area in memory; for example, a block can represent a global variable or a stack for a function. As there are no global variables, the main stack can be assumed to be block 0, and this is the only block we translate.
%\JW{So the stack frame for a function called by main would be in a different block, is that the idea? Seems unusual not to have a single stack.}
%\YH{Yeah exactly, it makes it much easier to reason about though, because everything is nicely isolated.  This is exactly what CompCertELF and CompCertS try and solve though.} 
%\JW{Would global variables normally be put in blocks 1, 2, etc.?}
%\YH{Yes, although it may also be possible that they could be numbered 0, 1, 2, 3, 4, pushing the block of the stack higher.} 
Meanwhile, our Verilog semantics defines two finite arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$. 
%\JW{It's a slight shame that `block' is used in two different senses in the preceding two sentences. I guess that can't be helped.}
%\YH{Ah that's true, I hadn't even noticed.  Yeah I think it would be good to keep the name ``block'' for CompCert's blocks.} 
The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle. %During our translation we only convert block 0 to a Verilog memory, and ensure that it is the only block that is present.  
%This means that the block necessarily represents the stack of the main function.  
The invariant used in the proofs is that block 0 should be equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.

%However, in practice, assigning and reading from an array directly in the state machine will not produce a memory in the final hardware design, as the synthesis tool cannot identify the array as having the necessary properties that a RAM needs, even though this is the most natural formulation of memory.  Even though theoretically the memory will only be read from once per clock cycle, the synthesis tool cannot ensure that this is true, and will instead create a register for each memory location.  This increases the size of the circuit dramatically, as the RAM on the FPGA chip will not be reused.  Instead, the synthesis tool expects a specific interface that ensures these properties, and will then transform the interface into a proper RAM during synthesis.  Therefore, a translation has to be performed from the naive use of memory in the state machine, to a proper use of a memory interface.

%\begin{figure}
%  \centering
%  \begin{subfigure}[t]{0.48\linewidth}
%    \includegraphics[width=\linewidth]{diagrams/store_waveform.pdf}
%    \caption{Store waveform.}
%  \end{subfigure}\hfill%
%  \begin{subfigure}[t]{0.48\linewidth}
%    \includegraphics[width=\linewidth]{diagrams/load_waveform.pdf}
%    \caption{Load waveform.}
%  \end{subfigure}
%\end{figure}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% TeX-command-extra-options: "-shell-escape"
%%% End: