summaryrefslogtreecommitdiffstats
path: root/verilog.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2021-08-05 05:49:55 +0000
committernode <node@git-bridge-prod-0>2021-08-05 07:46:06 +0000
commit67ae89e4668127f3de7f7adf217469c372a16f8b (patch)
treed8fe1f6aa09b47c79bc9a231c90d790c5306bb9e /verilog.tex
parentf7e372cacdc85498828fb9f0fc3ea86099f9301e (diff)
downloadoopsla21_fvhls-67ae89e4668127f3de7f7adf217469c372a16f8b.tar.gz
oopsla21_fvhls-67ae89e4668127f3de7f7adf217469c372a16f8b.zip
Update on Overleaf.
Diffstat (limited to 'verilog.tex')
-rw-r--r--verilog.tex16
1 files changed, 8 insertions, 8 deletions
diff --git a/verilog.tex b/verilog.tex
index c169aaf..21fadfd 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -8,7 +8,7 @@ The Verilog semantics we use is ported to Coq from a semantics written in HOL4 b
This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model the hardware constructs required for HLS. The main features that are excluded are continuous assignment and combinational \alwaysblock{}s; these are modelled in other semantics such as that by~\citet{meredith10_veril}. %however, these are not necessarily needed, but require more complicated event queues and execution model.
The semantics of Verilog differs from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, rather than an algorithm, which is usually sequential. The main construct in Verilog is the \alwaysblock{}.
-A module can contain multiple \alwaysblock{}s, all of which run in parallel. These \alwaysblock{}s further contain statements such as if-statements or assignments to variables. We support only \emph{synchronous} logic, which means that the \alwaysblock{} is triggered on (and only on) the rising or falling edge of a clock signal.
+A module can contain multiple \alwaysblock{}s, all of which run in parallel. These \alwaysblock{}s further contain statements such as if-statements or assignments to variables. We support only \emph{synchronous} logic, which means that the \alwaysblock{} is triggered on (and only on) the positive or negative edge of a clock signal.
%\NR{We should mention that variables cannot be driven by multiple \alwaysblock{}s, since one might get confused with data races when relating to concurrent processes in software.} \JW{Given the recent discussion on Teams, it seems to me that we perhaps don't need to mention here what happens if a variable is driven multiple times per clock cycle, especially since \vericert{} isn't ever going to do that.}
The semantics combines the big-step and small-step styles. The overall execution of the hardware is described using a small-step semantics, with one small step per clock cycle; this is appropriate because hardware is routinely designed to run for an unlimited number of clock cycles and the big-step style is ill-suited to describing infinite executions. Then, within each clock cycle, a big-step semantics is used to execute all the statements.
@@ -51,13 +51,13 @@ always @(posedge clk) begin x[0] = 1; x[1] <= 1; end
which modifies one array element using blocking assignment and then a second using nonblocking assignment. If the existing semantics were used to update the array, then during the merge, the entire array \texttt{x} from the nonblocking association map would replace the entire array from the blocking association map. This would replace \texttt{x[0]} with its original value and therefore behave incorrectly. Accordingly, we modified the maps so they record updates on a per-el\-em\-ent basis. Our state $\Gamma$ is therefore split up into $\Gamma_{r}$ for instantaneous updates to variables, and $\Gamma_{a}$ for instantaneous updates to arrays; $\Delta$ is split similarly. The merge function then ensures that only the modified indices get updated when $\Gamma_{a}$ is merged with the nonblocking map equivalent $\Delta_{a}$.
\paragraph{Adding negative edge support}
-To support memory inference efficiently and create and reason about a circuit that executes at the negative edge of the clock, support for the negative edge triggered \alwaysblock{}s was added to the semantics. This is shown in the modifications of the \textsc{Module} rule shown below:
+To reason about circuits that execute on the negative edge of the clock (such as our RAM interface described in Section~\ref{sec:algorithm:optimisation:ram}), support for negative-edge-triggered \alwaysblock{}s was added to the semantics. This is shown in the modifications of the \textsc{Module} rule shown below:
\begin{equation*}
\inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m}) \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\ \Delta'')}
\end{equation*}
-The main execution of the module $\downarrow_{\text{module}}$ is split into $\downarrow_{\text{module}^{+}}$ and $\downarrow_{\text{module}^{-}}$, which are rules that only execute always blocks triggered at the positive and at the negative edge respectively. As in the initial \textsc{Module} rule, the positive edge triggered \alwaysblock{}s are processed in the same way. The output maps $\Gamma'$ and $\Delta'$ are then merged and passed as the blocking assignments map into the negative edge execution, so that all the blocking and nonblocking assignments are present. Finally, all the negative edge triggered \alwaysblock{}s are processed and merged to give the final state.
+The main execution of the module $\downarrow_{\text{module}}$ is split into $\downarrow_{\text{module}^{+}}$ and $\downarrow_{\text{module}^{-}}$, which are rules that only execute \alwaysblock{}s triggered at the positive and at the negative edge respectively. The positive-edge-triggered \alwaysblock{}s are processed in the same way as in the original \textsc{Module} rule. The output maps $\Gamma'$ and $\Delta'$ are then merged and passed as the blocking assignments map into the negative edge execution, so that all the blocking and nonblocking assignments are present. Finally, all the negative-edge-triggered \alwaysblock{}s are processed and merged to give the final state.
\paragraph{Adding declarations} Explicit support for declaring inputs, outputs and internal variables was added to the semantics to make sure that the generated Verilog also contains the correct declarations. This adds some guarantees to the generated Verilog and ensures that it synthesises and simulates correctly.
@@ -99,13 +99,13 @@ To support this computational model, we extend the Verilog module we generate wi
\begin{description}
\item[program counter] The program counter can be modelled using a register that keeps track of the state, denoted as $\sigma$.
\item[function entry point] When a function is called, the entry point denotes the first instruction that will be executed. This can be modelled using a reset signal that sets the state accordingly, denoted as \textit{rst}.
- \item[return value] The return value can be modelled by setting a finished flag to 1 when the result is ready, and putting the result into a 32-bit output register. These are denoted as \textit{fin} and \textit{rtrn} respectively.
- \item[stack] The function stack can be modelled as a RAM block, which is implemented using an array in the module, and denoted as \textit{stk}.
+ \item[return value] The return value can be modelled by setting a finished flag to 1 when the result is ready, and putting the result into a 32-bit output register. These are denoted as \textit{fin} and \textit{rtrn} respectively. \JW{Is there a mismatch between `ret' in the figure and `rtrn' in the text?}
+ \item[stack] The function stack can be modelled as a RAM block, which is implemented using an array in the module, and denoted as \textit{stk}. \JW{Is there a mismatch between `st' in the figure and `stk' in the text?}
\end{description}
Figure~\ref{fig:inference_module} shows the inference rules for moving between the computational states. The first, \textsc{Step}, is the normal rule of execution. It defines one step in the \texttt{State} state, assuming that the module is not being reset, that the finish state has not been reached yet, that the current and next state are $v$ and $v'$, and that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Step} rule. The \textsc{Finish} rule returns the final value of running the module and is applied when the \textit{fin} register is set; the return value is then taken from the \textit{ret} register.
-Note that there is no step from \texttt{State} to \texttt{Callstate}; this is because function calls are not supported, and it is therefore impossible in our semantics to ever reach a \texttt{Callstate} except for the initial call to main. So the \textsc{Call} rule is only used at the very beginning of execution; likewise, the \textsc{Return} rule is only matched for the final return value from the main function. %as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.
+Note that there is no step from \texttt{State} to \texttt{Callstate}; this is because function calls are not supported, and it is therefore impossible in our semantics ever to reach a \texttt{Callstate} except for the initial call to main. So the \textsc{Call} rule is only used at the very beginning of execution; likewise, the \textsc{Return} rule is only matched for the final return value from the main function. %as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.
Therefore, in addition to the rules shown in Figure~\ref{fig:inference_module}, an initial state and final state need to be defined:
\begin{gather*}
@@ -117,7 +117,7 @@ Therefore, in addition to the rules shown in Figure~\ref{fig:inference_module},
\subsection{Memory Model}\label{sec:verilog:memory}
-The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language. There is no preexisting architecture that Verilog will produce, it can describe any memory layout that is needed. Instead of having specific semantics for memory, the semantics only needs to support the language features that can produce these different memory layouts, these being Verilog arrays. We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment. We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays that are used in Verilog. The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite. There have already been various efforts to define a finite memory model for all compiler passes in \compcert{}, such as Comp\-Cert\-S~\cite{besson18_compc}, Comp\-Cert\-ELF~\cite{wang20_compc} and Comp\-Cert\-TSO~\cite{sevcik13_compc}, however, we define the translation from \compcert{}'s standard infinite memory model to finitely sized arrays that can be represented in Verilog, leaving the compiler passes intact.
+The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware description language. There is no preexisting architecture that Verilog will produce; it can describe any memory layout that is needed. Instead of having specific semantics for memory, the semantics only needs to support the language features that can produce these different memory layouts, these being Verilog arrays. We therefore define semantics for updating Verilog arrays using blocking and nonblocking assignment. We then have to prove that the C memory model that \compcert{} uses matches with the interpretation of arrays used in Verilog. The \compcert{} memory model is infinite, whereas our representation of arrays in Verilog is inherently finite. There have already been various efforts to define a finite memory model for all compiler passes in \compcert{}, such as Comp\-Cert\-S~\cite{besson18_compc}, Comp\-Cert\-ELF~\cite{wang20_compc} and Comp\-Cert\-TSO~\cite{sevcik13_compc}, however, we define the translation from \compcert{}'s standard infinite memory model to finitely sized arrays that can be represented in Verilog, leaving the compiler passes intact. \JW{I'm not quite sure I understand. Let me check: Are you saying that previous work has shown how all the existing CompCert passes can be adapted from an infinite to a finite memory model, but what we're doing is leaving the default (infinite) memory model for the CompCert front end, and just converting from an infinite memory model to a finite memory model when we go from 3AC to HTL?}
\begin{figure}
\centering
@@ -193,7 +193,7 @@ The Verilog semantics do not define a memory model for Verilog, as this is not n
%\JW{It's not completely clear what the relationship is between your work and those works. The use of `only' suggests that you've re-done a subset of work that has already been done -- is that the right impression?}\YH{Hopefully that's more clear.}
-This translation is represented in Figure~\ref{fig:memory_model_transl}, where \compcert{} defines a map from blocks to maps from memory address to memory contents. Each block represents an area in memory, for example, a block can represent a global variable or a stack for a function. Instead, our Verilog semantics define two finitely sized arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$. The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle. During our translation we only convert block 0 to a Verilog memory, and ensure that it is the only block that is present. This means that the block necessarily represents the stack of the main function. The invariant that then has to hold in the proofs, is that block 0 should be equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.
+This translation is represented in Figure~\ref{fig:memory_model_transl}. \compcert{} defines a map from blocks to maps from memory addresses to memory contents. Each block represents an area in memory; for example, a block can represent a global variable or a stack for a function. \JW{Maybe mention here that block 0 is always the stack for the main function.} \JW{So the stack frame for a function called by main would be in a different block, is that the idea? Seems unusual not to have a single stack.} \JW{Would global variables normally be put in blocks 1, 2, etc.?} Meanwhile, our Verilog semantics defines two finite arrays of optional values, one for the blocking assignments map $\Gamma_{\rm a}$ and one for the nonblocking assignments map $\Delta_{\rm a}$. \JW{It's a slight shame that `block' is used in two different senses in the preceding two sentences. I guess that can't be helped.} The optional values are present to ensure correct merging of the two association maps at the end of the clock cycle. During our translation we only convert block 0 to a Verilog memory, and ensure that it is the only block that is present. This means that the block necessarily represents the stack of the main function. The invariant that then has to hold in the proofs is that block 0 should be equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.
%However, in practice, assigning and reading from an array directly in the state machine will not produce a memory in the final hardware design, as the synthesis tool cannot identify the array as having the necessary properties that a RAM needs, even though this is the most natural formulation of memory. Even though theoretically the memory will only be read from once per clock cycle, the synthesis tool cannot ensure that this is true, and will instead create a register for each memory location. This increases the size of the circuit dramatically, as the RAM on the FPGA chip will not be reused. Instead, the synthesis tool expects a specific interface that ensures these properties, and will then transform the interface into a proper RAM during synthesis. Therefore, a translation has to be performed from the naive use of memory in the state machine, to a proper use of a memory interface.