verilog.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

\section{Verilog}\label{sec:verilog}

\newcommand{\alwaysblock}{\texttt{always}-block}

This section describes the Verilog semantics that were \JW{was} chosen for the target language, including the changes that were made to the semantics to make it \JWcouldcut{a better fit} a \JW{suitable} HLS target.  The Verilog standard is quite large~\cite{06_ieee_stand_veril_hardw_descr_languag,05_ieee_stand_veril_regis_trans_level_synth}, but the syntax and semantics can be reduced to a small subset that \vericert{} needs to target.
\NR{Have we discussed what our options were and why we chose the HOL4 semantics?} \JW{Good point -- we should cite the Maude semantics too. I think that's the only viable alternative.}\YH{Yes, I should make that more clear, I cited Meredith semantics in the algorithm section, will add maude as well, the main reason is practicality and we don't need continuous assignment, basically the same reasons that Loow used to justify creating his own semantics.}

The Verilog semantics \JW{we use is ported to Coq from} a semantics written in HOL4  by \citet{loow19_verif_compil_verif_proces}.% which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design. 
This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design.  The main features that are not supported by the syntax and semantics are continuous assignment and combinational always blocks.
\NR{Shall we use special font fo always-blocks: maybe \alwaysblock{}}

The semantics of Verilog differs from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, rather than an algorithm, which is usually sequential.  The main construct in Verilog is the always block \JW{consider `always-block' as it's easier to parse}. \JWcouldcut{which is a construct that contains statements.}
A module can contain multiple always blocks, all of which run in parallel.  These always blocks further contain statements such as if-statements or assignments to variables.  Each always block also contains a list of events at which it should trigger, which could either contain signals that are assigned to other signals in that always block, or a different signal such as a clock which will trigger the always block periodically, only the latter are actually supported in our target semantics. \JW{That sentence is a bit wordy. Can you just say something like: `We support only \emph{synchronous} logic, which means that the always block is triggered on (and only on) the rising edge of a clock signal.'?}  
\NR{We should mention that variables cannot be driven by multiple \alwaysblock{}s, since one might get confused with data races when relating to concurrent processes in software.} \JW{Given the recent discussion on Teams, it seems to me that we perhaps don't need to mention here what happens if a variable is driven multiple times per clock cycle, especially since Vericert isn't ever going to do that.}

As hardware designs normally describe events that will be executed periodically for an infinite amount of time, the top-level of the semantics is best described using small-step semantics, whereas the execution of one small step is described using big-step semantics. 
\JW{I reckon that can be made clearer by talking about clock cycles. Maybe something like: ``The semantics combines the big-step and small-step styles. The overall execution of the hardware is described using a small-step semantics, with one small step per clock cycle; this is appropriate because hardware is routinely designed to run for an unlimited number of clock cycles and the big-step style is ill-suited to describing infinite executions. Then, within each clock cycle, a big-step semantics is used to execute all the statements.'' }
An example of a rule \JWcouldcut{in the semantics} for executing an always block is shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the always block:

\begin{equation*}
  \inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}} \Sigma'}
\end{equation*}

\noindent which shows that assuming the statement $s$ in the always block runs with state $\Sigma$ and produces the new state $\Sigma'$, the always block will result in the same final state.  As only clocked always blocks are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle \NRcouldcut{which is what it is defined to do}.
\NR{The mention about small steps being one cycle and 'only clocked/synchronous \alwaysblock{} are supported' can move earlier.} \JW{Actually couldn't this rule be run multiple times per clock cycle in the case where you have multiple always blocks in your module all triggered on the same rising edge?}

Two types of assignments are supported in always blocks: nonblocking and blocking assignment.  Nonblocking assignment modifies the signal at the end of the timestep \JW{Assuming `timestep' and `clock cycle' are the same, I suggest we stick with `clock cycle' everywhere.} and atomically.  Blocking assignment, on the other hand, assigns the variable \NRreplace{directly}{instantaneously?} in the always block for later signals to pick up.  To model both these assignments, the state $\Sigma$ has to be split into two \NRreplace{parts}{sets?}: $\Gamma$, containing the current values of all variables, and $\Delta$, containing the values that will be assigned to the variables at the end of the clock cycle, we can therefore say that $\Sigma = (\Gamma, \Delta)$.~\NR{Can we say that $\Gamma$ contains instantaneous (ephemeral) updates for the current cycle and $\Delta$ contains periodical updates for the next cycle? Will be good to distinguish those two updates with the same terms across the paper, which can tie in with small-step and big-step distinction maybe asynchronous vs synchronous updates. }\YH{Hmm, so I don't think nonblocking and blocking ties in with big-step and smallstep, but yes it would be better to use the same terms throughoug the paper}  The nonblocking assignment can therefore be expressed as the following:

\begin{equation*}
  \inferrule[Nonblocking Reg]{\yhkeyword{name}\ d = \yhkeyword{OK}\ n \\ (\Gamma, e) \downarrow_{\text{expr}} v}{((\Gamma, \Delta), d\ \yhkeyword{ <= } e) \downarrow_{\text{stmnt}} (\Gamma, \Delta [n \mapsto v])}\\
\end{equation*}

\noindent where assuming that $\downarrow_{\text{expr}}$ evaluates an expression $e$ to a value $v$, then the nonblocking assignment $d\ \yhkeyword{ <= } e$ will update the future state of the variable $d$ with value $v$.  Finally, at the end of the clock cycle, the registers need to be updated with their future values in $\Delta$, which can be shown by the rule in the semantics that runs a whole module, $\vec{m}$ is a list of variable declarations and always blocks, and the $\downarrow_{\vec{m}}$ evaluates these sequentially and obtains the final state $(\Gamma', \Delta')$. \JW{I'm a little torn. On the one hand, this is interesting stuff, and I almost want to suggest putting in the `Blocking Reg' rule alongside the one above, because seeing the rules side-by-side makes it really clear how they both work. But I'm conscious that talking about the distinction between non-blocking and blocking assignments is all a bit moot because Vericert never produces blocking assignments (right?). And this text is all just explaining Loow's work, not your own work. So, in short, I'm a bit torn. Probably best to keep it as-is for now, but bear this in mind as a candidate for chopping nearer the deadline. What do yout think?}


\begin{equation*}
  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma' // \Delta')}
\end{equation*}

\noindent This rule dictates how the whole module runs in one clock cycle, from an initial state of all the variables $\Gamma$, to a final state $(\Gamma' // \Delta')$, where $//$ defines a merge between the two maps from variable names to values, where the right hand side values take priority if there is a conflict. \JW{Is $//$ a standard notation for this? If so, that's fine. It's just I haven't come across it before.}\YH{It is not, should I use different notation? Is there standard notation?} This rule correctly implements the behaviours for the blocking and nonblocking assignment by overwriting the assignments made through blocking assignment at the end of the clock cycle with the atomic assignments from the nonblocking assignments.

\JW{I think most of the following paragraph could be chopped. We've already mentioned that we're only targeting a small fragment of Verilog, so we could mention at that point that in the fragment we target there is no discrepancy between the simulation behaviour and the synthesis behaviour. I don't know how important it is to talk about the details of evaluation order here -- does the reader really need to be aware of this in order to understand your work?}
When targeting a hardware description language such as Verilog, it is important to be consistent between simulating the hardware and the behaviour of the synthesised result on actual hardware.  In the target Verilog semantics, only clocked always blocks are supported as these ensure that there are not mismatches between simulation and synthesis, as combinational always blocks that trigger on any change of an internal signal may behave differently in simulation or synthesis based on the order of operations.  This limitation in the semantics therefore helps keep the Verilog correct and consistent.  In addition to that, only nonblocking assignment is used in signals that are used in multiple always blocks, which stops any race conditions from occurring as all the signal updates happen deterministically. \NR{So the semantics allows multiple drivers then?}  Finally, a specific order of evaluation for the always blocks is chosen, and because of the limitations listed before, choosing an order is guaranteed to have the same behaviour as executing the always blocks in any order.\NR{Order is not guaranteed if we have multiple drivers.}

\subsection{Changes to the Semantics}

Some small changes were made to the semantics proposed by \citet{loow19_verif_compil_verif_proces} to make it suitable as a HLS target.

The main change is the addition of support for two-dimensional arrays, which are needed to model RAM in Verilog.  RAM is needed to model the stack in C efficiently, without having to declare a variable for each possible stack location.  In the original semantics, RAMs, as well as inputs and outputs to the module could be modelled using a function from variable names (strings) to values, which could be modified accordingly to model inputs to the module.  This is quite an abstract description of memory, which can also be expressed as an array of bitvectors, which is the path we took instead.  This requires the addition of array operators to the semantics and correct reasoning of loads and stores to the array in different always blocks simultaneously.  Consider the following Verilog code:

\begin{minted}{verilog}
always @(posedge clk) begin
  x[0] = 23;
  x[1] <= 1;
end
\end{minted}

The code above modifies one array value using nonblocking assignment and it then modifies a second entry in the array with nonblocking assignment.  If the existing semantics used for values was used to update the array, then during the merge of values for the array, the array \texttt{x} from the nonblocking association map would replace the array from the blocking association map.  This would replace \texttt{x[0]} with it's original value and therefore behave incorrectly.  Instead, updates to arrays need to be recorded for each index in the array, and merging of the blocking and nonblocking queue needs to take into account each index separately.

Explicit support for declaring inputs, outputs and internal variables was added to the semantics to make sure that the generated Verilog also contains the correct declarations.  This adds some guarantees to the generated Verilog and ensures that it synthesises and simulates correctly.

In addition to adding support for two-dimensional arrays, support for receiving external inputs was removed from the semantics for the case of simplicity, as these are not needed for an HLS target.  The main module in Verilog models the main function in C, and as the inputs to a C function shouldn't change during it's execution, there is no need to add support for external inputs for Verilog modules.  Finally, another simplification to the semantics that was made is to use 32 bit integers instead of arrays of booleans for the bitvector representation.  As the translation only currently has support for \texttt{int}, it is possible to simplify the semantics further and not have to handle bitvectors of an arbitrary size.

\subsection{Integrating the Semantics in \compcert{}'s Model}
\NR{The \textit{Verilog} semantics perhaps.}

\begin{figure*}
  \centering
  \begin{minipage}{1.0\linewidth}
    \begin{gather*}
      \inferrule[Step]{\Gamma\ !\ \textit{rst} = \yhconstant{Some } 0 \\ \Gamma\ !\ \textit{fin} = \yhconstant{Some } 0 \\ \Gamma\ !\ \sigma = \yhconstant{Some } v \\ (m, \Gamma)\ \downarrow_{\text{program}} \Gamma' \\ \Gamma'\ !\ \sigma = \yhconstant{Some } v'}{\yhconstant{State } \textit{sf }\ m\ v\ \Gamma \longrightarrow \yhconstant{State } \textit{sf }\ m\ v'\ \Gamma'}\\
      %
      \inferrule[Finish]{\Gamma\ !\ \textit{fin} = \yhconstant{Some } 1 \\ \Gamma\ !\ \textit{ret} = \yhconstant{Some } r}{\yhconstant{State } \textit{sf }\ m\ \sigma\ \Gamma \longrightarrow \yhconstant{Returnstate } \textit{sf }\ r}\\
      %
      \inferrule[Call]{ }{\yhconstant{Callstate } \textit{sf }\ m\ \vec{r} \longrightarrow \yhconstant{State } \textit{sf }\ m\ n\ ((\yhfunction{init\_params}\ \vec{r}\ a)[\sigma \mapsto n, \textit{fin} \mapsto 0, \textit{rst} \mapsto 0])}\\
      %
      \inferrule[Return]{ }{\yhconstant{Returnstate } (\yhconstant{Stackframe } r\ m\ \textit{pc }\ \Gamma :: \textit{sf}) \longrightarrow \yhconstant{State } \textit{sf }\ m\ \textit{pc }\ (\Gamma [ \textit{st} \mapsto \textit{pc}, r \mapsto i ])\ \epsilon}
    \end{gather*}
  \end{minipage}
  \caption{Inferrence rules for modules}%
  \label{fig:inferrence_module}
\end{figure*}

\compcert{} defines a specific model of computation which all semantics have to follow in order to prove properties about them.  \compcert{} has three main states that need to be defined:

\begin{description}
  \item[\texttt{State} \textit{sf} $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$] The main state when executing a function, with stack frame \textit{sf}, current module $m$, current state $v$ and variable states $\Gamma_{r}$ and $\Gamma_{a}$\NR{does r and a mean register and array?}.
  \item[\texttt{Callstate} \textit{sf} $m$ $\vec{r}$] The state that is reached when a function is called, with the current stack frame \textit{sf}, current module $m$ and arguments $\vec{r}$.
  \item[\texttt{Returnstate} \textit{sf} $r$] The state that is reached when a function returns back to the caller, with stack frame \textit{sf} and return value $r$.
\end{description}

However, as Verilog behaves quite differently to software programming languages, this model does not match the Verilog semantics.  Instead, the module definition in Verilog needs to be enriched to support this model of computation, by adding required signals to the inputs and outputs of modules.

\begin{description}
  \item[program counter] First of all, the program counter needs to be modelled, which can be done using a register which keeps track of state, denoted as $\sigma$.
  \item[function entry point] When a function is called, the entry point denotes the first instruction that will be executed, which can be modelled using a reset signal that sets the state accordingly, denoted as $rst$.
  \item[return value] The return value can be modelled by setting a finished flag to one when the result is ready, and putting the result into a 32-bit output register, denoted as \textit{fin} and \textit{rtrn} respectively.
  \item[stack] The function stack can be modelled as a RAM using a two-dimensional array in the module, denoted as \textit{stk}.
\end{description}

Figure~\ref{fig:inferrence_module}\NR{This ref points to Figure 14 instead of Figure 5!} shows the inference rules that convert from one state to another.  The first rule \textsc{Step} is the normal rule of execution.  Assuming that the module is not being reset, and that the finish state has not been reached yet and that the current and next state are $v$ and $v'$, and finally that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Module} rule, we can then define one step in the \texttt{State}.  The \textsc{Finish} rule is the rule that returns the final value of running the module and is applied when the \textit{fin} register is set.  The return value is then taken from the \textit{ret} register.


The first thing to note about the \textsc{Call} rule is that there is no step from \texttt{State} to \texttt{Callstate}, as function calls are not supported, and it is therefore impossible in our semantics to ever reach a \texttt{Callstate} except for the initial call to main.  The same can be said about the \textsc{Return} state rule, which will only be matched for the final return value from the main function, as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.  Therefore, in addition to the rules shown in Figure~\ref{fig:inferrence_module}, an initial state and final state need to be defined.

\begin{align*}
  \texttt{initial:}\quad &\forall P,\ \yhconstant{Callstate } []\ P.\texttt{main } []\\
  \texttt{final:}\quad &\forall n,\ \yhconstant{Returnstate } []\ n \implies n
\end{align*}

\noindent where the initial state is defined as the \texttt{Callstate} with an empty stack frame and no arguments for the \texttt{main} function of program $P$, and the final state results in the program output of value $n$ when reaching a \texttt{Returnstate} with an empty stack frame.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: