verilog.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

\section{A Formal Semantics for Verilog}\label{sec:verilog}

\newcommand{\alwaysblock}{always-block}

This section describes the Verilog semantics that was chosen for the target language, including the changes that were made to the semantics to make it a suitable HLS target.  The Verilog standard is quite large~\cite{06_ieee_stand_veril_hardw_descr_languag,05_ieee_stand_veril_regis_trans_level_synth}, but the syntax and semantics can be reduced to a small subset that \vericert{} needs to target.

The Verilog semantics we use is ported to Coq from a semantics written in HOL4 by \citet{loow19_proof_trans_veril_devel_hol} and used to prove the translation from HOL4 to Verilog~\cite{loow19_verif_compil_verif_proces}. % which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog design. 
This semantics is quite practical as it is restricted to a small subset of Verilog, which can nonetheless be used to model the hardware constructs required for HLS.  The main features that are excluded are continuous assignment and combinational \alwaysblock{}s; these are modelled in other semantics such as that by~\citet{meredith10_veril}. %however, these are not necessarily needed, but require more complicated event queues and execution model.

The semantics of Verilog differs from regular programming languages, as it is used to describe hardware directly, which is inherently parallel, rather than an algorithm, which is usually sequential.  The main construct in Verilog is the \alwaysblock{}.
A module can contain multiple \alwaysblock{}s, all of which run in parallel.  These \alwaysblock{}s further contain statements such as if-statements or assignments to variables.  We support only \emph{synchronous} logic, which means that the \alwaysblock{} is triggered on (and only on) the rising edge of a clock signal.
%\NR{We should mention that variables cannot be driven by multiple \alwaysblock{}s, since one might get confused with data races when relating to concurrent processes in software.} \JW{Given the recent discussion on Teams, it seems to me that we perhaps don't need to mention here what happens if a variable is driven multiple times per clock cycle, especially since Vericert isn't ever going to do that.}

The semantics combines the big-step and small-step styles. The overall execution of the hardware is described using a small-step semantics, with one small step per clock cycle; this is appropriate because hardware is routinely designed to run for an unlimited number of clock cycles and the big-step style is ill-suited to describing infinite executions. Then, within each clock cycle, a big-step semantics is used to execute all the statements.
An example of a rule for executing an \alwaysblock{} is shown below, where $\Sigma$ is the state of the registers in the module and $s$ is the statement inside the \alwaysblock{}:

\begin{equation*}
  \inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}} \Sigma'}
\end{equation*}

\noindent This rule says that assuming the statement $s$ in the \alwaysblock{} runs with state $\Sigma$ and produces the new state $\Sigma'$, the \alwaysblock{} will result in the same final state.  %Since only clocked \alwaysblock{} are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle for each \alwaysblock{}.

Two types of assignments are supported in \alwaysblock{}s: nonblocking and blocking assignment.  Nonblocking assignments all take effect simultaneously at the end of the clock cycle, %and atomically.
while blocking assignments happen instantly so that later assignments in the clock cycle can pick them up.  To model both of these assignments, the state $\Sigma$ has to be split into two maps: $\Gamma$, which contains the current values of all variables, and $\Delta$, which contains the values that will be assigned at the end of the clock cycle. %, we can therefore say that $\Sigma = (\Gamma, \Delta)$. 
Nonblocking assignment can therefore be expressed as follows:
\begin{equation*}
  \inferrule[Nonblocking Reg]{\yhkeyword{name}\ d = \yhkeyword{OK}\ n \\ (\Gamma, e) \downarrow_{\text{expr}} v}{((\Gamma, \Delta), d\ \yhkeyword{ <= } e) \downarrow_{\text{stmnt}} (\Gamma, \Delta [n \mapsto v])}\\
\end{equation*}

\noindent where assuming that $\downarrow_{\text{expr}}$ evaluates an expression $e$ to a value $v$, the nonblocking assignment $d\ \yhkeyword{ <= } e$ updates the future state of the variable $d$ with value $v$.

Finally, the following rule dictates how the whole module runs in one clock cycle:
\begin{equation*}
  \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma' // \Delta')}
\end{equation*}
where $\Gamma$ is the initial state of all the variables, and $\vec{m}$ is a list of variable declarations and \alwaysblock{}s that $\downarrow_{\text{module}}$ evaluates sequentially to obtain $(\Gamma', \Delta')$. The final state is obtained by merging these maps using the $//$ operator, which gives priority to the right-hand operand in a conflict. This rule ensures that the nonblocking assignments overwrite at the end of the clock cycle any blocking assignments made during the cycle.

\subsection{Changes to the Semantics}

Four changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as a HLS target.

\paragraph{Adding array support}
The main change is the addition of support for arrays, which are needed to model RAM in Verilog.  RAM is needed to model the stack in C efficiently, without having to declare a variable for each possible stack location. % In the original semantics, RAMs (as well as inputs and outputs to the module) could be modelled using a function from variable names (strings) to values, which could be modified accordingly to model inputs to the module.  This is quite an abstract description of memory and can also be expressed as an array of bitvectors instead, which is the path we took. This requires the addition of array operators to the semantics and correct reasoning of loads and stores to the array in different \alwaysblock{}s simultaneously.  
Consider the following Verilog code:
\begin{center}
\begin{minted}{verilog}
reg [31:0] x[1:0];
always @(posedge clk) begin
  x[0] = 1;
  x[1] <= 1;
end
\end{minted}
\end{center}
which modifies one array element using blocking assignment and then a second using nonblocking assignment. If the existing semantics were used to update the array, then during the merge, the entire array \texttt{x} from the nonblocking association map would replace the entire array from the blocking association map.  This would replace \texttt{x[0]} with its original value and therefore behave incorrectly. Accordingly, we modified the maps so they record updates on a per-el\-em\-ent basis. Our state $\Gamma$ is therefore split up into $\Gamma_{r}$ for instantaneous updates to variables, and $\Gamma_{a}$ for instantaneous updates to arrays; $\Delta$ is split similarly. The merge function then ensures that only the modified indices get updated when $\Gamma_{a}$ is merged with the nonblocking map equivalent $\Delta_{a}$.

\paragraph{Adding declarations} Explicit support for declaring inputs, outputs and internal variables was added to the semantics to make sure that the generated Verilog also contains the correct declarations.  This adds some guarantees to the generated Verilog and ensures that it synthesises and simulates correctly.

\paragraph{Removing support for external inputs to modules} Support for receiving external inputs was removed from the semantics for simplicity, as these are not needed for an HLS target. The main module in Verilog models the main function in C, and since the inputs to a C function shouldn't change during its execution, there is no need for external inputs for Verilog modules.  

\paragraph{Simplifying representation of bitvectors} Finally, we use 32-bit integers to represent bitvectors rather of arrays of Booleans. This is because \vericert{} (currently) only supports types represented by 32 bits.

\subsection{Integrating the Verilog Semantics into \compcert{}'s Model}
\label{sec:verilog:integrating}

\begin{figure*}
  \centering
  \begin{minipage}{1.0\linewidth}
    \begin{gather*}
      \inferrule[Step]{\Gamma_r\ !\ \textit{rst} = \yhconstant{Some } 0 \\ \Gamma_r\ !\ \textit{fin} = \yhconstant{Some } 0 \\ \Gamma_r\ !\ \sigma = \yhconstant{Some } v \\ (m, (\Gamma_r, \Gamma_a))\ \downarrow_{\text{program}} (\Gamma_r', \Gamma_a') \\ \Gamma_r'\ !\ \sigma = \yhconstant{Some } v'}{\yhconstant{State } \textit{sf }\ m\ v\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{State } \textit{sf }\ m\ v'\ \Gamma_r'\ \Gamma_a'}\\
      %
      \inferrule[Finish]{\Gamma_r\ !\ \textit{fin} = \yhconstant{Some } 1 \\ \Gamma_r\ !\ \textit{ret} = \yhconstant{Some } r}{\yhconstant{State } \textit{sf }\ m\ \sigma\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{Returnstate } \textit{sf }\ r}\\
      %
      \inferrule[Call]{ }{\yhconstant{Callstate } \textit{sf }\ m\ \vec{r} \longrightarrow \yhconstant{State } \textit{sf }\ m\ n\ ((\yhfunction{init\_params}\ \vec{r}\ a)[\sigma \mapsto n, \textit{fin} \mapsto 0, \textit{rst} \mapsto 0])\ \epsilon}\\
      %
      \inferrule[Return]{ }{\yhconstant{Returnstate } (\yhconstant{Stackframe } r\ m\ \textit{pc }\ \Gamma_r\ \Gamma_a :: \textit{sf})\ v \longrightarrow \yhconstant{State } \textit{sf }\ m\ \textit{pc }\ (\Gamma_{r} [ \textit{st} \mapsto \textit{pc}, r \mapsto v ])\ \Gamma_{a}}
    \end{gather*}
  \end{minipage}
  \caption{Top-level small-step semantics for Verilog modules in \compcert{}'s computational framework.}%
  \label{fig:inferrence_module}
\end{figure*}

The \compcert{} computation model defines a set of states through which execution passes. In this subsection, we explain how we extend our Verilog semantics with five special-purpose  registers in order to integrate it into \compcert{}.

\compcert{} executions pass through three main states:
\begin{description}
  \item[\texttt{State} \textit{sf} $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$] The main state when executing a function, with stack frame \textit{sf}, current module $m$, current state $v$ and variable states $\Gamma_{r}$ and $\Gamma_{a}$.
  \item[\texttt{Callstate} \textit{sf} $m$ $\vec{r}$] The state that is reached when a function is called, with the current stack frame \textit{sf}, current module $m$ and arguments $\vec{r}$.
  \item[\texttt{Returnstate} \textit{sf} $v$] The state that is reached when a function returns back to the caller, with stack frame \textit{sf} and return value $v$.
\end{description}

To support this computational model, we extend the Verilog module we generate with the following five registers and a RAM block:

\begin{description}
  \item[program counter] The program counter can be modelled using a register that keeps track of the state, denoted as $\sigma$.
  \item[function entry point] When a function is called, the entry point denotes the first instruction that will be executed. This can be modelled using a reset signal that sets the state accordingly, denoted as \textit{rst}.
  \item[return value] The return value can be modelled by setting a finished flag to 1 when the result is ready, and putting the result into a 32-bit output register. These are denoted as \textit{fin} and \textit{rtrn} respectively.
  \item[stack] The function stack can be modelled as a RAM block, which is implemented using an array in the module, and denoted as \textit{stk}.
\end{description}

Figure~\ref{fig:inferrence_module} shows the inference rules for moving between the computational states.  The first, \textsc{Step}, is the normal rule of execution.  It defines one step in the \texttt{State} state, assuming that the module is not being reset, that the finish state has not been reached yet, that the current and next state are $v$ and $v'$, and that the module runs from state $\Gamma$ to $\Gamma'$ using the \textsc{Step} rule.  The \textsc{Finish} rule returns the final value of running the module and is applied when the \textit{fin} register is set; the return value is then taken from the \textit{ret} register.

Note that there is no step from \texttt{State} to \texttt{Callstate}; this is because function calls are not supported, and it is therefore impossible in our semantics to ever reach a \texttt{Callstate} except for the initial call to main. So the \textsc{Call} rule is only used at the very beginning of execution; likewise, the \textsc{Return} rule is only matched for the final return value from the main function. %as there is no rule that allocates another stack frame \textit{sf} except for the initial call to main.  
Therefore, in addition to the rules shown in Figure~\ref{fig:inferrence_module}, an initial state and final state need to be defined:

\begin{gather*}
  \inferrule[Initial]{\yhfunction{is\_internal}\ (P.\texttt{main})}{\yhfunction{initial\_state}\ (\yhconstant{Callstate } []\ (P.\texttt{main})\ [])}\qquad
  \inferrule[Final]{ }{\yhfunction{final\_state}\ (\yhconstant{Returnstate } []\ n)\ n}
\end{gather*}

\noindent where the initial state is the \texttt{Callstate} with an empty stack frame and no arguments for the \texttt{main} function of program $P$, where this \texttt{main} function needs to be in the current translation unit.  The final state results in the program output of value $n$ when reaching a \texttt{Returnstate} with an empty stack frame.

\subsection{Memory Model}

\begin{figure}
  \centering
  \begin{subfigure}[t]{0.48\linewidth}
    \includegraphics[width=\linewidth]{diagrams/store_waveform.pdf}
    \caption{Store waveform.}
  \end{subfigure}\hfill%
  \begin{subfigure}[t]{0.48\linewidth}
    \includegraphics[width=\linewidth]{diagrams/load_waveform.pdf}
    \caption{Load waveform.}
  \end{subfigure}
\end{figure}

\begin{figure}
  \centering
  \includegraphics[width=0.5\linewidth]{diagrams/memory_model.pdf}
  \caption{Change in the memory model during the translation of 3AC to HTL.}
\end{figure}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% TeX-command-extra-options: "-shell-escape"
%%% End: