\section{Proof}\label{sec:proof}

Now that the Verilog semantics have been adapted to the CompCert model, we are in a position to formally prove the correctness of our C-to-Verilog compilation.  This section describes the main correctness theorem that was proven and the main ideas behind the proof. The full Coq proof is available in auxiliary material.

The main correctness theorem is analogous to that stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil}: for all Clight source programs $C$, if the translation to the target Verilog code succeeds, and $C$ has safe observable behaviour $B$ when executed, then the target Verilog code will have the same behaviour $B$. Here, a `safe' execution is one that either converges or diverges, but does not ``go wrong''. If the program does admit some wrong behaviour (like undefined behaviour in C), the correctness theorem does not apply. A behaviour, then, is either a final state (in the case of convergence) or divergence. In \compcert{}, a behaviour is also associated with a trace of I/O events, but since external function calls are not supported in \vericert{}, this trace will always be empty for us. Note that the compiler is allowed to fail and not produce any output; the correctness theorem only applies when the translation succeeds.

%The following `backwards simulation' theorem describes the correctness theorem, where $\Downarrow$ stands for simulation and execution respectively.

\begin{theorem}
  For any safe behaviour $B$, whenever the translation from $C$ succeeds and produces Verilog $V$, then $V$ has behaviour $B$ only if $C$ has behaviour $B$.
  \begin{equation*}
    \forall C, V, B \in \texttt{Safe},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \land V \Downarrow B \implies C \Downarrow B.
  \end{equation*}
\end{theorem}

The theorem is a `backwards simulation' result (from target to source). The theorem does not demand the `if' direction too, because compilers are permitted to resolve any non-determinism present in their source programs.

In practice, Clight programs are all deterministic, as are the Verilog programs in the fragment we consider. This means that we can prove the correctness theorem above by first inverting it to become a forwards simulation result, following standard \compcert{} practice.

The second observation that needs to be made is that to prove this forward simulation, it suffices to prove forward simulations between each intermediate language, as these results can be composed to prove the correctness of the whole HLS tool. 
The forward simulation from 3AC to HTL is stated in Lemma~\ref{lemma:htl} (Section~\ref{sec:proof:3ac_htl}), then the forward simulation between HTL and Verilog is shown in Lemma~\ref{lemma:verilog} (Section~\ref{sec:proof:htl_verilog}) and finally, the proof that Verilog is deterministic is given in Lemma~\ref{lemma:deterministic} (Section~\ref{sec:proof:deterministic}).

\subsection{Forward simulation from 3AC to HTL}\label{sec:proof:3ac_htl}

As HTL is quite far removed from 3AC, this first translation is the most involved and therefore requires a larger proof, because the translation from 3AC instructions to Verilog statements needs to be proven correct in this step.  In addition to that, the semantics of HTL are also quite different to the 3AC semantics, as instead of defining small-step semantics for each construct in Verilog, the semantics are instead defined over one clock cycle and mirror the semantics defined for Verilog.  Lemma~\ref{lemma:htl} shows the result that needs to be proven in this subsection.

\begin{lemma}[Forward simulation from 3AC to HTL]\label{lemma:htl}
  We write \texttt{tr\_htl} for the translation from 3AC to HTL.
  \begin{equation*}
    \forall c, h, B \in \texttt{Safe}, \yhfunction{tr\_htl} (c) = \yhconstant{OK} (h) \land c \Downarrow B \implies h \Downarrow B.
  \end{equation*}
\end{lemma}

We prove this lemma by first establishing a specification of the translation function $\yhfunction{tr\_htl}$ that captures its important properties, and then splitting the proof into two parts: one to show that the translation function does indeed meet its specification, and one to show that the specification implies the desired simulation result. This strategy is in keeping with standard \compcert{} practice. % The forward simulation is then proven by showing that the initial states and final states between the 3AC semantics and HTL semantics match, and then showing that the simulation diagram in Lemma~\ref{lemma:simulation_diagram} holds.

\subsubsection{From Implementation to Specification}\label{sec:proof:3ac_htl:specification}

%To simplify the proof, instead of using the translation algorithm as an assumption, as was done in Lemma~\ref{lemma:htl}, a specification of the translation can be constructed instead which contains all the properties that are needed to prove the correctness.  For example, for the translation from 3AC to HTL, 
The specification for the translation of 3AC instructions into HTL data-path and control logic can be defined by the following predicate:
\begin{equation*}
  \yhfunction{spec\_instr } \textit{fin rtrn }\ \sigma\ \textit{stk }\ i\ \textit{data }\ \textit{control}
\end{equation*}

\noindent Here, the \textit{control} and \textit{data} parameters are the statements that the current 3AC instruction $i$ should translate to. The other parameters are the special registers defined in Section~\ref{sec:verilog:integrating}. An example of a rule describing the translation of an arithmetic/logical operation from 3AC is the following:
\begin{equation*}
  \inferrule[Iop]{\yhfunction{tr\_op } \textit{op }\ \vec{a} = \yhconstant{OK } e}{\yhfunction{spec\_instr } \textit{fin rtrn }\ \sigma\ \textit{stk }\ (\yhconstant{Iop } \textit{op }\ \vec{a}\ d\ n)\ (d\ \yhkeyword{<=}\ e)\ (\sigma\ \yhkeyword{<=}\ n)}
\end{equation*}

\noindent Assuming that the translation of the operator \textit{op} with operands $\vec{a}$ is successful and results in expression $e$, the rule describes how the destination register $d$ is updated to $e$ via a non-blocking assignment in the data path, and how the program counter $\sigma$ is updated to point to the next CFG node $n$ via another non-blocking assignment in the control path.

In the following lemma, $\yhfunction{spec\_htl}$ is the top-level specification predicate, which is built using $\yhfunction{spec\_instr}$ at the level of instructions. 

\begin{lemma}\label{lemma:specification}
  If a 3AC program $c$ is translated correctly to an HTL program $h$, then the specification of the translation holds.
  \begin{equation*}
    \forall\ c\ h,\ \yhfunction{tr\_htl} (c) = \yhconstant{OK}(h) \implies \yhfunction{spec\_htl}\ c\ h.
  \end{equation*}
\end{lemma}

%\begin{proof}
%  Follows from the definition of the specification and therefore should match the implementation of the algorithm.
%\end{proof}

\subsubsection{From Specification to Simulation}

To prove that the specification predicate implies the desired forward simulation, we must first define a relation that matches each 3AC state to an equivalent HTL state.  This relation also captures the assumptions made about the 3AC code that we receive from \compcert{}.% so that these assumptions can be used to prove the translations correct.  
These assumptions then have to be proven to always hold assuming the HTL code was created by the translation algorithm.  Some of the assumptions that need to be made about the 3AC and HTL code for a pair of states to match are:

\begin{itemize}
  \item The 3AC register file $R$ needs to be `less defined' than the HTL register map $\Gamma_{r}$ (written $R \le \Gamma_{r}$). This means that all entries should be equal to each other, unless a value in $R$ is undefined, in which case any value can match it.
  \item The RAM values represented by each Verilog array in $\Gamma_{a}$ need to match the 3AC function's stack contents, which are part of the memory $M$; that is: $M \le \Gamma_{a}$.
  \item The state is well formed, which means that the value of the state register matches the current value of the program counter; that is: $\textit{pc} = \Gamma_{r}!\sigma$.
\end{itemize}

We also define the following set $\mathcal{I}$ of invariants that must hold for the current state to be valid:

\begin{itemize}
  \item that all pointers in the program use the stack as a base pointer,
  \item that any loads or stores to locations outside of the bounds of the stack result in undefined behaviour (and hence we do not need to handle them),
  \item that \textit{rst} and \textit{fin} are not modified and therefore stay at a constant 0 throughout execution, and
  \item that the stack frames match.
\end{itemize}

We can now define the simulation diagram for the translation. The 3AC state can be represented by the tuple $(R,M,\textit{pc})$, which captures the register file, memory, and program counter. The HTL state can be represented by the pair $(\Gamma_{r}, \Gamma_{a})$, which captures the states of all the registers and arrays in the module.  Finally, $\mathcal{I}$ stands for the other invariants that need to hold for the states to match.

\begin{lemma}\label{lemma:simulation_diagram}
  Given the 3AC state $(R,M,\textit{pc})$ and the matching HTL state $(\Gamma_{r}, \Gamma_{a})$, assuming one step in the 3AC semantics produces state $(R',M',\textit{pc}')$, there exist one or more steps in the HTL semantics that result in matching states $(\Gamma_{r}', \Gamma_{a}')$.  This is all under the assumption that the specification $\yhfunction{tr\_{htl}}$ holds for the translation.

  \begin{center}
    \begin{tikzpicture}
      \begin{scope}
        \node[circle] (s1) at (0,1.5) {$R, M, \textit{pc}$};
        \node[circle] (r1) at (7.2,1.5) {$\Gamma_{r}, \Gamma_{a}$};
        \node[circle] (s2) at (0,0) {$R', M', \textit{pc}'$};
        \node[circle] (r2) at (7.2,0) {$\Gamma_{r}', \Gamma_{a}'$};
        %\node at (6.8,0.75) {+};
        \draw (s1) -- node[above] {$\mathcal{I} \land (R \le \Gamma_{r}) \land (M \le \Gamma_{a}) \land (\textit{pc} = \Gamma_{r}!\sigma)$} ++ (r1);
        \draw[-{Latex}] ($(s1.south) + (0,0.4)$) -- ($(s2.north) - (0,0.4)$);
        \draw[-{Latex},dashed] ($(r1.south) + (0,0.2)$) to[auto, pos=0.7] node {+} ($(r2.north) - (0,0.2)$);
        \draw[dashed] (s2) -- node[above] {$\mathcal{I} \land (R' \le \Gamma_{r}') \land (M' \le \Gamma_{a}') \land (\textit{pc}' = \Gamma_{r}'!\sigma)$} ++ (r2);
      \end{scope}
    \end{tikzpicture}
  \end{center}
\end{lemma}

\begin{proof}[Proof sketch]
  This simulation diagram is proven by induction over the operational semantics of 3AC, which allows us to find one or more steps in the HTL semantics that will produce the same final matching state.
\end{proof}

\subsection{Forward simulation from HTL to Verilog}\label{sec:proof:htl_verilog}

The HTL-to-Verilog simulation is conceptually simple, as the only transformation is from the map representation of the code to the case-statement representation.  The proof is more involved, as the semantics of a map structure are quite different to the semantics of the case-statement they are converted to.

%\YH{Maybe want to split this up into two lemmas?  One which states the proof about the map property of uniqueness of keys, and another proving the final theorem?}
\begin{lemma}[Forward simulation from HTL to Verilog]\label{lemma:verilog}
  We write $\yhfunction{tr\_verilog}$ for the translation from HTL to Verilog. (Note that this translation cannot fail, so we do not need the \yhconstant{OK} constructor here.)
  \begin{align*}
    &\forall h, V, B \in \texttt{Safe}, \yhfunction{tr\_verilog} (h) = V \land h \Downarrow B \implies V \Downarrow B.
  \end{align*}
\end{lemma}

\begin{proof}[Proof sketch]
  The translation from maps to case-statements is done by turning each node of the tree into a case-expression with the statements in each being the same.  The main difficulty for the proof is that a random-access structure is transformed into an inductive structure where a certain number of constructors need to be called to get to the correct case. 
  %\JW{I would chop from here.}\YH{Looks good to me.}
  %The proof of the translation from maps to case-statements follows by induction over the list of elements in the map and the fact that each key will be unique.  In addition to that, the statement that is currently being evaluated is guaranteed by the correctness of the list of elements to be in that list.  The latter fact therefore eliminates the base case, as an empty list does not contain the element we know is in the list.  The other two cases follow by the fact that either the key is equal to the evaluated value of the case expression, or it isn't.  In the first case we can then evaluate the statement and get the state after the case expression, as the uniqueness of the key tells us that the key cannot show up in the list anymore.  In the other case we can just apply the inductive hypothesis and remove the current case from the case statement, as it did not match.
\end{proof}

One problem with our representation of the state as an actual register is that we have to make sure that the state does not overflow.  Currently, the state register always has 32 bits, meaning the maximum number of states supported is $2^{32}$.
%This means that during the translation we have to check that each state can fit into an integer. 
\vericert{} will error out if there are more than this many nodes in the 3AC, thus satisfying the correctness theorem vacuously.

\subsection{Deterministic Semantics}\label{sec:proof:deterministic}

%Finally, to obtain the backward simulation that we want, it has to be shown that if we generate hardware with a specific behaviour, that it is the only possible program with that behaviour.  This only has to be performed for the final intermediate language, which is Verilog, so that the backward simulation holds for the whole chain from Clight to Verilog.
The final lemma we need is that the Verilog we generate is deterministic. This result allows us to replace the forwards simulation we have proved with the backwards simulation we desire.

\begin{lemma}\label{lemma:deterministic}
  If a Verilog program $V$ admits both behaviours $B_1$ and $B_2$, then $B_1$ and $B_2$ must be the same.

  \begin{equation*}
    \forall V, B_{1}, B_{2}, V \Downarrow B_{1} \land V \Downarrow B_{2} \implies B_{1} = B_{2}.
  \end{equation*}
\end{lemma}

\begin{proof}[Proof sketch]
  The Verilog semantics is deterministic because the order of operation of all the constructs is defined, and there is therefore only one way to evaluate the module and hence only one possible behaviour. This was proven for the small-step semantics shown in Figure~\ref{fig:inferrence_module}.
\end{proof}

%\subsection{Coq Mechanisation}

%\JW{Would be nice to include a few high-level metrics here. How many person-years of effort was the proof (very roughly)? How many lines of Coq? How many files, how many lemmas? How long does it take for the Coq proof to execute?}

\subsection{Coq Mechanisation}

\begin{table*}
  \centering
  \begin{tabular}{lrrrrr}
    \toprule
    & \textbf{Coq code} & \textbf{OCaml code} & \textbf{Specifications} & \textbf{Theorems \& Proofs} & \textbf{Total}\\
    \midrule
    {Data structures and libraries}     & 274  & --- & ---  & 654  & 928  \\
    {Integers and values}               & 98   & --- & 15   & 744  & 857  \\
    {HTL semantics}                     & ---  & --- & 174  & ---  & 174  \\
    {HTL generation}                    & 655  & --- & 79   & 3349 & 4083 \\
    {Verilog semantics}                 & ---  & --- & 739  & 174  & 913  \\
    {Verilog generation}                & 68   & --- & ---  & 396  & 464  \\
    {Top-level driver, pretty printers} & 89   & 747 & ---  & 209  & 1045 \\
    \midrule
    \textbf{Total}                                    & 1184 & 747 & 1007 & 5526 & 8464 \\
    \bottomrule
  \end{tabular}
  \caption{Statistics about the numbers of lines of code in the proof and implementation of \vericert{}.}
  \label{tab:proof_statistics}
\end{table*}

The lines of code for the implementation and proof of \vericert{} can be found in Table~\ref{tab:proof_statistics}.  Overall, it took about 1 person-year to build \vericert{} -- about two person-months on implementation and ten person-months on proofs. The largest proof is the correctness proof for the HTL generation, which required equivalence proofs between all integer operations supported by \compcert{} and those supported in hardware.  From the 3349 lines of proof code in the HTL generation, 1189 are for the correctness proof of just the load and store instructions.  These were tedious to prove correct because of the substantial difference between the memory models used, and the need to prove properties such as stores outside of the allocated memory being undefined, so that a finite array could be used. In addition to that, since pointers in HTL and Verilog are represented as integers, whereas there is a separate pointer value in the \compcert{} semantics, it was painful to reason about them and many new theorems had to be proven about integers and pointers in \vericert{}.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: