proof.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

\section{Proof}

This section describes the main correctness theorem that was proven and the main ideas behind the proofs.

The main correctness theorem states that for all Clight source programs $C$, if the translation from the source to the target Verilog code succeeds, assuming that $C$ has correct observable behaviour $B$ when executed, then the target Verilog code will simulate with the same behaviour $B$.  The following theorem describes this property.

\begin{equation*}
  \forall C, V, B \notin \texttt{Wrong},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \land C \Downarrow B \implies V \Downarrow_{s} B.
\end{equation*}

However, this forward simulation might still allow for wrong behaviour of the target code in cases where there are multiple possible behaviours in the target language.  This means that there may be cases where it executes correctly, however, there may also be other behaviours that are also valid.  However, if the target language is deterministic, meaning there is only one possible behaviour for all possible states, then this implies that the backwards simulation also holds.

\begin{equation*}
  \forall C, V, B \notin \texttt{Wrong},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \land V \Downarrow_{s} B \implies C \Downarrow B.
\end{equation*}

To prove this statement, we therefore have to prove that the Verilog semantics are deterministic and that the forward simulation between the C and the Verilog holds as well.
\JW{Hm, I'm a little confused. Which theorem is the actual `correctness' theorem? Looks like we could combine the two theorems into something like \[\forall C, V, B \notin \texttt{Wrong},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \implies (V \Downarrow_{s} B ~\text{iff}~ C \Downarrow B)\] Is that the theorem you actually want?}

\subsection{Forward Simulation}

The forward simulation between C and Verilog can be separated into forward simulations of each compiler pass, which can then be composed to provide a whole proof from C to Verilog.  We therefore only have to prove a forward simulation for the 3AC to HTL translation, and for the HTL to Verilog translation.

\subsubsection{3AC to HTL forward simulation}

As HTL is quite different to 3AC, this first translation is the most involved translation and therefore requires a larger proof, as the translation from 3AC instructions to Verilog statements needs to be proven correct.  In addition to that, the semantics of HTL are also quite different to the 3AC semantics, as instead of defining small-step semantics for each construct in Verilog, the semantics are instead defined over one clock cycle.

\begin{figure}
  \centering
\begin{minted}{coq}
Inductive match_states :
  3AC.state -> HTL.state -> Prop :=
| match_state : forall asa asr sf f sp
    sp' rs mem m st res
    (MASSOC : match_assocmaps f rs asr)
    (TF : tr_module f m)
    (WF : state_st_wf m (HTL.State res m st asr asa))
    (MF : match_frames sf res)
    (MARR : match_arrs m f sp mem asa)
    (SP : sp = Values.Vptr sp' (Integers.Ptrofs.repr 0))
    (RSBP : reg_stack_based_pointers sp' rs)
    (ASBP : arr_stack_based_pointers sp' mem
              (f.(3AC.fn_stacksize)) sp)
    (BOUNDS : stack_bounds sp (f.(3AC.fn_stacksize)) mem)
    (CONST : match_constants m asr),
    match_states
      (3AC.State sf f sp st rs mem)
      (HTL.State res m st asr asa)
| match_returnstate : forall v v' stack mem res
    (MF : match_frames stack res),
    val_value_lessdef v v' ->
    match_states
      (3AC.Returnstate stack v mem)
      (HTL.Returnstate res v')
| match_initial_call :
  forall f m m0 (TF : tr_module f m),
    match_states
      (3AC.Callstate nil (AST.Internal f) nil m0)
      (HTL.Callstate nil m nil).
\end{minted}
  \caption{\texttt{match\_states} predicate used to match an 3AC state to the equivalent HTL state.}\label{fig:match_states}
\end{figure}

The first step in proving the forward simulation is to define a relation that matches an 3AC state to an HTL state, which shows when the states are equivalent.  This relation also defines assumptions that are made about the 3AC code that we receive, so that these assumptions can be used to prove the translations correct.  These assumptions then have to be proven to always hold assuming the HTL code was created by the translation algorithm.  The \texttt{match\_states} predicate that is used to match the states of the 3AC code to the HTL code is shown in Figure~\ref{fig:match_states}.  The type \texttt{match\_states} declared in Figure~\ref{fig:match_states} has three constructors.

\begin{enumerate}
  \item \texttt{match\_state} is the main constructor which matches an \texttt{3AC.State} to a \texttt{HTL.State}, which during the normal execution of instructions in the function.
  \item \texttt{match\_returnstate} is the constructor used to match return statements in 3AC to HTL.\@  Even though function calls are not supported, there still has to be a notion of stack frames during the proof, as at the start of the program a stack frame is allocated, which then has to be deallocated (popped from the stack) when the main function returns its result.  The only condition on matching the return state is that the return values must be \emph{less defined} (either equal to, or if the C value is undefined any acceptable value is possible).
  \item \texttt{match\_initial\_state} matches the initial call to the main function, and cannot be used for any other function calls, as the stack frame is assumed to be \texttt{nil}.
\end{enumerate}

Using the \texttt{match\_states}, we can then define the correctness theorem for the translation.

\begin{align*}
  &\forall\ (S_{1} : \texttt{3AC.state})\ t\ S_{2},\ S_{1} \xrightarrow{t} S_{2}\\
  &\implies \forall\ (R_{1} : \texttt{HTL.state}),\ \texttt{match\_states}\ S_{1}\ R_{1}\\
  &\implies \exists R_{2},\ R_{1} \xrightarrow{t}_{+} R_{2} \land \texttt{match\_states}\ S_{2}\ R_{2}.
\end{align*}

$S_{1}$ and $S_{2}$ are 3AC states and $R_{1}$ and $R_{2}$, HTL states and $\xrightarrow{t}$ is one step in the semantics of 3AC and $\xrightarrow{t}_{+}$ is one or more steps in the semantics of HTL.\@  The correctness theorem then says that for all possible starting states $S_{1}$ and for the resulting state $S_{2}$ after one step in the semantics of 3AC, for all HTL states $R_{1}$ that matches the state $S_{1}$, there should exist a state $R_{2}$ such that $R_{2}$ is the result of the simulation of the HTL semantics and that the states $S_{2}$ and $R_{2}$ should match as well.  Using this property, the forward simulation can then be proven correct by also showing that the initial and final states of the simulation match as well.

\subsubsection{HTL to Verilog forward simulation}

The HTL to Verilog simulation is quite simple, as the only transformation is from the map representation of the code to the case statement representation.  As the representations are quite different though, to prove that they are equivalent the following observations have to be made.

Firstly, as the input representation is a map, all the keys of the map will be unique, which will be translated to the matched expressions in the case statement.  In addition to that, the assumption is made that every possible evaluation of the state value will give a valid state, meaning the key will be present in the map.  Both of these observations mean that after the translation, for every possible value of the state, the case statement will be able to match an expression and enter the correct statement.  As each key is unique, if the state matches a case expression, then this also means that this is the only case expression that it could match, and it therefore has to be the correct case expression which contains the same statement as the map.

However, the proof of uniqueness of the keys only works if the translation function is \emph{injective}, meaning that the function will result in distinct outputs for all possible inputs.  Otherwise, the proof of uniqueness of the keys for the input would not translate to a uniqueness of possible case expression matches in the output.  However, in our case the translation function is actually not injective, even though it might at first seem like it, because the state is stored as an integer, whereas the map is addressable by any positive number.  This means that if the positive number is greater than the maximum value that can be stored in the integer, the overflow would result in the wrong states being accessed.  To make the function injective, we therefore have to prove that the input positive number is never greater than $2^{32}-1$ so that the uniqueness property also holds for the output.

\subsection{Deterministic Semantics}

Finally, to prove the backward simulation given the forward simulation, it has to be shown that if we generate hardware with a specific behaviour, that it is the only possible program with that behaviour.  This only has to be performed for the final intermediate language, which is Verilog, so that the backward simulation holds for the whole chain from Clight to Verilog.

The Verilog semantics that are used are deterministic, as the order of operation of all the constructs is defined.

\subsection{Coq Mechanisation}

\JW{Would be nice to include a few high-level metrics here. How many person-years of effort was the proof (very roughly)? How many lines of Coq? How many files, how many lemmas? How long does it take for the Coq proof to execute?}

\subsection{Proving \texttt{Oshrximm} correct}

% Mention that this optimisation is not performed sometimes (clang -03).

Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware could run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.

These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.

Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two.  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:

\begin{gather*}
\forall x, y \in \mathbb{Z},\ \ 0 \leq y < 31,\ \ -2^{31} \leq x < 2^{31},\\
x \div 2^y =
\begin{cases}
  \left\lfloor \frac{x}{2^y} \right\rfloor = x >> y,& \text{if } x \geq 0\\
  \left\lceil \frac{x}{2^y} \right\rceil = - \left\lfloor \frac{-x}{2^y} \right\rfloor = - ( - x >> y ),& \text{otherwise}
\end{cases}\\
\end{gather*}

The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: