summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-15 19:42:53 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-15 19:42:53 +0100
commitac15b33e1bc6faad4b09fb508caf969e813b43db (patch)
tree7d1cc890b1d6bb06f5083d8f9708078381745c49
parent5a8c95178395dc095c236b98e7da046467a746a9 (diff)
downloadoopsla21_fvhls-ac15b33e1bc6faad4b09fb508caf969e813b43db.tar.gz
oopsla21_fvhls-ac15b33e1bc6faad4b09fb508caf969e813b43db.zip
Add more
-rw-r--r--proof.tex34
-rw-r--r--verilog.tex22
2 files changed, 31 insertions, 25 deletions
diff --git a/proof.tex b/proof.tex
index 9d250e0..e8aa4ba 100644
--- a/proof.tex
+++ b/proof.tex
@@ -69,7 +69,7 @@ These assumptions then have to be proven to always hold assuming the HTL code wa
\begin{itemize}
\item The 3AC register file $R$ needs to be `less defined' than the HTL register map $\Gamma_{r}$ (written $R \le \Gamma_{r}$). This means that all entries should be equal to each other, unless a value in $R$ is undefined, in which case any value can match it.
\item The RAM values represented by each Verilog array in $\Gamma_{a}$ need to match the 3AC function's stack contents, which are part of the memory $M$; that is: $M \le \Gamma_{a}$.
- \item The state is well formed, which means that the value of the state register matches the current value of the program counter; that is: $\textit{pc} = \Gamma_{r}!\sigma$.
+ \item The state is well formed, which means that the value of the state register matches the current value of the program counter; that is: $\textit{pc} = \Gamma_{r}[\sigma]$.
\end{itemize}
We also define the following set $\mathcal{I}$ of invariants that must hold for the current state to be valid:
@@ -94,10 +94,10 @@ We can now define the simulation diagram for the translation. The 3AC state can
\node[circle] (s2) at (0,0) {$R', M', \textit{pc}'$};
\node[circle] (r2) at (7.2,0) {$\Gamma_{r}', \Gamma_{a}'$};
%\node at (6.8,0.75) {+};
- \draw (s1) -- node[above] {$\mathcal{I} \land (R \le \Gamma_{r}) \land (M \le \Gamma_{a}) \land (\textit{pc} = \Gamma_{r}!\sigma)$} ++ (r1);
+ \draw (s1) -- node[above] {$\mathcal{I} \land (R \le \Gamma_{r}) \land (M \le \Gamma_{a}) \land (\textit{pc} = \Gamma_{r}[\sigma])$} ++ (r1);
\draw[-{Latex}] ($(s1.south) + (0,0.4)$) -- ($(s2.north) - (0,0.4)$);
\draw[-{Latex},dashed] ($(r1.south) + (0,0.2)$) to[auto, pos=0.7] node {+} ($(r2.north) - (0,0.2)$);
- \draw[dashed] (s2) -- node[above] {$\mathcal{I} \land (R' \le \Gamma_{r}') \land (M' \le \Gamma_{a}') \land (\textit{pc}' = \Gamma_{r}'!\sigma)$} ++ (r2);
+ \draw[dashed] (s2) -- node[above] {$\mathcal{I} \land (R' \le \Gamma_{r}') \land (M' \le \Gamma_{a}') \land (\textit{pc}' = \Gamma_{r}'[\sigma])$} ++ (r2);
\end{scope}
\end{tikzpicture}
\end{center}
@@ -107,6 +107,26 @@ We can now define the simulation diagram for the translation. The 3AC state can
This simulation diagram is proven by induction over the operational semantics of 3AC, which allows us to find one or more steps in the HTL semantics that will produce the same final matching state.
\end{proof}
+\subsection{Forward simulation of RAM insertion}\label{sec:proof:ram_insertion}
+
+HTL can only represent a single state machine, it is therefore necessary to model
+
+\begin{figure}
+ \centering
+ \begin{minipage}{1.0\linewidth}
+ \begin{gather*}
+ \inferrule[Idle]{\Gamma_{\rm r}[\textit{r.en}] = \Gamma_{\rm r}[\textit{r.u\_{en}}]}{((\Gamma_{\rm r}, \Gamma_{\rm a}), \Delta, r) \downarrow_{\text{ram}} \Delta}\\
+%
+ \inferrule[Load]{\Gamma_{\rm r}[\textit{r.en}] \ne \Gamma_{\rm r}[\textit{r.u\_en}] \\ \Gamma_{\rm r}[\textit{r.wr\_en}] = 0}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}, \textit{r.d\_out} \mapsto (\Gamma_{\rm a}[\textit{r.mem}])[ \textit{r.addr}]], \Delta_{\rm a}) }\\
+%
+ \inferrule[Store]{\Gamma_{\rm r}[\textit{r.en}] \ne \Gamma_{\rm r}[\textit{r.u\_en}] \\ \Gamma_{\rm r}[\textit{r.wr\_en}] = 1}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}], \Delta_{\rm a}[\textit{r.mem} \mapsto (\Gamma_{\rm a}[ \textit{r.mem}])[\textit{r.addr} \mapsto \textit{r.d\_in}]]) }
+ \end{gather*}
+ \end{minipage}
+ \caption{Specification for the memory implementation in HTL, where $r$ is the RAM, which is then implemented by equivalent Verilog code.}\label{fig:htl_ram_spec}
+\end{figure}
+
+This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.
+
\subsection{Forward simulation from HTL to Verilog}\label{sec:proof:htl_verilog}
The HTL-to-Verilog simulation is conceptually simple, as the only transformation is from the map representation of the code to the case-statement representation. The proof is more involved, as the semantics of a map structure are quite different to the semantics of the case-statement they are converted to.
@@ -162,7 +182,7 @@ The final lemma we need is that the Verilog we generate is deterministic. This r
{Integers and values} & 98 & --- & 15 & 968 & 1081 \\
{HTL semantics} & 50 & --- & 181 & 65 & 296 \\
{HTL generation} & 590 & --- & 66 & 3069 & 3725 \\
- {RAM generation} & 203 & --- & --- & 2843 & 3046 \\
+ {RAM generation} & 253 & --- & --- & 2793 & 3046 \\
{Verilog semantics} & 78 & --- & 431 & 235 & 744 \\
{Verilog generation} & 104 & --- & --- & 486 & 590 \\
{Top-level driver, pretty printers} & 318 & 775 & --- & 189 & 1282 \\
@@ -174,7 +194,11 @@ The final lemma we need is that the Verilog we generate is deterministic. This r
\label{tab:proof_statistics}
\end{table*}
-The lines of code for the implementation and proof of \vericert{} can be found in Table~\ref{tab:proof_statistics}. Overall, it took about 1 person-year \JW{This still true?} to build \vericert{} -- about two person-months on implementation and ten person-months on proofs. The largest proof is the correctness proof for the HTL generation, which required equivalence proofs between all integer operations supported by \compcert{} and those supported in hardware. From the 3349 lines of proof code in the HTL generation, 1189 are for the correctness proof of just the load and store instructions. These were tedious to prove correct because of the substantial difference between the memory models used, and the need to prove properties such as stores outside of the allocated memory being undefined, so that a finite array could be used. In addition to that, since pointers in HTL and Verilog are represented as integers, whereas there is a separate pointer value in the \compcert{} semantics, it was painful to reason about them and many new theorems had to be proven about integers and pointers in \vericert{}. \JW{Can we include a comment about the size of the trusted base, in case we get that reviewer again?}
+The lines of code for the implementation and proof of \vericert{} can be found in Table~\ref{tab:proof_statistics}. Overall, it took about 1.5 person-year to build \vericert{} -- about two person-months on implementation and ten person-months on proofs. The largest proof is the correctness proof for the HTL generation, which required equivalence proofs between all integer operations supported by \compcert{} and those supported in hardware. From the 3069 lines of proof code in the HTL generation, 1189 are for the correctness proof of just the load and store instructions. These were tedious to prove correct because of the substantial difference between the memory models used, and the need to prove properties such as stores outside of the allocated memory being undefined, so that a finite array could be used. In addition to that, since pointers in HTL and Verilog are represented as integers, whereas there is a separate pointer value in the \compcert{} semantics, it was painful to reason about them and many new theorems had to be proven about integers and pointers in \vericert{}. In addition to that, the second largest proof of the correct RAM generation includes many proofs about the extensional equality of array operations, such as merging arrays with different assignments. As the negative edge implies two merges take place every clock cycle, the proofs about the equality of the arrays becomes more tedious as well.
+
+Looking at the trusted base of \vericert{}, the Verilog semantics are 431. This, together with the Clight semantics from \compcert{}, are the only parts of the compiler that need to be trusted. Compared to the 1721 lines of the implementation that are written in Coq, which are the verified parts of the synthesis tool, this is larger than the 431 lines of Verilog semantics specification, even if the Clight semantics are added. In addition to that, reading semantics specifications is simpler than trying to understand algorithms, meaning the trusted base has been successfully reduced.
+
+%\JW{Can we include a comment about the size of the trusted base, in case we get that reviewer again?}
%%% Local Variables:
%%% mode: latex
diff --git a/verilog.tex b/verilog.tex
index 061444b..e0d1257 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -73,9 +73,9 @@ The main execution of the module $\downarrow_{\text{module}}$ is split into $\do
\centering
\begin{minipage}{1.0\linewidth}
\begin{gather*}
- \inferrule[Step]{\Gamma_r[\textit{rst}] = \yhconstant{Some } 0 \\ \Gamma_r[\textit{fin}] = \yhconstant{Some } 0 \\ \Gamma_r[\sigma] = \yhconstant{Some } v \\ (m, (\Gamma_r, \Gamma_a))\ \downarrow_{\text{program}} (\Gamma_r', \Gamma_a') \\ \Gamma_r'[\sigma] = \yhconstant{Some } v'}{\yhconstant{State } \textit{sf }\ m\ v\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{State } \textit{sf }\ m\ v'\ \Gamma_r'\ \Gamma_a'}\\
+ \inferrule[Step]{\Gamma_r[\textit{rst}] = 0 \\ \Gamma_r[\textit{fin}] = 0 \\ (m, (\Gamma_r, \Gamma_a))\ \downarrow_{\text{program}} (\Gamma_r', \Gamma_a')}{\yhconstant{State } \textit{sf }\ m\ \ \Gamma_r[\sigma]\ \ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{State } \textit{sf }\ m\ \ \Gamma_r'[\sigma]\ \ \Gamma_r'\ \Gamma_a'}\\
%
- \inferrule[Finish]{\Gamma_r[\textit{fin}] = \yhconstant{Some } 1 \\ \Gamma_r[ \textit{ret}] = \yhconstant{Some } r}{\yhconstant{State } \textit{sf }\ m\ \sigma\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{Returnstate } \textit{sf }\ r}\\
+ \inferrule[Finish]{\Gamma_r[\textit{fin}] = 1}{\yhconstant{State } \textit{sf }\ m\ \sigma\ \Gamma_r\ \Gamma_a \longrightarrow \yhconstant{Returnstate } \textit{sf }\ \Gamma_r[ \textit{ret}]}\\
%
\inferrule[Call]{ }{\yhconstant{Callstate } \textit{sf }\ m\ \vec{r} \longrightarrow \yhconstant{State } \textit{sf }\ m\ n\ ((\yhfunction{init\_params}\ \vec{r}\ a)[\sigma \mapsto n, \textit{fin} \mapsto 0, \textit{rst} \mapsto 0])\ \epsilon}\\
%
@@ -130,24 +130,6 @@ This translation is represented in Figure~\ref{fig:memory_model_transl}, where \
However, in practice, assigning and reading from an array directly in the state machine will not produce a memory in the final hardware design, as the synthesis tool cannot identify the array as having the necessary properties that a RAM needs, even though this is the most natural formulation of memory. Even though theoretically the memory will only be read from once per clock cycle, the synthesis tool cannot ensure that this is true, and will instead create a register for each memory location. This increases the size of the circuit dramatically, as the RAM on the FPGA chip will not be reused. Instead, the synthesis tool expects a specific template that ensures these properties, and will then transform the template into a proper RAM during synthesis. Therefore, a translation has to be performed from the naive use of memory in the state machine, to a proper use of a memory template.
-\begin{figure}
- \centering
- \begin{minipage}{1.0\linewidth}
- \begin{gather*}
- \inferrule[None]{ }{(\Gamma,\Delta, \yhconstant{None}) \downarrow_{\text{ram}} \Delta}\qquad
-%
- \inferrule[Idle]{\Gamma_{\rm r}[\textit{r.en}] = \Gamma_{\rm r}[\textit{r.u\_{en}}]}{((\Gamma_{\rm r}, \Gamma_{\rm a}), \Delta, \yhconstant{Some}\ r) \downarrow_{\text{ram}} \Delta}\\
-%
- \inferrule[Read]{\Gamma_{\rm r}[\textit{r.en}] \ne \Gamma_{\rm r}[\textit{r.u\_en}] \\ \Gamma_{\rm r}[\textit{r.wr\_en}] = 0}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), \yhconstant{Some}\ r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}, \textit{r.d\_out} \mapsto (\Gamma_{\rm a}[\textit{r.mem}])[ \textit{r.addr}]], \Delta_{\rm a}) }\\
-%
- \inferrule[Write]{\Gamma_{\rm r}[\textit{r.en}] \ne \Gamma_{\rm r}[\textit{r.u\_en}] \\ \Gamma_{\rm r}[\textit{r.wr\_en}] = 1}{((\Gamma_{\rm r}, \Gamma_{\rm a}), (\Delta_{\rm r}, \Delta_{\rm a}), \yhconstant{Some}\ r) \downarrow_{\text{ram}} (\Delta_{\rm r}[\textit{r.en} \mapsto \textit{r.u\_en}], \Delta_{\rm a}[\textit{r.mem} \mapsto (\Gamma_{\rm a}[ \textit{r.mem}])[\textit{r.addr} \mapsto \textit{r.d\_in}]]) }
- \end{gather*}
- \end{minipage}
- \caption{Specification for memory implementation in HTL, which is then implemented by equivalent Verilog code.}\label{fig:htl_ram_spec}
-\end{figure}
-
-This memory template can be represented using the following semantics shown in Figure~\ref{fig:htl_ram_spec}, which is then translated to the equivalent Verilog implementation shown in Figure~\ref{fig:verilog_ram_impl}.
-
%\begin{figure}
% \centering
% \begin{subfigure}[t]{0.48\linewidth}