summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-07-27 01:06:50 +0200
committerYann Herklotz <git@yannherklotz.com>2021-07-27 01:06:50 +0200
commitbefb0f0edd9f0af617464610e08bc16f8f0ebbf2 (patch)
tree7a05e19b730b8335d0fdb5a299fd5c3121d8791e
parentfe56f9efb8f9d983a9024383fa17695b347ec67e (diff)
downloadoopsla21_fvhls-befb0f0edd9f0af617464610e08bc16f8f0ebbf2.tar.gz
oopsla21_fvhls-befb0f0edd9f0af617464610e08bc16f8f0ebbf2.zip
Add more descriptions of challenges in proofs
-rw-r--r--algorithm.tex2
-rw-r--r--introduction.tex2
-rw-r--r--proof.tex12
3 files changed, 13 insertions, 3 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 7570ed7..676df46 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -425,7 +425,7 @@ There are various limitations in \vericert{} compared to other HLS tools due to
The main limitation of \vericert{} is that it does not perform scheduling of instructions into the same state, meaning that instructions cannot be executed in parallel. However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling. HTL is already an intermediate language which represents an FSMD which can have any number of statements in each state, and the semantics of HTL and Verilog allow for parallel assignments to registers. To simplify the proof of the scheduling algorithm, and to minimise the changes necessary for the current translation from 3AC to HTL, a new language must be introduced, called 3ACPar, which would be equivalent to 3AC but instead of consisting of a map from program counter to instruction, it would consist of a map from program counter to list of instructions, which can all be executed in parallel. A new proof for the scheduling algorithm would have to be written for the translation from 3AC to 3ACPar, for which a verified translation validation approach might be appropriate, however, the translation form 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state. This small difference means that most of the proof can be reused without any changes, as the translation of instructions was the main body of the proof and did not change.
-In a similar vein, the introduction of pipelined operators, especially for division, can alleviate many of the timing issues shown in the PolyBench/C benchmarks with divisions included in Section~\ref{sec:evaluation}. These operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware. In HTL, operations like this can be represented in a similar fashion to the load and store instructions to the RAM, by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation. However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
+In a similar vein, the introduction of pipelined operators, especially for division, can alleviate many of the timing issues shown in the \polybench{} benchmarks with divisions included in Section~\ref{sec:evaluation}. These operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware. In HTL, operations like this can be represented in a similar fashion to the load and store instructions to the RAM, by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation. However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
\paragraph{Limitations with I/O}
diff --git a/introduction.tex b/introduction.tex
index 49d8708..a36d8b4 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -43,7 +43,7 @@ The contributions of this paper are as follows:
\begin{itemize}
\item We present \vericert{}, the first mechanically verified HLS tool that compiles C to Verilog. In Section~\ref{sec:design}, we describe the design of \vericert{}, including a few optimisations related to memory accesses and division.
- \item We state the correctness theorem of \vericert{} with respect to an existing semantics for Verilog due to \citet{loow19_proof_trans_veril_devel_hol}. In Section~\ref{sec:verilog}, we describe how we extended this semantics to make it suitable as an HLS target.
+ \item We state the correctness theorem of \vericert{} with respect to an existing semantics for Verilog due to \citet{loow19_proof_trans_veril_devel_hol}. In Section~\ref{sec:verilog}, we describe how we extended this semantics to make it suitable as an HLS target. We also describe how the Verilog semantics are integrated into CompCert's model of the semantics, and how CompCert's memory model is translated to Verilog's low-level and finite memory model.
\item In Section~\ref{sec:proof}, we describe how we proved the correctness theorem. The proof follows standard \compcert{} techniques -- forward simulations, intermediate specifications, and determinism results -- but we encountered several challenges peculiar to our hardware-oriented setting. These include handling discrepancies between byte- and word-addressable memories, different handling of unsigned comparisons between C and Verilog, correctly mapping \compcert{}'s memory model onto a finite Verilog array and finally correctly rearranging memory reads and writes so that these behave properly as a RAM in hardware.
\item In Section~\ref{sec:evaluation}, we evaluate \vericert{} on the \polybench{} benchmark suite~\cite{polybench}, and compare the performance of our generated hardware against an existing, unverified HLS tool called \legup{}~\cite{canis11_legup}. We show that \vericert{} generates hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the absence of division) and \areaIncr$\times$ larger than that generated by \legup{}. We intend to bridge this performance gap in the future by introducing (and verifying) HLS optimisations of our own, such as scheduling and memory analysis.
\end{itemize}
diff --git a/proof.tex b/proof.tex
index 1218ecd..804c10d 100644
--- a/proof.tex
+++ b/proof.tex
@@ -2,7 +2,7 @@
Now that the Verilog semantics have been adapted to the CompCert model, we are in a position to formally prove the correctness of our C-to-Verilog compilation. This section describes the main correctness theorem that was proven and the main ideas behind the proof. The full Coq proof is available in auxiliary material.
-The main correctness theorem is analogous to that stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil}: for all Clight source programs $C$, if the translation to the target Verilog code succeeds, and $C$ has safe observable behaviour $B$ when executed, then the target Verilog code will have the same behaviour $B$. Here, a `safe' execution is one that either converges or diverges, but does not ``go wrong''. If the program does admit some wrong behaviour (like undefined behaviour in C), the correctness theorem does not apply. A behaviour, then, is either a final state (in the case of convergence) or divergence. In \compcert{}, a behaviour is also associated with a trace of I/O events, but since external function calls are not supported in \vericert{}, this trace will always be empty for us. Note that the compiler is allowed to fail and not produce any output; the correctness theorem only applies when the translation succeeds.
+The main correctness theorem is analogous to that stated in \compcert{}~\cite{leroy09_formal_verif_realis_compil}: for all Clight source programs $C$, if the translation to the target Verilog code succeeds, and $C$ has safe observable behaviour $B$ when executed, then the target Verilog code will have the same behaviour $B$. Here, a `safe' execution is one that either converges or diverges, but does not ``go wrong''. If the program does admit some wrong behaviour (like undefined behaviour in C), the correctness theorem does not apply. A behaviour, then, is either a final state (in the case of convergence) or divergence. In \compcert{}, a behaviour is also associated with a trace of I/O events, but since external function calls are not supported in \vericert{}, this trace will always be empty. This correctness theorem is also appropriate for HLS, as HLS is often used as a part of a larger hardware design that is connected together using a hardware description language like Verilog. This means that HLS designs are normally triggered multiple times and results are returned each time when the computation terminates, which is the property that the correctness theorem states. Note that the compiler is allowed to fail and not produce any output; the correctness theorem only applies when the translation succeeds.
%The following `backwards simulation' theorem describes the correctness theorem, where $\Downarrow$ stands for simulation and execution respectively.
@@ -19,6 +19,16 @@ In practice, Clight programs are all deterministic, as are the Verilog programs
Furthermore, to prove the forward simulation, it suffices to prove forward simulations between each intermediate language, as these results can be composed to prove the correctness of the whole HLS tool.
The forward simulation from 3AC to HTL is stated in Lemma~\ref{lemma:htl} (Section~\ref{sec:proof:3ac_htl}), the forward simulation for the RAM insertion is shown in Lemma~\ref{lemma:htl_ram} (Section~\ref{sec:proof:ram_insertion}), then the forward simulation between HTL and Verilog is shown in Lemma~\ref{lemma:verilog} (Section~\ref{sec:proof:htl_verilog}) and finally, the proof that Verilog is deterministic is given in Lemma~\ref{lemma:deterministic} (Section~\ref{sec:proof:deterministic}).
+\subsection{Main challenges in the proof}
+
+The proof of correctness of the Verilog back end is quite different to the usual proofs performed in CompCert, mainly because the difference in Verilog semantics compared to the standard CompCert intermediate languages and because of the translation of the memory model.
+
+Because the memory model in our Verilog semantics is finite and concrete, whereas the CompCert memory model is more abstract and infinite with additional bounds, the equivalence of both these models needs to be proven. Moreover, our memory is word-addressed for efficiency reasons, whereas CompCert's memory is byte-addressed.
+
+The Verilog semantics operates quite differently to the usual intermediate languages in the backend. All the CompCert intermediate languages use a map from control-flow nodes to instructions. An instruction can therefore be selected using an abstract program pointer. On the other hand, in the Verilog semantics the whole design is executed at every clock cycle, because hardware is inherently parallel. The program pointer is part of the design as well, not just part of an abstract state. This makes the semantics of Verilog simpler, but comparing it to the semantics of 3AC becomes more challenging, as one has to map the abstract notion of the state to concrete values in registers.
+
+Both these differences mean that translating 3AC directly to Verilog is infeasible, as the differences in the semantics is too large. Instead, a new intermediate language needs to be introduced, called HTL, which bridges the gap in the semantics between the two languages. HTL still consists of maps, like many of the other CompCert languages, however, each state corresponds to a Verilog statement.
+
\subsection{Forward simulation from 3AC to HTL}\label{sec:proof:3ac_htl}
As HTL is quite far removed from 3AC, this first translation is the most involved and therefore requires a larger proof, because the translation from 3AC instructions to Verilog statements needs to be proven correct in this step. In addition to that, the semantics of HTL are also quite different to the 3AC semantics, as instead of defining small-step semantics for each construct in Verilog, the semantics are instead defined over one clock cycle and mirror the semantics defined for Verilog. Lemma~\ref{lemma:htl} shows the result that needs to be proven in this subsection.