Fix more

author: Yann Herklotz <git@yannherklotz.com> 2021-04-16 23:32:37 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-04-16 23:32:37 +0100
commit: 71933526a7c203fb76d54d8f08fea3e132da535c (patch)
tree: 4c5ba14271ad40cf72faa66648315999c7e30d8c
parent: 00656c3a17263c8153cd96488cf06b571422a3d3 (diff)
download: oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.tar.gz
oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.zip
5 files changed, 30 insertions, 7 deletions
diff --git a/conclusion.tex b/conclusion.tex
index 8ac9561..5029bae 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -17,7 +17,7 @@ Beyond this, we plan to implement scheduling and loop pipelining, since this all
 Other optimisations include resource sharing to reduce the circuit area, and using tailored hardware operators that use hard IP blocks on chip and can be pipelined. 
 % this could include multi-cycle operations and pipelining optimisations so that division and multiplication operators also become more efficient.
 
-Finally, it's worth considering how trustworthy \vericert{} is compared to other HLS tools. The guarantee of full functional equivalence between input and output that $\vericert$ provides is a strong one, the semantics for the source and target languages are both well-established, and Coq is a mature and thoroughly tested system. However, \vericert{} cannot guarantee to provide an output for every valid input -- for instance, as remarked in Section~\ref{sec:proof:htl_verilog}, \vericert{} will error out if given a program with more than about four million instructions! -- but our evaluation indicates that it does not seem to error out too frequently. And of course, \vericert{} cannot guarantee that the final hardware produced will be correct, because the Verilog it generates must pass through a series of unverified tools along the way. This concern may be allayed in the future by ongoing work we are aware of to produce a verified hardware synthesis tool. \JW{Is this tool citable yet?}
+Finally, it's worth considering how trustworthy \vericert{} is compared to other HLS tools. The guarantee of full functional equivalence between input and output that $\vericert$ provides is a strong one, the semantics for the source and target languages are both well-established, and Coq is a mature and thoroughly tested system. However, \vericert{} cannot guarantee to provide an output for every valid input -- for instance, as remarked in Section~\ref{sec:proof:htl_verilog}, \vericert{} will error out if given a program with more than about four million instructions! -- but our evaluation indicates that it does not seem to error out too frequently. And of course, \vericert{} cannot guarantee that the final hardware produced will be correct, because the Verilog it generates must pass through a series of unverified tools along the way. This concern may be allayed in the future by ongoing work we are aware of to produce a verified hardware synthesis tool~\cite{10.1145/3437992.3439916}.
 
 
 %%% Local Variables:
diff --git a/evaluation.tex b/evaluation.tex
index 2e4ccd5..5f59ff8 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -154,9 +154,9 @@ This gap does not represent the performance cost that comes with formally provin
 Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
 As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
 
-Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
+Secondly, looking at the maximum clock frequency that each design can achieve, \legup{} designs achieve 8.2$\times$ the maximum clock frequency of \vericert{} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulo operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can be explained by the fact that \legup{} uses a memory controller to manage multiple RAMs using one interface, which is not needed in \vericert{} as a single RAM is used for the memory.
 
-Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases.  For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}.  Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
+Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases.  For the \texttt{trmm} benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}.  Another interesting benchmark is \texttt{doitgen}, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
 
 \subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
 
diff --git a/proof.tex b/proof.tex
index 5f7dc16..1a04fc7 100644
--- a/proof.tex
+++ b/proof.tex
@@ -18,7 +18,7 @@ The theorem is a `backwards simulation' result (from target to source), followin
 In practice, Clight programs are all deterministic, as are the Verilog programs in the fragment we consider. This means that we can prove the correctness theorem above by first inverting it to become a forwards simulation result, following standard \compcert{} practice.
 
 The second observation that needs to be made is that to prove this forward simulation, it suffices to prove forward simulations between each intermediate language, as these results can be composed to prove the correctness of the whole HLS tool. 
-The forward simulation from 3AC to HTL is stated in Lemma~\ref{lemma:htl} (Section~\ref{sec:proof:3ac_htl}), then the forward simulation between HTL and Verilog is shown in Lemma~\ref{lemma:verilog} (Section~\ref{sec:proof:htl_verilog}) and finally, the proof that Verilog is deterministic is given in Lemma~\ref{lemma:deterministic} (Section~\ref{sec:proof:deterministic}).
+The forward simulation from 3AC to HTL is stated in Lemma~\ref{lemma:htl} (Section~\ref{sec:proof:3ac_htl}), the forward simulation for the RAM insertion is shown in Lemma~\ref{lemma:htl_ram} (Section~\ref{sec:proof:ram_insertion}), then the forward simulation between HTL and Verilog is shown in Lemma~\ref{lemma:verilog} (Section~\ref{sec:proof:htl_verilog}) and finally, the proof that Verilog is deterministic is given in Lemma~\ref{lemma:deterministic} (Section~\ref{sec:proof:deterministic}).
 
 \subsection{Forward simulation from 3AC to HTL}\label{sec:proof:3ac_htl}
 
@@ -147,8 +147,13 @@ Another simulation proof is performed to prove that the insertion of the RAM is
 
 The other invariants and assumptions for defining two matching states in HTL are quite similar to the simulation performed in Lemma~\ref{lemma:simulation_diagram}, such as ensuring that the states have the same value, and that the values in the registers are less-defined.  In particular, the less-defined relation matches up all the registers, except for the new registers introduced by the RAM.
 
-%\begin{lemma}\label{lemma:simulation_diagram_ram}
+\begin{lemma}[Forward simulation from HTL to Verilog]\label{lemma:htl_ram}
+  Given an HTL program, the forward simulation relation should hold after inserting the RAM.
 
+  \begin{align*}
+    &\forall h, B \in \texttt{Safe}, \yhfunction{spec\_ram} (h) = h' \land h \Downarrow B \implies h' \Downarrow B.
+  \end{align*}
+\end{lemma}
 
 \subsection{Forward simulation from HTL to Verilog}\label{sec:proof:htl_verilog}
 
diff --git a/references.bib b/references.bib
index 7d53f71..44e8be8 100644
--- a/references.bib
+++ b/references.bib
@@ -934,3 +934,21 @@ year = {2020},
 	numpages = {28},
 	keywords = {Generation of Object Files, Assembler Verification, Verified Separate Compilation}
 }
+
+@inproceedings{10.1145/3437992.3439916,
+author = {L\"{o}\"{o}w, Andreas},
+title = {Lutsig: A Verified Verilog Compiler for Verified Circuit Development},
+year = {2021},
+isbn = {9781450382991},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3437992.3439916},
+doi = {10.1145/3437992.3439916},
+abstract = {We report on a new verified Verilog compiler called Lutsig. Lutsig currently targets (a class of) FPGAs and is capable of producing technology mapped netlists for FPGAs. We have connected Lutsig to existing Verilog development tools, and in this paper we show how Lutsig, as a consequence of this connection, fits into a hardware development methodology for verified circuits in the HOL4 theorem prover. One important step in the methodology is transporting properties proved at the behavioral Verilog level down to technology mapped netlists, and Lutsig is the component in the methodology that enables such transportation.},
+booktitle = {Proceedings of the 10th ACM SIGPLAN International Conference on Certified Programs and Proofs},
+pages = {46–60},
+numpages = {15},
+keywords = {hardware verification, hardware synthesis, compiler verification},
+location = {Virtual, Denmark},
+series = {CPP 2021}
+}
diff --git a/related.tex b/related.tex
index 8bf6743..c2cd8ff 100644
--- a/related.tex
+++ b/related.tex
@@ -26,7 +26,7 @@
     \node[align=center] at (0.15,1.5) {Translation validation approaches \\ \footnotesize\cite{mentor20_catap_high_level_synth,kundu08_valid_high_level_synth,clarke03_behav_c_veril}};
     \node at (0,0.5) {\bf \vericert{}};
     \node[align=left] at (-1.8,0.0) {\footnotesize K\^oika \cite{bourgeat20_essen_blues}};
-    \node[align=left] at (-1.8,0.32) { \footnotesize\citet{loow19_verif_compil_verif_proces}};
+    \node[align=left] at (-1.8,0.32) { \footnotesize\citet{10.1145/3437992.3439916}};
 
     \node at (2.2,0.2) {\footnotesize\citet{ellis08}};
     \node at (-1.3,-0.32) { {\footnotesize\citet{perna12_mechan_wire_wise_verif_handel_c_synth}}};
@@ -48,7 +48,7 @@ Most practical HLS tools~\citep{canis11_legup,xilinx20_vivad_high_synth,intel20_
 
 Ongoing work in translation validation~\citep{pnueli98_trans} seeks to prove equivalence between the hardware generated by an HLS tool and the original behavioural description in C.  An example of a tool that implements this is Mentor's Catapult~\citep{mentor20_catap_high_level_synth}, which tries to match the states in the 3AC description to states in the original C code after an unverified translation.  Using translation validation is quite effective for verifying complex optimisations such as scheduling~\citep{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\citep{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}, but the validation has to be run every time the HLS is performed.  In addition to that, the proofs are often not mechanised or directly related to the actual implementation, meaning the verifying algorithm might be wrong and hence could give false positives or false negatives.
 
-Finally, there are a few relevant mechanically verified tools.  First, K\^{o}ika is a formally verified translation from a core fragment of BlueSpec into a circuit representation which can then be printed as a Verilog design.  This is a translation from a high-level hardware description language into an equivalent circuit representation, so is a different approach to HLS.  \citet{loow19_proof_trans_veril_devel_hol} used a verified translation from HOL4 code describing state transitions into Verilog to design a verified processor~\citep{loow19_verif_compil_verif_proces}. \JW{That sentence is a bit confusing because it's not clear why Loow is cited twice.} Their approach translated a shallow embedding in HOL4 into a deep embedding of Verilog.
+Finally, there are a few relevant mechanically verified tools.  First, K\^{o}ika is a formally verified translation from a core fragment of BlueSpec into a circuit representation which can then be printed as a Verilog design.  This is a translation from a high-level hardware description language into an equivalent circuit representation, so is a different approach to HLS.  \citet{loow19_proof_trans_veril_devel_hol} used a verified translation from HOL4 code describing state transitions into Verilog to design a verified processor, which is described in \citet{loow19_verif_compil_verif_proces}.  In addition to that, there is also work on formally verifying a synthesis tool to transform, which can transform hardware descriptions into low-level netlists~\cite{10.1145/3437992.3439916}. Their approach translated a shallow embedding in HOL4 into a deep embedding of Verilog.
 \citet{perna12_mechan_wire_wise_verif_handel_c_synth,perna11_correc_hardw_synth} designed a formally verified translation from a deep embedding of Handel-C~\citep{aubury1996handel}, which is translated to a deep embedding of a circuit.
 Finally, \citet{ellis08} used Isabelle to implement and reason about intermediate languages for software/hardware compilation, where parts could be implemented in hardware and the correctness could still be shown.
author	Yann Herklotz <git@yannherklotz.com>	2021-04-16 23:32:37 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-04-16 23:32:37 +0100
commit	71933526a7c203fb76d54d8f08fea3e132da535c (patch)
tree	4c5ba14271ad40cf72faa66648315999c7e30d8c
parent	00656c3a17263c8153cd96488cf06b571422a3d3 (diff)
download	oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.tar.gz oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.zip