summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-02-25 17:35:19 +0000
committerYann Herklotz <git@yannherklotz.com>2021-02-25 17:35:19 +0000
commit65c3f49b87ff3afc9121ce32ff30fdd4f5937fa2 (patch)
tree8d0b9d42c8914ddf49cd115e5be015e9a7e73843
parent65c1d6d9a283b8e23fe509716d16d4d32bcc523d (diff)
downloadlatte21_hlstpc-65c3f49b87ff3afc9121ce32ff30fdd4f5937fa2.tar.gz
latte21_hlstpc-65c3f49b87ff3afc9121ce32ff30fdd4f5937fa2.zip
Add automatic counters
-rw-r--r--main.tex19
1 files changed, 12 insertions, 7 deletions
diff --git a/main.tex b/main.tex
index 4c395b2..39211b2 100644
--- a/main.tex
+++ b/main.tex
@@ -94,6 +94,9 @@
\newtheorem{lemma}{Lemma}
\newtheorem*{remark}{Remark}
+\newcounter{objections}
+\newcommand{\objection}[1]{\refstepcounter{objections}\paragraph{Objection \theobjections: #1}}
+
\hypersetup{draft}
\begin{document}
@@ -196,35 +199,37 @@ In what follows, we will argue our position by presenting several possible \emph
\section{Arguments against formalised HLS}
-\paragraph{Objection 1: People should not be designing hardware in C to begin with}
+\objection{People should not be designing hardware in C to begin with}
Formally verifying HLS of C is the wrong approach. C should not be used to design hardware, let alone hardware where reliability is crucial. Instead, there have been many efforts to formally verify the translation of high-level hardware description languages like Bluespec with K\^{o}i\-ka~\cite{bourgeat20_essen_blues}, formalising the synthesis of Verilog into technology-mapped net-lists with Lutsig~\cite{loow21_lutsig}, or work on formalising circuit design in Coq itself to ease design verification~\cite{choi17_kami,singh_silver_oak}.
\paragraph{Our response:} However, verifying HLS is also important. Not only is HLS becoming more popular, as it requires much less design effort to produce new hardware~\cite{lahti19_are_we_there_yet}, but much of that convenience comes from the easy behavioural testing that HLS allows to ensure correct functionality of the design. This assumes that HLS tools are correct. \JW{Richard Bornat used to have a nice saying about `proving the programs that people actually write'. You could say you're engaging with the kind of toolflow (i.e. C to Verilog) that hardware designers actually use (and for that, you could cite that magazine article we cited in the PLDI submission about `most hardware designs' start life as a C program').}
-\paragraph{Existing approaches for testing or formally verifying hardware designs are sufficient for ensuring reliability.}
+\objection{Existing approaches for testing or formally verifying hardware designs are sufficient for ensuring reliability.}
Instead, the design is often meticulously tested, often using the higher level design as a model. Since these tests are performed on the hardware design directly, they have to be run on a simulator, which takes much longer than if the original C was tested. And any formal properties obtained from the C code would also have to be checked again in the resulting design, to ensure that these hold there as well, as the synthesis tool may have translated the input incorrectly.
-\citet{du21_fuzzin_high_level_synth_tools} showed that on average 2.5\% of randomly generated C \JW{programs}, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch \JWreplace{these bugs}{them}. \JWreplace{In addition to that,}{And} existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test bench may miss many edge cases that produce bugs.
+\citet{du21_fuzzin_high_level_synth_tools} showed that on average 2.5\% of randomly generated C programs, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch them. And existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test bench may miss many edge cases that produce bugs.
+
+\JW{There are a few different phrases in the paper: `behavioural input', `software input', `higher level design', `input code' etc. I think it would be good to consistentify this a bit. Also it's sometimes not completely clear whether `design' means `software design' or `hardware design'.}
-There has been research on performing equivalence checks \JWreplace{on}{between} the \JW{output} design and the behavioural input, \JW{There are a few different phrases in the paper: `behavioural input', `software input', `higher level design', `input code' etc. I think it would be good to consistentify this a bit. Also it's sometimes not completely clear whether `design' means `software design' or `hardware design'.} focusing on creating translation validators~\cite{pnueli98_trans} to prove the equivalence between the design and input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, these aren't perfect solutions either, as there is no guarantee that these proofs really compose with each other. This means that \JWcouldcut{an} equivalence checkers are \JWreplace{normally}{often} designed to check the translation from start to finish, which is computationally expensive, as well as possibly being \JWreplace{unreliable}{highly incomplete?} due to a combination of optimisations producing an output that cannot be matched to the input, even though it is correct.
+There has been research on performing equivalence checks between the output design and the behavioural input, focusing on creating translation validators~\cite{pnueli98_trans} to prove the equivalence between the design and input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, these aren't perfect solutions either, as there is no guarantee that these proofs really compose with each other. This means that \JWcouldcut{an} equivalence checkers are \JWreplace{normally}{often} designed to check the translation from start to finish, which is computationally expensive, as well as possibly being \JWreplace{unreliable}{highly incomplete?} due to a combination of optimisations producing an output that cannot be matched to the input, even though it is correct.
The radical solution to this problem is to formally verify the \JWreplace{complete}{whole} tool. This has been \JWreplace{proven}{shown?} to be successful, in \compcert{}~\cite{leroy09_formal_verif_realis_compil}, for example, which is a formally verified C compiler written in Coq~\cite{coquand86}. The reliability of \JWreplace{formal verification in Coq}{a formally verified compiler} was demonstrated by CSmith~\cite{yang11_findin_under_bugs_c_compil}, a random, valid C generator, which found more than 300 bugs in GCC and Clang, whereas only 5 bugs were found in the unverified parts of \compcert{}, which prompted the verification of those parts as well. \JW{If this needs condensing, you could say: `whereas no bugs were found in the verified parts of \compcert{}.'}
-\paragraph{HLS applications don't require the levels of reliability that a formally verified compiler affords.}
+\objection{HLS applications don't require the levels of reliability that a formally verified compiler affords}
One might argue that developing a formally verified tool in a theorem prover and proving correctness theorems about it might be too tedious and take too long, and that HLS tools specifically do not need that kind of reliability. With our experience developing a verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth} based on \compcert{}, we found that it normally takes $5\times$ or $10\times$ longer to prove a translation correct compared to writing the algorithm.
However, this could be seen as being beneficial, as proving the correctness of the HLS tool proves the absence of any bugs according to the language semantics, meaning much less time has to be spent on fixing bugs. In addition to that, verification also forces the algorithm to deal with many different edge cases that may be hard to identify normally, and may even allow for more optimisations as one can be certain about assumptions one would usually have to make.
-\paragraph{Any HLS tool that is simple enough for formal verification to be feasible won't produce sufficiently optimised designs to be useful.}
+\objection{Any HLS tool that is simple enough for formal verification to be feasible won't produce sufficiently optimised designs to be useful.}
Another concern might be that a verified HLS tool might not be performant enough to be usable in practice. If that is the case, then the verification effort could be seen as useless, as it could not be used. Taking \vericert{} as an example, which does not currently include many optimisations, we found that performing comparisons between the fully verified bits of \vericert{} and \legup{}~\cite{canis11_legup}, we found that the speed and area was comparable ($1\times$ - $1.5\times$) that of \legup{} without LLVM optimisations and without operation chaining. With those optimisations fully turned on, \vericert{} is around $4.5\times$ slower than \legup{}, with half of the speed-up being due to LLVM.\@
There are many optimisations that need to be added to \vericert{} to turn it into a viable and competitive HLS tool. First of all, the most important addition is a good scheduling implementation, which supports operation chaining and properly pipelining operators. Our main focus is implementing scheduling based on system of difference constraint~\cite{cong06_sdc}, which is the same algorithm \legup{} uses. With this optimisation turned on, \vericert{} is only $~2\times$ to $~3\times$ slower than fully optimised \legup{}, with a slightly larger area. The scheduling step is implemented using verified translation validation, meaning the scheduling algorithm can be tweaked and optimised without ever having to touch the correctness proof.
-\paragraph{Even a formally verified HLS tool can't give absolute guarantees about the hardware designs it produces.}
+\objection{Even a formally verified HLS tool can't give absolute guarantees about the hardware designs it produces.}
It is true that a verified tool is still allowed to fail at compilation time, meaning none of the correctness proofs need to hold if no output is produced. However, this is mostly a matter of putting more engineering work into the tool to make it reliable. If a test bench is available, it is also quite simple to check this property, as it just has to be randomly tested without even having to execute the output.