summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-06 09:19:04 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-06 09:19:04 +0100
commita1bc54afd0757a6313bfc966a247fcc8e42901fb (patch)
tree97b5941030dd762416342f90c757076e0a017aa6
parent16221f94fe797f73538f1c9228d59a4ce1c2d22b (diff)
downloadlatte21_hlstpc-a1bc54afd0757a6313bfc966a247fcc8e42901fb.tar.gz
latte21_hlstpc-a1bc54afd0757a6313bfc966a247fcc8e42901fb.zip
Add
-rw-r--r--main.tex34
-rw-r--r--references.bib9
2 files changed, 24 insertions, 19 deletions
diff --git a/main.tex b/main.tex
index 2e03ff0..8588a79 100644
--- a/main.tex
+++ b/main.tex
@@ -75,7 +75,7 @@
\hypersetup{draft}
\begin{document}
-\title[HLS Tools Should be Proven Correct]{High-Level Synthesis Tools Should be Proven Correct}
+\title[HLS Tools should be Proven Correct]{High-Level Synthesis Tools should be Proven Correct}
\author{Yann Herklotz}
\orcid{0000-0002-2329-1029}
@@ -118,44 +118,48 @@ In what follows, we will argue our position by presenting several possible \emph
\section{Arguments against formalised HLS}
-\objection{People should not be designing hardware in C to begin with}{Formally verifying HLS of C is the wrong approach. C should not be used to design hardware, let alone hardware where reliability is crucial. First of all, HLS tools are not unreliable in terms of correctness, but more so in terms of the quality of the hardware, which can be quite unpredictable~\cite{nigam20_predic_accel_desig_time_sensit_affin_types}. For example, when writing C for an HLS tool, small changes in the input could result in large differences in area and performance \JWreplace{for}{of} the resulting hardware, \JWcouldcut{leading to a kind of flakiness in the output}. \JWreplace{Instead}{On the other hand}, there have been many efforts to formally verify the translation of high-level hardware description languages like Bluespec with K\^{o}i\-ka~\cite{bourgeat20_essen_blues}, \JWreplace{formalising}{to formalise} the synthesis of Verilog into technology-mapped net-lists with Lutsig~\cite{loow21_lutsig}, or \JWreplace{formalising}{to formalise} circuit design in Coq itself to ease design verification~\cite{choi17_kami,singh_silver_oak}.}
+\objection{People should not be designing hardware in C to begin with}{Formally verifying HLS of C is the wrong approach. C should not be used to design hardware, let alone hardware where reliability is crucial. First of all, HLS tools are not unreliable in terms of correctness, but more so in terms of the quality of the hardware, which can be quite unpredictable~\cite{nigam20_predic_accel_desig_time_sensit_affin_types}. For example, when writing C for an HLS tool, small changes in the input could result in large differences in area and performance of the resulting hardware. On the other hand, there have been many efforts to formally verify the translation of high-level hardware description languages like Bluespec with K\^{o}i\-ka~\cite{bourgeat20_essen_blues}, to formalise the synthesis of Verilog into technology-mapped net-lists with Lutsig~\cite{loow21_lutsig}, or to formalise circuit design in Coq itself to ease design verification~\cite{choi17_kami,singh_silver_oak}.}
-\response{} Verifying HLS is also important. First, C is often the starting point for hardware designs, as initial models are written in those languages to get a quick prototype~\cite{gajski10_what_hls}, so it is only natural to continue using C when designing the hardware. Not only is HLS from C becoming more popular, but much of that convenience comes from the easy behavioural testing that HLS allows to ensure correct functionality of the design~\cite{lahti19_are_we_there_yet}. This assumes that HLS tools are correct. Finally, even though unpredictability of the output of HLS tools might seem like the largest issue, working on a functionally correct HLS tool provides a good baseline to work on improving the predictability of the output as well. Reasoning about correctness could maybe be extended with proofs about the variability of the generated output hardware, thereby also improving the quality of the output. \JW{In this vein, }\compcert{}~\cite{leroy09_formal_verif_realis_compil}, an existing formally verified C compiler, has recently been extended with a proof of preservation of constant-time~\cite{barthe20_formal_c}, and a similar approach could be taken to \JWcouldcut{improve} to prove properties about the preservation of the hardware area or performance. % JW: Yes, this is great stuff.
+\response{} Verifying HLS is also important. First, C is often the starting point for hardware designs, as initial models are written in C to produce a quick prototype~\cite{gajski10_what_hls}, so it is only natural to continue using C when designing the hardware. Not only is HLS from C becoming more popular, but much of that convenience comes from the easy behavioural testing that HLS allows to ensure correct functionality of the design~\cite{lahti19_are_we_there_yet}. This assumes that HLS tools are correct. Finally, even though unpredictability of the output of HLS tools might seem like the largest issue, working on a functionally correct HLS tool provides a good baseline to work on improving the predictability of the output as well. Reasoning about correctness could maybe be extended with proofs about the variability of the generated output hardware, thereby also improving the quality of the output. In this vein, \compcert{}~\cite{leroy09_formal_verif_realis_compil}, an existing formally verified C compiler, has recently been extended with a proof of preservation of constant-time~\cite{barthe20_formal_c}, and a similar approach could be taken to to prove properties about the preservation of the hardware area or performance.
\objection{HLS tools are already commonly used in industry, so they are clearly already reliable enough}{}
-\response{} They are widely used, but they are also widely acknowledged to be quite flaky. In prior work~\cite{du21_fuzzin_high_level_synth_tools}, we have \JWreplace{showed}{shown} that on average 2.5\% of randomly generated C programs, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch them. And existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test bench may miss edge cases that produce bugs.
+\response{} They are widely used, but they are also widely acknowledged to be quite flaky. In prior work~\cite{du21_fuzzin_high_level_synth_tools}, we have shown that on average 2.5\% of randomly generated C programs, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch them.
\objection{Existing approaches for testing or formally verifying hardware designs are sufficient for ensuring reliability}{
-Besides the use of test benches to test designs produced by HLS, there has been research on performing equivalence checks between the output design and the behavioural input, focusing on creating translation validators~\cite{pnueli98_trans} to prove \JWcouldcut{the} equivalence between the design and \JW{the} input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, these aren't perfect solutions either, as there is no guarantee that these proofs really compose with each other. In response, equivalence checkers are often designed to check the translation from start to finish, but this is computationally expensive, as well as possibly being highly incomplete.}
+Besides the use of test-benches to test designs produced by HLS, there has been research on performing equivalence checks between the output design and the behavioural input, focusing on creating translation validators~\cite{pnueli98_trans} to prove equivalence between the design and the input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.}
-\JW{Hm, the flow doesn't quite make sense around here. Objection 3 starts by saying that `existing approaches for testing or formally verifying hardware designs are sufficient' but Response 2 has already said that `existing verification techniques for checking the output of HLS tools may not be enough'. And the text at the end of Objection 3 is already saying that translation validators aren't very good -- it feels like that belongs in the `response' rather than the `objection' itself. One idea: you could end Response 2 after `catch them'. Then Objection 3 suggests a variety of techniques for checking the output of HLS tools (translation validation of each stage, equivalence checking of the whole thing, etc). Then Response 3 demolishes each in turn, ending up with formal verification as the only way forward.}
+%\JW{Hm, the flow doesn't quite make sense around here. Objection 3 starts by saying that `existing approaches for testing or formally verifying hardware designs are sufficient' but Response 2 has already said that `existing verification techniques for checking the output of HLS tools may not be enough'. And the text at the end of Objection 3 is already saying that translation validators aren't very good -- it feels like that belongs in the `response' rather than the `objection' itself. One idea: you could end Response 2 after `catch them'. Then Objection 3 suggests a variety of techniques for checking the output of HLS tools (translation validation of each stage, equivalence checking of the whole thing, etc). Then Response 3 demolishes each in turn, ending up with formal verification as the only way forward.}
%\JW{could end sentence here, as the rest is a little convoluted and perhaps unnecessary} due to a combination of optimisations producing \JWreplace{an}{a correct} output that cannot be matched to the input, \JWcouldcut{even though it is correct.}
-\response{} The radical solution to this problem is to formally verify the whole tool. This proved successful for \compcert{}~\cite{leroy09_formal_verif_realis_compil}, for example, which is a formally verified C compiler written in Coq~\cite{coquand86}. The reliability of this formally verified compiler was demonstrated by Csmith~\cite{yang11_findin_under_bugs_c_compil}, a random, valid C generator, finding more than 300 bugs in GCC and Clang, but no bugs in the verified parts of \compcert{}.
+\response{} Existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test-bench may miss edge cases that produce bugs. In addition to that, these test-benches need to be kept up to date when new features are introduced. Translation validation seems like a good solution, as the tool can be automatically checked as it generates hardware, and any potential issues in the hardware are reported at the time the hardware is generated. However, this is not a perfect solution either, as there is no guarantee that the translation validation proofs really compose with each other. In response, equivalence checkers are often designed to check the translation from start to finish, but this is computationally expensive, as well as possibly being highly incomplete. In addition to that, these translation validation algorithms have not been validated mechanically by a theorem prover. Therefore, there is no guarantee that the validator itself is correct, which would require more testing.
-\objection{HLS applications don't require the levels of reliability that a formally verified compiler affords}{One might argue that developing a formally verified tool in a theorem prover and proving correctness theorems about it might take too long, and that HLS tools specifically do not need that kind of reliability. Indeed, in our experience developing a verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth} based on \compcert{}, we found that it normally takes $5\times$ or $10\times$ longer to prove a translation \JW{I find `a translation' confusing because it sounds like `a translation of a particular source program' like in `translation validation'. Perhaps you could say `compiler pass' or something?} correct compared to writing the algorithm.}
+The radical solution to this problem is to formally verify the whole tool. This proved successful for \compcert{}~\cite{leroy09_formal_verif_realis_compil}, for example, which is a formally verified C compiler written in Coq~\cite{coquand86}. The reliability of this formally verified compiler was demonstrated by Csmith~\cite{yang11_findin_under_bugs_c_compil}, a random, valid C generator, finding more than 300 bugs in GCC and Clang, but no bugs in the verified parts of \compcert{}.
-\response{} However, \JWcouldcut{this could be seen as being beneficial, as} proving the correctness of the HLS tool proves the absence of any bugs according to the language semantics, meaning much less time has to be spent on fixing bugs. In addition to that, verification also forces the algorithm to deal with many different edge cases that may be hard to identify normally, and may even allow for more optimisations to be performed, which may at first glance seem unreliable. \JW{Nice. I wonder if there's anything we could cite for that. Any examples where formal verification has led to improved performance?}
+\objection{HLS applications don't require the levels of reliability that a formally verified compiler affords}{One might argue that developing a formally verified tool in a theorem prover and proving correctness theorems about it might take too long, and that HLS tools specifically do not need that kind of reliability. Indeed, in our experience developing a verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth} based on \compcert{}, we found that it normally takes $5\times$ or $10\times$ longer to prove a compiler pass correct compared to writing the algorithm.}
+
+\response{} However, proving the correctness of the HLS tool proves the absence of any bugs according to the language semantics, meaning much less time has to be spent on fixing bugs. In addition to that, verification also forces the algorithm to deal with many different edge cases that may be hard to identify normally.
+
+%and may even allow for more optimisations to be performed, which may at first glance seem unreliable. \JW{Nice. I wonder if there's anything we could cite for that. Any examples where formal verification has led to improved performance?}
%\JW{I can't quite understand what `being certain about assumptions one would usually have to make' means. Can it be said more plainly?}\YH{Yes, I was mainly trying to explain that proofs help you be certain that your optimisations are correct, and so you can apply them in more cases sometimes.}
\objection{Any HLS tool that is simple enough for formal verification to be feasible won't produce sufficiently optimised designs to be useful}
-{\JWcouldcut{Another concern is that a verified HLS tool might not be performant enough to be usable in practice.} If that is the case, then the verification effort could be seen as useless, as it could not be used.}
+{ If that is the case, then the verification effort could be seen as useless, as it could not be used.}
-\response{} We think that even a verified HLS tool can be comparable in performance to a state-of-the-art unverified HLS tool. Taking \vericert{} as an example, which does not currently include many optimisations, we found that performing comparisons between \JWcouldcut{the fully verified bits of} % I deleted that because it looked like you were referring to the fully verified bits of both LegUp and Vericert
-\vericert{} and \legup{}~\cite{canis11_legup}, we found that the speed and area \JWreplace{was}{were} comparable ($1\times$ - $1.5\times$) to that of \legup{} without LLVM optimisations and without operation chaining. With those optimisations fully turned on, \vericert{} is around $4.5\times$ slower than \legup{}, with half the speed up being due to LLVM.\@
+\response{} We think that even a verified HLS tool can be comparable in performance to a state-of-the-art unverified HLS tool. Taking \vericert{} as an example, which does not currently include many optimisations, we found that performing comparisons between % I deleted that because it looked like you were referring to the fully verified bits of both LegUp and Vericert
+\vericert{} and \legup{}~\cite{canis11_legup}, we found that the speed and area were comparable ($1\times$ - $1.5\times$) to that of \legup{} without LLVM optimisations and without operation chaining. With those optimisations fully turned on, \vericert{} is around $4.5\times$ slower than \legup{}, with half the speed up being due to LLVM.\@
-There are many optimisations that need to be added to \vericert{} to turn it into a viable and competitive HLS tool. First of all, a good scheduling implementation that supports operation chaining and pipelined operators is critical. Our main focus is implementing scheduling based on systems of difference constraints~\cite{cong06_sdc}, which is the same algorithm \legup{} uses. With this optimisation turned on, \vericert{} is only $~2\times$ to $~3\times$ slower than fully optimised \legup{}, with a slightly larger area. The scheduling step is implemented using verified translation validation, \JWreplace{meaning}{which means that} the scheduling algorithm can be tweaked and optimised without ever having to touch the correctness proof.
+There are many optimisations that need to be added to \vericert{} to turn it into a viable and competitive HLS tool. First of all, a good scheduling implementation that supports operation chaining and pipelined operators is critical. Our main focus is implementing scheduling based on systems of difference constraints~\cite{cong06_sdc}, which is the same algorithm \legup{} uses. With this optimisation turned on, \vericert{} is only $~2\times$ to $~3\times$ slower than fully optimised \legup{}, with a slightly larger area. The scheduling step is implemented using verified translation validation, which means that the scheduling algorithm can be tweaked and optimised without ever having to touch the correctness proof.
\objection{Even a formally verified HLS tool can't give absolute guarantees about the hardware it produces}{}
-\response{} It is true that a verified tool is still allowed to fail at compile time, \JWreplace{meaning none of the correctness proofs need to hold if no output is produced}{which means that no output is produced despite the input being valid}. However, this is mostly a matter of putting more engineering work into the tool to make it more complete. Bugs are easier to identify as they will induce tool failures at compile time.
+\response{} It is true that a verified tool is still allowed to fail at compile time, which means that no output is produced despite the input being valid. However, this is mostly a matter of putting more engineering work into the tool to make it more complete. Bugs are easier to identify as they will induce tool failures at compile time.
In addition to that, specifically for an HLS tool taking C as input, undefined behaviour in the input code will allow the HLS tool to behave any way it wishes. This becomes even more important when passing the C to a verified HLS tool, because if it is not free of undefined behaviour, then none of the proofs will hold. Extra steps therefore need to be performed to ensure that the input is free of any undefined behaviour, by using a tool like VST~\cite{appel11_verif_softw_toolc} for example.
-Finally, the input and output language semantics \JWcouldcut{also} need to be trusted, as the proofs only hold as long as the semantics are a faithful representation of the languages. In \vericert{} this comes down to trusting the C semantics developed by \compcert{}~\cite{blazy09_mechan_seman_cligh_subset_c_languag} and the Verilog semantics that we adapted from \citet{loow19_proof_trans_veril_devel_hol}.
+Finally, the input and output language semantics need to be trusted, as the proofs only hold as long as the semantics are a faithful representation of the languages. In \vericert{} this comes down to trusting the C semantics developed by \compcert{}~\cite{blazy09_mechan_seman_cligh_subset_c_languag} and the Verilog semantics that we adapted from \citet{loow19_proof_trans_veril_devel_hol}.
\section{Conclusion}
diff --git a/references.bib b/references.bib
index 3779f02..34cb044 100644
--- a/references.bib
+++ b/references.bib
@@ -185,12 +185,13 @@
note = {(under review)}
}
-@unpublished{du21_fuzzin_high_level_synth_tools,
+@inproceedings{du21_fuzzin_high_level_synth_tools,
author = {Herklotz, Yann and Du, Zewei and Ramanathan, Nadesh and Wickerson, John},
- note = {(under review)},
+ note = {(to appear)},
year = 2021,
url = {https://yannherklotz.com/docs/drafts/fuzzing_hls.pdf},
- title = {{Fuzzing High-Level Synthesis Tools}},
+ title = {An Empirical Study of the Reliability of High-Level Synthesis Tools},
+ booktitle = {29th IEEE International Symposium on Field-Programmable Custom Computing Machines},
}
@misc{polybench,
@@ -203,7 +204,7 @@
@inproceedings{cong06_sdc,
author = {J. {Cong} and {Zhiru Zhang}},
title = {An efficient and versatile scheduling algorithm based on SDC formulation},
- booktitle = {2006 43rd ACM/IEEE Design Automation Conference},
+ booktitle = {43rd ACM/IEEE Design Automation Conference},
year = 2006,
pages = {433-438},
doi = {10.1145/1146909.1147025},