summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2021-02-25 20:34:37 +0000
committeroverleaf <overleaf@localhost>2021-02-26 13:01:18 +0000
commit222363cb6cc1814a32c60cdf078bf9608094c3a7 (patch)
tree24f6a1ee25ee7a4ffa25b8ed5af6bc6c6a91d928
parent42971f475053f87938f66b6cbb3d61c046b78ebe (diff)
downloadlatte21_hlstpc-222363cb6cc1814a32c60cdf078bf9608094c3a7.tar.gz
latte21_hlstpc-222363cb6cc1814a32c60cdf078bf9608094c3a7.zip
Update on Overleaf.
-rw-r--r--main.tex36
1 files changed, 21 insertions, 15 deletions
diff --git a/main.tex b/main.tex
index 1bd4fcc..294b878 100644
--- a/main.tex
+++ b/main.tex
@@ -148,7 +148,7 @@
%% Keywords
%% comma separated list
-\keywords{\compcert{}, Coq, high-level synthesis, C, Verilog}
+%\keywords{\compcert{}, Coq, high-level synthesis, C, Verilog}
\maketitle
@@ -161,9 +161,9 @@
%\JW{Richard Bornat used to have a nice saying about `proving the programs that people actually write'. You could say you're engaging with the kind of toolflow (i.e. C to Verilog) that hardware designers actually use (and for that, you could cite that magazine article we cited in the PLDI submission about `most hardware designs' start life as a C program').}
-Research in high-level synthesis (HLS) often concentrates on performance: trying to achieve the lowest area with the shortest run-time. What is often overlooked is ensuring that the HLS tool is indeed correct, which means that it outputs hardware designs that are equivalent to the behavioural input.
+Research in high-level synthesis (HLS) often concentrates on performance: trying to achieve the lowest area with the shortest run-time. What is often overlooked is ensuring that the HLS tool is correct, which means that it outputs hardware designs that are equivalent to the behavioural input.
-When working with HLS tools, it is often assumed that they transform the behavioural input into a semantically equivalent design~\cite{lahti19_are_we_there_yet} for example. However, this is not the case, and as with all complex pieces of software there are bugs in HLS tools as well. For example, Vivado HLS was found to incorrectly apply pipelining optimisations\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or generate incorrect designs silently when straying outside of the supported C\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}. These types of bugs are difficult to identify, and exist because firstly it is not quite clear what input these tools support, and secondly whether the output design actually behaves the same as the input.
+When working with HLS tools, it is often assumed that they transform the behavioural input into a semantically equivalent design~\cite{lahti19_are_we_there_yet} for example. However, this is not the case, and as with all complex pieces of software there are bugs in HLS tools as well. For example, Vivado HLS was found to incorrectly apply pipelining optimisations\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or generate incorrect designs silently when straying outside of the supported C.\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}} These types of bugs are difficult to identify, and exist because firstly it is not quite clear what input these tools support, and secondly whether the output design actually behaves the same as the input.
\paragraph{Our position} We believe that a formally verified high-level synthesis tool could be the solution to these problems. It not only guarantees that the output is correct, but also brings a formal specification of the input and output language semantics. These are the only parts of the compiler that need to be trusted, and if these are well-specified, then the behaviour of the resulting design can be fully trusted. In addition to that, if the semantics of the input language are taken from a tool that is widely trusted already, then there should not be any strange behaviour; the resultant design will either behave exactly like the input specification, or the translation will fail early at compile time. To this end, we have built a formally verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth}.
@@ -173,25 +173,25 @@ In what follows, we will argue our position by presenting several possible \emph
\objection{People should not be designing hardware in C to begin with}
-Formally verifying HLS of C is the wrong approach. C should not be used to design hardware, let alone hardware where reliability is crucial. Instead, there have been many efforts to formally verify the translation of high-level hardware description languages like Bluespec with K\^{o}i\-ka~\cite{bourgeat20_essen_blues}, formalising the synthesis of Verilog into technology-mapped net-lists with Lutsig~\cite{loow21_lutsig}, or work on formalising circuit design in Coq itself to ease design verification~\cite{choi17_kami,singh_silver_oak}.
+Formally verifying HLS of C is the wrong approach. C should not be used to design hardware, let alone hardware where reliability is crucial. Instead, there have been many efforts to formally verify the translation of high-level hardware description languages like Bluespec with K\^{o}i\-ka~\cite{bourgeat20_essen_blues}, formalising the synthesis of Verilog into technology-mapped net-lists with Lutsig~\cite{loow21_lutsig}, or formalising circuit design in Coq itself to ease design verification~\cite{choi17_kami,singh_silver_oak}.
-\textbf{\textit{Response:}} Verifying HLS is also important. Firstly, C is often the starting point for hardware designs, as initial models are written in those languages to get a quick prototype~\cite{gajski10_what_hls}, so it is only natural to continue using C when designing the hardware. Not only is HLS from C becoming more popular, but much of that convenience also comes from the easy behavioural testing that HLS allows to ensure correct functionality of the design~\cite{lahti19_are_we_there_yet}. This assumes that HLS tools are correct.
+\textbf{\textit{Response:}} Verifying HLS is also important. Firstly, C is often the starting point for hardware designs, as initial models are written in those languages to get a quick prototype~\cite{gajski10_what_hls}, so it is only natural to continue using C when designing the hardware. Not only is HLS from C becoming more popular, but much of that convenience comes from the easy behavioural testing that HLS allows to ensure correct functionality of the design~\cite{lahti19_are_we_there_yet}. This assumes that HLS tools are correct.
\objection{HLS tools are already commonly used in industry, so they are clearly already reliable enough}
-\paragraph{Response:} They are widely used, but they are also widely acknowledged to be quite flaky. \citet{du21_fuzzin_high_level_synth_tools} showed that on average 2.5\% of randomly generated C programs, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch them. And existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test bench may miss many edge cases that produce bugs.
+\paragraph{Response:} They are widely used, but they are also widely acknowledged to be quite flaky. \citet{du21_fuzzin_high_level_synth_tools} showed that on average 2.5\% of randomly generated C programs, tailored to the specific HLS tool, end up with incorrect designs. These bugs were reported and confirmed to be new bugs in the tools, demonstrating that existing internal tests did not catch them. And existing verification techniques for checking the output of HLS tools may not be enough to catch these bugs reliably. Checking the final design against the original model using a test bench may miss edge cases that produce bugs.
\objection{Existing approaches for testing or formally verifying hardware designs are sufficient for ensuring reliability}
-Besides the use of test benches to test designs produced by HLS, which suffers from many missing edge cases, there has been research on performing equivalence checks between the output design and the behavioural input, focusing on creating translation validators~\cite{pnueli98_trans} to prove the equivalence between the design and input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, these aren't perfect solutions either, as there is no guarantee that these proofs really compose with each other. In response, equivalence checkers are often designed to check the translation from start to finish, but this is computationally expensive, as well as possibly being highly incomplete.
+Besides the use of test benches to test designs produced by HLS, there has been research on performing equivalence checks between the output design and the behavioural input, focusing on creating translation validators~\cite{pnueli98_trans} to prove the equivalence between the design and input code, while supporting various optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, these aren't perfect solutions either, as there is no guarantee that these proofs really compose with each other. In response, equivalence checkers are often designed to check the translation from start to finish, but this is computationally expensive, as well as possibly being highly incomplete.
%\JW{could end sentence here, as the rest is a little convoluted and perhaps unnecessary} due to a combination of optimisations producing \JWreplace{an}{a correct} output that cannot be matched to the input, \JWcouldcut{even though it is correct.}
-\textbf{\textit{Response:}} The radical solution to this problem is to formally verify the whole tool. This has been shown to be successful in \compcert{}~\cite{leroy09_formal_verif_realis_compil}, for example, which is a formally verified C compiler written in Coq~\cite{coquand86}. The reliability of this formally verified compiler was demonstrated by CSmith~\cite{yang11_findin_under_bugs_c_compil}, a random, valid C generator, which found more than 300 bugs in GCC and Clang, but no bugs in the verified parts of \compcert{}.
+\textbf{\textit{Response:}} The radical solution to this problem is to formally verify the whole tool. This proved successful for \compcert{}~\cite{leroy09_formal_verif_realis_compil}, for example, which is a formally verified C compiler written in Coq~\cite{coquand86}. The reliability of this formally verified compiler was demonstrated by Csmith~\cite{yang11_findin_under_bugs_c_compil}, a random, valid C generator, finding more than 300 bugs in GCC and Clang, but no bugs in the verified parts of \compcert{}.
\objection{HLS applications don't require the levels of reliability that a formally verified compiler affords}
-One might argue that developing a formally verified tool in a theorem prover and proving correctness theorems about it might be too tedious and take too long, and that HLS tools specifically do not need that kind of reliability. Indeed, with our experience developing a verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth} based on \compcert{}, we found that it normally takes $5\times$ or $10\times$ longer to prove a translation correct compared to writing the algorithm.
+One might argue that developing a formally verified tool in a theorem prover and proving correctness theorems about it might take too long, and that HLS tools specifically do not need that kind of reliability. Indeed, in our experience developing a verified HLS tool called \vericert{}~\cite{herklotz21_formal_verif_high_level_synth} based on \compcert{}, we found that it normally takes $5\times$ or $10\times$ longer to prove a translation correct compared to writing the algorithm.
\textbf{\textit{Response:}} However, this could be seen as being beneficial, as proving the correctness of the HLS tool proves the absence of any bugs according to the language semantics, meaning much less time has to be spent on fixing bugs. In addition to that, verification also forces the algorithm to deal with many different edge cases that may be hard to identify normally, and may even allow for more optimisations to be performed, which may at first glance seem unreliable.
@@ -199,15 +199,19 @@ One might argue that developing a formally verified tool in a theorem prover and
\objection{Any HLS tool that is simple enough for formal verification to be feasible won't produce sufficiently optimised designs to be useful}
-Another concern might be that a verified HLS tool might not be performant enough to be usable in practice. If that is the case, then the verification effort could be seen as useless, as it could not be used.
+Another concern is that a verified HLS tool might not be performant enough to be usable in practice. If that is the case, then the verification effort could be seen as useless, as it could not be used.
-\textbf{\textit{Response:}} We think that even a verified HLS tool can be comparable in performance to a state-of-the-art unverified HLS tool. Taking \vericert{} as an example, which does not currently include many optimisations, we found that performing comparisons between the fully verified bits of \vericert{} and \legup{}~\cite{canis11_legup}, we found that the speed and area was comparable ($1\times$ - $1.5\times$) to that of \legup{} without LLVM optimisations and without operation chaining. With those optimisations fully turned on, \vericert{} is around $4.5\times$ slower than \legup{}, with half of the speed-up being due to LLVM.\@
+\textbf{\textit{Response:}} We think that even a verified HLS tool can be comparable in performance to a state-of-the-art unverified HLS tool. Taking \vericert{} as an example, which does not currently include many optimisations, we found that performing comparisons between the fully verified bits of \vericert{} and \legup{}~\cite{canis11_legup}, we found that the speed and area was comparable ($1\times$ - $1.5\times$) to that of \legup{} without LLVM optimisations and without operation chaining. With those optimisations fully turned on, \vericert{} is around $4.5\times$ slower than \legup{}, with half the speed up being due to LLVM.\@
-There are many optimisations that need to be added to \vericert{} to turn it into a viable and competitive HLS tool. First of all, the most important addition is a good scheduling implementation that supports operation chaining and properly pipelining operators. Our main focus is implementing scheduling based on systems of difference constraints~\cite{cong06_sdc}, which is the same algorithm \legup{} uses. With this optimisation turned on, \vericert{} is only $~2\times$ to $~3\times$ slower than fully optimised \legup{}, with a slightly larger area. The scheduling step is implemented using verified translation validation, meaning the scheduling algorithm can be tweaked and optimised without ever having to touch the correctness proof.
+There are many optimisations that need to be added to \vericert{} to turn it into a viable and competitive HLS tool. First of all, a good scheduling implementation that supports operation chaining and pipelined operators is critical. Our main focus is implementing scheduling based on systems of difference constraints~\cite{cong06_sdc}, which is the same algorithm \legup{} uses. With this optimisation turned on, \vericert{} is only $~2\times$ to $~3\times$ slower than fully optimised \legup{}, with a slightly larger area. The scheduling step is implemented using verified translation validation, meaning the scheduling algorithm can be tweaked and optimised without ever having to touch the correctness proof.
-\objection{Even a formally verified HLS tool can't give absolute guarantees about the hardware designs it produces}
+\vspace{2mm}
-\paragraph{Response:} It is true that a verified tool is still allowed to fail at compile time, meaning none of the correctness proofs need to hold if no output is produced. However, this is mostly a matter of putting more engineering work into the tool to make it more complete. Bugs are easier to identify as they will induce tool failures at compile time.
+\objection{Even a formally verified HLS tool can't give absolute guarantees about the hardware it produces}~
+
+\vspace{-3mm}
+
+\textbf{\textit{Response:}} It is true that a verified tool is still allowed to fail at compile time, meaning none of the correctness proofs need to hold if no output is produced. However, this is mostly a matter of putting more engineering work into the tool to make it more complete. Bugs are easier to identify as they will induce tool failures at compile time.
In addition to that, specifically for an HLS tool taking C as input, undefined behaviour in the input code will allow the HLS tool to behave any way it wishes. This becomes even more important when passing the C to a verified HLS tool, because if it is not free of undefined behaviour, then none of the proofs will hold. Extra steps therefore need to be performed to ensure that the input is free of any undefined behaviour, by using a tool like VST~\cite{appel11_verif_softw_toolc} for example.
@@ -215,7 +219,9 @@ Finally, the input and output language semantics also need to be trusted, as the
\section{Conclusion}
-In conclusion, we have argued that HLS tools should be formally verified, and through our Vericert prototype, we have demonstrated that doing so is feasible. Even though the performance does not \JWreplace{quite}{yet} match state-of-the-art HLS tools, as \JWreplace{the}{more and more} optimisations are implemented and formally verified, similar performance \JWreplace{may be achieved}{should be achievable}. In addition to that, a formally verified HLS tool, written in a theorem prover, would provide a good base to mechanically verify other HLS-specific optimisations and help develop more reliable HLS tools in the future.
+In conclusion, we have argued that HLS tools should be formally verified, and through our Vericert prototype, we have demonstrated that doing so is feasible. Even though the performance does not yet match state-of-the-art HLS tools, as more and more optimisations are implemented and formally verified, similar performance should be achievable. We believe that Vericert has the potential to raise the standard of reliability across the HLS field. It also has the potential to bring HLS to a new domain: designers of security- or safety-critical hardware, who are currently forced to design at very low levels of abstraction in order to minimise their trusted computing base.
+
+%In addition to that, a formally verified HLS tool, written in a theorem prover, would provide a good base to mechanically verify other HLS-specific optimisations and help develop more reliable HLS tools in the future.
%% Acknowledgments
%\begin{acks}%% acks environment is optional