summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2022-04-03 17:14:08 +0000
committernode <node@git-bridge-prod-0>2022-04-12 21:28:23 +0000
commit84a444062fe8a51da73592b8a48948ab34cb248f (patch)
treec06365b8d278a9f80f32cb6c57d79155d372bfd8
parentbda13f9724a006c4e30a92f9aca648af0d2c1665 (diff)
downloadfccm22_rsvhls-84a444062fe8a51da73592b8a48948ab34cb248f.tar.gz
fccm22_rsvhls-84a444062fe8a51da73592b8a48948ab34cb248f.zip
Update on Overleaf.HEADmaster
-rw-r--r--references.bib4
-rw-r--r--verified_resource_sharing.tex18
2 files changed, 11 insertions, 11 deletions
diff --git a/references.bib b/references.bib
index 2496bf4..6966d89 100644
--- a/references.bib
+++ b/references.bib
@@ -25,9 +25,9 @@
@misc{vericertfun-github,
author = {Pardalos, Michalis and Herklotz, Yann and Wickerson, John},
- title = {Public Github repository for Vericert-Fun},
+ title = {Public repository for Vericert-Fun},
year = 2022,
- url = {https://github.com/mpardalos/Vericert-Fun},
+ note = {\url{https://github.com/mpardalos/Vericert-Fun} or \url{https://doi.org/10.5281/zenodo.5866708}},
}
@misc{xilinx_vitis,
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 3b6313b..9a78d53 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -65,8 +65,9 @@ escapeinside=||,
\newcommand\YH[2][]{\st{#1}\Comment{blue!50!green}{YH}{#2}}
\newcommand\MP[2][]{\st{#1}\Comment{blue!50!magenta}{MP}{#2}}
-\newcommand\JWchanged[1]{\textcolor{red!75!black}{#1}}
-\newcommand\YHchanged[1]{\textcolor{blue!50!green}{#1}}
+%\newcommand\JWchanged[1]{\textcolor{red!75!black}{#1}}
+%\newcommand\YHchanged[1]{\textcolor{blue!50!green}{#1}}
+%\newcommand\MPchanged[1]{\textcolor{red!50!yellow}{#1}}
\newcommand\vericert{Vericert}
\newcommand\vericertfun{Vericert-Fun}
@@ -110,7 +111,7 @@ Yet doubts have been raised about the reliability of the current crop of HLS too
Aiming to address this issue is \vericert{}~\cite{vericert}, a new HLS tool whose correctness has been verified to the highest possible standard: a computer-checked proof that any Verilog design it produces will behave the same way as the C program given as input. %It is based on the CompCert~\cite{compcert_Leroy2009} verified C compiler.
Yet it is not enough for an HLS tool simply to be \emph{correct}; the generated hardware must also enjoy high throughput, low latency, and good \emph{area efficiency} -- the last of which is the topic of this paper.
-A common optimisation employed by HLS tools to improve area efficiency is \emph{resource sharing}; that is, \JWchanged{mapping multiple operations to the same hardware unit}. Accordingly, our work adds function-level resource sharing to \vericert{}, yielding a new prototype HLS tool called `\vericertfun'. In line with the aims of the \vericert{} project, work is ongoing to extend the correctness proof.
+A common optimisation employed by HLS tools to improve area efficiency is \emph{resource sharing}; that is, mapping multiple operations to the same hardware unit. Accordingly, our work adds function-level resource sharing to \vericert{}, yielding a new prototype HLS tool called `\vericertfun'. In line with the aims of the \vericert{} project, work is ongoing to extend the correctness proof.
The entire \vericertfun{} development is fully open-source~\cite{vericertfun-github}, and more details about the implementation and proofs are available in a technical report~\cite{pardalos_thesis}.
\section{Background}
@@ -123,8 +124,7 @@ Developing software within a proof assistant like Coq is widely held to be the g
It transforms its input through a series of ten intermediate languages before generating the final output.
%This design ensures that each individual pass remains relatively simple and hence feasible to prove correct.
The correctness proof of the entire compiler is formed by composing the correctness proofs of each of those passes.
-\YHchanged{That the Csmith compiler testing tool}~\cite{csmith} has found hundreds of bugs in GCC and LLVM but none in (the verified parts of) CompCert is a testament to the reliability of this development approach.
-\YHchanged{In \vericertfun{} designs, however, we cannot guarantee that they will never go wrong, because of fallible components not covered by the theorem, such as the synthesis toolchain~\cite{verismith} and the FPGA itself.}
+That the Csmith compiler testing tool~\cite{csmith} has found hundreds of bugs in GCC and LLVM but none in (the verified parts of) CompCert is a testament to the reliability of this development approach.
\paragraph{The \vericert{} verified HLS tool}
@@ -300,7 +300,7 @@ int main()
\end{figure}
-Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as obtained by the CompCert frontend. We see two CFGs, one per function. The \YHchanged{control-flow} in this example is straightforward, but in general, 3AC programs can contain unrestricted gotos. The nodes of the CFGs are numbered in reverse, as are the parameters of the \lstinline{add} function, following CompCert convention.
+Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as obtained by the CompCert frontend. We see two CFGs, one per function. The control flow in this example is straightforward, but in general, 3AC programs can contain unrestricted gotos. The nodes of the CFGs are numbered in reverse, as are the parameters of the \lstinline{add} function, following CompCert convention.
Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citet{vericert}; what is new here is the handling of function calls, so the following concentrates on that aspect. Note that ``$*{\rightarrow}\langle\mathit{node}\rangle$'' stands for edges from all nodes to $\langle\mathit{node}\rangle$.
The solid edges within the two state machines indicate the transfer of control between states, while the dashed edges between the state machines are more `fictional'. The ground truth is that both state machines run continuously, but it is convenient to think that only one machine does useful work at a time. So, the dashed edges indicate when the `active' machine changes.
@@ -436,10 +436,10 @@ To give a rough idea of the scale and complexity of our work: the implementation
\section{Performance evaluation}
-\JWchanged{We now compare the quality of the hardware generated by \vericertfun{} against that of \vericert{}. We use the open-source (but unverified) \bambu{} tool~\cite{pilato13_bambu, ferrandi2021_bambu} as a baseline. We run \bambu{} (version 0.9.6) in the {\tt BAMBU\_AREA} configuration, which optimises for area ahead of latency, but do not provide any additional pragmas to control the HLS process.}
+We now compare the quality of the hardware generated by \vericertfun{} against that of \vericert{}. We use the open-source (but unverified) \bambu{} tool~\cite{pilato13_bambu, ferrandi2021_bambu} as a baseline. We run \bambu{} (version 0.9.6) in the {\tt BAMBU\_AREA} configuration, which optimises for area ahead of latency, but do not provide any additional pragmas to control the HLS process.
Following \citet{vericert}, we use the PolyBench/C benchmark suite~\cite{polybench} with division and modulo replaced with iterative software implementations because \vericert{} does not handle them efficiently. That is, \lstinline{a/b} and \lstinline{c%d}
are textually replaced with \lstinline{div(a,b)} and \lstinline{mod(c,d)}. These \lstinline{div} and \lstinline{mod} functions are the only function calls that are not inlined.
-We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage \JWchanged{(measured in slices)} and fmax.
+We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage (measured in slices) and fmax.
%Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on
%the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the
@@ -514,7 +514,7 @@ The bottom graph compares the execution time. We observe that \vericertfun{} gen
\section{Future work}
-Our immediate priority is to complete \vericertfun's correctness proof. In the medium term, we intend to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers; we anticipate that this will lead to further area savings \JWchanged{and also allow \vericertfun{} to be evaluated on benchmarks with more interesting call graphs}. We would also like \vericertfun{} to generate designs with one Verilog module per C function, as this is more idiomatic than cramming all the state machines into a single module; we did not do this yet because it requires extending the Verilog semantics to handle multiple modules. \JWchanged{It would also be interesting to implement \emph{selective} inlining of functions~\cite{huang+15}, either guided by heuristics or by programmer-supplied pragmas. It is worth noting that having proven inlining correct in general, the amount of inlining performed can be adjusted without affecting the correctness proof.} In the longer term, we would like to combine resource sharing with operation scheduling, i.e. resource-constrained scheduling~\cite{sdc}.
+Our immediate priority is to complete \vericertfun's correctness proof. In the medium term, we intend to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers or perform function calls; we anticipate that this will lead to further area savings and also allow \vericertfun{} to be evaluated on benchmarks with more interesting call graphs. We would also like \vericertfun{} to generate designs with one Verilog module per C function, as this is more idiomatic than cramming all the state machines into a single module; we did not do this yet because it requires extending the Verilog semantics to handle multiple modules. It would also be interesting to implement \emph{selective} inlining of functions~\cite{huang+15}, either guided by heuristics or by programmer-supplied pragmas. It is worth noting that having proven inlining correct in general, the \emph{amount} of inlining can be adjusted without affecting the correctness proof. Longer term, we would like to combine resource sharing with operation scheduling, i.e. resource-constrained scheduling~\cite{sdc}.
\section*{Acknowledgments}