Add comments and text for results

author: Yann Herklotz <git@yannherklotz.com> 2022-01-14 23:23:27 +0000
committer: Yann Herklotz <git@yannherklotz.com> 2022-01-14 23:23:27 +0000
commit: adf4101b99a856adc1ecd4a04cd6119f4024cc26 (patch)
tree: 5412eadaa41875ad286ae3652f3f86ec4ef008af
parent: 8148ff937e8c5e76d3d4da33ea64416d33ed8d86 (diff)
download: fccm22_rsvhls-adf4101b99a856adc1ecd4a04cd6119f4024cc26.tar.gz
fccm22_rsvhls-adf4101b99a856adc1ecd4a04cd6119f4024cc26.zip
2 files changed, 47 insertions, 10 deletions
diff --git a/references.bib b/references.bib
index 0f35096..28aff54 100644
--- a/references.bib
+++ b/references.bib
@@ -415,4 +415,14 @@ year = {2016},
   timestamp = {Sun, 17 May 2020 11:44:25 +0200},
   biburl    = {https://dblp.org/rec/journals/dt/CoussyGMT09.bib},
   bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-\ No newline at end of file
+}
+@inproceedings{pilato13_bambu,
+  author = {C. {Pilato} and F. {Ferrandi}},
+  booktitle = {2013 23rd International Conference on Field programmable Logic and Applications},
+  title = {Bambu: A modular framework for the high level synthesis of memory-intensive applications},
+  year = {2013},
+  volume = {},
+  number = {},
+  pages = {1-4},
+  doi = {10.1109/FPL.2013.6645550}
+}
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 3cdc7ba..d08d373 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -67,6 +67,7 @@ escapeinside=||,
 
 \newcommand\vericert{Vericert}
 \newcommand\vericertfun{Vericert-Fun}
+\newcommand\bambu{Bambu}
 
 \title{Resource Sharing for Verified High-Level Synthesis \\ (Work in Progress)}
 
@@ -408,15 +409,35 @@ The CompCert correctness theorem~\cite{compcert} expresses that every behaviour
 Where changes \emph{are} required is in the semantics of the intermediate language HTL, which sits between CompCert's 3AC and the final Verilog.
 When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. \MP{TODO: Consider saying something here about how the effect of a function call is to put the state machine into a particular `callstate'. My concern is that this requires introducing a different notion of `state', distinct from the states of the state machine, and that this will probably confuse readers.}
 
-At the point of writing, the correctness of \vericertfun{} from C to HTL has been mostly proven. \YH{TODO: Clarify what `mostly' means.} The pass that renames variables in HTL is yet to be proven, as is the pass that generates the final Verilog. 
+At the point of writing, the correctness of \vericertfun{} from C to HTL has been mostly proven,
+meaning basic instructions, as well as the function calls themselves and their interaction with the
+stack is proven correct.  Proofs of the load and store instructions are not quite finished, as
+additional invariants need to be proven to hold for those instructions.  The pass that renames
+variables in HTL is yet to be proven, as is the pass that generates the final Verilog.
 
 To give a rough idea of the scale and complexity of the task: the implementation of \vericertfun{} involved the addition of about \MP{\ref{???}} lines of Coq code to \vericert{} and took the first author \MP{\ref{???}} months. The correctness proof, so far, has taken \MP{\ref{???}} lines of Coq code and \MP{\ref{???}} months.
 
 \section{Performance evaluation}
 
-We now compare the performance of the hardware generated by \vericertfun{} against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) at 50MHz, to determine area usage and fmax.
-
-Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the data points lie very close to 1, which suggests no significant impact. On average the cycle count increases by 0.7\%; this modest increase is in line with expectations because our translation introduces an extra state per function call. The impact on fmax is similarly minimal, ranging between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
+We now compare the performance of the hardware generated by \vericertfun{} against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage and fmax.
+
+%Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on
+%the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the
+%data points lie very close to 1, which suggests no significant impact. On average the cycle count
+%increases by 0.7\%; this modest increase is in line with expectations because our translation
+%introduces an extra state per function call. The impact on fmax is similarly minimal, ranging
+%between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
+
+Figure~\ref{fig:results} shows the area savings of \vericertfun{} obtained compared to \vericert{},
+and also shows their relative execution speed.  These are compared the \bambu{} high-level synthesis
+tool~\cite{pilato13_bambu}, which is open source but unverified.  First, one can can note that
+designs generated by \vericertfun{} and \vericert{} execute for a same amount of time, as any
+variations can mostly be explained by slight differences in the maximum frequency calculation.  This
+shows that the addition of the resource sharing does not negatively affect the execution time.  The
+area difference between \vericertfun{} and \vericert{} is quite large, which shows the effectiveness
+of the resource sharing that \vericertfun{} introduces.  On average, the area of \vericert{} designs
+is 1.4$\times$ larger than designs produced by \bambu{}, whereas designs produced by \vericertfun{}
+are on average 0.7$\times$ the size of \bambu{} designs.
 
 \pgfplotstableread[col sep=comma]{data/time-ratio.csv}{\divtimingtable}
 \pgfplotstableread[col sep=comma]{data/slice-ratio.csv}{\divslicetable}
@@ -440,7 +461,7 @@ Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of
       ymode=log,
       ybar=0.4pt,
       width=0.6\textwidth,
-      height=0.4\textwidth,
+      height=0.32\textwidth,
       /pgf/bar width=2pt,
       legend pos=south east,
       log ticks with fixed point,
@@ -477,15 +498,21 @@ Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of
       \legend{\vericert{},vericert-fun};
     \end{groupplot}
   \end{tikzpicture}
-  \caption{Performance of \vericert{} compared to \legup{}, with division and modulo operations enabled. The top graph compares the execution times and the bottom graph compares the area of the generated designs. In both cases, the performance of \vericert{}, \legup{} without LLVM optimisations and without operation chaining, and \legup{} without LLVM optimisations is compared against default \legup{}.}
+  \caption{Performance of \vericert{} compared to \vericertfun{}, using \legup{} as a baseline.}
 \label{fig:results}
 \end{figure}
 
-The y-axis shows the impact of resource sharing on the area usage of the hardware. On average we see a decrease in area usage of 12\%. The impact ranges from a 59\% decrease to a 6\% increase, with a 12\% decrease on average. It is hard to attribute precise reasons for this given the heuristic-driven nature of the synthesis process, but certainly the benchmarks with more function calls saw more benefit from resource sharing.
-
 \section{Future work}
 
-Our immediate priority is to complete \vericertfun's correctness proof. In the medium term, we would like to improve our implementation of resource sharing by dropping the requirement to inline functions that access pointers; we anticipate that this will lead to further reductions in area usage. We would also like to make \vericertfun generate designs with one Verilog module per C function, as this is more idiomatic than packing all the state machines into a single module; we did not do this yet because it would require extending the formal Verilog semantics to handle multiple modules. \YH{TODO: Please check that sentence.} In the longer term, we are considering how to implement resource sharing even more effectively in a verified HLS tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.
+Our immediate priority is to complete \vericertfun's correctness proof. In the medium term, we would
+like to improve our implementation of resource sharing by dropping the requirement to inline
+functions that access pointers; we anticipate that this will lead to further reductions in area
+usage. We would also like to make \vericertfun generate designs with one Verilog module per C
+function, as this is more idiomatic than packing all the state machines into a single module; we did
+not do this yet because it would require extending the formal Verilog semantics to handle multiple
+modules. \YH{TODO: Please check that sentence.}\YH{Yes that looks correct to me.} In the longer
+term, we are considering how to implement resource sharing even more effectively in a verified HLS
+tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.
 
 
 \bibliographystyle{ACM-Reference-Format}
author	Yann Herklotz <git@yannherklotz.com>	2022-01-14 23:23:27 +0000
committer	Yann Herklotz <git@yannherklotz.com>	2022-01-14 23:23:27 +0000
commit	adf4101b99a856adc1ecd4a04cd6119f4024cc26 (patch)
tree	5412eadaa41875ad286ae3652f3f86ec4ef008af
parent	8148ff937e8c5e76d3d4da33ea64416d33ed8d86 (diff)
download	fccm22_rsvhls-adf4101b99a856adc1ecd4a04cd6119f4024cc26.tar.gz fccm22_rsvhls-adf4101b99a856adc1ecd4a04cd6119f4024cc26.zip