Update on Overleaf.

author: Yann Herklotz <ymh15@ic.ac.uk> 2022-01-17 22:08:51 +0000
committer: node <node@git-bridge-prod-0> 2022-01-17 22:27:57 +0000
commit: 49eef634537b1b496e77a893e67479b6ca56a90a (patch)
tree: dbfe73a2dc641f72e9113e0807b03df0ff9fd8b5
parent: f8221fee0036ded642f611c4d5e48f70c15bc335 (diff)
download: fccm22_rsvhls-49eef634537b1b496e77a893e67479b6ca56a90a.tar.gz
fccm22_rsvhls-49eef634537b1b496e77a893e67479b6ca56a90a.zip
1 files changed, 7 insertions, 7 deletions
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 1f39e0f..14b2c29 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -417,7 +417,7 @@ the whole program, making all register names globally unique. This is necessary
 unintended conflicts between register names in different modules, as register names can only be assumed to be unique within their own module. We then do a second pass, renaming
 registers in \lstinline{extern_ctrl} to the name of the actual register they target. %\JW{I'm still struggling to understand externctrl properly. I don't see why we can't resolve externctrl straight away, when each HTL state machine is first generated. Why do we have to wait until they are linked together?}\YH{I think that there are multiple reasons: The first is that it's very difficult to access the function information from other functions, because this requires resolving their ID in the global AST state and then inspecting it.  If one ends up using this information, it can be very tricky to prove the induction, because one isn't local to the current function anymore.  All the proof structure in CompCert assumes one only deals within the current function.  This also complicates the definition of the semantics.  Secondly, I think it just helps as a bridge, because functions have their completely separate register files in 3AC.  It's therefore quite difficult to map registers from one register file to the other unless one has something that describes which ones to map over.}
 
-A second technical challenge we encountered in the implementation of \vericertfun{} has to do with an assumption made in \vericert{}'s correctness proof: that all pointers used by the input program are stored as offsets from the main function's stack frame. This assumption was reasonable for \vericert{} because after full inlining, the only function is the main function. This assumption is baked into several of \vericert{}'s most complicated lemmas, including the correctness proof for \lstinline{load} and \lstinline{store} instructions, and so we have not sought to lift it in the current incarnation of \vericertfun{}. Instead, we have made the compromise of only \emph{partially} eliminating the inlining pass. That is: \vericertfun{} inlines all functions that contain \lstinline{load} or \lstinline{store} instructions. Thus, the benefits of resource sharing are only enjoyed by functions that do not include these instructions.
+A second technical challenge we encountered in the implementation of \vericertfun{} has to do with an assumption made in \vericert{}'s correctness proof: that all pointers used by the input program are stored as offsets from the main function's stack frame. This assumption was reasonable for \vericert{} because after full inlining, the only function is the main function. This assumption is baked into several of \vericert{}'s most complicated lemmas, including the correctness proof for \lstinline{load} and \lstinline{store} instructions, and so we have not sought to lift it in the current incarnation of \vericertfun{}. Instead, we have made the compromise of only \emph{partially} eliminating the inlining pass. That is: \vericertfun{} inlines all functions that contain \lstinline{load}, \lstinline{store} or \lstinline{call} instructions. Thus, the benefits of resource sharing are only enjoyed by functions that do not include these instructions.
 
 \section{Proving \vericertfun{} correct}
 
@@ -434,7 +434,7 @@ To give a rough idea of the scale and complexity of the task: the implementation
 \section{Performance evaluation}
 
 We now compare the quality of the hardware generated by \vericertfun{} against that generated by \vericert{}. Following \citet{vericert}, we use the PolyBench/C benchmark suite~\cite{polybench} with division and modulo replaced with iterative software implementations because \vericert{} does not handle them inefficiently. That is, \lstinline{a/b} and \lstinline{c%d}
-are textually replaced with \lstinline{div(a,b)} and \lstinline{mod(c,d)}. These \lstinline{div} and \lstinline{mod} functions are the only function calls in the benchmarks.
+are textually replaced with \lstinline{div(a,b)} and \lstinline{mod(c,d)}. These \lstinline{div} and \lstinline{mod} functions are the only \YH{Non-inlinable? There are always 4 functions defined per benchmark, but these are inlined by Vericert.} function calls in the benchmarks.
 
 We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage and fmax.
 
@@ -445,14 +445,14 @@ We used the Icarus Verilog simulator to determine the cycle counts of the genera
 %introduces an extra state per function call. The impact on fmax is similarly minimal, ranging
 %between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
 
-Figure~\ref{fig:results} compares the hardware generated by \vericertfun{} with that generated by \vericert{}, using the open-source (but unverified) \bambu{} HLS tool~\cite{pilato13_bambu} as a baseline. The top graph compares the area usage. We observe a consistent and substantial reduction in area usage across the benchmark programs, with \vericert{} consistently using more area than Bambu (1.5$\times$ on average) and \vericertfun{} being much closer to Bambu (1.2$\times$ on average). As expected, the benchmarks with the most function calls, such as mvt and \YH{name of benchmark?} enjoy the biggest area savings. \JW{Revisit this text with the final experimental numbers.}
+Figure~\ref{fig:results} compares the hardware generated by \vericertfun{} with that generated by \vericert{}, using the open-source (but unverified) \bambu{} HLS tool~\cite{pilato13_bambu} as a baseline. The top graph compares the area usage. We observe a consistent and substantial reduction in area usage across the benchmark programs, with \vericert{} consistently using more area than Bambu (1.5$\times$ on average) and \vericertfun{} being much closer to Bambu (1.2$\times$ on average). As expected, the benchmarks with the most function calls, such as mvt, 2mm, 3mm and ludcmp enjoy the biggest area savings.  Benchmarks where only one function call is made, such as heat-3d and nussinov naturally see a slight increase in area usage because of the extra circuitry required.
 
-The bottom graph compares the execution time. We observe that \vericertfun{} generates hardware that is almost exactly as fast as \vericert{}'s, with \vericertfun{} hardware being only 1\% slower on average. This can be attributed to the latency overhead of performing a function call. Hardware from \vericert{} and \vericertfun{} is significantly slower than \bambu's, which can be attributed to \vericert{} employing far fewer optimisations than the unverified \bambu{} tool.
+The bottom graph compares the execution time. We observe that \vericertfun{} generates hardware that is almost exactly as fast as \vericert{}'s, with \vericertfun{} hardware being only 4\% slower on average. This can be attributed to the latency overhead of performing a function call. Hardware from \vericert{} and \vericertfun{} is significantly slower than \bambu's, which can be attributed to \vericert{} employing far fewer optimisations than the unverified \bambu{} tool.
 
 \pgfplotstableread[col sep=comma]{data/time-ratio.csv}{\divtimingtable}
 \pgfplotstableread[col sep=comma]{data/slice-ratio.csv}{\divslicetable}
-\definecolor{vericertcol}{HTML}{66C2A5}
-\definecolor{legupnooptcol}{HTML}{FC8D62}
+\definecolor{vericertcol}{HTML}{1b9e77}
+\definecolor{legupnooptcol}{HTML}{d95f02}
 \definecolor{legupnooptnochaincol}{HTML}{8DA0CB}
 \newcommand\backgroundbar[2][5]{\draw[draw=none, fill=black!#1] (axis cs:#2*2+0.5,0.1) rectangle
   (axis cs:1+#2*2+0.5,300);}
@@ -470,7 +470,7 @@ The bottom graph compares the execution time. We observe that \vericertfun{} gen
       },
       ymode=log,
       ybar=0.4pt,
-      width=0.6\textwidth,
+      width=0.64\textwidth,
       height=0.25\textwidth,
       /pgf/bar width=2pt,
       legend pos=north east,
author	Yann Herklotz <ymh15@ic.ac.uk>	2022-01-17 22:08:51 +0000
committer	node <node@git-bridge-prod-0>	2022-01-17 22:27:57 +0000
commit	49eef634537b1b496e77a893e67479b6ca56a90a (patch)
tree	dbfe73a2dc641f72e9113e0807b03df0ff9fd8b5
parent	f8221fee0036ded642f611c4d5e48f70c15bc335 (diff)
download	fccm22_rsvhls-49eef634537b1b496e77a893e67479b6ca56a90a.tar.gz fccm22_rsvhls-49eef634537b1b496e77a893e67479b6ca56a90a.zip