summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2022-01-17 15:02:40 +0000
committernode <node@git-bridge-prod-0>2022-01-17 15:42:29 +0000
commit15daff9a0bce19984a943974fdb148a8cabca8cc (patch)
treefe1b8e08ec98b4bb91350af423acfff6c112d2a8
parent5baaa8d8de28dfd93b0356bc3de2ab3eedd5601d (diff)
downloadfccm22_rsvhls-15daff9a0bce19984a943974fdb148a8cabca8cc.tar.gz
fccm22_rsvhls-15daff9a0bce19984a943974fdb148a8cabca8cc.zip
Update on Overleaf.
-rw-r--r--references.bib37
-rw-r--r--verified_resource_sharing.tex113
2 files changed, 100 insertions, 50 deletions
diff --git a/references.bib b/references.bib
index cec647e..4a2b791 100644
--- a/references.bib
+++ b/references.bib
@@ -20,6 +20,12 @@
url = {https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html},
}
+@misc{vericertfun-github,
+ author = {{Withheld for blind review}},
+ title = {Public Github repository for Vericert-Fun},
+ year = 2022,
+}
+
@misc{xilinx_vitis,
publisher = {Xilinx Inc.},
title = {Vitis HLS},
@@ -405,3 +411,34 @@ year = {2016},
pages = {1-4},
doi = {10.1109/FPL.2013.6645550}
}
+
+@article{huang+15,
+ author = {Qijing Huang and
+ Ruolong Lian and
+ Andrew Canis and
+ Jongsok Choi and
+ Ryan Xi and
+ Nazanin Calagar and
+ Stephen Dean Brown and
+ Jason Helge Anderson},
+ title = {The Effect of Compiler Optimizations on High-Level Synthesis-Generated
+ Hardware},
+ journal = {{ACM} Trans. Reconfigurable Technol. Syst.},
+ volume = {8},
+ number = {3},
+ pages = {14:1--14:26},
+ year = {2015},
+ url = {https://doi.org/10.1145/2629547},
+ doi = {10.1145/2629547},
+ timestamp = {Thu, 18 Jun 2020 15:43:43 +0200},
+ biburl = {https://dblp.org/rec/journals/trets/HuangLCCXCBA15.bib},
+ bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@techreport{pardalos_thesis,
+key = {zzzz},
+author = {{Withheld for blind review}},
+title = {Formally verified resource sharing for High
+Level Synthesis},
+ year = 2021,
+} \ No newline at end of file
diff --git a/verified_resource_sharing.tex b/verified_resource_sharing.tex
index 0efccb9..77626ff 100644
--- a/verified_resource_sharing.tex
+++ b/verified_resource_sharing.tex
@@ -92,7 +92,7 @@ escapeinside=||,
In this paper, we present \vericertfun: \vericert{} extended with function-level resource sharing. Where original \vericert{} creates one block of hardware per function \emph{call}, \vericertfun{} creates one block of hardware per function \emph{definition}. The enabling innovation is to extend \vericert{}'s intermediate language HTL with the ability to represent \emph{multiple} state machines, and then to add support for function calls that transfer control between these state machines. We are working to extend \vericert{}'s formal correctness proof to cover the translation from C source into this extended HTL language, and thence on to Verilog.
-We have benchmarked \vericertfun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 61\% of \vericert{}'s on average and 46\% in the best case, for only a 3\% average decrease in max frequency and 1\% average increase in cycle count.
+We have benchmarked \vericertfun's performance on the PolyBench/C suite, and our results show the generated hardware having a resource usage of 49\% of \vericert{}'s on average for only a 1\% average increase in the execution time.
\end{abstract}
\section{Introduction}
@@ -293,7 +293,7 @@ int main()
Figure~\ref{fig:example_3AC} shows the 3AC representation of that C program, as obtained by the CompCert frontend. It takes the form of two CFGs, one per function. The control flow in this example is straightforward, but in general, 3AC programs can contain unrestricted gotos. The nodes of the CFGs are numbered in reverse order, as are the parameters of the \lstinline{add} function; both are conventions inherited from CompCert.
-Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citet{Herklotz2020}; what is new here is the handling of function calls, so the following description concentrates on that aspect. Before we proceed, there are two conventions worth noting: first, ``$*{\rightarrow}\langle\mathit{node}\rangle$'' should be interpreted as an abbreviation for edges from all nodes to $\langle\mathit{node}\rangle$, and second, we allow the \lstinline{main} machine to refer to the \lstinline{add} machine's register $\langle\mathit{reg}\rangle$ by writing ``\lstinline{add.}$\langle\mathit{reg}\rangle$''.
+Figure~\ref{fig:example_HTL} depicts the result of converting those CFGs into two state machines. The conversion of 3AC instructions into Verilog instructions has been described already by \citet{Herklotz2020}; what is new here is the handling of function calls, so the following description concentrates on that aspect. Note that ``$*{\rightarrow}\langle\mathit{node}\rangle$'' is an abbreviation for edges from all nodes to $\langle\mathit{node}\rangle$.
The solid edges within the two state machines indicate the transfer of control between states, while the dashed edges between the state machines are more `fictional'. The ground truth is that both state machines run continuously, but it is convenient to think of only one of the machines as doing useful work at any point in time. So, the dashed edges indicate the points at which the `active' machine changes.
@@ -303,34 +303,34 @@ The solid edges within the two state machines indicate the transfer of control b
\begin{tikzpicture}[node distance=5mm]
\node[st, fill=highlight1] (m9) at (0,0) {reg\_3 <= 0;};
\node[st, fill=highlight2, below=of m9] (m8) {reg\_6 <= 1;};
-\node[st, fill=highlight2, below=of m8] (m7) {add.rst <= 1; \\ add.param\_0 <= reg\_3; \\ add.param\_1 <= reg\_6;};
-\node[st, fill=highlight2, below=of m7] (m12) {add.rst <= 0; \\ reg\_1 <= add.ret;};
+\node[st, fill=highlight2, below=of m8] (m7) {add\_0\_rst <= 1; \\ add\_0\_a <= reg\_3; \\ add\_0\_b <= reg\_6;};
+\node[st, fill=highlight2, below=of m7] (m12) {add\_0\_rst <= 0; \\ reg\_1 <= add\_0\_ret;};
\node[st, fill=highlight2, below=of m12] (m6) {reg\_3 <= reg\_1;};
\node[st, fill=highlight3, below=of m6] (m5) {reg\_5 <= 2;};
-\node[st, fill=highlight3, below=of m5] (m4) {add.rst <= 1; \\ add.param\_0 <= reg\_3; \\ add.param\_1 <= reg\_5;};
-\node[st, fill=highlight3, below=of m4] (m10) {add.rst <= 0; \\ reg\_2 <= add.ret;};
+\node[st, fill=highlight3, below=of m5] (m4) {add\_1\_rst <= 1; \\ add\_1\_a <= reg\_3; \\ add\_1\_b <= reg\_5;};
+\node[st, fill=highlight3, below=of m4] (m10) {add\_1\_rst <= 0; \\ reg\_2 <= add\_1\_ret;};
\node[st, fill=highlight3, below=of m10] (m3) {reg\_3 <= reg\_2;};
\node[st, fill=highlight4, below=of m3] (m2) {reg\_4 <= reg\_3;};
\node[st, fill=highlight4, below=of m2] (m1) {fin = 1; \\ ret = reg\_4;};
\node[st, fill=highlight4, below=of m1] (m11) {fin <= 0;};
-\node[st, fill=highlight5, above=38mm of m9] (a2) {reg\_7 <= \{\{x2 + x1\} + 0\};};
+\node[st, fill=highlight5, above=38mm of m9] (a2) {reg\_7 <= x2 + x1 + 0;};
\node[st, fill=highlight5, below=of a2] (a1) {fin = 1; \\ ret = reg\_7;};
\node[st, fill=highlight5, below=of a1] (a3) {fin <= 0;};
\draw[ed] (m9) to node[edlab] {!rst} (m8);
\draw[ed] (m8) to node[edlab] {!rst} (m7);
\draw[ed] (m7) to node[edlab] {!rst} (m12);
-\draw[ed] (m12) to node[edlab] {!rst \&\& add.fin} (m6);
+\draw[ed] (m12) to node[edlab] {!rst \&\& add\_0\_fin} (m6);
\draw[ed] (m6) to node[edlab] {!rst} (m5);
\draw[ed] (m5) to node[edlab] {!rst} (m4);
\draw[ed] (m4) to node[edlab] {!rst} (m10);
-\draw[ed] (m10) to node[edlab] {!rst \&\& add.fin} (m3);
+\draw[ed] (m10) to node[edlab] {!rst \&\& add\_1\_fin} (m3);
\draw[ed] (m3) to node[edlab] {!rst} (m2);
\draw[ed] (m2) to node[edlab] {!rst} (m1);
\draw[ed] (m1) to node[edlab] {!rst} (m11);
-\draw[ed] (m12) to[loop right, looseness=2, out=-4, in=4] node[edlab, swap] {!rst \&\& \\ !add.fin} (m12);
-\draw[ed] (m10) to[loop right, looseness=2, out=-4, in=4] node[edlab, swap] {!rst \&\& \\ !add.fin} (m10);
+\draw[ed] (m12) to[loop right, looseness=2, out=-4, in=4] node[edlab, swap] {!rst \&\& \\ !add\_0\_fin} (m12);
+\draw[ed] (m10) to[loop right, looseness=2, out=-4, in=4] node[edlab, swap] {!rst \&\& \\ !add\_1\_fin} (m10);
\draw[ed] (m11) to[loop below, looseness=10, pos=0.15] node[edlab] {!rst} (m11);
\node[inner sep=1pt] (ast1) at ([xshift=5mm, yshift=-5mm]m9.south east) {*};
@@ -343,13 +343,13 @@ The solid edges within the two state machines indicate the transfer of control b
\node[inner sep=1pt] (ast2) at ([xshift=5mm, yshift=-5mm]a2.south east) {*};
\draw[ed] (ast2.north) to[bend right] node[edlab, swap] {rst} (a2.east);
-\coordinate (nw1) at ([xshift=-3mm, yshift=6mm]m9.north west);
-\coordinate (se1) at ([xshift=16mm, yshift=-4mm]m11.south east);
+\coordinate (nw1) at ([xshift=-1mm, yshift=6mm]m9.north west);
+\coordinate (se1) at ([xshift=44mm, yshift=-4mm]m11.south east);
\begin{pgfonlayer}{background}
\node[draw, fill=background, rounded corners, fit=(nw1)(se1)] (main) {};
\end{pgfonlayer}
\node[tit, above=2mm of m9] (labmain) {\bfseries main()};
-\coordinate (nw2) at ([xshift=-3mm, yshift=6mm]a2.north west);
+\coordinate (nw2) at ([xshift=-1mm, yshift=6mm]a2.north west);
\coordinate (se2) at ([xshift=16mm, yshift=-4mm]a3.south east);
\begin{pgfonlayer}{background}
\node[draw, fill=background, rounded corners, fit=(nw2)(se2)] (add) {};
@@ -361,10 +361,24 @@ The solid edges within the two state machines indicate the transfer of control b
\foreach\i in {1,2,3}
\node[lab] at (a\i.north west) {\i};
-\draw[exted] (m7.west) -- ++(-7mm,0) |- ([yshift=-1mm]a2.west);
-\draw[exted] (m4.west) -- ++(-9mm,0) |- ([yshift=1mm]a2.west);
-\draw[exted] ([yshift=-1mm]a3.east) -- ++(19mm,0) |- (m6.east);
-\draw[exted] ([yshift=1mm]a3.east) -- ++(21mm,0) |- (m3.east);
+\draw[exted] (m7.west) -- ++(-5mm,0) |- ([yshift=-1mm]a2.west);
+\draw[exted] (m4.west) -- ++(-7mm,0) |- ([yshift=1mm]a2.west);
+\draw[exted] ([yshift=-1mm]a3.east) -- ++(21mm,0) |- (m6.east);
+\draw[exted] ([yshift=1mm]a3.east) -- ++(23mm,0) |- (m3.east);
+
+\node [align=left, right=3mm of m1, font=\tt\footnotesize] {
+{\bfseries extern\_ctrl:} \\
+add\_0\_rst $\mapsto$ (add, \textrm{reset}) \\
+add\_0\_ret $\mapsto$ (add, \textrm{return}) \\
+add\_0\_fin $\mapsto$ (add, \textrm{finish}) \\
+add\_0\_a $\mapsto$ (add, \textrm{param 0}) \\
+add\_0\_b $\mapsto$ (add, \textrm{param 1}) \\
+add\_1\_rst $\mapsto$ (add, \textrm{reset}) \\
+add\_1\_ret $\mapsto$ (add, \textrm{return}) \\
+add\_1\_fin $\mapsto$ (add, \textrm{finish}) \\
+add\_1\_a $\mapsto$ (add, \textrm{param 0}) \\
+add\_1\_b $\mapsto$ (add, \textrm{param 1})
+};
\end{tikzpicture}
\caption{HTL representation comprising two state machines}
@@ -393,35 +407,35 @@ add} (-2.6,-12.5);
\end{figure}
-In more detail: Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This puts the \lstinline{add} machine into state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to progress from state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
+In more detail: Execution begins in state 9 of the \lstinline{main} machine, and proceeds through successive states until it reaches state 7, in which the \lstinline{add} machine's \lstinline{rst} signal is raised. This causes the \lstinline{add} machine to advance to state 2. When the \lstinline{main} machine advances to state 12, that \lstinline{rst} signal is lowered; the \lstinline{add} machine then begins its computation while the \lstinline{main} machine spins in state 12. When the \lstinline{add} machine has finished its computation (state 1), it asserts its \lstinline{fin} signal. This allows the \lstinline{main} machine to leave state 12. Finally, the \lstinline{add} machine lowers its \lstinline{fin} and waits in state 3 until the next call.
-The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
+The same sequence of events can also be understood using a timing diagram, as given in Figure~\ref{fig:timingdiagram}. In that diagram, the red lines indicate unspecified values. %We see that each call of \lstinline{add} begins with a pulse on \lstinline{add.rst} (fifth waveform) and ends with a pulse on \lstinline{add.fin} (sixth waveform).
+We see that calls are initiated by triggering the \lstinline{rst} signal of the called module. Similarly, a function returns by setting its own \lstinline{fin} register. %There are no `call' or `return' instructions in HTL.
-One technical challenge we encountered in the implementation of \vericertfun{} has to do with the fact that the caller and callee state machines modify each other's variables. This is problematic because each function is translated into a state machine independently, and hence the register names used in the other state machines are not necessarily available. We work around this problem by introducing an additional component to our state machines: a mapping, called \lstinline{externctrl}, that maps local register names to pairs of module identifiers and register names in those modules. Once all the state machines have been generated, we erase \lstinline{externctrl}. We do this in two steps. First, we perform a renaming pass through
+One technical challenge we encountered in the implementation of \vericertfun{} has to do with the fact that the called and callee state machines need to modify each other's variables. This is problematic because each function is translated into a state machine independently, and hence the register names used in the other state machines are not necessarily available. We work around this problem by introducing an additional component to our state machines: a mapping, called \lstinline{extern_ctrl}, that maps local register names to pairs of module identifiers and roles in those modules. For instance, the first entry in \lstinline{extern_ctrl} in Figure~\ref{fig:example_HTL} tells us that the \lstinline{add_0_rst} register used by the \lstinline{main} state machine should resolve to whichever register plays the role of `reset' in the \lstinline{add} state machine. Once all the state machines have been generated, we erase \lstinline{extern_ctrl}. We do this in two steps. First, we perform a renaming pass through
the whole program, making all register names globally unique. This is necessary to avoid
unintended conflicts between register names in different modules, as register names can only be assumed to be unique within their own module. We then do a second pass, renaming
-registers present in \lstinline{externctrl} to the name of the register they target. \JW{Maybe add an example of what the externctrl map would be for the worked example?}
+registers in \lstinline{extern_ctrl} to the name of the actual register they target. %\JW{I'm still struggling to understand externctrl properly. I don't see why we can't resolve externctrl straight away, when each HTL state machine is first generated. Why do we have to wait until they are linked together?}\YH{I think that there are multiple reasons: The first is that it's very difficult to access the function information from other functions, because this requires resolving their ID in the global AST state and then inspecting it. If one ends up using this information, it can be very tricky to prove the induction, because one isn't local to the current function anymore. All the proof structure in CompCert assumes one only deals within the current function. This also complicates the definition of the semantics. Secondly, I think it just helps as a bridge, because functions have their completely separate register files in 3AC. It's therefore quite difficult to map registers from one register file to the other unless one has something that describes which ones to map over.}
-\JW{Explain the assumption made about no pointers in called functions.}
+A second technical challenge we encountered in the implementation of \vericertfun{} has to do with an assumption made in \vericert{}'s correctness proof: that all pointers used by the input program are stored as offsets from the main function's stack frame. This assumption was reasonable for \vericert{} because after full inlining, the only function is the main function. This assumption is baked into several of \vericert{}'s most complicated lemmas, including the correctness proof for \lstinline{load} and \lstinline{store} instructions, and so we have not sought to lift it in the current incarnation of \vericertfun{}. Instead, we have made the compromise of only \emph{partially} eliminating the inlining pass. That is: \vericertfun{} inlines all functions that contain \lstinline{load} or \lstinline{store} instructions. Thus, the benefits of resource sharing are only enjoyed by functions that do not include these instructions.
\section{Proving \vericertfun{} correct}
The CompCert correctness theorem~\cite{compcert} expresses that every behaviour that can be exhibited by the compiled program is also a behaviour of the source program. \vericert{}~\cite{Herklotz2020} adapted this theorem for HLS by replacing `compiled program' with `generated Verilog design'. In both cases, a formal semantics is required for the source and target languages. \vericertfun{} targets the same fragment of the Verilog language as \citeauthor{Herklotz2020} already mechanised in Coq, so no changes are required there.
Where changes \emph{are} required is in the semantics of the intermediate language HTL, which sits between CompCert's 3AC and the final Verilog.
-When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. Calls are initiated by triggering the reset signal of the called module (using the externctrl mechanism described earlier). Similarly, a function enters its returning state by setting its own ``finished'' register. \emph{There are no call or return instructions in HTL}. \MP{One aspect of this I didn't mention: This introduces non-determinism. When you set the reset of a module, the next state can be either a call or a normal executing state, ignoring the reset. This non-determinism does not affect the RTL-to-HTL proof, but could affect the HTL-to-Verilog one.}
+When \citeauthor{Herklotz2020} designed HTL, they did not include a semantics for function calls because they assumed all function calls would already have been inlined. We have extended HTL so that its semantics is additionally parameterised by an environment that maps function names to state machines. We have added a semantics for function calls that looks up the named function in this environment, activates the corresponding state machine, and creates a new frame on the stack. We have also added a semantics for return statements that pops the current frame off the stack and reactivates the caller's state machine. % \MP{One aspect of this I didn't mention: This introduces non-determinism. When you set the reset of a module, the next state can be either a call or a normal executing state, ignoring the reset. This non-determinism does not affect the RTL-to-HTL proof, but could affect the HTL-to-Verilog one.}
-At the point of writing, the correctness of \vericertfun{} from C to HTL has been mostly proven,
-meaning basic instructions, as well as the function calls themselves and their interaction with the
-stack is proven correct. Proofs of the load and store instructions are not quite finished, as
-additional invariants need to be proven to hold for those instructions. The pass that renames
-variables in HTL is yet to be proven, as is the pass that generates the final Verilog.
+At the point of writing, the correctness of \vericertfun{} from C to HTL has been mostly proven: basic instructions and function calls are proven correct, but proofs of load and store instructions are not finished as they lack some key invariants. The pass that renames
+variables in HTL is yet to be proven, as is the pass that generates the final Verilog.
To give a rough idea of the scale and complexity of the task: the implementation of \vericertfun{} involved adding or changing about 700 lines of Coq code in \vericert{} and took the first author 4 months. The correctness proof, so far, has required about 2300 lines of additions or changes to the Coq code and 8 months.
+The entire \vericertfun{} development is fully open-source~\cite{vericertfun-github}. Further details about the implementation and proofs are available in a technical report~\cite{pardalos_thesis}.
+
\section{Performance evaluation}
-We now compare the performance of the hardware generated by \vericertfun{} against that generated by Vericert. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage and fmax.
+We now compare the performance of the hardware generated by \vericertfun{} against that generated by \vericert{}. Following \citet{Herklotz2020}, we use the PolyBench/C benchmark suite~\cite{polybench}, with divisions replaced by an iterative algorithm expressed in C. We used the Icarus Verilog simulator to determine the cycle counts of the generated designs. We used Xilinx Vivado 2017.1, targeting a Xilinx 7-series FPGA (XC7K70T) to determine area usage and fmax.
%Figure~\ref{fig:results} summarises our results. The x-axis shows the impact of resource sharing on
%the speed of the hardware (as calculated by the cycle count divided by fmax); we see that all the
@@ -430,16 +444,13 @@ We now compare the performance of the hardware generated by \vericertfun{} again
%introduces an extra state per function call. The impact on fmax is similarly minimal, ranging
%between a 1.5\% increase and a 3.1\% decrease (0.2\% decrease on average).
-Figure~\ref{fig:results} shows the area savings of \vericertfun{} obtained compared to \vericert{},
-and also shows their relative execution speed. These are compared the \bambu{} high-level synthesis
-tool~\cite{pilato13_bambu}, which is open source but unverified. First, one can can note that
-designs generated by \vericertfun{} and \vericert{} execute for a same amount of time, as any
-variations can mostly be explained by slight differences in the maximum frequency calculation. This
-shows that the addition of the resource sharing does not negatively affect the execution time. The
+Figure~\ref{fig:results} compares the quality of the hardware generated by \vericertfun{} with that generated by \vericert{}, using the open-source (but unverified) \bambu{} HLS tool~\cite{pilato13_bambu} as a baseline. The top graph compares the area usage. We observe a consistent and substantial reduction in area usage across the benchmark programs, with \vericert{} consistently using more area than Bambu (1.4$\times$ on average) and \vericertfun{} almost always using less (0.7$\times$ on average) \JW{got up to here.}
+
+
+First, one can can note that
+designs generated by \vericertfun{} and \vericert{} execute for a similar amount of time, \vericertfun{} being only 1\% slower on average. This is explained by the introduction of latency when performing a function call, but it shows that in general this does not affect the performance too much. The
area difference between \vericertfun{} and \vericert{} is quite large, which shows the effectiveness
-of the resource sharing that \vericertfun{} introduces. On average, the area of \vericert{} designs
-is 1.4$\times$ larger than designs produced by \bambu{}, whereas designs produced by \vericertfun{}
-are on average 0.7$\times$ the size of \bambu{} designs.
+of the resource sharing that \vericertfun{} introduces.
\pgfplotstableread[col sep=comma]{data/time-ratio.csv}{\divtimingtable}
\pgfplotstableread[col sep=comma]{data/slice-ratio.csv}{\divslicetable}
@@ -476,28 +487,30 @@ are on average 0.7$\times$ the size of \bambu{} designs.
ylabel style={font=\footnotesize},
xtick style={draw=none},
]
-
- \nextgroupplot[ymin=1,ymax=7,ylabel={Execution time relative to \legup{}}]
+
+ \nextgroupplot[ymin=0.3,ymax=5,ylabel={Area relative to \legup{}}]
\pgfplotsinvokeforeach{0,...,12}{%
\backgroundbar{#1}}
\backgroundbar[10]{13}
- \addplot+[vericertcol] table [x expr=\coordindex,y=vericert,col sep=comma] from \divtimingtable;
- \addplot+[legupnooptcol] table [x expr=\coordindex,y=vericert-fun,col sep=comma] from \divtimingtable;
+ \addplot+[vericertcol] table [x expr=\coordindex,y=vericert,col sep=comma] from \divslicetable;
+ \addplot+[legupnooptcol] table [x expr=\coordindex,y=vericert-fun,col sep=comma] from \divslicetable;
\draw (axis cs:-1,1) -- (axis cs:28,1);
% JW: redraw axis border which has been partially covered by the grey bars
- \draw (axis cs:-0.5,0.8) rectangle (axis cs:27.5,300);
+ \draw (axis cs:-0.5,0.3) rectangle (axis cs:27.5,10);
- \nextgroupplot[ymin=0.3,ymax=5,ylabel={Area relative to \legup{}}]
+ \nextgroupplot[ymin=1,ymax=7,ylabel={Execution time relative to \legup{}}]
\pgfplotsinvokeforeach{0,...,12}{%
\backgroundbar{#1}}
\backgroundbar[10]{13}
- \addplot+[vericertcol] table [x expr=\coordindex,y=vericert,col sep=comma] from \divslicetable;
- \addplot+[legupnooptcol] table [x expr=\coordindex,y=vericert-fun,col sep=comma] from \divslicetable;
+ \addplot+[vericertcol] table [x expr=\coordindex,y=vericert,col sep=comma] from \divtimingtable;
+ \addplot+[legupnooptcol] table [x expr=\coordindex,y=vericert-fun,col sep=comma] from \divtimingtable;
\draw (axis cs:-1,1) -- (axis cs:28,1);
% JW: redraw axis border which has been partially covered by the grey bars
- \draw (axis cs:-0.5,0.3) rectangle (axis cs:27.5,10);
+ \draw (axis cs:-0.5,0.8) rectangle (axis cs:27.5,300);
+
+
- \legend{\vericert{},vericert-fun};
+ \legend{\vericert{},\vericertfun{}};
\end{groupplot}
\end{tikzpicture}
\caption{Performance of \vericert{} compared to \vericertfun{}, using \legup{} as a baseline.}
@@ -512,7 +525,7 @@ functions that access pointers; we anticipate that this will lead to further red
usage. We would also like to make \vericertfun generate designs with one Verilog module per C
function, as this is more idiomatic than packing all the state machines into a single module; we did
not do this yet because it would require extending the formal Verilog semantics to handle multiple
-modules. \YH{TODO: Please check that sentence.}\YH{Yes that looks correct to me.} In the longer
+modules. It would also be interesting to design heuristics for selectively inlining functions~\cite{huang+15}. In the longer
term, we are considering how to implement resource sharing even more effectively in a verified HLS
tool, perhaps by implementing it as part of a resource-constrained scheduling algorithm~\cite{sdc}.