summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-11-13 13:13:43 +0000
committerYann Herklotz <git@yannherklotz.com>2020-11-13 13:13:43 +0000
commitcdbe6f978bf6de9a1b389cebc02dc74666b7a4ee (patch)
treec2013228f14222a3fd7f373f3a815edac38806b7 /evaluation.tex
parentfae93d199b799b3b7ecf9900cec81e548de99d83 (diff)
downloadoopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.tar.gz
oopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.zip
Update some sections
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex97
1 files changed, 42 insertions, 55 deletions
diff --git a/evaluation.tex b/evaluation.tex
index b81d08d..cb185ac 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -4,16 +4,31 @@ Our evaluation is designed to answer the following three research questions. \JW
\begin{description}
\item[RQ1] How fast is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
\item[RQ2] How area-efficient is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
-\item[RQ3] How long does \vericert{} take to produce hardware?
+%\item[RQ3] How long does \vericert{} take to produce hardware?
\end{description}
\subsection{Experimental Setup}
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.
-\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
+\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}} and some CHStone benchmarks\footnote{\url{http://www.ertl.jp/chstone/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
-\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these tasks, and we target an Arria 10 \ref{?} FPGA.
+\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these tasks, and we target an Arria 10 FPGA.
+
+\begin{table*}
+ \begin{tabular}{lcccccccccccc}
+ \toprule
+ Bench & \multicolumn{2}{c}{Cycles} & \multicolumn{2}{c}{Frequency / MHz} & \multicolumn{2}{c}{LUTs} & \multicolumn{2}{c}{Reg} & \multicolumn{2}{c}{BRAMs} & \multicolumn{2}{c}{DSPs}\\
+ & L & V & L & V & L & V & L & V & L & V & L & V\\
+ \midrule
+ adpcm & 30241 & 121386 & 90.05 & 66.3 & 7719 & 51626 & 12034 & 42688 & 7 & 0 & 0 & 48\\
+ aes & 8489 & 41958 & 87.83 & 19.6 & 24413 & 104017 & 23796 & 94239 & 19 & 0 & 0 & 6\\
+ gsm & 7190 & 21994 & 119.25 & 66.1 & 6638 & 45764 & 9201 & 33675 & 3 & 0 & 0 & 8 \\
+ mips & 7754 & 18482 & 98.95 & 78.43 & 5049 & 10617 & 4185 & 7690 & 0 & 0 & 0 & 0\\
+ \bottomrule
+ \end{tabular}
+ \caption{CHStone programs synthesised in \legup{} 5.1 and with \vericert{}}\label{tab:chstone}
+\end{table*}
\subsection{RQ1: How fast is \vericert{}-generated hardware?}
@@ -45,9 +60,9 @@ Our evaluation is designed to answer the following three research questions. \JW
\label{fig:comparison_time}
\end{figure}
-Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because \vericert{}-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, \vericert{} generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
+Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because \vericert{}-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, \vericert{} generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is the divide circuits that are introduced by \vericert{} when a divide operation is performed in C. Currently, these have to be performed in one cycle, which heavily influences the maximum clock speed that can be reached, as the maximum delay introduced by the divide is quite large. One possible solution to this is to separate the divide operation into multiple clock cycles.
-The difference in cycle counts shows the degree of parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of \compcert{} instructions that we have proven to be correct.
+The difference in cycle counts shows the degree of parallelism that \legup{}'s scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of \compcert{} instructions that we have proven to be correct.
\subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
@@ -71,58 +86,30 @@ The difference in cycle counts shows the degree of parallelism that LegUp's sch
\label{fig:comparison_area}
\end{figure}
-Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
-
-\subsection{RQ3: How long does \vericert{} take to produce hardware?}
-
-\begin{figure}
-\begin{tikzpicture}
-\begin{axis}[
- height=80mm,
- width=80mm,
- xlabel={LegUp (s)},
- ylabel={\vericert{} (s)},
- ]
+Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is mainly because RAM is not inferred automatically for the Verilog that is generated by \vericert{}. Other synthesis tools can infer the RAM correctly for \vericert{} output, so this issue could be solved by either using a different synthesis tool and targeting a different FPGA, or by generating the correct template which allows Quartus to identify the RAM automatically.
+
+%\subsection{RQ3: How long does \vericert{} take to produce hardware?}
+%
+%\begin{figure}
+%\begin{tikzpicture}
+%\begin{axis}[
+% height=80mm,
+% width=80mm,
+% xlabel={LegUp (s)},
+% ylabel={\vericert{} (s)},
+% ]
-\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
- table [x=legupcomptime, y=vericertcomptime, col sep=comma]
- {results/comparison.csv};
+%\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
+% table [x=legupcomptime, y=vericertcomptime, col sep=comma]
+% {results/comparison.csv};
-\end{axis}
-\end{tikzpicture}
-\caption{A comparison of compilation time for \vericert{} and for LegUp \JW{Numbers for \vericert{} not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
-\label{fig:comparison_comptime}
-\end{figure}
-
-Figure~\ref{fig:comparison_comptime} compares the compilation times of \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} compilation is about ten times faster than LegUp compilation. The simple reason for this is that \vericert{} omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
-
-\begin{table}
- \begin{tabular}{lcccccc}
- \toprule
- Bench & Cycles & MHz & LUTs & Reg & BRAMs\\
- \midrule
- adpcm & 30241 &90.05 & 7719 & 12034 & 7\\
- aes & 8489 & 87.83 & 24413 & 23796 & 19 \\
- gsm & 7190 & 119.25 & 6638 & 9201 & 3 \\
- mips & 7754 & 98.95 & 5049 & 4185 & 0 \\
- \bottomrule
- \end{tabular}
- \caption{CHstone programs synthesised in LegUp 5.1}
-\end{table}
+%\end{axis}
+%\end{tikzpicture}
+%\caption{A comparison of compilation time for \vericert{} and for LegUp \JW{Numbers for \vericert{} not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
+%\label{fig:comparison_comptime}
+%\end{figure}
-\begin{table}
- \begin{tabular}{lccccccc}
- \toprule
- Bench & Cycles & MHz & LUTs & Reg & BRAMs & DSPs\\
- \midrule
- adpcm & 121386 & 66.3 & 51626 & 42688 & 0 & 48 &\\
- aes & 41958 & 19.6 & 104017 & 94239 & 0 & 6 \\
- gsm & 21994 & 66.1 & 45764 & 33675 & 0 & 8\\
- mips & 18482 & 78.43 & 10617 & 7690 & 0 & 0\\
- \bottomrule
- \end{tabular}
- \caption{CHstone programs synthesised in \vericert{}}
-\end{table}
+%Figure~\ref{fig:comparison_comptime} compares the compilation times of \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} compilation is about ten times faster than LegUp compilation. The simple reason for this is that \vericert{} omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
%%% Local Variables:
%%% mode: latex