summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-17 00:37:19 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-17 00:37:19 +0100
commit811e65af1394197ff32e99dbe89295f9258baaee (patch)
treebeaa74b1a69f587796ebbcd3595440eac3b555a2 /evaluation.tex
parent0f1416ee039d6e0b7ca3eb0563f62a22d00007c4 (diff)
downloadoopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.tar.gz
oopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.zip
Fix last few small things
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex4
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 08ac1fb..5dcd49c 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -10,7 +10,7 @@ Our evaluation is designed to answer the following two research questions.
\subsection{Experimental Setup}
\label{sec:evaluation:setup}
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels \JW{in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}}. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed. \JW{don't you mean decreasing??}
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily decreasing the clock speed.
\paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the \polybench{} benchmark suite (version 4.2.1)~\cite{polybench}, which is a collection of 30 numerical kernels. \polybench{} is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, requiring floats, which we do not support.
@@ -161,7 +161,7 @@ Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} fo
\subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
The bottom graphs in both Figure~\ref{fig:polybench-div} and Figure~\ref{fig:polybench-nodiv} compare the resource utilisation of the \polybench{} programs generated by \vericert{} and \legup{} at various optimisation levels.
-By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware. This is because optimisations can often increase the size of the hardware to make it faster. Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}. This can mostly be explained because of resource sharing in LegUp. Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design. As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size. Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller. The similarity in area also shows that area \JW{?} is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
+By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware. This is because optimisations can often increase the size of the hardware to make it faster. Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}. This can mostly be explained because of resource sharing in LegUp. Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design. As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size. Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller. The similarity in area also shows that RAM is correctly being inferred by the synthesis tool, and is therefore not implemented as registers.
%%% Local Variables:
%%% mode: latex