summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2021-04-16 22:52:24 +0000
committeroverleaf <overleaf@localhost>2021-04-16 23:34:56 +0000
commitbbf0c06c409180665f04346d186c4c5d991ecc15 (patch)
tree4a4d314ddef575ff015566957da873904c7c3b8a /evaluation.tex
parent71933526a7c203fb76d54d8f08fea3e132da535c (diff)
downloadoopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.tar.gz
oopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.zip
Update on Overleaf.
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex4
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 5f59ff8..08ac1fb 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -10,7 +10,7 @@ Our evaluation is designed to answer the following two research questions.
\subsection{Experimental Setup}
\label{sec:evaluation:setup}
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain, which is an HLS specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels \JW{in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}}. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed. \JW{don't you mean decreasing??}
\paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the \polybench{} benchmark suite (version 4.2.1)~\cite{polybench}, which is a collection of 30 numerical kernels. \polybench{} is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, requiring floats, which we do not support.
@@ -161,7 +161,7 @@ Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} fo
\subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
The bottom graphs in both Figure~\ref{fig:polybench-div} and Figure~\ref{fig:polybench-nodiv} compare the resource utilisation of the \polybench{} programs generated by \vericert{} and \legup{} at various optimisation levels.
-By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware. This is because optimisations can often increase the size of the hardware to make it faster. Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}. This can mostly be explained because of resource sharing in LegUp. Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design. As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size. Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller. The similarity in area also shows that area is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
+By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware. This is because optimisations can often increase the size of the hardware to make it faster. Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}. This can mostly be explained because of resource sharing in LegUp. Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design. As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size. Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller. The similarity in area also shows that area \JW{?} is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
%%% Local Variables:
%%% mode: latex