Fix last few small things

author: Yann Herklotz <git@yannherklotz.com> 2021-04-17 00:37:19 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-04-17 00:37:19 +0100
commit: 811e65af1394197ff32e99dbe89295f9258baaee (patch)
tree: beaa74b1a69f587796ebbcd3595440eac3b555a2 /evaluation.tex
parent: 0f1416ee039d6e0b7ca3eb0563f62a22d00007c4 (diff)
download: oopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.tar.gz
oopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 08ac1fb..5dcd49c 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -10,7 +10,7 @@ Our evaluation is designed to answer the following two research questions.
 \subsection{Experimental Setup}
 \label{sec:evaluation:setup}
 
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.  We also compare against \legup{} with different optimisation levels \JW{in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}}.  First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt.  Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed. \JW{don't you mean decreasing??}
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.  We also compare against \legup{} with different optimisation levels in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}.  First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt.  Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily decreasing the clock speed.
 
 \paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the \polybench{} benchmark suite (version 4.2.1)~\cite{polybench}, which is a collection of 30 numerical kernels. \polybench{} is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
 We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, requiring floats, which we do not support. 
@@ -161,7 +161,7 @@ Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} fo
 \subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
 
 The bottom graphs in both Figure~\ref{fig:polybench-div} and Figure~\ref{fig:polybench-nodiv} compare the resource utilisation of the \polybench{} programs generated by \vericert{} and \legup{} at various optimisation levels.
-By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware.  This is because optimisations can often increase the size of the hardware to make it faster.  Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}.  This can mostly be explained because of resource sharing in LegUp.  Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design.  As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size.  Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller.  The similarity in area also shows that area \JW{?} is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
+By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware.  This is because optimisations can often increase the size of the hardware to make it faster.  Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}.  This can mostly be explained because of resource sharing in LegUp.  Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design.  As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size.  Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller.  The similarity in area also shows that RAM is correctly being inferred by the synthesis tool, and is therefore not implemented as registers.
 
 %%% Local Variables:
 %%% mode: latex
author	Yann Herklotz <git@yannherklotz.com>	2021-04-17 00:37:19 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-04-17 00:37:19 +0100
commit	811e65af1394197ff32e99dbe89295f9258baaee (patch)
tree	beaa74b1a69f587796ebbcd3595440eac3b555a2 /evaluation.tex
parent	0f1416ee039d6e0b7ca3eb0563f62a22d00007c4 (diff)
download	oopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.tar.gz oopsla21_fvhls-811e65af1394197ff32e99dbe89295f9258baaee.zip