Update on Overleaf.

author: John Wickerson <j.wickerson@imperial.ac.uk> 2021-04-16 22:52:24 +0000
committer: overleaf <overleaf@localhost> 2021-04-16 23:34:56 +0000
commit: bbf0c06c409180665f04346d186c4c5d991ecc15 (patch)
tree: 4a4d314ddef575ff015566957da873904c7c3b8a /evaluation.tex
parent: 71933526a7c203fb76d54d8f08fea3e132da535c (diff)
download: oopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.tar.gz
oopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 5f59ff8..08ac1fb 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -10,7 +10,7 @@ Our evaluation is designed to answer the following two research questions.
 \subsection{Experimental Setup}
 \label{sec:evaluation:setup}
 
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.  We also compare against \legup{} with different optimisation levels.  First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt.  Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain, which is an HLS specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.  We also compare against \legup{} with different optimisation levels \JW{in an effort to understand which optimisations have the biggest impact on the performance discrepancies between \legup{} and \vericert{}}.  First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt.  Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain. Operation chaining is an HLS-specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed. \JW{don't you mean decreasing??}
 
 \paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the \polybench{} benchmark suite (version 4.2.1)~\cite{polybench}, which is a collection of 30 numerical kernels. \polybench{} is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
 We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, requiring floats, which we do not support. 
@@ -161,7 +161,7 @@ Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} fo
 \subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
 
 The bottom graphs in both Figure~\ref{fig:polybench-div} and Figure~\ref{fig:polybench-nodiv} compare the resource utilisation of the \polybench{} programs generated by \vericert{} and \legup{} at various optimisation levels.
-By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware.  This is because optimisations can often increase the size of the hardware to make it faster.  Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}.  This can mostly be explained because of resource sharing in LegUp.  Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design.  As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size.  Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller.  The similarity in area also shows that area is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
+By looking at the median, when division/modulo operations are enabled, we see that \vericert{} produces hardware that is about the same size as optimised \legup{}, whereas the unoptimised versions of \legup{} actually produce slightly smaller hardware.  This is because optimisations can often increase the size of the hardware to make it faster.  Especially in Figure~\ref{fig:polybench-div}, there are a few benchmarks where the size of the \legup{} design is much smaller than that produced by \vericert{}.  This can mostly be explained because of resource sharing in LegUp.  Division/modulo operations need large circuits, and it is therefore usual to only have one circuit per design.  As \vericert{} uses the na\"ive implementation of division/modulo, there will be multiple circuits present in the design, which blows up the size.  Looking at Figure~\ref{fig:polybench-nodiv}, one can see that without division, the size of \vericert{} designs are almost always around the same size as \legup{} designs, never being more than 2$\times$ larger, and sometimes even being smaller.  The similarity in area also shows that area \JW{?} is correctly being inferred by the synthesis tool as a RAM, and is therefore not implemented as registers.
 
 %%% Local Variables:
 %%% mode: latex
author	John Wickerson <j.wickerson@imperial.ac.uk>	2021-04-16 22:52:24 +0000
committer	overleaf <overleaf@localhost>	2021-04-16 23:34:56 +0000
commit	bbf0c06c409180665f04346d186c4c5d991ecc15 (patch)
tree	4a4d314ddef575ff015566957da873904c7c3b8a /evaluation.tex
parent	71933526a7c203fb76d54d8f08fea3e132da535c (diff)
download	oopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.tar.gz oopsla21_fvhls-bbf0c06c409180665f04346d186c4c5d991ecc15.zip