summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-16 21:49:40 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-16 21:49:40 +0100
commitcbae62c443013a2888a6e93cf409adff1f395f63 (patch)
treeaa794c2eed196a023daeb4c43e10173f7ea94c2f
parent1479bb42c2877c29376549d768a97676e1b96841 (diff)
downloadoopsla21_fvhls-cbae62c443013a2888a6e93cf409adff1f395f63.tar.gz
oopsla21_fvhls-cbae62c443013a2888a6e93cf409adff1f395f63.zip
Fix
-rw-r--r--evaluation.tex4
-rw-r--r--proof.tex4
2 files changed, 2 insertions, 6 deletions
diff --git a/evaluation.tex b/evaluation.tex
index f82c83b..2e4ccd5 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -10,13 +10,13 @@ Our evaluation is designed to answer the following two research questions.
\subsection{Experimental Setup}
\label{sec:evaluation:setup}
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, which is an HLS specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 4.0, because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}. We also compare against \legup{} with different optimisation levels. First, we only turn off the LLVM optimisations in \legup{}, to eliminate all the optimisations that are common to standard software compilers, referred to as \legup{} w/o opt. Secondly, we also compare against \legup{} with LLVM optimisations and operation chaining turned off, referred to as \legup{} w/o opt+chain, which is an HLS specific optimisation that combines data-dependent operations into one clock cycle, and therefore dramatically reduces the number of cycles, without necessarily increasing the clock speed.
\paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the \polybench{} benchmark suite (version 4.2.1)~\cite{polybench}, which is a collection of 30 numerical kernels. \polybench{} is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, requiring floats, which we do not support.
% Interestingly, we were also unable to evaluate \texttt{cholesky} on \legup{}, since it produce an error during its HLS compilation.
%In summary, we evaluate 27 programs from the latest Polybench suite.
-We configured \polybench{}'s parameters so that only integer types are used, since we do not support floats. We use \polybench{}'s smallest datasets for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
+We configured \polybench{}'s parameters so that only integer types are used. We use \polybench{}'s smallest datasets for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
\vericert{} implements divisions and modulo operations in C using the corresponding built-in Verilog operators. These built-in operators are designed to complete within a single clock cycle, and this causes substantial penalties in clock frequency. Other HLS tools, including LegUp, supply their own multi-cycle division/modulo implementations, and we plan to do the same in future versions of \vericert{}. Implementing pipelined operators such as the divide and modulus operator can be solved by scheduling the instructions so that these can execute in parallel, which is the main optimisation that needs to be added to \vericert{}. In the meantime, we have prepared an alternative version of the benchmarks in which each division/modulo operation is replaced with our own implementation that uses repeated division and multiplications by 2. Figure~\ref{fig:polybench-div} shows the results of comparing Vericert with optimised LegUp 4.0 on the \polybench{} benchmarks, where divisions have been left intact. Figure~\ref{fig:polybench-nodiv} performs the comparison where the division/modulo operations have been replaced by the iterative algorithm.
diff --git a/proof.tex b/proof.tex
index bc1d95f..6749482 100644
--- a/proof.tex
+++ b/proof.tex
@@ -168,10 +168,6 @@ The HTL-to-Verilog simulation is conceptually simple, as the only transformation
%The proof of the translation from maps to case-statements follows by induction over the list of elements in the map and the fact that each key will be unique. In addition to that, the statement that is currently being evaluated is guaranteed by the correctness of the list of elements to be in that list. The latter fact therefore eliminates the base case, as an empty list does not contain the element we know is in the list. The other two cases follow by the fact that either the key is equal to the evaluated value of the case expression, or it isn't. In the first case we can then evaluate the statement and get the state after the case expression, as the uniqueness of the key tells us that the key cannot show up in the list anymore. In the other case we can just apply the inductive hypothesis and remove the current case from the case statement, as it did not match.
\end{proof}
-One problem with our representation of the state as an actual register is that we have to make sure that the state does not overflow. Currently, the state register always has 32 bits, meaning the maximum number of states supported is $2^{32}$.
-%This means that during the translation we have to check that each state can fit into an integer.
-\vericert{} will error out if there are more than this many nodes in the 3AC, thus satisfying the correctness theorem vacuously.
-
\subsection{Deterministic Semantics}\label{sec:proof:deterministic}
%Finally, to obtain the backward simulation that we want, it has to be shown that if we generate hardware with a specific behaviour, that it is the only possible program with that behaviour. This only has to be performed for the final intermediate language, which is Verilog, so that the backward simulation holds for the whole chain from Clight to Verilog.