summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex23
1 files changed, 12 insertions, 11 deletions
diff --git a/evaluation.tex b/evaluation.tex
index cd0a278..edb5795 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -3,8 +3,8 @@
Our evaluation is designed to answer the following three research questions.
\begin{description}
-\item[RQ1] How fast is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
-\item[RQ2] How area-efficient is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
+\item[RQ1] How fast is the hardware generated by \vericert{}?
+\item[RQ2] How area-efficient is the hardware generated by \vericert{}?
\item[RQ3] How long does \vericert{} take to produce hardware?
\end{description}
@@ -12,21 +12,22 @@ Our evaluation is designed to answer the following three research questions.
\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against \legup{} 5.1 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.
-\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite~\cite{polybench}, consisting of a collection of well-known numerical kernels. PolyBench/C is widely-used in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it consists of affine loop bounds, making it attractive for regular and streaming computation on FPGA architectures.
-We chose Polybench 4.2.1 for our experiments, which consists of 30 programs.
-Out of these 30 programs, three programs utilise square root functions: \texttt{corelation},~\texttt{gramschmidt} and~\texttt{deriche}.
-Hence, we were unable evaluate these programs, since they mandatorily require \texttt{float}s.
+\paragraph{Choice and preparation of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite (version 4.2.1)~\cite{polybench}, which consists of a collection of 30 numerical kernels. PolyBench/C is popular in the HLS context~\cite{choi+18,poly_hls_pouchet2013polyhedral,poly_hls_zhao2017,poly_hls_zuo2013}, since it has affine loop bounds, making it attractive for streaming computation on FPGA architectures.
+We were able to use 27 of the 30 programs; three had to be discarded (\texttt{correlation},~\texttt{gramschmidt} and~\texttt{deriche}) because they involve square roots, which require floats, which we do not support.
% Interestingly, we were also unable to evaluate \texttt{cholesky} on \legup{}, since it produce an error during its HLS compilation.
-In summary, we evaluate 27 programs from the latest Polybench suite.
+%In summary, we evaluate 27 programs from the latest Polybench suite.
-\paragraph{Configuring Polybench for experimentation}
-We configure Polybench's metadata and slightly modified the source code to suit our purposes.
+We configured Polybench's parameters so that only integer types are used, since we do not support floats or doubles currently. We use Polybench's smallest datasets for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
+
+
+
+metadata and slightly modified the source code to suit our purposes.
First, we restrict Polybench to only generate integer data types, since we do not support floats or doubles currently.
-Secondly, we utilise Polybench's smallest data set size for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
+Second, we use Polybench's smallest datasets for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
Furthermore, using the C divide or modulo operators results in directly translate to built-in Verilog divide and modulo operators.
Unfortunately, the built-in operators are designed as single-cycle operation, causing large penalties in clock frequency.
To work around this issue, we use a C implementation of the divide and modulo operations, which is indirectly compiles them as multi-cycle operations on the FPGA.
-In addition, we initial the input arrays and check the output arrays of all programs entirely on-chip.
+%In addition, we initial the input arrays and check the output arrays of all programs entirely on-chip.
% For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}