summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2020-11-19 20:51:45 +0000
committeroverleaf <overleaf@localhost>2020-11-19 20:52:11 +0000
commit70df35bc74805473cd4a1e48293cb29d09b3767c (patch)
tree6526879d2701ecbdee18f256779f08789a093c52 /evaluation.tex
parent1d66503454f22db76b8a314ea1f30babca8f7c93 (diff)
downloadoopsla21_fvhls-70df35bc74805473cd4a1e48293cb29d09b3767c.tar.gz
oopsla21_fvhls-70df35bc74805473cd4a1e48293cb29d09b3767c.zip
Update on Overleaf.
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex14
1 files changed, 7 insertions, 7 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 8981f11..5970de1 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -16,16 +16,16 @@ Our evaluation is designed to answer the following three research questions.
We chose Polybench 4.2.1 for our experiments, which consists of 30 programs.
Out of these 30 programs, three programs utilise square root functions: \texttt{co-relation}, \texttt{gramschmidt} and \texttt{deriche}.
Hence, we were unable evaluate these programs, since they mandatorily require \texttt{float}s.
-Interestingly, we were also unable to evaluate \texttt{cholesky} on \legup{}, since it produce an error during its HLS compilation.
-In summary, we evaluate 26 programs from the latest Polybench suite.
+% Interestingly, we were also unable to evaluate \texttt{cholesky} on \legup{}, since it produce an error during its HLS compilation.
+In summary, we evaluate 27 programs from the latest Polybench suite.
\paragraph{Configuring Polybench for experimentation}
We configure Polybench's metadata and slightly modified the source code to suit our purposes.
First, we restrict Polybench to only generate integer data types, since we do not support floats or doubles currently.
Secondly, we utilise Polybench's smallest data set size for each program to ensure that data can reside within on-chip memories of the FPGA, avoiding any need for off-chip memory accesses.
Furthermore, using the C divide or modulo operators results in directly translate to built-in Verilog divide and modulo operators.
-Unfortunately, the built-in operators are designed as single-cycle operation, causing large penalties in latency and area.
-To work around this issue, we use a C implementation of the divide and modulo operations, which is indirectly compiles them as multi-cycle operations on the FPGA, reducing their latency penalties drastically.
+Unfortunately, the built-in operators are designed as single-cycle operation, causing large penalties in clock frequency.
+To work around this issue, we use a C implementation of the divide and modulo operations, which is indirectly compiles them as multi-cycle operations on the FPGA.
In addition, we initial the input arrays and check the output arrays of all programs entirely on-chip.
% For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
@@ -95,13 +95,13 @@ In addition, we initial the input arrays and check the output arrays of all prog
Firstly, before comparing any performance metrics, it is worth highlighting that any Verilog produced by \vericert{} is guaranteed to be \emph{correct}, whilst no such guarantee can be provided by \legup{}.
This guarantee in itself provides a significant leap in terms of reliability of HLS, compared to any other HLS tools available.
-Figure~\ref{fig:comparison_cycles} compares the cycle counts of our 26 programs executed by \vericert{} and \legup{} respectively.
+Figure~\ref{fig:comparison_cycles} compares the cycle counts of our 27 programs executed by \vericert{} and \legup{} respectively.
In most cases, we see that the data points are above the diagonal, demonstrating that the \legup{}-generated hardware is faster than \vericert{}-generated Verilog.
This performance gap is mostly due to \legup{} optimisations such as scheduling and memory analysis, which are designed to exploit parallelism from input programs.
-On average, \legup{} designs are $4\times$ faster than \vericert{} designs.
+On average, \legup{} designs are $4\times$ faster than \vericert{} designs on Polybench programs.
This gap does not represent the performance cost that comes with formally proving a HLS tool.
Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
-In fact, without any optimisation, a few data points are close to diagonal and even below diagonal, which means \vericert{} is competitive to \legup{}.
+In fact, even without any optimisations, a few data points are close to diagonal and even below diagonal, which means \vericert{} is competitive to \legup{}.
We are very encouraged by these data points.
As we optimise \vericert{} to incorporate other HLS optimisations in a formally-proved manner, this gap can reduce whilst preserving our correctness guarantees.