diff options
author | Yann Herklotz <git@yannherklotz.com> | 2021-04-16 13:56:32 +0100 |
---|---|---|
committer | Yann Herklotz <git@yannherklotz.com> | 2021-04-16 13:57:11 +0100 |
commit | d36bd7f187ddf0db1745c82400da996c97ed9a03 (patch) | |
tree | 0419b7a7725231788af65b475bf19e7ee2e7a7aa /archive | |
parent | 20ddd80e5eb18e261d6f228d8e9103a9090b7a39 (diff) | |
download | oopsla21_fvhls-d36bd7f187ddf0db1745c82400da996c97ed9a03.tar.gz oopsla21_fvhls-d36bd7f187ddf0db1745c82400da996c97ed9a03.zip |
Remove compilation speed and fix results
Diffstat (limited to 'archive')
-rw-r--r-- | archive/evaluation.tex | 28 |
1 files changed, 10 insertions, 18 deletions
diff --git a/archive/evaluation.tex b/archive/evaluation.tex index 6c21cbf..aa952cf 100644 --- a/archive/evaluation.tex +++ b/archive/evaluation.tex @@ -150,24 +150,6 @@ \end{subfigure} \end{figure} -Firstly, before comparing any performance metrics, it is worth highlighting that any Verilog produced by \vericert{} is guaranteed to be \emph{correct}, whilst no such guarantee can be provided by \legup{}. -This guarantee in itself provides a significant leap in terms of HLS reliability, compared to any other HLS tools available. - -igure~\ref{fig:comparison_cycles} compares the cycle counts of our 27 programs executed by \vericert{} and \legup{} respectively. -n most cases, we see that the data points are above the diagonal, which demonstrates that the \legup{}-generated hardware is faster than \vericert{}-generated Verilog. - -n average, \legup{} designs are $4.5\times$ faster than \vericert{} designs. -his performance gap is mostly due to \legup{} optimisations such as scheduling and memory analysis, which are designed to extract parallelism from input programs. -This gap does not represent the performance cost that comes with formally proving a HLS tool. -Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}. -t is notable that even without \vericert{} performing many optimisations, a few data points are close to the diagonal and even below it. -We are very encouraged by these data points. -s we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving our correctness guarantees. - -ycle count is one factor in calculating execution times; the other is the clock frequency, which determines the duration of each of these cycles. Figure~\ref{fig:comparison_time} compares the execution times of \vericert{} and \legup{}. Across the original \polybench{} benchmarks, we see that \vericert{} designs are about \slowdownOrig$\times$ slower than \legup{} designs. This dramatic discrepancy in performance can be largely attributed to \vericert's na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 21MHz, while \legup{} managed about 247MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 112MHz. This is better, but still substantially below \legup{}, which uses various additional optimisations and Intel-specific IP blocks. Across the modified \polybench{} benchmarks, we see that \vericert{} designs are about \slowdownDiv$\times$ slower than \legup{} designs. - -subsection{RQ2: How area-efficient is \vericert{}-generated hardware?} - \begin{figure} \begin{subfigure}[t]{0.48\textwidth} \definecolor{resourceutilcol}{HTML}{e7298a} @@ -221,3 +203,13 @@ subsection{RQ2: How area-efficient is \vericert{}-generated hardware?} \label{fig:comparison_comptime} \end{subfigure} \end{figure} + +%These designs therefore fill less than 1\% of the FPGA. +%The reason for the similar size in hardware is that +%Synthesis tools such as Quartus generally require array accesses to be in a specific form in order for RAM inference to activate. +%\legup{}'s Verilog generation is tailored to enable RAM inference by Quartus, while \vericert{} generates more generic array accesses. This may make \vericert{} more portable across different FPGA synthesis tools and vendors. +%%For a fair comparison, we chose Quartus for these experiments because LegUp supports Quartus efficiently. +%% Consequently, on average, \legup{} designs use $XX$ RAMs whereas \vericert{} use none. +%Enabling RAM inference is part of our future plans. + +% We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is mainly because RAM is not inferred automatically for the Verilog that is generated by \vericert{}. Other synthesis tools can infer the RAM correctly for \vericert{} output, so this issue could be solved by either using a different synthesis tool and targeting a different FPGA, or by generating the correct template which allows Quartus to identify the RAM automatically. |