summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-16 23:32:37 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-16 23:32:37 +0100
commit71933526a7c203fb76d54d8f08fea3e132da535c (patch)
tree4c5ba14271ad40cf72faa66648315999c7e30d8c /evaluation.tex
parent00656c3a17263c8153cd96488cf06b571422a3d3 (diff)
downloadoopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.tar.gz
oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.zip
Fix more
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex4
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 2e4ccd5..5f59ff8 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -154,9 +154,9 @@ This gap does not represent the performance cost that comes with formally provin
Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
-Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
+Secondly, looking at the maximum clock frequency that each design can achieve, \legup{} designs achieve 8.2$\times$ the maximum clock frequency of \vericert{} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulo operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can be explained by the fact that \legup{} uses a memory controller to manage multiple RAMs using one interface, which is not needed in \vericert{} as a single RAM is used for the memory.
-Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases. For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}. Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
+Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases. For the \texttt{trmm} benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}. Another interesting benchmark is \texttt{doitgen}, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
\subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}