summaryrefslogtreecommitdiffstats
path: root/evaluation.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-16 20:53:57 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-16 20:53:57 +0100
commit1479bb42c2877c29376549d768a97676e1b96841 (patch)
tree42dedad20c175e8c9340316a6abde823c213b44b /evaluation.tex
parent7d8150af139d30058a6be3b962f252505fd45d9b (diff)
downloadoopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.tar.gz
oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.zip
AddFix more things
Diffstat (limited to 'evaluation.tex')
-rw-r--r--evaluation.tex2
1 files changed, 1 insertions, 1 deletions
diff --git a/evaluation.tex b/evaluation.tex
index c682336..f82c83b 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -154,7 +154,7 @@ This gap does not represent the performance cost that comes with formally provin
Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
-Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM template so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
+Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases. For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}. Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.