AddFix more things

author: Yann Herklotz <git@yannherklotz.com> 2021-04-16 20:53:57 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-04-16 20:53:57 +0100
commit: 1479bb42c2877c29376549d768a97676e1b96841 (patch)
tree: 42dedad20c175e8c9340316a6abde823c213b44b /evaluation.tex
parent: 7d8150af139d30058a6be3b962f252505fd45d9b (diff)
download: oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.tar.gz
oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/evaluation.tex b/evaluation.tex
index c682336..f82c83b 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -154,7 +154,7 @@ This gap does not represent the performance cost that comes with formally provin
 Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
 As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
 
-Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM template so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
+Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
 
 Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases.  For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}.  Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
author	Yann Herklotz <git@yannherklotz.com>	2021-04-16 20:53:57 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-04-16 20:53:57 +0100
commit	1479bb42c2877c29376549d768a97676e1b96841 (patch)
tree	42dedad20c175e8c9340316a6abde823c213b44b /evaluation.tex
parent	7d8150af139d30058a6be3b962f252505fd45d9b (diff)
download	oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.tar.gz oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.zip