diff options
author | Yann Herklotz <git@yannherklotz.com> | 2021-04-16 20:53:57 +0100 |
---|---|---|
committer | Yann Herklotz <git@yannherklotz.com> | 2021-04-16 20:53:57 +0100 |
commit | 1479bb42c2877c29376549d768a97676e1b96841 (patch) | |
tree | 42dedad20c175e8c9340316a6abde823c213b44b /evaluation.tex | |
parent | 7d8150af139d30058a6be3b962f252505fd45d9b (diff) | |
download | oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.tar.gz oopsla21_fvhls-1479bb42c2877c29376549d768a97676e1b96841.zip |
AddFix more things
Diffstat (limited to 'evaluation.tex')
-rw-r--r-- | evaluation.tex | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/evaluation.tex b/evaluation.tex index c682336..f82c83b 100644 --- a/evaluation.tex +++ b/evaluation.tex @@ -154,7 +154,7 @@ This gap does not represent the performance cost that comes with formally provin Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}. As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees. -Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM template so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle. +Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present. This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs. The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz. This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle. Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases. For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}. Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count. |