Fix more

author: Yann Herklotz <git@yannherklotz.com> 2021-04-16 23:32:37 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-04-16 23:32:37 +0100
commit: 71933526a7c203fb76d54d8f08fea3e132da535c (patch)
tree: 4c5ba14271ad40cf72faa66648315999c7e30d8c /evaluation.tex
parent: 00656c3a17263c8153cd96488cf06b571422a3d3 (diff)
download: oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.tar.gz
oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/evaluation.tex b/evaluation.tex
index 2e4ccd5..5f59ff8 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -154,9 +154,9 @@ This gap does not represent the performance cost that comes with formally provin
 Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
 As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
 
-Secondly, looking at the maximum clock frequency that each design can achieve, \vericert{} designs can only achieve 8.2$\times$ the maximum clock frequency of \legup{} \JW{That sounds wrong? Shouldn't it be less than legup's fmax?} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulus \JW{modulo?} operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can maybe be explained by scheduling trying to pack too many instructions into a cycle, or by the fact that \legup{} uses a more involved RAM interface so that the hardware produces a dual-port RAM, which can perform two reads and writes per clock cycle.
+Secondly, looking at the maximum clock frequency that each design can achieve, \legup{} designs achieve 8.2$\times$ the maximum clock frequency of \vericert{} when division/modulo operations are present.  This is in great contrast to the maximum clock frequency that \vericert{} can achieve when no divide/modulo operations are present, where \vericert{} generates designs that are actually 2$\times$ better than the frequency achieved by \legup{} designs.  The dramatic discrepancy in performance for the former case can be largely attributed to \vericert{}'s na\"ive implementations of division and modulo operations, as explained in Section~\ref{sec:evaluation:setup}. Indeed, \vericert{} achieved an average clock frequency of just 13MHz, while \legup{} managed about 111MHz. After replacing the division/modulo operations with our own C-based implementations, \vericert{}'s average clock frequency becomes about 220MHz.  This improvement in frequency can be explained by the fact that \legup{} uses a memory controller to manage multiple RAMs using one interface, which is not needed in \vericert{} as a single RAM is used for the memory.
 
-Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases.  For the trmm benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}.  Another interesting benchmark is \JW{tt formatting for benchmark program names?} doitgen, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
+Looking at a few benchmarks in particular in Figure~\ref{fig:polybench-nodiv} for some interesting cases.  For the \texttt{trmm} benchmark, \vericert{} produces hardware that executes with the same cycle count as \legup{}, and manages to create hardware that achieves twice the frequency compared to \legup{}, thereby actually producing a design that executes twice as fast as \legup{}.  Another interesting benchmark is \texttt{doitgen}, where \vericert{} is comparable to \legup{} without LLVM optimisations, however, LLVM optimisations seem to have a large effect on the cycle count.
 
 \subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
author	Yann Herklotz <git@yannherklotz.com>	2021-04-16 23:32:37 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-04-16 23:32:37 +0100
commit	71933526a7c203fb76d54d8f08fea3e132da535c (patch)
tree	4c5ba14271ad40cf72faa66648315999c7e30d8c /evaluation.tex
parent	00656c3a17263c8153cd96488cf06b571422a3d3 (diff)
download	oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.tar.gz oopsla21_fvhls-71933526a7c203fb76d54d8f08fea3e132da535c.zip