a HLS -> an HLS

author: Yann Herklotz <git@yannherklotz.com> 2021-09-13 00:20:52 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2021-09-13 00:20:52 +0100
commit: b4f5506f21bf98dc36e78a809a797413fe79e8b9 (patch)
tree: 4ff8374c4e63d6ea3f76e6d05bd000302233c2db
parent: 311999d1c74ea03b57e7113335e0d2c88fa8f8cf (diff)
download: oopsla21_fvhls-b4f5506f21bf98dc36e78a809a797413fe79e8b9.tar.gz
oopsla21_fvhls-b4f5506f21bf98dc36e78a809a797413fe79e8b9.zip
3 files changed, 3 insertions, 3 deletions
diff --git a/evaluation.tex b/evaluation.tex
index bd10ae3..bcaba88 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -158,7 +158,7 @@ This guarantee in itself provides a significant leap in terms of HLS reliability
 The top graphs of Fig.~\ref{fig:polybench-div} and Fig.~\ref{fig:polybench-nodiv} compare the execution time of the 27 programs executed by \vericert{} and the different optimisation levels of \legup{}.  Each graph uses optimised \legup{} as the baseline.  When division/modulo operations are present \legup{} designs execute around 27$\times$ faster than \vericert{} designs.  However, when division/modulo operations are replaced by the iterative algorithm, \legup{} designs are only 2$\times$ faster than \vericert{} designs.  The benchmarks with division/modulo replaced show that \vericert{} actually achieves the same execution speed as \legup{} without LLVM optimisations and without operation chaining, which is encouraging, and shows that the hardware generation is following the right steps.  The execution time is calculated by multiplying the maximum frequency that the FPGA can run at with this design, by the number of clock cycles that are needed to complete the execution.  We can therefore analyse each separately.
 
 First, looking at the difference in clock cycles, \vericert{} produces designs that have around 4.5$\times$ as many clock cycles as \legup{} designs in both cases, when division/modulo operations are enabled as well as when they are replaced.  This performance gap can be explained in part by LLVM optimisations, which seem to account for a 2$\times$ decrease in clock cycles, as well as operation chaining, which decreases the clock cycles by another 2$\times$.  The rest of the speed-up is mostly due to \legup{} optimisations such as scheduling and memory analysis, which are designed to extract parallelism from input programs.
-This gap does not represent the performance cost that comes with formally proving a HLS tool.
+This gap does not represent the performance cost that comes with formally proving an HLS tool.
 Instead, it is simply a gap between an unoptimised \vericert{} versus an optimised \legup{}.
 As we improve \vericert{} by incorporating further optimisations, this gap should reduce whilst preserving the correctness guarantees.
 
diff --git a/introduction.tex b/introduction.tex
index 45bb1a6..65b0bfc 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -21,7 +21,7 @@ Indeed, there are reasons to doubt that HLS tools actually \emph{do} always pres
 %Other reasons are more specific: 
 For instance, Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports.\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}} 
 Meanwhile, \citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  
-More recently, \citet{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases were compiled to designs that behaved incorrectly. %\NR{The word `input' here made me think of I/Os. `input software' or just `software' is better. I think it is worth being consistent across the article on the word used to describe the software description provided to the HLS tool. Actually, we can even signpost it like: `From here on we used the word bla to refer to the input software that is provided to a HLS tool.'} %JW: thanks, done.
+More recently, \citet{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases were compiled to designs that behaved incorrectly. %\NR{The word `input' here made me think of I/Os. `input software' or just `software' is better. I think it is worth being consistent across the article on the word used to describe the software description provided to the HLS tool. Actually, we can even signpost it like: `From here on we used the word bla to refer to the input software that is provided to an HLS tool.'} %JW: thanks, done.
 
 \paragraph{Existing workarounds}
 
diff --git a/verilog.tex b/verilog.tex
index 11087b0..d7980f0 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -37,7 +37,7 @@ where $\Gamma$ is the initial state of all the variables, $\epsilon$ is the empt
 
 \subsection{Changes to the Semantics}
 
-Five changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as a HLS target.
+Five changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as an HLS target.
 
 \paragraph{Adding array support}
 The main change is the addition of support for arrays, which are needed to model RAM in Verilog.  RAM is needed to model the stack in C efficiently, without having to declare a variable for each possible stack location. % In the original semantics, RAMs (as well as inputs and outputs to the module) could be modelled using a function from variable names (strings) to values, which could be modified accordingly to model inputs to the module.  This is quite an abstract description of memory and can also be expressed as an array of bitvectors instead, which is the path we took. This requires the addition of array operators to the semantics and correct reasoning of loads and stores to the array in different \alwaysblock{}s simultaneously.
author	Yann Herklotz <git@yannherklotz.com>	2021-09-13 00:20:52 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2021-09-13 00:20:52 +0100
commit	b4f5506f21bf98dc36e78a809a797413fe79e8b9 (patch)
tree	4ff8374c4e63d6ea3f76e6d05bd000302233c2db
parent	311999d1c74ea03b57e7113335e0d2c88fa8f8cf (diff)
download	oopsla21_fvhls-b4f5506f21bf98dc36e78a809a797413fe79e8b9.tar.gz oopsla21_fvhls-b4f5506f21bf98dc36e78a809a797413fe79e8b9.zip