2 files changed, 17 insertions, 22 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 848a94f..0d27bac 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -112,12 +112,12 @@ The first step of the translation is to use \compcert{} to transform the input C
 
 %   + TODO Explain how memory is mapped
 
-The first translation performed in Vericert is from 3AC to a new hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog.  The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD)~\cite{hwang99_fsmd}\JW{I feel like this could use some sort of citation, but I'm not sure what. I guess this is all from "Hardware Design 101", right?}\YH{I think I found a good one actually, which goes over the basics.} representation in HTL.\@  The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations that update the memory and registers.  Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD. \JW{I think it would be worth having a sentence to explain how the C model of memory is translated to a hardware-centric model of memory. For instance, in C we have global variables/arrays, stack-allocated variables/arrays, and heap-allocated variables/arrays (anything else?). In Verilog we have registers and RAM blocks. So what's the correspondence between the two worlds? Globals and heap-allocated are not handled, stack-allocated variables become registers, and stack-allocated arrays become RAM blocks? Am I close?}\YH{Stack allocated variables become RAM as well, so that we can deal with addresses easily and take addresses of any variable.}  Hardware does not have the same memory model as C, the memory model therefore needs to be translated in the following way.  Global variables are not translated in Vericert at the moment, however, the stack of the main function will become the RAM seen in Figure~\ref{fig:accumulator_diagram}.  Variables that have their address is taken will therefore be stored in the RAM, as well as any arrays or structs defined in the function.  Variables that did not have their address taken will be kept in registers.
+The first translation performed in Vericert is from 3AC to a new hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog.  The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD)~\cite{hwang99_fsmd}\JW{I feel like this could use some sort of citation, but I'm not sure what. I guess this is all from "Hardware Design 101", right?}\YH{I think I found a good one actually, which goes over the basics.} representation in HTL.\@  The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations that update the memory and registers.  Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD. \JW{I think it would be worth having a sentence to explain how the C model of memory is translated to a hardware-centric model of memory. For instance, in C we have global variables/arrays, stack-allocated variables/arrays, and heap-allocated variables/arrays (anything else?). In Verilog we have registers and RAM blocks. So what's the correspondence between the two worlds? Globals and heap-allocated are not handled, stack-allocated variables become registers, and stack-allocated arrays become RAM blocks? Am I close?}\YH{Stack allocated variables become RAM as well, so that we can deal with addresses easily and take addresses of any variable.} \JW{I see, thanks. So, in short, the only registers in your hardware designs are those that store things like the current state, etc. You generate a fixed number of registers every time you synthesis -- you don't generate extra registers to store any of the program variables. Right?} Hardware does not have the same memory model as C, the memory model therefore needs to be translated in the following way.  Global variables are not translated in Vericert at the moment, however, the stack of the main function will become the RAM seen in Figure~\ref{fig:accumulator_diagram}.  Variables that have their address is taken will therefore be stored in the RAM, as well as any arrays or structs defined in the function.  Variables that did not have their address taken will be kept in registers.
 
 \begin{figure*}
   \centering
   \includegraphics[scale=0.3,trim={10cm 8cm 5cm 5cm},clip=true]{data/accumulator_fsmd2.pdf}
-  \caption{The FSMD for our running example. \JW{Maybe replace `State' with `Current State'? And maybe `Calculate State' could be clearer as `Calculate Next State'?} \JW{Is it worth distinguishing between the different types of box? We have a RAM box which is a single hardware block, then the Registers and State boxes both indicate a collection of registers, and finally the Update and Calculate boxes indicate combinational logic. Perhaps the combinational logic bits could be visualised as clouds? (Or if clouds are a bit of a faff to draw, then rounded rectangles, or something?)} \JW{Can state 15 (or should it be state 16??) have a dangling incoming arrow to indicate that it is the start state? And perhaps state 1 could have a double outline to indicate that it is an `accepting' state? Since there's space above the `Calculate State' box, I'd be mildly in favour of expanding that box a bit so that it included all 15 states explicitly (snaking back and forth).}\YH{If this is better I can mock up a tikz version of it maybe and fix the last bits then too.}}\label{fig:accumulator_diagram}
+  \caption{The FSMD for our running example. \JW{Maybe replace `State' with `Current State'?} \JW{Can state 15 (or should it be state 16??) have a dangling incoming arrow to indicate that it is the start state? And perhaps state 1 could have a double outline to indicate that it is an `accepting' state?} \JW{Since there's space above the `Calculate State' box, I'd be mildly in favour of expanding that box a bit so that it included all 15 states explicitly (snaking back and forth).}\YH{If this is better I can mock up a tikz version of it maybe and fix the last bits then too.}}\label{fig:accumulator_diagram}
 \end{figure*}
 
 The translation from 3AC to HTL is straightforward, as each 3AC instruction either matches up quite well to a hardware construct, or does not have to be handled by the translation, such as function calls.
diff --git a/introduction.tex b/introduction.tex
index 2ef9218..4e69cd8 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -4,28 +4,23 @@
 
 %\JW{A few high-level comments: \begin{enumerate} \item Create more tension from the start by making the reader doubt whether existing HLS tools are trustworthy. \item The intro currently draws quite a bit of motivation from Lidbury et al. 2015, but we should also now lean on our FPGA submission too. \item I wonder whether the paragraph `To mitigate the problems...' should be demoted to a `related work' discussion (perhaps as a subsection towards the end of the introduction). It outlines (and nicely dismisses) some existing attempts to tackle the problem, which is certainly useful motivation for your work, especially for readers already familiar with HLS, but I feel that it's not really on the critical path for understanding the paper.\end{enumerate}}
 
-\NR{I couldn't have subsections in comments so I have appended my writing to the bottom of this file.}\YH{The original intro is in the archive, we can maybe merge them in the future a bit.}
+%\NR{I couldn't have subsections in comments so I have appended my writing to the bottom of this file.}\YH{The original intro is in the archive, we can maybe merge them in the future a bit.}
+
+\subsection{Can you trust your high-level synthesis tool?}
 
 As latency, throughput and energy efficiency become increasingly important, custom hardware accelerators are being designed for numerous applications~\cite{??}. 
-Alas, designing these accelerators come at a cost of additional engineering effort and risk, since it typically involves designing in hardware description languages (HDL), such as Verilog.
-Designing at HDL level is not only arduous and time-consuming but also it limits the expressiveness and abstraction of computation. 
-As such, high-level synthesis (HLS) is becoming an attractive alternative, since it offers abstraction by compiling high-level languages like C/C++ to HDL, whilst achieving comparable quality of results relative HDL designs~\cite{bdti,autoesl}.
-Modern HLS tools such as LegUp~\cite{}, Vivado HLS~\cite{}, and Intel i++~\cite{} can produce designs with comparable performance and energy-efficiency to those manually coded in HDL [citation needed], while offering the convenient abstractions and rich ecosystem of software development.
-
-\subsection{Can we trust our high-level synthesis tools?}
-The common starting point for most HLS tool development is to leverage an existing software compiler framework like LLVM, such as LegUp HLS, Vivado HLS and Bambu HLS~\cite{}.
-Re-using software concepts, optimisation and codebases make HLS compiliation as susceptible to bugs as any software compilers. 
-These native codebases are large and they perform complex and non-trivial analyses and transformations to translate software into efficient assembly or HDL, in the case of HLS. 
-Consequently, HLS tools cannot always guarantee that the generated hardware is equivalent to the input program, undermining any reasoning conducted at the software level. 
-Furthermore, HDL design in itself is complex, error-prone and requires careful reasoning and formal verification to ensure behaviour, data-flow and structural correctness~\cite{}.
-The added complexity of HDL design thus increases the likelihood of HLS compilation mismatches. 
-% between the software program and the generated hardware. 
-% There are valid reasons to doubt that HLS tools actually \emph{do} preserve equivalence, not least because they are large pieces of software that perform complex transformations.
-
-Hence, the premise of this work is: Can we trust these compilers to translate high-level languages like C/C++ to HDL correctly? 
-For instance, \citet{lidbury15_many_core_compil_fuzzin} abandoned fuzzing Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on a large number of their test inputs.  
-Also, Du \emph{et al.}~\cite{?} fuzz tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, while restricting the C programs to the constructs explicitly supported by the synthesis tools. 
-They found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.  
+Alas, designing these accelerators can be a tedious and error-prone process using a hardware description language (HDL) such as Verilog.
+An attractive alternative is high-level synthesis (HLS), in which hardware designs are automatically compiled from software written in a language like C. 
+Modern HLS tools such as \legup{}~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{??}, and Bambu HLS~\cite{} can produce designs with comparable performance and energy-efficiency to those manually coded in HDL~\cite{bdti,autoesl}, while offering the convenient abstractions and rich ecosystem of software development.
+
+But existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to the software they were given. 
+This undermines any reasoning conducted at the software level. 
+And there are reasons to doubt that HLS tools actually \emph{do} preserve equivalence. First, just like conventional compilers, HLS tools are large pieces of software that perform a series of complex analyses and transformations. 
+Second, unlike conventional compilers, HLS tools target HDL, which has superficial syntactic similarities to software languages but a subtly different semantics, making it easy to introduce behavioural mismatches between the output of the HLS tool and its input.
+%Hence, the premise of this work is: Can we trust these compilers to translate high-level languages like C/C++ to HDL correctly? 
+For instance, \citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  
+More recently, Du et al.~\cite{du+20} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.  
+%
 Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}.
 
 \subsection{Existing verification workarounds}