3 files changed, 24 insertions, 13 deletions
diff --git a/algorithm.tex b/algorithm.tex
index bf98b6c..429f627 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -2,16 +2,19 @@
 \label{sec:design}
 
 This section covers the main architecture of the HLS tool, and the way in which the back end was added to \compcert{}.  This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
-%\JW{I've experimented with adding a few paragraph headings to the text below -- see what you think. The advantage of headings is that it can make the text easier to read quickly.}\YH{Yes I think it works well actually, makes the sections clearer.}
 
 \paragraph{Choice of source language}
-First of all, the choice of C for the input language of \vericert{} is because it is the most widely supported language for HLS, and most major HLS tools also use it as an input.  As a lot of existing code is also written in C for HLS, supporting C as an input language compared to a custom domain-specific language means that \vericert{} is more practical.  Another alternative was to support LLVM IR as an input language, however, to get a full work flow from a higher level language to hardware, a front end for that language to LLVM IR would also have to be verified.  Finally, a language similar to Occam was also considered, as it has inherent parallel constructs, however, this would not qualify as being high-level synthesis due to the manual parallelism that would have to be performed. %\JW{Can we mention one or two alternatives that we considered?}
+First of all, the choice of C for the input language of \vericert{} is simply because it is what most major HLS tools use~\cite{canis11_legup, xilinx20_vivad_high_synth, intel_hls, bambu_hls}. This, in turn, may be because C is ``[t]he starting point for the vast majority of algorithms to be implemented in hardware''~\cite{5522874}.
+%Since a lot of existing code for HLS is written in C, supporting C as an input language, rather than a custom domain-specific language, means that \vericert{} is more practical. 
+%An alternative was to support LLVM IR as an input language, however, to get a full work flow from a higher level language to hardware, a front end for that language to LLVM IR would also have to be verified. \JW{Maybe save LLVM for the `Choice of implementation language'?}
+We considered following~\citet{page91_compil_occam} by using an Occam-like language, as it has built-in parallel constructs that map well to parallel hardware. however, this would not qualify as being high-level synthesis due to the manual parallelism that would have to be performed. \JW{I don't think the presence of parallelism stops it being proper HLS.} \JP{I think I agree with Yann here, but it could be worded better. At any rate not many people have experience writing what is essentially syntactic sugar over a process calculus.}
+
 
 \paragraph{Choice of target language}
-Next, Verilog~\cite{06_ieee_stand_veril_hardw_descr_languag} is a hardware description language, which can be synthesised into logic gates which can be either placed onto a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC).  Verilog was chosen as the output language for \vericert{} because it is one of the most popular hardware description languages and there already exist a few formal semantics for it that could be used as a target~\cite{loow19_verif_compil_verif_proces,meredith10_veril}. Other possible targets could have been Bluespec, a higher level hardware description language, for which there exists a formally verified translation to circuits using K\^{o}ika~\cite{bourgeat20_essen_blues}, however, targeting this language would not be trivial as it is not meant to be targeted by an automatic tool.  Finally, a custom circuit language could also have been targeted, which can then be translated to Verilog in an unverified way, however, some guarantees would be lost and it would not be possible to completely trust the output. %\JW{Can we mention one or two alternatives that we considered? Bluespec or Chisel or one of Adam Chlipala's languages, perhaps?}
+Verilog~\cite{06_ieee_stand_veril_hardw_descr_languag} is an HDL that can be synthesised into logic gates \JP{Should we say ``logic cells'' instead? FPGAs target LUTs, and ASIC processes target standard cells (which may or may not be logic gates).} \JW{Fine by me.} which can be either placed onto a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC).  Verilog was chosen as the output language for \vericert{} because it is one of the most popular HDLs and there already exist a few formal semantics for it that could be used as a target~\cite{loow19_verif_compil_verif_proces, meredith10_veril}. Other possible targets could have been Bluespec, a higher level hardware description language, for which there exists a formally verified translation to circuits using K\^{o}ika~\cite{bourgeat20_essen_blues}, however, targeting this language would not be trivial as it is not meant to be targeted by an automatic tool.  Finally, a custom circuit language could also have been targeted, which can then be translated to Verilog in an unverified way, however, some guarantees would be lost and it would not be possible to completely trust the output. \JP{Is this meant to be a dig at Koika :p It feels a bit odd mentioning Koika and then saying this without any comment or evaluation.} \JP{What about that IR from ETH (I think) at PLDI 2020? Obviously didn't exist at the start, but might be worth comment.} %\JW{Can we mention one or two alternatives that we considered? Bluespec or Chisel or one of Adam Chlipala's languages, perhaps?}
 
 \paragraph{Choice of implementation language}
-The framework that was chosen for the frontend was \compcert{}, as it is a mature framework for simulation proofs about intermediate languages, in addition to already providing a validated parser~\cite{jourdan12_valid_lr_parser} from C into the internal representation of Clight.  Other frameworks were also considered, such as Vellvm~\cite{zhao12_formal_llvm_inter_repres_verif_progr_trans}, as LLVM IR in particular is often used by HLS tools anyways, however, these would require more work to support a higher level language such as C as input, or even providing a parser for LLVM IR.\@
+The framework that was chosen for the frontend was \compcert{}, as it is a mature framework for simulation proofs about intermediate languages, in addition to already providing a validated parser~\cite{jourdan12_valid_lr_parser} from C into the internal representation of Clight.  Other frameworks were also considered, such as Vellvm~\cite{zhao12_formal_llvm_inter_repres_verif_progr_trans}, as LLVM IR in particular is often used by HLS tools anyways, however, these would require more work to support a higher level language such as C as input, or even providing a parser for LLVM IR.\@ \JP{No mention of Coq here.}
 
 \begin{figure}
   \centering
@@ -48,11 +51,11 @@ The framework that was chosen for the frontend was \compcert{}, as it is a matur
 \paragraph{Architecture of \vericert{}}
 The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.
 
-\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various optimisations.  When designing a new back end for \compcert{}, it is crucial to know where to branch off, so as to benefit from all the useful optimisations that \compcert{} performs, but not performing optimisations that are not useful, which include optimisations that are specific to the target CPU architecture.  These optimisations include register allocation, as there is not a fixed number of registers that need to be targeted.
+\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various optimisations. \JP{Nit: the languages don't contain optimisations, the translation processes do. Maybe we should rephrase this slightly.}  When designing a new back end for \compcert{}, it is crucial to know where to branch off, so as to benefit from all the useful optimisations that \compcert{} performs, but not performing optimisations that are not useful, which include optimisations that are specific to the target CPU architecture.  These optimisations include register allocation, as there is not a fixed number of registers that need to be targeted.
 
 To choose where to branch off at, each intermediate language in \compcert{} can be evaluated to see if it is suitable to be transformed into hardware.  Existing HLS compilers often use LLVM IR as an intermediate representation when performing HLS-specific optimisations, as each instruction can be mapped quite well to hardware that performs the same behaviour.  Looking at the intermediate languages in \compcert{} shown in Figure~\ref{fig:rtlbranch}, there are many languages to choose from.  Clight and CminorSel are an abstract syntax tree (AST) representation of the C code, which does not correspond to a more assembly like language similar to LLVM IR.\@  In addition to that, looking at the languages from LTL to PPC, even though these languages do contain basic blocks, which are desirable when doing HLS, starting from LTL the number of registers is limited.  Register allocation limits the number of registers when translating 3AC into LTL, and stores variables on the stack if that is required.  This is not needed when performing HLS, as there are many more registers available, and it is preferable to use these instead of RAM whenever possible.
 
-\compcert{}'s three-address code (3AC)\footnote{Three-address code (3AC) is also known as register transfer language (RTL) in the \compcert{} literature, however, 3AC is used in this paper instead so as not to confuse it with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware.  3AC is represented as a control flow graph (CFG) in CompCert.  Each instruction is a node in the graph and links to the instructions that follow it.  This CFG then describes how the computation should proceed, and is a good representation for performing optimisations on as well as local transformations.  However, one difference between LLVM IR and 3AC is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract.  This can be mitigated by making \compcert{} target a specific architecture such as x86\_32, where most operations translate quite well into hardware.  In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS back end. The complete flow that \vericert{} takes is show in figure~\ref{fig:rtlbranch}.
+\compcert{}'s three-address code (3AC)\footnote{Three-address code (3AC) is also known as register transfer language (RTL) in the \compcert{} literature, however, 3AC is used in this paper instead so as not to confuse it with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware \JP{This needs qualification since the instructions are the target instructions of the architecture (i.e.\ they aren't a fixed set).}.  3AC is represented as a control flow graph (CFG) in CompCert.  Each instruction is a node in the graph and links to the instructions that follow it.  This CFG then describes how the computation should proceed, and is a good representation for performing optimisations on as well as local transformations.  However, one difference between LLVM IR and 3AC is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract.  This can be mitigated by making \compcert{} target a specific architecture such as x86\_32, where most operations translate quite well into hardware.  In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS back end. The complete flow that \vericert{} takes is show in figure~\ref{fig:rtlbranch}.
 
 \begin{figure}
   \centering
@@ -217,11 +220,11 @@ However, the memory model that \compcert{} uses for its intermediate languages~\
 
 % Mention that this optimisation is not performed sometimes (clang -03).
 
-\vericert{} performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware can run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of \vericert{}, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
+\vericert{} performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware can run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of \vericert{}, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish. \JP{Multi-cycle paths might be something worth exploring in future work: fairly error-prone/dangerous for hand-written code, but might be interesting in generated code.}
 
 These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
 
-Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two.  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:
+Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two. \JP{People will laugh at us if we don't use the term ``strength-reduction'' somewhere here}  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:
 
 \begin{align*}
   &\forall x, y \in \mathbb{Z},\ \ 0 \leq y < 31,\ \ -2^{31} \leq x < 2^{31},\\
diff --git a/introduction.tex b/introduction.tex
index 39da7ff..59e0fce 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -6,7 +6,7 @@
 
 %\NR{I couldn't have subsections in comments so I have appended my writing to the bottom of this file.}\YH{The original intro is in the archive, we can maybe merge them in the future a bit.}
 
-\subsection{Can you trust your high-level synthesis tool?}
+\paragraph{Can you trust your high-level synthesis tool?}
 
 As latency, throughput and energy efficiency become increasingly important, custom hardware accelerators are being designed for numerous applications. 
 Alas, designing these accelerators can be a tedious and error-prone process using a hardware description language (HDL) such as Verilog.
@@ -23,7 +23,7 @@ For instance, Vivado HLS has been shown to apply pipelining optimisations incorr
 Meanwhile, \citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so many of their test inputs.  
 And more recently, Du et al.~\cite{du+20} fuzz-tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated programs to the C fragment explicitly supported by all the tools, they still found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.  
 
-\subsection{Existing workarounds}
+\paragraph{Existing workarounds}
 
 Aware of the reliability shortcomings of HLS tools, hardware designers routinely check the generated hardware for functional correctness. This is commonly done by simulating the design against a large test-bench; however, the guarantees are only as good as the test-bench, meaning that if all the inputs were not covered exhaustively, there is a risk that bugs remain. 
 
@@ -33,11 +33,10 @@ For example, the translation validation for Catapult C~\cite{mentor20_catap_high
 
 Our position is that none of the above workarounds are necessary if the HLS tool can simply be trusted to work correctly.
 
-\subsection{Our solution}
+\paragraph{Our solution}
 We have written a new HLS tool in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel} and proven that it any output it produces always has the same behaviour as its input. Our tool, which is called \vericert{},\ifANONYMOUS\footnote{Tool name has been changed for blind review.}\fi{} is automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is the same as the implementation of the tool. \vericert{} is built by extending the \compcert{} verified C compiler~\cite{leroy09_formal_verif_realis_compil} with a new hardware-specific intermediate language and a Verilog back end. It supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables.
 
-\subsection{Contributions and Outline}
-
+\paragraph{Contributions and Outline}
 The contributions of this paper are as follows:
 
 \begin{itemize}
diff --git a/references.bib b/references.bib
index 00bcc64..2212317 100644
--- a/references.bib
+++ b/references.bib
@@ -39,6 +39,15 @@ url = {http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/},
 year = {2020},
 }
 
+@INPROCEEDINGS{5522874,
+  author={Dan Gajski and Todd Austin and Steve Svoboda},
+  booktitle={Design Automation Conference}, 
+  title={What input-language is the best choice for high level synthesis (HLS)?}, 
+  year={2010},
+  volume={},
+  number={},
+  pages={857-858},
+  doi={10.1145/1837274.1837489}}
 
 @article{takach16_high_level_synth,
   author          = {A. {Takach}},