2 files changed, 12 insertions, 6 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 0745638..d52a647 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -228,19 +228,22 @@ This subsection lists some key challenges that were encountered when implementin
 \subsubsection{Byte- and word-addressable memories}
 
 One big difference between C and Verilog is how memory is represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring two-dimensional arrays with specific properties.  A major limitation is that \JW{common} RAMs often only allow one read and one write per clock cycle, \JWcouldcut{for example if implementing single port RAM, which is the most common type of RAM}.  To make loads and stores as efficient as possible, the RAM needs to be \JWcouldcut{implemented as being} word-addressable, so that an entire integer can be loaded or stored in one clock cycle.
-However, the memory model that \compcert{} uses for its intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model have to also be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported for the HLS backend, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting by the base address of the memory.
+However, the memory model that \compcert{} uses for its intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported in our HLS back end, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting \JWcouldcut{by} the base address of the memory. \JW{Why does `subtracting the base address of the memory' have anything to do with whether the memory is byte or word addressed? Don't you have to do that either way? Or perhaps you're saying that your memory is not only word-addressed, but it also starts at address 0 rather than some random address like you'd get in software?}
 
 \subsubsection{Reset signals}
 
-\YH{This section could maybe go into the proof section instead.} In hardware there is also no notion of a stack which can be popped or pushed.  It is therefore necessary to add reset signals to the main module that will be called so that the proper state can be set at the start of the function.
+\YH{This section could maybe go into the proof section instead.} In hardware, there is no notion of a stack that can be popped or pushed.  It is therefore necessary to add reset signals to the main module \JWcouldcut{that will be called} so that the proper state can be set at the start of the function.
 
+\JW{The bit about running the module for one clock cycle with reset high could go here perhaps? Unless that's more a detail of the \emph{proof} rather than the \emph{implementation}?}
+
+\JW{Yeah I agree that the rest of this paragraph describes a detail of the proof and should be in the Proof section.}
 Even though functions calls are not supported by \vericert{}, the initial function call that \compcert{} uses to enter the main function does need to be supported in the Verilog semantics, as explained further in Section~\ref{sec:verilog}.  The reset signal therefore has to be reasoned about correctly in the Semantics and in the initial function call to ensure that the initial state of the module is set correctly, as in 3AC, the entry point of the function is part of the function definition.
 
-\subsubsection{\texttt{Oshrximm} instruction}
+\subsubsection{Implementing the \texttt{Oshrximm} instruction}
 
 % Mention that this optimisation is not performed sometimes (clang -03).
 
-Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware could run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
+Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware can run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
 
 These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
 
diff --git a/introduction.tex b/introduction.tex
index aac8580..5383dba 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -74,8 +74,11 @@ Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations
 \subsection{Existing approach: HLS-generated hardware verification}
 
 It is rather difficult to exhaustively test to prove the absence of bugs on these large HLS codebases, which include custom compiler directives.
-Hence, most existing work on verifying HLS tools proving that the translation of a program instance to hardware is equaivalent, as known as translation validation. 
-Translation validation proves t
+Hence, most existing work on verifying HLS tools proving that HLS-generated hardware is equivalent to its software counterpart for all possible inputs of the program, as known as translation validation~\cite{pnueli98_trans}. 
+Translation validation has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
+However, translation validation often suffers from one key problem: the validator itself might not have been mechanically proven to be correct.
+The proof of the valide
+
 
 To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there is a risk that bugs remain in the untested code.  To ensure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent.  Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially.  One example is Catapult C~\cite{mentor20_catap_high_level_synth}, which allows the use of translation validation between the C and RTL code, however, as they mention themselves~\cite{whitepaper}, this process might require quite a bit of iteration on the C code so that the equivalence checker can prove this correctly.  This can be a tedious process, as one cannot be sure about what sections in the C need to change to help the equivalence checker prove the equality.  In addition to that, the larger the design, the more infeasible using an equivalence checking tool like this would be.