1 files changed, 11 insertions, 6 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 447a215..1f011eb 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -115,7 +115,7 @@ The first step of the translation is to use \compcert{} to transform the input C
 
 %   + TODO Explain how memory is mapped
 
-The first translation performed in Vericert is from 3AC to a \JW{new} hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog.  The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD) \JW{I feel like this could use some sort of citation, but I'm not sure what. I guess this is all from "Hardware Design 101", right?} representation in HTL.\@  The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations that update the memory and registers.  Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD. \JW{I think it would be worth having a sentence to explain how the C model of memory is translated to a hardware-centric model of memory. For instance, in}
+The first translation performed in Vericert is from 3AC to a \JW{new} hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog.  The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD) \JW{I feel like this could use some sort of citation, but I'm not sure what. I guess this is all from "Hardware Design 101", right?} representation in HTL.\@  The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations that update the memory and registers.  Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD. \JW{I think it would be worth having a sentence to explain how the C model of memory is translated to a hardware-centric model of memory. For instance, in C we have global variables/arrays, stack-allocated variables/arrays, and heap-allocated variables/arrays (anything else?). In Verilog we have registers and RAM blocks. So what's the correspondence between the two worlds? Globals and heap-allocated are not handled, stack-allocated variables become registers, and stack-allocated arrays become RAM blocks? Am I close?}
 
 \begin{figure*}
   \centering
@@ -133,9 +133,11 @@ In the \JWcouldcut{accumulator} C code, the for loop \JW{for-loop is easier to p
 
 \subsection{Translating HTL to Verilog}
 
-Finally, we have to translate the HTL code into proper Verilog and prove that it behaves the same as the 3AC according to the Verilog semantics.  The Verilog output is modelled as a complete abstract syntax tree (AST) instead of being an abstract map over the instructions that are executed.  However, as all the instructions are already expressed in Verilog, only the maps need to be translated to valid Verilog, and correct declarations for all the variables in the program need to be added as well.
+Finally, we have to translate the HTL code into proper Verilog and prove that it behaves the same as the 3AC according to the Verilog semantics.  The Verilog output is modelled as a complete abstract syntax tree (AST) instead of being an abstract map over the instructions that are executed. \JW{I find `an abstract map over the instructions that are executed' hard to unpick. How about ``Whereas HTL is a language that is specifically designed to represent the FSMDs we are interested in, Verilog is a general-purpose HDL. So the challenge here is to translate our FSMD representation into a Verilog AST.''}  However, as all the instructions are already expressed in Verilog, only the maps need to be translated to valid Verilog, and correct declarations for all the variables in the program need to be added as well.
 
-This translation seems quite straightforward, however, proving that it is correct is not that simple, as all the implicit assumptions that were made in HTL need to be translated explicitly to Verilog and needs to have the same behaviour according to the semantics.  Figure~\ref{fig:accumulator_v} shows the final Verilog output that is generated.  In general, the structure is similar to the structure of the HTL code, however, the control and datapath maps have been translated to case statement which serve the same purpose.  The other main addition to the code is the initialisation of all the variables in the code to the correct bitwidth and the declaration of the inputs and outputs to the module, so that the module can be used inside a larger hardware design.  The main subtle change that was added to the code is the reset signal which sets the state to the starting state correctly.  In HTL, this was described directly in the semantics, where the entrypoint is stored in the module interface of HTL.  However, in Verilog we also want to verify that the hardware will be placed into the right state when we power it up or reset the design, and it therefore has to be encoded directly in the Verilog code.
+This translation seems quite straightforward, however, proving that it is correct is not that simple, as all the implicit assumptions that were made in HTL need to be translated explicitly to Verilog and needs to have the same behaviour according to the semantics. 
+\JW{What does `needs' refer to? Missing `it' perhaps?}
+Figure~\ref{fig:accumulator_v} shows the final Verilog output that is generated.  In general, the structure is similar to \JWcouldcut{the structure}\JW{that} of the HTL code, however, the control and datapath maps have been translated to case statements \JWcouldcut{that serve the same purpose}.  The other main addition to the code is the initialisation of all the variables in the code to the correct bitwidths and the declaration of the inputs and outputs to the module, so that the module can be used inside a larger hardware design.  
 
 \begin{figure}
   \centering
@@ -210,19 +212,22 @@ endmodule
   \caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog. \JW{If space permits, it would probably be preferable to have this code in a single column, as splitting a single module across two subfigures is a bit jarring.}}\label{fig:accumulator_v}
 \end{figure}
 
+\JW{What do you think about moving the following paragraph to the Key Challenges section?}
+The main subtle change that was added to the code is the reset signal which sets the state to the starting state correctly.  In HTL, this was described directly in the semantics, where the entry point is stored in the module interface of HTL.  However, in Verilog we also want to verify that the hardware will be placed into the right state when we power it up or reset the design, and it therefore has to be encoded directly in the Verilog code.
 To check that the initial state of the Verilog is the same as the HTL code, we therefore have to run the module once, assuming the state is undefined and the reset is set to high.  We then have to compare the abstract starting state stored in the HTL module to the bitvector value we obtain from running the module for one clock cycle and prove that they are the same.  As the value for the state is undefined, the case statements will evaluate to the default state and therefore not perform any computations.
 
+\JW{What do you think about moving the following two paragraphs to the Proof section?}
 The translation from maps to case statements is done by turning each node of the tree into a case expression with the statments in each being the same.  The main difficulty for the proof is that a structure that can be directly accessed is transformed into an inductive structure where a certain number of constructors need to be called to get to the correct case.  The proof of the translation from maps to case statements follows by induction over the list of elements in the map and the fact that each key in this list will be unique.  In addition to that, the statement that is currently being evaluated is guaranteed by the correctness of the list of elements to be in that list.  The latter fact therefore eliminates the base case, as an empty list does not contain the element we know is in the list.  The other two cases follow by the fact that either the key is equal to the evaluated value of the case expression, or it isn't.  In the first case we can then evaluate the statement and get the state after the case expression, as the uniqueness of the key tells us that the key cannot show up in the list anymore.  In the other case we can just apply the inductive hypothesis and remove the current case from the case statement, as it did not match.
 
-Another problem with the representation of the state as an actual register is that we have to make sure that the state does not overflow.  Currently, the state register will always be 32 bits, meaning the maximum number of states can only be $2^{32} - 1$.  We therefore have to prove that the state value will never go over that value.  This means that during the translation we have to check for each state that it can fit into an integer.
+Another problem with the representation of the state as an actual register is that we have to make sure that the state does not overflow.  Currently, the state register will always be 32 bits, meaning the maximum number of states can only be $2^{32} - 1$.  We therefore have to prove that the state value will never go over that value.  This means that during the translation we have to check for each state that it can fit into an integer. \JW{So I guess this means that Vericert will refuse to compile a program with 5 million instructions? If so, might be worth making that explicit, and maybe even making `Size of state register' one of your Key Challenges?}
 
 \subsection{Key Challenges}
 
-This subsection lists some key challenges that were encountered when implementing the translation from 3AC to HTL and subsequently Verilog.
+This subsection lists some key challenges that were encountered when implementing the translations from 3AC to HTL to Verilog.
 
 \subsubsection{Byte- and word-addressable memories}
 
-One big difference between C and Verilog is how memory can be represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring a two dimensional array with specific properties.  One main limitation is that RAMs often only allow one read and one write per clock cycle, for example if implementing single port RAM, which is the most common type of RAM.  To make loads and stores as efficient as possible, the RAM needs to be implemented as being word addressable, so that a load and store of an integer can be done in one clock cycle.
+One big difference between C and Verilog is how memory is represented.  In hardware, efficient RAMs are not as available as in software, and need to be explicitly implemented by declaring two-dimensional arrays with specific properties.  A major limitation is that \JW{common} RAMs often only allow one read and one write per clock cycle, \JWcouldcut{for example if implementing single port RAM, which is the most common type of RAM}.  To make loads and stores as efficient as possible, the RAM needs to be implemented as being word addressable, so that a load and store of an integer can be done in one clock cycle.
 
 However, the memory model that \compcert{} uses for it's intermediate languages~\cite{blazy05_formal_verif_memor_model_c} is byte-addressable.  It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware.  Any modifications of the bytes in the \compcert{} memory model have to also be shown to modify the word-addressable memory in the same way.  As only integer loads and stores are currently supported for the HLS backend, it follows that the addresses given to the loads and stores should be divisible by four.  If that is the case, then the translation from byte-addressed memory to word-addressed memory could be done by dividing the address by four and subtracting by the base address of the memory.