Add more content

author: Yann Herklotz <git@yannherklotz.com> 2020-11-02 14:57:00 +0000
committer: Yann Herklotz <git@yannherklotz.com> 2020-11-02 14:57:00 +0000
commit: 30b6a0f2a4709520d5aecec9959a2ea4f28b6529 (patch)
tree: 17cf6304c616e1fe907794416700110e4fab79ff
parent: e40fa69ac8bf0a3b38edc63bdf33e4e879c42b9b (diff)
download: oopsla21_fvhls-30b6a0f2a4709520d5aecec9959a2ea4f28b6529.tar.gz
oopsla21_fvhls-30b6a0f2a4709520d5aecec9959a2ea4f28b6529.zip
4 files changed, 29 insertions, 21 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 684975c..5144bb9 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -31,6 +31,10 @@ This section covers the main architecture of the HLS tool, and the way in which
 
 The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.
 
+\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various different optimisations.  When designing a new backend for \compcert{}, it is crucial to know where to branch off and start the hardware generation.  Many optimisations that the CompCert backend performs are not necessary when generating custom hardware, meaning no CPU architecture is being targeted.  These optimisations include register allocation, as there is no more fixed number of registers that need to be targeted.  It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
+
+Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour.  CompCert's three address code (3AC)\footnote{Three address code (3AC) is also know as register transfer language (RTL) in the CompCert literature, however, 3AC is used in this paper so as not to confuse it with register transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware.  However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract.  This can be mitigated by making CompCert target a specific architecture such as x86\_32, where most operations translate quite well into hardware.  In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.
+
 % - Explain main differences between translating C to software and to hardware.
 
 %   + This can be done by going through the simple example.
@@ -43,7 +47,7 @@ The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which
   \end{subfigure}%
   \begin{subfigure}[b]{0.5\linewidth}
     \inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
-    \caption{Accumulator RTL code.}\label{fig:accumulator_rtl}
+    \caption{Accumulator 3AC code.}\label{fig:accumulator_rtl}
   \end{subfigure}
   \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
 \end{figure}
@@ -58,10 +62,14 @@ The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which
     \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
     \caption{Accumulator Verilog code.}\label{fig:accumulator_v_2}
   \end{subfigure}
-  \caption{Accumulator example using \compcert{} to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.} \JW{Ok, what about drawing the FSMD in some sort of pictorial way? I think you tried drawing it as a schematic previously, but that was too big and clumsy. What about drawing the FSMD as a state machine, with sixteen states and labelled edges between them? Or might that be too abstract?}}\label{fig:accumulator_v}
+  \caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog.}\label{fig:accumulator_v}
 \end{figure}
 
-Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behaviour description into a hardware design.
+\subsection{Example C to Verilog translation}
+
+Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behavioural description into a hardware design.
+
+The first step of the translation is to use \compcert{} to transform the C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}, including many optimisations that are performed in the 3AC language.  This includes optimisations such as constant propagation and dead-code elimination.  Function inlining is also performed, and it is used as a way to support function calls instead of having to support the \texttt{Icall} 3AC instruction.  The duplication of the function bodies caused by inlining does affect the total area of the hardware,
 
 \begin{figure*}
   \centering
@@ -71,7 +79,7 @@ Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, w
 
 %   + TODO Explain the main mapping in a short simple way
 
-% - Explain why we chose 3AC (RTL) as the branching off point.
+% - Explain why we chose 3AC (3AC) as the branching off point.
 
 %   + TODO Clarify connection between CFG and FSMD
 
diff --git a/proof.tex b/proof.tex
index 58a5417..c5698c2 100644
--- a/proof.tex
+++ b/proof.tex
@@ -18,17 +18,17 @@ To prove this statement, we therefore have to prove that the Verilog semantics a
 
 \subsection{Forward Simulation}
 
-The forward simulation between C and Verilog can be separated into forward simulations of each compiler pass, which can then be composed to provide a whole proof from C to Verilog.  We therefore only have to prove a forward simulation for the RTL to HTL translation, and for the HTL to Verilog translation.
+The forward simulation between C and Verilog can be separated into forward simulations of each compiler pass, which can then be composed to provide a whole proof from C to Verilog.  We therefore only have to prove a forward simulation for the 3AC to HTL translation, and for the HTL to Verilog translation.
 
-\subsubsection{RTL to HTL forward simulation}
+\subsubsection{3AC to HTL forward simulation}
 
-As HTL is quite different to RTL, this first translation is the most involved translation and therefore requires a larger proof, as the translation from RTL instructions to Verilog statements needs to be proven correct.  In addition to that, the semantics of HTL are also quite different to the RTL semantics, as instead of defining small-step semantics for each construct in Verilog, the semantics are instead defined over one clock cycle.
+As HTL is quite different to 3AC, this first translation is the most involved translation and therefore requires a larger proof, as the translation from 3AC instructions to Verilog statements needs to be proven correct.  In addition to that, the semantics of HTL are also quite different to the 3AC semantics, as instead of defining small-step semantics for each construct in Verilog, the semantics are instead defined over one clock cycle.
 
 \begin{figure}
   \centering
 \begin{minted}{coq}
 Inductive match_states :
-  RTL.state -> HTL.state -> Prop :=
+  3AC.state -> HTL.state -> Prop :=
 | match_state : forall asa asr sf f sp
     sp' rs mem m st res
     (MASSOC : match_assocmaps f rs asr)
@@ -39,44 +39,44 @@ Inductive match_states :
     (SP : sp = Values.Vptr sp' (Integers.Ptrofs.repr 0))
     (RSBP : reg_stack_based_pointers sp' rs)
     (ASBP : arr_stack_based_pointers sp' mem
-              (f.(RTL.fn_stacksize)) sp)
-    (BOUNDS : stack_bounds sp (f.(RTL.fn_stacksize)) mem)
+              (f.(3AC.fn_stacksize)) sp)
+    (BOUNDS : stack_bounds sp (f.(3AC.fn_stacksize)) mem)
     (CONST : match_constants m asr),
     match_states
-      (RTL.State sf f sp st rs mem)
+      (3AC.State sf f sp st rs mem)
       (HTL.State res m st asr asa)
 | match_returnstate : forall v v' stack mem res
     (MF : match_frames stack res),
     val_value_lessdef v v' ->
     match_states
-      (RTL.Returnstate stack v mem)
+      (3AC.Returnstate stack v mem)
       (HTL.Returnstate res v')
 | match_initial_call :
   forall f m m0 (TF : tr_module f m),
     match_states
-      (RTL.Callstate nil (AST.Internal f) nil m0)
+      (3AC.Callstate nil (AST.Internal f) nil m0)
       (HTL.Callstate nil m nil).
 \end{minted}
-  \caption{\texttt{match\_states} predicate used to match an RTL state to the equivalent HTL state.}\label{fig:match_states}
+  \caption{\texttt{match\_states} predicate used to match an 3AC state to the equivalent HTL state.}\label{fig:match_states}
 \end{figure}
 
-The first step in proving the forward simulation is to define a relation that matches an RTL state to an HTL state, which shows when the states are equivalent.  This relation also defines assumptions that are made about the RTL code that we receive, so that these assumptions can be used to prove the translations correct.  These assumptions then have to be proven to always hold assuming the HTL code was created by the translation algorithm.  The \texttt{match\_states} predicate that is used to match the states of the RTL code to the HTL code is shown in Figure~\ref{fig:match_states}.  The type \texttt{match\_states} declared in Figure~\ref{fig:match_states} has three constructors.
+The first step in proving the forward simulation is to define a relation that matches an 3AC state to an HTL state, which shows when the states are equivalent.  This relation also defines assumptions that are made about the 3AC code that we receive, so that these assumptions can be used to prove the translations correct.  These assumptions then have to be proven to always hold assuming the HTL code was created by the translation algorithm.  The \texttt{match\_states} predicate that is used to match the states of the 3AC code to the HTL code is shown in Figure~\ref{fig:match_states}.  The type \texttt{match\_states} declared in Figure~\ref{fig:match_states} has three constructors.
 
 \begin{enumerate}
-  \item \texttt{match\_state} is the main constructor which matches an \texttt{RTL.State} to a \texttt{HTL.State}, which during the normal execution of instructions in the function.
-  \item \texttt{match\_returnstate} is the constructor used to match return statements in RTL to HTL.\@  Even though function calls are not supported, there still has to be a notion of stack frames during the proof, as at the start of the program a stack frame is allocated, which then has to be deallocated (popped from the stack) when the main function returns its result.  The only condition on matching the return state is that the return values must be \emph{less defined} (either equal to, or if the C value is undefined any acceptable value is possible).
+  \item \texttt{match\_state} is the main constructor which matches an \texttt{3AC.State} to a \texttt{HTL.State}, which during the normal execution of instructions in the function.
+  \item \texttt{match\_returnstate} is the constructor used to match return statements in 3AC to HTL.\@  Even though function calls are not supported, there still has to be a notion of stack frames during the proof, as at the start of the program a stack frame is allocated, which then has to be deallocated (popped from the stack) when the main function returns its result.  The only condition on matching the return state is that the return values must be \emph{less defined} (either equal to, or if the C value is undefined any acceptable value is possible).
   \item \texttt{match\_initial\_state} matches the initial call to the main function, and cannot be used for any other function calls, as the stack frame is assumed to be \texttt{nil}.
 \end{enumerate}
 
 Using the \texttt{match\_states}, we can then define the correctness theorem for the translation.
 
 \begin{align*}
-  &\forall\ (S_{1} : \texttt{RTL.state})\ t\ S_{2},\ S_{1} \xrightarrow{t} S_{2}\\
+  &\forall\ (S_{1} : \texttt{3AC.state})\ t\ S_{2},\ S_{1} \xrightarrow{t} S_{2}\\
   &\implies \forall\ (R_{1} : \texttt{HTL.state}),\ \texttt{match\_states}\ S_{1}\ R_{1}\\
   &\implies \exists R_{2},\ R_{1} \xrightarrow{t}_{+} R_{2} \land \texttt{match\_states}\ S_{2}\ R_{2}.
 \end{align*}
 
-$S_{1}$ and $S_{2}$ are RTL states and $R_{1}$ and $R_{2}$, HTL states and $\xrightarrow{t}$ is one step in the semantics of RTL and $\xrightarrow{t}_{+}$ is one or more steps in the semantics of HTL.\@  The correctness theorem then says that for all possible starting states $S_{1}$ and for the resulting state $S_{2}$ after one step in the semantics of RTL, for all HTL states $R_{1}$ that matches the state $S_{1}$, there should exist a state $R_{2}$ such that $R_{2}$ is the result of the simulation of the HTL semantics and that the states $S_{2}$ and $R_{2}$ should match as well.  Using this property, the forward simulation can then be proven correct by also showing that the initial and final states of the simulation match as well.
+$S_{1}$ and $S_{2}$ are 3AC states and $R_{1}$ and $R_{2}$, HTL states and $\xrightarrow{t}$ is one step in the semantics of 3AC and $\xrightarrow{t}_{+}$ is one or more steps in the semantics of HTL.\@  The correctness theorem then says that for all possible starting states $S_{1}$ and for the resulting state $S_{2}$ after one step in the semantics of 3AC, for all HTL states $R_{1}$ that matches the state $S_{1}$, there should exist a state $R_{2}$ such that $R_{2}$ is the result of the simulation of the HTL semantics and that the states $S_{2}$ and $R_{2}$ should match as well.  Using this property, the forward simulation can then be proven correct by also showing that the initial and final states of the simulation match as well.
 
 \subsubsection{HTL to Verilog forward simulation}
 
diff --git a/related.tex b/related.tex
index f38a7c9..a2da530 100644
--- a/related.tex
+++ b/related.tex
@@ -1,6 +1,6 @@
 \section{Related Work}
 
-Work is being done to prove the equivalence between the generated hardware and the original behavioural description in C.  An example of a tool that implements this is Mentor's Catapult~\cite{mentor20_catap_high_level_synth}, which tries to match the states in the register transfer level (RTL) description to states in the original C code after an unverified translation.  This technique is called translation validation~\cite{pnueli98_trans}, whereby the translation that the HLS tool performed is proven to have been correct for that input, by showing that they behave in the same way for all possible inputs.  Using translation validation is quite effective for proving complex optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}, however, the validation has to be run every time the high-level synthesis is performed.  In addition to that, the proofs are often not mechanised or directly related to the actual implementation, meaning the verifying algorithm might be wrong and could give false positives or false negatives.
+Work is being done to prove the equivalence between the generated hardware and the original behavioural description in C.  An example of a tool that implements this is Mentor's Catapult~\cite{mentor20_catap_high_level_synth}, which tries to match the states in the three adress code (3AC) description to states in the original C code after an unverified translation.  This technique is called translation validation~\cite{pnueli98_trans}, whereby the translation that the HLS tool performed is proven to have been correct for that input, by showing that they behave in the same way for all possible inputs.  Using translation validation is quite effective for proving complex optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}, however, the validation has to be run every time the high-level synthesis is performed.  In addition to that, the proofs are often not mechanised or directly related to the actual implementation, meaning the verifying algorithm might be wrong and could give false positives or false negatives.
 
 \JW{Some papers to cite somewhere:
 \begin{itemize}
diff --git a/verilog.tex b/verilog.tex
index b672145..cdcbe10 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -25,7 +25,7 @@ The main addition to the Verilog syntax is the explicit declaration of inputs an
 
 \subsection{Semantics}
 
-An existing operational semantics for Verilog~\cite{loow19_verif_compil_verif_proces} was adapted for the semantics of the language that Vericert eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of \compcert{}'s register transfer language (RTL) quite well.
+An existing operational semantics for Verilog~\cite{loow19_verif_compil_verif_proces} was adapted for the semantics of the language that Vericert eventually targets.  This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.  This style of semantics matches the small-step operational semantics of \compcert{}'s three address code (3AC) quite well.
 
 %~\NR{So the semantics does not support combinational always blocks or assignments?}\YH{Currently not, as they are not needed.}
 %~\NR{Does the semantics also enforce that variables can only be modified within one always block (multiple drivers not allowed)? PL community might think of data races. }\YH{Will add that.}
author	Yann Herklotz <git@yannherklotz.com>	2020-11-02 14:57:00 +0000
committer	Yann Herklotz <git@yannherklotz.com>	2020-11-02 14:57:00 +0000
commit	30b6a0f2a4709520d5aecec9959a2ea4f28b6529 (patch)
tree	17cf6304c616e1fe907794416700110e4fab79ff
parent	e40fa69ac8bf0a3b38edc63bdf33e4e879c42b9b (diff)
download	oopsla21_fvhls-30b6a0f2a4709520d5aecec9959a2ea4f28b6529.tar.gz oopsla21_fvhls-30b6a0f2a4709520d5aecec9959a2ea4f28b6529.zip