summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-07-06 01:22:14 +0100
committerYann Herklotz <git@yannherklotz.com>2020-07-06 01:22:14 +0100
commit64f10aab13b694adca6566b10d5dab564a981a09 (patch)
tree2f853458053a9df53483b9660a88a37d80e89787
parentd9e7ab6c4e8632510e69abd82fbd11ff250ab733 (diff)
downloadoopsla21_fvhls-64f10aab13b694adca6566b10d5dab564a981a09.tar.gz
oopsla21_fvhls-64f10aab13b694adca6566b10d5dab564a981a09.zip
Add accumulator_fsmd.pdf and finish Verilog section
-rw-r--r--.gitignore1
-rw-r--r--algorithm.tex16
-rw-r--r--data/accumulator_fsmd.pdfbin0 -> 69106 bytes
-rw-r--r--proof.tex10
4 files changed, 25 insertions, 2 deletions
diff --git a/.gitignore b/.gitignore
index e40dd46..65aa54d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,7 +25,6 @@
*.brf
*.rip
*.fls
-*.pdf
*.fdb_latexmk
.auctex*/
diff --git a/algorithm.tex b/algorithm.tex
index 9043eaf..b76136e 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -61,6 +61,12 @@ Existing HLS compilers usually use LLVM IR as an intermediate representation whe
\caption{Accumulator example using CompCert to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.}}\label{fig:accumulator_htl_v}
\end{figure}
+\begin{figure}
+ \centering
+ \includegraphics[width=0.9]\textwidth]{data/accumulator_fsmd}
+ \caption{Accumulator example translated to a finite state-machine with datapath (FSMD).}\label{fig:accumulator_fsmd}
+\end{figure}
+
To describe the translation, we start with an example of how to translate a simple accumulator example, which is shown in figure~\ref{fig:accumulator_c}. Using this example, the different stages in the translation can be explained, together with how they were proven.
The first step is to use the CompCert front end passes to convert the C code into RTL, which is the intermediate language that CompCert uses for most of its optimisations. The translation is shown in figure~\ref{fig:accumulator_c_rtl}, where the RTL code is depicted in figure~\ref{fig:accumulator_rtl}. At the RTL stage, many of CompCert's normal optimisations passes, such as deadcode elimination or common subexpression elimination (CSE) can be applied to reduce the RTL as much as possible. The optimised RTL is then translated to HTL, consisting of a finite state-machine representation of the RTL code. This representation maps directly to hardware and it therefore transformed to Verilog afterwards. The translation from RTL to HTL is the main step in converting the software representation of the code into a hardware representation of the code. The following sections will go over the details of the transformation of each pass.
@@ -123,7 +129,15 @@ Finally, this example also contains an array, which is stored on the stack for t
\subsection{Verilog}
-Finally, we have to prove the translation from Verilog to hardware correct
+Finally, we have to translate the HTL code into proper Verilog and prove that it behaves the same as the RTL according to the Verilog semantics. The Verilog output is modelled as a complete abstract syntax tree (AST) instead of being an abstract map over the instructions that are executed. However, as all the instructions are already expressed in Verilog, only the maps need to be translated to valid Verilog, and correct declarations for all the variables in the program need to be added as well.
+
+This translation seems quite straightforward, however, proving that it is correct is not that simple, as all the implicit assumptions that were made in HTL need to be translated explicitly to Verilog and needs to have the same behaviour according to the semantics. Figure~\ref{fig:accumulator_v} shows the final Verilog output that is generated. In general, the structure is similar to the structure of the HTL code, however, the control and datapath maps have been translated to case statement which serve the same purpose. The other main addition to the code is the initialisation of all the variables in the code to the correct bitwidth and the declaration of the inputs and outputs to the module, so that the module can be used inside a larger hardware design. The main subtle change that was added to the code is the reset signal which sets the state to the starting state correctly. In HTL, this was described directly in the semantics, where the entrypoint is stored in the module interface of HTL. However, in Verilog we also want to verify that the hardware will be placed into the right state when we power it up or reset the design, and it therefore has to be encoded directly in the Verilog code.
+
+To check that the initial state of the Verilog is the same as the HTL code, we therefore have to run the module once, assuming the state is undefined and the reset is set to high. We then have to compare the abstract starting state stored in the HTL module to the bitvector value we obtain from running the module for one clock cycle and prove that they are the same. As the value for the state is undefined, the case statements will evaluate to the default state and therefore not perform any computations.
+
+The translation from maps to case statements is done by turning each node of the tree into a case expression with the statments in each being the same. The main difficulty for the proof is that a structure that can be directly accessed is transformed into an inductive structure where a certain number of constructors need to be called to get to the correct case. The proof of the translation from maps to case statements follows by induction over the list of elements in the map and the fact that each key in this list will be unique. In addition to that, the statement that is currently being evaluated is guaranteed by the correctness of the list of elements to be in that list. The latter fact therefore eliminates the base case, as an empty list does not contain the element we know is in the list. The other two cases follow by the fact that either the key is equal to the evaluated value of the case expression, or it isn't. In the first case we can then evaluate the statement and get the state after the case expression, as the uniqueness of the key tells us that the key cannot show up in the list anymore. In the other case we can just apply the inductive hypothesis and remove the current case from the case statement, as it did not match.
+
+Another problem with the representation of the state as an actual register is that we have to make sure that the state does not overflow. Currently, the state register will always be 32 bits, meaning the maximum number of states can only be $2^{32} - 1$. We therefore have to prove that the state value will never go over that value. This means that during the translation we have to check for each state that it can fit into an integer.
%%% Local Variables:
%%% mode: latex
diff --git a/data/accumulator_fsmd.pdf b/data/accumulator_fsmd.pdf
new file mode 100644
index 0000000..e7e029b
--- /dev/null
+++ b/data/accumulator_fsmd.pdf
Binary files differ
diff --git a/proof.tex b/proof.tex
index fdd140f..8bee352 100644
--- a/proof.tex
+++ b/proof.tex
@@ -1,5 +1,15 @@
\section{Proof}
+This section describes the main correctness theorem that was proven and the main ideas behind the proofs.
+
+The main correctness theorem states that for all Clight source programs $C$, if the translation from the source to the target Verilog code succeeds, assuming that $C$ has correct observable behaviour $B$ when executed, then the target Verilog code will simulate with the same behaviour $B$. The following theorem describes this property.
+
+\begin{equation*}
+ \forall C, V, B \notin \texttt{Wrong},\, \yhfunction{HLS} (C) = \yhconstant{OK} (V) \land C \Downarrow B \implies V \Downarrow_{s} B.
+\end{equation*}
+
+However, observable behaviours of C are actually not that important when translating directly to hardware, as there are no external function calls that produce events. Therefore, the only result that matters is the return value of the function, making sure that it is the same result as the original C code when executed with the same inputs.
+
\subsection{Coq Mechanisation}
%%% Local Variables: