summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
blob: 5144bb92259d5b51a830ecc2bced3d0391dff2b7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
\section{Adding an HLS back end to \compcert{}}

This section covers the main architecture of the HLS tool, and the way in which the backend was added to \compcert{}.  This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.

\begin{figure}
  \centering
  \resizebox{0.47\textwidth}{!}{
  \begin{tikzpicture}
    [language/.style={fill=white,rounded corners=2pt}]
    \fill[compcert,rounded corners=3pt] (-1,-1) rectangle (9,1.5);
    \fill[formalhls,rounded corners=3pt] (-1,-1.5) rectangle (9,-2.5);
    \node[language] at (0,0) (clight) {Clight};
    \node[language] at (2,0) (cminor) {C\#minor};
    \node[language] at (4,0) (rtl) {3AC};
    \node[language] at (6,0) (ltl) {LTL};
    \node[language] at (8,0) (ppc) {PPC};
    \node[language] at (4,-2) (dfgstmd) {HTL};
    \node[language] at (7,-2) (verilog) {Verilog};
    \node at (0,1) {\compcert{}};
    \node at (0,-2) {Vericert};
    \draw[->] (clight) -- (cminor);
    \draw[->,dashed] (cminor) -- (rtl);
    \draw[->] (rtl) -- (ltl);
    \draw[->,dashed] (ltl) -- (ppc);
    \draw[->] (rtl) -- (dfgstmd);
    \draw[->] (dfgstmd) -- (verilog);
  \end{tikzpicture}}
  \caption{Verilog backend to Compcert, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine.  Finally, it is transformed to a hardware description of the state machine in Verilog.}%
  \label{fig:rtlbranch}
\end{figure}

The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.

\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output.  These intermediate languages each serve a different purpose and contain various different optimisations.  When designing a new backend for \compcert{}, it is crucial to know where to branch off and start the hardware generation.  Many optimisations that the CompCert backend performs are not necessary when generating custom hardware, meaning no CPU architecture is being targeted.  These optimisations include register allocation, as there is no more fixed number of registers that need to be targeted.  It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.

Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour.  CompCert's three address code (3AC)\footnote{Three address code (3AC) is also know as register transfer language (RTL) in the CompCert literature, however, 3AC is used in this paper so as not to confuse it with register transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware.  However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract.  This can be mitigated by making CompCert target a specific architecture such as x86\_32, where most operations translate quite well into hardware.  In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations.  It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.

% - Explain main differences between translating C to software and to hardware.

%   + This can be done by going through the simple example.

\begin{figure}
  \centering
  \begin{subfigure}[b]{0.5\linewidth}
    \inputminted{c}{data/accumulator.c}
    \caption{Accumulator C code.}\label{fig:accumulator_c}
  \end{subfigure}%
  \begin{subfigure}[b]{0.5\linewidth}
    \inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
    \caption{Accumulator 3AC code.}\label{fig:accumulator_rtl}
  \end{subfigure}
  \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
\end{figure}

\begin{figure}
  \centering
  \begin{subfigure}[b]{0.5\linewidth}
    \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator1.v}
    \caption{Accumulator HTL code.}\label{fig:accumulator_v_1}
  \end{subfigure}%
  \begin{subfigure}[b]{0.5\linewidth}
    \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
    \caption{Accumulator Verilog code.}\label{fig:accumulator_v_2}
  \end{subfigure}
  \caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog.}\label{fig:accumulator_v}
\end{figure}

\subsection{Example C to Verilog translation}

Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behavioural description into a hardware design.

The first step of the translation is to use \compcert{} to transform the C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}, including many optimisations that are performed in the 3AC language.  This includes optimisations such as constant propagation and dead-code elimination.  Function inlining is also performed, and it is used as a way to support function calls instead of having to support the \texttt{Icall} 3AC instruction.  The duplication of the function bodies caused by inlining does affect the total area of the hardware,

\begin{figure*}
  \centering
  \includegraphics[scale=0.3,trim={10cm 10cm 7cm 7cm},clip=true]{data/accumulator_fsmd2.pdf}
  \caption{Accumulator hardware diagram.}\label{fig:accumulator_diagram}
\end{figure*}

%   + TODO Explain the main mapping in a short simple way

% - Explain why we chose 3AC (3AC) as the branching off point.

%   + TODO Clarify connection between CFG and FSMD

%   + TODO Explain how memory is mapped

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% TeX-command-extra-options: "-shell-escape"
%%% End: