summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--algorithm.tex140
-rw-r--r--archive/algorithm.tex4
2 files changed, 126 insertions, 18 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 8bc0a47..96743d0 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,6 +1,6 @@
\section{Adding an HLS back end to \compcert{}}
-\JW{The first part of this section (up to 2.1) is good but needs tightening up. Ultimately, the point of this section is to explain that there's an existing verified compiler called CompCert which has a bunch of stages, and we need to make a decision about where to tap into that pipeline. Too early and we miss out on some helpful optimisations; too late and we've ended up too target-specific. What if you put a few more stages into Figure 1 -- there aren't actually that many missing anyway. Suppose you add in Cminor between C\#minor and 3AC. Then the high-level structure of your argument in this subsection could be: (1) why Cminor is too early, (2) why LTL is too late, and then maybe (3) why 3AC is just right. The Goldilocks structure, haha!}
+%\JW{The first part of this section (up to 2.1) is good but needs tightening up. Ultimately, the point of this section is to explain that there's an existing verified compiler called CompCert which has a bunch of stages, and we need to make a decision about where to tap into that pipeline. Too early and we miss out on some helpful optimisations; too late and we've ended up too target-specific. What if you put a few more stages into Figure 1 -- there aren't actually that many missing anyway. Suppose you add in Cminor between C\#minor and 3AC. Then the high-level structure of your argument in this subsection could be: (1) why Cminor is too early, (2) why LTL is too late, and then maybe (3) why 3AC is just right. The Goldilocks structure, haha!}
This section covers the main architecture of the HLS tool, and the way in which the back end was added to \compcert{}. This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
@@ -38,9 +38,23 @@ This section covers the main architecture of the HLS tool, and the way in which
The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.
-\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various optimisations. When designing a new back end for \compcert{}, it is crucial to know where to branch off and start the hardware generation. \JW{Mismatch between `designing a new back end' (not HW-specific) and `hardware generation' (HW-specific).} Many optimisations that the CompCert back end \JW{`the CompCert back end' -> CompCert?} performs are not necessary when generating custom hardware, meaning no CPU architecture is being targeted. \JW{`meaning ...' -- this explanation comes too late to be useful} These optimisations include register allocation, as there is no more fixed \JW{there is not a fixed} number of registers that need to be targeted. It is therefore important to find the right intermediate language \JW{You already said `crucial to know where to branch off' so this feels repetitive} so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
+\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various optimisations. When designing a new back end for \compcert{}, it is crucial to know where to branch off, so as to benefit from all the useful optimisations that \compcert{} performs, but not performing optimisations that are not useful, which include optimisations that are specific to the target CPU architecture or. These optimisations include register allocation, as there is not a fixed number of registers that need to be targeted.
-Existing HLS compilers often use LLVM IR as an intermediate representation when performing HLS-specific optimisations, as each instruction can be mapped quite well to hardware that performs the same behaviour. CompCert's three-address code (3AC)\footnote{Three-address code (3AC) is also known as register transfer language (RTL) in the \compcert{} literature, however, 3AC is used in this paper instead so as not to confuse it with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract. This can be mitigated by making CompCert target a specific architecture such as x86\_32, where most operations translate quite well into hardware. In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS back end. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.
+To choose where to branch off at, each intermediate language in \compcert{} can be evaluated to see if
+
+Existing HLS compilers often use LLVM IR as an intermediate representation when performing HLS-specific optimisations, as each instruction can be mapped quite well to hardware that performs the same behaviour. \compcert{}'s three-address code (3AC)\footnote{Three-address code (3AC) is also known as register transfer language (RTL) in the \compcert{} literature, however, 3AC is used in this paper instead so as not to confuse it with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract. This can be mitigated by making \compcert{} target a specific architecture such as x86\_32, where most operations translate quite well into hardware. In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS back end. The complete flow that \vericert{} takes is show in figure~\ref{fig:rtlbranch}.
+
+\subsection{Some Key Challenges}
+
+This subsection lists some key challenges that were encountered when implementing the translation from 3AC to HTL and subsequently Verilog.
+
+\subsubsection{Discrepency between C and Verilog w.r.t. signedness}
+
+\subsubsection{Byte- and word-addressable memories}
+
+\subsubsection{Reset signals}
+
+\subsubsection{Implementation of the \texttt{Oshrximm} instruction}
\JW{I wonder if Section 2 could benefit from a `Some Key Challenges' subsection, where you highlight several interesting bits of the translation process, each with their own paragraph heading. These could be something like:\begin{enumerate}\item Discrepancy between C and Verilog w.r.t. signedness \item Deciding between byte- and word-addressable memories \item Adding reset signals \item Implementing the Oshrximm instruction correctly \end{enumerate} For the causal reader, this would immediately signal two things: (1) you can skip this subsection on your initial pass, and (2) proving the HLS tool correct was a non-trivial undertaking.}
@@ -50,15 +64,45 @@ Existing HLS compilers often use LLVM IR as an intermediate representation when
\begin{figure}
\centering
- \begin{subfigure}[b]{0.5\linewidth}
- \inputminted{c}{data/accumulator.c}
- \caption{Accumulator C code.}\label{fig:accumulator_c}
- \end{subfigure}%
- \begin{subfigure}[b]{0.5\linewidth}
- \inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
- \caption{Accumulator 3AC code.}\label{fig:accumulator_rtl}
+ \begin{subfigure}[b]{0.49\linewidth}
+\begin{minted}{c}
+int main() {
+ int x[3] = {1, 2, 3};
+ int sum = 0;
+ for (int i = 0;
+ i < 3;
+ i++)
+ sum += x[i];
+ return sum;
+}
+\end{minted}
+ \caption{Example showing the input C code.}\label{fig:accumulator_c}
+ \end{subfigure}\hfill%
+ \begin{subfigure}[b]{0.49\linewidth}
+\begin{minted}[fontsize=\footnotesize]{c}
+main() {
+ 16: x9 = 1
+ 15: int32[stack(0)] = x9
+ 14: x8 = 2
+ 13: int32[stack(4)] = x8
+ 12: x7 = 3
+ 11: int32[stack(8)] = x7
+ 10: x3 = 0
+ 9: nop
+ 8: x1 = 0
+ 7: x6 = stack(0) (int)
+ 6: x5 = int32[x6 + x1 * 4 + 0]
+ 5: x3 = x3 + x5 + 0 (int)
+ 4: x1 = x1 + 1 (int)
+ 3: if (x1 <s 3) goto 7
+ else goto 2
+ 2: x4 = x3
+ 1: return x4
+}
+\end{minted}
+ \caption{The 3AC code produced by \compcert{}.}\label{fig:accumulator_rtl}
\end{subfigure}
- \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC). \JW{I would consider dropping `accumulator' here, as I found it slightly confusing.} \JW{Some of the line numbers on the right have been cropped.}}\label{fig:accumulator_c_rtl}
+ \caption{Short example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
\end{figure}
\subsection{Example C to Verilog translation}
@@ -113,13 +157,73 @@ This translation seems quite straightforward, however, proving that it is correc
\begin{figure}
\centering
- \begin{subfigure}[b]{0.5\linewidth}
- \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator1.v}
- \caption{Accumulator HTL code.}\label{fig:accumulator_v_1}
- \end{subfigure}%
- \begin{subfigure}[b]{0.5\linewidth}
- \inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
- \caption{Accumulator Verilog code.}\label{fig:accumulator_v_2}
+ \begin{subfigure}[b]{0.49\linewidth}
+\begin{minted}[fontsize=\tiny]{verilog}
+module main(reset, clk, finish, return_val);
+ reg [31:0] stack [2:0];
+ input [0:0] clk, reset;
+ output reg [31:0] return_val;
+ output reg [0:0] finish;
+ reg [31:0] reg_8, reg_4, state,
+ reg_6, reg_1, reg_9,
+ reg_5, reg_3, reg_7;
+ always @(posedge clk)
+ case (state)
+ 32'd16: reg_9 <= 32'd1;
+ 32'd15: stack[32'd0] <= reg_9;
+ 32'd14: reg_8 <= 32'd2;
+ 32'd13: stack[32'd1] <= reg_8;
+ 32'd12: reg_7 <= 32'd3;
+ 32'd11: stack[32'd2] <= reg_7;
+ 32'd10: reg_3 <= 32'd0;
+ 32'd9: ;
+ 32'd8: reg_1 <= 32'd0;
+ 32'd7: reg_6 <= 32'd0;
+ 32'd6: reg_5 <= stack[{{{reg_6 + 32'd0}
+ + {reg_1 * 32'd4}} / 32'd4}];
+ 32'd5: reg_3 <= {reg_3 + {reg_5 + 32'd0}};
+ 32'd4: reg_1 <= {reg_1 + 32'd1};
+ 32'd3: ;
+ 32'd2: reg_4 <= reg_3;
+ 32'd1: begin
+ finish = 1'd1;
+ return_val = reg_4;
+ end
+ default:;
+ endcase
+\end{minted}
+ \caption{Verilog always block describing the datapath of the module.}\label{fig:accumulator_v_1}
+ \end{subfigure}\hfill%
+ \begin{subfigure}[b]{0.49\linewidth}
+\begin{minted}[fontsize=\tiny]{verilog}
+ always @(posedge clk)
+ if ({reset == 1'd1})
+ state <= 32'd16;
+ else
+ case (state)
+ 32'd16: state <= 32'd15;
+ 32'd15: state <= 32'd14;
+ 32'd14: state <= 32'd13;
+ 32'd13: state <= 32'd12;
+ 32'd12: state <= 32'd11;
+ 32'd11: state <= 32'd10;
+ 32'd10: state <= 32'd9;
+ 32'd9: state <= 32'd8;
+ 32'd8: state <= 32'd7;
+ 32'd7: state <= 32'd6;
+ 32'd6: state <= 32'd5;
+ 32'd5: state <= 32'd4;
+ 32'd4: state <= 32'd3;
+ 32'd3: state <=
+ ({$signed(reg_1) < $signed(32'd3)}
+ ? 32'd7 : 32'd2);
+ 32'd2: state <= 32'd1;
+ 32'd1: ;
+ default:;
+ endcase
+endmodule
+\end{minted}
+ \caption{Verilog always block describing the control logic of the module.}\label{fig:accumulator_v_2}
\end{subfigure}
\caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog. \JW{Subfigure captions aren't correct.}}\label{fig:accumulator_v}
\end{figure}
diff --git a/archive/algorithm.tex b/archive/algorithm.tex
index 9db0262..fe4cb10 100644
--- a/archive/algorithm.tex
+++ b/archive/algorithm.tex
@@ -1,3 +1,7 @@
+It is therefore important to find the right intermediate language \JW{You already said `crucial to know where to branch off' so this feels repetitive} so that the HLS tool still benefits from many of the generic optimisations that \compcert{} performs, but does not receive the code transformations that are specific to CPU architectures.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
\section{Turning CompCert into an HLS tool}
%% Should maybe go in the introduction instead.