summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-08-11 12:02:46 +0200
committerYann Herklotz <git@yannherklotz.com>2021-08-11 12:02:46 +0200
commite2e7ba544d650440ab2371c9a103ff2c8c8f9d1e (patch)
tree1b828849dc5d33c81057e89b79fb3b19a669202a /algorithm.tex
parenta6f4597748b8af3e2dc571154afbc86534a67812 (diff)
downloadoopsla21_fvhls-e2e7ba544d650440ab2371c9a103ff2c8c8f9d1e.tar.gz
oopsla21_fvhls-e2e7ba544d650440ab2371c9a103ff2c8c8f9d1e.zip
Add diagram to Section 2.1
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex59
1 files changed, 44 insertions, 15 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 8793e41..ba869e4 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -67,8 +67,7 @@ The .NET framework has been used as a basis for other HLS tools, such as Kiwi~\c
\draw[->,thick] (htl) -- (verilog);
\draw[->,thick] (htl.west) to [out=180,in=150] (4,-2.2) to [out=330,in=270] (htl.south);
\end{tikzpicture}%}
- \caption{\vericert{} as a Verilog back end to \compcert{}. For scale, the approximate lines of code (kloc) are shown for \vericert{}, as well as for the front end and back end of \compcert{}, including any comments and whitespace. \JW{Nice. I like the inclusion of the LOCs -- it makes the diagram do a bit more work.}
-}%
+ \caption{\vericert{} as a Verilog back end to \compcert{}. For scale, the approximate lines of code (kloc) are shown for \vericert{}, as well as for the front end and back end of \compcert{}, including any comments and whitespace.}%
\label{fig:rtlbranch}
\end{figure}
@@ -86,29 +85,59 @@ It has an unlimited number of pseudo-registers, and is represented as a control
\subsection{An introduction to Verilog}
-This section will introduce Verilog for readers that may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}. Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA. Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development. For example, a net list with transfer delay information could be implemented in Verilog, as well as a more abstract state machine which would first have to translated to the net list level. \JW{I don't like that last sentence because I don't think it's pitched to the right audience. If you don't know Verilog, you won't understand `net list' or `transfer delay'. I'd cut it.}
+This section will introduce Verilog for readers that may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}. Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA. Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
-Verilog behaves quite differently to standard software programming languages due to it having to express the parallel nature of hardware. The basic construct to achieve this is the always-block, which is a block \JW{Can we avoid repeating `block'? Like, can we say `collection' here? It just feels a bit inelegant to say `an always-block is a block...'.} of assignments that are executed every time some event occurs. In the case of \vericert{}, this event is either a positive \JW{(rising)} or a negative \JW{(falling)} clock edge. All always-blocks triggering on the same event are executed in parallel. Always-blocks can also express control-flow using if-statements and case-statements.
-
-A simple state machine can therefore be implemented as follows: \JW{This should be floated out into a figure. And might it be nice to put a little picture of the corresponding state machine in the gap on the right-hand side?}
+Verilog behaves quite differently to standard software programming languages due to it having to express the parallel nature of hardware. The basic construct to achieve this is the always-block, which is a collection of assignments that are executed every time some event occurs. In the case of \vericert{}, this event is either a positive (rising) or a negative (falling) clock edge. All always-blocks triggering on the same event are executed in parallel. Always-blocks can also express control-flow using if-statements and case-statements.
+\begin{figure}
+ \centering
+ \begin{subfigure}{0.55\linewidth}
\begin{minted}[linenos,xleftmargin=20pt]{verilog}
-module main(input state, input y, input clk, output z);
- reg x;
+module main(input rst, input y, input clk,
+ output reg z);
+ reg x, state;
always @(posedge clk)
case (state)
- 1'd0: begin x <= y; end
- 1'd1: begin x <= 1'd0; z <= x; end
+ 1'b0: x <= y;
+ 1'b1: begin x <= 1'b0; z <= x; end
endcase
always @(posedge clk)
- case (state)
- 1'd0: begin if (y) state <= 1'd1; else state <= 1'd0; end
- 1'd1: begin state <= 1'd0 end
+ if (rst) state <= 1'b0;
+ else case (state)
+ 1'b0: if (y) state <= 1'b1;
+ else state <= 1'b0;
+ 1'b1: state <= 1'b0;
endcase
endmodule
\end{minted}
+ \end{subfigure}\hfill%
+ \begin{subfigure}{0.45\linewidth}
+ \centering
+ \begin{tikzpicture}
+ \node[draw,circle,inner sep=8pt] (s0) at (0,0) {$S_{0}$};
+ \node[draw,circle,inner sep=8pt] (s1) at (1.5,-3) {$S_{1}$};
+ \node[draw,circle,inner sep=8pt] (s2) at (3,0) {$S_{2}$};
+ \node (s2s) at ($(s2.west) + (-0.3,1)$) {\texttt{\_0/1}};
+ \node (s2ss) at ($(s2.east) + (0.3,1)$) {\texttt{1\_/1}};
+ \draw[-{Latex[length=2mm,width=1.4mm]}] ($(s0.west) + (-0.3,1)$) to [out=0,in=120] (s0);
+ \draw[-{Latex[length=2mm,width=1.4mm]}] (s0)
+ to [out=-90,in=150] node[midway,left] {\texttt{01/\_}} (s1);
+ \draw[-{Latex[length=2mm,width=1.4mm]}] (s1)
+ to [out=80,in=220] node[midway,left] {\texttt{\_\_/1}} (s2);
+ \draw[-{Latex[length=2mm,width=1.4mm]}] (s2)
+ to [out=260,in=50] node[midway,right] {\texttt{01/1}} (s1);
+ \draw[-{Latex[length=2mm,width=1.4mm]}] (s2)
+ to [out=120,in=40] ($(s2.west) + (-0.3,0.7)$) to [out=220,in=170] (s2);
+ \draw[-{Latex[length=2mm,width=1.4mm]}] (s2)
+ to [out=60,in=130] ($(s2.east) + (0.3,0.7)$) to [out=310,in=10] (s2);
+ \end{tikzpicture}
+ \end{subfigure}
+ \caption{A simple state machine implemented in Verilog, with it's state machine representation on the right.}%
+ \label{fig:tutorial:state_machine}
+\end{figure}
-At every positive edge of the clock (\texttt{clk}), both of the always-blocks will trigger simultaneously. The first always-block controls the values of the internal state of the register \JW{Is it ok to simplify `the values of the internal state of the register' to `the values in the register'?} \texttt{x} and the output \texttt{z}, while the second always-block controls the next state the state machine should go to. When the input \texttt{state} is 0, \texttt{x} will be assigned to the input \texttt{y} using nonblocking assignment, denoted by \texttt{<=}. Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second always-block, the input \texttt{y} will be checked, and if it's high it will move on to the next state, otherwise it will stay in the current state. When \texttt{state} is 1, the first always-block will reset the value of \texttt{x} and then set \texttt{z} to the original value of \texttt{x}, since nonblocking assignment does not change its value until the end of the clock cycle. Finally, the last always-block will set the state to be 0 again.
+A simple state machine can therefore be implemented, it's Verilog implementation as well as a representation of the state machine can be seen in Figure~\ref{fig:tutorial:state_machine}.
+At every positive edge of the clock (\texttt{clk}), both of the always-blocks will trigger simultaneously. The first always-block controls the values in the register \texttt{x} and the output \texttt{z}, while the second always-block controls the next state the state machine should go to. When the \texttt{state} is 0, \texttt{x} will be assigned to the input \texttt{y} using nonblocking assignment, denoted by \texttt{<=}. Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second always-block, the input \texttt{y} will be checked, and if it's high it will move on to the next state, otherwise it will stay in the current state. When \texttt{state} is 1, the first always-block will reset the value of \texttt{x} and then set \texttt{z} to the original value of \texttt{x}, since nonblocking assignment does not change its value until the end of the clock cycle. Finally, the last always-block will set the state to be 0 again.
\begin{figure}
\centering
@@ -342,7 +371,7 @@ A high-level overview of the architecture and of the RAM interface can be seen i
Most 3AC instructions correspond to hardware constructs.
%Each 3AC instruction either corresponds to a hardware construct or does not have to be handled by the translation, such as function calls (because of inlining). \JW{Are function calls the only 3AC instruction that we ignore? (And I guess return statements too for the same reason.)}\YH{Actually, return instructions are translated (because you can return from main whenever), so call instructions (Icall, Ibuiltin and Itailcall) are the only functions that are not handled.}
% JW: Thanks; please check proposed new text.
-For example, line 2 in Figure~\ref{fig:accumulator_rtl} shows a 32-bit register \texttt{x5} being initialised to 3, after which the control flow moves execution to line 3. This initialisation is also encoded in the Verilog generated from HTL at state 8 in both the control logic and data-path always-blocks, shown at lines 33 and 16 respectively in Figure~\ref{fig:accumulator_v}. Simple operator instructions are translated in a similar way. For example, the add instruction is just translated to the built-in add operator, similarly for the multiply operator. All 32-bit instructions can be translated in this way, but some special instructions require extra care. One such is the \texttt{Oshrximm} instruction, which is discussed further in Section~\ref{sec:algorithm:optimisation:oshrximm}. Another is the \texttt{Oshldimm} instruction, which is a left rotate instruction that has no Verilog equivalent and therefore has to be implemented in terms of other operations and proven to be equivalent. \JW{I reverted the following back to my version because it seems clearer to me.}
+For example, line 2 in Figure~\ref{fig:accumulator_rtl} shows a 32-bit register \texttt{x5} being initialised to 3, after which the control flow moves execution to line 3. This initialisation is also encoded in the Verilog generated from HTL at state 8 in both the control logic and data-path always-blocks, shown at lines 33 and 16 respectively in Figure~\ref{fig:accumulator_v}. Simple operator instructions are translated in a similar way. For example, the add instruction is just translated to the built-in add operator, similarly for the multiply operator. All 32-bit instructions can be translated in this way, but some special instructions require extra care. One such is the \texttt{Oshrximm} instruction, which is discussed further in Section~\ref{sec:algorithm:optimisation:oshrximm}. Another is the \texttt{Oshldimm} instruction, which is a left rotate instruction that has no Verilog equivalent and therefore has to be implemented in terms of other operations and proven to be equivalent.
% In addition to any non-32-bit operations, the remaining
The only 32-bit instructions that we do not translate are those related to function calls (\texttt{Icall}, \texttt{Ibuiltin}, and \texttt{Itailcall}), because we enable inlining by default.