summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2021-08-11 14:33:11 +0000
committernode <node@git-bridge-prod-0>2021-08-11 14:56:13 +0000
commit221aa79714add6689aaa64522b6d6d8b0d2bea46 (patch)
tree7ffb1d3b18c3581221368129368c74cd66a5dd8f /algorithm.tex
parenta8d7c175c72b9b6d07a2ce94fcbe16754cdf6857 (diff)
downloadoopsla21_fvhls-221aa79714add6689aaa64522b6d6d8b0d2bea46.tar.gz
oopsla21_fvhls-221aa79714add6689aaa64522b6d6d8b0d2bea46.zip
Update on Overleaf.
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex12
1 files changed, 6 insertions, 6 deletions
diff --git a/algorithm.tex b/algorithm.tex
index ba869e4..ba00bff 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -85,7 +85,7 @@ It has an unlimited number of pseudo-registers, and is represented as a control
\subsection{An introduction to Verilog}
-This section will introduce Verilog for readers that may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}. Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA. Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
+This section will introduce Verilog for readers who may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}. Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA. Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
Verilog behaves quite differently to standard software programming languages due to it having to express the parallel nature of hardware. The basic construct to achieve this is the always-block, which is a collection of assignments that are executed every time some event occurs. In the case of \vericert{}, this event is either a positive (rising) or a negative (falling) clock edge. All always-blocks triggering on the same event are executed in parallel. Always-blocks can also express control-flow using if-statements and case-statements.
@@ -132,11 +132,11 @@ endmodule
to [out=60,in=130] ($(s2.east) + (0.3,0.7)$) to [out=310,in=10] (s2);
\end{tikzpicture}
\end{subfigure}
- \caption{A simple state machine implemented in Verilog, with it's state machine representation on the right.}%
+ \caption{A simple state machine implemented in Verilog, with its diagrammatic representation on the right.}%
\label{fig:tutorial:state_machine}
\end{figure}
-A simple state machine can therefore be implemented, it's Verilog implementation as well as a representation of the state machine can be seen in Figure~\ref{fig:tutorial:state_machine}.
+A simple state machine can be implemented as shown in Figure~\ref{fig:tutorial:state_machine}.
At every positive edge of the clock (\texttt{clk}), both of the always-blocks will trigger simultaneously. The first always-block controls the values in the register \texttt{x} and the output \texttt{z}, while the second always-block controls the next state the state machine should go to. When the \texttt{state} is 0, \texttt{x} will be assigned to the input \texttt{y} using nonblocking assignment, denoted by \texttt{<=}. Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second always-block, the input \texttt{y} will be checked, and if it's high it will move on to the next state, otherwise it will stay in the current state. When \texttt{state} is 1, the first always-block will reset the value of \texttt{x} and then set \texttt{z} to the original value of \texttt{x}, since nonblocking assignment does not change its value until the end of the clock cycle. Finally, the last always-block will set the state to be 0 again.
\begin{figure}
@@ -399,8 +399,7 @@ One big difference between C and Verilog is how memory is represented. Although
However, the memory model that \compcert{} uses for its intermediate languages is byte-addre\-ssa\-ble~\cite{blazy05_formal_verif_memor_model_c}. If a byte-addressable memory was used in the target hardware, which is closer to \compcert{}'s memory model, then a load and store would instead take four clock cycles, because a RAM can only perform one read and write per clock cycle. It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware. Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way. Since only integer loads and stores are currently supported in \vericert{}, it follows that the addresses given to the loads and stores will be multiples of four. If that is the case, then the translation from byte-addressed memory to word-addressed memory can be done by dividing the address by four.
\subsubsection{Implementation of RAM interface}
-The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.
-Interestingly, the Verilog syntax for the RAM interface is quite strict, as the synthesis tool will pattern-match on it and only work for a predefined set of interfaces. \NR{Bring forward this sentence to help with flow.}
+The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.\footnote{Interestingly, the Verilog code shown for the RAM interface must not be modified, because the synthesis tool will only generate a RAM when the code matches a small set of specific patterns. \JW{I tweaked this slightly in an attempt to clarify; please check.}} %\NR{Bring forward this sentence to help with flow.}
%\JW{I think the following sentence could be cut as we've said this kind of thing a couple of times already.} Without the interface, the array would be implemented using registers, which would increase the size of the hardware considerably.
@@ -486,7 +485,8 @@ One might hope that the synthesis tool consuming our generated Verilog would con
where $\gg$ stands for a logical right shift. %Once this equivalence about the shifts and division operator is proven correct, it can be used to implement the \texttt{Oshrximm} using the efficient shift version instead of how the \compcert{} semantics described it.
When proving this equivalence, we actually found a bug in our original implementation that was due to the fact that a na\"{i}ve shift rounds towards $-\infty$. \NR{What do you mean byy naive shift here?}
-\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed. But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation. I'll try and reword this.}\NR{I am not following this paragraph below too.} \compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused. In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
+%\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed. But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation. I'll try and reword this.}\NR{I am not following this paragraph below too.}
+\compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused. In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts. \JW{I wonder, can I file this under `things that you've improved in CompCert generally as part of your efforts on Vericert''?}
%The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different. In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit. To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics. While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.