summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-14 20:52:36 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-14 20:52:49 +0100
commita855aa4ff2211421db9d11b3270b69d2aa8b18f4 (patch)
tree832e7b79b6198c9975c75d7179595412ac90643d /algorithm.tex
parent826f861788376d5867a3fe4bec0ab8d8d54545db (diff)
downloadoopsla21_fvhls-a855aa4ff2211421db9d11b3270b69d2aa8b18f4.tar.gz
oopsla21_fvhls-a855aa4ff2211421db9d11b3270b69d2aa8b18f4.zip
Add timing diagrams
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex44
1 files changed, 43 insertions, 1 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 3ea6778..83b879a 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -331,7 +331,49 @@ However, the memory model that \compcert{} uses for its intermediate languages i
\subsubsection{Implementation of RAM templates}
-Verilog arrays can be used in a variety of ways, however, these do not all produce optimal hardware designs. Synthesis tools provide code snippets that they know how to transform into various constructs, including proper RAMs in the final hardware. If, instead of using these templates, the array was accessed immediately in the data-path, then the synthesis tool would not be able to identify it as having the right properties for a RAM, and would instead implement the array using registers. This is extremely expensive, and for large memories this can easily blow up the area usage of the FPGA, and because of the longer wires that are needed, it would also affect the performance of the circuit. The initial translation from 3AC to HTL converts loads and stores to direct accesses to the memory, as this preserves the same behaviour without having to insert more registers and signalling. Another pass from HTL to HTL then performs the translation from this na\"ive representation to using a proper RAM template, adding the necessary data, address and enable signals to make the synthesis tool infer that block as a proper RAM. The RAM template that is used has some interesting additions to make the proof of this pass easier, which is further explained in Section~\ref{sec:verilog:memory}.
+Verilog arrays can be used in a variety of ways, however, these do not all produce optimal hardware designs. Synthesis tools provide code snippets that they know how to transform into various constructs, including proper RAMs in the final hardware. If, instead of using these templates, the array was accessed immediately in the data-path, then the synthesis tool would not be able to identify it as having the right properties for a RAM, and would instead implement the array using registers. This is extremely expensive, and for large memories this can easily blow up the area usage of the FPGA, and because of the longer wires that are needed, it would also affect the performance of the circuit. The initial translation from 3AC to HTL converts loads and stores to direct accesses to the memory, as this preserves the same behaviour without having to insert more registers and signalling. Another pass from HTL to HTL then performs the translation from this na\"ive representation to using a proper RAM template, adding the necessary data, address and enable signals to make the synthesis tool infer that block as a proper RAM.
+
+There are two interesting parts to the memory template that is used for the stack of the main function. Firstly, the memory updates are triggered on the negative edge of the clock, out of phase with the rest of the design, which is triggered on the positive edge of the clock. The main advantage is that instead of loads and stores taking three and two clock cycles respectively, they only take two and one clock cycle instead, greatly improving their performance. In addition to that, using the negative edge for the clock is supported by many synthesis tools, it therefore does not affect the maximum frequency of the final design. Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is also atypical. To make the proof simpler, the goal is to create a RAM which disables itself after every use, so that firstly, the proof can assume that the RAM is disabled at the start and end of every clock cycle, and secondly so that only the state which contains the load and store need to be modified to ensure this property. Using a simple enable signal, it would not be possible to disable it in the RAM itself, as well as enabling it in the datapath, as this would result in a register being driven twice from two different locations. We can instead generate a second enable signal that is set by the user, and the original enable signal is then updated by the RAM to be equal to the value that the user set. This means that the RAM should be enabled whenever the two signals are different, and disabled otherwise.
+
+\begin{figure}
+ \centering
+ \begin{subfigure}[b]{0.48\linewidth}
+ \begin{tikztimingtable}[timing/d/background/.style={fill=white}]
+ \small clk & 2L 3{6C} \\
+ \small u\_en & 2D{u\_en} 18D{$\overline{\text{u\_en}}$}\\
+ \small en & 8D{u\_en} 12D{$\overline{\text{u\_en}}$}\\
+ \small addr & 2U 18D{3} \\
+ \small d\_out & 8U 12D{0xDEADBEEF} \\
+ \small r & 14U 6D{0xDEADBEEF} \\
+ \extracode
+ \node[help lines] at (2,2.25) {\tiny 1};
+ \node[help lines] at (8,2.25) {\tiny 2};
+ \node[help lines] at (14,2.25) {\tiny 3};
+ \begin{pgfonlayer}{background}
+ \vertlines[help lines]{2,8,14}
+ \end{pgfonlayer}
+ \end{tikztimingtable}
+ \caption{Timing diagram for loads. The \texttt{u\_en} signal is toggled which enables the RAM, then d\_out is set to be the value stored at the address in the RAM, which is finally assigned to the register \texttt{r}.}
+ \end{subfigure}\hfill%
+ \begin{subfigure}[b]{0.48\linewidth}
+ \begin{tikztimingtable}[timing/d/background/.style={fill=white}]
+ \small clk & 2L 2{7C} \\
+ \small u\_en & 2D{u\_en} 14D{$\overline{\text{u\_en}}$}\\
+ \small en & 9D{u\_en} 7D{$\overline{\text{u\_en}}$}\\
+ \small addr & 2U 14D{3} \\
+ \small d\_in & 2U 14D{0xDEADBEEF} \\
+ \small stack[addr] & 9U 7D{0xDEADBEEF} \\
+ \extracode
+ \node[help lines] at (2,2.25) {\tiny 1};
+ \node[help lines] at (9,2.25) {\tiny 2};
+ \begin{pgfonlayer}{background}
+ \vertlines[help lines]{2,9}
+ \end{pgfonlayer}
+ \end{tikztimingtable}
+ \caption{Timing diagram for stores. The \texttt{u\_en} signal is toggled to enable the RAM, together with the address \texttt{addr} and the data to store \texttt{d\_in}. On the negative edge the data is stored into the RAM.}
+ \end{subfigure}
+ \caption{Timing diagrams showing the execution of loads and stores over multiple clock cycles.}\label{fig:ram_load_store}
+\end{figure}
\subsubsection{Implementing the \texttt{Oshrximm} instruction}