summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-08-05 11:12:51 +0200
committerYann Herklotz <git@yannherklotz.com>2021-08-05 11:12:51 +0200
commitcae2909811563106d5469c2825f7e001be2b1325 (patch)
treee7e5a4e682661f6b6137ac13719c051debe555bb /algorithm.tex
parent915879bab3498b817f26a5389898330c12b98417 (diff)
downloadoopsla21_fvhls-cae2909811563106d5469c2825f7e001be2b1325.tar.gz
oopsla21_fvhls-cae2909811563106d5469c2825f7e001be2b1325.zip
Address most notes
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex7
1 files changed, 3 insertions, 4 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 737031b..34358cf 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,5 +1,4 @@
-\section{Designing a verified HLS tool}
-\label{sec:design}
+\section{Designing a verified HLS tool}\label{sec:design}
This section describes the main architecture of the HLS tool, and the way in which the Verilog back end was added to \compcert{}. This section also covers an example of converting a simple C program into hardware, expressed in the Verilog language.
@@ -397,7 +396,7 @@ Secondly, the logic in the enable signal of the RAM (\texttt{en != u\_en}) is al
\caption{Timing diagrams showing the execution of loads and stores over multiple clock cycles.}\label{fig:ram_load_store}
\end{figure}
-\JW{The following paragraph could probably be cut, as the same explanation is already in the Figure 4 caption, and replaced with something like ``Figure~\ref{fig:ram_load_store} gives an example of how the RAM interface behaves when values are loaded and stored.''} Figure~\ref{fig:ram_load} shows an example of how the waveforms in the RAM in Figure~\ref{fig:accumulator_v} behave when a value is loaded. To initiate a load, the data-path enable signal \texttt{u\_en} flag is toggled, the address \texttt{addr} is set and the write enable \texttt{wr\_en} is set to low. This all happens at the positive edge of the clock, at time slice 1. Then, on the next negative edge of the clock, at time slice 2, the \texttt{u\_en} is now different from the RAM enable \texttt{en}, so the RAM is enabled. A load is then performed by assigning the \texttt{d\_out} register to the value stored at the address in the RAM and the \texttt{en} is set to the same value as \texttt{u\_en} to disable the RAM again. Finally, on the next positive edge of the clock, the value in \texttt{d\_out} is assigned to the destination register \texttt{r}. An example of a store is shown in Figure~\ref{fig:ram_store}. The \texttt{d\_in} register is assigned the value to be stored. The store is then performed on the negative edge of the clock and is therefore complete by the next positive edge.
+\JW{The following paragraph could probably be cut, as the same explanation is already in the Figure 4 caption, and replaced with something like ``Figure~\ref{fig:ram_load_store} gives an example of how the RAM interface behaves when values are loaded and stored.''}\YH{Ah ok, yes sure, I just had it there to explain the figure in case some readers are unfamiliar with timing diagrams, but it's true that' it's already in the caption.} Figure~\ref{fig:ram_load} shows an example of how the waveforms in the RAM in Figure~\ref{fig:accumulator_v} behave when a value is loaded. To initiate a load, the data-path enable signal \texttt{u\_en} flag is toggled, the address \texttt{addr} is set and the write enable \texttt{wr\_en} is set to low. This all happens at the positive edge of the clock, at time slice 1. Then, on the next negative edge of the clock, at time slice 2, the \texttt{u\_en} is now different from the RAM enable \texttt{en}, so the RAM is enabled. A load is then performed by assigning the \texttt{d\_out} register to the value stored at the address in the RAM and the \texttt{en} is set to the same value as \texttt{u\_en} to disable the RAM again. Finally, on the next positive edge of the clock, the value in \texttt{d\_out} is assigned to the destination register \texttt{r}. An example of a store is shown in Figure~\ref{fig:ram_store}. The \texttt{d\_in} register is assigned the value to be stored. The store is then performed on the negative edge of the clock and is therefore complete by the next positive edge.
\subsubsection{Implementing the \texttt{Oshrximm} instruction}\label{sec:algorithm:optimisation:oshrximm}
@@ -424,7 +423,7 @@ One might hope that the synthesis tool consuming our generated Verilog would con
where $\gg$ stands for a logical right shift. %Once this equivalence about the shifts and division operator is proven correct, it can be used to implement the \texttt{Oshrximm} using the efficient shift version instead of how the \compcert{} semantics described it.
When proving this equivalence, we actually found a bug in our original implementation that was due to the fact that a na\"{i}ve shift rounds towards $-\infty$.
-\JW{I don't really understand the following paragraph.} \compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused. In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
+\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed. But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation. I'll try and reword this.} \compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused. In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
%The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different. In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit. To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics. While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.