Shorten the Oshrximm section

author: Yann Herklotz <git@yannherklotz.com> 2021-08-13 13:10:41 +0200
committer: Yann Herklotz <git@yannherklotz.com> 2021-08-13 13:10:41 +0200
commit: 9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3 (patch)
tree: 3a8037fb6974be0123d847fa8c29b26cfd059581 /algorithm.tex
parent: 211cd1d427c64f08e9ca7b74a3acca2c4ef0abbf (diff)
download: oopsla21_fvhls-9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3.tar.gz
oopsla21_fvhls-9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3.zip
1 files changed, 23 insertions, 22 deletions
diff --git a/algorithm.tex b/algorithm.tex
index bca7a61..130d403 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -92,7 +92,7 @@ Verilog behaves quite differently to standard software programming languages due
 \begin{figure}
   \centering
   \begin{subfigure}{0.55\linewidth}
-\begin{minted}[linenos,xleftmargin=20pt]{verilog}
+\begin{minted}[linenos,xleftmargin=20pt,fontsize=\footnotesize]{verilog}
 module main(input rst, input y, input clk,
             output reg z);
   reg x, state;
@@ -117,13 +117,13 @@ endmodule
       \node[draw,circle,inner sep=8pt] (s0) at (0,0) {$S_{0}$};
       \node[draw,circle,inner sep=8pt] (s1) at (1.5,-3) {$S_{1}$};
       \node[draw,circle,inner sep=8pt] (s2) at (3,0) {$S_{2}$};
-      \node (s2s) at ($(s2.west) + (-0.3,1)$) {\texttt{\_0/1}};
-      \node (s2ss) at ($(s2.east) + (0.3,1)$) {\texttt{1\_/1}};
+      \node (s2s) at ($(s2.west) + (-0.3,1)$) {\texttt{x0/1}};
+      \node (s2ss) at ($(s2.east) + (0.3,1)$) {\texttt{1x/1}};
       \draw[-{Latex[length=2mm,width=1.4mm]}] ($(s0.west) + (-0.3,1)$) to [out=0,in=120] (s0);
       \draw[-{Latex[length=2mm,width=1.4mm]}] (s0)
-           to [out=-90,in=150] node[midway,left] {\texttt{01/\_}} (s1);
+           to [out=-90,in=150] node[midway,left] {\texttt{01/x}} (s1);
       \draw[-{Latex[length=2mm,width=1.4mm]}] (s1)
-      to [out=80,in=220] node[midway,left] {\texttt{\_\_/1}} (s2);
+      to [out=80,in=220] node[midway,left] {\texttt{xx/1}} (s2);
       \draw[-{Latex[length=2mm,width=1.4mm]}] (s2)
       to [out=260,in=50] node[midway,right] {\texttt{01/1}} (s1);
       \draw[-{Latex[length=2mm,width=1.4mm]}] (s2)
@@ -132,7 +132,7 @@ endmodule
       to [out=60,in=130] ($(s2.east) + (0.3,0.7)$) to [out=310,in=10] (s2);
     \end{tikzpicture}
   \end{subfigure}
-  \caption{A simple state machine implemented in Verilog, with its diagrammatic representation on the right.}%
+  \caption{A simple state machine implemented in Verilog, with its diagrammatic representation on the right, where the x's stand for don't cares and each transition is labelled with the values for the \texttt{rst}, \texttt{y} and output \texttt{z}.}%
   \label{fig:tutorial:state_machine}
 \end{figure}
 
@@ -469,27 +469,28 @@ Many of the \compcert{} instructions map well to hardware, but \texttt{Oshrximm}
 \begin{equation*}
   \texttt{Oshrximm } x\ y = \text{round\_towards\_zero}\left(\frac{x}{2^{y}}\right)
 \end{equation*}
-(where $x, y \in \mathbb{Z}$, $0 \leq y < 31$, and $-2^{31} \leq x < 2^{31}$) and instantiating divider circuits in hardware is well known to cripple performance. Moreover, since \vericert{} requires the result of a divide operation to be ready within a single clock cycle, the divide circuit needs to be entirely combinational. This is inefficient in terms of area, but also in terms of latency, because it means that the maximum frequency of the hardware must be reduced dramatically so that the divide circuit has enough time to finish.
+(where $x, y \in \mathbb{Z}$, $0 \leq y < 31$, and $-2^{31} \leq x < 2^{31}$) and instantiating divider circuits in hardware is well known to cripple performance. Moreover, since \vericert{} requires the result of a divide operation to be ready within a single clock cycle, the divide circuit needs to be entirely combinational. This is inefficient in terms of area, but also in terms of latency, because it means that the maximum frequency of the hardware must be reduced dramatically so that the divide circuit has enough time to finish.  It should therefore be implemented using a sequence of shifts.
+
+\compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction in 3AC itself still uses division instead of shifts, meaning this proof of the translation cannot be reused.  In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
+
+%\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed.  But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation.  I'll try and reword this.}\NR{I am not following this paragraph below too.}
+%\JW{I wonder, can I file this under `things that you've improved in CompCert generally as part of your efforts on Vericert''?}\YH{I think so, although I have not tried to adapt existing passes to use this.  But I do think that this would make the proofs much easier.}
 
 %\JP{Multi-cycle paths might be something worth exploring in future work: fairly error-prone/dangerous for hand-written code, but might be interesting in generated code.}\YH{Definitely is on the list for next things to look into, will make divide so much more efficient.}
 
 %These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
 
-One might hope that the synthesis tool consuming our generated Verilog would convert the division to an efficient shift operation, but this is unlikely to happen with signed division which requires more than a single shift. However, the observation can be made that signed division can be implemented using shifts:
-\begin{equation*}
-  \text{round\_towards\_zero}\left(\frac{x}{2^{y}}\right) =
-  \begin{cases}
-    x \gg y & \text{if } x \geq 0 \\
-    - ( - x \gg y ) & \text{otherwise}.
-  \end{cases}\\
-\end{equation*}
-where $\gg$ stands for a logical right shift. %Once this equivalence about the shifts and division operator is proven correct, it can be used to implement the \texttt{Oshrximm} using the efficient shift version instead of how the \compcert{} semantics described it.
-When proving this equivalence, we actually found a bug in our original implementation that was due to the fact that a na\"{i}ve shift implementation which rounds towards $-\infty$.
-%\NR{What do you mean byy naive shift here?}\YH{Just a normal shift, instead of being adapted for division.}
-
-%\JW{I don't really understand the following paragraph.}\YH{So my intention with this was to make this section more meaningful, as one of the reviewers mentioned that because compcert already did this, this section is not needed.  But I wanted to explain that because we do the reasoning of equivalence between shifts and division at the Integer level, our proof is more general than the language specific proofs that \compcert{} has in it's back ends, as we fix the specification instead of the implementation.  I'll try and reword this.}\NR{I am not following this paragraph below too.} 
-\compcert{} eventually performs a translation from this representation into assembly code which uses shifts to implement the division, however, the specification of the instruction itself still uses division instead of shifts, meaning the proof of the translation cannot be reused.  In \vericert{}, the equivalence of the representation in terms of divisions and shifts is proven over the integers and the specification, thereby making it simpler to prove the correctness of the Verilog implementation in terms of shifts.
-%\JW{I wonder, can I file this under `things that you've improved in CompCert generally as part of your efforts on Vericert''?}\YH{I think so, although I have not tried to adapt existing passes to use this.  But I do think that this would make the proofs much easier.}
+%One might hope that the synthesis tool consuming our generated Verilog would convert the division to an efficient shift operation, but this is unlikely to happen with signed division which requires more than a single shift. However, the observation can be made that signed division can be implemented using shifts:
+%\begin{equation*}
+%  \text{round\_towards\_zero}\left(\frac{x}{2^{y}}\right) =
+%  \begin{cases}
+%    x \gg y & \text{if } x \geq 0 \\
+%    - ( - x \gg y ) & \text{otherwise}.
+%  \end{cases}\\
+%\end{equation*}
+%where $\gg$ stands for a logical right shift. %Once this equivalence about the shifts and division operator is proven correct, it can be used to implement the \texttt{Oshrximm} using the efficient shift version instead of how the \compcert{} semantics described it.
+%When proving this equivalence, we actually found a bug in our original implementation that was due to the fact that a na\"{i}ve shift implementation which rounds towards $-\infty$.
+%%\NR{What do you mean byy naive shift here?}\YH{Just a normal shift, instead of being adapted for division.}
 
 %The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
author	Yann Herklotz <git@yannherklotz.com>	2021-08-13 13:10:41 +0200
committer	Yann Herklotz <git@yannherklotz.com>	2021-08-13 13:10:41 +0200
commit	9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3 (patch)
tree	3a8037fb6974be0123d847fa8c29b26cfd059581 /algorithm.tex
parent	211cd1d427c64f08e9ca7b74a3acca2c4ef0abbf (diff)
download	oopsla21_fvhls-9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3.tar.gz oopsla21_fvhls-9d5ecc3f79ad3fb4a4186aef9c6b3430331128c3.zip