Finish most of algorithm section

author: Yann Herklotz <git@yannherklotz.com> 2020-11-15 19:15:27 +0000
committer: Yann Herklotz <git@yannherklotz.com> 2020-11-15 19:15:27 +0000
commit: 2ec78625f252074260127d365581ac886548cff4 (patch)
tree: bd2cd7c08af9ed1207950c494a68f2eff6a76417 /proof.tex
parent: c258a14a48bf5cc7ffe22f26edf03f55a92573d2 (diff)
download: oopsla21_fvhls-2ec78625f252074260127d365581ac886548cff4.tar.gz
oopsla21_fvhls-2ec78625f252074260127d365581ac886548cff4.zip
1 files changed, 1 insertions, 22 deletions
diff --git a/proof.tex b/proof.tex
index 355094c..a01b688 100644
--- a/proof.tex
+++ b/proof.tex
@@ -113,28 +113,7 @@ The Verilog semantics that are used are deterministic, as the order of operation
 
 \JW{Would be nice to include a few high-level metrics here. How many person-years of effort was the proof (very roughly)? How many lines of Coq? How many files, how many lemmas? How long does it take for the Coq proof to execute?}
 
-\subsection{Proving \texttt{Oshrximm} correct}
-
-% Mention that this optimisation is not performed sometimes (clang -03).
-
-Vericert performs some optimisations at the level of the instructions that are generated, so that the hardware performs the instructions as quickly as possible and so that the maximum frequency at which the hardware could run is increased.  One of the main constructs that cripple performance of the generated hardware is the instantiation of divider circuits in the hardware.  In the case of Vericert, it requires the result of the divide operation to be ready in the same clock cycle, meaning the divide circuit needs to be implemented fully combinationally.  This is inefficient in terms of hardware size, but also in terms of latency, because it means that the maximum frequency of the hardware needs to be reduced dramatically so that the divide circuit has enough time to finish.
-
-These small optimisations were found to be the most error prone, and guaranteeing that the new representation is equivalent to representation used in the \compcert{} semantics is difficult without proving this for all possible inputs.
-
-Dividing by a constant can often be optimised to a more efficient operation, especially if the denominator is a factor of two.  In \compcert{}, the \texttt{Oshrximm} instruction does exactly this, and a normal signed divide operation can be replaced by the \texttt{Oshrximm} instruction, performing the following operation, which is transformed to our optimal representation on the right, where $\div$ stands for integer signed division:
-
-\begin{gather*}
-\forall x, y \in \mathbb{Z},\ \ 0 \leq y < 31,\ \ -2^{31} \leq x < 2^{31},\\
-x \div 2^y =
-\begin{cases}
-  \left\lfloor \frac{x}{2^y} \right\rfloor = x >> y,& \text{if } x \geq 0\\
-  \left\lceil \frac{x}{2^y} \right\rceil = - \left\lfloor \frac{-x}{2^y} \right\rfloor = - ( - x >> y ),& \text{otherwise}
-\end{cases}\\
-\end{gather*}
-
-The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
-
-%\subsection{Coq Mechanisation}
+\subsection{Coq Mechanisation}
 
 %%% Local Variables:
 %%% mode: latex
author	Yann Herklotz <git@yannherklotz.com>	2020-11-15 19:15:27 +0000
committer	Yann Herklotz <git@yannherklotz.com>	2020-11-15 19:15:27 +0000
commit	2ec78625f252074260127d365581ac886548cff4 (patch)
tree	bd2cd7c08af9ed1207950c494a68f2eff6a76417 /proof.tex
parent	c258a14a48bf5cc7ffe22f26edf03f55a92573d2 (diff)
download	oopsla21_fvhls-2ec78625f252074260127d365581ac886548cff4.tar.gz oopsla21_fvhls-2ec78625f252074260127d365581ac886548cff4.zip