summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-07-25 23:41:13 +0200
committerYann Herklotz <git@yannherklotz.com>2021-07-25 23:41:13 +0200
commit3826f25870abb2725c5c0552ff0ebddd62803ebc (patch)
tree5324fe69492d2b77d3ddb679a3dc83dd46d72b53 /algorithm.tex
parent5ea6be2ade46c7150d33e9fb0c32046be74abb43 (diff)
downloadoopsla21_fvhls-3826f25870abb2725c5c0552ff0ebddd62803ebc.tar.gz
oopsla21_fvhls-3826f25870abb2725c5c0552ff0ebddd62803ebc.zip
Add explanation of scheduling implementation
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex12
1 files changed, 12 insertions, 0 deletions
diff --git a/algorithm.tex b/algorithm.tex
index c4c91a4..7570ed7 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -417,7 +417,19 @@ When proving this equivalence, we actually found a bug in our original implement
%The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different. In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit. To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics. While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
+\subsection{Limitations}
+There are various limitations in \vericert{} compared to other HLS tools due to the fact that the main focus was to formally verify the translation from 3AC to Verilog.
+
+\paragraph{Parallel execution of instructions}
+
+The main limitation of \vericert{} is that it does not perform scheduling of instructions into the same state, meaning that instructions cannot be executed in parallel. However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling. HTL is already an intermediate language which represents an FSMD which can have any number of statements in each state, and the semantics of HTL and Verilog allow for parallel assignments to registers. To simplify the proof of the scheduling algorithm, and to minimise the changes necessary for the current translation from 3AC to HTL, a new language must be introduced, called 3ACPar, which would be equivalent to 3AC but instead of consisting of a map from program counter to instruction, it would consist of a map from program counter to list of instructions, which can all be executed in parallel. A new proof for the scheduling algorithm would have to be written for the translation from 3AC to 3ACPar, for which a verified translation validation approach might be appropriate, however, the translation form 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state. This small difference means that most of the proof can be reused without any changes, as the translation of instructions was the main body of the proof and did not change.
+
+In a similar vein, the introduction of pipelined operators, especially for division, can alleviate many of the timing issues shown in the PolyBench/C benchmarks with divisions included in Section~\ref{sec:evaluation}. These operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware. In HTL, operations like this can be represented in a similar fashion to the load and store instructions to the RAM, by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation. However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
+
+\paragraph{Limitations with I/O}
+
+\vericert{} is currently limited in terms of I/O as well, because the main function cannot accept any arguments for the Clight program to be well-formed. In addition to that, external function calls that produce traces have also not been implemented yet, however, these could enable the C program to read and write values to a bus that is shared by various other components in the hardware design.
%%% Local Variables:
%%% mode: latex