summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-08-03 11:03:03 +0200
committerYann Herklotz <git@yannherklotz.com>2021-08-03 11:03:03 +0200
commit7b016243f6822258ede8ec5a1970b4753324f7b2 (patch)
tree74a2b04027483eacee1c024c1012305cc9b57b65 /algorithm.tex
parentd4d8ac89c2b2efa4c291d00010fd585922ce28e2 (diff)
downloadoopsla21_fvhls-7b016243f6822258ede8ec5a1970b4753324f7b2.tar.gz
oopsla21_fvhls-7b016243f6822258ede8ec5a1970b4753324f7b2.zip
Add limitations into it's own section
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex25
1 files changed, 0 insertions, 25 deletions
diff --git a/algorithm.tex b/algorithm.tex
index b974b97..0df5186 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -416,31 +416,6 @@ When proving this equivalence, we actually found a bug in our original implement
%The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different. In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit. To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics. While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
-\subsection{Limitations}
-
-There are various limitations in \vericert{} compared to other HLS tools due to the fact that the main focus was to formally verify the translation from 3AC to Verilog.
-
-\paragraph{Lack of instruction-level parallelism}
-
-The main limitation of \vericert{} is that it does not perform scheduling of instructions into the same state, meaning that instructions cannot be executed in parallel. However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling. HTL is already an intermediate language which represents an FSMD which can have any number of statements in each state, and the semantics of HTL and Verilog allow for parallel assignments to registers. To simplify the proof of the scheduling algorithm, and to minimise the changes necessary for the current translation from 3AC to HTL, a new language must be introduced, called 3ACPar, \JW{I think this is getting too far into the details of how to add scheduling, and risks stealing the thunder of our next submission. I think the basic point to make is: Vericert is currently all sequential, that's pretty bad, but it's not a fundamental limitation. So it's good to point out that HTL is a very general language -- anything that can be put into an always-block in Verilog can be put into an HTL state -- but I wouldn't go into details about new languages like 3ACPar here.} which would be equivalent to 3AC but instead of consisting of a map from program counter to instruction, it would consist of a map from program counter to list of instructions, which can all be executed in parallel. A new proof for the scheduling algorithm would have to be written for the translation from 3AC to 3ACPar, for which a verified translation validation approach might be appropriate, however, the translation form 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state. This small difference means that most of the proof can be reused without any changes, as the translation of instructions was the main body of the proof and did not change.
-
-\paragraph{Lack of pipelined division}
-In a similar vein, the introduction of pipelined operators, especially for division, can alleviate many of the timing issues shown in the \polybench{} benchmarks with divisions included in Section~\ref{sec:evaluation}. These operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware. In HTL, operations like this can be represented in a similar fashion to the load and store instructions to the RAM, by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation. However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
-
-\paragraph{Limitations with I/O}
-
-\vericert{} is currently limited in terms of I/O as well, because the main function cannot accept any arguments for the Clight program to be well-formed. \JW{Is it worth mentioning that it will compile programs with main(a,b), just as CompCert does, but just not prove them correct? This might be a nice `little-known fact' about CompCert?} In addition to that, external function calls that produce traces have also not been implemented yet, however, these could enable the C program to read and write values to a bus that is shared by various other components in the hardware design.
-
-\paragraph{Lack of support for global variables}
-In \compcert{}, each global variable is stored in its own memory. A generalisation of the stack translation into a RAM block could therefore be performed to translate global variables in the same manner. This would require a slight generalisation of pointers so that they store provenance information to ensure that each pointer accesses the right RAM. It would also be necessary to generalise the RAM interface so that it decodes the provenance information and indexes the right array.
-
-\paragraph{Other language restrictions}
-C and Verilog handle signedness quite differently. By default, all operators and registers in Verilog (and HTL) are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to be signed. Moreover, Verilog implicitly resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is not supported by the Verilog semantics we adopted, so to match the semantics to the behaviour of the simulator and synthesis tool, braces are placed around all expressions to inhibit implicit resizing. Instead, explicit resizing is used in the semantics, and operations can only be performed on two registers that have the same size.
-
-In addition to that, equality checks between \emph{unsigned} variables are actually not supported, because this requires supporting the comparison of pointers, which should only be performed between pointers with the same provenance. In \vericert{} there is currently no way to determine the provenance of a pointer, and it therefore cannot model the semantics of unsigned comparison in \compcert{}. This is not a severe restriction in practice, however, because, as dynamic allocation is not supported either, equality comparison of pointers is rarely needed, and equality comparison of integers can still be performed by casting them both to signed integers.
-
-Finally, the \texttt{mulhs} and \texttt{mulhu} instructions, which fetch the upper bits of a 32-bit multiplication, are not translated by \vericert{} either, because 64-bit numbers are not supported. These instructions are only generated to optimise divisions by a constant that is not a power of two, so turning off constant propagation will allow these programs to pass without error.
-
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"