Fix some more notes

author: Yann Herklotz <git@yannherklotz.com> 2021-08-08 23:34:17 +0200
committer: Yann Herklotz <git@yannherklotz.com> 2021-08-08 23:34:17 +0200
commit: dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd (patch)
tree: 3568c78d08f43811327135d7c03e3b64bf3b25fd /limitations.tex
parent: 3aa90b4b91627bc851f79b47164280541f0c363c (diff)
download: oopsla21_fvhls-dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd.tar.gz
oopsla21_fvhls-dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd.zip
1 files changed, 15 insertions, 4 deletions
diff --git a/limitations.tex b/limitations.tex
index b002bdb..3bca4ac 100644
--- a/limitations.tex
+++ b/limitations.tex
@@ -6,13 +6,17 @@ There are various limitations in \vericert{} compared to other HLS tools due to
 
 \paragraph{Lack of instruction-level parallelism}
 
-The main limitation of \vericert{} is that it does not perform instruction scheduling, meaning that instructions cannot be gathered into the same state and executed in parallel.  However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling in the future. For instance, our HTL language allows arbitrary Verilog to appear in each state of the FSMD, including parallel assignments to registers. Our plan for adding scheduling support involves adding a new intermediate language after 3AC, tentatively called 3ACPar. This would be similar to 3AC but rather than mapping program counters to instructions, it would map program counters to \emph{lists} of instructions that can all be executed in parallel. The translation from 3AC to 3ACPar would be performed by a scheduling tool. Following \citet{tristan08_formal_verif_trans_valid} and \citet{six+20}, we expect to use translation validation to verify that each generated schedule is correct (rather than verifying the scheduling tool itself). The translation from 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state.
-\NR{It is best to explain why we didn't focus on scheduling with a positive/future work spin. For example, ``It was more intuitive to handle one instruction per cycle at the initial stage of our project as we want to focus our efforts on correctness, which has been the main weakness of HLS, rather than performance. However, we were very much aware during the design stage that in order for our compiler to be able to perform better, supporting of scheduling was inevitable. Hence, we intentionally left room for support of scheduling. In essence, instead of supporting one instruction per cycle, we must be able to support a list of instructions per cycle. To do so, we envision extensions to our work in several ways.'' We might not even need to specify the details of how. You can keep the tricks in your sleeves for the next publication. :)}.
+The main limitation of \vericert{} is that it does not perform instruction scheduling, meaning that instructions cannot be gathered into the same state and executed in parallel.  This limitation is present by design so that the focus could be made on the initial proof of correctness of the translation of instructions with a sequential schedule.  However, as scheduling is an important optimisation for HLS tools, each language in \vericert{} was designed with scheduling in mind, so that these would not have to change when it is implemented in the future.  For instance, our HTL language allows arbitrary Verilog to appear in each state of the FSMD, including parallel assignments to registers. Our plan for adding scheduling support involves adding a new intermediate language after 3AC, tentatively called 3ACPar. This would be similar to 3AC but rather than mapping program counters to instructions, it would map program counters to \emph{lists} of instructions that can all be executed in parallel. The translation from 3AC to 3ACPar would be performed by a scheduling tool. Following \citet{tristan08_formal_verif_trans_valid} and \citet{six+20}, we expect to use translation validation to verify that each generated schedule is correct (rather than verifying the scheduling tool itself). The translation from 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state.  This means that the proof of translation from 3AC to HTL can also be adapted with minimal changes to the top-level of the proofs.  The bulk of the proofs proving the translation of single instructions would stay intact.
+
+%However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling in the future.
+
+%\NR{It is best to explain why we didn't focus on scheduling with a positive/future work spin. For example, ``It was more intuitive to handle one instruction per cycle at the initial stage of our project as we want to focus our efforts on correctness, which has been the main weakness of HLS, rather than performance. However, we were very much aware during the design stage that in order for our compiler to be able to perform better, supporting of scheduling was inevitable. Hence, we intentionally left room for support of scheduling. In essence, instead of supporting one instruction per cycle, we must be able to support a list of instructions per cycle. To do so, we envision extensions to our work in several ways.'' We might not even need to specify the details of how. You can keep the tricks in your sleeves for the next publication. :)}\YH{Well one of the only things the reviewers wanted us to add was a realistic implementation of scheduling, so I think we need to at least keep that.}.
 
 %To simplify the proof of the scheduling algorithm, and to minimise the changes necessary for the current translation from 3AC to HTL, a new language must be introduced, called 3ACPar, which would be equivalent to 3AC but instead of consisting of a map from program counter to instruction, it would consist of a map from program counter to list of instructions, which can all be executed in parallel.  A new proof for the scheduling algorithm would have to be written for the translation from 3AC to 3ACPar, for which a verified translation validation approach might be appropriate, however, the translation form 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state.  This small difference means that most of the proof can be reused without any changes, as the translation of instructions was the main body of the proof and did not change.
 
 \paragraph{Lack of pipelined division}
-In a similar vein, the introduction of pipelined operators, especially for division, would alleviate the slow clock speed experienced in the \polybench{} benchmarks with divisions included in Section~\ref{sec:evaluation}.  Pipelined operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware.  In HTL, operations like this can be represented in a similar fashion to the load and store instructions by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation. \NR{Are you describing using IP blocks? If so, you can generalise it to any IP block rather than just division.}
+In a similar vein, the introduction of pipelined operators, especially for division, would alleviate the slow clock speed experienced in the \polybench{} benchmarks with divisions included in Section~\ref{sec:evaluation}.  Pipelined operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware.  In HTL, operations like this can be represented in a similar fashion to the load and store instructions by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation.
+%\NR{Are you describing using IP blocks? If so, you can generalise it to any IP block rather than just division.}\YH{I think I prefer just focusing on pipelined division for now, as that's specifically the issue, so that people know we have a plan to resolve specifically that.}
 %JW I've chopped the following sentence because it felt like it was going into too much detail. 
 %However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
 
@@ -21,7 +25,8 @@ In a similar vein, the introduction of pipelined operators, especially for divis
 \vericert{} is currently limited in terms of I/O as well, because the main function cannot accept any arguments for the Clight program to be well-formed. However, just like \compcert{}, \vericert{} can compile main functions with arbitrary arguments and will handle the inputs appropriately.  However, the main correctness theorem in \compcert{} assumes that the main function does not have any arguments, so it may be possible that unexpected behaviour is introduced.  In addition to that, external function calls that produce traces have also not been implemented yet, however, these could enable the C program to read and write values to a bus that is shared by various other components in the hardware design.
 
 \paragraph{Lack of support for global variables}
-In \compcert{}, each global variable is stored in its own memory.  A generalisation of the stack translation into a RAM block could therefore be performed to translate global variables in the same manner.  This would require a slight generalisation of pointers so that they store provenance information to ensure that each pointer accesses the right RAM. It would also be necessary to generalise the RAM interface so that it decodes the provenance information and indexes the right array. \NR{Curiously, is memory analysis in your to-do list?}
+In \compcert{}, each global variable is stored in its own memory.  A generalisation of the stack translation into a RAM block could therefore be performed to translate global variables in the same manner.  This would require a slight generalisation of pointers so that they store provenance information to ensure that each pointer accesses the right RAM. It would also be necessary to generalise the RAM interface so that it decodes the provenance information and indexes the right array.
+%\NR{Curiously, is memory analysis in your to-do list?}\YH{Well kind of, would be nice, but probably not sophisticated analysis.}
 
 \paragraph{Other language restrictions}
 C and Verilog handle signedness quite differently. By default, all operators and registers in Verilog (and HTL) are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to be signed. Moreover, Verilog implicitly resizes expressions to the largest needed size by default, which can affect the result of the computation.  This feature is not supported by the Verilog semantics we adopted, so to match the semantics to the behaviour of the simulator and synthesis tool, braces are placed around all expressions to inhibit implicit resizing.  Instead, explicit resizing is used in the semantics, and operations can only be performed on two registers that have the same size.
@@ -29,3 +34,9 @@ C and Verilog handle signedness quite differently. By default, all operators and
 In addition to that, equality checks between \emph{unsigned} variables are actually not supported, because this requires supporting the comparison of pointers, which should only be performed between pointers with the same provenance.  In \vericert{} there is currently no way to determine the provenance of a pointer, and it therefore cannot model the semantics of unsigned comparison in \compcert{}. This is not a severe restriction in practice, however, because, as dynamic allocation is not supported either, equality comparison of pointers is rarely needed, and equality comparison of integers can still be performed by casting them both to signed integers.
 
 Finally, the \texttt{mulhs} and \texttt{mulhu} instructions, which fetch the upper bits of a 32-bit multiplication, are not translated by \vericert{} either, because 64-bit numbers are not supported. These instructions are only generated to optimise divisions by a constant that is not a power of two, so turning off constant propagation will allow these programs to pass without error.
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: "main"
+%%% TeX-command-extra-options: "-shell-escape"
+%%% End:
author	Yann Herklotz <git@yannherklotz.com>	2021-08-08 23:34:17 +0200
committer	Yann Herklotz <git@yannherklotz.com>	2021-08-08 23:34:17 +0200
commit	dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd (patch)
tree	3568c78d08f43811327135d7c03e3b64bf3b25fd /limitations.tex
parent	3aa90b4b91627bc851f79b47164280541f0c363c (diff)
download	oopsla21_fvhls-dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd.tar.gz oopsla21_fvhls-dc23a3e2a53cb56e0bde73a3d0ae7206ab13bafd.zip