3 files changed, 14 insertions, 3 deletions
diff --git a/algorithm.tex b/algorithm.tex
index c4c91a4..7570ed7 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -417,7 +417,19 @@ When proving this equivalence, we actually found a bug in our original implement
 
 %The \compcert{} semantics for the \texttt{Oshrximm} instruction expresses its operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  While conducting the proof, we discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
 
+\subsection{Limitations}
 
+There are various limitations in \vericert{} compared to other HLS tools due to the fact that the main focus was to formally verify the translation from 3AC to Verilog.
+
+\paragraph{Parallel execution of instructions}
+
+The main limitation of \vericert{} is that it does not perform scheduling of instructions into the same state, meaning that instructions cannot be executed in parallel.  However, the design of the intermediate languages in \vericert{} take this optimisation into account and are designed to support scheduling.  HTL is already an intermediate language which represents an FSMD which can have any number of statements in each state, and the semantics of HTL and Verilog allow for parallel assignments to registers.  To simplify the proof of the scheduling algorithm, and to minimise the changes necessary for the current translation from 3AC to HTL, a new language must be introduced, called 3ACPar, which would be equivalent to 3AC but instead of consisting of a map from program counter to instruction, it would consist of a map from program counter to list of instructions, which can all be executed in parallel.  A new proof for the scheduling algorithm would have to be written for the translation from 3AC to 3ACPar, for which a verified translation validation approach might be appropriate, however, the translation form 3ACPar to HTL would not change conceptually, except for the fact that multiple instructions can now be translated into the same state.  This small difference means that most of the proof can be reused without any changes, as the translation of instructions was the main body of the proof and did not change.
+
+In a similar vein, the introduction of pipelined operators, especially for division, can alleviate many of the timing issues shown in the PolyBench/C benchmarks with divisions included in Section~\ref{sec:evaluation}.  These operators can execute different stages of the operation in parallel, and therefore run long operations in parallel while sharing the same hardware.  In HTL, operations like this can be represented in a similar fashion to the load and store instructions to the RAM, by using wires to communicate with an abstract computation block modelled in HTL and later replaced by a hardware implementation.  However, 3ACPar would have to be modified to also describe such instructions so that these can be placed optimally using the external scheduling algorithm.
+
+\paragraph{Limitations with I/O}
+
+\vericert{} is currently limited in terms of I/O as well, because the main function cannot accept any arguments for the Clight program to be well-formed.  In addition to that, external function calls that produce traces have also not been implemented yet, however, these could enable the C program to read and write values to a bus that is shared by various other components in the hardware design.
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/evaluation.tex b/evaluation.tex
index 4ac85b4..ef43a9a 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -1,5 +1,4 @@
-\section{Evaluation}
-\label{sec:evaluation}
+\section{Evaluation}\label{sec:evaluation}
 
 Our evaluation is designed to answer the following three research questions.
 \begin{description}
diff --git a/introduction.tex b/introduction.tex
index d2ec9cc..cfb99b8 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -13,7 +13,7 @@ As latency, throughput and energy efficiency become increasingly important, cust
 Alas, designing these accelerators can be a tedious and error-prone process using a hardware description language (HDL) such as Verilog.
 An attractive alternative is \emph{high-level synthesis} (HLS), in which hardware designs are automatically compiled from software written in a high-level language like C. 
 Modern HLS tools such as \legup{}~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{intel_hls}, and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and energy-efficiency to those hand-written in an HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341}, while offering the convenient abstractions and rich ecosystems of software development.
-But existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to the software they were given, and this undermines any reasoning conducted at the software level. 
+But existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to the software they were given, and this undermines any reasoning conducted at the software level.
 
 Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence.
 %Some of these reasons are general: HLS tools are large pieces of software, they perform a series of complex analyses and transformations, and the HDL output they produce has superficial syntactic similarities to a software language but a subtly different semantics.