summaryrefslogtreecommitdiffstats
path: root/chapters/pipelining.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/pipelining.tex')
-rw-r--r--chapters/pipelining.tex59
1 files changed, 54 insertions, 5 deletions
diff --git a/chapters/pipelining.tex b/chapters/pipelining.tex
index 6c351c8..5aa03dc 100644
--- a/chapters/pipelining.tex
+++ b/chapters/pipelining.tex
@@ -13,7 +13,12 @@
Standard instruction scheduling only addresses parallelisation inside hyperblocks, which are linear
sections of code. However, loops are often the most critical sections in code, and scheduling only
-addresses parallelisation within one iteration.
+addresses parallelisation within one iteration. Traditionally, loop pipelining is performed as part
+of the normal scheduling step in \HLS, as the scheduling algorithm can be generalised to support
+these new constraints. However, it becomes expensive to check the correctness of a scheduling
+algorithm that transforms the code in this many ways. It is better to separate loop pipelining into
+it's own translation pass which does a source-to-source translation of the code. The final result
+should be similar to performing the loop pipelining together with the scheduling.
\section{Loop pipelining example}
@@ -53,10 +58,54 @@ addresses parallelisation within one iteration.
\stopfloatcombination
\stopplacemarginfigure
-\in{Figure}[fig:pipelined-loop] shows an example of pipelining a loop which accumulates values and
-modifies an array. In \in{Figure}{a}[fig:pipelined-loop], the body of the loop cannot be scheduled
-in less than three cycles, assuming that a load takes two clock cycles. However, after transforming
-the code into the pipelined version on the right
+\in{Figure}[fig:pipelined-loop] shows an example of \emph{software pipelining} of a loop which
+accumulates values and modifies an array. In \in{Figure}{a}[fig:pipelined-loop], the body of the
+loop cannot be scheduled in less than three cycles, assuming that a load takes two clock cycles.
+However, after transforming the code into the pipelined version in
+\in{Figure}{b}[fig:pipelined-loop], the number of inter-iteration dependencies have been reduced, by
+moving the store into the next iteration of the loop. This means that the body of the loop could
+now be scheduled in two clock cycles.
+
+The process of pipelining the loop by being resource constrained can be performed using iterative
+modulo scheduling~\cite[rau94_iterat_sched], followed by modulo variable
+expansion~\cite[lam88_softw_pipel]. The steps performed in the optimisation are the following.
+
+\startitemize[n]
+\item Calculate a minimum \II, which is the lowest possible \II\ based on the resources of the code
+ inside of the loop and the distance and delay of inter-iteration dependencies.
+\item Perform modulo scheduling with an increasing \II\ until one is found that works. The
+ scheduling is performed by adding instructions to a \MRT~\cite[lam88_softw_pipel], assigning
+ operations to each resource for one iteration.
+\item Once a valid modulo schedule is found, the loop code can be generated from it. To keep the
+ \II\ that was found in the previous step, modulo variable expansion might be necessary and require
+ unrolling the loop.
+\stopitemize
+
+\section{Verification of pipelining}
+
+Verification of pipelining has already been performed in CompCert by
+\cite[authoryears][tristan10_simpl_verif_valid_softw_pipel]. Assuming that one has a loop body
+$\mathcal{B}$, the pipelined code can be described using the prologue $\mathcal{P}$, the steady
+state $\mathcal{S}$, the epilogue $\mathcal{E}$, and finally some additional variables representing
+the minimum number of iterations that need to be performed to be able to use the pipelined loop
+$\mu$ and representing the unroll factor of the steady state $\delta$.
+
+The goal is to verify the equivalence of the original loop and the pipelined loop using a validation
+algorithm, and then prove that the validation algorithm is \emph{sound}, i.e. if it finds the two
+loops to be equivalent, this implies that they will behave the same according to the language
+semantics. Essentially, as the loop pipelining algorithm only modifies loops, it is enough to show
+that the input loop $X$ is equivalent to the pipelined version of the loop $Y$. Assuming that the
+validation algorithm is called $\alpha$, we therefore want to show that the following statement
+holds, where $X_N$ means that we are considering $N$ iterations of the loop.
+\placeformula\startformula
+ \forall N,\quad \alpha(X_N) = \startmathcases
+ \NC \alpha(Y_{N/\delta}; X_{N\%\delta}) \NC N \ge \mu \NR
+ \NC \alpha(X_N) \NC \text{otherwise} \NR
+\stopmathcases\stopformula
+
+This states that for any number of iteration of the loop X, the formula should hold. As the number
+of loop iterations $N$ are not always known at compile time, this may become an infinite property
+that needs to be checked.
\startmode[section]
\section{Bibliography}