summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2022-05-06 16:24:06 +0100
committerYann Herklotz <git@yannherklotz.com>2022-05-06 16:24:06 +0100
commit66f0fcdcd83bbffb45f08483bbd859503e3c1fa5 (patch)
tree4345fd43135d64f18dd64cfbaef1da6364f528ba
parent2f2f98e6c2b8d0e74d6f8930633d20b2e5e79678 (diff)
downloadlsr22_fvhls-66f0fcdcd83bbffb45f08483bbd859503e3c1fa5.tar.gz
lsr22_fvhls-66f0fcdcd83bbffb45f08483bbd859503e3c1fa5.zip
Fix the description of if-conversion
-rw-r--r--chapters/scheduling.tex83
1 files changed, 40 insertions, 43 deletions
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
index 58528cd..dc7a88c 100644
--- a/chapters/scheduling.tex
+++ b/chapters/scheduling.tex
@@ -207,7 +207,7 @@ transition in the left hand side.
\hbox{\starttikzpicture[shorten >=1pt,>=stealth]
\node (s2) at (0,0) {$S_{2}$};
\node (s1) at (0,2) {$S_{1}$};
- \node (s1p) at (3,2) {$S_{1}'$};
+ \node (s1p) at (3,2) {$S_{1}$};
\draw (s1) -- node[above] {$\sim$} (s1p);
\draw[->] (s1) -- (s2);
\draw[dashed] (s2) -- node[below,xshift=1mm] {$\sim$} (s1p);
@@ -249,31 +249,28 @@ never be activate at the same time.
\section[implementation-of-if-conversion]{Implementation of If-Conversion}
-If-conversion is the conversion that introduces predicated instructions which can make use of the
-hyperblocks. It converts conditional statements into predicated instructions. This transformation
-has a few limitations on the kind of conditional statements that it can translate. First, only
-conditional statements without loops can be translated, therefore, one must identify cycles in the
-control-flow before performing the if-conversion. Secondly, if-conversion will not always result in
-more efficient code, so it should not be applied to any conditional statements. Instead, it is best
-applied to conditional statements where each branch will take a similar amount of time to execute.
+If-conversion is the transformation that introduces predicated instructions which form the
+hyperblocks. It has a few limitations on the kind of conditional statements that it can translate.
+First, only conditional statements without loops can be translated, therefore, one must identify
+cycles in the control-flow before performing the if-conversion. Secondly, if-conversion will not
+always result in more efficient code, so it should not be applied to any conditional
+statements. Instead, it is best applied to conditional statements where each branch will take a
+similar amount of time to execute.
+
+From an implementation point of view, if-conversion can be implemented naïvely, however, it could be
+improved by adding better heuristics for which conditional statements to transform into predicated
+instructions. For hyperblocks, conditionals where each branch approximately execute in the same
+number of cycles are an ideal candidate for the transformation, as it means that no time is lost due
+to having to execute extra predicated instructions that are false. However, in addition to that if
+one has an estimate on the probability of picking a branch, then one can introduce more interesting
+heuristics such as also transforming conditionals who have disproportionate branches, but where one
+is more likely to execute the longest branch every time.
For back ends that do not support predicates, it is also important to eliminate the predicates again
using a process called reverse if-conversion, which creates branches out of the predicates and
groups these together again. This allows for hyperblock scheduling to also be used with back ends
-that do not support predicates, which are most the default back ends included in CompCert.
-
-From an implementation point of view, the two steps are relatively simple to implement naïvely,
-however, especially the if-conversion transformation could be improved by adding better heuristics
-for which conditional statements to transform into predicated instructions. For hyperblocks,
-conditionals where each branch approximately execute in the same number of cycles are an ideal
-candidate for the transformation, as it means that no time is lost due to having to execute extra
-predicated instructions that are false. However, in addition to that if one has an estimate on the
-probability of picking a branch, then one can introduce more interesting heuristics such as also
-transforming conditionals who have disproportionate branches, but where one is more likely to
-execute the longest branch every time.
-
-However, from a proof perspective they are not the same. In this case, the basic block generation
-is especially tricky due to the
+that do not support predicates, which are most the default back ends included in CompCert. However,
+this is currently out of scope for the implementation.
\section[sec:scheduling]{Scheduling Implementation}
@@ -291,29 +288,29 @@ next sections.
\startfloatcombination[nx=2]
\startplacefigure[reference={eq:standard},title={Syntax for instructions within a hyperblock.}]
\startframedtext[frame=off,offset=none,width={0.45\textwidth}]
- \startformula\startmathalignment
-\NC i\ \ \eqdef \NC \ \ \text{\tt RBnop} \NR
-\NC \NC |\ \ \text{\tt RBop}\ p?\ \mathit{op}\ \vec{r}\ d \NR
-\NC \NC |\ \ \text{\tt RBload}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ d \NR
-\NC \NC |\ \ \text{\tt RBstore}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ s \NR
-\NC \NC |\ \ \text{\tt RBsetpred}\ p?\ c\ \vec{r}\ d \NR
-\stopmathalignment\stopformula
-\stopframedtext
+ \startformula\startmathalignment
+ \NC i\ \ \eqdef \NC \ \ \text{\tt RBnop} \NR
+ \NC \NC |\ \ \text{\tt RBop}\ p?\ \mathit{op}\ \vec{r}\ d \NR
+ \NC \NC |\ \ \text{\tt RBload}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ d \NR
+ \NC \NC |\ \ \text{\tt RBstore}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ s \NR
+ \NC \NC |\ \ \text{\tt RBsetpred}\ p?\ c\ \vec{r}\ d \NR
+ \stopmathalignment\stopformula
+ \stopframedtext
\stopplacefigure
\startplacefigure[reference={eq:cf-instr},title={Control-flow instructions ending a hyperblock.}]
-\startframedtext[frame=off,offset=none,width={0.45\textwidth}]
- \startformula \startmathalignment
-\NC i_{\mathit{cf}}\ \ \eqdef \NC \ \ \text{\tt RBcall}\ \mathit{sig}\ f\ \vec{r}\ d\ n \NR
-\NC \NC |\ \ \text{\tt RBtailcall}\ \mathit{sig}\ f\ \vec{r} \NR
-\NC \NC |\ \ \text{\tt RBbuiltin}\ f_{\mathit{ext}}\ \vec{r}\ r\ n \NR
-\NC \NC |\ \ \text{\tt RBcond}\ c\ \vec{r}\ n_{1}\ n_{2} \NR
-\NC \NC |\ \ \text{\tt RBjumptable}\ r\ \vec{n} \NR
-\NC \NC |\ \ \text{\tt RBreturn}\ r? \NR
-\NC \NC |\ \ \text{\tt RBgoto}\ n \NR
-\NC \NC |\ \ \text{\tt RBpred\_{cf}}\ P\ i_{\mathit{cf}_{1}}\ i_{\mathit{cf}_{2}} \NR
-\stopmathalignment \stopformula
-\stopframedtext
+ \startframedtext[frame=off,offset=none,width={0.45\textwidth}]
+ \startformula \startmathalignment
+ \NC i_{\mathit{cf}}\ \ \eqdef \NC \ \ \text{\tt RBcall}\ \mathit{sig}\ f\ \vec{r}\ d\ n \NR
+ \NC \NC |\ \ \text{\tt RBtailcall}\ \mathit{sig}\ f\ \vec{r} \NR
+ \NC \NC |\ \ \text{\tt RBbuiltin}\ f_{\mathit{ext}}\ \vec{r}\ r\ n \NR
+ \NC \NC |\ \ \text{\tt RBcond}\ c\ \vec{r}\ n_{1}\ n_{2} \NR
+ \NC \NC |\ \ \text{\tt RBjumptable}\ r\ \vec{n} \NR
+ \NC \NC |\ \ \text{\tt RBreturn}\ r? \NR
+ \NC \NC |\ \ \text{\tt RBgoto}\ n \NR
+ \NC \NC |\ \ \text{\tt RBpred\_{cf}}\ P\ i_{\mathit{cf}_{1}}\ i_{\mathit{cf}_{2}} \NR
+ \stopmathalignment \stopformula
+ \stopframedtext
\stopplacefigure
\stopfloatcombination
\stopplacefigure
@@ -348,7 +345,7 @@ of the basic blocks in \rtlblock\ as well as \rtlpar, and where the parallel sem
come into play. \rtlblock\ is made of a list of instructions and a control-flow instruction that
ends the hyperblock. Each instruction in \rtlblock\ is executed sequentially. \rtlpar\ is made of
three separate lists. The outer list behaves like the list in \rtlblock, and executes each list
-inside of it sequentially. \rtlpar\ then has a list which executes it's contents in parallel,
+inside of it sequentially. \rtlpar\ then has a list which executes its contents in parallel,
meaning each instruction uses the same state of the registers and memory while executing. Finally,
instead of directly containing instructions, there is an additional layer of sequentially executing
lists of instructions, to be able to execute small groups of instructions in sequence, but still in