summaryrefslogtreecommitdiffstats
path: root/chapters/scheduling.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/scheduling.tex')
-rw-r--r--chapters/scheduling.tex76
1 files changed, 39 insertions, 37 deletions
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
index 15850ce..8457a1e 100644
--- a/chapters/scheduling.tex
+++ b/chapters/scheduling.tex
@@ -7,18 +7,18 @@
\section[introduction]{Introduction}
-The use of multicore processors has been increasing drastically, thereby making parallelising
-compilers ever more important. In addition to that, the need for custom hardware accelerators is
-also increasing, as traditional processors are not always the best choice for all
-applications. Compilers that support optimisations which can automatically parallelise it's input
-programs are therefore also becoming more important, as these back ends all benefit from it. For
-processors, one wants to optimise the pipeline usage as much as possible, whereas in hardware one
-can create custom pipelines, as well as making more use out of the spacial properties that hardware
-offers, making instruction parallelism important as well as speculative execution. However, with
-more powerful optimisations there is a greater chance to make mistakes in the compiler, making it
-possible to generate programs that behave differently to the input program. An important step
-towards making compilers more reliable is to formally verify these optimisations so that it can be
-guaranteed that bugs are not introduced.
+\oindex{static scheduling}The use of multicore processors has been increasing drastically, thereby
+making parallelising compilers ever more important. In addition to that, the need for custom
+hardware accelerators is also increasing, as traditional processors are not always the best choice
+for all applications. Compilers that support optimisations which can automatically parallelise it's
+input programs are therefore also becoming more important, as these back ends all benefit from
+it. For processors, one wants to optimise the pipeline usage as much as possible, whereas in
+hardware one can create custom pipelines, as well as making more use out of the spacial properties
+that hardware offers, making instruction parallelism important as well as speculative
+execution. However, with more powerful optimisations there is a greater chance to make mistakes in
+the compiler, making it possible to generate programs that behave differently to the input
+program. An important step towards making compilers more reliable is to formally verify these
+optimisations so that it can be guaranteed that bugs are not introduced.
A popular optimisation for processors increasing the instruction parallelism of the resulting
program is to use scheduling. This is an optimisation that is especially important for processors
@@ -31,11 +31,12 @@ can never move instructions past a branching instruction, even when this might d
the performance. Trace scheduling~ is an alternative that addresses this issue, by creating paths
that might cross branching instructions, and scheduling the instructions in that path. However, the
main problem with trace scheduling is that it is often infeasible on large programs, and because of
-the large design space, it's sometimes hard to find an optimal schedule. Superblock~ and hyperblock~
-scheduling are two subsets of trace scheduling, which compromise on the flexibility of trace
-scheduling to instead build more tractable algorithms. Superblock scheduling produces more
-efficient code for processors with a low issue rate, whereas hyperblock scheduling produces more
-performant code for processors with a high issue rate~.
+the large design space, it's sometimes hard to find an optimal
+schedule. \index{superblock}Superblock~ and \index{hyperblock}hyperblock~ scheduling are two subsets
+of trace scheduling, which compromise on the flexibility of trace scheduling to instead build more
+tractable algorithms. Superblock scheduling produces more efficient code for processors with a low
+issue rate, whereas hyperblock scheduling produces more performant code for processors with a high
+issue rate~.
CompCert~ is a compiler that is formally verified in Coq. In addition to that, have also added a
formally verified hardware back end to CompCert, allowing it to also generate provably correct
@@ -59,7 +60,7 @@ the key points of this paper are the following:
\section[sec:scheduling]{Implementation of Hyperblocks in CompCert}
-This section describes the structure of hyperblocks in
+\index{hyperblock}This section describes the structure of hyperblocks in
Section~\goto{2.1}[scheduling:hyperblocks]. Then, the structure of two extra intermediate languages
that were added to CompCert to implement hyperblocks are also shown in
Section~\goto{2.2}[sec:rtlblockdef].
@@ -81,13 +82,13 @@ predicates can be removed, as these instructions will never be activate at the s
\subsection[sec:rtlblockdef]{RTLBlock and RTLPar Intermediate Language Definition}
-Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows the intermediate languages
-introduced to implement hyperblocks in CompCert. This consists of new RTLBlock and RTLPar
-intermediate languages, which implement the sequential and parallel semantics of basic blocks
-respectively. The semantics of RTLBlock and RTLPar are based on the CompCert RTL intermediate
-language. However, instead of mapping from states to instructions, RTLBlock maps from states to
-hyperblocks, and RTLPar maps from states to parallel hyperblocks, which will be described in the
-next sections.
+\lindex{RTLBlock}\lindex{RTLPar}Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows
+the intermediate languages introduced to implement hyperblocks in CompCert. This consists of new
+RTLBlock and RTLPar intermediate languages, which implement the sequential and parallel semantics of
+basic blocks respectively. The semantics of RTLBlock and RTLPar are based on the CompCert RTL
+intermediate language. However, instead of mapping from states to instructions, RTLBlock maps from
+states to hyperblocks, and RTLPar maps from states to parallel hyperblocks, which will be described
+in the next sections.
\placeformula \startformula \startmathalignment
\NC i\ \ \colon\colon= \NC \ \ \mono{RBnop} \NR
@@ -180,16 +181,17 @@ schedule that is shown in Figure~\goto{{[}fig:op_chain_d{]}}[fig:op_chain_d].
\section[abstr_interp]{Abstract Interpretation of Hyperblocks}
-The correctness of the scheduling algorithm is checked by comparing the symbolic expressions of each
-register before and after scheduling. This section describes how these symbolic expressions are
-constructed for each hyperblock.
+\index{abstract interpretation}The correctness of the scheduling algorithm is checked by comparing
+the symbolic expressions of each register before and after scheduling. This section describes how
+these symbolic expressions are constructed for each hyperblock.
\subsection[abstr_language]{Construction of Abstract Language}
-The main difficulty when constructing symbolic expressions for hyperblocks is the presence of
-predicates, which can be used by the scheduling algorithm to reorder instructions further, even if
-these contain data dependencies. The scheduling algorithm will manipulate predicates in the
-following ways, which the comparison function needs to take into account.
+The main difficulty when constructing \index{symbolic expression}symbolic expressions for
+hyperblocks is the presence of predicates, which can be used by the scheduling algorithm to reorder
+instructions further, even if these contain data dependencies. The scheduling algorithm will
+manipulate predicates in the following ways, which the comparison function needs to take into
+account.
\startitemize
\item
@@ -271,11 +273,11 @@ meaning if the predicate evaluates to true, all other predicates must evaluate t
\subsection[linear-flat-predicated-expressions]{Linear (Flat) Predicated Expressions}
-The update function for predicated expressions is quite a bit more complex compared to standard
-update functions for symbolic execution, such as the one used by . The main reason for this is the
-non-recursive nature of the predicated expressions. As the predicates are only available at the
-top-level of the expressions, combining multiple expressions means that the conditions and multiple
-possibilities also need to trickle up to the top.
+\index{symbolic expression}The update function for predicated expressions is quite a bit more
+complex compared to standard update functions for symbolic execution, such as the one used by . The
+main reason for this is the non-recursive nature of the predicated expressions. As the predicates
+are only available at the top-level of the expressions, combining multiple expressions means that
+the conditions and multiple possibilities also need to trickle up to the top.
Instead, if the predicated instructions were recursive, one would not have to change any predicates
when evaluating the argument lists of instructions, for example, as these can just stay