summaryrefslogtreecommitdiffstats
path: root/chapters/scheduling.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/scheduling.tex')
-rw-r--r--chapters/scheduling.tex587
1 files changed, 587 insertions, 0 deletions
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
new file mode 100644
index 0000000..5edad2e
--- /dev/null
+++ b/chapters/scheduling.tex
@@ -0,0 +1,587 @@
+\startcomponent scheduling
+\project main
+
+\chapter{Static Scheduling}
+
+\section[introduction]{Introduction}
+
+The use of multicore processors has been increasing drastically, thereby
+making parallelising compilers ever more important. In addition to that,
+the need for custom hardware accelerators is also increasing, as
+traditional processors are not always the best choice for all
+applications. Compilers that support optimisations which can
+automatically parallelise it's input programs are therefore also
+becoming more important, as these back ends all benefit from it. For
+processors, one wants to optimise the pipeline usage as much as
+possible, whereas in hardware one can create custom pipelines, as well
+as making more use out of the spacial properties that hardware offers,
+making instruction parallelism important as well as speculative
+execution. However, with more powerful optimisations there is a greater
+chance to make mistakes in the compiler, making it possible to generate
+programs that behave differently to the input program. An important step
+towards making compilers more reliable is to formally verify these
+optimisations so that it can be guaranteed that bugs are not introduced.
+
+A popular optimisation for processors increasing the instruction
+parallelism of the resulting program is to use scheduling. This is an
+optimisation that is especially important for processors that have deep
+instruction pipelines or multiple functional units that can be used to
+execute instructions in parallel. However, this optimisation is also
+critical for high-level synthesis tools, which generate application
+specific hardware designs based on a software input. Simple scheduling
+algorithms often work on basic blocks, which are sequential lists of
+instructions without any branches. However, this limits what kind of
+reordering the scheduling algorithm can perform. It can never move
+instructions past a branching instruction, even when this might
+drastically improve the performance. Trace scheduling~ is an alternative
+that addresses this issue, by creating paths that might cross branching
+instructions, and scheduling the instructions in that path. However, the
+main problem with trace scheduling is that it is often infeasible on
+large programs, and because of the large design space, it's sometimes
+hard to find an optimal schedule. Superblock~ and hyperblock~ scheduling
+are two subsets of trace scheduling, which compromise on the flexibility
+of trace scheduling to instead build more tractable algorithms.
+Superblock scheduling produces more efficient code for processors with a
+low issue rate, whereas hyperblock scheduling produces more performant
+code for processors with a high issue rate~.
+
+CompCert~ is a compiler that is formally verified in Coq. In addition to
+that, have also added a formally verified hardware back end to CompCert,
+allowing it to also generate provably correct hardware designs. A
+hardware back end is an ideal target for hyperblock scheduling, as the
+issue rate is practically infinite. This paper describes the
+implementation of a hyperblock instruction scheduler in CompCert so that
+it can benefit existing CompCert back ends, as well as support more
+general targets such as scheduling instructions for custom hardware.
+
+\subsection[key-points]{Key Points}
+
+The goal of this paper is to introduce hyperblock scheduling into
+CompCert to allow for advanced and flexible scheduling operations that
+could be beneficial to various different back ends. To this end, the key
+points of this paper are the following:
+
+\startitemize
+\item
+ Description of how hyperblocks are introduced for improved scheduling
+ of instructions, and how these can be built and translated back to
+ instructions without predicates.
+\item
+ Description of the implementation and correctness proof of a system of
+ difference constraints (SDC) scheduling algorithm, which can be
+ adapted to various different scheduling strategies. {\bf {[}{[}JW:}
+ What's the link between hyperblocks and SDC-based scheduling? Are
+ these orthogonal?{\bf {]}{]}}{\bf {[}{[}YH:} Well the original SDC
+ papers only talk about basic blocks, and not SDC hyperblock
+ scheduling. But it's easy enough to adapt. SDC just requires some kind
+ of blocks without control-flow.{\bf {]}{]}} {\bf {[}{[}JW:} Ok cool,
+ so you can have SDC scheduling without hyperblocks. Can you have
+ hyperblocks without SDC scheduling? (I guess so, but just wanted to be
+ completely sure.){\bf {]}{]}}{\bf {[}{[}YH:} Yes exactly, you can
+ perform any type of basic block scheduling on hyperblocks as well I
+ think, as long as you mark the dependencies correctly. But in the
+ worst case you could even just see hyperblocks as basic blocks and
+ ignore predicates. It would still be correct but just less
+ efficient.{\bf {]}{]}}
+
+ {\bf {[}{[}JW:} It would be interesting to make the case for SDC-based
+ scheduling. Is it superior to other scheduling mechanisms? Is it so
+ much superior that it's worth the additional complexity, compared to,
+ say, list scheduling? {\bf {]}{]}}{\bf {[}{[}YH:} SDC scheduling is
+ more flexible with the type of constraint to optimise for (latency /
+ throughput). I don't think it's too meaningful for compilers, where
+ there isn't as much flexibility.{\bf {]}{]}} {\bf {[}{[}JW:} Thanks.
+ Does Vericert have any idea that the scheduling is being performed
+ using SDC? Or is the SDC stuff entirely internal to your unverified
+ OCaml code? That is, could you swap your scheduler with a different
+ one that isn't based on SDC, and not have to change anything in
+ Vericert?{\bf {]}{]}}{\bf {[}{[}YH:} The verified part doesn't know
+ about SDC at all, so it could be changed to any other scheduler that
+ acts on hyperblocks.{\bf {]}{]}}
+\stopitemize
+
+\section[sec:scheduling]{Implementation of Hyperblocks in CompCert}
+
+This section describes the structure of hyperblocks in
+Section~\goto{2.1}[scheduling:hyperblocks]. Then, the structure of two
+extra intermediate languages that were added to CompCert to implement
+hyperblocks are also shown in Section~\goto{2.2}[sec:rtlblockdef].
+
+\subsection[scheduling:hyperblocks]{Hyperblocks as a Generalisation of Basic Blocks}
+
+Basic blocks are popular in intermediate languages because they describe
+a list of sequential instructions that are not separated by
+control-flow. This information can be used by various analysis passes to
+perform optimisations, as these sequences of instructions can be
+reordered in any way as long as their data dependencies are met. One
+such optimisation is scheduling, which reorders instructions to
+eliminate of any delay slots caused by the CPU pipeline.
+
+However, due to control-flow not being allowed in basic blocks, the
+number of instructions that can be scheduled at any one time is limited.
+The boundaries to other basic blocks act as hard boundary to the
+scheduling algorithm. hyperblocks~ allow instructions to be predicated,
+thereby introducing some control-flow information into the blocks, which
+can also be taken into account by the scheduler. For example, any data
+dependency between instructions that have mutually exclusive predicates
+can be removed, as these instructions will never be activate at the same
+time.
+
+\subsection[sec:rtlblockdef]{RTLBlock and RTLPar Intermediate Language Definition}
+
+Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows the
+intermediate languages introduced to implement hyperblocks in CompCert.
+This consists of new RTLBlock and RTLPar intermediate languages, which
+implement the sequential and parallel semantics of basic blocks
+respectively. The semantics of RTLBlock and RTLPar are based on the
+CompCert RTL intermediate language. However, instead of mapping from
+states to instructions, RTLBlock maps from states to hyperblocks, and
+RTLPar maps from states to parallel hyperblocks, which will be described
+in the next sections.
+
+%\startformula \begin{aligned}
+% i\ ::=&\ \ \mono{RBnop} \\
+% &|\ \mono{RBop}\ p?\ \mathit{op}\ \vec{r}\ d \\
+% &|\ \mono{RBload}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ d \\
+% &|\ \mono{RBstore}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ s \\
+% &|\ \mono{RBsetpred}\ p?\ c\ \vec{r}\ d \\
+% \end{aligned} \stopformula
+%
+%\startformula \begin{aligned}
+% i_{\mathit{cf}}\ ::=&\ \ \mono{RBcall}\ \mathit{sig}\ f\ \vec{r}\ d\ n \\
+% &|\ \mono{RBtailcall}\ \mathit{sig}\ f\ \vec{r} \\
+% &|\ \mono{RBbuiltin}\ f_{\mathit{ext}}\ \vec{r}\ r\ n \\
+% &|\ \mono{RBcond}\ c\ \vec{r}\ n_{1}\ n_{2} \\
+% &|\ \mono{RBjumptable}\ r\ \vec{n} \\
+% &|\ \mono{RBreturn}\ r? \\
+% &|\ \mono{RBgoto}\ n \\
+% &|\ \mono{RBpred\_{cf}}\ P\ i_{\mathit{cf}_{1}}\ i_{\mathit{cf}_{2}} \\
+% \end{aligned} \stopformula
+
+RTLBlock instructions are split into two types of instructions, standard
+instructions and control-flow instructions. The standard instructions
+are the instructions that can be placed into hyperblocks, whereas
+control-flow instructions are the instructions that can end hyperblocks.
+The standard instructions are shown in
+Figure~\goto{{[}fig:standard_instruction{]}}[fig:standard_instruction],
+whereas the control-flow instructions are shown in
+Figure~\goto{{[}fig:control_flow_instruction{]}}[fig:control_flow_instruction].
+Most instructions are quite similar to their RTL counterparts, however,
+there are some instructions that have been added. \type{RBsetpred} is a
+standard instruction which sets a predicate equal to an evaluated
+condition. This instruction is used during if-conversion to set the
+predicate to the value of the condition in the conditional statement
+instruction. To support proper predicated instructions, each instruction
+therefore also contains an optional predicate which has been set
+beforehand, which produces the hyperblock. In additon to that, there is
+also an extra control-flow instruction called \type{RBpred_cf}, which
+can branch on a predicate an nests more control-flow instructions inside
+of it. This is also necessary for if-conversion, when converting an
+already converted conditional statement, as depending on the predicate,
+a different control-flow instruction may be necessary. {\bf {[}{[}JW:} I
+think there'd be a lot of value in having a picture showing op-chaining
+in action. See tex source for the kind of thing I mean. Probably best
+done as a running example throughout the paper. {\bf {]}{]}}
+
+These instructions are use in RTLBlock as well as in RTLPar. The main
+difference between these two languages is how these instructions are
+arranged within the hyperblock and the execution semantics of the
+hyperblock in the languages. Let $\mathit{hb}_{b}$ be the definition of
+a hyperblock for RTLBlock, and let $\mathit{hb}_{p}$ be the definition
+of a hyperblock for RTLPar, then $\mathit{hb}_{b}$ and $\mathit{hb}_{p}$
+be defined as in Figure~\goto{{[}fig:hb_def{]}}[fig:hb_def]. The Figure
+shows the different nesting levels of the basic blocks in RTLBlock as
+well as RTLPar, and where the parallel semantics of RTLPar come into
+play. RTLBlock is made of a list of instructions and a control-flow
+instruction that ends the hyperblock. Each instruction in RTLBlock is
+executed sequentially. RTLPar is made of three separate lists. The outer
+list behaves like the list in RTLBlock, and executes each list inside of
+it sequentially. RTLPar then has a list which executes it's contents in
+parallel, meaning each instruction uses the same state of the registers
+and memory while executing. Finally, instead of directly containing
+instructions, there is an additional layer of sequentially executing
+lists of instructions, to be able to execute small groups of
+instructions in sequence, but still in parallel to the other
+instructions in the current parallel group.
+
+RTLPar is structured in this way so that the scheduler has more
+flexibility, so that it can not only parallelise operations in the basic
+blocks, but that it can also define sequential operations in these
+parallel operations. This is especially useful for the hardware back
+end, as it means that sequential operations that do not quite fill one
+clock cycle can be executed sequentially, but can still be executed in
+parallel with other independent operations. This optimisation is called
+operation chaining, and is critical to get the most performance out of
+the hardware.
+
+\subsection[translating-rtlblock-to-rtlpar]{Translating RTLBlock to RTLPar}
+
+The translation from RTLBlock to RTLPar is performed using a scheduling
+algorithm, which will takes hyperblocks and rearranges their
+instructions to maximise parallelism based on the data dependencies. It
+then arranges the instructions in RTLPar by putting independent
+instructions into the same parallel block. Scheduling is an optimisation
+that is well suited to translation validation, as the algorithm itself
+can be quite complex, whereas the output is normally relatively easy to
+check for equivalence with the input.
+Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows that
+the scheduling step comes right after the if-conversion step which
+originally creates the hyperblocks.
+
+Figure~\goto{{[}fig:op_chain{]}}[fig:op_chain] shows how the scheduling
+step transforms RTLBlock into RTLPar. The RTLBlock hyperblock being
+scheduled is shown in
+Figure~\goto{{[}fig:op_chain_a{]}}[fig:op_chain_a], which contains five
+predicated operations, comprising two additions and three
+multiplications. The data dependencies of the instructions in the
+hyperblock are shown in
+Figure~\goto{{[}fig:op_chain_b{]}}[fig:op_chain_b]. Curiously, even
+though operations and are normally dependent on each other due to a
+write-after-write conflict, but because the predicates are independent
+the write-after-write conflict can effectively be removed by the
+scheduler.
+
+The scheduler transforms the RTLBlock hyperblock into the RTLPar
+hyperblock shown in Figure~\goto{{[}fig:op_chain_c{]}}[fig:op_chain_c].
+Even though the addition in is dependent on , they can still be put into
+the same cycle as the additions do not take as long to complete as the
+multiplications. This optimisation is called operation chaining. Then,
+the multiplication in can also be placed in the same clock cycle as it
+does not have any data dependencies. Then, in the next clock cycle,
+either the multiplication in can take place, or the multiplication in
+will take place, meaning these two multiplications can also be placed in
+the same clock cycle. This gives the final schedule that is shown in
+Figure~\goto{{[}fig:op_chain_d{]}}[fig:op_chain_d].
+
+\section[abstr_interp]{Abstract Interpretation of Hyperblocks}
+
+The correctness of the scheduling algorithm is checked by comparing the
+symbolic expressions of each register before and after scheduling. This
+section describes how these symbolic expressions are constructed for
+each hyperblock.
+
+\subsection[abstr_language]{Construction of Abstract Language}
+
+The main difficulty when constructing symbolic expressions for
+hyperblocks is the presence of predicates, which can be used by the
+scheduling algorithm to reorder instructions further, even if these
+contain data dependencies. The scheduling algorithm will manipulate
+predicates in the following ways, which the comparison function needs to
+take into account.
+
+\startitemize
+\item
+ Instructions with predicates that are equivalent to false can be
+ removed.
+\item
+ Instructions with predicates that are mutually exclusive can be
+ executed in parallel.
+\item
+ Any predicate can be replaced with an equivalent predicate.
+\stopitemize
+
+The symbolic expressions have the following form:
+
+There are three main resources that are present in symbolic expressions:
+registers $r$, which represent the value of the registers in the
+program, predicates $p$, which represent the value of the predicates
+throughout the program, and finally, memory \type{Mem}, representing the
+state of the memory. For each basic instruction in the hyperblock, there
+is an equivalent symbolic expression which either modifies the state of
+the registers, predicates or memory. Finally, as each instruction in the
+hyperblock can be predicates, each symbolic expression is also paired
+with a predicate as well.
+
+Assigning the same resource multiple times with instructions that
+contain different predicates, means that the symbolic expression must
+also express this conditional result based on the predicates. This is
+expressed by assigning lists of predicated symbolic expressions to each
+resource, where each predicate needs to be mutually exclusive from the
+other predicates in the list.
+
+The expressions are then constructed using a function which updates the
+symbolic expressions assigned for each resource. This is done using
+multiple primitives which act on predicated types, which are made up of
+a list of pairs of predicates and the element of that type. The first
+important primitive is multiplication of two predicated types, which is
+implemented as performing a cartesian multiplication between the
+predicated types in the two lists, anding the two predicates and joining
+the two types of each list using a function.
+
+%\startformula \label{eq:1}
+% P_{1} \otimes_{f} P_{2} \equiv \mono{map } (\lambda ((p_{1}, e_{1}), (p_{2}, e_{2})) . (p_{1} \land p_{2}, f\ e_{1}\ e_{2})) P_{1} \times P_{2} \stopformula
+
+In addition to that, another primitive that is needed is the following
+append operation, which will negate the combination of all predicates in
+the second predicated type, and it to the first predicated type and
+append the first predicated type to the second:
+
+%\startformula \label{eq:2}
+% \mu(p, P) \equiv \mono{map } (\lambda (p', e') . (p \land p', e'))\ P \stopformula
+
+%\startformula \label{eq:3}
+% P_{1} \oplus_{p} P_{2} \equiv \mu(\neg p, P_{1}) \mathbin{++} \mu(p, P_{2}) \stopformula
+
+\subsection[title={Example of
+translation},reference={example-of-translation}]
+
+%\startformula \begin{aligned}
+% \label{eq:4}
+% &\mono{r1} =
+% \begin{cases}
+% \mono{r1}^{0} + \mono{r4}^{0} + \mono{r4}^{0}, \quad &\mono{p1} \\
+% \mono{r1}^{0}, \quad &\mono{!p1} \\
+% \end{cases}\\
+% &\mono{r2} = \mono{r1}^{0} + \mono{r4}^{0}\\
+% &\mono{r3} =
+% \begin{cases}
+% \left( \mono{r1}^{0} \times \mono{r1}^{0} \right) \times \left( \mono{r1}^{0} \times \mono{r1}^{0} \right),\quad &\mono{!p2 \&\& !p1}\\
+% \mono{r1}^{0} \times \mono{r4}^{0},\quad &\mono{p2 \&\& !p1}\\
+% \left( \mono{r1}^{0} + \mono{r4}^{0} + \mono{r4}^{0} \right) \times \mono{r4}^{0},\quad &\mono{!p2 \&\& p1}\\
+% \mono{r3}^{0} \times \mono{r3}^{0},\quad &\mono{p2 \&\& p1}
+% \end{cases}
+% \end{aligned} \stopformula
+
+Using the example shown in
+Figure~\goto{{[}fig:op_chain{]}}[fig:op_chain], the RTLBlock hyperblock
+shown in Figure~\goto{{[}fig:op_chain_a{]}}[fig:op_chain_a] is scheduled
+into the hyperblock RTLPar shown in
+Figure~\goto{{[}fig:op_chain_c{]}}[fig:op_chain_c]. The first step is to
+translate the input and output of the scheduler into the Abstr
+intermediate language by calculating the symbolic values that will be
+stored in each of the registers. Symbolically evaluating the RTLBlock
+hyperblock results in the expressions shown in
+Figure~\goto{{[}fig:symbolic_expressions{]}}[fig:symbolic_expressions]
+for registers \type{r1}, \type{r2} and \type{r3}, where the state of the
+register \type{r1} at the start of the hyperblock is denoted as
+$\mono{r1}^{0}$.
+
+This symbolic expression can be generated by sequentially going through
+the list of predicated expressions and applying the update function
+defined in Section~\goto{3.1}[abstr_language]. The resulting predicated
+expressions is guaranteed to have predicates which are mutually
+exclusive, meaning if the predicate evaluates to true, all other
+predicates must evaluate to false.
+
+\subsection[title={Linear (Flat) Predicated
+Expressions},reference={linear-flat-predicated-expressions}]
+
+The update function for predicated expressions is quite a bit more
+complex compared to standard update functions for symbolic execution,
+such as the one used by . The main reason for this is the non-recursive
+nature of the predicated expressions. As the predicates are only
+available at the top-level of the expressions, combining multiple
+expressions means that the conditions and multiple possibilities also
+need to trickle up to the top.
+
+Instead, if the predicated instructions were recursive, one would not
+have to change any predicates when evaluating the argument lists of
+instructions, for example, as these can just stay predicated. The
+evaluation semantics would also be much simpler, because a rule can be
+added for evaluating predicates.
+
+\section[title={Correctness of the Abstract
+Interpretation},reference={sec:correctness}]
+
+The correctness of the scheduling step is dependent on the correctness
+of the equivalence check between the two abstract languages. As the
+abstract languages contain predicates, this comparison check cannot just
+be a syntactic comparison between the two languages, because the
+predicates might be different but logically equivalent. Instead, a mixed
+equivalence check is performed using a verified SAT solver and a
+syntactic equivalence check of the actual symbolic expressions.
+
+\subsection[title={Comparing Symbolic Abstract
+Expressions},reference={comparing-symbolic-abstract-expressions}]
+
+Abstract expressions comprise a predicate and a symbolic expression. The
+translation from RTLBlock and RTLPar to Abstract ensures that each
+register will contain a predicated symbolic expression that describes
+the value that will be assigned to it after evaluating the basic block.
+This means that if an expression always produces the same result as
+another expression, that the register will always contain the same
+value. If the RTLBlock program and RTLPar program translate to two
+Abstract languages where all the register's symbolic expressions match,
+then this means that both programs will always produce the same value.
+
+The main comparison between two symbolic expressions can be done using a
+simple syntactic comparison, because this ensures that each of the
+expressions always evaluate to the same value, and is also general
+enough for the expressions that need to be compared. However, as all
+expressions might be predicated, this simple syntactic comparison cannot
+be performed for the predicates. This is because predicated instructions
+can be executed in many different orders, and the predicates are used to
+calculate dependence information and therefore also determine the order
+of operations.
+
+Instead, comparing predicates can be done by checking logical
+equivalence of the predicates using a formally verified SAT solver. This
+ensures that the predicates will always evaluate to the same value,
+thereby making it a fair comparison.
+
+\subsection[title={Verified Decidable DPLL SAT
+Solver},reference={verified_sat}]
+
+A SAT solver is needed to correctly and reliably compare predicated
+expressions. This makes it possible to then compare many different
+possible schedules, even if they remove expressions completely if these
+are unnecessary. The SAT solver is needed in the verifier, which needs
+to be proven correct in Coq to prove the scheduling translation correct.
+The SAT solver therefore has to be verified, because it could otherwise
+not be used in the proof of correctness.
+
+The DPLL algorithm is used to implement the SAT solver. This means that
+the SAT solver takes in a conjunctive normal form (CNF) formula. The
+conversion from a recursive predicate type into a CNF formula is done by
+recursively expanding the predicate in equivalent CNF formulas.
+
+The SAT solver is implemented in Coq and proven to be sound and
+complete. The soundness and completeness of an algorithm to check
+whether a predicate is satisfiable or not also directly implies that the
+check is decidable, simplifying many proofs in Coq by using a
+\type{Setoid} instance for equivalence between predicates. This makes
+rewriting predicates if they are equivalent much easier in proofs by
+reusing many of the existing Coq tactics.
+
+\subsection[title={Encoding Predicated Expressions as SAT
+Formulas},reference={encoding-predicated-expressions-as-sat-formulas}]
+
+Given the following two predicated expressions which should be tested
+for equivalence, one can construct a SAT expression which will ensure
+that both expressions will always result in the same value.
+
+%\startformula \begin{aligned}
+% \bgroup \left.\begin{aligned}
+% e_{1},\qquad &p_{1}\\
+% e_{2},\qquad &p_{2}\\
+% \vdots\quad& \\
+% e_{n},\qquad &p_{n}\\
+% \end{aligned}\right\rbrace\egroup
+% \Longleftrightarrow
+% \begin{cases}
+% e_{1}',\quad &p_{1}'\\
+% e_{2}',\quad &p_{2}'\\
+% \;\;\;\quad\vdots \notag \\
+% e_{m}',\quad &p_{m}'\\
+% \end{cases}\end{aligned} \stopformula
+
+The SAT query that should determine if two predicated expressions are
+equivalent is the following, where $\mathit{he}_{i}$ is the hashed
+expression of $e_{i}$:
+
+%\startformula \begin{gathered}
+% (p_{1} \implies \mathit{he}_{1}) \land (p_{2} \implies \mathit{he}_{2}) \land \cdots \land (p_{n} \implies \mathit{he}_{n}) \\
+% \Longleftrightarrow\\
+% (p_{1}' \implies \mathit{he}_{1}') \land (p_{2}' \implies \mathit{he}_{2}') \land \cdots \land (p_{m}' \implies \mathit{he}_{m}') \\\end{gathered} \stopformula
+
+However, reasoning about the equivalence of individual expressions
+within the greater list of predicated expressions is not quite
+straightforward. The evaluation semantics of predicated expression list
+will evaluate each predicate expression sequentially, and pick the
+expression whose predicate evaluates to true. However, proving that both
+predicated expression lists will return the same value assuming that the
+SAT query above is true, means that for each predicate that is true, One
+needs to be able to show that there is a predicate in the other list
+that will also be true, which will return the same value. However, there
+is no guarantee that there is a unique predicated expression that will
+match the current predicate, as there may be duplicate expressions in
+the list with different predicates.
+
+This means that it is quite difficult to prove that the SAT query
+implies the semantic preservation of the scheduling transformation. The
+SAT query can therefore be broken down into a slightly more direct
+comparison that is more useful for the proof itself. The steps to
+perform the more fine grained comparison are the following:
+
+%\startenumerate[n]
+%\item
+% Hash all of the expressions into a unique literal.
+%\item
+% Create a map from expression literals to their corresponding
+% predicate, thereby having a unique predicate for each expressions. If
+% there are duplicates in the list that forms the map, the predicates
+% are combined using the or operation.
+%\item
+% Iterate through the expressions in both maps and compare the
+% predicates using the SAT solver. If an expression is not present in
+% one of the maps, then it's predicate should be equivalent to $\perp$.
+%\stopenumerate
+
+This comparison is logically equivalent to performing the large SAT
+query, however, it maps better to the proof because it is comparing the
+predicates of the same expressions with each other.
+
+\subsection[proof-of-equivalence-between-passes]{Proof of equivalence between passes}
+
+Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows the
+four different passes that were added to CompCert to form hyperblocks.
+This section will cover the proofs for each of the translations,
+focusing mainly on the proof of translation of the scheduling pass.
+
+\subsubsection[proof-of-correctness-of-the-if-conversion-pass]{Proof of Correctness of the
+ If-Conversion Pass}
+
+The if-conversion pass is quite simple to verify, as it is an
+optimisation that transforms the code in the same language.
+
+\subsubsection[proof-of-correctness-of-scheduling]{Proof of Correctness of Scheduling}
+
+The scheduling pass is formally verified using translation validation,
+using the Abstr language as a point of comparison between RTLBlock and
+RTLPar. Using the algorithm defined in Section~\goto{3}[abstr_interp],
+the formally verified SAT solver described in
+Section~\goto{4.2}[verified_sat] can be used prove that if the two
+predicated expression lists are equivalent, that they will always
+evaluate to the same value.
+
+\section[implementation-details-of-static-sdc-scheduling]{Implementation Details of Static SDC
+ Scheduling}
+
+This section describes the implementation details of the static SDC
+scheduling algorithm used, and also describes some of the details of the
+basic block generation as well as the if-conversion pass which generates
+the predicated instructions.
+
+\subsection[implementation-of-if-conversion]{Implementation of If-Conversion}
+
+If-conversion is the conversion that introduces predicated instructions
+which can make use of the hyperblocks. It converts conditional
+statements into predicated instructions. This transformation has a few
+limitations on the kind of conditional statements that it can translate.
+First, only conditional statements without loops can be translated,
+therefore, one must identify cycles in the control-flow before
+performing the if-conversion. Secondly, if-conversion will not always
+result in more efficient code, so it should not be applied to any
+conditional statements. Instead, it is best applied to conditional
+statements where each branch will take a similar amount of time to
+execute.
+
+For back ends that do not support predicates, it is also important to
+eliminate the predicates again using a process called reverse
+if-conversion, which creates branches out of the predicates and groups
+these together again. This allows for hyperblock scheduling to also be
+used with back ends that do not support predicates, which are most the
+default back ends included in CompCert.
+
+\subsection[title={Static SDC
+Scheduling},reference={static-sdc-scheduling}]
+
+System of difference constraints (SDC) scheduling~ is a flexible
+algorithm to perform scheduling of instructions, especially if the
+target is hardware directly. It allows for flexible definitions of
+optimisation functions in terms of latencies and dependencies of the
+operations in the basic block.
+
+\section[title={Performance
+Comparison},reference={performance-comparison}]
+
+\section[title={Related Work},reference={sec:related-work}]
+
+\section[title={Conclusion},reference={conclusion}]
+
+This material is based upon work supported by the under Grant No.~ and
+Grant No.~. Any opinions, findings, and conclusions or recommendations
+expressed in this material are those of the author and do not
+necessarily reflect the views of the National Science Foundation.
+
+\stopcomponent