diff options
Diffstat (limited to 'chapters/scheduling.tex')
-rw-r--r-- | chapters/scheduling.tex | 587 |
1 files changed, 587 insertions, 0 deletions
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex new file mode 100644 index 0000000..5edad2e --- /dev/null +++ b/chapters/scheduling.tex @@ -0,0 +1,587 @@ +\startcomponent scheduling +\project main + +\chapter{Static Scheduling} + +\section[introduction]{Introduction} + +The use of multicore processors has been increasing drastically, thereby +making parallelising compilers ever more important. In addition to that, +the need for custom hardware accelerators is also increasing, as +traditional processors are not always the best choice for all +applications. Compilers that support optimisations which can +automatically parallelise it's input programs are therefore also +becoming more important, as these back ends all benefit from it. For +processors, one wants to optimise the pipeline usage as much as +possible, whereas in hardware one can create custom pipelines, as well +as making more use out of the spacial properties that hardware offers, +making instruction parallelism important as well as speculative +execution. However, with more powerful optimisations there is a greater +chance to make mistakes in the compiler, making it possible to generate +programs that behave differently to the input program. An important step +towards making compilers more reliable is to formally verify these +optimisations so that it can be guaranteed that bugs are not introduced. + +A popular optimisation for processors increasing the instruction +parallelism of the resulting program is to use scheduling. This is an +optimisation that is especially important for processors that have deep +instruction pipelines or multiple functional units that can be used to +execute instructions in parallel. However, this optimisation is also +critical for high-level synthesis tools, which generate application +specific hardware designs based on a software input. Simple scheduling +algorithms often work on basic blocks, which are sequential lists of +instructions without any branches. However, this limits what kind of +reordering the scheduling algorithm can perform. It can never move +instructions past a branching instruction, even when this might +drastically improve the performance. Trace scheduling~ is an alternative +that addresses this issue, by creating paths that might cross branching +instructions, and scheduling the instructions in that path. However, the +main problem with trace scheduling is that it is often infeasible on +large programs, and because of the large design space, it's sometimes +hard to find an optimal schedule. Superblock~ and hyperblock~ scheduling +are two subsets of trace scheduling, which compromise on the flexibility +of trace scheduling to instead build more tractable algorithms. +Superblock scheduling produces more efficient code for processors with a +low issue rate, whereas hyperblock scheduling produces more performant +code for processors with a high issue rate~. + +CompCert~ is a compiler that is formally verified in Coq. In addition to +that, have also added a formally verified hardware back end to CompCert, +allowing it to also generate provably correct hardware designs. A +hardware back end is an ideal target for hyperblock scheduling, as the +issue rate is practically infinite. This paper describes the +implementation of a hyperblock instruction scheduler in CompCert so that +it can benefit existing CompCert back ends, as well as support more +general targets such as scheduling instructions for custom hardware. + +\subsection[key-points]{Key Points} + +The goal of this paper is to introduce hyperblock scheduling into +CompCert to allow for advanced and flexible scheduling operations that +could be beneficial to various different back ends. To this end, the key +points of this paper are the following: + +\startitemize +\item + Description of how hyperblocks are introduced for improved scheduling + of instructions, and how these can be built and translated back to + instructions without predicates. +\item + Description of the implementation and correctness proof of a system of + difference constraints (SDC) scheduling algorithm, which can be + adapted to various different scheduling strategies. {\bf {[}{[}JW:} + What's the link between hyperblocks and SDC-based scheduling? Are + these orthogonal?{\bf {]}{]}}{\bf {[}{[}YH:} Well the original SDC + papers only talk about basic blocks, and not SDC hyperblock + scheduling. But it's easy enough to adapt. SDC just requires some kind + of blocks without control-flow.{\bf {]}{]}} {\bf {[}{[}JW:} Ok cool, + so you can have SDC scheduling without hyperblocks. Can you have + hyperblocks without SDC scheduling? (I guess so, but just wanted to be + completely sure.){\bf {]}{]}}{\bf {[}{[}YH:} Yes exactly, you can + perform any type of basic block scheduling on hyperblocks as well I + think, as long as you mark the dependencies correctly. But in the + worst case you could even just see hyperblocks as basic blocks and + ignore predicates. It would still be correct but just less + efficient.{\bf {]}{]}} + + {\bf {[}{[}JW:} It would be interesting to make the case for SDC-based + scheduling. Is it superior to other scheduling mechanisms? Is it so + much superior that it's worth the additional complexity, compared to, + say, list scheduling? {\bf {]}{]}}{\bf {[}{[}YH:} SDC scheduling is + more flexible with the type of constraint to optimise for (latency / + throughput). I don't think it's too meaningful for compilers, where + there isn't as much flexibility.{\bf {]}{]}} {\bf {[}{[}JW:} Thanks. + Does Vericert have any idea that the scheduling is being performed + using SDC? Or is the SDC stuff entirely internal to your unverified + OCaml code? That is, could you swap your scheduler with a different + one that isn't based on SDC, and not have to change anything in + Vericert?{\bf {]}{]}}{\bf {[}{[}YH:} The verified part doesn't know + about SDC at all, so it could be changed to any other scheduler that + acts on hyperblocks.{\bf {]}{]}} +\stopitemize + +\section[sec:scheduling]{Implementation of Hyperblocks in CompCert} + +This section describes the structure of hyperblocks in +Section~\goto{2.1}[scheduling:hyperblocks]. Then, the structure of two +extra intermediate languages that were added to CompCert to implement +hyperblocks are also shown in Section~\goto{2.2}[sec:rtlblockdef]. + +\subsection[scheduling:hyperblocks]{Hyperblocks as a Generalisation of Basic Blocks} + +Basic blocks are popular in intermediate languages because they describe +a list of sequential instructions that are not separated by +control-flow. This information can be used by various analysis passes to +perform optimisations, as these sequences of instructions can be +reordered in any way as long as their data dependencies are met. One +such optimisation is scheduling, which reorders instructions to +eliminate of any delay slots caused by the CPU pipeline. + +However, due to control-flow not being allowed in basic blocks, the +number of instructions that can be scheduled at any one time is limited. +The boundaries to other basic blocks act as hard boundary to the +scheduling algorithm. hyperblocks~ allow instructions to be predicated, +thereby introducing some control-flow information into the blocks, which +can also be taken into account by the scheduler. For example, any data +dependency between instructions that have mutually exclusive predicates +can be removed, as these instructions will never be activate at the same +time. + +\subsection[sec:rtlblockdef]{RTLBlock and RTLPar Intermediate Language Definition} + +Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows the +intermediate languages introduced to implement hyperblocks in CompCert. +This consists of new RTLBlock and RTLPar intermediate languages, which +implement the sequential and parallel semantics of basic blocks +respectively. The semantics of RTLBlock and RTLPar are based on the +CompCert RTL intermediate language. However, instead of mapping from +states to instructions, RTLBlock maps from states to hyperblocks, and +RTLPar maps from states to parallel hyperblocks, which will be described +in the next sections. + +%\startformula \begin{aligned} +% i\ ::=&\ \ \mono{RBnop} \\ +% &|\ \mono{RBop}\ p?\ \mathit{op}\ \vec{r}\ d \\ +% &|\ \mono{RBload}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ d \\ +% &|\ \mono{RBstore}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ s \\ +% &|\ \mono{RBsetpred}\ p?\ c\ \vec{r}\ d \\ +% \end{aligned} \stopformula +% +%\startformula \begin{aligned} +% i_{\mathit{cf}}\ ::=&\ \ \mono{RBcall}\ \mathit{sig}\ f\ \vec{r}\ d\ n \\ +% &|\ \mono{RBtailcall}\ \mathit{sig}\ f\ \vec{r} \\ +% &|\ \mono{RBbuiltin}\ f_{\mathit{ext}}\ \vec{r}\ r\ n \\ +% &|\ \mono{RBcond}\ c\ \vec{r}\ n_{1}\ n_{2} \\ +% &|\ \mono{RBjumptable}\ r\ \vec{n} \\ +% &|\ \mono{RBreturn}\ r? \\ +% &|\ \mono{RBgoto}\ n \\ +% &|\ \mono{RBpred\_{cf}}\ P\ i_{\mathit{cf}_{1}}\ i_{\mathit{cf}_{2}} \\ +% \end{aligned} \stopformula + +RTLBlock instructions are split into two types of instructions, standard +instructions and control-flow instructions. The standard instructions +are the instructions that can be placed into hyperblocks, whereas +control-flow instructions are the instructions that can end hyperblocks. +The standard instructions are shown in +Figure~\goto{{[}fig:standard_instruction{]}}[fig:standard_instruction], +whereas the control-flow instructions are shown in +Figure~\goto{{[}fig:control_flow_instruction{]}}[fig:control_flow_instruction]. +Most instructions are quite similar to their RTL counterparts, however, +there are some instructions that have been added. \type{RBsetpred} is a +standard instruction which sets a predicate equal to an evaluated +condition. This instruction is used during if-conversion to set the +predicate to the value of the condition in the conditional statement +instruction. To support proper predicated instructions, each instruction +therefore also contains an optional predicate which has been set +beforehand, which produces the hyperblock. In additon to that, there is +also an extra control-flow instruction called \type{RBpred_cf}, which +can branch on a predicate an nests more control-flow instructions inside +of it. This is also necessary for if-conversion, when converting an +already converted conditional statement, as depending on the predicate, +a different control-flow instruction may be necessary. {\bf {[}{[}JW:} I +think there'd be a lot of value in having a picture showing op-chaining +in action. See tex source for the kind of thing I mean. Probably best +done as a running example throughout the paper. {\bf {]}{]}} + +These instructions are use in RTLBlock as well as in RTLPar. The main +difference between these two languages is how these instructions are +arranged within the hyperblock and the execution semantics of the +hyperblock in the languages. Let $\mathit{hb}_{b}$ be the definition of +a hyperblock for RTLBlock, and let $\mathit{hb}_{p}$ be the definition +of a hyperblock for RTLPar, then $\mathit{hb}_{b}$ and $\mathit{hb}_{p}$ +be defined as in Figure~\goto{{[}fig:hb_def{]}}[fig:hb_def]. The Figure +shows the different nesting levels of the basic blocks in RTLBlock as +well as RTLPar, and where the parallel semantics of RTLPar come into +play. RTLBlock is made of a list of instructions and a control-flow +instruction that ends the hyperblock. Each instruction in RTLBlock is +executed sequentially. RTLPar is made of three separate lists. The outer +list behaves like the list in RTLBlock, and executes each list inside of +it sequentially. RTLPar then has a list which executes it's contents in +parallel, meaning each instruction uses the same state of the registers +and memory while executing. Finally, instead of directly containing +instructions, there is an additional layer of sequentially executing +lists of instructions, to be able to execute small groups of +instructions in sequence, but still in parallel to the other +instructions in the current parallel group. + +RTLPar is structured in this way so that the scheduler has more +flexibility, so that it can not only parallelise operations in the basic +blocks, but that it can also define sequential operations in these +parallel operations. This is especially useful for the hardware back +end, as it means that sequential operations that do not quite fill one +clock cycle can be executed sequentially, but can still be executed in +parallel with other independent operations. This optimisation is called +operation chaining, and is critical to get the most performance out of +the hardware. + +\subsection[translating-rtlblock-to-rtlpar]{Translating RTLBlock to RTLPar} + +The translation from RTLBlock to RTLPar is performed using a scheduling +algorithm, which will takes hyperblocks and rearranges their +instructions to maximise parallelism based on the data dependencies. It +then arranges the instructions in RTLPar by putting independent +instructions into the same parallel block. Scheduling is an optimisation +that is well suited to translation validation, as the algorithm itself +can be quite complex, whereas the output is normally relatively easy to +check for equivalence with the input. +Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows that +the scheduling step comes right after the if-conversion step which +originally creates the hyperblocks. + +Figure~\goto{{[}fig:op_chain{]}}[fig:op_chain] shows how the scheduling +step transforms RTLBlock into RTLPar. The RTLBlock hyperblock being +scheduled is shown in +Figure~\goto{{[}fig:op_chain_a{]}}[fig:op_chain_a], which contains five +predicated operations, comprising two additions and three +multiplications. The data dependencies of the instructions in the +hyperblock are shown in +Figure~\goto{{[}fig:op_chain_b{]}}[fig:op_chain_b]. Curiously, even +though operations and are normally dependent on each other due to a +write-after-write conflict, but because the predicates are independent +the write-after-write conflict can effectively be removed by the +scheduler. + +The scheduler transforms the RTLBlock hyperblock into the RTLPar +hyperblock shown in Figure~\goto{{[}fig:op_chain_c{]}}[fig:op_chain_c]. +Even though the addition in is dependent on , they can still be put into +the same cycle as the additions do not take as long to complete as the +multiplications. This optimisation is called operation chaining. Then, +the multiplication in can also be placed in the same clock cycle as it +does not have any data dependencies. Then, in the next clock cycle, +either the multiplication in can take place, or the multiplication in +will take place, meaning these two multiplications can also be placed in +the same clock cycle. This gives the final schedule that is shown in +Figure~\goto{{[}fig:op_chain_d{]}}[fig:op_chain_d]. + +\section[abstr_interp]{Abstract Interpretation of Hyperblocks} + +The correctness of the scheduling algorithm is checked by comparing the +symbolic expressions of each register before and after scheduling. This +section describes how these symbolic expressions are constructed for +each hyperblock. + +\subsection[abstr_language]{Construction of Abstract Language} + +The main difficulty when constructing symbolic expressions for +hyperblocks is the presence of predicates, which can be used by the +scheduling algorithm to reorder instructions further, even if these +contain data dependencies. The scheduling algorithm will manipulate +predicates in the following ways, which the comparison function needs to +take into account. + +\startitemize +\item + Instructions with predicates that are equivalent to false can be + removed. +\item + Instructions with predicates that are mutually exclusive can be + executed in parallel. +\item + Any predicate can be replaced with an equivalent predicate. +\stopitemize + +The symbolic expressions have the following form: + +There are three main resources that are present in symbolic expressions: +registers $r$, which represent the value of the registers in the +program, predicates $p$, which represent the value of the predicates +throughout the program, and finally, memory \type{Mem}, representing the +state of the memory. For each basic instruction in the hyperblock, there +is an equivalent symbolic expression which either modifies the state of +the registers, predicates or memory. Finally, as each instruction in the +hyperblock can be predicates, each symbolic expression is also paired +with a predicate as well. + +Assigning the same resource multiple times with instructions that +contain different predicates, means that the symbolic expression must +also express this conditional result based on the predicates. This is +expressed by assigning lists of predicated symbolic expressions to each +resource, where each predicate needs to be mutually exclusive from the +other predicates in the list. + +The expressions are then constructed using a function which updates the +symbolic expressions assigned for each resource. This is done using +multiple primitives which act on predicated types, which are made up of +a list of pairs of predicates and the element of that type. The first +important primitive is multiplication of two predicated types, which is +implemented as performing a cartesian multiplication between the +predicated types in the two lists, anding the two predicates and joining +the two types of each list using a function. + +%\startformula \label{eq:1} +% P_{1} \otimes_{f} P_{2} \equiv \mono{map } (\lambda ((p_{1}, e_{1}), (p_{2}, e_{2})) . (p_{1} \land p_{2}, f\ e_{1}\ e_{2})) P_{1} \times P_{2} \stopformula + +In addition to that, another primitive that is needed is the following +append operation, which will negate the combination of all predicates in +the second predicated type, and it to the first predicated type and +append the first predicated type to the second: + +%\startformula \label{eq:2} +% \mu(p, P) \equiv \mono{map } (\lambda (p', e') . (p \land p', e'))\ P \stopformula + +%\startformula \label{eq:3} +% P_{1} \oplus_{p} P_{2} \equiv \mu(\neg p, P_{1}) \mathbin{++} \mu(p, P_{2}) \stopformula + +\subsection[title={Example of +translation},reference={example-of-translation}] + +%\startformula \begin{aligned} +% \label{eq:4} +% &\mono{r1} = +% \begin{cases} +% \mono{r1}^{0} + \mono{r4}^{0} + \mono{r4}^{0}, \quad &\mono{p1} \\ +% \mono{r1}^{0}, \quad &\mono{!p1} \\ +% \end{cases}\\ +% &\mono{r2} = \mono{r1}^{0} + \mono{r4}^{0}\\ +% &\mono{r3} = +% \begin{cases} +% \left( \mono{r1}^{0} \times \mono{r1}^{0} \right) \times \left( \mono{r1}^{0} \times \mono{r1}^{0} \right),\quad &\mono{!p2 \&\& !p1}\\ +% \mono{r1}^{0} \times \mono{r4}^{0},\quad &\mono{p2 \&\& !p1}\\ +% \left( \mono{r1}^{0} + \mono{r4}^{0} + \mono{r4}^{0} \right) \times \mono{r4}^{0},\quad &\mono{!p2 \&\& p1}\\ +% \mono{r3}^{0} \times \mono{r3}^{0},\quad &\mono{p2 \&\& p1} +% \end{cases} +% \end{aligned} \stopformula + +Using the example shown in +Figure~\goto{{[}fig:op_chain{]}}[fig:op_chain], the RTLBlock hyperblock +shown in Figure~\goto{{[}fig:op_chain_a{]}}[fig:op_chain_a] is scheduled +into the hyperblock RTLPar shown in +Figure~\goto{{[}fig:op_chain_c{]}}[fig:op_chain_c]. The first step is to +translate the input and output of the scheduler into the Abstr +intermediate language by calculating the symbolic values that will be +stored in each of the registers. Symbolically evaluating the RTLBlock +hyperblock results in the expressions shown in +Figure~\goto{{[}fig:symbolic_expressions{]}}[fig:symbolic_expressions] +for registers \type{r1}, \type{r2} and \type{r3}, where the state of the +register \type{r1} at the start of the hyperblock is denoted as +$\mono{r1}^{0}$. + +This symbolic expression can be generated by sequentially going through +the list of predicated expressions and applying the update function +defined in Section~\goto{3.1}[abstr_language]. The resulting predicated +expressions is guaranteed to have predicates which are mutually +exclusive, meaning if the predicate evaluates to true, all other +predicates must evaluate to false. + +\subsection[title={Linear (Flat) Predicated +Expressions},reference={linear-flat-predicated-expressions}] + +The update function for predicated expressions is quite a bit more +complex compared to standard update functions for symbolic execution, +such as the one used by . The main reason for this is the non-recursive +nature of the predicated expressions. As the predicates are only +available at the top-level of the expressions, combining multiple +expressions means that the conditions and multiple possibilities also +need to trickle up to the top. + +Instead, if the predicated instructions were recursive, one would not +have to change any predicates when evaluating the argument lists of +instructions, for example, as these can just stay predicated. The +evaluation semantics would also be much simpler, because a rule can be +added for evaluating predicates. + +\section[title={Correctness of the Abstract +Interpretation},reference={sec:correctness}] + +The correctness of the scheduling step is dependent on the correctness +of the equivalence check between the two abstract languages. As the +abstract languages contain predicates, this comparison check cannot just +be a syntactic comparison between the two languages, because the +predicates might be different but logically equivalent. Instead, a mixed +equivalence check is performed using a verified SAT solver and a +syntactic equivalence check of the actual symbolic expressions. + +\subsection[title={Comparing Symbolic Abstract +Expressions},reference={comparing-symbolic-abstract-expressions}] + +Abstract expressions comprise a predicate and a symbolic expression. The +translation from RTLBlock and RTLPar to Abstract ensures that each +register will contain a predicated symbolic expression that describes +the value that will be assigned to it after evaluating the basic block. +This means that if an expression always produces the same result as +another expression, that the register will always contain the same +value. If the RTLBlock program and RTLPar program translate to two +Abstract languages where all the register's symbolic expressions match, +then this means that both programs will always produce the same value. + +The main comparison between two symbolic expressions can be done using a +simple syntactic comparison, because this ensures that each of the +expressions always evaluate to the same value, and is also general +enough for the expressions that need to be compared. However, as all +expressions might be predicated, this simple syntactic comparison cannot +be performed for the predicates. This is because predicated instructions +can be executed in many different orders, and the predicates are used to +calculate dependence information and therefore also determine the order +of operations. + +Instead, comparing predicates can be done by checking logical +equivalence of the predicates using a formally verified SAT solver. This +ensures that the predicates will always evaluate to the same value, +thereby making it a fair comparison. + +\subsection[title={Verified Decidable DPLL SAT +Solver},reference={verified_sat}] + +A SAT solver is needed to correctly and reliably compare predicated +expressions. This makes it possible to then compare many different +possible schedules, even if they remove expressions completely if these +are unnecessary. The SAT solver is needed in the verifier, which needs +to be proven correct in Coq to prove the scheduling translation correct. +The SAT solver therefore has to be verified, because it could otherwise +not be used in the proof of correctness. + +The DPLL algorithm is used to implement the SAT solver. This means that +the SAT solver takes in a conjunctive normal form (CNF) formula. The +conversion from a recursive predicate type into a CNF formula is done by +recursively expanding the predicate in equivalent CNF formulas. + +The SAT solver is implemented in Coq and proven to be sound and +complete. The soundness and completeness of an algorithm to check +whether a predicate is satisfiable or not also directly implies that the +check is decidable, simplifying many proofs in Coq by using a +\type{Setoid} instance for equivalence between predicates. This makes +rewriting predicates if they are equivalent much easier in proofs by +reusing many of the existing Coq tactics. + +\subsection[title={Encoding Predicated Expressions as SAT +Formulas},reference={encoding-predicated-expressions-as-sat-formulas}] + +Given the following two predicated expressions which should be tested +for equivalence, one can construct a SAT expression which will ensure +that both expressions will always result in the same value. + +%\startformula \begin{aligned} +% \bgroup \left.\begin{aligned} +% e_{1},\qquad &p_{1}\\ +% e_{2},\qquad &p_{2}\\ +% \vdots\quad& \\ +% e_{n},\qquad &p_{n}\\ +% \end{aligned}\right\rbrace\egroup +% \Longleftrightarrow +% \begin{cases} +% e_{1}',\quad &p_{1}'\\ +% e_{2}',\quad &p_{2}'\\ +% \;\;\;\quad\vdots \notag \\ +% e_{m}',\quad &p_{m}'\\ +% \end{cases}\end{aligned} \stopformula + +The SAT query that should determine if two predicated expressions are +equivalent is the following, where $\mathit{he}_{i}$ is the hashed +expression of $e_{i}$: + +%\startformula \begin{gathered} +% (p_{1} \implies \mathit{he}_{1}) \land (p_{2} \implies \mathit{he}_{2}) \land \cdots \land (p_{n} \implies \mathit{he}_{n}) \\ +% \Longleftrightarrow\\ +% (p_{1}' \implies \mathit{he}_{1}') \land (p_{2}' \implies \mathit{he}_{2}') \land \cdots \land (p_{m}' \implies \mathit{he}_{m}') \\\end{gathered} \stopformula + +However, reasoning about the equivalence of individual expressions +within the greater list of predicated expressions is not quite +straightforward. The evaluation semantics of predicated expression list +will evaluate each predicate expression sequentially, and pick the +expression whose predicate evaluates to true. However, proving that both +predicated expression lists will return the same value assuming that the +SAT query above is true, means that for each predicate that is true, One +needs to be able to show that there is a predicate in the other list +that will also be true, which will return the same value. However, there +is no guarantee that there is a unique predicated expression that will +match the current predicate, as there may be duplicate expressions in +the list with different predicates. + +This means that it is quite difficult to prove that the SAT query +implies the semantic preservation of the scheduling transformation. The +SAT query can therefore be broken down into a slightly more direct +comparison that is more useful for the proof itself. The steps to +perform the more fine grained comparison are the following: + +%\startenumerate[n] +%\item +% Hash all of the expressions into a unique literal. +%\item +% Create a map from expression literals to their corresponding +% predicate, thereby having a unique predicate for each expressions. If +% there are duplicates in the list that forms the map, the predicates +% are combined using the or operation. +%\item +% Iterate through the expressions in both maps and compare the +% predicates using the SAT solver. If an expression is not present in +% one of the maps, then it's predicate should be equivalent to $\perp$. +%\stopenumerate + +This comparison is logically equivalent to performing the large SAT +query, however, it maps better to the proof because it is comparing the +predicates of the same expressions with each other. + +\subsection[proof-of-equivalence-between-passes]{Proof of equivalence between passes} + +Figure~\goto{{[}fig:compcert_interm{]}}[fig:compcert_interm] shows the +four different passes that were added to CompCert to form hyperblocks. +This section will cover the proofs for each of the translations, +focusing mainly on the proof of translation of the scheduling pass. + +\subsubsection[proof-of-correctness-of-the-if-conversion-pass]{Proof of Correctness of the + If-Conversion Pass} + +The if-conversion pass is quite simple to verify, as it is an +optimisation that transforms the code in the same language. + +\subsubsection[proof-of-correctness-of-scheduling]{Proof of Correctness of Scheduling} + +The scheduling pass is formally verified using translation validation, +using the Abstr language as a point of comparison between RTLBlock and +RTLPar. Using the algorithm defined in Section~\goto{3}[abstr_interp], +the formally verified SAT solver described in +Section~\goto{4.2}[verified_sat] can be used prove that if the two +predicated expression lists are equivalent, that they will always +evaluate to the same value. + +\section[implementation-details-of-static-sdc-scheduling]{Implementation Details of Static SDC + Scheduling} + +This section describes the implementation details of the static SDC +scheduling algorithm used, and also describes some of the details of the +basic block generation as well as the if-conversion pass which generates +the predicated instructions. + +\subsection[implementation-of-if-conversion]{Implementation of If-Conversion} + +If-conversion is the conversion that introduces predicated instructions +which can make use of the hyperblocks. It converts conditional +statements into predicated instructions. This transformation has a few +limitations on the kind of conditional statements that it can translate. +First, only conditional statements without loops can be translated, +therefore, one must identify cycles in the control-flow before +performing the if-conversion. Secondly, if-conversion will not always +result in more efficient code, so it should not be applied to any +conditional statements. Instead, it is best applied to conditional +statements where each branch will take a similar amount of time to +execute. + +For back ends that do not support predicates, it is also important to +eliminate the predicates again using a process called reverse +if-conversion, which creates branches out of the predicates and groups +these together again. This allows for hyperblock scheduling to also be +used with back ends that do not support predicates, which are most the +default back ends included in CompCert. + +\subsection[title={Static SDC +Scheduling},reference={static-sdc-scheduling}] + +System of difference constraints (SDC) scheduling~ is a flexible +algorithm to perform scheduling of instructions, especially if the +target is hardware directly. It allows for flexible definitions of +optimisation functions in terms of latencies and dependencies of the +operations in the basic block. + +\section[title={Performance +Comparison},reference={performance-comparison}] + +\section[title={Related Work},reference={sec:related-work}] + +\section[title={Conclusion},reference={conclusion}] + +This material is based upon work supported by the under Grant No.~ and +Grant No.~. Any opinions, findings, and conclusions or recommendations +expressed in this material are those of the author and do not +necessarily reflect the views of the National Science Foundation. + +\stopcomponent |