1 files changed, 169 insertions, 158 deletions
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
index 4b897ef..d3a25f4 100644
--- a/chapters/scheduling.tex
+++ b/chapters/scheduling.tex
@@ -63,21 +63,18 @@ end, the key points of this paper are the following:
 
 \startplacemarginfigure[reference={fig:compcert_interm},title={New intermediate languages introduced into
     CompCert.  Progress bars are shown for each transformation pass, where green represents the
-    estimated part of the work that is completed, and red stands for the remaining work to be done.}]
-  \hbox{\starttikzpicture[shorten >=1pt,>=stealth]
-    \node       at (-3, 0) (dotsin) {$\cdots$};
-    \node       at (10, 0) (dotsout) {$\cdots$};
-    \node[draw] at (-1, 0) (rtl) {\rtl};
-    \node[draw] at (2,  0) (rtlblock) {\rtlblock};
-    \node[draw] at (2, -2) (abstrblock) {\abstr};
-    \node[draw] at (6,  0) (rtlpar) {\rtlpar};
-    \node[draw] at (6, -2) (abstrpar) {\abstr};
-    \node[draw] at (8.5,  0) (htl) {\htl};
+    estimated part of the work that is completed, and red stands for the remaining work to be done.
+Orange stands for a pass that will currently not be used for the current translation but could be
+added back in the future.}]
+  \hbox{\starttikzpicture[shorten >=1pt,>=stealth] \node at (-3, 0) (dotsin) {$\cdots$}; \node at
+    (10, 0) (dotsout) {$\cdots$}; \node[draw] at (-1, 0) (rtl) {\rtl}; \node[draw] at (2, 0)
+    (rtlblock) {\rtlblock}; \node[draw] at (2, -2) (abstrblock) {\abstr}; \node[draw] at (6, 0)
+    (rtlpar) {\rtlpar}; \node[draw] at (6, -2) (abstrpar) {\abstr}; \node[draw] at (8.5, 0) (htl)
+    {\htl};
 
     \draw[->] (rtl) --
          node[above,midway,align=center,font=\small] {basic block \\[-0.3em] creation}
-         node[below,yshift=-3mm,xshift=-0.75cm,anchor=west,fill=VenetianRed,minimum width=1.5cm] {}
-         node[below,yshift=-3mm,xshift=-0.75cm,anchor=west,fill=ForestGreen,minimum width=0.6cm] {}
+         node[below,yshift=-3mm,xshift=-0.75cm,anchor=west,fill=ForestGreen,minimum width=1.5cm] {}
          (rtlblock);
     \draw[->] (rtlblock) --
          node[above,midway,font=\small] {scheduling}
@@ -103,12 +100,11 @@ end, the key points of this paper are the following:
     \draw[->] (rtlblock) to [out=130,in=50,loop,looseness=10]
          node[above,midway,font=\small] {if-conversion}
          node[below,yshift=6mm,xshift=-0.75cm,anchor=west,fill=VenetianRed,minimum width=1.5cm] {}
-         node[below,yshift=6mm,xshift=-0.75cm,anchor=west,fill=ForestGreen,minimum width=0.2cm] {}
+         node[below,yshift=6mm,xshift=-0.75cm,anchor=west,fill=ForestGreen,minimum width=1cm] {}
          (rtlblock);
     \draw[->] (rtlpar) to [out=130,in=50,loop,looseness=10]
          node[above,midway,align=center,font=\small] { operation \\[-0.3em] pipelining}
-         node[below,yshift=1cm,xshift=-0.75cm,anchor=west,fill=VenetianRed,minimum width=1.5cm] {}
-         node[below,yshift=1cm,xshift=-0.75cm,anchor=west,fill=ForestGreen,minimum width=0.2cm] {}
+         node[below,yshift=1cm,xshift=-0.75cm,anchor=west,fill=TigersEye,minimum width=1.5cm] {}
          (rtlpar);
     \draw[->] (dotsin) -- (rtl);
     \draw[->] (htl) -- (dotsout);
@@ -124,20 +120,18 @@ support of pipelined operations such as dividers.
 In general, the implementation of all the passes is now complete, and so current work is being done
 to complete the correctness proofs of all the passes.  Starting at the basic block creating pass,
 which is further described in \in{Section}[sec:basic-block-generation], the proof is quite complex
-for generating basic blocks.  However, the main idea of how to complete the proof has now been
-understood, so it should just be a matter of time until it is implemented.  Secondly, the
-if-conversion pass has only been implemented and has not been verified yet.  However, it should be
-straightforward to verify it, as the transformation is quite direct.  Next, for the scheduling pass
-the verification translates the input and the output of the scheduling algorithm into the \abstr\
-language and then compares the two languages has mainly been completed.  However, due to changes in
-the \rtlblock\ and \rtlpar\ language semantics, some of these proofs will have to be redone, which
-should not take too long though.  Finally, for the generation of \htl\ from \rtlpar\ the proof has
-not been started yet.  However, the difficult parts of the proof are the translation of loads and
-stores into proper reads and writes from RAM, which has already been completely proven for the \rtl\
-to \htl\ translation.
-
-The operation pipelining pass has only been implemented until now, however, it will likely be
-removed from the current workflow if it proves to take too long to complete.
+for generating basic blocks, but has now been completed. Secondly, the if-conversion pass has only
+been implemented and has not been verified yet.  Even though the transformation is quite direct,
+it's still difficult to formulate the correct matching condition for the proof.  However, this has
+nearly been completed.  Next, for the scheduling pass the verification of the pass that translates
+the input and the output of the scheduling algorithm into the \abstr\ language and then compares the
+two languages has mainly been completed.  However, due to changes in the \rtlblock\ and \rtlpar\
+language semantics, some of these proofs will have to be redone.  Finally, for the generation of
+\htl\ from \rtlpar\ the proof has not been started yet.  However, the difficult parts of the proof
+are the translation of loads and stores into proper reads and writes from RAM, which has already
+been completely proven for the \rtl\ to \htl\ translation, and can be directly copied over.  The
+operation pipelining pass has only been implemented until now, however, it will likely be removed
+from the current workflow if it proves to take too long to complete.
 
 \section[sec:basic-block-generation]{Basic Block Generation}
 
@@ -197,8 +191,8 @@ transition in the left hand side.
         \node (s1) at (0,2) {$S_{1}$};
         \node (s2p) at (3,0) {$S_{2}'$};
         \node (s1p) at (3,2) {$S_{1}'$};
-        \draw (s1) -- node[above] {$\sim$} (s1p);
-        \draw[dashed] (s2) -- node[above] {$\sim$} (s2p);
+        \draw (s1) -- node[above] {$\sim_{i::\varnothing}$} (s1p);
+        \draw[dashed] (s2) -- node[above] {$\exists b, \sim_{b}$} (s2p);
         \draw[->] (s1) -- (s2);
         \draw[->,dashed] (s1p) -- (s2p);
       \stoptikzpicture}
@@ -208,9 +202,9 @@ transition in the left hand side.
         \node (s2) at (0,0) {$S_{2}$};
         \node (s1) at (0,2) {$S_{1}$};
         \node (s1p) at (3,2) {$S_{1}'$};
-        \draw (s1) -- node[above] {$\sim$} (s1p);
+        \draw (s1) -- node[above] {$\sim_{i::b}$} (s1p);
         \draw[->] (s1) -- (s2);
-        \draw[dashed] (s2) -- node[below,xshift=1mm] {$\sim$} (s1p);
+        \draw[dashed] (s2) -- node[below,xshift=3mm] {$\sim_{b}$} (s1p);
       \stoptikzpicture}
     \stopplacefigure
   \stopfloatcombination
@@ -218,17 +212,41 @@ transition in the left hand side.
 
 The diagram on the left represents a control-flow transition, which is performed in lock-step.  This
 means one has encountered a control-flow instruction, and therefore also implies that the basic
-block has finished executing and that the next block needs to be retrieved.  The diagram on the
-right is the interesting one which allows for the multiple transitions in the input language.  It
-requires writing a matching predicate $\sim$ which can match the start of the basic block in
-\rtlblock\ with any state in the original \rtl\ language as long as one has not finished executing
-the basic block and has not encountered a control-flow instruction.  By also showing that some
-metric stored in the $\sim$ relation is decreasing, one can then show that one will not get stuck
-iterating forever, and effectively get the behaviour of executing multiple instructions in the input
-code to execute one big step in the output language.  The other trick is that at each one of these
-steps, one also incrementally builds a proof of the execution from the start of the basic block to
-the current state, so that when a control-flow instruction is encountered, this proof can be used
-show that one can jump over all the input instructions with that block.
+block has finished executing and that the next block needs to be retrieved.  In addition to that,
+the matching relation is parameterised by the current basic block that is executing, so in the case
+of the lock-step translation the translation will always be a control-flow instruction at the end of
+the basic block and will therefore populate the matching relation with the new basic block that is
+jumped to.  The diagram on the right is the interesting one though, which allows for the multiple
+transitions in the input language.  It requires matching on a predicate $\sim_{i::b}$ which matches
+the start of the basic block in \rtlblock\ with any state in the original \rtl\ language as long as
+one has not finished executing the basic block and has not encountered a control-flow instruction.
+Then, at every step, one has to show that the block in the matching relation reduces at every step
+to show termination, and one also has to keep track of the proof of simulation from the beginning of
+the basic block to the current point, to then show that the final control-flow step also matches.
+
+This is done by the following two properties that are inside of the simulation relation
+$S_{2} \sim_{b} S_{1}'$.  The \in{Equation}[eq:sem_extrap] property describes that executing the
+basic block $B$ that is in the code $c$ of the translated function in \rtlblock\ at the program
+counter location from an initial state $S_{1}'$ will result in the same final state as executing the
+current list of instructions that the simulation relation is parametrised at (i.e. $b$).
+
+\placeformula[eq:sem_extrap]\startformula
+  \forall S,\ \ c[S_{1}'.\mathit{pc}] = \Some{B} \land (b, S_2) \Downarrow S \implies (B, S_{1}') \Downarrow S
+\stopformula
+
+The following \in{Equation}[eq:star] then also states that there exists zero or more small steps
+in \rtl\ that can go from $S_{1}'$ to the current state in \rtl\ $S_{2}$.
+
+\placeformula[eq:star]\startformula
+  S_{1}' \rightarrow_* S_{2}
+\stopformula
+
+By also demonstrating that some metric stored in the $\sim$ relation is decreasing, one can then
+show that one will not get stuck iterating forever, and effectively get the behaviour of executing
+multiple instructions in the input code to execute one big step in the output language.  The
+\in{Equation}[eq:star] also incrementally builds a proof of the execution from the start of the
+basic block to the current state, so that when a control-flow instruction is encountered, this proof
+can be used show that one can jump over all the input instructions with that block.
 
 \subsection[scheduling:hyperblocks]{Hyperblocks as a Generalisation of Basic Blocks}
 
@@ -236,7 +254,7 @@ Basic blocks are popular in intermediate languages because they describe a list
 instructions that are not separated by control-flow. This information can be used by various
 analysis passes to perform optimisations, as these sequences of instructions can be reordered in any
 way as long as their data dependencies are met. One such optimisation is scheduling, which reorders
-instructions to eliminate of any delay slots caused by the CPU pipeline.
+instructions to eliminate any delay slots caused by the CPU pipeline.
 
 However, due to control-flow not being allowed in basic blocks, the number of instructions that can
 be scheduled at any one time is limited.  The boundaries to other basic blocks act as hard boundary
@@ -245,7 +263,7 @@ algorithm. Hyperblocks~\cite[mahlke92_effec_compil_suppor_predic_execut_using_hy
 instructions to be predicated, thereby introducing some control-flow information into the blocks,
 which can also be taken into account by the scheduler. For example, any data dependency between
 instructions that have mutually exclusive predicates can be removed, as these instructions will
-never be activate at the same time.
+never be active at the same time.
 
 \section[implementation-of-if-conversion]{Implementation of If-Conversion}
 
@@ -257,18 +275,19 @@ always result in more efficient code, so it should not be applied to any conditi
 statements.
 
 From an implementation point of view, if-conversion can be implemented naïvely, however, it could be
-improved by adding better heuristics for which conditional statements to transform into predicated
-instructions.  For hyperblocks, conditionals where each branch approximately execute in the same
-number of cycles are an ideal candidate for the transformation, as it means that no time is lost due
-to having to execute extra predicated instructions that are false.  However, as hyperblocks are an
-extension of superblocks, single paths through the control-flow graph can also be represented.  In
-our implementation, we use simple static heuristics to pick these paths~\cite[??].
+improved by adding better heuristics to determine which conditional statements to transform into
+predicated instructions.  For hyperblocks, conditionals where each branch approximately execute in
+the same number of cycles are an ideal candidate for the transformation, as it means that no time is
+lost due to having to execute extra predicated instructions that are false.  However, as hyperblocks
+are an extension of superblocks, single paths through the control-flow graph can also be
+represented.  In our implementation, we use simple static heuristics to pick these
+paths~\cite[ball93_branc_predic_free].
 
 \subsection[proof-if-conversion]{Correctness of If-Conversion}
 
 The correctness of the if-conversion pass does not rely on which paths are chosen to be combined.
 The proof therefore does not have to cover the static heuristics that are generated, because these
-only affect the performance.  Even though the translation is quite simple, the reasoning about it's
+only affect the performance.  Even though the translation is quite simple, the reasoning about its
 correctness does have its difficulties.  We have a similar problem as before, where we sometimes
 need to execute multiple steps in the input program to reach one step in the output program.  The
 solution is to perform a similar style proof as for the basic block generation.
@@ -280,6 +299,21 @@ if-converted section.
 
 \section[sec:scheduling]{Scheduling Implementation}
 
+This section will discuss the implementation of hyperblock scheduling in Vericert, which is still a
+work in progress.  The scheduling step consists of many different smaller transformations, covering
+the initial basic block generation, hyperblock construction using if-conversion, then the scheduling
+and finally the hardware generation.
+
+Vericert uses \SDC\ scheduling~\cite[cong06_sdc], which is a flexible algorithm to perform
+scheduling of instructions, especially if the target is hardware directly. It allows for flexible
+definitions of optimisation functions in terms of latencies and dependencies of the operations in
+the basic block. \HLS\ tools often use \SDC\ scheduling performed on hyperblocks to find the optimal
+schedule.  The main benefit of using \SDC\ scheduling is that it generates constraints and an
+equation that should be minimised, which can then be passed to a linear program solver.  By changing
+the minimisation, different attributes of the hardware can be optimised.  In addition to that
+though, there have been extensions to the \SDC\ scheduling algorithm to include more constraints
+that can then also schedule loops using modulo scheduling~\cite[zhang13_sdc].
+
 \subsection[sec:rtlblockdef]{\rtlblock\ and \rtlpar\ Intermediate Language Definition}
 
 \lindex{\rtlblock}\lindex{\rtlpar}\in{Figure}[fig:compcert_interm] shows the intermediate languages
@@ -292,14 +326,17 @@ next sections.
 
 \startplacefigure[location={here,none}]
   \startfloatcombination[nx=2]
-    \startplacefigure[reference={eq:standard},title={Syntax for instructions within a hyperblock.}]
+    \startplacefigure[reference={eq:standard},title={Syntax for instructions within a hyperblock.
+        There is a predicated exit instruction which serves as an early exit instruction from the
+        hyperblock, thereby making it single entry and multiple exit.}]
       \startframedtext[frame=off,offset=none,width={0.45\textwidth}]
         \startformula\startmathalignment
           \NC i\ \ \eqdef \NC \ \ \text{\tt RBnop}                                                   \NR
           \NC                  \NC |\ \ \text{\tt RBop}\ p?\ \mathit{op}\ \vec{r}\ d                      \NR
           \NC                  \NC |\ \ \text{\tt RBload}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ d  \NR
           \NC                  \NC |\ \ \text{\tt RBstore}\ p?\ \mathit{chunk}\ \mathit{addr}\ \vec{r}\ s \NR
-          \NC                  \NC |\ \ \text{\tt RBsetpred}\ p?\ c\ \vec{r}\ d                           \NR
+          \NC                  \NC |\ \ \text{\tt RBsetpred}\ p?\ c\ \vec{r}\ d                 \NR
+          \NC                  \NC |\ \ \text{\tt RBexit}\ p?\ i_{\mathit{cf}}                 \NR
         \stopmathalignment\stopformula
       \stopframedtext
 
@@ -314,7 +351,6 @@ next sections.
           \NC                      \NC |\ \ \text{\tt RBjumptable}\ r\ \vec{n}                                   \NR
           \NC                      \NC |\ \ \text{\tt RBreturn}\ r?                                              \NR
           \NC                      \NC |\ \ \text{\tt RBgoto}\ n                                                 \NR
-          \NC                      \NC |\ \ \text{\tt RBpred\_{cf}}\ P\ i_{\mathit{cf}_{1}}\ i_{\mathit{cf}_{2}} \NR
         \stopmathalignment \stopformula
       \stopframedtext
     \stopplacefigure
@@ -331,10 +367,8 @@ standard instruction which sets a predicate equal to an evaluated condition. Thi
 used during if-conversion to set the predicate to the value of the condition in the conditional
 statement instruction. To support proper predicated instructions, each instruction therefore also
 contains an optional predicate which has been set beforehand, which produces the hyperblock. In
-additon to that, there is also an extra control-flow instruction called \type{RBpred_cf}, which can
-branch on a predicate an nests more control-flow instructions inside of it. This is also necessary
-for if-conversion, when converting an already converted conditional statement, as depending on the
-predicate, a different control-flow instruction may be necessary.
+addition to that, there is also a predicated exit instruction called \type{RBexit}, which serves as
+an early exit from the hyperblock.
 
 \placeformula[fig:hb_def]\startformula\startmathalignment[align={1:left}]
 \NC \blockbb \qquad \eqdef \qquad (\text{\tt slist}\ i) \times i_{\mathit{cf}} \NR \NC \parbb \qquad
@@ -351,18 +385,15 @@ of the basic blocks in \rtlblock\ as well as \rtlpar, and where the parallel sem
 come into play. \rtlblock\ is made of a list of instructions and a control-flow instruction that
 ends the hyperblock. Each instruction in \rtlblock\ is executed sequentially. \rtlpar\ is made of
 three separate lists. The outer list behaves like the list in \rtlblock, and executes each list
-inside of it sequentially. \rtlpar\ then has a list which executes its contents in parallel,
-meaning each instruction uses the same state of the registers and memory while executing. Finally,
-instead of directly containing instructions, there is an additional layer of sequentially executing
-lists of instructions, to be able to execute small groups of instructions in sequence, but still in
-parallel to the other instructions in the current parallel group.
-
-\rtlpar\ is structured in this way so that the scheduler has more flexibility, so that it can not
-only parallelise operations in the basic blocks, but that it can also define sequential operations
-in these parallel operations. This is especially useful for the hardware back end, as it means that
-sequential operations that do not quite fill one clock cycle can be executed sequentially, but can
-still be executed in parallel with other independent operations. This optimisation is called
-operation chaining, and is critical to get the most performance out of the hardware.
+inside of it sequentially. \rtlpar\ then has a list which executes its contents in parallel, meaning
+each instruction uses the same state of the registers and memory while executing. Finally, instead
+of directly containing instructions, there is an additional layer of sequentially executing lists of
+instructions, to be able to execute small groups of instructions in sequence, but still in parallel
+to the other instructions in the current parallel group.  The latter construct enables operation
+chaining to be expressed, without having to create new operations for each type of chaining that
+could occur.  Operation chaining is critical to get the most performance out of the hardware, as it
+fills the extra latency one may have available within one clock cycle when one is also scheduling
+instructions that have a larger latency.
 
 \subsection[translating-rtlblock-to-rtlpar]{Translating \rtlblock\ to \rtlpar}
 
@@ -384,7 +415,7 @@ operation chaining, and is critical to get the most performance out of the hardw
     clock cycles.}]
   \startfloatcombination[ny=2,nx=2]
     \startplacesubfigure[title={RTLBlock hyperblock to be scheduled.}]
-      \startframedtext[frame=off,offset=none,width={0.43\textwidth},style=\rmx]
+      \startframedtext[frame=off,offset=none,width={0.43\textwidth},style=\rmxx]
         \starthlC
           /BTEX \sI/ETEX                r2 = r1 + r4;
           /BTEX \sII/ETEX if(p1)         r1 = r2 + r4;
@@ -395,7 +426,7 @@ operation chaining, and is critical to get the most performance out of the hardw
       \stopframedtext
     \stopplacesubfigure
     \startplacesubfigure[title={Scheduled RTLPar hyperblock.}]
-      \startframedtext[frame=off,offset=none,width={0.53\textwidth},style=\rmx]
+      \startframedtext[frame=off,offset=none,width={0.53\textwidth},style=\rmxx]
         \starthlC
           1: [r2 = r1 + r4; if(p1) r1 = r2 + r4];
           [if(!p2 && !p1) r3 = r1 * r1]
@@ -439,9 +470,8 @@ scheduling step comes right after the if-conversion step which originally create
 \rtlblock\ hyperblock being scheduled is shown in \in{Figure}{a}[fig:op_chain], which contains five
 predicated operations, comprising two additions and three multiplications. The data dependencies of
 the instructions in the hyperblock are shown in \in{Figure}{c}[fig:op_chain]. Curiously, even though
-operations and are normally dependent on each other due to a write-after-write conflict, but because
-the predicates are independent the write-after-write conflict can effectively be removed by the
-scheduler.
+operations may normally be dependent on each other due to a data conflict, the dependency can be
+removed when the predicates are independent.
 
 The scheduler transforms the \rtlblock\ hyperblock into the \rtlpar\ hyperblock shown in
 \in{Figure}{b}[fig:op_chain].  Even though the addition in \sII\ is dependent on \sI, they can still
@@ -452,14 +482,14 @@ next clock cycle, either the multiplication in \sIV\ can take place, or the mult
 will take place, meaning these two multiplications can also be placed in the same clock cycle. This
 gives the final schedule that is shown in \in{Figure}{d}[fig:op_chain].
 
-\section[abstr_interp]{Abstract Interpretation of Hyperblocks}
+\subsection[abstr_interp]{Abstract Interpretation of Hyperblocks}
 
 \index{symbolic execution}\seeindex{abstract interpretation}{symbolic execution}The correctness of
 the scheduling algorithm is checked by comparing the symbolic expressions of each register before
 and after scheduling. This section describes how these symbolic expressions are constructed for each
 hyperblock.
 
-\subsection[abstr_language]{Construction of Abstract Language}
+\subsubsection[abstr_language]{Construction of the Abstract Language}
 
 The main difficulty when constructing \index{symbolic expression}symbolic expressions for
 hyperblocks is the presence of predicates, which can be used by the scheduling algorithm to reorder
@@ -526,20 +556,6 @@ primitive is multiplication of two predicated types, which is implemented as per
 multiplication between the predicated types in the two lists, anding the two predicates and joining
 the two types of each list using a function.
 
-\placeformula[eq:1]\startformula
-  P_{1} \otimes_{f} P_{2} \equiv \text{\tt map } (\lambda ((p_{1}, e_{1}), (p_{2}, e_{2})) . (p_{1}
-  \land p_{2}, f\ e_{1}\ e_{2})) P_{1} \times P_{2} \stopformula
-
-In addition to that, another primitive that is needed is the following append operation, which will
-negate the combination of all predicates in the second predicated type, and it to the first
-predicated type and append the first predicated type to the second:
-
-\placeformula[eq:2]\startformula
-  \mu(p, P) \equiv \text{\tt map } (\lambda (p', e') . (p \land p', e'))\ P \stopformula
-
-\placeformula[eq:3]\startformula
-  P_{1} \oplus_{p} P_{2} \equiv \mu(\neg p, P_{1}) + \mu(p, P_{2}) \stopformula
-
 \subsection[example-of-translation]{Example of translation}
 
 \placeformula[eq:4]\startformula \startmathalignment[n=1,align={1:left}]
@@ -573,20 +589,21 @@ expressions and applying the update function defined in \in{Section}[abstr_langu
 predicated expressions is guaranteed to have predicates which are mutually exclusive, meaning if the
 predicate evaluates to true, all other predicates must evaluate to false.
 
-\subsection[linear-flat-predicated-expressions]{Linear (Flat) Predicated Expressions}
+\subsubsection[linear-flat-predicated-expressions]{Linear (Flat) Predicated Expressions}
 
-\index{symbolic expression}The update function for predicated expressions is quite a bit more
-complex compared to standard update functions for symbolic execution, such as the one used by . The
-main reason for this is the non-recursive nature of the predicated expressions. As the predicates
-are only available at the top-level of the expressions, combining multiple expressions means that
-the conditions and multiple possibilities also need to trickle up to the top.
+\index{symbolic expression}The update function for predicated expressions is substantially more
+complex compared to the standard update functions for symbolic execution, such as the one used by
+\cite[entry][tristan08_formal_verif_trans_valid]. The main reason for this is the non-recursive
+nature of the predicated expressions. As the predicates are only available at the top-level of the
+expressions, combining multiple expressions means that the conditions and multiple possibilities
+also need to trickle up to the top.
 
 Instead, if the predicated instructions were recursive, one would not have to change any predicates
 when evaluating the argument lists of instructions, for example, as these can just stay
 predicated. The evaluation semantics would also be much simpler, because a rule can be added for
 evaluating predicates.
 
-\section[sec:correctness]{Correctness of the Abstract Interpretation}
+\subsection[sec:correctness]{Correctness of the Abstract Interpretation}
 
 The correctness of the scheduling step is dependent on the correctness of the equivalence check
 between the two abstract languages. As the abstract languages contain predicates, this comparison
@@ -594,7 +611,7 @@ check cannot just be a syntactic comparison between the two languages, because t
 be different but logically equivalent. Instead, a mixed equivalence check is performed using a
 verified SAT solver and a syntactic equivalence check of the actual symbolic expressions.
 
-\subsection[comparing-symbolic-abstract-expressions]{Comparing Symbolic Abstract Expressions}
+\subsubsection[comparing-symbolic-abstract-expressions]{Comparing Symbolic Abstract Expressions}
 
 Abstract expressions comprise a predicate and a symbolic expression. The translation from \rtlblock\
 and \rtlpar\ to \abstr\ ensures that each register will contain a predicated symbolic expression
@@ -615,11 +632,11 @@ Instead, comparing predicates can be done by checking logical equivalence of the
 formally verified \SAT\ solver. This ensures that the predicates will always evaluate to the same
 value, thereby making it a fair comparison.
 
-\subsection[verified_sat]{Verified Decidable DPLL SAT Solver}
+\subsubsection[verified_sat]{Verified Decidable DPLL SAT Solver}
 
 A SAT solver is needed to correctly and reliably compare predicated expressions. This makes it
 possible to then compare many different possible schedules, even if they remove expressions
-completely if these are unnecessary. The SAT solver is needed in the verifier, which needs to be
+completely if these are unnecessary. The SAT solver is required by the verifier, which needs to be
 proven correct in Coq to prove the scheduling translation correct.  The SAT solver therefore has to
 be verified, because it could otherwise not be used in the proof of correctness.
 
@@ -633,45 +650,47 @@ implies that the check is decidable, simplifying many proofs in Coq by using a \
 instance for equivalence between predicates. This makes rewriting predicates if they are equivalent
 much easier in proofs by reusing many of the existing Coq tactics.
 
-\subsection[encoding-predicated-expressions-as-sat-formulas]{Encoding Predicated Expressions as SAT
+\subsubsection[encoding-predicated-expressions-as-sat-formulas]{Encoding Predicated Expressions as SAT
   Formulas}
 
 Given the following two predicated expressions which should be tested for equivalence, one can
 construct a SAT expression which will ensure that both expressions will always result in the same
 value.
 
-%\startformula \begin{aligned}
-%  \bgroup \left.\begin{aligned}
-%    e_{1},\qquad &p_{1}\\
-%    e_{2},\qquad &p_{2}\\
-%    \vdots\quad& \\
-%    e_{n},\qquad &p_{n}\\
-%  \end{aligned}\right\rbrace\egroup
-%  \Longleftrightarrow
-%  \begin{cases}
-%    e_{1}',\quad &p_{1}'\\
-%    e_{2}',\quad &p_{2}'\\
-%    \;\;\;\quad\vdots \notag \\
-%    e_{m}',\quad &p_{m}'\\
-%  \end{cases}\end{aligned} \stopformula
+\startformula \bgroup \left.\startmatrix
+    \NC e_{1},\quad \NC p_{1} \NR
+    \NC e_{2},\quad \NC p_{2}\NR
+    \NC \quad \vdots \NC \NR
+    \NC e_{n},\quad \NC p_{n}\NR
+  \stopmatrix\right\rbrace\egroup
+  \Longleftrightarrow
+  \startmathcases
+    \NC e_{1}',\NC p_{1}' \NR
+    \NC e_{2}',\NC p_{2}' \NR
+    \NC \quad \vdots \NC \NR
+    \NC e_{m}',\NC p_{m}'\NR
+  \stopmathcases \stopformula
 
 The SAT query that should determine if two predicated expressions are equivalent is the following,
 where $\mathit{he}_{i}$ is the hashed expression of $e_{i}$:
 
-%\startformula \begin{gathered}
-%  (p_{1} \implies \mathit{he}_{1}) \land (p_{2} \implies \mathit{he}_{2}) \land \cdots \land (p_{n} \implies \mathit{he}_{n}) \\
-%  \Longleftrightarrow\\
-%  (p_{1}' \implies \mathit{he}_{1}') \land (p_{2}' \implies \mathit{he}_{2}') \land \cdots \land (p_{m}' \implies \mathit{he}_{m}') \\\end{gathered} \stopformula
+\startformula \startmathalignment[align={1:middle}]
+  \NC (p_{1} \implies \mathit{he}_{1}) \land (p_{2} \implies \mathit{he}_{2}) \land \cdots \land (p_{n}
+  \implies \mathit{he}_{n}) \NR
+  \NC \Longleftrightarrow \NR
+  \NC (p_{1}' \implies \mathit{he}_{1}') \land (p_{2}' \implies \mathit{he}_{2}') \land \cdots \land
+  (p_{m}' \implies \mathit{he}_{m}') \NR
+\stopmathalignment \stopformula
 
 However, reasoning about the equivalence of individual expressions within the greater list of
-predicated expressions is not quite straightforward. The evaluation semantics of predicated
-expression list will evaluate each predicate expression sequentially, and pick the expression whose
-predicate evaluates to true. However, proving that both predicated expression lists will return the
-same value assuming that the SAT query above is true, means that for each predicate that is true,
-One needs to be able to show that there is a predicate in the other list that will also be true,
-which will return the same value. However, there is no guarantee that there is a unique predicated
-expression that will match the current predicate, as there may be duplicate expressions in the list
-with different predicates.
+predicated expressions is not quite straightforward. The evaluation semantics of a predicated
+expression list is the sequentially evaluation of each predicate expression, picking the expression
+whose predicate evaluates to true. However, proving that two predicated expression lists will return
+the same value assuming that the SAT query above is true, means that for each predicate that is
+true, one needs to be able to show that there is a predicate in the other list that will also be
+true and will return the same value. However, there is no guarantee that there is a unique
+predicated expression that will match the current predicate, as there may be duplicate expressions
+in the list with different predicates.
 
 This means that it is quite difficult to prove that the SAT query implies the semantic preservation
 of the scheduling transformation. The SAT query can therefore be broken down into a slightly more
@@ -694,37 +713,29 @@ grained comparison are the following:
 This comparison is logically equivalent to performing the large SAT query, however, it maps better
 to the proof because it is comparing the predicates of the same expressions with each other.
 
-\subsection[proof-of-equivalence-between-passes]{Proof of equivalence between passes}
+\subsection[proof-of-correctness-of-scheduling]{Proof of Correctness of Scheduling}
 
-\in{Figure}[fig:compcert_interm] shows the four different passes that were added to CompCert to form
-hyperblocks.  This section will cover the proofs for each of the translations, focusing mainly on
-the proof of translation of the scheduling pass.
-
-\subsubsection[proof-of-correctness-of-the-if-conversion-pass]{Proof of Correctness of the
-  If-Conversion Pass}
-
-The if-conversion pass is quite simple to verify, as it is an optimisation that transforms the code
-in the same language.
-
-\subsubsection[proof-of-correctness-of-scheduling]{Proof of Correctness of Scheduling}
-
-The scheduling pass is formally verified using translation validation, using the \abstr\ language as
+The scheduling pass is formally verified using translation validation using the \abstr\ language as
 a point of comparison between \rtlblock\ and \rtlpar. Using the algorithm defined in
 \in{Section}[abstr_interp], the formally verified SAT solver described in \in{Section}[verified_sat]
-can be used prove that if the two predicated expression lists are equivalent, that they will always
-evaluate to the same value.
-
-\section[implementation-details-of-static-sdc-scheduling]{\SDC\ Scheduling Implementation Details}
-
-This section describes the implementation details of the static \SDC\ scheduling algorithm used, and
-also describes some of the details of the basic block generation as well as the if-conversion pass
-which generates the predicated instructions.
-
-\subsection[static-sdc-scheduling]{Static \SDC\ Scheduling}
-
-\SDC\ scheduling~\cite[cong06_sdc] is a flexible algorithm to perform scheduling of instructions,
-especially if the target is hardware directly. It allows for flexible definitions of optimisation
-functions in terms of latencies and dependencies of the operations in the basic block.
+can be used to prove that if the two predicated expression lists are equivalent, that they will
+always evaluate to the same value.  This follows directly from the proofs of correctness for the
+translation to \abstr.  Then, the proof that if the comparison function using the SAT solver between
+two \abstr\ languages succeeds, then the two abstract languages will behave identically, this proof
+can then be composed with the proofs of correctness of the generation of \abstr.
+
+\section[scheduling:conclusion]{Conclusion}
+
+In general, this whole proof of correctness for scheduling was the most challenging because as the
+scheduling algorithm evolved, the language semantics of \rtlblock\ and \rtlpar\ also evolved, which
+then required large changes in the proofs of correctness for each step in the scheduling proof.  In
+addition to that, the whole scheduling proof, from \rtlblock\ generation to hardware generation
+required greatly different approaches to the correctness proofs in each step.  Even simple
+translations like the basic block generation and if-conversion pass required quite a few different
+insights into how the algorithms actually worked and how the semantics behaved to come up with good
+matching relation which are powerful enough to prove the final correctness result, but are also
+provable in the first place.  Simply formulating these relations in the first place requires a lot
+of experimentation.
 
 \startmode[chapter]
   \section{Bibliography}