From b93723f5d4391333b6ed12cea8c3f49871976927 Mon Sep 17 00:00:00 2001
From: Yann Herklotz <git@yannherklotz.com>
Date: Sun, 12 Jun 2022 11:28:15 -0700
Subject: Changes

---
 chapters/background.tex |   2 +-
 chapters/hls.tex        |   4 ++--
 chapters/pipelining.tex |   2 +-
 chapters/scheduling.tex |  34 ++++++++++++++++++++--------------
 figures/timing-1.pdf    | Bin 26407 -> 24532 bytes
 figures/timing-2.pdf    | Bin 26760 -> 24881 bytes
 figures/timing-3.pdf    | Bin 23394 -> 22448 bytes
 lsr_env.tex             |   2 +-
 8 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/chapters/background.tex b/chapters/background.tex
index 7f8539c..d4ec91f 100644
--- a/chapters/background.tex
+++ b/chapters/background.tex
@@ -303,7 +303,7 @@ assuming that $\sigma$ is dominated by $d$ ($p \le_{d} d$), then the following e
 \stopformula
 
 This is an important lemma as it essentially allows one to know the value of a register as long as
-one knows that it's assignment dominates the current node and one knows what expressions it was
+one knows that its assignment dominates the current node and one knows what expressions it was
 assigned.
 
 \subsection{HLS Formalised in Isabelle}
diff --git a/chapters/hls.tex b/chapters/hls.tex
index 7e1f79e..0be3f99 100644
--- a/chapters/hls.tex
+++ b/chapters/hls.tex
@@ -187,7 +187,7 @@ always-blocks triggering on the same event are executed in parallel. Always-bloc
 control-flow using if-statements and case-statements.
 
 \startplacemarginfigure[reference={fig:tutorial:state_machine},title={Example of a state machine
-    implementation in Verilog and it's corresponding state diagram.}]
+    implementation in Verilog and its corresponding state diagram.}]
   \startfloatcombination[nx=2, ny=1]
     \startplacesubfigure
       \startframedtext[width={0.6\textwidth},frame=off,offset=none,bodyfont=11pt]
@@ -242,7 +242,7 @@ simultaneously.  The first always-block controls the values in the register \typ
 When the \type{state} is 0, \type{tmp} will be assigned to the input \type{y} using nonblocking
 assignment, denoted by \type{<=}.  Nonblocking assignment assigns registers in parallel at the end
 of the clock cycle, rather than sequentially throughout the always-block. In the second
-always-block, the input \type{y} will be checked, and if it's high it will move on to the next
+always-block, the input \type{y} will be checked, and if it is high it will move on to the next
 state, otherwise it will stay in the current state.  When \type{state} is 1, the first always-block
 will reset the value of \type{tmp} and then set \type{z} to the original value of \type{tmp}, since
 nonblocking assignment does not change its value until the end of the clock cycle.  Finally, the
diff --git a/chapters/pipelining.tex b/chapters/pipelining.tex
index b0bdb20..7d81da6 100644
--- a/chapters/pipelining.tex
+++ b/chapters/pipelining.tex
@@ -17,7 +17,7 @@ addresses parallelisation within one iteration.  Traditionally, loop pipelining
 of the normal scheduling step in \HLS, as the scheduling algorithm can be generalised to support
 these new constraints.  However, it becomes expensive to check the correctness of a scheduling
 algorithm that transforms the code in this many ways.  It is better to separate loop pipelining into
-it's own translation pass which does a source-to-source translation of the code.  The final result
+its own translation pass which does a source-to-source translation of the code.  The final result
 should be similar to performing the loop pipelining together with the scheduling.
 
 \section{Loop pipelining example}
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
index 9412ea4..4b897ef 100644
--- a/chapters/scheduling.tex
+++ b/chapters/scheduling.tex
@@ -10,7 +10,7 @@
 \oindex{static scheduling}The use of multicore processors has been increasing drastically, thereby
 making parallelising compilers ever more important. In addition to that, the need for custom
 hardware accelerators is also increasing, as traditional processors are not always the best choice
-for all applications. Compilers that support optimisations which can automatically parallelise it's
+for all applications. Compilers that support optimisations which can automatically parallelise its
 input programs are therefore also becoming more important, as these back ends all benefit from
 it. For processors, one wants to optimise the pipeline usage as much as possible, whereas in
 hardware one can create custom pipelines, as well as making more use out of the spacial properties
@@ -31,7 +31,7 @@ can never move instructions past a branching instruction, even when this might d
 the performance. Trace scheduling~\cite[fisher81_trace_sched] is an alternative that addresses this
 issue, by creating paths that might cross branching instructions, and scheduling the instructions in
 that path. However, the main problem with trace scheduling is that it is often infeasible on large
-programs, and because of the large design space, it's sometimes hard to find an optimal
+programs, and because of the large design space, it is sometimes hard to find an optimal
 schedule. \index{superblock}Superblock~\cite[hwu93_super] and
 \index{hyperblock}hyperblock~\cite[mahlke92_effec_compil_suppor_predic_execut_using_hyper]
 scheduling are two subsets of trace scheduling, which compromise on the flexibility of trace
@@ -254,23 +254,29 @@ hyperblocks. It has a few limitations on the kind of conditional statements that
 First, only conditional statements without loops can be translated, therefore, one must identify
 cycles in the control-flow before performing the if-conversion. Secondly, if-conversion will not
 always result in more efficient code, so it should not be applied to any conditional
-statements. Instead, it is best applied to conditional statements where each branch will take a
-similar amount of time to execute.
+statements.
 
 From an implementation point of view, if-conversion can be implemented naïvely, however, it could be
 improved by adding better heuristics for which conditional statements to transform into predicated
 instructions.  For hyperblocks, conditionals where each branch approximately execute in the same
 number of cycles are an ideal candidate for the transformation, as it means that no time is lost due
-to having to execute extra predicated instructions that are false.  However, in addition to that if
-one has an estimate on the probability of picking a branch, then one can introduce more interesting
-heuristics such as also transforming conditionals who have disproportionate branches, but where one
-is more likely to execute the longest branch every time.
+to having to execute extra predicated instructions that are false.  However, as hyperblocks are an
+extension of superblocks, single paths through the control-flow graph can also be represented.  In
+our implementation, we use simple static heuristics to pick these paths~\cite[??].
 
-For back ends that do not support predicates, it is also important to eliminate the predicates again
-using a process called reverse if-conversion, which creates branches out of the predicates and
-groups these together again. This allows for hyperblock scheduling to also be used with back ends
-that do not support predicates, which are most the default back ends included in CompCert.  However,
-this is currently out of scope for the implementation.
+\subsection[proof-if-conversion]{Correctness of If-Conversion}
+
+The correctness of the if-conversion pass does not rely on which paths are chosen to be combined.
+The proof therefore does not have to cover the static heuristics that are generated, because these
+only affect the performance.  Even though the translation is quite simple, the reasoning about it's
+correctness does have its difficulties.  We have a similar problem as before, where we sometimes
+need to execute multiple steps in the input program to reach one step in the output program.  The
+solution is to perform a similar style proof as for the basic block generation.
+
+The main difference is the matching condition, as the hyperblock is not reduced instruction by
+instruction anymore, but section by section, depending on if a section was if-converted or not.
+This means that at each point, one is either at the head of a section or in the middle of an
+if-converted section.
 
 \section[sec:scheduling]{Scheduling Implementation}
 
@@ -681,7 +687,7 @@ grained comparison are the following:
   predicates are combined using the or operation.
 \item
   Iterate through the expressions in both maps and compare the predicates using the SAT solver. If
-  an expression is not present in one of the maps, then it's predicate should be equivalent to
+  an expression is not present in one of the maps, then its predicate should be equivalent to
   $\perp$.
 \stopitemize
 
diff --git a/figures/timing-1.pdf b/figures/timing-1.pdf
index 24f8edd..b540cb0 100644
Binary files a/figures/timing-1.pdf and b/figures/timing-1.pdf differ
diff --git a/figures/timing-2.pdf b/figures/timing-2.pdf
index 36075c6..34dab0d 100644
Binary files a/figures/timing-2.pdf and b/figures/timing-2.pdf differ
diff --git a/figures/timing-3.pdf b/figures/timing-3.pdf
index 68f4cf8..04262be 100644
Binary files a/figures/timing-3.pdf and b/figures/timing-3.pdf differ
diff --git a/lsr_env.tex b/lsr_env.tex
index b144119..a97e380 100644
--- a/lsr_env.tex
+++ b/lsr_env.tex
@@ -33,7 +33,7 @@
 
 \definefontfeature[default][default][protrusion=quality,expansion=quality]
 \setupalign[hz,hanging,lesshyphenation,verytolerant]
-\startnotmode[nolmtx]\setupbodyfont[lsrfont,11pt]\stopnotmode
+\startnotmode[nolmtx]\setupbodyfont[pagella,11pt]\stopnotmode
 \startmode[nolmtx]\setupbodyfont[pagella,11pt]\stopmode
 %\setupalign[hz,hanging,nothyphenated]
 \setupinterlinespace[big]
-- 
cgit