summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2022-06-12 11:28:15 -0700
committerYann Herklotz <git@yannherklotz.com>2022-06-12 11:28:15 -0700
commitb93723f5d4391333b6ed12cea8c3f49871976927 (patch)
treeec540a80d891c8afb10f2be62baefb1823f77981
parentd91b97dfed94a38147894621404b9df075460084 (diff)
downloadlsr22_fvhls-b93723f5d4391333b6ed12cea8c3f49871976927.tar.gz
lsr22_fvhls-b93723f5d4391333b6ed12cea8c3f49871976927.zip
Changes
-rw-r--r--chapters/background.tex2
-rw-r--r--chapters/hls.tex4
-rw-r--r--chapters/pipelining.tex2
-rw-r--r--chapters/scheduling.tex34
-rw-r--r--figures/timing-1.pdfbin26407 -> 24532 bytes
-rw-r--r--figures/timing-2.pdfbin26760 -> 24881 bytes
-rw-r--r--figures/timing-3.pdfbin23394 -> 22448 bytes
-rw-r--r--lsr_env.tex2
8 files changed, 25 insertions, 19 deletions
diff --git a/chapters/background.tex b/chapters/background.tex
index 7f8539c..d4ec91f 100644
--- a/chapters/background.tex
+++ b/chapters/background.tex
@@ -303,7 +303,7 @@ assuming that $\sigma$ is dominated by $d$ ($p \le_{d} d$), then the following e
\stopformula
This is an important lemma as it essentially allows one to know the value of a register as long as
-one knows that it's assignment dominates the current node and one knows what expressions it was
+one knows that its assignment dominates the current node and one knows what expressions it was
assigned.
\subsection{HLS Formalised in Isabelle}
diff --git a/chapters/hls.tex b/chapters/hls.tex
index 7e1f79e..0be3f99 100644
--- a/chapters/hls.tex
+++ b/chapters/hls.tex
@@ -187,7 +187,7 @@ always-blocks triggering on the same event are executed in parallel. Always-bloc
control-flow using if-statements and case-statements.
\startplacemarginfigure[reference={fig:tutorial:state_machine},title={Example of a state machine
- implementation in Verilog and it's corresponding state diagram.}]
+ implementation in Verilog and its corresponding state diagram.}]
\startfloatcombination[nx=2, ny=1]
\startplacesubfigure
\startframedtext[width={0.6\textwidth},frame=off,offset=none,bodyfont=11pt]
@@ -242,7 +242,7 @@ simultaneously. The first always-block controls the values in the register \typ
When the \type{state} is 0, \type{tmp} will be assigned to the input \type{y} using nonblocking
assignment, denoted by \type{<=}. Nonblocking assignment assigns registers in parallel at the end
of the clock cycle, rather than sequentially throughout the always-block. In the second
-always-block, the input \type{y} will be checked, and if it's high it will move on to the next
+always-block, the input \type{y} will be checked, and if it is high it will move on to the next
state, otherwise it will stay in the current state. When \type{state} is 1, the first always-block
will reset the value of \type{tmp} and then set \type{z} to the original value of \type{tmp}, since
nonblocking assignment does not change its value until the end of the clock cycle. Finally, the
diff --git a/chapters/pipelining.tex b/chapters/pipelining.tex
index b0bdb20..7d81da6 100644
--- a/chapters/pipelining.tex
+++ b/chapters/pipelining.tex
@@ -17,7 +17,7 @@ addresses parallelisation within one iteration. Traditionally, loop pipelining
of the normal scheduling step in \HLS, as the scheduling algorithm can be generalised to support
these new constraints. However, it becomes expensive to check the correctness of a scheduling
algorithm that transforms the code in this many ways. It is better to separate loop pipelining into
-it's own translation pass which does a source-to-source translation of the code. The final result
+its own translation pass which does a source-to-source translation of the code. The final result
should be similar to performing the loop pipelining together with the scheduling.
\section{Loop pipelining example}
diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex
index 9412ea4..4b897ef 100644
--- a/chapters/scheduling.tex
+++ b/chapters/scheduling.tex
@@ -10,7 +10,7 @@
\oindex{static scheduling}The use of multicore processors has been increasing drastically, thereby
making parallelising compilers ever more important. In addition to that, the need for custom
hardware accelerators is also increasing, as traditional processors are not always the best choice
-for all applications. Compilers that support optimisations which can automatically parallelise it's
+for all applications. Compilers that support optimisations which can automatically parallelise its
input programs are therefore also becoming more important, as these back ends all benefit from
it. For processors, one wants to optimise the pipeline usage as much as possible, whereas in
hardware one can create custom pipelines, as well as making more use out of the spacial properties
@@ -31,7 +31,7 @@ can never move instructions past a branching instruction, even when this might d
the performance. Trace scheduling~\cite[fisher81_trace_sched] is an alternative that addresses this
issue, by creating paths that might cross branching instructions, and scheduling the instructions in
that path. However, the main problem with trace scheduling is that it is often infeasible on large
-programs, and because of the large design space, it's sometimes hard to find an optimal
+programs, and because of the large design space, it is sometimes hard to find an optimal
schedule. \index{superblock}Superblock~\cite[hwu93_super] and
\index{hyperblock}hyperblock~\cite[mahlke92_effec_compil_suppor_predic_execut_using_hyper]
scheduling are two subsets of trace scheduling, which compromise on the flexibility of trace
@@ -254,23 +254,29 @@ hyperblocks. It has a few limitations on the kind of conditional statements that
First, only conditional statements without loops can be translated, therefore, one must identify
cycles in the control-flow before performing the if-conversion. Secondly, if-conversion will not
always result in more efficient code, so it should not be applied to any conditional
-statements. Instead, it is best applied to conditional statements where each branch will take a
-similar amount of time to execute.
+statements.
From an implementation point of view, if-conversion can be implemented naïvely, however, it could be
improved by adding better heuristics for which conditional statements to transform into predicated
instructions. For hyperblocks, conditionals where each branch approximately execute in the same
number of cycles are an ideal candidate for the transformation, as it means that no time is lost due
-to having to execute extra predicated instructions that are false. However, in addition to that if
-one has an estimate on the probability of picking a branch, then one can introduce more interesting
-heuristics such as also transforming conditionals who have disproportionate branches, but where one
-is more likely to execute the longest branch every time.
+to having to execute extra predicated instructions that are false. However, as hyperblocks are an
+extension of superblocks, single paths through the control-flow graph can also be represented. In
+our implementation, we use simple static heuristics to pick these paths~\cite[??].
-For back ends that do not support predicates, it is also important to eliminate the predicates again
-using a process called reverse if-conversion, which creates branches out of the predicates and
-groups these together again. This allows for hyperblock scheduling to also be used with back ends
-that do not support predicates, which are most the default back ends included in CompCert. However,
-this is currently out of scope for the implementation.
+\subsection[proof-if-conversion]{Correctness of If-Conversion}
+
+The correctness of the if-conversion pass does not rely on which paths are chosen to be combined.
+The proof therefore does not have to cover the static heuristics that are generated, because these
+only affect the performance. Even though the translation is quite simple, the reasoning about it's
+correctness does have its difficulties. We have a similar problem as before, where we sometimes
+need to execute multiple steps in the input program to reach one step in the output program. The
+solution is to perform a similar style proof as for the basic block generation.
+
+The main difference is the matching condition, as the hyperblock is not reduced instruction by
+instruction anymore, but section by section, depending on if a section was if-converted or not.
+This means that at each point, one is either at the head of a section or in the middle of an
+if-converted section.
\section[sec:scheduling]{Scheduling Implementation}
@@ -681,7 +687,7 @@ grained comparison are the following:
predicates are combined using the or operation.
\item
Iterate through the expressions in both maps and compare the predicates using the SAT solver. If
- an expression is not present in one of the maps, then it's predicate should be equivalent to
+ an expression is not present in one of the maps, then its predicate should be equivalent to
$\perp$.
\stopitemize
diff --git a/figures/timing-1.pdf b/figures/timing-1.pdf
index 24f8edd..b540cb0 100644
--- a/figures/timing-1.pdf
+++ b/figures/timing-1.pdf
Binary files differ
diff --git a/figures/timing-2.pdf b/figures/timing-2.pdf
index 36075c6..34dab0d 100644
--- a/figures/timing-2.pdf
+++ b/figures/timing-2.pdf
Binary files differ
diff --git a/figures/timing-3.pdf b/figures/timing-3.pdf
index 68f4cf8..04262be 100644
--- a/figures/timing-3.pdf
+++ b/figures/timing-3.pdf
Binary files differ
diff --git a/lsr_env.tex b/lsr_env.tex
index b144119..a97e380 100644
--- a/lsr_env.tex
+++ b/lsr_env.tex
@@ -33,7 +33,7 @@
\definefontfeature[default][default][protrusion=quality,expansion=quality]
\setupalign[hz,hanging,lesshyphenation,verytolerant]
-\startnotmode[nolmtx]\setupbodyfont[lsrfont,11pt]\stopnotmode
+\startnotmode[nolmtx]\setupbodyfont[pagella,11pt]\stopnotmode
\startmode[nolmtx]\setupbodyfont[pagella,11pt]\stopmode
%\setupalign[hz,hanging,nothyphenated]
\setupinterlinespace[big]