From b93723f5d4391333b6ed12cea8c3f49871976927 Mon Sep 17 00:00:00 2001 From: Yann Herklotz Date: Sun, 12 Jun 2022 11:28:15 -0700 Subject: Changes --- chapters/background.tex | 2 +- chapters/hls.tex | 4 ++-- chapters/pipelining.tex | 2 +- chapters/scheduling.tex | 34 ++++++++++++++++++++-------------- figures/timing-1.pdf | Bin 26407 -> 24532 bytes figures/timing-2.pdf | Bin 26760 -> 24881 bytes figures/timing-3.pdf | Bin 23394 -> 22448 bytes lsr_env.tex | 2 +- 8 files changed, 25 insertions(+), 19 deletions(-) diff --git a/chapters/background.tex b/chapters/background.tex index 7f8539c..d4ec91f 100644 --- a/chapters/background.tex +++ b/chapters/background.tex @@ -303,7 +303,7 @@ assuming that $\sigma$ is dominated by $d$ ($p \le_{d} d$), then the following e \stopformula This is an important lemma as it essentially allows one to know the value of a register as long as -one knows that it's assignment dominates the current node and one knows what expressions it was +one knows that its assignment dominates the current node and one knows what expressions it was assigned. \subsection{HLS Formalised in Isabelle} diff --git a/chapters/hls.tex b/chapters/hls.tex index 7e1f79e..0be3f99 100644 --- a/chapters/hls.tex +++ b/chapters/hls.tex @@ -187,7 +187,7 @@ always-blocks triggering on the same event are executed in parallel. Always-bloc control-flow using if-statements and case-statements. \startplacemarginfigure[reference={fig:tutorial:state_machine},title={Example of a state machine - implementation in Verilog and it's corresponding state diagram.}] + implementation in Verilog and its corresponding state diagram.}] \startfloatcombination[nx=2, ny=1] \startplacesubfigure \startframedtext[width={0.6\textwidth},frame=off,offset=none,bodyfont=11pt] @@ -242,7 +242,7 @@ simultaneously. The first always-block controls the values in the register \typ When the \type{state} is 0, \type{tmp} will be assigned to the input \type{y} using nonblocking assignment, denoted by \type{<=}. Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second -always-block, the input \type{y} will be checked, and if it's high it will move on to the next +always-block, the input \type{y} will be checked, and if it is high it will move on to the next state, otherwise it will stay in the current state. When \type{state} is 1, the first always-block will reset the value of \type{tmp} and then set \type{z} to the original value of \type{tmp}, since nonblocking assignment does not change its value until the end of the clock cycle. Finally, the diff --git a/chapters/pipelining.tex b/chapters/pipelining.tex index b0bdb20..7d81da6 100644 --- a/chapters/pipelining.tex +++ b/chapters/pipelining.tex @@ -17,7 +17,7 @@ addresses parallelisation within one iteration. Traditionally, loop pipelining of the normal scheduling step in \HLS, as the scheduling algorithm can be generalised to support these new constraints. However, it becomes expensive to check the correctness of a scheduling algorithm that transforms the code in this many ways. It is better to separate loop pipelining into -it's own translation pass which does a source-to-source translation of the code. The final result +its own translation pass which does a source-to-source translation of the code. The final result should be similar to performing the loop pipelining together with the scheduling. \section{Loop pipelining example} diff --git a/chapters/scheduling.tex b/chapters/scheduling.tex index 9412ea4..4b897ef 100644 --- a/chapters/scheduling.tex +++ b/chapters/scheduling.tex @@ -10,7 +10,7 @@ \oindex{static scheduling}The use of multicore processors has been increasing drastically, thereby making parallelising compilers ever more important. In addition to that, the need for custom hardware accelerators is also increasing, as traditional processors are not always the best choice -for all applications. Compilers that support optimisations which can automatically parallelise it's +for all applications. Compilers that support optimisations which can automatically parallelise its input programs are therefore also becoming more important, as these back ends all benefit from it. For processors, one wants to optimise the pipeline usage as much as possible, whereas in hardware one can create custom pipelines, as well as making more use out of the spacial properties @@ -31,7 +31,7 @@ can never move instructions past a branching instruction, even when this might d the performance. Trace scheduling~\cite[fisher81_trace_sched] is an alternative that addresses this issue, by creating paths that might cross branching instructions, and scheduling the instructions in that path. However, the main problem with trace scheduling is that it is often infeasible on large -programs, and because of the large design space, it's sometimes hard to find an optimal +programs, and because of the large design space, it is sometimes hard to find an optimal schedule. \index{superblock}Superblock~\cite[hwu93_super] and \index{hyperblock}hyperblock~\cite[mahlke92_effec_compil_suppor_predic_execut_using_hyper] scheduling are two subsets of trace scheduling, which compromise on the flexibility of trace @@ -254,23 +254,29 @@ hyperblocks. It has a few limitations on the kind of conditional statements that First, only conditional statements without loops can be translated, therefore, one must identify cycles in the control-flow before performing the if-conversion. Secondly, if-conversion will not always result in more efficient code, so it should not be applied to any conditional -statements. Instead, it is best applied to conditional statements where each branch will take a -similar amount of time to execute. +statements. From an implementation point of view, if-conversion can be implemented naïvely, however, it could be improved by adding better heuristics for which conditional statements to transform into predicated instructions. For hyperblocks, conditionals where each branch approximately execute in the same number of cycles are an ideal candidate for the transformation, as it means that no time is lost due -to having to execute extra predicated instructions that are false. However, in addition to that if -one has an estimate on the probability of picking a branch, then one can introduce more interesting -heuristics such as also transforming conditionals who have disproportionate branches, but where one -is more likely to execute the longest branch every time. +to having to execute extra predicated instructions that are false. However, as hyperblocks are an +extension of superblocks, single paths through the control-flow graph can also be represented. In +our implementation, we use simple static heuristics to pick these paths~\cite[??]. -For back ends that do not support predicates, it is also important to eliminate the predicates again -using a process called reverse if-conversion, which creates branches out of the predicates and -groups these together again. This allows for hyperblock scheduling to also be used with back ends -that do not support predicates, which are most the default back ends included in CompCert. However, -this is currently out of scope for the implementation. +\subsection[proof-if-conversion]{Correctness of If-Conversion} + +The correctness of the if-conversion pass does not rely on which paths are chosen to be combined. +The proof therefore does not have to cover the static heuristics that are generated, because these +only affect the performance. Even though the translation is quite simple, the reasoning about it's +correctness does have its difficulties. We have a similar problem as before, where we sometimes +need to execute multiple steps in the input program to reach one step in the output program. The +solution is to perform a similar style proof as for the basic block generation. + +The main difference is the matching condition, as the hyperblock is not reduced instruction by +instruction anymore, but section by section, depending on if a section was if-converted or not. +This means that at each point, one is either at the head of a section or in the middle of an +if-converted section. \section[sec:scheduling]{Scheduling Implementation} @@ -681,7 +687,7 @@ grained comparison are the following: predicates are combined using the or operation. \item Iterate through the expressions in both maps and compare the predicates using the SAT solver. If - an expression is not present in one of the maps, then it's predicate should be equivalent to + an expression is not present in one of the maps, then its predicate should be equivalent to $\perp$. \stopitemize diff --git a/figures/timing-1.pdf b/figures/timing-1.pdf index 24f8edd..b540cb0 100644 Binary files a/figures/timing-1.pdf and b/figures/timing-1.pdf differ diff --git a/figures/timing-2.pdf b/figures/timing-2.pdf index 36075c6..34dab0d 100644 Binary files a/figures/timing-2.pdf and b/figures/timing-2.pdf differ diff --git a/figures/timing-3.pdf b/figures/timing-3.pdf index 68f4cf8..04262be 100644 Binary files a/figures/timing-3.pdf and b/figures/timing-3.pdf differ diff --git a/lsr_env.tex b/lsr_env.tex index b144119..a97e380 100644 --- a/lsr_env.tex +++ b/lsr_env.tex @@ -33,7 +33,7 @@ \definefontfeature[default][default][protrusion=quality,expansion=quality] \setupalign[hz,hanging,lesshyphenation,verytolerant] -\startnotmode[nolmtx]\setupbodyfont[lsrfont,11pt]\stopnotmode +\startnotmode[nolmtx]\setupbodyfont[pagella,11pt]\stopnotmode \startmode[nolmtx]\setupbodyfont[pagella,11pt]\stopmode %\setupalign[hz,hanging,nothyphenated] \setupinterlinespace[big] -- cgit