summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2020-11-21 11:08:06 +0000
committeroverleaf <overleaf@localhost>2020-11-21 11:08:52 +0000
commitef7ac73793f3a1f119a6da176c51c1f90b558d41 (patch)
tree4f6f2fad7d47ef1e3e39857eb27187076db6ed45 /algorithm.tex
parent94dbc2a5e70ffa2d6b4a464cb2880dd4affdccfe (diff)
downloadoopsla21_fvhls-ef7ac73793f3a1f119a6da176c51c1f90b558d41.tar.gz
oopsla21_fvhls-ef7ac73793f3a1f119a6da176c51c1f90b558d41.zip
Update on Overleaf.
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex8
1 files changed, 4 insertions, 4 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 3a8882c..1191c93 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -65,7 +65,7 @@ The main work flow of \vericert{} is given in Figure~\ref{fig:rtlbranch}, which
\compcert{} translates Clight\footnote{A deterministic subset of C with pure expressions.} input into assembly output via a sequence of intermediate languages; we must decide which of these \numcompcertlanguages{} languages is the most suitable starting point for the HLS-specific translation stages.
-We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language (RTL) in the \compcert{} literature. `3AC' is used in this paper instead to avoid confusion with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} as the starting point. Branching off before this point (at CminorSel or earlier) denies \compcert{} the opportunity to perform optimisations such as constant propagation and dead code elimination, which have been found useful in HLS tools as well as software compilers~\cite{cong+11}. And if we branch off after this point (at LTL or later) then \compcert{} has already performed register allocation to reduce the number of registers and spill some variables to the stack; this transformation is not required in HLS because there are many more registers available, and these should be used instead of RAM whenever possible. %\JP{``\compcert{} performs register allocation during the translation to LTL, with some registers spilled onto the stack: this is unnecessary in HLS since as many registers as are required may be described in the output RTL.''} \JP{Maybe something about FPGAs being register-dense (so rarely a need to worry about the number of flops)?}
+We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language (RTL) in the \compcert{} literature. `3AC' is used in this paper instead to avoid confusion with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} as the starting point. Branching off before this point (at CminorSel or earlier) denies \compcert{} the opportunity to perform optimisations such as constant propagation and dead code elimination, which have been found to be useful in HLS tools as well as software compilers~\cite{cong+11}. Instead, if we branch off after this point (at LTL or later) then \compcert{} has already performed register allocation to reduce the number of registers and spill some variables to the stack; this transformation is not required in HLS because there are many more registers available, and these should be used instead of RAM whenever possible. %\JP{``\compcert{} performs register allocation during the translation to LTL, with some registers spilled onto the stack: this is unnecessary in HLS since as many registers as are required may be described in the output RTL.''} \JP{Maybe something about FPGAs being register-dense (so rarely a need to worry about the number of flops)?}
3AC is also attractive because it is the closest intermediate language to LLVM IR, which is used by several existing HLS compilers. %\JP{We already ruled out LLVM as a starting point, so this seems like it needs further qualification.}\YH{Well not because it's not a good starting point, but the ecosystem in Coq isn't as good. I think it's still OK here to say that being similar to LLVM IR is an advantage?}
It has an unlimited number of pseudo-registers, and is represented as a control flow graph (CFG) where each instruction is a node with links to the instructions that can follow it. One difference between LLVM IR and 3AC is that 3AC includes operations that are specific to the chosen target architecture; we chose to target the x86\_32 backend, because it generally produces relatively dense 3AC thanks to the availability of complex addressing modes.% reducing cycle counts in the absence of an effective scheduling approach.
@@ -204,7 +204,7 @@ As part of this translation, function inlining is also performed on all function
The next translation is from 3AC to a new hardware translation language (HTL). %, which is one step towards being completely translated to hardware described in Verilog.
This involves going from a CFG representation of the computation to a finite state machine with data-path (FSMD) representation~\cite{hwang99_fsmd}. The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers. %\JP{I've become less comfortable with this term, but it's personal preference so feel free to ignore. I think `generalised finite state machine' (i.e.\ thinking of the entire `data-path' as contributing to the overall state) is more accurate.}\YH{Hmm, yes, I mainly chose FSMD because there is quite a lot of literature around it. I think for now I'll keep it but for the final draft we could maybe change it.}
%This means that the state transitions can be translated into a simple finite state machine (FSM) where each state contains data operations that update the memory and registers.
-Hence, an HTL program consists of two maps consisting of Verilog statements: control- and data-path maps that express state transitions and computations respectively.
+Hence, an HTL program consists of two maps from states to Verilog statements: control- and data-path maps that express state transitions and computations respectively.
Figure~\ref{fig:accumulator_diagram} shows the resulting FSMD architecture. The right-hand block is the control logic that computes the next state, while the left-hand block updates all the registers and RAM based on the current program state.
\begin{figure*}
@@ -293,7 +293,7 @@ Figure~\ref{fig:accumulator_diagram} shows the resulting FSMD architecture. The
\node[scale=0.4] at (3.5,3.6) {$\cdots$};
\node[scale=0.4] at (3.5,3.4) {\texttt{reg\_8}};
\end{tikzpicture}}
- \caption{The FSMD for the example shown in Figure~\ref{fig:accumulator_c_rtl}, split into a data path and control logic for the next state calculation. The Update block takes the current state, current values of all registers and at most one value stored in the array, and calculates a new value that can either be stored back in the array or in a register.}\label{fig:accumulator_diagram}
+ \caption{The FSMD for the example shown in Figure~\ref{fig:accumulator_c_rtl}, split into a data-path and control logic for the next state calculation. The Update block takes the current state, current values of all registers and at most one value stored in the array, and calculates a new value that can either be stored back in the array or in a register.}\label{fig:accumulator_diagram}
\end{figure*}
%\JP{Does it? Verilog has neither physical registers nor RAMs, just language constructs which the synthesiser might implement with registers and RAMs. We should be clear whether we're talking about the HDL representation, or the synthesised result: in our case these can be very different since we don't target any specific architectural features of an FPGA fabric of ASIC process.}
@@ -312,7 +312,7 @@ For example, state 15 in Figure~\ref{fig:accumulator_rtl} shows a 32-bit registe
Note that the comparison in state 3 is signed. C and Verilog handle signedness quite differently; by default, all operators and registers in Verilog (and HTL) are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to be signed. In addition to that, Verilog implicitly resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is not supported by the Verilog semantics we adopted, so to match the semantics to the behaviour of the simulator and synthesis tool, braces are placed around all expressions as this inhibits implicit resizing. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
-In addition to that, equality between \emph{unsigned} variables are actually not supported, because this requires supporting the comparison of pointers, which should only be performed between pointers with the same provenance. In \vericert{} there is currently no way to determine the provenance of a pointer, and it therefore cannot model the semantics of unsigned comparison in \compcert{}. As dynamic allocation is not supported either, comparison of pointers is rarely needed, and for the comparison of integers, these can be cast to signed integers for the translation to succeed.
+In addition to that, equality between \emph{unsigned} variables are actually not supported, because this requires supporting the comparison of pointers, which should only be performed between pointers with the same provenance. In \vericert{} there is currently no way to determine the provenance of a pointer, and it therefore cannot model the semantics of unsigned comparison in \compcert{}. As dynamic allocation is not supported either, comparison of pointers is rarely needed, and for the comparison of integers, these can be cast to signed integers during the comparison for the translation to succeed.
\subsubsection{Translating HTL to Verilog}