summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2020-11-17 15:17:01 +0000
committeroverleaf <overleaf@localhost>2020-11-17 15:17:05 +0000
commit12a1efe6be5023394bbb3a5adf320700156add08 (patch)
treea3ea1188091b6a637ccc399db415a2d856190d1c
parentbf2274eda4164b5d085a4359c0d6158e3e59f9dc (diff)
downloadoopsla21_fvhls-12a1efe6be5023394bbb3a5adf320700156add08.tar.gz
oopsla21_fvhls-12a1efe6be5023394bbb3a5adf320700156add08.zip
Update on Overleaf.
-rw-r--r--acmart.cls2
-rw-r--r--algorithm.tex46
-rw-r--r--introduction.tex18
-rw-r--r--main.tex2
4 files changed, 43 insertions, 25 deletions
diff --git a/acmart.cls b/acmart.cls
index 239698b..e8b3b9b 100644
--- a/acmart.cls
+++ b/acmart.cls
@@ -489,7 +489,7 @@
\def\@makefnmark{\hbox{\@textsuperscript{\normalfont\@thefnmark}}}
\let\@footnotemark@nolink\@footnotemark
\let\@footnotetext@nolink\@footnotetext
-\RequirePackage[bookmarksnumbered,unicode]{hyperref}
+\RequirePackage[bookmarksnumbered,unicode,draft]{hyperref}
\pdfstringdefDisableCommands{%
\def\addtocounter#1#2{}%
\def\unskip{}%
diff --git a/algorithm.tex b/algorithm.tex
index 2c4f577..18154bd 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -61,17 +61,17 @@ To choose where to branch off at, each intermediate language in \compcert{} can
\begin{subfigure}[b]{0.49\linewidth}
\begin{minted}{c}
int main() {
- int x[3] = {1, 2, 3};
- int sum = 0;
- for (int i = 0;
- i < 3;
- i++)
- sum += x[i];
- return sum;
+ int x[3] = {1, 2, 3};
+ int sum = 0;
+ for (int i = 0;
+ i < 3;
+ i++)
+ sum += x[i];
+ return sum;
}
\end{minted}
- \caption{Example showing the input C code.}\label{fig:accumulator_c}
- \end{subfigure}\hfill%
+ \caption{Input C code.}\label{fig:accumulator_c}
+ \end{subfigure}\hspace*{-4mm}
\begin{subfigure}[b]{0.49\linewidth}
\begin{minted}[fontsize=\footnotesize]{c}
main() {
@@ -94,18 +94,20 @@ main() {
1: return x4
}
\end{minted}
- \caption{The 3AC code produced by \compcert{}.}\label{fig:accumulator_rtl}
+ \caption{3AC produced by \compcert{}.}\label{fig:accumulator_rtl}
\end{subfigure}
- \caption{Short example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
+ \caption{Using \compcert{} to translate a simple program from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
\end{figure}
-\subsection{Example C to Verilog translation}
+\subsection{Translating C to Verilog, by example}
+
+Using the simple accumulator program shown in Figure~\ref{fig:accumulator_c} as a worked example, this section describes the main translation that is performed in Vericert to go from the behavioural description in C into a hardware design in Verilog.
-Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behavioural description into a hardware design.
+\subsubsection{Translating C to 3AC}
-The first step of the translation is to use \compcert{} to transform the C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}, including many optimisations that are performed in the 3AC language. This includes optimisations such as constant propagation and dead-code elimination. Function inlining is also performed, and it is used as a way to support function calls instead of having to support the \texttt{Icall} 3AC instruction. The duplication of the function bodies caused by inlining does affect the total area of the hardware, however it improves the latency of the hardware. In addition to that, inlining removes the possibility of supporting recursive function calls, however, this is a feature that isn't supported in most other HLS tools either.
+The first step of the translation is to use \compcert{} to transform the input C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}. As part of this, \compcert{} performs optimisations such as constant propagation and dead-code elimination. Function inlining is also performed, which allows us to support function calls without having to support the \texttt{Icall} 3AC instruction. The duplication of the function bodies caused by inlining does affect the total area of the hardware \JW{Is that definitely true? Was discussing this with Nadesh and George recently, and I ended up not being so sure. Inlining could actually lead to \emph{reduced} resource usage because once everything has been inlined, the (big) scheduling problem could then be solved quite optimally. Certainly inlining is known to increase register pressure, but that's not really an issue here. If we're not sure, we could just say that inlining everything leads to bloated Verilog files and the inability to support recursion, and leave it at that.}, however it improves the latency of the hardware. In addition to that, inlining removes the possibility of supporting recursive function calls, however, this is a feature that isn't supported in most other HLS tools either.
-\subsubsection{Transformation to HTL}
+\subsubsection{Translating 3AC to HTL}
% + TODO Explain the main mapping in a short simple way
@@ -113,19 +115,23 @@ The first step of the translation is to use \compcert{} to transform the C code
% + TODO Explain how memory is mapped
-The first translation performed in Vericert is from 3AC to a hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog. The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD) representation in HTL.\@ The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations which update the memory and registers. Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD.\@
+The first translation performed in Vericert is from 3AC to a \JW{new} hardware translation language (HTL), which is one step towards being completely translated to hardware described in Verilog. The main translation that is performed is going from a CFG representation of the computation to a finite state machine with datapath (FSMD) \JW{I feel like this could use some sort of citation, but I'm not sure what. I guess this is all from "Hardware Design 101", right?} representation in HTL.\@ The core idea of the FSMD representation is that it separates the control flow from the operations on the memory and registers, so that the state transitions can be translated into a simple finite state machine (FSM) and each state then contains data operations that update the memory and registers. Figure~\ref{fig:accumulator_diagram} shows the resulting architecture of the FSMD. \JW{I think it would be worth having a sentence to explain how the C model of memory is translated to a hardware-centric model of m}
\begin{figure*}
\centering
\includegraphics[scale=0.3,trim={10cm 10cm 7cm 7cm},clip=true]{data/accumulator_fsmd2.pdf}
- \caption{Accumulator hardware diagram.}\label{fig:accumulator_diagram}
+ \caption{The FSMD for our running example. \JW{Maybe replace `State' with `Current State'? And maybe `Calculate State' could be clearer as `Calculate Next State'?} \JW{Is it worth distinguishing between the different types of box? We have a RAM box which is a single hardware block, then the Registers and State boxes both indicate a collection of registers, and finally the Update and Calculate boxes indicate combinational logic. Perhaps the combinational logic bits could be visualised as clouds? (Or if clouds are a bit of a faff to draw, then rounded rectangles, or something?)} \JW{Can we label `data path' and `control path' on the diagram? Looks like the diagram is nicely split into these two parts already.} \JW{Can state 15 (or should it be state 16??) have a dangling incoming arrow to indicate that it is the start state? And perhaps state 1 could have a double outline to indicate that it is an `accepting' state? Since there's space above the `Calculate State' box, I'd be mildly in favour of expanding that box a bit so that it included all 15 states explicitly (snaking back and forth).}}\label{fig:accumulator_diagram}
\end{figure*}
-The translation from 3AC to HTL is quite straightforward, as each 3AC instruction either matches up quite well to a hardware construct, or does not have to be handled by the translation, such as function calls. At each instruction, the control flow is separated from the data computation and is then added to the control logic and data-flow map respectively. For example, in state 16 in the 3AC code in figure~\ref{fig:accumulator_rtl}, the register \texttt{x9} is initialised to one, after which the control flow moves to state 15. This is encoded in HTL by initialising a 32-bit register \texttt{reg\_9} to one as well in the data-flow graph section, and also adding a state transition to the state 15 in the control logic transition. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
+The translation from 3AC to HTL is quite straightforward, as each 3AC instruction either matches up quite well to a hardware construct, or does not have to be handled by the translation, such as function calls.
+\JW{Repeated `quite'.}
+At each instruction, the control flow is separated from the data computation and is then added to the control logic and data-flow map respectively.
+\JW{I suspect that you could safely chop that sentence.}
+For example, in state 16 \JWcouldcut{in the 3AC code} in figure~\ref{fig:accumulator_rtl}, the register \texttt{x9} is initialised to 1, after which the control flow moves to state 15. This is encoded in HTL by initialising a 32-bit register \texttt{reg\_9} to 1 \JWcouldcut{as well} in the data-flow \JWcouldcut{graph} section, and also adding a \JWcouldcut{state} transition to the state 15 in the control logic \JWcouldcut{transition}\JW{section}. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
-In the accumulator C code, the for loop is translated to a branch statement in state 3, which can also be translated to equivalent HTL code by placing the comparison into the control logic part of the main function, and not performing any data operations in the data-flow section. On the next clock cycle, the state will therefore transition to state 7 or state 2 depending on if \texttt{reg\_3} is less than 3 or not. One thing to note is that in this case, the comparison is signed. By default, all operators and registers in Verilog and HTL are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to signed. In addition to that, Verilog resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is also not supported by the Verilog semantics we used, and there would therefore be a mismatch between the Verilog semantics and the actual behaviour of Verilog according to the standard. To bypass this issue braces are used to stop the Verilog simulator or synthesis tool from resizing anything inside the braces. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
+In the \JWcouldcut{accumulator} C code, the for loop \JW{for-loop is easier to parse} is translated to a branch statement in state 3, which can also be translated to equivalent HTL code by placing the comparison into the control logic part of the main function, and not performing any data operations in the data-flow section. On the next clock cycle, the state will therefore transition to state 7 or state 2 depending on if \texttt{reg\_3} is less than 3 or not. \JW{The idea of a for-loop translating to a branch statement in the CFG is standard; no need to explain that at all I'd say. So I'd start this paragraph by focusing on the signedness issue. In fact, better still: I would move the signedness discussion to a Key Challenge. It's a technical detail that makes the translation more difficult than you might expect, so it's worth talking about, but I don't think it belongs right here because at the moment the focus is on explaining the essence of how the translation works.} One thing to note is that in this case, the comparison is signed. By default, all operators and registers in Verilog and HTL are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to signed. In addition to that, Verilog resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is also not supported by the Verilog semantics we used, and there would therefore be a mismatch between the Verilog semantics and the actual behaviour of Verilog according to the standard. To bypass this issue braces are used to stop the Verilog simulator or synthesis tool from resizing anything inside the braces. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
-\subsection{Translation from HTL to Verilog}
+\subsection{Translating HTL to Verilog}
Finally, we have to translate the HTL code into proper Verilog and prove that it behaves the same as the 3AC according to the Verilog semantics. The Verilog output is modelled as a complete abstract syntax tree (AST) instead of being an abstract map over the instructions that are executed. However, as all the instructions are already expressed in Verilog, only the maps need to be translated to valid Verilog, and correct declarations for all the variables in the program need to be added as well.
diff --git a/introduction.tex b/introduction.tex
index 7abed31..e2e2520 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -56,17 +56,29 @@ As such, high-level synthesis (HLS) is becoming an attractive alternative, since
Modern HLS tools such as LegUp~\cite{}, Vivado HLS~\cite{}, and Intel i++~\cite{} can produce designs with comparable performance and energy-efficiency to those manually coded in HDL [citation needed], while offering the convenient abstractions and rich ecosystem of software development.
\subsection{Can we trust our high-level synthesis tools?}
-The usual starting point for most HLS tool development is to leverage an existing software compiler framework like LLVM.
-Re-using software concepts, optimisation and codebases make HLS compiliation as susceptible to bugs as any software compilers for the same reasons.
+The common starting point for most HLS tool development is to leverage an existing software compiler framework like LLVM, such as LegUp HLS, Vivado HLS and Bambu HLS~\cite{}.
+Re-using software concepts, optimisation and codebases make HLS compiliation as susceptible to bugs as any software compilers.
These native codebases are large and they perform complex and non-trivial analyses and transformations to translate software into efficient assembly or HDL, in the case of HLS.
Consequently, HLS tools cannot always guarantee that the generated hardware is equivalent to the input program, undermining any reasoning conducted at the software level.
Furthermore, HDL design in itself is complex, error-prone and requires careful reasoning and formal verification to ensure behaviour, data-flow and structural correctness~\cite{}.
The added complexity of HDL design thus increases the likelihood of HLS compilation mismatches.
% between the software program and the generated hardware.
% There are valid reasons to doubt that HLS tools actually \emph{do} preserve equivalence, not least because they are large pieces of software that perform complex transformations.
+
Hence, the premise of this work is: Can we trust these compilers to translate high-level languages like C/C++ to HDL correctly?
+For instance, \citet{lidbury15_many_core_compil_fuzzin} abandoned fuzzing Altera's (now Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on a large number of their test inputs.
+Also, Du \emph{et al.}~\cite{?} fuzz tested three commercial HLS tools using Csmith~\cite{yang11_findin_under_bugs_c_compil}, while restricting the C programs to the constructs explicitly supported by the synthesis tools.
+They found that on average 2.5\% of test cases generated a design that did not match the behaviour of the input.
+Meanwhile, Xilinx's Vivado HLS has been shown to apply pipelining optimisations incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong code should the programmer stray outside the fragment of C that it supports\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}}.
+
+\subsection{Existing approach: HLS-generated hardware verification}
+
+It is rather difficult to exhaustively test these large HLS codebases, which include custom compiler directives.
+Hence, most existing work on verifying HLS tools
-\subsection{Evidence of unreliability and previous works}
+To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness. This is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there is a risk that bugs remain in the untested code. To ensure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent. Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof. In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. One example is Catapult C~\cite{mentor20_catap_high_level_synth}, which allows the use of translation validation between the C and RTL code, however, as they mention themselves~\cite{whitepaper}, this process might require quite a bit of iteration on the C code so that the equivalence checker can prove this correctly. This can be a tedious process, as one cannot be sure about what sections in the C need to change to help the equivalence checker prove the equality. In addition to that, the larger the design, the more infeasible using an equivalence checking tool like this would be.
+
+A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code. In addition to that, it would allow for the implementation of optimisation passes which are also proven correct mechanically by translation validation, thereby greatly improving the reliability of these passes.
\subsection{Our approach}
diff --git a/main.tex b/main.tex
index 8697347..a6da81a 100644
--- a/main.tex
+++ b/main.tex
@@ -56,7 +56,7 @@
\newcommand\JP[1]{\Comment{blue!50!black}{JP}{#1}}
\newcommand\NR[1]{\Comment{yellow!50!black}{NR}{#1}}
-\newcommand\JWcouldcut[1]{{\setstcolor{red!75!black}\st{#1}}}
+\newcommand\JWcouldcut[1]{{\st{#1}}}
\definecolor{compcert}{HTML}{66c2a5}
\definecolor{formalhls}{HTML}{fc8d62}