summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJohn Wickerson <j.wickerson@imperial.ac.uk>2020-11-06 11:37:39 +0000
committeroverleaf <overleaf@localhost>2020-11-06 14:45:38 +0000
commit0967b811b7f6f18288406e91987791eb55b38b62 (patch)
tree5d02d336631db9320b0630be50cc4aa320546a01
parent15b9b710af58d3573c8d5bf23d63e88e9b11994c (diff)
downloadoopsla21_fvhls-0967b811b7f6f18288406e91987791eb55b38b62.tar.gz
oopsla21_fvhls-0967b811b7f6f18288406e91987791eb55b38b62.zip
Update on Overleaf.
-rw-r--r--algorithm.tex20
-rw-r--r--introduction.tex2
-rw-r--r--main.tex9
3 files changed, 19 insertions, 12 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 2ffcda6..3d55d93 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,5 +1,7 @@
\section{Adding an HLS back end to \compcert{}}
+\JW{The first part of this section (up to 2.1) is good but needs tightening up. Ultimately, the point of this section is to explain that there's an existing verified compiler called CompCert which has a bunch of stages, and we need to make a decision about where to tap into that pipeline. Too early and we miss out on some helpful optimisations; too late and we've ended up too target-specific. What if you put a few more stages into Figure 1 -- there aren't actually that many missing anyway. Suppose you add in Cminor between C\#minor and 3AC. Then the high-level structure of your argument in this subsection could be: (1) why Cminor is too early, (2) why LTL is too late, and then maybe (3) why 3AC is just right. The Goldilocks structure, haha!}
+
This section covers the main architecture of the HLS tool, and the way in which the backend was added to \compcert{}. This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
\begin{figure}
@@ -25,15 +27,15 @@ This section covers the main architecture of the HLS tool, and the way in which
\draw[->] (rtl) -- (dfgstmd);
\draw[->] (dfgstmd) -- (verilog);
\end{tikzpicture}}
- \caption{Verilog backend to Compcert, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine. Finally, it is transformed to a hardware description of the state machine in Verilog.}%
+ \caption{Verilog backend to Compcert, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine. Finally, it is transformed to a hardware description of the state machine in Verilog. \JW{The dashed arrows look like `optional' arrows to my mind. To suggest a sequence of arrows (which is what you intend) I would draw something more like `-->...-->'.}}%
\label{fig:rtlbranch}
\end{figure}
The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}.
-\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various different optimisations. When designing a new backend for \compcert{}, it is crucial to know where to branch off and start the hardware generation. Many optimisations that the CompCert backend performs are not necessary when generating custom hardware, meaning no CPU architecture is being targeted. These optimisations include register allocation, as there is no more fixed number of registers that need to be targeted. It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
+\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various \JWcouldcut{different} optimisations. When designing a new backend \JW{need to stick with either `back end' or `back-end' or `backend' throughout} for \compcert{}, it is crucial to know where to branch off and start the hardware generation. \JW{Mismatch between `designing a new backend' (not HW-specific) and `hardware generation' (HW-specific).} Many optimisations that the CompCert backend \JW{`the CompCert backend' -> CompCert?} performs are not necessary when generating custom hardware, meaning no CPU architecture is being targeted. \JW{`meaning ...' -- this explanation comes too late to be useful} These optimisations include register allocation, as there is no more fixed \JW{there is not a fixed} number of registers that need to be targeted. It is therefore important to find the right intermediate language \JW{You already said `crucial to know where to branch off' so this feels repetitive} so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
-Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour. CompCert's three address code (3AC)\footnote{Three address code (3AC) is also know as register transfer language (RTL) in the CompCert literature, however, 3AC is used in this paper so as not to confuse it with register transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract. This can be mitigated by making CompCert target a specific architecture such as x86\_32, where most operations translate quite well into hardware. In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.
+Existing HLS compilers usually \JW{maybe `often' is a less contentious claim} use LLVM IR as an intermediate representation when performing HLS specific \JW{HLS-specific} optimisations, as each instruction can be mapped quite well to hardware which \JW{which -> that} performs the same behaviour. CompCert's three address \JW{three-address} code (3AC)\footnote{Three address \JW{Three-address} code (3AC) is also know \JW{known} as register transfer language (RTL) in the CompCert literature, however, 3AC is used in this paper so as not to confuse it with register transfer \JW{register-transfer} level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. However, one difference between the two is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract. This can be mitigated by making CompCert target a specific architecture such as x86\_32, where most operations translate quite well into hardware. In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.
% - Explain main differences between translating C to software and to hardware.
@@ -49,18 +51,18 @@ Existing HLS compilers usually use LLVM IR as an intermediate representation whe
\inputminted[fontsize=\footnotesize]{c}{data/accumulator.rtl}
\caption{Accumulator 3AC code.}\label{fig:accumulator_rtl}
\end{subfigure}
- \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC).}\label{fig:accumulator_c_rtl}
+ \caption{Accumulator example using \compcert{} to translate from C to three address code (3AC). \JW{I would consider dropping `accumulator' here, as I found it slightly confusing.} \JW{Some of the line numbers on the right have been cropped.}}\label{fig:accumulator_c_rtl}
\end{figure}
\subsection{Example C to Verilog translation}
Taking the simple accumulator program shown in Figure~\ref{fig:accumulator_c}, we can describe the main translation that is performed in Vericert to go from the behavioural description into a hardware design.
-The first step of the translation is to use \compcert{} to transform the C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}, including many optimisations that are performed in the 3AC language. This includes optimisations such as constant propagation and dead-code elimination. Function inlining is also performed, and it is used as a way to support function calls instead of having to support the \texttt{Icall} 3AC instruction. The duplication of the function bodies caused by inlining does affect the total area of the hardware, however it improves performance of the hardware.\YH{TODO: Not sure how to justify inlining, apart from that it is an easy way to support function calls}
+The first step of the translation is to use \compcert{} to transform the C code into the 3AC shown in Figure~\ref{fig:accumulator_rtl}, including many optimisations that are performed in the 3AC language. This includes optimisations such as constant propagation and dead-code elimination. Function inlining is also performed, and it is used as a way to support function calls instead of having to support the \texttt{Icall} 3AC instruction. The duplication of the function bodies caused by inlining does affect the total area of the hardware, however it improves performance of the hardware.\YH{TODO: Not sure how to justify inlining, apart from that it is an easy way to support function calls} \JW{Perhaps you could say that this means you don't support recursion, but that this is in line with most existing HLS tools.}
% - Explain why we chose 3AC (3AC) as the branching off point.
-3AC is represented as a control flow graph (CFG) in CompCert. Each instruction is a node in the graph and links to the instructions that follow it. This CFG then describes how the computation should proceed, and is a good representation for performing optimisations on as well as local transformations.
+3AC is represented as a control flow graph (CFG) in CompCert. Each instruction is a node in the graph and links to the instructions that follow it. This CFG then describes how the computation should proceed, and is a good representation for performing optimisations on as well as local transformations. \JW{This belongs in the previous subsection where you were justifying 3AC as the branching-off point.}
\subsubsection{Transformation to HTL}
@@ -78,9 +80,9 @@ The first translation performed in Vericert is from 3AC to a hardware translatio
\caption{Accumulator hardware diagram.}\label{fig:accumulator_diagram}
\end{figure*}
-The translation from RTL to HTL is quite straightforward, as each RTL instruction either matches up quite well to a hardware construct, or does not have to be handled by the translation, such as function calls. At each instruction, the control flow is separated from the data computation and is then added to the control logic and data-flow map respectively. For example, in state 16 in the RTL code in figure~\ref{fig:accumulator_rtl}, the register \texttt{x9} is initialised to one and then moves on to state 15. This is encoded in HTL by initialising a 32 bit register \texttt{reg\_9} to one as well in the data-flow graph section, and also adding a state transition to the state 15 in the controllogic transition. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
+The translation from RTL \JW{3AC, here and elsewhere} to HTL is quite straightforward, as each RTL instruction either matches up quite well to a hardware construct, or does not have to be handled by the translation, such as function calls. At each instruction, the control flow is separated from the data computation and is then added to the control logic and data-flow map respectively. For example, in state 16 in the RTL code in figure~\ref{fig:accumulator_rtl}, the register \texttt{x9} is initialised to one and then moves on \JW{this reads as `the register moves on' which is not what you intend} to state 15. This is encoded in HTL by initialising a 32 bit \JW{32-bit} register \texttt{reg\_9} to one as well in the data-flow graph section, and also adding a state transition to the state 15 in the control logic transition. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
-The translation can be described by a relational specification, which can be described by the following relation, where \textit{fin}, \textit{rtrn}, $\sigma$ and \textit{stk} are the registers for the finished signal, return value, current state and stack respectively, $i$ is the RTL instruction being translated, and \textit{data} and \textit{control} are the data-flow and control logic map respectively.
+\JW{I think some of the following material belongs in a later section. In the current section, I think we should aim at the casual reader who is interested in understanding things like the overall structure of the tool, which IR language you branch off from, and what the key ideas/difficulties involved are. I think intimidating inference rules with loads of new symbols should be saved for a later section -- the kind of section that really is only going to be read by people who are actually interested in reproducing or building upon your work. So things like `One thing to note is that the comparison is signed ... by default all operators in Verilog are unsigned ... so we force the operators to be signed' very much belong in the current section; you just don't need a scary inference rule first.} The translation can be described by a relational specification, which can be described by the following relation, where \textit{fin}, \textit{rtrn}, $\sigma$ and \textit{stk} are the registers for the finished signal, return value, current state and stack respectively, $i$ is the RTL instruction being translated, and \textit{data} and \textit{control} are the data-flow and control logic map respectively.
\begin{equation*}
\yhfunction{tr\_instr } \textit{fin rtrn }\ \sigma\ \textit{stk }\ i\ \textit{data }\ \textit{control}
@@ -112,7 +114,7 @@ This translation seems quite straightforward, however, proving that it is correc
\inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
\caption{Accumulator Verilog code.}\label{fig:accumulator_v_2}
\end{subfigure}
- \caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog.}\label{fig:accumulator_v}
+ \caption{Accumulator example using \vericert{} to translate the 3AC to a state machine expressed in Verilog. \JW{Subfigure captions aren't correct.}}\label{fig:accumulator_v}
\end{figure}
To check that the initial state of the Verilog is the same as the HTL code, we therefore have to run the module once, assuming the state is undefined and the reset is set to high. We then have to compare the abstract starting state stored in the HTL module to the bitvector value we obtain from running the module for one clock cycle and prove that they are the same. As the value for the state is undefined, the case statements will evaluate to the default state and therefore not perform any computations.
diff --git a/introduction.tex b/introduction.tex
index 8411ebe..c597e01 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -4,6 +4,8 @@
% \JW{Maybe `the' is too strong -- could say `One current approach'.}
+\JW{A few high-level comments: \begin{enumerate} \item Create more tension from the start by making the reader doubt whether existing HLS tools are trustworthy. \item The intro currently draws quite a bit of motivation from Lidbury et al. 2015, but we should also now lean on our FPGA submission too. \item I wonder whether the paragraph `To mitigate the problems...' should be demoted to a `related work' discussion (perhaps as a subsection towards the end of the introduction). It outlines (and nicely dismisses) some existing attempts to tackle the problem, which is certainly useful motivation for your work, especially for readers already familiar with HLS, but I feel that it's not really on the critical path for understanding the paper.\end{enumerate}}
+
One current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@ However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog. Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient. Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
%% Definition and benefits of HLS
diff --git a/main.tex b/main.tex
index dbca92f..9ed50ec 100644
--- a/main.tex
+++ b/main.tex
@@ -40,6 +40,7 @@
\usepackage{booktabs}
\usepackage{mathpartir}
\usepackage{pgfplots}
+\usepackage{soul}
\usepackage{subcaption}
\usepackage{tikz}
\usepackage{minted}
@@ -54,6 +55,8 @@
\newcommand\JP[1]{\Comment{blue!50!black}{JP}{#1}}
\newcommand\NR[1]{\Comment{yellow!50!black}{NR}{#1}}
+\newcommand\JWcouldcut[1]{{\setstcolor{red!75!black}\st{#1}}}
+
\definecolor{compcert}{HTML}{66c2a5}
\definecolor{formalhls}{HTML}{fc8d62}
\definecolor{keywordcolour}{HTML}{8f0075}
@@ -66,8 +69,8 @@
\newcommand\yhfunction[1]{\texttt{\textcolor{functioncolour}{#1}}}
\newcommand\yhconstant[1]{\texttt{\textcolor{constantcolour}{#1}}}
-\newcommand{\vericert}{Vericert}%
-\newcommand{\compcert}{CompCert}%
+\newcommand{\vericert}{Veri\-cert}%
+\newcommand{\compcert}{Comp\-Cert}%
\begin{document}
@@ -133,7 +136,7 @@
% \JW{‘Not generating hardware’ isn’t too bad, and indeed VeriCert is guilty of that too, when presented with C programs that contain features that it can’t yet handle. The real problem is the HLS tool crashing unceremoniously.}
High-level synthesis (HLS) refers to automatically compiling software into hardware. In a world increasingly reliant on application-specific hardware accelerators, HLS provides a higher-level of abstraction for hardware designers, while also allowing the designers to benefit from the rich software ecosystem to verify the functionality of their designs. However, adoption of HLS in safety-critical applications is still limited, because it may be infeasible to verify that the hardware implements the same behaviour as the software it was generated from, eliminating the benefits that software verification may provide. There is also evidence that HLS tools tend to be quite unreliable instead, either generating wrong hardware or sometimes crashing in unexpected ways for supported inputs.
- We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics. Our tool, called \vericert{}, extends the \compcert{} verified C compiler and consists of a new hardware-specific intermediate language and a Verilog back end, which has been proven correct in Coq. \vericert{} supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables. It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool. \JW{Hopefully in due course we can nuance this a bit by adding in some figures relating to the unverified \vericert{} extensions.}\YH{This might be the case, but maybe we want to keep that for a completely new paper as discussed.}
+ We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics. Our tool, called \vericert{}, extends the \compcert{} verified C compiler and consists of a new hardware-specific intermediate language and a Verilog back end, which has been proven correct in Coq. \vericert{} supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables. It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool.
\end{abstract}
%% 2012 ACM Computing Classification System (CSS) concepts