\section{Designing a verified HLS tool} %\JW{The first part of this section (up to 2.1) is good but needs tightening up. Ultimately, the point of this section is to explain that there's an existing verified compiler called CompCert which has a bunch of stages, and we need to make a decision about where to tap into that pipeline. Too early and we miss out on some helpful optimisations; too late and we've ended up too target-specific. What if you put a few more stages into Figure 1 -- there aren't actually that many missing anyway. Suppose you add in Cminor between C\#minor and 3AC. Then the high-level structure of your argument in this subsection could be: (1) why Cminor is too early, (2) why LTL is too late, and then maybe (3) why 3AC is just right. The Goldilocks structure, haha!} This section covers the main architecture of the HLS tool, and the way in which the back end was added to \compcert{}. This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language. %\JW{I've experimented with adding a few paragraph headings to the text below -- see what you think. The advantage of headings is that it can make the text easier to read quickly.}\YH{Yes I think it works well actually, makes the sections clearer.} \paragraph{Choice of source language}\YH{Could combine this with ``Choice of implementation language'' maybe, as there is a bit of overlap.} \JW{Although the answers to these questions happen to overlap, I think the questions are distinct and it is logical to ask them separately.} First of all, the choice of C for the input language of \vericert{} is because it is the most widely supported language for HLS, and most major HLS tools also use it as an input. As a lot of existing code is also written in C for HLS, supporting C as an input language compared to a custom domain-specific language means that \vericert{} is more practical. Another alternative was to support LLVM IR as an input language, however, to get a full work flow from a higher level language to hardware, a front end for that language to LLVM IR would also have to be verified. Finally, a language similar to Occam was also considered, as it has inherent parallel constructs, however, this would not qualify as being high-level synthesis due to the manual parallelism that would have to be performed. %\JW{Can we mention one or two alternatives that we considered?} \paragraph{Choice of target language} Next, Verilog~\cite{06_ieee_stand_veril_hardw_descr_languag} is a hardware description language, which can be synthesised into logic gates which can be either placed onto a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC). Verilog was chosen as the output language for \vericert{} because it is one of the most popular hardware description languages and there already exist a few formal semantics for it that could be used as a target~\cite{loow19_verif_compil_verif_proces,meredith10_veril}. Other possible targets could have been Bluespec, a higher level hardware description language, for which there exists a formally verified translation to circuits using K\^{o}ika~\cite{bourgeat20_essen_blues}, however, targeting this language would not be trivial as it is not meant to be targeted by an automatic tool. Finally, a custom circuit language could also have been targeted, which can then be translated to Verilog in an unverified way, however, some guarantees would be lost and it would not be possible to completely trust the output. %\JW{Can we mention one or two alternatives that we considered? Bluespec or Chisel or one of Adam Chlipala's languages, perhaps?} \paragraph{Choice of implementation language} The framework that was chosen for the frontend was \compcert{}, as it is a mature framework for simulation proofs about intermediate languages, in addition to already providing a validated parser~\cite{jourdan12_valid_lr_parser} from C into the internal representation of Clight. Other frameworks were also considered, such as Vellvm~\cite{zhao12_formal_llvm_inter_repres_verif_progr_trans}, as LLVM IR in particular is often used by HLS tools anyways, however, these would require more work to support a higher level language such as C as input, or even providing a parser for LLVM IR.\@ \begin{figure} \centering \resizebox{0.47\textwidth}{!}{ \begin{tikzpicture} [language/.style={fill=white,rounded corners=3pt,minimum height=7mm}, continuation/.style={}] \fill[compcert,rounded corners=3pt] (-1,-1) rectangle (9,1.5); \fill[formalhls,rounded corners=3pt] (-1,-1.5) rectangle (9,-2.5); \node[language] at (-0.3,0) (clight) {Clight}; \node[continuation] at (1,0) (conta) {$\cdots$}; \node[language] at (2.7,0) (cminor) {CminorSel}; \node[language] at (4.7,0) (rtl) {3AC}; \node[language] at (6.2,0) (ltl) {LTL}; \node[language] at (8.4,0) (ppc) {PPC}; \node[continuation] at (7.3,0) (contb) {$\cdots$}; \node[language] at (4.7,-2) (dfgstmd) {HTL}; \node[language] at (6.7,-2) (verilog) {Verilog}; \node at (0,1) {\compcert{}}; \node at (0,-2) {Vericert}; \draw[->] (clight) -- (conta); \draw[->] (conta) -- (cminor); \draw[->] (cminor) -- (rtl); \draw[->] (rtl) -- (ltl); \draw[->] (ltl) -- (contb); \draw[->] (contb) -- (ppc); \draw[->] (rtl) -- (dfgstmd); \draw[->] (dfgstmd) -- (verilog); \end{tikzpicture}} \caption{Verilog back end to Compcert, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine. Finally, it is transformed to a hardware description of the state machine in Verilog.}% \label{fig:rtlbranch} \end{figure} \paragraph{Architecture of \vericert{}} The main work flow of \vericert{} is shown in Figure~\ref{fig:rtlbranch}, which shows the parts of the translation that are performed in \compcert{}, and which have been added with \vericert{}. \compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various optimisations. When designing a new back end for \compcert{}, it is crucial to know where to branch off, so as to benefit from all the useful optimisations that \compcert{} performs, but not performing optimisations that are not useful, which include optimisations that are specific to the target CPU architecture. These optimisations include register allocation, as there is not a fixed number of registers that need to be targeted. To choose where to branch off at, each intermediate language in \compcert{} can be evaluated to see if it is suitable to be transformed into hardware. Existing HLS compilers often use LLVM IR as an intermediate representation when performing HLS-specific optimisations, as each instruction can be mapped quite well to hardware that performs the same behaviour. Looking at the intermediate languages in \compcert{} shown in Figure~\ref{fig:rtlbranch}, there are many languages to choose from. Clight and CminorSel are an abstract syntax tree (AST) representation of the C code, which does not correspond to a more assembly like language similar to LLVM IR.\@ In addition to that, looking at the languages from LTL to PPC, even though these languages do contain basic blocks, which are desirable when doing HLS, starting from LTL the number of registers is limited. Register allocation limits the number of registers when translating 3AC into LTL, and stores variables on the stack if that is required. This is not needed when performing HLS, as there are many more registers available, and it is preferable to use these instead of RAM whenever possible. \compcert{}'s three-address code (3AC)\footnote{Three-address code (3AC) is also known as register transfer language (RTL) in the \compcert{} literature, however, 3AC is used in this paper instead so as not to confuse it with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. 3AC is represented as a control flow graph (CFG) in CompCert. Each instruction is a node in the graph and links to the instructions that follow it. This CFG then describes how the computation should proceed, and is a good representation for performing optimisations on as well as local transformations. However, one difference between LLVM IR and 3AC is that 3AC uses operations of the target architecture and performs architecture specific optimisations as well, which is not the case in LLVM IR where all the instructions are quite abstract. This can be mitigated by making \compcert{} target a specific architecture such as x86\_32, where most operations translate quite well into hardware. In addition to that, many optimisations that are also useful for HLS are performed in 3AC, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS back end. The complete flow that \vericert{} takes is show in figure~\ref{fig:rtlbranch}. \begin{figure} \centering \begin{subfigure}[b]{0.49\linewidth} \begin{minted}{c} int main() { int x[3] = {1, 2, 3}; int sum = 0; for (int i = 0; i < 3; i++) sum += x[i]; return sum; } \end{minted} \caption{Input C code.}\label{fig:accumulator_c} \end{subfigure}\hspace*{-4mm} \begin{subfigure}[b]{0.49\linewidth} \begin{minted}[fontsize=\footnotesize]{c} main() { 15: x8 = 1 14: int32[stack(0)] = x8 13: x7 = 2 12: int32[stack(4)] = x7 11: x6 = 3 10: int32[stack(8)] = x6 9: x2 = 0 8: x1 = 0 7: x5 = stack(0) (int) 6: x4 = int32[x5 + x1 * 4 + 0] 5: x2 = x2 + x4 + 0 (int) 4: x1 = x1 + 1 (int) 3: if (x1 > y,& \text{if } x \geq 0\\ \left\lceil \frac{x}{2^y} \right\rceil = - \left\lfloor \frac{-x}{2^y} \right\rfloor = - ( - x >> y ),& \text{otherwise}. \end{cases}\\ \end{align*} The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different. In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit. To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics. This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0. %\JW{I wonder if Section 2 could benefit from a `Some Key Challenges' subsection, where you highlight several interesting bits of the translation process, each with their own paragraph heading. These could be something like:\begin{enumerate}\item Discrepancy between C and Verilog w.r.t. signedness \item Deciding between byte- and word-addressable memories \item Adding reset signals \item Implementing the Oshrximm instruction correctly \end{enumerate} For the causal reader, this would immediately signal two things: (1) you can skip this subsection on your initial pass, and (2) proving the HLS tool correct was a non-trivial undertaking.} % - Explain main differences between translating C to software and to hardware. % + This can be done by going through the simple example. %%% Local Variables: %%% mode: latex %%% TeX-master: "main" %%% TeX-command-extra-options: "-shell-escape" %%% End: