\section{Designing a verified HLS tool} \label{sec:design} This section describes the main architecture of the HLS tool, and the way in which the Verilog back end was added to \compcert{}. This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language. \paragraph{Choice of source language} C was chosen as the source language as it remains the most common source language amongst production-quality HLS tools~\cite{canis11_legup, xilinx20_vivad_high_synth, intel_hls, bambu_hls}. This, in turn, may be because it is ``[t]he starting point for the vast majority of algorithms to be implemented in hardware''~\cite{5522874}, lending a degree of practicality. %The availability of \compcert{}~\cite{??} also provides a solid basis for formally verified C compilation. %Since a lot of existing code for HLS is written in C, supporting C as an input language, rather than a custom domain-specific language, means that \vericert{} is more practical. %An alternative was to support LLVM IR as an input language, however, to get a full work flow from a higher level language to hardware, a front end for that language to LLVM IR would also have to be verified. \JW{Maybe save LLVM for the `Choice of implementation language'?} We considered Bluespec~\cite{nikhil04_blues_system_veril}, but decided that although it ``can be classed as a high-level language''~\cite{greaves_note}, it is too hardware-oriented to be suitable for traditional HLS. We also considered using a language with built-in parallel constructs that map well to parallel hardware, such as occam~\cite{page91_compil_occam}, Spatial~\cite{spatial} or Scala~\cite{chisel}, but found these languages too niche. % However, this would not qualify as being HLS due to the manual parallelism that would have to be performed. \JW{I don't think the presence of parallelism stops it being proper HLS.} %\JP{I think I agree with Yann here, but it could be worded better. At any rate not many people have experience writing what is essentially syntactic sugar over a process calculus.} %\JW{I mean: there are plenty of software languages that involve parallel constructs. Anyway, perhaps we can just dismiss occam for being too obscure.} \paragraph{Choice of target language} Verilog~\cite{06_ieee_stand_veril_hardw_descr_languag} is an HDL that can be synthesised into logic cells which can be either placed onto a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC). Verilog was chosen as the output language for \vericert{} because it is one of the most popular HDLs and there already exist a few formal semantics for it that could be used as a target~\cite{loow19_verif_compil_verif_proces, meredith10_veril}. Bluespec, previously ruled out as a source language, is another possible target and there exists a formally verified translation to circuits using K\^{o}ika~\cite{bourgeat20_essen_blues}. %\JP{This needs an extra comment maybe?}\YH{Maybe about bluespec not being an ideal target language because it's quite high-level?} % but targeting this language would not be trivial as it is not meant to be targeted by an automatic tool, instead strives to a formally verified high-level hardware description language instead. %\JW{Can we mention one or two alternatives that we considered? Bluespec or Chisel or one of Adam Chlipala's languages, perhaps?} \paragraph{Choice of implementation language} We chose Coq as the implementation language because of its mature support for code extraction; that is, its ability to generate OCaml programs directly from the definitions used in the theorems. We note that other authors have had some success reasoning about the HLS process using other theorem provers such as Isabelle~\cite{ellis08}. \compcert{}~\cite{leroy09_formal_verif_realis_compil} was chosen as the front end framework, as it is a mature framework for simulation proofs about intermediate languages, and it already provides a validated C parser~\cite{jourdan12_valid_lr_parser}. The Vellvm framework~\cite{zhao12_formal_llvm_inter_repres_verif_progr_trans} was also considered because several existing HLS tools are already LLVM-based, but additional work would be required to support a high-level language like C as input. The .NET framework has been used as a basis for other HLS tools, such as Kiwi~\cite{kiwi}, and LLHD~\cite{schuiki20_llhd} has been recently proposed as an intermediate language for hardware design, but neither are suitable for us because they lack formal semantics. \begin{figure} \centering \resizebox{0.47\textwidth}{!}{ \begin{tikzpicture} [language/.style={fill=white,rounded corners=3pt,minimum height=7mm}, continuation/.style={}] \fill[compcert,rounded corners=3pt] (-1,-1) rectangle (9,1.5); \fill[formalhls,rounded corners=3pt] (-1,-1.5) rectangle (9,-2.5); \node[language] at (-0.3,0) (clight) {Clight}; \node[continuation] at (1,0) (conta) {$\cdots$}; \node[language] at (2.7,0) (cminor) {CminorSel}; \node[language] at (4.7,0) (rtl) {3AC}; \node[language] at (6.2,0) (ltl) {LTL}; \node[language] at (8.4,0) (ppc) {PPC}; \node[continuation] at (7.3,0) (contb) {$\cdots$}; \node[language] at (4.7,-2) (dfgstmd) {HTL}; \node[language] at (6.7,-2) (verilog) {Verilog}; \node at (0,1) {\compcert{}}; \node at (0,-2) {\vericert{}}; \draw[->] (clight) -- (conta); \draw[->] (conta) -- (cminor); \draw[->] (cminor) -- (rtl); \draw[->] (rtl) -- (ltl); \draw[->] (ltl) -- (contb); \draw[->] (contb) -- (ppc); \draw[->] (rtl) -- (dfgstmd); \draw[->] (dfgstmd) -- (verilog); \end{tikzpicture}} \caption{Verilog back end to \compcert{}, branching off at the three address code (3AC), at which point the three address code is transformed into a state machine. Finally, it is transformed to a hardware description of the state machine in Verilog.}% \label{fig:rtlbranch} \end{figure} \paragraph{Architecture of \vericert{}} The main work flow of \vericert{} is given in Figure~\ref{fig:rtlbranch}, which shows those parts of the translation that are performed in \compcert{}, and those that have been added. \def\numcompcertlanguages{ten} \compcert{} translates Clight\footnote{A deterministic subset of C with pure expressions.} input into assembly output via a sequence of intermediate languages; we must decide which of these \numcompcertlanguages{} languages is the most suitable starting point for the HLS-specific translation stages. We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language (RTL) in the \compcert{} literature. `3AC' is used in this paper instead to avoid confusion with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} as the starting point. Branching off before this point (at CminorSel or earlier) denies \compcert{} the opportunity to perform optimisations such as constant propagation and dead code elimination, which have been found useful in HLS tools as well as software compilers~\cite{cong+11}. And if we branch off after this point (at LTL or later) then \compcert{} has already performed register allocation to reduce the number of registers and spill some variables to the stack; this transformation is not required in HLS because there are many more registers available, and these should be used instead of RAM whenever possible. %\JP{``\compcert{} performs register allocation during the translation to LTL, with some registers spilled onto the stack: this is unnecessary in HLS since as many registers as are required may be described in the output RTL.''} \JP{Maybe something about FPGAs being register-dense (so rarely a need to worry about the number of flops)?} 3AC is also attractive because it is the closest intermediate language to LLVM IR, which is used by several existing HLS compilers. %\JP{We already ruled out LLVM as a starting point, so this seems like it needs further qualification.}\YH{Well not because it's not a good starting point, but the ecosystem in Coq isn't as good. I think it's still OK here to say that being similar to LLVM IR is an advantage?} It has an unlimited number of pseudo-registers, and is represented as a control flow graph (CFG) where each instruction is a node with links to the instructions that can follow it. One difference between LLVM IR and 3AC is that 3AC includes operations that are specific to the chosen target architecture; we chose to target the x86\_32 backend, because it generally produces relatively dense 3AC thanks to the availability of complex addressing modes.% reducing cycle counts in the absence of an effective scheduling approach. \begin{figure} \centering \begin{subfigure}[b]{0.49\linewidth} \begin{minted}{c} int main() { int x[3] = {1, 2, 3}; int sum = 0; for (int i = 0; i < 3; i++) sum += x[i]; return sum; } \end{minted} \caption{Input C code.}\label{fig:accumulator_c} \end{subfigure}\hspace*{-4mm} \begin{subfigure}[b]{0.49\linewidth} \begin{minted}[fontsize=\footnotesize]{c} main() { 15: x8 = 1 14: int32[stack(0)] = x8 13: x7 = 2 12: int32[stack(4)] = x7 11: x6 = 3 10: int32[stack(8)] = x6 9: x2 = 0 8: x1 = 0 7: x5 = stack(0) (int) 6: x4 = int32[x5 + x1 * 4 + 0] 5: x2 = x2 + x4 + 0 (int) 4: x1 = x1 + 1 (int) 3: if (x1