\section{Evaluation}

Our evaluation is designed to answer the following three research questions.
\begin{description}
\item[RQ1] How fast is the hardware generated by CoqUp, and how does this compare to existing HLS tools?
\item[RQ2] How area-efficient is the hardware generated by CoqUp, and how does this compare to existing HLS tools?
\item[RQ3] How long does CoqUp take to produce hardware?
\end{description}

\subsection{Experimental Setup}

\paragraph{Choice of HLS tool for comparison.} We compare CoqUp against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.

\paragraph{Choice of benchmarks.} We evaluate CoqUp using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since CoqUp only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of CoqUp, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}

\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist using a tool such as Yosys~\cite{yosys} or Intel Quartus~\cite{quartus}. The resultant netlist can then be placed-and-routed for a particular FPGA device. In the ideal experimental setup, we would use the same netlist synthesis tool for both CoqUp and LegUp.
However, we found that neither Yosys nor Quartus worked well with \emph{both} HLS tools. Quartus could synthesise efficient hardware from LegUp-generated Verilog in part because it detects opportunities to replace large numbers of registers with small RAM blocks, yet on CoqUp-generated Verilog, this RAM inference failed, leading to designs too large to fit onto the FPGA. Yosys had the same problem, but with the HLS tools reversed. So, in an effort to avoid disadvantaging either HLS tool, we use LegUp with Quartus and CoqUp with Yosys. In both cases, we then use Quartus to place-and-route the synthesised netlists for a \ref{?} FPGA.

\subsection{RQ1: How fast is CoqUp-generated hardware?}

\begin{itemize}
    \item Draw a scatter graph and talk about it. Note: advantage of scatter graph is that it summarises a large number of benchmarks quite succinctly. However, barcharts is more traditional and would allow data for individual benchmarks to be more easily identified.
\end{itemize}

\subsection{RQ2: How area-efficient is CoqUp-generated hardware?}

\begin{itemize}
    \item Draw a scatter graph and talk about it.
\end{itemize}

\subsection{RQ3: How long does CoqUp take to produce hardware?}

\begin{itemize}
    \item Draw a scatter graph and talk about it.
\end{itemize}

\begin{table}
  \begin{tabular}{lcccccc}
    \toprule
    Benchmark & Cycles & Frequency & LUTs & Registers & BRAMs\\
    \midrule
    adpcm & 30241 &90.05 & 7719 & 12034 & 7\\
    aes & 8489 & 87.83 & 24413 & 23796 & 19 \\
    gsm & 7190 & 119.25 & 6638 & 9201 & 3 \\
    mips & 7754 & 98.95 & 5049 & 4185 & 0 \\
    \bottomrule
  \end{tabular}
  \caption{CHstone programs synthesised in LegUp 5.1}
\end{table}

\begin{table}
  \begin{tabular}{lccccccc}
    \toprule
    Benchmark & Cycles & Frequency & LUTs & Registers & BRAMs & DSPs\\
    \midrule
    adpcm & XXX  & 66.3 & 51626 & 42688 & 0 & 48 &\\
    aes & 41958 & 19.6 & 104017 & 94239 & 0 & 6 \\
    gsm & 21994 & 66.1 & 45764 & 33675 & 0 & 8\\
    mips & 18482 & 78.43 & 10617 & 7690 & 0 & 0\\
    \bottomrule
  \end{tabular}
  \caption{CHstone programs synthesised in CoqUp}
\end{table}

The difference in cycle counts shows the degree of  parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of CompCert instructions that we have proven to be correct.

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: