hls.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108

\chapter{Formal Verification of High-Level Synthesis}

\newcommand{\slowdownOrig}{27}
\newcommand{\slowdownDiv}{2}
\newcommand{\areaIncr}{1.1}

\paragraph{Can you trust your high-level synthesis tool?}

As latency, throughput, and energy efficiency become increasingly important, custom hardware
accelerators are being designed for numerous applications.  Alas, designing these accelerators can
be a tedious and error-prone process using a hardware description language (HDL) such as Verilog.
An attractive alternative is \emph{high-level synthesis} (HLS), in which hardware designs are
automatically compiled from software written in a high-level language like C.  Modern HLS tools such
as LegUp~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel
i++~\cite{intel_hls}, and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and
energy-efficiency to those hand-written in an HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341},
while offering the convenient abstractions and rich ecosystems of software development.  But
existing HLS tools cannot always guarantee that the hardware designs they produce are equivalent to
the software they were given, and this undermines any reasoning conducted at the software level.

Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence.
For instance, Vivado HLS has been shown to apply pipelining optimisations
incorrectly\footnote{\url{https://bit.ly/vivado-hls-pipeline-bug}} or to silently generate wrong
code should the programmer stray outside the fragment of C that it
supports.\footnote{\url{https://bit.ly/vivado-hls-pointer-bug}} Meanwhile,
\citet{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now
Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so
many of their test inputs.  More recently,
\citet{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools
using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated
programs to the C fragment explicitly supported by all the tools, they still found that on average
2.5\% of test-cases were compiled to designs that behaved incorrectly.

\paragraph{Existing workarounds}

Aware of the reliability shortcomings of HLS tools, hardware designers routinely check the generated
hardware for functional correctness. This is commonly done by simulating the generated design
against a large test-bench. But unless the test-bench covers all inputs exhaustively -- which is
often infeasible -- there is a risk that bugs remain.

One alternative is to use \emph{translation validation}~\cite{pnueli98_trans} to prove equivalence
between the input program and the output design. Translation validation has been successfully
applied to several HLS
optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
Nevertheless, it is an expensive task, especially for large designs, and it must be repeated every
time the compiler is invoked.  For example, the translation validation for Catapult
C~\cite{mentor20_catap_high_level_synth} may require several rounds of expert
`adjustments'~\cite[p.~3]{slec_whitepaper} to the input C program before validation succeeds. And
even when it succeeds, translation validation does not provide watertight guarantees unless the
validator itself has been mechanically proven
correct~\cite[e.g.][]{tristan08_formal_verif_trans_valid}, which has not been the case in HLS tools
to date.

Our position is that none of the above workarounds are necessary if the HLS tool can simply be
trusted to work correctly.

\paragraph{Our solution}

We have designed a new HLS tool in the Coq theorem prover and proved that any output design it
produces always has the same behaviour as its input program. Our tool, called Vericert, is
automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is
the same as the implementation of the tool. Vericert is built by extending the CompCert verified C
compiler~\cite{leroy09_formal_verif_realis_compil} with a new hardware-specific intermediate
language and a Verilog back end. It supports most C constructs, including integer operations,
function calls (which are all inlined), local arrays, structs, unions, and general control-flow
statements, but currently excludes support for case statements, function pointers, recursive
function calls, non-32-bit integers, floats, and global variables.

\paragraph{Contributions and Outline}
The contributions of this paper are as follows:

\begin{itemize}
  \item We present Vericert, the first mechanically verified HLS tool that compiles C to Verilog. In
        Section~\ref{sec:design}, we describe the design of Vericert, including certain
        optimisations related to memory accesses and division.
  \item We state the correctness theorem of Vericert with respect to an existing semantics for
        Verilog due to \citet{loow19_proof_trans_veril_devel_hol}. In Section~\ref{sec:verilog}, we
        describe how we extended this semantics to make it suitable as an HLS target.  We also
        describe how the Verilog semantics is integrated into CompCert's language execution model
        and its framework for performing simulation proofs. A mapping of CompCert's infinite memory
        model onto a finite Verilog array is also described.
  \item In Section~\ref{sec:proof}, we describe how we proved the correctness theorem. The proof
        follows standard CompCert techniques -- forward simulations, intermediate specifications,
        and determinism results -- but we encountered several challenges peculiar to our
        hardware-oriented
        setting.
        These include handling discrepancies between the byte-addressed memory assumed by the input
        software and the word-addressed memory that we implement in the output hardware, different
        handling of unsigned comparisons between C and Verilog, and carefully implementing memory
        reads and writes so that these behave properly as a RAM in hardware.
  \item In Section~\ref{sec:evaluation}, we evaluate Vericert on the PolyBench/C benchmark
        suite~\cite{polybench}, and compare the performance of our generated hardware against an
        existing, unverified HLS tool called LegUp~\cite{canis11_legup}. We show that Vericert
        generates hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the
        absence of division) and \areaIncr$\times$ larger than that generated by LegUp. This
        performance gap can be largely attributed to Vericert's current lack of support for
        instruction-level parallelism and the absence of an efficient, pipelined division
        operator. We intend to close this gap in the future by introducing (and verifying) HLS
        optimisations of our own, such as scheduling and memory analysis.  This section also reports
        on our campaign to fuzz-test Vericert using over a hundred thousand random C programs
        generated by Csmith~\cite{yang11_findin_under_bugs_c_compil} in order to confirm that its
        correctness theorem is watertight.
\end{itemize}

\paragraph{Companion material}
Vericert is fully open source and available on GitHub at
\url{https://github.com/ymherklotz/vericert}. A snapshot of the Vericert development is also
available in a Zenodo repository~\cite{yann_herklotz_2021_5093839}.