hls.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

\startcomponent hls
\product main

\def\slowdownOrig{27}
\def\slowdownDiv{2}
\def\areaIncr{1.1}

\chapter{High-Level Synthesis}

\paragraph{Can you trust your high-level synthesis tool?} As latency, throughput, and energy
efficiency become increasingly important, custom hardware accelerators are being designed for
numerous applications.  Alas, designing these accelerators can be a tedious and error-prone process
using a hardware description language (HDL) such as Verilog.  An attractive alternative is
\emph{high-level synthesis} (HLS), in which hardware designs are automatically compiled from
software written in a high-level language like C.  Modern HLS tools such as
LegUp~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{intel_hls},
and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and energy-efficiency to
those hand-written in an HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341}, while offering the
convenient abstractions and rich ecosystems of software development.  But existing HLS tools cannot
always guarantee that the hardware designs they produce are equivalent to the software they were
given, and this undermines any reasoning conducted at the software level.

Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence.
For instance, Vivado HLS has been shown to apply pipelining optimisations
incorrectly\footnote{\goto{https://bit.ly/vivado-hls-pipeline-bug}[url(https://bit.ly/vivado-hls-pipeline-bug)]} or to silently generate wrong
code should the programmer stray outside the fragment of C that it
supports.\footnote{\goto{https://bit.ly/vivado-hls-pointer-bug}[url(https://bit.ly/vivado-hls-pointer-bug)]} Meanwhile,
\cite{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test Altera's (now
Intel's) OpenCL compiler since it ``either crashed or emitted an internal compiler error'' on so
many of their test inputs.  More recently,
\cite{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools
using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated
programs to the C fragment explicitly supported by all the tools, they still found that on average
2.5\% of test-cases were compiled to designs that behaved incorrectly.

\paragraph{Existing workarounds.} Aware of the reliability shortcomings of HLS tools, hardware
designers routinely check the generated hardware for functional correctness. This is commonly done
by simulating the generated design against a large test-bench. But unless the test-bench covers all
inputs exhaustively -- which is often infeasible -- there is a risk that bugs remain.

One alternative is to use \emph{translation validation}~\cite{pnueli98_trans} to prove equivalence
between the input program and the output design. Translation validation has been successfully
applied to several HLS
optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
Nevertheless, it is an expensive task, especially for large designs, and it must be repeated every
time the compiler is invoked.  For example, the translation validation for Catapult
C~\cite{mentor20_catap_high_level_synth} may require several rounds of expert
`adjustments'~\cite[alternative=authoryear,right={, p.~3)}][slec_whitepaper] to the input C program before validation succeeds. And
even when it succeeds, translation validation does not provide watertight guarantees unless the
validator itself has been mechanically proven
correct~\cite[tristan08_formal_verif_trans_valid], which has not been the case in HLS tools
to date.

Our position is that none of the above workarounds are necessary if the HLS tool can simply be
trusted to work correctly.

\paragraph{Our solution.} We have designed a new HLS tool in the Coq theorem prover and proved that
any output design it produces always has the same behaviour as its input program. Our tool, called
Vericert, is automatically extracted to an OCaml program from Coq, which ensures that the object of
the proof is the same as the implementation of the tool. Vericert is built by extending the CompCert
verified C compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific
intermediate language and a Verilog back end. It supports most C constructs, including integer
operations, function calls (which are all inlined), local arrays, structs, unions, and general
control-flow statements, but currently excludes support for case statements, function pointers,
recursive function calls, non-32-bit integers, floats, and global variables.

\paragraph{Contributions and Outline.} The contributions of this paper are as follows:

\startitemize[]
  \item We present Vericert, the first mechanically verified HLS tool that compiles C to Verilog. In
        Section~\ref{sec:design}, we describe the design of Vericert, including certain
        optimisations related to memory accesses and division.
  \item We state the correctness theorem of Vericert with respect to an existing semantics for
    Verilog due to \cite[authoryear][loow19_proof_trans_veril_devel_hol]. In
    Section~\ref{sec:verilog}, we describe how we extended this semantics to make it suitable as an
    HLS target.  We also describe how the Verilog semantics is integrated into CompCert's language
    execution model and its framework for performing simulation proofs. A mapping of CompCert's
    infinite memory model onto a finite Verilog array is also described.
  \item In Section~\ref{sec:proof}, we describe how we proved the correctness theorem. The proof
        follows standard CompCert techniques -- forward simulations, intermediate specifications,
        and determinism results -- but we encountered several challenges peculiar to our
        hardware-oriented
        setting.
        These include handling discrepancies between the byte-addressed memory assumed by the input
        software and the word-addressed memory that we implement in the output hardware, different
        handling of unsigned comparisons between C and Verilog, and carefully implementing memory
        reads and writes so that these behave properly as a RAM in hardware.
  \item In Section~\ref{sec:evaluation}, we evaluate Vericert on the PolyBench/C benchmark
        suite~\cite{polybench}, and compare the performance of our generated hardware against an
        existing, unverified HLS tool called LegUp~\cite{canis11_legup}. We show that Vericert
        generates hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the
        absence of division) and \areaIncr$\times$ larger than that generated by LegUp. This
        performance gap can be largely attributed to Vericert's current lack of support for
        instruction-level parallelism and the absence of an efficient, pipelined division
        operator. We intend to close this gap in the future by introducing (and verifying) HLS
        optimisations of our own, such as scheduling and memory analysis.  This section also reports
        on our campaign to fuzz-test Vericert using over a hundred thousand random C programs
        generated by Csmith~\cite{yang11_findin_under_bugs_c_compil} in order to confirm that its
        correctness theorem is watertight.
\stopitemize

\paragraph{Companion material.} Vericert is fully open source and available on GitHub at

\startalignment[center]
\goto{\hyphenatedurl{https://github.com/ymherklotz/vericert}}[url(https://github.com/ymherklotz/vericert)].
\stopalignment

A snapshot of the Vericert development is also available in a Zenodo
repository~\cite{yann_herklotz_2021_5093839}.

\stopcomponent