diff options
Diffstat (limited to 'chapters/hls.tex')
-rw-r--r-- | chapters/hls.tex | 121 |
1 files changed, 41 insertions, 80 deletions
diff --git a/chapters/hls.tex b/chapters/hls.tex index 8ff7143..fbaf41a 100644 --- a/chapters/hls.tex +++ b/chapters/hls.tex @@ -7,61 +7,15 @@ \chapter[sec:hls]{High-Level Synthesis} -\paragraph{Can you trust your high-level synthesis tool?} As latency, throughput, and energy -efficiency become increasingly important, custom hardware accelerators are being designed for -numerous applications. Alas, designing these accelerators can be a tedious and error-prone process -using a hardware description language (HDL) such as Verilog. An attractive alternative is -\emph{high-level synthesis} (HLS), in which hardware designs are automatically compiled from -software written in a high-level language like C. Modern HLS tools such as -LegUp~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{intel_hls}, -and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and energy-efficiency to -those hand-written in an HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341}, while offering the -convenient abstractions and rich ecosystems of software development. But existing HLS tools cannot -always guarantee that the hardware designs they produce are equivalent to the software they were -given, and this undermines any reasoning conducted at the software level. - -Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence. -For instance, Vivado HLS has been shown to apply pipelining optimisations -incorrectly\footnote{\goto{https://bit.ly/vivado-hls-pipeline-bug}[url(https://bit.ly/vivado-hls-pipeline-bug)]} -or to silently generate wrong code should the programmer stray outside the fragment of C that it -supports.\footnote{\goto{https://bit.ly/vivado-hls-pointer-bug}[url(https://bit.ly/vivado-hls-pointer-bug)]} -Meanwhile, \cite{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test -Altera's (now Intel's) OpenCL compiler since it \quotation{either crashed or emitted an internal - compiler error} on so many of their test inputs. More recently, -\cite{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools -using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated -programs to the C fragment explicitly supported by all the tools, they still found that on average -2.5\% of test-cases were compiled to designs that behaved incorrectly. - -\paragraph{Existing workarounds.} Aware of the reliability shortcomings of HLS tools, hardware -designers routinely check the generated hardware for functional correctness. This is commonly done -by simulating the generated design against a large test-bench. But unless the test-bench covers all -inputs exhaustively -- which is often infeasible -- there is a risk that bugs remain. - -One alternative is to use \emph{translation validation}~\cite{pnueli98_trans} to prove equivalence -between the input program and the output design. Translation validation has been successfully -applied to several HLS -optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}. -Nevertheless, it is an expensive task, especially for large designs, and it must be repeated every -time the compiler is invoked. For example, the translation validation for Catapult -C~\cite{mentor20_catap_high_level_synth} may require several rounds of expert -`adjustments'~\cite[alternative=authoryear,right={, p.~3)}][slec_whitepaper] to the input C program -before validation succeeds. And even when it succeeds, translation validation does not provide -watertight guarantees unless the validator itself has been mechanically proven -correct~\cite[tristan08_formal_verif_trans_valid], which has not been the case in HLS tools to date. - -Our position is that none of the above workarounds are necessary if the HLS tool can simply be -trusted to work correctly. - -\paragraph{Our solution.} We have designed a new HLS tool in the Coq theorem prover and proved that -any output design it produces always has the same behaviour as its input program. Our tool, called -Vericert, is automatically extracted to an OCaml program from Coq, which ensures that the object of -the proof is the same as the implementation of the tool. Vericert is built by extending the CompCert -verified C compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific -intermediate language and a Verilog back end. It supports most C constructs, including integer -operations, function calls (which are all inlined), local arrays, structs, unions, and general -control-flow statements, but currently excludes support for case statements, function pointers, -recursive function calls, non-32-bit integers, floats, and global variables. +We have designed a new HLS tool in the Coq theorem prover and proved that any output design it +produces always has the same behaviour as its input program. Our tool, called Vericert, is +automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is +the same as the implementation of the tool. Vericert is built by extending the CompCert verified C +compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific intermediate +language and a Verilog back end. It supports most C constructs, including integer operations, +function calls (which are all inlined), local arrays, structs, unions, and general control-flow +statements, but currently excludes support for case statements, function pointers, recursive +function calls, non-32-bit integers, floats, and global variables. \paragraph{Contributions and Outline.} The contributions of this paper are as follows: @@ -1255,33 +1209,40 @@ Fig.~\goto{{[}fig:inference_module{]}}[fig:inference_module].~◻ \subsection[coq-mechanisation]{Coq Mechanisation} -lrrrrr & {\bf Coq code} & & {\bf Spec} & & {\bf Total}\crlf -Data structures and libraries & 280 & --- & --- & 500 & 780\crlf -Integers and values & 98 & --- & 15 & 968 & 1081\crlf -HTL semantics & 50 & --- & 181 & 65 & 296\crlf -HTL generation & 590 & --- & 66 & 3069 & 3725\crlf -RAM generation & 253 & --- & --- & 2793 & 3046\crlf -Verilog semantics & 78 & --- & 431 & 235 & 744\crlf -Verilog generation & 104 & --- & --- & 486 & 590\crlf -Top-level driver, pretty printers & 318 & 775 & --- & 189 & 1282\crlf -{\bf Total} & 1721 & 775 & 693 & 8355 & 11544\crlf +\startplacetable[reference=tab:proof-statistics] + \starttabulate[|l|r|r|r|r|r|] + \HL + \NC \NC {\bf Coq code} \NC \NC {\bf Spec} \NC \NC {\bf Total} \NC \NR + \HL + \NC Data structures and libraries \NC 280 \NC --- \NC --- \NC 500 \NC 780 \NC \NR + \NC Integers and values \NC 98 \NC --- \NC 15 \NC 968 \NC 1081 \NC \NR + \NC HTL semantics \NC 50 \NC --- \NC 181 \NC 65 \NC 296 \NC \NR + \NC HTL generation \NC 590 \NC --- \NC 66 \NC 3069 \NC 3725 \NC \NR + \NC RAM generation \NC 253 \NC --- \NC --- \NC 2793 \NC 3046 \NC \NR + \NC Verilog semantics \NC 78 \NC --- \NC 431 \NC 235 \NC 744 \NC \NR + \NC Verilog generation \NC 104 \NC --- \NC --- \NC 486 \NC 590 \NC \NR + \NC Top-level driver, pretty printers \NC 318 \NC 775 \NC --- \NC 189 \NC 1282 \NC \NR + \HL + \NC {\bf Total} \NC 1721 \NC 775 \NC 693 \NC 8355 \NC 11544 \NC \NR + \HL + \stoptabulate +\stopplacetable The lines of code for the implementation and proof of can be found in -Table~\goto{{[}tab:proof_statistics{]}}[tab:proof_statistics]. Overall, it took about 1.5 -person-years to build -- about three person-months on implementation and 15 person-months on -proofs. The largest proof is the correctness proof for the HTL generation, which required -equivalence proofs between all integer operations supported by and those supported in hardware. From -the 3069 lines of proof code in the HTL generation, 1189 are for the correctness proof of just the -load and store instructions. These were tedious to prove correct because of the substantial -difference between the memory models used, and the need to prove properties such as stores outside -of the allocated memory being undefined, so that a finite array could be used. In addition to that, -since pointers in HTL and Verilog are represented as integers, instead of as a separate -\quote{pointer} type like in the semantics, it was painful to reason about them. Many new theorems -had to be proven about them in to prove the conversion from pointer to integer. Moreover, the -second-largest proof of the correct RAM generation includes many proofs about the extensional -equality of array operations, such as merging arrays with different assignments. As the negative -edge implies that two merges take place every clock cycle, the proofs about the equality of the -contents in memory and in the merged arrays become more tedious too. +\in{Table}[tab:proof-statistics]. Overall, it took about 1.5 person-years to build -- about three +person-months on implementation and 15 person-months on proofs. The largest proof is the correctness +proof for the HTL generation, which required equivalence proofs between all integer operations +supported by and those supported in hardware. From the 3069 lines of proof code in the HTL +generation, 1189 are for the correctness proof of just the load and store instructions. These were +tedious to prove correct because of the substantial difference between the memory models used, and +the need to prove properties such as stores outside of the allocated memory being undefined, so that +a finite array could be used. In addition to that, since pointers in HTL and Verilog are represented +as integers, instead of as a separate \quote{pointer} type like in the semantics, it was painful to +reason about them. Many new theorems had to be proven about them in to prove the conversion from +pointer to integer. Moreover, the second-largest proof of the correct RAM generation includes many +proofs about the extensional equality of array operations, such as merging arrays with different +assignments. As the negative edge implies that two merges take place every clock cycle, the proofs +about the equality of the contents in memory and in the merged arrays become more tedious too. Looking at the trusted computing base of , the Verilog semantics is 431 lines of code. This and the Clight semantics from are the only parts of the compiler that need to be trusted. The Verilog |