1 files changed, 41 insertions, 80 deletions
diff --git a/chapters/hls.tex b/chapters/hls.tex
index 8ff7143..fbaf41a 100644
--- a/chapters/hls.tex
+++ b/chapters/hls.tex
@@ -7,61 +7,15 @@
 
 \chapter[sec:hls]{High-Level Synthesis}
 
-\paragraph{Can you trust your high-level synthesis tool?} As latency, throughput, and energy
-efficiency become increasingly important, custom hardware accelerators are being designed for
-numerous applications.  Alas, designing these accelerators can be a tedious and error-prone process
-using a hardware description language (HDL) such as Verilog.  An attractive alternative is
-\emph{high-level synthesis} (HLS), in which hardware designs are automatically compiled from
-software written in a high-level language like C.  Modern HLS tools such as
-LegUp~\cite{canis11_legup}, Vivado HLS~\cite{xilinx20_vivad_high_synth}, Intel i++~\cite{intel_hls},
-and Bambu HLS~\cite{bambu_hls} promise designs with comparable performance and energy-efficiency to
-those hand-written in an HDL~\cite{homsirikamol+14, silexicahlshdl, 7818341}, while offering the
-convenient abstractions and rich ecosystems of software development.  But existing HLS tools cannot
-always guarantee that the hardware designs they produce are equivalent to the software they were
-given, and this undermines any reasoning conducted at the software level.
-
-Indeed, there are reasons to doubt that HLS tools actually \emph{do} always preserve equivalence.
-For instance, Vivado HLS has been shown to apply pipelining optimisations
-incorrectly\footnote{\goto{https://bit.ly/vivado-hls-pipeline-bug}[url(https://bit.ly/vivado-hls-pipeline-bug)]}
-or to silently generate wrong code should the programmer stray outside the fragment of C that it
-supports.\footnote{\goto{https://bit.ly/vivado-hls-pointer-bug}[url(https://bit.ly/vivado-hls-pointer-bug)]}
-Meanwhile, \cite{lidbury15_many_core_compil_fuzzin} had to abandon their attempt to fuzz-test
-Altera's (now Intel's) OpenCL compiler since it \quotation{either crashed or emitted an internal
-  compiler error} on so many of their test inputs.  More recently,
-\cite{herklotz21_empir_study_reliab_high_level_synth_tools} fuzz-tested three commercial HLS tools
-using Csmith~\cite{yang11_findin_under_bugs_c_compil}, and despite restricting the generated
-programs to the C fragment explicitly supported by all the tools, they still found that on average
-2.5\% of test-cases were compiled to designs that behaved incorrectly.
-
-\paragraph{Existing workarounds.} Aware of the reliability shortcomings of HLS tools, hardware
-designers routinely check the generated hardware for functional correctness. This is commonly done
-by simulating the generated design against a large test-bench. But unless the test-bench covers all
-inputs exhaustively -- which is often infeasible -- there is a risk that bugs remain.
-
-One alternative is to use \emph{translation validation}~\cite{pnueli98_trans} to prove equivalence
-between the input program and the output design. Translation validation has been successfully
-applied to several HLS
-optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.
-Nevertheless, it is an expensive task, especially for large designs, and it must be repeated every
-time the compiler is invoked.  For example, the translation validation for Catapult
-C~\cite{mentor20_catap_high_level_synth} may require several rounds of expert
-`adjustments'~\cite[alternative=authoryear,right={, p.~3)}][slec_whitepaper] to the input C program
-before validation succeeds. And even when it succeeds, translation validation does not provide
-watertight guarantees unless the validator itself has been mechanically proven
-correct~\cite[tristan08_formal_verif_trans_valid], which has not been the case in HLS tools to date.
-
-Our position is that none of the above workarounds are necessary if the HLS tool can simply be
-trusted to work correctly.
-
-\paragraph{Our solution.} We have designed a new HLS tool in the Coq theorem prover and proved that
-any output design it produces always has the same behaviour as its input program. Our tool, called
-Vericert, is automatically extracted to an OCaml program from Coq, which ensures that the object of
-the proof is the same as the implementation of the tool. Vericert is built by extending the CompCert
-verified C compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific
-intermediate language and a Verilog back end. It supports most C constructs, including integer
-operations, function calls (which are all inlined), local arrays, structs, unions, and general
-control-flow statements, but currently excludes support for case statements, function pointers,
-recursive function calls, non-32-bit integers, floats, and global variables.
+We have designed a new HLS tool in the Coq theorem prover and proved that any output design it
+produces always has the same behaviour as its input program. Our tool, called Vericert, is
+automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is
+the same as the implementation of the tool. Vericert is built by extending the CompCert verified C
+compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific intermediate
+language and a Verilog back end. It supports most C constructs, including integer operations,
+function calls (which are all inlined), local arrays, structs, unions, and general control-flow
+statements, but currently excludes support for case statements, function pointers, recursive
+function calls, non-32-bit integers, floats, and global variables.
 
 \paragraph{Contributions and Outline.} The contributions of this paper are as follows:
 
@@ -1255,33 +1209,40 @@ Fig.~\goto{{[}fig:inference_module{]}}[fig:inference_module].~◻
 
 \subsection[coq-mechanisation]{Coq Mechanisation}
 
-lrrrrr & {\bf Coq code} & & {\bf Spec} & & {\bf Total}\crlf
-Data structures and libraries & 280 & --- & --- & 500 & 780\crlf
-Integers and values & 98 & --- & 15 & 968 & 1081\crlf
-HTL semantics & 50 & --- & 181 & 65 & 296\crlf
-HTL generation & 590 & --- & 66 & 3069 & 3725\crlf
-RAM generation & 253 & --- & --- & 2793 & 3046\crlf
-Verilog semantics & 78 & --- & 431 & 235 & 744\crlf
-Verilog generation & 104 & --- & --- & 486 & 590\crlf
-Top-level driver, pretty printers & 318 & 775 & --- & 189 & 1282\crlf
-{\bf Total} & 1721 & 775 & 693 & 8355 & 11544\crlf
+\startplacetable[reference=tab:proof-statistics]
+  \starttabulate[|l|r|r|r|r|r|]
+    \HL
+    \NC \NC {\bf Coq code} \NC \NC {\bf Spec} \NC \NC {\bf Total} \NC \NR
+    \HL
+    \NC Data structures and libraries \NC 280 \NC --- \NC --- \NC 500 \NC 780 \NC \NR
+    \NC Integers and values \NC 98 \NC --- \NC 15 \NC 968 \NC 1081 \NC \NR
+    \NC HTL semantics \NC 50 \NC --- \NC 181 \NC 65 \NC 296 \NC \NR
+    \NC HTL generation \NC 590 \NC --- \NC 66 \NC 3069 \NC 3725 \NC \NR
+    \NC RAM generation \NC 253 \NC --- \NC --- \NC 2793 \NC 3046 \NC \NR
+    \NC Verilog semantics \NC 78 \NC --- \NC 431 \NC 235 \NC 744 \NC \NR
+    \NC Verilog generation \NC 104 \NC --- \NC --- \NC 486 \NC 590 \NC \NR
+    \NC Top-level driver, pretty printers \NC 318 \NC 775 \NC --- \NC 189 \NC 1282 \NC \NR
+    \HL
+    \NC {\bf Total} \NC 1721 \NC 775 \NC 693 \NC 8355 \NC 11544 \NC \NR
+    \HL
+  \stoptabulate
+\stopplacetable
 
 The lines of code for the implementation and proof of can be found in
-Table~\goto{{[}tab:proof_statistics{]}}[tab:proof_statistics]. Overall, it took about 1.5
-person-years to build -- about three person-months on implementation and 15 person-months on
-proofs. The largest proof is the correctness proof for the HTL generation, which required
-equivalence proofs between all integer operations supported by and those supported in hardware. From
-the 3069 lines of proof code in the HTL generation, 1189 are for the correctness proof of just the
-load and store instructions. These were tedious to prove correct because of the substantial
-difference between the memory models used, and the need to prove properties such as stores outside
-of the allocated memory being undefined, so that a finite array could be used. In addition to that,
-since pointers in HTL and Verilog are represented as integers, instead of as a separate
-\quote{pointer} type like in the semantics, it was painful to reason about them. Many new theorems
-had to be proven about them in to prove the conversion from pointer to integer. Moreover, the
-second-largest proof of the correct RAM generation includes many proofs about the extensional
-equality of array operations, such as merging arrays with different assignments. As the negative
-edge implies that two merges take place every clock cycle, the proofs about the equality of the
-contents in memory and in the merged arrays become more tedious too.
+\in{Table}[tab:proof-statistics]. Overall, it took about 1.5 person-years to build -- about three
+person-months on implementation and 15 person-months on proofs. The largest proof is the correctness
+proof for the HTL generation, which required equivalence proofs between all integer operations
+supported by and those supported in hardware. From the 3069 lines of proof code in the HTL
+generation, 1189 are for the correctness proof of just the load and store instructions. These were
+tedious to prove correct because of the substantial difference between the memory models used, and
+the need to prove properties such as stores outside of the allocated memory being undefined, so that
+a finite array could be used. In addition to that, since pointers in HTL and Verilog are represented
+as integers, instead of as a separate \quote{pointer} type like in the semantics, it was painful to
+reason about them. Many new theorems had to be proven about them in to prove the conversion from
+pointer to integer. Moreover, the second-largest proof of the correct RAM generation includes many
+proofs about the extensional equality of array operations, such as merging arrays with different
+assignments. As the negative edge implies that two merges take place every clock cycle, the proofs
+about the equality of the contents in memory and in the merged arrays become more tedious too.
 
 Looking at the trusted computing base of , the Verilog semantics is 431 lines of code. This and the
 Clight semantics from are the only parts of the compiler that need to be trusted. The Verilog