Update some sections

author: Yann Herklotz <git@yannherklotz.com> 2020-11-13 13:13:43 +0000
committer: Yann Herklotz <git@yannherklotz.com> 2020-11-13 13:13:43 +0000
commit: cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee (patch)
tree: c2013228f14222a3fd7f373f3a815edac38806b7
parent: fae93d199b799b3b7ecf9900cec81e548de99d83 (diff)
download: oopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.tar.gz
oopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.zip
7 files changed, 159 insertions, 117 deletions
diff --git a/conclusion.tex b/conclusion.tex
index 876382d..7c06245 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -1,5 +1,7 @@
 \section{Conclusion}
 
+\vericert{} is the first fully mechanised proof of correctness for HLS, translating C into Verilog.  However, to make the tool more usable there are many more optimisations that could be implemented to get the performance closer to \legup{}.  The main optimisation that should be performed is scheduling operations into the same clock cycle, so that these can run in parallel.  In addition to that, another issue that \vericert{} faces is that \compcert{}'s 3AC uses many different registers, even if these are not anymore later on.  It would therefore be useful to have a register allocation algorithm that could ensure that registers could be reused appropriately.  Finally, another optimisation would be to support resource sharing to reduce the area that the final design uses.
+
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: "main"
diff --git a/evaluation.tex b/evaluation.tex
index b81d08d..cb185ac 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -4,16 +4,31 @@ Our evaluation is designed to answer the following three research questions. \JW
 \begin{description}
 \item[RQ1] How fast is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
 \item[RQ2] How area-efficient is the hardware generated by \vericert{}, and how does this compare to existing HLS tools?
-\item[RQ3] How long does \vericert{} take to produce hardware?
+%\item[RQ3] How long does \vericert{} take to produce hardware?
 \end{description}
 
 \subsection{Experimental Setup}
 
-\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
+\paragraph{Choice of HLS tool for comparison.} We compare \vericert{} against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis11_legup}.
 
-\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
+\paragraph{Choice of benchmarks.} We evaluate \vericert{} using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}} and some CHStone benchmarks\footnote{\url{http://www.ertl.jp/chstone/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since \vericert{} only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of \vericert{}, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
 
-\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these tasks, and we target an Arria 10 \ref{?} FPGA.
+\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these  tasks, and we target an Arria 10 FPGA.
+
+\begin{table*}
+  \begin{tabular}{lcccccccccccc}
+    \toprule
+    Bench & \multicolumn{2}{c}{Cycles} & \multicolumn{2}{c}{Frequency / MHz} & \multicolumn{2}{c}{LUTs} & \multicolumn{2}{c}{Reg} & \multicolumn{2}{c}{BRAMs} & \multicolumn{2}{c}{DSPs}\\
+    & L & V & L & V & L & V & L & V & L & V & L & V\\
+    \midrule
+    adpcm & 30241 & 121386 & 90.05 & 66.3 & 7719 & 51626 & 12034 & 42688 & 7 & 0 & 0 & 48\\
+    aes & 8489 & 41958 & 87.83 & 19.6 & 24413 & 104017 & 23796 & 94239 & 19  & 0 & 0 & 6\\
+    gsm & 7190 & 21994 & 119.25 & 66.1 & 6638 & 45764 & 9201 & 33675 & 3 & 0 & 0 & 8 \\
+    mips & 7754 & 18482 & 98.95 & 78.43 & 5049 & 10617 & 4185 & 7690 & 0 & 0 & 0 & 0\\
+    \bottomrule
+  \end{tabular}
+  \caption{CHStone programs synthesised in \legup{} 5.1 and with \vericert{}}\label{tab:chstone}
+\end{table*}
 
 \subsection{RQ1: How fast is \vericert{}-generated hardware?}
 
@@ -45,9 +60,9 @@ Our evaluation is designed to answer the following three research questions. \JW
 \label{fig:comparison_time}
 \end{figure}
 
-Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because \vericert{}-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, \vericert{} generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
+Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because \vericert{}-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, \vericert{} generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is the divide circuits that are introduced by \vericert{} when a divide operation is performed in C.  Currently, these have to be performed in one cycle, which heavily influences the maximum clock speed that can be reached, as the maximum delay introduced by the divide is quite large.  One possible solution to this is to separate the divide operation into multiple clock cycles.
 
-The difference in cycle counts shows the degree of  parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of \compcert{} instructions that we have proven to be correct.
+The difference in cycle counts shows the degree of  parallelism that \legup{}'s scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of \compcert{} instructions that we have proven to be correct.
 
 \subsection{RQ2: How area-efficient is \vericert{}-generated hardware?}
 
@@ -71,58 +86,30 @@ The difference in cycle counts shows the degree of  parallelism that LegUp's sch
 \label{fig:comparison_area}
 \end{figure}
 
-Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
-
-\subsection{RQ3: How long does \vericert{} take to produce hardware?}
-
-\begin{figure}
-\begin{tikzpicture}
-\begin{axis}[
-    height=80mm,
-    width=80mm,
-    xlabel={LegUp (s)},
-    ylabel={\vericert{} (s)},
-  ]
+Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is mainly because RAM is not inferred automatically for the Verilog that is generated by \vericert{}.  Other synthesis tools can infer the RAM correctly for \vericert{} output, so this issue could be solved by either using a different synthesis tool and targeting a different FPGA, or by generating the correct template which allows Quartus to identify the RAM automatically.
+
+%\subsection{RQ3: How long does \vericert{} take to produce hardware?}
+%
+%\begin{figure}
+%\begin{tikzpicture}
+%\begin{axis}[
+%    height=80mm,
+%    width=80mm,
+%    xlabel={LegUp (s)},
+%    ylabel={\vericert{} (s)},
+%  ]
   
-\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
-  table [x=legupcomptime, y=vericertcomptime, col sep=comma]
-  {results/comparison.csv};
+%\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
+%  table [x=legupcomptime, y=vericertcomptime, col sep=comma]
+%   {results/comparison.csv};
   
-\end{axis}
-\end{tikzpicture}
-\caption{A comparison of compilation time for \vericert{} and for LegUp \JW{Numbers for \vericert{} not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
-\label{fig:comparison_comptime}
-\end{figure}
-
-Figure~\ref{fig:comparison_comptime} compares the compilation times of \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} compilation is about ten times faster than LegUp compilation. The simple reason for this is that \vericert{} omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
-
-\begin{table}
-  \begin{tabular}{lcccccc}
-    \toprule
-    Bench & Cycles & MHz & LUTs & Reg & BRAMs\\
-    \midrule
-    adpcm & 30241 &90.05 & 7719 & 12034 & 7\\
-    aes & 8489 & 87.83 & 24413 & 23796 & 19 \\
-    gsm & 7190 & 119.25 & 6638 & 9201 & 3 \\
-    mips & 7754 & 98.95 & 5049 & 4185 & 0 \\
-    \bottomrule
-  \end{tabular}
-  \caption{CHstone programs synthesised in LegUp 5.1}
-\end{table}
+%\end{axis}
+%\end{tikzpicture}
+%\caption{A comparison of compilation time for \vericert{} and for LegUp \JW{Numbers for \vericert{} not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
+%\label{fig:comparison_comptime}
+%\end{figure}
 
-\begin{table}
-  \begin{tabular}{lccccccc}
-    \toprule
-    Bench & Cycles & MHz & LUTs & Reg & BRAMs & DSPs\\
-    \midrule
-    adpcm & 121386 & 66.3 & 51626 & 42688 & 0 & 48 &\\
-    aes & 41958 & 19.6 & 104017 & 94239 & 0 & 6 \\
-    gsm & 21994 & 66.1 & 45764 & 33675 & 0 & 8\\
-    mips & 18482 & 78.43 & 10617 & 7690 & 0 & 0\\
-    \bottomrule
-  \end{tabular}
-  \caption{CHstone programs synthesised in \vericert{}}
-\end{table}
+%Figure~\ref{fig:comparison_comptime} compares the compilation times of \vericert{} and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that \vericert{} compilation is about ten times faster than LegUp compilation. The simple reason for this is that \vericert{} omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/introduction.tex b/introduction.tex
index c597e01..35d5ce7 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -6,7 +6,9 @@
 
 \JW{A few high-level comments: \begin{enumerate} \item Create more tension from the start by making the reader doubt whether existing HLS tools are trustworthy. \item The intro currently draws quite a bit of motivation from Lidbury et al. 2015, but we should also now lean on our FPGA submission too. \item I wonder whether the paragraph `To mitigate the problems...' should be demoted to a `related work' discussion (perhaps as a subsection towards the end of the introduction). It outlines (and nicely dismisses) some existing attempts to tackle the problem, which is certainly useful motivation for your work, especially for readers already familiar with HLS, but I feel that it's not really on the critical path for understanding the paper.\end{enumerate}}
 
-One current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@  However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog.  Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient.  Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
+One current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@  However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog.  High-level synthesis (HLS) is becoming a
+
+Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient.  Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
 
 %% Definition and benefits of HLS
 One possible alternative to the tedious design process of custom hardware is to use high-level synthesis (HLS), which is the process of generating custom hardware, represented in an HDL, based on a behavioural description, often in a subset of C.  This elevates the level of abstraction, because the description of the algorithm in C is inherently untimed, meaning actions don't have to be scheduled into clock cycles manually.  The higher level of abstraction makes it easier to reason about the algorithms and therefore also facilitates maintaining them.  As a result the time to design the hardware is reduced significantly, especially if a software description of the algorithm already exists, because there is no need to redesign at a lower level and directly in hardware.  In addition, using HLS to design the hardware has the benefit of making the functional verification of the design much more efficient and simple than at the HDL stage, since the entire software ecosystem can be mobilised for this
@@ -20,13 +22,6 @@ However, the fact that the behaviour is preserved after HLS cannot be guaranteed
 % JW: Another candidate, probably less interesting:
 % https://bit.ly/intel-hls-memory-bug
 
-%% Current work in formal verification of HLS
-%%\NR{This is a good paragraph, but we need to relate it more to this work and why this work is different.}
-%%\NR{Focus on more high-level of "why this work is interesting"? Two key points we want to get across to the reader is that in existing works: validation is neccessary every time a new program is compiled and the verifying algorithm is not verified.}
-%%\NR{Also define words like validation, verifying algorithm (can you use the word ``verifier'',mechanisation)}
-%%\NR{Having said that, keep the text for related work section.}\YH{Added into related works.}
-To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there is a risk that bugs remain in the untested code.  To ensure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent.  Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. \JW{Does this link back to Mentor's Catapult-C, which you mentioned earlier? Does Catapult-C attempt to do translation validation as part of its HLS process? And if so, can you make the point that this effort is largely ineffective because once the design is a reasonable size, the translation validation usually fails anyway?}\YH{TODO: Currently I only have a whitepaper which goes over the translation validation in a very high level, but they do actually mention some flakiness and state that the user would have to manually change the code to fix that.  So I think I can actually make that point.  I just realised they have a pretty funny diagram of verification $\rightarrow$ differences $\rightarrow$ adjustments $\rightarrow$ ... until it is finally verified.} A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code.  In addition to that, it would allow for the implementation of optimisation passes which are also proven correct mechanically by translation validation, thereby greatly improving the reliability of these passes.
-
 \compcert{}~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has been written and formally verified in the Coq theorem prover~\cite{bertot04_inter_theor_provin_progr_devel}.  First of all, most of the compiler passes in \compcert{} have been proven correct, meaning that once the compiler is built, the proofs can be erased as the algorithm has been shown to be correct independent of the input.  However, some optimisation passes such as software pipelining require translation validation~\cite{tristan08_formal_verif_trans_valid}, in which case the correctness of the compiler pass needs to be checked at runtime.  However, even in this case the verifier itself is proven to only verify code correct that does indeed behave in the same way.
 
 %% Contributions of paper
@@ -50,6 +45,15 @@ The first section will describe the Verilog semantics that were used and extende
 %%\NR{A tool-flow diagram here will be useful. I thought you had one previously?}
 \NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in \vericert{}?}\YH{The LegUp example actually with the shift might be a good one.}
 
+\subsection{Existing works}
+
+%% Current work in formal verification of HLS
+%%\NR{This is a good paragraph, but we need to relate it more to this work and why this work is different.}
+%%\NR{Focus on more high-level of "why this work is interesting"? Two key points we want to get across to the reader is that in existing works: validation is neccessary every time a new program is compiled and the verifying algorithm is not verified.}
+%%\NR{Also define words like validation, verifying algorithm (can you use the word ``verifier'',mechanisation)}
+%%\NR{Having said that, keep the text for related work section.}\YH{Added into related works.}
+To mitigate the problems about the unreliability of synthesis tool, it is often required to check the generated hardware for functional correctness.  This is commonly done by simulating the design with a large test-bench, however, the guarantees are only as good as the test-bench, meaning if all the inputs were not covered, there is a risk that bugs remain in the untested code.  To ensure that the hardware does indeed behave in the same way as the C code, it may therefore be necessary to prove that they are equivalent.  Translation validation~\cite{pnueli98_trans} is the main method which is used to prove that the HLS translation was successful, and has been successfully applied to many HLS optimisations~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth,banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}.  However, the main problem is that the validator itself has often not been mechanically proven correct, meaning that the implementation is quite separate from the proof.  In addition to that, with large designs it may not be feasible to perform translation validation, as the state space would grow exponentially. \JW{Does this link back to Mentor's Catapult-C, which you mentioned earlier? Does Catapult-C attempt to do translation validation as part of its HLS process? And if so, can you make the point that this effort is largely ineffective because once the design is a reasonable size, the translation validation usually fails anyway?}\YH{TODO: Currently I only have a whitepaper which goes over the translation validation in a very high level, but they do actually mention some flakiness and state that the user would have to manually change the code to fix that.  So I think I can actually make that point.  I just realised they have a pretty funny diagram of verification $\rightarrow$ differences $\rightarrow$ adjustments $\rightarrow$ ... until it is finally verified.} A mechanically verified HLS tool would remove the need to perform simulation after the synthesis process if one has proven desirable properties about the C code.  In addition to that, it would allow for the implementation of optimisation passes which are also proven correct mechanically by translation validation, thereby greatly improving the reliability of these passes.
+
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: "main"
diff --git a/main.tex b/main.tex
index fff43ff..a1a8355 100644
--- a/main.tex
+++ b/main.tex
@@ -71,6 +71,7 @@
 
 \newcommand{\vericert}{Veri\-cert}%
 \newcommand{\compcert}{Comp\-Cert}%
+\newcommand{\legup}{LegUp}%
 
 \begin{document}
 
diff --git a/proof.tex b/proof.tex
index 061ea8d..483887b 100644
--- a/proof.tex
+++ b/proof.tex
@@ -93,7 +93,7 @@ Finally, to prove the backward simulation given the forward simulation, it has t
 
 The Verilog semantics that are used are deterministic, as the order of operation of all the constructs is defined.
 
-\subsection{Coq Mechanisation}
+%\subsection{Coq Mechanisation}
 
 \JW{Would be nice to include a few high-level metrics here. How many person-years of effort was the proof (very roughly)? How many lines of Coq? How many files, how many lemmas? How long does it take for the Coq proof to execute?}
 
@@ -118,7 +118,7 @@ x \div 2^y =
 
 The \compcert{} semantics for the \texttt{Oshrximm} instruction express it's operation exactly as shown in the equation above, even though in hardware the computation that would be performed would be different.  In \vericert{}, if the same operation would be implemented using Verilog operators, it is not guaranteed to be optimised correctly by the synthesis tools that convert the Verilog into a circuit.  To guarantee an output that does not include divides, we therefore have to express it in Verilog using shifts, and then prove that this representation is equivalent to the divide representation used in the \compcert{} semantics.  This proof discovered quite a few bugs in our initial implementation of optimisations, which rounded to $-\infty$ instead of 0.
 
-\subsection{Coq Mechanisation}
+%\subsection{Coq Mechanisation}
 
 %%% Local Variables:
 %%% mode: latex
diff --git a/references.bib b/references.bib
index 78a317a..27cc88f 100644
--- a/references.bib
+++ b/references.bib
@@ -357,20 +357,14 @@
   series =       {PLDI 2019},
 }
 
-@inproceedings{canis+11,
-  author    = {Andrew Canis and
-               Jongsok Choi and
-               Mark Aldham and
-               Victor Zhang and
-               Ahmed Kammoona and
-               Jason Helge Anderson and
-               Stephen Dean Brown and
-               Tomasz S. Czajkowski},
-  title     = {{LegUp}: high-level synthesis for {FPGA}-based processor/accelerator systems},
-  booktitle = {{FPGA}},
-  pages     = {33--36},
-  publisher = {{ACM}},
-  year      = {2011}
+@inproceedings{canis11_legup,
+  author          = {Andrew Canis and Jongsok Choi and Mark Aldham and Victor Zhang and Ahmed
+                  Kammoona and Jason Helge Anderson and Stephen Dean Brown and Tomasz S. Czajkowski},
+  title           = {{LegUp}: high-level synthesis for {FPGA}-based processor/accelerator systems},
+  booktitle       = {{FPGA}},
+  year            = 2011,
+  pages           = {33--36},
+  publisher       = {{ACM}},
 }
 
 @inproceedings{choi+18,
@@ -436,3 +430,57 @@
   month           = {April},
   type            = {Standard},
 }
+
+@misc{xilinx20_vivad_high_synth,
+  author          = {Xilinx},
+  title           = {Vivado High-level Synthesis},
+  url             = {https://bit.ly/39ereMx},
+  urldate         = {2020-07-20},
+  year            = 2020,
+}
+
+@misc{intel20_sdk_openc_applic,
+  author          = {Intel},
+  title           = {{SDK} for {OpenCL} Applications},
+  url             = {https://intel.ly/30sYHz0},
+  urldate         = {2020-07-20},
+  year            = 2020,
+}
+
+@inproceedings{nigam20_predic_accel_desig_time_sensit_affin_types,
+  author          = {Nigam, Rachit and Atapattu, Sachille and Thomas, Samuel and Li, Zhijing and
+                  Bauer, Theodore and Ye, Yuwei and Koti, Apurva and Sampson, Adrian and Zhang,
+                  Zhiru},
+  title           = {Predictable Accelerator Design with Time-Sensitive Affine Types},
+  booktitle       = {Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design
+                  and Implementation},
+  year            = 2020,
+  pages           = {393-407},
+  doi             = {10.1145/3385412.3385974},
+  url             = {https://doi.org/10.1145/3385412.3385974},
+  address         = {New York, NY, USA},
+  isbn            = 9781450376136,
+  keywords        = {Affine Type Systems, High-Level Synthesis},
+  location        = {London, UK},
+  numpages        = 15,
+  publisher       = {Association for Computing Machinery},
+  series          = {PLDI 2020},
+}
+
+@inproceedings{bourgeat20_essen_blues,
+  author          = {Bourgeat, Thomas and Pit-Claudel, Cl\'{e}ment and Chlipala, Adam and Arvind},
+  title           = {The Essence of Bluespec: A Core Language for Rule-Based Hardware Design},
+  booktitle       = {Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design
+                  and Implementation},
+  year            = 2020,
+  pages           = {243-257},
+  doi             = {10.1145/3385412.3385965},
+  url             = {https://doi.org/10.1145/3385412.3385965},
+  address         = {New York, NY, USA},
+  isbn            = 9781450376136,
+  keywords        = {Hardware Description Language, Compiler Correctness, Semantics},
+  location        = {London, UK},
+  numpages        = 15,
+  publisher       = {Association for Computing Machinery},
+  series          = {PLDI 2020},
+}
diff --git a/related.tex b/related.tex
index 6457913..5cdee3a 100644
--- a/related.tex
+++ b/related.tex
@@ -1,59 +1,59 @@
 \section{Related Work}
 
 \begin{figure}
-\centering
-\begin{tikzpicture}
-\def\opacity{0.2}
-\definecolor{colorusabletool}{HTML}{1b9e77}
-\definecolor{colorhighlevel}{HTML}{d95f02}
-\definecolor{colorproof}{HTML}{7570b3}
-\definecolor{colormechanised}{HTML}{e7298a}
+  \centering
+  \begin{tikzpicture}
+    \def\opacity{0.2}
+    \definecolor{colorusabletool}{HTML}{1b9e77}
+    \definecolor{colorhighlevel}{HTML}{d95f02}
+    \definecolor{colorproof}{HTML}{7570b3}
+    \definecolor{colormechanised}{HTML}{e7298a}
 
-\tikzset{myellipse/.style={draw=none, fill opacity=0.2}}
-\tikzset{myoutline/.style={draw=white, thick}}
+    \tikzset{myellipse/.style={draw=none, fill opacity=0.2}}
+    \tikzset{myoutline/.style={draw=white, thick}}
 
-\draw[myellipse, fill=colorusabletool] (-1.1,1.8) ellipse (2.9 and 2.1);
-\draw[myellipse, fill=colorhighlevel] (1.1,1.8) ellipse (2.9 and 2.1);
-\draw[myellipse, fill=colorproof] (0,0.3) ellipse (3.7 and 1.9);
-\draw[myellipse, fill=colormechanised] (0,0) ellipse (2.7 and 1.0);
+    \draw[myellipse, fill=colorusabletool] (-1.1,1.8) ellipse (2.9 and 2.1);
+    \draw[myellipse, fill=colorhighlevel] (1.1,1.8) ellipse (2.9 and 2.1);
+    \draw[myellipse, fill=colorproof] (0,0.3) ellipse (3.7 and 1.9);
+    \draw[myellipse, fill=colormechanised] (0,0) ellipse (2.7 and 1.0);
 
-\draw[myoutline] (-1.1,1.8) ellipse (2.9 and 2.1);
-\draw[myoutline] (1.1,1.8) ellipse (2.9 and 2.1);
-\draw[myoutline] (0,0.3) ellipse (3.7 and 1.9);
-\draw[myoutline] (0,0) ellipse (2.7 and 1.0);
+    \draw[myoutline] (-1.1,1.8) ellipse (2.9 and 2.1);
+    \draw[myoutline] (1.1,1.8) ellipse (2.9 and 2.1);
+    \draw[myoutline] (0,0.3) ellipse (3.7 and 1.9);
+    \draw[myoutline] (0,0) ellipse (2.7 and 1.0);
 
-\node[align=center] at (0,2.8) {Standard HLS \\ tools~\cite{canis+11, vivadohls, intelopencl, dahlia}};
-\node[align=center] at (0,1.5) {Translation validation \\ approaches~\cite{mentor20_catap_high_level_synth, kundu08_valid_high_level_synth, clarke_kroening_yorav_03}};
-\node at (0,0.5) {\bf \vericert{}};
-\node[align=left] at (-1.5,0.4) {Koika~\cite{koika_pldi2020}};
-\node[align=left] at (-1.5,0.0) {L\"o\"ow et al.~\cite{loow}};
+    \node[align=center] at (0,2.8) {Standard HLS \\ tools~\cite{canis11_legup,xilinx20_vivad_high_synth,intel20_sdk_openc_applic,nigam20_predic_accel_desig_time_sensit_affin_types}};
+    \node[align=center] at (0,1.5) {Translation validation \\ approaches~\cite{mentor20_catap_high_level_synth,kundu08_valid_high_level_synth, clarke_kroening_yorav_03}};
+    \node at (0,0.5) {\bf \vericert{}};
+    \node[align=left] at (-1.5,0.4) {Koika~\cite{bourgeat20_essen_blues}};
+    \node[align=left] at (-1.5,0.0) {L\"o\"ow et al.~\cite{loow19_verif_compil_verif_proces}};
 
-\node at (1.8,0.2) {Ellis~\cite{ellis08}};
-\node at (0,-0.6) {Perna et al.~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}};
-\node at (0,-1.3) {BEDROC~\cite{chapman92_verif_bedroc}};
+    \node at (1.8,0.2) {Ellis~\cite{ellis08}};
+    \node at (0,-0.6) {Perna et al.~\cite{perna12_mechan_wire_wise_verif_handel_c_synth}};
+    \node at (0,-1.3) {BEDROC~\cite{chapman92_verif_bedroc}};
 
-\node[align=left] at (-2.9,-1.7) {\color{colorproof}Correctness \\ \color{colorproof}proof};
-\draw[myoutline] (1.2,-0.9) to (2.6,-1.7);
-\node[align=right] at (2.6,-1.7) {\color{colormechanised}Mechanised \\ \color{colormechanised}correctness proof};
-\node at (-2.8,4.1) {\color{colorusabletool}\strut Usable tool};
-\node at (2.1,4.1) {\color{colorhighlevel}\strut High-level software input};
+    \node[align=left] at (-2.9,-1.7) {\color{colorproof}Correctness \\ \color{colorproof}proof};
+    \draw[myoutline] (1.2,-0.9) to (2.6,-1.7);
+    \node[align=right] at (2.6,-1.7) {\color{colormechanised}Mechanised \\ \color{colormechanised}correctness proof};
+    \node at (-2.8,4.1) {\color{colorusabletool}\strut Usable tool};
+    \node at (2.1,4.1) {\color{colorhighlevel}\strut High-level software input};
 
-
-\end{tikzpicture}
-\caption{Summary of related work}
-\label{fig:related_venn}
+  \end{tikzpicture}
+  \caption{Summary of related work}\label{fig:related_venn}
 \end{figure}
 
+A summary of the related works can be found in Figure~\ref{fig:related_venn}, which is represented as a Venn diagram.  The categories that were chosen for the Venn diagram are if the tool is usable and available, if it takes a high-level software language as input, if it has a correctness proof and finally if that proof is mechanised.  The goal of \vericert{} is to cover all of these categories.
+
 Work is being done to prove the equivalence between the generated hardware and the original behavioural description in C.  An example of a tool that implements this is Mentor's Catapult~\cite{mentor20_catap_high_level_synth}, which tries to match the states in the three adress code (3AC) description to states in the original C code after an unverified translation.  This technique is called translation validation~\cite{pnueli98_trans}, whereby the translation that the HLS tool performed is proven to have been correct for that input, by showing that they behave in the same way for all possible inputs.  Using translation validation is quite effective for proving complex optimisations such as scheduling~\cite{kim04_autom_fsmd,karfa06_formal_verif_method_sched_high_synth,chouksey20_verif_sched_condit_behav_high_level_synth} or code motion~\cite{banerjee14_verif_code_motion_techn_using_value_propag,chouksey19_trans_valid_code_motion_trans_invol_loops}, however, the validation has to be run every time the high-level synthesis is performed.  In addition to that, the proofs are often not mechanised or directly related to the actual implementation, meaning the verifying algorithm might be wrong and could give false positives or false negatives.
 
 \JW{Some papers to cite somewhere:
-\begin{itemize}
+  \begin{itemize}
     \item \citet{kundu08_valid_high_level_synth} have done translation validation on an HLS tool called SPARK. They don't have \emph{very} strong evidence that HLS tools are buggy; they just say ``HLS tools are large and complex software systems, often with hundreds of
-thousands of lines of code, and as with any software of this scale, they are prone
-to logical and implementation errors''. They did, however, find two bugs in SPARK as a result of their efforts, so that provides a small bit of evidence that HLS tools are buggy. \@
+      thousands of lines of code, and as with any software of this scale, they are prone
+      to logical and implementation errors''. They did, however, find two bugs in SPARK as a result of their efforts, so that provides a small bit of evidence that HLS tools are buggy. \@
     \item \citet{chapman92_verif_bedroc} have proved (manually, as far as JW can tell) that the front-end and scheduler of the BEDROC synthesis system is correct. (NB:\@ Chapman also wrote a PhD thesis about this (1994) but it doesn't appear to be online.)
     \item \citet{ellis08} wrote a PhD thesis that was supervised by Cliff Jones whom you met at the VeTSS workshop thing last year. At a high level, it's about verifying a high-level synthesis tool using Isabelle, so very relevant to this project, though scanning through the pdf, it's quite hard to ascertain exactly what the contributions are. Strange that they never published a paper about the work -- it makes it very hard to find!
-\end{itemize}
+  \end{itemize}
 }
 
 \YH{Amazing, thank you, yes it seems like Kundu \textit{et al.} have quite a few papers on formal verification of HLS algorithms using translation validation, as well as~\citet{karfa06_formal_verif_method_sched_high_synth}, all using the SPARK~\cite{gupta03_spark} synthesis tool.  And yes thank you, will definitely cite that paper.  There seem to be similar early proofs of high-level synthesis like \citet{hwang91_formal_approac_to_sched_probl} or \citet{grass94_high}.  There are also the Occam papers that I need to mention too, like \citet{page91_compil_occam}, \citet{jifeng93_towar}, \citet{perna11_correc_hardw_synth} and \citet{perna12_mechan_wire_wise_verif_handel_c_synth}.}
author	Yann Herklotz <git@yannherklotz.com>	2020-11-13 13:13:43 +0000
committer	Yann Herklotz <git@yannherklotz.com>	2020-11-13 13:13:43 +0000
commit	cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee (patch)
tree	c2013228f14222a3fd7f373f3a815edac38806b7
parent	fae93d199b799b3b7ecf9900cec81e548de99d83 (diff)
download	oopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.tar.gz oopsla21_fvhls-cdbe6f978bf6de9a1b389cebc02dc74666b7a4ee.zip