3 files changed, 5 insertions, 3 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 130d403..2aff9bb 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -88,6 +88,7 @@ It has an unlimited number of pseudo-registers, and is represented as a control
 This section will introduce Verilog for readers who may not be familiar with the language, concentrating on the features that are used in the output of \vericert{}.  Verilog is a hardware description language (HDL) and is used to design hardware ranging from complete CPUs that are eventually produced as an integrated circuit, to small application-specific accelerators that are placed on an FPGA.  Verilog is a popular language because it allows for fine-grained control over the hardware, and also provides high-level constructs to simplify the development.
 
 Verilog behaves quite differently to standard software programming languages due to it having to express the parallel nature of hardware.  The basic construct to achieve this is the always-block, which is a collection of assignments that are executed every time some event occurs.  In the case of \vericert{}, this event is either a positive (rising) or a negative (falling) clock edge.  All always-blocks triggering on the same event are executed in parallel. Always-blocks can also express control-flow using if-statements and case-statements.
+\NR{Might be useful to talk about registers must be updated only within an always block.} \JW{That's important for Verilog programming in general, but is it necessary for understanding this paper?}
 
 \begin{figure}
   \centering
@@ -136,6 +137,7 @@ endmodule
   \label{fig:tutorial:state_machine}
 \end{figure}
 
+
 A simple state machine can be implemented as shown in Figure~\ref{fig:tutorial:state_machine}.
 At every positive edge of the clock (\texttt{clk}), both of the always-blocks will trigger simultaneously.  The first always-block controls the values in the register \texttt{x} and the output \texttt{z}, while the second always-block controls the next state the state machine should go to.  When the \texttt{state} is 0, \texttt{x} will be assigned to the input \texttt{y} using nonblocking assignment, denoted by \texttt{<=}.  Nonblocking assignment assigns registers in parallel at the end of the clock cycle, rather than sequentially throughout the always-block. In the second always-block, the input \texttt{y} will be checked, and if it's high it will move on to the next state, otherwise it will stay in the current state.  When \texttt{state} is 1, the first always-block will reset the value of \texttt{x} and then set \texttt{z} to the original value of \texttt{x}, since nonblocking assignment does not change its value until the end of the clock cycle.  Finally, the last always-block will set the state to be 0 again.
 
diff --git a/evaluation.tex b/evaluation.tex
index 7b85eff..0077246 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -177,8 +177,8 @@ By looking at the median, when division/modulo operations are enabled, we see th
 
 \subsection{RQ4: How effective is the correctness theorem in \vericert{}?}
 
-\definecolor{fuzzred}{HTML}{fb9a99}
-\definecolor{fuzzyellow}{HTML}{fed9a6}
+\definecolor{fuzzred}{HTML}{f8514f}
+\definecolor{fuzzyellow}{HTML}{fee4bf}
 \definecolor{fuzzgreen}{HTML}{b2df8a}
 \begin{figure}
   \centering
diff --git a/introduction.tex b/introduction.tex
index 4573f63..a8a8185 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -48,7 +48,7 @@ The contributions of this paper are as follows:
   \item In Section~\ref{sec:proof}, we describe how we proved the correctness theorem. The proof follows standard \compcert{} techniques -- forward simulations, intermediate specifications, and determinism results -- but we encountered several challenges peculiar to our hardware-oriented setting. %\NR{`specific' is better than `peculiar'?} %JW: I think this is a nice use of peculiar. Note that it means `distinctive' here, not `weird' -- the third meaning at https://www.dictionary.com/browse/peculiar 
   These include handling discrepancies between the byte-addressed memory assumed by the input software and the word-addressed memory that we implement in the output hardware,  different handling of unsigned comparisons between C and Verilog, and carefully implementing memory reads and writes so that these behave properly as a RAM in hardware.
   %\JW{Not sure `rearranging' is quite the right word. Sounds like you're rearranging independent reads/writes w.r.t. each other. Maybe change `correctly rearranging' to `carefully implementing'?}
-  \item In Section~\ref{sec:evaluation}, we evaluate \vericert{} on the \polybench{} benchmark suite~\cite{polybench}, and compare the performance of our generated hardware against an existing, unverified HLS tool called \legup{}~\cite{canis11_legup}. We show that \vericert{} generates hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the absence of division) and \areaIncr$\times$ larger than that generated by \legup{}. This performance gap can be largely attributed to \vericert{}'s current lack of support for instruction-level parallelism and the absence of an efficient, pipelined division operator. We intend to close this gap in the future by introducing (and verifying) HLS optimisations of our own, such as scheduling and memory analysis. %\NR{Question rather than comment: Will there be verification issues to add support for hard IPs like division blocks?}\YH{Not really any issues, just many different levels of reliability.  You don't have to prove IP correct, but theoretically could.}
+  \item In Section~\ref{sec:evaluation}, we evaluate \vericert{} on the \polybench{} benchmark suite~\cite{polybench}, and compare the performance of our generated hardware against an existing, unverified HLS tool called \legup{}~\cite{canis11_legup}. We show that \vericert{} generates hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the absence of division) and \areaIncr$\times$ larger than that generated by \legup{}. This performance gap can be largely attributed to \vericert{}'s current lack of support for instruction-level parallelism and the absence of an efficient, pipelined division operator. We intend to close this gap in the future by introducing (and verifying) HLS optimisations of our own, such as scheduling and memory analysis. \JW{This section also reports on our campaign to fuzz-test \vericert{} using over a hundred thousand random C programs generated by Csmith~\cite{yang11_findin_under_bugs_c_compil} in order to confirm that its correctness theorem is watertight.} %\NR{Question rather than comment: Will there be verification issues to add support for hard IPs like division blocks?}\YH{Not really any issues, just many different levels of reliability.  You don't have to prove IP correct, but theoretically could.}
 \end{itemize}
 %\JW{This sentence seems pretty good to me; is it up-to-date with the latest `challenges' you've faced?}
 \vericert{} is fully open source and available online.