summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--algorithm.tex4
-rw-r--r--evaluation.tex42
-rw-r--r--introduction.tex6
-rw-r--r--results/comparison.csv4
-rw-r--r--verilog.tex12
5 files changed, 34 insertions, 34 deletions
diff --git a/algorithm.tex b/algorithm.tex
index b9012ce..9db0262 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -17,7 +17,7 @@
\node[language] at (4,-2) (dfgstmd) {HTL};
\node[language] at (7,-2) (verilog) {Verilog};
\node at (0,1) {CompCert};
- \node at (0,-2) {CoqUp};
+ \node at (0,-2) {Vericert};
\draw[->] (clight) -- (cminor);
\draw[->,dashed] (cminor) -- (rtl);
\draw[->] (rtl) -- (ltl);
@@ -33,7 +33,7 @@ This section covers the main architecture of the HLS tool, and the way in which
CompCert is made up of 11 intermediate languages in between the Clight input and the assembly output. These intermediate languages each serve a different purpose and contain various different optimisations. When designing a new backend for CompCert, it is therefore crucial to know where to branch off and start the hardware generation. Many of the optimisations that the CompCert backend performs are not necessary when generating custom hardware and not relying on a CPU anymore, such as register allocation or even scheduling. It is therefore important to find the right intermediate language so that the HLS tool still benefits from many of the generic optimisations that CompCert performs, but does not receive the code transformations that are specific to CPU architectures.
-Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour. CompCert's register transfer language (RTL) is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. \JP{Perhaps this needs some further qualification? RTL uses the operations from the target architecture, and indeed performs architecture specific optimisations prior to RTL gen, so (for sake of example) switching from x86 RTL to RISC-V RTL could have a significant impact on performance.}\YH{Yes will definitely include those, just have to think about where. I think this could be something that could be mentioned in the RTL section instead.} In addition to that, many optimisations that are also useful for HLS are performed in RTL, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that CoqUp takes is show in figure~\ref{fig:rtlbranch}.
+Existing HLS compilers usually use LLVM IR as an intermediate representation when performing HLS specific optimisations, as each instruction can be mapped quite well to hardware which performs the same behaviour. CompCert's register transfer language (RTL) is the intermediate language that resembles LLVM IR the most, as it also has an infinite number of pseudo-registers and each instruction maps well to hardware. \JP{Perhaps this needs some further qualification? RTL uses the operations from the target architecture, and indeed performs architecture specific optimisations prior to RTL gen, so (for sake of example) switching from x86 RTL to RISC-V RTL could have a significant impact on performance.}\YH{Yes will definitely include those, just have to think about where. I think this could be something that could be mentioned in the RTL section instead.} In addition to that, many optimisations that are also useful for HLS are performed in RTL, which means that if it is supported as the input language, the HLS algorithm benefits from the same optimisations. It is therefore a good candidate to be chosen as the input language to the HLS backend. The complete flow that Vericert takes is show in figure~\ref{fig:rtlbranch}.
%%TODO: Maybe add why LTL and the other smaller languages are not that well suited
diff --git a/evaluation.tex b/evaluation.tex
index b2b4f6e..8ab8999 100644
--- a/evaluation.tex
+++ b/evaluation.tex
@@ -2,20 +2,20 @@
Our evaluation is designed to answer the following three research questions.
\begin{description}
-\item[RQ1] How fast is the hardware generated by CoqUp, and how does this compare to existing HLS tools?
-\item[RQ2] How area-efficient is the hardware generated by CoqUp, and how does this compare to existing HLS tools?
-\item[RQ3] How long does CoqUp take to produce hardware?
+\item[RQ1] How fast is the hardware generated by Vericert, and how does this compare to existing HLS tools?
+\item[RQ2] How area-efficient is the hardware generated by Vericert, and how does this compare to existing HLS tools?
+\item[RQ3] How long does Vericert take to produce hardware?
\end{description}
\subsection{Experimental Setup}
-\paragraph{Choice of HLS tool for comparison.} We compare CoqUp against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
+\paragraph{Choice of HLS tool for comparison.} We compare Vericert against LegUp 4.0 because it is open-source and hence easily accessible, but still produces hardware ``of comparable quality to a commercial high-level synthesis tool''~\cite{canis+11}.
-\paragraph{Choice of benchmarks.} We evaluate CoqUp using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since CoqUp only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of CoqUp, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
+\paragraph{Choice of benchmarks.} We evaluate Vericert using the PolyBench/C benchmark suite\footnote{\url{http://web.cs.ucla.edu/~pouchet/software/polybench/}}. PolyBench/C is a modern benchmark suite that has been previously used to evaluate HLS tools~\cite{choi+18}. For completeness, we use the full set of 24 benchmarks. We set the benchmark parameters so that all datatypes are integers (since Vericert only supports integers) and all datasets are `small' (to fit into the small on-chip memories). A current limitation of Vericert, as discussed in Section~\ref{?}, is that it does not support addition and subtraction operations involving integer literals not divisible by 4. To work around this, we lightly modified each benchmark program so that literals other than multiples of 4 are stored into variables before being added or subtracted. \JW{Any other notable changes to the benchmarks?}
\paragraph{Experimental setup.} In order to generate a hardware implementation, the Verilog produced by the HLS tool-under-test must be synthesised to a netlist; the resultant netlist can then be placed-and-routed for a particular FPGA device. We use Intel Quartus~\cite{quartus} for both of these tasks, and we target an Arria 10 \ref{?} FPGA.
-\subsection{RQ1: How fast is CoqUp-generated hardware?}
+\subsection{RQ1: How fast is Vericert-generated hardware?}
\begin{figure}
\begin{tikzpicture}
@@ -25,7 +25,7 @@ Our evaluation is designed to answer the following three research questions.
height=80mm,
width=80mm,
xlabel={LegUp (ms)},
- ylabel={CoqUp (ms)},
+ ylabel={Vericert (ms)},
xmin=0.01,
xmax=10,
ymax=500,
@@ -34,22 +34,22 @@ Our evaluation is designed to answer the following three research questions.
]
\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
- table [x=legupwallclockMS, y=coqupwallclockMS, col sep=comma]
+ table [x=legupwallclockMS, y=vericertwallclockMS, col sep=comma]
{results/comparison.csv};
\addplot[dashed, domain=0.01:10]{50*x};
\end{axis}
\end{tikzpicture}
-\caption{A comparison of the execution time of hardware designs generated by CoqUp and by LegUp. The diagonal is at $y=50x$.}
+\caption{A comparison of the execution time of hardware designs generated by Vericert and by LegUp. The diagonal is at $y=50x$.}
\label{fig:comparison_time}
\end{figure}
-Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by CoqUp and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because CoqUp-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, CoqUp generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
+Figure~\ref{fig:comparison_time} compares the execution time of hardware designs generated by Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. It uses logarithmic scales so that the data points are spread out more evenly. The execution times are quite well correlated between the two tools, but LegUp generates hardware designs that execute roughly fifty times faster, as illustrated by the diagonal line at $y=50x$. There are two reasons for this. First, because Vericert-generated hardware has all of its operations serialised, it requires more clock cycles to perform its computation -- about 4$\times$ as many as LegUp-generated hardware. Second, Vericert generates hardware with a much lower clock frequency than LegUp (roughly 22 MHz compared to roughly 250 MHz) so each of those clock cycles takes about 11$\times$ longer. The main reason for the low clock frequency is \JW{could somebody else finish this bit off?}.
The difference in cycle counts shows the degree of parallelism that LegUp's scheduling and memory system can offer. However, their Verilog generation is not guaranteed to be correct. Although the runtime LegUp outputs are tested to be correct for these programs, this does not provide any confidence on the correctness of Verilog generation of any other programs. Our Coq proof mechanisation guarantees that generated Verilog is correct for any input program that uses the subset of CompCert instructions that we have proven to be correct.
-\subsection{RQ2: How area-efficient is CoqUp-generated hardware?}
+\subsection{RQ2: How area-efficient is Vericert-generated hardware?}
\begin{figure}
\begin{tikzpicture}
@@ -57,23 +57,23 @@ The difference in cycle counts shows the degree of parallelism that LegUp's sch
height=80mm,
width=80mm,
xlabel={LegUp (\%)},
- ylabel={CoqUp (\%)},
+ ylabel={Vericert (\%)},
xmin=0, ymin=0,
]
\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
- table [x=leguputilisation, y=coquputilisation, col sep=comma]
+ table [x=leguputilisation, y=vericertutilisation, col sep=comma]
{results/comparison.csv};
\end{axis}
\end{tikzpicture}
-\caption{A comparison of the resource utilisation of designs generated by CoqUp and by LegUp}
+\caption{A comparison of the resource utilisation of designs generated by Vericert and by LegUp}
\label{fig:comparison_area}
\end{figure}
-Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by CoqUp and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that CoqUp designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
+Figure~\ref{fig:comparison_area} compares the size of the hardware designs generated by Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that Vericert designs use between 1\% and 30\% of the available logic on the FPGA, averaging at around 10\%, whereas LegUp designs all use less than 1\% of the FPGA, averaging at around 0.45\%. The main reason for this is \JW{blah blah}.
-\subsection{RQ3: How long does CoqUp take to produce hardware?}
+\subsection{RQ3: How long does Vericert take to produce hardware?}
\begin{figure}
\begin{tikzpicture}
@@ -81,20 +81,20 @@ Figure~\ref{fig:comparison_area} compares the size of the hardware designs gener
height=80mm,
width=80mm,
xlabel={LegUp (s)},
- ylabel={CoqUp (s)},
+ ylabel={Vericert (s)},
]
\addplot[draw=none, mark=*, draw opacity=0, fill opacity=0.3]
- table [x=legupcomptime, y=coqupcomptime, col sep=comma]
+ table [x=legupcomptime, y=vericertcomptime, col sep=comma]
{results/comparison.csv};
\end{axis}
\end{tikzpicture}
-\caption{A comparison of compilation time for CoqUp and for LegUp \JW{Numbers for CoqUp not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
+\caption{A comparison of compilation time for Vericert and for LegUp \JW{Numbers for Vericert not ready yet.} \JW{This graph might end up being cut -- we might just summarise the numbers in the paragraph.}}
\label{fig:comparison_comptime}
\end{figure}
-Figure~\ref{fig:comparison_comptime} compares the compilation times of CoqUp and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that CoqUp compilation is about ten times faster than LegUp compilation. The simple reason for this is that CoqUp omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
+Figure~\ref{fig:comparison_comptime} compares the compilation times of Vericert and by LegUp, with each data point corresponding to one of the PolyBench/C benchmarks. We see that Vericert compilation is about ten times faster than LegUp compilation. The simple reason for this is that Vericert omits many of the complicated and time-consuming optimisations that LegUp performs, such as scheduling multiple operations into clock cycles.
\begin{table}
\begin{tabular}{lcccccc}
@@ -121,7 +121,7 @@ Figure~\ref{fig:comparison_comptime} compares the compilation times of CoqUp and
mips & 18482 & 78.43 & 10617 & 7690 & 0 & 0\\
\bottomrule
\end{tabular}
- \caption{CHstone programs synthesised in CoqUp}
+ \caption{CHstone programs synthesised in Vericert}
\end{table}
%%% Local Variables:
diff --git a/introduction.tex b/introduction.tex
index 290d6a4..bcaf52f 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -29,7 +29,7 @@ CompCert~\cite{leroy06_formal_certif_compil_back_end} is a C compiler that has b
%% Contributions of paper
-In this paper we describe a fully verified HLS tool called CoqUp, which adds a Verilog back end to CompCert and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics. The main contributions of the paper are the following:
+In this paper we describe a fully verified HLS tool called Vericert, which adds a Verilog back end to CompCert and proves that the behaviour of the C code is preserved with respect to an existing Verilog semantics. The main contributions of the paper are the following:
\begin{itemize}
\item First mechanised and formally verified HLS flow from C to Verilog.
@@ -41,12 +41,12 @@ In this paper we describe a fully verified HLS tool called CoqUp, which adds a V
The first section will describe the Verilog semantics that were used and extended to fit into CompCert's model. The second section will then describe the HLS algorithm, together with its proof.
-CoqUp is open source and is hosted on GitHub\footnote{\url{https://github.com/ymherklotz/coqup}}.
+Vericert is open source and is hosted on GitHub\footnote{\url{https://github.com/ymherklotz/vericert}}.
\NR{Other comments:}
\NR{Is both the translator and verifier written in Coq?}\YH{Currently there is no verifier, the algorithms themselves are proven directly in Coq.}
%%\NR{A tool-flow diagram here will be useful. I thought you had one previously?}
-\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in CoqUp?}\YH{The LegUp example actually with the shift might be a good one.}
+\NR{Do you think we have a very simple example of a program where wrong Verilog is generated in VHLS or LegUp, but not in Vericert?}\YH{The LegUp example actually with the shift might be a good one.}
%%% Local Variables:
%%% mode: latex
diff --git a/results/comparison.csv b/results/comparison.csv
index e858299..bb990a3 100644
--- a/results/comparison.csv
+++ b/results/comparison.csv
@@ -1,4 +1,4 @@
-benchmark,coqupcycles,coqupfreqMHz,coquplogicutilisation,coqupregs,coquprams,coqupdsps,coqupwallclockMS,coquputilisation,coqupcomptime,legupcycles,legupfreqMHz,leguplogicutilisation,legupregs,leguprams,legupdsps,legupwallclockMS,leguputilisation,legupcomptime
+benchmark,vericertcycles,vericertfreqMHz,vericertlogicutilisation,vericertregs,vericertrams,vericertdsps,vericertwallclockMS,vericertutilisation,vericertcomptime,legupcycles,legupfreqMHz,leguplogicutilisation,legupregs,leguprams,legupdsps,legupwallclockMS,leguputilisation,legupcomptime
durbin,21581,18.23,2941,4750,0,8,1.18,0.69,0.20,15106,188.61,2509,4008,0,8,0.08,0.59,4.77
lu,2685403,19.21,54188,104005,0,6,139.79,12.68,0.20,482766,244.62,3116,4593,0,10,1.97,0.73,4.72
ludcmp,2460807,18.78,58022,108157,0,10,131.03,13.58,0.20,470843,249.69,3605,5397,0,15,1.89,0.84,4.87
@@ -23,4 +23,4 @@ jacobi-1d,17481,21.07,2478,2989,0,0,0.83,0.58,0.20,6914,386.25,1355,1885,0,0,0.0
jacobi-2d,314377,20.75,31101,59410,0,4,15.15,7.28,0.20,84609,240.79,2347,3185,0,2,0.35,0.55,4.81
seidel-2d,609259,18.8,27638,52840,0,2,32.41,6.47,0.20,345294,232.4,2128,3337,0,2,1.49,0.50,4.68
floyd-warshall,4556081,17.2,60713,116401,0,2,264.89,14.21,0.20,1238764,235.52,1869,2367,0,2,5.26,0.44,4.71
-nussinov,881801,87.67,60649,119210,0,0,10.06,14.20,0.20,216467,273.07,1078,1431,0,2,0.79,0.25,4.79 \ No newline at end of file
+nussinov,881801,87.67,60649,119210,0,0,10.06,14.20,0.20,216467,273.07,1078,1431,0,2,0.79,0.25,4.79
diff --git a/verilog.tex b/verilog.tex
index b4b2d08..ffec632 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -1,10 +1,10 @@
\section{Verilog}
-Verilog is a hardware description language commonly used to design hardware. A Verilog design can then be synthesised into logic gates which describes how different gates connect to each other, called a netlist. This representation can then be mapped onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC) to implement the design that was described in Verilog. The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware. Many Verilog features are only useful for simulation and do not affect the actual hardware itself, we can therefore restrict the Verilog semantics to the synthesisable subset of Verilog~\cite{05_veril_regis_trans_level_synth}. In addition to that, as the HLS algorithm dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.~\NR{Didn't get this sentence? Do you mean that the HLS algo further restricts the synthesisable subset?}\YH{Yes basically, because we get to choose what we generate. For example, we don't have to support combinational always blocks.} Only supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated.
-\NR{What is the distinction here between the semantics and simulation? Discuss.}
+Verilog is a hardware description language commonly used to design hardware. A Verilog design can then be synthesised into logic gates which describes how different gates connect to each other, called a netlist. This representation can then be mapped onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC) to implement the design that was described in Verilog. The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware. Many Verilog features are only useful for simulation and do not affect the actual hardware itself, we can therefore restrict the Verilog semantics to the synthesisable subset of Verilog~\cite{05_veril_regis_trans_level_synth}. In addition to that, HLS dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.~\NR{Didn't get this sentence? Do you mean that the HLS algo further restricts the synthesisable subset?}\YH{Yes basically, because we get to choose what we generate. For example, we don't have to support combinational always blocks.} Supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated and synthesised by existing tools and how it is dictated by the standard.
+\NR{What is the distinction here between the semantics and simulation? Discuss.}\YH{Tried to clarify that I meant simulation and synthesis by tools and how it should be understood from the standard}
-The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit. These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design. An abstraction of the Verilog syntax that is generated is shown below:
-\NR{What is HOL? Was it discussed earlier?}
+The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which was used to create a formal translation from a logic representation encoded in the HOL4~\cite{slind08_brief_overv_hol4} theorem prover into an equivalent Verilog circuit. These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design. An abstraction of the Verilog syntax that is generated is shown below:
+\NR{What is HOL? Was it discussed earlier?}\YH{Another theorem prover, have clarified hopefully.}
\begin{align*}
v\quad ::=&\; \mathit{sz} \yhkeyword{'d} n\\
@@ -21,11 +21,11 @@ The Verilog semantics are based on the semantics proposed by \citet{loow19_verif
m \text{ list}\quad ::=&\; \{ m \}
\end{align*}
-The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays.~\NR{Isn't the 'main' addition that the PL community may not have seen the always blocks? It can be counter-intuitive to them. Also have you intentionally avoided assign statements? Finally, what is $\epsilon$ in the syntax?} This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated. In addition to that, literal values are not represented by a list of nested Boolean values, but instead they are represented by a size and its value, meaning a Boolean is represented as a value with size one. Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}\YH{Yes that's actually completely true, I added a $\vec{m}$ rule for now, but can also add the $m$ afterwards.}
+The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays.~\NR{Isn't the 'main' addition that the PL community may not have seen the always blocks? It can be counter-intuitive to them. Also have you intentionally avoided assign statements? Finally, what is $\epsilon$ in the syntax?}\YH{Yes, I will probably have to explain always blocks some more, I'll try and add a paragraph about them. Yes, assign statements are quite complex to model and many semantics do not model them. They are also not needed for translation. The $\epsilon$ is an empty skip statement, which is just a semicolon.} This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated. In addition to that, literal values are not represented by a list of nested Boolean values, but instead they are represented by a size and its value, meaning a Boolean is represented as a value with size one. Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}\YH{Yes that's actually completely true, I added a $\vec{m}$ rule for now, but can also add the $m$ afterwards.}
\subsection{Semantics}
-Existing operational semantics~\cite{loow19_verif_compil_verif_proces}~\NR{of Verilog?} were adapted for the semantics of the language that CoqUp eventually targets. This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.~\NR{Confusing sentence. How can you take a big step within a small step?} This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
+Existing operational semantics~\cite{loow19_verif_compil_verif_proces}~\NR{of Verilog?} were adapted for the semantics of the language that Vericert eventually targets. This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.~\NR{Confusing sentence. How can you take a big step within a small step?} This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
At the top-level, always blocks describe logic which is run every time some event occurs. The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic.~\NR{So the semantics does not support combinational always blocks or assignments?} As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.~\NR{Does the semantics also enforce that variables that only be modified within one always block (multiple drivers not allowed)? PL community might think of data races. } However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.~\NR{Oh, do we sequentialise always execution?} However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.~\NR{Have you discussed what an association map is?} This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel. We can denote these two association maps as $s = (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})$, where $\Gamma_{\rm r}$ is the current value of the registers, $\Gamma_{\rm a}$ is the current value of the array, and $\Delta_{\rm r}$ and $\Delta_{\rm a}$ are the values of the variables and arrays when the clock cycle ends.~\NR{Do blocking and non-blocking assignments each have individual maps? Also, association map is used before definition in this paragraph.}