summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-15 10:38:53 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-15 10:38:53 +0100
commit2bf1ff846a792ef3007fa1e97928697509f318f7 (patch)
treede842907b30c337f3c20a35f36fc17ede8147fd7
parent6363b3998d65dc7a4e45cec0db9b41f69f45fb31 (diff)
downloadoopsla21_fvhls-2bf1ff846a792ef3007fa1e97928697509f318f7.tar.gz
oopsla21_fvhls-2bf1ff846a792ef3007fa1e97928697509f318f7.zip
Update the table
-rw-r--r--algorithm.tex2
-rw-r--r--proof.tex17
-rw-r--r--results/results.org34
-rw-r--r--verilog.tex20
4 files changed, 57 insertions, 16 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 1d8201d..2241ee6 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -292,7 +292,7 @@ Typically, HLS-generated hardware consists of a sea of registers and RAM memorie
This memory view is very different to the C memory model, so we perform the following translation.
Variables that do not have their address taken are kept in registers, which correspond to the registers in 3AC.
All address-taken variables, arrays, and structs are kept in RAM.
-The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. \vericert{} generates a well-formed RAM template in Verilog to ensure that a RAM is present in the final hardware, performing reads and writes using the signals that the RAM template exposes, instead of directly modifying the memory.
+The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. Initially, \vericert{} translates loads and stores to direct accesses to the memory. An extra compiler pass is added to generate a well-formed RAM template to ensure that a RAM is correctly inferred by the synthesis tool. Loads and stores are performed using the signals that the RAM template exposes, instead of directly modifying the array.
Finally, global variables are not translated in \vericert{} at the moment.
A high-level overview of the architecture can be seen in Figure~\ref{fig:accumulator_diagram}.
diff --git a/proof.tex b/proof.tex
index b6c9775..1b1f234 100644
--- a/proof.tex
+++ b/proof.tex
@@ -158,15 +158,16 @@ The final lemma we need is that the Verilog we generate is deterministic. This r
\toprule
& \textbf{Coq code} & \multicolumn{1}{p{1cm}}{\raggedleft\textbf{OCaml code}} & \textbf{Spec} & \multicolumn{1}{p{2cm}}{\raggedleft\textbf{Theorems \& Proofs}} & \textbf{Total}\\
\midrule
- {Data structures and libraries} & 274 & --- & --- & 654 & 928 \\
- {Integers and values} & 98 & --- & 15 & 744 & 857 \\
- {HTL semantics} & --- & --- & 174 & --- & 174 \\
- {HTL generation} & 655 & --- & 79 & 3349 & 4083 \\
- {Verilog semantics} & --- & --- & 739 & 174 & 913 \\
- {Verilog generation} & 68 & --- & --- & 396 & 464 \\
- {Top-level driver, pretty printers} & 89 & 747 & --- & 209 & 1045 \\
+ {Data structures and libraries} & 280 & --- & --- & 500 & 780 \\
+ {Integers and values} & 98 & --- & 15 & 968 & 1081 \\
+ {HTL semantics} & 50 & --- & 181 & 65 & 296 \\
+ {HTL generation} & 590 & --- & 66 & 3069 & 3725 \\
+ {RAM generation} & 203 & --- & --- & 2843 & 3046 \\
+ {Verilog semantics} & 78 & --- & 431 & 235 & 744 \\
+ {Verilog generation} & 104 & --- & --- & 486 & 590 \\
+ {Top-level driver, pretty printers} & 318 & 775 & --- & 189 & 1282 \\
\midrule
- \textbf{Total} & 1184 & 747 & 1007 & 5526 & 8464 \\
+ \textbf{Total} & 1721 & 775 & 693 & 8355 & 11544 \\
\bottomrule
\end{tabular}
\caption{Statistics about the numbers of lines of code in the proof and implementation of \vericert{}.}
diff --git a/results/results.org b/results/results.org
index 741a1ee..264fd0a 100644
--- a/results/results.org
+++ b/results/results.org
@@ -607,3 +607,37 @@
|----------------+------------+-------------+---------------+---------------------|
| median | 1.0575184 | 0.91083941 | 1.0246082 | 0.99245744 |
#+TBLFM: @2$1..@-2$1='(identity remote(vericert-div,@@#$1))::@2$2..@-2$2=(remote(vericert-div,@@#$3))/(remote(legup-div,@@#$3))::@2$3..@-2$3=(remote(legup-noopt-div,@@#$3))/(remote(legup-div,@@#$3))::@2$4..@-2$4=(remote(legup-nochain-div,@@#$3))/(remote(legup-div,@@#$3))::@2$5..@-2$5=(remote(legup-noopt-nochain-div,@@#$3))/(remote(legup-div,@@#$3))::@27$2..@27$5=vmedian(@2$$#..@-2$$#)
+
+* Code count
+
+| File | Theorem | Code | Spec | Total |
+|-------------------------+---------+------+------+-------|
+| Compiler.v | 180 | 132 | 0 | 312 |
+| HLSOpts.v | 0 | 2 | 0 | 2 |
+| VericertClFlags.ml | 0 | 9 | 0 | 9 |
+| common/IntegerExtra.v | 587 | 98 | 15 | 700 |
+| common/Maps.v | 0 | 32 | 0 | 32 |
+| common/Monad.v | 0 | 35 | 0 | 35 |
+| common/Show.v | 0 | 42 | 0 | 42 |
+| common/Statemonad.v | 0 | 46 | 0 | 46 |
+| common/Vericertlib.v | 124 | 51 | 0 | 175 |
+| common/ZExtra.v | 258 | 0 | 0 | 258 |
+| hls/Array.v | 229 | 44 | 0 | 273 |
+| hls/AssocMap.v | 147 | 30 | 0 | 177 |
+| hls/FunctionalUnits.v | 9 | 0 | 0 | 9 |
+| hls/HTL.v | 65 | 50 | 181 | 296 |
+| hls/HTLgen.v | 0 | 590 | 0 | 590 |
+| hls/HTLgenproof.v | 2567 | 0 | 0 | 2567 |
+| hls/HTLgenspec.v | 502 | 0 | 66 | 568 |
+| hls/Memorygen.v | 2843 | 203 | 0 | 3046 |
+| hls/PrintHTL.ml | 0 | 70 | 0 | 70 |
+| hls/PrintVerilog.ml | 0 | 236 | 0 | 236 |
+| hls/ValueInt.v | 123 | 0 | 0 | 123 |
+| hls/Verilog.v | 235 | 78 | 431 | 744 |
+| hls/Veriloggen.v | 0 | 104 | 0 | 104 |
+| hls/Veriloggenproof.v | 486 | 0 | 0 | 486 |
+| extraction/Extraction.v | 0 | 184 | 0 | 184 |
+| driver/VericertDriver.v | 0 | 460 | 0 | 460 |
+|-------------------------+---------+------+------+-------|
+| Total | 8355 | 2496 | 693 | 11544 |
+#+TBLFM: @>$2..@>$4=vsum(@2$$#..@-1$$#)::$5=vsum(@@#$-3..@@#$-1)
diff --git a/verilog.tex b/verilog.tex
index cb0d243..e95889c 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -32,28 +32,34 @@ Nonblocking assignment can therefore be expressed as follows:
Finally, the following rule dictates how the whole module runs in one clock cycle:
\begin{equation*}
- \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m}) \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\ \Delta'')}
+ \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma'\ //\ \Delta')}
\end{equation*}
where $\Gamma$ is the initial state of all the variables, and $\vec{m}$ is a list of variable declarations and \alwaysblock{}s that $\downarrow_{\text{module}}$ evaluates sequentially to obtain $(\Gamma', \Delta')$. The final state is obtained by merging these maps using the $//$ operator, which gives priority to the right-hand operand in a conflict. This rule ensures that the nonblocking assignments overwrite at the end of the clock cycle any blocking assignments made during the cycle.
\subsection{Changes to the Semantics}
-Four changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as a HLS target.
+Five changes were made to the semantics proposed by \citet{loow19_proof_trans_veril_devel_hol} to make it suitable as a HLS target.
\paragraph{Adding array support}
The main change is the addition of support for arrays, which are needed to model RAM in Verilog. RAM is needed to model the stack in C efficiently, without having to declare a variable for each possible stack location. % In the original semantics, RAMs (as well as inputs and outputs to the module) could be modelled using a function from variable names (strings) to values, which could be modified accordingly to model inputs to the module. This is quite an abstract description of memory and can also be expressed as an array of bitvectors instead, which is the path we took. This requires the addition of array operators to the semantics and correct reasoning of loads and stores to the array in different \alwaysblock{}s simultaneously.
Consider the following Verilog code:
\begin{center}
-\begin{minted}{verilog}
+\begin{minted}[xleftmargin=40pt,linenos]{verilog}
reg [31:0] x[1:0];
-always @(posedge clk) begin
- x[0] = 1;
- x[1] <= 1;
-end
+always @(posedge clk) begin x[0] = 1; x[1] <= 1; end
\end{minted}
\end{center}
which modifies one array element using blocking assignment and then a second using nonblocking assignment. If the existing semantics were used to update the array, then during the merge, the entire array \texttt{x} from the nonblocking association map would replace the entire array from the blocking association map. This would replace \texttt{x[0]} with its original value and therefore behave incorrectly. Accordingly, we modified the maps so they record updates on a per-el\-em\-ent basis. Our state $\Gamma$ is therefore split up into $\Gamma_{r}$ for instantaneous updates to variables, and $\Gamma_{a}$ for instantaneous updates to arrays; $\Delta$ is split similarly. The merge function then ensures that only the modified indices get updated when $\Gamma_{a}$ is merged with the nonblocking map equivalent $\Delta_{a}$.
+\paragraph{Adding negative edge support}
+To support memory inferrence efficiently and create and reason about a circuit that executes at the negative edge of the clock, support for the negative edge triggered \alwaysblock{}s was added to the semantics. This is shown in the modifications of the \textsc{Module} rule shown below:
+
+\begin{equation*}
+ \inferrule[Module]{(\Gamma, \epsilon, \vec{m})\ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m}) \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \yhkeyword{module } \yhconstant{main} \yhkeyword{(...);}\ \vec{m}\ \yhkeyword{endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\ \Delta'')}
+\end{equation*}
+
+The main execution of the module $\downarrow_{\text{module}}$ is split into $\downarrow_{\text{module}^{+}}$ and $\downarrow_{\text{module}^{-}}$, which are rules that only execute always blocks triggered at the positive and at the negative edge respectively. As in the initial \textsc{Module} rule, the positive edge triggered \alwaysblock{}s are processed in the same way. The output maps $\Gamma'$ and $\Delta'$ are then merged and passed as the blocking assignments map into the negative edge execution, so that all the blocking and nonblocking assignments are present. Finally, all the negative edge triggered \alwaysblock{}s are processed and merged to give the final state.
+
\paragraph{Adding declarations} Explicit support for declaring inputs, outputs and internal variables was added to the semantics to make sure that the generated Verilog also contains the correct declarations. This adds some guarantees to the generated Verilog and ensures that it synthesises and simulates correctly.
\paragraph{Removing support for external inputs to modules} Support for receiving external inputs was removed from the semantics for simplicity, as these are not needed for an HLS target. The main module in Verilog models the main function in C, and since the inputs to a C function shouldn't change during its execution, there is no need for external inputs for Verilog modules.