summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJohn Wickerson <>2020-11-20 11:19:47 +0000
committerJohn Wickerson <>2020-11-20 11:19:47 +0000
commitdae277c624efa583d70630c52569da8c0fb62e41 (patch)
treea894fb404a76a20c8264b0c4c9d2a904344c425a
parent5d705ed31620a09d40e80535101f15095f64b4c1 (diff)
parente2766297022dccecf237bbc95289fb01de4ed643 (diff)
downloadoopsla21_fvhls-dae277c624efa583d70630c52569da8c0fb62e41.tar.gz
oopsla21_fvhls-dae277c624efa583d70630c52569da8c0fb62e41.zip
Merge branch 'master' of https://git.overleaf.com/5ed78033b633200001e693d0
# Conflicts: # verilog.tex
-rw-r--r--algorithm.tex85
-rw-r--r--references.bib19
-rw-r--r--verilog.tex9
3 files changed, 67 insertions, 46 deletions
diff --git a/algorithm.tex b/algorithm.tex
index b731dfc..6c02326 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -8,7 +8,7 @@ First of all, the choice of C as the input language of \vericert{} is simply bec
%Since a lot of existing code for HLS is written in C, supporting C as an input language, rather than a custom domain-specific language, means that \vericert{} is more practical.
%An alternative was to support LLVM IR as an input language, however, to get a full work flow from a higher level language to hardware, a front end for that language to LLVM IR would also have to be verified. \JW{Maybe save LLVM for the `Choice of implementation language'?}
We considered Bluespec~\cite{nikhil04_blues_system_veril}, but decided that although it ``can be classed as a high-level language''~\cite{greaves_note}, it is too hardware-oriented to be used for traditional HLS.
-We also considered using a language with built-in parallel constructs that map well to parallel hardware, such as occam~\cite{page91_compil_occam} or Spatial~\cite{spatial}, but found these languages too niche.
+We also considered using a language with built-in parallel constructs that map well to parallel hardware, such as occam~\cite{page91_compil_occam}, Spatial~\cite{spatial} or Scala~\cite{chisel}, but we found these languages too niche.
% However, this would not qualify as being HLS due to the manual parallelism that would have to be performed. \JW{I don't think the presence of parallelism stops it being proper HLS.}
%\JP{I think I agree with Yann here, but it could be worded better. At any rate not many people have experience writing what is essentially syntactic sugar over a process calculus.}
%\JW{I mean: there are plenty of software languages that involve parallel constructs. Anyway, perhaps we can just dismiss occam for being too obscure.}
@@ -63,7 +63,7 @@ The main work flow of \vericert{} is given in Figure~\ref{fig:rtlbranch}, which
\compcert{} is made up of 11 intermediate languages in between the Clight input and the assembly output, so the first thing we must decide is where best to branch off to our Verilog back end.
-We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language (RTL) in the \compcert{} literature. We use `3AC' is used in this paper instead to avoid confusion with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} as the branching off point. If we branch off before this (at CminorSel or earlier), then CompCert has not had the opportunity to perform such optimisations as constant propagation and dead code elimination, which have been shown to be valuable not just in conventional compilation but also in HLS~\cite{cong+11}. And if we branch off after this (at LTL or later) then CompCert has already performed register allocation to reduce the number of registers and spill some variables to the stack; this transformation is not required in HLS because there are many more registers available, and these should be used instead of RAM whenever possible.
+We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language (RTL) in the \compcert{} literature. We use `3AC' is used in this paper instead to avoid confusion with register-transfer level (RTL), which is another name for the final hardware target of the HLS tool.} as the branching off point. If we branch off before this point (at CminorSel or earlier), then CompCert has not had the opportunity to perform such optimisations as constant propagation and dead code elimination, which have been shown to be valuable not just in conventional compilation but also in HLS~\cite{cong+11}. And if we branch off after this point (at LTL or later) then CompCert has already performed register allocation to reduce the number of registers and spill some variables to the stack; this transformation is not required in HLS because there are many more registers available, and these should be used instead of RAM whenever possible.
3AC is also attractive because it is the closest intermediate language to LLVM IR, which is used by several existing HLS compilers. It has an unlimited number of pseudo-registers, and is represented as a control flow graph (CFG) where each instruction is a node with links to the instructions that can follow it. One difference between LLVM IR and 3AC is that 3AC includes operations that are specific to the chosen target architecture; we chose x86\_32 because each instruction maps well to hardware.
@@ -137,10 +137,10 @@ An HTL program thus consists of two maps: a control map that describes how to ca
\definecolor{control}{HTML}{b3e2cd}
\definecolor{data}{HTML}{fdcdac}
\begin{tikzpicture}
- \fill[control,fill opacity=1] (6.5,0) rectangle (12,4);
- \fill[data,fill opacity=1] (0,0) rectangle (5.5,4);
- \node at (1,3.7) {\footnotesize Data Path};
- \node at (7.5,3.7) {\footnotesize Control Logic};
+ \fill[control,fill opacity=1] (6.5,0) rectangle (12,5);
+ \fill[data,fill opacity=1] (0,0) rectangle (5.5,5);
+ \node at (1,4.7) {\footnotesize Data Path};
+ \node at (7.5,4.7) {\footnotesize Control Logic};
\fill[white,rounded corners=10pt] (7,0.5) rectangle (11.5,2.2);
\node at (8,2) {\tiny Next State FSM};
@@ -167,11 +167,12 @@ An HTL program thus consists of two maps: a control map that describes how to ca
\draw[-{Latex[length=1mm,width=0.7mm]}] (7.45,-0.5) -- (7.45,0.5);
\fill[white,rounded corners=10pt] (2,0.5) rectangle (5,3);
- \filldraw[fill=white] (0.25,0.5) rectangle (1.5,3);
+ \filldraw[fill=white] (0.25,0.5) rectangle (1.5,2.75);
\node at (2.6,2.8) {\tiny Update};
- \node at (0.875,2.8) {\tiny \texttt{RAM}};
+ \node[align=center] at (0.875,2.4) {\tiny \texttt{RAM}\\[-1.5ex]\tiny\texttt{(Array)}};
\node[scale=0.4] at (4.7,1.5) {\texttt{state}};
\draw[-{Latex[length=1mm,width=0.7mm]}] (6,1.5) -- (5,1.5);
+ \draw[-{Latex[length=1mm,width=0.7mm]}] (6,1.5) -- (7,1.5);
\node[scale=0.4,rotate=60] at (4.1,0.9) {\texttt{finished}};
\node[scale=0.4,rotate=60] at (3.9,0.95) {\texttt{return\_val}};
\node[scale=0.4,rotate=60] at (2.5,0.75) {\texttt{clk}};
@@ -182,37 +183,41 @@ An HTL program thus consists of two maps: a control map that describes how to ca
\draw[-{Latex[length=1mm,width=0.7mm]}] (2.45,-0.5) -- (2.45,0.5);
\draw[-{Latex[length=1mm,width=0.7mm]}] (2.65,-0.5) -- (2.65,0.5);
- \node[scale=0.4] at (1.2,2.2) {\texttt{wr\_en}};
- \node[scale=0.4] at (1.2,2) {\texttt{wr\_addr}};
- \node[scale=0.4] at (1.2,1.8) {\texttt{wr\_data}};
- \node[scale=0.4] at (1.2,1.4) {\texttt{r\_addr}};
- \node[scale=0.4] at (1.2,1.2) {\texttt{r\_data}};
-
- \node[scale=0.4] at (2.3,2.2) {\texttt{wr\_en}};
- \node[scale=0.4] at (2.3,2) {\texttt{wr\_addr}};
- \node[scale=0.4] at (2.3,1.8) {\texttt{wr\_data}};
- \node[scale=0.4] at (2.3,1.4) {\texttt{r\_addr}};
- \node[scale=0.4] at (2.3,1.2) {\texttt{r\_data}};
-
- \draw[-{Latex[length=1mm,width=0.7mm]}] (2,2.2) -- (1.5,2.2);
- \draw[-{Latex[length=1mm,width=0.7mm]}] (2,2) -- (1.5,2);
- \draw[-{Latex[length=1mm,width=0.7mm]}] (2,1.8) -- (1.5,1.8);
- \draw[-{Latex[length=1mm,width=0.7mm]}] (2,1.4) -- (1.5,1.4);
- \draw[-{Latex[length=1mm,width=0.7mm]}] (1.5,1.2) -- (2,1.2);
-
- %\filldraw[fill=white] (2.8,3.25) rectangle (4.2,4.75);
- %\node at (3.5,4.55) {\tiny \texttt{Registers}};
- %\draw[-{Latex[length=1mm,width=0.7mm]}] (2,2.4) -| (1.75,4) -- (2.8,4);
- %\draw[-{Latex[length=1mm,width=0.7mm]}] (4.2,4) -- (5.25,4) |- (5,2.4);
-
- %\node[scale=0.4] at (3.5,4.2) {\texttt{reg\_1}};
- %\node[scale=0.4] at (3.5,4) {\texttt{reg\_2}};
- %\node[scale=0.4] at (3.5,3.8) {\texttt{reg\_3}};
- %\node[scale=0.4] at (3.5,3.6) {$\cdots$};
- %\node[scale=0.4] at (3.5,3.4) {\texttt{reg\_8}};
+ \foreach \x in {0,...,2}
+ {\draw (0.25,1.25-0.25*\x) -- (1.5,1.25-0.25*\x); \node at (0.875,1.13-0.25*\x) {\tiny \x};}
+ \draw[-{Latex[length=1mm,width=0.7mm]}] (1.5,1.5) -- (2,1.5);
+ \draw[-{Latex[length=1mm,width=0.7mm]}] (2,1.75) -- (1.5,1.75);
+
+ %\node[scale=0.4] at (1.2,2.2) {\texttt{wr\_en}};
+ %\node[scale=0.4] at (1.2,2) {\texttt{wr\_addr}};
+ %\node[scale=0.4] at (1.2,1.8) {\texttt{wr\_data}};
+ %\node[scale=0.4] at (1.2,1.4) {\texttt{r\_addr}};
+ %\node[scale=0.4] at (1.2,1.2) {\texttt{r\_data}};
+ %
+ %\node[scale=0.4] at (2.3,2.2) {\texttt{wr\_en}};
+ %\node[scale=0.4] at (2.3,2) {\texttt{wr\_addr}};
+ %\node[scale=0.4] at (2.3,1.8) {\texttt{wr\_data}};
+ %\node[scale=0.4] at (2.3,1.4) {\texttt{r\_addr}};
+ %\node[scale=0.4] at (2.3,1.2) {\texttt{r\_data}};
+ %
+ %\draw[-{Latex[length=1mm,width=0.7mm]}] (2,2.2) -- (1.5,2.2);
+ %\draw[-{Latex[length=1mm,width=0.7mm]}] (2,2) -- (1.5,2);
+ %\draw[-{Latex[length=1mm,width=0.7mm]}] (2,1.8) -- (1.5,1.8);
+ %\draw[-{Latex[length=1mm,width=0.7mm]}] (2,1.4) -- (1.5,1.4);
+ %\draw[-{Latex[length=1mm,width=0.7mm]}] (1.5,1.2) -- (2,1.2);
+
+ \filldraw[fill=white] (2.8,3.25) rectangle (4.2,4.75);
+ \node at (3.5,4.55) {\tiny \texttt{Registers}};
+ \draw[-{Latex[length=1mm,width=0.7mm]}] (2,2.4) -| (1.75,4) -- (2.8,4);
+ \draw[-{Latex[length=1mm,width=0.7mm]}] (4.2,4) -- (5.25,4) |- (5,2.4);
+
+ \node[scale=0.4] at (3.5,4.2) {\texttt{reg\_1}};
+ \node[scale=0.4] at (3.5,4) {\texttt{reg\_2}};
+ \node[scale=0.4] at (3.5,3.8) {\texttt{reg\_3}};
+ \node[scale=0.4] at (3.5,3.6) {$\cdots$};
+ \node[scale=0.4] at (3.5,3.4) {\texttt{reg\_8}};
\end{tikzpicture}
- \caption{The FSMD for our running example.\YH{Removed the register block because these are technically just internal to the update function. Also not sure if we should keep the RAM block because it is a bit misleading as we don't use those signals to communicate with the RAM, instead we just use a RAM.}}\label{fig:accumulator_diagram}
-%\JW{Maybe replace `State' with `Current State'? And maybe `Calculate State' could be clearer as `Calculate Next State'?} \JW{Can state 15 (or should it be state 16??) have a dangling incoming arrow to indicate that it is the start state? And perhaps state 1 could have a double outline to indicate that it is an `accepting' state? Since there's space above the `Calculate State' box, I'd be mildly in favour of expanding that box a bit so that it included all 15 states explicitly (snaking back and forth).}\YH{If this is better I can mock up a tikz version of it maybe and fix the last bits then too.}
+ \caption{The FSMD for the example shown in Figure~\ref{fig:accumulator_c_rtl}, split into a data path and control logic for the next state calculation. The Update block takes the current state, current values of all registers and at most one value stored in the array, and calculates a new value that can either be stored back in the array or in a register.}\label{fig:accumulator_diagram}
\end{figure*}
\paragraph{Translating memory}
@@ -220,7 +225,7 @@ Hardware does not have the same memory model as C, so the memory model needs to
\paragraph{Translating instructions}
Each 3AC instruction either corresponds to a hardware construct, or does not have to be handled by the translation, such as function calls.
-For example, in state 16 in figure~\ref{fig:accumulator_rtl}, the register \texttt{x9} is initialised to 1, after which the control flow moves to state 15. This is encoded in HTL by initialising a 32-bit register \texttt{reg\_9} to 1 in the data-flow section, and also adding a transition to the state 15 in the control logic section. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
+For example, in state 15 in figure~\ref{fig:accumulator_rtl}, the register \texttt{x8} is initialised to 1, after which the control flow moves to state 14. This is encoded in HTL by initialising a 32-bit register \texttt{reg\_8} to 1 in the data-flow section, and also adding a transition to the state 14 in the control logic section. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
\paragraph{Key challenge: signedness} Note that the comparison in state 3 is signed. This is because C and Verilog handle signedness quite differently. By default, all operators and registers in Verilog (and HTL) are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to be signed. In addition to that, Verilog resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is also not supported by the Verilog semantics we adopted, and there would therefore be a mismatch between the Verilog semantics and the actual behaviour of Verilog according to the standard. To bypass this issue, braces are used to stop the Verilog simulator or synthesis tool from resizing anything inside the braces. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
@@ -309,8 +314,6 @@ The translation from 3AC to HTL is straightforward, as each 3AC instruction eith
%\JW{I suspect that you could safely chop that sentence.}
For example, in state 15 in figure~\ref{fig:accumulator_rtl}, the register \texttt{x8} is initialised to 1, after which the control flow moves to state 14. This is encoded in HTL by initialising a 32-bit register \texttt{reg\_8} to 1 in the data-flow section, and also adding a transition to the state 15 in the control logic section. Simple operator instructions are translated in a similar way. For example, in state 5, the value in the array is added to the current value of the accumulated sum, which is simply translated to an addition of the equivalent registers in the HTL code.
-\paragraph{Key challenge: signedness} \JW{This paragraph appeared earlier too! Does it belong in the 3AC-to-HTL subsection or the HTL-to-Verilog subsection?} Note that the comparison in state 3 is signed. By default, all operators and registers in Verilog and HTL are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to signed. In addition to that, Verilog resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is also not supported by the Verilog semantics we used, and there would therefore be a mismatch between the Verilog semantics and the actual behaviour of Verilog according to the standard. To bypass this issue braces are used to stop the Verilog simulator or synthesis tool from resizing anything inside the braces. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
-
\subsection{Translating HTL to Verilog}
Finally, we have to translate the HTL code into proper Verilog and prove that it behaves the same as the 3AC according to the Verilog semantics. Whereas HTL is a language that is specifically designed to represent the FSMDs we are interested in, Verilog is a general-purpose HDL.\@ So the challenge here is to translate our FSMD representation into a Verilog AST. However, as all the instructions are already expressed in Verilog, only the maps need to be translated to valid Verilog, and correct declarations for all the variables in the program need to be added as well.
diff --git a/references.bib b/references.bib
index 982434f..d4cd81f 100644
--- a/references.bib
+++ b/references.bib
@@ -820,3 +820,22 @@ year = {2020},
note = {unpublished},
title = {Fuzzing High-Level Synthesis Tools},
}
+
+
+@inproceedings{kiwi_hls,
+ title={Kiwi: Synthesis of FPGA circuits from parallel programs},
+ author={Singh, Satnam and Greaves, David J},
+ booktitle={2008 16th International Symposium on Field-Programmable Custom Computing Machines},
+ pages={3--12},
+ year={2008},
+ organization={IEEE}
+}
+
+@inproceedings{chisel,
+ title={Chisel: constructing hardware in a scala embedded language},
+ author={Bachrach, Jonathan and Vo, Huy and Richards, Brian and Lee, Yunsup and Waterman, Andrew and Avi{\v{z}}ienis, Rimas and Wawrzynek, John and Asanovi{\'c}, Krste},
+ booktitle={DAC Design Automation Conference 2012},
+ pages={1212--1221},
+ year={2012},
+ organization={IEEE}
+} \ No newline at end of file
diff --git a/verilog.tex b/verilog.tex
index a1f05bb..e3b09fa 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -18,12 +18,11 @@ An example of a rule for executing an \alwaysblock{} is shown below, where $\Sig
\inferrule[Always]{(\Sigma, s)\downarrow_{\text{stmnt}} \Sigma'}{(\Sigma, \yhkeyword{always @(posedge clk) } s) \downarrow_{\text{always}} \Sigma'}
\end{equation*}
-\noindent which shows that assuming the statement $s$ in the \alwaysblock{} runs with state $\Sigma$ and produces the new state $\Sigma'$, the \alwaysblock{} will result in the same final state. As only clocked \alwaysblock{} are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle for each \alwaysblock{}.
-
-Two types of assignments are supported in \alwaysblock{}: nonblocking and blocking assignment. Nonblocking assignment modifies the signal at the end of the clock cycle and atomically. Blocking assignment, on the other hand, assigns the variable instantaneously in the \alwaysblock{} for later signals to pick up. To model both these assignments, the state $\Sigma$ has to be split into two maps: $\Gamma$, containing the current values of all variables, and $\Delta$, containing the values that will be assigned to the variables at the end of the clock cycle, we can therefore say that $\Sigma = (\Gamma, \Delta)$. The nonblocking assignment can therefore be expressed as the following:
-
-%~\NR{Can we say that $\Gamma$ contains instantaneous (ephemeral) updates for the current cycle and $\Delta$ contains periodical updates for the next cycle? Will be good to distinguish those two updates with the same terms across the paper, which can tie in with small-step and big-step distinction maybe asynchronous vs synchronous updates. }\YH{Hmm, so I don't think nonblocking and blocking ties in with big-step and smallstep, but yes it would be better to use the same terms throughoug the paper}
+\noindent This rule says that assuming the statement $s$ in the \alwaysblock{} runs with state $\Sigma$ and produces the new state $\Sigma'$, the \alwaysblock{} will result in the same final state. %Since only clocked \alwaysblock{} are supported, and one step in the semantics correspond to one clock cycle, it means that this rule is run once per clock cycle for each \alwaysblock{}.
+Two types of assignments are supported in \alwaysblock{}s: nonblocking and blocking assignment. Nonblocking assignments all take effect simultaneously at the end of the clock cycle, %and atomically.
+while blocking assignments happen instantly so that later assignments in the clock cycle can pick them up. To model both of these assignments, the state $\Sigma$ has to be split into two maps: $\Gamma$, which contains the current values of all variables, and $\Delta$, which contains the values that will be assigned at the end of the clock cycle. %, we can therefore say that $\Sigma = (\Gamma, \Delta)$.
+Nonblocking assignment can therefore be expressed as follows:
\begin{equation*}
\inferrule[Nonblocking Reg]{\yhkeyword{name}\ d = \yhkeyword{OK}\ n \\ (\Gamma, e) \downarrow_{\text{expr}} v}{((\Gamma, \Delta), d\ \yhkeyword{ <= } e) \downarrow_{\text{stmnt}} (\Gamma, \Delta [n \mapsto v])}\\
\end{equation*}