summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-10-14 11:45:08 +0100
committerYann Herklotz <git@yannherklotz.com>2020-10-14 11:45:08 +0100
commitdab109f87a4f227ad51aadfeb8be5f9a8ec96aba (patch)
tree2d443cfc968a4edca11f217b4f4baa5eb8a7c149
parent8002c83d3c2dfd93075362923d6f63396ff53351 (diff)
parent2f47d02ba68c60aa127b5553acd14c7b7aa36367 (diff)
downloadoopsla21_fvhls-dab109f87a4f227ad51aadfeb8be5f9a8ec96aba.tar.gz
oopsla21_fvhls-dab109f87a4f227ad51aadfeb8be5f9a8ec96aba.zip
Merge branch 'master' of https://git.overleaf.com/5ed78033b633200001e693d0 into master
-rw-r--r--algorithm.tex4
-rw-r--r--introduction.tex2
-rw-r--r--main.tex4
-rw-r--r--verilog.tex23
4 files changed, 21 insertions, 12 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 1068076..4d8b1da 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -25,7 +25,7 @@
\draw[->] (rtl) -- (dfgstmd);
\draw[->] (dfgstmd) -- (verilog);
\end{tikzpicture}}
- \caption{Verilog backend branching off at the RTL stage.}%
+ \caption{Verilog backend branching off at the RTL stage. \JW{It's a real nightmare that `RTL' is so ambiguous, meaning both software and hardware. I think we have to do something about it. One possibility: according to the CompCert webpages, software RTL is `also known as 3-address code', so we could replace (software) `RTL' with `3AC' (and give a footnote to explain why). Is it also possible to replace (hardware) `RTL' with just `Verilog'? I know Verilog could mean RTL or netlist, but since this paper doesn't talk much about RTL-to-netlist synthesis, perhaps that doesn't matter? Or perhaps RTL could be replaced with FSMD? Or even just `hardware design'?}}%
\label{fig:rtlbranch}
\end{figure}
@@ -60,7 +60,7 @@ Existing HLS compilers usually use LLVM IR as an intermediate representation whe
\inputminted[fontsize=\tiny]{systemverilog}{data/accumulator2.v}
\caption{Accumulator Verilog code.}\label{fig:accumulator_v}
\end{subfigure}
- \caption{Accumulator example using CompCert to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.}}\label{fig:accumulator_htl_v}
+ \caption{Accumulator example using CompCert to translate from HTL to Verilog.\YH{I feel like these examples take up too much space, but don't really know of a different way to show off a complete example without the long code.} \JW{Ok, what about drawing the FSMD in some sort of pictorial way? I think you tried drawing it as a schematic previously, but that was too big and clumsy. What about drawing the FSMD as a state machine, with sixteen states and labelled edges between them? Or might that be too abstract?}}\label{fig:accumulator_htl_v}
\end{figure}
%\begin{figure}
diff --git a/introduction.tex b/introduction.tex
index 798e7f9..cb1ed1e 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -2,7 +2,7 @@
%% Motivation for why HLS might be needed
-The current approach to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@ However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog. Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient. Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
+The current approach \JW{Maybe `the' is too strong -- could say `One current approach'.} to writing energy-efficient and high-throughput applications is to use application-specific hardware, instead of relying on a general-purpose CPU.\@ However, custom hardware designs come at the cost of having to design and produce them, which can be a tedious and error-prone process using hardware description languages (HDL) such as Verilog. Especially with the size of hardware designs growing over the years, it can become difficult to verify that the hardware design behaves in the expected way, as simulation of HDLs can be quite inefficient. Furthermore, the algorithms that are being accelerated in hardware often already have a software implementation, meaning they have to be reimplemented efficiently in a hardware description language which can be time-consuming.
%% Definition and benefits of HLS
One possible alternative to the tedious design process of custom hardware is to use high-level synthesis (HLS), which is the process of generating custom hardware, represented in an HDL, based on a behavioural description, often in a subset of C. This elevates the level of abstraction, because the description of the algorithm in C is inherently untimed, meaning actions don't have to be scheduled into clock cycles manually. The higher level of abstraction makes it easier to reason about the algorithms and therefore also facilitates maintaining them. As a result the time to design the hardware is reduced significantly, especially if a software description of the algorithm already exists, because there is no need to redesign at a lower level and directly in hardware. In addition, using HLS to design the hardware has the benefit of making the functional verification of the design much more efficient and simple than at the HDL stage, since the entire software ecosystem can be mobilised for this
diff --git a/main.tex b/main.tex
index 5dd538f..d5ac1e5 100644
--- a/main.tex
+++ b/main.tex
@@ -125,9 +125,9 @@
\email{j.wickerson@imperial.ac.uk}
\begin{abstract}
- High-level synthesis (HLS) refers to automatically compiling software into hardware. In a world increasingly reliant on application-specific hardware accelerators, HLS provides a higher-level of abstraction for hardware designers, while also benefitting from the rich software ecosystem to verify the functionality of their designs. However, adoption of HLS in safety-critical applications is still limited, because current tools cannot verify that the hardware implements the same behaviour as the software it was generated from, eliminating the benefits that software verification may provide. There is even evidence that HLS tools tend to be quite unreliable instead, either generating wrong hardware or sometimes not generating any hardware at all.
+ High-level synthesis (HLS) refers to automatically compiling software into hardware. In a world increasingly reliant on application-specific hardware accelerators, HLS provides a higher-level of abstraction for hardware designers, while also benefitting from \JW{This sentence is slightly confused because it's not really `HLS' that benefits, it's the HLS user who benefits.} the rich software ecosystem to verify the functionality of their designs. However, adoption of HLS in safety-critical applications is still limited, because current tools cannot verify \JW{`cannot verify' is a dubious claim -- I guess something like SLEC can technically verify that the hardware is correct, but it's a big faff to use it} that the hardware implements the same behaviour as the software it was generated from, eliminating the benefits that software verification may provide. There is even evidence that HLS tools tend to be quite unreliable instead, either generating wrong hardware or sometimes not generating any hardware at all. \JW{`Not generating hardware' isn't too bad, and indeed VeriCert is guilty of that too, when presented with C programs that contain features that it can't yet handle. The real problem is the HLS tool crashing unceremoniously.}
- We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics. Our tool, called VeriCert, extends the CompCert verified C compiler and consists of a new hardware specific intermediate language and a Verilog back end, which has been proven correct in Coq. VeriCert supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables. It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool.
+ We present the first mechanically verified HLS tool that preserves the behaviour of the software according to our semantics. Our tool, called VeriCert, extends the CompCert verified C compiler and consists of a new hardware specific \JW{hardware-specific} intermediate language and a Verilog back end, which has been proven correct in Coq. VeriCert supports all C constructs except for function pointers, recursive function calls, non-integer types and global variables. It has been evaluated on all suitable PolyBench benchmarks, which indicates that it generates hardware with area and cycle count around a magnitude larger than an existing, unverified HLS tool. \JW{Hopefully in due course we can nuance this a bit by adding in some figures relating to the unverified Vericert extensions.}
\end{abstract}
%% 2012 ACM Computing Classification System (CSS) concepts
diff --git a/verilog.tex b/verilog.tex
index 6274dcb..d9bcae2 100644
--- a/verilog.tex
+++ b/verilog.tex
@@ -1,8 +1,10 @@
\section{Verilog}
-Verilog is a hardware description language commonly used to design hardware. A Verilog design can then be synthesised into more basic logic which describes how different gates connect to each other, called a netlist. This representation can then be put onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC) to implement the design that was described in Verilog. The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware. Many Verilog features are only useful for simulation and do not affect the actual hardware itself, which means that these features do not have to be modelled in the semantics. In addition to that, as the HLS algorithm dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed. Only supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated.
+Verilog is a hardware description language commonly used to design hardware. A Verilog design can then be synthesised into more basic logic~\NR{Did you mean logic gate?} which describes how different gates connect to each other, called a netlist. This representation can then be put~\NR{mapped?} onto either a field-programmable gate array (FPGA) or turned into an application-specific integrated circuit (ASIC) to implement the design that was described in Verilog. The Verilog standard is quite large though, and not all Verilog features are needed to be able to describe hardware.~\NR{Will be good to use the word synthesisable subset somewhere in this discussion.} Many Verilog features are only useful for simulation and do not affect the actual hardware itself, which means that these features do not have to be modelled in the semantics. In addition to that, as the HLS algorithm dictates which Verilog constructs are generated, meaning the Verilog subset that has to be modelled by the semantics can be reduced even further to only support the constructs that are needed.~\NR{Didn't get this sentence? Do you mean that the HLS algo further restricts the synthesisable subset?} Only supporting a smaller subset in the semantics also means that there is less chance that the standard is misunderstood, and that the semantics actually model how the Verilog is simulated.
+\NR{What is the distinction here between the semantics and simulation? Discuss.}
The Verilog semantics are based on the semantics proposed by \citet{loow19_verif_compil_verif_proces}, which were used to create a formal translation from HOL logic into a Verilog circuit. These semantics are quite practical as they restrict themselves to a small subset of Verilog, which can nonetheless be used to model all hardware constructs one would want to design. An abstraction of the Verilog syntax that is generated is shown below:
+\NR{What is HOL? Was it discussed earlier?}
\begin{align*}
v\quad ::=&\; \mathit{sz} \yhkeyword{'d} n\\
@@ -19,16 +21,17 @@ The Verilog semantics are based on the semantics proposed by \citet{loow19_verif
m \text{ list}\quad ::=&\; \{ m \}
\end{align*}
-The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays. This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated. In addition to that, literal values are not represented by a list of nested Boolean values, but instead they are represented by a size and its value, meaning a Boolean is represented as a value with size one. Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}\YH{Yes that's actually completely true, I added a $\vec{m}$ rule for now, but can also add the $m$ afterwards.}
+The main addition to the Verilog syntax is the explicit declaration of inputs and outputs, as well as variables and arrays.~\NR{Isn't the 'main' addition that the PL community may not have seen the always blocks? It can be counter-intuitive to them. Also have you intentionally avoided assign statements? Finally, what is $\epsilon$ in the syntax?} This means that the declarations have to be handled in the semantics as well, adding to the safety that all the registers are declared properly with the right size, as this affects how the Verilog module is synthesised and simulated. In addition to that, literal values are not represented by a list of nested Boolean values, but instead they are represented by a size and its value, meaning a Boolean is represented as a value with size one. Finally, the last difference is that the syntax supports two dimensional arrays in Verilog explicitly which model memory so that we can reason about array loads and stores properly. \JW{In the $m$ category, should it be `reg d; m' rather than just `reg d;'?}\YH{Yes that's actually completely true, I added a $\vec{m}$ rule for now, but can also add the $m$ afterwards.}
\subsection{Semantics}
-Existing operational semantics~\cite{loow19_verif_compil_verif_proces} were adapted for the semantics of the language that CoqUp eventually targets. This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics. This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
+Existing operational semantics~\cite{loow19_verif_compil_verif_proces}~\NR{of Verilog?} were adapted for the semantics of the language that CoqUp eventually targets. This semantics is a small-step operational semantics at the clock cycle level, as hardware typically does not terminate in any way, however, within each clock cycle the semantics are constructed in a big-step style semantics.~\NR{Confusing sentence. How can you take a big step within a small step?} This style of semantics matches the small-step operational semantics of CompCert's register transfer language (RTL) quite well.
-At the top-level, always blocks describe logic which is run every time some event occurs. The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic. As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel. However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially. However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables. This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel. We can denote these two association maps as $s = (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})$, where $\Gamma_{\rm r}$ is the current value of the registers, $\Gamma_{\rm a}$ is the current value of the array, and $\Delta_{\rm r}$ and $\Delta_{\rm a}$ are the values of the variables and arrays when the clock cycle ends.
-
-We can then define how one step in the semantics looks like. We therefore first need to define the structure of the main module which will contain the logic for the program. In general, functions that are translated to hardware will require basic handshaking signals so that the translated function can be used in hardware. Firstly, they require an input for the clock, so that all the sequential circuits are run at the right time. They then require a start and reset input, so that the hardware generated from the function can be reused multiple times. Finally, they need a finish and return signal, where finish will go high when the result is ready to be read. In addition to that, the function could take an arbitrary number of inputs which act as arguments to the function, so that the function can be called with different arguments. However, in addition to inputs and outputs to the module, we also need to keep track of some internal signals and properties about the module. Firstly, we need to keep track of the internal variables that contain the current state of the module, and the current contents of the stack. Finally, the module will contain the entry point of the module and the list of module items that declare all of the internal registers and contain the encoding of the state machine that behaves in the same way as the function. We can therefore declare it in the following way:
+At the top-level, always blocks describe logic which is run every time some event occurs. The only event that is supported by these semantics is detecting the rising rising edge of the clock, so that we can implement synchronous logic.~\NR{So the semantics does not support combinational always blocks or assignments?} As soon as an event occurs, the hardware will be executed, meaning if there are multiple always blocks that get triggered by the event, these will run in parallel.~\NR{Does the semantics also enforce that variables that only be modified within one always block (multiple drivers not allowed)? PL community might think of data races. } However, as the semantics should be deterministic, we impose an order on the always blocks and execute them sequentially.~\NR{Oh, do we sequentialise always execution?} However, to preserve the fact that the statements inside of the always block are executed in parallel, nonblocking assignments to variables need to be kept in a different association map compared to blocking assignments to variables.~\NR{Have you discussed what an association map is?} This preserves the behaviour that blocking assignments change the value of the variable inside of the clock cycle, whereas the nonblocking assignments only take place at the end of the clock cycle, and in parallel. We can denote these two association maps as $s = (\Gamma_{\rm r}, \Gamma_{\rm a}, \Delta_{\rm r}, \Delta_{\rm a})$, where $\Gamma_{\rm r}$ is the current value of the registers, $\Gamma_{\rm a}$ is the current value of the array, and $\Delta_{\rm r}$ and $\Delta_{\rm a}$ are the values of the variables and arrays when the clock cycle ends.~\NR{Do blocking and non-blocking assignments each have individual maps? Also, association map is used before definition in this paragraph.}
+We can then define how one step in the semantics looks like. We therefore first need to define the structure of the main module which will contain the logic for the program. In general, functions that are translated to hardware will require basic handshaking signals so that the translated function can be used in hardware. Firstly, they require an input for the clock, so that all the sequential circuits are run at the right time. They then require a start and reset input, so that the hardware generated from the function can be reused multiple times. Finally, they need a finish and return signal, where finish will go high when the result is ready to be read. In addition to that, the function could take an arbitrary number of inputs which act as arguments to the function, so that the function can be called with different arguments. In addition to inputs and outputs to the module, we also need to keep track of some internal signals and properties about the module. Firstly, we need to keep track of the internal variables that contain the current state of the module, and the current contents of the stack.~\NR{Are you referring to the C stack?} Finally, the module contains the entry point of the module, the list of module items that declare all of the internal registers and the encoding of the state machine that behaves in the same way as the function. We can therefore declare it in the following way:
+~\NR{We should distinguish handshaking signals of a Verilog module to arguments of a sw function. In the sw world, an argument is either value or pointer. Values are unmodifiable, and pointers can be only be modified by the function body. In contrast, handshaking signals are constantly active, interacting with external modules. These signals can also be trigger events, such as the clock.}
+~\NR{Also, I don't think a Verilog module is equivalent to a Verilog function. I think they both exist in Verilog. Please double-check this before we use module and function interchangably in the text above.}
%\JW{I'd be inclined to write `$r~\mathrm{list}$' rather than $\vec{r}$, as it's a little more readable. (Assuming it's more-or-less the same thing?)}
\begin{align*}
@@ -40,6 +43,7 @@ We can then define how one step in the semantics looks like. We therefore first
\end{align*}
The two main evaluation functions are then \textit{erun}, which takes in the current state together with an expression and returns a value, and \textit{srun}, which takes the current state and a statement as input, and returns the updated state. The inductive rules defining \textit{srun} are shown below, where $\sigma_{n} = (\Gamma_{\rm r}^{n}, \Gamma_{\rm a}^{n}, \Delta_{\rm r}^{n}, \Delta_{\rm a}^{n})$:
+\NR{The evaluation function here relates to the semantics, right? Can it be confused with sw function and Verilog module?}
\begin{gather*}
\label{eq:1}
@@ -66,13 +70,16 @@ The two main evaluation functions are then \textit{erun}, which takes in the cur
% \inferrule[Nonblocking Array]{\yhkeyword{name}\ \textit{lhs} = \yhkeyword{OK}\ n \\ \textit{erun}\ \Gamma\ \textit{rhs}\ v_{\textit{rhs}}}{\textit{srun}\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r}, \Delta_{a})\ (\textit{lhs} \Leftarrow \textit{rhs})\ (\Gamma_{r}, \Gamma_{a}, \Delta_{r} // \{n \rightarrow v_{\textit{rhs}}\}, \Delta_{a})}
\end{gather*}
-Taking the \textsc{CondTrue} rule as an example, this rule will only apply if the Boolean result of running the expression results in a \texttt{true} value. It then also states that the statement in the true branch of the conditional statement \textit{stt} runs from state $\sigma_{0}$ to state $\sigma_{1}$. If both of these conditions hold, we then get that the conditional statement will also run from state $\sigma_{0}$ to state $\sigma_{1}$. The \textsc{Blocking} and \textsc{Nonblocking} rules are a bit more interesting, as these modify the blocking and nonblocking association maps respectively.
+Taking the \textsc{CondTrue} rule as an example, this rule will only apply if the Boolean result of running the expression results in a \texttt{true} value. It then also states that the statement in the true branch of the conditional statement \textit{stt}~\NR{\textit{st} or \textit{stt}?} runs from state $\sigma_{0}$ to state $\sigma_{1}$. If both of these conditions hold~\NR{How are these conditions checked? Can \textit{erun} or \textit{srun} evaluate to false?}, we then get that the conditional statement will also run from state $\sigma_{0}$ to state $\sigma_{1}$. The \textsc{Blocking} and \textsc{Nonblocking} rules are a bit more interesting, as these modify the blocking and nonblocking association maps respectively.
One main difference between these semantics and the Verilog semantics by \citet{loow19_verif_compil_verif_proces} is that there is no function for external nondeterministic effects, such as memories and inputs and outputs. These are instead handled explicitly in the semantics by using two dimensional unpacked arrays to model memories and assuming that inputs to modules cannot change. Another difference with these semantics is that partial updates to arrays are fully supported, due to the fact that there are two different queues for arrays and variables. Originally, if there was a blocking assignment to an array, and then a nonblocking assignment to a different region in the array, then the blocking assignment would disappear at the end of the clock cycle. This is because the complete array would be overwritten with the updated array in the nonblocking association maps. However, in our semantics, only the values that were changed in the array are actually recorded in the nonblocking assignment queue, meaning once the blocking and nonblocking array association maps are merged, only the actual indices that changed with nonblocking assignment are updated in the blocking assignment map.
+\NR{I did not understand this paragraph. Discuss with Yann.}
In these semantics, module instantiations are not supported, as they can be modelled by inlining the logic that modules would have produced. This therefore means that function calls are implemented by inlining all the functions as well. Consequently, recursive function calls are not supported, however, these are not supported by other high-level synthesis tools either, as allocating memory dynamically is not possible with fixed size RAM blocks.
+\NR{Were these last two paragraphs referring to your assumptions or limitations or simplications you made? Discuss with Yann.}
To integrate our semantics with CompCert, we need to define the same states that CompCert uses to define their semantics, which are the \texttt{State}, \texttt{Callstate} and \texttt{Returnstate}. The \texttt{Callstate} and \texttt{Returnstate} are needed because the initial state describes a call to main, and the final state results in the return value from the main function. Even though the Verilog semantics do not support function calls, semantics for the initial function call and final return state need to be implemented to be properly shown equivalent to the Verilog semantics. Firstly, the \texttt{State} is contains all the information necessary to identify the current execution state of the function or module. In Verilog, we therefore keep track of the current module $m$ we are working on, the current value of the program counter $v$, which translates to the current value of the state register $s_{t}$, the current contents of the stack frame \textit{sf} and finally the current states of the association maps for variables $\Gamma_{r}$ and arrays $\Gamma_{a}$. We can therefore define the state as \texttt{State} \textit{sf} $m$ $v$ $\Gamma_{r}$ $\Gamma_{a}$. The \texttt{Callstate} only needs to be supported for the initial call to the function and the \texttt{Returnstate} only needs to be supported for the return of the main function. As function calls are not supported in the semantics, there cannot be any other possible calling state or return state in the program.
+\NR{We had a lot of discussion on functions earlier in this section, but then now we claim that the Verilog semantics does not support functions. If I recall correctly, our C program can't have functions either, or we inline all functions. Aren't they contradicting? Discuss with Yann.}
We then define the semantics of running the module for one clock cycle in the following way:
@@ -95,6 +102,8 @@ We then define the semantics of running the module for one clock cycle in the fo
The \textsc{Module} rule is the main rule for the execution of one clock cycle of the module. Given that the value of the $s_{t}$ register is the value of the program counter at the current instruction and that the value of the $s_{t}$ register in the resulting association map is equal to the next program counter value, we can then say that if all the module items in the body go from one state to another, that the whole module will step from that state to the other.
+\NR{Bigger picture question: Why don't always blocks show up in the rules?}
+
%%\input{verilog_notes}
%%% Local Variables: