summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <ymh15@ic.ac.uk>2021-04-17 10:27:17 +0000
committeroverleaf <overleaf@localhost>2021-04-17 10:31:48 +0000
commit204b9b668aa418fbab1ed8b591f2cf7fabb6bc2c (patch)
treefc292d92bdea9cc10c33d7c5855f2f328bd523cb /algorithm.tex
parent811e65af1394197ff32e99dbe89295f9258baaee (diff)
downloadoopsla21_fvhls-204b9b668aa418fbab1ed8b591f2cf7fabb6bc2c.tar.gz
oopsla21_fvhls-204b9b668aa418fbab1ed8b591f2cf7fabb6bc2c.zip
Update on Overleaf.
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex9
1 files changed, 5 insertions, 4 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 6303173..c4c91a4 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,7 +1,7 @@
\section{Designing a verified HLS tool}
\label{sec:design}
-This section describes the main architecture of the HLS tool, and the way in which the Verilog back end was added to \compcert{}. This section will also cover an example of converting a simple C program into hardware, expressed in the Verilog language.
+This section describes the main architecture of the HLS tool, and the way in which the Verilog back end was added to \compcert{}. This section also covers an example of converting a simple C program into hardware, expressed in the Verilog language.
\paragraph{Choice of source language}
C was chosen as the source language as it remains the most common source language amongst production-quality HLS tools~\cite{canis11_legup, xilinx20_vivad_high_synth, intel_hls, bambu_hls}. This, in turn, may be because it is ``[t]he starting point for the vast majority of algorithms to be implemented in hardware''~\cite{5522874}, lending a degree of practicality.
@@ -156,7 +156,7 @@ endmodule
\end{figure}
\subsection{Translating C to Verilog, by example}
-Figure~\ref{fig:accumulator_c_rtl} illustrates the translation of a simple program that sums the elements of an array.
+Figure~\ref{fig:accumulator_c_rtl} illustrates the translation of a simple program that stores and retrieves values from an array.
In this section, we describe the stages of the \vericert{} translation, referring to this program as an example.
\subsubsection{Translating C to 3AC}
@@ -293,7 +293,7 @@ Typically, HLS-generated hardware consists of a sea of registers and RAM memorie
This memory view is very different to the C memory model, so we perform the following translation.
Variables that do not have their address taken are kept in registers, which correspond to the registers in 3AC.
All address-taken variables, arrays, and structs are kept in RAM.
-The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. In a first pass, RAM is represented by direct reads and writes to an array, however, in a RAM insertion pass a proper RAM interface is added, and reads and writes are performed through that interface.
+The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. In a first pass, \vericert{} represents the RAM by direct reads and writes to an array. This is followed by a RAM insertion pass which inserts a proper RAM interface, and reads and writes are performed through that interface instead. This interface is shown on lines 9-15 in the Verilog code in Figure~\ref{fig:accumulator_v}.
Finally, global variables are not translated in \vericert{} at the moment.
A high-level overview of the architecture can be seen in Figure~\ref{fig:accumulator_diagram}.
@@ -332,7 +332,8 @@ One big difference between C and Verilog is how memory is represented. Although
However, the memory model that \compcert{} uses for its intermediate languages is byte-addre\-ssa\-ble~\cite{blazy05_formal_verif_memor_model_c}. If a byte-addressable memory was used in the target hardware, which is closer to \compcert{}'s memory model, then a load and store would instead take four clock cycles, because a RAM can only perform one read and write per clock cycle. It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware. Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way. Since only integer loads and stores are currently supported in \vericert{}, it follows that the addresses given to the loads and stores should be multiples of four. If that is the case, then the translation from byte-addressed memory to word-addressed memory can be done by dividing the address by four.
\subsubsection{Implementation of RAM interface}
-The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block. Without the interface, the array would be implemented using registers, which would increase the size of the hardware considerably.
+The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path does not access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM interface (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate interface like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.
+Interestingly, the syntax for the RAM interface is quite strict, as the synthesis tool will pattern match on it and only work for a predefined set of interfaces. Without the interface, the array would be implemented using registers, which would increase the size of the hardware considerably.
Therefore, an extra compiler pass is added from HTL to HTL to extract all the direct accesses to the Verilog array and replace them by signals which access the RAM interface in a separate always-block. The translation is performed by going through all the instructions and replacing each load and store expression in turn. Stores can simply be replaced by the necessary wires directly, however, because loads using the RAM interface take two clock cycles where a direct load from an array takes only one, this pass inserts an extra state after each load.