summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-15 15:54:55 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-15 15:54:55 +0100
commit5a8c95178395dc095c236b98e7da046467a746a9 (patch)
treec58e662e31d85542311a5ca2543ca8657bb7f00b /algorithm.tex
parent58896b5c859b43dc72e0864bb27539277c1d5730 (diff)
downloadoopsla21_fvhls-5a8c95178395dc095c236b98e7da046467a746a9.tar.gz
oopsla21_fvhls-5a8c95178395dc095c236b98e7da046467a746a9.zip
Finish algorithm section
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex10
1 files changed, 7 insertions, 3 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 21b25eb..8a619e4 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -46,7 +46,7 @@ The .NET framework has been used as a basis for other HLS tools, such as Kiwi~\c
\node[language] at (6.7,-1.5) (verilog) {Verilog};
\node at (0,1) {\bf\compcert{}};
\node at (0,-1.5) {\bf\vericert{}};
- \node[align=center] at (3.3,-2.4) {\footnotesize RAM\\[-0.5em]\footnotesize generation};
+ \node[align=center] at (3.3,-2.4) {\footnotesize RAM\\[-0.5em]\footnotesize insertion};
\draw[->,thick] (clight) -- (conta);
\draw[->,thick] (conta) -- (cminor);
\draw[->,thick] (cminor) -- (rtl);
@@ -292,7 +292,7 @@ Typically, HLS-generated hardware consists of a sea of registers and RAM memorie
This memory view is very different to the C memory model, so we perform the following translation.
Variables that do not have their address taken are kept in registers, which correspond to the registers in 3AC.
All address-taken variables, arrays, and structs are kept in RAM.
-The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. Initially, \vericert{} translates loads and stores to direct accesses to the memory. An extra compiler pass is added to generate a well-formed RAM template to ensure that a RAM is correctly inferred by the synthesis tool. Loads and stores are performed using the signals that the RAM template exposes, instead of directly modifying the array.
+The stack of the main function becomes an unpacked array of 32-bit integers, which is translated to a RAM when the hardware description is passed through a synthesis tool. In this pass, loads and stores are translated to direct accesses to the Verilog array representing memory.
Finally, global variables are not translated in \vericert{} at the moment.
A high-level overview of the architecture can be seen in Figure~\ref{fig:accumulator_diagram}.
@@ -307,10 +307,14 @@ In addition to that, equality between \emph{unsigned} variables are actually not
Finally, the \texttt{mulhs} and \texttt{mulhu} instructions are not translated by \vericert{} either. These instructions fetch the upper bits of a 32-bit multiply. However, 64-bit number representations are currently not supported in the generated hardware, so this operation cannot currently be performed. These instructions are only generated to optimise divides by a constant number that is not a power of two, so turning off constant propagation will allow these programs to pass without error.
+\subsubsection{RAM insertion}
+
+This pass goes from HTL back to HTL and extracts all the direct accesses to the Verilog array implementing memory and replaces them by signals which access the memory in a separate always-block. This ensures that the synthesis tool correctly identifies the array as being a RAM, so that it is not implemented by logic directly. The translation is performed by going through all the instructions and replacing each load and store expression one after another. Stores can simply be replaced by the necessary wires directly, however, loads using the RAM block take two clock cycles instead of a direct load from an array which only takes one clock cycles. This pass therefore creates a extra state which is inserted after each load.
+
\subsubsection{Translating HTL to Verilog}
Finally, we have to translate the HTL code into proper Verilog. % and prove that it behaves the same as the 3AC according to the Verilog semantics.
-The challenge here is to translate our FSMD representation into a Verilog AST. However, as all the instructions in HTL are already expressed as Verilog statements, only the top level data-path and control logic maps need to be translated to valid Verilog. We also require declarations for all the variables in the program, as well as declarations of the inputs and outputs to the module, so that the module can be used inside a larger hardware design.
+The challenge here is to translate our FSMD representation into a Verilog AST. However, as all the instructions in HTL are already expressed as Verilog statements, only the top level data-path and control logic maps need to be translated to valid Verilog. We also require declarations for all the variables in the program, as well as declarations of the inputs and outputs to the module, so that the module can be used inside a larger hardware design. In addition to translating the maps of Verilog statements, an always-block that will behave like the RAM also has to be created, which is only modelled abstractly at the HTL level.
Figure~\ref{fig:accumulator_v} shows the final Verilog output that is generated for our example.
Although this translation seems quite straight\-forward, proving that this translation is correct is complex.