summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2021-04-16 17:13:13 +0100
committerYann Herklotz <git@yannherklotz.com>2021-04-16 17:13:13 +0100
commit0152bb5c0e2e4dcbd95e53f6d668e02d1fca6beb (patch)
tree81ae9a05ae55737a480362053fdeb54e3dd94848 /algorithm.tex
parentefa1e45a13c073ab6bec67eec9f195a4dc8c7039 (diff)
downloadoopsla21_fvhls-0152bb5c0e2e4dcbd95e53f6d668e02d1fca6beb.tar.gz
oopsla21_fvhls-0152bb5c0e2e4dcbd95e53f6d668e02d1fca6beb.zip
Add back text
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex3
1 files changed, 2 insertions, 1 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 2c7cea2..4d07424 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -331,7 +331,8 @@ Although we would not claim that \vericert{} is a proper `optimising' HLS compil
One big difference between C and Verilog is how memory is represented. Although Verilog arrays use similar syntax to C arrays, they must be treated quite differently. To make loads and stores as efficient as possible, the RAM needs to be word-addressable, which means that an entire integer can be loaded or stored in one clock cycle.
However, the memory model that \compcert{} uses for its intermediate languages is byte-addre\-ssa\-ble~\cite{blazy05_formal_verif_memor_model_c}. If a byte-addressable memory was used in the target hardware, which is closer to \compcert{}'s memory model, then a load and store would instead take four clock cycles, because a RAM can only perform one read and write per clock cycle. It therefore has to be proven that the byte-addressable memory behaves in the same way as the word-addressable memory in hardware. Any modifications of the bytes in the \compcert{} memory model also have to be shown to modify the word-addressable memory in the same way. Since only integer loads and stores are currently supported in \vericert{}, it follows that the addresses given to the loads and stores should be multiples of four. If that is the case, then the translation from byte-addressed memory to word-addressed memory can be done by dividing the address by four.
-The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly in the data-path (i.e., within the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path doesn't access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM driver (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate driver like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.
+\subsubsection{Implementation of RAM templates}
+The simplest way to implement loads and stores in \vericert{} would be to access the Verilog array directly from within the data-path (i.e., inside the always-block on lines 16--32 of Figure~\ref{fig:accumulator_v}). This would be correct, but when a Verilog array is accessed at several program points, the synthesis tool is unlikely to detect that it can be implemented as a RAM block, and will resort to using lots of registers instead, ruining the circuit's area and performance. To avert this, we arrange that the data-path doesn't access memory directly, but simply sets the address it wishes to access and then toggles the \texttt{u\_en} flag. This activates the RAM driver (lines 9--15 of Figure~\ref{fig:accumulator_v}) on the next falling clock edge, which performs the requested load or store. By factoring all the memory accesses out into a separate driver like this, we ensure that the underlying array is only accessed from a single program point in the Verilog code, and thus ensure that the synthesis tool will correctly infer a RAM block.
Therefore, an extra compiler pass is added from HTL to HTL to extracts all the direct accesses to the Verilog array implementing memory and replaces them by signals which access the memory in a separate always-block. This ensures that the synthesis tool correctly identifies the array as being a RAM, so that it is not implemented by logic directly. The translation is performed by going through all the instructions and replacing each load and store expression one after another. Stores can simply be replaced by the necessary wires directly, however, loads using the RAM block take two clock cycles instead of a direct load from an array which only takes one clock cycles. This pass therefore creates a extra state which is inserted after each load.