summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-07-04 16:03:31 +0100
committerYann Herklotz <git@yannherklotz.com>2020-07-04 16:03:31 +0100
commit3bdb4e216ae8d65ce0feeba65d47d7613bd07365 (patch)
tree0e9beab02c2585629b5ed0989516283154a5bf5d /algorithm.tex
parent276279c5a676c4a46361c97456da76498e972fea (diff)
downloadoopsla21_fvhls-3bdb4e216ae8d65ce0feeba65d47d7613bd07365.tar.gz
oopsla21_fvhls-3bdb4e216ae8d65ce0feeba65d47d7613bd07365.zip
Verilog section
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex4
1 files changed, 3 insertions, 1 deletions
diff --git a/algorithm.tex b/algorithm.tex
index 727e067..9043eaf 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -119,10 +119,12 @@ The translation can be described by a relational specification, which can be des
In the accumulator C code, the for loop is translated to a branch statement in state 3, which can also be translated to equivalent HTL code by placing the comparison into the control logic part of the main function, and not performing any data operations in the data-flow section. On the next clock cycle, the state will therefore transition to state 7 or state 2 depending on if \texttt{reg\_3} is less than 3 or not. One thing to note is that in this case, the comparison is signed. By default, all operators and registers in Verilog and HTL are unsigned, so to force an operation to handle the bits as signed, both operators have to be forced to signed. In addition to that, Verilog resizes expressions to the largest needed size by default, which can affect the result of the computation. This feature is also not supported by the Verilog semantics we used, and there would therefore be a mismatch between the Verilog semantics and the actual behaviour of Verilog according to the standard. To bypass this issue braces are used to stop the Verilog simulator or synthesis tool from resizing anything inside the braces. Instead, explicit resizing is used in the semantics and operations can only be performed on two registers that have the same size.
-Finally, this example also contains an array, which is stored on the stack for the main function. Translating arrays means that load and store instructions need to be supported. In hardware RAM is not as available as in software, and needs to be explicitly implemented by declaring a two dimensional array with specific properties, such as only being able to write and read from it once per clock cycle. Therefore, to make memory as efficient as possible and support reads and writes in one clock cycle, a memory only addressable by word needs to be implemented, instead of the standard RAM which is addressable by byte. This is because the clock cycles in devices such as FPGAs are normally an order of magnitude slower than CPUs, meaning taking four cycles to load an integer is not feasible. However, this leads to various problems, the main one being that addresses need to be word-aligned, which is not the case in RTL or in C. Therefore, addresses need to be divisible by four to also be valid addresses in HTL, which is shown in the load from the stack in state 6 of the RTL. The equivalent HTL code is shown in the datapath branch where the same addition is done, however, the result is divided by 4. To prove that these two loads have the same behaviour, we have to prove that
+Finally, this example also contains an array, which is stored on the stack for the main function. Translating arrays means that load and store instructions need to be supported. In hardware RAM is not as available as in software, and needs to be explicitly implemented by declaring a two dimensional array with specific properties, such as only being able to write and read from it once per clock cycle. Therefore, to make memory as efficient as possible and support reads and writes in one clock cycle, a memory only addressable by word needs to be implemented, instead of the standard RAM which is addressable by byte. This is because the clock cycles in devices such as FPGAs are normally an order of magnitude slower than CPUs, meaning taking four cycles to load an integer is not feasible. However, this leads to various problems, the main one being that addresses need to be word-aligned, which is not the case in RTL or in C. Therefore, addresses need to be divisible by four to also be valid addresses in HTL, which is shown in the load from the stack in state 6 of the RTL. The equivalent HTL code is shown in the datapath branch where the same addition is done, however, the result is divided by 4. To prove that these two loads have the same behaviour, we have to prove that the load was word-aligned and therefore could be divided by 4.\YH{Elaborate on our solution.}
\subsection{Verilog}
+Finally, we have to prove the translation from Verilog to hardware correct
+
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"