summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2022-05-06 15:31:30 +0100
committerYann Herklotz <git@yannherklotz.com>2022-05-06 15:31:30 +0100
commitf6add5b74e9ec6e251d8c744d3084f5b869f8d48 (patch)
treef6ede425e9d77d14b41c2c50ce1ac5e67fa45e50
parent29abf1b418989b544295e004a215c0e9cfb18280 (diff)
downloadlsr22_fvhls-f6add5b74e9ec6e251d8c744d3084f5b869f8d48.tar.gz
lsr22_fvhls-f6add5b74e9ec6e251d8c744d3084f5b869f8d48.zip
Fix more maths
-rw-r--r--chapters/hls.tex70
-rw-r--r--chapters/schedule.tex7
2 files changed, 39 insertions, 38 deletions
diff --git a/chapters/hls.tex b/chapters/hls.tex
index 6a86107..ad2b32c 100644
--- a/chapters/hls.tex
+++ b/chapters/hls.tex
@@ -44,17 +44,17 @@ pointers, recursive function calls, non-32-bit integers, floats, and global vari
input software and the word-addressed memory that we implement in the output hardware, different
handling of unsigned comparisons between C and Verilog, and carefully implementing memory reads
and writes so that these behave properly as a RAM in hardware.
-\item In \in{Section}[sec:hls:evaluation], we evaluate Vericert on the PolyBench/C benchmark
- suite~\cite[pouchet13_polyh], and compare the performance of our generated hardware against an
- existing, unverified HLS tool called LegUp~\cite[canis11_legup]. We show that Vericert generates
- hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the absence of
- division) and \areaIncr$\times$ larger than that generated by LegUp. This performance gap can be
- largely attributed to Vericert's current lack of support for instruction-level parallelism and the
- absence of an efficient, pipelined division operator. We intend to close this gap in the future by
- introducing (and verifying) HLS optimisations of our own, such as scheduling and memory analysis.
- This section also reports on our campaign to fuzz-test Vericert using over a hundred thousand
- random C programs generated by Csmith~\cite{yang11_findin_under_bugs_c_compil} in order to confirm
- that its correctness theorem is watertight.
+%\item In \in{Section}[sec:hls:evaluation], we evaluate Vericert on the PolyBench/C benchmark
+% suite~\cite[pouchet13_polyh], and compare the performance of our generated hardware against an
+% existing, unverified HLS tool called LegUp~\cite[canis11_legup]. We show that Vericert generates
+% hardware that is \slowdownOrig$\times$ slower (\slowdownDiv$\times$ slower in the absence of
+% division) and \areaIncr$\times$ larger than that generated by LegUp. This performance gap can be
+% largely attributed to Vericert's current lack of support for instruction-level parallelism and the
+% absence of an efficient, pipelined division operator. We intend to close this gap in the future by
+% introducing (and verifying) HLS optimisations of our own, such as scheduling and memory analysis.
+% This section also reports on our campaign to fuzz-test Vericert using over a hundred thousand
+% random C programs generated by Csmith~\cite{yang11_findin_under_bugs_c_compil} in order to confirm
+% that its correctness theorem is watertight.
\stopitemize
\section[sec:hls:design]{Designing a Verified HLS Tool}
@@ -190,7 +190,7 @@ control-flow using if-statements and case-statements.
implementation in Verilog and it's corresponding state diagram.}]
\startfloatcombination[nx=2, ny=1]
\startplacesubfigure
- \startframedtext[width={0.9\textwidth},frame=off,offset=none,loffset=3cm,bodyfont=11pt]
+ \startframedtext[width={0.6\textwidth},frame=off,offset=none,bodyfont=11pt]
\starthlverilog
module main(input rst, input y,
input clk, output reg z);
@@ -649,7 +649,7 @@ Vericert, the equivalence of the representation in terms of divisions and shifts
integers and the specification, thereby making it simpler to prove the correctness of the Verilog
implementation in terms of shifts.
-\section[title={A Formal Semantics for Verilog},reference={sec:verilog}]
+\section[title={A Formal Semantics for Verilog},reference={sec:hls:verilog}]
This section describes the Verilog semantics that was chosen for the target language, including the
changes that were made to the semantics to make it a suitable HLS target. The Verilog standard is
@@ -707,8 +707,8 @@ $v$.
Finally, the following rule dictates how the whole module runs in one clock cycle:
\placeformula\startformula \text{\sc Module }\ \frac{(\Gamma, \epsilon, \vec{m})\
- \downarrow_{\text{module}} (\Gamma', \Delta')}{(\Gamma, \text{\tt module } \text{\tt main}
- \text{\tt (...);}\ \vec{m}\ \text{\tt endmodule}) \downarrow_{\text{program}} (\Gamma'\ //\
+ \downarrow_{\text{module}} (\Gamma', \Delta')}{\left(\Gamma, \startmatrix\NC\text{\tt module}\ \text{\tt main}
+ \text{\tt (...);}\NR\NC\vec{m}\NR\NC \text{\tt endmodule}\NR\stopmatrix\right) \downarrow_{\text{program}} (\Gamma'\ //\
\Delta')} \stopformula
where $\Gamma$ is the initial state of all the variables, $\epsilon$ is the empty map because the
@@ -755,10 +755,12 @@ described in \in{Section}[sec:algorithm:optimisation:ram]), support for negative
always-blocks was added to the semantics. This is shown in the modifications of the {\sc Module}
rule shown below:
-\placeformula\startformula \text{\sc Module }\ \frac{(\Gamma, \epsilon, \vec{m})\
- \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \\ (\Gamma'\ //\ \Delta', \epsilon, \vec{m})
- \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')}{(\Gamma, \text{\tt module}\ \text{\tt main}
- \text{\tt (...);}\ \vec{m}\ \text{\tt endmodule}) \downarrow_{\text{program}} (\Gamma''\ //\
+\placeformula\startformula \text{\sc Module }\ \frac{
+\startmatrix
+(\NC \Gamma, \epsilon, \vec{m})\
+ \downarrow_{\text{module}^{+}} (\Gamma', \Delta') \NR\NC (\Gamma'\ //\ \Delta', \epsilon, \vec{m})
+ \downarrow_{\text{module}^{-}} (\Gamma'', \Delta'')\NR\stopmatrix}{\left(\Gamma, \startmatrix\NC\text{\tt module}\ \text{\tt main}
+ \text{\tt (...);}\NR\NC\vec{m}\NR\NC \text{\tt endmodule}\NR\stopmatrix\right) \downarrow_{\text{program}} (\Gamma''\ //\
\Delta'')} \stopformula
The main execution of the module $\downarrow_{\text{module}}$ is split into
@@ -789,7 +791,7 @@ inputs for Verilog modules.
Finally, we use 32-bit integers to represent bitvectors rather than arrays of booleans. This is
because Vericert (currently) only supports types represented by 32 bits.
-\subsection[sec:verilog:integrating]{Integrating the Verilog Semantics into CompCert's Model}
+\subsection[sec:hls:verilog:integrating]{Integrating the Verilog Semantics into CompCert's Model}
\startplacefigure[reference={fig:inference_module},title={Top-level semantics used in CompCert to
initiate the first call to \type{main} and well as return the final result of the \type{main}
@@ -880,7 +882,7 @@ for the \type{main} function of program $P$, where this \type{main} function nee
current translation unit. The final state results in the program output of value $n$ when reaching a
\type{Returnstate} with an empty stack frame.
-\subsection[sec:verilog:memory]{Memory Model}
+\subsection[sec:hls:verilog:memory]{Memory Model}
The Verilog semantics do not define a memory model for Verilog, as this is not needed for a hardware
description language. There is no preexisting architecture that Verilog will produce; it can
@@ -983,7 +985,7 @@ $\Delta_{\rm a}$. The optional values are present to ensure correct merging of t
maps at the end of the clock cycle. The invariant used in the proofs is that block 0 should be
equivalent to the merged representation of the $\Gamma_{\rm a}$ and $\Delta_{\rm a}$ maps.
-\section[sec:proof]{Correctness Proof}
+\section[sec:hls:proof]{Correctness Proof}
Now that the Verilog semantics have been adapted to the CompCert model, we are in a position to
formally prove the correctness of our C-to-Verilog compilation. This section describes the main
@@ -998,7 +1000,7 @@ Verilog and CompCert's existing intermediate languages.
\startitemize
\item
- As already mentioned in \in{Section}[sec:verilog:memory], because the memory model in our Verilog
+ As already mentioned in \in{Section}[sec:hls:verilog:memory], because the memory model in our Verilog
semantics is finite and concrete, but the CompCert memory model is more abstract and infinite with
additional bounds, the equivalence of these models needs to be proven. Moreover, our memory is
word-addressed for efficiency reasons, whereas CompCert's memory is byte-addressed.
@@ -1067,13 +1069,13 @@ simulation result (following standard practice), which makes it easier to prove.
forward simulation, it suffices to prove forward simulations between each pair of consecutive
intermediate languages, as these results can be composed to prove the correctness of the whole HLS
tool. The forward simulation from 3AC to HTL is stated in \in{Lemma}[lemma:htl]
-(\in{Section}[sec:proof:3ac_htl]), the forward simulation for the RAM insertion is shown in
-\in{Lemma}[lemma:htl_ram] (\in{Section}[sec:proof:ram_insertion]), then the forward simulation
-between HTL and Verilog is shown in \in{Lemma}[lemma:verilog] (\in{Section}[sec:proof:htl_verilog]),
+(\in{Section}[sec:hls:proof:3ac_htl]), the forward simulation for the RAM insertion is shown in
+\in{Lemma}[lemma:htl_ram] (\in{Section}[sec:hls:proof:ram_insertion]), then the forward simulation
+between HTL and Verilog is shown in \in{Lemma}[lemma:verilog] (\in{Section}[sec:hls:proof:htl_verilog]),
and finally, the proof that Verilog is deterministic is given in \in{Lemma}[lemma:deterministic]
-(\in{Section}[sec:proof:deterministic]).
+(\in{Section}[sec:hls:proof:deterministic]).
-\subsection[sec:proof:3ac_htl]{Forward Simulation from 3AC to HTL}
+\subsection[sec:hls:proof:3ac_htl]{Forward Simulation from 3AC to HTL}
As HTL is quite far removed from 3AC, this first translation is the most involved and therefore
requires a larger proof, because the translation from 3AC instructions to Verilog statements needs
@@ -1098,7 +1100,7 @@ show that the specification implies the desired simulation result. This strategy
standard practice.
\stopproof
-\subsubsection[sec:proof:3ac_htl:specification]{From Implementation to Specification}
+\subsubsection[sec:hls:proof:3ac_htl:specification]{From Implementation to Specification}
The specification for the translation of 3AC instructions into HTL data-path and control logic can
be defined by the following predicate:
@@ -1108,8 +1110,8 @@ be defined by the following predicate:
Here, the $\mathit{control}$ and $\mathit{data}$ parameters are the statements that the current 3AC
instruction $i$ should translate to. The other parameters are the special registers defined in
-Section~\goto{{[}sec:verilog:integrating{]}}[sec:verilog:integrating]. An example of a rule
-describing the translation of an arithmetic/logical operation from 3AC is the following:
+\in{Section}[sec:hls:verilog:integrating]. An example of a rule describing the translation of an
+arithmetic/logical operation from 3AC is the following:
\startformula
\text{\sc Iop}\ \frac{\text{\tt tr\_op}\ \mathit{op}\ \vec{a} = \text{\tt OK}\ e}{\text{\tt spec\_instr}\
@@ -1190,7 +1192,7 @@ This simulation diagram is proven by induction over the operational semantics of
matching state.
\stopproof
-\subsection[sec:proof:ram_insertion]{Forward Simulation of RAM Insertion}
+\subsection[sec:hls:proof:ram_insertion]{Forward Simulation of RAM Insertion}
\startplacefigure[reference={fig:htl_ram_spec},title={Specification for the memory implementation in
HTL, where $r$ is the RAM, which is then implemented by equivalent Verilog code.}]
@@ -1275,7 +1277,7 @@ introduced by the RAM.
\stopformula
\stoplemma
-\subsection[sec:proof:htl_verilog]{Forward Simulation from HTL to Verilog}
+\subsection[sec:hls:proof:htl_verilog]{Forward Simulation from HTL to Verilog}
The HTL-to-Verilog simulation is conceptually simple, as the only transformation is from the map
representation of the code to the case-statement representation. The proof is more involved, as the
@@ -1301,7 +1303,7 @@ converted.
constructors need to be called to get to the correct case.
\stopproof
-\subsection[sec:proof:deterministic]{Deterministic Verilog Semantics}
+\subsection[sec:hls:proof:deterministic]{Deterministic Verilog Semantics}
The final lemma we need is that the \pindex{determinism}Verilog semantics is deterministic. This
result allows us to replace the forwards simulation we have proved with the backwards simulation we
diff --git a/chapters/schedule.tex b/chapters/schedule.tex
index c1bb4c3..73cb4b2 100644
--- a/chapters/schedule.tex
+++ b/chapters/schedule.tex
@@ -12,10 +12,9 @@
The structure of the thesis has been described in the previous chapters. Out of these chapters, the
work for \in{Chapter}[sec:hls] has been completed, the work for \in{Chapter}[sec:scheduling] is
-still ongoing and finally the work in \in{Chapter}[sec:pipelining] and \in{Chapter}[sec:dynamic] has
-not been started yet. This chapter will describe how the rest of the work will be completed. The
-current plan is to finish all the work including the thesis by June 2023, which is when the funding
-will run out.
+still ongoing and finally the work in \in{Chapter}[sec:pipelining] has not been started yet. This
+chapter will describe how the rest of the work will be completed. The current plan is to finish all
+the work including the thesis by June 2023, which is when the funding will run out.
\section{Timeline}