summaryrefslogtreecommitdiffstats
path: root/chapters/hls.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/hls.tex')
-rw-r--r--chapters/hls.tex98
1 files changed, 51 insertions, 47 deletions
diff --git a/chapters/hls.tex b/chapters/hls.tex
index ef75e78..5e7237a 100644
--- a/chapters/hls.tex
+++ b/chapters/hls.tex
@@ -18,12 +18,12 @@
We have designed a new HLS tool in the Coq theorem prover and proved that any output design it
produces always has the same behaviour as its input program. Our tool, called Vericert, is
automatically extracted to an OCaml program from Coq, which ensures that the object of the proof is
-the same as the implementation of the tool. Vericert is built by extending the CompCert verified C
-compiler~\cite[leroy09_formal_verif_realis_compil] with a new hardware-specific intermediate
-language and a Verilog back end. It supports most C constructs, including integer operations,
-function calls (which are all inlined), local arrays, structs, unions, and general control-flow
-statements, but currently excludes support for case statements, function pointers, recursive
-function calls, non-32-bit integers, floats, and global variables.
+the same as the implementation of the tool. Vericert is built by extending the
+\index{CompCert}CompCert verified C compiler~\cite[leroy09_formal_verif_realis_compil] with a new
+hardware-specific intermediate language and a Verilog back end. It supports most C constructs,
+including integer operations, function calls (which are all inlined), local arrays, structs, unions,
+and general control-flow statements, but currently excludes support for case statements, function
+pointers, recursive function calls, non-32-bit integers, floats, and global variables.
\paragraph{Contributions and Outline.} The contributions of this paper are as follows:
@@ -142,8 +142,8 @@ formal semantics.
\paragraph{Architecture of Vericert.} The main work flow of Vericert is given in
Fig.~\ref{fig:rtlbranch}, which shows those parts of the translation that are performed in CompCert,
and those that have been added. This includes translations to two new intermediate languages added
-in Vericert, HTL and Verilog, as well as an additional optimisation pass labelled as \quotation{RAM
-insertion}.
+in Vericert, \lindex{HTL}HTL and \lindex{Verilog}Verilog, as well as an additional optimisation pass
+labelled as \quotation{RAM insertion}.
\def\numcompcertlanguages{ten}
@@ -152,9 +152,9 @@ assembly output via a sequence of intermediate languages; we must decide which o
\numcompcertlanguages{} languages is the most suitable starting point for the HLS-specific
translation stages.
-We select CompCert's three-address code (3AC)\footnote{This is known as register transfer language
- (RTL) in the CompCert literature. `3AC' is used in this paper instead to avoid confusion with
- register-transfer level (RTL), which is another name for the final hardware target of the HLS
+We select CompCert's \lindex{RTL}three-address code (3AC)\footnote{This is known as register transfer
+ language (RTL) in the CompCert literature. `3AC' is used in this paper instead to avoid confusion
+ with register-transfer level (RTL), which is another name for the final hardware target of the HLS
tool.} as the starting point. Branching off \emph{before} this point (at CminorSel or earlier)
denies CompCert the opportunity to perform optimisations such as constant propagation and dead-code
elimination, which, despite being designed for software compilers, have been found useful in HLS
@@ -357,12 +357,12 @@ but this feature is not supported in most HLS tools anyway~\cite{davidthomas_asa
\subsubsection{Translating 3AC to HTL}
-The next translation is from 3AC to a new hardware translation language (HTL). This involves going
-from a CFG representation of the computation to a finite state machine with data-path (FSMD)
-representation~\cite{hwang99_fsmd}. The core idea of the FSMD representation is that it separates
-the control flow from the operations on the memory and registers. Hence, an HTL program consists of
-two maps from states to Verilog statements: the \emph{control logic} map, which expresses state
-transitions, and the \emph{data-path} map, which expresses computations.
+The next translation is from 3AC to a new \lindex{HTL}hardware translation language (HTL). This
+involves going from a CFG representation of the computation to a \index{FSMD}finite state machine
+with data-path (FSMD) representation~\cite{hwang99_fsmd}. The core idea of the FSMD representation
+is that it separates the control flow from the operations on the memory and registers. Hence, an
+HTL program consists of two maps from states to Verilog statements: the \emph{control logic} map,
+which expresses state transitions, and the \emph{data-path} map, which expresses computations.
Fig.~\ref{fig:accumulator_diagram} shows the resulting FSMD architecture. The right-hand block is
the control logic that computes the next state, while the left-hand block updates all the registers
and RAM based on the current program state.
@@ -513,7 +513,7 @@ Another is the \mono{Oshldimm} instruction, which is a left rotate instruction t
equivalent and therefore has to be implemented in terms of other operations and proven to be
equivalent. The only 32-bit instructions that we do not translate are case-statements
(\mono{Ijumptable}) and those instructions related to function calls (\mono{Icall}, \mono{Ibuiltin},
-and \mono{Itailcall}), because we enable inlining by default.
+and \mono{Itailcall}), because we enable \oindex{inlining}inlining by default.
\subsubsection{Translating HTL to Verilog}
@@ -547,14 +547,15 @@ similar syntax to C arrays, they must be treated quite differently. To make load
efficient as possible, the RAM needs to be word-addressable, which means that an entire integer can
be loaded or stored in one clock cycle. However, the memory model that CompCert uses for its
intermediate languages is byte-addre\-ssa\-ble~\cite{blazy05_formal_verif_memor_model_c}. If a
-byte-addressable memory was used in the target hardware, which is closer to CompCert's memory model,
-then a load and store would instead take four clock cycles, because a RAM can only perform one read
-and write per clock cycle. It therefore has to be proven that the byte-addressable memory behaves
-in the same way as the word-addressable memory in hardware. Any modifications of the bytes in the
-CompCert memory model also have to be shown to modify the word-addressable memory in the same way.
-Since only integer loads and stores are currently supported in Vericert, it follows that the
-addresses given to the loads and stores will be multiples of four. Translating from byte-addressed
-memory to word-addressed memory can then be done by dividing the address by four.
+byte-addressable \index{memory model}memory was used in the target hardware, which is closer to
+CompCert's memory model, then a load and store would instead take four clock cycles, because a RAM
+can only perform one read and write per clock cycle. It therefore has to be proven that the
+byte-addressable memory behaves in the same way as the word-addressable memory in hardware. Any
+modifications of the bytes in the CompCert memory model also have to be shown to modify the
+word-addressable memory in the same way. Since only integer loads and stores are currently
+supported in Vericert, it follows that the addresses given to the loads and stores will be multiples
+of four. Translating from byte-addressed memory to word-addressed memory can then be done by
+dividing the address by four.
\subsubsection[sec:hls:algorithm:optimisation:ram]{Implementation of RAM Interface}
@@ -1124,24 +1125,25 @@ translation proves that the specification holds.
\subsubsection[from-specification-to-simulation-1]{From Specification to Simulation}
Another simulation proof is performed to prove that the insertion of the RAM is semantics
-preserving. As in \in{Lemma}[lemma:simulation_diagram], we require some invariants that always hold
-at the start and end of the simulation. The invariants needed for the simulation of the RAM
-insertion are quite different to the previous ones, so we can define these invariants
-$\mathcal{I}_{r}$ to be the following:
+preserving. As in \index{simulation diagram}\in{Lemma}[lemma:simulation_diagram], we require some
+invariants that always hold at the start and end of the simulation. The invariants needed for the
+simulation of the RAM insertion are quite different to the previous ones, so we can define these
+invariants $\mathcal{I}_{r}$ to be the following:
\startitemize
\item
- The association map for arrays $\Gamma_{a}$ always needs to have the same arrays present, and
- these arrays should never change in size.
+ The \index{association map}association map for arrays $\Gamma_{a}$ always needs to have the same
+ arrays present, and these arrays should never change in size.
\item
The RAM should always be disabled at the start and the end of each simulation step. (This is why
self-disabling RAM is needed.)
\stopitemize
-The other invariants and assumptions for defining two matching states in HTL are quite similar to
-the simulation performed in \in{Lemma}[lemma:simulation_diagram], such as ensuring that the states
-have the same value, and that the values in the registers are less defined. In particular, the less
-defined relation matches up all the registers, except for the new registers introduced by the RAM.
+The other \pindex{invariants}invariants and assumptions for defining two matching states in HTL are
+quite similar to the simulation performed in \in{Lemma}[lemma:simulation_diagram], such as ensuring
+that the states have the same value, and that the values in the registers are less defined. In
+particular, the less defined relation matches up all the registers, except for the new registers
+introduced by the RAM.
\startlemma[lemma:htl_ram]
Given an HTL program, the forward-simulation relation should hold after inserting the RAM and
@@ -1161,8 +1163,9 @@ semantics of a map structure is quite different to that of the case-statement to
converted.
\startlemma[lemma:verilog]
- In the following, we write $\text{\tt tr\_verilog}$ for the translation from HTL to Verilog.
- (Note that this translation cannot fail, so we do not need the constructor here.)
+ In the following, we write $\text{\tt tr\_verilog}$ for the \pindex{translation+HTL to
+ Verilog}translation from HTL to Verilog. (Note that this translation cannot fail, so we do not
+ need the constructor here.)
\startformula
\forall h, V, B \in \text{\tt Safe},\quad \text{\tt tr\_verilog} (h) = V \land h \Downarrow B
@@ -1180,8 +1183,9 @@ converted.
\subsection[sec:proof:deterministic]{Deterministic Verilog Semantics}
-The final lemma we need is that the Verilog semantics is deterministic. This result allows us to
-replace the forwards simulation we have proved with the backwards simulation we desire.
+The final lemma we need is that the \pindex{determinism}Verilog semantics is deterministic. This
+result allows us to replace the forwards simulation we have proved with the backwards simulation we
+desire.
\startlemma[lemma:deterministic]
If a Verilog program $V$ admits behaviours $B_1$ and $B_2$, then
@@ -1236,12 +1240,12 @@ proofs about the extensional equality of array operations, such as merging array
assignments. As the negative edge implies that two merges take place every clock cycle, the proofs
about the equality of the contents in memory and in the merged arrays become more tedious too.
-Looking at the trusted computing base of , the Verilog semantics is 431 lines of code. This and the
-Clight semantics from are the only parts of the compiler that need to be trusted. The Verilog
-semantics specification is therefore much smaller compared to the 1721 lines of the implementation
-that are written in Coq, which are the verified parts of the HLS tool, even when the Clight
-semantics are added. In addition to that, understanding the semantics specification is simpler than
-trying to understand the translation algorithm. We conclude that the trusted base has been
-successfully reduced.
+Looking at the trusted computing base of Vericert, the Verilog semantics is 431 lines of code. This
+and the Clight semantics from are the only parts of the compiler that need to be trusted. The
+Verilog semantics specification is therefore much smaller compared to the 1721 lines of the
+implementation that are written in Coq, which are the verified parts of the HLS tool, even when the
+Clight semantics are added. In addition to that, understanding the semantics specification is
+simpler than trying to understand the translation algorithm. We conclude that the trusted base has
+been successfully reduced.
\stopcomponent