summaryrefslogtreecommitdiffstats
path: root/eval.tex
diff options
context:
space:
mode:
authorn.ramanathan14 <n.ramanathan14@imperial.ac.uk>2020-09-15 08:40:52 +0000
committeroverleaf <overleaf@localhost>2020-09-15 08:40:55 +0000
commitebaf38ee9fa9ed6a74230c93fa2d15df44521c9a (patch)
tree9b661c26ec60e0710668e108e879abbfd1ecb3bd /eval.tex
parent66b5b9ca3de555980d93864fb7bac5e8ea0fcb1c (diff)
downloadfccm21_esrhls-ebaf38ee9fa9ed6a74230c93fa2d15df44521c9a.tar.gz
fccm21_esrhls-ebaf38ee9fa9ed6a74230c93fa2d15df44521c9a.zip
Update on Overleaf.
Diffstat (limited to 'eval.tex')
-rw-r--r--eval.tex203
1 files changed, 149 insertions, 54 deletions
diff --git a/eval.tex b/eval.tex
index 6330647..1708519 100644
--- a/eval.tex
+++ b/eval.tex
@@ -1,110 +1,205 @@
-\section{Evaluation}
+\section{Evaluation}\label{sec:evaluation}
\definecolor{vivado}{HTML}{7fc97f}
\definecolor{intel}{HTML}{beaed4}
\definecolor{legup}{HTML}{fdc086}
\begin{figure}
- \resizebox{0.5\textwidth}{!}{%
+ \resizebox{0.47\textwidth}{!}{%
\begin{tikzpicture}
- \begin{scope}
- \draw[fill=vivado,fill opacity=0.5] (-4.4,4.4) ellipse (3.75 and 2.75);
- \draw[fill=intel,fill opacity=0.5] (-10.2,4.4) ellipse (3.75 and 2.75);
- \draw[fill=legup,fill opacity=0.5] (-7.3,2) ellipse (3.75 and 2.75);
- \node at (-10.2,6.3) {\Large\textsf{\textbf{Vivado}}};
- \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel HLS}}};
- \node at (-7.3,0) {\Large\textsf{\textbf{Legup}}};
- \end{scope}
-
+ \draw (-14.5,7.65) rectangle (0,-1);
+ \fill[vivado,fill opacity=0.5] (-4.4,4.4) ellipse (3.75 and 2.75);
+ \fill[intel,fill opacity=0.5] (-10.2,4.4) ellipse (3.75 and 2.75);
+ \fill[legup,fill opacity=0.5] (-7.3,2) ellipse (3.75 and 2.75);
+ \draw[white] (-4.4,4.4) ellipse (3.75 and 2.75); % making the
+ \draw[white] (-10.2,4.4) ellipse (3.75 and 2.75); % outlines
+ \draw[white] (-7.3,2) ellipse (3.75 and 2.75); % fully opaque
+ \node[align=center] at (-10.2,6.3) {\Large\textsf{\textbf{Xilinx Vivado HLS}} \\ \textsf{\textbf{(all versions)}}};
+ \node at (-4.4,6.3) {\Large\textsf{\textbf{Intel HLS Compiler}}};
+ \node at (-7.3,0) {\Large\textsf{\textbf{LegUp}}};
+
+ \node at (-5.5,3) {\Huge 1 (\textcolor{red}{1})};
+ \node at (-9.1,3) {\Huge 4 (\textcolor{red}{0})};
+ \node at (-3,5) {\Huge 26 (\textcolor{red}{540})};
+ \node at (-11.6,5) {\Huge 79 (\textcolor{red}{20})};
+ \node at (-7.3,1) {\Huge 162 (\textcolor{red}{6})};
+ \node at (-7.3,5.2) {\Huge 0 (\textcolor{red}{5})};
+ \node at (-7.3,3.8) {\Huge 0 (\textcolor{red}{0})};
+ \node at (-13.6,-0.5) {\Huge 5856};
\end{tikzpicture}
}
-\caption{Venn diagram showing the existing tools and their current features. \textbf{Implementation} refers to usable HLS tools, whereas \textbf{Proof} refers to papers that demonstrate proofs of an algorithm without necessarily linking that proof with the algorithm that implements it.}\label{fig:existing_tools}
+\caption{A Venn diagram showing the number of failures in each tool out of 6700 test cases that were run. Overlapping regions mean that the test cases failed in all those tools. The numbers in parentheses represent the number of test cases that timed out.}\label{fig:existing_tools}
\end{figure}
\begin{table}
\centering
\begin{tabular}{lr}\toprule
- \textbf{Tool} & \textbf{Bugs found}\\
+ \textbf{Tool} & \textbf{Unique Bugs}\\
\midrule
- Vivado 2018.3 & 2\\
- Vivado 2019.1 & 2\\
- Vivado 2019.2 & 2\\
- Legup 4.0 & 4\\
- Intel HLS & 1\\
+ Xilinx Vivado HLS (all versions) & $\ge 2$\\
+ LegUp HLS & $\ge 3$\\
+ Intel HLS Compiler & $\ge 1$\\
\bottomrule
\end{tabular}
\caption{Unique bugs found in each tool.}
\label{tab:unique_bugs}
\end{table}
-During the implementation stage, the testing system successfully detected bugs among all three tools. The test cases used were not the same for each tool, because, as being mentioned, each tool has its own supported syntax. Thus, to ensure the validness of test cases during the implementation stage, test cases were all generated based on the supported syntax of the targeted tools.
-
-After the implementation of testing system is proven to be working, a set of 10,000 test cases are generated and fed into HLS tools for constructing the bigger picture regarding the quality of HLS tools. The 10,000 test cases were kept constant to ensure the comparison is fair between tools. Unfortunately, 10,000 test runs were still in progress by the time of writing this thesis, as runtime for each test case can range from 5 minutes up to several hours. Data showing in \ref{result table for 10k test cases} is collected when writing this report.
+We generate 6700 test cases and provide them to three HLS tools: Vivado HLS, LegUp HLS and Intel HLS.
+We use the same test cases across all tools for fair comparison.
+We were able to test three different versions of Vivado HLS (v2018.3,v2019.1 and v2019.2).
+We were only able to test one version of LegUp: 4.0.
+At the point of writing, LegUp 7.5 is still GUI-based and therefore we could not script our tests.
+However, we are able to reproduce bugs found in LegUp 4.0 in LegUp 7.5.
+Finally, we tested one version of Intel HLS (vXXXX.X).
+
+% Three different tools were tested, including three different versions of Vivado HLS. We were only able to test one version of LegUp HLS (version 4.0), because although LegUp 7.5 is available, it is GUI-based and not amenable to scripting. However, bugs we found in LegUp 4.0 were reproduced manually in LegUp 7.5.
+% LegUp and Vivado HLS were run under Linux, while the Intel HLS Compiler was run under Windows.
+
+\subsection{Results across different HLS tools}
+
+Figure~\ref{fig:existing_tools} shows a Venn diagram of our results.
+We see that 167 (2.5\%), 83 (1.2\%) and 26 (0.4\%) test-cases fail in LegUp, Vivado HLS and Intel HLS respectively.
+Despite Intel HLS having the lowet failure rate, it has the highest time-out rate with 540 programs, because of its long compilation time.
+% We remark that although the Intel HLS Compiler had the smallest number of confirmed test-case failures, it had the most time-outs (which could be masking additional failures)
+Note that the absolute numbers here do not necessary correspond number of bugs found.
+Multiple programs could crash or fail due to the same bug.
+Hence, we reduce many of the failing test-cases to identify unique bugs, as summarised in Table~\ref{tab:unique_bugs}.
+We write `$\ge$' in the table to indicate that all the bug counts are lower bounds -- we did not have time to go through the test-case reduction process for every failure.
+
+\subsection{Results across versions of an HLS tool}
+
+Besides comparing the reliability of different HLS tools, we also investigated the reliability of Vivado HLS over time. Figure~\ref{fig:sankey_diagram} shows the results of giving 3645 test cases to Vivado HLS 2018.3, 2019.1 and 2019.2.
+Test cases that pass and fail in the same tools are grouped together into a ribbon.
+For instance, the topmost ribbon represents the 31 test-cases that fail in all three versions of Vivado HLS. Other ribbons can be seen weaving in and out; these indicate that bugs were fixed or reintroduced in the various versions. The diagram demonstrates that Vivado HLS 2018.3 contains the most failing test cases compared to the other versions, having 62 test cases fail in total. %Interestingly, Vivado HLS 2019.1 and 2019.2 have a different number of failing test cases, meaning feature improvements that introduced bugs as well as bug fixes between those minor versions.
+Interestingly, as an indicator of reliability of HLS tools, the blue ribbon shows that there are test-cases that fail in v2018.3, pass in v2019.1 but then fail again in 2019.2.
+
+\definecolor{ribbon1}{HTML}{8dd3c7}
+\definecolor{ribbon2}{HTML}{b3de69}
+\definecolor{ribbon3}{HTML}{bebada}
+\definecolor{ribbon4}{HTML}{fb8072}
+\definecolor{ribbon5}{HTML}{80b1d3}
+\definecolor{ribbon6}{HTML}{fdb462}
+\begin{figure}
+ \centering
+ \begin{tikzpicture}
+ \draw[white, fill=ribbon1] (-1.0,4.1) -- (0.0,4.1000000000000005) to [out=0,in=180] (2.0,4.1000000000000005) to [out=0,in=180] (4.0,4.1000000000000005) -- (6.0,4.1000000000000005) -- %(7.55,3.325) --
+ (6.0,2.5500000000000003) -- (4.0,2.5500000000000003) to [out=180,in=0] (2.0,2.5500000000000003) to [out=180,in=0] (0.0,2.5500000000000003) -- (-1.0,2.55) -- cycle;
+ \draw[white, fill=ribbon2] (-1.0,2.55) -- (0.0,2.5500000000000003) to [out=0,in=180] (2.0,1.8) to [out=0,in=180] (4.0,1.55) -- (6.0,1.55) -- %(7.3,0.9) --
+ (6.0,0.25) -- (4.0,0.25) to [out=180,in=0] (2.0,0.5) to [out=180,in=0] (0.0,1.25) -- (-1.0,1.25) -- cycle;
+ \draw[white, fill=ribbon3] (-1.0,1.25) -- (0.0,1.25) to [out=0,in=180] (2.0,2.5500000000000003) to [out=0,in=180] (4.0,0.25) -- (6.0,0.25) -- %(6.05,0.225) --
+ (6.0,0.2) -- (4.0,0.2) to [out=180,in=0] (2.0,2.5) to [out=180,in=0] (0.0,1.2000000000000002) -- (-1.0,1.2) -- cycle;
+ \draw[white, fill=ribbon4] (-1.0,0.5) -- (0.0,0.5) to [out=0,in=180] (2.0,2.5) to [out=0,in=180] (4.0,0.2) -- (6.0,0.2) -- %(6.2,0.1) --
+ (6.0,0.0) -- (4.0,0.0) to [out=180,in=0] (2.0,2.3000000000000003) to [out=180,in=0] (0.0,0.30000000000000004) -- (-1.0,0.3) -- cycle;
+ \draw[white, fill=ribbon5] (-1.0,1.2) -- (0.0,1.2000000000000002) to [out=0,in=180] (2.0,0.5) to [out=0,in=180] (4.0,2.5500000000000003) -- (6.0,2.5500000000000003) -- %(6.2,2.45) --
+ (6.0,2.35) -- (4.0,2.35) to [out=180,in=0] (2.0,0.30000000000000004) to [out=180,in=0] (0.0,1.0) -- (-1.0,1.0) -- cycle;
+ \draw[white, fill=ribbon6] (-1.0,0.3) -- (0.0,0.30000000000000004) to [out=0,in=180] (2.0,0.30000000000000004) to [out=0,in=180] (4.0,2.35) -- (6.0,2.35) -- %(6.3,2.2) --
+ (6.0,2.0500000000000003) -- (4.0,2.0500000000000003) to [out=180,in=0] (2.0,0.0) to [out=180,in=0] (0.0,0.0) -- (-1.0,0.0) -- cycle;
+
+ \draw[white, fill=black] (-0.4,4.1) rectangle (0.0,1.0);
+ \draw[white, fill=black] (1.8,4.1) rectangle (2.2,2.3);
+ \draw[white, fill=black] (3.8,4.1) rectangle (4.2,2.05);
+
+ \node at (-0.2,4.5) {2018.3};
+ \node at (2,4.5) {2019.1};
+ \node at (4,4.5) {2019.2};
+ %\node at (2,5) {Vivado HLS};
+
+ \node at (5.5,3.325) {31};
+ \node at (5.5,0.9) {26};
+ \node at (5.5,2.2) {6};
+
+ \node[white] at (-0.2,1.2) {62};
+ \node[white] at (2,2.5) {36};
+ \node[white] at (4,2.25) {41};
+ \end{tikzpicture}
+ \caption{A Sankey diagram that tracks 3645 test cases through three different versions of Vivado HLS. The ribbons collect the test cases that pass and fail together. The 3573 test cases that pass in all three versions are not depicted.
+ }\label{fig:sankey_diagram}
+\end{figure}
+% \NR{Why are there missing numbers in the ribbons?}
-Three versions of Vivado HLS, including version 2019.2, 2019.1, and 2018.3, were tested. As being mentioned before, Vivado HLS version 2019.2 and 2019.1 won’t process logical AND operator with constant, which will warn about changing to bitwise AND operator (\verb|&|) and then error out. Thus, as shown in \ref{result table}, the column named as “Invalid test cases” indicates test cases that have logical AND operator (\verb|&&|) with constants. It is also being said that version 2018.3 managed to cope with the logical AND operator, so the “invalid test cases” section for version 2018.3 only count the test cases that has GCC and Vivado HLS’s C simulation result unmatched.
+As in our Venn diagram, the absolute numbers in Figure~\ref{fig:sankey_diagram} do not necessary correspond to the number of bugs. However, we can deduce from this diagram that there must be at least six unique bugs in Vivado HLS, given that a ribbon must contain at least one unique bug. \YH{Contradicts value of 3 in Table~\ref{tab:unique_bugs}, maybe I can change that to 6?} \JW{I'd leave it as-is personally; we have already put a `$\ge$' symbol in the table, so I think it's fine.}
+In addition to that, it can then be seen that Vivado HLS v2018.3 must have at least 4 individual bugs, of which two were fixed and two others stayed in Vivado HLS v2019.1. However, with the release of v2019.1, new bugs were introduced as well. % Finally, for version 2019.2 of Vivado HLS, there seems to be a bug that was reintroduced which was also present in Vivado 2018.3, in addition to a new bug. In general it seems like each release of Vivado HLS will have new bugs present, however, will also contain many previous bug fixes. However, it cannot be guaranteed that a bug that was previously fixed will remain fixed in future versions as well.
-Moving on to the LegUp HLS version 4.0, the C/RTL unmatched result section is composed of assertion errors and C and RTL results unmatched. The assertion error happens when translating from C to RTL, which results in no Verilog file being produced. Therefore, this condition is also considered as inconsistent results since the LegUp HLS failed to translate an accurate RTL description for the inputted valid C code. Although, proportionally, the total number of C and RTL mismatch detected in LegUp HLS is much higher compared to which of Vivado HLS, two points must be emphasized. Firstly, LegUp HLS version 4.0 was implemented long time ago in 2015 and published as open sources for academic and research uses. By the time of writing this thesis, version 8.0 was just released for commercial uses. As mentioned before, version 7.5 was installed and run as GUI. Although it was not being fuzzing-tested under 10,000 test cases due to difficulties encountered when running it through the command line, it was used as a reference for comparing differences between versions released. By trying out several failing tests on version 7.5, some of the failing tests passed successfully without causing the same problem. Then we can confirm that some of the embedded issues have been solved already. Thus, lesser discrepancy in results when running through the newer versions should be expected. Secondly, Vivado HLS version 2019.2 errors out plenty of test cases due to the pragma error. Only a subset of test cases was synthesized to RTL and being simulated. Reducing in overall valid test cases can result in a lower amount of unmatched test cases. Thus, the results for LegUp HLS should not be treated “equally” with which for Vivado HLS. Those two points needed to be taken into consideration when analyzing the results obtained from LegUp HLS.
+\subsection{Some specific bugs found}
-Finally, as for Intel HLS, three points need to be taken into account when analyzing the result for Intel HLS. Firstly, the hashing function used for Intel HLS is much simpler comparing to which is used for both Vivado HLS and LegUp HLS. One possible result of using simple hashing function is that a considerable amount of bug can go undetected, which can lower the total number of cases that have C and RTL results unmatched. Secondly, as Intel HLS is running under the Windows environment, it runs relatively slower comparing to other tools that run under the Linux system. Based on the result, a considerable amount of test runs was timed out, which ultimately decreases the total number of valid test runs. Therefore, similar to what has been mentioned about the reduced overall valid test cases for Vivado HLS, when analyzing the Intel HLS, this should also be taken into consideration. Lastly, as Intel HLS uses the I++ compiler instead of GCC, differences can exist. Although theoretically, the C++ compiler does have stricter requirements, it should be compatible with compiling C programs. And as the test programs were generated through Csmith in C language, words like “bool” or “class”, which are only supported in C++ but not in C, do not exist. Also, Csimth was forbidden to generate operations like malloc, which might trigger incompatibility between C and C++. Thus, although compatibility can pose a potential problem, it should not have a huge impact. Besides those three points, similar to Vivado HLS, Intel HLS will alert about using logical AND operation with constants, but it does not error out immediately. So the “invalid test cases” section is set to not applicable.
+This section describes some of the bugs that were found in the various tools that were tested. We describe two bugs in LegUp and one in Vivado HLS; in each case, the bug was first reduced automatically using \creduce{}, and then reduced further manually to achieve the minimal test case. Although we did find test-case failures in the Intel HLS Compiler, the very long compilation times for that tool meant that we did not have time to reduce any of the failures down to an example that is minimal enough to present here.
-We generate $10$ thousand test programs to test on three different HLS tools, including three versions of Vivado HLS, one version of LegUp HLS and one version of Intel HLS.
-We provide the same set of programs to all HLS tools and show that different tools give rise to unique bugs.
+\subsubsection{LegUp assertion error}
+The code shown in Figure~\ref{fig:eval:legup:assert} leads to an assertion error in LegUp 4.0 and 7.5 even though it should compile without any errors.
+An assertion error counts as a crash of the tool, as it means that an unexpected state was reached by this input.
+This shows that there is a bug in one of the compilation passes in LegUp, however, due to the assertion the bug is caught in the tool before it produces an incorrect design.
-\begin{figure}\centering
+\begin{figure}
\begin{minted}{c}
int a[2][2][1] = {{{0},{1}},{{0},{0}}};
int main() { a[0][1][0] = 1; }
\end{minted}
-\caption{This test cases crashes LegUp 4.0 with an assertion error.}\label{fig:legup_crash1}
+\caption{This program causes an assertion failure in LegUp HLS when \texttt{NO\_INLINE} is set.}\label{fig:eval:legup:assert}
\end{figure}
-\begin{figure}\centering
-\begin{minted}{c}
-union U1 { int a; };
+The buggy test case has to do with initialisation and assignment to a three-dimensional array, for which the above piece of code is the minimal example. However, in addition to that it requires the \texttt{NO\_INLINE} flag to be set, which disables function inlining. The code initialises the array with zeroes except for \texttt{a[0][1][0]}, which is set to one. Then the main function assigns one to that same location. This code on its own should not actually produce a result and should just terminate by returning 0, which is also what the design that LegUp generates does when the \texttt{NO\_INLINE} flag is turned off.
-volatile union U1 un = {0};
+%The following code also produces an assertion error in LegUp, which is a different one this time. This bug was not discovered during the main test runs of 10 thousand test cases, but beforehand, which meant that we disabled unions from being generated. However, this bug also requires the \texttt{volatile} keyword which seems to be the reason for quite a few mismatches in LegUp and Vivado.
+%
+%\begin{minted}{c}
+%union U1 { int a; };
+%
+%volatile union U1 un = {0};
+%
+%int main() { return un.a; }
+%\end{minted}
-int main() { return un.a; }
-\end{minted}
-\caption{This crashes in Legup 4.0.}\label{fig:legup_crash2}
-\end{figure}
+\subsubsection{LegUp miscompilation}
-\begin{figure}\centering
+The test case in Figure~\ref{fig:eval:legup:wrong} produces an incorrect Verilog in LegUp 4.0, which means that the results of RTL simulation is different to the C execution.
+
+\begin{figure}
\begin{minted}{c}
volatile int a = 0;
int b = 1;
-int main() {
+int main() {
int d = 1;
if (d + a) b || 1;
else b = 0;
return b;
}
\end{minted}
-\caption{This crashes in Legup 4.0.}\label{fig:legup_crash2}
+\caption{An output mismatch: LegUp HLS returns 0 but the correct result is 1.}\label{fig:eval:legup:wrong}
\end{figure}
-\NR{
-What we need for this section:
-\begin{itemize}
- \item Venn diagrams to show the overall results. Onlyy missing information from Venn diagram is the unique bugs per tool, which we can provide with a smaller table. This table can actually lead nicely to the bug portfolio.
- \item Sankey diagram for different versions of Vivado HLS.
- \item A portfolio of bugs on various tools, together with intuitive explanation for why we see these bugs. We can also mention filing of bug reports, where relevant.
- \item Is the area and latency of buggy hardware of interest to us?
-\end{itemize}
-}
+In the code above, \texttt{b} has value 1 when run in GCC, but has value 0 when run with LegUp 4.0. If the \texttt{volatile} keyword is removed from \texttt{a}, then the Verilog produces the correct result. As \texttt{a} and \texttt{d} are constants, the \code{if} statement should always produce go into the \texttt{true} branch, meaning \texttt{b} should never be set to 0. The \texttt{true} branch of the \code{if} statement only executes an expression which is not assigned to any variable, meaning the initial state of all variables should not change. However, LegUp HLS generates a design which enters the \texttt{else} branch instead and assigns \texttt{b} to be 0. The cause of this bug seems to be the \texttt{volatile} keyword and the analysis that is performed to simplify the \code{if} statement.
+
+\subsubsection{Vivado HLS miscompilation}
+
+Figure~\ref{fig:eval:vivado:mismatch} shows code that does not output the right value when compiled with all Vivado HLS versions, as it returns \texttt{0x0} with Vivado HLS whereas it should be returning \texttt{0xF}. This test case is much longer compared to the other test cases that were reduced and could not be made any smaller, as everything in the code seems to be necessary to trigger the bug.
-\subsection{Venn diagram of results}
+The array \texttt{a} is initialised to all zeroes, as well as the other global variables \texttt{g} and \texttt{c}, so as to not introduce any undefined behaviour. However, \texttt{g} is also given the \texttt{volatile} keyword, which ensures that the variable is not optimised away. The function \texttt{d} then accumulates the values \texttt{b} that it is passed into a hash stored in \texttt{c}. Each \texttt{b} is eight bits wide, so function \texttt{e} calls the function 7 times for some of the bits in the 64-bit value of \texttt{f} that it is passed. Finally, in the main function, the array is initialised partially with a \code{for} loop, after which the \texttt{e} function is called twice, once on the volatile function and once on a constant. Interestingly, the second function call with the constant is also necessary to trigger the bug.
-\subsection{Bug portfolio}
-\NR{In fact, the structure of this section relates to the Venn diagram.}
+\begin{figure}
+\begin{minted}{c}
+volatile unsigned int g = 0;
+int a[256] = {0};
+int c = 0;
-\subsubsection{Vivado HLS}
+void d(char b) { c = (c & 4095) ^ a[(c ^ b) & 15]; }
-\subsubsection{LegUp HLS}
+void e(long f) {
+ d(f); d(f >> 8); d(f >> 16); d(f >> 24);
+ d(f >> 32); d(f >> 40); d(f >> 48);
+}
+
+int main() {
+ for (int i = 0; i < 56; i++) a[i] = i;
+ e(g); e(-2L);
+ return c;
+}
+\end{minted}
+\caption{An output mismatch where GCC returns \texttt{0xF}, whereas Vivado HLS return \texttt{0x0}.}\label{fig:eval:vivado:mismatch}
+\end{figure}
-\subsubsection{Intel HLS}
%%% Local Variables: