method.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

\section{Method}

% \NR{I think this section is very detailed. I think we can start with a figure of our tool-flow. Then, we can break down each item in the figure and discuss the necessary details (Fig.~\ref{fig:method:toolflow}).}

The overall flow of our testing approach is shown in Figure~\ref{fig:method:toolflow}.
\input{tool-figure}
This section describes how test-cases are generated (\S\ref{sec:method:csmith}), executed (\S\ref{sec:method:testing}), and reduced (\S\ref{sec:method:reduce}).


%In~\S\ref{sec:method:csmith}, we describe how we modify Csmith to generate random HLS-friendly programs for our testing campaign.
%\item 
%In~\S\ref{sec:method:annotate}, we discuss how we augment those random programs with directives and the necessary configuration files for HLS compilation. 
%\item 
%In~\S\ref{sec:method:testing}, we discuss how we set up compilation and co-simulation checking for the three HLS tools under test. 
%\item 
%Finally, in~\S\ref{sec:method:reduce}, we discuss how we reduce problematic programs in order to obtain minimal examples of bugs.
%\end{itemize}

% How we configure Csmith so that it only generates HLS-friendly programs.
% \item How we process the programs generated by Csmith to add in labels on loops etc.
% \item How we generate a TCL script for each program.
% \item How we run each HLS tool, using timeouts as appropriate.
% \end{itemize}

\subsection{Generating test-cases}
\label{sec:method:csmith}

% \NR{
% Questions:
% \begin{itemize}
%     \item Do we use Csmith's hashing or our own hashing now? Is that what \texttt{transparent\_crc} is for?
% \end{itemize}
% }
% \YH{I believe that we are now maybe using our own hashing, even though one of the bugs in Vivado was found in the Csmith hashing algorithm..  So I don't know how best to mention that.}

%For our testing campaign, we require a random program generator that produces C programs that are both semantically valid and feature-diverse; Csmith~\cite{yang11_findin_under_bugs_c_compil} meets both these criteria.
%Csmith is randomised code generator of C programs for compiler testing, that has found more than 400 bugs in software compilers. 
%Csmith provides several properties to ensure generation of valid C programs.
%Csmith is designed to ensure that all the programs it generates are syntactically valid (i.e. there are no syntax errors), semantically valid (for instance: all variable are defined before use), and free from undefined behaviour (undefined behaviour indicates a programmer error, which means that the compiler is free to produce any output it likes, which renders the program useless as a test-case). Csmith programs are also deterministic, which means that their output is fixed at compile-time; this property is valuable for compiler testing because it means that if two different compilers produce programs that produce different results, we can deduce that one of the compilers must be wrong.
%Validity is critical for us since these random programs are treated as our ground truth in our testing setup, as shown in Figure~\ref{fig:method:toolflow}.

%We use a modified version of Csmith to generate our test-cases. 
%Csmith is designed so that each C program is syntactically and semantically valid, and free from undefined behaviour. 

%Additionally, Csmith allows users control over how it generates programs.
%For instance, the probabilities of choosing various C constructs can be tuned.
%By default, Csmith assigns a probability value for each of these features.
%Csmith also allows users to customise these values to tune program generation for their own purposes.
%This is vital for our work since we want to generate programs that are HLS-friendly. 
%Configuring Csmith to generate HLS-friendly programs is mostly an exercise of reducing and restricting feature probabilities.


\begin{table}
    \centering
        \caption{Summary of changes to Csmith's probabilities and parameters.}\label{tab:properties}
    \begin{tabular}{ll}
      \toprule
      \textbf{Property/Parameter} & \textbf{Change}  \\
      \midrule
      \code{statement\_ifelse\_prob} & Increased \\
      \code{statement\_for\_prob} & Reduced \\
      \code{statement\_arrayop\_prob} & Reduced \\
      \code{statement\_break/goto/continue\_prob} & Reduced \\
      \code{float\_as\_ltype\_prob} &  Disabled \\
      \code{pointer\_as\_ltype\_prob} & Disabled \\
      \code{union\_as\_ltype\_prob} &  Disabled \\
      \code{more\_struct\_union\_type\_prob} & Disabled \\
      \code{safe\_ops\_signed\_prob} & Disabled \\
      \code{binary\_bit\_and/or\_prob} & Disabled \\
      \code{-{}-no-packed-struct} & Enabled \\
      \code{-{}-no-embedded-assigns} & Enabled\\
      \code{-{}-no-argc} & Enabled\\
      \code{-{}-max-funcs} & 5 \\
      \code{-{}-max-block-depth} & 2  \\
      \code{-{}-max-array-dim} & 3 \\
      \code{-{}-max-expr-complexity} & 2 \\
      \bottomrule
    \end{tabular}
\end{table}

Csmith exposes several parameters through which the user can adjust how often various C constructs appear in the randomly generated programs. Table~\ref{tab:properties} describes how we configure these parameters.
%\paragraph{Significant probability changes}
%Table~\ref{tab:properties} lists the main changes that we put in place to ensure that HLS tools are able to synthesise all of our generated programs. 
Our overarching aim is to make the programs tricky for the tools to handle correctly (to maximise our chance of exposing bugs), while keeping the synthesis and simulation times low (to maximise the rate at which tests can be run).
For instance, we increase the probability of generating \code{if} statements to increase the number of control paths, but we reduce the probability of generating \code{for} loops and array operations since they generally increase run times but not hardware complexity.
We disable various features that are not supported by HLS tools such as assignments inside expressions, pointers, and union types.
We avoid floating-point numbers since these often involve external libraries or IPs on FPGAs, which make it hard to reduce bugs to a minimal form.
%Relatedly, we reduce the probability of generating \code{break}, \code{goto}, \code{continue} and \code{return} statements, because with fewer \code{for} loops being generated, these statements tend to lead to uninteresting programs that simply exit prematurely.

%\paragraph{Important restrictions}
%More importantly, we disable the generation of several language features to enable HLS testing. 
% \NR{Few sentences on different features whose probabilities were changed significantly.}
% Table~\ref{tab:properties} lists the important restrictions that we put in place to ensure that HLS tools are able to synthesise programs generated by Csmith. 
%First, we ensure that all mathematical expressions are safe and unsigned, to ensure no undefined behaviour. 
%We also disallow assignments being embedded within expressions, since HLS generally does not support them.
%We eliminate any floating-point numbers since they typically involve external libraries or use of hard IPs on FPGAs, which in turn make it hard to reduce bugs to their minimal form.
%We also disable the generation of pointers for HLS testing, since pointer support in HLS tools is either absent or immature~\cite{xilinx20_vivad_high_synth}.
%\YH{I've looked at the documentation and even pointer to pointer is supported, but maybe not pointer to pointer to pointer.  I think there was some other pointer assignment that didn't quite work, but I don't remember now.  Immature might be a good description though.}
%\YH{Figure \ref{fig:eval:vivado:mismatch} actually has void functions...} \JW{Hm, true. Perhaps these were introduced during reduction.}
%We disable the generation of unions as these were not supported by some of the tools such as LegUp 4.0.

%To decide whether a problematic feature should be disabled or reported as a bug, the tool in question is taken into account.  Unfortunately there is no standard subset of C that is supported by HLS tools; every tool chooses a slightly different subset.  It is therefore important to choose the right subset in order to maximise the number of real bugs found in each tool, while avoiding generating code that the tool does not support.  Therefore, we disable a feature if it fails gracefully (i.e. with an error message stating the issue) in one of the tools.  If the HLS tool fails in a different way though, such as generating a wrong design or crashing during synthesis, the feature is kept in our test suite.

%Use a volatile pointer for any pointer that is accessed multiple times within a single transaction (one execution of the C function). If you do not use a volatile pointer, everything except the first read and last write is optimized out to adhere to the C standard.

%We enforce that the main function of each generated program must not have any input arguments to allow for HLS synthesis. 
%We disable structure packing within Csmith since the ``\code{\#pragma pack(1)}'' directive involved causes conflicts in HLS tools because it is interpreted as an unsupported pragma. 
%We also disable bitwise AND and OR operations because when applied to constant operands, some versions of Vivado HLS errored out with `Wrong pragma usage.' 
%\NR{Did we report this problem to the developers? If so, can we claim that here?}\YH{We haven't I think, we only reported one bug.}

%\paragraph{Parameter settings}
%Finally, we tweak several integer parameters that influence program generation. 
%We limit the maximum number of functions (five) and array dimensions (three) in our random C programs, in order to reduce the design complexity and size. 
%\JW{Is this bigger or smaller than the default? Either way: why did we make this change?}
%We also limit the depth of statements and expressions, to reduce the synthesis and simulation times. 
% \JW{Why? Presumably this is to do with keeping synthesis/simulation times as low as possible?}

%\NR{I remove Zewei's writing but please feel free to re-instate them}

%\subsection{Augmenting programs for HLS testing}
%\label{sec:method:annotate}

To prepare the programs generated by Csmith for HLS testing, we modify them in two ways. First, we inject random HLS directives, which instruct the HLS tool to perform certain optimisations, including: loop pipelining, loop unrolling, loop flattening, loop merging, expression balancing, function pipelining, function-level loop merging, function inlining, array mapping, array partitioning, and array reshaping. Some directives can be applied via a separate configuration file (.tcl), some require us to add labels to the C program (e.g. to identify loops), and some require placing pragmas at particular locations in the C program. 

%rstrst\AD{Did any reduced test-case involve these HLS-specific features?} \JW{The LegUp bug in Figure 4 requires NO\_INLINE -- does that count? If so, perhaps we could append to the Figure 4 caption: `thus vindicating our strategy of adding random HLS directives to our test-cases'.}

The second modification has to do with the top-level function. Each program generated by Csmith ends its execution by printing a hash of all its variables' values, hoping that miscompilations will be exposed through this hash value. Csmith's built-in hash function leads to infeasibly long simulation times, so we replace it with a simple XOR-based one.

Finally, we generate a synthesisable testbench that executes the main function of the original C program, and a tool-specific script that instructs the HLS tool to create a design project and then build and simulate the design.


%We do this in two ways: program instrumentation and directive injection.
%This involves either modifying the C program or accompanying the C program with a configuration file, typically a \code{tcl} file.
%Finally, we must also generate a tool-specific build script per program, which instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.

%\paragraph{Instrumenting the original C program}
%To ensure that the C programs can be successfully synthesised and simulated, we must adhere to a few rules that are common to HLS tools. 
%We generate a synthesisable testbench that executes the main function of the original C program. 
%Csmith does not generate meaningful testbenches for HLS synthesis. 
%So we invoke the original C program's main function from another top-level function. 
%This top-level testbench contains a custom XOR-based hash function that takes hashes of the program state at several points during execution, combines all these hashes together, and then returns this value. 
%By making the program's output sensitive to the program state in this way, we maximise the likelihood of detecting bugs when they occur.
%Csmith-generated programs do already include their own hashing function, but we replaced this with a simple XOR-based hashing function because we found that the Csmith one led to infeasibly long synthesis times.
%We inject  hashing on program states at several stages of the C program.
%By doing so, we keep track of program state, increasing the likelihood of encountering a bug. 
%The final hash value is returned as a part of the main function to assist in determining the presence of a bug.

%\paragraph{Injecting HLS directives}
%Directives are used to instruct HLS tools to optimise the resultant hardware to meet specific performance, power and area targets. 
%Typically, a HLS tool identifies these directives and subjects the C code to customised optimisation passes.
%In order to test the robustness of these parts of the HLS tools, we randomly generated directives for each C program generated by Csmith.
%Some directives can be applied via a separate configuration file, others require us to add labels in the C program (e.g. to identify loops), and a few directives require placing pragmas at particular locations in a C program. 
%We generate three classes of directives: those for loops, those for functions, and those for variables. 
%For loops, we randomly generate directives including loop pipelining (with rewinding and flushing), loop unrolling, loop flattening, loop merging, loop tripcount, loop inlining, and expression balancing.
%\NR{Not so clear about nested loops. Discuss.}\YH{I believe they are just handled like simple loops?}
%For functions, we randomly generate directives including function pipelining, function-level loop merging, function inlining, and expression balancing.
%For variables, we randomly generate directives including array mapping, array partitioning and array reshaping.
%\NR{Would a table of directives help? }\YH{I think there might be too many if we go over all the tools.}

%\paragraph{Generating build script}
%In addition to the annotated C program and configuration file with directives, we also generate a tool-specific build script.
%This script instructs the HLS tool to create a design project and perform the necessary steps to build and simulate the design.

%\NR{Have I missed any LegUp directives? What is these lines referring to. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. }\YH{I think that covers most of it.}


\subsection{Compiling the test-cases using the HLS tools}
\label{sec:method:testing}

% \NR{Some notes on key points from this section: 
% \begin{itemize}
%     \item We needed time outs that several stages of the HLS flow to ensure scalable testing. 
%     \item LegUp 7.0 was not user-friendly enough to perform scripting. 
%     \item Intel HLS needs more interventions and time outs since its slow execution times are part of our critical path during testing and reduction.
%     \item We need to think of a good way to highlight the tooling difference and how it affects our changes. This is good information for anyone who is attempting to port any work across various tools.
% \end{itemize}
% }
% \NR{So Vivado HLS does co-simulation automatically, whereas we had to invoke them for other tools?}

%Having generated HLS-friendly programs and automatically augmented them with directives and meaningful testbenches, we are now ready to provide them to HLS tools for testing.
%Figure~\ref{fig:method:toolflow} shows the three stages of testing, depicted as the testing environment in the dashed area.
For each HLS tool in turn, we compile the C program to RTL and then simulate the RTL. 
We also compile the C program using GCC and execute it. 
%Although each HLS tool has its own built-in C compiler that could be used to obtain the reference output, we prefer to obtain the reference output ourselves in order to rely less on the tool being tested.
To ensure that our testing scales to large programs, we enforce several time-outs: we set a 5-minute time-out for C execution and a 2-hour time-out for C-to-RTL synthesis and RTL simulation.

%% JW: This paragraph is not really needed because we repeat the sentiment in the next subsection anyway.
%There two types of bugs that we can encounter in this testing setup: programs that cause the HLS tool to crash during compilation (e.g. an unhandled assertion violation or a segmentation fault), and programs where the software execution and the RTL simulation do not return the same value. Programs that cause either type of error are given to the reduction stage, which aims to minimise the programs and (hopefully) identify the root cause(s).


%\paragraph{Tool-specific trivia}
%Vivado HLS implements C and RTL simulation by default.

%For all tools, we set a 5-minute and 2-hour for C simulation and RTL simulation respectively. 
%In cases where simulation takes longer, we do not consider them as failures or crashes and we record them. 
% When testing Intel HLS, we place three time-outs since its execution is generally slower than other HLS tools: a \ref{XX}-minute time-out for C compilation, a \ref{XX}-hour time out for HLS compilation and a \ref{XX}-hour time-out for co-simulation. 

%  And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test-case can be dumped at any timeout period if the task is not finished within the limited time.

\subsection{Reducing buggy programs}\label{sec:method:reduce}

Once we discover a program that crashes the HLS tool or whose C/RTL simulations do not match, we systematically reduce it to its minimal form using the \creduce{} tool~\cite{creduce}, in the hope of identifying the root cause. This is done by successively removing or simplifying parts of the program, checking that the bug remains at each step.  We also check at each stage of the reduction process that the reduced program remains within the subset of the language that is supported by the HLS tools; without this check, we found that \creduce{} kept zeroing in on programs that were outside this subset and hence did not represent real bugs.

%\YH{Edit this section some more}

%Reduction is performed by iteratively removing some part of the original program and then providing the reduced program to the HLS tool for re-synthesis and co-simulation. 
%The goal is to find the smallest program that still triggers the bug. 
%We apply two consecutive methods of reduction in this work. 
%The first step is to reduce the labels and pragmas that were added afterwards to make sure that these do not affect the behaviour of the program.  These are reduced until there are no more declarations left or the bug does not get triggered anymore.
% \NR{We can add one or two more sentences summarising how we reduce the programs. Zewei is probably the best person to add these sentences.}\YH{Added some more lines, we can ask Zewei if she is OK with that.}
%Although, our custom reduction gives us the freedom and control of how to reduce buggy programs, it is arduous and requires a lot of manual effort.
%We then use the \creduce{} tool~\cite{creduce} to automatically reduce the remaining C program.
%\creduce{} is an existing reducer for C and C++ and runs the reduction steps in parallel to converge as quickly as possible.  It is effective because it reduces the input while preserving semantic validity and avoiding undefined behaviour.
%It has various reduction strategies, such as delta debugging passes and function inlining, that help it converge rapidly to a test-case that is small enough to understand and step through.

%However, the downside of using \creduce{} with HLS tools is that we are not in control of which lines and features are prioritised for removal.
%As a consequence, we can easily end up with \creduce{} producing programs that are not synthesisable, despite being valid C.
%This is because the semantics of C for HLS tools might be a bit different to the semantics for C of a standard C compiler that \creduce{} was built for.
%Even though \creduce{} does not normally introduce undefined behaviour, it can introduce behaviour that is unsupported in the HLS tools.
%An example is the reduction of a function call, where the reducer realises that a mismatch is still observed when the function call's arguments are removed, and the function pointer is assigned a constant instead.
%This is however often unsupported in HLS tools, since a function pointer does not have a concrete interpretation in hardware, because in the absence of instructions, functions are not associated with a particular memory location.
%Once unhandled behaviour is introduced at any point in the reduction, the test-cases will often zero in on that unhandled behaviour, even though it does not actually represent an interesting bug.
%To prevent this, we use a script to guide \creduce{} away from introducing these unhandled behaviours as much as possible. 
%This script involves adding \texttt{-fsanitize=undefined} to the GCC options in order to abort when undefined behaviour is detected at runtime, and erroring out whenever any warning is encountered while running the HLS tool (except common warnings that are known to be harmless).

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "main"
%%% End: