summaryrefslogtreecommitdiffstats
path: root/main.tex
diff options
context:
space:
mode:
authorYann Herklotz <git@yannherklotz.com>2020-08-17 11:04:51 +0100
committerYann Herklotz <git@yannherklotz.com>2020-08-17 11:04:51 +0100
commit285c6ef32e2a33307bdd72dc18d10f74547c31d1 (patch)
tree0567858fcf012cb4322d604574c845813b9a8dda /main.tex
parent04a4713305175a73783be120ba962d74b6ac0312 (diff)
parentaad889842dd0ac72afe9be7f61cdc2d3392ea4db (diff)
downloadfccm21_esrhls-285c6ef32e2a33307bdd72dc18d10f74547c31d1.tar.gz
fccm21_esrhls-285c6ef32e2a33307bdd72dc18d10f74547c31d1.zip
Merge branch 'master' of https://git.overleaf.com/5f2d21b9b10d6c0001c164a4
Diffstat (limited to 'main.tex')
-rw-r--r--main.tex117
1 files changed, 117 insertions, 0 deletions
diff --git a/main.tex b/main.tex
index aa5daef..ebfc506 100644
--- a/main.tex
+++ b/main.tex
@@ -152,6 +152,25 @@ Our method is brought over from the compiler testing literature. We use a tool c
We have tested three widely used HLS tools: LegUp~\cite{canis13_legup}, Xilinx Vivado HLS~\cite{xilinx20_vivad_high_synth}, and the Intel HLS Compiler~\cite{?}. For all three tools, we were able to find valid C programs that cause crashes while compiling and valid C programs that cause wrong RTL to be generated. We have submitted a total of \ref{?} bug reports to the developers, \ref{?} of which have been confirmed and \ref{?} of which have now been fixed at the time of writing.
+\section{Overview of the testing system}
+Three major commercial HLS tools, Vivado HLS, LegUp HLS, and Intel HLS, are being heavily tested with different versions. The testing flow we introduced and implemented is shown in \ref{?}(Fig) and can be categorized into 6 steps, including random program generation, pre-processing, directives/labels adding, HLS tools processing, result extraction and comparison, and reduction. Those helper functions are implemented using C language. The arrow lines inside the figure represent the connection between each step. Those connections are implemented either by the bash or batch scripts based on HLS tools. Besides making connections, scripts are also responsible for generating TCL/Makefiles if needed. Vivado HLS version 2019.2, 2019.1, and 2018.3, as well as LegUp HLS version 4.0, are instructed under the Linux setting, so bash shell scripts are implemented to direct the testing flow. Since we failed to run LegUp HLS version 7.5 through the command line, version 7.5 is installed on Windows and launched as GUI. Only test cases, which trigger discrepancies in results detected through version 4.0, have been run on version 7.5. Thus, the whole testing flow does not apply to LegUp version 7.5. Intel HLS is running under the Windows setting, so a batch script is written.
+
+Starting with random program generation, valid, random C/C++ programs are essential for ensuring the quality of test cases that will be feed into HLS tools. Csmith, implemented by a group of people from the University of Utah, is being chosen and used to generate random C/C++ programs. Csmith utilizes complex hashing functions to provide one single result that reflects every change in the variables’ value, which is extremely useful for later result comparison stage. Besides that, Csmith provides both built-in commands as well as a probability file for tuning the properties and structures of generated C/C++ programs. So the wide variety of test cases can be guaranteed. Note that Csmith can create programs that fail to terminate or don’t produce results. So, it is useful to pre-check whether if the C program can provide a valid result before feeding into HLS tools. Once C/C++ program is generated, it will then undergo a pre-processing step.
+
+As each HLS tools has different supported synthesizable syntax, pre-processing steps for each HLS tools differ from one another. For instance, Intel HLS doesn’t work correctly with Csmith’s hashing functions. The program generated by Csmith will be processed, aiming to replace its original hashing with a simple XOR hashing. This replacement eases the synthesis and simulation flow for Intel HLS to a great extent, but the downside is that some bugs can go undetected. Vivado HLS and LegUp HLS managed to cope with Csmith’s hashing functions; thus, there is no need for replacement. The pre-processing step not only twists the syntax while maximumly preserve the original functionality, but also extracts some necessary information regarding the generated programs. For instance, the total number and name of functions, number of for loops, and array variable’s names are useful information for applying suitable directives/pragmas automatically in the directive-selecting step.
+
+Types and quantities of directives/pragmas applied are selected randomly but ensured to be valid by checking with extracted information. For Vivado HLS and LegUp HLS, the selection process is done by scripts, whereas, for Intel HLS, the process is done using the C program. The reason of the distinction is that Intel HLS requires the pragmas to be directly added to the program. But for the other two tools, selected pragmas is written to either TCL scripts or Makefile, which will be used and applied when the HLS tool runs.
+
+After pragmas are selected, the C/C++ program will be processed again for adding labels at specific correct places. For example, if a program contains 5 for-loops and the loop-pipeline optimization is being chosen to apply on the second loop, a name/label needs to be added to the place where the second loop starts and leaving other loops unchanged.
+
+After the labels are being added, the program can be synthesized and compiled to get the golden C result for comparing with the RTL result later. GCC version 9.3.0 is used to compile and execute the C result for Vivado HLS, whereas version 4.8.2 is used for LegUp HLS. Intel HLS uses i++ for C++ programs. Once the program successfully produced a result, it can finally be feed into HLS tools for synthesizing, translating, and simulating. During this process, three types of results can exist, including matched C/RTL result, unmatched C/RTL result, and crashes. Theoretically, HLS tools can translate every C/C++ program into RTL if the syntax is supported. Also, the RTL result should be equivalent to the C/C++ result. However, we found this is not always the case and is the reason for implementing this testing method.
+
+The extraction and comparison stages involve extracting the RTL result from the command line, log file, or transcript, comparing it with the golden C result, and saving both the numerical returned result and comparison result to a complete result file. The comparison result is used to determine whether if the reduction process should start.
+
+The reduction process will only proceed when a test case fails, which does not include crashes. The process will iteratively comment out one functional line, then send the modified version to HLS tools. In this way, the functionality of the program is maximumly preserved while reducing it to the minimum program that still triggers the bug. Although this reduction process can reduce the program down to some extent, manual work is yet required since full-automated reduction requires more effort.
+
+A checker will be executed at the end when all the test cases are finished. It automatically analyzes and displays the summative result regarding the total number of tests performed, the amount of which did not produce a result, and the amount of which produced the wrong result. Depending on the HLS tools, other information is also displayed. For example, the number of assertion errors triggered is summarized for LegUp HLS.
+
\section{Method}
\begin{itemize}
\item How Csmith works.
@@ -161,6 +180,104 @@ We have tested three widely used HLS tools: LegUp~\cite{canis13_legup}, Xilinx V
\item How we run each HLS tool, using timeouts as appropriate.
\end{itemize}
+\subsection{How Csmith works}
+It is essential to have valid random C/C++ programs, as the software programs need to be correct to confirm that discrepancies in results are indeed introduced by HLS tools. Csmith became the best candidate for generating the random C/C++ programs. The bug-detecting ability of Csmith has already been proved. It has found more than 400 previously unknown compiler bugs \cite{csmith}. Programs generated by Csmith contains uniformed syntax, formatted function names and variable names, complex hashing functions, and a main that assembles all the sub-functions. It also provides a safe math wrapper function that avoids undefined behaviors in C, such as divided by 0 or mod by 0. As shown in \ref{?} (Fig of random program), Csmith creates a random program with the order of struct/union declarations, global variables, top-level function (func\underline{ }1), sub-functions (func\underline{ }6), and main. Inside the main function, only function 1 is being called as it is the top-most function, which will be calling other sub-functions. The crc32\underline{ }gentab and transparent\underline{ }crc function are responsible for creating and processing a unique hash for every global variable. XOR and shift are the primary operations done by the hashing process, which ensures uniqueness. The final hash value is calculated by looping through every index of the global variable and perform hashing, which is saved in crc32\underline{ }context and displayed.
+
+As programs are uniformly formatted, the later pre-processing step is eased since the overall program structure is predictable. It also facilitated the reduction process as it is easier to spot functions or variables with uniformed names. Furthermore, the bug-detecting efficiency is boosted by comparing the unique final hash value, which reflects all the changes made to the global variables.
+
+ Csmith is not only confirmed to generated uniform valid programs that do not contain any undefined behaviors, but also provides built-in commands as well as probability file for tuning the properties and structures of generated programs. This feature eases the process of restricting types of programs it creates since some syntax is not supported by HLS tools. Csmith gives flexibility while providing a wide range of possible test cases that boost the efficiency of challenging HLS tools.
+
+\subsection{How we configure Csmith so that it only generates HLS-friendly programs.}
+As being mentioned, HLS tools have specific supported syntax for synthesizable C/C++ programs. In other words, not every valid C grammar is allowed for synthesis. Fortunately, Csmith provides a probability file and built-in commands for tuning or restricting the programs it generates. By directly modifying the probability file or adding the commands, we can get HLS-friendly random C/C++ programs from Csmith. Although it is known that each HLS tools have different supported syntax, the reported 10,000 test cases are kept constant for every HLS tools regardless of supported or not. The 10,000 test cases are pre-generated with altering probability and commands every 1000 test cases. It is more useful to have constant test cases for comparing the performance between each HLS tool, although some test cases for specific tools might not be valid since it might contain unsupported syntax. But before running the final 10,000 test cases, Csmith has been configured only to generate test cases that match the supported syntax of each tool.
+
+Starting with Vivado HLS, based on the user guide, 2018.3, 2019.1, and 2019.2 version has limitations on synthesizing system calls, dynamic memory allocation such as malloc() and alloc(), function pointers, pointer casting, and recursive functions \cite{user_manual_vivado}. Pointer to pointer reference is also limited and not valid if it is on the top-level function. Although all three versions declared to have the same unsupported syntax, version 2019.2 and 2019.1 have stricter restrictions on the syntax of the programs generated by Csmith. For example, Csmith can produce both binary AND operator (&&) and binary bitwise AND operator (&). Although both operations are correct and valid as grammar-wise, both versions 2019.2 and 2019.1 do not accept the binary logical AND operation with constant operand when performing C synthesis. It warns that ‘&’ operator should be used for the bitwise operation with constants. Then the test case will error out as “Wrong pragma usage”. However, version 2018.3 didn’t have any trouble with this matter. Thus, in the TABLE \ref{result table}, a large number of invalid tests were caused by this reason. Two 2019 versions were by no means worse than the older version. To fit the need of both 2019 versions, only binary bitwise AND operator (&&) should be allowed to avoid triggering the pragma error. And that can be simply done by switch the probability of binary AND operator to 0 inside the Csmith’s probability file.
+
+Besides the binary-AND-operator problem, there are other features of Csmith that need to be turned off for Vivado HLS to work, and this also applies to both LegUp HLS and Intel HLS. By default, Csmith will generate programs with a main that reads in command line parameters. And standard argc and argv are used to read and parse the parameters. The only parameter it takes is used to determine whether if the user wanted the hash value to be printed. The parameter is compared with a constant integer to make that decision. However, argv is declared as an array of character strings, which is unsynthesisable due to the pointer to pointer limitation. Besides, strcmp used for comparison is also not supported. Since the final unique hash value is saved to crc32\underline{ }context as described\ref{overview section}; thus, printing or not does not affect the whole testing process as long as the crc32\underline{ }context variable is accessible. So argv and strcmp can be both removed safely, and this can be done by putting --no-argc command when running Csmith.
+
+Additionally, in order to not interfering with directives/pragmas of the HLS tools, --no-packed-struct command needs to be explicitly specified. Struct packing is not the issue, but the syntax Csmith generated confuses the HLS tools. If this command is not specified, Csmith will automatically declare “#pragma pack(1)” before struct definition to enables packed struct. Unfortunately, this has the same format with pragma declaration of HLS tools, thus, causes conflict and need to be removed.
+
+Other than the features mentioned above, there is a subset of C grammars that requires extra attention to ensure the proper functioning of all three HLS tools. All of which can be modified inside the probability file. Firstly, the probability of generating bitfield is turned off as it required extra twist in syntax to become HLS-friendly, and the correctness cannot be guaranteed. Secondly, since the union is not a feature commonly used in HLS, frequency of union’s occurrence in the program is set to a relatively low number. Note that it’s been confirmed that LegUp doesn’t have a good support for the union. In addition, the float is considered as another tricky datatype due to bit truncation and saturation, which can lead to precision issues. Although float is supported by HLS tools and been provided with HLS tools’ own library such as ap\underline{ }fixed in Vivado HLS and hls\underline{ }float in Intel HLS, the probability of generating float type variable has still been set low. Float type requires extra pre-processing steps with high precision; otherwise, the validity and correctness of the program fed to HLS tools cannot be assured.
+
+Furthermore, several Csimth commands are used to shape the program along the way but not compulsively. For example, in the number control section, --max-funcs command sets the maximum number of functions Csmith can generate. It is interesting to explore whether if the length of the program will affect the synthesis and simulation time. Similarly, --max-block-depth, --max-array-dim, and –max-expr-complexity restrict the maximum depth of the nested block, maximum array dimension, and maximum expression complexities. By alternating those parameters, HLS tools performance can be compared under different conditions.
+
+\subsection{How we process the programs generated by Csmith to add in labels on loops etc.}
+Once a valid and HLS-friendly C/C++ program is generated, the testing process moves on to the next stage, pre-processing. There are several parts of pre-processing, including information extraction, directives/labels adding, the main function reformatting, testbench implementation, and XOR hashing implementation.
+
+Starting with automatically gathering information of the random programs, the extracted information will be used as indicators for adding proper labels. Information needed includes function names, number of functions, number of for loops inside each function and inside the whole program, variable names, array variable names, and number of variables. The automation and correctness of extraction is warranted by the uniformed syntax of the Csmith generated program. As shown in \ref{sample random program fig}, the random program follows a specific order. Each parameter can be found by looping through the program and detecting the keyword. For instance, functions are always declared in the forward declaration section and one function declaration per line. Then by reading in each line and checking for “func\underline{ }” keyword, function names can be found and saved as variables for later use.
+
+After pre-processing, types and number of optimizations applied are selected through using either the scripts or a C program. Then another processing is required for adding labels or pragmas at the proper place, based on extracted data and selected optimization. Vivado HLS accepts both in-file pragma declaration and TCL script directive specification. TCL scripting was chosen to be the only place where the directives were declared. This simplifies the debug process since directives are not mixed with the program, which is easier to read. The detailed automated generation of TCL script will be described in \ref{tcl_generating section}. Directives for Vivado HLS can be roughly categorized into three-part including function-level, loop-level, and variable-wise. To realizing loop-level optimizations, unique labels need to be added to for-loops inside the program, whereas function-level and variable-wise optimizations require no modification in the program. The number of for-loop labels added is determined at the same time when the TCL script is generated. This will also be discussed in \ref{tcl_generating section}. LegUp HLS follows a similar method excepting it uses an additional Makefile with TCL script to instruct the optimizations. Intel HLS requires both labels and pragmas to be added directly to the program at the specific place. For instance, loop-level pipeline pragma needs to be declared directly under the loop name and above the actual loop that is being optimized.
+
+Once labels are added, there are several last modifications required before it can be synthesized by HLS tools. Changes differ for each HLS tools. For Vivado HLS, a standard RTL verification flow needs a self-checking testbench to be in place and sets as the main function. Although Csmith’s main function calls the top-level function, which can act as a testbench triggering all other functions, it lacks the self-checking ability. Besides, the Csmith’s generated main function only returns 0, which is meaningless for result checking. Thus, to fit the need of Vivado HLS, the name of Csmith’s generated main is changed to “result” to avoid conflicts and been explicitly set as the top-level function using the design-flow TCL script. The returned value is modified from 0 to crc32\underline{ }context, the unique hash value that reflects every change in the global variables. Then, a new main function, the self-checking testbench, is written for automatically comparing and reporting the result of C simulation and RTL simulation. The testbench has no additional operations except calling the top-level function and comparing the returned results, so it should not introduce discrepancies in results. Testbench will return a non-zero value if C/RTL results unmatches.
+
+Similar main modification applied for LegUp HLS except no new main function needed, as LegUp does not require a self-checking testbench. Intel HLS requires more processing comparing to the other two tools, mainly caused by unable to process Csmith’s hashing function embedded inside the main. Thus, a new simple but powerful hash method is required for efficient bug-detection. The best solution is to XORing all the variables since XOR is sensitive to bitwise changes, and XOR can be simply implemented using basic universal gates without requiring excessive resources. Meanwhile, the hash function is separated from the main acting as the new top-level function, whereas the main, behave as the testbench, calls the hash and returns the result.
+
+\subsection{How we generate a TCL script for each program.}
+The automated TCL script generation is required for both Vivado HLS and LegUp HLS. Vivado HLS uses two TCL scripts throughout the process. One is used for declaring directives, and the other one is for instructing the design flow. As for LegUp HLS, it requires one TCL file and one Makefile, which are only used for specifying directives. The design flow is instructed through the shell script. For both tools, the directive TCL script is generated randomly but validly based on the information extracted during the pre-processing stage.
+
+Starting with Vivado HLS, an example directive file generated is shown in \ref{sample directive tcl figure}. Categories of optimization, including function-level, loop-level, and variable-wise, is first selected randomly. Then the specific directive applied are picked randomly as well.
+
+\begin{enumerate}
+ \item For loop-level optimization, the base limitation is the number of for-loops that are available for optimization both inside the program and inside each function. It is possible for the program to have zero for-loops or massive nested for-loops, so there is a wide range of possible numbers. If the program contains for-loops, the amount selected for optimization is chosen between 1 and the total number included in the program. We have included 9 types of loop-level optimization, including loop pipeline with rewind, loop pipeline with flush, loop pipeline, loop unrolling, loop flatten, loop merge, loop tripcount, loop inline, and expression balance. Those optimizations will be randomly selected for each for-loop. Once the directive is chosen, it will be automatically printed to the directive TCL script with its targeted loop name and name of the function it is in.
+
+ \item For function-level optimization, the restriction is the total number of functions. The selection steps stay the same with which of the loop-level. Function-level optimization we allowed includes function pipeline, function pipeline with flush, function-level loop merge, function inline, and expression balance. Each directive will be specified and saved to the TCL script with the targeted function name.
+
+ \item As for variable-wise optimization, which is only applied to array variables, the limitation is the total number of global array variables used throughout the program. Available optimization includes vertical mode array map, horizontal mode array map, array partition, and array reshape. Each variable-wise directive will be added to the TCL script with the targeted array variable name. Note that the top-level function name is also specified for adding variable-wise directives because, for Csmith generated program, top-level function reads or writes all the variables.
+
+\end{enumerate}
+Once the directive TCL script is successfully generated, we will proceed to design-flow TCL script implementation. Design-flow TCL instructs Vivado HLS to perform RTL synthesis and verification in a specific order. The standard flow includes create a new project, set the top-level function, add necessary files, declare the solution number, declare the target device, set the clock period, source the directive TCL, synthesis C program, simulate C, co-simulate C and RTL, and finally export the design. The files added include pre-processed C program, self-implemented self-checking testbench, and the golden GCC result. The only modification that needs to be made for each test case is the project name. Other commands stay the same. For simplicity, the project is named following the add\underline{ }i pattern, where i is the number of the test case currently performing. All of which are specified inside the design-flow TCL in order.
+
+Comparing to Vivado HLS, LegUp HLS does not require an extra TCL script for design-flow controlling but two scripts, Makefile and Config TCL script, for specifying optimizations. Different optimizations are specified in different files. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. If the optimizations in Config TCL are chosen, the Config TCL needs to be explicitly sourced inside the Makefile as the local configuration. Otherwise, the standard global LegUp Config TCL will be used. Selection limitations and selection processes follow the same method as which of Vivado HLS. Loop type directives chosen are restricted by the total number of loops and need to be specified with the loop name. Note that loop-level optimization can only be applied to the innermost for-loop. Function-level directives selected are limited by the total number of functions and should be specified with function names.
+
+\subsection{How we run each HLS tool, using timeouts as appropriate.}
+The running stage of the testing process follows the order of C synthesis, C simulation, C to RTL translation, and eventually RTL simulation. Although the overall running process is identical, each HLS tool has a little twist at the running stage.
+
+Vivado HLS follows the standard running process with one extra step, C/RTL co-simulation. Its testing flow is instructed by the automatically generated TCL file, described in section \ref{tcl_generation section}. Before entering the Vivado HLS, a golden C result is first produced using GCC. The result is saved to the out.gold text file and eventually added to the Vivado project at the add-files step. Since Csmith can generate programs that fail to terminate, to avoid being trapped, a timeout limitation is set for 5 minutes starting when the executable is generated and ready to run. Once finished, the exit condition can be determined by echoing and comparing the exit status. If the exit code shows forced termination, the current test case will be dumped and treat as invalid. Besides, it is also possible for a successfully executed Csmith programs to return no result. Thus, not only the exit code is checked, but also the result printout is considered. This can be done by checking whether if any content has been written to the out.gold text file. If the text file is empty, the current test case will be forced to stop as well. The testing process will not proceed to Vivado HLS unless those two checks are passed. The second timeout limits the Vivado HLS’s runtime. It starts once the Vivado HLS’s project is created after passing the GCC-related checks. The second timeout is set for 2 hours. During the 2-hour period, Vivado HLS first performs its own C synthesis and simulation to produce a C result for comparing against the golden GCC result. Then C/RTL co-simulation step, which employs the self-checking testbench, simulates both C and RTL and compares the results. Thus, two comparisons are be made during the 2-hour period, and two types of unmatching results can exist. The detailed result extraction and comparison method will be described in \ref{result extraction section}. A project finishes typically within 2 hours, but not always. C to RTL translation, as well as RTL simulation, usually takes up most of the time. Note that a project that could not complete within 2 hours does not count as faulty, since most of the incomplete projects are still running when being forced to terminate by the terminal, based on log files.
+
+For running LegUp HLS, version 4.0 and 7.5 differs. Version 7.5 follows the same pattern as which for Vivado HLS, but under the Windows environment and launched as GUI. Since version 7.5 did not being tested exhaustively, the following description for the LegUp HLS running process will only be focused on and applied to version 4.0. Version 4.0 running process is instructed using a bash shell script, and two timeouts are used. The first 5 minutes timeout is applied for the same reason as which for Vivado HLS. The C result is produced through GCC with -m32 command for 32-bit data width. This is done to match with LegUp’s behavior. Then LegUp HLS does C to RTL translation and RTL simulation by calling the “make default v” command. And that is where the second 2-hour period timeout is placed. The RTL result will be used to compare directly against the golden C result, so only one comparison will be made.
+
+Lastly, Intel HLS is instructed by a batch script and employs the i++ compiler. The essential difference of running Intel HLS comparing to two other tools is that Intel HLS processes the C++ language, whereas the other two tools process C language. While Intel HLS uses i++ as its compiler comparing to GCC for the other two tolls, the timeout restriction still applies. And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test case can be dumped at any timeout period if the task is not finished within the limited time.
+
+\subsection{How we extract and compare the result}
+Results are extracted automatically either from log files/transcripts or the terminal’s output and then send for comparison either done by the HLS tools or self-implemented checking method. Although results will be compared immediately and returned for determining whether if the reduction process should be entered, it is still necessary to save every result produced for later analysis stage.
+
+Mentioned in the \ref{how to timeout section}, two comparisons are performed and returned by Vivado HLS automatically using the self-checking testbench. The returned exit code can be used as an indicator of whether to start the reduction process. The first comparison is made between the golden GCC C result and Vivado HLS’s C simulation result, whereas the second comparison is made between Vivado HLS’s C simulation and RTL simulation result. The golden result is extracted from the terminal’s output, whereas the Vivado HLS’s C simulation result and RTL simulation result are read from the log file. Saving results from the terminal’s output is done by simply echoing the exit code and redirect it to the text file. However, finding results from the log files requires more effort. The log file is automatically generated by Vivado HLS and has a specific pattern. It follows the top-down flow recording information about the C synthesis, C simulation, C testbench checking/comparing, and RTL testbench checking/comparing. Each section has a fixed header that indicates the step is in, such as C TB testing. Thus, the result extraction method is implemented by looping through the file, finding the specific header name, extract the hexadecimal results, and then save to the result file. Ideally, both comparison results can be found inside the log file. However, the log file may stop at the first comparisons if the C result doesn’t match with the GCC result. In the cases, only the C result will be extracted and saved.
+
+For LegUp HLS, version 4.0 will be the focus here, and it employs the same extraction method but reading from the transcripts. The transcript automatically generated by LegUp HLS also has a fixed pattern. And as LegUp HLS only produced the RTL result without comparing it with C result automatically; thus, a comparison method is required. Once the RTL result is produced and extracted from the transcript, it will be saved to both the result file and an empty temporary text file. Then, the comparison program will compare it against the golden C result. Both C golden result file and the temporary RTL result file will be refreshed for each test case; the comparison program only needs to get the first line of each file for the most current result. Then the comparison result will be outputted to the complete result file indicating whether if discrepancies have been detected. The terminal can then immediately grep the newly added line from the result file to determine whether to enter the reduction.
+
+The extraction and comparison steps for Intel HLS are compact since both C and RTL results are produced by the executable and can directly be read or saved from the command line. The extraction work is simplified as well as comparison. Result comparison is done once after all test cases have finished, unlike the other two tools, which require comparison to be done immediately. The reason is that, for Intel HLS, the reduction method is not applied because it would take a massive amount of time to reduce the program iteratively. Thus, there is no need to produce comparison results immediately.
+
+\subsection{How we reduce}
+The reduction process will only be triggered if the C and RTL result does not match. Test cases that either crash or forced to terminate by the timeout do not proceed to the reduction stage. As reduction is performed iteratively, long runtime is expected. Only two tools, Vivado HLS and LegUp HLS, are equipped with the reduction method. It is ignored for Intel HLS due to excessive runtime. The reduction flow can be described as follows.
+
+\begin{enumerate}
+ \item Reduction starts with the top-level function, which by default named func\underline{ }1. As being said, the random program generated by Csimth follows a strict order that the main always call the top-level function. So, we can trace down to the root problem by iteratively reduce the top-level function and see if any sub-function has been called.
+
+ The program is first being processed to extract the number of removable lines inside the func\underline{ }1. Lines, include variable declarations, variable initialization, for loops, if/else statements, while loops, goto and goto labels, continue, break, and return, do not count as removable lines. The reason for not removing variable declarations and initialization is that it can cause the undeclared variable error. Other non-removable features such as if/else statements and goto are kept to ensure the original logic flow. Changing the logic flow can lead to bugs went undetected, since if changed, some path might not be taken but was taken initially. By keeping those lines, the original functionality and logic flow of the program can be maximumly preserved.
+
+ \item By getting the total number of possible removable lines, each line will be iterative commented out. One line per time eases the aim of finding the exact line that causes the discrepancy in results. A new golden result will need to be produced since each line can have an impact on the final hash variable. The timeout command is placed to avoid non-terminating programs.
+
+ \item Once the new golden result is generated, the program will be fed into HLS tools and go through the same testing procedures as standard test cases. A new RTL result will be produced under the timeout constrain.
+
+ \item 4. The new golden result is then compared against the new RTL result. Two possible cases can happen:
+ \begin{itemize}
+ \item If the RTL result matches with the golden result, we can confirm that the current comment-out line causes the discrepancy in results previously. Then several modifications will be made on this problematic line.
+
+ Firstly, lines after the bug-trigger line but within the current function will be removed directly since the problem has been detected. The bug-trigger line will be the new return statement. Then, close curly brackets will be added to meet the standard C syntax if needed. For example, if the problematic line is contained inside a nested for-loop originally, removing the following lines removes the close curly brackets simultaneously, which can lead to an invalid program. So close curly brackets must be added back to match with the number of open curly brackets used.
+
+ Secondly, the problematic line will be checked to see if it calls other functions. If it does, the reduction focus will be moved to the functions it called. For example, if func\underline{ }1 calls the sub-function func\underline{ }2, and this calling statement is detected as problematic. Func\underline{ }2 needs to be reduced as well since the root problem can be embedded inside the func\underline{ }2. Whereas if the other functions are not being called at this line, we can conclude the reduction process and confirm that the current line is the root problem.
+
+ \item If the RTL result again does not match with the golden result, the current comment-out line is not the bug-trigger. The reduction process will move on to the next possible removable line and repeat the procedure.
+ \end{itemize}
+
+ \item The reduction process will terminate when the problematic line is detected, as described in step 4.1, or when there are no more possible removable lines inside the func\underline{ }1. The second case means that the discrepancy in results is not caused by a single line. It is possible that the bug is triggered by a combination of several statements or is triggered by the hashing function embedded inside the main. Bugs triggered by a combination of statements require a thorough understanding of the logic flow of the program, which is hard to reduce automatically. And as for hashing functions, the transparent\underline{ }crc hashing is complicated and involved massive bug-prone operations such as shifts and sign extensions. The imprecise automatic reduction can introduce new bugs. Thus, both cases require manual reduction work.
+\end{enumerate}
+A short note will be generated automatically once exited from the reduction process. The memo can describe three different exit conditions including 1) the program is reduced and the exact problematic line is detected; 2) the exact problematic line cannot be confirmed, and manual work is required; 3) the reduction process does not function properly which might cause by the timeout commands. The note is useful for later checking the condition of the reduced program.
+
+By the time of writing, the reduction method showed the ability to reduce the program but unable to achieve the minimal working example sometimes due to manual work required.
+
+\subsection{How we summarize}
+A simple automated analysis is done after all test cases finished, and a summary list is saved to a result\underline{ }check file for display. The complete result file extracted and saved as described in the \ref{extraction_compare_section} is used for analysis. Several data are collected regarding the overall results, including the number and label of test cases that have unmatched C/RTL results, that has failed to terminate C program, that being forced timeout during RTL synthesis and simulation, that trigger assertion error, and that has pragma error. Those data are extracted by looping through the result file and will be displayed through the terminal.
+
\section{Related Work}
Existing work has already been done in finding bugs in Intel's OpenCL SDK using metamorphic testing~\cite{lidbury15_many_core_compil_fuzzin}.