\section{Method}

\input{tool-figure}

\NR{I think this section is very detailed. I think we can start with a figure of our tool-flow. Then, we can break down each item in the figure and discuss the necessary details (Fig.~\ref{fig:method:toolflow}).}


\begin{itemize}
\item How Csmith works.
\item How we configure Csmith so that it only generates HLS-friendly programs.
\item How we process the programs generated by Csmith to add in labels on loops etc.
\item How we generate a TCL script for each program.
\item How we run each HLS tool, using timeouts as appropriate.
\end{itemize}

\subsection{How Csmith works}
It is essential to have valid random C/C++ programs, as the software programs need to be correct to confirm that discrepancies in results are indeed introduced by HLS tools. Csmith became the best candidate for generating the random C/C++ programs. The bug-detecting ability of Csmith has already been proved. It has found more than 400 previously unknown compiler bugs \cite{csmith}. Programs generated by Csmith contains uniformed syntax, formatted function names and variable names, complex hashing functions, and a main that assembles all the sub-functions. It also provides a safe math wrapper function that avoids undefined behaviors in C, such as divided by 0 or mod by 0. As shown in \ref{?} (Fig of random program)\YH{It may take up too much space, as we are reusing Csmith it is probably also not necessary, better to have more space to focus on the experiments.}, Csmith creates a random program with the order of struct/union declarations, global variables, top-level function (func\underline{ }1)\YH{You can use \texttt{func\_1} to render them a bit nicer.}, sub-functions (func\underline{ }6), and main. Inside the main function, only function 1 is being called as it is the top-most function, which will be calling other sub-functions. The crc32\underline{ }gentab and transparent\underline{ }crc function are responsible for creating and processing a unique hash for every global variable. XOR and shift are the primary operations done by the hashing process, which ensures uniqueness. The final hash value is calculated by looping through every index of the global variable and perform hashing, which is saved in crc32\underline{ }context and displayed.

As programs are uniformly formatted, the later pre-processing step is eased since the overall program structure is predictable. It also facilitated the reduction process as it is easier to spot functions or variables with uniformed names. Furthermore, the bug-detecting efficiency is boosted by comparing the unique final hash value, which reflects all the changes made to the global variables. 

 Csmith is not only confirmed to generated uniform valid programs \YH{Csmith ensures the generation of uniform and valid programs ...} that do not contain any undefined behaviors, but also provides built-in commands as well as probability file for tuning the properties and structures of generated programs. This feature eases the process of restricting \YH{the} types of programs it creates since some syntax is not supported by HLS tools. Csmith gives flexibility while providing a wide range of possible test cases that boost the efficiency of challenging HLS tools.

\subsection{How we configure Csmith so that it only generates HLS-friendly programs.}
As being mentioned, HLS tools have specific supported syntax for synthesizable C/C++ programs. In other words, not every valid C grammar is allowed for synthesis. Fortunately, Csmith provides a probability file and built-in commands for tuning or restricting the programs it generates. By directly modifying the probability file or adding the commands, we can get HLS-friendly random C/C++ programs from Csmith. Although it is known that each HLS tools have different supported syntax, the reported 10,000 test cases are kept constant for every HLS tools regardless of supported or not. The 10,000 test cases are pre-generated with altering probability and commands every 1000 test cases\YH{Does this mean you generate 1000 test cases at a time and then run them through the HLS tools?}. It is more useful to have constant test cases for comparing the performance between each HLS tool, although some test cases for specific tools might not be valid since it might contain unsupported syntax. But before running the final 10,000 test cases, Csmith has been configured only to generate test cases that match the supported syntax of each tool\YH{So the Csmith test cases are generated and then tweaking is done for each tool afterwards?  I think it might be clearer if you say that your tools configure the output after Csmith has generated it.}.

Starting with Vivado HLS, based on the user guide, 2018.3, 2019.1, and 2019.2 version has limitations on synthesizing system calls, dynamic memory allocation such as malloc() and alloc(), function pointers, pointer casting, and recursive functions \cite{user_manual_vivado}. Pointer to pointer reference is also limited and not valid if it is on the top-level function. Although all three versions declared to have the same unsupported syntax, version 2019.2 and 2019.1 have stricter restrictions on the syntax of the programs generated by Csmith. For example, Csmith can produce both binary AND operator (\&\&) and binary bitwise AND operator (\&). Although both operations are correct and valid as grammar-wise, both versions 2019.2 and 2019.1 do not accept the binary logical AND operation with constant operand when performing C synthesis. It warns that ‘\&’ operator should be used for the bitwise operation with constants. Then the test case will error out as “Wrong pragma usage”. However, version 2018.3 didn’t have any trouble with this matter. Thus, in the TABLE \ref{result table}, a large number of invalid tests were caused by this reason. Two 2019 versions were by no means worse than the older version\YH{Don't know if this is needed.}. To fit the need of both 2019 versions, only binary bitwise AND operator (\&\&) should be allowed to avoid triggering the pragma error. And that can be simply done by switch the probability of binary AND operator to 0 inside the Csmith’s probability file.

Besides the binary-AND-operator problem, there are other features of Csmith that need to be turned off for Vivado HLS to work, and this also applies to both LegUp HLS and Intel HLS. By default, Csmith will generate programs with a main that reads in command line parameters. And standard argc and argv are used to read and parse the parameters. The only parameter it takes is used to determine whether if the\YH{whether the} user wanted the hash value to be printed. The parameter is compared with a constant integer to make that decision. However, argv is declared as an array of character strings, which is unsynthesisable due to the pointer to pointer limitation. Besides, strcmp used for comparison is also not supported. Since the final unique hash value is saved to crc32\underline{ }context as described\ref{overview section}; thus, printing or not does not affect the whole testing process as long as the crc32\underline{ }context variable is accessible. So argv and strcmp can be both removed safely, and this can be done by putting --no-argc command when running Csmith.

Additionally, in order to not interfering\YH{interfere} with directives/pragmas of the HLS tools, --no-packed-struct command needs to be explicitly specified. Struct packing is not the issue, but the syntax Csmith generated confuses the HLS tools. If this command is not specified, Csmith will automatically declare “\#pragma pack(1)” before struct definition to enables packed struct. Unfortunately, this has the same format with pragma declaration of HLS tools, thus, causes conflict and need to be removed.

Other than the features mentioned above, there is a subset of C grammars that requires extra attention to ensure the proper functioning of all three HLS tools. All\YH{, all} of which can be modified inside the probability file. Firstly, the probability of generating bitfield is turned off as it required extra twist in syntax to become HLS-friendly, and the correctness cannot be guaranteed\YH{What is the twist that is needed, and how come it cannot be guaranteed to be correct?}. Secondly, since the union is not a feature commonly used in HLS, frequency of union’s occurrence in the program is set to a relatively low number. Note that it’s been confirmed that LegUp doesn’t have a good support for the union. In addition, the float is considered as another tricky datatype due to bit truncation and saturation, which can lead to precision issues. Although float is supported by HLS tools and been provided with HLS tools’ own library such as ap\underline{ }fixed\YH{Is \texttt{ap\_fixed} really used for \texttt{float}?  I thought that was for fixed point numbers an integers, but not sure what the floating point library is.}~\NR{\texttt{ap\_fixed} is the arbitrary-precision fixed-point library. } in Vivado HLS and hls\underline{ }float in Intel HLS, the probability of generating float type variable has still been set low. Float type requires extra pre-processing steps with high precision; otherwise, the validity and correctness of the program fed to HLS tools cannot be assured.

Furthermore, several Csmith commands are used to shape the program along the way but not compulsively. For example, in the number control section, --max-funcs command sets the maximum number of functions Csmith can generate. It is interesting to explore whether if the length of the program will affect the synthesis and simulation time. Similarly, --max-block-depth, --max-array-dim, and –max-expr-complexity restrict the maximum depth of the nested block, maximum array dimension, and maximum expression complexities. By alternating those parameters, HLS tools performance can be compared under different conditions.

\NR{We could summarise these different features to improve HLS friendliness of CSmith in a Table. That will capture the entire section. Do we have this list?}

\subsection{How we process the programs generated by Csmith to add in labels on loops etc.}

\NR{Am I right in saying two things happening here: 
\begin{itemize}
    \item Firstly, we identify all loops, functions and variables in the program to inject directives. In the case of loops, we may have to annotate the C file with directives to ensure the pragmas take effect. 
    \item Then, we generate testbenches for specific tools to allow for meaningful testing and verification of the annotated programs.
\end{itemize}
}

Once a valid and HLS-friendly C/C++ program is generated, the testing process moves on to the next stage, pre-processing. There are several parts of pre-processing, including information extraction, directives/labels adding, the main function reformatting, testbench implementation, and XOR hashing implementation. 
\NR{Is XOR hashing part of the annotation step or the CSmith generation step? }

Starting with automatically gathering information of the random programs, the extracted information will be used as indicators for adding proper labels. Information needed includes function names, number of functions, number of for loops inside each function and inside the whole program, variable names, array variable names, and number of variables. The automation and correctness of extraction is warranted by the uniformed syntax of the Csmith generated program. As shown in \ref{sample random program fig}, the random program follows a specific order. Each parameter can be found by looping through the program and detecting the keyword. For instance, functions are always declared in the forward declaration section and one function declaration per line. Then by reading in each line and checking for ``func\underline{ }'' keyword, function names can be found and saved as variables for later use.

After pre-processing, types and number of optimizations applied are selected through using either the scripts or a C program. Then another processing is required for adding labels or pragmas at the proper place,  based on extracted data and selected optimization. Vivado HLS accepts both in-file pragma declaration and TCL script directive specification. TCL scripting was chosen to be the only place where the directives were declared. This simplifies the debug process since directives are not mixed with the program, which is easier to read\YH{Are there any directives that can only be specified in source code?}. The detailed automated generation of TCL script will be described in \ref{tcl_generating section}. Directives for Vivado HLS can be roughly categorized into three-part including function-level, loop-level, and variable-wise.\NR{What happens with variable scopes? Two functions can have declared a variable with the same name, or have we avoid this case?} To realizing loop-level optimizations, unique labels need to be added to for-loops inside the program, whereas function-level and variable-wise optimizations require no modification in the program. The number of for-loop labels added is determined at the same time when the TCL script is generated. This will also be discussed in \ref{tcl_generating section}. LegUp HLS follows a similar method excepting it uses an additional Makefile with TCL script to instruct the optimizations. Intel HLS requires both labels and pragmas to be added directly to the program at the specific place. For instance, loop-level pipeline pragma needs to be declared directly under the loop name and above the actual loop that is being optimized.

Once labels are added, there are several last modifications required before it can be synthesized by HLS tools. Changes differ for each HLS tools. For Vivado HLS, a standard RTL verification flow needs a self-checking testbench to be in place and sets\YH{set} as the main function. Although Csmith’s main function calls the top-level function, which can act as a testbench triggering all other functions, it lacks the self-checking ability. Besides, the Csmith’s generated main function only returns 0, which is meaningless for result checking. Thus, to fit the need of Vivado HLS, the name of Csmith’s generated main is changed to “result” to avoid conflicts and been explicitly set as the top-level function using the design-flow TCL script. The returned value is modified from 0 to crc32\underline{ }context, the unique hash value that reflects every change in the global variables. Then, a new main function, the self-checking testbench, is written for automatically comparing and reporting the result of C simulation and RTL simulation. The testbench has no additional operations except calling the top-level function and comparing the returned results, so it should not introduce discrepancies in results. Testbench will return a non-zero value if C/RTL results unmatches.

Similar main modification applied for LegUp HLS except no new main function needed, as LegUp does not require a self-checking testbench. Intel HLS requires more processing comparing to the other two tools, mainly caused by \YH{being} unable to process Csmith’s hashing function embedded inside the main. Thus, a new simple but powerful hash method is required for efficient bug-detection. The best solution is to XORing \YH{XOR} all the variables since XOR is sensitive to bitwise changes, and XOR can be simply implemented using basic universal gates without requiring excessive resources. Meanwhile, the hash function is separated from the main acting as the new top-level function, whereas the main, behave \YH{behaving} as the testbench, calls the hash and returns the result.


\subsection{How we generate a TCL script for each program.}

\YH{I feel like this Section could be combined with the previous section, as they touch similar aspects.  The TCL generation could then be explained in the middle of the previous section where the TCL generation is first mentioned.}

\YH{This section might also be a bit too detailed, if we want to keep as much as space as possible for the results, it might be better to focus on pitfalls of the TCL generation or transformations of C in general, and the problems that were encountered.}

The automated TCL script generation is required for both Vivado HLS and LegUp HLS. Vivado HLS uses two TCL scripts throughout the process. One is used for declaring directives, and the other one is for instructing the design flow. As for LegUp HLS, it requires one TCL file and one Makefile, which are only used for specifying directives. The design flow is instructed through the shell script. For both tools, the directive TCL script is generated randomly but validly based on the information extracted during the pre-processing stage.

Starting with Vivado HLS, an example directive file generated is shown in \ref{sample directive tcl figure}. Categories of optimization, including function-level, loop-level, and variable-wise, is first selected randomly. Then the specific directive applied are picked randomly as well.

\YH{These descriptions could maybe be a bit more concise, for example it might be nice to summarise them in a table?}

\begin{enumerate}
  \item For loop-level \YH{Maybe for-loop-level} optimization, the base limitation is the number of for-loops that are available for optimization both inside the program and inside each function. It is possible for the program to have zero for-loops or massive nested for-loops, so there is a wide range of possible numbers. If the program contains for-loops, the amount selected for optimization is chosen between 1 and the total number included in the program. We have included 9 types of loop-level optimization, including loop pipeline with rewind, loop pipeline with flush, loop pipeline, loop unrolling, loop flatten, loop merge, loop tripcount, loop inline, and expression balance. Those optimizations will be randomly selected for each for-loop. Once the directive is chosen, it will be automatically printed to the directive TCL script with its targeted loop name and name of the function it is in.

  \item For function-level optimization, the restriction is the total number of functions. The selection steps stay the same with which of the loop-level. Function-level optimization we allowed includes function pipeline, function pipeline with flush, function-level loop merge, function inline, and expression balance. Each directive will be specified and saved to the TCL script with the targeted function name.  

  \item As for variable-wise optimization, which is only applied to array variables, the limitation is the total number of global array variables used throughout the program. Available optimization includes vertical mode array map, horizontal mode array map, array partition, and array reshape. Each variable-wise directive will be added to the TCL script with the targeted array variable name. Note that the top-level function name is also specified for adding variable-wise directives because, for Csmith generated program, top-level function reads or writes all the variables.
\end{enumerate}

Once the directive TCL script is successfully generated, we will proceed to design-flow TCL script implementation. Design-flow TCL instructs Vivado HLS to perform RTL synthesis and verification in a specific order. The standard flow includes create a new project, set the top-level function, add necessary files, declare the solution number, declare the target device, set the clock period, source the directive TCL, synthesis C program, simulate C, co-simulate C and RTL, and finally export the design. The files added include pre-processed C program, self-implemented self-checking testbench, and the golden GCC result. The only modification that needs to be made for each test case is the project name. Other commands stay the same. For simplicity, the project is named following the add\underline{ }i pattern, where i is the number of the test case currently performing. All of which are specified inside the design-flow TCL in order. 

Comparing to Vivado HLS, LegUp HLS does not require an extra TCL script for design-flow controlling but two scripts, Makefile and Config TCL script, for specifying optimizations. Different optimizations are specified in different files. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. If the optimizations in Config TCL are chosen, the Config TCL needs to be explicitly sourced inside the Makefile as the local configuration. Otherwise, the standard global LegUp Config TCL will be used. Selection limitations and selection processes follow the same method as which of Vivado HLS. Loop type directives chosen are restricted by the total number of loops and need to be specified with the loop name. Note that loop-level optimization can only be applied to the innermost for-loop. Function-level directives selected are limited by the total number of functions and should be specified with function names. 

\subsection{How we run each HLS tool, using timeouts as appropriate.}

\NR{In previous steps, we have generated HLS-friendly programs that have been automatically annotated with directives and meaningful testbenches. These programs are now ready to be provided by the HLS tool. A typical HLS tools executes these stages: C synthesis, C simulation, C-to-RTL generation and RTL simulation.
What about co-simulation? Is that just linking C and RTL simulation?}

The running stage of the testing process follows the order of C synthesis, C simulation, C to RTL translation, and eventually RTL simulation. \NR{Is Figure~\ref{fig:method:toolflow} matching this description of order then?} Although the overall running process is identical, each HLS tool has a little twist at the running stage.  

\NR{Some notes on key points from this section: 
\begin{itemize}
    \item We needed time outs that several stages of the HLS flow to ensure scalable testing. 
    \item LegUp 7.0 was not user-friendly enough to perform scripting. 
    \item Intel HLS needs more interventions and time outs since its slow execution times are part of our critical path during testing and reduction.
    \item We need to think of a good way to highlight the tooling difference and how it affects our changes. This is good information for anyone who is attempting to port any work across various tools.
\end{itemize}
}
\NR{So Vivado HLS does co-simulation automatically, whereas we had to invoke them for other tools?}
Vivado HLS follows the standard running process with one extra step, C/RTL co-simulation. Its testing flow is instructed by the automatically generated TCL file, described in section \ref{tcl_generation section}. Before entering the Vivado HLS, a golden C result is first produced using GCC. The result is saved to the out.gold text file and eventually added to the Vivado project at the add-files step. Since Csmith can generate programs that fail to terminate, to avoid being trapped, a timeout limitation is set for 5 minutes starting when the executable is generated and ready to run. Once finished, the exit condition can be determined by echoing and comparing the exit status. If the exit code shows forced termination, the current test case will be dumped and treat as invalid. Besides, it is also possible for a successfully executed Csmith programs to return no result. Thus, not only the exit code is checked, but also the result printout is considered. This can be done by checking whether if any \YH{whether any} content has been written to the out.gold text file. If the text file is empty, the current test case will be forced to stop as well. The testing process will not proceed to Vivado HLS unless those two checks are passed. The second timeout limits the Vivado HLS’s runtime. It starts once the Vivado HLS’s project is created after passing the GCC-related checks. The second timeout is set for 2 hours. During the 2-hour period, Vivado HLS first performs its own C synthesis and simulation to produce a C result for comparing against the golden GCC result. Then C/RTL co-simulation step, which employs the self-checking testbench, simulates both C and RTL and compares the results. Thus, two comparisons are be made during the 2-hour period, and two types of unmatching results can exist. The detailed result extraction and comparison method will be described in \ref{result extraction section}. A project finishes typically within 2 hours, but not always. C to RTL translation, as well as RTL simulation, usually takes up most of the time. Note that a project that could not complete within 2 hours does not count as faulty, since most of the incomplete projects are still running when being forced to terminate by the terminal, based on log files.

For running LegUp HLS, version 4.0 and 7.5 differs\YH{For running LegUp HLS, version 4.0 and 7.5 differs $\rightarrow$ LegUp version 4.0 and 7.5 differ greatly in how they are executed.}. Version 7.5 follows the same pattern as which for Vivado HLS, but under the Windows environment and launched as GUI. Since version 7.5 did \YH{did $\rightarrow$ was} not being tested exhaustively, the following description for the LegUp HLS running process will only be focused on and applied to version 4.0. Version 4.0 running process is instructed using a bash shell script, and two timeouts are used. The first 5 minutes timeout is applied for the same reason as which for Vivado HLS \YH{as which for $\rightarrow$ as for}. The C result is produced through GCC with -m32 command for 32-bit data width. This is done to match with \YH{with $\rightarrow$ ``''} LegUp’s behavior. Then LegUp HLS does C to RTL translation and RTL simulation by calling the ``make default v'' command. And that is where the second 2-hour period timeout is placed. The RTL result will be used to compare directly against the golden C result, so only one comparison will be made.

Lastly, Intel HLS is instructed by a batch script and employs the i++ compiler. The essential difference of running Intel HLS comparing to two other tools is that Intel HLS processes the C++ language, whereas the other two tools process C language. While Intel HLS uses i++ as its compiler comparing to GCC for the other two tolls, the timeout restriction still applies. And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test case can be dumped at any timeout period if the task is not finished within the limited time. 

\subsection{How we extract and compare the result}

\NR{We can merge this into the previous sections}

Results are extracted automatically either from log files/transcripts or the terminal’s output and then send for comparison either done by the HLS tools or self-implemented checking method. Although results will be compared immediately and returned for determining whether if the \YH{whether the} reduction process should be entered, it is still necessary to save every result produced for later analysis stage.

Mentioned in the \ref{how to timeout section}, two comparisons are performed and returned by Vivado HLS automatically using the self-checking testbench. The returned exit code can be used as an indicator of whether to start the reduction process. The first comparison is made between the golden GCC C result and Vivado HLS’s C simulation result, whereas the second comparison is made between Vivado HLS’s C simulation and RTL simulation result. The golden result is extracted from the terminal’s output, whereas the Vivado HLS’s C simulation result and RTL simulation result are read from the log file. Saving results from the terminal’s output is done by simply echoing the exit code and redirect it to the text file. However, finding results from the log files requires more effort. The log file is automatically generated by Vivado HLS and has a specific pattern. It follows the top-down flow recording information about the C synthesis, C simulation, C testbench checking/comparing, and RTL testbench checking/comparing. Each section has a fixed header that indicates the step is in, such as C TB testing. Thus, the result extraction method is implemented by looping through the file, finding the specific header name, extract the hexadecimal results, and then save to the result file. Ideally, both comparison results can be found inside the log file. However, the log file may stop at the first comparisons if the C result doesn’t match with the GCC result. In the cases, only the C result will be extracted and saved. 

For LegUp HLS, version 4.0 will be the focus here, and it employs the same extraction method but reading from the transcripts. The transcript automatically generated by LegUp HLS also has a fixed pattern. And as LegUp HLS only produced the RTL result without comparing it with C result automatically; thus, a comparison method is required. Once the RTL result is produced and extracted from the transcript, it will be saved to both the result file and an empty temporary text file. Then, the comparison program will compare it against the golden C result. Both C golden result file and the temporary RTL result file will be refreshed for each test case; the comparison program only needs to get the first line of each file for the most current result. Then the comparison result will be outputted to the complete result file indicating whether if discrepancies have been detected. The terminal can then immediately grep the newly added line from the result file to determine whether to enter the reduction. 

The extraction and comparison steps for Intel HLS are compact since both C and RTL results are produced by the executable and can directly be read or saved from the command line. The extraction work is simplified as well as comparison. Result comparison is done once after all test cases have finished, unlike the other two tools, which require comparison to be done immediately. The reason is that, for Intel HLS, the reduction method is not applied because it would take a massive amount of time to reduce the program iteratively. Thus, there is no need to produce comparison results immediately. 

\subsection{How we reduce}

\NR{Once we discover any programs that crashes the HLS tool or whose C and RTL simulation do not match, we must further scrutinise these programs to identify the root cause(s) for these undesirable behaviours. As the programs generated by CSmith can be fairly large, we must systematically \emph{reduce} these programs to identify the source of a bug.}

The reduction process will only be triggered if the C and RTL result does not match. Test cases that either crash or forced to terminate by the timeout do not proceed to the reduction stage. As reduction is performed iteratively, long runtime is expected. Only two tools, Vivado HLS and LegUp HLS, are equipped with the reduction method. It is ignored for Intel HLS due to excessive runtime. The reduction flow can be described as follows.

\begin{enumerate}
  \item Reduction starts with the top-level function, which by default named func\underline{ }1. As being said, the random program generated by CSmith follows a strict order that the main always call the top-level function. So, we can trace down to the root problem by iteratively reduce the top-level function and see if any sub-function has been called. 

    The program is first being processed to extract the number of removable lines inside the func\underline{ }1. Lines, include variable declarations, variable initialization, for loops, if/else statements, while loops, goto and goto labels, continue, break, and return, do not count as removable lines. The reason for not removing variable declarations and initialization is that it can cause the undeclared variable error. Other non-removable features such as if/else statements and goto are kept to ensure the original logic flow. Changing the logic flow can lead to bugs went undetected, since if changed, some path might not be taken but was taken initially. By keeping those lines, the original functionality and logic flow of the program can be maximumly preserved. 

  \item By getting the total number of possible removable lines, each line will be iterative commented out. One line per time eases the aim of finding the exact line that causes the discrepancy in results. A new golden result will need to be produced since each line can have an impact on the final hash variable. The timeout command is placed to avoid non-terminating programs. 
  
  \item Once the new golden result is generated, the program will be fed into HLS tools and go through the same testing procedures as standard test cases. A new RTL result will be produced under the timeout constrain. 
  
  \item 4.	The new golden result is then compared against the new RTL result. Two possible cases can happen:
    \begin{itemize}
     \item If the RTL result matches with the golden result, we can confirm that the current comment-out line causes the discrepancy in results previously. Then several modifications will be made on this problematic line.
     
    Firstly, lines after the bug-trigger line but within the current function will be removed directly since the problem has been detected. The bug-trigger line will be the new return statement. Then, close curly brackets will be added to meet the standard C syntax if needed. For example, if the problematic line is contained inside a nested for-loop originally, removing the following lines removes the close curly brackets simultaneously, which can lead to an invalid program. So close curly brackets must be added back to match with the number of open curly brackets used.  
    
    Secondly, the problematic line will be checked to see if it calls other functions. If it does, the reduction focus will be moved to the functions it called. For example, if func\underline{ }1 calls the sub-function func\underline{ }2, and this calling statement is detected as problematic. Func\underline{ }2 needs to be reduced as well since the root problem can be embedded inside the func\underline{ }2. Whereas if the other functions are not being called at this line, we can conclude the reduction process and confirm that the current line is the root problem.

     \item If the RTL result again does not match with the golden result, the current comment-out line is not the bug-trigger. The reduction process will move on to the next possible removable line and repeat the procedure. 
    \end{itemize}
    
    \item The reduction process will terminate when the problematic line is detected, as described in step 4.1, or when there are no more possible removable lines inside the func\underline{ }1. The second case means that the discrepancy in results is not caused by a single line. It is possible that the bug is triggered by a combination of several statements or is triggered by the hashing function embedded inside the main. Bugs triggered by a combination of statements require a thorough understanding of the logic flow of the program, which is hard to reduce automatically. And as for hashing functions, the transparent\underline{ }crc hashing is complicated and involved massive bug-prone operations such as shifts and sign extensions. The imprecise automatic reduction can introduce new bugs. Thus, both cases require manual reduction work. 
\end{enumerate}
A short note will be generated automatically once exited from the reduction process. The memo can describe three different exit conditions including 1) the program is reduced and the exact problematic line is detected; 2) the exact problematic line cannot be confirmed, and manual work is required; 3) the reduction process does not function properly which might cause by the timeout commands. The note is useful for later checking the condition of the reduced program.

By the time of writing, the reduction method showed the ability to reduce the program but unable to achieve the minimal working example sometimes due to manual work required.  

\subsection{How we summarize}
A simple automated analysis is done after all test cases finished, and a summary list is saved to a result\underline{ }check file for display. The complete result file extracted and saved as described in the \ref{extraction_compare_section} is used for analysis. Several data are collected regarding the overall results, including the number and label of test cases that have unmatched C/RTL results, that has failed to terminate C program, that being forced timeout during RTL synthesis and simulation, that trigger assertion error, and that has pragma error. Those data are extracted by looping through the result file and will be displayed through the terminal.