Push preliminary comments

author: Yann Herklotz <git@yannherklotz.com> 2020-08-17 14:17:10 +0100
committer: Yann Herklotz <git@yannherklotz.com> 2020-08-17 14:17:10 +0100
commit: 42292ef1e5552cc599c20a3d43953f56a56107cf (patch)
tree: 413225050158f31ba4499a93cfb7d640b8a2777d /main.tex
parent: 5c8114850ce22267a594a0bb2ca825e389677170 (diff)
download: fccm21_esrhls-42292ef1e5552cc599c20a3d43953f56a56107cf.tar.gz
fccm21_esrhls-42292ef1e5552cc599c20a3d43953f56a56107cf.zip
1 files changed, 33 insertions, 26 deletions
diff --git a/main.tex b/main.tex
index 8d3a5e8..5022e19 100644
--- a/main.tex
+++ b/main.tex
@@ -165,17 +165,17 @@ Three major commercial HLS tools, Vivado HLS, LegUp HLS, and Intel HLS, are bein
 
 Starting with random program generation, valid, random C/C++ programs are essential for ensuring the quality of test cases that will be feed into HLS tools. Csmith, implemented by a group of people from the University of Utah, is being chosen and used to generate random C/C++ programs. Csmith utilizes complex hashing functions to provide one single result that reflects every change in the variables’ value, which is extremely useful for later result comparison stage. Besides that, Csmith provides both built-in commands as well as a probability file for tuning the properties and structures of generated C/C++ programs. So the wide variety of test cases can be guaranteed. Note that Csmith can create programs that fail to terminate or don’t produce results. So, it is useful to pre-check whether if the C program can provide a valid result before feeding into HLS tools. Once C/C++ program is generated, it will then undergo a pre-processing step.
 
-As each HLS tools has different supported synthesizable syntax, pre-processing steps for each HLS tools differ from one another. For instance, Intel HLS doesn’t work correctly with Csmith’s hashing functions. The program generated by Csmith will be processed, aiming to replace its original hashing with a simple XOR hashing. This replacement eases the synthesis and simulation flow for Intel HLS to a great extent, but the downside is that some bugs can go undetected. Vivado HLS and LegUp HLS managed to cope with Csmith’s hashing functions; thus, there is no need for replacement. The pre-processing step not only twists the syntax while maximumly preserve the original functionality, but also extracts some necessary information regarding the generated programs. For instance, the total number and name of functions, number of for loops, and array variable’s names are useful information for applying suitable directives/pragmas automatically in the directive-selecting step. 
+As each HLS tools has different supported synthesizable syntax, pre-processing steps for each HLS tools differ from one another. For instance, Intel HLS doesn’t work correctly with Csmith’s hashing functions. The program generated by Csmith will be processed, aiming to replace its original hashing with a simple XOR hashing. This replacement eases the synthesis and simulation flow for Intel HLS to a great extent, but the downside is that some bugs can go undetected. Vivado HLS and LegUp HLS managed to cope with Csmith’s hashing functions; thus, there is no need for replacement. The pre-processing step not only twists the syntax while maximumly preserve \YH{maximally preserving} the original functionality, but also extracts some necessary information regarding the generated programs. For instance, the total number and name of functions, number of for loops, and array variable’s names are useful information for applying suitable directives/pragmas automatically in the directive-selecting step.
 
-Types and quantities of directives/pragmas applied are selected randomly but ensured to be valid by checking with extracted information. For Vivado HLS and LegUp HLS, the selection process is done by scripts, whereas, for Intel HLS, the process is done using the C program. The reason of the distinction is that Intel HLS requires the pragmas to be directly added to the program. But for the other two tools, selected pragmas is written to either TCL scripts or Makefile, which will be used and applied when the HLS tool runs. 
+Types and quantities of directives/pragmas applied are selected randomly but ensured to be valid by checking with extracted information.\YH{Maybe this doesn't have to be repeated again as it was already mentioned at the end of the previous paragraph.} For Vivado HLS and LegUp HLS, the selection process is done by scripts, whereas, for Intel HLS, the process is done using the C program. The reason of the distinction is that Intel HLS requires the pragmas to be directly added to the program. But for the other two tools, selected pragmas is written to either TCL scripts or Makefile, which will be used and applied when the HLS tool runs.
 
-After pragmas are selected, the C/C++ program will be processed again for adding labels at specific correct places. For example, if a program contains 5 for-loops and the loop-pipeline optimization is being chosen to apply on the second loop, a name/label needs to be added to the place where the second loop starts and leaving other loops unchanged.  
+After pragmas are selected, the C/C++ program will be processed again for adding labels at specific correct places \YH{to add labels to specific positions}. For example, if a program contains 5 for-loops and the loop-pipeline optimization is being chosen to apply on the second loop, a name/label needs to be added to the place where the second loop starts and leaving other loops unchanged.
 
-After the labels are being added, the program can be synthesized and compiled to get the golden C result for comparing with the RTL result later. GCC version 9.3.0 is used to compile and execute the C result for Vivado HLS, whereas version 4.8.2 is used for LegUp HLS. Intel HLS uses i++ for C++ programs. Once the program successfully produced a result, it can finally be feed into HLS tools for synthesizing, translating, and simulating. During this process, three types of results can exist, including matched C/RTL result, unmatched C/RTL result, and crashes. Theoretically, HLS tools can translate every C/C++ program into RTL if the syntax is supported. Also, the RTL result should be equivalent to the C/C++ result. However, we found this is not always the case and is the reason for implementing this testing method. 
+After the labels are being added, the program can be synthesized and compiled to get the golden C result for comparing with the RTL result later. GCC version 9.3.0 is used to compile and execute the C result for Vivado HLS, whereas version 4.8.2 is used for LegUp HLS\YH{Maybe the same version of GCC can be used for all tools, for consisency reasons}. Intel HLS uses i++ for C++ programs. Once the program successfully produced a result, it can finally be feed into HLS tools for synthesizing, translating, and simulating. During this process, three types of results can exist, including matched C/RTL result, unmatched C/RTL result, and crashes. Theoretically, HLS tools can translate every C/C++ program into RTL if the syntax is supported. Also, the RTL result should be equivalent to the C/C++ result. However, we found this is not always the case and is the reason for implementing this testing method.
 
 The extraction and comparison stages involve extracting the RTL result from the command line, log file, or transcript, comparing it with the golden C result, and saving both the numerical returned result and comparison result to a complete result file. The comparison result is used to determine whether if the reduction process should start. 
 
-The reduction process will only proceed when a test case fails, which does not include crashes. The process will iteratively comment out one functional line, then send the modified version to HLS tools. In this way, the functionality of the program is maximumly preserved while reducing it to the minimum program that still triggers the bug. Although this reduction process can reduce the program down to some extent, manual work is yet required since full-automated reduction requires more effort. 
+The reduction process will only proceed when a test case fails, which does not include crashes\YH{How come crashes are not included to be reduced?}. The process will iteratively comment out one functional line, then send the modified version to HLS tools. In this way, the functionality of the program is maximumly preserved while reducing it to the minimum program that still triggers the bug. Although this reduction process can reduce the program down to some extent, manual work is yet required since full-automated reduction requires more effort.\YH{The reviews will probably mention C-Reduce, we should therefore probably try to at least get it working in the tool flow.}
 
 A checker will be executed at the end when all the test cases are finished. It automatically analyzes and displays the summative result regarding the total number of tests performed, the amount of which did not produce a result, and the amount of which produced the wrong result. Depending on the HLS tools, other information is also displayed. For example, the number of assertion errors triggered is summarized for LegUp HLS.
 
@@ -189,49 +189,56 @@ A checker will be executed at the end when all the test cases are finished. It a
 \end{itemize}
 
 \subsection{How Csmith works}
-It is essential to have valid random C/C++ programs, as the software programs need to be correct to confirm that discrepancies in results are indeed introduced by HLS tools. Csmith became the best candidate for generating the random C/C++ programs. The bug-detecting ability of Csmith has already been proved. It has found more than 400 previously unknown compiler bugs \cite{csmith}. Programs generated by Csmith contains uniformed syntax, formatted function names and variable names, complex hashing functions, and a main that assembles all the sub-functions. It also provides a safe math wrapper function that avoids undefined behaviors in C, such as divided by 0 or mod by 0. As shown in \ref{?} (Fig of random program), Csmith creates a random program with the order of struct/union declarations, global variables, top-level function (func\underline{ }1), sub-functions (func\underline{ }6), and main. Inside the main function, only function 1 is being called as it is the top-most function, which will be calling other sub-functions. The crc32\underline{ }gentab and transparent\underline{ }crc function are responsible for creating and processing a unique hash for every global variable. XOR and shift are the primary operations done by the hashing process, which ensures uniqueness. The final hash value is calculated by looping through every index of the global variable and perform hashing, which is saved in crc32\underline{ }context and displayed. 
+It is essential to have valid random C/C++ programs, as the software programs need to be correct to confirm that discrepancies in results are indeed introduced by HLS tools. Csmith became the best candidate for generating the random C/C++ programs. The bug-detecting ability of Csmith has already been proved. It has found more than 400 previously unknown compiler bugs \cite{csmith}. Programs generated by Csmith contains uniformed syntax, formatted function names and variable names, complex hashing functions, and a main that assembles all the sub-functions. It also provides a safe math wrapper function that avoids undefined behaviors in C, such as divided by 0 or mod by 0. As shown in \ref{?} (Fig of random program)\YH{It may take up too much space, as we are reusing Csmith it is probably also not necessary, better to have more space to focus on the experiments.}, Csmith creates a random program with the order of struct/union declarations, global variables, top-level function (func\underline{ }1)\YH{You can use \texttt{func\_1} to render them a bit nicer.}, sub-functions (func\underline{ }6), and main. Inside the main function, only function 1 is being called as it is the top-most function, which will be calling other sub-functions. The crc32\underline{ }gentab and transparent\underline{ }crc function are responsible for creating and processing a unique hash for every global variable. XOR and shift are the primary operations done by the hashing process, which ensures uniqueness. The final hash value is calculated by looping through every index of the global variable and perform hashing, which is saved in crc32\underline{ }context and displayed.
 
 As programs are uniformly formatted, the later pre-processing step is eased since the overall program structure is predictable. It also facilitated the reduction process as it is easier to spot functions or variables with uniformed names. Furthermore, the bug-detecting efficiency is boosted by comparing the unique final hash value, which reflects all the changes made to the global variables. 
 
- Csmith is not only confirmed to generated uniform valid programs that do not contain any undefined behaviors, but also provides built-in commands as well as probability file for tuning the properties and structures of generated programs. This feature eases the process of restricting types of programs it creates since some syntax is not supported by HLS tools. Csmith gives flexibility while providing a wide range of possible test cases that boost the efficiency of challenging HLS tools.
+ Csmith is not only confirmed to generated uniform valid programs \YH{Csmith ensures the generation of uniform and valid programs ...} that do not contain any undefined behaviors, but also provides built-in commands as well as probability file for tuning the properties and structures of generated programs. This feature eases the process of restricting \YH{the} types of programs it creates since some syntax is not supported by HLS tools. Csmith gives flexibility while providing a wide range of possible test cases that boost the efficiency of challenging HLS tools.
 
 \subsection{How we configure Csmith so that it only generates HLS-friendly programs.}
-As being mentioned, HLS tools have specific supported syntax for synthesizable C/C++ programs. In other words, not every valid C grammar is allowed for synthesis. Fortunately, Csmith provides a probability file and built-in commands for tuning or restricting the programs it generates. By directly modifying the probability file or adding the commands, we can get HLS-friendly random C/C++ programs from Csmith. Although it is known that each HLS tools have different supported syntax, the reported 10,000 test cases are kept constant for every HLS tools regardless of supported or not. The 10,000 test cases are pre-generated with altering probability and commands every 1000 test cases. It is more useful to have constant test cases for comparing the performance between each HLS tool, although some test cases for specific tools might not be valid since it might contain unsupported syntax. But before running the final 10,000 test cases, Csmith has been configured only to generate test cases that match the supported syntax of each tool.
+As being mentioned, HLS tools have specific supported syntax for synthesizable C/C++ programs. In other words, not every valid C grammar is allowed for synthesis. Fortunately, Csmith provides a probability file and built-in commands for tuning or restricting the programs it generates. By directly modifying the probability file or adding the commands, we can get HLS-friendly random C/C++ programs from Csmith. Although it is known that each HLS tools have different supported syntax, the reported 10,000 test cases are kept constant for every HLS tools regardless of supported or not. The 10,000 test cases are pre-generated with altering probability and commands every 1000 test cases\YH{Does this mean you generate 1000 test cases at a time and then run them through the HLS tools?}. It is more useful to have constant test cases for comparing the performance between each HLS tool, although some test cases for specific tools might not be valid since it might contain unsupported syntax. But before running the final 10,000 test cases, Csmith has been configured only to generate test cases that match the supported syntax of each tool\YH{So the Csmith test cases are generated and then tweaking is done for each tool afterwards?  I think it might be clearer if you say that your tools configure the output after Csmith has generated it.}.
 
-Starting with Vivado HLS, based on the user guide, 2018.3, 2019.1, and 2019.2 version has limitations on synthesizing system calls, dynamic memory allocation such as malloc() and alloc(), function pointers, pointer casting, and recursive functions \cite{user_manual_vivado}. Pointer to pointer reference is also limited and not valid if it is on the top-level function. Although all three versions declared to have the same unsupported syntax, version 2019.2 and 2019.1 have stricter restrictions on the syntax of the programs generated by Csmith. For example, Csmith can produce both binary AND operator (\&\&) and binary bitwise AND operator (\&). Although both operations are correct and valid as grammar-wise, both versions 2019.2 and 2019.1 do not accept the binary logical AND operation with constant operand when performing C synthesis. It warns that ‘\&’ operator should be used for the bitwise operation with constants. Then the test case will error out as “Wrong pragma usage”. However, version 2018.3 didn’t have any trouble with this matter. Thus, in the TABLE \ref{result table}, a large number of invalid tests were caused by this reason. Two 2019 versions were by no means worse than the older version. To fit the need of both 2019 versions, only binary bitwise AND operator (\&\&) should be allowed to avoid triggering the pragma error. And that can be simply done by switch the probability of binary AND operator to 0 inside the Csmith’s probability file.
+Starting with Vivado HLS, based on the user guide, 2018.3, 2019.1, and 2019.2 version has limitations on synthesizing system calls, dynamic memory allocation such as malloc() and alloc(), function pointers, pointer casting, and recursive functions \cite{user_manual_vivado}. Pointer to pointer reference is also limited and not valid if it is on the top-level function. Although all three versions declared to have the same unsupported syntax, version 2019.2 and 2019.1 have stricter restrictions on the syntax of the programs generated by Csmith. For example, Csmith can produce both binary AND operator (\&\&) and binary bitwise AND operator (\&). Although both operations are correct and valid as grammar-wise, both versions 2019.2 and 2019.1 do not accept the binary logical AND operation with constant operand when performing C synthesis. It warns that ‘\&’ operator should be used for the bitwise operation with constants. Then the test case will error out as “Wrong pragma usage”. However, version 2018.3 didn’t have any trouble with this matter. Thus, in the TABLE \ref{result table}, a large number of invalid tests were caused by this reason. Two 2019 versions were by no means worse than the older version\YH{Don't know if this is needed.}. To fit the need of both 2019 versions, only binary bitwise AND operator (\&\&) should be allowed to avoid triggering the pragma error. And that can be simply done by switch the probability of binary AND operator to 0 inside the Csmith’s probability file.
 
-Besides the binary-AND-operator problem, there are other features of Csmith that need to be turned off for Vivado HLS to work, and this also applies to both LegUp HLS and Intel HLS. By default, Csmith will generate programs with a main that reads in command line parameters. And standard argc and argv are used to read and parse the parameters. The only parameter it takes is used to determine whether if the user wanted the hash value to be printed. The parameter is compared with a constant integer to make that decision. However, argv is declared as an array of character strings, which is unsynthesisable due to the pointer to pointer limitation. Besides, strcmp used for comparison is also not supported. Since the final unique hash value is saved to crc32\underline{ }context as described\ref{overview section}; thus, printing or not does not affect the whole testing process as long as the crc32\underline{ }context variable is accessible. So argv and strcmp can be both removed safely, and this can be done by putting --no-argc command when running Csmith. 
+Besides the binary-AND-operator problem, there are other features of Csmith that need to be turned off for Vivado HLS to work, and this also applies to both LegUp HLS and Intel HLS. By default, Csmith will generate programs with a main that reads in command line parameters. And standard argc and argv are used to read and parse the parameters. The only parameter it takes is used to determine whether if the\YH{whether the} user wanted the hash value to be printed. The parameter is compared with a constant integer to make that decision. However, argv is declared as an array of character strings, which is unsynthesisable due to the pointer to pointer limitation. Besides, strcmp used for comparison is also not supported. Since the final unique hash value is saved to crc32\underline{ }context as described\ref{overview section}; thus, printing or not does not affect the whole testing process as long as the crc32\underline{ }context variable is accessible. So argv and strcmp can be both removed safely, and this can be done by putting --no-argc command when running Csmith.
 
-Additionally, in order to not interfering with directives/pragmas of the HLS tools, --no-packed-struct command needs to be explicitly specified. Struct packing is not the issue, but the syntax Csmith generated confuses the HLS tools. If this command is not specified, Csmith will automatically declare “\#pragma pack(1)” before struct definition to enables packed struct. Unfortunately, this has the same format with pragma declaration of HLS tools, thus, causes conflict and need to be removed.
+Additionally, in order to not interfering\YH{interfere} with directives/pragmas of the HLS tools, --no-packed-struct command needs to be explicitly specified. Struct packing is not the issue, but the syntax Csmith generated confuses the HLS tools. If this command is not specified, Csmith will automatically declare “\#pragma pack(1)” before struct definition to enables packed struct. Unfortunately, this has the same format with pragma declaration of HLS tools, thus, causes conflict and need to be removed.
 
-Other than the features mentioned above, there is a subset of C grammars that requires extra attention to ensure the proper functioning of all three HLS tools. All of which can be modified inside the probability file. Firstly, the probability of generating bitfield is turned off as it required extra twist in syntax to become HLS-friendly, and the correctness cannot be guaranteed. Secondly, since the union is not a feature commonly used in HLS, frequency of union’s occurrence in the program is set to a relatively low number. Note that it’s been confirmed that LegUp doesn’t have a good support for the union. In addition, the float is considered as another tricky datatype due to bit truncation and saturation, which can lead to precision issues. Although float is supported by HLS tools and been provided with HLS tools’ own library such as ap\underline{ }fixed in Vivado HLS and hls\underline{ }float in Intel HLS, the probability of generating float type variable has still been set low. Float type requires extra pre-processing steps with high precision; otherwise, the validity and correctness of the program fed to HLS tools cannot be assured. 
+Other than the features mentioned above, there is a subset of C grammars that requires extra attention to ensure the proper functioning of all three HLS tools. All\YH{, all} of which can be modified inside the probability file. Firstly, the probability of generating bitfield is turned off as it required extra twist in syntax to become HLS-friendly, and the correctness cannot be guaranteed\YH{What is the twist that is needed, and how come it cannot be guaranteed to be correct?}. Secondly, since the union is not a feature commonly used in HLS, frequency of union’s occurrence in the program is set to a relatively low number. Note that it’s been confirmed that LegUp doesn’t have a good support for the union. In addition, the float is considered as another tricky datatype due to bit truncation and saturation, which can lead to precision issues. Although float is supported by HLS tools and been provided with HLS tools’ own library such as ap\underline{ }fixed\YH{Is \texttt{ap\_fixed} really used for \texttt{float}?  I thought that was for fixed point numbers an integers, but not sure what the floating point library is.} in Vivado HLS and hls\underline{ }float in Intel HLS, the probability of generating float type variable has still been set low. Float type requires extra pre-processing steps with high precision; otherwise, the validity and correctness of the program fed to HLS tools cannot be assured.
 
-Furthermore, several Csimth commands are used to shape the program along the way but not compulsively. For example, in the number control section, --max-funcs command sets the maximum number of functions Csmith can generate. It is interesting to explore whether if the length of the program will affect the synthesis and simulation time. Similarly, --max-block-depth, --max-array-dim, and –max-expr-complexity restrict the maximum depth of the nested block, maximum array dimension, and maximum expression complexities. By alternating those parameters, HLS tools performance can be compared under different conditions.  
+Furthermore, several Csmith commands are used to shape the program along the way but not compulsively. For example, in the number control section, --max-funcs command sets the maximum number of functions Csmith can generate. It is interesting to explore whether if the length of the program will affect the synthesis and simulation time. Similarly, --max-block-depth, --max-array-dim, and –max-expr-complexity restrict the maximum depth of the nested block, maximum array dimension, and maximum expression complexities. By alternating those parameters, HLS tools performance can be compared under different conditions.
 
 \subsection{How we process the programs generated by Csmith to add in labels on loops etc.}
 Once a valid and HLS-friendly C/C++ program is generated, the testing process moves on to the next stage, pre-processing. There are several parts of pre-processing, including information extraction, directives/labels adding, the main function reformatting, testbench implementation, and XOR hashing implementation. 
 
-Starting with automatically gathering information of the random programs, the extracted information will be used as indicators for adding proper labels. Information needed includes function names, number of functions, number of for loops inside each function and inside the whole program, variable names, array variable names, and number of variables. The automation and correctness of extraction is warranted by the uniformed syntax of the Csmith generated program. As shown in \ref{sample random program fig}, the random program follows a specific order. Each parameter can be found by looping through the program and detecting the keyword. For instance, functions are always declared in the forward declaration section and one function declaration per line. Then by reading in each line and checking for “func\underline{ }” keyword, function names can be found and saved as variables for later use. 
+Starting with automatically gathering information of the random programs, the extracted information will be used as indicators for adding proper labels. Information needed includes function names, number of functions, number of for loops inside each function and inside the whole program, variable names, array variable names, and number of variables. The automation and correctness of extraction is warranted by the uniformed syntax of the Csmith generated program. As shown in \ref{sample random program fig}, the random program follows a specific order. Each parameter can be found by looping through the program and detecting the keyword. For instance, functions are always declared in the forward declaration section and one function declaration per line. Then by reading in each line and checking for ``func\underline{ }'' keyword, function names can be found and saved as variables for later use.
 
-After pre-processing, types and number of optimizations applied are selected through using either the scripts or a C program. Then another processing is required for adding labels or pragmas at the proper place,  based on extracted data and selected optimization. Vivado HLS accepts both in-file pragma declaration and TCL script directive specification. TCL scripting was chosen to be the only place where the directives were declared. This simplifies the debug process since directives are not mixed with the program, which is easier to read. The detailed automated generation of TCL script will be described in \ref{tcl_generating section}. Directives for Vivado HLS can be roughly categorized into three-part including function-level, loop-level, and variable-wise. To realizing loop-level optimizations, unique labels need to be added to for-loops inside the program, whereas function-level and variable-wise optimizations require no modification in the program. The number of for-loop labels added is determined at the same time when the TCL script is generated. This will also be discussed in \ref{tcl_generating section}. LegUp HLS follows a similar method excepting it uses an additional Makefile with TCL script to instruct the optimizations. Intel HLS requires both labels and pragmas to be added directly to the program at the specific place. For instance, loop-level pipeline pragma needs to be declared directly under the loop name and above the actual loop that is being optimized. 
+After pre-processing, types and number of optimizations applied are selected through using either the scripts or a C program. Then another processing is required for adding labels or pragmas at the proper place,  based on extracted data and selected optimization. Vivado HLS accepts both in-file pragma declaration and TCL script directive specification. TCL scripting was chosen to be the only place where the directives were declared. This simplifies the debug process since directives are not mixed with the program, which is easier to read\YH{Are there any directives that can only be specified in source code?}. The detailed automated generation of TCL script will be described in \ref{tcl_generating section}. Directives for Vivado HLS can be roughly categorized into three-part including function-level, loop-level, and variable-wise. To realizing loop-level optimizations, unique labels need to be added to for-loops inside the program, whereas function-level and variable-wise optimizations require no modification in the program. The number of for-loop labels added is determined at the same time when the TCL script is generated. This will also be discussed in \ref{tcl_generating section}. LegUp HLS follows a similar method excepting it uses an additional Makefile with TCL script to instruct the optimizations. Intel HLS requires both labels and pragmas to be added directly to the program at the specific place. For instance, loop-level pipeline pragma needs to be declared directly under the loop name and above the actual loop that is being optimized.
 
-Once labels are added, there are several last modifications required before it can be synthesized by HLS tools. Changes differ for each HLS tools. For Vivado HLS, a standard RTL verification flow needs a self-checking testbench to be in place and sets as the main function. Although Csmith’s main function calls the top-level function, which can act as a testbench triggering all other functions, it lacks the self-checking ability. Besides, the Csmith’s generated main function only returns 0, which is meaningless for result checking. Thus, to fit the need of Vivado HLS, the name of Csmith’s generated main is changed to “result” to avoid conflicts and been explicitly set as the top-level function using the design-flow TCL script. The returned value is modified from 0 to crc32\underline{ }context, the unique hash value that reflects every change in the global variables. Then, a new main function, the self-checking testbench, is written for automatically comparing and reporting the result of C simulation and RTL simulation. The testbench has no additional operations except calling the top-level function and comparing the returned results, so it should not introduce discrepancies in results. Testbench will return a non-zero value if C/RTL results unmatches. 
+Once labels are added, there are several last modifications required before it can be synthesized by HLS tools. Changes differ for each HLS tools. For Vivado HLS, a standard RTL verification flow needs a self-checking testbench to be in place and sets\YH{set} as the main function. Although Csmith’s main function calls the top-level function, which can act as a testbench triggering all other functions, it lacks the self-checking ability. Besides, the Csmith’s generated main function only returns 0, which is meaningless for result checking. Thus, to fit the need of Vivado HLS, the name of Csmith’s generated main is changed to “result” to avoid conflicts and been explicitly set as the top-level function using the design-flow TCL script. The returned value is modified from 0 to crc32\underline{ }context, the unique hash value that reflects every change in the global variables. Then, a new main function, the self-checking testbench, is written for automatically comparing and reporting the result of C simulation and RTL simulation. The testbench has no additional operations except calling the top-level function and comparing the returned results, so it should not introduce discrepancies in results. Testbench will return a non-zero value if C/RTL results unmatches.
 
-Similar main modification applied for LegUp HLS except no new main function needed, as LegUp does not require a self-checking testbench. Intel HLS requires more processing comparing to the other two tools, mainly caused by unable to process Csmith’s hashing function embedded inside the main. Thus, a new simple but powerful hash method is required for efficient bug-detection. The best solution is to XORing all the variables since XOR is sensitive to bitwise changes, and XOR can be simply implemented using basic universal gates without requiring excessive resources. Meanwhile, the hash function is separated from the main acting as the new top-level function, whereas the main, behave as the testbench, calls the hash and returns the result. 
+Similar main modification applied for LegUp HLS except no new main function needed, as LegUp does not require a self-checking testbench. Intel HLS requires more processing comparing to the other two tools, mainly caused by \YH{being} unable to process Csmith’s hashing function embedded inside the main. Thus, a new simple but powerful hash method is required for efficient bug-detection. The best solution is to XORing \YH{XOR} all the variables since XOR is sensitive to bitwise changes, and XOR can be simply implemented using basic universal gates without requiring excessive resources. Meanwhile, the hash function is separated from the main acting as the new top-level function, whereas the main, behave \YH{behaving} as the testbench, calls the hash and returns the result.
 
 \subsection{How we generate a TCL script for each program.}
+
+\YH{I feel like this Section could be combined with the previous section, as they touch similar aspects.  The TCL generation could then be explained in the middle of the previous section where the TCL generation is first mentioned.}
+
+\YH{This section might also be a bit too detailed, if we want to keep as much as space as possible for the results, it might be better to focus on pitfalls of the TCL generation or transformations of C in general, and the problems that were encountered.}
+
 The automated TCL script generation is required for both Vivado HLS and LegUp HLS. Vivado HLS uses two TCL scripts throughout the process. One is used for declaring directives, and the other one is for instructing the design flow. As for LegUp HLS, it requires one TCL file and one Makefile, which are only used for specifying directives. The design flow is instructed through the shell script. For both tools, the directive TCL script is generated randomly but validly based on the information extracted during the pre-processing stage.
 
-Starting with Vivado HLS, an example directive file generated is shown in \ref{sample directive tcl figure}. Categories of optimization, including function-level, loop-level, and variable-wise, is first selected randomly. Then the specific directive applied are picked randomly as well. 
+Starting with Vivado HLS, an example directive file generated is shown in \ref{sample directive tcl figure}. Categories of optimization, including function-level, loop-level, and variable-wise, is first selected randomly. Then the specific directive applied are picked randomly as well.
+
+\YH{These descriptions could maybe be a bit more concise, for example it might be nice to summarise them in a table?}
 
 \begin{enumerate}
-  \item For loop-level optimization, the base limitation is the number of for-loops that are available for optimization both inside the program and inside each function. It is possible for the program to have zero for-loops or massive nested for-loops, so there is a wide range of possible numbers. If the program contains for-loops, the amount selected for optimization is chosen between 1 and the total number included in the program. We have included 9 types of loop-level optimization, including loop pipeline with rewind, loop pipeline with flush, loop pipeline, loop unrolling, loop flatten, loop merge, loop tripcount, loop inline, and expression balance. Those optimizations will be randomly selected for each for-loop. Once the directive is chosen, it will be automatically printed to the directive TCL script with its targeted loop name and name of the function it is in.
-  
+  \item For loop-level \YH{Maybe for-loop-level} optimization, the base limitation is the number of for-loops that are available for optimization both inside the program and inside each function. It is possible for the program to have zero for-loops or massive nested for-loops, so there is a wide range of possible numbers. If the program contains for-loops, the amount selected for optimization is chosen between 1 and the total number included in the program. We have included 9 types of loop-level optimization, including loop pipeline with rewind, loop pipeline with flush, loop pipeline, loop unrolling, loop flatten, loop merge, loop tripcount, loop inline, and expression balance. Those optimizations will be randomly selected for each for-loop. Once the directive is chosen, it will be automatically printed to the directive TCL script with its targeted loop name and name of the function it is in.
+
   \item For function-level optimization, the restriction is the total number of functions. The selection steps stay the same with which of the loop-level. Function-level optimization we allowed includes function pipeline, function pipeline with flush, function-level loop merge, function inline, and expression balance. Each directive will be specified and saved to the TCL script with the targeted function name.  
-  
-  \item As for variable-wise optimization, which is only applied to array variables, the limitation is the total number of global array variables used throughout the program. Available optimization includes vertical mode array map, horizontal mode array map, array partition, and array reshape. Each variable-wise directive will be added to the TCL script with the targeted array variable name. Note that the top-level function name is also specified for adding variable-wise directives because, for Csmith generated program, top-level function reads or writes all the variables. 
-  
+
+  \item As for variable-wise optimization, which is only applied to array variables, the limitation is the total number of global array variables used throughout the program. Available optimization includes vertical mode array map, horizontal mode array map, array partition, and array reshape. Each variable-wise directive will be added to the TCL script with the targeted array variable name. Note that the top-level function name is also specified for adding variable-wise directives because, for Csmith generated program, top-level function reads or writes all the variables.
 \end{enumerate}
+
 Once the directive TCL script is successfully generated, we will proceed to design-flow TCL script implementation. Design-flow TCL instructs Vivado HLS to perform RTL synthesis and verification in a specific order. The standard flow includes create a new project, set the top-level function, add necessary files, declare the solution number, declare the target device, set the clock period, source the directive TCL, synthesis C program, simulate C, co-simulate C and RTL, and finally export the design. The files added include pre-processed C program, self-implemented self-checking testbench, and the golden GCC result. The only modification that needs to be made for each test case is the project name. Other commands stay the same. For simplicity, the project is named following the add\underline{ }i pattern, where i is the number of the test case currently performing. All of which are specified inside the design-flow TCL in order. 
 
 Comparing to Vivado HLS, LegUp HLS does not require an extra TCL script for design-flow controlling but two scripts, Makefile and Config TCL script, for specifying optimizations. Different optimizations are specified in different files. Supported Makefile directives include partial loop unrolling with a threshold, disable inline, and disable all optimizations. Available Config TCL directives include partial loop pipeline, all loop pipeline, disable loop pipeline, resource-sharing loop pipeline, and accelerating functions. If the optimizations in Config TCL are chosen, the Config TCL needs to be explicitly sourced inside the Makefile as the local configuration. Otherwise, the standard global LegUp Config TCL will be used. Selection limitations and selection processes follow the same method as which of Vivado HLS. Loop type directives chosen are restricted by the total number of loops and need to be specified with the loop name. Note that loop-level optimization can only be applied to the innermost for-loop. Function-level directives selected are limited by the total number of functions and should be specified with function names. 
@@ -239,14 +246,14 @@ Comparing to Vivado HLS, LegUp HLS does not require an extra TCL script for desi
 \subsection{How we run each HLS tool, using timeouts as appropriate.}
 The running stage of the testing process follows the order of C synthesis, C simulation, C to RTL translation, and eventually RTL simulation. Although the overall running process is identical, each HLS tool has a little twist at the running stage.  
 
-Vivado HLS follows the standard running process with one extra step, C/RTL co-simulation. Its testing flow is instructed by the automatically generated TCL file, described in section \ref{tcl_generation section}. Before entering the Vivado HLS, a golden C result is first produced using GCC. The result is saved to the out.gold text file and eventually added to the Vivado project at the add-files step. Since Csmith can generate programs that fail to terminate, to avoid being trapped, a timeout limitation is set for 5 minutes starting when the executable is generated and ready to run. Once finished, the exit condition can be determined by echoing and comparing the exit status. If the exit code shows forced termination, the current test case will be dumped and treat as invalid. Besides, it is also possible for a successfully executed Csmith programs to return no result. Thus, not only the exit code is checked, but also the result printout is considered. This can be done by checking whether if any content has been written to the out.gold text file. If the text file is empty, the current test case will be forced to stop as well. The testing process will not proceed to Vivado HLS unless those two checks are passed. The second timeout limits the Vivado HLS’s runtime. It starts once the Vivado HLS’s project is created after passing the GCC-related checks. The second timeout is set for 2 hours. During the 2-hour period, Vivado HLS first performs its own C synthesis and simulation to produce a C result for comparing against the golden GCC result. Then C/RTL co-simulation step, which employs the self-checking testbench, simulates both C and RTL and compares the results. Thus, two comparisons are be made during the 2-hour period, and two types of unmatching results can exist. The detailed result extraction and comparison method will be described in \ref{result extraction section}. A project finishes typically within 2 hours, but not always. C to RTL translation, as well as RTL simulation, usually takes up most of the time. Note that a project that could not complete within 2 hours does not count as faulty, since most of the incomplete projects are still running when being forced to terminate by the terminal, based on log files. 
+Vivado HLS follows the standard running process with one extra step, C/RTL co-simulation. Its testing flow is instructed by the automatically generated TCL file, described in section \ref{tcl_generation section}. Before entering the Vivado HLS, a golden C result is first produced using GCC. The result is saved to the out.gold text file and eventually added to the Vivado project at the add-files step. Since Csmith can generate programs that fail to terminate, to avoid being trapped, a timeout limitation is set for 5 minutes starting when the executable is generated and ready to run. Once finished, the exit condition can be determined by echoing and comparing the exit status. If the exit code shows forced termination, the current test case will be dumped and treat as invalid. Besides, it is also possible for a successfully executed Csmith programs to return no result. Thus, not only the exit code is checked, but also the result printout is considered. This can be done by checking whether if any \YH{whether any} content has been written to the out.gold text file. If the text file is empty, the current test case will be forced to stop as well. The testing process will not proceed to Vivado HLS unless those two checks are passed. The second timeout limits the Vivado HLS’s runtime. It starts once the Vivado HLS’s project is created after passing the GCC-related checks. The second timeout is set for 2 hours. During the 2-hour period, Vivado HLS first performs its own C synthesis and simulation to produce a C result for comparing against the golden GCC result. Then C/RTL co-simulation step, which employs the self-checking testbench, simulates both C and RTL and compares the results. Thus, two comparisons are be made during the 2-hour period, and two types of unmatching results can exist. The detailed result extraction and comparison method will be described in \ref{result extraction section}. A project finishes typically within 2 hours, but not always. C to RTL translation, as well as RTL simulation, usually takes up most of the time. Note that a project that could not complete within 2 hours does not count as faulty, since most of the incomplete projects are still running when being forced to terminate by the terminal, based on log files.
 
-For running LegUp HLS, version 4.0 and 7.5 differs. Version 7.5 follows the same pattern as which for Vivado HLS, but under the Windows environment and launched as GUI. Since version 7.5 did not being tested exhaustively, the following description for the LegUp HLS running process will only be focused on and applied to version 4.0. Version 4.0 running process is instructed using a bash shell script, and two timeouts are used. The first 5 minutes timeout is applied for the same reason as which for Vivado HLS. The C result is produced through GCC with -m32 command for 32-bit data width. This is done to match with LegUp’s behavior. Then LegUp HLS does C to RTL translation and RTL simulation by calling the “make default v” command. And that is where the second 2-hour period timeout is placed. The RTL result will be used to compare directly against the golden C result, so only one comparison will be made. 
+For running LegUp HLS, version 4.0 and 7.5 differs\YH{For running LegUp HLS, version 4.0 and 7.5 differs $\rightarrow$ LegUp version 4.0 and 7.5 differ greatly in how they are executed.}. Version 7.5 follows the same pattern as which for Vivado HLS, but under the Windows environment and launched as GUI. Since version 7.5 did \YH{did $\rightarrow$ was} not being tested exhaustively, the following description for the LegUp HLS running process will only be focused on and applied to version 4.0. Version 4.0 running process is instructed using a bash shell script, and two timeouts are used. The first 5 minutes timeout is applied for the same reason as which for Vivado HLS \YH{as which for $\rightarrow$ as for}. The C result is produced through GCC with -m32 command for 32-bit data width. This is done to match with \YH{with $\rightarrow$ ``''} LegUp’s behavior. Then LegUp HLS does C to RTL translation and RTL simulation by calling the ``make default v'' command. And that is where the second 2-hour period timeout is placed. The RTL result will be used to compare directly against the golden C result, so only one comparison will be made.
 
 Lastly, Intel HLS is instructed by a batch script and employs the i++ compiler. The essential difference of running Intel HLS comparing to two other tools is that Intel HLS processes the C++ language, whereas the other two tools process C language. While Intel HLS uses i++ as its compiler comparing to GCC for the other two tolls, the timeout restriction still applies. And the number of timeouts placed has increased to 4. The first timeout sets when compiling the C++ program to CPU and returning an executable once finished. The second timeout is placed when running the executable to get the C++ result. The third timeout, which been given the most extended period, is at synthesizing the design and generating the co-simulation executable. Finally, running the co-simulation executable requires the fourth timeout. The test case can be dumped at any timeout period if the task is not finished within the limited time. 
 
 \subsection{How we extract and compare the result}
-Results are extracted automatically either from log files/transcripts or the terminal’s output and then send for comparison either done by the HLS tools or self-implemented checking method. Although results will be compared immediately and returned for determining whether if the reduction process should be entered, it is still necessary to save every result produced for later analysis stage. 
+Results are extracted automatically either from log files/transcripts or the terminal’s output and then send for comparison either done by the HLS tools or self-implemented checking method. Although results will be compared immediately and returned for determining whether if the \YH{whether the} reduction process should be entered, it is still necessary to save every result produced for later analysis stage.
 
 Mentioned in the \ref{how to timeout section}, two comparisons are performed and returned by Vivado HLS automatically using the self-checking testbench. The returned exit code can be used as an indicator of whether to start the reduction process. The first comparison is made between the golden GCC C result and Vivado HLS’s C simulation result, whereas the second comparison is made between Vivado HLS’s C simulation and RTL simulation result. The golden result is extracted from the terminal’s output, whereas the Vivado HLS’s C simulation result and RTL simulation result are read from the log file. Saving results from the terminal’s output is done by simply echoing the exit code and redirect it to the text file. However, finding results from the log files requires more effort. The log file is automatically generated by Vivado HLS and has a specific pattern. It follows the top-down flow recording information about the C synthesis, C simulation, C testbench checking/comparing, and RTL testbench checking/comparing. Each section has a fixed header that indicates the step is in, such as C TB testing. Thus, the result extraction method is implemented by looping through the file, finding the specific header name, extract the hexadecimal results, and then save to the result file. Ideally, both comparison results can be found inside the log file. However, the log file may stop at the first comparisons if the C result doesn’t match with the GCC result. In the cases, only the C result will be extracted and saved.
author	Yann Herklotz <git@yannherklotz.com>	2020-08-17 14:17:10 +0100
committer	Yann Herklotz <git@yannherklotz.com>	2020-08-17 14:17:10 +0100
commit	42292ef1e5552cc599c20a3d43953f56a56107cf (patch)
tree	413225050158f31ba4499a93cfb7d640b8a2777d /main.tex
parent	5c8114850ce22267a594a0bb2ca825e389677170 (diff)
download	fccm21_esrhls-42292ef1e5552cc599c20a3d43953f56a56107cf.tar.gz fccm21_esrhls-42292ef1e5552cc599c20a3d43953f56a56107cf.zip