From a5f03d96eee482cd84861fc8cefff9eb451c0cad Mon Sep 17 00:00:00 2001 From: xleroy Date: Sun, 29 Mar 2009 09:47:11 +0000 Subject: Cleaned up configure script. Distribution of CIL as an expanded source tree with changes applied (instead of original .tar.gz + patches to be applied at config time). git-svn-id: https://yquem.inria.fr/compcert/svn/compcert/trunk@1020 fca1b0fc-160b-0410-b1d3-a4f43f01ea2e --- cil/doc/ext.html | 506 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 506 insertions(+) create mode 100644 cil/doc/ext.html (limited to 'cil/doc/ext.html') diff --git a/cil/doc/ext.html b/cil/doc/ext.html new file mode 100644 index 00000000..532e2258 --- /dev/null +++ b/cil/doc/ext.html @@ -0,0 +1,506 @@ + + + + + + + + + + + + + +Library of CIL Modules + + + +Previous +Up +Next +
+ +

8  Library of CIL Modules


+
+We are developing a suite of modules that use CIL for program analyses and +transformations that we have found useful. You can use these modules directly +on your code, or generally as inspiration for writing similar modules. A +particularly big and complex application written on top of CIL is CCured +(../ccured/index.html).
+
+ +

8.1  Control-Flow Graphs

+The Cil.stmt datatype includes fields for intraprocedural +control-flow information: the predecessor and successor statements of +the current statement. This information is not computed by default. +If you want to use the control-flow graph, or any of the extensions in +this section that require it, you have to explicitly ask CIL to +compute the CFG.
+
+ +

8.1.1  The CFG module (new in CIL 1.3.5)

+The best way to compute the CFG is with the CFG module. Just invoke +Cfg.computeFileCFG on your file. The Cfg API +describes the rest of actions you can take with this module, including +computing the CFG for one function at a time, or printing the CFG in +dot form.
+
+ +

8.1.2  Simplified control flow

+CIL can reduce high-level C control-flow constructs like switch and +continue to lower-level gotos. This completely eliminates some +possible classes of statements from the program and may make the result +easier to analyze (e.g., it simplifies data-flow analysis).
+
+You can invoke this transformation on the command line with +--domakeCFG or programatically with Cil.prepareCFG. +After calling Cil.prepareCFG, you can use Cil.computeCFGInfo +to compute the CFG information and find the successor and predecessor +of each statement.
+
+For a concrete example, you can see how cilly --domakeCFG +transforms the following code (note the fall-through in case 1): +

+  int foo (int predicate) {
+    int x = 0;
+    switch (predicate) {
+      case 0: return 111;
+      case 1: x = x + 1;
+      case 2: return (x+3);
+      case 3: break;
+      default: return 222;
+    }
+    return 333;
+  }
+
+See the CIL output for this +code fragment
+
+ +

8.2  Data flow analysis framework

+The Dataflow module (click for the ocamldoc) contains a +parameterized framework for forward and backward data flow +analyses. You provide the transfer functions and this module does the +analysis. You must compute control-flow information (Section 8.1) +before invoking the Dataflow module.
+
+ +

8.3  Dominators

+The module Dominators contains the computation of immediate + dominators. It uses the Dataflow module.
+
+ +

8.4  Points-to Analysis

+The module ptranal.ml contains two interprocedural points-to +analyses for CIL: Olf and Golf. Olf is the default. +(Switching from olf.ml to golf.ml requires a change in +Ptranal and a recompiling cilly.)
+
+The analyses have the following characteristics: + +The analysis itself is factored into two components: Ptranal, +which walks over the CIL file and generates constraints, and Olf +or Golf, which solve the constraints. The analysis is invoked +with the function Ptranal.analyze_file: Cil.file -> + unit. This function builds the points-to graph for the CIL file +and stores it internally. There is currently no facility for clearing +internal state, so Ptranal.analyze_file should only be called +once.
+
+The constructed points-to graph supports several kinds of queries, +including alias queries (may two expressions be aliased?) and +points-to queries (to what set of locations may an expression point?).
+
+The main interface with the alias analysis is as follows: + +The precision of the analysis can be customized by changing the values +of several flags: + +In practice, the best precision/efficiency tradeoff is achieved by +setting Ptranal.no_sub to false, Ptranal.analyze_mono to +true, and Ptranal.smart_aliases to false. These are the +default values of the flags.
+
+There are also a few flags that can be used to inspect or serialize +the results of the analysis. + + +

8.5  StackGuard

+The module heapify.ml contains a transformation similar to the one +described in “StackGuard: Automatic Adaptive Detection and Prevention of +Buffer-Overflow Attacks”, Proceedings of the 7th USENIX Security +Conference. In essence it modifies the program to maintain a separate +stack for return addresses. Even if a buffer overrun attack occurs the +actual correct return address will be taken from the special stack.
+
+Although it does work, this CIL module is provided mainly as an example of +how to perform a simple source-to-source program analysis and +transformation. As an optimization only functions that contain a dangerous +local array make use of the special return address stack.
+
+For a concrete example, you can see how cilly --dostackGuard +transforms the following dangerous code: +

+  int dangerous() {
+    char array[10];
+    scanf("%s",array); // possible buffer overrun!
+  }
+
+  int main () {
+    return dangerous();
+  }
+
+See the CIL output for this +code fragment
+
+ +

8.6  Heapify

+The module heapify.ml also contains a transformation that moves all +dangerous local arrays to the heap. This also prevents a number of buffer +overruns.
+
+For a concrete example, you can see how cilly --doheapify +transforms the following dangerous code: +

+  int dangerous() {
+    char array[10];
+    scanf("%s",array); // possible buffer overrun!
+  }
+
+  int main () {
+    return dangerous();
+  }
+
+See the CIL output for this +code fragment
+
+ +

8.7  One Return

+The module oneret.ml contains a transformation the ensures that all +function bodies have at most one return statement. This simplifies a number +of analyses by providing a canonical exit-point.
+
+For a concrete example, you can see how cilly --dooneRet +transforms the following code: +

+  int foo (int predicate) {
+    if (predicate <= 0) {
+      return 1;
+    } else {
+      if (predicate > 5)
+        return 2;
+      return 3;
+    }
+  }
+
+See the CIL output for this +code fragment
+
+ +

8.8  Partial Evaluation and Constant Folding

+The partial.ml module provides a simple interprocedural partial +evaluation and constant folding data-flow analysis and transformation. This +transformation requires the --domakeCFG option.
+
+For a concrete example, you can see how cilly --domakeCFG --dopartial +transforms the following code (note the eliminated if branch and the +partial optimization of foo): +

+  int foo(int x, int y) {
+    int unknown;
+    if (unknown)
+      return y+2;     
+    return x+3;      
+  }
+
+  int main () {
+    int a,b,c;
+    a = foo(5,7) + foo(6,7);
+    b = 4;
+    c = b * b;      
+    if (b > c)     
+      return b-c;
+    else
+      return b+c; 
+  }
+
+See the CIL output for this +code fragment
+
+ +

8.9  Reaching Definitions

+The reachingdefs.ml module uses the dataflow framework and CFG +information to calculate the definitions that reach each +statement. After computing the CFG (Section 8.1) and calling +computeRDs on a +function declaration, ReachingDef.stmtStartData will contain a +mapping from statement IDs to data about which definitions reach each +statement. In particular, it is a mapping from statement IDs to a +triple the first two members of which are used internally. The third +member is a mapping from variable IDs to Sets of integer options. If +the set contains Some(i), then the definition of that variable +with ID i reaches that statement. If the set contains None, +then there is a path to that statement on which there is no definition +of that variable. Also, if the variable ID is unmapped at a +statement, then no definition of that variable reaches that statement.
+
+To summarize, reachingdefs.ml has the following interface: + + +

8.10  Available Expressions

+The availexps.ml module uses the dataflow framework and CFG +information to calculate something similar to a traditional available +expressions analysis. After computeAEs is called following a CFG +calculation (Section 8.1), AvailableExps.stmtStartData will +contain a mapping +from statement IDs to data about what expressions are available at +that statement. The data for each statement is a mapping for each +variable ID to the whole expression available at that point(in the +traditional sense) which the variable was last defined to be. So, +this differs from a traditional available expressions analysis in that +only whole expressions from a variable definition are considered rather +than all expressions.
+
+The interface is as follows: + + +

8.11  Liveness Analysis

+The liveness.ml module uses the dataflow framework and +CFG information to calculate which variables are live at +each program point. After computeLiveness is called +following a CFG calculation (Section 8.1), LiveFlow.stmtStartData will +contain a mapping for each statement ID to a set of varinfos +for varialbes live at that program point.
+
+The interface is as follows: + +Also included in this module is a command line interface that +will cause liveness data to be printed to standard out for +a particular function or label. + + +

8.12  Dead Code Elimination

+The module deadcodeelim.ml uses the reaching definitions +analysis to eliminate assignment instructions whose results +are not used. The interface is as follows: + + +

8.13  Simple Memory Operations

+The simplemem.ml module allows CIL lvalues that contain memory +accesses to be even futher simplified via the introduction of +well-typed temporaries. After this transformation all lvalues involve +at most one memory reference.
+
+For a concrete example, you can see how cilly --dosimpleMem +transforms the following code: +

+  int main () {
+    int ***three;
+    int **two;
+    ***three = **two; 
+  } 
+
+See the CIL output for this +code fragment
+
+ +

8.14  Simple Three-Address Code

+The simplify.ml module further reduces the complexity of program +expressions and gives you a form of three-address code. After this +transformation all expressions will adhere to the following grammar: +
+ basic::=
+    Const _ 
+    Addrof(Var v, NoOffset)
+    StartOf(Var v, NoOffset)
+    Lval(Var v, off), where v is a variable whose address is not taken
+                      and off contains only "basic"
+
+ exp::=
+    basic
+    Lval(Mem basic, NoOffset)
+    BinOp(bop, basic, basic)
+    UnOp(uop, basic)
+    CastE(t, basic)
+   
+ lval ::= 
+    Mem basic, NoOffset
+    Var v, off, where v is a variable whose address is not taken and off
+                contains only "basic"
+
In addition, all sizeof and alignof forms are turned into +constants. Accesses to arrays and variables whose address is taken are +turned into "Mem" accesses. All field and index computations are turned +into address arithmetic.
+
+For a concrete example, you can see how cilly --dosimplify +transforms the following code: +

+  int main() {
+    struct mystruct {
+      int a;
+      int b;
+    } m;
+    int local;
+    int arr[3];
+    int *ptr;
+
+    ptr = &local;
+    m.a = local + sizeof(m) + arr[2];
+    return m.a; 
+  } 
+
+See the CIL output for this +code fragment
+
+ +

8.15  Converting C to C++

+The module canonicalize.ml performs several transformations to correct +differences between C and C++, so that the output is (hopefully) valid +C++ code. This may be incomplete — certain fixes which are necessary +for some programs are not yet implemented.
+
+Using the --doCanonicalize option with CIL will perform the +following changes to your program: +
  1. +Any variables that use C++ keywords as identifiers are renamed. +
  2. C allows global variables to have multiple declarations and + multiple (equivalent) definitions. This transformation removes + all but one declaration and all but one definition. +
  3. __inline is #defined to inline, and __restrict + is #defined to nothing. +
  4. C allows function pointers with no specified arguments to be used on + any argument list. To make C++ accept this code, we insert a cast + from the function pointer to a type that matches the arguments. Of + course, this does nothing to guarantee that the pointer actually has + that type. +
  5. Makes casts from int to enum types explicit. (CIL changes enum + constants to int constants, but doesn't use a cast.) +
+
+Previous +Up +Next + + -- cgit