diff options
Diffstat (limited to 'cil/doc/cil004.html')
-rw-r--r-- | cil/doc/cil004.html | 350 |
1 files changed, 350 insertions, 0 deletions
diff --git a/cil/doc/cil004.html b/cil/doc/cil004.html new file mode 100644 index 00000000..16fde392 --- /dev/null +++ b/cil/doc/cil004.html @@ -0,0 +1,350 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" + "http://www.w3.org/TR/REC-html40/loose.dtd"> +<HTML> +<HEAD> + + + +<META http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"> +<META name="GENERATOR" content="hevea 1.08"> + +<base target="main"> +<script language="JavaScript"> +<!-- Begin +function loadTop(url) { + parent.location.href= url; +} +// --> +</script> +<LINK rel="stylesheet" type="text/css" href="cil.css"> +<TITLE> +Compiling C to CIL +</TITLE> +</HEAD> +<BODY > +<A HREF="cil003.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> +<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A> +<A HREF="cilly.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> +<HR> + +<H2 CLASS="section"><A NAME="htoc4">4</A> Compiling C to CIL</H2><A NAME="sec-cabs2cil"></A> +In this section we try to describe a few of the many transformations that are +applied to a C program to convert it to CIL. The module that implements this +conversion is about 5000 lines of OCaml code. In contrast a simple program +transformation that instruments all functions to keep a shadow stack of the +true return address (thus preventing stack smashing) is only 70 lines of code. +This example shows that the analysis is so much simpler because it has to +handle only a few simple C constructs and also because it can leverage on CIL +infrastructure such as visitors and pretty-printers.<BR> +<BR> +In no particular order these are a few of the most significant ways in which +C programs are compiled into CIL: +<OL CLASS="enumerate" type=1><LI CLASS="li-enumerate"> +CIL will eliminate all declarations for unused entities. This means that +just because your hello world program includes <TT>stdio.h</TT> it does not mean +that your analysis has to handle all the ugly stuff from <TT>stdio.h</TT>.<BR> +<BR> +<LI CLASS="li-enumerate">Type specifiers are interpreted and normalized: +<PRE CLASS="verbatim"><FONT COLOR=blue> +int long signed x; +signed long extern x; +long static int long y; + +// Some code that uses these declaration, so that CIL does not remove them +int main() { return x + y; } +</FONT></PRE> +See the <A HREF="examples/ex1.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Anonymous structure and union declarations are given a name. +<PRE CLASS="verbatim"><FONT COLOR=blue> + struct { int x; } s; +</FONT></PRE> +See the <A HREF="examples/ex2.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Nested structure tag definitions are pulled apart. This means that all +structure tag definitions can be found by a simple scan of the globals. +<PRE CLASS="verbatim"><FONT COLOR=blue> +struct foo { + struct bar { + union baz { + int x1; + double x2; + } u1; + int y; + } s1; + int z; +} f; +</FONT></PRE> +See the <A HREF="examples/ex3.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">All structure, union, enumeration definitions and the type definitions +from inners scopes are moved to global scope (with appropriate renaming). This +facilitates moving around of the references to these entities. +<PRE CLASS="verbatim"><FONT COLOR=blue> +int main() { + struct foo { + int x; } foo; + { + struct foo { + double d; + }; + return foo.x; + } +} +</FONT></PRE> +See the <A HREF="examples/ex4.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Prototypes are added for those functions that are called before being +defined. Furthermore, if a prototype exists but does not specify the type of +parameters that is fixed. But CIL will not be able to add prototypes for those +functions that are neither declared nor defined (but are used!). +<PRE CLASS="verbatim"><FONT COLOR=blue> + int f(); // Prototype without arguments + int f(double x) { + return g(x); + } + int g(double x) { + return x; + } +</FONT></PRE> +See the <A HREF="examples/ex5.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Array lengths are computed based on the initializers or by constant +folding. +<PRE CLASS="verbatim"><FONT COLOR=blue> + int a1[] = {1,2,3}; + int a2[sizeof(int) >= 4 ? 8 : 16]; +</FONT></PRE> +See the <A HREF="examples/ex6.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Enumeration tags are computed using constant folding: +<PRE CLASS="verbatim"><FONT COLOR=blue> +int main() { + enum { + FIVE = 5, + SIX, SEVEN, + FOUR = FIVE - 1, + EIGHT = sizeof(double) + } x = FIVE; + return x; +} + +</FONT></PRE> +See the <A HREF="examples/ex7.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Initializers are normalized to include specific initialization for the +missing elements: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int a1[5] = {1,2,3}; + struct foo { int x, y; } s1 = { 4 }; +</FONT></PRE> +See the <A HREF="examples/ex8.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Initializer designators are interpreted and eliminated. Subobjects are +properly marked with braces. CIL implements +the whole ISO C99 specification for initializer (neither GCC nor MSVC do) and +a few GCC extensions. +<PRE CLASS="verbatim"><FONT COLOR=blue> + struct foo { + int x, y; + int a[5]; + struct inner { + int z; + } inner; + } s = { 0, .inner.z = 3, .a[1 ... 2] = 5, 4, y : 8 }; +</FONT></PRE> +See the <A HREF="examples/ex9.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">String initializers for arrays of characters are processed +<PRE CLASS="verbatim"><FONT COLOR=blue> +char foo[] = "foo plus bar"; +</FONT></PRE> +See the <A HREF="examples/ex10.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">String constants are concatenated +<PRE CLASS="verbatim"><FONT COLOR=blue> +char *foo = "foo " " plus " " bar "; +</FONT></PRE> +See the <A HREF="examples/ex11.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Initializers for local variables are turned into assignments. This is in +order to separate completely the declarative part of a function body from the +statements. This has the unfortunate effect that we have to drop the <TT>const</TT> +qualifier from local variables ! +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x = 5; + struct foo { int f1, f2; } a [] = {1, 2, 3, 4, 5 }; +</FONT></PRE> +See the <A HREF="examples/ex12.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Local variables in inner scopes are pulled to function scope (with +appropriate renaming). Local scopes thus disappear. This makes it easy to find +and operate on all local variables in a function. +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x = 5; + int main() { + int x = 6; + { + int x = 7; + return x; + } + return x; + } +</FONT></PRE> +See the <A HREF="examples/ex13.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Global declarations in local scopes are moved to global scope: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x = 5; + int main() { + int x = 6; + { + static int x = 7; + return x; + } + return x; + } +</FONT></PRE> +See the <A HREF="examples/ex14.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">Return statements are added for functions that are missing them. If the +return type is not a base type then a <TT>return</TT> without a value is added. +The guaranteed presence of return statements makes it easy to implement a +transformation that inserts some code to be executed immediately before +returning from a function. +<PRE CLASS="verbatim"><FONT COLOR=blue> + int foo() { + int x = 5; + } +</FONT></PRE> +See the <A HREF="examples/ex15.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">One of the most significant transformations is that expressions that +contain side-effects are separated into statements. +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x, f(int); + return (x ++ + f(x)); +</FONT></PRE> +See the <A HREF="examples/ex16.txt">CIL output</A> for this +code fragment<BR> +<BR> +Internally, the <TT>x ++</TT> statement is turned into an assignment which the +pretty-printer prints like the original. CIL has only three forms of basic +statements: assignments, function calls and inline assembly.<BR> +<BR> +<LI CLASS="li-enumerate">Shortcut evaluation of boolean expressions and the <TT>?:</TT> operator are +compiled into explicit conditionals: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x; + int y = x ? 2 : 4; + int z = x || y; + // Here we duplicate the return statement + if(x && y) { return 0; } else { return 1; } + // To avoid excessive duplication, CIL uses goto's for + // statement that have more than 5 instructions + if(x && y || z) { x ++; y ++; z ++; x ++; y ++; return z; } +</FONT></PRE> +See the <A HREF="examples/ex17.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">GCC's conditional expression with missing operands are also compiled +into conditionals: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int f();; + return f() ? : 4; +</FONT></PRE> +See the <A HREF="examples/ex18.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">All forms of loops (<TT>while</TT>, <TT>for</TT> and <TT>do</TT>) are compiled +internally as a single <TT>while(1)</TT> looping construct with explicit <TT>break</TT> +statement for termination. For simple <TT>while</TT> loops the pretty printer is +able to print back the original: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x, y; + for(int i = 0; i<5; i++) { + if(i == 5) continue; + if(i == 4) break; + i += 2; + } + while(x < 5) { + if(x == 3) continue; + x ++; + } +</FONT></PRE> +See the <A HREF="examples/ex19.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">GCC's block expressions are compiled away. (That's right there is an +infinite loop in this code.) +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x = 5, y = x; + int z = ({ x++; L: y -= x; y;}); + return ({ goto L; 0; }); +</FONT></PRE> +See the <A HREF="examples/ex20.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">CIL contains support for both MSVC and GCC inline assembly (both in one +internal construct)<BR> +<BR> +<LI CLASS="li-enumerate">CIL compiles away the GCC extension that allows many kinds of constructs +to be used as lvalues: +<PRE CLASS="verbatim"><FONT COLOR=blue> + int x, y, z; + return &(x ? y : z) - & (x ++, x); +</FONT></PRE> +See the <A HREF="examples/ex21.txt">CIL output</A> for this +code fragment<BR> +<BR> +<LI CLASS="li-enumerate">All types are computed and explicit casts are inserted for all +promotions and conversions that a compiler must insert:<BR> +<BR> +<LI CLASS="li-enumerate">CIL will turn old-style function definition (without prototype) into +new-style definitions. This will make the compiler less forgiving when +checking function calls, and will catch for example cases when a function is +called with too few arguments. This happens in old-style code for the purpose +of implementing variable argument functions.<BR> +<BR> +<LI CLASS="li-enumerate">Since CIL sees the source after preprocessing the code after CIL does +not contain the comments and the preprocessing directives.<BR> +<BR> +<LI CLASS="li-enumerate">CIL will remove from the source file those type declarations, local +variables and inline functions that are not used in the file. This means that +your analysis does not have to see all the ugly stuff that comes from the +header files: +<PRE CLASS="verbatim"><FONT COLOR=blue> +#include <stdio.h> + +typedef int unused_type; + +static char unused_static (void) { return 0; } + +int main() { + int unused_local; + printf("Hello world\n"); // Only printf will be kept from stdio.h +} +</FONT></PRE> +See the <A HREF="examples/ex22.txt">CIL output</A> for this +code fragment</OL> +<HR> +<A HREF="cil003.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A> +<A HREF="ciltoc.html"><IMG SRC ="contents_motif.gif" ALT="Up"></A> +<A HREF="cilly.html"><IMG SRC ="next_motif.gif" ALT="Next"></A> +</BODY> +</HTML> |