diff options
Diffstat (limited to 'papers/cfrontend_new/paper.tex')
-rwxr-xr-x | papers/cfrontend_new/paper.tex | 382 |
1 files changed, 0 insertions, 382 deletions
diff --git a/papers/cfrontend_new/paper.tex b/papers/cfrontend_new/paper.tex deleted file mode 100755 index 4293554f..00000000 --- a/papers/cfrontend_new/paper.tex +++ /dev/null @@ -1,382 +0,0 @@ -\documentclass[namedreferences]{kluwer} - -\usepackage{latexsym} -%\usepackage{amsmath} pb latex -\usepackage{amssymb} -\usepackage{amsfonts} -\usepackage{mymacros} -\usepackage{infrules} -\usepackage{url} -\usepackage{mymacros} -\usepackage[usenames]{color} - -\input{macros} - -\begin{document} -\begin{article} - -\begin{opening} -\title{Formal verification of C front-end} -\subtitle{Yet another formal verification in Coq of a C front-end} -\runningtitle{C front-end} - -\author{Sandrine \surname{Blazy} \email{{S}andrine.Blazy@ensiie.fr}} -\institute{ENSIIE and INRIA} - -\author{Xavier \surname{Leroy} \email{{X}avier.Leroy@inria.fr}} -\runningauthor{S.Blazy, X.Leroy} -\institute{INRIA} - -\maketitle - -\begin{abstract} -This paper presents -\end{abstract} - -\keywords{formal semantics, formal proof, C language, C compiler} -\end{opening} - -\section{Introduction} - -\section{A formal semantics for \Clight{}} -\subsection{CIL} - -\subsection{Abstract syntax} - -The abstract syntax of \Clight{} is given in figures~\ref{fig:syntax} -and~\ref{fig:syntax2}. -In the Coq formalization, this abstract syntax is presented as -inductive data types, therefore achieving a deep embedding of \Clight{} -into Coq. - -At the level of types, \Clight{} features all the integral types of C, -along with array, pointer (including function pointers), function -types, {\tt struct} and {\tt union} types. Bit-fields in {\tt struct} -or {\tt union} are ignored. -The integral types fully specify the bit size of integers and floats, -unlike the half-specified C types {\tt int}, {\tt long}, etc. - -\sbcm{TO DO: representation of struct and union} - -Within expressions, all C operators are supported. -Side-effects stem from function calls. -All expressions and their sub-expressions are -annotated by their static types. We write $a^\tau$ for the expression -$a$ carrying type $\tau$. These types are necessary to determine the -semantics of type-dependent operators such as the overloaded -arithmetic operators. The following expressions may be in left-value -position: $\id, *a, a_1[a_2]$ and $a \dot \id$. - -\input syntax.tex - -At the level of statements, all structured control statements of C -(conditional, loops, simple {\tt switch}, {\tt break}, {\tt continue} and -{\tt return}) are supported, but not unstructured statements -(unstructured {\tt switch} and {\tt goto}). -Contrary to C, the default case is mandatory in a \Clight{} {\tt switch} statement -and it is the last branch of the {\tt switch}. -An assignment is treated as a statement (instead of an expression). -This transformation from expression into statements is performed by CIL. -Two kinds of variables are allowed: global variables and local {\tt auto} -variables declared at the beginning of a function. Block-scoped local -variables and {\tt static} variables are omitted because they are emulated by -pulling their declarations to function scope or global scope, -respectively. Consequently, there is no block statement in \Clight{}. - -A function $f$ defined in \Clight{} consists of a header (\textit{i.e.} -{\tt fn\_params(f)} the parameters of $f$ and {\tt fn\_return(f)} -the return type of $f$), a list of local variables ({\tt fn\_vars(f)}) -and a body ({\tt fn\_body(f)}) that is a statement. The functions composing -the program are either internal functions, defined within \Clight{}, or -external functions (a.k.a. system calls). An external function consists -of the internal identifier and the signature of the function. - -A program $p$ consists of a list of global variable declarations -({\tt prog\_vars(p)}), a list of function definitions ({\tt prog\_funct(p)}), -and an identifier naming the entry point of the program ({\tt prog\_main(p)}). -In a global variable declaration, the list of initial values -represents all values that need to be stored (\textit{e.g.} the -values of the cells of an array). - -\input syntax2.tex - -\subsection{Static semantics} - The type system is very coarse: we check only the typing properties - that matter for the translation to be correct. Essentially, - we need to check that the types attached to variable references - match the declaration types for those variables. - -\subsection{Dynamic semantics} - -The dynamic semantics of \Clight{} is specified using natural semantics, -also known as big-step operational semantics. - These natural semantics capture the final result of program execution, as -well as traces of calls to external functions that emit an event when applied. -The big-step semantics described in~\cite{blazy06:fm} did not observe traces. -This addition of traces leads to a significantly -stronger observational equivalence between source and machine code. - -Execution traces represent the interactions between the program and -the rest of the world (\textit{i.e.} the input-output activities of -the program); they are defined in figure~\ref{fig:trace}. -A trace is a finite list of input-output events. -Each call to an external function produces an event. -An event consists of an identifier and the actual values of the parameters of -the called function. Usually, the identifier represents the called function, -but more generally it could also represent any observable identifier. -Starting from an initial empty event (called $\E0$), events are propagated and -concatenated during program execution. - -\input trace.tex - -While the semantics of -C is not deterministic (the evaluation order for expressions is not -completely specified and compilers are free to choose between several -orders), the semantics of \Clight{} is completely deterministic and -imposes a left-to-right evaluation order, consistent with the order -implemented by our compiler. This choice simplifies greatly the -semantics compared with, for instance, Norrish's semantics for C -\cite{Norrish:phd}, which captures (at great expense) the -non-determinism allowed by the ISO C specification. Our semantics can -therefore be viewed as a refinement of (a subset of) the ISO C -semantics, or that of Norrish. - -The semantics is defined by 7 judgements (relations): -$$\begin{array}{rclr} -G, E & |- & a, M \evallvalue \loc, \tr, M' & \mbox{(expressions in l-value position)} \\ -G, E & |- & a , M \evalexpr v, \tr, M' & \mbox{(expressions in r-value position)} \\ -G, E & |- & \seq a, M \evalexpr \seq v, \tr, M' & \mbox{(list of expressions)}\\ -G, E & |- & s, M \evalstmt \out, \tr, M' & \mbox{(statements)} \\ -G, E & |- & \ls, M \evalstmt \out, \tr, M' & \mbox{(labeled statements of a switch)} \\ -G & |- & f(\seq v), M \evalfunc v, \tr, M' & \mbox{(function invocations)} \\ - & \vdash & p \evalprog v, \tr & \mbox{(programs)} -\end{array}$$ -Each judgement relates a syntactic element (expression, statement, -etc.) and an initial memory state to the result of executing this -syntactic element, as well as the final memory state at the end of -execution and also the trace of input-output events (system calls). - The various kinds of results, as well as the evaluation -environments, are defined in figure~\ref{fig:values}. - -\input values.tex - -For instance, executing an expression $a$ in l-value position results in a -memory location $\loc$ (a memory block reference and an offset within that -block), while executing an expression $a$ in r-value position results in -the value $v$ of the expression. Values range over 32-bit integers, -64-bit floats, memory locations (pointers), and an undefined value -that represents for instance the value of uninitialized variables. -The result associated with the execution of a statement $s$ is an -``outcome'' indicating how the execution terminated: either normally -by running to completion or prematurely via a {\tt break}, {\tt continue} or -{\tt return} statement. -The invocation of a function $f$ yields a value $v$, and so does the execution -of a program. - -Two evaluation environments, defined in figure~\ref{fig:values}, -appear as parameters to the judgements. The local environment $E$ maps -local variables to references of memory blocks containing the values -of these variables. These blocks are allocated at function entry and -freed at function return. The global environment $G$ maps global -variables and function names to memory references. It also maps -some references (those corresponding to function pointers) to function -definitions. - -The memory model of the semantics is described in~\cite{icfem:sb}. -A memory $M$ maps block references to block contents. -Each block represents the result of a C \texttt{malloc}, -or a stack frame, a global static variable, or a function -code-pointer. -A block content consists of the dimensions of the block (low and high bounds) -plus a mapping from byte offsets to byte-sized memory cells (\textit{i.e.} -values). - -\sbcm{TO DO: a detailler (faire une section a part?): il faut expliquer -loadval et storeval qui sont utilisees dans la semantique.} - -In the Coq specification, the 7 judgements of the dynamic semantics -are encoded as mutually-inductive predicates. Each defining case of -each predicate corresponds exactly to an inference rule in the -conventional, on-paper presentation of natural semantics. We have one -inference rule for each kind of expression and statement described in -figure~\ref{fig:syntax}. - We show most of the inference rules in figures~\ref{fig:dynsem1} -to~\ref{fig:dynsem4}. - -\input dynsem1.tex - -The first five rules of figure~\ref{fig:dynsem1} illustrate the -evaluation of an expression in l-value position. -A variable $\id$ evaluates to the location $(E(\id), 0)$. -If an expression $a$ evaluates to a pointer value $\loc$, then -the location of the dereferencing expression $(\hbox{{\tt *}}a)^\tau$ is $\loc$. - -Rule~\ref{rule:3} shows the evaluation of an array cell. -The location of an array cell $\unannot{a_1}{\tau_1} [\unannot{a_2}{\tau_2}]$ -is computed from the location of the array $\unannot{a_1}{\tau_1}$ -({\it i.e.} the location of the first cell) and the value of the index -$\unannot{a_2}{\tau_2}$ thanks to the function {\tt add}. -{\tt add} is a partial function: -it can be undefined if the types and the shapes of argument values are -incompatible (e.g. an integer addition of two pointer values). -In the Coq specification, it is a total -function returning optional values: either {\tt None} in case of failure, or -${\tt Some}(v)$, abbreviated as $\some v$, in case of success. -%More precisely, the computation succeeds only if $v_1$ and -%$v_2$ are either 2 integers, or 2 floats or a pointer value and an integer. -%The corresponding types $\tau_1$ and $\tau_2$ must also be -%compatible with these values. For instance, if $\tau_1$ is a {\tt Tarray}, -%then $\tau_2$ must be a {\tt Tint}. -The trace resulting from the evaluation of -$\unannot{a_1}{\tau_1} [\unannot{a_2}{\tau_2}]$ is the concatenation of the -traces resulting from the evaluation of $\unannot{a_1}{\tau_1}$ and -$\unannot{a_2}{\tau_2}$. - - -In a similar way, the location of an access to a {\tt struct} field -$\unannot{a}{\tau} \dot f$ is computed from the location of the {\tt struct} -$\unannot{a}{\tau}$ and an offset with that {\tt struct} (see -rule~\ref{rule:4}). -\sbcm{un peu rapide, cf. repr ?} -The location of an access to a {\tt union} field is the location of -the {\tt union} (see rule~\ref{rule:5}). - -The last four rule of figure~\ref{fig:dynsem1} illustrate the -evaluation of an expression in r-value position. -Rule~\ref{rule:6} rule shows the evaluation of a l-value expression in an -r-value context. The expression is evaluated to its location $l$, -with final memory state $M'$. The value at location $l$ in $M'$ is -fetched using the {\tt loadval} function (see -\sbcm{(faire une section a part?)} -%section~\ref{sec:mem} -) and returned. - -If the location of an expression $a$ is $\loc$, then -the value of the expression $(\hbox{{\tt \&}}a)^\tau$ is also $\loc$ -(rule~\ref{rule:7}). -Rule~\ref{rule:8} evaluates an application of a binary operator $\op$ to -expressions $\unannot{a_1}{\tau_1}$ and $\unannot{a_2}{\tau_2}$. -Both sub-expressions are evaluated in -sequence, and their values are combined with the -{\tt eval{\char95}binary{\char95}operation} function, which takes as additional -arguments the types $\tau_1$ and $\tau_2$ of the arguments, in order to resolve -overloaded and type-dependent operators. - -Rule~\ref{rule:9} evaluates a cast expression $(\tau)\unannot{a}{\tau_a}$. -The expression $\unannot{a}{\tau_a}$ is evaluated, and its -value is cast from its natural type $\tau_a$ to the expected type -$\tau$ using the partial function {\tt cast}. This function -performs appropriate conversions, truncations and sign-extensions over -integers and floats, and may fail for undefined casts. The result $v$ -of the cast is then the final result. - -The rules in figure~\ref{fig:dynsem2} are -examples of statements execution. -The execution of a {\tt skip} statement yields the {\tt Out{\char95}normal} -outcome and the empty trace. -Similarly, the execution of a {\tt break} (resp. {\tt continue}) statement -yields the {\tt Out{\char95}break} (resp. {\tt Out{\char95}continue}) -outcome and the empty trace. - -Rule~\ref{rule:12} evaluates an assignment statement. -An assignment statement $a_1^\tau \hbox{{\tt \ =\ }} a_2^{\tau'}$ evaluates -the l-value $a_1$ to a location $\loc$, then the r-value $a_2$ to a value $v$. -As implicit casts are inserted by CIL, there is no need to cast this $v$. -$v$ is stored in memory at location $\loc$, resulting in the final memory - state $M_3$. - -The execution of a sequence of two statements starts with the -execution of the first statement, thus yielding an outcome that -determines if the second statement must be executed or not -(rules~\ref{rule:15}~and~\ref{rule:16}). -Rules~\ref{rule:17}--\ref{rule:19} describe the execution of a -{\tt while} loop. Once the condition of a while loop is evaluated to a -value $v$, the execution of the loop terminates normally if $v$ is -false. If $v$ is true, then the loop body is executed, thus yielding -an outcome $\out$. If $\out$ is {\tt Out{\char95}break}, the loop terminates -normally. If $\out$ is {\tt Out{\char95}normal} or {\tt Out{\char95}continue}, -the whole loop is reexecuted in the memory state modified by the execution of the -body. -Finally, rules~\ref{rule:20}--\ref{rule:21} describe the execution of a -{\tt return} statement. - -\input dynsem2.tex - -The rules of figure~\ref{fig:dynsem3} define the execution of a call statement. -When the expression ${a_{fun}}^{\tau}$ is evaluated to a pointer value that is the -location (in the global environment $G$) of a function $f$, and when $f$ has the -same signature as ${a_{fun}}^{\tau}$, the control flow is passed to the called -function (rule~\ref{rule:25}). - -The execution of a \Clight{} function allocates the memory required -for storing the formal parameters and the local variables of the function -(rule~\ref{rule:26}). The allocation initializes local variables to the -${\tt Vundef}$ undefined value. Formal parameters are initialized to their -corresponding actual parameters. The body of $f$ is then executed, thus yielding -an outcome. The return value of the function is computed from this outcome and -from the return type of $f$. The memory allocated for executing $f$ is freed -before returning to the caller. - -A call $<\id \, \sigma>$ to an external function triggers a new event -that is the trace resulting from that call. - -\input dynsem3.tex - -The rules of figure~\ref{fig:dynsem4} define the execution of a -${\tt switch}(a) \ls$ statement. If the expression of the {\tt switch} evaluates -to $n$, then some statements of the {\tt switch} are selected and executed -(rule~\ref{rule:28}), thus yielding an outcome $\out$. If $\out$ is -{\tt Out{\char95}break}, then the {\tt switch} terminates normally -(\textit{i.e.} the {\tt outcome{\char95}switch} function updates $\out$ into -{\tt Out{\char95}normal} and leave other outcomes unchanged). - -The first selected statements that are executed are those of either the branch -labeled by $n$ or the default branch (rules~\ref{rule:30} and \ref{rule:29}), -thus yielding an outcome that determines if the following statements must be -executed or not (rules~\ref{rule:30} and \ref{rule:31}). If needed, the execution falls through to the next statements $\ls$. - -\input dynsem4.tex - -\section{Translation from \Clight{} to Csharpminor} -\subsection{Overview of Csharpminor} -The Csharpminor language is the first intermediate language of the compiler. -It is a low-level imperative language, structured like our subset of C into -expressions, statements and functions. - -As in \Clight{}, Csharpminor local variables reside in memory, and their address -can be taken. Csharpminor is entirely processor-neutral. In particular, -Csharpminor uses a standard set of operations. - -\subsection{Overview of the translation} - -\section{Proof of correctness} - -The theorem guarantees that if the source program terminates and does -not go wrong, the compiled code terminates and does not go wrong, -performs exactly the same system calls, and returns the same exit code -as the source program. -Currently, it says nothing about source programs that do not terminate (work -in progress). - -\section{Translation from \Clight{} to Csharpminor} -\subsection{Overview of Csharpminor} -The Cminor language is the second intermediate language of the compiler -and the target language of our front-end compiler. - We now give a short overview of Cminor; see \cite{xl:popl} for a more detailed -description, and \cite{xl:compcert-backend} for a complete formal -specification. - -Unlike in Csharpminor, Cminor local variables do not reside in memory, -and their address can not be taken. - -\subsection{Overview of the translation} - -\section{Quantitative results} - -\section{Conclusion} - -\bibliographystyle{unsrt} -\bibliography{blazy} - -\end{article} -\end{document}
\ No newline at end of file |