aboutsummaryrefslogtreecommitdiffstats
path: root/papers/cfrontend_new/paper.tex
diff options
context:
space:
mode:
authorxleroy <xleroy@fca1b0fc-160b-0410-b1d3-a4f43f01ea2e>2007-08-04 07:33:32 +0000
committerxleroy <xleroy@fca1b0fc-160b-0410-b1d3-a4f43f01ea2e>2007-08-04 07:33:32 +0000
commitadc9e990a0c338cef57ff1bd9717adcc781f283c (patch)
treecdbb1265be6c524ade10565b1a2b500b510f3491 /papers/cfrontend_new/paper.tex
parent355b4abcee015c3fae9ac5653c25259e104a886c (diff)
downloadcompcert-adc9e990a0c338cef57ff1bd9717adcc781f283c.tar.gz
compcert-adc9e990a0c338cef57ff1bd9717adcc781f283c.zip
Deplacement du repertoire "papers" dans la hierarchie SVN
git-svn-id: https://yquem.inria.fr/compcert/svn/compcert/trunk@385 fca1b0fc-160b-0410-b1d3-a4f43f01ea2e
Diffstat (limited to 'papers/cfrontend_new/paper.tex')
-rwxr-xr-xpapers/cfrontend_new/paper.tex382
1 files changed, 0 insertions, 382 deletions
diff --git a/papers/cfrontend_new/paper.tex b/papers/cfrontend_new/paper.tex
deleted file mode 100755
index 4293554f..00000000
--- a/papers/cfrontend_new/paper.tex
+++ /dev/null
@@ -1,382 +0,0 @@
-\documentclass[namedreferences]{kluwer}
-
-\usepackage{latexsym}
-%\usepackage{amsmath} pb latex
-\usepackage{amssymb}
-\usepackage{amsfonts}
-\usepackage{mymacros}
-\usepackage{infrules}
-\usepackage{url}
-\usepackage{mymacros}
-\usepackage[usenames]{color}
-
-\input{macros}
-
-\begin{document}
-\begin{article}
-
-\begin{opening}
-\title{Formal verification of C front-end}
-\subtitle{Yet another formal verification in Coq of a C front-end}
-\runningtitle{C front-end}
-
-\author{Sandrine \surname{Blazy} \email{{S}andrine.Blazy@ensiie.fr}}
-\institute{ENSIIE and INRIA}
-
-\author{Xavier \surname{Leroy} \email{{X}avier.Leroy@inria.fr}}
-\runningauthor{S.Blazy, X.Leroy}
-\institute{INRIA}
-
-\maketitle
-
-\begin{abstract}
-This paper presents
-\end{abstract}
-
-\keywords{formal semantics, formal proof, C language, C compiler}
-\end{opening}
-
-\section{Introduction}
-
-\section{A formal semantics for \Clight{}}
-\subsection{CIL}
-
-\subsection{Abstract syntax}
-
-The abstract syntax of \Clight{} is given in figures~\ref{fig:syntax}
-and~\ref{fig:syntax2}.
-In the Coq formalization, this abstract syntax is presented as
-inductive data types, therefore achieving a deep embedding of \Clight{}
-into Coq.
-
-At the level of types, \Clight{} features all the integral types of C,
-along with array, pointer (including function pointers), function
-types, {\tt struct} and {\tt union} types. Bit-fields in {\tt struct}
-or {\tt union} are ignored.
-The integral types fully specify the bit size of integers and floats,
-unlike the half-specified C types {\tt int}, {\tt long}, etc.
-
-\sbcm{TO DO: representation of struct and union}
-
-Within expressions, all C operators are supported.
-Side-effects stem from function calls.
-All expressions and their sub-expressions are
-annotated by their static types. We write $a^\tau$ for the expression
-$a$ carrying type $\tau$. These types are necessary to determine the
-semantics of type-dependent operators such as the overloaded
-arithmetic operators. The following expressions may be in left-value
-position: $\id, *a, a_1[a_2]$ and $a \dot \id$.
-
-\input syntax.tex
-
-At the level of statements, all structured control statements of C
-(conditional, loops, simple {\tt switch}, {\tt break}, {\tt continue} and
-{\tt return}) are supported, but not unstructured statements
-(unstructured {\tt switch} and {\tt goto}).
-Contrary to C, the default case is mandatory in a \Clight{} {\tt switch} statement
-and it is the last branch of the {\tt switch}.
-An assignment is treated as a statement (instead of an expression).
-This transformation from expression into statements is performed by CIL.
-Two kinds of variables are allowed: global variables and local {\tt auto}
-variables declared at the beginning of a function. Block-scoped local
-variables and {\tt static} variables are omitted because they are emulated by
-pulling their declarations to function scope or global scope,
-respectively. Consequently, there is no block statement in \Clight{}.
-
-A function $f$ defined in \Clight{} consists of a header (\textit{i.e.}
-{\tt fn\_params(f)} the parameters of $f$ and {\tt fn\_return(f)}
-the return type of $f$), a list of local variables ({\tt fn\_vars(f)})
-and a body ({\tt fn\_body(f)}) that is a statement. The functions composing
-the program are either internal functions, defined within \Clight{}, or
-external functions (a.k.a. system calls). An external function consists
-of the internal identifier and the signature of the function.
-
-A program $p$ consists of a list of global variable declarations
-({\tt prog\_vars(p)}), a list of function definitions ({\tt prog\_funct(p)}),
-and an identifier naming the entry point of the program ({\tt prog\_main(p)}).
-In a global variable declaration, the list of initial values
-represents all values that need to be stored (\textit{e.g.} the
-values of the cells of an array).
-
-\input syntax2.tex
-
-\subsection{Static semantics}
- The type system is very coarse: we check only the typing properties
- that matter for the translation to be correct. Essentially,
- we need to check that the types attached to variable references
- match the declaration types for those variables.
-
-\subsection{Dynamic semantics}
-
-The dynamic semantics of \Clight{} is specified using natural semantics,
-also known as big-step operational semantics.
- These natural semantics capture the final result of program execution, as
-well as traces of calls to external functions that emit an event when applied.
-The big-step semantics described in~\cite{blazy06:fm} did not observe traces.
-This addition of traces leads to a significantly
-stronger observational equivalence between source and machine code.
-
-Execution traces represent the interactions between the program and
-the rest of the world (\textit{i.e.} the input-output activities of
-the program); they are defined in figure~\ref{fig:trace}.
-A trace is a finite list of input-output events.
-Each call to an external function produces an event.
-An event consists of an identifier and the actual values of the parameters of
-the called function. Usually, the identifier represents the called function,
-but more generally it could also represent any observable identifier.
-Starting from an initial empty event (called $\E0$), events are propagated and
-concatenated during program execution.
-
-\input trace.tex
-
-While the semantics of
-C is not deterministic (the evaluation order for expressions is not
-completely specified and compilers are free to choose between several
-orders), the semantics of \Clight{} is completely deterministic and
-imposes a left-to-right evaluation order, consistent with the order
-implemented by our compiler. This choice simplifies greatly the
-semantics compared with, for instance, Norrish's semantics for C
-\cite{Norrish:phd}, which captures (at great expense) the
-non-determinism allowed by the ISO C specification. Our semantics can
-therefore be viewed as a refinement of (a subset of) the ISO C
-semantics, or that of Norrish.
-
-The semantics is defined by 7 judgements (relations):
-$$\begin{array}{rclr}
-G, E & |- & a, M \evallvalue \loc, \tr, M' & \mbox{(expressions in l-value position)} \\
-G, E & |- & a , M \evalexpr v, \tr, M' & \mbox{(expressions in r-value position)} \\
-G, E & |- & \seq a, M \evalexpr \seq v, \tr, M' & \mbox{(list of expressions)}\\
-G, E & |- & s, M \evalstmt \out, \tr, M' & \mbox{(statements)} \\
-G, E & |- & \ls, M \evalstmt \out, \tr, M' & \mbox{(labeled statements of a switch)} \\
-G & |- & f(\seq v), M \evalfunc v, \tr, M' & \mbox{(function invocations)} \\
- & \vdash & p \evalprog v, \tr & \mbox{(programs)}
-\end{array}$$
-Each judgement relates a syntactic element (expression, statement,
-etc.) and an initial memory state to the result of executing this
-syntactic element, as well as the final memory state at the end of
-execution and also the trace of input-output events (system calls).
- The various kinds of results, as well as the evaluation
-environments, are defined in figure~\ref{fig:values}.
-
-\input values.tex
-
-For instance, executing an expression $a$ in l-value position results in a
-memory location $\loc$ (a memory block reference and an offset within that
-block), while executing an expression $a$ in r-value position results in
-the value $v$ of the expression. Values range over 32-bit integers,
-64-bit floats, memory locations (pointers), and an undefined value
-that represents for instance the value of uninitialized variables.
-The result associated with the execution of a statement $s$ is an
-``outcome'' indicating how the execution terminated: either normally
-by running to completion or prematurely via a {\tt break}, {\tt continue} or
-{\tt return} statement.
-The invocation of a function $f$ yields a value $v$, and so does the execution
-of a program.
-
-Two evaluation environments, defined in figure~\ref{fig:values},
-appear as parameters to the judgements. The local environment $E$ maps
-local variables to references of memory blocks containing the values
-of these variables. These blocks are allocated at function entry and
-freed at function return. The global environment $G$ maps global
-variables and function names to memory references. It also maps
-some references (those corresponding to function pointers) to function
-definitions.
-
-The memory model of the semantics is described in~\cite{icfem:sb}.
-A memory $M$ maps block references to block contents.
-Each block represents the result of a C \texttt{malloc},
-or a stack frame, a global static variable, or a function
-code-pointer.
-A block content consists of the dimensions of the block (low and high bounds)
-plus a mapping from byte offsets to byte-sized memory cells (\textit{i.e.}
-values).
-
-\sbcm{TO DO: a detailler (faire une section a part?): il faut expliquer
-loadval et storeval qui sont utilisees dans la semantique.}
-
-In the Coq specification, the 7 judgements of the dynamic semantics
-are encoded as mutually-inductive predicates. Each defining case of
-each predicate corresponds exactly to an inference rule in the
-conventional, on-paper presentation of natural semantics. We have one
-inference rule for each kind of expression and statement described in
-figure~\ref{fig:syntax}.
- We show most of the inference rules in figures~\ref{fig:dynsem1}
-to~\ref{fig:dynsem4}.
-
-\input dynsem1.tex
-
-The first five rules of figure~\ref{fig:dynsem1} illustrate the
-evaluation of an expression in l-value position.
-A variable $\id$ evaluates to the location $(E(\id), 0)$.
-If an expression $a$ evaluates to a pointer value $\loc$, then
-the location of the dereferencing expression $(\hbox{{\tt *}}a)^\tau$ is $\loc$.
-
-Rule~\ref{rule:3} shows the evaluation of an array cell.
-The location of an array cell $\unannot{a_1}{\tau_1} [\unannot{a_2}{\tau_2}]$
-is computed from the location of the array $\unannot{a_1}{\tau_1}$
-({\it i.e.} the location of the first cell) and the value of the index
-$\unannot{a_2}{\tau_2}$ thanks to the function {\tt add}.
-{\tt add} is a partial function:
-it can be undefined if the types and the shapes of argument values are
-incompatible (e.g. an integer addition of two pointer values).
-In the Coq specification, it is a total
-function returning optional values: either {\tt None} in case of failure, or
-${\tt Some}(v)$, abbreviated as $\some v$, in case of success.
-%More precisely, the computation succeeds only if $v_1$ and
-%$v_2$ are either 2 integers, or 2 floats or a pointer value and an integer.
-%The corresponding types $\tau_1$ and $\tau_2$ must also be
-%compatible with these values. For instance, if $\tau_1$ is a {\tt Tarray},
-%then $\tau_2$ must be a {\tt Tint}.
-The trace resulting from the evaluation of
-$\unannot{a_1}{\tau_1} [\unannot{a_2}{\tau_2}]$ is the concatenation of the
-traces resulting from the evaluation of $\unannot{a_1}{\tau_1}$ and
-$\unannot{a_2}{\tau_2}$.
-
-
-In a similar way, the location of an access to a {\tt struct} field
-$\unannot{a}{\tau} \dot f$ is computed from the location of the {\tt struct}
-$\unannot{a}{\tau}$ and an offset with that {\tt struct} (see
-rule~\ref{rule:4}).
-\sbcm{un peu rapide, cf. repr ?}
-The location of an access to a {\tt union} field is the location of
-the {\tt union} (see rule~\ref{rule:5}).
-
-The last four rule of figure~\ref{fig:dynsem1} illustrate the
-evaluation of an expression in r-value position.
-Rule~\ref{rule:6} rule shows the evaluation of a l-value expression in an
-r-value context. The expression is evaluated to its location $l$,
-with final memory state $M'$. The value at location $l$ in $M'$ is
-fetched using the {\tt loadval} function (see
-\sbcm{(faire une section a part?)}
-%section~\ref{sec:mem}
-) and returned.
-
-If the location of an expression $a$ is $\loc$, then
-the value of the expression $(\hbox{{\tt \&}}a)^\tau$ is also $\loc$
-(rule~\ref{rule:7}).
-Rule~\ref{rule:8} evaluates an application of a binary operator $\op$ to
-expressions $\unannot{a_1}{\tau_1}$ and $\unannot{a_2}{\tau_2}$.
-Both sub-expressions are evaluated in
-sequence, and their values are combined with the
-{\tt eval{\char95}binary{\char95}operation} function, which takes as additional
-arguments the types $\tau_1$ and $\tau_2$ of the arguments, in order to resolve
-overloaded and type-dependent operators.
-
-Rule~\ref{rule:9} evaluates a cast expression $(\tau)\unannot{a}{\tau_a}$.
-The expression $\unannot{a}{\tau_a}$ is evaluated, and its
-value is cast from its natural type $\tau_a$ to the expected type
-$\tau$ using the partial function {\tt cast}. This function
-performs appropriate conversions, truncations and sign-extensions over
-integers and floats, and may fail for undefined casts. The result $v$
-of the cast is then the final result.
-
-The rules in figure~\ref{fig:dynsem2} are
-examples of statements execution.
-The execution of a {\tt skip} statement yields the {\tt Out{\char95}normal}
-outcome and the empty trace.
-Similarly, the execution of a {\tt break} (resp. {\tt continue}) statement
-yields the {\tt Out{\char95}break} (resp. {\tt Out{\char95}continue})
-outcome and the empty trace.
-
-Rule~\ref{rule:12} evaluates an assignment statement.
-An assignment statement $a_1^\tau \hbox{{\tt \ =\ }} a_2^{\tau'}$ evaluates
-the l-value $a_1$ to a location $\loc$, then the r-value $a_2$ to a value $v$.
-As implicit casts are inserted by CIL, there is no need to cast this $v$.
-$v$ is stored in memory at location $\loc$, resulting in the final memory
- state $M_3$.
-
-The execution of a sequence of two statements starts with the
-execution of the first statement, thus yielding an outcome that
-determines if the second statement must be executed or not
-(rules~\ref{rule:15}~and~\ref{rule:16}).
-Rules~\ref{rule:17}--\ref{rule:19} describe the execution of a
-{\tt while} loop. Once the condition of a while loop is evaluated to a
-value $v$, the execution of the loop terminates normally if $v$ is
-false. If $v$ is true, then the loop body is executed, thus yielding
-an outcome $\out$. If $\out$ is {\tt Out{\char95}break}, the loop terminates
-normally. If $\out$ is {\tt Out{\char95}normal} or {\tt Out{\char95}continue},
-the whole loop is reexecuted in the memory state modified by the execution of the
-body.
-Finally, rules~\ref{rule:20}--\ref{rule:21} describe the execution of a
-{\tt return} statement.
-
-\input dynsem2.tex
-
-The rules of figure~\ref{fig:dynsem3} define the execution of a call statement.
-When the expression ${a_{fun}}^{\tau}$ is evaluated to a pointer value that is the
-location (in the global environment $G$) of a function $f$, and when $f$ has the
-same signature as ${a_{fun}}^{\tau}$, the control flow is passed to the called
-function (rule~\ref{rule:25}).
-
-The execution of a \Clight{} function allocates the memory required
-for storing the formal parameters and the local variables of the function
-(rule~\ref{rule:26}). The allocation initializes local variables to the
-${\tt Vundef}$ undefined value. Formal parameters are initialized to their
-corresponding actual parameters. The body of $f$ is then executed, thus yielding
-an outcome. The return value of the function is computed from this outcome and
-from the return type of $f$. The memory allocated for executing $f$ is freed
-before returning to the caller.
-
-A call $<\id \, \sigma>$ to an external function triggers a new event
-that is the trace resulting from that call.
-
-\input dynsem3.tex
-
-The rules of figure~\ref{fig:dynsem4} define the execution of a
-${\tt switch}(a) \ls$ statement. If the expression of the {\tt switch} evaluates
-to $n$, then some statements of the {\tt switch} are selected and executed
-(rule~\ref{rule:28}), thus yielding an outcome $\out$. If $\out$ is
-{\tt Out{\char95}break}, then the {\tt switch} terminates normally
-(\textit{i.e.} the {\tt outcome{\char95}switch} function updates $\out$ into
-{\tt Out{\char95}normal} and leave other outcomes unchanged).
-
-The first selected statements that are executed are those of either the branch
-labeled by $n$ or the default branch (rules~\ref{rule:30} and \ref{rule:29}),
-thus yielding an outcome that determines if the following statements must be
-executed or not (rules~\ref{rule:30} and \ref{rule:31}). If needed, the execution falls through to the next statements $\ls$.
-
-\input dynsem4.tex
-
-\section{Translation from \Clight{} to Csharpminor}
-\subsection{Overview of Csharpminor}
-The Csharpminor language is the first intermediate language of the compiler.
-It is a low-level imperative language, structured like our subset of C into
-expressions, statements and functions.
-
-As in \Clight{}, Csharpminor local variables reside in memory, and their address
-can be taken. Csharpminor is entirely processor-neutral. In particular,
-Csharpminor uses a standard set of operations.
-
-\subsection{Overview of the translation}
-
-\section{Proof of correctness}
-
-The theorem guarantees that if the source program terminates and does
-not go wrong, the compiled code terminates and does not go wrong,
-performs exactly the same system calls, and returns the same exit code
-as the source program.
-Currently, it says nothing about source programs that do not terminate (work
-in progress).
-
-\section{Translation from \Clight{} to Csharpminor}
-\subsection{Overview of Csharpminor}
-The Cminor language is the second intermediate language of the compiler
-and the target language of our front-end compiler.
- We now give a short overview of Cminor; see \cite{xl:popl} for a more detailed
-description, and \cite{xl:compcert-backend} for a complete formal
-specification.
-
-Unlike in Csharpminor, Cminor local variables do not reside in memory,
-and their address can not be taken.
-
-\subsection{Overview of the translation}
-
-\section{Quantitative results}
-
-\section{Conclusion}
-
-\bibliographystyle{unsrt}
-\bibliography{blazy}
-
-\end{article}
-\end{document} \ No newline at end of file