From 6224148fdd809170d138216d72b8e6180d626aec Mon Sep 17 00:00:00 2001 From: xleroy Date: Wed, 17 Feb 2010 13:44:32 +0000 Subject: Reorganization test directory git-svn-id: https://yquem.inria.fr/compcert/svn/compcert/trunk@1253 fca1b0fc-160b-0410-b1d3-a4f43f01ea2e --- test/littlesemantics/intro.tex | 629 ----------------------------------------- 1 file changed, 629 deletions(-) delete mode 100644 test/littlesemantics/intro.tex (limited to 'test/littlesemantics/intro.tex') diff --git a/test/littlesemantics/intro.tex b/test/littlesemantics/intro.tex deleted file mode 100644 index ac5bfda6..00000000 --- a/test/littlesemantics/intro.tex +++ /dev/null @@ -1,629 +0,0 @@ -\documentclass{book} -\usepackage{alltt} -\title{A practical approach to Programming Language Semantics} -\begin{document} - -The purpose of this book is to study programming language semantics -in a way that highlights the practical aspects of this field. -The main objective of programming language semantics is to understand -what programs really mean. The practical outcome of this understanding -is that programming errors can be avoided. In this book, we will actually -insist on the following aspects. - -\begin{itemize} - \item - We will show that there is a continuous progression between - conventional programs and semantic descriptions. We will start - by describing tools concerning programming languages written - as simple programs, and we will show how the same tools are - related to more theoretical descriptions of programming languages. - \item Most of the questions we will study have a practical motivation. In - general, this practical motivation will be to ensure that our tools - do what we mean. - \item Some of our work consists in mathematical proofs about programming - language descriptions. All these proofs be will implemented as formal - proofs that can be verified using a mathematical proof tool. - \item We will only describe toy languages, so that the whole content - of the book can be used as an introduction to programming language - semantics. -\end{itemize} - -We hope that this practical approach, starting with programs in -conventional programming languages to describe a toy programming -language will provide a gentle introduction to the more mathematically -designed formalisms that are customarily used to describe programming -languages and to the practice of proof-verification tools. -Our attachment to the formal verification of semantics -proofs may seem debatable: proof-checkers are still difficult to -use and the constraints imposed by proof-checking add to the -constraints imposed by the formal description of languages in ways -that may discourage students who are already intimidated by the -theoretical content of the course. We believe that this extra cost is -compensated in the following points: -\begin{itemize} - \item A formal description of the concepts being used actually makes it - possible to reduce their ``theoretical load'': the fact that some - notations are just traditions hiding simple predicates is made explicit - in a formal description. - \item Formal descriptions are unambiguous. If some explanation is not - well understood, the formal description can be consulted to disambiguate - the English-language explanations. Moreover, the dependency between - theoretical concepts and practical considerations is more explicit - in formal documents. - \item Mechanical proofs and formal descriptions provide a playground for - students to build up their own experience in extending a language, - defining a new tool, or proving extra properties of an existing - language. - \item Modern proof-checkers also provide ways to extract executable programs - from formal descriptions and proofs. This characteristic will make it - possible to re-inforce the idea that the formal semantics of programming - languages are directly related to implementing tools to - program and avoid programming errors. -\end{itemize} - -The book contain the description of three toy languages: a simple -imperative programming language, a simple theoretical language inspired -from the study of logic, and simple functional programming language; and three -formal description styles: natural semantics, denotational semantics, and -axiomatic semantics. The practical tools that we will obtain -from the formal descriptions will fall in the following categories: -\begin{itemize} - \item Interpreters, that is, tools that execute programs, - \item Compilers, that is, tools that translate programs in other - languages, - \item Static analyzers, that is tools that help make sure that programs - do not contains certain categories of errors. -\end{itemize} -The purpose of mechanical verification of programming language semantics -will be to ensure that these tools respect the intending meaning of programs. - -\chapter{A simple imperative programming language} -\section{An informal description} -We want to study a small programming language whose only instructions -are sequences of assignments to memory locations, possibly repeated inside -loops that are terminated when a test is satisfied. -The values stored in each location will be integers, and for now, we will -assume that the only operations available are additions and the only -tests are strict comparison. Here are a few examples of such instructions. - -\begin{itemize} - \item A program to compute a multiplication, -\begin{verbatim} -z:=0; while x > z do t := y + t; z:=z+1 done -\end{verbatim} -If {\tt x} contains a positive value, a machine executing this program -should terminate in a state where {\tt t} contains the product of the values -in {\tt y} and {\tt x}. -\item A program to compute a square root, -\begin{verbatim} -y:=0; s := 1; - while x > s do - s:=0; y:=y+1; z:=0; - while y > z do s := y+s; z:= z+1 done -done -\end{verbatim} -If {\tt x} contains a positive value, a machine executing this program -should terminate in a state where {\tt y} contains the greatest integer -whose square is smaller than the value in {\tt x}. -\end{itemize} - -These instructions are incomplete in the sense that they do not -indicate what are all the initial values of the memory locations they -refer to. To make complete programs we will choose to associate -an instruction with a declaration of variables. Programs will thus -look as follows: -\begin{verbatim} -Variables - x=10 y=0 s=1 -in -while s < x do - s := 0; y:=y+1; z:=0; - while z < y do s := y+s; z:= z+1 done -done -\end{verbatim} - -Our programming language does not contain a conditional statement. -This statement can be simulated with the help -of while loops and describing the same language with an extra -conditional statement is not more difficult, just longer. - -\section{An interpreter, written in C} -\subsection{Representing programs} -In our interpreter, we want to have an abstract view of programs: -we don't need to know whether programs were written all in one line or -whether a lot of space was used to separate the location name from the -assignment sign in an assignment instruction. Actually, we don't even -need to know whether assignments are written with {\tt :=} (as in Pascal) -or {\tt =} (as in C). The tradition is simply to describe programs -as term structures, where each term has a label (to indicate what kind -of term it is) and a collection of subterms. For our simple programming -language the various kinds of terms available are as follows: -\begin{itemize} - \item {\tt VARIABLE\_LIST}, this term is used to hold all the variable - declarations at the beginning of a program. Terms of this kind contain - a first subterm that describes the declaration of a single variable and - a second subterm that describes the rest of the declarations. A variable - list is thus expected to be a nested collection of terms of kind - {\tt VARIABLE\_LIST}, - \item {\tt VARIABLE\_VALUE}, this term is used to hold the pair of a variable - and a value in a single variable declaration. Terms of this kind have - two subterms where the first one should be a variable and the second - should be a numeral. - \item {\tt NUMERAL}, this term is used to hold a number when it appears - in an assignment, a comparison or an addition. Terms of this kind - contain an integer value, - \item {\tt VARIABLE}, this term is used to hold a memory location name. - Terms of this kind contain a string, - \item {\tt ADDITION}, this term is used to hold an addition. Terms of - this kind have two subterms, which can themselves be other additions, - variables, or numerals, - \item {\tt GREATER}, this term is used to hold a comparison. Terms - of this kind have two subterms, which can be additions, variables, or - numerals, - \item {\tt ASSIGNMENT}, this term is used to hold the instruction - assigning a value to a variable. Terms of this kind contain two - subterms, where the first one is the variable being assigned to, and - the second is the expression that needs to be evaluated in order to - produce the value that is given to the variable, - \item {\tt SEQUENCE}, this term is used to hold two instructions that - should be executed one after the other. Terms of this kind contain - two subterms, which can themselves be sequences, while terms, - assignments, or skip statements, - \item {\tt WHILE}, this term is used to hold a while loop instruction. - Terms of this kind contain two subterms, where the first one is - a comparison (a term of the kind {\tt GREATER}) and the second one - is itself a while instruction, a sequence, an assignment, or a skip - statement, - \item {\tt SKIP}, this term is used to hold an instruction that does - nothing. Terms of this kind have no subterms. -\end{itemize} - -To keep track of these various kinds we define a collection of constant -macros in our C program: - -\begin{verbatim} -#define VARIABLE_LIST 1 -... -#define WHILE 9 -#define SKIP 10 -\end{verbatim} - -We then define a data-structure that corresponds to these terms. In -this definition, we take into account the fact that a term contains -either two subterms, a numeral, or a string of characters. - -\begin{verbatim} - typedef struct term { - short select; - union { - struct term **children; - int int_val; - char *string_val; - } term_body; -} term; -\end{verbatim} - -We then define a collection of functions that construct terms: -the function {\tt numeral} takes an integer as argument and -it constructs a term whose {\tt select} field is {\tt NUMERAL} and whose -value is the given integer. Similarly, a function {\tt sequence} -takes two term arguments and construct a new term that represents the sequence -of these two arguments. In fact, all constructors rely on the following -{\tt make\_term} function. -\begin{verbatim} -term *make_term(int tag) { - term *ptr; - if(!(ptr = (term *)malloc(sizeof(term)))) { - panic("allocation failed in make_term"); - } - ptr->select = tag; - return ptr; -} -\end{verbatim} -The {\tt panic} procedure used in this function simply reports the error -message in terminates the interpreter. The binary constructors rely on -the following {\tt make\_binary\_term} function: -\begin{verbatim} -term *make_binary_term(int tag, term *i1, term *i2) { - term *ptr = make_term(tag); - if(!(ptr->term_body.children - = (term **)malloc(sizeof(term*)*2))) { - panic("could not allocate memory in make_binary_term"); - } - ptr->term_body.children[0] = i1; - ptr->term_body.children[1] = i2; - return ptr; -} -\end{verbatim} -For instance, the {\tt variable} and {\tt addition} functions -are defined as follows: -\begin{verbatim} -term *variable(char *s) { - term *ptr = make_term(VARIABLE); - ptr->term_body.string_val = s; - return ptr; -} - -term *addition(term *i1, term *i2) { - make_binary_term(ADDITION, i1, i2); -} -\end{verbatim} - -Once we have defined all the constructors, we can combine them to build -large terms. For instance, the multiplication program is built by -the following C expression: - -\begin{verbatim} -sequence(assignment(variable("z"),numeral(0)), - term_while(greater(variable("x"),variable("z")), - sequence( - assignment(variable("t"), - addition(variable("y"),variable("t"))), - assignment(variable("z"), - addition(variable("z"),numeral(1)))))) -\end{verbatim} -\subsection{Navigating in programs} -When we have a term representing a program fragment, we access its -sub-expressions using a function {\tt child} that has the following -definition, such that executing -\begin{verbatim} - child(assignment(variable("z"),numeral(1)),2) -\end{verbatim} -should return the same term as executing -\begin{verbatim} - numeral(1) -\end{verbatim} -The function {\tt child} is defined as follows: -\begin{verbatim} -term *child(term *t, int rank) { - return (t->term_body.children[rank-1]); -} -\end{verbatim} - -We also define two functions {\tt variable\_name} and -{\tt numeral\_value} to map terms of kind {\tt VARIABLE} and -{\tt NUMERAL} to the values of type {\tt string} and {\tt int}, -respectively, that they contain: -\begin{verbatim} -char *variable_name(term *t) { - return(t->term_body.string_val); -} - -int numeral_value(term *t) { - return(t->term_body.int_val); -} -\end{verbatim} -\subsection{Execution environments} -To simulate the execution of programs, we simply want to view their -effect as modifying a collection of memory locations. We don't need -to know how these locations are laid out in memory, we simply need -to record the fact that some variable is associated to some integer -value. To keep this record, we construct a simply linked list using terms -of kind {\tt variable\_list} and {\tt variable\_value}. We -use the same kind of binary terms as for the program structure to -represent these lists. To represent the end of such a list, we use the -conventional NULL pointer. - -With these conventions, we can build the representation of a state where -the location {\tt x} contains 0 and the location {\tt y} contains 2 as -follows: -\begin{verbatim} -variable_list(variable_value(variable("x"),numeral(0)), - variable_list(variable_value(variable("y"),numeral(2)),NULL)) -\end{verbatim} - -Given an arbitrary variable, we often need to compute the value -associated to this variable in a given environment. The following -C function performs this task by navigating the list structure and -stopping as soon as it finds a pair of a variable and a value that -corresponds to the variable we are looking for and returns the -corresponding integer. -\begin{verbatim} -int lookup(term *env, char *var) { - while(NULL != env) { - term *pair = child(env, 1); - if(strcmp(variable_name(child(pair, 1)), var)==0) { - return numeral_value(child(pair, 2)); - } else { - env = child(env, 2); - } - } - fprintf(stderr, "%s : ", var); - panic("no such variable in environment for lookup"); -} -\end{verbatim} -In this function, we use the function {\tt strcmp} from the -standard {\tt string} library of the C language. This function -is used to compare two strings, it returns 0 if, and only if, the two -strings are the same. - -Similarly, we will also need to construct a new environment that -only differs from a previous one in the fact the value of a single -variable has changed. We want this function to map the environment -\begin{verbatim} -variable_list(variable_value(variable("x"),numeral(0)), - variable_list(variable_value(variable("y"),numeral(2)),NULL)), -\end{verbatim} -the variable {\tt y} and the value {\tt 4} to the environment -\begin{verbatim} -variable_list(variable_value(variable("x"),numeral(0)), - variable_list(variable_value(variable("y"),numeral(4)),NULL)), -\end{verbatim} - - -Such a mapping is described with the following function: -\begin{verbatim} -term *update(term *env, char *var_name, term *val) { - if(NULL != env) { - term *var = child(child(env, 1), 1); - if(strcmp(variable_name(var), var_name) == 0) { - return(variable_list(variable_value(var, val), - child(env, 2))); - } else { - return(variable_list(child(env, 1), - update(child(env, 2), var_name, val))); - } - } else { - fprintf(stderr, "%s : ", var_name); - panic("no such variable in environment for update"); - } -} -\end{verbatim} -\subsection{Evaluating expressions} -Consider the following program: -\begin{verbatim} -x := 1; -y := 2; -x := (x + y) + 1; -y := (x + y) + 1 -\end{verbatim} -The first time the expression {\tt x + y} is executed, on the third line, -the result should be be 4. The second time this expression is executed, -on the fourth line, the result should be 7. The evaluation of expressions -needs to take into account the value of variables. This is the reason -why we have an environment data structure, to keep track of the -value of all the variables used in the program. - -In our language there are three kinds of expressions. For variables, -we need to look in the environment to know the value that is currently -recorded. We can use the {\tt lookup} function that was defined in the -previous section. - -When the expression we want to evaluate is numeral, the value is -simply the value of this numeral. On the other hand, when the -expression is an addition, we need first to evaluate the two -components of the expression. This returns integer values that we can -add to obtain the value of the whole addition expression. Thus, the -function that evaluates the expressions is a recursive function. -This is easily described with the following function: -\begin{verbatim} -int evaluate(term *env, term *exp) { - int n1, n2; - switch(exp->select) { - case VARIABLE: return lookup(env, variable_name(exp)); - case NUMERAL: return numeral_value(exp); - case ADDITION: - return (evaluate(env, child(exp, 1)) + - evaluate(env, child(exp, 2))); - } -} -\end{verbatim} -We also need to evaluate comparison expressions and return a boolean value. -The function that implements this simply needs to evaluate the two -expressions that need to be compared in a given environment and then -compare the integer values that are received. Here is this function's -code: -\begin{verbatim} -int evaluate_bool(term *env, term *exp) { - int n1 = evaluate(env, child(exp, 1)); - int n2 = evaluate(env, child(exp, 2)); - return (n1 > n2); -} -\end{verbatim} - -\subsection{Executing instructions} -We now have enough functions to describe the behavior of the three instructions -in our language. We want to define a function {\tt execute} that takes -two arguments, an environment and an instruction. For instance, when -applied to the environment -\begin{verbatim} -variable_list(variable_value(variable("x"),numeral(0)), - variable_list(variable_value(variable("y"),numeral(2)),NULL)), -and the instruction -\begin{verbatim} - assignment(variable("x",addition(variable("y"),numeral(2))) -\end{verbatim} -we want this function {\tt execute} to return the following environment -\begin{verbatim} -variable_list(variable_value(variable("x"),numeral(0)), - variable_list(variable_value(variable("y"),numeral(4)),NULL)). -\end{verbatim} -\begin{enumerate} -\item To execute an assignment instruction in a given environment, -we simply need to evaluate the expression that occurs as this assignment's -right-hand side (using the {\tt evaluate} function) and then to -construct a new environment where only the assigned variable is changed -(using the {\tt update} function), -\item To execute a sequence of two instructions, we need to execute the -first instruction and collect the resulting environment. The second -instruction must then be executed, but in that intermediate environment. -\item To execute a while loop, we need to evaluate the condition as -a boolean expression. If the value of this expression is false, then -we can return the input environment: the loop terminates without modifying -its execution environment. If the value of the expression is true, then -the body of the while loop needs to be executed, the resulting environment -must be collected, and the while loop must be executed again, starting -from that environment. -\end{enumerate} -These behaviors can be expressed very neatly by a simple composition of -the functions, including the recursive calls of the execution function -that is being defined: -\begin{verbatim} -term *execute(term *env, term *inst) { - switch(inst->select) { - case ASSIGNMENT: - return update(env, - variable_name(child(inst, 1)), - numeral(evaluate(env,child(inst, 2)))); - case SEQUENCE: - return execute(execute(env,child(inst, 1)),child(inst, 2)); - case WHILE: - if(evaluate_bool(env,child(inst, 1))) { - return execute(execute(env,child(inst, 2)), inst); - } else { - return env; - } - } -} -\end{verbatim} -\subsection{Practical aspects: adding a parser} -It is not practical to test our interpreter if we have to build C -expressions by combining the term constructors in a C program, recompile -this C program and observe the result with the navigating functions. -A practical alternative is to define a parser to read in programs written -in a readable syntax and combine this parser with a the {\tt execute} -function. We use the Bison tool to add a simple parser to this interpreter. -For instance, the parsing rules for instructions have the following -shape -\begin{verbatim} -nst: identifier T_ASSIGN exp {$$ = assignment($1,$3); } -| inst T_SCOLUMN inst {$$ = sequence($1,$3); } -| T_WHILE b_exp T_DO inst T_DONE {$$ = term_while($2,$4); } -| T_SKIP { $$ = skip(); } -| T_OPEN_B exp T_CLOSE_B { $$ = $2; } -\end{verbatim} -For most parsing rules, the associated semantic action is simply a -term constructor, except for rules that are concerned with parentheses or -brackets, like the last parsing rule above. - -The semantic rule for the program construct triggers -the execution of the program with the initial state given by the -list of variables that was parsed (this list is reversed, because -our parser stores the variable declarations in the wrong order). -\begin{verbatim} -program : T_VARIABLES environment T_IN inst T_END -{ print_env(execute(revert_env($2), $4)); exit(0); } -; -\end{verbatim} -\section{The interpreter in OCaml} -In the OCaml language, defining the terms and the constructors is -directly done with a type definition: -\begin{verbatim} -type exp = - Addition of exp*exp | Numeral of int | Variable of string;; -type b_exp = Greater of exp*exp;; -type inst = - Assignment of string * exp | Sequence of inst * inst -| While of b_exp * inst | Skip;; -\end{verbatim} -We simply use the pre-defined type of lists to represent environments, -in fact we use lists of pairs, where the first component is a string -and the second argument is a integer. -\begin{verbatim} -type environment = (string * int) list;; -\end{verbatim} -With this data-structure, the functions {\tt lookup} and {\tt update} -are defined as follows: -\begin{verbatim} -let rec lookup (env:environment) (var:string) = - match env with - [] -> failwith(var ^ - " : no such variable in environment for lookup") - | (a,v)::tl -> if a = var then v else lookup tl var;; - -let rec update (env:environment) (var:string) (n:int) = - match env with - [] -> failwith(var ^ - " : no such variable in environment for update") - | ((a,v) as pair)::tl -> - if a = var then - (a,n)::tl - else - pair::(update tl var n);; -\end{verbatim} -The programming style of OCaml relies on pattern matching construct, -with specific syntax to describe lists. Thus the expression -\begin{alltt} - match env with [] -> \(e\sub{1}\) | a::tl -> \(e\sub{2}\) -\end{alltt} -means ``if the input data is an empty list, then return the -value of \(e_1\), otherwise, let {\tt a} be the first element of -the list and {\tt tl} be the second element, compute the value -of \(e_2\), which may depend on the value of {\tt a} and {\tt tl}, -and return this value. In the functions, we know that the elements -of the lists being processed are pairs and the pattern-matching -construct also makes it possible to access the components of -the first pair element concisely. - -The pattern-matching construct makes it possible to write the -functions more concisely than in the C programming language. This -conciseness also helps in the description of the {\tt evaluate}, -{\tt b\_evaluate}, and {\tt execute} functions. -\begin{verbatim} -let rec evaluate (env:environment)(e:exp) = - match e with - Numeral n -> n - | Variable s -> lookup env s - | Addition(e1,e2) -> evaluate env e1 + evaluate env e2;; - -let evaluate_bool (env:environment)(e:b_exp) = - match e with - Greater(e1,e2) -> evaluate env e1 > evaluate env e2;; - -let rec execute (env:environment)(i:inst) = - match i with - Skip -> env - | Assignment(s,e) -> update env s (evaluate env e) - | Sequence(i1,i2) -> execute (execute env i1) i2 - | While(b,i) as it -> - if (evaluate_bool env b) then - execute (execute env i) it - else - env;; -\end{verbatim} -The OCaml tools suite also provide means to define parsers. We -used the {\tt ocamlyacc} tool which is close to {\tt bison}. In all, -we are able to define the same interpreter more concisely with the -OCaml language. - -\section{An interpreter written in Prolog} -The Prolog programming style relies on the possibility to define -predicates with multiple arguments. Very often, these predicates -can be used to describe functions, when the last argument is actually -viewed as the result of processing the previous argument. - -Predicates are usually described in a style that is closed to OCaml's -pattern-matching. Thus, we will consider a predicate {\tt lookup} -with three arguments, so that {\tt lookup(E,X,V)} can be read as -``the value of {\tt X} in the environment {\tt E} is {\tt V}. -This predicated can actually be defined with the following two clauses. -\begin{verbatim} -lookup([(X,V)|_], X, V) :- ! . -lookup([(Y,_)|Tl], X, V) :- lookup(Tl, X, V). -\end{verbatim} -Prolog's treatement of definitions is that when a question is asked, the -defining clauses are tried to solve the question. The bang ponctuation -means that no other clause should be tested if this one succeeded. In -effect, this ensures that the second clause is applied only when {\tt X} -and {\tt Y} have different values. The {\em anonymous} variable ``{\tt \_}'' -is used to indicate that the value found in the question where this -variable lies is not important. The two clauses defining {\tt lookup} -can be read as follows: ``in any environment that is a list with -a first pair ({\tt X}, {\tt V}), the value of the variable {\tt X} is {\tt V}. -When the previous rule does not apply (when {\tt X} is different from {\tt Y}, -the value of {\tt Y} in an environment that starts with a pair -of {\tt Y} and any value and whose tail is an environment {\tt Tl}, -the value of the variable {\tt X} is the same as the value it would -have in the environment {\tt Tl}. - -Thus, in Prolog, the notation {\tt :-} can be read as an {\em if} -construct. - -We can define the operation of updating environments similarly: -\begin{verbatim} -update([(X,_)|Tl], X, V1, [(X,V1)|Tl]) :- !. -update([(Y,V)|Tl], X, V1, [(Y,V)|Tl1]) :- - update(Tl, X, V1, Tl1). -\end{verbatim} - -\end{document} -- cgit