aboutsummaryrefslogtreecommitdiffstats
path: root/test/littlesemantics/intro.tex
diff options
context:
space:
mode:
Diffstat (limited to 'test/littlesemantics/intro.tex')
-rw-r--r--test/littlesemantics/intro.tex629
1 files changed, 0 insertions, 629 deletions
diff --git a/test/littlesemantics/intro.tex b/test/littlesemantics/intro.tex
deleted file mode 100644
index ac5bfda6..00000000
--- a/test/littlesemantics/intro.tex
+++ /dev/null
@@ -1,629 +0,0 @@
-\documentclass{book}
-\usepackage{alltt}
-\title{A practical approach to Programming Language Semantics}
-\begin{document}
-
-The purpose of this book is to study programming language semantics
-in a way that highlights the practical aspects of this field.
-The main objective of programming language semantics is to understand
-what programs really mean. The practical outcome of this understanding
-is that programming errors can be avoided. In this book, we will actually
-insist on the following aspects.
-
-\begin{itemize}
- \item
- We will show that there is a continuous progression between
- conventional programs and semantic descriptions. We will start
- by describing tools concerning programming languages written
- as simple programs, and we will show how the same tools are
- related to more theoretical descriptions of programming languages.
- \item Most of the questions we will study have a practical motivation. In
- general, this practical motivation will be to ensure that our tools
- do what we mean.
- \item Some of our work consists in mathematical proofs about programming
- language descriptions. All these proofs be will implemented as formal
- proofs that can be verified using a mathematical proof tool.
- \item We will only describe toy languages, so that the whole content
- of the book can be used as an introduction to programming language
- semantics.
-\end{itemize}
-
-We hope that this practical approach, starting with programs in
-conventional programming languages to describe a toy programming
-language will provide a gentle introduction to the more mathematically
-designed formalisms that are customarily used to describe programming
-languages and to the practice of proof-verification tools.
-Our attachment to the formal verification of semantics
-proofs may seem debatable: proof-checkers are still difficult to
-use and the constraints imposed by proof-checking add to the
-constraints imposed by the formal description of languages in ways
-that may discourage students who are already intimidated by the
-theoretical content of the course. We believe that this extra cost is
-compensated in the following points:
-\begin{itemize}
- \item A formal description of the concepts being used actually makes it
- possible to reduce their ``theoretical load'': the fact that some
- notations are just traditions hiding simple predicates is made explicit
- in a formal description.
- \item Formal descriptions are unambiguous. If some explanation is not
- well understood, the formal description can be consulted to disambiguate
- the English-language explanations. Moreover, the dependency between
- theoretical concepts and practical considerations is more explicit
- in formal documents.
- \item Mechanical proofs and formal descriptions provide a playground for
- students to build up their own experience in extending a language,
- defining a new tool, or proving extra properties of an existing
- language.
- \item Modern proof-checkers also provide ways to extract executable programs
- from formal descriptions and proofs. This characteristic will make it
- possible to re-inforce the idea that the formal semantics of programming
- languages are directly related to implementing tools to
- program and avoid programming errors.
-\end{itemize}
-
-The book contain the description of three toy languages: a simple
-imperative programming language, a simple theoretical language inspired
-from the study of logic, and simple functional programming language; and three
-formal description styles: natural semantics, denotational semantics, and
-axiomatic semantics. The practical tools that we will obtain
-from the formal descriptions will fall in the following categories:
-\begin{itemize}
- \item Interpreters, that is, tools that execute programs,
- \item Compilers, that is, tools that translate programs in other
- languages,
- \item Static analyzers, that is tools that help make sure that programs
- do not contains certain categories of errors.
-\end{itemize}
-The purpose of mechanical verification of programming language semantics
-will be to ensure that these tools respect the intending meaning of programs.
-
-\chapter{A simple imperative programming language}
-\section{An informal description}
-We want to study a small programming language whose only instructions
-are sequences of assignments to memory locations, possibly repeated inside
-loops that are terminated when a test is satisfied.
-The values stored in each location will be integers, and for now, we will
-assume that the only operations available are additions and the only
-tests are strict comparison. Here are a few examples of such instructions.
-
-\begin{itemize}
- \item A program to compute a multiplication,
-\begin{verbatim}
-z:=0; while x > z do t := y + t; z:=z+1 done
-\end{verbatim}
-If {\tt x} contains a positive value, a machine executing this program
-should terminate in a state where {\tt t} contains the product of the values
-in {\tt y} and {\tt x}.
-\item A program to compute a square root,
-\begin{verbatim}
-y:=0; s := 1;
- while x > s do
- s:=0; y:=y+1; z:=0;
- while y > z do s := y+s; z:= z+1 done
-done
-\end{verbatim}
-If {\tt x} contains a positive value, a machine executing this program
-should terminate in a state where {\tt y} contains the greatest integer
-whose square is smaller than the value in {\tt x}.
-\end{itemize}
-
-These instructions are incomplete in the sense that they do not
-indicate what are all the initial values of the memory locations they
-refer to. To make complete programs we will choose to associate
-an instruction with a declaration of variables. Programs will thus
-look as follows:
-\begin{verbatim}
-Variables
- x=10 y=0 s=1
-in
-while s < x do
- s := 0; y:=y+1; z:=0;
- while z < y do s := y+s; z:= z+1 done
-done
-\end{verbatim}
-
-Our programming language does not contain a conditional statement.
-This statement can be simulated with the help
-of while loops and describing the same language with an extra
-conditional statement is not more difficult, just longer.
-
-\section{An interpreter, written in C}
-\subsection{Representing programs}
-In our interpreter, we want to have an abstract view of programs:
-we don't need to know whether programs were written all in one line or
-whether a lot of space was used to separate the location name from the
-assignment sign in an assignment instruction. Actually, we don't even
-need to know whether assignments are written with {\tt :=} (as in Pascal)
-or {\tt =} (as in C). The tradition is simply to describe programs
-as term structures, where each term has a label (to indicate what kind
-of term it is) and a collection of subterms. For our simple programming
-language the various kinds of terms available are as follows:
-\begin{itemize}
- \item {\tt VARIABLE\_LIST}, this term is used to hold all the variable
- declarations at the beginning of a program. Terms of this kind contain
- a first subterm that describes the declaration of a single variable and
- a second subterm that describes the rest of the declarations. A variable
- list is thus expected to be a nested collection of terms of kind
- {\tt VARIABLE\_LIST},
- \item {\tt VARIABLE\_VALUE}, this term is used to hold the pair of a variable
- and a value in a single variable declaration. Terms of this kind have
- two subterms where the first one should be a variable and the second
- should be a numeral.
- \item {\tt NUMERAL}, this term is used to hold a number when it appears
- in an assignment, a comparison or an addition. Terms of this kind
- contain an integer value,
- \item {\tt VARIABLE}, this term is used to hold a memory location name.
- Terms of this kind contain a string,
- \item {\tt ADDITION}, this term is used to hold an addition. Terms of
- this kind have two subterms, which can themselves be other additions,
- variables, or numerals,
- \item {\tt GREATER}, this term is used to hold a comparison. Terms
- of this kind have two subterms, which can be additions, variables, or
- numerals,
- \item {\tt ASSIGNMENT}, this term is used to hold the instruction
- assigning a value to a variable. Terms of this kind contain two
- subterms, where the first one is the variable being assigned to, and
- the second is the expression that needs to be evaluated in order to
- produce the value that is given to the variable,
- \item {\tt SEQUENCE}, this term is used to hold two instructions that
- should be executed one after the other. Terms of this kind contain
- two subterms, which can themselves be sequences, while terms,
- assignments, or skip statements,
- \item {\tt WHILE}, this term is used to hold a while loop instruction.
- Terms of this kind contain two subterms, where the first one is
- a comparison (a term of the kind {\tt GREATER}) and the second one
- is itself a while instruction, a sequence, an assignment, or a skip
- statement,
- \item {\tt SKIP}, this term is used to hold an instruction that does
- nothing. Terms of this kind have no subterms.
-\end{itemize}
-
-To keep track of these various kinds we define a collection of constant
-macros in our C program:
-
-\begin{verbatim}
-#define VARIABLE_LIST 1
-...
-#define WHILE 9
-#define SKIP 10
-\end{verbatim}
-
-We then define a data-structure that corresponds to these terms. In
-this definition, we take into account the fact that a term contains
-either two subterms, a numeral, or a string of characters.
-
-\begin{verbatim}
- typedef struct term {
- short select;
- union {
- struct term **children;
- int int_val;
- char *string_val;
- } term_body;
-} term;
-\end{verbatim}
-
-We then define a collection of functions that construct terms:
-the function {\tt numeral} takes an integer as argument and
-it constructs a term whose {\tt select} field is {\tt NUMERAL} and whose
-value is the given integer. Similarly, a function {\tt sequence}
-takes two term arguments and construct a new term that represents the sequence
-of these two arguments. In fact, all constructors rely on the following
-{\tt make\_term} function.
-\begin{verbatim}
-term *make_term(int tag) {
- term *ptr;
- if(!(ptr = (term *)malloc(sizeof(term)))) {
- panic("allocation failed in make_term");
- }
- ptr->select = tag;
- return ptr;
-}
-\end{verbatim}
-The {\tt panic} procedure used in this function simply reports the error
-message in terminates the interpreter. The binary constructors rely on
-the following {\tt make\_binary\_term} function:
-\begin{verbatim}
-term *make_binary_term(int tag, term *i1, term *i2) {
- term *ptr = make_term(tag);
- if(!(ptr->term_body.children
- = (term **)malloc(sizeof(term*)*2))) {
- panic("could not allocate memory in make_binary_term");
- }
- ptr->term_body.children[0] = i1;
- ptr->term_body.children[1] = i2;
- return ptr;
-}
-\end{verbatim}
-For instance, the {\tt variable} and {\tt addition} functions
-are defined as follows:
-\begin{verbatim}
-term *variable(char *s) {
- term *ptr = make_term(VARIABLE);
- ptr->term_body.string_val = s;
- return ptr;
-}
-
-term *addition(term *i1, term *i2) {
- make_binary_term(ADDITION, i1, i2);
-}
-\end{verbatim}
-
-Once we have defined all the constructors, we can combine them to build
-large terms. For instance, the multiplication program is built by
-the following C expression:
-
-\begin{verbatim}
-sequence(assignment(variable("z"),numeral(0)),
- term_while(greater(variable("x"),variable("z")),
- sequence(
- assignment(variable("t"),
- addition(variable("y"),variable("t"))),
- assignment(variable("z"),
- addition(variable("z"),numeral(1))))))
-\end{verbatim}
-\subsection{Navigating in programs}
-When we have a term representing a program fragment, we access its
-sub-expressions using a function {\tt child} that has the following
-definition, such that executing
-\begin{verbatim}
- child(assignment(variable("z"),numeral(1)),2)
-\end{verbatim}
-should return the same term as executing
-\begin{verbatim}
- numeral(1)
-\end{verbatim}
-The function {\tt child} is defined as follows:
-\begin{verbatim}
-term *child(term *t, int rank) {
- return (t->term_body.children[rank-1]);
-}
-\end{verbatim}
-
-We also define two functions {\tt variable\_name} and
-{\tt numeral\_value} to map terms of kind {\tt VARIABLE} and
-{\tt NUMERAL} to the values of type {\tt string} and {\tt int},
-respectively, that they contain:
-\begin{verbatim}
-char *variable_name(term *t) {
- return(t->term_body.string_val);
-}
-
-int numeral_value(term *t) {
- return(t->term_body.int_val);
-}
-\end{verbatim}
-\subsection{Execution environments}
-To simulate the execution of programs, we simply want to view their
-effect as modifying a collection of memory locations. We don't need
-to know how these locations are laid out in memory, we simply need
-to record the fact that some variable is associated to some integer
-value. To keep this record, we construct a simply linked list using terms
-of kind {\tt variable\_list} and {\tt variable\_value}. We
-use the same kind of binary terms as for the program structure to
-represent these lists. To represent the end of such a list, we use the
-conventional NULL pointer.
-
-With these conventions, we can build the representation of a state where
-the location {\tt x} contains 0 and the location {\tt y} contains 2 as
-follows:
-\begin{verbatim}
-variable_list(variable_value(variable("x"),numeral(0)),
- variable_list(variable_value(variable("y"),numeral(2)),NULL))
-\end{verbatim}
-
-Given an arbitrary variable, we often need to compute the value
-associated to this variable in a given environment. The following
-C function performs this task by navigating the list structure and
-stopping as soon as it finds a pair of a variable and a value that
-corresponds to the variable we are looking for and returns the
-corresponding integer.
-\begin{verbatim}
-int lookup(term *env, char *var) {
- while(NULL != env) {
- term *pair = child(env, 1);
- if(strcmp(variable_name(child(pair, 1)), var)==0) {
- return numeral_value(child(pair, 2));
- } else {
- env = child(env, 2);
- }
- }
- fprintf(stderr, "%s : ", var);
- panic("no such variable in environment for lookup");
-}
-\end{verbatim}
-In this function, we use the function {\tt strcmp} from the
-standard {\tt string} library of the C language. This function
-is used to compare two strings, it returns 0 if, and only if, the two
-strings are the same.
-
-Similarly, we will also need to construct a new environment that
-only differs from a previous one in the fact the value of a single
-variable has changed. We want this function to map the environment
-\begin{verbatim}
-variable_list(variable_value(variable("x"),numeral(0)),
- variable_list(variable_value(variable("y"),numeral(2)),NULL)),
-\end{verbatim}
-the variable {\tt y} and the value {\tt 4} to the environment
-\begin{verbatim}
-variable_list(variable_value(variable("x"),numeral(0)),
- variable_list(variable_value(variable("y"),numeral(4)),NULL)),
-\end{verbatim}
-
-
-Such a mapping is described with the following function:
-\begin{verbatim}
-term *update(term *env, char *var_name, term *val) {
- if(NULL != env) {
- term *var = child(child(env, 1), 1);
- if(strcmp(variable_name(var), var_name) == 0) {
- return(variable_list(variable_value(var, val),
- child(env, 2)));
- } else {
- return(variable_list(child(env, 1),
- update(child(env, 2), var_name, val)));
- }
- } else {
- fprintf(stderr, "%s : ", var_name);
- panic("no such variable in environment for update");
- }
-}
-\end{verbatim}
-\subsection{Evaluating expressions}
-Consider the following program:
-\begin{verbatim}
-x := 1;
-y := 2;
-x := (x + y) + 1;
-y := (x + y) + 1
-\end{verbatim}
-The first time the expression {\tt x + y} is executed, on the third line,
-the result should be be 4. The second time this expression is executed,
-on the fourth line, the result should be 7. The evaluation of expressions
-needs to take into account the value of variables. This is the reason
-why we have an environment data structure, to keep track of the
-value of all the variables used in the program.
-
-In our language there are three kinds of expressions. For variables,
-we need to look in the environment to know the value that is currently
-recorded. We can use the {\tt lookup} function that was defined in the
-previous section.
-
-When the expression we want to evaluate is numeral, the value is
-simply the value of this numeral. On the other hand, when the
-expression is an addition, we need first to evaluate the two
-components of the expression. This returns integer values that we can
-add to obtain the value of the whole addition expression. Thus, the
-function that evaluates the expressions is a recursive function.
-This is easily described with the following function:
-\begin{verbatim}
-int evaluate(term *env, term *exp) {
- int n1, n2;
- switch(exp->select) {
- case VARIABLE: return lookup(env, variable_name(exp));
- case NUMERAL: return numeral_value(exp);
- case ADDITION:
- return (evaluate(env, child(exp, 1)) +
- evaluate(env, child(exp, 2)));
- }
-}
-\end{verbatim}
-We also need to evaluate comparison expressions and return a boolean value.
-The function that implements this simply needs to evaluate the two
-expressions that need to be compared in a given environment and then
-compare the integer values that are received. Here is this function's
-code:
-\begin{verbatim}
-int evaluate_bool(term *env, term *exp) {
- int n1 = evaluate(env, child(exp, 1));
- int n2 = evaluate(env, child(exp, 2));
- return (n1 > n2);
-}
-\end{verbatim}
-
-\subsection{Executing instructions}
-We now have enough functions to describe the behavior of the three instructions
-in our language. We want to define a function {\tt execute} that takes
-two arguments, an environment and an instruction. For instance, when
-applied to the environment
-\begin{verbatim}
-variable_list(variable_value(variable("x"),numeral(0)),
- variable_list(variable_value(variable("y"),numeral(2)),NULL)),
-and the instruction
-\begin{verbatim}
- assignment(variable("x",addition(variable("y"),numeral(2)))
-\end{verbatim}
-we want this function {\tt execute} to return the following environment
-\begin{verbatim}
-variable_list(variable_value(variable("x"),numeral(0)),
- variable_list(variable_value(variable("y"),numeral(4)),NULL)).
-\end{verbatim}
-\begin{enumerate}
-\item To execute an assignment instruction in a given environment,
-we simply need to evaluate the expression that occurs as this assignment's
-right-hand side (using the {\tt evaluate} function) and then to
-construct a new environment where only the assigned variable is changed
-(using the {\tt update} function),
-\item To execute a sequence of two instructions, we need to execute the
-first instruction and collect the resulting environment. The second
-instruction must then be executed, but in that intermediate environment.
-\item To execute a while loop, we need to evaluate the condition as
-a boolean expression. If the value of this expression is false, then
-we can return the input environment: the loop terminates without modifying
-its execution environment. If the value of the expression is true, then
-the body of the while loop needs to be executed, the resulting environment
-must be collected, and the while loop must be executed again, starting
-from that environment.
-\end{enumerate}
-These behaviors can be expressed very neatly by a simple composition of
-the functions, including the recursive calls of the execution function
-that is being defined:
-\begin{verbatim}
-term *execute(term *env, term *inst) {
- switch(inst->select) {
- case ASSIGNMENT:
- return update(env,
- variable_name(child(inst, 1)),
- numeral(evaluate(env,child(inst, 2))));
- case SEQUENCE:
- return execute(execute(env,child(inst, 1)),child(inst, 2));
- case WHILE:
- if(evaluate_bool(env,child(inst, 1))) {
- return execute(execute(env,child(inst, 2)), inst);
- } else {
- return env;
- }
- }
-}
-\end{verbatim}
-\subsection{Practical aspects: adding a parser}
-It is not practical to test our interpreter if we have to build C
-expressions by combining the term constructors in a C program, recompile
-this C program and observe the result with the navigating functions.
-A practical alternative is to define a parser to read in programs written
-in a readable syntax and combine this parser with a the {\tt execute}
-function. We use the Bison tool to add a simple parser to this interpreter.
-For instance, the parsing rules for instructions have the following
-shape
-\begin{verbatim}
-nst: identifier T_ASSIGN exp {$$ = assignment($1,$3); }
-| inst T_SCOLUMN inst {$$ = sequence($1,$3); }
-| T_WHILE b_exp T_DO inst T_DONE {$$ = term_while($2,$4); }
-| T_SKIP { $$ = skip(); }
-| T_OPEN_B exp T_CLOSE_B { $$ = $2; }
-\end{verbatim}
-For most parsing rules, the associated semantic action is simply a
-term constructor, except for rules that are concerned with parentheses or
-brackets, like the last parsing rule above.
-
-The semantic rule for the program construct triggers
-the execution of the program with the initial state given by the
-list of variables that was parsed (this list is reversed, because
-our parser stores the variable declarations in the wrong order).
-\begin{verbatim}
-program : T_VARIABLES environment T_IN inst T_END
-{ print_env(execute(revert_env($2), $4)); exit(0); }
-;
-\end{verbatim}
-\section{The interpreter in OCaml}
-In the OCaml language, defining the terms and the constructors is
-directly done with a type definition:
-\begin{verbatim}
-type exp =
- Addition of exp*exp | Numeral of int | Variable of string;;
-type b_exp = Greater of exp*exp;;
-type inst =
- Assignment of string * exp | Sequence of inst * inst
-| While of b_exp * inst | Skip;;
-\end{verbatim}
-We simply use the pre-defined type of lists to represent environments,
-in fact we use lists of pairs, where the first component is a string
-and the second argument is a integer.
-\begin{verbatim}
-type environment = (string * int) list;;
-\end{verbatim}
-With this data-structure, the functions {\tt lookup} and {\tt update}
-are defined as follows:
-\begin{verbatim}
-let rec lookup (env:environment) (var:string) =
- match env with
- [] -> failwith(var ^
- " : no such variable in environment for lookup")
- | (a,v)::tl -> if a = var then v else lookup tl var;;
-
-let rec update (env:environment) (var:string) (n:int) =
- match env with
- [] -> failwith(var ^
- " : no such variable in environment for update")
- | ((a,v) as pair)::tl ->
- if a = var then
- (a,n)::tl
- else
- pair::(update tl var n);;
-\end{verbatim}
-The programming style of OCaml relies on pattern matching construct,
-with specific syntax to describe lists. Thus the expression
-\begin{alltt}
- match env with [] -> \(e\sub{1}\) | a::tl -> \(e\sub{2}\)
-\end{alltt}
-means ``if the input data is an empty list, then return the
-value of \(e_1\), otherwise, let {\tt a} be the first element of
-the list and {\tt tl} be the second element, compute the value
-of \(e_2\), which may depend on the value of {\tt a} and {\tt tl},
-and return this value. In the functions, we know that the elements
-of the lists being processed are pairs and the pattern-matching
-construct also makes it possible to access the components of
-the first pair element concisely.
-
-The pattern-matching construct makes it possible to write the
-functions more concisely than in the C programming language. This
-conciseness also helps in the description of the {\tt evaluate},
-{\tt b\_evaluate}, and {\tt execute} functions.
-\begin{verbatim}
-let rec evaluate (env:environment)(e:exp) =
- match e with
- Numeral n -> n
- | Variable s -> lookup env s
- | Addition(e1,e2) -> evaluate env e1 + evaluate env e2;;
-
-let evaluate_bool (env:environment)(e:b_exp) =
- match e with
- Greater(e1,e2) -> evaluate env e1 > evaluate env e2;;
-
-let rec execute (env:environment)(i:inst) =
- match i with
- Skip -> env
- | Assignment(s,e) -> update env s (evaluate env e)
- | Sequence(i1,i2) -> execute (execute env i1) i2
- | While(b,i) as it ->
- if (evaluate_bool env b) then
- execute (execute env i) it
- else
- env;;
-\end{verbatim}
-The OCaml tools suite also provide means to define parsers. We
-used the {\tt ocamlyacc} tool which is close to {\tt bison}. In all,
-we are able to define the same interpreter more concisely with the
-OCaml language.
-
-\section{An interpreter written in Prolog}
-The Prolog programming style relies on the possibility to define
-predicates with multiple arguments. Very often, these predicates
-can be used to describe functions, when the last argument is actually
-viewed as the result of processing the previous argument.
-
-Predicates are usually described in a style that is closed to OCaml's
-pattern-matching. Thus, we will consider a predicate {\tt lookup}
-with three arguments, so that {\tt lookup(E,X,V)} can be read as
-``the value of {\tt X} in the environment {\tt E} is {\tt V}.
-This predicated can actually be defined with the following two clauses.
-\begin{verbatim}
-lookup([(X,V)|_], X, V) :- ! .
-lookup([(Y,_)|Tl], X, V) :- lookup(Tl, X, V).
-\end{verbatim}
-Prolog's treatement of definitions is that when a question is asked, the
-defining clauses are tried to solve the question. The bang ponctuation
-means that no other clause should be tested if this one succeeded. In
-effect, this ensures that the second clause is applied only when {\tt X}
-and {\tt Y} have different values. The {\em anonymous} variable ``{\tt \_}''
-is used to indicate that the value found in the question where this
-variable lies is not important. The two clauses defining {\tt lookup}
-can be read as follows: ``in any environment that is a list with
-a first pair ({\tt X}, {\tt V}), the value of the variable {\tt X} is {\tt V}.
-When the previous rule does not apply (when {\tt X} is different from {\tt Y},
-the value of {\tt Y} in an environment that starts with a pair
-of {\tt Y} and any value and whose tail is an environment {\tt Tl},
-the value of the variable {\tt X} is the same as the value it would
-have in the environment {\tt Tl}.
-
-Thus, in Prolog, the notation {\tt :-} can be read as an {\em if}
-construct.
-
-We can define the operation of updating environments similarly:
-\begin{verbatim}
-update([(X,_)|Tl], X, V1, [(X,V1)|Tl]) :- !.
-update([(Y,V)|Tl], X, V1, [(Y,V)|Tl1]) :-
- update(Tl, X, V1, Tl1).
-\end{verbatim}
-
-\end{document}