aboutsummaryrefslogtreecommitdiffstats

Project Contribution

The preprocessor and lexer are part of FMark, which a markdown parser in F#. This sub project contains the lexer and the preprocessor for the markdown parser. The preprocessor is a completely separate parser which preprocesses the markdown before passing it to the lexer and finally the parser.

Preprocessor

This project contains the Preprocessor for FMark. The preprocessor adds templating capabilities to FMark, which was inspired by Liquid.

Specification

Supported Constructs

These are the supported constructs in the preprocessor.

Supported Syntax Description Tested
Simple Macro {% macro name value %} Sets the Macro name equal to the string value Unit Test
Function Macro {% macro name(arg1; arg2) value %} Sets the Macro name equal to the string value with two parameters. Unit Test
Simple Evaluation {{ macro_name }} Evaluates the macro macro_name and replaces the evaluation with the evaluated body of the macro. Unit Test
Function Evaluation {{ macro_name(arg 1; arg 2) }} Evaluates the macro macro_name with the arguments arg 1 and arg 2 and replaces the evaluation with the evaluated body of the macro. Unit Test
File Include {{ include relative/path/to/file }} Includes and preprocesses a file using a relative or absolute path. The macros declared in that file will then be available in the current file Unit Test
Complex Macro Evaluation {{ x( {{ y( {{z}} ; Hello ) }} ; {{z}} ) }} Nested macro evaulations are supported. This way, default arguments can be created for other macros. Unit Test

Supported Features

These are the features that are currently supported by the preprocessor.

Feature Example Description Tested
Simple whitespace control {% macro x y %} evaluates to y and not y . Removes whitespace and newlines in macros where one wouldn't expect them to be added to the macro body. Unit Test
Shadowing of macros through {% macro x x %} {% macro y(x) {{ x }} %} with {{ y(z) }} will evaluate to z but {{ x }} outside of the macro will always evaluate to x. Macros can be shadowed by arguments of other macros. Unit Test
Nested macros {% macro x {% macro y %} %} Macro y is only defined inside macro x and cannot be seen outside of the scope of x. Unit Test
Shadowing of macros through {% macro x x %} {% macro y {% macro x z %} {{x}} %} y: {{ y }}, x: {{ x }} will evaluate to y: z, x: x Macros can be shadowed by other macros which will be used instead for evaluation. Unit Test
Evaluation of large strings {{ x(This is the first argument; This is the second argument) }} One can pass large strings as arguments to the macros. Unit Test
Escaping of characters inside argument {{ x(arg 1 with a \); arg 2 with a \;) }} One can esape all the special characters inside macros and substitutions Unit Test
Escaping macros \{% macro x y %} This will escape the whole macro and not evaluate it Unit Test
Escaping Subsitutions \{{ x }} will not evaluate the substitution but instead output it literally Unit Test
Outputting unmatched subsituttion {{ x }} -> {{ x }} if not in scope If the subsitution is not matched, it will output it as it got it Unit Test
Nested Evaluations {{ x( {{y}} ) }} Arguments can now be evaluated inside them. Unit Test

Usage

To use the preprocessor and the lexer, a string or a list of strings can be used, depending on if there are multiple lines or not. For a single string, the following can be used.

For string, the preprocess and lex functions.

[<EntryPoint>]
let main =
    let inputString = (* Read the string *)

    inputString
    |> preprocess
    |> lex
    ...

For a list of strings, one can use the preprocessList and lexList functions.

[<EntryPoint>]
let main =
    let inputStringList = (* Read the string list *)

    inputStringList
    |> preprocessList
    |> lexList
    ...

Example

In markdown using the preprocessor, one can then write the following:

Text before macro
{% macro Hello(arg1; arg2)
This is text inside the macro, with semicolons;
{% macro local(arg1; arg2)
This is the second macro
%}
Now back in the first macro.
{{ local(arg1; arg2) }}
%}
Outside both macros
Should be printed as not in scope: {{ local(arg1; arg2) }}

{{ Hello(arg1; arg2) }}

which then evaluates to

Text before macro
Outside both macros
Should be printed as not in scope: {{ local(arg1; arg2) }}


This is text inside the macro, with semicolons;
Now back in the first macro.

This is the second macro 

More complicated macros can also be created by writing html in the macros. Due to the html passthrough in the lexer, the html will be copied over literally to the output html.

Future improvements

There are many features that will be introduced into the preprocessor in the future. Some of the future constructs can be seen below.

Construct Description
for loop A for loop that will repeat whatever is put into the body
ifdef Check if a macro is defined
Expressions Introduce arithmetic expressions
if Check if a condition is true, which will need the introduction of Expressions

There are also some features that could be added.

Feature Description
{%- -%} New delimiter that will completely remove the whitespace of the macro at that point

Lexer

Interface to the Parser

The interface to the parser was done using the following Token type, which the parser takes in and can parse.

type Token =
    | CODEBLOCK of string * Language
    | LITERAL of string
    | WHITESPACE of size: int
    | NUMBER of string
    | HASH | PIPE | EQUAL | MINUS | PLUS | ASTERISK | DOT
    | DASTERISK | TASTERISK | UNDERSCORE | DUNDERSCORE | TUNDERSCORE | TILDE | DTILDE
    | TTILDE | LSBRA | RSBRA | LBRA | RBRA | BSLASH | SLASH | LABRA | RABRA | LCBRA
    | RCBRA | BACKTICK | TBACKTICK | EXCLAMATION | ENDLINE | COLON | CARET | PERCENT

Features

Supports escaping of all the special characters defined in Types. This is done by adding a \ in front of the character that should be escaped.

Tokens that match multiple characters can also be escaped by just putting a \ before it. For example, *** can be escaped by writing \***.

Extensibility

It can easily be extended by adding the type of the token to Token above. Then the string has to be linked to the token by adding it as a tuple of type string * Token to a list called charList in the Lexer.

Test Plan

The lexer and the preprocessor were built using a test-driven manner, by writing tests first and then making them pass with the code. This means that the goal of the code is well defined beforehand and can more easily be written. It is then much easier to test the whole code by just running all the unit tests, instead of manually testing it everytime, as that could mean that pevious functionality might not work anymore.

Unit tests were used to make small tests that were going to have to pass. After the code was written, property based tests made sure that the main functions were working as they were supposed to.

More tests were added once all the functionality was there to thoroughly test the preprocessor and lexer. These tests were chosen for relatively large functions that were used directly in the workflow and not for small functions that are used by these larger functions, as they would make them fail if they didn't work. This could then be detected by running tests on the larger functions that used these tests.

As many edge cases as possible were identified for the preprocessor and tokenizer and tested using unit tests as well, which identified a few bugs, such as issues with whitespace in macros.

Finally, a property based test was added for the preprocessor that tests that the preprocessed output, when preprocessed again is the same. This is the only property test that seemed to work. Trying to create a lex property based test that would compare the input to the output, a lot of differences were found that I had previously not thought about. This type of test did not work, because the Token type does not restrict the type enough. FsCheck would generate a LITERAL "" which would never be generated by the lexer, however is still a valid type. The same goes for NUMBER "" and WHITESPACE 0. The ENDLINE token will also always be at the end of the list for a single string for lex, however, FsCheck would put it anywhere.

Summary

  1. Test while writing the different functions and when implementing new features.

  2. Add unit tests to test edge cases.

  3. Add property based tests.

Property based tests

Preprocessor

This property based test runs a random string that is generated by FsCheck and runs it through the preprocessor. It then checks if the output string is the same as the string if it passed through the preprocessor a second time.

Unit tests

Preprocessor

Next Token

Name Status
Openeval Pass
Closeeval Pass
Opendef Pass
Semicolon Pass
Long random text Pass

Tokenize

Name Status
All Tokens Pass
Macro Pass
Subsitution Pass
Normal markdown Pass
Escaped character in sentence Pass

Parse

Name Status
Macro with multiple arguments and inline body Pass
Substitution Pass
Substitution with argument Pass
Substitution with multiple arguments Pass
Substitution with argument and spaces Pass

Preprocess

Name Status
Simple text does not change Pass
Simple text does not change with special chars Pass
Simple macro with no arguments Pass
Simple macro with empty brackets Pass
Simple macro evaluation Pass
Print out the input when substitution not in scope Pass
Escaping macro bracket should make the original input appear Pass
Shadowed macros and arguments Pass
Shadowed macros Pass
Macro with different arguments Pass
Macro with long name Pass

Preprocess List

Name Status
Multiline macro evaluation with newline Pass
Multiline macro without newline Pass
Multiline macro with arguments Pass

Lexer

Lex

Name Status
All Tokens Pass
Literal Pass
Number Pass
WhiteSpace Pass
Very simple markdown Pass
With special characters Pass
Escaping characters Pass

lexList

Name Status
Very simple multiline markdown Pass
With special characters Pass
Escaping characters Pass