diff --git a/.drone.yml b/.drone.yml index 2504c34..9021162 100644 --- a/.drone.yml +++ b/.drone.yml @@ -3,8 +3,16 @@ type: docker name: default steps: -- name: Getting bob - image: alpine/git - commands: - - git clone https://git.simonpetit.top/simonpetit/bob - +- name: Deploy + image: appleboy/drone-scp + settings: + host: + - simonpetit.top + username: debian + password: + from_secret: ssh_password + port: 22 + target: /var/www/html/blog/ + source: + - index.html + - posts/*.html diff --git a/drafts/awk_for_static_site_generation.md b/drafts/published/awk_for_static_site_generation.md similarity index 100% rename from drafts/awk_for_static_site_generation.md rename to drafts/published/awk_for_static_site_generation.md diff --git a/index.html b/index.html index ec46757..e98061e 100644 --- a/index.html +++ b/index.html @@ -12,6 +12,7 @@

simpet

diff --git a/posts/awk_for_static_site_generation.html b/posts/awk_for_static_site_generation.html new file mode 100644 index 0000000..736f37e --- /dev/null +++ b/posts/awk_for_static_site_generation.html @@ -0,0 +1,159 @@ + + + + + + simpet + + + + + + +

simpet

+
+

A static site generator

+

when I decided to start blogging, it was mostly for me to learn and remember all tech thing I learnt throughout time.

+

I also want to explore a wide diversity of technology, not focus on a particular one.

+

Hence to start blogging, I obviously needed a static site generator.

+

Many of them exist already, like Hugo for example, however rewriting one from scratch is typically the kind of exercise I want to throw myself into.

+

The advantage of a static site is clearly its loading speed : a simple html file, combined with a small licked css, and a whole new blog is born

+

Anyway, writing this static site generator from scratch is also the perfect excuse to explore a not so widely know technology to manipulate text files.

+

Introduction to AWK

+

AWK, from the intials of its creator, is an old an powerful text file maniulation. Syntactically close to C, it is a scripting language to manipulation text entries.

+

Its [wikipedia page](https://en.wikipedia.org/wiki/AWK) sums up nicely its story.

+

I thought it was clever to use is for a site generator, to parse markdown files and generate html ones.

+

However, according to this [listing](https://jamstack.org/generators/) of static site generator programs, another one has had the same idea.

+

Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8years).

+

Parsing markdown

+

Following the official [syntax](https://daringfireball.net/projects/markdown/syntax), is a good start for a parser.

+

AWK works as follow : it takes an optional regex and execute some code between bracket, as a function, at each line of the text input.

+

For example :

+
/^#/ {
+    print "

" $0 "

" +} +
+
+

Although `$n` refers to the n-th records in the line (according to a delimiter, like in a csv), the special `$0` refers to the whole line.

+

In this case, for each line starting with `#`, awk will print (to the standard output), `

[content of the line]

`.

+

This is the beginning to parse headers in markdown.

+

However, by trying this, we immediatly see that `#` is part of the whole line, hence it also appear in the html whereas it sould not.

+

AWK has a way to prevent this, as it is a complete scripting language, with built-in functions, that enable further manipulations.

+
/^#/ {
+    print "

" substr($0, 3) "

" +} +
+
+

In the example above, as per the [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function)

+

it returns the subtring of `$0` starting at 3 (1 being `#` and 2 the whitespace following it) to the end of the line.

+

Now this is better, but we now are able to generalized it to all headers. Another function, `match` can return the number of char matched by a regex,

+

and allows the script to dynamically determine which depth of header it parses :

+
/^#+ / {
+    match($0, /#+ /);
+    n = RLENGTH;
+    print "" substr($0, n + 1) "" 
+}
+
+
+

Reproducing this technique to parse the rest proves to be difficult, as lists for example, are not contained in a single line, hence

+

how to know when to close it with `` or ``

+

Introducing a LIFO stack

+

Since according to the markown syntax, it is possible to have nested blocks such as headers and lists withing blockquotes, or lists withing lists, I came with the simple idea to track to current environnement in a stack in AWK.

+

Turns out it came out to be easy, I only needed a pointer to track the size of the lifo, a fonction to push an element, an another one to pop one out :

+
BEGIN {
+    env = "none"
+    stack_pointer = 0
+    push(env)
+}
+
+
+
# Function to push a value onto the stack
+function push(value) {
+    stack_pointer++
+    stack[stack_pointer] = value
+}
+
+
+
# Function to pop a value from the stack (LIFO)
+function pop() {
+    if (stack_pointer > 0) {
+        value = stack[stack_pointer]
+        delete stack[stack_pointer]
+        stack_pointer--
+        return value
+    } else {
+        return "empty"
+    }
+}
+
+
+

The stack does not have to be strictly declared. The value of inside the LIFO correspond to the current markdown environment.

+

This is a clever trick, because when I need to close an html tag, I use the poped element between a `` instead of having a matching table.

+

I also used a simple `last()` function to return the last pushed value in the stack without popping it out :

+
# Function to get last value in LIFO
+function last() {
+    return stack[stack_pointer]
+}
+
+
+

This way, parsing lists became trivial :

+
# Matching unordered lists
+/^[-+*] / {
+    env = last()
+    if (env == "ul" ) {
+        # In a unordered list block, print a new item 
+        print "
  • " substr($0, 3) "
  • " + } else { + # Otherwise, init the unordered list block + push("ul") + print "
    +

    I believe the code is pretty self explanatory, but when the last environement is not `ul`, then we enter this environement.

    +

    This translates as pushing it to the stack.

    +

    Otherwise, it means we are already reading a list, and we only need to add a new element to it.

    +

    Parsing the simple paragraph and ending the parser

    +

    I showed examples of lists and headers, but it works the same way for code blocks, blockquotes, etc.. Only the simple paragraph is different :

    +

    it does not start with a specific caracter. That is, to match it, we match everything that is not a special character.

    +

    I have no idea if this is the best solution, but so far it proved to work:

    +
    # Matching a simple paragraph
    +!/^(#|\*|-|\+|>|`|$|	|    )/ {
    +    env = last() 
    +    if (env == "none") {
    +        # If no block, print a paragraph
    +        print "

    " replaceEmAndStrong($0) "

    " + } else if (env == "blockquote") { + print $0 + } +} +
    +
    +

    AS `BEGIN`, AWK provide the possibilty to execute code at the very end of the file, with the `END` keyword.

    +

    Naturally we need to empty the stack and close all html tags that might have been opened during the parsing.

    +

    It only is a while loop, until the last environement is "none", as it way initiated :

    +
    END {
    +    env = last()
    +    while (env != "none") {
    +        env = pop()
    +        print ""
    +        env = last()
    +    }
    +}
    +
    +
    +

    This way we are able to simply parse markdown and turn it into an HTML file.

    +

    Of course I am aware that is lacks emphasis, strong and code within a line of text.

    +

    However I did implement it, but maybe it will be explained in another edit of this post.

    +

    Nonetheless the code can still be consulted on [github](https://github.com/SiwonP/bob).

    +

    A testing suite for markdown parser

    +

    Having a markdown parser is cool, having one well tested id better.

    +

    I embarked in writing a testing suite for markdown parsers. I wanted it to be generic, meaning you only had to provide a parsing program,

    +

    that takes markdown in the standard input, and returns html in the standard output.

    +

    All tests would be provided by the test suite.

    +
    + + +