From cd22d9c8d2bca62059e00951194116a3829178b2 Mon Sep 17 00:00:00 2001
From: Simon Petit <nomisp96@hotmail.fr>
Date: Fri, 8 Nov 2024 21:54:42 +0100
Subject: [PATCH] new pipelin

---
 .drone.yml                                    |  18 +-
 .../awk_for_static_site_generation.md         |   0
 index.html                                    |   1 +
 posts/awk_for_static_site_generation.html     | 159 ++++++++++++++++++
 4 files changed, 173 insertions(+), 5 deletions(-)
 rename drafts/{ => published}/awk_for_static_site_generation.md (100%)
 create mode 100644 posts/awk_for_static_site_generation.html
diff --git a/.drone.yml b/.drone.yml
index 2504c34..9021162 100644
--- a/.drone.yml
+++ b/.drone.yml
@@ -3,8 +3,16 @@ type: docker
 name: default
 
 steps:
-- name: Getting bob
-  image: alpine/git 
-  commands:
-  - git clone https://git.simonpetit.top/simonpetit/bob 
-
+- name: Deploy
+  image: appleboy/drone-scp
+  settings:
+    host:
+      - simonpetit.top
+    username: debian
+    password:
+      from_secret: ssh_password
+    port: 22
+    target: /var/www/html/blog/
+    source:
+      - index.html
+      - posts/*.html
diff --git a/drafts/awk_for_static_site_generation.md b/drafts/published/awk_for_static_site_generation.md
similarity index 100%
rename from drafts/awk_for_static_site_generation.md
rename to drafts/published/awk_for_static_site_generation.md
diff --git a/index.html b/index.html
index ec46757..e98061e 100644
--- a/index.html
+++ b/index.html
@@ -12,6 +12,7 @@
 <body>
     <h1 class='title'>simpet</h1>
     <ul>
+<li><a href="./posts/awk_for_static_site_generation.html">awk for static site generation</a></li>
 </ul>
 </body>
 
diff --git a/posts/awk_for_static_site_generation.html b/posts/awk_for_static_site_generation.html
new file mode 100644
index 0000000..736f37e
--- /dev/null
+++ b/posts/awk_for_static_site_generation.html
@@ -0,0 +1,159 @@
+<!DOCTYPE html>
+<html lang="fr" dir="ltr">
+
+<head>
+    <meta charset="utf-8">
+    <title>simpet</title>
+    <meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover">
+    <link href="https://fonts.googleapis.com/css?family=Cutive+Mono|IBM+Plex+Mono&display=swap" rel="stylesheet">
+    <link rel="stylesheet" type="text/css" href="../css/poststyle.css">
+</head>
+
+<body>
+    <h1 class='title'><a href="../index.html">simpet</a></h1>
+    <article>
+        <h1>A static site generator</h1>
+<p>when I decided to start blogging, it was mostly for me to learn and remember all tech thing I learnt throughout time.</p>
+<p>I also want to explore a wide diversity of technology, not focus on a particular one.</p>
+<p>Hence to start blogging, I obviously needed a static site generator. </p>
+<p>Many of them exist already, like Hugo for example, however rewriting one from scratch is typically the kind of exercise I want to throw myself into.</p>
+<p>The advantage of a static site is clearly its loading speed : a simple html file, combined with a small licked css, and a whole new blog is born</p>
+<p>Anyway, writing this static site generator from scratch is also the perfect excuse to explore a not so widely know technology to manipulate text files. </p>
+<h2>Introduction to AWK</h2>
+<p>AWK, from the intials of its creator, is an old an powerful text file maniulation. Syntactically close to C, it is a scripting language to manipulation text entries.</p>
+<p>Its [wikipedia page](https://en.wikipedia.org/wiki/AWK) sums up nicely its story.</p>
+<p>I thought it was clever to use is for a site generator, to parse markdown files and generate html ones.</p>
+<p>However, according to this [listing](https://jamstack.org/generators/) of static site generator programs, another one has had the same idea.</p>
+<p>Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8years).</p>
+<h2>Parsing markdown</h2>
+<p>Following the official [syntax](https://daringfireball.net/projects/markdown/syntax), is a good start for a parser.</p>
+<p>AWK works as follow : it takes an optional regex and execute some code between bracket, as a function, at each line of the text input.</p>
+<p>For example :</p>
+<pre><code>/^#/ {
+    print "<h1>" $0 "</h1>"
+}
+</code>
+</pre>
+<p>Although `$n` refers to the n-th records in the line (according to a delimiter, like in a csv), the special `$0` refers to the whole line.</p>
+<p>In this case, for each line starting with `#`, awk will print (to the standard output), `<h1> [content of the line] </h1>`.</p>
+<p>This is the beginning to parse headers in markdown.</p>
+<p>However, by trying this, we immediatly see that `#` is part of the whole line, hence it also appear in the html whereas it sould not.</p>
+<p>AWK has a way to prevent this, as it is a complete scripting language, with built-in functions, that enable further manipulations.</p>
+<pre><code>/^#/ {
+    print "<h1>" substr($0, 3) "</h1>"
+}
+</code>
+</pre>
+<p>In the example above, as per the [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function) </p>
+<p>it returns the subtring of `$0` starting at 3 (1 being `#` and 2 the whitespace following it) to the end of the line.</p>
+<p>Now this is better, but we now are able to generalized it to all headers. Another function, `match` can return the number of char matched by a regex,</p>
+<p>and allows the script to dynamically determine which depth of header it parses :</p>
+<pre><code>/^#+ / {
+    match($0, /#+ /);
+    n = RLENGTH;
+    print "<h" n-1 ">" substr($0, n + 1) "</h" n-1 ">" 
+}
+</code>
+</pre>
+<p>Reproducing this technique to parse the rest proves to be difficult, as lists for example, are not contained in a single line, hence </p>
+<p>how to know when to close it with `</ul>` or `</ol>`</p>
+<h2>Introducing a LIFO stack</h2>
+<p>Since according to the markown syntax, it is possible to have nested blocks such as headers and lists withing blockquotes, or lists withing lists, I came with the simple idea to track to current environnement in a stack in AWK.</p>
+<p>Turns out it came out to be easy, I only needed a pointer to track the size of the lifo, a fonction to push an element, an another one to pop one out :</p>
+<pre><code>BEGIN {
+    env = "none"
+    stack_pointer = 0
+    push(env)
+}
+</code>
+</pre>
+<pre><code># Function to push a value onto the stack
+function push(value) {
+    stack_pointer++
+    stack[stack_pointer] = value
+}
+</code>
+</pre>
+<pre><code># Function to pop a value from the stack (LIFO)
+function pop() {
+    if (stack_pointer > 0) {
+        value = stack[stack_pointer]
+        delete stack[stack_pointer]
+        stack_pointer--
+        return value
+    } else {
+        return "empty"
+    }
+}
+</code>
+</pre>
+<p>The stack does not have to be strictly declared. The value of inside the LIFO correspond to the current markdown environment.</p>
+<p>This is a clever trick, because when I need to close an html tag, I use the poped element between a `</` and a `>` instead of having a matching table.</p>
+<p>I also used a simple `last()` function to return the last pushed value in the stack without popping it out :</p>
+<pre><code># Function to get last value in LIFO
+function last() {
+    return stack[stack_pointer]
+}
+</code>
+</pre>
+<p>This way, parsing lists became trivial : </p>
+<pre><code># Matching unordered lists
+/^[-+*] / {
+    env = last()
+    if (env == "ul" ) {
+        # In a unordered list block, print a new item 
+        print "<li>" substr($0, 3) "</li>" 
+    } else {
+        # Otherwise, init the unordered list block 
+        push("ul")
+        print "<ul>
+<li>" substr($0, 3) "</li>"
+    }
+}
+</code>
+</pre>
+<p>I believe the code is pretty self explanatory, but when the last environement is not `ul`, then we enter this environement.</p>
+<p>This translates as pushing it to the stack.</p>
+<p>Otherwise, it means we are already reading a list, and we only need to add a new element to it.</p>
+<h2>Parsing the simple paragraph and ending the parser</h2>
+<p>I showed examples of lists and headers, but it works the same way for code blocks, blockquotes, etc.. Only the simple paragraph is different : </p>
+<p>it does not start with a specific caracter. That is, to match it, we match everything that is not a special character.</p>
+<p>I have no idea if this is the best solution, but so far it proved to work:</p>
+<pre><code># Matching a simple paragraph
+!/^(#|\*|-|\+|>|`|$|	|    )/ {
+    env = last() 
+    if (env == "none") {
+        # If no block, print a paragraph
+        print "<p>" replaceEmAndStrong($0) "</p>"
+    } else if (env == "blockquote") {
+        print $0
+    }
+}
+</code>
+</pre>
+<p>AS `BEGIN`, AWK provide the possibilty to execute code at the very end of the file, with the `END` keyword.</p>
+<p>Naturally we need to empty the stack and close all html tags that might have been opened during the parsing.</p>
+<p>It only is a while loop, until the last environement is "none", as it way initiated : </p>
+<pre><code>END {
+    env = last()
+    while (env != "none") {
+        env = pop()
+        print "</" env ">"
+        env = last()
+    }
+}
+</code>
+</pre>
+<p>This way we are able to simply parse markdown and turn it into an HTML file.</p>
+<p>Of course I am aware that is lacks emphasis, strong and code within a line of text. </p>
+<p>However I did implement it, but maybe it will be explained in another edit of this post.</p>
+<p>Nonetheless the code can still be consulted on [github](https://github.com/SiwonP/bob).</p>
+<h1>A testing suite for markdown parser</h1>
+<p>Having a markdown parser is cool, having one well tested id better.</p>
+<p>I embarked in writing a testing suite for markdown parsers. I wanted it to be generic, meaning you only had to provide a parsing program,</p>
+<p>that takes markdown in the standard input, and returns html in the standard output.</p>
+<p>All tests would be provided by the test suite.</p>
+    </article>
+</body>
+
+</html>