# A static site generator when I decided to start blogging, it was mostly for me to learn and remember all tech thing I learnt throughout time. I also want to explore a wide diversity of technology, not focus on a particular one. Hence to start blogging, I obviously needed a static site generator. Many of them exist already, like Hugo for example, however rewriting one from scratch is typically the kind of exercise I want to throw myself into. The advantage of a static site is clearly its loading speed : a simple html file, combined with a small licked css, and a whole new blog is born Anyway, writing this static site generator from scratch is also the perfect excuse to explore a not so widely know technology to manipulate text files. ## Introduction to AWK AWK, from the intials of its creator, is an old an powerful text file maniulation. Syntactically close to C, it is a scripting language to manipulation text entries. Its [wikipedia page](https://en.wikipedia.org/wiki/AWK) sums up nicely its story. I thought it was clever to use is for a site generator, to parse markdown files and generate html ones. However, according to this [listing](https://jamstack.org/generators/) of static site generator programs, another one has had the same idea. Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8years). ## Parsing markdown Following the official [syntax](https://daringfireball.net/projects/markdown/syntax), is a good start for a parser. AWK works as follow : it takes an optional regex and execute some code between bracket, as a function, at each line of the text input. For example : /^#/ { print "

" $0 "

" } Although `$n` refers to the n-th records in the line (according to a delimiter, like in a csv), the special `$0` refers to the whole line. In this case, for each line starting with `#`, awk will print (to the standard output), `

[content of the line]

`. This is the beginning to parse headers in markdown. However, by trying this, we immediatly see that `#` is part of the whole line, hence it also appear in the html whereas it sould not. AWK has a way to prevent this, as it is a complete scripting language, with built-in functions, that enable further manipulations. `substr` acts as its name indicates, it return a substring of its argument. /^#/ { print "

" substr($0, 3) "

" } In the example above, as per the [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function) it returns the subtring of `$0` starting at 3 (1 being `#` and 2 the whitespace following it) to the end of the line. Now this is better, but we now are able to generalized it to all headers. Another function, `match` can return the number of char matched by a regex, and allows the script to dynamically determine which depth of header it parses : /^#+ / { match($0, /#+ /); n = RLENGTH; print "" substr($0, n + 1) "" } Reproducing this technique to parse the rest proves to be difficult, as lists for example, are not contained in a single line, hence how to know when to close it with `` or `` ## Introducing a LIFO stack Since according to the markown syntax, it is possible to have nested blocks such as headers and lists withing blockquotes, or lists withing lists, I came with the simple idea to track to current environnement in a stack in AWK. Turns out it came out to be easy, I only needed a pointer to track the size of the lifo, a fonction to push an element, an another one to pop one out : BEGIN { env = "none" stack_pointer = 0 push(env) } # Function to push a value onto the stack function push(value) { stack_pointer++ stack[stack_pointer] = value } # Function to pop a value from the stack (LIFO) function pop() { if (stack_pointer > 0) { value = stack[stack_pointer] delete stack[stack_pointer] stack_pointer-- return value } else { return "empty" } } The stack does not have to be strictly declared. The value of inside the LIFO correspond to the current markdown environment. This is a clever trick, because when I need to close an html tag, I use the poped element between a `` instead of having a matching table. I also used a simple `last()` function to return the last pushed value in the stack without popping it out : # Function to get last value in LIFO function last() { return stack[stack_pointer] } This way, parsing lists became trivial : # Matching unordered lists /^[-+*] / { env = last() if (env == "ul" ) { # In a unordered list block, print a new item print "
  • " substr($0, 3) "
  • " } else { # Otherwise, init the unordered list block push("ul") print "