From f6a71fe353cd2b229738559b90afbe1c67deaf90 Mon Sep 17 00:00:00 2001
From: Simon Petit " replaceEmAndStrong($0) " AWK works as follow : it takes an optional regex and execute some code between bracket, as a function, at each line of the text input. For example : Although `$n` refers to the n-th records in the line (according to a delimiter, like in a csv), the special `$0` refers to the whole line. In this case, for each line starting with `#`, awk will print (to the standard output), `" $0 "
"
+ print "<h1>" $0 "</h1>"
}
Although `$n` refers to the n-th records in the line (according to a delimiter, like in a csv), the special `$0` refers to the whole line.
-In this case, for each line starting with `#`, awk will print (to the standard output), ` [content of the line]
`.
+In this case, for each line starting with `#`, awk will print (to the standard output), `<h1> [content of the line] </h1>`.
This is the beginning to parse headers in markdown.
However, by trying this, we immediatly see that `#` is part of the whole line, hence it also appear in the html whereas it sould not.
AWK has a way to prevent this, as it is a complete scripting language, with built-in functions, that enable further manipulations.
`substr` acts as its name indicates, it return a substring of its argument.
/^#/ {
- print "" substr($0, 3) "
"
+ print "<h1>" substr($0, 3) "</h1>"
}
In the example above, as per the [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function)
@@ -46,11 +46,11 @@ and allows the script to dynamically determine which depth of header it parses :
/^#+ / {
match($0, /#+ /);
n = RLENGTH;
- print "\n
/^#/ {
- print "
" $0 "
"
+ print "<h1>" $0 "</h1>"
}
[content of the line]
`.
In this case, for each line starting with `#`, awk will print (to the standard output), `<h1> [content of the line] </h1>`.
This is the beginning to parse headers in markdown.
However, by trying this, we immediatly see that `#` is part of the whole line, hence it also appear in the html whereas it sould not.
AWK has a way to prevent this, as it is a complete scripting language, with built-in functions, that enable further manipulations.
/^#/ {
- print "" substr($0, 3) "
"
+ print "<h1>" substr($0, 3) "</h1>"
}
@@ -51,12 +51,12 @@
/^#+ / {
match($0, /#+ /);
n = RLENGTH;
- print "" substr($0, n + 1) " "
+ print "<h" n-1 ">" substr($0, n + 1) "</h" n-1 ">"
}
Reproducing this technique to parse the rest proves to be difficult, as lists for example, are not contained in a single line, hence
-how to know when to close it with `` or ``
+how to know when to close it with `</ul>` or `</ol>`
Since according to the markown syntax, it is possible to have nested blocks such as headers and lists withing blockquotes, or lists withing lists, I came with the simple idea to track to current environnement in a stack in AWK.
Turns out it came out to be easy, I only needed a pointer to track the size of the lifo, a fonction to push an element, an another one to pop one out :
@@ -88,7 +88,7 @@ function pop() {The stack does not have to be strictly declared. The value of inside the LIFO correspond to the current markdown environment.
-This is a clever trick, because when I need to close an html tag, I use the poped element between a `` and a `>` instead of having a matching table.
+This is a clever trick, because when I need to close an html tag, I use the poped element between a `</` and a `>` instead of having a matching table.
I also used a simple `last()` function to return the last pushed value in the stack without popping it out :
# Function to get last value in LIFO function last() { @@ -102,12 +102,12 @@ function last() { env = last() if (env == "ul" ) { # In a unordered list block, print a new item - print "
@@ -124,7 +124,7 @@ function last() { env = last() if (env == "none") { # If no block, print a paragraph - print "" substr($0, 3) " " + print "<li>" substr($0, 3) "</li>" } else { # Otherwise, init the unordered list block push("ul") - print "-
- " substr($0, 3) "
" + print "<ul> +<li>" substr($0, 3) "</li>" } }" replaceEmAndStrong($0) "
" + print "<p>" replaceEmAndStrong($0) "</p>" } else if (env == "blockquote") { print $0 } @@ -138,7 +138,7 @@ function last() { env = last() while (env != "none") { env = pop() - print "" env ">" + print "</" env ">" env = last() } }