first article

2024-11-16 12:27:47 +01:00 · 2024-11-16 12:27:47 +01:00 · 8ef470c84d
commit 8ef470c84d
parent c8a549cc7c
2 changed files with 102 additions and 2 deletions
--- a/drafts/published/awk_for_static_site_generation.md
+++ b/drafts/published/awk_for_static_site_generation.md
@ -122,7 +122,7 @@ I have no idea if this is the best solution, but so far it proved to work:
        env = last() 
        if (env == "none") {
            # If no block, print a paragraph
-            print "&lt;p&gt;" replaceEmAndStrong($0) "&lt;/p&gt;"
+            print "&lt;p&gt;" $0 "&lt;/p&gt;"
        } else if (env == "blockquote") {
            print $0
        }
@ -151,3 +151,56 @@ Nonetheless the code can still be consulted on [github](https://github.com/Siwon
 For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.
 Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.

+A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.
+Whenever the pattern is found, two global variables are filled :
+- RSTART : the index of the first character matching the *group*
+- RLENGTH: the length of the matched *group*
+
+For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.
+
+This way `match(line, /\*([^*]+)\*/)` matches a string surrounded by two `*`, corresponding to an emphasis text.
+The `*` are espaced are thez are special characters, and the *group* is inside the parenthesis.
+To matche several instances of emphasis text within a line, a simple `while` will do the trick.
+We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.
+We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the 
+matched substrings and reconstruct the actual html string :
+
+
+    while (match(line, /\*([^*]+)\*/)) {
+        start = RSTART
+        end = RSTART + RLENGTH - 1
+        # Build the result: before match, <em>, content, </em>, after match
+        line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
+    }
+  
+We now can repeat the pattern for all inline fonctionnalities, e.g. strong and code.
+
+The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.
+No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.
+
+This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.
+As above, we store the `start` and `end` and also the whole match :
+ 
+    start = RSTART
+    end = RSTART + RLENGTH - 1
+    matched = substr($0, RSTART, RLENGTH)
+
+It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`
+
+
+    if (match(matched, /\[([^\]]+)\]/)) {
+        matched_link = substr(matched, RSTART+1, RLENGTH-2) 
+    }
+    if (match(matched, /\([^\)]+\)/)) {
+        matched_url = substr(matched, RSTART+1, RLENGTH-2)
+    }
+
+As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :
+
+    line = substr(line, 1, start-1) "<a href=\"" matched_url "\">" matched_link "</a>" substr(line, end+1)
+
+The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.
+
+This, of course, is the first brick of a static site generator, maybe the most complexe one. 
+We shall see up next how to orchestrate this parser to make is a actual site generator.
+
--- a/posts/awk_for_static_site_generation.html
+++ b/posts/awk_for_static_site_generation.html
@ -124,7 +124,7 @@ function last() {
    env = last() 
    if (env == "none") {
        # If no block, print a paragraph
-        print "&lt;p&gt;" replaceEmAndStrong($0) "&lt;/p&gt;"
+        print "&lt;p&gt;" $0 "&lt;/p&gt;"
    } else if (env == "blockquote") {
        print $0
    }
@ -151,6 +151,53 @@ function last() {
 <h2>Parsing in-line fonctionnalities</h2>
 <p>For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.</p>
 <p>Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.</p>
+<p>A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.</p>
+<p>Whenever the pattern is found, two global variables are filled :</p>
+<ul>
+<li>RSTART : the index of the first character matching the *group*</li>
+<li>RLENGTH: the length of the matched *group*</li>
+</ul>
+<p>For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.</p>
+<p>This way `match(line, /\<em>([^</em>]+)\<em>/)` matches a string surrounded by two `</em>`, corresponding to an emphasis text.</p>
+<p>The `<em>` are espaced are thez are special characters, and the </em>group* is inside the parenthesis.</p>
+<p>To matche several instances of emphasis text within a line, a simple `while` will do the trick.</p>
+<p>We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.</p>
+<p>We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the </p>
+<p>matched substrings and reconstruct the actual html string :</p>
+<pre><code>while (match(line, /\*([^*]+)\*/)) {
+    start = RSTART
+    end = RSTART + RLENGTH - 1
+    # Build the result: before match, <em>, content, </em>, after match
+    line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
+}
+</code>
+</pre>
+<p>The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.</p>
+<p>No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.</p>
+<p>This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.</p>
+<p>As above, we store the `start` and `end` and also the whole match :</p>
+<p> </p>
+<pre><code>start = RSTART
+end = RSTART + RLENGTH - 1
+matched = substr($0, RSTART, RLENGTH)
+</code>
+</pre>
+<p>It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`</p>
+<pre><code>if (match(matched, /\[([^\]]+)\]/)) {
+    matched_link = substr(matched, RSTART+1, RLENGTH-2) 
+}
+if (match(matched, /\([^\)]+\)/)) {
+    matched_url = substr(matched, RSTART+1, RLENGTH-2)
+}
+</code>
+</pre>
+<p>As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :</p>
+<pre><code>line = substr(line, 1, start-1) "<a href="" matched_url "">" matched_link "</a>" substr(line, end+1)
+</code>
+</pre>
+<p>The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.</p>
+<p>This, of course, is the first brick of a static site generator, maybe the most complexe one. </p>
+<p>We shall see up next how to orchestrate this parser to make is a actual site generator.</p>
    </article>
 </body>