first article
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
Simon Petit 2024-11-16 12:27:47 +01:00
parent c8a549cc7c
commit 8ef470c84d
2 changed files with 102 additions and 2 deletions

View File

@ -122,7 +122,7 @@ I have no idea if this is the best solution, but so far it proved to work:
env = last()
if (env == "none") {
# If no block, print a paragraph
print "<p>" replaceEmAndStrong($0) "</p>"
print "<p>" $0 "</p>"
} else if (env == "blockquote") {
print $0
}
@ -151,3 +151,56 @@ Nonetheless the code can still be consulted on [github](https://github.com/Siwon
For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.
Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.
A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.
Whenever the pattern is found, two global variables are filled :
- RSTART : the index of the first character matching the *group*
- RLENGTH: the length of the matched *group*
For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.
This way `match(line, /\*([^*]+)\*/)` matches a string surrounded by two `*`, corresponding to an emphasis text.
The `*` are espaced are thez are special characters, and the *group* is inside the parenthesis.
To matche several instances of emphasis text within a line, a simple `while` will do the trick.
We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.
We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the
matched substrings and reconstruct the actual html string :
while (match(line, /\*([^*]+)\*/)) {
start = RSTART
end = RSTART + RLENGTH - 1
# Build the result: before match, <em>, content, </em>, after match
line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
}
We now can repeat the pattern for all inline fonctionnalities, e.g. strong and code.
The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.
No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.
This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.
As above, we store the `start` and `end` and also the whole match :
start = RSTART
end = RSTART + RLENGTH - 1
matched = substr($0, RSTART, RLENGTH)
It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`
if (match(matched, /\[([^\]]+)\]/)) {
matched_link = substr(matched, RSTART+1, RLENGTH-2)
}
if (match(matched, /\([^\)]+\)/)) {
matched_url = substr(matched, RSTART+1, RLENGTH-2)
}
As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :
line = substr(line, 1, start-1) "<a href=\"" matched_url "\">" matched_link "</a>" substr(line, end+1)
The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.
This, of course, is the first brick of a static site generator, maybe the most complexe one.
We shall see up next how to orchestrate this parser to make is a actual site generator.

View File

@ -124,7 +124,7 @@ function last() {
env = last()
if (env == "none") {
# If no block, print a paragraph
print "&lt;p&gt;" replaceEmAndStrong($0) "&lt;/p&gt;"
print "&lt;p&gt;" $0 "&lt;/p&gt;"
} else if (env == "blockquote") {
print $0
}
@ -151,6 +151,53 @@ function last() {
<h2>Parsing in-line fonctionnalities</h2>
<p>For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.</p>
<p>Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.</p>
<p>A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.</p>
<p>Whenever the pattern is found, two global variables are filled :</p>
<ul>
<li>RSTART : the index of the first character matching the *group*</li>
<li>RLENGTH: the length of the matched *group*</li>
</ul>
<p>For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.</p>
<p>This way `match(line, /\<em>([^</em>]+)\<em>/)` matches a string surrounded by two `</em>`, corresponding to an emphasis text.</p>
<p>The `<em>` are espaced are thez are special characters, and the </em>group* is inside the parenthesis.</p>
<p>To matche several instances of emphasis text within a line, a simple `while` will do the trick.</p>
<p>We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.</p>
<p>We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the </p>
<p>matched substrings and reconstruct the actual html string :</p>
<pre><code>while (match(line, /\*([^*]+)\*/)) {
start = RSTART
end = RSTART + RLENGTH - 1
# Build the result: before match, <em>, content, </em>, after match
line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
}
</code>
</pre>
<p>The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.</p>
<p>No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.</p>
<p>This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.</p>
<p>As above, we store the `start` and `end` and also the whole match :</p>
<p> </p>
<pre><code>start = RSTART
end = RSTART + RLENGTH - 1
matched = substr($0, RSTART, RLENGTH)
</code>
</pre>
<p>It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`</p>
<pre><code>if (match(matched, /\[([^\]]+)\]/)) {
matched_link = substr(matched, RSTART+1, RLENGTH-2)
}
if (match(matched, /\([^\)]+\)/)) {
matched_url = substr(matched, RSTART+1, RLENGTH-2)
}
</code>
</pre>
<p>As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :</p>
<pre><code>line = substr(line, 1, start-1) "<a href="" matched_url "">" matched_link "</a>" substr(line, end+1)
</code>
</pre>
<p>The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.</p>
<p>This, of course, is the first brick of a static site generator, maybe the most complexe one. </p>
<p>We shall see up next how to orchestrate this parser to make is a actual site generator.</p>
</article>
</body>