This commit is contained in:
parent
c8a549cc7c
commit
8ef470c84d
@ -122,7 +122,7 @@ I have no idea if this is the best solution, but so far it proved to work:
|
||||
env = last()
|
||||
if (env == "none") {
|
||||
# If no block, print a paragraph
|
||||
print "<p>" replaceEmAndStrong($0) "</p>"
|
||||
print "<p>" $0 "</p>"
|
||||
} else if (env == "blockquote") {
|
||||
print $0
|
||||
}
|
||||
@ -151,3 +151,56 @@ Nonetheless the code can still be consulted on [github](https://github.com/Siwon
|
||||
For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.
|
||||
Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.
|
||||
|
||||
A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.
|
||||
Whenever the pattern is found, two global variables are filled :
|
||||
- RSTART : the index of the first character matching the *group*
|
||||
- RLENGTH: the length of the matched *group*
|
||||
|
||||
For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.
|
||||
|
||||
This way `match(line, /\*([^*]+)\*/)` matches a string surrounded by two `*`, corresponding to an emphasis text.
|
||||
The `*` are espaced are thez are special characters, and the *group* is inside the parenthesis.
|
||||
To matche several instances of emphasis text within a line, a simple `while` will do the trick.
|
||||
We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.
|
||||
We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the
|
||||
matched substrings and reconstruct the actual html string :
|
||||
|
||||
|
||||
while (match(line, /\*([^*]+)\*/)) {
|
||||
start = RSTART
|
||||
end = RSTART + RLENGTH - 1
|
||||
# Build the result: before match, <em>, content, </em>, after match
|
||||
line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
|
||||
}
|
||||
|
||||
We now can repeat the pattern for all inline fonctionnalities, e.g. strong and code.
|
||||
|
||||
The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.
|
||||
No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.
|
||||
|
||||
This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.
|
||||
As above, we store the `start` and `end` and also the whole match :
|
||||
|
||||
start = RSTART
|
||||
end = RSTART + RLENGTH - 1
|
||||
matched = substr($0, RSTART, RLENGTH)
|
||||
|
||||
It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`
|
||||
|
||||
|
||||
if (match(matched, /\[([^\]]+)\]/)) {
|
||||
matched_link = substr(matched, RSTART+1, RLENGTH-2)
|
||||
}
|
||||
if (match(matched, /\([^\)]+\)/)) {
|
||||
matched_url = substr(matched, RSTART+1, RLENGTH-2)
|
||||
}
|
||||
|
||||
As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :
|
||||
|
||||
line = substr(line, 1, start-1) "<a href=\"" matched_url "\">" matched_link "</a>" substr(line, end+1)
|
||||
|
||||
The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.
|
||||
|
||||
This, of course, is the first brick of a static site generator, maybe the most complexe one.
|
||||
We shall see up next how to orchestrate this parser to make is a actual site generator.
|
||||
|
||||
|
@ -124,7 +124,7 @@ function last() {
|
||||
env = last()
|
||||
if (env == "none") {
|
||||
# If no block, print a paragraph
|
||||
print "<p>" replaceEmAndStrong($0) "</p>"
|
||||
print "<p>" $0 "</p>"
|
||||
} else if (env == "blockquote") {
|
||||
print $0
|
||||
}
|
||||
@ -151,6 +151,53 @@ function last() {
|
||||
<h2>Parsing in-line fonctionnalities</h2>
|
||||
<p>For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.</p>
|
||||
<p>Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.</p>
|
||||
<p>A very useful function in awk is `match` : it literally is a regex engine, looking for a pattern in a string.</p>
|
||||
<p>Whenever the pattern is found, two global variables are filled :</p>
|
||||
<ul>
|
||||
<li>RSTART : the index of the first character matching the *group*</li>
|
||||
<li>RLENGTH: the length of the matched *group*</li>
|
||||
</ul>
|
||||
<p>For the following, `line` represents the line processed by the function, as the following `while` loops are actually part of a single function.</p>
|
||||
<p>This way `match(line, /\<em>([^</em>]+)\<em>/)` matches a string surrounded by two `</em>`, corresponding to an emphasis text.</p>
|
||||
<p>The `<em>` are espaced are thez are special characters, and the </em>group* is inside the parenthesis.</p>
|
||||
<p>To matche several instances of emphasis text within a line, a simple `while` will do the trick.</p>
|
||||
<p>We now only have to insert html tags `<em>` are the right space around the matched text, and we are good to go.</p>
|
||||
<p>We can save the global variables `RSTART` and `RLENGTH` for further use, in case they were to be change. Using them we also can extract the </p>
|
||||
<p>matched substrings and reconstruct the actual html string :</p>
|
||||
<pre><code>while (match(line, /\*([^*]+)\*/)) {
|
||||
start = RSTART
|
||||
end = RSTART + RLENGTH - 1
|
||||
# Build the result: before match, <em>, content, </em>, after match
|
||||
line = substr(line, 1, start-1) "<em>" substr(line, start+1, RLENGTH-2) "</em>" substr(line, end+1)
|
||||
}
|
||||
</code>
|
||||
</pre>
|
||||
<p>The case of url is a bit more deep as we need to match two groups : the actual text and the url itself.</p>
|
||||
<p>No real issue here, the naïve way is to match thd whole, and looking for both the link and the url within the matched whole.</p>
|
||||
<p>This way `match(line, /\[([^\]]+)\]\([^\)]+\)/)` matches a text between `[]` followed by a text between `()` : the markdown representation of links.</p>
|
||||
<p>As above, we store the `start` and `end` and also the whole match :</p>
|
||||
<p> </p>
|
||||
<pre><code>start = RSTART
|
||||
end = RSTART + RLENGTH - 1
|
||||
matched = substr($0, RSTART, RLENGTH)
|
||||
</code>
|
||||
</pre>
|
||||
<p>It is possible to apply the match fonction on this `matched` string, and extract, first, the text in `[]`, and last the text in `()`</p>
|
||||
<pre><code>if (match(matched, /\[([^\]]+)\]/)) {
|
||||
matched_link = substr(matched, RSTART+1, RLENGTH-2)
|
||||
}
|
||||
if (match(matched, /\([^\)]+\)/)) {
|
||||
matched_url = substr(matched, RSTART+1, RLENGTH-2)
|
||||
}
|
||||
</code>
|
||||
</pre>
|
||||
<p>As the link text and the url are stored, using the variables `start` and `end`, it is easy to reconstruct the html line :</p>
|
||||
<pre><code>line = substr(line, 1, start-1) "<a href="" matched_url "">" matched_link "</a>" substr(line, end+1)
|
||||
</code>
|
||||
</pre>
|
||||
<p>The inline parsing function is now complete, all we have to do it apply is systematically on the text within html tags and this finished the markdown parser.</p>
|
||||
<p>This, of course, is the first brick of a static site generator, maybe the most complexe one. </p>
|
||||
<p>We shall see up next how to orchestrate this parser to make is a actual site generator.</p>
|
||||
</article>
|
||||
</body>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user