This commit is contained in:
parent
9a62a00553
commit
c8a549cc7c
0
drafts/markdown_testing_suite.md
Normal file
0
drafts/markdown_testing_suite.md
Normal file
@ -14,7 +14,7 @@ AWK, from the intials of its creator, is an old an powerful text file maniulatio
|
||||
Its [wikipedia page](https://en.wikipedia.org/wiki/AWK) sums up nicely its story.
|
||||
I thought it was clever to use is for a site generator, to parse markdown files and generate html ones.
|
||||
However, according to this [listing](https://jamstack.org/generators/) of static site generator programs, another one has had the same idea.
|
||||
Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8years).
|
||||
Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8 years).
|
||||
|
||||
## Parsing markdown
|
||||
|
||||
@ -41,7 +41,7 @@ In the example above, as per the [documentation](https://www.gnu.org/software/ga
|
||||
it returns the subtring of `$0` starting at 3 (1 being `#` and 2 the whitespace following it) to the end of the line.
|
||||
|
||||
Now this is better, but we now are able to generalized it to all headers. Another function, `match` can return the number of char matched by a regex,
|
||||
and allows the script to dynamically determine which depth of header it parses :
|
||||
and allows the script to dynamically determine which depth of header it parses. This length is stored is the global variable `RLENGTH`:
|
||||
|
||||
/^#+ / {
|
||||
match($0, /#+ /);
|
||||
@ -146,11 +146,8 @@ Of course I am aware that is lacks emphasis, strong and code within a line of te
|
||||
However I did implement it, but maybe it will be explained in another edit of this post.
|
||||
Nonetheless the code can still be consulted on [github](https://github.com/SiwonP/bob).
|
||||
|
||||
# A testing suite for markdown parser
|
||||
|
||||
Having a markdown parser is cool, having one well tested id better.
|
||||
I embarked in writing a testing suite for markdown parsers. I wanted it to be generic, meaning you only had to provide a parsing program,
|
||||
that takes markdown in the standard input, and returns html in the standard output.
|
||||
All tests would be provided by the test suite.
|
||||
## Parsing in-line fonctionnalities
|
||||
|
||||
For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.
|
||||
Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.
|
||||
|
||||
|
@ -21,12 +21,12 @@
|
||||
<p>Anyway, writing this static site generator from scratch is also the perfect excuse to explore a not so widely know technology to manipulate text files. </p>
|
||||
<h2>Introduction to AWK</h2>
|
||||
<p>AWK, from the intials of its creator, is an old an powerful text file maniulation. Syntactically close to C, it is a scripting language to manipulation text entries.</p>
|
||||
<p>Its [wikipedia page](https://en.wikipedia.org/wiki/AWK) sums up nicely its story.</p>
|
||||
<p>Its <a href="https://en.wikipedia.org/wiki/AWK">wikipedia page</a> sums up nicely its story.</p>
|
||||
<p>I thought it was clever to use is for a site generator, to parse markdown files and generate html ones.</p>
|
||||
<p>However, according to this [listing](https://jamstack.org/generators/) of static site generator programs, another one has had the same idea.</p>
|
||||
<p>Hence, the following, as well as my code is heavily inspired by [Zodiac](https://github.com/nuex/zodiac) (even though the repo has not been touched for 8years).</p>
|
||||
<p>However, according to this <a href="https://jamstack.org/generators/">listing</a> of static site generator programs, another one has had the same idea.</p>
|
||||
<p>Hence, the following, as well as my code is heavily inspired by <a href="https://github.com/nuex/zodiac">Zodiac</a> (even though the repo has not been touched for 8 years).</p>
|
||||
<h2>Parsing markdown</h2>
|
||||
<p>Following the official [syntax](https://daringfireball.net/projects/markdown/syntax), is a good start for a parser.</p>
|
||||
<p>Following the official <a href="https://daringfireball.net/projects/markdown/syntax">syntax</a>, is a good start for a parser.</p>
|
||||
<p>AWK works as follow : it takes an optional regex and execute some code between bracket, as a function, at each line of the text input.</p>
|
||||
<p>For example :</p>
|
||||
<pre><code>/^#/ {
|
||||
@ -44,10 +44,10 @@
|
||||
}
|
||||
</code>
|
||||
</pre>
|
||||
<p>In the example above, as per the [documentation](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function) </p>
|
||||
<p>In the example above, as per the <a href="https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#index-substr_0028_0029-function">documentation</a> </p>
|
||||
<p>it returns the subtring of `$0` starting at 3 (1 being `#` and 2 the whitespace following it) to the end of the line.</p>
|
||||
<p>Now this is better, but we now are able to generalized it to all headers. Another function, `match` can return the number of char matched by a regex,</p>
|
||||
<p>and allows the script to dynamically determine which depth of header it parses :</p>
|
||||
<p>and allows the script to dynamically determine which depth of header it parses. This length is stored is the global variable `RLENGTH`:</p>
|
||||
<pre><code>/^#+ / {
|
||||
match($0, /#+ /);
|
||||
n = RLENGTH;
|
||||
@ -147,12 +147,10 @@ function last() {
|
||||
<p>This way we are able to simply parse markdown and turn it into an HTML file.</p>
|
||||
<p>Of course I am aware that is lacks emphasis, strong and code within a line of text. </p>
|
||||
<p>However I did implement it, but maybe it will be explained in another edit of this post.</p>
|
||||
<p>Nonetheless the code can still be consulted on [github](https://github.com/SiwonP/bob).</p>
|
||||
<h1>A testing suite for markdown parser</h1>
|
||||
<p>Having a markdown parser is cool, having one well tested id better.</p>
|
||||
<p>I embarked in writing a testing suite for markdown parsers. I wanted it to be generic, meaning you only had to provide a parsing program,</p>
|
||||
<p>that takes markdown in the standard input, and returns html in the standard output.</p>
|
||||
<p>All tests would be provided by the test suite.</p>
|
||||
<p>Nonetheless the code can still be consulted on <a href="https://github.com/SiwonP/bob">github</a>.</p>
|
||||
<h2>Parsing in-line fonctionnalities</h2>
|
||||
<p>For now we have seen a way to parse blocks, but markdown also handles strong, emphasis and links. However, these tags can appear anywhere in a line.</p>
|
||||
<p>Hence we need to be able to parse these lines apart from the block itself : indeed a header can container a strong and a link.</p>
|
||||
</article>
|
||||
</body>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user