Proposal
This is a proposal to solve this problem, and consists of these 5 statements.
(But #5 is optional.)
-
We prohibit paragraph consists of only one code span whose
- backtick string is more than 2 backtiks
AND
- end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces
-
Code span doesn’t convert line ending to space. Just remove line endings.
-
Length of code fence must be longer than sequence of backticks this fenced code block contains.
-
Length of backtick string must be longer than sequence of backticks this code span contains.
-
Fenced code block requires blank line before & after it.
Below, I will explain these. Sorry for long script.
About #1 & #2
If backtick string consists of less than 3 backticks
This is code span. And it’s not matter whether backtick string has its own line or not.
Because length of code fence must be at least 3 backticks, so they are recognized as code span.
If text which preceeds end backtick contains non-space charactor
This is code span. And it’s not matter whether start backtick string has its own line or not, and how long the length of backtick string is. Because closed code fence must have its own line.
If text which follows end backtick string contains non-space charactor
This is code span. And it’s not matter whether start backtick string has its own line or not, and how long the length of backtick string is. Because closed code fence must have its own line.
If end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces, and length of backtisk strings is longer than 2
This is when ambiguity comes.
``` nice days ```
Is this meant to be
this one
<p><code> nice days</code></p>
or this one?
<pre><code class="language-nice">days </code></pre>
To determine whether this is fenced code block or code span, I introduce one restriction on paragraph (#1).
(Why on paragraph? Because this fenced code block vs code span ambiguity only happens in paragraph, I think.)
We prohibit paragraph consists of only one code span whose
- backtick string is more than 2 backtiks
AND
- end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces
Then, if we want paragraph consists of only one code span whose backtick string is more than 2 backticks, we make end backtick string preceeded by content of code span through splitting content with line ending.
```content
of code spa
n```
But as of CommonMark 0.29, if content of code span consists of only one long string, this doesn’t work well.
Because, content of code span is normalized as following ways;
- First, line endings are converted to spaces.
- If the resulting string both begins and ends with a space
character, but does not consist entirely of space
characters, a single space character is removed from the
front and back. This allows you to include code that begins
or ends with backtick characters, which must be separated by
whitespace from the opening or closing backtick strings.
So, if we make code span only contains one long string (e.g. sha256 hash) and want to hard-wrap it, this normalization process introduces problem:
This one code span
`sha256:e3b0c44298fc1c149afbf4c8996fb
92427ae41e4649b934ca495991b7852b855`
will result in
sha256:e3b0c44298fc1c149afbf4c8996fb 92427ae41e4649b934ca495991b7852b855
But, if you copy&paste this result, you see a space between 8996fb
and 92427a
.
This is not what I expect.
So, #2 comes:
- Code span doesn’t convert line ending to space. Just remove line endings.
About #3 & #4
#1 & #2 are not sufficient.
Following examples have still umbiguity.
#one fenced code block OR one code span followed by ```?
```cannot determine whether fenced code block containing ``` ```
#one code span OR two code spans?
``` two``` ```code spans?```
With #3 & #4, we resolve both examples’ umbiguity.
For first example,
if you intend this to be one fenced code block, following #3, fence code’s length must be longer than 3.
`````cannot determine whether fenced code block containing ``` `````
if you intend this to be one code span followed by ```, no change makes sense.
(If parser obeys #3, then it thinks this is not fence code block.)```cannot determine whether fenced code block containing ``` ```
For second example,
if you intend this to be one code span, following #4, length of backtick strings must be longer than 3.
````` two``` ```code spans?`````
if you intend this to be two code spans, no change makes sense.
(If parser obeys #4, then it thinks this is not one code span.)``` two``` ```code spans?```
About #5
I think there is one problem, if I follow only #1, #2, #3 and #4 (but not #5).
If I want one code span to be embedded in paragraph like below,
abc def
```ghi
jkl mno
```
pqr stu
this will be understood by parser as ‘one paragraph’ + ‘one fenced code block’ + ‘one paragraph’.
Because, as of CommonMark 0.29,
A fenced code block may interrupt a paragraph, and does not require a blank line either before or after.
To deal with this situation, we need #5:
- Fenced code block requires blank line before & after it.
Thank you for reading.