A problem with backtick code fences

Proposal

This is a proposal to solve this problem, and consists of these 5 statements.
(But #5 is optional.)

  1. We prohibit paragraph consists of only one code span whose

    • backtick string is more than 2 backtiks

    AND

    • end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces
  2. Code span doesn’t convert line ending to space. Just remove line endings.

  3. Length of code fence must be longer than sequence of backticks this fenced code block contains.

  4. Length of backtick string must be longer than sequence of backticks this code span contains.

  5. Fenced code block requires blank line before & after it.

Below, I will explain these. Sorry for long script.


About #1 & #2

If backtick string consists of less than 3 backticks

This is code span. And it’s not matter whether backtick string has its own line or not.
Because length of code fence must be at least 3 backticks, so they are recognized as code span.

If text which preceeds end backtick contains non-space charactor

This is code span. And it’s not matter whether start backtick string has its own line or not, and how long the length of backtick string is. Because closed code fence must have its own line.

If text which follows end backtick string contains non-space charactor

This is code span. And it’s not matter whether start backtick string has its own line or not, and how long the length of backtick string is. Because closed code fence must have its own line.

If end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces, and length of backtisk strings is longer than 2

This is when ambiguity comes.

``` nice
days
```

Is this meant to be

this one

<p><code> nice days</code></p>

or this one?

<pre><code class="language-nice">days
</code></pre>

To determine whether this is fenced code block or code span, I introduce one restriction on paragraph (#1).
(Why on paragraph? Because this fenced code block vs code span ambiguity only happens in paragraph, I think.)

  1. We prohibit paragraph consists of only one code span whose

    • backtick string is more than 2 backtiks

    AND

    • end backtick string has its own line, and is preceeded by only 0~3 space(s) AND followed by any number of spaces

Then, if we want paragraph consists of only one code span whose backtick string is more than 2 backticks, we make end backtick string preceeded by content of code span through splitting content with line ending.

```content
of code spa
n```

But as of CommonMark 0.29, if content of code span consists of only one long string, this doesn’t work well.

Because, content of code span is normalized as following ways;

  • First, line endings are converted to spaces.
  • If the resulting string both begins and ends with a space
    character, but does not consist entirely of space
    characters, a single space character is removed from the
    front and back. This allows you to include code that begins
    or ends with backtick characters, which must be separated by
    whitespace from the opening or closing backtick strings.

So, if we make code span only contains one long string (e.g. sha256 hash) and want to hard-wrap it, this normalization process introduces problem:

This one code span

`sha256:e3b0c44298fc1c149afbf4c8996fb
92427ae41e4649b934ca495991b7852b855`

will result in

sha256:e3b0c44298fc1c149afbf4c8996fb 92427ae41e4649b934ca495991b7852b855

But, if you copy&paste this result, you see a space between 8996fb and 92427a.
This is not what I expect.

So, #2 comes:

  1. Code span doesn’t convert line ending to space. Just remove line endings.

About #3 & #4

#1 & #2 are not sufficient.

Following examples have still umbiguity.

#one fenced code block OR one code span followed by ```?

```cannot determine
whether fenced code block containing ```
```

#one code span OR two code spans?

```
two``` ```code spans?```

With #3 & #4, we resolve both examples’ umbiguity.

For first example,

if you intend this to be one fenced code block, following #3, fence code’s length must be longer than 3.

`````cannot determine
whether fenced code block containing ```
`````

if you intend this to be one code span followed by ```, no change makes sense.
(If parser obeys #3, then it thinks this is not fence code block.)

```cannot determine
whether fenced code block containing ```
```

For second example,

if you intend this to be one code span, following #4, length of backtick strings must be longer than 3.

`````
two``` ```code spans?`````

if you intend this to be two code spans, no change makes sense.
(If parser obeys #4, then it thinks this is not one code span.)

```
two``` ```code spans?```

About #5

I think there is one problem, if I follow only #1, #2, #3 and #4 (but not #5).

If I want one code span to be embedded in paragraph like below,

abc def
```ghi
jkl mno
```
pqr stu

this will be understood by parser as ‘one paragraph’ + ‘one fenced code block’ + ‘one paragraph’.

Because, as of CommonMark 0.29,

A fenced code block may interrupt a paragraph, and does not require a blank line either before or after.

To deal with this situation, we need #5:

  1. Fenced code block requires blank line before & after it.

Thank you for reading.

@vanou I think this is too complex; in addition, requiring blank space before and after a fenced code block would break too many existing documents.

@codinghorror - I think “do nothing” is probably okay for now. A better solution, I think, is the pandoc one: constrain the info string so that it is either (a) a single word or (b) a pandoc-style attribute block in curly braces. This would require agreement on the attribute syntax and would rule out free-form info strings.

1 Like

@jgm

(As respond to your feedback, relaxing policy)
It’s enough to only follow #1.
And putting #2 & #3 & #4 aside is ok.

As I said, #5 is optional. It’s ok to reject #5, because of backward incompatibility. Backward compatibility is important.


About pandoc style

Constraining style of info string means constraining code span style.

Following your idea, if code span’s start backtick string has its own line, (a) a single word or (b) a pandoc-style attribute block in curly spaces cannot follow start backtick in same line?

Does this constraints impose backward-incompatibility?

a) red or seem backwards compatible. In addition to an optional single “word” at the start, CM would need to allow any number of “words” preceded by any of #, ., @, ?, !, _, -, +, and both sides of = and :, as well as quoted (', ") and parenthetical ((), [], <>, {}) strings that may include spaces. Other ASCII punctuation may be in current use as prefix as well.

Sorry. This is same as @cben 's idea.

I agree with @cben 's idea. It’s easy to understand and simple.

But of course there are cases where you need that
space: when the content being quoted ends with a
backtick.

Admittedly, though, this is a rare kind of case.
Which is why, in practice, this issue can probably
be ignored for now (option 0).

@jgm

Ah. Yes, it could happen. I agree.
I agree that @cben 's idea is not enough.

How about this approach?

The line including closing backtick must have at least one non-space character, other than closing backtick, that comes before OR after closing backtick.