A problem with backtick code fences

If you look at the GitHub code fences, they specify language without a space:

```ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```

I think a multi-word restriction here, or even a space restriction is probably fine.

Option #1, removing backtick code fences, is just not an option.

I agree that requiring the info string to follow immediately, without intervening space, is probably the best solution: if a code span is indeed intended, one can always insert a SPACE after the backtick string, and this SPACE will then be trimmed off the code span’s content anyway according to the current syntax rules.

And even if the backtick string of such a code span happens to end up at a begin-of-line, say through re-flowing lines, any text formatter would preserve or restore the following SPACE, so even in this case all is well.

And an opening code fence without an info string can’t occur “by accident” in a paragraph either, I’d say.

So apart from resolving an ambiguity, this rule would also make code spans “robust” in the face of re-flowing the lines of a paragraph, as far as I can tell. Which is nice!

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it. So, if we didn’t allow a space before the info spring, we’d risk breaking a lot of existing content. That’s a big strike against your proposal. And I don’t see how it really addresses the underlying problem. After all, a space isn’t required in inline code, so the ambiguity still needs to be addressed.

1 Like

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it.

I have to admit that this went through my mind, but then I conveniently kept silent about the issue—which yet is there, you’re right.

And I don’t see how it really addresses the underlying problem.

I may be wrong, but assume the info string (if any) in an opening code fence were required to follow immediately after the backtick string. Now consider a case where you write a code span, but it then happens too look like a code fence, say:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because here “sample” is meant to be part of the code span, and is not meant to be an info string (by assumption!), simply inserting a SPACE would disambiguate (what a word!) the situation:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```␣sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because of the SPACE (and the assumed syntax rule change), it can’t mean a code block any more, and because of the “trim one SPACE”-rule for code spans, the “meaning” or content of the—now unambiguous—code span hasn’t changed.

Does that make sense now?

If there are multiple words after the opening code fence – with or without a space – then I would argue the author probably intended those words to be the first line of code, and not an info string at all.

```multiple words here
more content here
```

```␣multiple words here
more content here
```

A few options in that case:

  1. Be flexible and treat those multiple words as the first line of a code block.
  2. Be rigid and break the code block, turning the entire first line into literal text with three visible backticks. (GitHub does this.)
  3. Be selective and use only the first word on that line as the info string, discarding the rest of the line.
  4. Be graceful and preserve all the text between the two code fences, but have it degrade to inline code.

I would choose option 4, because of the principle of least surprise. (“Why did the entire first line of my code block disappear?” vs. “Looks my code is displaying inline. I’d better double-check my formatting.”)

It also solves the problem that began this thread:

…then you can put that code between triple-backticks, with the first 2+ words of code on the same line as the opening triple backticks.


TL;DR If the opening three-backtick code fence is followed by

  • Zero words, it starts a code block.
  • One word, it starts a code block and includes a one-word info string.
  • Two or more words, it starts an inline code span.

Then, since the info string is limited to one word, I think there should be no problem allowing a space before it.

The info string isn’t currently limited to one word. This was the subject of some debate when we first started talking about this, so let me recapitulate it.

Pandoc’s fenced code blocks have always allowed specification of quite a bit of structured information:

``` {.class .class #id key=value key="value"}
code here
```

This is especially useful when the code blocks are postprocessed (e.g. by a pandoc “filter”). You might, for example, have a filter that takes specially marked code blocks and converts them to charts. And in that case you might want to have attributes like width, height, background-color…

Even for source code, you might want to specify whether to number lines, what highlighting style to use, and so on.

So limiting the info to a single word would really be too limiting.

Originally I proposed something like the pandoc format, but others didn’t want to get behind that, so we came up with a compromise that the spec wouldn’t specify the format of the info string or what was to be done with it. Perhaps this should be revisited.

Revisiting this sentence. I agree “no way to write this” is a showstopper. So perhaps allow people to also use tildes in this particular scenario? In other words support both tilde and backtick code fences.

Does that solve it to your satisfaction?

@codinghorror - You can already have tilde code fences for code blocks. This doesn’t help with the problem, which is how to express a certain kind of code span (not a code block). And even if we allowed tildes to be used for code spans (which would conflict with common extensions that use them for strikeout), we’d still have the problem.

I see, I think I misunderstood the example:

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

~~~ hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
~~~

Forbid the space before the info string if it contains more words/spaces:

```hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because it only contains a single word after the backticks!
```

```␣hello␣
this IS inline code with one backtick ` and two backticks ` (?)
```

```␣hello␣world
this IS inline code with one backtick ` and two backticks `
```

```hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no word but punctuation after the space!
```

Maybe we’re overthinking this. Simpler rules are better, right?

How about:

  1. Triple backticks or triple tildes: Code fence.

  2. Opening fence begins a line, closing fence begins a new line: Code block.

  3. Opening fence begins a line, but closing fence does not begin a new line: Inline code.

  4. Text on the same line as the opening fence of a code block: Info string.

Any info string is thus allowed, even multiple words, even with a preceding space.

As for the issue that began this thread…

``` To include ` and `` backticks in inline code,
the closing fence should not be at the start of a new line,
but rather after code, like this. ``` And here's some non-code inline text.

This starts a paragraph with inline code, including single and double backticks.

It works in the current commonmark, markdown.pl, and most other flavors. (Babelmark 2 test, Babelmark 3 test.)

And, as pointed out by cben and Ajedi32, it follows the convention of *other* types of **delimiters** being written _inline_.

@jkdev - your proposal does nothing to remove the ambiguities.

1 ``` code
2 foo
3 bar ```
4 ```

Is this a code block that ends on line 4? Or a code span that ends on line 3? You could resolve this in favor of the latter by saying that we close as soon as we can, but this breaks backwards compatibility for code blocks containing strings of backticks, and creates difficulties expressing code blocks containing backticks.

Not being able to identify a code block by the first line would also break a lot of very nice properties of our present parsers, which identify block structure first, inline structure later.

It may be that this issue is enough of a corner case that we shouldn’t obsess about it. The only real “blind spot” there is is for inline code that contains strings of two backticks and occurs at the beginning of a paragraph (otherwise you can reorganize it so it doesn’t start at the beginning of the line).

I suppose another solution would be to allow only one-word info strings with backtick code blocks, while allowing free-form info strings with tilde code blocks. I hesitate to do that, though, as it complicates the mental model. (Why can I do this with tildes but not backticks?)

2 Likes

While it woulndn’t resolve the original problem,
I’m wondering if changing code block rules to close on triple (or however many) backticks anywhere in the line is feasible. In your example, it’d be a code block that ends at line 3.

Motivation for closing early:

  1. Current behavior is not essential: I never realized that I can safely include ``` in code blocks as long as they don’t start the line. But I can always use more backticks (````) to start/end the code block, which is a simple rule covering all cases.

  2. “Compatibility” with original markdown: backtick-fenced code block syntax degrades gracefully to inline code in tools that don’t understand fenced blocks [^1]. That’s a good property and IMO should be maximized. However, tools that think ``` starts inline code will stop on the first ``` anywhere.

Babelmark confirms about half implementations support fenced blocks and only stop on final start-of-line backticks, while half don’t understand fenced blocks and stop the inline code on line 3.

  • marked (0.2.6) is only one that supports fenced blocks AND stops early on line 3. It only does that for ``` at an end of the line — if text follows, it doesn’t close the block there.

  • AFAICT no tool follows @jkdev’s proposal of switching from block to inline when closing fence is mid-line.

[^1] I lied: it only degrades gracefully without empty lines — empty lines abort inline code but not fenced blocks.
That’s why deciding block structure first is important. Consider this paradox:

```
Is this inline code or code block?

Closing fence is not at start of line: ``` And here's some non-code inline text.

If the block/inline decision depends on lookahead to where closing backticks are, it can’t be either — inline shouldn’t cross the empty line and block shouldn’t have mid-line termination. I.e. you don’t know how far to look for termination before you found the termination…

Interoperability is especially critical for agreeing where code starts and ends

Code is fundamental like escaping — it suppresses markdown-significant constructs, so if you don’t agree about whether it’s code, you get cascading confusion…
Fenced blocks make it worse — disagreeing about just one top-level fence can catastrophically flip the meanings of everything till the end of the document!

  • That’s why any limits on info strings worry me. The simplest rule “3+ backticks/tildes followed by anything [without backticks] starts a block” is probably our best chance for agreement. (Ignoring, or treating as code, info strings you don’t understand is fine, as long as you still consider it a code block.)

  • If we agree it’s code but don’t agree block vs inline, that’s still rather good!
    IIUC the only exceptions are (1) empty lines (2) mid-line closing backticks.
    Fenced blocks without empty lines are probably a non-starter, but perhaps (2) could be harmonized?
    As noted above all but one implementations with fenced blocks disregard mid-line backticks. So there is no single best-compatibility answer here :frowning:

    • “Parse block structure before inline” principle is I think new to CommonMark? It’s more or less implicit in the language, but I think many existing parsers have more ad-hoc structure. Ad-hoc parsing favors a single, simple, rule for code termination, whether it’s block or inline. [I’m thinking here not only of full parsers but about approximations like editor syntax highlighting…]
1 Like

Revisiting this thread, I see only two possible solutions.

One is @Crissov’s, which is to require the info string to start immediately after the backticks (with no intervening space) if it contains more than one word. (If it is just one word, then a space is okay; we want this for backwards compatibility since many implementations allow a space.) That is:

```␣hello␣world
this IS inline code with one backtick ` and two backticks `
```

```hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

The second is to constrain the info string; instead of allowing it to be anything, we could limit to, say, a bracketed list of key/value pairs:

Example:

``` haskell {class="numberLines" id="mycodesample" startline="15"}
let x = x + 1 in x
```

One option would be to allow any pandoc-style attributes, e.g. {#id .class1 .class2 key="value" booleankey}.

I think I prefer the option of giving some structure to the info string to the option of forbidding the space when there’s more than a single word in the info string, since the latter makes the presence of a single space have a big effect (and only in some cases), which might be surprising.

But nobody on this thread has actually commented on the idea of giving more structure to the info string.

1 Like

I have no personal preferences. Currently i use this syntax for fenced quotes extension:

```quote http://link.to/origin
multiline
markdown
content
```

but it’s not a big problem to switch to ~~~ delimiters, if constraints are for backticks only.

  • Solution 0: do nothing.

    It’s ugly, but it’s needed only for the very rare combination of (1) inline code (2) which is annoyingly long (3) containing `` (4) at start of paragraph. Are we still discussing that one use case, or is there a bigger purpose?

    Is hard-wrapping so critical that a long line == “no way to write this”? Yes, most of the syntax is wrappable (even setext headers now!), but here we’re dealing with a conflict between inline code and line blocks, of which the latter is inherently not wrappable. If you wanted to express exactly the same long line of code in a code block, there would be no question that writing it as one line is the only option, and that’s OK.

  • IMHO allowing ``` oneword but requiring no space in ```two words is too surprising.

  • Do any existing implementations constrain the info string?
    Let’s see, Babelmark shows few varying on space before word, and few varying on multi-word info string. But both are rare.
    What happens when info string doesn’t adhere to the constraint? It’s still code, but inline code, right?

I’m worried about different implementations disagreeing on what’s code and what’s markdown. Especially on top level where off-by-one-fence can flip the meaning of a whole document.
Disagreement between block and inline is less severe, but with empty line it can grow into a what’s-code disagreement:

However this situation already exists between implementations that understand fenced blocks and those that don’t.
Not sure we’d be increasing the risk by constraining info string syntax.

The above Babelmark link shows some catastrophic text-became-code disagreements — even without empty lines!
(marked, PHP Markdown Extra, Maruku) and some milder code-became-text cases.
It seems some implementations do direct code block → text fallback, which is bad for interoperability :frowning:

Intuitively, I expect any constraints on info strings should reduce interoperability; the current “anything goes (except backticks)” rule is simplest so should have better chance to be widely adopted. However in practice it’s it might not matter…

4 Likes

Oh. The disagreements in above babelmark are exactly like what you explained to me above in 2015 as not a bug:

``` something that's illegal info string
The line above can NOT start a fenced block,
(this will be treated as text rather than code)
but the line below can:
```

(This text will be treated as code.)

``` another info string
The line above can NOT terminate the fenced block,
(which is good, this is treated as code as intended)
only the line below terminates the block:
```

I’m not sure that’s what happens in marked, PHP Markdown Extra, Maruku, but it’s what CommonMark would do if you constrain the info string.
A human reading sequentially thinks “opening fence, or at least start of inline code”; but the spec looks for block structure before inline, so as soon as you reject a an intended opening fence, you’ll lock on the closing fence instead.

Here is another way this fails:

````` outer fence

Backticks on next line are just code
``` all inside outer fence, right?
foo

`````

But if “outer fence” is some illegal info string, it stops being a fence, and the 5 backticks no longer shield the 3 backticks inside!

The only solution I see is never constraining info strings (well we must outlaw backticks for inline one-liners).
IIUC anything else inevitably creates interoperability problems.

Nice point that “Option 0” may actually be better than the other options, given the costs of each.

There are implementations that constrain the info string. Pandoc allows either a GitHub-style single world or pandoc-style attributes like {#id .class .class2 key="value"}.

2 Likes

There is similar problem with ~~~text~~~

1 Like

Can we remove this from the list of blockers for 1.0? “Do nothing” is a very appealing choice @jgm.

3 Likes