I agree that requiring the info string to follow immediately, without intervening space, is probably the best solution: if a code span is indeed intended, one can always insert a SPACE after the backtick string, and this SPACE will then be trimmed off the code spanâs content anyway according to the current syntax rules.
And even if the backtick string of such a code span happens to end up at a begin-of-line, say through re-flowing lines, any text formatter would preserve or restore the following SPACE, so even in this case all is well.
And an opening code fence without an info string canât occur âby accidentâ in a paragraph either, Iâd say.
So apart from resolving an ambiguity, this rule would also make code spans ârobustâ in the face of re-flowing the lines of a paragraph, as far as I can tell. Which is nice!
Iâve always used a space after the backticks, and every Markdown converter Iâve ever interacted with has accepted it. So, if we didnât allow a space before the info spring, weâd risk breaking a lot of existing content. Thatâs a big strike against your proposal. And I donât see how it really addresses the underlying problem. After all, a space isnât required in inline code, so the ambiguity still needs to be addressed.
Iâve always used a space after the backticks, and every Markdown converter Iâve ever interacted with has accepted it.
I have to admit that this went through my mind, but then I conveniently kept silent about the issueâwhich yet is there, youâre right.
And I donât see how it really addresses the underlying problem.
I may be wrong, but assume the info string (if any) in an opening code fence were required to follow immediately after the backtick string. Now consider a case where you write a code span, but it then happens too look like a code fence, say:
Here is a *code span*, which starts with a *backtick string* and crosses lines:
```sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.
Because here âsampleâ is meant to be part of the code span, and is not meant to be an info string (by assumption!), simply inserting a SPACE would disambiguate (what a word!) the situation:
Here is a *code span*, which starts with a *backtick string* and crosses lines:
```âŁsample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.
Because of the SPACE (and the assumed syntax rule change), it canât mean a code block any more, and because of the âtrim one SPACEâ-rule for code spans, the âmeaningâ or content of theânow unambiguousâcode span hasnât changed.
If there are multiple words after the opening code fence â with or without a space â then I would argue the author probably intended those words to be the first line of code, and not an info string at all.
```multiple words here
more content here
```
```âŁmultiple words here
more content here
```
A few options in that case:
Be flexible and treat those multiple words as the first line of a code block.
Be rigid and break the code block, turning the entire first line into literal text with three visible backticks. (GitHub does this.)
Be selective and use only the first word on that line as the info string, discarding the rest of the line.
Be graceful and preserve all the text between the two code fences, but have it degrade to inline code.
I would choose option 4, because of the principle of least surprise. (âWhy did the entire first line of my code block disappear?â vs. âLooks my code is displaying inline. Iâd better double-check my formatting.â)
It also solves the problem that began this thread:
âŚthen you can put that code between triple-backticks, with the first 2+ words of code on the same line as the opening triple backticks.
TL;DR If the opening three-backtick code fence is followed by
Zero words, it starts a code block.
One word, it starts a code block and includes a one-word info string.
Two or more words, it starts an inline code span.
Then, since the info string is limited to one word, I think there should be no problem allowing a space before it.
The info string isnât currently limited to one word. This was the subject of some debate when we first started talking about this, so let me recapitulate it.
Pandocâs fenced code blocks have always allowed specification of quite a bit of structured information:
``` {.class .class #id key=value key="value"}
code here
```
This is especially useful when the code blocks are postprocessed (e.g. by a pandoc âfilterâ). You might, for example, have a filter that takes specially marked code blocks and converts them to charts. And in that case you might want to have attributes like width, height, background-colorâŚ
Even for source code, you might want to specify whether to number lines, what highlighting style to use, and so on.
So limiting the info to a single word would really be too limiting.
Originally I proposed something like the pandoc format, but others didnât want to get behind that, so we came up with a compromise that the spec wouldnât specify the format of the info string or what was to be done with it. Perhaps this should be revisited.
Revisiting this sentence. I agree âno way to write thisâ is a showstopper. So perhaps allow people to also use tildes in this particular scenario? In other words support both tilde and backtick code fences.
@codinghorror - You can already have tilde code fences for code blocks. This doesnât help with the problem, which is how to express a certain kind of code span (not a code block). And even if we allowed tildes to be used for code spans (which would conflict with common extensions that use them for strikeout), weâd still have the problem.
``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```
~~~ hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
~~~
Forbid the space before the info string if it contains more words/spaces:
```hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```
```.hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```
```âŁhello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because it only contains a single word after the backticks!
```
```âŁhelloâŁ
this IS inline code with one backtick ` and two backticks ` (?)
```
```âŁhelloâŁworld
this IS inline code with one backtick ` and two backticks `
```
```helloâŁworld
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```
```.helloâŁworld
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```
```âŁ.helloâŁworld
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no word but punctuation after the space!
```
Maybe weâre overthinking this. Simpler rules are better, right?
How about:
Triple backticks or triple tildes: Code fence.
Opening fence begins a line, closing fence begins a new line: Code block.
Opening fence begins a line, but closing fence does not begin a new line: Inline code.
Text on the same line as the opening fence of a code block: Info string.
Any info string is thus allowed, even multiple words, even with a preceding space.
As for the issue that began this threadâŚ
``` To include ` and `` backticks in inline code,
the closing fence should not be at the start of a new line,
but rather after code, like this. ``` And here's some non-code inline text.
This starts a paragraph with inline code, including single and double backticks.
@jkdev - your proposal does nothing to remove the ambiguities.
1 ``` code
2 foo
3 bar ```
4 ```
Is this a code block that ends on line 4? Or a code span that ends on line 3? You could resolve this in favor of the latter by saying that we close as soon as we can, but this breaks backwards compatibility for code blocks containing strings of backticks, and creates difficulties expressing code blocks containing backticks.
Not being able to identify a code block by the first line would also break a lot of very nice properties of our present parsers, which identify block structure first, inline structure later.
It may be that this issue is enough of a corner case that we shouldnât obsess about it. The only real âblind spotâ there is is for inline code that contains strings of two backticks and occurs at the beginning of a paragraph (otherwise you can reorganize it so it doesnât start at the beginning of the line).
I suppose another solution would be to allow only one-word info strings with backtick code blocks, while allowing free-form info strings with tilde code blocks. I hesitate to do that, though, as it complicates the mental model. (Why can I do this with tildes but not backticks?)
While it woulndnât resolve the original problem,
Iâm wondering if changing code block rules to close on triple (or however many) backticks anywhere in the line is feasible. In your example, itâd be a code block that ends at line 3.
Motivation for closing early:
Current behavior is not essential: I never realized that I can safely include ``` in code blocks as long as they donât start the line. But I can always use more backticks (````) to start/end the code block, which is a simple rule covering all cases.
âCompatibilityâ with original markdown: backtick-fenced code block syntax degrades gracefully to inline code in tools that donât understand fenced blocks [^1]. Thatâs a good property and IMO should be maximized. However, tools that think ``` starts inline code will stop on the first ``` anywhere.
Babelmark confirms about half implementations support fenced blocks and only stop on final start-of-line backticks, while half donât understand fenced blocks and stop the inline code on line 3.
AFAICT no tool follows @jkdevâs proposal of switching from block to inline when closing fence is mid-line.
[^1] I lied: it only degrades gracefully without empty lines â empty lines abort inline code but not fenced blocks.
Thatâs why deciding block structure first is important. Consider this paradox:
```
Is this inline code or code block?
Closing fence is not at start of line: ``` And here's some non-code inline text.
If the block/inline decision depends on lookahead to where closing backticks are, it canât be either â inline shouldnât cross the empty line and block shouldnât have mid-line termination. I.e. you donât know how far to look for termination before you found the terminationâŚ
Interoperability is especially critical for agreeing where code starts and ends
Code is fundamental like escaping â it suppresses markdown-significant constructs, so if you donât agree about whether itâs code, you get cascading confusionâŚ
Fenced blocks make it worse â disagreeing about just one top-level fence can catastrophically flip the meanings of everything till the end of the document!
Thatâs why any limits on info strings worry me. The simplest rule â3+ backticks/tildes followed by anything [without backticks] starts a blockâ is probably our best chance for agreement. (Ignoring, or treating as code, info strings you donât understand is fine, as long as you still consider it a code block.)
If we agree itâs code but donât agree block vs inline, thatâs still rather good!
IIUC the only exceptions are (1) empty lines (2) mid-line closing backticks.
Fenced blocks without empty lines are probably a non-starter, but perhaps (2) could be harmonized?
As noted above all but one implementations with fenced blocks disregard mid-line backticks. So there is no single best-compatibility answer here
âParse block structure before inlineâ principle is I think new to CommonMark? Itâs more or less implicit in the language, but I think many existing parsers have more ad-hoc structure. Ad-hoc parsing favors a single, simple, rule for code termination, whether itâs block or inline. [Iâm thinking here not only of full parsers but about approximations like editor syntax highlightingâŚ]
Revisiting this thread, I see only two possible solutions.
One is @Crissovâs, which is to require the info string to start immediately after the backticks (with no intervening space) if it contains more than one word. (If it is just one word, then a space is okay; we want this for backwards compatibility since many implementations allow a space.) That is:
```âŁhelloâŁworld
this IS inline code with one backtick ` and two backticks `
```
```helloâŁworld
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```
The second is to constrain the info string; instead of allowing it to be anything, we could limit to, say, a bracketed list of key/value pairs:
Example:
``` haskell {class="numberLines" id="mycodesample" startline="15"}
let x = x + 1 in x
```
One option would be to allow any pandoc-style attributes, e.g. {#id .class1 .class2 key="value" booleankey}.
I think I prefer the option of giving some structure to the info string to the option of forbidding the space when thereâs more than a single word in the info string, since the latter makes the presence of a single space have a big effect (and only in some cases), which might be surprising.
But nobody on this thread has actually commented on the idea of giving more structure to the info string.
Itâs ugly, but itâs needed only for the very rare combination of (1) inline code (2) which is annoyingly long (3) containing `` (4) at start of paragraph. Are we still discussing that one use case, or is there a bigger purpose?
Is hard-wrapping so critical that a long line == âno way to write thisâ? Yes, most of the syntax is wrappable (even setext headers now!), but here weâre dealing with a conflict between inline code and line blocks, of which the latter is inherently not wrappable. If you wanted to express exactly the same long line of code in a code block, there would be no question that writing it as one line is the only option, and thatâs OK.
IMHO allowing ``` oneword but requiring no space in ```two words is too surprising.
Do any existing implementations constrain the info string?
Letâs see, Babelmark shows few varying on space before word, and few varying on multi-word info string. But both are rare.
What happens when info string doesnât adhere to the constraint? Itâs still code, but inline code, right?
Iâm worried about different implementations disagreeing on whatâs code and whatâs markdown. Especially on top level where off-by-one-fence can flip the meaning of a whole document.
Disagreement between block and inline is less severe, but with empty line it can grow into a whatâs-code disagreement:
However this situation already exists between implementations that understand fenced blocks and those that donât.
Not sure weâd be increasing the risk by constraining info string syntax.
The above Babelmark link shows some catastrophic text-became-code disagreements â even without empty lines!
(marked, PHP Markdown Extra, Maruku) and some milder code-became-text cases.
It seems some implementations do direct code block â text fallback, which is bad for interoperability
Intuitively, I expect any constraints on info strings should reduce interoperability; the current âanything goes (except backticks)â rule is simplest so should have better chance to be widely adopted. However in practice itâs it might not matterâŚ
``` something that's illegal info string
The line above can NOT start a fenced block,
(this will be treated as text rather than code)
but the line below can:
```
(This text will be treated as code.)
``` another info string
The line above can NOT terminate the fenced block,
(which is good, this is treated as code as intended)
only the line below terminates the block:
```
Iâm not sure thatâs what happens in marked, PHP Markdown Extra, Maruku, but itâs what CommonMark would do if you constrain the info string.
A human reading sequentially thinks âopening fence, or at least start of inline codeâ; but the spec looks for block structure before inline, so as soon as you reject a an intended opening fence, youâll lock on the closing fence instead.
Here is another way this fails:
````` outer fence
Backticks on next line are just code
``` all inside outer fence, right?
foo
`````
But if âouter fenceâ is some illegal info string, it stops being a fence, and the 5 backticks no longer shield the 3 backticks inside!
The only solution I see is never constraining info strings (well we must outlaw backticks for inline one-liners).
IIUC anything else inevitably creates interoperability problems.
Nice point that âOption 0â may actually be better than the other options, given the costs of each.
There are implementations that constrain the info string. Pandoc allows either a GitHub-style single world or pandoc-style attributes like {#id .class .class2 key="value"}.