Tables in pure Markdown

http://johnmacfarlane.net/babelmark2/?text=+field1+|+field2 +-------|-------- +entry0+|+entry1 +entry2+|+entry3 +entry4+|+entry5

You mean.

 field1 | field2
 -------|--------
 entry0 | entry1
 entry2 | entry3
 entry4 | entry5

I just read through all of these posts. It seems the preferences surrounding the table syntax for CommonMark are as variegated as the Markdown “spec” itself. Some folks want it to be more like HTML, some value the parse over readability in the pure Markdown form, and still others don’t think it should be in the CommonMark spec at all!

@vitaly’s has basically taken the GitHub Flavored Markdown version and incorporated it into markdown-it. (This is what I use for my node.js-powered blog, based on Markdown source.) Most people even remotely interested in Markdown are familiar with GitHub, so this seems like a good choice on @vitaly’s part.

Going one step further, Byword’s implementation of Fletcher Penney’s MultiMarkdown 3 table spec seems the most straight-forward amalgamation of the core ideas for tables in Markdown. Penney’s ideas are a close cousin to the GitHub Flavored Markdown table spec, except for one crucial addition: table captions.

Captions for tables do not seem to appear in any of the other Markdown specs including some table parsing. (And, likewise, no Markdown spec except for Penney’s have any support for <figcaptions>. I’m not sure why.)

Example [Multi]Markdown source for a table:

| First Header  | Second Header | Third Header         |
| :------------ | :-----------: | -------------------: |
| First row     | Data          | Very long data entry |
| Second row    | **Cell**      | *Cell*               |
| Third row     | Cell that spans across two columns  ||
[Table caption, works as a reference][section-mmd-tables-table1]

The final line, [Table caption, works as a reference][section-mmd-tables-table1] is what forms the <caption> for the <table>.

Is this not the most simple execution for tables in CommonMark?

1 Like

My choice was done to minimize breaking changes when spec is complete. I’m not sure it’s the best of possible for the spec.

2 Likes

@vitaly Understood. Still, it was a good choice in and of itself :slight_smile:

More than a year has passed, is the core settled enough? Can you please elaborate on how we can create these extensions? Should it be a Pull request on the spec itself in a new section ‘extensions’? Or should it be its own git repo?

If we would create a fork, the entire idea of having one spec gets lost. That doesn’t seem useful.

1 Like

Having a spec for the core elements is still useful even if there are implementations that do extensions to the core in different ways. That is a much better situation than having implementations that render core elements differently. Anyway, I’m sorry this is trying your patience. This thread is a good place to record and discuss ideas for table syntax, which should make things faster when we get to that.

1 Like

If/when standardization of pipe tables occur, it would be good to address escaping of pipes. I just had the following scenario in some documentation:

|`E[foo|="en"]`|...some text...|

Note the internal | character within the first cell in the row. The Markdown implementation I am using considers it a column separator.

The only solution with common Markdown processors appears to be to escape to HTML, for example:

|<code>E[foo&#124;="en"]</code>	|...some text...|

But I don’t find that very satisfying.

In my example, given that I am within a backticks code block, we could make a case that the internal pipe should not be considered a column separator at all. But in the case where there are no backticks, there should be a non-HTML way to escape pipes.

According to CommonMark 0.23 Section 6.1. Backslash escapes,

Any ASCII punctuation character may be backslash-escaped […]

I believe pipe (|) is an ASCII punctuation character, and \ is a decidedly non-HTML way to escape it.

As for code spans, Section 6.3. Code spans states:

Code span backticks have higher precedence than any other inline constructs except HTML tags and autolinks.

I don’t see how pipe tables, if/when standardized, would change these simple rules.

The implementation you are using seems to either use different precedence rules or (more likely) disregard precedence altogether.

@ebruchez Any implementation which uses “|” to delimit table cells (or whatever syntax construct), but

  1. does not provide the escape sequence “\|” to “hide” that character, and
  2. does not treat is as data inside a code span

seems pretty broken, in particular since (2.) already holds for any “markup-relevant” character in the very first Markdown description by Gruber.


That said, based on examples I tried in BabelMark, it seems that botching escape sequence recognition is not an uncommon problem in Markdown implementations, and that using a character reference like you did with &#124; is in fact the most robust work-around (and sometimes the only one). For a simple example:

*foo \* bar*

will not render as “foo * bar” (wrapped in <EM>) in all implementations, and even less so

*foo * bar*

(though I’m pretty sure that both forms should, by very basic Markdown rules), but

*foo &#42; bar*

will in every implementation employed there (even in some really dumb ones!).


@Dmitry Hmm, now that you quote it from the specification, the use of the term “precedence” in this context doesn’t feel quite right—am I the only one having this hunch?

Indeed, select implementations of CommonMark use the term priority, but I don’t see much harm in using precedence in this context.

2 posts were split to a new topic: Is the spec too big?

@tin-pot I am not sure which implementation this is (it’s the one used by gitbook). I agree it’s quite broken. Hopefully CommonMark can make sure this kind of scenarios are fully covered, and if the core of CommonMark already does cover escapes and code spans properly, then it’s even better!

JFTR Gitbook uses kramed, a fork of marked which is supposedly compatible with kramdown. Kramdown is well known to be the table-greediest of them all.

1 Like

I have started work on making libcmark extensible, see https://github.com/jgm/cmark/issues/100 for the (pretty long) discussion. My test case / use case for this is tables, it seems it’s something a lot of other people want too, and I was made aware of that escaping problem through Parsing strategy for tables? .

My humble opinion on that is that as it seems accepted that block level rules should take precedence over inline rules, the correct way to approach the issue is to match lines with table row rules by ignoring all backslash-escaped pipes. If a line matches and a table row block is created, the backslash should be removed before parsing inlines. Thus:

| A cell `\|` with a pipe | another cell |

should be interpreted as a table row with two columns, the content of the first cell being translated to:

A cell `|` with a pipe

before inlines in it are parsed.

That’s the behaviour my test extension now implements at https://github.com/MathieuDuponchelle/cmark/commits/extensions_draft_3 .

1 Like

Table headers should not be required (HTML doesn’t require them… :relaxed:) Instead, a single pipe character should be accepted as a ‘no-header’ table opener. (Bitbucket’s parser does this already FWIW)

The source would look like:

|
|------|------|
| cell | cell |
| cell | cell |

Thoughts? It’s unambiguously a table but most engines parse it as text unless it looks like this:

| | |
|------|------|
| cell | cell |
| cell | cell |

Of course, requiring users to count pipes when they aren’t using a header row seems, well, like a recipe for disaster parsing errors.

Why not just

|------|------|
| cell | cell |
| cell | cell |
6 Likes

Support for table cells that wrap over multiple lines in the Markdown source would be useful for platforms where files have fixed or limited record (line) length.

For example, on IBM z/OS (mainframes), record length might be limited to 80 characters.

I’m using markdown-it (thank you, Alex and Vitaly!), with GFM table support, to render Markdown on the mainframe. In a file with 80-byte records, this works well for tables with a small number of columns and short cell values - that is, where each row occupies less than 80 bytes - but beyond that, tables are unusable in this context.

I have colleagues who maintain tables as plain-text in files with 80-byte records, where table cells wrap over multiple lines. I’d like to be able to suggest some flavor of Markdown (preferably with an existing markdown-it plugin) to format those tables.

(Vitaly, I’ve read, understood, and agree with your comments about implementing GFM table formatting in markdown-it.)

1 Like

Pandoc supports grid tables with multiline cells which are suited for fixed width content.

I have recently implemented the same feature in Markdig

2 Likes

How about supporting Wikitext-style tables, along with the other syntaxes e.g.
pipe tables?

It’s sort of a thin layer on the traditional <table>, <th>, <td>, etc.
markup tags.

It supports declaring “flat” tables; tables where you don’t have to manually
format and reformat stuff like pipes. Also allows you to create big tables
without going over line-length limits.

Here’s an example:

{| class="wikitable"
|+ Table caption; The quick brown fox jumps over the lazy dog.
|-
! Header 1
! Header 2
! Header 3
|-
| row 1, cell 1
| row 1, cell 2
| row 1, cell 3
|-
| row 2, cell 1
| row 2, cell 2
| row 2, cell 3
|- style="text-align: center;"
| row 3, cell 1 || row 3, cell 2 || row 3, cell 3
|-
| The quick brown fox jumps over the lazy dog.
|
The five boxing wizards jump **quickly**.

> Whoo yeah this is a blockquote.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
| 
{|
|-
| Tis a nested table
| with two cells per row.
|-
| One two three
| Four five.
|}
|}

Advantages

  • Easy to write and is arguably more maintainable than pipe/grid tables (which
    is basically ASCII art.)
    • Writing long paragraphs is much simpler. No need to fiddle with pipe
      characters.
  • Diff-friendly. Adding or removing a cell (assuming one cell per line) means 1
    line edited in diffs.
  • Large amount of formatting options; again this is due to it being a thin
    layer for HTML.
    • Declaring HTML attributes is dead easy. This makes it possible to use
      rowspan, colspan and friends.
  • Used in one of the largest and most edited (?) sites in the world.

Disadvantages

  • Nesting tables — though less inconvenient compared to other formats — is
    unreadable when you have large tables.
  • Verbose.
  • Looks less like a table and more like markup.

Some modifications for Markdown.

Here are some thoughts to make it better for Markdown.

Starting and closing tags, plus shorthand attributes:

[| {.class1.class2 #royal-pain style="text-align: center"}
// table code here //
|]

Readable nested elements:

[|
|-
| And in the end, the love you take is equal to the love you make.
|
:: Here's a table inside a table.
:: 
:: {|
:: |-
:: | Tis a nested table
:: | with two cells per row.
:: |-
:: |
:: :: Here's another nested table
:: :: [|
:: :: |-
:: :: | One plus || Two plus || Three plus
:: :: |]
:: |}
| Closing thoughts.
|]

Nah, Mediawiki tables are not really something Commonmark should adopt due to the disadvantages you listed. The ASCII art look of pipe and grid tables is exactly what makes them fit well with the markdown spirit. Anyhow, if you like the {| intro, I guess it would mix with code fence info strings like this:

||| .class1 .class2 #royal-pain style="text-align: center"
table code here
|||

Nesting is done by indentation in markdown, so your proposed :: would be a deviation from that.

3 Likes