Consistent attribute syntax

Just for the record, markdown-it and a number of other implementations converts the < to &lt;.

Babelmark

Howdy folks :wave:

I have been using a custom syntax for attributes for code-blocks for quite some time now in a reasonably large app and I’m interested in making it “more standard”. Is this the right place to chip in with the discussion? Are there any regular meetings where things are planned?

Also, I don’t just want to reply to this thread with a “this is the way that I would like it to look” I’m more interested in helping to shepherd any specification through than making sure it’s my personal preference of the syntax :joy:

Yes, I believe it is. You could describe a syntax and invite other people here to share their thoughts.

Keep in mind, there’s already some variation (or is it fragmentation?) in attribute syntax among different Markdown flavors and other writing formats like Markua. For example…

Pandoc / PHP Markdown Extra / Earlier version of CommonMark spec:

{#myId .myClass key=val key2="val 2"}

Maruku / Kramdown:

{:ref-name: #myid .my-class}

Markua:

{key_one: value1, key_two: value_two, key_three: "value three!", key_four: true, key_five: 0, key_six: 3.14}

Since Markdown supports, and indeed encourages more than one way of marking up elements, I think it would be fine to support some or all of these variations. Both key: value and key=value are logically equivalent, and the #id and .class syntax acts as a useful shortcut. So long as they’re within curly braces, I don’t think there should be any issue with parsers supporting a mixture of attribute syntax?

1 Like

Ok cool :+1:

The syntax that we’re utilising is similar to the one you mention is in the “Earlier version of CommonMark”:

```handlebars {data-filename="app/templates/components/rental-listing.hbs" data-diff="-15,+16"}

This comes from the Ember Documentation here to allow us to render filename and diffs:

As for the syntax, I wonder if we can’t potentially aim for having something a lot simpler in the spec? :thinking: I understand the benefits that people might have about using shortcuts for ids and classes but I wonder what is wrong with the simpler to define:

```hr {id="awesome-rule" class="blue" data-unrelated="all the data"}

I am not familiar yet with the CommonMark spec process but I wonder if a simpler definition of a feature is much more likely to make it through?

And yes I agree with @chrisalley that because of the multiple markup situations we could support both key: value and key="value" (and maybe even key=value but I am less bothered about that).

1 Like

We’ll probably need to wrap multi-word values within quotes, e.g. key="multi word value" or key: "multi word value", but single word values could leave off the quotes, e.g. key=value or key: value.

I think we should use double quotes around the value, so that the writer can use values that contain apostrophes, e.g. thought-experiment: "Schrödinger's cat"

Since there may be cases where values contain double quotes, we could alternatively support wrapping the value in single quotes, e.g. key: 'my "useful" value'.

1 Like

I thought I had said this earlier, but since I cannot find it, I will mention it possibly again: The implementations of equals vs. colon syntax for key-value pairs differ in that the latter requires a comma (or perhaps semicolon) between pairs, thus it does not need quote marks. Also, the equals sign is usually directly attached to the key and the (possibly quoted) value, without intervening whitespace, whereas the colon is almost always followed (and possibly preceded) by whitespace.

I like it the idea of leaving off quote marks. But it does raise the question of what happens when there is a colon or comma inside of the value string. What does this produce?

title: Horizon: Zero Dawn

It seems to me that a value containing a colon or comma would be more common than the value that contains double quotes.

Markua requires that multi-word value strings are wrapped in double quotes. I wouldn’t mind staying close to the Markua rules. This would also keep the = and : quote rules consistent. There would be less cognitive overhead switching between the two.

If we wanted to take consistency to the next level we could make the commas between the key/value pairs optional. That way, any of the following four would be valid:

{key_one: value1 key_two: value_two key_three: "value three!" key_four: true key_five: 0 key_six: 3.14}
{key_one: value1, key_two: value_two, key_three: "value three!", key_four: true, key_five: 0, key_six: 3.14}
{key_one=value1 key_two=value_two key_three="value three!" key_four=true key_five=0 key_six=3.14}
{key_one=value1, key_two=value_two, key_three="value three!", key_four=true, key_five=0, key_six=3.14}

The parser could simply ignore the commas, leaving it up to the writer to optionally include them for aesthetic purposes.

4 Likes

@mb21 : Has your draft proposal reach acceptance yet?

I agree with most of the proposal except for the following points:

In the draft I am reading:

[…] For paragraphs, block quotes and tight lists, the attribute block must start on a line that immediately follows the corresponding block […]

I wonder what is the advantage of putting the attribute block after the block? I would personally follow “Beyond Markdown” recommendation and put them before the block and for inline I would leave them after The reason is that it does not seem natural to “identify” something after it’s been declared. Maybe it’s just me. I also think that class and attributes information may be useful not only for HTML but to simply identify blocks of texts in a document and it would be a lot easier to spot if it is before the blocks, by putting them them after the reader will need to identify the end of the block which is trickier than the start in all cases and less intuitive anyway.

I completely agree. Escaping left curly brackets could be quite problematic.

Sébastien Hamel via CommonMark Discussion
noreply@talk.commonmark.org writes:

I wonder what is the advantage of putting the
attribute block after the block? I would personally
follow “Beyond Markdown” recommendation and put them
before the block and for inline I would leave them
after

The main reason for this recommendation is to avoid
ambiguities. If block attributes can come after a
block, then there’s always an ambiguity about whether
the attribute goes with the block or the final inline
in it.

I completely agree. Escaping left curly brackets could be quite problematic.

Where do curly brackets appear in ordinary text? Of
course they appear in computer code, but that should
be in code backticks. They also appear in math, but
that too should be in a special environment (since a
lot of mathematical expressions would otherwise
need escaping).

Chicago Manual of Style only mentions these two uses:

I agree that they do not appear often… After reconsidering, we can drop this. The syntax {...} is simpler and for the rare occasion someone will need the curly brackets inside text, it may not be worth to penalize more common use cases with the more complicated syntax {: ... ).

As an implementer thought, regarding the attributes blocks position (after or before blocks) I need to implement this functionality. What is the approval process for proposals?

When I wrote this proposal, I was (and still am) on the fence on whether the attributes should come before or after paragraphs.

We have mixed precedents: in fenced block quotes it’s before:

``` {.python}
x=1
```

while in headings it’s after:

# my title {.myclass}

But you people have certainly made good argument in favour of having them come before. Especially:

Regarding:

I thought this could be resolved by requiring the attributes to be on their own line. But it may well be that it’s easier to parse if the attributes come at the beginning of the block.


Currently, there isn’t any. Commonmark hasn’t even reached 1.0 due to some edge cases that need to be resolved. That being said, it’s certainly a valuable forum to have different implementers discuss pros and cons of future extension syntax.

Btw., if you’re interested to see what happens if you bolt-on attributes and some other pandoc extensions on the token-based parser of markdown-it.js (on a least-effort basis), feel free to play around with this bundle of markdown-it plugins: https://github.com/mb21/markdown-it-pandoc.

As for the difference between before and after… For me it is a matter of how we see the attributes…

If we consider them merely like HTML attributes (ignoring all other considerations I make below) then to put them after would make sense since they are just seen as side parameters only there to serve the purpose of being used in the HTML generation process.

I point out in my previous comment that putting them before would make them easier to use to identify blocks. In the vision I have for those attributes, yes, they are used to feed the HTML rendering process (or any other kind of generation process…) but I see them mainly as semantic identifiers. They can tell us something about the text we are looking at or we are looking for.

For example, I could have a text like this:


# title level one 

{.content}
Some text. 

…that I want to comment…


# title level one 

{.content}
Some text. 

{.comment}
Author, could you make this sentence longer? 

If I take the same text and I put the parameters blocks after, it gives something like this:


# title level one 

Some text. 
{.content}

Author, could you make this sentence longer? 
{.comment}

For me, the first one is easier to read, in the second, it’s more difficult to know what goes with what.

Other point to consider in favor of putting attributes blocks before is consistency. The semantic information in a fenced code block, the parameters, is put before the block so parameters blocks should follow I think.

Another point regarding the spec. I see that there is no way to specify multiple classes in the attributes blocks:

Markdown authors shouldn’t write multiple key-value pairs with the same key in an attribute block. However, to ease the burden of implementation, the behaviour in such cases is left undefined—although most implementations will probably parse the attributes sequentially and insert them into a map, which would result in a last-one-wins semantic.

On this, I think a syntax like this should be allowed:

{.author .john}

Again, if we see classes as semantic tagging, or meta-information about the block, a bit like in fenced code blocks parameters, the support of multiple classes and the syntax that goes with it should clearly be defined in my opinion. The parsing would simply need to have an array(or a set to avoid duplicates…) per map entry and fill it in as the parsing is done.

Thank you very much for the link to the source code: exactly what I needed to get me started! I will look into it for sure, really appreciated!

2 Likes

There are just sooo many possible locations for block attributes.

  1. {.foo} 
    # Bar #
    
  2. {.foo} # Bar #
    
  3. # {.foo} Bar #
    
  4. # Bar {.foo} #
    
  5. # Bar # {.foo} 
    
  6. # Bar #
    {.foo} 
    
  7. #{.foo}# Bar ## 
    
  8. # Bar #{.foo}# 
    
  9. {.foo} 
    Bar
    ===
    
  10. {.foo} Bar
    ===
    
  11. Bar {.foo} 
    ===
    
  12. Bar
    {.foo} 
    ===
    
  13. Bar
    {.foo} ===
    
  14. Bar
    === {.foo} 
    
  15. Bar
    ===
    {.foo} 
    
  16. Bar
    ==={.foo}===
    

Well at least some of those might be filtered out if we assume that (1) inlines are allowed to have the attributes too and (2) the syntax is the same for inline attributes and block attributes.

I would also ban strange things like e.g. the point 16 because it would mean special rules for SeText headers. I would argue the rules should be so general they apply to all blocks as far as possible.

But even then, all those examples and the discussion “before versus after” actually shows there is so much space for ambiguity that I am wondering whether keeping the syntax same for blocks and inlines is really a good idea. If we keep the syntax the same, we would have quite complicated rules about it to solve the ambiguities and worse, users would have to learn it too. It would not be intuitive for use and even the discussion shows that some people want it before and other after the block.

I would also argue that requiring zero delimiting space/newline for inline element is imho not a good option because inlines can be already quite long and hard to wrap reasonably into multiple lines (e.g. inline link with longer URL) and adding attribute to it would make it even worse.

So what’s your opinion about making the syntax for blocks and inlines different so it explicitly says what it is associated with; yet still very similar to each other so parser can share most of the code to parse it and human brains can also see it as something related and learn it just once. For example {.class #id attr=value} would be for inlines; and {{.class #id attr=value}} for blocks (or maybe vice versa; I would keep the current syntax for what’s more common).

It would allow to have the block attribute just before the block or after it or even inside of it (except likely code blocks), giving users quiet a lot of liberty how to use it.

Surely, it wouldn’t solve all the problems and corner cases (e.g. in cases there is no blank line between blocks), but it would solve many of them.

PS: Inline markup could accept attributes in a lot of places, too:

  1. {.foo}![bar](<baz> "quuz")
  2. !{.foo}[bar](<baz> "quuz")
  3. ![{.foo}bar](<baz> "quuz")
  4. ![bar{.foo}](<baz> "quuz")
  5. ![bar]{.foo}(<baz> "quuz")
  6. ![bar]({.foo}<baz> "quuz")
  7. ![bar](<{.foo}baz> "quuz")
  8. ![bar](<baz{.foo}> "quuz")
  9. ![bar](<baz> {.foo} "quuz")
  10. ![bar](<baz> "{.foo}quuz")
  11. ![bar](<baz> "quuz{.foo}")
  12. ![bar](<baz> "quuz"{.foo})
  13. ![bar](<baz> "quuz"){.foo}

Okay, some of these positions make no sense at all, but would need to be ruled out anyway. Serious options are 1, 2, 5 and 13, perhaps also 6, 9 and 12.

After developing the attributes blocs for Stylo I came out with an implementation which follows in part the draft proposal from @mb21 and the specifications stated in my previous message (which this message replace).

All examples below, which are loosely inspired from the @mb21 draft proposal, will apply the following CSS and I show the final rendering in Stylo:

.blue {
    color: blue;
}
.red {
    color: red;
}
.green {
    color: green;
}
.pink {
    color: pink;
}

This proposal follows the draft proposal on many points but simplifies it on others and add three new capabilities:

  1. Possibility to add attributes blocs before bloc.

  2. Attributes aggregation of all attributes blocs pertaining to a bloc.

  3. Inline attributes blocs are applied to the inline element defined before them, if there is no such element, they apply to the bloc in which they are defined: the enclosing bloc.

There is also some difference with my last proposal: instead of allowing attributes blocs after only for terminating blocs, I followed the simpler rule of allowing an attributes bloc one the line below any bloc, like the draft proposal is suggesting.

On the simplifications side,

  • no requirement for the attributes blocs to follow the bloc indentation to apply to them, unless normal parsing implies such necessity e.g. a list of paragraphs

  • also removed, is the necessity to have spacing (or no spacing) between the attributes blocs and the elements. The three rules below cover all cases without the need of such rules.

  • no line feed are allowed inside attributes blocs. This last rule could have become a problem as more attributes are added to an element, but since aggregation is supported they can just be added separately, like here:

Rules

So, here are the modified/new rules:

  1. if an attributes bloc is one line below a non-attributes bloc, it is always assigned to this bloc (the one before). One line below in this definition, means there is no blank lines between the end of the non-attributes bloc before and the attributes bloc, otherwise we get unintuitive results with list continuation where a list is terminated by the attr-bloc a couple of lines below but is still considered on the line below because of lazy list continuation.

  2. if an attributes block is placed before (see definition of before and after below*), it is assigned to the first non-attributes bloc element after

  3. otherwise, it is assigned to the first element on the left on the same line, unless this element is an attributes bloc, in which case it should apply to the first bloc it is contained in.

*An attributes bloc can be defined inline e.g. inside a paragraph, or as a bloc, at the root or inside another bloc. _ Before _ or _ after _ refers to the relative position of an attributes bloc relatively to another bloc level element under the same bloc or the root. The notion is quite intuitive in fact:

Attributes bloc after:

Example 1:

Or:

Example 2:

Attributes bloc before:

Example 3:

Or:

Example 4:

To be considered a bloc, the attributes bloc must not be followed by anything else than whitespaces:

Example 5:

Some examples:

ATX Headers:

Before:

Example 6:

03%20PM

inline:

Example 7:

08%20PM

or after:

Example 8:

39%20PM

are allowed.

Or inline:

Example 9:

01%20PM

One of the goal of this way of specifying attributes is to keep the same parsing as we would have without attributes bloc handling. In this case putting the attributes bloc after the closing header sequence would result in the attributes to be applied to the whole content including the closing header sequence as it becomes part of the content because it is not closing the header content. So this would probably be unexpected:

Example 10:

27%20PM

horizontal rule

Example 11:

00%20PM

Following these rules, we don’t care if there is spaces between the attributes bloc and the horizontal rule:

Example 12:

25%20PM

In my implementation, a line feed inside an attributes bloc invalidate it, the same as blank lines inside it:

Example 13:

53%20PM

Setext headers

Example 14:

17%20PM

fenced code block

As in the draft proposal, an attributes bloc can be put in place of the usual params or “info string” and become “syntactic sugar for classes”.

But attributes can be put before or after as with any other bloc element:

Before:

Example 15:

49%20PM

After:

Example 16:

14%20PM

Replacing the params:

Example 17:

36%20PM

Reference Links

In the reference links case, there is no need for spaces before the attributes bloc:

Example 18:

24%20PM

In the previous example, only the first attributes are propagated since it is the active reference (because it is the first one).

For now, in Stylo, the attributes propagation to the referencing links and images is applied. I am still not sure about this feature though. For me it agravates the problem on non-locality of the information and adds the necessity to look at the reference to know the attributes applied to the link or images. It is acceptable but not an ideal situation.

Paragraphs

As usual, attributes are supported before, inline and after:

Before:

Example 19:

16%20PM

Inline:

Example 20:

38%20PM

After:

Example 21:

01%20PM

Inline blocs, apply to the previous inline element unless, this previous inline element is itself an attributes bloc:

Example 22:

24%20PM

Example 23:

46%20PM

If the previous element is an attribute bloc then the attributes apply to the enclosing bloc:

Example 24:

07%20PM

There is no need for the attribute blocks to be indented exactly as much as the first line of the paragraph: if it is on the line below it applies to it:

Example 25:

33%20PM

Block quotes

No need for the attributes bloc to be indented as the block quote to apply to it, since it is on the line below at the same level, it is sufficient:

Example 26:

59%20PM

Example 27:

20%20PM

The same rules apply inside a bloc quote, :

Example 28:

43%20PM

Bloc indentation does not change anything:

Example 29:

But as we can see, depending on where the attributes belong they apply to different blocs (remember that red has higher priority than blue in the CSS style):

Example 30:

Example 31:

40%20PM

Example 32:

02%20PM

Example 33:

24%20PM

Lists

Example 34:

44%20PM

Attributes blocs that are not directly under a bloc are applied to the following one:

Example 35:

The same goes for lists inside another bloc, here a blockquote:

Example 36:

41%20PM

To get red color applied to the list, we need to put attributes bloc directly under it:

Example 37:

10%20PM

We can apply different attributes to the lists

Example 38:

31%20PM

Example 39:

04%20PM

Example 40:

24%20PM

Example 41:

Example 42:

Example 43:

59%20PM

Example 44:

Pink class attribute is put on top:

Example 45:

The attributes can be put before for the list too:

Example 46:

inline code

The spaces don’t matter:

Example 47:

29%20PM

or

Example 48:

50%20PM

Example 49:

17%20PM

emphasis

Example 50:

51%20PM

Example 51:

12%20PM

links

Example 52:

35%20PM

Example 53:

58%20PM

Example 54:

23%20PM

images

Example 55:

44%20PM

Example 56:

03%20PM

Example 57:

24%20PM

reference

Example 58:

06%20PM

Example 59:

Example 60:

48%20PM

Interesting, glad my proposal helped you!

Personally, I don’t feel too great about having so many ways to write the same thing. But please let us know about your experience in using this implementation… i.e. how it works out in practice for you (and users of your software).

2 Likes