Blank lines before lists, revisited

jkdev noreply@talk.commonmark.org writes:

It seems sensible to require at least two numbered list items.

How many situations are there in which someone actually wants to 
interrupt a paragraph with a single-item numbered list? Probably
1. Or maybe a couple more. Other than that, a numbered list would
be created by happenstance, without intending or expecting it.

It’s actually pretty common in some contexts. For example, in a
paper one might discuss a number of numbered examples, with regular
paragraph text in between. I’ve frequently used one-item numbered
lists (with different start numbers) for this kind of thing.

To weigh in on the proposals:

(1) (single-digit start numbers) seems okay to me. It might fix some
practical problems (and might cause some others). On the other hand,
even with this fix we’d have a mismatch between parser behavior and the
spec of the sort described in commonmark/cmark#204.

(3) is bad for the reason I just gave in my previous post.
One-item lists are, in general, useful, so we don’t want to
rule them out altogether. I suppose we could require at least
two items when the list interrupts a paragraph, but that creates
difficulties for parsing: you can’t know you’ve got a list until
you’ve parsed the whole list item and seen what comes after it.

(4) has a similar issue: we can’t tell if the list is going
to be loose unless we’ve parsed the whole thing. And I’m not
sure what is gained by allowing only tight lists to interrupt
a paragraph.

(5) is going to be problematic for people who wrap their text
to a reasonable width (say, 72 characters), and also for people
who don’t hard-wrap at all. And I echo @mity’s worry about surprising
behavior.

(2) seems the most promising to me, but there is the worry
about languages with different punctuation conventions.

I guess it might be a good idea at this point if someone
summarized clearly and concisely why we need to change things.
My own intervention above was motivated by
https://github.com/commonmark/cmark/issues/204, but I believe
that issue could be handled at the implementation level without
a change in the spec. At any rate, I’ve written a Haskell
implementation, roughly following the same strategy as cmark,
which gives the right results in this case.

1 Like

An enumerated example is not a list item. I don’t think this counts as a strong argument against (3). I did assume a minimum of two list items would only be expected in lists that are not preceded by a blank line. If this makes it to complicated, I’m fine with doing this idea.

The reasoning behind (4) is that a tight list could well be a child of a paragraph (in output formats which support this nesting), whereas a loose list, which can contain paragraphs (i.e. blank lines) itself, seems strange inside a paraphrasing and thus could only end it.

<p><list.tight/></p>

<p/><list.loose/><p/>

Anyhow, I prefer (2), too. The colon at the end of the line preceding a list works in two ways:

  1. Existing content in many languages will have a colon introduce a list without an intervening blank line. It works as a heuristic rule.
  2. New content in any language can be authored with the colon as a new active markup character.

The problem is that for much of variant 1 the colon should be retained in the output, whereas it should be dropped for many cases of variant 2.
This can be done with an additional rule, but I’m not sure whether that would still be acceptable.

Christoph Päper noreply@talk.commonmark.org writes:

An enumerated example is not a list item.

Well, a list item is the closest thing in commonmark to represent it
with. If you make it a regular paragraph, the indentation will be
wrong and it won’t stand out.

The reasoning behind (4) is that a tight list could well be a child of
a paragraph (in output formats which support this nesting), whereas a
loose list, which can contain paragraphs (i.e. blank lines) itself,
seems strange inside a paraphrasing and thus could only end it.

I see. But the way the spec is designed, a paragraph can never resume
after a tight list either. So, without much larger changes, (4) doesn’t
seem motivated.

Anyhow, I prefer (2), too. The colon at the end of the line preceding a list works in two ways:

  1. Existing content in many languages will have a colon introduce a list without an intervening blank line. It works as a heuristic rule.
  2. New content in any language can be authored with the colon as a new active markup character.

The problem is that for much of variant 1 the colon should be retained in the output, whereas it should be dropped for many cases of variant 2.
This can be done with an additional rule, but I’m not sure whether that would still be acceptable.

For something a bit like this, see
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#literal-blocks

‘As a convenience, the “::” is recognized at the end of any
paragraph. If immediately preceded by whitespace, both colons will be
removed from the output (this is the “partially minimized” form). When
text immediately precedes the “::”, one colon will be removed from the
output, leaving only one colon visible (i.e., “::” will be replaced by
“:”; this is the “fully minimized” form).’

So, we could say that if the final colon is preceded by whitespace,
it gets removed.

1 Like

<space><colon> would not work well with French practice, I guess. I thought about leaving a single colon as is :, but removing both if there are two ::.

I honestly believed that list items needed to have some whitespace and not start at the left margin.

What if list items needed either a blank line separating them from the paragraph or some initial whitespace before the first bullet/number? Then you wouldn’t get accidental list items from linewrapped paragraphs.

2 Likes

I like that kind of flexibility. You could even mix-and-match blank lines and initial whitespace according to your preferences.

For example: a blank line for a new list, and initial whitespace / additional indentation for sublists.

Paragraph text here.

1. List after blank line.
    1. Sublist after indentation.
2. Another list item.
    1. Another new sublist after indentation.

This is very readable and intuitive, and backwards-compatible as well.

1 Like

I’ve thought about this a lot. @Crissov’s #3 makes the most sense. Every example of why a list should be able to interrupt a paragraph on this thread and in others on this forum have at least two items in the example. e.g.:

If you think about it, I’m pretty sure that’s how humans parse it. Compare

In Markdown 0.8 and earlier and version
1. This line turns into a list item.

and

In Markdown 0.8 and earlier and version
1. This line turns into a list item. Also in version
2. This line turns into another list item.

I think humans parse the latter as a list, irrespective of the content, at least until they read it. In such cases the plain text author is likely to see it that way too, and will do something to fix it, e.g. move the numbers to the end of the preceding line.

It’s the pattern that makes us see it as a list. Given that this is a plain text format designed for how humans read, it makes for rules that jive with that.

As to parsing, @jgm, looking at the source code for commonmark.js, it doesn’t seem that hard to peek ahead to the same text column on the next line. Am I wrong?

2 Likes

As to parsing, @jgm, looking at the source code for commonmark.js , it doesn’t seem that hard to peek ahead to the same text column on the next line. Am I wrong?

Imho, you are wrong because you would may need to peak to much further then that, and also in a non-trivial way. Consider:

  • Loose list (there may be a blank line): You may need to peek after a blank line.
  • Multi-line list item: You may need to peek to the 1st line after the potential list item ends, but it may be long.
  • Nested list in the 1st item: You may need to peek after the nested list ends.
  • All of it combined together.

So peeking is more like full block parsing looking speculatively ahead (without any hard limit) until we now there is a 2nd item, and then possibly reverting back if there is not one.

And when I say block parsing looking speculatively ahead (without any hard limit), I become afraid whether there might be a malitious input using such feature, leading to O(n^2) parsing times.

EDIT: And also note the nested list in the 1st list item may need the same treatment, so you may need to perform a speculation in speculation:

Lorem ipsum.
1. Is this 1st item of a list?
   1. Is this 1st item of a sub-list? Note we still do not even know yet whether there is a parent list...

@mity CommonMark.js, at least, isn’t streaming, takes two passes, has random access to every line. Still, you may be right.

The intent of my post was more philosophical. I’m working on an idea for how to extend Markdown in a way that stays true to its plain text human reader philosophy. I may even be able to apply that work to help make progress on the open issues keeping us from a 1.0 release.

There are a lot of posts on this forum that seem to not know that philosophy, or seem to not care about it. A lot of posters see Markdown as source code for HTML. I think it would be good if everyone got on the same plain text page.

[PS. I think the current rule, that a list can interrupt a paragraph if it starts with 1, is good. Again, I didn’t mean to reopen an issue that I think is settled. As long as we don’t go back to requiring a blank line always.]

1 Like

That wouldn’t help. O(n^2) where n is number of lines (and not bytes) is still bad.

What about requiring an inline space before any number interrupting a paragraph? The spec allows space before starting a numbered list, right (to allow lining  1. up with 10.)?

1 Like