Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting a text-based diagram syntax in Jupyter Markdown #101

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bollwyvl
Copy link

@bollwyvl bollwyvl commented Mar 14, 2023

Hello folks! This JEP proposes adopting this syntax:

```mermaid
flowchart LR
  chicken --> egg --> chicken
```

Which renders all over as:

flowchart LR
    chicken --> egg --> chicken
Loading

It was written entirely within a checkout of this PR, which you can try out directly on Binder.

Thanks!

Resolve #100

Voting from @jupyter/software-steering-council

@bollwyvl
Copy link
Author

Hi folks! It's been a week, and we've collected a few emoji ❤️ .

it was my hope to do the work to get this into the JupyterLab 4/Notebook 7 release window, which is rapidly closing.

Any feedback welcome!

@manics
Copy link
Contributor

manics commented Mar 22, 2023

What are the expectations on Jupyter clients that comply with this addition? For example, are they expected to:

  • use mermaidjs 10 (or version X) for rendering all mermaid diagrams
  • display all mermaid diagrams "in the same way" as mermaid 10 (or version X) but using their own code if they want?
  • interpret mermaid diagrams in any way they wish?
  • display only the non-experimental diagram types, e.g. C4 Diagrams has the warning:

    C4 Diagram: This is an experimental diagram for now. The syntax and properties can change in future releases. Proper documentation will be provided when the syntax is stable.

@bollwyvl
Copy link
Author

bollwyvl commented Mar 22, 2023

interpret mermaid diagrams in any way they wish?

As a "living" dependency with maintenance costs, I think this has to be the assumption: as noted, changes can happen in minor versions, and 9.x is still getting releases, which speak well the the upstream maintainer attention to detail.

That being said, the flowchart diagrams originally authored from very early in mermaid's release history still represent fairly well in 10.x. And everyone changes grammars now and again, or even the meaning of existing keywords, if a better algorithm comes along (e.g. in graphviz prior to 2.28 overlap=false meant voronoi, but now means prism, but would need voronoi to draw the same way).

Perhaps if rendering fails, the right play is to fall back to the the raw text, which could convey some meaning... ideally with grammar-aware syntax highlighting, but then that can "fail".

only the non-experimental diagram types

Perhaps. But I think throwing the kitchen sink, including the new, sharp knives, isn't the worst play. The one that kills me is the ElkJS backend for flowchart: it's super heavy on the wire and behind a feature flag, but super powerful. And mindmap is the sole user of cytoscape.js (I think).

@bollwyvl
Copy link
Author

@manics thinking more about it: i've also updated the JupyterLab PR with a treatment of how an "unparseable" might be treated... moving it down to the "warning" class of feedback seems reasonable, as again, you can still kinda tell what is going on in the diagram by reading the text.

as-rendered click on the diagram text/arrow
a screenshot of a warning from the parser a screenshot revealing the parser message

and the full `d3` metapackage. Some advanced features use the `cytoscape` and `elkjs`
_rendering engines_, but are only loaded when needed.

## Unparseable Diagrams
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manics here's me expounding a bit more

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for reference, here's how GitHub handles unparseables:

flowchart LR
a --> b -->
Loading

Note that it:

  • puts the (undismissable) error up front with (unhighlighted) source below
  • does not provide the (semi-) helpful grammar message
  • allows for copying (but not viewing) the raw source


...and they should _mostly_ render the same way everywhere.

_MermaidJS 10_ also includes a number of new diagrams, not available on earlier major
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manics i've split out the diagrams and features not available in <10

@psychemedia
Copy link

I note the proposed syntax diverges from the syntax used by MyST, which is the flavour of md preferred for writing Jupyter Book documents.

For example, via https://myst-tools.org/docs/mystjs/diagrams :

It is possible to add mermaid diagrams using the {mermaid} directive, for example:

```{mermaid}
flowchart LR
 A[Jupyter Notebook] --> C
 B[MyST Markdown] --> C
 C(mystjs) --> D{AST}
 D <--> E[LaTeX]
 E --> F[PDF]
 D --> G[Word]
 D --> H[React]
 D --> I[HTML]
 D <--> J[JATS]
```

The jupyterlab-myst extension supports stylised rendering of several MyST admonition types, including mermaid:

jupyterlab-myst-mermaid

There are also various flavours of IPython magic that support text2diagram rendering (example).

@westurner
Copy link

GitHub has just mermaid (instead of {mermaid}) what is the jupyterbook syntax and does JupyterBook also support what just works on GitHub:
https://github.blog/2022-02-14-include-diagrams-markdown-files-mermaid/ :

```mermaid
  graph TD;
      A-->B;
      A-->C;
      B-->D;
      C-->D;
```

@psychemedia
Copy link

psychemedia commented Apr 26, 2023

@westurner The MyST parser renders the ```{mermaid} initiated block correctly, but not ```mermaid. I note also that Quarto uses the ```{mermaid} syntax [docs].

In general, I get the feeling {} route has certain advantages, not least because it allows passage of additional contextual information in the top line of the block (for example, ```{admonition} Title, ```{figure} IMAGE-URL etc.

From a "content creator" end user perspective, writing content, the {} approach appears to have more traction. From a developer perspective, the Github flavoured route may be more attractive.

@psychemedia
Copy link

psychemedia commented Apr 26, 2023

In MyST, the convention is ```CODE-LANGUAGE for executable code blocks (docs) whereas in Quarto/qmd the curly-bracketed syntax is used for executable content too (```{python}, ```{r}) [docs].

In the .ipynb context, Jupyter markdown is distinguished separately from code within the "markdown" vs. "code" keyed JSON object. But as Jupytext shows, it is quite possibly to use extended markdown or extended Python formats to represent notebook content that excludes cell outputs.

@krassowski
Copy link
Member

I believe that the discussion on the use of braces or not is orthogonal to this JEP. Jupyter (Notebok, Lab, etc) currently de facto uses GtiHub-flavoured markdown and this JEP follows that; there is a separate JEP about allowing other Markdown flavours: #99 and if both will be accepted everyone will be happy. But should only this one be accepted, I think we should not introduce discrepancies by changing to braced format for only one singular case.

@rgbkrk
Copy link
Member

rgbkrk commented May 3, 2023

👍 to supporting whatever GitHub flavored markdown supports. Too bad they didn't include mermaid in their spec.

@westurner
Copy link

@westurner The MyST parser renders the ```{mermaid} initiated block correctly, but not ```mermaid. I note also that Quarto uses the ```{mermaid} syntax [docs].

In general, I get the feeling {} route has certain advantages, not least because it allows passage of additional contextual information in the top line of the block (for example, ```{admonition} Title, ```{figure} IMAGE-URL etc.

From a "content creator" end user perspective, writing content, the {} approach appears to have more traction. From a developer perspective, the Github flavoured route may be more attractive.

FWIW YAML (and YAML-LD with an implicit @context) probably has a better-defined one-line grammar than the existing MyST Markdown parser and JupyterBook.

@westurner
Copy link

  • Value of YAML-LD in MyST in: code-fence attr syntax:
    • We could add schema.org JSON-LD metadata to inlined code blocks, datasets etc

@ivanov
Copy link
Member

ivanov commented Jun 26, 2023

I want to start off by conceding that having a text-to-graphic diagramming standard within the notebook would be useful, and acknowledge the significant amount of effort and thoroughness that's been put into preparing this JEP.

One of the challenges I see with this proposal is that an external tool is being used without reference to the version of the tool that is part of the document containing the diagrams.

To make this discussion more concrete: Mermaid JS adds a new chart types in minor versions, 10.2.0 introduced quadrant chart, for example. A new user could start using these quadrant charts, or other Mermaid JS features which are not in the version of Mermaid JS that shipped with whatever Jupyter tool they are using, and be frustrated when it doesn't work. Similarly, two users sharing a notebook document but using different versions of the same Jupyter tool which happen to bundle different versions of Mermaid could end up with a proper render of a diagram for one of them, and an un-rendered/error diagram in another. This doesn't have to be different users, the same situation can arise from one user using a combination of Jupyter tools (a JupyterLab version that has adopted the latest Mermaid JS version, and an nbconvert version that is using an older version).

I hope you can understand how an unspecified and possibly changing dependency can also lead to a complement problem, where diagrams that used to render stop rendering as features in MermaidJS are deprecated and removed, and the Mermaid JS version changes.

One of the differences I see between this and an analog in code cells caused by syntax changes in programming code languages is that at least the previous rendered results are available when receiving a notebook document, even if the recipient lacks a kernel matching programming language version.

Finally, it seems like this introduces a javascript runtime environment requirement for converting a notebook to a PDF document. Notebooks using mermaid syntax will have to either be rendered via webpdf or would have to pass though a Mermaid filter for pandoc (I found a couple, they use the Mermaid CLI which requires nodejs: https://github.com/raghur/mermaid-filter and https://github.com/timofurrer/pandoc-mermaid-filter). I do not believe we've had such a requirement in the past.

For these reasons, I am voting against this proposal. I would prefer an alternate path where we come up with a way to try out extensions to markdown like this without committing the project to a particular one. That way, even if JupyterLab and Notebook 7+ ship supporting mermaid syntax, others are also possible, including the ones mentioned in the JEP, or still others, like Wavedrom, or ones that haven't been made yet. One way to think about this JEP is that it adds a secondary "kernel" to documents, and further blurs the operational distinction between code and markdown cells, as markdown cells now contain "code" (in the form of diagram syntax) and that new markdown "code" relies on a particular "kernel" (version of mermaidJS). But I can also see how this could be a staring point for further refinement later, the way we used to have words that were specialized for Python (what started as "pyin" and "pyout" instead of the current "execution_count" and "execution_result").

Should this proposal pass as is, I would like to encourage further work to address cross-compatibility, such as capturing the Mermaid JS version used during authoring into the ipynb document, so other tools have a chance of resolving the ambiguity. To me, the alternative of capturing the MermaidJS version within the notebook document format seems like it would lead to too many minor version updates just for this one component.

Another proposal to address some of these concerns would be to add a rendered version of the diagrams (SVGs) as attachments to the markdown cell, with some convention for how to unambiguously use a pre-rendered version in cases where rendering is unavailable (no javascript) or fails (MermaidJS version mismatch).

@echarles
Copy link
Member

echarles commented Jun 27, 2023

Based on @ivanov comment on #101 (comment) and on question shared yesterday at the SSC meeting by @ibdafna I am converting my Yes to an Abstain.

I am also formalising here some feedbacks/questions I have asked yesterday during yesterday meeting:

  1. This JEP is very detailed and logical which is great, my doubt is rather on the foundation on which it applies. We don't have a well defined Markdown definition across all Jupyter projects as far as I understand, so building on sand will not drive us far.
  2. MyST is an option to have that rock-solid common definition. It sounds that embracing MyST is a much broader scope than this JEP, however I would prefer having that discussion to take informed decision on this JEP. Based on the recent experience on the JEP, I don't except a potential discussion around MyST to happen later that sooner, so please carry-on without that parameter.

@echarles
Copy link
Member

How this PR relates/articulates/coordinates with #99 and #103?

@bollwyvl
Copy link
Author

#103

Presumably it is orthogonal, as this proposed change only adopts existing, widely-adopted syntax with relatively sane failure modes. Put differently, if something in 103 breaks due to this PR, it will probably break other de facto GFM+ features employed on many other platforms.

#99

Welp, it would have to pile on a significant amount more metadata to be precise.

To dovetail with one of @ivanov's key points: let's not be overly-harsh on little old mermaid: we have a bunch of non-portable things done in markdown (and other Jupyter-adopted %languages) that result in pixel-level, or worse, differences: MathJax2, MathJax3, MathJax4, KaTeX, different math fonts, localized typefaces, and they are all extendable at pretty much all levels. If we think mermaid is heavy, wait until every tool would have to ship the matrix of all of the above!

the following is getting totally off topic, now, please ignore for the purposes of reviewing this proposal...

Frankly, I'm of a mind that the current model of many best effort tools (e.g. markdown, widgets, etc) really don't provide the level of reproducibility we yearn for. The above-mentioned "typesetting kernel," (also discussed in a number of other places) is indeed likely the high road.

I think the path there would pretty much abolish cell_type, making source a mimebundle-enabled "rich input" so that "A The Markdown" cell would just be "a self-contained cell my fancy whizzywhig client knows how to render into text/html" such that a downstream viewer might only have to view it as a completely standards-compliant, isolated document fragment. Capture the whole, standards-based description of the used (not installed) renderers. But then extend this thinking to the backend: kernels could tell us what syntax deviations they support, and almost certainly could turn text/x-ipython into text/python. And provide a description of their environments, with tools such as SPDX.

I'd go on to abolish cell_id, hoisting that content to the keys of a cells map, which manages ordering in some other way (linked list, "weight", whatever). Then all cells (and by extension all cell components, pretty much down to the line-of-code level) would be concretely addressable with a standards-compliant URL patterns. This would then allow transcluding them, ideally via some built in, accessibility-first technique, such as a picture of a duck. And, as a side-effect: real annotation and commenting.

But these are topics for a whole other day, the discussion around which would be... greatly facilitated by having a portable diagram syntax! ;P

rock-solid

Mermaid has been running like a champ for just a hair shy of a decade, and has a (number of) formal, closed-form grammars, but indeed, just a single implementation. I'd just like to be able to use it in Jupyter tools. This proposal tries to make as few uninformed claims about other syntaxes/tools/features... just the diagram ones that have proven stable enough for me to spend my time (trying) to implement against, and my findings in using them.

@fcollonval
Copy link
Contributor

Following the discussion at the SSC meeting yesterday, the mermaid JS website lists the available integrations. In addition to Git webservice (such as GitHub), online editing platforms are also integrating it, like Notion, Observable, HackMD.

@bollwyvl
Copy link
Author

online editing platforms are also integrating it

Yep, this is the called out in the summary as one of the main motivators of why this proposal concentrates on just one diagram format with just one markdown syntax. Even if a platform doesn't offer this syntax natively, open source platforms Jupyter already uses (such as Discourse) can enable it via first-party plugin.

That Discourse plugin offers one "metadata" extension to the syntax (~~~mermaid height=500) which is actually super useful for page layout to avoid heavy visual page reflows, but probably isn't widely-enough implemented to commit to supporting everywhere.

@rpwagner
Copy link

open source platforms Jupyter already uses (such as Discourse) can enable it via first-party plugin.

I think the distinction between hosted services, such as GitHub or Discourse, and standalone applications that need to support the exchange of documents in an agreed upon format (e.g., Jupyter notebooks) is important. Looking at the list of integrations I see that code editors exclusively use plugins to enable Mermaid.js support, and that nearly all the documentation generations tools do, as well. This suggests that the current practice for standalone applications is to recommend or maintain a well-supported plugin.

@Zsailer
Copy link
Member

Zsailer commented Jul 31, 2023

Hey folks, I think we should pause voting here temporarily—

I think the voting pattern on this JEP presents an interesting problem we haven't faced before; we should use this opportunity to learn/define how to handle this type of situation going forward.

While this JEP technically has more "Yes" votes (4) than "No" votes (1) at this point, there is an equal number of abstentions (4). It would make me extremely uncomfortable to pass this JEP without more buy-in from the SSC.

Further, when I read this thread, the temperature in the room that I'm perceiving is that these abstentions as "soft no"s—folks feel hesitant towards the proposal, but don't want to be the blocker to acceptance. If we are seeing a majority of these types of abstentions, (I believe) this raises a flag.

I believe SSC should work together to ensure that every member has enough time/info to formulate a firm opinion on each JEP (also, the SSC should ensure this happens in a timely manner—we'll get better at this over time). Otherwise, abstentions can become a crutch and allow JEPs to pass without more thorough investigation.

I think this JEP needs more discussion before voting, so I propose we pull it out of voting phase temporarily. I'll mention this in our next SSC meeting.

I want to thank @bollwyvl for proactively engaging every question/concern mentioned here and giving thorough responses. Also, thank you to everyone who has given careful comment/feedback on the JEP so far.

I think this exposed a gap in the SSC that we need to address. I've opened an issue on the SSC team-compass to discuss further, jupyter/software-steering-council-team-compass#4. I'll update this thread on our discussion after next week's meeting.

@echarles
Copy link
Member

This JEP is very detailed and logical which is great, my doubt is rather on the foundation on which it applies. We don't have a well defined Markdown definition across all Jupyter projects as far as I understand, so building on sand will not drive us far.

In this particular JEP, I have abstained not because of the JEP content itself (and I really want to thank @bollwyvl for the quality work he is doing here, and at a lot of other places), but rather because of the lack of definitions of Markdown outside of this JEP. This is not the first time I have similar feeling where a JEP as such sounds good, and it is easier to say yes to it, instead of looking wider, which in general, sounds synonym to "blocking", or "moving much slower". I don't expect anyone to agree that it is preferable to look wider, instead of having tiny steps even on unstable and unclear things, but I felt maybe useful to share the reasons for my abstention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-proposal: Adopting a text-based diagram syntax in Jupyter Markdown