Skip to content

paired md:myst format also works as MyST Markdown #1315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Jupytext ChangeLog

**Changed**
- Jupytext's default contents manager is now derived from the asynchronous AsyncLargeFileManager. Thanks to [Darshan Poudel](https://github.com/Darshan808) for making this finally happen ([#1328](https://github.com/mwouts/jupytext/pull/1328))!
- The [percent format](https://jupytext.readthedocs.io/en/latest/formats-scripts.html#the-percent-format) is now the default at the command line. If you run `jupytext --to py notebook.ipynb` you now get a `py:percent` script (use `--to py:light` for the light format) [#1201](https://github.com/mwouts/jupytext/pull/1201)
- The MyST frontmatter found in MyST Markdown notebooks is now mapped to a YAML header at the top of the Jupyter notebook. This way MyST notebooks in either the `md:myst` or `ipynb` format can be used with MyST. Thanks to [Ian Carroll](https://github.com/itcarroll) for proposing and implementing this change ([#1314](https://github.com/mwouts/jupytext/issues/1314))
- We have added and fixed round trip tests on MyST Markdown notebooks ([#759](https://github.com/mwouts/jupytext/issues/759), [#789](https://github.com/mwouts/jupytext/issues/789), [#1267](https://github.com/mwouts/jupytext/issues/1267), [#1317](https://github.com/mwouts/jupytext/issues/1317))
- The [percent format](https://jupytext.readthedocs.io/en/latest/formats-scripts.html#the-percent-format) is now the default format for scripts. If you run `jupytext --to py notebook.ipynb` you now get a `py:percent` script (use `--to py:light` for the light format) [#1201](https://github.com/mwouts/jupytext/pull/1201)
- The `rst2md` conversion now works with `sphinx-gallery>=0.8`. Thanks to [Thomas J. Fan](https://github.com/thomasjpfan) for fixing this! ([#1334](https://github.com/mwouts/jupytext/pull/1334))
- We have updated the JupyterLab extension dependencies ([#1300](https://github.com/mwouts/jupytext/pull/1300), [#1355](https://github.com/mwouts/jupytext/pull/1355), [#1360](https://github.com/mwouts/jupytext/pull/1360)). Thanks to [Mahendra Paipuri](https://github.com/mahendrapaipuri) for these PRs!

Expand Down
10 changes: 10 additions & 0 deletions src/jupytext/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ class JupytextConfiguration(Configurable):
config=True,
)

root_level_metadata_filter = Unicode(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to add a new configuration option - especially because this one is only implemented for the MyST format.

Do you think it would be feasible to use instead the list of frontmatter fields for moving the frontmatter fields back and forth from the MyST header to the raw cell?

When a field is in both the frontmatter fields and the notebook_metadata_filter like e.g. the kernelspecs, it should remain in the notebook metadata.

And when a field of the frontmatter is not in the raw cell it should not go to the main header but remain in the raw cell (but then we need to extend the existing raw cell with the metadata coming from the frontmatter fields... note to myself: add a test for that case)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be feasible to use instead the list of frontmatter fields for moving the frontmatter fields back and forth from the MyST header to the raw cell?

I take that suggestion back, the list of frontmatter fields is actually very long and will most likely change over time! Please keep what you have for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the configuration options because I suspect most users will eventually realize that only "kernelspec" should be at the root level (in addition to the raw cell frontmatter). Having the remaining notebook metadata under "jupyter" is much safer, since MyST has many fields and explicitly says "jupyter" is ignored by MyST at the file level.

I sorta began to implement it for other formats by defining a default ("-all") in metadata_filter.py. But well, no, that would be time poorly spent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you ask for it implemented everywhere, then it would not be poorly spent!

help="Notebook metadata that should be promoted to the root level in the text representations. "
"Examples: 'all', '-all', 'kernelspec,jupytext'",
config=True,
)

cell_metadata_filter = Unicode(
help="Cell metadata that should be saved in the text representations. "
"Examples: 'all', 'hide_input,hide_output'",
Expand Down Expand Up @@ -203,6 +209,10 @@ def set_default_format_options(self, format_options, read=False):
format_options.setdefault(
"cell_metadata_filter", self.default_cell_metadata_filter
)
if self.root_level_metadata_filter:
format_options.setdefault(
"root_level_metadata_filter", self.root_level_metadata_filter
)
if self.cell_metadata_filter:
format_options.setdefault("cell_metadata_filter", self.cell_metadata_filter)
if self.hide_notebook_metadata is not None:
Expand Down
7 changes: 4 additions & 3 deletions src/jupytext/formats.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ def read_metadata(text, ext):
if ext in [".r", ".R"] and not metadata:
metadata, _, _, _ = header_to_metadata_and_cell(lines, "#'", "", ext)

# MyST has the metadata at the root level
# metadata in MyST format may be at root level (i.e. not caught above)
if not metadata and ext in myst_extensions() and text.startswith("---"):
for header in yaml.safe_load_all(text):
if not isinstance(header, dict):
Expand All @@ -286,7 +286,7 @@ def read_metadata(text, ext):
header.get("jupytext", {})
.get("text_representation", {})
.get("format_name")
== "myst"
== MYST_FORMAT_NAME
):
return header
return metadata
Expand Down Expand Up @@ -535,7 +535,7 @@ def rearrange_jupytext_metadata(metadata):
if key in metadata:
metadata[key.replace("nbrmd", "jupytext")] = metadata.pop(key)

jupytext_metadata = metadata.pop("jupytext", {})
jupytext_metadata = metadata.get("jupytext", {})

if "jupytext_formats" in metadata:
jupytext_metadata["formats"] = metadata.pop("jupytext_formats")
Expand Down Expand Up @@ -734,6 +734,7 @@ def short_form_multiple_formats(jupytext_formats):
]
_VALID_FORMAT_OPTIONS = _BINARY_FORMAT_OPTIONS + [
"notebook_metadata_filter",
"root_level_metadata_filter",
"cell_metadata_filter",
"cell_markers",
"custom_cell_magics",
Expand Down
85 changes: 81 additions & 4 deletions src/jupytext/header.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Parse header of text notebooks
"""

import logging
import re

import nbformat
Expand All @@ -13,7 +14,13 @@
comment_lines,
default_language_from_metadata_and_ext,
)
from .metadata_filter import _DEFAULT_NOTEBOOK_METADATA, filter_metadata
from .metadata_filter import (
_DEFAULT_NOTEBOOK_METADATA,
_DEFAULT_ROOT_LEVEL_METADATA,
_JUPYTER_METADATA_NAMESPACE,
filter_metadata,
)
from .myst import MYST_FORMAT_NAME
from .pep8 import pep8_lines_between_cells
from .version import __version__

Expand Down Expand Up @@ -156,7 +163,7 @@
)


def recursive_update(target, update):
def recursive_update(target, update, overwrite=True):
"""Update recursively a (nested) dictionary with the content of another.
Inspired from https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth
"""
Expand All @@ -165,9 +172,15 @@
if value is None:
del target[key]
elif isinstance(value, dict):
target[key] = recursive_update(target.get(key, {}), value)
else:
target[key] = recursive_update(
target.get(key, {}),
value,
overwrite=overwrite,
)
elif overwrite:
target[key] = value
else:
target.setdefault(key, value)
return target


Expand Down Expand Up @@ -285,3 +298,67 @@
return metadata, jupyter, cell, i + 1

return metadata, False, None, start


def default_root_level_metadata_filter(fmt):
"""Return defaults for settings that promote or demote root level metadata."""
if fmt and fmt.get("format_name") == MYST_FORMAT_NAME:
from .myst import _DEFAULT_ROOT_LEVEL_METADATA as default_filter
else:
default_filter = _DEFAULT_ROOT_LEVEL_METADATA

Check warning on line 308 in src/jupytext/header.py

View check run for this annotation

Codecov / codecov/patch

src/jupytext/header.py#L308

Added line #L308 was not covered by tests
return default_filter


def metadata_to_metadata_and_cell(nb, metadata, fmt, unsupported_keys=None):
# stash notebook metadata, including keys promoted to the root level
metadata.update(
filter_metadata(
nb.metadata,
fmt.get("root_level_metadata_filter", ""),
default_root_level_metadata_filter(fmt),
unsupported_keys=unsupported_keys,
remove=True,
)
)
# move remaining metadata (i.e. frontmatter) to the first notebook cell
if nb.metadata and fmt.get("root_level_metadata_as_raw_cell", True):
frontmatter = yaml.safe_dump(nb.metadata, sort_keys=False)
nb.cells.insert(0, new_raw_cell("---\n" + frontmatter + "---"))
# attach the stashed metadata to notebook
nb.metadata = metadata
return nb


def metadata_and_cell_to_metadata(nb, fmt, unsupported_keys=None):
# new metadata from filtered nb.metadata
metadata = filter_metadata(
nb.metadata,
fmt.get("root_level_metadata_filter", ""),
default_root_level_metadata_filter(fmt),
unsupported_keys=unsupported_keys,
remove=True,
)
# remaining nb.metadata moved under namespace key for jupyter metadata
if nb.metadata:
metadata[_JUPYTER_METADATA_NAMESPACE] = nb.metadata
nb.metadata = metadata
# move first cell frontmatter to the root level of nb.metadata (overwrites)
if nb.cells and fmt.get("root_level_metadata_as_raw_cell", True):
cell = nb.cells[0]
if cell.cell_type == "raw":
lines = cell.source.strip("\n\t ").splitlines()
if (
len(lines) >= 2
and _HEADER_RE.match(lines[0])
and _HEADER_RE.match(lines[-1])
):
try:
frontmatter = next(yaml.safe_load_all(cell.source))
except (yaml.parser.ParserError, yaml.scanner.ScannerError):
logging.warning("[jupytext] failed to parse YAML in raw cell")
else:
nb.cells = nb.cells[1:]
nb.metadata = recursive_update(
frontmatter, nb.metadata, overwrite=False
)
return nb
27 changes: 25 additions & 2 deletions src/jupytext/jupytext.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,14 @@
update_jupytext_formats_metadata,
)
from .header import (
_JUPYTER_METADATA_NAMESPACE,
encoding_and_executable,
header_to_metadata_and_cell,
insert_jupytext_info_and_filter_metadata,
insert_or_test_version_number,
metadata_and_cell_to_header,
metadata_and_cell_to_metadata,
metadata_to_metadata_and_cell,
)
from .languages import (
_SCRIPT_EXTENSIONS,
Expand Down Expand Up @@ -101,7 +104,8 @@ def reads(self, s, **_):
return qmd_to_notebook(s)

if self.fmt.get("format_name") == MYST_FORMAT_NAME:
return myst_to_notebook(s)
nb = myst_to_notebook(s)
return self.split_frontmatter(nb)

lines = s.splitlines()

Expand Down Expand Up @@ -239,8 +243,10 @@ def writes(self, nb, metadata=None, **kwargs):
default_lexer_from_jupytext_metadata = metadata.get("jupytext", {}).pop(
"default_lexer", None
)
nb = self.filter_notebook(nb, metadata)
nb = self.merge_frontmatter(nb)
return notebook_to_myst(
self.filter_notebook(nb, metadata),
nb,
default_lexer=default_lexer_from_language_info
or default_lexer_from_jupytext_metadata,
)
Expand Down Expand Up @@ -353,6 +359,23 @@ def writes(self, nb, metadata=None, **kwargs):

return "\n".join(header + lines)

def split_frontmatter(self, nb):
"""Use during self.reads to separate notebook metadata from other frontmatter."""
unsupported_keys = set()
metadata = nb.metadata.pop(_JUPYTER_METADATA_NAMESPACE, {})
metadata.setdefault("jupytext", nb.metadata.get("jupytext", {}))
self.update_fmt_with_notebook_options(deepcopy(metadata), read=True)
nb = metadata_to_metadata_and_cell(nb, metadata, self.fmt, unsupported_keys)
_warn_on_unsupported_keys(unsupported_keys)
return nb

def merge_frontmatter(self, nb):
"""Use during self.writes to rewrite notebook metadata as frontmatter content."""
unsupported_keys = set()
nb = metadata_and_cell_to_metadata(nb, self.fmt, unsupported_keys)
_warn_on_unsupported_keys(unsupported_keys)
return nb


def reads(text, fmt=None, as_version=nbformat.NO_CONVERT, config=None, **kwargs):
"""
Expand Down
64 changes: 47 additions & 17 deletions src/jupytext/metadata_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
"tocdepth",
]
)
_JUPYTER_METADATA_NAMESPACE = "jupyter"
_DEFAULT_ROOT_LEVEL_METADATA = "-all"


def metadata_filter_as_dict(metadata_config):
Expand Down Expand Up @@ -128,7 +130,9 @@
)


def filter_metadata(metadata, user_filter, default_filter="", unsupported_keys=None):
def filter_metadata(
metadata, user_filter, default_filter="", unsupported_keys=None, **kwargs
):
"""Filter the cell or notebook metadata, according to the user preference"""
default_filter = metadata_filter_as_dict(default_filter) or {}
user_filter = metadata_filter_as_dict(user_filter) or {}
Expand All @@ -147,27 +151,40 @@
if default_exclude == "all":
if user_include == "all":
return subset_metadata(
metadata, exclude=user_exclude, unsupported_keys=unsupported_keys
metadata,
exclude=user_exclude,
unsupported_keys=unsupported_keys,
**kwargs,
)
if user_exclude == "all":
return subset_metadata(
metadata, keep_only=user_include, unsupported_keys=unsupported_keys
metadata,
keep_only=user_include,
unsupported_keys=unsupported_keys,
**kwargs,
)
return subset_metadata(
metadata,
keep_only=set(user_include).union(default_include),
exclude=user_exclude,
unsupported_keys=unsupported_keys,
**kwargs,
)

# cell default filter = all metadata but removed ones
if user_include == "all":
return subset_metadata(
metadata, exclude=user_exclude, unsupported_keys=unsupported_keys
metadata,
exclude=user_exclude,
unsupported_keys=unsupported_keys,
**kwargs,
)
if user_exclude == "all":
return subset_metadata(
metadata, keep_only=user_include, unsupported_keys=unsupported_keys
metadata,
keep_only=user_include,
unsupported_keys=unsupported_keys,
**kwargs,
)
# Do not serialize empty tags
if "tags" in metadata and not metadata["tags"]:
Expand All @@ -177,6 +194,7 @@
metadata,
exclude=set(user_exclude).union(set(default_exclude).difference(user_include)),
unsupported_keys=unsupported_keys,
**kwargs,
)


Expand All @@ -197,24 +215,36 @@
for key in metadata:
if not is_valid_metadata_key(key):
unsupported_keys.add(key)
return {key: value for key, value in metadata.items() if is_valid_metadata_key(key)}
return [key for key in metadata if is_valid_metadata_key(key)]


def subset_metadata(metadata, keep_only=None, exclude=None, unsupported_keys=None):
def subset_metadata(
metadata, keep_only=None, exclude=None, unsupported_keys=None, remove=False
):
"""Filter the metadata"""
metadata = suppress_unsupported_keys(metadata, unsupported_keys=unsupported_keys)
supported_keys = suppress_unsupported_keys(
metadata, unsupported_keys=unsupported_keys
)
if keep_only is not None:
filtered_metadata = {key: metadata[key] for key in metadata if key in keep_only}
keys = [key for key in supported_keys if key in keep_only]
if remove:
filtered_metadata = {key: metadata.pop(key) for key in keys}
else:
filtered_metadata = {key: metadata[key] for key in keys}
sub_keep_only = second_level(keep_only)
for key in sub_keep_only:
if key in metadata:
filtered_metadata[key] = subset_metadata(
metadata[key],
keep_only=sub_keep_only[key],
unsupported_keys=unsupported_keys,
)
keys = [key for key in supported_keys if key in sub_keep_only]
for key in keys:
filtered_metadata[key] = subset_metadata(
metadata[key],
keep_only=sub_keep_only[key],
unsupported_keys=unsupported_keys,
remove=remove,
)
else:
filtered_metadata = copy(metadata)
if remove:
filtered_metadata = {key: metadata.pop(key) for key in supported_keys}

Check warning on line 245 in src/jupytext/metadata_filter.py

View check run for this annotation

Codecov / codecov/patch

src/jupytext/metadata_filter.py#L245

Added line #L245 was not covered by tests
else:
filtered_metadata = {key: metadata[key] for key in supported_keys}

if exclude is not None:
for key in exclude:
Expand Down
Loading
Loading