Skip to content

Python Markdown extension to include local or remote files

License

Notifications You must be signed in to change notification settings

neurobin/mdx_include

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Include extension for Python Markdown. It lets you include local or remote (downloadable) files into your markdown at arbitrary positions.

This project is motivated by markdown-include and provides the same functionalities with some extras.

Inclusion for local file is by default recursive and for remote file non-recursive. You can change this behavior through configuration.

You can include part of the file by slicing according to line/column number.

File/Downloaded contents are cached, i.e if you include same file multiple times in multiple places, they won't be read/downloaded more than once. This behavior can also be changed with configuration.

Circular inclusion by default raises an exception. You can change this behavior to include the affected files in non-recursive mode through configuration.

You should not use markdown-include along with this extension, choose either one, not both.

Syntax

  1. Simple: {! file_path_or_url !}
  2. With explicit encoding: {! file_path_or_url | encoding !}
  3. With recurs_state on: {!+ file_path_or_url !} or {!+ file_path_or_url | encoding !}. This makes the included file to be able to include other files. This is meaningful only when recursion is set to None. If it is set to False, this explicit recurs_state defintion can not force recursion. This is a depth 1 recursion, so you can choose which one to recurs and which one to not.
  4. With recurs_state off: {!- file_path_or_url !} or {!- file_path_or_url | encoding !}. This will force not to recurs even when recursion is set to True.
  5. Applying indentation {!> file_path_or_url!}. This will apply the indentation found in the include line before the include for all the lines in the included file.
  6. Escaped syntax: You can escape it to get the literal. For example, \{! file_path_or_url !} will give you literal {! file_path_or_url !} and \\\{! file_path_or_url !} will give you \{! file_path_or_url !}
  7. File slice: You can slice a file by line and column number. The syntax is {! file_path [ln:l.c-l.c,l.c-l.c,...] !}. No spaces allowed inside file slice syntax [ln:l.c-l.c,l.c-l.c,]. See more detals in File slicing section.

General syntax: {!recurs_state apply_indent file_path_or_url [ln:slice_syntax] | encoding !}

The spaces are not necessary. They are just to make it look nice :) . No spaces allowed between {! and recurs_state (+-). If apply indentation is specified then it must follow recurse_state immediately or the {! if recurse_state is not specified.

You can change the syntax!!!

If you don't like the syntax you can change it through configuration.

There might be some complications with the syntax {!file!}, for example, conflict with markdown.extensions.attr_list which uses {:?something}. As the : is optional, a typical problem that occurs is this one:

A paragraph
{!our syntax!}

would produce:

<p syntax_="syntax!" _our="!our">A paragraph</p>

If you really want to avoid this type of collision, find some character sequence that is not being used by any extension that you are using and use those character sequences to make up the syntax.

See the configuration section for details

Install

Install from Pypi:

pip install mdx_include

Usage

text = r"""
some text {! test1.md !} some more text {! test2.md | utf-8 !}

Escaping will give you the exact literal \{! some_file !}

If you escape, then the backslash will be removed.

If you want the backslash too, then provide two more: \\\{! some_file !}
"""
md = markdown.Markdown(extensions=['mdx_include'])
html = md.convert(text)
print(html)

Example output:

(when test1.md contains a single line **This is test1.md** and test2.md contains **This is test2.md**)

<p>some text <strong>This is test1.md</strong> some more text <strong>This is test2.md</strong></p>
<p>Escaping will give you the exact literal {! some_file !}</p>
<p>If you escape, then the backslash will be removed.</p>
<p>If you want the backslash too, then provide two more: \{! some_file !}</p>

Configuration

Config param Default Details
base_path . The base path from which relative paths are normalized.
encoding utf-8 The file encoding.
allow_local True Whether to allow including local files.
allow_remote True Whether to allow including remote files.
truncate_on_failure True Whether to truncate the matched include syntax on failure. False value for both allow_local and allow_remote is treated as a failure.
recurs_local True Whether the inclusions are recursive on local files. Options are: True, False and None. None is a neutral value with negative default and overridable with recurs_state (e.g {!+file!}). False will permanently prevent recursion i.e you won't be able to override it with the recurs_state. True value is overridable with recurs_state (e.g {!-file!}).
recurs_remote False Whether the inclusions are recursive on remote files. Options are: True, False and None. None is a neutral value with negative default and overridable with recurs_state (e.g {!+file!}). False will permanently prevent recursion i.e you won't be able to override it with the recurs_state. True value is overridable with recurs_state (e.g {!-file!}).
syntax_left \{! The left boundary of the syntax. (Used in regex, thus escaped {).
syntax_right !\} The right boundary of the syntax. (Used in regex, thus escaped }).
syntax_delim \| The delimiter that separates encoding from path_or_url. (Used in regex, thus escaped |).
syntax_recurs_on + The character to specify recurs_state on. (Used in regex).
syntax_recurs_off - The character to specify recurs_state off. (Used in regex).
syntax_apply_indent \> The character which stands for applying indentation found before the include for the lines included from the files.
content_cache_local True Whether to cache content for local files.
content_cache_remote True Whether to cache content for remote files.
content_cache_clean_local False Whether to clean content cache for local files after processing all the includes
content_cache_clean_remote False Whether to clean content cache for remote files after processing all the includes
allow_circular_inclusion False Whether to allow circular inclusion. If allowed, the affected files will be included in non-recursive mode, otherwise it will raise an exception.
line_slice_separator ['',''] A list of lines that will be used to separate parts specified by line slice syntax: 1-2,3-4,5 etc.
recursive_relative_path False Whether include paths inside recursive files should be relative to the parent file path

Example with configuration

configs = {
            'mdx_include': {
                'base_path': 'mdx_include/test/',
                'encoding': 'utf-8',
                'allow_local': True,
                'allow_remote': True,
                'truncate_on_failure': False,
                'recurs_local': None,
                'recurs_remote': False,
                'syntax_left': r'\{!',
                'syntax_right': r'!\}',
                'syntax_delim': r'\|',
                'syntax_recurs_on': '+',
                'syntax_recurs_off': '-',
                'syntax_apply_indent': r'\>',
            },
        }

text = r"""
some text {! some_file !} some more text {! some_more_file | utf-8 !}

Escaping will give you the exact literal \{! some_file !}

If you escape, then the backslash will be removed.

If you want the backslash too, then provide two more: \\\{! some_file !}
"""
md = markdown.Markdown(extensions=['mdx_include'], extension_configs=configs)
html = md.convert(text)
print(html)

File slicing

You can include part of the file from certain line/column number to certain line/column number.

The general file slice syntax is: [ln:l.c-l.c,l.c-l.c,...], where l is the line number and c is the column number. All indexes are inclusive.

Examples:

Slice Details
[ln:1-4] line 1 to 4 (both inclusive)
[ln:1.2-3.4] character 2 in line 1 to character 4 in line 3
[ln:2-] line 2 to all of the rest
[ln:-3] Last line to 3rd line (reversion)
[ln:6-2] 6th line to 2nd line (reversion)
[ln:2.9-2.2] From 9th character of line 2 to 2nd character of line 2 (string reverse)
[ln:.3-.10] Slice along the column from every row, from 3rd character to 10th character
[ln:2] line 2 only
[ln:e] last line

Multiple slicing can be done by adding more slice expressions with commas (s,). In this case, a separator (default is two newlines) is inserted between each slice. For example, with slice expression 1-2,4-9, two newlines will be inserted between the lines 1-2 and 4-9.

More details on the rcslice doc

Manual cache control

The configuration gives you enough cache control, but that's not where it ends :). You can do manual cache cleaning instead of letting the extension handle it for itself. First turn the auto cache cleaning off by setting content_cache_clean_local and/or content_cache_clean_remote to False (this is default), then call the cache cleaning function manually on the markdown object whenever you want:

md.mdx_include_content_cache_clean_local()
md.mdx_include_content_cache_clean_remote()

You can also get the internal cache dictionary and make inplace modification (e.g cleaning a specific cache for a specific file/URL, or even modify the cached content):

local_cache_dict = md.mdx_include_get_content_cache_local()
remote_cache_dict = md.mdx_include_get_content_cache_remote()

How circular inclusion works

Let's say, there are three files, A, B and C. A includes B, B includes C and C inclues A and we are doing recursive include.

If circular inclusion is not allowed in the config i.e if allow_circular_inclusion is False (which is the default) then it will raise an exception.

If allow_circular_inclusion is set to True, then it will work like this:

  1. A and B will be normally included
  2. B includes C normally too
  3. C includes A which is a circular inclusion (C>A>B>C>A>B>C...). Thus A will be included in non-recursive mode as allow_circular_inclusion is set to True i.e C will include A literally without parsing A anymore.

An example of including a gist

The following markdown:

Including a gist:
    
```python
{! https://gist.github.com/drgarcia1986/3cce1d134c3c3eeb01bd/raw/73951574d6b62a18b4c342235006ff89d299f879/django_hello.py !}
```

will produce (with fenced code block enabled):

<p>Including a gist:</p>
<pre><code class="python"># -*- coding: utf-8 -*-

# Settings
from django.conf import settings


settings.configure(
    DEBUG=True,
    SECRET_KEY='secretfoobar',
    ROOT_URLCONF=__name__,
    MIDDLEWARE_CLASSES=(
        'django.middleware.common.CommonMiddleware',
        'django.middleware.csrf.CsrfViewMiddleware',
        'django.middleware.clickjacking.XFrameOptionsMiddleware',
    )
)


# Views
from django.http import HttpResponse
from django.conf.urls import url


def index(request):
    return HttpResponse('&lt;h1&gt;Hello Word&lt;/h1&gt;')

# Routes
urlpatterns = (
    url(r'^$', index),
)


# RunServer
if __name__ == '__main__':
    from django.core.management import execute_from_command_line
    import sys

    execute_from_command_line(sys.argv)

</code></pre>