Skip to content

Conversation

@heaven00
Copy link
Collaborator

@heaven00 heaven00 commented Nov 15, 2025

Description

Adding capability to generate mermaid diagrams from schema, the representation in mermaid includes the following

  • column names
  • data type
  • primary key or foreign key
  • relationships ( first_entity, cardinality, second_entity, label)

Related Issues

Additional Context

Task List

  • Create a module dlt/helpers/mermaid.py
  • Build the diagram by manipulating strings without additional Python dependencies
  • Write modular functions and tests for generating columns, tables, etc.
  • Add dlt.Schema.to_mermaid() method
  • Add to the CLI dlt schema --format mermaid FILE and dlt pipeline NAME schema --format mermaid
  • Update CLI docs
  • Add example export to the docs

@heaven00
Copy link
Collaborator Author

@zilto I seem to have steered away from regular parameters a bit in the implementation.

  • Do I need to work with TStoredSchema instead of the Schema object?
  • Is it okay to include table_names as parameter or stick to show everything and just provide removing dlt private tables?

Copy link
Collaborator

@zilto zilto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is almost there! There are few things to align with existing to_dot() and to_dbml() for consistency and some unit tests to add

return "".join(items)


def _columns_to_text(columns: TTableSchemaColumns) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's worth having a _to_mermaid_column() function that generates a single column. Then we have:

  • _to_mermaid_column()
  • _to_mermaid_table()
  • _to_mermaid_reference()
  • schema_to_mermaid()

Then, add unit tests to this function to check if it produces the expected results for:

  • no hint
  • each individual hint
  • 2+ hints

@zilto zilto added the enhancement New feature or request label Nov 18, 2025
@zilto zilto linked an issue Nov 18, 2025 that may be closed by this pull request
@zilto
Copy link
Collaborator

zilto commented Nov 18, 2025

Additional arg

I think it would be useful to have a top-level arg hide_columns: bool (available on schema_to_mermaid() and dlt.Schema.to_mermaid(). if hide_columns is True then we just have table names and references. This gives a much more compact visual when you have a lot of tables (this could be added in .to_dot() and .to_dbml())

@heaven00
Copy link
Collaborator Author

heaven00 commented Nov 21, 2025

hey @zilto almost done with the feedback you provided (thanks for that)

one question, at the moment when we include all the dlt tables the _dlt_version renders separately

Screenshot 2025-11-21 at 08-10-00 Online FlowChart   Diagrams Editor - Mermaid Live Editor

though the version is populated by every table should it be any different? things pending

  • hide_columns feature
  • cli integration and testing with the new parameters for pipeline and schema
  • documentation update

@zilto zilto mentioned this pull request Nov 21, 2025
@zilto
Copy link
Collaborator

zilto commented Nov 24, 2025

closing in favor of #3364

@zilto zilto closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: dlt.Schema.to_mermaid()

2 participants