Skip to content

Conversation

@plaharanne
Copy link

@plaharanne plaharanne commented Jan 8, 2026

What's Inside

The import flow has been revised and therefore the documentation page is not anymore relevant as it is.
The new version of the page has new screenshots and the text fits what the user see on the screen and the available features and settings.

Preview

See https://crate-cloud--114.org.readthedocs.build/en/114/

Highlights

Checklist

Summary by CodeRabbit

  • Documentation
    • Rewrote import documentation into a streamlined, file-centric flow covering local files, URLs, S3, Azure, and MongoDB.
    • Emphasized a single import flow (select format → source → destination table) and added a sample-data note.
    • Updated visuals and terminology to file-based imports; clarified S3/Azure permissions and multi-file wildcard usage.
    • Renamed schema settings to “Allow schema evolution,” added type-mismatch examples, and a File Format Limitations section.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

Walkthrough

Rewrote import documentation to center on file-based imports (local files, URLs, AWS S3, Azure, MongoDB) with a unified "File Import" flow, updated images and form terminology, clarified S3/Azure and multi-file (wildcard) guidance, renamed schema-evolution controls to "Allow schema evolution", and added File Format Limitations and sample-data guidance.

Changes

Cohort / File(s) Change Summary
Primary doc
docs/cluster/import.md
Full rewrite from URL/history-centric to a file-centric import guide: new "File Import" flow (select format, source, destination), unified messaging, updated images, removed per-source URL examples, and added sample-data note.
S3 / Azure / Multi-file
docs/cluster/import.md (S3/Azure sections)
Restructured S3/Azure subsections to focus on file imports, clarified permission/secret notes, added wildcard/globbing usage for multi-file imports, and removed verbose legacy globbing exposition.
Schema & Formats
docs/cluster/import.md (schema & formats)
Renamed "Schema Evolution" to "Allow schema evolution", clarified behavior and limits (including type-mismatch example), and added a File Format Limitations subsection covering CSV, JSON (Documents/Arrays/JSON‑Lines), and Parquet specifics.
Integrations / Cross-references
docs/.../cluster-integrations, docs/cluster/import.md
Replaced detailed per-source integration examples with cross-references to cluster-integrations and adjusted wording to point readers to integration docs.
Images & Visuals
docs/cluster/import.md, docs/.../images/*
Updated/renamed images and image references to reflect file-based import UI and flow.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • Import: Copy editing #111: Overlapping edits to docs/cluster/import.md touching section renames, S3/Azure wording, and globbing guidance.
  • Import: Use less "Import" #113: Related restructuring and renaming within docs/cluster/import.md, consolidating import flow and form terminology.

Poem

🐇 I nibbled through docs and hopped to the core,
From URL trails to files and more,
S3 hops, Azure bounds, a schema's new dance,
Wildcards flutter — give imports a chance,
A tiny rabbit cheers this docs romance ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refresh 'Import' documentation' is clear, concise, and directly reflects the main change—updating the Import documentation to align with revised import flow.
Description check ✅ Passed The PR description follows the template with 'What's Inside', 'Preview', 'Checklist' sections completed; 'Highlights' is empty but non-critical. The description clearly explains the rationale and provides a preview link and linked issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch p/import

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)

111-123: Fix double space in Schema Evolution section.

Line 115 contains a formatting inconsistency with double space before "evolution" in "Allow schema evolution".

✏️ Proposed fix
  import process. It can be toggled via the 'Allow schema  evolution' checkbox
+ import process. It can be toggled via the 'Allow schema evolution' checkbox
🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 11-15: Update the typo in the import formats list by replacing the
incorrect word "Paquet" with "Parquet" in the bullet list (the line that
currently reads "Paquet"); ensure the list remains: CSV, JSON (JSON-Lines, JSON
Arrays and JSON Documents), Parquet, MongoDB collection.
- Around line 52-58: Update the sentence in the S3 import documentation to
correct the typo "file form bucket" to "file from bucket" (in the paragraph
describing CrateDB Cloud imports in docs/cluster/import.md) so the sentence
reads "To import a file from a bucket, provide the name of your bucket, and path
to the file."; ensure only the typo is changed and punctuation remains
consistent.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54a7928 and 94a841d.

⛔ Files ignored due to path filters (9)
  • docs/_assets/img/cluster-import-file-form.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-globbing.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-azure.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-choice.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-file.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-s3.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab-url.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import-tab.png is excluded by !**/*.png
  • docs/_assets/img/cluster-import.png is excluded by !**/*.png
📒 Files selected for processing (1)
  • docs/cluster/import.md
🧰 Additional context used
🪛 LanguageTool
docs/cluster/import.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...N-Lines, JSON Arrays and JSON Documents) - Paquet - MongoDB collection :::{note} If you don...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~53-~53: Ensure spelling is correct
Context: ...S3-compatible storage. To import a file form bucket, provide the name of your bucket...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (4)
docs/cluster/import.md (4)

30-47: File Import section looks good.

Clear and concise explanation of the import process with appropriate image reference. The schema evolution toggle description is accurate and helpful.


86-100: Azure subsection is well-structured.

Clear documentation of Azure Blob Storage imports with appropriate guidance on secrets and file path configuration. Wildcard notation and file size limits are clearly documented.


102-105: Integration section reference is appropriate.

Brief section appropriately directs users to more comprehensive integration documentation. Cross-reference syntax is correct.


126-165: File Format Limitations section is well-documented.

Comprehensive coverage of format-specific limitations with clear code examples for JSON variants. Documentation effectively demonstrates expected input formats.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Line 115: Fix the double space in the checkbox label by replacing "Allow
schema  evolution" with "Allow schema evolution" and make capitalization
consistent across the doc (e.g., use "Schema Evolution" or "schema evolution"
everywhere); update the string used in the text to the chosen capitalization so
the checkbox reference matches other occurrences.
🧹 Nitpick comments (2)
docs/cluster/import.md (2)

11-15: Consolidate duplicate data format listings.

The supported data formats are listed twice: once in the introduction (lines 12-14) and again in the File Import section (lines 41-43). Consider removing the duplicate in the File Import section or consolidating into a single reference, as this may confuse users about whether there are different format constraints for different import methods.

Also applies to: 39-45


126-165: File Format Limitations section is well-documented for JSON and CSV.

The examples are clear and helpful. However, Parquet format is listed as supported but has no details or examples, unlike CSV and JSON. Consider adding Parquet documentation for completeness.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94a841d and a256a0b.

📒 Files selected for processing (1)
  • docs/cluster/import.md
🔇 Additional comments (2)
docs/cluster/import.md (2)

30-47: File Import section is clear and well-structured.

The simplified flow with file format, source, and table selection is straightforward. The schema evolution explanation is helpful context at this level.


100-100: Add missing period.

Line 100 ends without a period.

Fix missing punctuation
-File size limitation for imports is 10 GiB per file.
+File size limitation for imports is 10 GiB per file.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/cluster/import.md (1)

126-165: File Format Limitations section is incomplete; missing Parquet and MongoDB documentation.

The section documents CSV and JSON formats well with helpful examples, but is missing documentation for two formats that are prominently promoted earlier:

  1. Parquet is mentioned in the introduction (Line 14) and File Import section (Line 43) but has no entry in File Format Limitations.
  2. MongoDB collection is mentioned in the introduction (Line 15) but has no entry in File Format Limitations.

Additionally, the JSON code examples use :::{code} console language tags (Lines 138, 147, 162), which may be semantically incorrect; these should likely use :::{code} json for proper syntax highlighting.

Please add documentation for Parquet and MongoDB collection formats, explaining:

  • Parquet: any schema/type handling specifics, nested structure behavior
  • MongoDB collection: connection requirements, field mapping behavior

Also review the code block language tags for semantic accuracy.

🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 4-9: Add three new subsections under the "File Import" area
mirroring the scope of the existing "S3" and "Azure" sections: "Local file",
"URL", and "MongoDB". For "Local file" document the upload process (steps to
upload via UI/CLI/API), accepted file formats, size limits, and any
preprocessing or format requirements; for "URL" document supported URL schemes
(http/https), authentication options (basic, bearer, signed URLs), timeout/retry
behavior, and how to reference the URL in import commands; for "MongoDB"
document connection string format, required drivers/versions, auth mechanisms,
how to select a database/collection, field mapping/transform examples, and any
batching/throughput limits. Follow the same tone/structure and examples used in
the "S3 Import" and "Azure Import" sections so the new subsections are
consistent and include sample commands, configuration keys, and known
limitations.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a256a0b and 70e7909.

📒 Files selected for processing (1)
  • docs/cluster/import.md
🔇 Additional comments (6)
docs/cluster/import.md (6)

4-21: Verify import history UI reference and image asset.

Line 23-24 references a "Import history" tab, but this feature description appears after the introduction and before explaining how to perform imports. For new users, the logical flow would benefit from explaining the basic import process first before mentioning historical references. Additionally, verify that the referenced image at Line 28 (cluster-import.png) exists and correctly depicts the current UI.


30-48: Verify image asset and source documentation completeness.

The File Import section provides a clear overview of the unified workflow. However, verify that:

  1. The image cluster-import-file-form.png (Line 47) exists and correctly depicts the current file import form.
  2. All five import sources mentioned in the introduction (local file, URL, AWS S3, Azure, MongoDB) are documented in dedicated sections. Currently, only S3 (Lines 49-85) and Azure (Lines 86-101) have subsections; local file, URL, and MongoDB guidance is missing.

49-85: S3 guidance is comprehensive.

The AWS S3 section provides clear instructions including bucket/path requirements, authentication, wildcard support for multi-file imports, and relevant IAM policy examples. The 10 GiB file size limit and egress cost warning are appropriately documented.

Please verify that the IAM policy example (Lines 68-83) reflects current AWS S3 best practices and that no additional S3-specific permissions (e.g., s3:ListBucket for prefix matching) are required for wildcard imports to function correctly.


86-101: Azure guidance is clear and consistent with S3 structure.

The Azure section appropriately documents secret-based authentication, path format, wildcard support, and file size limits. The mention of admin-level secret management is important operational guidance.


102-106: Clarify the Integration section purpose.

The Integration section defers entirely to another documentation page via cross-reference. This is acceptable if comprehensive data integration guidance exists elsewhere, but the section feels incomplete for users reading the import documentation. Consider adding 1-2 sentences explaining what integrations are (e.g., "Integrations allow connecting external data sources for continuous sync") before the cross-reference to provide better context.

Also, verify that the reference {ref}cluster-integrations is correct and that the target page exists and is appropriately maintained.


108-124: Schema evolution section is well-documented with good examples.

The explanation of schema evolution behavior is clear, limitations are explicit, and the type-mismatch example effectively illustrates edge cases. The toggle naming is consistent with the File Import section (Line 36).

Confirm that the described schema evolution behavior (automatic column addition only, type mismatch failures) matches the current product implementation. Also verify whether there are additional limitations (e.g., constraints on column types, handling of nested JSON structures) that should be documented.

Comment on lines 4 to 9
You can import data into your CrateDB directly from sources like:
- local file
- URL
- AWS S3 bucket
- Azure storage
- MongoDB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing documentation for three import sources: local file, URL, and MongoDB.

The introduction lists five import sources (local file, URL, AWS S3, Azure, MongoDB), and the File Import section implies all are equally accessible. However, only S3 (Lines 49-85) and Azure (Lines 86-101) have dedicated documentation sections. Local file, URL, and MongoDB sources lack any guidance.

For users trying to use these sources, there is no information about:

  • Local file: upload process, file size limits, format requirements
  • URL: authentication (if needed), supported URL schemes, timeout behavior
  • MongoDB: connection string format, collection selection, authentication, field mapping

Please add subsections (similar in scope to the S3 and Azure sections) for each missing source, documenting their specific requirements, limitations, and any relevant configuration details.

Also applies to: 30-48

🤖 Prompt for AI Agents
In @docs/cluster/import.md around lines 4 - 9, Add three new subsections under
the "File Import" area mirroring the scope of the existing "S3" and "Azure"
sections: "Local file", "URL", and "MongoDB". For "Local file" document the
upload process (steps to upload via UI/CLI/API), accepted file formats, size
limits, and any preprocessing or format requirements; for "URL" document
supported URL schemes (http/https), authentication options (basic, bearer,
signed URLs), timeout/retry behavior, and how to reference the URL in import
commands; for "MongoDB" document connection string format, required
drivers/versions, auth mechanisms, how to select a database/collection, field
mapping/transform examples, and any batching/throughput limits. Follow the same
tone/structure and examples used in the "S3 Import" and "Azure Import" sections
so the new subsections are consistent and include sample commands, configuration
keys, and known limitations.

@plaharanne plaharanne requested a review from joerg84 January 9, 2026 14:32
like date and status.
By navigating to "Show details", you can display details of a particular
import job.
You can import data into your CrateDB directly from various sources, including:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General style question: This page has two very different styles of adressing the user/capabilities:
You can import vs CrateDB Cloud allows further down.

Do we have a general recommended style/guidelines?
cc @bmunkholm

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/cluster/import.md (2)

39-45: Remove duplicate format listing.

The supported data formats are already documented in the introduction (lines 11-15). This duplication should be removed to avoid maintenance issues if formats change in the future.

Proposed refactor to remove duplication
-The following data formats are supported:
-
--   CSV (all variants)
--   JSON (JSON-Lines, JSON Arrays and JSON Documents)
--   Parquet
-
-Gzip compressed files are also supported.

Instead, reference the intro section or consolidate format details into a single location. If compression support is important to highlight here, consider: "All formats listed above support Gzip compression."


132-165: Fix JSON code block directives and document Parquet format limitations.

Two issues:

  1. Code block directive: JSON examples use :::{code} console (lines 138, 147, 162), which may not provide proper syntax highlighting. Use json instead for better readability:

    :::{code} json
    
  2. Incomplete format coverage: The intro mentions both Parquet (line 14) and MongoDB collection (line 15) as supported formats, but the File Format Limitations section omits both. Add subsections documenting Parquet behavior and MongoDB-specific limitations or behaviors.

Proposed fix for JSON code block directive
-    :::{code} console
+    :::{code} json
     {
       "id":1,
       "text": "example"
     }
-    :::
+    :::

Repeat for lines 147 and 162 as well.

🤖 Fix all issues with AI agents
In @docs/cluster/import.md:
- Around line 102-105: The Integration section titled
"(cluster-import-integration) ## Integration" is too minimal and causes
confusion about how integrations differ from File Import and whether MongoDB is
supported as a one-time source; expand this section to briefly define
"integrations" vs "one-time file imports", explicitly state that MongoDB CDC
covers real-time sync while noting whether one-time MongoDB imports are
supported (and link to the MongoDB CDC page), and either move this expanded
section to follow the File Import subsections or add a clear
cross-reference/note under "File Import" clarifying that "MongoDB collection" in
formats refers to format support (not source import) unless one-time MongoDB
import is implemented—if one-time MongoDB import exists, add a short how-to
summary or link to its docs.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70e7909 and 4a0954f.

📒 Files selected for processing (1)
  • docs/cluster/import.md
🔇 Additional comments (5)
docs/cluster/import.md (5)

4-21: LGTM! The introduction effectively summarizes available sources and formats, and the sample data note is a nice UX touch for users.


49-84: LGTM! The S3 section is comprehensive, with clear guidance on permissions and multi-file imports via wildcards. The JSON policy example is helpful.


86-100: LGTM! The Azure section covers the key requirements and includes multi-file wildcard support. The 10 GiB per-file limit is consistently stated.


115-116: Verify schema evolution checkbox label matches the UI.

The documentation states the feature "can be toggled via the 'Allow schema evolution' checkbox" (lines 115-116). Confirm this exact label matches the current UI control name, as checkbox labels are frequently targets of UX refinement and may differ from documentation.


28-28: All image asset paths are correct. Both cluster-import.png (line 28) and cluster-import-file-form.png (line 47) exist in docs/_assets/img/ and are properly referenced in the documentation.

Comment on lines +102 to +105
(cluster-import-integration)=
## Integration

As with other imports, the supported file types are CSV, JSON, and
Parquet.

(cluster-import-file)=
## File

Uploading directly from your computer offers more control over your
data. From the security point of view, you don't have to share the data
on the internet just to be able to import it to your cluster. You also
have more control over who has access to your data. Your files are
temporarily uploaded to a secure location managed by Crate (an S3 bucket
in AWS) which is not publicly accessible. The files are automatically
deleted after 3 days. You may re-import the same file into multiple
tables without having to re-upload it within those 3 days. Up to 5 files
may be uploaded at the same time, with the oldest ones being
automatically deleted if you upload more.

![Cloud Console cluster upload from file](../_assets/img/cluster-import-tab-file.png)

As with other import, the supported file formats are:

- CSV (all variants)
- JSON (JSON-Lines, JSON Arrays and JSON Documents)
- Parquet
{ref}`More info about data integration. <cluster-integrations>`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and get its size
wc -l docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 86


🏁 Script executed:

# Read the entire file to understand structure and content
cat -n docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 5804


🏁 Script executed:

# Search for MongoDB references in the file
rg -n "MongoDB|mongodb" docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 104


🏁 Script executed:

# Search for S3 and Azure sections to compare with MongoDB
rg -n "S3|Azure|azure" docs/cluster/import.md

Repository: crate/cloud-docs

Length of output: 769


🏁 Script executed:

# Search for MongoDB in related documentation
fd -e md -e rst --search-path docs/cluster | head -20

Repository: crate/cloud-docs

Length of output: 277


🏁 Script executed:

# Search for cluster-integrations reference target
rg -n "cluster-integrations" docs/cluster/import.md docs/cluster/integrations/

Repository: crate/cloud-docs

Length of output: 211


🏁 Script executed:

# Check the integrations index file
cat -n docs/cluster/integrations/index.md

Repository: crate/cloud-docs

Length of output: 2214


🏁 Script executed:

# Check the mongo-cdc file to see if it documents MongoDB import
head -50 docs/cluster/integrations/mongo-cdc.md

Repository: crate/cloud-docs

Length of output: 1864


Integration section is minimalist and should provide more context.

The Integration section (lines 102-105) contains only a single cross-reference. While the reference points to the appropriate integrations documentation, the section itself reads as a placeholder and lacks any explanatory context about what integrations are or how they differ from File Import.

Additionally, there is a terminology confusion: the intro mentions "MongoDB database" and line 15 lists "MongoDB collection" as a supported format, but no guidance exists in the File Import section for importing from MongoDB as a data source (unlike S3 and Azure, which have dedicated subsections). The "MongoDB collection" format reference relates to data format support in other imports, not MongoDB-as-source capability. MongoDB import/sync guidance exists only in the separate Integrations section (MongoDB CDC), which describes continuous real-time synchronization rather than one-time imports.

Consider either:

  1. Expanding the Integration section with a brief explanation of what integrations are and how they differ from one-time file imports, or
  2. Relocating this section to appear after the File Import subsections with clearer separation of concerns
  3. Clarifying whether one-time MongoDB imports are supported in File Import (beyond CDC) and documenting them accordingly
🤖 Prompt for AI Agents
In @docs/cluster/import.md around lines 102 - 105, The Integration section
titled "(cluster-import-integration) ## Integration" is too minimal and causes
confusion about how integrations differ from File Import and whether MongoDB is
supported as a one-time source; expand this section to briefly define
"integrations" vs "one-time file imports", explicitly state that MongoDB CDC
covers real-time sync while noting whether one-time MongoDB imports are
supported (and link to the MongoDB CDC page), and either move this expanded
section to follow the File Import subsections or add a clear
cross-reference/note under "File Import" clarifying that "MongoDB collection" in
formats refers to format support (not source import) unless one-time MongoDB
import is implemented—if one-time MongoDB import exists, add a short how-to
summary or link to its docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants