Complete documentation on testing markitdown-rs converters and the overall system.
cargo testcargo test csv
cargo test docxcargo test test_csv_basiccargo test -- --nocaptureRUST_BACKTRACE=1 cargo testcargo test -- --test-threads=1The library includes integration tests for AI capabilities (image description, etc.) using OpenRouter (or any OpenAI-compatible provider).
To run the LLM tests, you need to set the following environment variables, either in your shell or in a .env file in the project root:
OPENROUTER_API_KEY="your_api_key"
OPENROUTER_ENDPOINT="https://openrouter.ai/api/v1"
OPENROUTER_MODEL="google/gemini-2.0-flash-exp:free"cargo test --test llmIf the environment variables are missing, the tests will be automatically skipped.
Tests are organized by format in the tests/ directory:
tests/
├── archive.rs # ZIP, TAR, GZIP, etc.
├── bibtex_log.rs # BibTeX and Log files
├── csv.rs # Comma-separated values
├── docbook.rs # DocBook XML
├── docx.rs # Microsoft Word
├── email.rs # Email (EML, MSG)
├── epub.rs # EPUB ebooks
├── excel.rs # Excel spreadsheets
├── fictionbook.rs # FB2 ebooks
├── html.rs # HTML web pages
├── image.rs # Raster images
├── json.rs # JSON data
├── jupyter.rs # Jupyter notebooks
├── latex.rs # LaTeX documents
├── legacy_office.rs # Old Office formats
├── markdown.rs # Markdown passthrough
├── opendocument.rs # ODF documents
├── opml.rs # OPML outlines
├── orgmode.rs # Org-mode
├── pdf.rs # PDF documents
├── pptx.rs # PowerPoint
├── rst.rs # reStructuredText
├── rtf.rs # Rich Text Format
├── sqlite.rs # SQLite databases
├── table_merge.rs # Table merging utility
├── text.rs # Plain text
├── typst.rs # Typst documents
├── vcard.rs # vCard contacts
├── xml.rs # RSS/Atom feeds
└── yaml.rs # YAML data
test_documents/ # Test fixtures organized by format
├── archive/
├── csv/
├── docbook/
├── docx/
├── email/
├── ... (one per format)
| Test Suite | Count | Status |
|---|---|---|
| Library unit tests | 4 | ✅ Pass |
| Archive | 7 | ✅ Pass |
| BibTeX/Log | 5 | ✅ Pass |
| CSV | 3 | ✅ Pass |
| DocBook | 6 | ✅ Pass |
| DOCX | 12 | ✅ Pass |
| 5 | ✅ Pass | |
| EPUB | 6 | ✅ Pass |
| Excel | 6 | ✅ Pass |
| FictionBook | 10 | ✅ Pass |
| HTML | 11 | ✅ Pass |
| Image | 15 | ✅ Pass |
| JSON | 6 | ✅ Pass |
| Jupyter | 5 | ✅ Pass |
| LaTeX | 9 | ✅ Pass |
| Legacy Office | 12 | ✅ Pass |
| Markdown | 3 | ✅ Pass |
| OpenDocument | 10 | ✅ Pass |
| OPML | 6 | ✅ Pass |
| Org-mode | 6 | ✅ Pass |
| 10 | ✅ Pass (1 ignored) | |
| PowerPoint | 7 | ✅ Pass |
| RST | 2 | ✅ Pass |
| RTF | 2 | ✅ Pass |
| SQLite | 4 | ✅ Pass |
| Table Merge | 4 | ✅ Pass |
| Text | 4 | ✅ Pass |
| Typst | 7 | ✅ Pass |
| vCard | 4 | ✅ Pass |
| XML/RSS | 2 | ✅ Pass |
| YAML | 4 | ✅ Pass |
| TOTAL | 198 | ✅ Pass |
| Ignored | 3 |
Create tests/myformat.rs:
//! MyFormat conversion tests
use bytes::Bytes;
use markitdown::{ConversionOptions, MarkItDown};
use std::fs;
fn default_options(ext: &str) -> ConversionOptions {
ConversionOptions {
file_extension: Some(ext.to_string()),
url: None,
llm_client: None,
extract_images: true,
force_llm_ocr: false,
merge_multipage_tables: false,
}
}
const TEST_DIR: &str = "tests/test_documents/myformat";
fn test_file(name: &str) -> String {
format!("{}/{}", TEST_DIR, name)
}
// Basic conversion test
#[tokio::test]
async fn test_myformat_basic() {
let md = MarkItDown::new();
let result = md
.convert(&test_file("basic.myfmt"), Some(default_options(".myfmt")))
.await;
assert!(
result.is_ok(),
"MyFormat conversion failed: {:?}",
result.err()
);
let doc = result.unwrap();
let content = doc.to_markdown();
assert!(!content.is_empty(), "Content should not be empty");
}
// Bytes conversion test
#[tokio::test]
async fn test_myformat_bytes_conversion() {
let md = MarkItDown::new();
let bytes = fs::read(test_file("basic.myfmt")).expect("Failed to read file");
let result = md
.convert_bytes(Bytes::from(bytes), Some(default_options(".myfmt")))
.await;
assert!(
result.is_ok(),
"MyFormat bytes conversion failed: {:?}",
result.err()
);
}
// Feature-specific test
#[tokio::test]
async fn test_myformat_with_tables() {
let md = MarkItDown::new();
let result = md
.convert(&test_file("tables.myfmt"), Some(default_options(".myfmt")))
.await;
assert!(result.is_ok());
let doc = result.unwrap();
let content = doc.to_markdown();
// Verify table structure is preserved
assert!(content.contains("|"), "Should contain table markup");
}Create test files in tests/test_documents/myformat/:
tests/test_documents/myformat/
├── basic.myfmt # Simple document
├── tables.myfmt # Document with tables
├── with-images.myfmt # Document with embedded images
├── complex.myfmt # Complex/real-world example
└── README.md # Notes on fixtures
-
Test Multiple Scenarios:
- Basic/minimal documents
- Documents with special features (tables, images, etc.)
- Bytes vs. file-based input
- Error conditions
-
Use Descriptive Names:
#[tokio::test] async fn test_myformat_preserves_heading_hierarchy() { // Good: describes what's being tested } #[tokio::test] async fn test_myformat_1() { // Bad: not descriptive }
-
Assert Specific Content:
// Good: verifies specific feature assert!(content.contains("# Heading"), "Should preserve markdown headings"); // Bad: too vague assert!(!content.is_empty());
-
Test Error Cases:
#[tokio::test] async fn test_myformat_empty_file() { // Should handle gracefully let md = MarkItDown::new(); let result = md .convert_bytes(Bytes::from(vec![]), Some(default_options(".myfmt"))) .await; // Either Ok with empty doc or appropriate error assert!(result.is_ok() || result.is_err()); }
Internal unit tests for library components live in src/:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_table_merge_basic() {
let input = vec![/* table data */];
let result = merge_tables(input);
assert_eq!(result.len(), 1);
}
}Run unit tests:
cargo test --libIntegration tests verify end-to-end format conversion:
# All integration tests
cargo test --test '*'
# Specific format
cargo test --test docxFor complex formats, consider property-based testing with proptest:
#[cfg(test)]
mod prop_tests {
use proptest::prelude::*;
proptest! {
#[test]
fn test_csv_always_produces_markdown(csv in ".*") {
let result = CsvConverter::parse(csv.as_bytes());
// Property: conversion should never panic
let _ = result;
}
}
}Use Criterion for benchmarking:
cargo benchSee benches/conversion.rs for benchmark definitions.
Tests run automatically on:
- Every commit
- Every pull request
- Before release
RUST_BACKTRACE=1 cargo test test_namecargo test test_name -- --nocapture#[tokio::test]
async fn test_myformat_debug() {
env_logger::builder()
.is_test(true)
.try_init()
.ok();
log::debug!("Starting test");
// ... test code
}Run with logging:
RUST_LOG=debug cargo test -- --nocapturels tests/test_documents/myformat/
file tests/test_documents/myformat/basic.myfmt- Check for hardcoded paths
- Verify test isolation (independent from other tests)
- Look for file locking issues
- PDF and image tests take longer
- Run specific test suites during development:
cargo test csv # Fast cargo test pdf # Slower
- Ensure files exist in
tests/test_documents/<format>/ - Use skip attribute for optional tests:
#[tokio::test] #[ignore = "requires large fixture file"] async fn test_large_file() { }
- Collect sample documents
- Store in
tests/test_documents/<format>/ - Name clearly:
basic.ext,with-images.ext,complex.ext - Document source and purpose in README.md
Create minimal test files to verify basic functionality:
# Create minimal CSV
echo "Name,Age
Alice,30
Bob,25" > tests/test_documents/csv/minimal.csv
# Create minimal JSON
echo '{"key": "value"}' > tests/test_documents/json/minimal.jsonSkip problematic tests temporarily:
#[tokio::test]
#[ignore = "PDF library issue #123"]
async fn test_pdf_complex_layout() {
// Test code
}Run ignored tests:
cargo test -- --ignoredGenerate coverage reports:
# Using tarpaulin
cargo tarpaulin --out Html
# View in browser
open tarpaulin-report.htmlCurrent coverage target: 80%+ for converters
When submitting a new format:
- ✅ Add converter implementation
- ✅ Create test file
tests/<format>.rs - ✅ Add test fixtures to
tests/test_documents/<format>/ - ✅ Ensure all tests pass:
cargo test - ✅ Add format to supported list in README