Skip to content

Latest commit

 

History

History
468 lines (369 loc) · 13.6 KB

File metadata and controls

468 lines (369 loc) · 13.6 KB

Import & Merge Guide

Guide for importing content from various sources and merging memory slots with intelligent duplicate detection.

Table of Contents

  1. Content Import System
  2. Memory Slot Merging
  3. Supported File Formats
  4. Import Strategies
  5. Merge Strategies
  6. Best Practices
  7. Troubleshooting

Content Import System

Overview

The memcord_import tool enables importing content from various sources into memory slots, expanding beyond manual text entry to support:

  • Text Files: Markdown, plain text, documentation
  • PDF Documents: Research papers, reports, manuals
  • Web Content: Articles, blog posts, documentation pages
  • Structured Data: CSV datasets, JSON configurations

Basic Import Syntax

memcord_import source="<source_path_or_url>" [options]

Required Parameters:

  • source: File path, URL, or data source

Optional Parameters:

  • slot_name: Target memory slot (uses current slot if not specified)
  • description: Descriptive text for the imported content
  • tags: Array of tags for categorization
  • group_path: Hierarchical organization path

Import Examples

Text File Import

# Import markdown documentation
memcord_import source="./project-docs/README.md" slot_name="project_readme" tags=["docs","readme"] group_path="projects/alpha"

# Import meeting notes
memcord_import source="/notes/meeting_2025_01_15.txt" slot_name="meeting_notes" description="Weekly standup notes" tags=["meeting","standup"]

PDF Document Import

# Import research paper
memcord_import source="/research/paper.pdf" slot_name="research_lit" tags=["research","pdf","literature"] description="Key research paper on ML"

# Import technical manual
memcord_import source="./manuals/api_guide.pdf" slot_name="api_docs" tags=["manual","api","reference"] group_path="documentation/api"

Web Content Import

# Import blog article
memcord_import source="https://example.com/best-practices-guide" slot_name="best_practices" tags=["web","guide"] description="Industry best practices"

# Import documentation page
memcord_import source="https://docs.framework.com/getting-started" slot_name="framework_docs" tags=["docs","web","tutorial"] group_path="learning/frameworks"

Structured Data Import

# Import CSV dataset
memcord_import source="/data/sales_q1_2025.csv" slot_name="sales_data" tags=["data","csv","sales"] description="Q1 2025 sales metrics"

# Import JSON configuration
memcord_import source="./config/app_settings.json" slot_name="app_config" tags=["config","json"] group_path="configurations/app"

Import Metadata

Every import automatically includes rich metadata:

=== IMPORTED CONTENT ===
Source: /path/to/file.pdf
Type: pdf
Imported: 2025-01-15T10:30:00
Description: Research paper on machine learning
========================

[Original content follows...]

Memory Slot Merging

Overview

The memcord_merge tool consolidates multiple memory slots into a single, organized slot with:

  • Duplicate Detection: Configurable similarity thresholds
  • Chronological Ordering: Timeline-based content organization
  • Metadata Consolidation: Combined tags and groups
  • Preview Mode: See results before execution

Basic Merge Syntax

memcord_merge source_slots=["slot1","slot2"] target_slot="merged_slot" [options]

Required Parameters:

  • source_slots: Array of memory slots to merge (minimum 2)
  • target_slot: Name for the merged result

Optional Parameters:

  • action: preview (default) or merge
  • similarity_threshold: 0.0-1.0 (default 0.8)
  • delete_sources: true/false (default false)

Merge Workflow

1. Preview Phase

# Preview merge to see statistics
memcord_merge source_slots=["meeting1","meeting2","meeting3"] target_slot="project_meetings" action="preview"

Preview Output:

=== MERGE PREVIEW: project_meetings ===
Source slots: meeting1, meeting2, meeting3
Total content length: 15,420 characters
Duplicate content to remove: 7 sections
Similarity threshold: 80.0%

Merged tags (8): meeting, project, alpha, weekly, standup, urgent, decisions, action-items
Merged groups (1): meetings/weekly

Chronological order:
  - meeting1: 2025-01-08 09:00:00
  - meeting2: 2025-01-15 09:00:00  
  - meeting3: 2025-01-22 09:00:00

⚠️  WARNING: Target slot 'project_meetings' already exists and will be overwritten!

Content preview:
==========================================
=== MERGED MEMORY SLOT ===
Created: 2025-01-22 14:30:00
Source Slots: meeting1, meeting2, meeting3
Total Sources: 3
=========================

--- From meeting1 (2025-01-08 09:00:00) ---
Team Standup - Jan 8, 2025
[Content follows...]
==========================================

To execute the merge, call memcord_merge again with action='merge'

2. Execution Phase

# Execute the merge
memcord_merge source_slots=["meeting1","meeting2","meeting3"] target_slot="project_meetings" action="merge"

Execution Output:

✅ Successfully merged 3 slots into 'project_meetings'
Final content: 14,150 characters
Duplicates removed: 7 sections
Merged at: 2025-01-22 14:30:15

Source slots: meeting1, meeting2, meeting3
Tags merged: meeting, project, alpha, weekly, standup, urgent, decisions, action-items
Groups merged: meetings/weekly

Advanced Merge Options

Custom Similarity Threshold

# More aggressive duplicate detection (70% similarity)
memcord_merge source_slots=["draft1","draft2"] target_slot="final_doc" action="merge" similarity_threshold=0.7

# More conservative duplicate detection (90% similarity)
memcord_merge source_slots=["notes1","notes2"] target_slot="combined_notes" action="merge" similarity_threshold=0.9

Source Cleanup

# Merge and delete source slots
memcord_merge source_slots=["temp1","temp2","temp3"] target_slot="consolidated" action="merge" delete_sources=true

Supported File Formats

Text Files

  • Extensions: .txt, .md, .markdown, .rst, .log
  • Encoding: UTF-8 (automatic detection)
  • Size Limit: 50MB per file
  • Features: Preserves formatting, handles large files

PDF Documents

  • Processing: Page-by-page text extraction
  • Library: pdfplumber for robust extraction
  • Features: Page number headers, maintains structure
  • Limitations: Text-based PDFs only (no OCR)

Web Content

  • Protocols: HTTP/HTTPS
  • Processing: Clean article extraction with trafilatura
  • Features: Removes ads/navigation, preserves main content
  • Metadata: Page title, content type, extraction method

Structured Data

  • JSON: Configuration files, API responses, data exports
  • CSV/TSV: Datasets, reports, tabular data
  • Processing: pandas for robust data handling
  • Features: Schema detection, row/column statistics

Import Strategies

1. Hierarchical Organization

# Organize by project and type
memcord_import source="./docs/api.md" slot_name="api_docs" group_path="projects/alpha/documentation"
memcord_import source="./specs/requirements.pdf" slot_name="requirements" group_path="projects/alpha/specifications"

2. Thematic Tagging

# Tag by content themes
memcord_import source="article1.pdf" slot_name="research1" tags=["ai","neural-networks","deep-learning"]
memcord_import source="article2.pdf" slot_name="research2" tags=["ai","computer-vision","cnn"]

3. Batch Import Workflows

# Import multiple related files
for file in docs/*.md; do
    memcord_import source="$file" slot_name="doc_$(basename $file .md)" tags=["docs","batch"] group_path="documentation/guides"
done

4. Source Type Specialization

# Web content with source attribution
memcord_import source="https://tech-blog.com/article" slot_name="tech_trends" tags=["web","trends","external"] description="External tech trends analysis"

# Internal documentation
memcord_import source="./internal/process.md" slot_name="internal_process" tags=["internal","process","confidential"] description="Internal process documentation"

Merge Strategies

1. Chronological Consolidation

# Merge time-series content (meetings, logs, reports)
memcord_merge source_slots=["jan_meetings","feb_meetings","mar_meetings"] target_slot="q1_meetings" action="merge"

2. Thematic Consolidation

# Merge by topic or theme
memcord_merge source_slots=["api_docs1","api_docs2","api_reference"] target_slot="complete_api_docs" action="merge"

3. Progressive Consolidation

# Multi-stage merging for large datasets
# Stage 1: Merge weekly reports
memcord_merge source_slots=["week1","week2","week3","week4"] target_slot="month1" action="merge"
memcord_merge source_slots=["week5","week6","week7","week8"] target_slot="month2" action="merge"

# Stage 2: Merge monthly summaries
memcord_merge source_slots=["month1","month2","month3"] target_slot="q1_summary" action="merge"

4. Cleanup and Archival

# Merge temporary slots and cleanup
memcord_merge source_slots=["temp_notes1","temp_notes2","temp_drafts"] target_slot="archived_content" action="merge" delete_sources=true

Best Practices

Import Best Practices

  1. Use Descriptive Slot Names

    # Good
    memcord_import source="report.pdf" slot_name="q1_sales_report_2025"
    
    # Avoid
    memcord_import source="report.pdf" slot_name="report1"
  2. Apply Consistent Tagging

    # Consistent taxonomy
    memcord_import source="doc.pdf" tags=["finance","quarterly","report","2025"]
  3. Organize with Group Paths

    # Hierarchical organization
    memcord_import source="spec.md" group_path="projects/alpha/specifications"
  4. Add Context with Descriptions

    # Descriptive context
    memcord_import source="data.csv" description="Customer survey responses Q1 2025 - 1,500 respondents"

Merge Best Practices

  1. Always Preview First

    # Preview before executing
    memcord_merge source_slots=["a","b"] target_slot="merged" action="preview"
    # Review output, then:
    memcord_merge source_slots=["a","b"] target_slot="merged" action="merge"
  2. Adjust Similarity Thresholds

    # For technical docs (conservative)
    memcord_merge ... similarity_threshold=0.9
    
    # For meeting notes (aggressive)
    memcord_merge ... similarity_threshold=0.7
  3. Use Cleanup Strategically

    # Only delete sources when confident
    memcord_merge ... delete_sources=true action="merge"
  4. Meaningful Target Names

    # Descriptive merge targets
    memcord_merge ... target_slot="project_alpha_complete_documentation"

Organization Best Practices

  1. Consistent Naming Conventions

    • Use descriptive, date-stamped names
    • Follow project/team naming standards
    • Include version numbers for iterations
  2. Strategic Group Hierarchies

    projects/
    ├── alpha/
    │   ├── documentation/
    │   ├── meetings/
    │   └── specifications/
    └── beta/
        ├── research/
        └── development/
    
  3. Tag Taxonomies

    # Category tags: [type, priority, status, domain]
    tags=["meeting","high","active","frontend"]

Troubleshooting

Import Issues

File Not Found

Error: Source cannot be empty
Error: File not found: /path/to/file.pdf

Solution: Verify file path and permissions

Unsupported Format

Error: No suitable import handler found for source

Solution: Check supported formats, convert if necessary

Web Content Extraction Failed

Import failed: No content could be extracted from URL

Solutions:

  • Check URL accessibility
  • Verify content is text-based
  • Try different URLs if paywall/login required

Large File Handling

Import failed: File too large

Solutions:

  • Split large files into smaller sections
  • Use compression if applicable
  • Consider cloud storage with direct links

Merge Issues

Insufficient Source Slots

Error: At least 2 source slots are required for merging

Solution: Provide minimum 2 valid slot names

Missing Source Slots

Error: Memory slots not found: slot1, slot3

Solution: Verify slot names with memcord_list

Target Slot Conflicts

⚠️ WARNING: Target slot 'merged' already exists and will be overwritten!

Solution:

  • Use different target name, or
  • Proceed if overwrite is intentional

Memory/Performance Issues

Merge operation failed: Memory allocation error

Solutions:

  • Reduce content size
  • Use higher similarity threshold
  • Merge in smaller batches

Performance Optimization

Large Content Handling

# Use higher similarity thresholds for faster processing
memcord_merge ... similarity_threshold=0.9

# Process in smaller batches
memcord_merge source_slots=["batch1","batch2"] target_slot="intermediate1"
memcord_merge source_slots=["batch3","batch4"] target_slot="intermediate2"  
memcord_merge source_slots=["intermediate1","intermediate2"] target_slot="final"

Web Import Optimization

# Batch web imports to avoid rate limiting
for url in $urls; do
    memcord_import source="$url" ...
    sleep 2  # Rate limiting
done

Resource Management

# Cleanup after major operations
memcord_merge ... delete_sources=true  # Remove temporary slots

This guide covers all aspects of using the import and merge features effectively. For additional help, refer to the Tools Reference for detailed parameter specifications and the Examples for practical workflows.