Skip to content

Backfill Workflow

akurait edited this page Dec 4, 2025 · 5 revisions

Backfill Workflow

This page describes how to perform document backfill using the Workflow CLI on Kubernetes. The workflow-based approach provides declarative configuration, automatic retry handling, and progress monitoring through Argo Workflows.

Overview

The backfill workflow migrates documents from your source cluster to your target cluster using snapshot-based reindexing (RFS). The workflow approach uses Argo Workflows to orchestrate the entire process automatically.

What the workflow does

The backfill workflow performs these steps automatically:

  1. Create snapshot: Creates a snapshot of specified indexes on the source cluster
  2. Register snapshot: Makes the snapshot available to the target cluster
  3. Migrate metadata (optional): Transfers index templates and settings
  4. Load documents: Reindexes documents from the snapshot to the target cluster
  5. Cleanup: Removes temporary state and coordination data

Prerequisites

Before running a backfill workflow:

  • Have access to the migration console on your Kubernetes cluster
  • Have Argo Workflows installed and running
  • Ensure source and target clusters are accessible
  • Have an S3 bucket or persistent volume configured for snapshots

Configuring a backfill workflow

Basic configuration

A minimal backfill configuration includes:

sourceClusters:
  my-source:
    endpoint: https://source-cluster:9200
    version: "7.10.2"
    authConfig:
      basic:
        username: admin
        password: password

targetClusters:
  my-target:
    endpoint: https://target-cluster:9200
    version: "2.11.0"
    authConfig:
      basic:
        username: admin
        password: password

migrations:
  - sourceCluster: my-source
    targetCluster: my-target
    snapshotMigrations:
      - indices: ["*"]
        metadataMigration:
          enabled: true
        documentBackfill:
          enabled: true

Configuration options

Index selection

Specify which indexes to migrate using patterns:

snapshotMigrations:
  # All indexes
  - indices: ["*"]
  
  # Specific patterns
  - indices: ["logs-*", "metrics-*"]
  
  # Multiple specific indexes
  - indices: ["users", "orders", "products"]

Metadata migration

Control whether to migrate index templates and settings:

metadataMigration:
  enabled: true   # Migrate templates, mappings, settings
  # or
  enabled: false  # Skip metadata, documents only

Document backfill

Enable or disable document migration:

documentBackfill:
  enabled: true   # Migrate documents
  # or
  enabled: false  # Skip documents (metadata only)

S3 snapshot configuration

If using S3 for snapshots:

sourceClusters:
  my-source:
    endpoint: https://source-cluster:9200
    version: "7.10.2"
    snapshotRepo:
      s3RepoPathUri: s3://my-bucket/snapshots
      aws_region: us-east-1

Running a backfill workflow

Step 1: Create configuration

Access the migration console and create your configuration:

# Load sample and customize
workflow configure sample --load
workflow configure edit

Step 2: Submit workflow

Submit the workflow to Argo:

workflow submit

Note the workflow name from the output for monitoring.

Step 3: Monitor progress

Check workflow status:

workflow status

Example output:

[*] Workflow: migration-abc123
  Phase: Running
  Started: 2024-01-15T10:30:00Z

📋 Workflow Steps
├── ✓ Initialize (Succeeded)
├── ✓ Create Snapshot (Succeeded)
├── ✓ Register Snapshot (Succeeded)
├── ▶ Migrate Metadata (Running)
├── ○ Restore Documents (Pending)
└── ○ Cleanup (Pending)

Step 4: Wait for completion

The workflow runs asynchronously. Monitor until all steps succeed:

watch -n 10 workflow status

Or submit with the --wait flag to block until completion:

workflow submit --wait --timeout 3600

Advanced scenarios

Multiple index groups

Migrate different sets of indexes with different configurations:

migrations:
  - sourceCluster: my-source
    targetCluster: my-target
    snapshotMigrations:
      # Critical data: full migration
      - indices: ["orders-*", "customers-*"]
        metadataMigration:
          enabled: true
        documentBackfill:
          enabled: true
      
      # Historical logs: documents only
      - indices: ["logs-2023-*"]
        metadataMigration:
          enabled: false
        documentBackfill:
          enabled: true
      
      # Templates only: no documents
      - indices: ["templates-*"]
        metadataMigration:
          enabled: true
        documentBackfill:
          enabled: false

Parallel migrations

Migrate from multiple sources simultaneously:

sourceClusters:
  source-1:
    endpoint: https://source1:9200
  source-2:
    endpoint: https://source2:9200

targetClusters:
  target:
    endpoint: https://target:9200

migrations:
  - sourceCluster: source-1
    targetCluster: target
    snapshotMigrations:
      - indices: ["source1-*"]
  
  - sourceCluster: source-2
    targetCluster: target
    snapshotMigrations:
      - indices: ["source2-*"]

The workflow engine executes these migrations in parallel where possible.

Using existing snapshots

If you have existing snapshots, configure the snapshot repository:

sourceClusters:
  my-source:
    endpoint: https://source:9200
    snapshotRepo:
      s3RepoPathUri: s3://existing-bucket/existing-snapshots
      aws_region: us-west-2

Monitoring and troubleshooting

Check workflow status

View all running workflows:

workflow status

View specific workflow details:

workflow status

View workflow logs

Check logs for a specific workflow:

workflow output

Common issues

Workflow stuck in pending

Problem: Workflow shows all steps as Pending.

Solutions:

  • Check if workflow templates are deployed: kubectl get workflowtemplates -n ma
  • Verify Argo Workflows is running: kubectl get pods -n argo
  • Check resource availability: kubectl describe pod -n ma

Snapshot creation failed

Problem: Snapshot creation step fails.

Solutions:

  • Verify S3 bucket permissions
  • Check source cluster has snapshot repository configured
  • Ensure indexes exist and are accessible
  • Review snapshot step logs for specific errors

Document loading slow or stalled

Problem: Document restore is taking too long or appears stuck.

Solutions:

  • Check target cluster health and capacity
  • Monitor shard allocation: curl http://target:9200/_cat/shards?v
  • Verify network connectivity between clusters
  • Check if target cluster has sufficient disk space

Metadata migration errors

Problem: Metadata migration fails with compatibility errors.

Solutions:

  • Review breaking changes between source and target versions
  • Check for unsupported field types
  • See Migration Paths for version compatibility details

Stopping a workflow

If you need to stop a running workflow:

workflow stop

This gracefully terminates the workflow and cleans up resources.

Performance considerations

Parallelism

The workflow engine supports parallel execution of independent tasks. Configure parallelism based on your cluster capacity:

  • Default: Up to 100 concurrent pods
  • Limited by Kubernetes cluster resources
  • Balanced against source and target cluster capacity

Resource limits

Workflow pods use default resource limits. For large migrations, you may need to adjust:

  • CPU and memory requests/limits
  • Storage for temporary data
  • Network bandwidth

Migration duration

Factors affecting migration time:

Factor Impact
Total data volume Primary factor in duration
Number of indexes and shards Affects parallelism
Source cluster load May throttle snapshot speed
Target cluster capacity Affects indexing speed
Network bandwidth Affects data transfer
S3 snapshot transfer speeds Affects snapshot operations

Verification

After the workflow completes:

1. Check workflow status

workflow status

2. Verify document counts

# Source cluster
curl http://source:9200/_cat/indices?v

# Target cluster
curl http://target:9200/_cat/indices?v

3. Compare index settings

# Check target cluster indexes
curl http://target:9200/<index>/_settings

4. Test queries

Run test queries on the target cluster to ensure data is accessible and correctly indexed.

Next steps

After completing the backfill workflow:

  1. Verify all data migrated successfully
  2. Test application connectivity to the target cluster
  3. Review Troubleshooting if you encounter issues

Related pages

Clone this wiki locally