Skip to content

Enabling multiprocessing in sv_inference.py#17

Open
heesuallykim wants to merge 3 commits intoSUwonglab:masterfrom
heesuallykim:multiprocessing
Open

Enabling multiprocessing in sv_inference.py#17
heesuallykim wants to merge 3 commits intoSUwonglab:masterfrom
heesuallykim:multiprocessing

Conversation

@heesuallykim
Copy link
Copy Markdown

I've added multiprocessing steps to the sv_inference.py code, as well as adding an n_processes option in the CLI. I'm pasting Claude's interpretation of the multiprocessing steps below in a visual guide.

Visual Guide: How Multiprocessing Works in sv_inference

Overview Diagram

┌─────────────────────────────────────────────────────────────┐
│              SV INFERENCE PIPELINE                          │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  1. Load Data & Build Graph                                 │
│     (Sequential - not parallelized)                         │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  2. Decompose Graph into Subgraphs                          │
│     (Sequential - very fast)                                │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  3. Test for Insertions (PARALLELIZED! 🚀)                 │
│                                                             │
│     ORIGINAL (Sequential):                                  │
│     Block 1 ──> Block 2 ──> Block 3 ──> ... ──> Block N    │
│     Time: T1 + T2 + T3 + ... + TN                          │
│                                                             │
│     OPTIMIZED (Parallel):                                   │
│     Block 1 ─┐                                              │
│     Block 2 ─┤                                              │
│     Block 3 ─┼──> Process Pool ──> Results                 │
│     Block 4 ─┤                                              │
│     Block N ─┘                                              │
│     Time: max(T1, T2, ..., TN) / num_cores                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  4. Expand & Merge Subgraphs                                │
│     (Sequential - very fast)                                │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  5. Process Subgraphs (PARALLELIZED! 🚀🚀🚀)               │
│     (This is the BIGGEST optimization)                      │
│                                                             │
│     ORIGINAL (Sequential):                                  │
│     Subgraph 1 ──> Subgraph 2 ──> ... ──> Subgraph M       │
│     Time: S1 + S2 + ... + SM                               │
│                                                             │
│     OPTIMIZED (Parallel):                                   │
│     Subgraph 1 ─┐                                           │
│     Subgraph 2 ─┤                                           │
│     Subgraph 3 ─┼──> Process Pool ──> SV Calls            │
│     Subgraph 4 ─┤                                           │
│     Subgraph M ─┘                                           │
│     Time: max(S1, S2, ..., SM) / num_cores                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  6. Write Output Files                                      │
│     (Sequential - must maintain order)                      │
└─────────────────────────────────────────────────────────────┘

Data Flow Diagram

┌──────────────────────────────────────────────────────────┐
│                    Main Process                          │
│  ┌────────────────────────────────────────────────────┐  │
│  │  Input Data: blocks, graph, options, ...          │  │
│  └────────────────────────────────────────────────────┘  │
│                          │                               │
│                          │ Pickle & Send                 │
│                          ▼                               │
│  ┌─────────────────────────────────────────────────┐    │
│  │         Process Pool (multiprocessing)          │    │
│  │                                                  │    │
│  │  ┌──────────────┐  ┌──────────────┐            │    │
│  │  │  Worker 1    │  │  Worker 2    │            │    │
│  │  │              │  │              │   ...      │    │
│  │  │ [Subgraph 1] │  │ [Subgraph 2] │            │    │
│  │  │              │  │              │            │    │
│  │  │  Compute     │  │  Compute     │            │    │
│  │  │  Likelihood  │  │  Likelihood  │            │    │
│  │  │  Call SVs    │  │  Call SVs    │            │    │
│  │  │              │  │              │            │    │
│  │  └──────┬───────┘  └──────┬───────┘            │    │
│  │         │                 │                     │    │
│  │         └─────────┬───────┘                     │    │
│  │                   │ Pickle & Return             │    │
│  └───────────────────┼─────────────────────────────┘    │
│                      ▼                                   │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Collect Results: SV calls, VCF lines, stats      │ │
│  └────────────────────────────────────────────────────┘ │
│                          │                              │
│                          ▼                              │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Write Output Files (sequential)                   │ │
│  └────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Multiprocessing Optimization for sv_inference.py

Overview

This document explains the multiprocessing optimizations added to sv_inference.py to improve runtime performance. The optimized version (sv_inference_mp.py) uses Python's multiprocessing module to parallelize computationally expensive operations.

Key Changes

1. Parallel Insertion Testing

The original code tested for insertions sequentially across all blocks. This is now parallelized:

Original (Sequential):

for b in range(0, len(blocks) - 1):
    # Test insertion for each block one at a time
    # This can be slow when there are many blocks

Optimized (Parallel):

# Create a pool of worker processes
with Pool(processes=n_processes) as pool:
    results = pool.map(test_single_insertion, block_args)

Why this helps: Each block's insertion test is independent, so they can run simultaneously on different CPU cores.

2. Parallel Subgraph Processing

The most significant optimization is parallelizing subgraph processing:

Original (Sequential):

for sub in subgraphs:
    # Process each subgraph one at a time
    # This is the most time-consuming part
    process_subgraph(...)

Optimized (Parallel):

with Pool(processes=n_processes) as pool:
    results = pool.map(process_single_subgraph, subgraph_args)

Why this helps: Each subgraph represents an independent region of the genome. Processing them in parallel can dramatically reduce runtime when you have many subgraphs.

3. Automatic CPU Detection

The code automatically determines the optimal number of processes to use:

n_processes = opts.get('n_processes', max(1, cpu_count() - 1))

This uses all available CPU cores minus one (to keep the system responsive).

Expected Performance Improvements

Performance gains depend on your specific workload:

  1. Insertion Testing:

    • Speedup: ~2-8x (depends on number of blocks)
    • Most beneficial when: Testing many blocks for insertions
  2. Subgraph Processing:

    • Speedup: ~2-16x (depends on number of subgraphs and CPU cores)
    • Most beneficial when: You have many independent subgraphs
  3. Overall Runtime:

    • Typical speedup: 2-6x for most genomic regions
    • Best case: 10-15x for highly parallelizable regions

Important Notes

Memory Considerations

Multiprocessing creates separate processes, each with its own memory:

  • Each process gets a copy of the data it needs
  • Memory usage = single_process_memory × n_processes
  • Monitor your system's RAM usage

Limitations

Some parts remain sequential because they:

  1. Require file I/O: Writing to output files must be sequential to maintain order
  2. Share state: Graph decomposition needs to see the full graph
  3. Are already fast: Some operations are quick enough that parallelization overhead wouldn't help

Technical Details

Worker Function Design

Each parallelized operation has a dedicated worker function:

  1. test_single_insertion(args): Tests one block for insertions
  2. process_single_subgraph(args): Processes one subgraph

These functions:

  • Take a tuple of arguments (required by multiprocessing.Pool.map)
  • Are independent (don't share state)
  • Return serializable results (no file handles or complex objects)

Data Serialization

Python's multiprocessing uses pickle to send data between processes:

  • Arguments are serialized (pickled) and sent to worker processes
  • Results are serialized (pickled) and sent back to main process
  • This is why we can't pass file handles or certain complex objects

Debugging Tips

If you encounter issues:

  1. Start with 1 process to verify the code works:

    opts['n_processes'] = 1
  2. Check memory usage:

    # While running, monitor memory in another terminal
    watch -n 1 'free -h'
  3. Enable verbose output:

    opts['verbosity'] = 2  # See detailed progress
  4. Test with small regions first:

    • Process a small genomic region to verify functionality
    • Then scale up to larger regions

Benchmarking

To measure speedup:

import time

# Original version
start = time.time()
do_inference(opts, ...)  # Original function
original_time = time.time() - start

# Optimized version
start = time.time()
do_inference(opts, ...)  # Optimized function with n_processes set
optimized_time = time.time() - start

speedup = original_time / optimized_time
print(f"Speedup: {speedup:.2f}x")

Further Optimization Possibilities

Future enhancements could include:

  1. Path likelihood computation: Parallelize likelihood calculations for different paths within a subgraph
  2. Hybrid parallelization: Use multiprocessing + multithreading for I/O-bound operations
  3. GPU acceleration: For likelihood computations (requires significant refactoring)
  4. Distributed computing: Process different chromosomes on different machines

Conclusion

The multiprocessing optimization provides significant speedups for most genomic SV calling workloads, especially when:

  • Processing large regions with many blocks
  • Analyzing regions with multiple subgraphs
  • Running on multi-core systems

The changes are backward-compatible: settingn_processes to 1 causes the code behaves like the original sequential version.

@jgarthur jgarthur self-requested a review November 21, 2025 20:04
@jgarthur
Copy link
Copy Markdown
Collaborator

jgarthur commented Dec 7, 2025

@heesuallykim were you able to test this and confirm the speedup?

@heesuallykim
Copy link
Copy Markdown
Author

Hi, not yet - give me another 2 weeks for me to get back to you. There are a couple bugs I'm working on. I apologize for the delay!

@jgarthur
Copy link
Copy Markdown
Collaborator

Hi, not yet - give me another 2 weeks for me to get back to you. There are a couple bugs I'm working on. I apologize for the delay!

thanks! no hurry on my end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants