Allow using mean read length in FASTQ-trained read simulation by faithokamoto · Pull Request #4855 · vgteam/vg

faithokamoto · 2026-03-18T19:02:42Z

Changelog Entry

To be copied to the draft changelog by merger:

Add vg sim --use-average-length option

Description

When vg sim -F is given a FASTQ file to try to match, it auto-detects a read length and then simulates a bunch of reads of that length. The current auto-detection logic uses the mode, i.e. the most common read length. While that works fine for short reads where the common read length is probably the target, for long reads it is problematic. This PR adds a --use-average-length/-L option which tells the sampler to calculate an average read length and use that instead.

The ideal state would probably be simulating from a read length distribution. However, we're only set up to simulate a single read length at a time, and since I'm currently doing HiFi reads (not nanopore) this is good enough for now.

faithokamoto added 2 commits March 18, 2026 10:30

add --use-average-length option

fefe56b

updated logging apparatus

e6d4c1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using mean read length in FASTQ-trained read simulation#4855

Allow using mean read length in FASTQ-trained read simulation#4855
faithokamoto wants to merge 2 commits intomasterfrom
sim-avg-len

faithokamoto commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faithokamoto commented Mar 18, 2026

Changelog Entry

Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant