Parallelize chunked Parakeet batch transcription by hamzaq2000 · Pull Request #507 · FluidInference/FluidAudio

hamzaq2000 · 2026-04-09T05:25:40Z

Why is this change needed?

This PR speeds up Parakeet batch transcription for long audio by ~2.2-2.8x, by parallelizing the existing stateless chunked path. It doesn't change the streaming/live transcription path.

It adds a configurable parallelChunkConcurrency setting to ASRConfig, lets AsrManager create worker clones from already-loaded AsrModels, and updates ChunkProcessor to send independent chunks across that worker pool before merging the results with the existing merge logic.

The important part is that the decoding behavior for each chunk stays the same. The patch is really about scheduling chunk work in parallel so the runtime can keep more hardware busy and improve throughput on longer files.

Validation

Benchmarked on Apple M3, using 16 KHz 16-bit mono wav file downloaded from this video (~1 hour duration), with 5 runs each for current upstream vs. PR branch.

Model	Upstream Avg Time	PR Branch Avg Time	Speedup	Upstream Avg Peak Mem	PR Branch Avg Peak Mem	Delta
Parakeet v2	31.84 s	11.25 s	2.83x	515.9 MiB	537.4 MiB	+21.4 MiB
Parakeet v3	31.37 s	12.75 s	2.46x	496.0 MiB	527.0 MiB	+31.0 MiB
Parakeet tdt-ctc-110m	19.89 s	9.08 s	2.19x	489.6 MiB	509.2 MiB	+19.7 MiB

I compared the resulting transcripts and word timings before and after this change for v2, v3, and tdt-ctc-110m, and found no differences. So based on this one test file at least, the optimization appears safe.

Peak memory footprint was measured with macOS /usr/bin/time -lp. While it does increase, the measured increase is modest relative to the speedup, so I think it's reasonable to keep parallelChunkConcurrency set to 4 by default rather than make it opt-in.

`parallelChunkConcurrency` Optimal Value

A default value of 4 for the chunk parallelism was chosen becuase values higher than it yielded little to no extra speedup and values less than it still left speed on the table; on the two devices I tested on, at least, which were iPhone SE 3 and M3 MacBook Air.

AI Disclosure

OpenAI Codex was used to write the code for this patch.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Alex-Wengg · 2026-04-09T22:05:40Z

@hamzaq2000 could you compare the test-clean performances between your changes and main on your Mac. See the benchmarks.md for more info

hamzaq2000 · 2026-04-10T02:51:44Z

Yep, ran it, results (Apple M3):

Model	`main` Overall RTFx	PR Overall RTFx	`main` Total Time	PR Total Time	Speedup
Parakeet v3	89.81x	95.20x	216.59s	204.34s	1.06x
Parakeet v2	84.46x	87.64x	230.32s	221.97s	1.04x
Parakeet `tdt-ctc-110m`	135.33x	141.73x	143.74s	137.25s	1.05x

Not much of a speedup on shorter files, which the test-clean dataset is made up of:

Average duration: 7.42s
Median duration: 5.79s
Min duration: 1.29s
Max duration: 34.96s

But on longer files, quite a speedup.

Parallelize chunked Parakeet batch transcription

801f226

devin-ai-integration bot reviewed Apr 9, 2026

View reviewed changes

fix formatting in ChunkProcessor

c6e92cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize chunked Parakeet batch transcription#507

Parallelize chunked Parakeet batch transcription#507
hamzaq2000 wants to merge 2 commits intoFluidInference:mainfrom
hamzaq2000:main

hamzaq2000 commented Apr 9, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Alex-Wengg commented Apr 9, 2026

Uh oh!

hamzaq2000 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hamzaq2000 commented Apr 9, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this change needed?

Validation

parallelChunkConcurrency Optimal Value

AI Disclosure

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Alex-Wengg commented Apr 9, 2026

Uh oh!

hamzaq2000 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hamzaq2000 commented Apr 9, 2026 •

edited by devin-ai-integration bot

Loading

`parallelChunkConcurrency` Optimal Value