Support DTS-HD in ts extractor by nift4 · Pull Request #3147 · androidx/media

nift4 · 2026-03-28T16:16:08Z

The TS extractor already supports DTS Express which uses the same
headers as DTS-HD, but unlike DTS Express, DTS-HD uses interleaved
core and extension samples.

A few issues arise from this fact:

We cannot output the final Format after seeing the first core frame,
we must wait and see if extension frame will follow.
We must emit the core and extension frames as one sample as opposed
to two, as they belong together.
After seeking, we must ensure to sync to a core frame (which comes
first) and not starting to decode at extension frame.

Fix those with a wrapper approach to avoid changing logic for existing
DTS formats.

Issue: #2487

FongMi · 2026-03-28T17:54:30Z

Not enough. Please follow the ffmpeg implementation.

nift4 · 2026-03-28T18:27:22Z

@FongMi please can you elaborate a bit?

FongMi · 2026-03-28T18:46:16Z

https://github.com/FongMi/media/blob/release-1.10.0-fongmi/libraries/extractor/src/main/java/androidx/media3/extractor/ts/DtsReader.java
https://github.com/FongMi/media/blob/release-1.10.0-fongmi/libraries/extractor/src/main/java/androidx/media3/extractor/DtsUtil.java

This is what I wrote using copilot + sonnet 4.6

nift4 · 2026-03-28T19:07:24Z

@FongMi Well, I see that you did a lot of changes and most of them seem to address the same issue that we shouldn't emit core format but extension format instead (I solved it with the PatientTrackOutput wrapper and you added a new state and coreFormatPendingEmit instead). My fix is a bit subtle because it intercepts the format() call and waits until extension header is parsed before sending, allowing to overwrite the format which is cached. But it works and it's readable.

I also see another issue which I didn't solve but you solved it, that we should wait for core frame after seeking instead of sending extension substream frame first - it makes sense and I'll add it to my patch.

Is there any other problem I need to address?

FongMi · 2026-03-28T19:17:22Z

That should be all. I'll test the ISO file again after you finish.

nift4 · 2026-03-28T19:29:33Z

@FongMi It should be good now, let me know if it works. Thanks :)

FongMi · 2026-03-29T02:44:43Z

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

FongMi · 2026-03-29T04:43:14Z

FongMi@6755c3e
Please also implement DTS-MA and DTS-X recognition.
I've switched DtsUtil to your version to reduce merge conflicts.

nift4 · 2026-03-29T08:39:30Z

@FongMi

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

Should be fixed, please check again. Thanks for testing!

Please also implement DTS-MA and DTS-X recognition.

It's on my TODO list, but I will do it in a seperate PR when this is merged. This one is already quite big.

FongMi · 2026-03-30T02:17:16Z

@nift4 Users say it's still very laggy.

Core Architectural Difference

nick — `PatientTrackOutput` double-buffer

TS input → PatientTrackOutput internal buffer → flush() → TrackOutput

Every frame goes through two copies:

// Copy 1: into internal ParsableByteArray
public void sampleData(ParsableByteArray data, int length) {
    this.data.ensureCapacity(this.data.limit() + length);
    ByteBuffer tmp = ByteBuffer.wrap(this.data.getData()); // new object every call
    tmp.put(data.getData(), data.getPosition(), length);
}

// Copy 2: out to the real TrackOutput
public void flush() {
    output.sampleData(data, data.limit());
}

fongmi — zero-copy passthrough

TS input → TrackOutput (direct)

Instead of buffering, it adds a STATE_CHECKING_FOR_EXTSS_AFTER_CORE state: after reading a core frame, it peeks 4 bytes to check whether an EXSS follows, then decides to combine or emit immediately — no intermediate buffer needed.

Performance Comparison

Criterion	fongmi	nick
Data copies per frame	1 (direct write)	2 (buffered)
Per-frame heap allocation	None	`ByteBuffer.wrap()` each call
`ensureCapacity` reallocation risk	None	Yes (large frames)
DTS:X (XLL-X) auto-detection	Yes	No
Post-seek resync	`skipExtssUntilCore` (precise)	`waitingForResyncAfterSeek` (basic)
GC pressure	Low	Higher

Verdict

fongmi is faster, for three reasons:

Zero-copy: data is written directly to TrackOutput, eliminating the PatientTrackOutput intermediate buffer and second memcpy. For large DTS-HD MA frames (potentially hundreds of KB each), this is significant.
No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.
Richer functionality: fongmi adds XLL-X scanning to auto-detect DTS:X object audio and properly upgrades the track format to AUDIO_DTS_X, which nick lacks entirely.

nick's PatientTrackOutput is a cleaner abstraction conceptually, but the double-copy cost makes it a poor trade-off for a hot path that processes audio frames continuously.

nift4 · 2026-03-30T07:09:05Z

Hi @FongMi, I ran my sample file through both versions of extractors and only difference I saw in result was average bitrate was set on mine. That's set from core header, I imagine it might cause buffer allocation to be too small, so I removed it. Can you re-test?

No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.

Also, I don't think that's the problem causing audible lags, but I changed it to only do that on the first frame (after seek).

rohitjoins · 2026-04-02T14:57:43Z

Hi @nift4,

Thank you for all the work and iterations on this!

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Suggestion:

Fix for mime type in TsExtractor
Format emission issue in TsExtractor
Fix for MatroskaExtractor
Fix for Mp4Extractor.

Once done, I'm happy to look at them individually and merge them.

nift4 · 2026-04-02T15:11:35Z

Hi @rohitjoins,

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Because the other parts depend on mime type detection change in TsExtractor I had it as one branch, otherwise there would be conflicts. But we can instead merge this change first (in this PR, I removed everything except "Fix for mime type in TsExtractor") and I will add the other changes to new PRs once this one is merged.

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Looking forward to review, thanks!

rohitjoins · 2026-04-02T16:57:40Z

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Thanks for clarifying! While setting format = null; makes the test pass, we unfortunately can't use this in production.

Clearing the format on seek() forces the extractor to re-emit the format. This causes the player to reinitialize the audio decoder on every seek, which leads directly to audio dropouts and the stuttering reported IMO.

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

nift4 · 2026-04-02T17:16:43Z

Hi @rohitjoins,

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

Yes, I did do this, as you can see on my final branch here it's no longer needed to reset the format:
https://github.com/nift4/media/blob/dtshdints-backup/libraries/extractor/src/main/java/androidx/media3/extractor/ts/DtsReader.java#L124

It's only in this PR because the DTS-HD MA sample (which is added to demonstrate the mime type change) is separated from the fix to emit format only once.

So, I guess it was the wrong order to merge changes. Instead, I will remove mime type change from the branch, and add format-only-once fix here. Then we can merge that first, then the mime type fix, and then the MKV/MP4 related changes; and format = null; won't be needed at any point in time.

I pushed this, so now the format-only-fix is in this PR. I opted for this wrapper in order to be minimally invasive, and most importantly, don't duplicate DTS related business logic. FongMi has an approach where there are a bunch of additional states added, which is a different set of tradeoffs for code flow with the same end result. Let me know if mine is OK or if you'd rather have the different state approach.

rohitjoins · 2026-04-08T17:24:46Z

Hi @nift4,

Thanks for explaining your rationale. While I have not compared the performance differences for either of the approaches, I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

don't duplicate DTS related business logic

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

nift4 · 2026-04-08T18:09:06Z

Hi @rohitjoins,

I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

Please note there is only one copy, for the very first core frame ,before the format is emitted. After format is emitted to ExtractorOutput, there is no copy anymore and buffer is released (the flush() calls are only used to trigger emitting sampleMetadata, and not sampleData, at this point), as I removed it since FongMi pointed it out. (edit: force pushed to fix a small bug causing useless memory copies)

Given the constraints that:

consume() in DtsReader cannot peek into future bytes, or go back
there is no guarantee we get all bytes of a frame or even just header on first invocation of consume()
we need to read extension substream's header before calling output.format() - and hence before calling output.sampleData()

...I think there is no way to get rid of this one temporary memory copy in any way.

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

We need to read extension substream's header before calling format(). That involves:

first finding core syncword if any (there may be none for DTS Express or DTS-HD MA without backward compatible)
a. parsing core header
b. storing core sample data in temporary buffer
then finding extension syncword if any (if there is another core one, emit the core sample data and format BEFORE parsing new one)
a. then parsing its header
b. call output.format() based on extension header data
c. only now we can emit core sampleData()

This is the business logic I was talking about. Almost entirety of consume() method code is required for this job. But consume() method is optimized to directly pass data to ExtractorOutput, even though we cannot pass any data to it until we called format().

The difficulty in extracting helper method here is that consume() method may be called multiple times until we finish with any one specific aspect of this parsing. So there is a need to keep state in the class, as is currently done already.

But due to different needs of:

first frame (cannot call ExtractorOutput methods in any case and must cache data in the class with temporary buffer)
second frame (based on first frame type, we have to maybe parse header before emitting format, or maybe after, then emit old sample data if any, then proceed as normal)
subsequent frames (should instantly output data to ExtractorOutput without special logic)

I am not sure how code can be shared in elegant way. I'd be interested if you see any way to do it nicely.

Let me know if that's clear or if you have further questions, Thanks for the feedback!

The TS extractor already supports DTS Express which uses the same headers as DTS-HD, but unlike DTS Express, DTS-HD uses interleaved core and extension samples. A few issues arise from this fact: * We cannot output the final Format after seeing the first core frame, we must wait and see if extension frame will follow. * We must emit the core and extension frames as one sample as opposed to two, as they belong together. * After seeking, we must ensure to sync to a core frame (which comes first) and not starting to decode at extension frame. Fix those with a wrapper approach to avoid changing logic for existing DTS formats. Issue: androidx#2487

rohitjoins · 2026-04-09T12:57:44Z

Hi @nift4,

Thank you for the detailed breakdown! Your logic makes complete sense based on the constraint that we need to read the extension substream's header before calling output.format(), and hence before calling output.sampleData().

However, in Media3's TrackOutput architecture, that constraint doesn't actually exist for sampleData(). The sampleData() method simply pipes raw bytes into a format-agnostic queue. The format is only strictly required and checked at the very end when sampleMetadata() is called to commit the frame.

Because of this, we can stream raw bytes via sampleData() before the format is fully resolved. For example, if you look at H264Reader.java or H265Reader.java, they stream the NAL units via output.sampleData(data, data.bytesLeft()); as they arrive, parse the SPS headers to build the format later, call output.format(), and finally call sampleMetadata().

Using this pattern, we can achieve the zero-copy state machine for DTS-HD without needing any temporary buffer copies, even for the first frame. Inspired by @FongMi's approach, I went ahead and implemented this here: a0c4698.

Could you take a look and let me know if this approach seems fine to you? I can either send my commit directly for internal review, or if you prefer, you can update this PR to include these changes and I can import it from there. Either way, getting this submitted will enable you to add your dependent PRs.

nift4 · 2026-04-09T13:15:42Z

Hi @rohitjoins,

that constraint doesn't actually exist for sampleData()

Duh... I don't know why I thought it did...

Could you take a look and let me know if this approach seems fine to you? I can either send my commit directly for internal review

Sure, sending it to internal review is fine, looks like it should work just fine and solves the problem.

Thanks for looking into it!

FongMi · 2026-04-09T13:29:16Z

It’s time to use AI to address the shortcomings that Media3 previously had. 🤣

nift4 force-pushed the dtshdints branch from 181cf37 to 1eb50e8 Compare March 28, 2026 19:21

nift4 changed the title ~~Support DTS-HD in ts extractor~~ Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors Mar 28, 2026

nift4 force-pushed the dtshdints branch from 1eb50e8 to ff9e3bf Compare March 28, 2026 19:28

nift4 changed the title ~~Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors~~ Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors Mar 29, 2026

oceanjules assigned rohitjoins Apr 2, 2026

rohitjoins added the pending comments label Apr 2, 2026

nift4 force-pushed the dtshdints branch 2 times, most recently from ea7a9c6 to 90611d7 Compare April 2, 2026 15:08

nift4 changed the title ~~Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors~~ Support DTS-HD mime type detection in ts extractor Apr 2, 2026

nift4 changed the title ~~Support DTS-HD mime type detection in ts extractor~~ Support DTS-HD in ts extractor Apr 2, 2026

nift4 force-pushed the dtshdints branch from 90611d7 to ce68449 Compare April 2, 2026 17:16

nift4 force-pushed the dtshdints branch from ce68449 to d247497 Compare April 7, 2026 14:43

nift4 force-pushed the dtshdints branch from d247497 to 1de2697 Compare April 8, 2026 18:13

Conversation

nift4 commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FongMi commented Mar 28, 2026

Uh oh!

nift4 commented Mar 28, 2026

Uh oh!

FongMi commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nift4 commented Mar 28, 2026

Uh oh!

FongMi commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nift4 commented Mar 28, 2026

Uh oh!

FongMi commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FongMi commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nift4 commented Mar 29, 2026

Uh oh!

FongMi commented Mar 30, 2026

Core Architectural Difference

nick — PatientTrackOutput double-buffer

fongmi — zero-copy passthrough

Performance Comparison

Verdict

Uh oh!

nift4 commented Mar 30, 2026

Uh oh!

rohitjoins commented Apr 2, 2026

Uh oh!

nift4 commented Apr 2, 2026

Uh oh!

rohitjoins commented Apr 2, 2026

Uh oh!

nift4 commented Apr 2, 2026

Uh oh!

rohitjoins commented Apr 8, 2026

Uh oh!

nift4 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohitjoins commented Apr 9, 2026

Uh oh!

nift4 commented Apr 9, 2026

Uh oh!

FongMi commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nift4 commented Mar 28, 2026 •

edited

Loading

FongMi commented Mar 28, 2026 •

edited

Loading

FongMi commented Mar 28, 2026 •

edited

Loading

FongMi commented Mar 29, 2026 •

edited

Loading

FongMi commented Mar 29, 2026 •

edited

Loading

nick — `PatientTrackOutput` double-buffer

nift4 commented Apr 8, 2026 •

edited

Loading