Skip to content

Support DTS-HD in ts extractor#3147

Open
nift4 wants to merge 1 commit intoandroidx:mainfrom
nift4:dtshdints
Open

Support DTS-HD in ts extractor#3147
nift4 wants to merge 1 commit intoandroidx:mainfrom
nift4:dtshdints

Conversation

@nift4
Copy link
Copy Markdown
Contributor

@nift4 nift4 commented Mar 28, 2026

The TS extractor already supports DTS Express which uses the same
headers as DTS-HD, but unlike DTS Express, DTS-HD uses interleaved
core and extension samples.

A few issues arise from this fact:

  • We cannot output the final Format after seeing the first core frame,
    we must wait and see if extension frame will follow.
  • We must emit the core and extension frames as one sample as opposed
    to two, as they belong together.
  • After seeking, we must ensure to sync to a core frame (which comes
    first) and not starting to decode at extension frame.

Fix those with a wrapper approach to avoid changing logic for existing
DTS formats.

Issue: #2487

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

Not enough. Please follow the ffmpeg implementation.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi please can you elaborate a bit?

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi Well, I see that you did a lot of changes and most of them seem to address the same issue that we shouldn't emit core format but extension format instead (I solved it with the PatientTrackOutput wrapper and you added a new state and coreFormatPendingEmit instead). My fix is a bit subtle because it intercepts the format() call and waits until extension header is parsed before sending, allowing to overwrite the format which is cached. But it works and it's readable.

I also see another issue which I didn't solve but you solved it, that we should wait for core frame after seeking instead of sending extension substream frame first - it makes sense and I'll add it to my patch.

Is there any other problem I need to address?

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 28, 2026

That should be all. I'll test the ISO file again after you finish.

@nift4 nift4 changed the title Support DTS-HD in ts extractor Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors Mar 28, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 28, 2026

@FongMi It should be good now, let me know if it works. Thanks :)

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 29, 2026

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 29, 2026

FongMi@6755c3e
Please also implement DTS-MA and DTS-X recognition.
I've switched DtsUtil to your version to reduce merge conflicts.

@nift4 nift4 changed the title Support differentiating between DTS-HD and DTS Express in TS and Matroska extractors Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors Mar 29, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 29, 2026

@FongMi

Users have reported that the playback quality of DTS-HD is not as good as my version, and that stuttering occurs. This may be because your Core+EXTSS is sent in segments, while mine is sent all at once.

Should be fixed, please check again. Thanks for testing!

Please also implement DTS-MA and DTS-X recognition.

It's on my TODO list, but I will do it in a seperate PR when this is merged. This one is already quite big.

@FongMi
Copy link
Copy Markdown

FongMi commented Mar 30, 2026

@nift4 Users say it's still very laggy.

Core Architectural Difference

nick — PatientTrackOutput double-buffer

TS input → PatientTrackOutput internal buffer → flush() → TrackOutput

Every frame goes through two copies:

// Copy 1: into internal ParsableByteArray
public void sampleData(ParsableByteArray data, int length) {
    this.data.ensureCapacity(this.data.limit() + length);
    ByteBuffer tmp = ByteBuffer.wrap(this.data.getData()); // new object every call
    tmp.put(data.getData(), data.getPosition(), length);
}

// Copy 2: out to the real TrackOutput
public void flush() {
    output.sampleData(data, data.limit());
}

fongmi — zero-copy passthrough

TS input → TrackOutput (direct)

Instead of buffering, it adds a STATE_CHECKING_FOR_EXTSS_AFTER_CORE state: after reading a core frame, it peeks 4 bytes to check whether an EXSS follows, then decides to combine or emit immediately — no intermediate buffer needed.


Performance Comparison

Criterion fongmi nick
Data copies per frame 1 (direct write) 2 (buffered)
Per-frame heap allocation None ByteBuffer.wrap() each call
ensureCapacity reallocation risk None Yes (large frames)
DTS:X (XLL-X) auto-detection Yes No
Post-seek resync skipExtssUntilCore (precise) waitingForResyncAfterSeek (basic)
GC pressure Low Higher

Verdict

fongmi is faster, for three reasons:

  1. Zero-copy: data is written directly to TrackOutput, eliminating the PatientTrackOutput intermediate buffer and second memcpy. For large DTS-HD MA frames (potentially hundreds of KB each), this is significant.

  2. No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.

  3. Richer functionality: fongmi adds XLL-X scanning to auto-detect DTS:X object audio and properly upgrades the track format to AUDIO_DTS_X, which nick lacks entirely.

nick's PatientTrackOutput is a cleaner abstraction conceptually, but the double-copy cost makes it a poor trade-off for a hot path that processes audio frames continuously.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Mar 30, 2026

Hi @FongMi, I ran my sample file through both versions of extractors and only difference I saw in result was average bitrate was set on mine. That's set from core header, I imagine it might cause buffer allocation to be too small, so I removed it. Can you re-test?

No per-frame heap allocation: nick allocates a new ByteBuffer wrapper on every sampleData() call, increasing GC pressure.

Also, I don't think that's the problem causing audible lags, but I changed it to only do that on the first frame (after seek).

@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

Thank you for all the work and iterations on this!

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Suggestion:

  • Fix for mime type in TsExtractor
  • Format emission issue in TsExtractor
  • Fix for MatroskaExtractor
  • Fix for Mp4Extractor.

Once done, I'm happy to look at them individually and merge them.

@nift4 nift4 force-pushed the dtshdints branch 2 times, most recently from ea7a9c6 to 90611d7 Compare April 2, 2026 15:08
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 2, 2026

Hi @rohitjoins,

The PR in its current form, touches / fixes lots of different issues in extractor. Could we break this down further into smaller issues, focusing one thing at a time?

Because the other parts depend on mime type detection change in TsExtractor I had it as one branch, otherwise there would be conflicts. But we can instead merge this change first (in this PR, I removed everything except "Fix for mime type in TsExtractor") and I will add the other changes to new PRs once this one is merged.

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Looking forward to review, thanks!

@nift4 nift4 changed the title Support differentiating between DTS-HD and DTS Express in TS, MP4 and Matroska extractors Support DTS-HD mime type detection in ts extractor Apr 2, 2026
@rohitjoins
Copy link
Copy Markdown
Contributor

Also please note that the format = null; change is required (and not left over by mistake) in order to have the DTS-HD MA sample ever work with DumpFileAsserts, because without this line, a seek back to 0 would yield different result and test would not work.

Thanks for clarifying! While setting format = null; makes the test pass, we unfortunately can't use this in production.

Clearing the format on seek() forces the extractor to re-emit the format. This causes the player to reinitialize the audio decoder on every seek, which leads directly to audio dropouts and the stuttering reported IMO.

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

@nift4 nift4 changed the title Support DTS-HD mime type detection in ts extractor Support DTS-HD in ts extractor Apr 2, 2026
@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 2, 2026

Hi @rohitjoins,

The correct way to make the DumpFileAsserts test pass is to avoid the two-step (Core -> HD) format emission entirely. If we wait to check for the extension substream before emitting the format (e.g., using a peek state), the extractor will only ever emit the final DTS-HD format. This ensures the initial playback and the seek() behavior match naturally, fixing the test without causing decoder resets.

Yes, I did do this, as you can see on my final branch here it's no longer needed to reset the format:
https://github.com/nift4/media/blob/dtshdints-backup/libraries/extractor/src/main/java/androidx/media3/extractor/ts/DtsReader.java#L124

It's only in this PR because the DTS-HD MA sample (which is added to demonstrate the mime type change) is separated from the fix to emit format only once.

So, I guess it was the wrong order to merge changes. Instead, I will remove mime type change from the branch, and add format-only-once fix here. Then we can merge that first, then the mime type fix, and then the MKV/MP4 related changes; and format = null; won't be needed at any point in time.

I pushed this, so now the format-only-fix is in this PR. I opted for this wrapper in order to be minimally invasive, and most importantly, don't duplicate DTS related business logic. FongMi has an approach where there are a bunch of additional states added, which is a different set of tradeoffs for code flow with the same end result. Let me know if mine is OK or if you'd rather have the different state approach.

@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

Thanks for explaining your rationale. While I have not compared the performance differences for either of the approaches, I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

don't duplicate DTS related business logic

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 8, 2026

Hi @rohitjoins,

I think it would make sense to avoid temporary copies in the memory before being flushed out to the actual output?

Please note there is only one copy, for the very first core frame ,before the format is emitted. After format is emitted to ExtractorOutput, there is no copy anymore and buffer is released (the flush() calls are only used to trigger emitting sampleMetadata, and not sampleData, at this point), as I removed it since FongMi pointed it out. (edit: force pushed to fix a small bug causing useless memory copies)

Given the constraints that:

  • consume() in DtsReader cannot peek into future bytes, or go back
  • there is no guarantee we get all bytes of a frame or even just header on first invocation of consume()
  • we need to read extension substream's header before calling output.format() - and hence before calling output.sampleData()

...I think there is no way to get rid of this one temporary memory copy in any way.

Can you please expand more on this? Can we avoid this by refactoring into reusable helper methods?

We need to read extension substream's header before calling format(). That involves:

  1. first finding core syncword if any (there may be none for DTS Express or DTS-HD MA without backward compatible)
    a. parsing core header
    b. storing core sample data in temporary buffer
  2. then finding extension syncword if any (if there is another core one, emit the core sample data and format BEFORE parsing new one)
    a. then parsing its header
    b. call output.format() based on extension header data
    c. only now we can emit core sampleData()

This is the business logic I was talking about. Almost entirety of consume() method code is required for this job. But consume() method is optimized to directly pass data to ExtractorOutput, even though we cannot pass any data to it until we called format().

The difficulty in extracting helper method here is that consume() method may be called multiple times until we finish with any one specific aspect of this parsing. So there is a need to keep state in the class, as is currently done already.

But due to different needs of:

  • first frame (cannot call ExtractorOutput methods in any case and must cache data in the class with temporary buffer)
  • second frame (based on first frame type, we have to maybe parse header before emitting format, or maybe after, then emit old sample data if any, then proceed as normal)
  • subsequent frames (should instantly output data to ExtractorOutput without special logic)

I am not sure how code can be shared in elegant way. I'd be interested if you see any way to do it nicely.

Let me know if that's clear or if you have further questions, Thanks for the feedback!

The TS extractor already supports DTS Express which uses the same
headers as DTS-HD, but unlike DTS Express, DTS-HD uses interleaved
core and extension samples.

A few issues arise from this fact:
* We cannot output the final Format after seeing the first core frame,
  we must wait and see if extension frame will follow.
* We must emit the core and extension frames as one sample as opposed
  to two, as they belong together.
* After seeking, we must ensure to sync to a core frame (which comes
  first) and not starting to decode at extension frame.

Fix those with a wrapper approach to avoid changing logic for existing
DTS formats.

Issue: androidx#2487
@rohitjoins
Copy link
Copy Markdown
Contributor

Hi @nift4,

Thank you for the detailed breakdown! Your logic makes complete sense based on the constraint that we need to read the extension substream's header before calling output.format(), and hence before calling output.sampleData().

However, in Media3's TrackOutput architecture, that constraint doesn't actually exist for sampleData(). The sampleData() method simply pipes raw bytes into a format-agnostic queue. The format is only strictly required and checked at the very end when sampleMetadata() is called to commit the frame.

Because of this, we can stream raw bytes via sampleData() before the format is fully resolved. For example, if you look at H264Reader.java or H265Reader.java, they stream the NAL units via output.sampleData(data, data.bytesLeft()); as they arrive, parse the SPS headers to build the format later, call output.format(), and finally call sampleMetadata().

Using this pattern, we can achieve the zero-copy state machine for DTS-HD without needing any temporary buffer copies, even for the first frame. Inspired by @FongMi's approach, I went ahead and implemented this here: a0c4698.

Could you take a look and let me know if this approach seems fine to you? I can either send my commit directly for internal review, or if you prefer, you can update this PR to include these changes and I can import it from there. Either way, getting this submitted will enable you to add your dependent PRs.

@nift4
Copy link
Copy Markdown
Contributor Author

nift4 commented Apr 9, 2026

Hi @rohitjoins,

that constraint doesn't actually exist for sampleData()

Duh... I don't know why I thought it did...

Could you take a look and let me know if this approach seems fine to you? I can either send my commit directly for internal review

Sure, sending it to internal review is fine, looks like it should work just fine and solves the problem.

Thanks for looking into it!

@FongMi
Copy link
Copy Markdown

FongMi commented Apr 9, 2026

It’s time to use AI to address the shortcomings that Media3 previously had. 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants