Support encoding into a bytes tensor #635

NicolasHug · 2025-04-10T15:29:43Z

This PR adds support for encoding audio samples into a bytes (uint8) tensor. We allocate the output tensor to 10MB, and double the allocated size up to 320MB if needed (after which we error out).

We rely on a new AVIOToTensorContext class, which inherits from AVIOContextHolder.

The C++ Encoder class now takes two optional and mutually exclusive parameters: fileName and format. fileName is used when encoding to a file, while format is used when encoding to a tensor.

The core API has changed: we don't expose an Encoder object anymore. We only expose two stateless functions:

encode_audio_to_file(filename : str, ...)
encoded_tensor = encode_audio_to_tensor(format : str, ...)

I suspect that the public Python API will look similar to this, but final design is still WIP.

…oder

src/torchcodec/_core/AVIOBytesContext.cpp

NicolasHug · 2025-04-15T09:22:01Z

src/torchcodec/_core/AVIOContextHolder.h

@@ -46,11 +46,13 @@ class AVIOContextHolder {

  // These signatures are defined by FFmpeg.
  using AVIOReadFunction = int (*)(void*, uint8_t*, int);
+  using AVIOWriteFunction = int (*)(void*, const uint8_t*, int);


Note: we define the buffer parameter as const uint8_t*, which is how it's defined starting from FFmpeg 7. Before that, it was defined as uint8_t*. That's why we now have to wrap the call to avio_alloc_context() into our own AVIOAllocContext() in FFMPEGCommon.cpp

NicolasHug · 2025-04-15T09:28:03Z

src/torchcodec/_core/CMakeLists.txt

@@ -59,8 +59,9 @@ function(make_torchcodec_libraries
    set(decoder_library_name "libtorchcodec_decoder${ffmpeg_major_version}")
    set(decoder_sources
        AVIOContextHolder.cpp
+        AVIOBytesContext.cpp


Encoder.[cpp, h] rely on AVIOToTensorContext, and specifically on the AVIOToTensorContext::getOutputTensor() method. They don't rely on the AVIOContextHolder base class, like the decoder. For this reason we have to add AVIOBytesContext.cpp to the source dependency here.

Alternatively, I think we could make getOutputTensor a virtual method of the base class? But this method wouldn't make much sense for the existing child classes (like AVIOBytesContext), so this doesn't sounds like a great OOP design.

Agreed on not making getOutputTensor() virtual on the base class, since it's only applicable to one of the derived classes. We could potentially pull AVIOToTensorContext out of AVIOBytesContext.[h|cpp] to limit what gets put into libtorchcodec_decoderN.so.

I think the issue here is that the Encoder class needs to call getOutputTensor(), which means that it must actually hold a reference to an actual AVIOToTensorContext rather than just the base AVIOContextHolder. (Which is what SingleStreamDecoder does.) A potential way around this is for AVIOToTensorContext to accept a tensor rather than creating its own. The caller, however, would still need to keep track of how many bytes it asked it to encode in order to do the final narrow on the tensor, so that's not necessarily any cleaner. Your call on what you think makes the most sense.

NicolasHug · 2025-04-15T09:33:17Z

src/torchcodec/_core/Encoder.cpp

+    std::optional<std::string_view> fileName,
+    std::optional<std::string_view> formatName,


We enforce that fileName and formatName parameters are mutually exclusive. Happy to consider alternative designs. Creating separate constructors seemed more complex and doesn't provide much value IMHO.

Note that this only exists at the C++ constructor level. The custom ops expose two distinct entry-points: encode_audio_to_file() and encode_audio_to_tensor(). I suspect that the public Python API will look similar to this, but that's still up for discussion.

I think we should follow the same pattern set out in SingleStreamDecoder: one constructor that takes the file name, and another constructor that takes the AVIOToTensorContext as an argument. That means it's the callers responsibility to create the AVIOToTensorContext object; that's how it currently works in SingleStreamDecoder. That would also require pulling all common initialization into some initializeEncoder() function that both constructors would call.

The value, to me, is that it's more clear when creating the object that you're doing it correctly, and then when reading the initialization code, it's more clear what's required for, and correct for, each path.

NicolasHug · 2025-04-15T09:45:50Z

src/torchcodec/_core/AVIOContextHolder.cpp

+  TORCH_CHECK(
+      (seek != nullptr) && ((write != nullptr) ^ (read != nullptr)),
+      "seek method must be defined, and either write or read must be defined. "
+      "But not both!")


We may relax the mutual-exclusivity check above eventually, if we implement both write and read within the same class. For now, mutual-exclusivity is assumed and enforced, because we use the existence of write to set the write_flag below.

scotts · 2025-04-21T20:19:06Z

src/torchcodec/_core/FFMPEGCommon.h

+    void* opaque,
+    int (*read_packet)(void* opaque, uint8_t* buf, int buf_size),
+    int (*write_packet)(void* opaque, const uint8_t* buf, int buf_size),
+    int64_t (*seek)(void* opaque, int64_t offset, int whence));


Let's move the convenience function type aliases of AVIO*Function to here, and then use those in all of our interfaces. That should be easier to grep for, and also easier to tell at a glance that the signatures are the same in different places.

Also, since AVIOAllocContext is a function and not a type, I think we should name it avioAllocContext(). Otherwise I think readers will assume we're instantiating an object of type AVIOAllocContext until they look at the declaration.

Done - I assume we can't use the type aliases in the child classes member declarations / definition? I.e. in the class AVIOBytesContext header, we still have to use verbose

static int read(void* opaque, uint8_t* buf, int buf_size);

declaration, right?

Yes, correct. We can only use the type aliases when we declare some variable that has that type.

scotts · 2025-04-28T01:51:21Z

Approving to unblock, but I do think we should follow the same two-constructor pattern we have in SingleStreamDecoder.

NicolasHug added 17 commits April 7, 2025 16:15

Disable FFmpeg logs for encoder

7921558

Merge branch 'main' of github.com:pytorch/torchcodec into loglevelenc…

2a19014

…oder

Use c++ strings

73bdc85

Merge branch 'main' of github.com:pytorch/torchcodec into loglevelenc…

54f5543

…oder

Account for frame_size being 0

24842b6

Merge branch 'main' of github.com:pytorch/torchcodec into encoding_wav

c3ac80a

WIP

5b39c8f

Move createSwrContext in ffmpeg file

1f9f904

WIP

f525848

Move convertAudioAVFrameSampleFormatAndSampleRate in ffmpeg file

9150137

Automatically find output sample format

872b569

Convert sample format, update tests

a0dcafd

Skip wav on FFmpeg4

f49d507

Add assertion

ee3a199

Move comment

485ee2e

Better default heuristic

27fdbac

Merge branch 'main' of github.com:pytorch/torchcodec into encoding_wav

8467b92

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 10, 2025

NicolasHug commented Apr 10, 2025

View reviewed changes

src/torchcodec/_core/AVIOBytesContext.cpp Show resolved Hide resolved

NicolasHug force-pushed the avio branch from 83c75b5 to cfea9ab Compare April 11, 2025 13:27

NicolasHug added 2 commits April 11, 2025 15:39

Support encoding into tensor

ee7a217

Merge branch 'main' of github.com:pytorch/torchcodec into avio

7b3847f

NicolasHug force-pushed the avio branch from cfea9ab to 7b3847f Compare April 14, 2025 12:51

NicolasHug added 6 commits April 14, 2025 14:18

nits

d85baa2

Allow output tensor re-allocation

42c6373

Fix compilation on FFmpeg7?

254529f

Fix?

3f0417c

Use int64_t consistently

290c96e

cmake

5f42d15

NicolasHug commented Apr 15, 2025

View reviewed changes

scotts reviewed Apr 21, 2025

View reviewed changes

NicolasHug added 3 commits April 22, 2025 10:03

Merge branch 'main' of github.com:pytorch/torchcodec into avio

0f415c1

Move type aliases, fix avioAllocContext name

2954c9b

Merge branch 'main' of github.com:pytorch/torchcodec into avio

29866fc

scotts approved these changes Apr 28, 2025

View reviewed changes

NicolasHug mentioned this pull request Apr 28, 2025

Use "LINKER:-undefined,dynamic_lookup" to avoid wrong deduplication #657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support encoding into a bytes tensor #635

Support encoding into a bytes tensor #635

NicolasHug commented Apr 10, 2025 •

edited

Loading

NicolasHug Apr 15, 2025

NicolasHug Apr 15, 2025

scotts Apr 28, 2025

NicolasHug Apr 15, 2025

scotts Apr 28, 2025

NicolasHug Apr 15, 2025

scotts Apr 21, 2025

NicolasHug Apr 22, 2025 •

edited

Loading

scotts Apr 28, 2025

scotts commented Apr 28, 2025

		std::optional<std::string_view> fileName,
		std::optional<std::string_view> formatName,

Support encoding into a bytes tensor #635

Are you sure you want to change the base?

Support encoding into a bytes tensor #635

Conversation

NicolasHug commented Apr 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scotts commented Apr 28, 2025

NicolasHug commented Apr 10, 2025 •

edited

Loading

NicolasHug Apr 22, 2025 •

edited

Loading