-
Notifications
You must be signed in to change notification settings - Fork 31
Support encoding into a bytes tensor #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -46,11 +46,13 @@ class AVIOContextHolder { | |||
|
|||
// These signatures are defined by FFmpeg. | |||
using AVIOReadFunction = int (*)(void*, uint8_t*, int); | |||
using AVIOWriteFunction = int (*)(void*, const uint8_t*, int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we define the buffer parameter as const uint8_t*
, which is how it's defined starting from FFmpeg 7. Before that, it was defined as uint8_t*
. That's why we now have to wrap the call to avio_alloc_context()
into our own AVIOAllocContext()
in FFMPEGCommon.cpp
@@ -59,8 +59,9 @@ function(make_torchcodec_libraries | |||
set(decoder_library_name "libtorchcodec_decoder${ffmpeg_major_version}") | |||
set(decoder_sources | |||
AVIOContextHolder.cpp | |||
AVIOBytesContext.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoder.[cpp, h]
rely on AVIOToTensorContext
, and specifically on the AVIOToTensorContext::getOutputTensor()
method. They don't rely on the AVIOContextHolder
base class, like the decoder. For this reason we have to add AVIOBytesContext.cpp
to the source dependency here.
Alternatively, I think we could make getOutputTensor
a virtual method of the base class? But this method wouldn't make much sense for the existing child classes (like AVIOBytesContext
), so this doesn't sounds like a great OOP design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on not making getOutputTensor()
virtual on the base class, since it's only applicable to one of the derived classes. We could potentially pull AVIOToTensorContext
out of AVIOBytesContext.[h|cpp]
to limit what gets put into libtorchcodec_decoderN.so
.
I think the issue here is that the Encoder
class needs to call getOutputTensor()
, which means that it must actually hold a reference to an actual AVIOToTensorContext
rather than just the base AVIOContextHolder
. (Which is what SingleStreamDecoder
does.) A potential way around this is for AVIOToTensorContext
to accept a tensor rather than creating its own. The caller, however, would still need to keep track of how many bytes it asked it to encode in order to do the final narrow on the tensor, so that's not necessarily any cleaner. Your call on what you think makes the most sense.
std::optional<std::string_view> fileName, | ||
std::optional<std::string_view> formatName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We enforce that fileName
and formatName
parameters are mutually exclusive. Happy to consider alternative designs. Creating separate constructors seemed more complex and doesn't provide much value IMHO.
Note that this only exists at the C++ constructor level. The custom ops expose two distinct entry-points: encode_audio_to_file()
and encode_audio_to_tensor()
. I suspect that the public Python API will look similar to this, but that's still up for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the same pattern set out in SingleStreamDecoder
: one constructor that takes the file name, and another constructor that takes the AVIOToTensorContext
as an argument. That means it's the callers responsibility to create the AVIOToTensorContext
object; that's how it currently works in SingleStreamDecoder
. That would also require pulling all common initialization into some initializeEncoder()
function that both constructors would call.
The value, to me, is that it's more clear when creating the object that you're doing it correctly, and then when reading the initialization code, it's more clear what's required for, and correct for, each path.
TORCH_CHECK( | ||
(seek != nullptr) && ((write != nullptr) ^ (read != nullptr)), | ||
"seek method must be defined, and either write or read must be defined. " | ||
"But not both!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may relax the mutual-exclusivity check above eventually, if we implement both write
and read
within the same class. For now, mutual-exclusivity is assumed and enforced, because we use the existence of write
to set the write_flag
below.
src/torchcodec/_core/FFMPEGCommon.h
Outdated
void* opaque, | ||
int (*read_packet)(void* opaque, uint8_t* buf, int buf_size), | ||
int (*write_packet)(void* opaque, const uint8_t* buf, int buf_size), | ||
int64_t (*seek)(void* opaque, int64_t offset, int whence)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the convenience function type aliases of AVIO*Function
to here, and then use those in all of our interfaces. That should be easier to grep for, and also easier to tell at a glance that the signatures are the same in different places.
Also, since AVIOAllocContext
is a function and not a type, I think we should name it avioAllocContext()
. Otherwise I think readers will assume we're instantiating an object of type AVIOAllocContext
until they look at the declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I assume we can't use the type aliases in the child classes member declarations / definition? I.e. in the class AVIOBytesContext
header, we still have to use verbose
static int read(void* opaque, uint8_t* buf, int buf_size);
declaration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. We can only use the type aliases when we declare some variable that has that type.
Approving to unblock, but I do think we should follow the same two-constructor pattern we have in |
This PR adds support for encoding audio samples into a bytes (uint8) tensor. We allocate the output tensor to 10MB, and double the allocated size up to 320MB if needed (after which we error out).
We rely on a new
AVIOToTensorContext
class, which inherits fromAVIOContextHolder
.The C++
Encoder
class now takes two optional and mutually exclusive parameters:fileName
andformat
.fileName
is used when encoding to a file, whileformat
is used when encoding to a tensor.The core API has changed: we don't expose an Encoder object anymore. We only expose two stateless functions:
encode_audio_to_file(filename : str, ...)
encoded_tensor = encode_audio_to_tensor(format : str, ...)
I suspect that the public Python API will look similar to this, but final design is still WIP.