-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix(utils/detect-mimetype): add support for detecting id3 tags #5737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
Please add a test suite that covers the id3 behavior. |
@@ -83,19 +85,48 @@ export const audioMimeTypeSignatures = [ | |||
}, | |||
] as const; | |||
|
|||
const stripID3 = (data: Uint8Array | string) => { | |||
const bytes = | |||
typeof data === 'string' ? convertBase64ToUint8Array(data) : data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to unnecessary conversion of potentially large data entries. Can the edit in the base64 case happen w/o converting to UInt8Array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did spend quite a while on this today, I couldn't get raw base64 stripping working. You can detect the existence of id3 tags by the prefix SURz
but I think the remaining data between the prefix and the start of the raw mp3 frames are variable. You could strip out the data by removing everything before the known mp3 prefix e.g. //s=
but I think you run the risk of cutting at the wrong point.
I'd love to know whether it's possible but yeah I couldn't get it working without conversion.
Please add a changeset (e.g. |
Co-authored-by: Lars Grammel <[email protected]>
Background
While testing the new Fal transcription functionality, I noticed that we weren't able to correctly detect the mimetype of an mp3 file. Some console logging showed that the headers didn't match. So after a bit of a deep dive, I learned that mp3 files can sometimes contain id3 tags, essentially metadata.
This PR strips that metadata if it exists, allowing the signature checks to function as intended.
Summary
This pull request includes significant changes to the
detect-mimetype.ts
file to enhance the detection of MIME types by handling ID3 tags in audio files. The most important changes include the addition of functions to strip ID3 tags from binary and base64 data and modifications to thedetectMimeType
function to utilize these new functions.Enhancements to MIME type detection:
getID3v2TagSize
function to calculate the size of the ID3v2 tag in an audio file.stripID3
function to remove ID3 tags from aUint8Array
.isBase64ID3v2
function to check if a base64 string starts with an ID3v2 header.base64ToBytes
function to convert base64 encoded data to aUint8Array
.detectMimeType
function to usestripID3TagsIfPresent
to process data before checking MIME type signatures.Tasks
pnpm prettier-fix
to fix any formatting issuesFuture Work
None.