Skip to content

fix(utils/detect-mimetype): add support for detecting id3 tags #5737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 16, 2025

Conversation

haydenbleasel
Copy link
Contributor

Background

While testing the new Fal transcription functionality, I noticed that we weren't able to correctly detect the mimetype of an mp3 file. Some console logging showed that the headers didn't match. So after a bit of a deep dive, I learned that mp3 files can sometimes contain id3 tags, essentially metadata.

This PR strips that metadata if it exists, allowing the signature checks to function as intended.

Summary

This pull request includes significant changes to the detect-mimetype.ts file to enhance the detection of MIME types by handling ID3 tags in audio files. The most important changes include the addition of functions to strip ID3 tags from binary and base64 data and modifications to the detectMimeType function to utilize these new functions.

Enhancements to MIME type detection:

  • Added getID3v2TagSize function to calculate the size of the ID3v2 tag in an audio file.
  • Added stripID3 function to remove ID3 tags from a Uint8Array.
  • Added isBase64ID3v2 function to check if a base64 string starts with an ID3v2 header.
  • Added base64ToBytes function to convert base64 encoded data to a Uint8Array.
  • Modified detectMimeType function to use stripID3TagsIfPresent to process data before checking MIME type signatures.

Tasks

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • If required, a patch changeset for relevant packages has been added
  • You've run pnpm prettier-fix to fix any formatting issues

Future Work

None.

@haydenbleasel haydenbleasel requested a review from Copilot April 13, 2025 17:13
@haydenbleasel haydenbleasel self-assigned this Apr 13, 2025
@haydenbleasel haydenbleasel requested a review from lgrammel April 13, 2025 17:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@lgrammel
Copy link
Collaborator

Please add a test suite that covers the id3 behavior.

@@ -83,19 +85,48 @@ export const audioMimeTypeSignatures = [
},
] as const;

const stripID3 = (data: Uint8Array | string) => {
const bytes =
typeof data === 'string' ? convertBase64ToUint8Array(data) : data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to unnecessary conversion of potentially large data entries. Can the edit in the base64 case happen w/o converting to UInt8Array?

Copy link
Contributor Author

@haydenbleasel haydenbleasel Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did spend quite a while on this today, I couldn't get raw base64 stripping working. You can detect the existence of id3 tags by the prefix SURz but I think the remaining data between the prefix and the start of the raw mp3 frames are variable. You could strip out the data by removing everything before the known mp3 prefix e.g. //s= but I think you run the risk of cutting at the wrong point.

I'd love to know whether it's possible but yeah I couldn't get it working without conversion.

@lgrammel
Copy link
Collaborator

Please add a changeset (e.g. fix (core): support mimetypes with id3 tags) bc it changes the behavior.

@lgrammel lgrammel merged commit b69a253 into main Apr 16, 2025
7 checks passed
@lgrammel lgrammel deleted the detect-id3v2 branch April 16, 2025 17:40
samdenty pushed a commit that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants