add Sam2 video test #1436

zyd-ustc · 2025-11-24T10:13:36Z

What does this PR do?

Fixes # (issue)
1.modify convert_to_rgb,do_resize,do_normalize for video frames
Adds # (feature)
1.add fast test for sam2_video
2.cv2 error info

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

gemini-code-assist · 2025-11-24T10:13:59Z

Summary of Changes

Hello @zyd-ustc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enabling and testing video processing capabilities for the Sam2Video model within the MindSpore framework. It involves significant updates to image processing utilities to correctly handle video frame batches, refining color conversion logic, and establishing a robust testing framework to ensure functional parity and correctness for the Sam2Video model's video inference session.

Highlights

Video Frame Processing Enhancements: The resize and normalize functions in image_processing_utils_fast.py have been updated to correctly handle batch dimensions for video frames, ensuring proper processing of multi-frame inputs by iterating and applying operations per frame.
Refined Video Color Conversion: The convert_to_rgb function in video_processing_utils.py has been refactored to explicitly manage 1-channel (grayscale), 3-channel (RGB), and 4-channel (RGBA with alpha blending) video inputs, improving robustness and removing reliance on a specific transform.
OpenCV Dependency Handling: An import error message and backend mapping for OpenCV (cv2) have been added to import_utils.py, which will improve error handling and user guidance when the library is missing.
SAM2 Video Model Test Suite: A new, comprehensive test suite has been introduced for the Sam2Video model, including dedicated testers for the prompt encoder, mask decoder, and the full model. This suite features parameterized tests that compare MindSpore and PyTorch outputs for functional consistency and correctness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily focuses on adapting image processing utilities for video frames and adding a test for Sam2VideoModel. The modifications in convert_to_rgb and the import utilities are well-implemented. However, the changes to the resize and normalize functions in image_processing_utils_fast.py introduce performance issues by using loops for batch processing. I have provided suggestions to refactor these functions using batched tensor operations, which will significantly improve efficiency. Additionally, I've noted a minor maintainability improvement for the new test file.

mindone/transformers/image_processing_utils_fast.py

gemini-code-assist · 2025-11-24T10:17:15Z

mindone/transformers/image_processing_utils_fast.py

+        original_shape = image.shape
+        batch_dims = original_shape[:-3]
+        num_batch = 1
+        for dim in batch_dims:
+            num_batch *= dim
+        image_flat = image.view(num_batch, *original_shape[-3:])
+
        mean = [float(mean[0]), float(mean[1]), float(mean[2])]
        std = [float(std[0]), float(std[1]), float(std[2])]
        normalize = vision.Normalize(
            mean=mean,
            std=std,
        )
-        images = []
-        for img in image:
-            images.append(normalize(img.permute(1, 2, 0).asnumpy()))
-        return ms.tensor(images).permute(0, 3, 1, 2)
+        normalized_images = []
+        for img in image_flat:
+            normalized_img = normalize(img.permute(1, 2, 0).asnumpy())
+            normalized_images.append(ms.tensor(normalized_img).permute(2, 0, 1))
+
+        normalized_flat = mint.stack(normalized_images, dim=0)
+        _, new_C, new_H, new_W = normalized_flat.shape
+        return normalized_flat.view(*batch_dims, new_C, new_H, new_W)


The normalize function iterates over each image in the batch, converting it to a NumPy array for processing. This is inefficient. Normalization is a simple arithmetic operation (image - mean) / std that can be performed directly on the entire batch as a tensor operation using broadcasting. This would be significantly faster and avoid unnecessary data conversions.

Suggested change

original_shape = image.shape

batch_dims = original_shape[:-3]

num_batch = 1

for dim in batch_dims:

num_batch *= dim

image_flat = image.view(num_batch, *original_shape[-3:])

mean = [float(mean[0]), float(mean[1]), float(mean[2])]

std = [float(std[0]), float(std[1]), float(std[2])]

normalize = vision.Normalize(

mean=mean,

std=std,

)

images = []

for img in image:

images.append(normalize(img.permute(1, 2, 0).asnumpy()))

return ms.tensor(images).permute(0, 3, 1, 2)

normalized_images = []

for img in image_flat:

normalized_img = normalize(img.permute(1, 2, 0).asnumpy())

normalized_images.append(ms.tensor(normalized_img).permute(2, 0, 1))

normalized_flat = mint.stack(normalized_images, dim=0)

_, new_C, new_H, new_W = normalized_flat.shape

return normalized_flat.view(*batch_dims, new_C, new_H, new_W)

mean = ms.tensor(mean, dtype=image.dtype)

std = ms.tensor(std, dtype=image.dtype)

# Reshape mean and std to broadcast across batch and spatial dimensions

view_shape = (1,) * (image.ndim - 3) + (image.shape[-3], 1, 1)

mean = mean.view(view_shape)

std = std.view(view_shape)

return (image - mean) / std

tests/transformers_tests/models/sam2_video/test_modeling_sam2_video.py

…video.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Updated the normalization function to simplify the process by removing unnecessary flattening and looping. Enhanced the documentation to clarify input shape and output characteristics.

zyd-ustc added 6 commits November 24, 2025 17:46

Create __init__.py

fea3ddf

Create test_modeling_sam2_video.py

f0886b7

Update processing_utils.py

4fa773d

Update import_utils.py

f241224

Update video_processing_utils.py

b22a062

Update image_processing_utils_fast.py

5775f78

zyd-ustc requested a review from vigo999 as a code owner November 24, 2025 10:13

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

zyd-ustc and others added 2 commits November 25, 2025 17:52

Update tests/transformers_tests/models/sam2_video/test_modeling_sam2_…

4702638

…video.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Refactor image normalization function for clarity

8dfbfed

Updated the normalization function to simplify the process by removing unnecessary flattening and looping. Enhanced the documentation to clarify input shape and output characteristics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Sam2 video test #1436

add Sam2 video test #1436

Uh oh!

zyd-ustc commented Nov 24, 2025

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

add Sam2 video test #1436

Are you sure you want to change the base?

add Sam2 video test #1436

Uh oh!

Conversation

zyd-ustc commented Nov 24, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Nov 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant