Hi SAM2 team,
I built a lightweight wrapper around SAM2 video predictor that solves
long-video segmentation - SAM2 currently loads the entire video into
memory which makes it unusable for longer clips.
simple-sam2 adds:
-
Batch processing :- only batch_size frames in memory at once, keeping
GPU/CPU memory usage constant regardless of video length
-
Unified prompt API :- mask, points, and box in a single call (SAM2 natively
requires mask OR points+box separately)
-
Direct video file input:- pass any .mp4/.avi/.mov directly, frames are
extracted automatically via OpenCV, no manual preprocessing needed
-
Selective frame range processing :- specify start_frame_idx and end_frame_idx
to segment only a portion of the video instead of processing the entire clip,
useful for long videos where only a specific segment is of interest
-
Persistent storage layout :- extracted frames and output masks are organized
under a canonical directory structure, so re-runs skip re-extraction and
masks are always easy to locate
-
Carry-over mask mechanism :- object identity is maintained across batch
boundaries by propagating the last frame's mask as the seed for the next batch
PyPI: https://pypi.org/project/simple-sam2/
GitHub: https://github.com/varun-kolluru/simple_sam2
Sharing here in case it's useful to others in the community. Happy to
answer questions or take feedback.
Hi SAM2 team,
I built a lightweight wrapper around SAM2 video predictor that solves
long-video segmentation - SAM2 currently loads the entire video into
memory which makes it unusable for longer clips.
simple-sam2 adds:
Batch processing :- only
batch_sizeframes in memory at once, keepingGPU/CPU memory usage constant regardless of video length
Unified prompt API :- mask, points, and box in a single call (SAM2 natively
requires mask OR points+box separately)
Direct video file input:- pass any .mp4/.avi/.mov directly, frames are
extracted automatically via OpenCV, no manual preprocessing needed
Selective frame range processing :- specify
start_frame_idxandend_frame_idxto segment only a portion of the video instead of processing the entire clip,
useful for long videos where only a specific segment is of interest
Persistent storage layout :- extracted frames and output masks are organized
under a canonical directory structure, so re-runs skip re-extraction and
masks are always easy to locate
Carry-over mask mechanism :- object identity is maintained across batch
boundaries by propagating the last frame's mask as the seed for the next batch
PyPI: https://pypi.org/project/simple-sam2/
GitHub: https://github.com/varun-kolluru/simple_sam2
Sharing here in case it's useful to others in the community. Happy to
answer questions or take feedback.