feat: add nim image retrieval endpoint support #394

ajcasagrande · 2025-10-24T04:32:31Z

Summary by CodeRabbit

New Features
- Image retrieval endpoint for NIM-based image inference workloads.
- Video media modality support in datasets alongside images and audio.
- Audio and video encoding/decoding utilities for media file handling.
Metrics
- Image throughput and latency metrics for tracking performance.
- Video throughput and latency metrics for performance measurement.
- Extended metric units supporting per-second and per-time-unit calculations.

ajcasagrande · 2025-10-24T04:34:21Z

coderabbitai · 2025-10-24T04:34:28Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

codecov · 2025-10-24T04:35:17Z

Codecov Report

❌ Patch coverage is 78.31858% with 49 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/endpoints/nim_image_retrieval.py	72.50%	6 Missing and 5 partials ⚠️
src/aiperf/metrics/types/image_metrics.py	76.19%	10 Missing ⚠️
src/aiperf/metrics/types/video_metrics.py	75.60%	10 Missing ⚠️
src/aiperf/dataset/loader/mixins.py	86.36%	3 Missing and 3 partials ⚠️
src/aiperf/dataset/utils.py	79.31%	4 Missing and 2 partials ⚠️
src/aiperf/dataset/loader/models.py	55.55%	2 Missing and 2 partials ⚠️
src/aiperf/common/enums/metric_enums.py	93.33%	1 Missing ⚠️
src/aiperf/common/models/record_models.py	75.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai · 2025-10-24T04:41:59Z

Walkthrough

This pull request adds comprehensive support for image retrieval and video processing capabilities to the aiperf framework. It introduces new metric units and metrics for images/videos, a new image retrieval endpoint type, video modality support across dataset models and loaders, audio/video encoding utilities, and response data structures for image retrieval, with corresponding test coverage.

Changes

Cohort / File(s)	Summary
Metric Enums & Units `src/aiperf/common/enums/metric_enums.py`	Added generic metric units (IMAGE, IMAGES, VIDEO, VIDEOS); introduced inverted flag to MetricOverTimeUnitInfo for swapped unit ordering; exposed inverted property on MetricOverTimeUnit; added composite units (IMAGES_PER_SECOND, MS_PER_IMAGE, VIDEOS_PER_SECOND, MS_PER_VIDEO); extended MetricFlags with SUPPORTS_VIDEO_ONLY.
Plugin Enums `src/aiperf/common/enums/plugin_enums.py`	Added IMAGE_RETRIEVAL member to EndpointType enum.
Response Models `src/aiperf/common/models/__init__.py`, `src/aiperf/common/models/record_models.py`	Created ImageRetrievalResponseData class with data field and get_text method; extended ParsedResponse union to include ImageRetrievalResponseData; exposed via package exports.
Image Retrieval Endpoint `src/aiperf/endpoints/__init__.py`, `src/aiperf/endpoints/nim_image_retrieval.py`	Implemented ImageRetrievalEndpoint with payload formatting (validates images, enforces non-empty content), response parsing (extracts JSON data), and metadata endpoint; registered with EndpointFactory.
Media Utilities `src/aiperf/dataset/utils.py`	Added open_audio, encode_audio, open_video, encode_video functions supporting format detection and base64 encoding with proper data URL formatting.
Dataset Package Exports `src/aiperf/dataset/__init__.py`	Exposed audio/video encoding and opening utilities (encode_audio, encode_video, open_audio, open_video) in public API.
Dataset Models & Video Support `src/aiperf/dataset/loader/models.py`	Added video and videos fields to SingleTurn and RandomPool; updated validation for mutual exclusivity and modality presence checks.
Media Conversion Mixin `src/aiperf/dataset/loader/mixins.py`	Added convert_to_media_objects method; introduced helper methods (_is_url, _is_already_encoded, _encode_media_file, _handle_media_content) for detecting and encoding local media files to base64 data URLs.
Turn Loaders - Video Integration `src/aiperf/dataset/loader/single_turn.py`, `src/aiperf/dataset/loader/multi_turn.py`, `src/aiperf/dataset/loader/random_pool.py`	Integrated video field population from media extraction across single-turn, multi-turn, and random pool conversion flows.
Image & Video Metrics `src/aiperf/metrics/types/image_metrics.py`, `src/aiperf/metrics/types/video_metrics.py`	Introduced NumImagesMetric, ImageThroughputMetric, ImageLatencyMetric; introduced NumVideosMetric, VideoThroughputMetric, VideoLatencyMetric with appropriate dependencies and unit conversions.
Endpoint Tests `tests/endpoints/test_nim_image_retrieval_endpoint.py`, `tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py`	Added tests for ImageRetrievalEndpoint payload formatting, validation, response parsing, and error handling.
Loader Fixtures & Tests `tests/loaders/conftest.py`, `tests/loaders/test_single_turn.py`, `tests/loaders/test_multi_turn.py`, `tests/loaders/test_random_pool.py`	Added pytest fixtures (test_images, create_test_image, create_test_audio, create_test_video); updated loader tests to validate base64 encoding of local media assets, URL pass-through, and mixed media source handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

This PR spans 20+ files with heterogeneous changes: enum/metric additions, media encoding logic with format detection, endpoint validation and response parsing, dataset model extensions, and comprehensive test coverage. While individual sections follow consistent patterns, the breadth of interconnected functionality and logic density around media handling, metric calculations, and endpoint validation warrant sustained attention across multiple distinct areas.

Poem

🐰 Hops with joy through pixels and frames,
Images flow, videos now stake their claims,
Metrics bloom where media once slumbered alone,
Each frame encoded in base64 stone,
A retrieval endpoint awaits the next quest,
From metrics to loaders, our framework's now blessed! ✨📹

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "feat: add nim image retrieval endpoint support" is directly related to the pull request changes. The PR includes a new ImageRetrievalEndpoint class with format_payload and parse_response methods, along with supporting infrastructure like ImageRetrievalResponseData and the IMAGE_RETRIEVAL endpoint type enum value. However, the changeset has a broader scope that extends beyond the image retrieval endpoint, including comprehensive video support across data models and loaders, new image and video metrics, audio/video utilities, and metric unit enhancements. The title captures a real and significant part of the changes but does not fully summarize the complete scope of multi-modal support additions in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/aiperf/dataset/loader/random_pool.py (1)
179-192: Add videos to the merge logic.

The _merge_turns method merges texts, images, and audios but omits videos. This inconsistency will cause video data to be lost when turns are merged.

Apply this diff to include videos in the merged turn:
     def _merge_turns(self, turns: list[Turn]) -> Turn:
         """Merge turns into a single turn.
 
         Args:
             turns: A list of turns.
 
         Returns:
             A single turn.
         """
         merged_turn = Turn(
             texts=[text for turn in turns for text in turn.texts],
             images=[image for turn in turns for image in turn.images],
             audios=[audio for turn in turns for audio in turn.audios],
+            videos=[video for turn in turns for video in turn.videos],
         )
         return merged_turn

🧹 Nitpick comments (27)

tests/endpoints/test_nim_image_retrieval_endpoint.py (1)

49-52: Make the failure assertion less brittle and add multi-image coverage.

Use a stable substring in the regex to reduce brittleness if wording changes.
Add a test for multiple images to ensure list ordering and formatting.

Apply this minimal tweak to the assertion:

-with pytest.raises(
-    ValueError, match="Image Retrieval request requires at least one image"
-):
+with pytest.raises(ValueError, match=r"requires at least one image"):

Optionally add:

def test_format_payload_multiple_images(endpoint, model_endpoint):
    turn = Turn(images=[Image(contents=["data:image/png;base64,AAA"]), Image(contents=["data:image/png;base64,BBB"])], model="image-retrieval-model")
    req = RequestInfo(model_endpoint=model_endpoint, turns=[turn])
    payload = endpoint.format_payload(req)
    assert [i["url"] for i in payload["input"]] == ["data:image/png;base64,AAA","data:image/png;base64,BBB"]

tests/loaders/test_multi_turn.py (1)

456-467: Good coverage; consider asserting image ordering for stability.

Add an explicit order check to ensure the two encoded images remain in the provided order during conversion.

 first_turn = conversation.turns[0]
 assert first_turn.texts[0].contents == ["What's this?"]
-assert len(first_turn.images[0].contents) == 1
+assert len(first_turn.images[0].contents) == 1
 # ...
 second_turn = conversation.turns[1]
 assert second_turn.texts[0].contents == ["Follow up"]
-assert len(second_turn.images[0].contents) == 1
+assert len(second_turn.images[0].contents) == 1
+
+# Optional: verify order is preserved by comparing raw contents before/after
+# (placeholders—focus is on positional stability)
+img0 = first_turn.images[0].contents[0]
+img1 = second_turn.images[0].contents[0]
+assert img0 != img1  # sanity check; should represent different inputs

Also applies to: 479-485, 486-492

tests/loaders/test_random_pool.py (2)

251-275: Add an explicit ordering assertion for batched images.

Helps catch accidental reordering during encoding.
 for img_content in turn.images[0].contents:
     assert img_content.startswith("data:image/")
     assert ";base64," in img_content
+assert turn.images[0].contents[0] != turn.images[0].contents[1]
325-375: Good multi-file assertions; mirror the image-encoding checks for both conversations.

You already validate base64 for both; consider asserting that text-image pairs belong to different files (queries vs contexts) by name when available to tighten guarantees.

tests/loaders/conftest.py (2)

89-111: Minor: produce exact sample count for generated audio.

Use endpoint=False to avoid including the end sample twice for short durations.

- t = np.linspace(0, duration, int(sample_rate * duration))
+ t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)

124-166: Narrow exception handling and ensure temp-frame cleanup in video fixture.

Catching Exception masks errors (Ruff BLE001). Also, ensure frames are cleaned even on failure by using TemporaryDirectory.

-    def _create_video(name: str = "test_video.mp4"):
+    def _create_video(name: str = "test_video.mp4"):
         dest_path = tmp_path / name
-
-        # Try using ffmpeg-python if available, otherwise create a minimal MP4
-        try:
-            import tempfile
-
-            import ffmpeg
-            # Create a few simple frames
-            temp_frame_dir = tempfile.mkdtemp(prefix="video_frames_")
-            for i in range(3):
-                img = Image.new("RGB", (64, 64), (i * 80, 0, 0))
-                draw = ImageDraw.Draw(img)
-                draw.text((10, 25), f"F{i}", fill=(255, 255, 255))
-                img.save(f"{temp_frame_dir}/frame_{i:03d}.png")
-            # Use ffmpeg to create video
-            (
-                ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1)
-                .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1)
-                .overwrite_output()
-                .run(quiet=True)
-            )
-            for file in Path(temp_frame_dir).glob("*.png"):
-                file.unlink()
-            Path(temp_frame_dir).rmdir()
-        except (ImportError, Exception):
+        # Try using ffmpeg-python if available, otherwise create a minimal MP4
+        try:
+            try:
+                import ffmpeg  # type: ignore
+            except ImportError:
+                ffmpeg = None
+            if ffmpeg:
+                import tempfile as _tf
+                from tempfile import TemporaryDirectory
+                with TemporaryDirectory(prefix="video_frames_") as temp_frame_dir:
+                    for i in range(3):
+                        img = Image.new("RGB", (64, 64), (i * 80, 0, 0))
+                        draw = ImageDraw.Draw(img)
+                        draw.text((10, 25), f"F{i}", fill=(255, 255, 255))
+                        img.save(f"{temp_frame_dir}/frame_{i:03d}.png")
+                    (
+                        ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1)
+                        .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1)
+                        .overwrite_output()
+                        .run(quiet=True)
+                    )
+            else:
+                raise RuntimeError("ffmpeg not available")
+        except Exception:
             # Fallback: create a minimal valid MP4 file
             minimal_mp4 = bytes.fromhex(
                 "000000186674797069736f6d0000020069736f6d69736f32617663310000"
                 "0008667265650000002c6d6461740000001c6d6f6f7600000000006d7668"
                 "6400000000000000000000000000000001000000"
             )
             with open(dest_path, "wb") as f:
                 f.write(minimal_mp4)
         return str(dest_path)

If keeping broad except is intentional, add a noqa for BLE001 with a short rationale.

tests/loaders/test_single_turn.py (5)

399-437: Avoid hard-coded asset UUIDs; use the fixture to reduce skips.

Replace the fixed source path with the create_test_image fixture to make this portable and keep the test running across environments.

- def test_convert_local_image_to_base64(self, create_jsonl_file):
+ def test_convert_local_image_to_base64(self, create_jsonl_file, create_test_image):
     """Test that local image files are encoded to base64 data URLs."""
-    test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg")
-    if not test_image.exists():
-        pytest.skip("Test image not found")
+    test_image = Path(create_test_image())

Also, narrow the exception in base64 validation:

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")

472-512: Use fixture-driven images instead of hard-coded paths.

Swap the two explicit paths with the test_images fixture to avoid brittle skips.

- test_images = [
-     Path("src/.../source_images/0bfd8fdf-..._1024.jpg"),
-     Path("src/.../source_images/119544eb-..._861.jpg"),
- ]
+ def_imgs = [Path(p) for _, p in sorted(test_images.items())[:2]]
+ test_images = def_imgs

513-552: Prefer create_test_image for the local component in mixed sources.

Keeps the test self-contained and portable.

- test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg")
- if not test_image.exists():
-     pytest.skip("Test image not found")
+ test_image = Path(create_test_image())

596-601: Narrow the exception type in audio base64 validation.

Catching Exception is too broad and hides unrelated bugs.

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")

Note: Audio uses "wav," whereas images/videos use data URLs. Consider aligning formats or documenting the difference clearly.

665-668: Same here: narrow the exception type for video base64 validation.

-    try:
-        base64.b64decode(base64_part)
-    except Exception as e:
+    import binascii
+    try:
+        base64.b64decode(base64_part)
+    except (binascii.Error, ValueError) as e:
         pytest.fail(f"Invalid base64 encoding: {e}")

src/aiperf/dataset/loader/models.py (1)

56-69: Reduce duplication in validators to prevent drift.

Extract shared helpers for:

mutually exclusive scalar vs list per modality

at-least-one-modality checks

This keeps SingleTurn and RandomPool in sync as modalities evolve.

Example helper sketch:
def _ensure_exclusive(self, pairs: list[tuple[object, object]], names: list[tuple[str,str]]):
    for (a,b), (an,bn) in zip(pairs, names):
        if a and b:
            raise ValueError(f"{an} and {bn} cannot be set together")

def _has_any(self, fields: list[object]) -> bool:
    return any(bool(f) for f in fields)
Then call with the relevant fields per model. Also consider rejecting empty lists explicitly if passed.

Also applies to: 149-160, 162-178

src/aiperf/dataset/utils.py (1)

150-153: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. While not critical, consider either shortening these messages or creating custom exception classes if this pattern appears frequently.

Also applies to: 197-200

src/aiperf/endpoints/nim_image_retrieval.py (1)

35-35: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. While not critical for functionality, this is a style consideration.

Also applies to: 46-46, 49-49

src/aiperf/dataset/loader/mixins.py (1)

111-111: Consider using shorter exception messages or custom exception classes.

Static analysis suggests avoiding long exception messages outside the exception class. This is a style consideration and not critical for functionality.

Also applies to: 171-171

src/aiperf/common/enums/metric_enums.py (2)

296-301: Guard conversions between inverted and non‑inverted over‑time units.

Tags reflect inversion, but convert_to does not explicitly prevent converting between inverted and non‑inverted units (e.g., IMAGES_PER_SECOND ↔ MS_PER_IMAGE). Make this fail fast with a clear error to avoid accidental misuse.

Apply this diff:
 class MetricOverTimeUnitInfo(BaseMetricUnitInfo):
@@
     def convert_to(self, other_unit: "MetricUnitT", value: int | float) -> float:
@@
-        if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo):
+        if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo):
+            # Disallow conversions across inverted orientation to avoid subtle errors.
+            if self.inverted != other_unit.inverted:
+                raise MetricUnitError(
+                    f"Cannot convert between inverted ('{self.tag}') and non-inverted ('{other_unit.tag}') units. "
+                    "Compute the reciprocal metric explicitly."
+                )
             # Chain convert each unit to the other unit.
             value = self.primary_unit.convert_to(other_unit.primary_unit, value)
             value = self.time_unit.convert_to(other_unit.time_unit, value)
             if self.third_unit and other_unit.third_unit:
                 value = self.third_unit.convert_to(other_unit.third_unit, value)
             return value
Also applies to: 315-336

354-371: Naming and inverted configuration look good; consider optional seconds variants.

IMAGES_PER_SECOND/MS_PER_IMAGE and VIDEOS_PER_SECOND/MS_PER_VIDEO are coherent. If consumers need seconds-per-image/video without rounding to ms, consider adding SECONDS_PER_IMAGE and SECONDS_PER_VIDEO for symmetry; otherwise current time-unit conversions on latency metrics suffice.

Confirm whether UI/CSV exporters ever need “s/image” or “s/video” tags directly.

src/aiperf/metrics/types/image_metrics.py (5)

1-10: Import ClassVar to annotate mutable class attributes.

Needed for RUF012 compliance.

-from aiperf.common.enums import MetricFlags
+from typing import ClassVar
+from aiperf.common.enums import MetricFlags

21-35: Count logic ok; silence unused record_metrics.

The summation matches the stated behavior. Delete the unused parameter to satisfy ARG002 without changing the signature.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> int:
         """Parse the number of images from the record by summing the number of images in each turn."""
+        del record_metrics  # unused
         num_images = sum(
             len(image.contents)
             for turn in record.request.turns
             for image in turn.images
         )
         if num_images == 0:
-            raise NoMetricValue(
-                "Record must have at least one image in at least one turn."
-            )
+            raise NoMetricValue("No images found.")
         return num_images

46-49: Annotate mutable class attribute required_metrics with ClassVar.

Avoids it being treated as an instance attribute.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumImagesMetric.tag,
         RequestLatencyMetric.tag,
     }

71-74: Annotate mutable class attribute required_metrics with ClassVar.

Same as throughput metric.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumImagesMetric.tag,
         RequestLatencyMetric.tag,
     }

76-84: Silence unused record parameter.

Keeps signature while appeasing ARG002.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> float:
         """Parse the image latency from the record by dividing the request latency by the number of images."""
+        del record  # unused
         num_images = record_metrics.get_or_raise(NumImagesMetric)
         request_latency_ms = record_metrics.get_converted_or_raise(
             RequestLatencyMetric, self.unit.time_unit
         )
         return request_latency_ms / num_images

src/aiperf/metrics/types/video_metrics.py (5)

1-10: Import ClassVar for mutable class attribute annotations.

-from aiperf.common.enums import MetricFlags
+from typing import ClassVar
+from aiperf.common.enums import MetricFlags

21-35: Count logic ok; silence unused record_metrics.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> int:
         """Parse the number of videos from the record by summing the number of videos in each turn."""
+        del record_metrics  # unused
         num_videos = sum(
             len(video.contents)
             for turn in record.request.turns
             for video in turn.videos
         )
         if num_videos == 0:
-            raise NoMetricValue(
-                "Record must have at least one video in at least one turn."
-            )
+            raise NoMetricValue("No videos found.")
         return num_videos

45-48: Annotate mutable class attribute required_metrics with ClassVar.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumVideosMetric.tag,
         RequestLatencyMetric.tag,
     }

70-73: Annotate mutable class attribute required_metrics with ClassVar.

-    required_metrics = {
+    required_metrics: ClassVar[set[str]] = {
         NumVideosMetric.tag,
         RequestLatencyMetric.tag,
     }

75-83: Silence unused record parameter.

     def _parse_record(
         self, record: ParsedResponseRecord, record_metrics: MetricRecordDict
     ) -> float:
         """Parse the video latency from the record by dividing the request latency by the number of videos."""
+        del record  # unused
         num_videos = record_metrics.get_or_raise(NumVideosMetric)
         request_latency_ms = record_metrics.get_converted_or_raise(
             RequestLatencyMetric, self.unit.time_unit
         )
         return request_latency_ms / num_videos

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ddb6b4 and b44390a.

📒 Files selected for processing (21)

src/aiperf/common/enums/metric_enums.py (6 hunks)
src/aiperf/common/enums/plugin_enums.py (1 hunks)
src/aiperf/common/models/__init__.py (2 hunks)
src/aiperf/common/models/record_models.py (2 hunks)
src/aiperf/dataset/__init__.py (2 hunks)
src/aiperf/dataset/loader/mixins.py (4 hunks)
src/aiperf/dataset/loader/models.py (9 hunks)
src/aiperf/dataset/loader/multi_turn.py (1 hunks)
src/aiperf/dataset/loader/random_pool.py (1 hunks)
src/aiperf/dataset/loader/single_turn.py (1 hunks)
src/aiperf/dataset/utils.py (2 hunks)
src/aiperf/endpoints/__init__.py (2 hunks)
src/aiperf/endpoints/nim_image_retrieval.py (1 hunks)
src/aiperf/metrics/types/image_metrics.py (1 hunks)
src/aiperf/metrics/types/video_metrics.py (1 hunks)
tests/endpoints/test_nim_image_retrieval_endpoint.py (1 hunks)
tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (1 hunks)
tests/loaders/conftest.py (2 hunks)
tests/loaders/test_multi_turn.py (2 hunks)
tests/loaders/test_random_pool.py (5 hunks)
tests/loaders/test_single_turn.py (6 hunks)

🧰 Additional context used

🪛 Ruff (0.14.1)

src/aiperf/dataset/utils.py

150-153: Avoid specifying long messages outside the exception class

(TRY003)

197-200: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/metrics/types/image_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)

31-33: Avoid specifying long messages outside the exception class

(TRY003)

46-49: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

52-52: Unused method argument: record

(ARG002)

71-74: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

77-77: Unused method argument: record

(ARG002)

src/aiperf/dataset/loader/mixins.py

111-111: Avoid specifying long messages outside the exception class

(TRY003)

171-171: Avoid specifying long messages outside the exception class

(TRY003)

tests/loaders/conftest.py

153-153: Do not catch blind exception: Exception

(BLE001)

src/aiperf/dataset/loader/models.py

66-66: Avoid specifying long messages outside the exception class

(TRY003)

159-159: Avoid specifying long messages outside the exception class

(TRY003)

tests/loaders/test_single_turn.py

434-434: Do not catch blind exception: Exception

(BLE001)

599-599: Do not catch blind exception: Exception

(BLE001)

667-667: Do not catch blind exception: Exception

(BLE001)

src/aiperf/endpoints/nim_image_retrieval.py

35-35: Avoid specifying long messages outside the exception class

(TRY003)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

49-49: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/metrics/types/video_metrics.py

22-22: Unused method argument: record_metrics

(ARG002)

31-33: Avoid specifying long messages outside the exception class

(TRY003)

45-48: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

51-51: Unused method argument: record

(ARG002)

70-73: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

76-76: Unused method argument: record

(ARG002)

🔇 Additional comments (27)

src/aiperf/common/models/__init__.py (1)

72-72: LGTM! ImageRetrievalResponseData properly exported.

The new response data class is correctly imported and exported following the same pattern as other response data types.

Also applies to: 144-144

src/aiperf/common/models/record_models.py (2)

602-612: LGTM! ImageRetrievalResponseData follows established pattern.

The new response data class is well-structured and consistent with similar non-text response types (EmbeddingResponseData, RankingsResponseData).

623-623: LGTM! ParsedResponse union updated correctly.

ImageRetrievalResponseData properly added to the SerializeAsAny union type.

src/aiperf/dataset/loader/multi_turn.py (1)

142-142: LGTM! Video modality support added consistently.

The videos field is correctly passed to the Turn constructor, following the same pattern as texts, images, and audios.

src/aiperf/dataset/loader/random_pool.py (1)

167-167: LGTM! Video modality support added.

The videos field is correctly passed to the Turn constructor, consistent with the pattern for other modalities.

src/aiperf/common/enums/plugin_enums.py (1)

30-30: LGTM! IMAGE_RETRIEVAL endpoint type added.

The new endpoint type follows the established pattern and naming convention for other endpoint types.

src/aiperf/endpoints/__init__.py (1)

7-9: LGTM! ImageRetrievalEndpoint properly exported.

The new endpoint is correctly imported and exported, following the same pattern as other endpoint implementations.

Also applies to: 28-28

src/aiperf/dataset/loader/single_turn.py (1)

113-113: LGTM! Video modality support added consistently.

The videos field is correctly passed to the Turn constructor, following the same pattern as other modalities.

tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (3)

19-36: LGTM! Endpoint fixture properly configured.

The fixture correctly sets up an ImageRetrievalEndpoint with appropriate mocking for the transport layer.

38-68: LGTM! Basic parse response test is comprehensive.

The test validates the complete parsing flow including timestamp preservation, response type verification, and data structure integrity.

70-77: LGTM! Invalid response handling tested.

The test properly verifies that None is returned for invalid/empty responses.

tests/endpoints/test_nim_image_retrieval_endpoint.py (1)

31-43: Happy path looks solid.

Asserting a single image_url item and echoing the data URL is correct for the NIM payload.

tests/loaders/test_random_pool.py (1)

223-250: LGTM for multimodal conversion assertions.

Data URL checks for image and passthrough for audio URL are appropriate.

tests/loaders/test_single_turn.py (1)

310-326: URL passthrough assertions look correct.

Good separation: local files are encoded elsewhere; remote URLs pass through as-is.

src/aiperf/dataset/loader/models.py (1)

42-46: Video modality support is correctly integrated.

Fields and validators mirror existing modalities; docstrings updated accordingly.

Also applies to: 65-67, 75-85, 143-147, 158-160, 166-176
src/aiperf/dataset/utils.py (2)
127-159: Verify type consistency between open_audio return value and encode_audio parameter.

The function returns audio_format.value (a string), but encode_audio expects format: AudioFormat (an enum). This type mismatch could cause confusion and may fail static type checking.

Consider either:

Changing the return type to tuple[bytes, AudioFormat] and returning the enum, or

Updating encode_audio to accept str instead of AudioFormat

Apply this diff to return the enum for consistency:
-    return audio_bytes, audio_format.value
+    return audio_bytes, audio_format
And update the docstring:
     Returns:
-        A tuple of (audio_bytes, format) where format is 'wav' or 'mp3'.
+        A tuple of (audio_bytes, format) where format is an AudioFormat enum.
176-206: Verify type consistency between open_video return value and encode_video parameter.

Similar to open_audio, this function returns video_format.value (a string), but encode_video expects format: VideoFormat (an enum). This creates a type mismatch.

Apply this diff to return the enum for consistency:
-    return video_bytes, video_format.value
+    return video_bytes, video_format
And update the docstring:
     Returns:
-        A tuple of (video_bytes, format) where format is VideoFormat.MP4.
+        A tuple of (video_bytes, format) where format is a VideoFormat enum.
src/aiperf/dataset/__init__.py (1)

40-51: LGTM!

The new audio and video utilities are correctly imported and exported. The public API surface expansion is clean and consistent with existing patterns.

Also applies to: 53-92

src/aiperf/endpoints/nim_image_retrieval.py (2)

23-30: LGTM!

The metadata configuration is appropriate for an image retrieval endpoint.

65-83: LGTM!

The response parsing handles missing JSON and missing data fields appropriately with debug logging. Returning None for unparseable responses appears to be the established pattern in this codebase.

src/aiperf/dataset/loader/mixins.py (4)

47-89: LGTM!

The extended media conversion logic correctly handles video alongside image and audio, with appropriate encoding for local files. The singular and plural field handling is consistent.

91-114: LGTM!

The URL validation logic is robust, correctly handling valid URLs, non-URLs, and raising errors for malformed URLs with only scheme or netloc. This prevents subtle bugs.

144-171: Verify compatibility with utils.py type signatures.

This method calls utils.open_audio and utils.open_video which currently return string format values, but then passes those to utils.encode_audio and utils.encode_video which expect enum types. This works at runtime because the encode functions incorrectly call .lower() on the parameter without .value, but the type signatures are inconsistent.

Ensure the type signature fixes suggested for utils.py are applied consistently, so that:

open_audio and open_video return enums

encode_audio and encode_video accept enums and call .value.lower()

Or alternatively:

All functions use strings consistently

173-202: LGTM!

The media content handling logic is well-structured, checking for already-encoded content (including data URLs) before checking for remote URLs, then treating remaining content as local files. The ordering is correct and prevents data URLs from being misidentified as remote URLs.

src/aiperf/common/enums/metric_enums.py (3)

191-195: Units added are consistent and clear.

IMAGE/IMAGES/VIDEO/VIDEOS naming aligns with existing pattern and tag casing. No issues.

393-397: Good API: expose inverted on the enum.

Surface mirrors info cleanly; helps callers avoid peeking into info.

678-680: Video-only flag addition is sensible and non-breaking.

Bit position continues sequence; no overlap.

src/aiperf/dataset/loader/mixins.py

src/aiperf/dataset/utils.py

coderabbitai · 2025-10-24T04:42:03Z

src/aiperf/endpoints/nim_image_retrieval.py

+        if not turn.images:
+            raise ValueError("Image Retrieval request requires at least one image.")
+
+        if not turn.images[0].contents:
+            raise ValueError("Image content is required for Image Retrieval.")
+
+        payload = {
+            "input": [
+                {"type": "image_url", "url": content}
+                for img in turn.images
+                for content in img.contents
+            ],


⚠️ Potential issue | 🟡 Minor

Incomplete validation: only first image is checked.

Lines 48-49 only validate that turn.images[0].contents is not empty, but lines 53-56 iterate over all images and their contents. If subsequent images have empty contents lists, they won't contribute to the payload but won't raise an error either.

Consider validating all images:

if not turn.images: raise ValueError("Image Retrieval request requires at least one image.") - if not turn.images[0].contents: - raise ValueError("Image content is required for Image Retrieval.") + if not any(img.contents for img in turn.images): + raise ValueError("At least one image must have content for Image Retrieval.")

Or validate that each image has content:

if not turn.images: raise ValueError("Image Retrieval request requires at least one image.") - if not turn.images[0].contents: - raise ValueError("Image content is required for Image Retrieval.") + for img in turn.images: + if not img.contents: + raise ValueError("All images must have content for Image Retrieval.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not turn.images:

raise ValueError("Image Retrieval request requires at least one image.")

if not turn.images[0].contents:

raise ValueError("Image content is required for Image Retrieval.")

payload = {

"input": [

{"type": "image_url", "url": content}

for img in turn.images

for content in img.contents

],

if not turn.images:

raise ValueError("Image Retrieval request requires at least one image.")

if not any(img.contents for img in turn.images):

raise ValueError("At least one image must have content for Image Retrieval.")

payload = {

"input": [

{"type": "image_url", "url": content}

for img in turn.images

for content in img.contents

],

Suggested change

if not turn.images:

raise ValueError("Image Retrieval request requires at least one image.")

if not turn.images[0].contents:

raise ValueError("Image content is required for Image Retrieval.")

payload = {

"input": [

{"type": "image_url", "url": content}

for img in turn.images

for content in img.contents

],

if not turn.images:

raise ValueError("Image Retrieval request requires at least one image.")

for img in turn.images:

if not img.contents:

raise ValueError("All images must have content for Image Retrieval.")

payload = {

"input": [

{"type": "image_url", "url": content}

for img in turn.images

for content in img.contents

],

🧰 Tools

🪛 Ruff (0.14.1)

46-46: Avoid specifying long messages outside the exception class

(TRY003)

49-49: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In src/aiperf/endpoints/nim_image_retrieval.py around lines 45 to 56, the code only checks turn.images[0].contents but builds a payload from every image; update validation to ensure every image in turn.images has a non-empty contents list (and optionally non-empty content values) before constructing the payload, raising a ValueError that includes the offending image index or a clear message if any image.contents is empty, so the payload only proceeds when all images have content.

src/aiperf/metrics/types/image_metrics.py

src/aiperf/metrics/types/video_metrics.py

coderabbitai · 2025-10-24T04:42:03Z

tests/loaders/conftest.py

+@pytest.fixture
+def test_images(tmp_path):
+    """Create temporary test images copied from source assets.
+
+    Returns:
+        A dictionary mapping image names to their temporary file paths.
+    """
+    # Get the source images directory
+    source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images")
+
+    # Get some actual image files
+    source_images = list(source_images_dir.glob("*.jpg"))[:4]
+
+    if not source_images:
+        pytest.skip("No source images found for testing")
+
+    # Create temporary copies with simple names
+    image_map = {}
+    for i, source_img in enumerate(source_images, 1):
+        dest_path = tmp_path / f"image{i}.png"
+        shutil.copy(source_img, dest_path)
+        image_map[f"image{i}.png"] = str(dest_path)
+
+    return image_map
+


⚠️ Potential issue | 🟠 Major

Fix image extension/content mismatch to avoid incorrect MIME.

Source files are “.jpg” but are copied to “.png” names. If encoders infer MIME from suffix, you’ll produce data:image/png with JPEG bytes. Preserve the original suffix for both path and key.

- for i, source_img in enumerate(source_images, 1): - dest_path = tmp_path / f"image{i}.png" - shutil.copy(source_img, dest_path) - image_map[f"image{i}.png"] = str(dest_path) + for i, source_img in enumerate(source_images, 1): + dest_path = tmp_path / f"image{i}{source_img.suffix}" + shutil.copy(source_img, dest_path) + image_map[dest_path.name] = str(dest_path)

Optionally, if no source images, generate a tiny synthetic image instead of skipping to keep tests running.

🤖 Prompt for AI Agents

In tests/loaders/conftest.py around lines 31 to 55, the fixture copies JPEG source files but renames them to .png which causes MIME/encoder mismatches; fix by preserving the original file extension when constructing dest_path and the dict key (use source_img.suffix and source_img.stem), i.e., copy to tmp_path / f"{source_img.stem}{source_img.suffix}" and map that filename to the dest path string; optionally, if no source_images are found, create a tiny valid synthetic image (e.g., write a minimal 1x1 PNG or JPEG byte payload) into tmp_path and include it in image_map instead of skipping so tests still run.

coderabbitai · 2025-10-24T04:42:03Z

tests/loaders/conftest.py

+@pytest.fixture
+def create_test_image(tmp_path):
+    """Create a single test image copied from source assets.
+
+    Returns:
+        A function that creates a test image with the given name.
+    """
+    source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images")
+    source_images = list(source_images_dir.glob("*.jpg"))
+
+    if not source_images:
+        pytest.skip("No source images found for testing")
+
+    def _create_image(name: str = "test_image.png"):
+        dest_path = tmp_path / name
+        shutil.copy(source_images[0], dest_path)
+        return str(dest_path)
+
+    return _create_image
+


🛠️ Refactor suggestion | 🟠 Major

Align create_test_image naming with source suffix.

Same suffix issue here. Respect caller-provided suffix if present; otherwise inherit from source.

- def _create_image(name: str = "test_image.png"): - dest_path = tmp_path / name + def _create_image(name: str = "test_image"): + # If caller passed a suffix, use it; else inherit from source + suffix = Path(name).suffix or source_images[0].suffix + stem = Path(name).stem + dest_path = tmp_path / f"{stem}{suffix}" shutil.copy(source_images[0], dest_path) return str(dest_path)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.fixture

def create_test_image(tmp_path):

"""Create a single test image copied from source assets.

Returns:

A function that creates a test image with the given name.

"""

source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images")

source_images = list(source_images_dir.glob("*.jpg"))

if not source_images:

pytest.skip("No source images found for testing")

def _create_image(name: str = "test_image.png"):

dest_path = tmp_path / name

shutil.copy(source_images[0], dest_path)

return str(dest_path)

return _create_image

@pytest.fixture

def create_test_image(tmp_path):

"""Create a single test image copied from source assets.

Returns:

A function that creates a test image with the given name.

"""

source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images")

source_images = list(source_images_dir.glob("*.jpg"))

if not source_images:

pytest.skip("No source images found for testing")

def _create_image(name: str = "test_image"):

# If caller passed a suffix, use it; else inherit from source

suffix = Path(name).suffix or source_images[0].suffix

stem = Path(name).stem

dest_path = tmp_path / f"{stem}{suffix}"

shutil.copy(source_images[0], dest_path)

return str(dest_path)

return _create_image

🤖 Prompt for AI Agents

In tests/loaders/conftest.py around lines 57 to 76, the fixture always writes "test_image.png" ignoring the source image extension; change the factory so it respects a caller-provided suffix if present, otherwise inherit the suffix from source_images[0]. Specifically: when building dest_path, parse the provided name with Path(name) and if it has no suffix, append source_images[0].suffix; if it already has a suffix, use it as-is; then copy the source image to that resolved destination and return its string path.

feat: add nim image retrieval endpoint support

b44390a

github-actions bot added the feat label Oct 24, 2025

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

ajcasagrande self-assigned this Oct 24, 2025

Merge branch 'main' into ajc/img-ret

74c8ff3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add nim image retrieval endpoint support #394

feat: add nim image retrieval endpoint support #394

ajcasagrande commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

ajcasagrande commented Oct 24, 2025

Uh oh!

coderabbitai bot commented Oct 24, 2025

Uh oh!

codecov bot commented Oct 24, 2025

Uh oh!

coderabbitai bot commented Oct 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 24, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 24, 2025

Uh oh!

coderabbitai bot Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: add nim image retrieval endpoint support #394

Are you sure you want to change the base?

feat: add nim image retrieval endpoint support #394

Conversation

ajcasagrande commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

ajcasagrande commented Oct 24, 2025

Uh oh!

coderabbitai bot commented Oct 24, 2025

Uh oh!

codecov bot commented Oct 24, 2025

Codecov Report

Uh oh!

coderabbitai bot commented Oct 24, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajcasagrande commented Oct 24, 2025 •

edited by coderabbitai bot

Loading