-
Couldn't load subscription status.
- Fork 3
feat: add nim image retrieval endpoint support #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
WalkthroughThis pull request adds comprehensive support for image retrieval and video processing capabilities to the aiperf framework. It introduces new metric units and metrics for images/videos, a new image retrieval endpoint type, video modality support across dataset models and loaders, audio/video encoding utilities, and response data structures for image retrieval, with corresponding test coverage. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes This PR spans 20+ files with heterogeneous changes: enum/metric additions, media encoding logic with format detection, endpoint validation and response parsing, dataset model extensions, and comprehensive test coverage. While individual sections follow consistent patterns, the breadth of interconnected functionality and logic density around media handling, metric calculations, and endpoint validation warrant sustained attention across multiple distinct areas. Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/aiperf/dataset/loader/random_pool.py (1)
179-192: Add videos to the merge logic.The _merge_turns method merges texts, images, and audios but omits videos. This inconsistency will cause video data to be lost when turns are merged.
Apply this diff to include videos in the merged turn:
def _merge_turns(self, turns: list[Turn]) -> Turn: """Merge turns into a single turn. Args: turns: A list of turns. Returns: A single turn. """ merged_turn = Turn( texts=[text for turn in turns for text in turn.texts], images=[image for turn in turns for image in turn.images], audios=[audio for turn in turns for audio in turn.audios], + videos=[video for turn in turns for video in turn.videos], ) return merged_turn
🧹 Nitpick comments (27)
tests/endpoints/test_nim_image_retrieval_endpoint.py (1)
49-52: Make the failure assertion less brittle and add multi-image coverage.
- Use a stable substring in the regex to reduce brittleness if wording changes.
- Add a test for multiple images to ensure list ordering and formatting.
Apply this minimal tweak to the assertion:
-with pytest.raises( - ValueError, match="Image Retrieval request requires at least one image" -): +with pytest.raises(ValueError, match=r"requires at least one image"):Optionally add:
def test_format_payload_multiple_images(endpoint, model_endpoint): turn = Turn(images=[Image(contents=[""]), Image(contents=[""])], model="image-retrieval-model") req = RequestInfo(model_endpoint=model_endpoint, turns=[turn]) payload = endpoint.format_payload(req) assert [i["url"] for i in payload["input"]] == ["",""]tests/loaders/test_multi_turn.py (1)
456-467: Good coverage; consider asserting image ordering for stability.Add an explicit order check to ensure the two encoded images remain in the provided order during conversion.
first_turn = conversation.turns[0] assert first_turn.texts[0].contents == ["What's this?"] -assert len(first_turn.images[0].contents) == 1 +assert len(first_turn.images[0].contents) == 1 # ... second_turn = conversation.turns[1] assert second_turn.texts[0].contents == ["Follow up"] -assert len(second_turn.images[0].contents) == 1 +assert len(second_turn.images[0].contents) == 1 + +# Optional: verify order is preserved by comparing raw contents before/after +# (placeholders—focus is on positional stability) +img0 = first_turn.images[0].contents[0] +img1 = second_turn.images[0].contents[0] +assert img0 != img1 # sanity check; should represent different inputsAlso applies to: 479-485, 486-492
tests/loaders/test_random_pool.py (2)
251-275: Add an explicit ordering assertion for batched images.Helps catch accidental reordering during encoding.
for img_content in turn.images[0].contents: assert img_content.startswith("data:image/") assert ";base64," in img_content +assert turn.images[0].contents[0] != turn.images[0].contents[1]
325-375: Good multi-file assertions; mirror the image-encoding checks for both conversations.You already validate base64 for both; consider asserting that text-image pairs belong to different files (queries vs contexts) by name when available to tighten guarantees.
tests/loaders/conftest.py (2)
89-111: Minor: produce exact sample count for generated audio.Use endpoint=False to avoid including the end sample twice for short durations.
- t = np.linspace(0, duration, int(sample_rate * duration)) + t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
124-166: Narrow exception handling and ensure temp-frame cleanup in video fixture.Catching Exception masks errors (Ruff BLE001). Also, ensure frames are cleaned even on failure by using TemporaryDirectory.
- def _create_video(name: str = "test_video.mp4"): + def _create_video(name: str = "test_video.mp4"): dest_path = tmp_path / name - - # Try using ffmpeg-python if available, otherwise create a minimal MP4 - try: - import tempfile - - import ffmpeg - # Create a few simple frames - temp_frame_dir = tempfile.mkdtemp(prefix="video_frames_") - for i in range(3): - img = Image.new("RGB", (64, 64), (i * 80, 0, 0)) - draw = ImageDraw.Draw(img) - draw.text((10, 25), f"F{i}", fill=(255, 255, 255)) - img.save(f"{temp_frame_dir}/frame_{i:03d}.png") - # Use ffmpeg to create video - ( - ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1) - .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1) - .overwrite_output() - .run(quiet=True) - ) - for file in Path(temp_frame_dir).glob("*.png"): - file.unlink() - Path(temp_frame_dir).rmdir() - except (ImportError, Exception): + # Try using ffmpeg-python if available, otherwise create a minimal MP4 + try: + try: + import ffmpeg # type: ignore + except ImportError: + ffmpeg = None + if ffmpeg: + import tempfile as _tf + from tempfile import TemporaryDirectory + with TemporaryDirectory(prefix="video_frames_") as temp_frame_dir: + for i in range(3): + img = Image.new("RGB", (64, 64), (i * 80, 0, 0)) + draw = ImageDraw.Draw(img) + draw.text((10, 25), f"F{i}", fill=(255, 255, 255)) + img.save(f"{temp_frame_dir}/frame_{i:03d}.png") + ( + ffmpeg.input(f"{temp_frame_dir}/frame_%03d.png", framerate=1) + .output(str(dest_path), vcodec="libx264", pix_fmt="yuv420p", t=1) + .overwrite_output() + .run(quiet=True) + ) + else: + raise RuntimeError("ffmpeg not available") + except Exception: # Fallback: create a minimal valid MP4 file minimal_mp4 = bytes.fromhex( "000000186674797069736f6d0000020069736f6d69736f32617663310000" "0008667265650000002c6d6461740000001c6d6f6f7600000000006d7668" "6400000000000000000000000000000001000000" ) with open(dest_path, "wb") as f: f.write(minimal_mp4) return str(dest_path)If keeping broad except is intentional, add a noqa for BLE001 with a short rationale.
tests/loaders/test_single_turn.py (5)
399-437: Avoid hard-coded asset UUIDs; use the fixture to reduce skips.Replace the fixed source path with the create_test_image fixture to make this portable and keep the test running across environments.
- def test_convert_local_image_to_base64(self, create_jsonl_file): + def test_convert_local_image_to_base64(self, create_jsonl_file, create_test_image): """Test that local image files are encoded to base64 data URLs.""" - test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg") - if not test_image.exists(): - pytest.skip("Test image not found") + test_image = Path(create_test_image())Also, narrow the exception in base64 validation:
- try: - base64.b64decode(base64_part) - except Exception as e: + import binascii + try: + base64.b64decode(base64_part) + except (binascii.Error, ValueError) as e: pytest.fail(f"Invalid base64 encoding: {e}")
472-512: Use fixture-driven images instead of hard-coded paths.Swap the two explicit paths with the test_images fixture to avoid brittle skips.
- test_images = [ - Path("src/.../source_images/0bfd8fdf-..._1024.jpg"), - Path("src/.../source_images/119544eb-..._861.jpg"), - ] + def_imgs = [Path(p) for _, p in sorted(test_images.items())[:2]] + test_images = def_imgs
513-552: Prefer create_test_image for the local component in mixed sources.Keeps the test self-contained and portable.
- test_image = Path("src/aiperf/dataset/generator/assets/source_images/0bfd8fdf-457f-43c8-9253-a2346d37d26a_1024.jpg") - if not test_image.exists(): - pytest.skip("Test image not found") + test_image = Path(create_test_image())
596-601: Narrow the exception type in audio base64 validation.Catching Exception is too broad and hides unrelated bugs.
- try: - base64.b64decode(base64_part) - except Exception as e: + import binascii + try: + base64.b64decode(base64_part) + except (binascii.Error, ValueError) as e: pytest.fail(f"Invalid base64 encoding: {e}")Note: Audio uses "wav," whereas images/videos use data URLs. Consider aligning formats or documenting the difference clearly.
665-668: Same here: narrow the exception type for video base64 validation.- try: - base64.b64decode(base64_part) - except Exception as e: + import binascii + try: + base64.b64decode(base64_part) + except (binascii.Error, ValueError) as e: pytest.fail(f"Invalid base64 encoding: {e}")src/aiperf/dataset/loader/models.py (1)
56-69: Reduce duplication in validators to prevent drift.Extract shared helpers for:
- mutually exclusive scalar vs list per modality
- at-least-one-modality checks
This keeps SingleTurn and RandomPool in sync as modalities evolve.
Example helper sketch:
def _ensure_exclusive(self, pairs: list[tuple[object, object]], names: list[tuple[str,str]]): for (a,b), (an,bn) in zip(pairs, names): if a and b: raise ValueError(f"{an} and {bn} cannot be set together") def _has_any(self, fields: list[object]) -> bool: return any(bool(f) for f in fields)Then call with the relevant fields per model. Also consider rejecting empty lists explicitly if passed.
Also applies to: 149-160, 162-178
src/aiperf/dataset/utils.py (1)
150-153: Consider using shorter exception messages or custom exception classes.Static analysis suggests avoiding long exception messages outside the exception class. While not critical, consider either shortening these messages or creating custom exception classes if this pattern appears frequently.
Also applies to: 197-200
src/aiperf/endpoints/nim_image_retrieval.py (1)
35-35: Consider using shorter exception messages or custom exception classes.Static analysis suggests avoiding long exception messages outside the exception class. While not critical for functionality, this is a style consideration.
Also applies to: 46-46, 49-49
src/aiperf/dataset/loader/mixins.py (1)
111-111: Consider using shorter exception messages or custom exception classes.Static analysis suggests avoiding long exception messages outside the exception class. This is a style consideration and not critical for functionality.
Also applies to: 171-171
src/aiperf/common/enums/metric_enums.py (2)
296-301: Guard conversions between inverted and non‑inverted over‑time units.Tags reflect inversion, but convert_to does not explicitly prevent converting between inverted and non‑inverted units (e.g., IMAGES_PER_SECOND ↔ MS_PER_IMAGE). Make this fail fast with a clear error to avoid accidental misuse.
Apply this diff:
class MetricOverTimeUnitInfo(BaseMetricUnitInfo): @@ def convert_to(self, other_unit: "MetricUnitT", value: int | float) -> float: @@ - if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo): + if isinstance(other_unit, MetricOverTimeUnit | MetricOverTimeUnitInfo): + # Disallow conversions across inverted orientation to avoid subtle errors. + if self.inverted != other_unit.inverted: + raise MetricUnitError( + f"Cannot convert between inverted ('{self.tag}') and non-inverted ('{other_unit.tag}') units. " + "Compute the reciprocal metric explicitly." + ) # Chain convert each unit to the other unit. value = self.primary_unit.convert_to(other_unit.primary_unit, value) value = self.time_unit.convert_to(other_unit.time_unit, value) if self.third_unit and other_unit.third_unit: value = self.third_unit.convert_to(other_unit.third_unit, value) return valueAlso applies to: 315-336
354-371: Naming and inverted configuration look good; consider optional seconds variants.IMAGES_PER_SECOND/MS_PER_IMAGE and VIDEOS_PER_SECOND/MS_PER_VIDEO are coherent. If consumers need seconds-per-image/video without rounding to ms, consider adding SECONDS_PER_IMAGE and SECONDS_PER_VIDEO for symmetry; otherwise current time-unit conversions on latency metrics suffice.
Confirm whether UI/CSV exporters ever need “s/image” or “s/video” tags directly.
src/aiperf/metrics/types/image_metrics.py (5)
1-10: Import ClassVar to annotate mutable class attributes.Needed for RUF012 compliance.
-from aiperf.common.enums import MetricFlags +from typing import ClassVar +from aiperf.common.enums import MetricFlags
21-35: Count logic ok; silence unusedrecord_metrics.The summation matches the stated behavior. Delete the unused parameter to satisfy ARG002 without changing the signature.
def _parse_record( self, record: ParsedResponseRecord, record_metrics: MetricRecordDict ) -> int: """Parse the number of images from the record by summing the number of images in each turn.""" + del record_metrics # unused num_images = sum( len(image.contents) for turn in record.request.turns for image in turn.images ) if num_images == 0: - raise NoMetricValue( - "Record must have at least one image in at least one turn." - ) + raise NoMetricValue("No images found.") return num_images
46-49: Annotate mutable class attributerequired_metricswith ClassVar.Avoids it being treated as an instance attribute.
- required_metrics = { + required_metrics: ClassVar[set[str]] = { NumImagesMetric.tag, RequestLatencyMetric.tag, }
71-74: Annotate mutable class attributerequired_metricswith ClassVar.Same as throughput metric.
- required_metrics = { + required_metrics: ClassVar[set[str]] = { NumImagesMetric.tag, RequestLatencyMetric.tag, }
76-84: Silence unusedrecordparameter.Keeps signature while appeasing ARG002.
def _parse_record( self, record: ParsedResponseRecord, record_metrics: MetricRecordDict ) -> float: """Parse the image latency from the record by dividing the request latency by the number of images.""" + del record # unused num_images = record_metrics.get_or_raise(NumImagesMetric) request_latency_ms = record_metrics.get_converted_or_raise( RequestLatencyMetric, self.unit.time_unit ) return request_latency_ms / num_imagessrc/aiperf/metrics/types/video_metrics.py (5)
1-10: Import ClassVar for mutable class attribute annotations.-from aiperf.common.enums import MetricFlags +from typing import ClassVar +from aiperf.common.enums import MetricFlags
21-35: Count logic ok; silence unusedrecord_metrics.def _parse_record( self, record: ParsedResponseRecord, record_metrics: MetricRecordDict ) -> int: """Parse the number of videos from the record by summing the number of videos in each turn.""" + del record_metrics # unused num_videos = sum( len(video.contents) for turn in record.request.turns for video in turn.videos ) if num_videos == 0: - raise NoMetricValue( - "Record must have at least one video in at least one turn." - ) + raise NoMetricValue("No videos found.") return num_videos
45-48: Annotate mutable class attributerequired_metricswith ClassVar.- required_metrics = { + required_metrics: ClassVar[set[str]] = { NumVideosMetric.tag, RequestLatencyMetric.tag, }
70-73: Annotate mutable class attributerequired_metricswith ClassVar.- required_metrics = { + required_metrics: ClassVar[set[str]] = { NumVideosMetric.tag, RequestLatencyMetric.tag, }
75-83: Silence unusedrecordparameter.def _parse_record( self, record: ParsedResponseRecord, record_metrics: MetricRecordDict ) -> float: """Parse the video latency from the record by dividing the request latency by the number of videos.""" + del record # unused num_videos = record_metrics.get_or_raise(NumVideosMetric) request_latency_ms = record_metrics.get_converted_or_raise( RequestLatencyMetric, self.unit.time_unit ) return request_latency_ms / num_videos
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
src/aiperf/common/enums/metric_enums.py(6 hunks)src/aiperf/common/enums/plugin_enums.py(1 hunks)src/aiperf/common/models/__init__.py(2 hunks)src/aiperf/common/models/record_models.py(2 hunks)src/aiperf/dataset/__init__.py(2 hunks)src/aiperf/dataset/loader/mixins.py(4 hunks)src/aiperf/dataset/loader/models.py(9 hunks)src/aiperf/dataset/loader/multi_turn.py(1 hunks)src/aiperf/dataset/loader/random_pool.py(1 hunks)src/aiperf/dataset/loader/single_turn.py(1 hunks)src/aiperf/dataset/utils.py(2 hunks)src/aiperf/endpoints/__init__.py(2 hunks)src/aiperf/endpoints/nim_image_retrieval.py(1 hunks)src/aiperf/metrics/types/image_metrics.py(1 hunks)src/aiperf/metrics/types/video_metrics.py(1 hunks)tests/endpoints/test_nim_image_retrieval_endpoint.py(1 hunks)tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py(1 hunks)tests/loaders/conftest.py(2 hunks)tests/loaders/test_multi_turn.py(2 hunks)tests/loaders/test_random_pool.py(5 hunks)tests/loaders/test_single_turn.py(6 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
src/aiperf/dataset/utils.py
150-153: Avoid specifying long messages outside the exception class
(TRY003)
197-200: Avoid specifying long messages outside the exception class
(TRY003)
src/aiperf/metrics/types/image_metrics.py
22-22: Unused method argument: record_metrics
(ARG002)
31-33: Avoid specifying long messages outside the exception class
(TRY003)
46-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
52-52: Unused method argument: record
(ARG002)
71-74: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
77-77: Unused method argument: record
(ARG002)
src/aiperf/dataset/loader/mixins.py
111-111: Avoid specifying long messages outside the exception class
(TRY003)
171-171: Avoid specifying long messages outside the exception class
(TRY003)
tests/loaders/conftest.py
153-153: Do not catch blind exception: Exception
(BLE001)
src/aiperf/dataset/loader/models.py
66-66: Avoid specifying long messages outside the exception class
(TRY003)
159-159: Avoid specifying long messages outside the exception class
(TRY003)
tests/loaders/test_single_turn.py
434-434: Do not catch blind exception: Exception
(BLE001)
599-599: Do not catch blind exception: Exception
(BLE001)
667-667: Do not catch blind exception: Exception
(BLE001)
src/aiperf/endpoints/nim_image_retrieval.py
35-35: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
49-49: Avoid specifying long messages outside the exception class
(TRY003)
src/aiperf/metrics/types/video_metrics.py
22-22: Unused method argument: record_metrics
(ARG002)
31-33: Avoid specifying long messages outside the exception class
(TRY003)
45-48: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
51-51: Unused method argument: record
(ARG002)
70-73: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
76-76: Unused method argument: record
(ARG002)
🔇 Additional comments (27)
src/aiperf/common/models/__init__.py (1)
72-72: LGTM! ImageRetrievalResponseData properly exported.The new response data class is correctly imported and exported following the same pattern as other response data types.
Also applies to: 144-144
src/aiperf/common/models/record_models.py (2)
602-612: LGTM! ImageRetrievalResponseData follows established pattern.The new response data class is well-structured and consistent with similar non-text response types (EmbeddingResponseData, RankingsResponseData).
623-623: LGTM! ParsedResponse union updated correctly.ImageRetrievalResponseData properly added to the SerializeAsAny union type.
src/aiperf/dataset/loader/multi_turn.py (1)
142-142: LGTM! Video modality support added consistently.The videos field is correctly passed to the Turn constructor, following the same pattern as texts, images, and audios.
src/aiperf/dataset/loader/random_pool.py (1)
167-167: LGTM! Video modality support added.The videos field is correctly passed to the Turn constructor, consistent with the pattern for other modalities.
src/aiperf/common/enums/plugin_enums.py (1)
30-30: LGTM! IMAGE_RETRIEVAL endpoint type added.The new endpoint type follows the established pattern and naming convention for other endpoint types.
src/aiperf/endpoints/__init__.py (1)
7-9: LGTM! ImageRetrievalEndpoint properly exported.The new endpoint is correctly imported and exported, following the same pattern as other endpoint implementations.
Also applies to: 28-28
src/aiperf/dataset/loader/single_turn.py (1)
113-113: LGTM! Video modality support added consistently.The videos field is correctly passed to the Turn constructor, following the same pattern as other modalities.
tests/endpoints/test_nim_image_retrieval_endpoint_parse_response.py (3)
19-36: LGTM! Endpoint fixture properly configured.The fixture correctly sets up an ImageRetrievalEndpoint with appropriate mocking for the transport layer.
38-68: LGTM! Basic parse response test is comprehensive.The test validates the complete parsing flow including timestamp preservation, response type verification, and data structure integrity.
70-77: LGTM! Invalid response handling tested.The test properly verifies that None is returned for invalid/empty responses.
tests/endpoints/test_nim_image_retrieval_endpoint.py (1)
31-43: Happy path looks solid.Asserting a single image_url item and echoing the data URL is correct for the NIM payload.
tests/loaders/test_random_pool.py (1)
223-250: LGTM for multimodal conversion assertions.Data URL checks for image and passthrough for audio URL are appropriate.
tests/loaders/test_single_turn.py (1)
310-326: URL passthrough assertions look correct.Good separation: local files are encoded elsewhere; remote URLs pass through as-is.
src/aiperf/dataset/loader/models.py (1)
42-46: Video modality support is correctly integrated.Fields and validators mirror existing modalities; docstrings updated accordingly.
Also applies to: 65-67, 75-85, 143-147, 158-160, 166-176
src/aiperf/dataset/utils.py (2)
127-159: Verify type consistency between open_audio return value and encode_audio parameter.The function returns
audio_format.value(a string), butencode_audioexpectsformat: AudioFormat(an enum). This type mismatch could cause confusion and may fail static type checking.Consider either:
- Changing the return type to
tuple[bytes, AudioFormat]and returning the enum, or- Updating
encode_audioto acceptstrinstead ofAudioFormatApply this diff to return the enum for consistency:
- return audio_bytes, audio_format.value + return audio_bytes, audio_formatAnd update the docstring:
Returns: - A tuple of (audio_bytes, format) where format is 'wav' or 'mp3'. + A tuple of (audio_bytes, format) where format is an AudioFormat enum.
176-206: Verify type consistency between open_video return value and encode_video parameter.Similar to
open_audio, this function returnsvideo_format.value(a string), butencode_videoexpectsformat: VideoFormat(an enum). This creates a type mismatch.Apply this diff to return the enum for consistency:
- return video_bytes, video_format.value + return video_bytes, video_formatAnd update the docstring:
Returns: - A tuple of (video_bytes, format) where format is VideoFormat.MP4. + A tuple of (video_bytes, format) where format is a VideoFormat enum.src/aiperf/dataset/__init__.py (1)
40-51: LGTM!The new audio and video utilities are correctly imported and exported. The public API surface expansion is clean and consistent with existing patterns.
Also applies to: 53-92
src/aiperf/endpoints/nim_image_retrieval.py (2)
23-30: LGTM!The metadata configuration is appropriate for an image retrieval endpoint.
65-83: LGTM!The response parsing handles missing JSON and missing data fields appropriately with debug logging. Returning
Nonefor unparseable responses appears to be the established pattern in this codebase.src/aiperf/dataset/loader/mixins.py (4)
47-89: LGTM!The extended media conversion logic correctly handles video alongside image and audio, with appropriate encoding for local files. The singular and plural field handling is consistent.
91-114: LGTM!The URL validation logic is robust, correctly handling valid URLs, non-URLs, and raising errors for malformed URLs with only scheme or netloc. This prevents subtle bugs.
144-171: Verify compatibility with utils.py type signatures.This method calls
utils.open_audioandutils.open_videowhich currently return string format values, but then passes those toutils.encode_audioandutils.encode_videowhich expect enum types. This works at runtime because the encode functions incorrectly call.lower()on the parameter without.value, but the type signatures are inconsistent.Ensure the type signature fixes suggested for
utils.pyare applied consistently, so that:
open_audioandopen_videoreturn enumsencode_audioandencode_videoaccept enums and call.value.lower()Or alternatively:
- All functions use strings consistently
173-202: LGTM!The media content handling logic is well-structured, checking for already-encoded content (including data URLs) before checking for remote URLs, then treating remaining content as local files. The ordering is correct and prevents data URLs from being misidentified as remote URLs.
src/aiperf/common/enums/metric_enums.py (3)
191-195: Units added are consistent and clear.IMAGE/IMAGES/VIDEO/VIDEOS naming aligns with existing pattern and tag casing. No issues.
393-397: Good API: exposeinvertedon the enum.Surface mirrors info cleanly; helps callers avoid peeking into info.
678-680: Video-only flag addition is sensible and non-breaking.Bit position continues sequence; no overlap.
| if not turn.images: | ||
| raise ValueError("Image Retrieval request requires at least one image.") | ||
|
|
||
| if not turn.images[0].contents: | ||
| raise ValueError("Image content is required for Image Retrieval.") | ||
|
|
||
| payload = { | ||
| "input": [ | ||
| {"type": "image_url", "url": content} | ||
| for img in turn.images | ||
| for content in img.contents | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete validation: only first image is checked.
Lines 48-49 only validate that turn.images[0].contents is not empty, but lines 53-56 iterate over all images and their contents. If subsequent images have empty contents lists, they won't contribute to the payload but won't raise an error either.
Consider validating all images:
if not turn.images:
raise ValueError("Image Retrieval request requires at least one image.")
- if not turn.images[0].contents:
- raise ValueError("Image content is required for Image Retrieval.")
+ if not any(img.contents for img in turn.images):
+ raise ValueError("At least one image must have content for Image Retrieval.")Or validate that each image has content:
if not turn.images:
raise ValueError("Image Retrieval request requires at least one image.")
- if not turn.images[0].contents:
- raise ValueError("Image content is required for Image Retrieval.")
+ for img in turn.images:
+ if not img.contents:
+ raise ValueError("All images must have content for Image Retrieval.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not turn.images: | |
| raise ValueError("Image Retrieval request requires at least one image.") | |
| if not turn.images[0].contents: | |
| raise ValueError("Image content is required for Image Retrieval.") | |
| payload = { | |
| "input": [ | |
| {"type": "image_url", "url": content} | |
| for img in turn.images | |
| for content in img.contents | |
| ], | |
| if not turn.images: | |
| raise ValueError("Image Retrieval request requires at least one image.") | |
| if not any(img.contents for img in turn.images): | |
| raise ValueError("At least one image must have content for Image Retrieval.") | |
| payload = { | |
| "input": [ | |
| {"type": "image_url", "url": content} | |
| for img in turn.images | |
| for content in img.contents | |
| ], |
| if not turn.images: | |
| raise ValueError("Image Retrieval request requires at least one image.") | |
| if not turn.images[0].contents: | |
| raise ValueError("Image content is required for Image Retrieval.") | |
| payload = { | |
| "input": [ | |
| {"type": "image_url", "url": content} | |
| for img in turn.images | |
| for content in img.contents | |
| ], | |
| if not turn.images: | |
| raise ValueError("Image Retrieval request requires at least one image.") | |
| for img in turn.images: | |
| if not img.contents: | |
| raise ValueError("All images must have content for Image Retrieval.") | |
| payload = { | |
| "input": [ | |
| {"type": "image_url", "url": content} | |
| for img in turn.images | |
| for content in img.contents | |
| ], |
🧰 Tools
🪛 Ruff (0.14.1)
46-46: Avoid specifying long messages outside the exception class
(TRY003)
49-49: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In src/aiperf/endpoints/nim_image_retrieval.py around lines 45 to 56, the code
only checks turn.images[0].contents but builds a payload from every image;
update validation to ensure every image in turn.images has a non-empty contents
list (and optionally non-empty content values) before constructing the payload,
raising a ValueError that includes the offending image index or a clear message
if any image.contents is empty, so the payload only proceeds when all images
have content.
| @pytest.fixture | ||
| def test_images(tmp_path): | ||
| """Create temporary test images copied from source assets. | ||
| Returns: | ||
| A dictionary mapping image names to their temporary file paths. | ||
| """ | ||
| # Get the source images directory | ||
| source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images") | ||
|
|
||
| # Get some actual image files | ||
| source_images = list(source_images_dir.glob("*.jpg"))[:4] | ||
|
|
||
| if not source_images: | ||
| pytest.skip("No source images found for testing") | ||
|
|
||
| # Create temporary copies with simple names | ||
| image_map = {} | ||
| for i, source_img in enumerate(source_images, 1): | ||
| dest_path = tmp_path / f"image{i}.png" | ||
| shutil.copy(source_img, dest_path) | ||
| image_map[f"image{i}.png"] = str(dest_path) | ||
|
|
||
| return image_map | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix image extension/content mismatch to avoid incorrect MIME.
Source files are “.jpg” but are copied to “.png” names. If encoders infer MIME from suffix, you’ll produce data:image/png with JPEG bytes. Preserve the original suffix for both path and key.
- for i, source_img in enumerate(source_images, 1):
- dest_path = tmp_path / f"image{i}.png"
- shutil.copy(source_img, dest_path)
- image_map[f"image{i}.png"] = str(dest_path)
+ for i, source_img in enumerate(source_images, 1):
+ dest_path = tmp_path / f"image{i}{source_img.suffix}"
+ shutil.copy(source_img, dest_path)
+ image_map[dest_path.name] = str(dest_path)Optionally, if no source images, generate a tiny synthetic image instead of skipping to keep tests running.
🤖 Prompt for AI Agents
In tests/loaders/conftest.py around lines 31 to 55, the fixture copies JPEG
source files but renames them to .png which causes MIME/encoder mismatches; fix
by preserving the original file extension when constructing dest_path and the
dict key (use source_img.suffix and source_img.stem), i.e., copy to tmp_path /
f"{source_img.stem}{source_img.suffix}" and map that filename to the dest path
string; optionally, if no source_images are found, create a tiny valid synthetic
image (e.g., write a minimal 1x1 PNG or JPEG byte payload) into tmp_path and
include it in image_map instead of skipping so tests still run.
| @pytest.fixture | ||
| def create_test_image(tmp_path): | ||
| """Create a single test image copied from source assets. | ||
| Returns: | ||
| A function that creates a test image with the given name. | ||
| """ | ||
| source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images") | ||
| source_images = list(source_images_dir.glob("*.jpg")) | ||
|
|
||
| if not source_images: | ||
| pytest.skip("No source images found for testing") | ||
|
|
||
| def _create_image(name: str = "test_image.png"): | ||
| dest_path = tmp_path / name | ||
| shutil.copy(source_images[0], dest_path) | ||
| return str(dest_path) | ||
|
|
||
| return _create_image | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Align create_test_image naming with source suffix.
Same suffix issue here. Respect caller-provided suffix if present; otherwise inherit from source.
- def _create_image(name: str = "test_image.png"):
- dest_path = tmp_path / name
+ def _create_image(name: str = "test_image"):
+ # If caller passed a suffix, use it; else inherit from source
+ suffix = Path(name).suffix or source_images[0].suffix
+ stem = Path(name).stem
+ dest_path = tmp_path / f"{stem}{suffix}"
shutil.copy(source_images[0], dest_path)
return str(dest_path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture | |
| def create_test_image(tmp_path): | |
| """Create a single test image copied from source assets. | |
| Returns: | |
| A function that creates a test image with the given name. | |
| """ | |
| source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images") | |
| source_images = list(source_images_dir.glob("*.jpg")) | |
| if not source_images: | |
| pytest.skip("No source images found for testing") | |
| def _create_image(name: str = "test_image.png"): | |
| dest_path = tmp_path / name | |
| shutil.copy(source_images[0], dest_path) | |
| return str(dest_path) | |
| return _create_image | |
| @pytest.fixture | |
| def create_test_image(tmp_path): | |
| """Create a single test image copied from source assets. | |
| Returns: | |
| A function that creates a test image with the given name. | |
| """ | |
| source_images_dir = Path("src/aiperf/dataset/generator/assets/source_images") | |
| source_images = list(source_images_dir.glob("*.jpg")) | |
| if not source_images: | |
| pytest.skip("No source images found for testing") | |
| def _create_image(name: str = "test_image"): | |
| # If caller passed a suffix, use it; else inherit from source | |
| suffix = Path(name).suffix or source_images[0].suffix | |
| stem = Path(name).stem | |
| dest_path = tmp_path / f"{stem}{suffix}" | |
| shutil.copy(source_images[0], dest_path) | |
| return str(dest_path) | |
| return _create_image |
🤖 Prompt for AI Agents
In tests/loaders/conftest.py around lines 57 to 76, the fixture always writes
"test_image.png" ignoring the source image extension; change the factory so it
respects a caller-provided suffix if present, otherwise inherit the suffix from
source_images[0]. Specifically: when building dest_path, parse the provided name
with Path(name) and if it has no suffix, append source_images[0].suffix; if it
already has a suffix, use it as-is; then copy the source image to that resolved
destination and return its string path.
Summary by CodeRabbit
New Features
Metrics