-
Notifications
You must be signed in to change notification settings - Fork 310
[ViTDet] Remove hardcoded image preprocessing and add ViTDet ImageConverter #2452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| from keras_hub.src.api_export import keras_hub_export | ||
| from keras_hub.src.layers.preprocessing.image_converter import ImageConverter | ||
| from keras_hub.src.models.vit_det.vit_det_backbone import ViTDetBackbone | ||
|
|
||
|
|
||
| @keras_hub_export("keras_hub.layers.ViTDetImageConverter") | ||
| class ViTDetImageConverter(ImageConverter): | ||
| """Image converter for ViTDet models. | ||
|
|
||
| This layer applies ImageNet normalization (mean=[0.485, 0.456, 0.406], | ||
| std=[0.229, 0.224, 0.225]) to input images for ViTDet models. | ||
|
|
||
| Args: | ||
| image_size: int or tuple of (height, width). The output size of the | ||
| image. Defaults to `(1024, 1024)`. | ||
|
|
||
| Example: | ||
| ```python | ||
| converter = keras_hub.layers.ViTDetImageConverter(image_size=(1024, 1024)) | ||
| converter(np.random.rand(1, 512, 512, 3)) # Resizes and normalizes | ||
| ``` | ||
| """ | ||
|
|
||
| backbone_cls = ViTDetBackbone | ||
|
|
||
| def __init__( | ||
| self, | ||
| image_size=(1024, 1024), | ||
| **kwargs, | ||
| ): | ||
| mean = [0.485, 0.456, 0.406] | ||
| std = [0.229, 0.224, 0.225] | ||
| variance = [x**2 for x in std] | ||
| super().__init__( | ||
| image_size=image_size, | ||
| scale=1.0 / 255.0, # Scale to [0, 1] | ||
| mean=mean, | ||
| variance=variance, | ||
| **kwargs, | ||
| ) | ||
|
Comment on lines
+8
to
+40
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All these are not required, except backbone_cls = ViTDetBackbone If the model is being used anywhere else, they need to call the ViTDetImageConverter by passing scale and offset which is calculated using mean and variance. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example code is a bit misleading and incomplete:
npwithout showing theimport numpy as npstatement.np.random.rand()generates float values in[0, 1). The layer then scales this by1/255, which is likely not the intended demonstration. Usingnp.random.randint(0, 256, ...)would better simulate a typicaluint8image for whichscale=1.0 / 255.0is appropriate.Please add the import inside the ````python
block and userandint` for clarity.