Skip to content

Conversation

yyq19990828
Copy link

@yyq19990828 yyq19990828 commented Aug 21, 2025

Summary

This PR improves the dataset handling functionality in rfdetr/datasets/coco.py by adding adaptive folder detection and better fallback mechanisms for test datasets.

Key improvements:

  • Adaptive val/valid folder detection: Automatically detects whether the dataset uses val (YOLO format) or valid (COCO format) folder naming
  • Smart test dataset fallback: When test dataset is missing (which is common in most datasets), automatically uses the validation dataset as fallback
  • Reduced manual dataset modification: Users no longer need to manually rename folders to match expected naming conventions
  • Better error handling: Proper FileNotFoundError when neither val nor valid folders exist
  • Code internationalization: Converted Chinese comments to English

Why this change?

  • Most datasets don't include a separate test split, making the fallback mechanism necessary
  • YOLO datasets typically use val folder while COCO datasets use valid folder
  • This reduces the friction for users when working with different dataset formats without requiring manual folder restructuring

Test plan

  • Verify the code handles both val and valid folder structures
  • Confirm test dataset fallback works when test folder is missing
  • Ensure proper error handling when validation folders are missing

🤖 Generated with Claude Code

@CLAassistant
Copy link

CLAassistant commented Aug 21, 2025

CLA assistant check
All committers have signed the CLA.

…st dataset fallback

- Add adaptive val/valid folder detection to support both YOLO (val) and COCO (valid) dataset structures
- Implement fallback mechanism for test dataset since most datasets don't include test split
- Reduce the need for manual dataset modification by automatically handling different folder naming conventions
- Add proper error handling with FileNotFoundError for missing validation folders
- Convert Chinese comments to English for better internationalization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@yyq19990828 yyq19990828 force-pushed the feature/adaptive-dataset-folder-detection branch from fcd7819 to 2c9e689 Compare August 21, 2025 09:37
@isaacrob-roboflow
Copy link
Collaborator

I think this is interesting but I will say that datasets SHOULD include a test set that's not the val set ;) often nowadays people benchmark on COCO val, but that's not because that form is fine, it's because the test set is hidden away on a private server .. ultralytics notably does NOT report test set numbers on datasets they train on, but imo that gives a misleading measure of final accuracy because they're also picking the best checkpoint based on val score so the result is biased

so I would still like it to be clear when folks train a model that they SHOULD have a test set .. but handling val vs valid seems logical to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants