Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

Open
2 tasks done
zhiltsov-max opened this issue Jan 30, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jan 30, 2025

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

Related #9010

from cvat_sdk import make_client, models
from cvat_sdk.core.proxies.tasks import ResourceType

with make_client("http://localhost", port=8080, credentials=("user", "password")) as client:
    task = client.tasks.create_from_data(
        spec=models.TaskWriteRequest(
            name="task with cs images frame filter and honeypots",
            labels=[{"name": "cat"}],
            segment_size=3,
        ),
        resources=[
            "test/frame_1.jpg",
            "test/frame_2.jpg",
            "test/frame_3.jpg",
            "test/frame_4.jpg",
            "test/frame_5.jpg",
            "test/frame_6.jpg",
            "test/frame_7.jpg",
            "test/frame_8.jpg",
            "test/frame_9.jpg",
            "test/frame_10.jpg",
            "test/frame_11.jpg",
            "test/frame_12.jpg",
            "test/frame_13.jpg",
            "test/frame_14.jpg",
        ],
        resource_type=ResourceType.SHARE,
        data_params=dict(
            cloud_storage_id=157,
            image_quality=70,
            sorting_method="random",
            start_frame=2,
            stop_frame=14,
            frame_step=2,
            validation_params={
                "mode": "gt_pool",
                "frame_selection_method": "random_uniform",
                "frame_count": 3,
                "frames_per_job_count": 2,
            },
            use_cache=False, # ensure static cache
        ),
    )
$ CVAT_ALLOW_STATIC_CACHE=yes SMOKESCREEN_OPTS="--allow-address=172.22.0.1" docker compose -f docker-compose.yml up -d

$ python samples/create_task.py
...
`FileNotFoundError: [Errno 2] No such file or directory: '/home/django/data/data/1901/raw/test/frame_xxx.jpg`

Expected Behavior

No response

Possible Solution

No response

Context

  1. Here
    filtered_data = []
    for files in (i for i in media.values() if i):
    filtered_data.extend(files)
    media_to_download = filtered_data
    if media['image']:
    start_frame = db_data.start_frame
    stop_frame = len(filtered_data) - 1
    if data['stop_frame'] is not None:
    stop_frame = min(stop_frame, data['stop_frame'])
    step = db_data.get_frame_step()
    if start_frame or step != 1 or stop_frame != len(filtered_data) - 1:
    media_to_download = filtered_data[start_frame : stop_frame + 1: step]
    _download_data_from_cloud_storage(db_data.cloud_storage, media_to_download, upload_dir)
    frame sorting is not applied yet, so the input files are filtered as they are in the input list
  2. Next, a media extractor is created
    details = {
    'source_path': source_paths,
    'step': db_data.get_frame_step(),
    'start': db_data.start_frame,
    'stop': data['stop_frame'],
    }
    if media_type in {'archive', 'zip', 'pdf'} and db_data.storage == models.StorageChoice.SHARE:
    details['extract_dir'] = db_data.get_upload_dirname()
    upload_dir = db_data.get_upload_dirname()
    db_data.storage = models.StorageChoice.LOCAL
    if media_type != 'video':
    details['sorting_method'] = data['sorting_method'] if not is_media_sorted else models.SortingMethod.PREDEFINED
    extractor = MEDIA_TYPES[media_type]['extractor'](**details)
    with the same filtering and sorting params
  3. Next, a manifest creation starts here

    cvat/cvat/apps/engine/task.py

    Lines 1105 to 1114 in 3b5202e

    manifest.link(
    sources=extractor.absolute_source_paths,
    meta={
    k: {'related_images': related_images[k] }
    for k in related_images
    },
    data_dir=upload_dir,
    DIM_3D=(db_task.dimension == models.DimensionType.DIM_3D),
    )
    manifest.create()
    . It gets items from the extractor, but the frames are returned sorted. If the sorting is random, it can be a different order from what was in the input and (1). Then, frame filter is applied, and some frames are missing after downloading in (1)
  4. If honeypots are requested as well, the code fails in
    manifest.reorder([images[frame_idx_map[image.frame]].path for image in new_db_images])
    because .reorder() doesn't seem expect frames without meta in the manifest.

Environment

@zhiltsov-max zhiltsov-max added the bug Something isn't working label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant