Skip to content

BUG: Unnecessary duplicate scans with "Scan All Packages" due to inclusion of user UUID in ScanCode.io project name #387

@rogu-beta

Description

@rogu-beta

Describe the bug
DejaCode includes the user triggering a package scan in the project name that will be created in ScanCode.io. The issue with this is duplicate scans when one attempts to fix failed package scans. For this you could go into ScanCode.io and delete the failed ones and then go back to DejaCode and run "Scan All Packages". This is fine when done with the same user as before. However, if one does this with a different user than the one that originally triggered the scans, then "Scan All Packages" will result in a rescan of all packages, even those with successful scans. This is time consuming on projects with hundereds or thousands of packages.

It is unclear to me why the user ID is included here in the first place given that the scan result will not change depending on who triggers them.

def get_project_name(uri, user_uuid, dataspace_uuid):
"""
Return a project name based on a hash of the provided `uri` combined with a hash
of the `user_uuid` and `dataspace_uuid`.
project_name = "uri_hash.dataspace_uuid_hash.user_uuid_hash"
"""
uri_hash = get_hash_uid(uri)
dataspace_hash = get_hash_uid(dataspace_uuid)
user_hash = get_hash_uid(user_uuid)
return f"{uri_hash}.{dataspace_hash}.{user_hash}"

To Reproduce
Steps to reproduce the behavior:

  1. Log in as user A in DejaCode and create a product
  2. Import an SBOM to the product to populate the inventory with packages
  3. Make sure that packages have download URLs (perhaps running "populate_purldb" in ScanCode.io required, followed by "Improve Package from PurlDB in DejaCode)
  4. Trigger "Scan All Packages" in DejaCode
  5. Wait for scans to complete. Some will most likely fail due to timeouts
  6. Delete failed projects in ScanCode.io (either directly in ScanCode.io or from DejaCode)
  7. Log in as user B in DejaCode
  8. Trigger "Scan All Packages"
  9. Notice that all packages will be rescanned instead of just the ones without a scan result

Expected behavior
ScanCode.io scans should not be tied to the user triggering them but rather to the namespace and package. When another "Scan All Packages" is run, it should be able to recognize that scan results are already present. This could be fixed by adapting the way the project names are calculated.

Screenshots
n.a.

Context (OS, Browser, Device, etc.):
n.a.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdesign neededDesign details needed to complete the issueenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions