BUG: Unnecessary duplicate scans with "Scan All Packages" due to inclusion of user UUID in ScanCode.io project name

**Describe the bug**
DejaCode includes the user triggering a package scan in the project name that will be created in ScanCode.io. The issue with this is duplicate scans when one attempts to fix failed package scans. For this you could go into ScanCode.io and delete the failed ones and then go back to DejaCode and run "Scan All Packages". This is fine when done with the same user as before. However, if one does this with a different user than the one that originally triggered the scans, then "Scan All Packages" will result in a rescan of all packages, even those with successful scans. This is time consuming on projects with hundereds or thousands of packages.

It is unclear to me why the user ID is included here in the first place given that the scan result will not change depending on who triggers them.

https://github.com/aboutcode-org/dejacode/blob/e80db0eaeffdab150a834413b1f14360a46d3c0c/dejacode_toolkit/scancodeio.py#L447-L458

**To Reproduce**
Steps to reproduce the behavior:
1. Log in as user A in DejaCode and create a product
2. Import an SBOM to the product to populate the inventory with packages
3. Make sure that packages have download URLs (perhaps running "populate_purldb" in ScanCode.io required, followed by "Improve Package from PurlDB in DejaCode)
4. Trigger "Scan All Packages" in DejaCode
5. Wait for scans to complete. Some will most likely fail due to timeouts
6. Delete failed projects in ScanCode.io (either directly in ScanCode.io or from DejaCode)
7. Log in as user B in DejaCode
8. Trigger "Scan All Packages"
9. Notice that all packages will be rescanned instead of just the ones without a scan result

**Expected behavior**
ScanCode.io scans should not be tied to the user triggering them but rather to the namespace and package. When another "Scan All Packages" is run, it should be able to recognize that scan results are already present. This could be fixed by adapting the way the project names are calculated.

**Screenshots**
n.a.

**Context (OS, Browser, Device, etc.):**
n.a.


	def get_project_name(uri, user_uuid, dataspace_uuid):
	"""
	Return a project name based on a hash of the provided `uri` combined with a hash
	of the `user_uuid` and `dataspace_uuid`.

	project_name = "uri_hash.dataspace_uuid_hash.user_uuid_hash"
	"""
	uri_hash = get_hash_uid(uri)
	dataspace_hash = get_hash_uid(dataspace_uuid)
	user_hash = get_hash_uid(user_uuid)

	return f"{uri_hash}.{dataspace_hash}.{user_hash}"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Unnecessary duplicate scans with "Scan All Packages" due to inclusion of user UUID in ScanCode.io project name #387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: Unnecessary duplicate scans with "Scan All Packages" due to inclusion of user UUID in ScanCode.io project name #387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions