Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add package uploader/maintainers to the Package metadata API #9978

Open
Duppils opened this issue Aug 31, 2021 · 4 comments · May be fixed by #16780
Open

Add package uploader/maintainers to the Package metadata API #9978

Duppils opened this issue Aug 31, 2021 · 4 comments · May be fixed by #16780
Labels
feature request help needed We'd love volunteers to advise on or help fix/implement this.

Comments

@Duppils
Copy link

Duppils commented Aug 31, 2021

What's the problem this feature will solve?

Help identify trustworthy package uploaders. Currently, the package metadata API https://pypi.org/project/{package_name}/json returns the repository maintainers, but not the package maintainers. Accessing package uploader/maintainer can help build credibility to the package or expose risks.

Describe the solution you'd like

Package maintainer is added to the API. If the package maintainers' historic contributions could be added to this or a separate API, that would help identify trustworthy packages.

Additional context

Home-brewed or forked packages, which should not inherit credibility, such as https://pypi.org/project/f-ask/. This package at a glance (incorrectly) looks to be owned by the pallets team, which has a different level of trust associated with it. This was just an example, please do not negatively affect whoever uploaded it. I do not wish to check if it was a malicious typo-squat or not, as that is irrelevant to the problem to fix.

@cthoyt
Copy link
Contributor

cthoyt commented Feb 26, 2023

I second @Duppils, though my motivation for wanting this information is to disambiguate PyPI user with GitHub user accounts, Wikidata entries, and ORCID identifiers so we in the computational life/natural sciences (and others) can better report on bibliometrics of software

What code needs to be changed

The following code is responsible for what gets put on the metadata API (https://pypi.org/project/{package_name}/json). The trick is just to connect the Project model in the database to the associated information, then do some sqlalchemy magic (i.e., joining + filtering) to get it out

def _json_data(request, project, release, *, all_releases):
# Get all of the releases and files for this project.
release_files = (
request.db.query(Release, File)
.options(
Load(Release).load_only(
"version", "requires_python", "yanked", "yanked_reason"
)
)
.outerjoin(File)
.filter(Release.project == project)
)
# If we're not looking for all_releases, then we'll filter this further
# to just this release.
if not all_releases:
release_files = release_files.filter(Release.id == release.id)
# Finally set an ordering, and execute the query.
release_files = release_files.order_by(
Release._pypi_ordering.desc(), File.filename
).all()
# Map our releases + files into a dictionary that maps each release to a
# list of all its files.
releases = {}
for r, file_ in release_files:
files = releases.setdefault(r, [])
if file_ is not None:
files.append(file_)
# Serialize our database objects to match the way that PyPI legacy
# presented this data.
releases = {
r.version: [
{
"filename": f.filename,
"packagetype": f.packagetype,
"python_version": f.python_version,
"has_sig": f.has_signature,
"comment_text": f.comment_text,
"md5_digest": f.md5_digest,
"digests": {
"md5": f.md5_digest,
"sha256": f.sha256_digest,
"blake2b_256": f.blake2_256_digest,
},
"size": f.size,
# TODO: Remove this once we've had a long enough time with it
# here to consider it no longer in use.
"downloads": -1,
"upload_time": f.upload_time.strftime("%Y-%m-%dT%H:%M:%S"),
"upload_time_iso_8601": f.upload_time.isoformat() + "Z",
"url": request.route_url("packaging.file", path=f.path),
"requires_python": r.requires_python if r.requires_python else None,
"yanked": r.yanked,
"yanked_reason": r.yanked_reason or None,
}
for f in fs
]
for r, fs in releases.items()
}
# Serialize a list of vulnerabilities for this release
vulnerabilities = [
{
"id": vulnerability_record.id,
"source": vulnerability_record.source,
"link": vulnerability_record.link,
"aliases": vulnerability_record.aliases,
"details": vulnerability_record.details,
"summary": vulnerability_record.summary,
"fixed_in": vulnerability_record.fixed_in,
"withdrawn": (
vulnerability_record.withdrawn.strftime("%Y-%m-%dT%H:%M:%SZ")
if vulnerability_record.withdrawn
else None
),
}
for vulnerability_record in release.vulnerabilities
]
data = {
"info": {
"name": project.name,
"version": release.version,
"summary": release.summary,
"description_content_type": release.description.content_type,
"description": release.description.raw,
"keywords": release.keywords,
"license": release.license,
"classifiers": list(release.classifiers),
"author": release.author,
"author_email": release.author_email,
"maintainer": release.maintainer,
"maintainer_email": release.maintainer_email,
"requires_python": release.requires_python,
"platform": release.platform,
"downloads": {"last_day": -1, "last_week": -1, "last_month": -1},
"package_url": request.route_url("packaging.project", name=project.name),
"project_url": request.route_url("packaging.project", name=project.name),
"project_urls": release.urls if release.urls else None,
"release_url": request.route_url(
"packaging.release", name=project.name, version=release.version
),
"requires_dist": (
list(release.requires_dist) if release.requires_dist else None
),
"docs_url": project.documentation_url,
"bugtrack_url": None,
"home_page": release.home_page,
"download_url": release.download_url,
"yanked": release.yanked,
"yanked_reason": release.yanked_reason or None,
},
"urls": releases[release.version],
"vulnerabilities": vulnerabilities,
"last_serial": project.last_serial,
}
if all_releases:
data["releases"] = releases
return data

Investigation

This appears to be the enum responsible for people's roles in a package:

class TeamProjectRoleType(str, enum.Enum):
Owner = "Owner" # Granted "Administer" permissions.
Maintainer = "Maintainer" # Granted "Upload" permissions.

This enum appears in the following database model:

class TeamProjectRole(db.Model):
__tablename__ = "team_project_roles"
__table_args__ = (
Index("team_project_roles_project_id_idx", "project_id"),
Index("team_project_roles_team_id_idx", "team_id"),
UniqueConstraint(
"project_id",
"team_id",
name="_team_project_roles_project_team_uc",
),
)
__repr__ = make_repr("role_name", "team", "project")
role_name = Column(
Enum(TeamProjectRoleType, values_callable=lambda x: [e.value for e in x]),
nullable=False,
)

This is linked in a secondary table to the Team model in

projects = orm.relationship(
"Project", secondary=TeamProjectRole.__table__, backref="teams", viewonly=True # type: ignore # noqa
)

Proposal

Given the release object in the JSON API, you can (probably) traverse the datamodel to get a list of maintainers with the following:

maintainer_to_roles = defaultdict(list)
maintainers = {}
for tpr in release.project.team_project_roles:
	role_name = tpr.role_name
	for user in tpr.team.members:
		maintainers[user.username] = user
		maintainer_to_roles[user.username].append(role_name)
maintainers = [
	{
		"username": username,
		"name": user.name,
		"roles": sorted(maintainer_to_roles[username])
	}
	for username, user in maintainers.items()
]

@miketheman
Copy link
Member

@cthoyt Thanks for the investigation! Would you consider turning this into a pull request? Our dev docs should help get you going. https://warehouse.pypa.io/

@miketheman miketheman added the help needed We'd love volunteers to advise on or help fix/implement this. label May 23, 2023
@cthoyt
Copy link
Contributor

cthoyt commented May 23, 2023

@miketheman yes, in fact I already have the code ready :) will post it later

@peterk
Copy link

peterk commented Sep 22, 2024

@cthoyt Did you work any more on implementing this? I had a look today and maybe the code from the package html view could be used to load maintainers?

# Get all of the maintainers for this project.
maintainers = [
r.user
for r in (
request.db.query(Role)
.join(User)
.filter(Role.project == project)
.distinct(User.username)
.order_by(User.username)
.all()
)
]

peterk added a commit to peterk/warehouse that referenced this issue Sep 23, 2024
@peterk peterk linked a pull request Sep 23, 2024 that will close this issue
peterk added a commit to peterk/warehouse that referenced this issue Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help needed We'd love volunteers to advise on or help fix/implement this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants