Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloudapi routes for listing and deleting composes #4154

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

bcl
Copy link
Contributor

@bcl bcl commented May 18, 2024

This pull request includes:

  • adequate testing for the new functionality or fixed issue
  • adequate documentation informing people about the change such as
    • submit a PR for the READMEs listed here
    • submit a PR for the osbuild.org website repository if this PR changed any behavior not covered by the automatically updated READMEs

This PR adds 2 new cloudapi routes, GET /composes and DELETE /composes/delete/<uuid>, see the commits for the details.
As part of this it now also cleans up artifacts that are no longer attached to a job.

What it doesn't do is work with the dbjobqueue yet. It works on-premises with the fsjobqueue.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 23, 2024
@bcl bcl removed the Stale label Jun 26, 2024
Copy link

This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 26, 2024
@bcl bcl removed the Stale label Jul 26, 2024
@bcl bcl marked this pull request as ready for review August 1, 2024 23:17
achilleas-k
achilleas-k previously approved these changes Aug 5, 2024
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This LGTM. Other than a couple of very minor test variable names, which I don't feel too strongly about.

internal/jobqueue/fsjobqueue/fsjobqueue_test.go Outdated Show resolved Hide resolved
internal/jobqueue/fsjobqueue/fsjobqueue_test.go Outdated Show resolved Hide resolved
achilleas-k
achilleas-k previously approved these changes Aug 8, 2024
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM :)

achilleas-k
achilleas-k previously approved these changes Aug 8, 2024
Copy link
Member

@croissanne croissanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Mostly some SQL issues.

And I am confused about one thing, I've always thought as the first job in the chain (depsolve > manifest > osbuild) to be the root node.

@@ -92,6 +92,9 @@ type JobQueue interface {

// Deletes the worker
DeleteWorker(workerID uuid.UUID) error

// AllRootJobIDs returns a list of top level job UUIDs that the worker knows about
AllRootJobIDs() ([]uuid.UUID, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add / filter on channel here. That way the brew tenant can't look at the job ids of the consoledot tenant etc…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that belongs at this layer. If the caller wants to filter by channel that is a separate issue. The goal here is to list everything so that composer-cli can let the user examine what's happening.

@@ -93,6 +93,9 @@ type JobQueue interface {
// Deletes the worker
DeleteWorker(workerID uuid.UUID) error

// AllJobIDs returns a list of all job UUIDs that the worker knows about
AllJobIDs() ([]uuid.UUID, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I'd add a job channel.

// the dependants list and then re-save the job instead of removing it.
//
// This assumes that the jobs have been created correctly, and that they have
// no dependency loops. Shared Dependants are ok, but a job cannot have a dependancy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, dependancy > dependency.

@@ -18,6 +18,7 @@ import (
"github.com/labstack/echo/v4"
"github.com/labstack/echo/v4/middleware"
"github.com/sirupsen/logrus"
"golang.org/x/exp/slices"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use "slices" now, we're using 1.21

@@ -341,6 +342,42 @@ func (s *Server) RemoveJob(id uuid.UUID) error {
return s.jobs.RemoveJob(id)
}

// CleanupArtifacts removes worker artifact directories that do not have matching jobs
func (s *Server) CleanupArtifacts() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder how this will run in the service, probably fine since I don't think we create anything in the artifact directory there.


return nil, nil
var jobs []uuid.UUID
jobs, err = q.AllJobIDs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to just write a SQL query to query all IDs which aren't in the dependency_id (in case the osbuild/koji-finalize jobs are the root nodes) column in the job_dependencies table. In the DB there could be thousands of jobs so it's better to just offload the filtering to the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with thousands of jobs I don't think performance should be a problem. I find it far easier to think about this in go than in SQL. I find complex SQL queries are difficult to maintain.

@@ -70,6 +74,9 @@ const (
SELECT job_id
FROM job_dependencies
WHERE dependency_id = $1`
sqlDeleteDependencies = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this isn't necessary, because deleting from the jobs table cascades automatically to the dependencies and heartbeats table.

// This assumes that the jobs have been created correctly, and that they have
// no dependency loops. Shared Dependents are ok, but a job cannot have a dependency
// on any of its parents (this should never happen).
func (q *DBJobQueue) RemoveJob(id uuid.UUID) error {
Copy link
Member

@croissanne croissanne Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just deleting from the jobs table and letting psql do the rest via cascades is advised here. Should be much simpler.

We used to use this query to recursively select dependencies:

	sqlQueryDepedenciesRecursively = `
                WITH RECURSIVE dependencies(d) AS (
                                SELECT dependency_id
                                FROM job_dependencies
                                WHERE job_id = $1
                                UNION ALL
                                SELECT dependency_id
                                FROM dependencies, job_dependencies
                                WHERE job_dependencies.job_id = d  )
                SELECT * FROM dependencies`

Could tweak it to switch to job dependants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not sure if the way the db is setup is actually the same as the fsjobqueue then. The goal is to only delete the jobs that only depend on the jobs 'above' it in the tree. That's why there is a sqlDeleteDependencies query.

I can't draw this in ascii so I'll try with words :) My understanding is that it is possible that a job could have two dependents (I like to call them parents). So when you delete a top level job you only want to walk down the tree as far as the job with two parents, and remove the parent that was in the path back to the root job you are deleting but leave that job with its other parent intact.

if err != nil {
return jobIDs, err
}
if !exists || len(j.Dependents) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always considered the root node to be the first job in a chain. So in a depsolve > manifest > osbuild chain it's the depsolve job that's the root I thought, and in a koji job koji-init. The osbuild / koji-finalize would be leaf nodes then.

But this considers the osbuild or koji-finalize job to be the root node, no? As those are the only ones without dependents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any live SQL examples, but with the fsjobqueue backend when you start a compose the first job is the one with no Dependents. TBF I find the terminology here hard to keep straight in my head, but from what I see happening Dependencies are jobs the current job is a parent of, and Dependents are jobs that depend on the current job.

@bcl bcl force-pushed the main-composes-list branch 2 times, most recently from 632ea2a to c2ce996 Compare September 25, 2024 17:49
This lists the root job UUIDs (the jobs with no dependants).
Currently only implemented by fsjobqueue. The function for
dbjobqueue currently returns nil.

Related: RHEL-60120
This will be used to list the top level job UUIDs.

Related: RHEL-60120
This function lists all of the jobs, not just the root jobs.
Currently only implemented by fsjobqueue. The function for
dbjobqueue currently returns nil.

Related: RHEL-60120
This allows database entries to be removed.

Related: RHEL-60120
This allows jobs to be removed from the database.
Currently only implemented by fsjobqueue. The function for
dbjobqueue currently returns nil.

This will remove all the job files used by the root job UUID as long as
no other job depends on them. ie. It starts at the top, and moves down
the dependency tree until it finds a job that is also used by another
job.

Related: RHEL-60120
This will be used to delete jobs and their artifacts.

Related: RHEL-60120
This adds the handler for /composes/delete which will delete a job and
all of its dependencies.

Related: RHEL-60120
This makes it easier for the delete handler to test for running jobs and
reject the delete request. The job must be either a success or failure
before it can be deleted.

Related: RHEL-60120
This removes all artifact directories, and their contents, if there
isn't an associated Job ID (as returned by worker.Server.jobs.AllJobIDs)

Related: RHEL-60120
This adds SQL to delete jobs and dependencies, and implements the
database version of the RemoveJobs function.

Related: RHEL-60120
This adds tests for retrieving all job, all root jobs, and removing jobs
and unused dependencies. These tests are run against the fsjobqueue for
unit testing, and against dbjobqueue for integration testing.

Resolves: RHEL-60120
Copy link

github-actions bot commented Nov 9, 2024

This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 9, 2024
@bcl bcl removed the Stale label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants