Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job labels #530

Open
andrewmarkle opened this issue Mar 10, 2025 · 4 comments
Open

Job labels #530

andrewmarkle opened this issue Mar 10, 2025 · 4 comments

Comments

@andrewmarkle
Copy link

andrewmarkle commented Mar 10, 2025

Hey @rosa. Thanks for this gem all the work you've done with SolidQueue. We're transitioning our monolith from Resque to SolidQueue and it's been a great experience!

Just wondering if you would be interested in a feature similar to GoodJob's labelled jobs.

The problem

For us we have a need for something like this feature so that we can add team labels to jobs. Our Rails app is worked on by hundreds of developers and we have a concept of codeownership where each class is owned and maintained by a team and this concept is codified in our app. What we would really love to do is be able to filter jobs by teams. This would allow teams to triage their own jobs and it would generally make our lives easier. But this solution would be flexible enough that anyone could add their own labels or job metadata and filter jobs by these labels in the UI.

Proposed solution

The basic premise is there would be an active job extension (very similar to how concurrency_controls are implemented) that you could include in your job and then you can add a label to a specific job.

class ApplicationJob < ActiveJob::Base
  include ActiveJob::Labels
end

# Add a default label to every job within the class
class WelcomeJob < ApplicationJob
  self.job_labels = ["email"]

  def perform
    # Labels can be inspected from within the job
    puts job_labels # => ["email"]
  end
end

# Or add to individual jobs when enqueued
WelcomeJob.set(job_labels: ["email"]).perform_later

This would then tie into Mission Control and we'd be able to filter jobs by these labels in the UI.

How Goodjob works is they store these labels in the database and essentially stamp jobs as they get enqueued with whatever labels are there at the time. This allows the UI to quickly filter and display these labels. For goodjob, because it's postgres, they store it as an array of text.

We could do something similar but I was kind of thinking of having a labels as it's own table? With unique label names. And then SolidQueue::Job would basically have the equivalent of a has_many :labels relationship and, when jobs are enqueued, we could essentially do a label.find_or_create_by(name: active_job.label). In reality it would have to be more efficient than this so that we don't slow down job enqueuing but that's the general idea.

Proposed table changes:

# add a label_id to solid_queue_jobs
change_table "solid_queue_jobs" do |t|
  t.bigint "label_id", null: true
  t.index ["label_id"], name: "index_solid_queue_label"
end

create_table "solid_queue_labels" do |t|
  t.datetime "created_at", null: false
  t.name, null: false, unique: true
  t.index ["label"], name: "solid_queue_label_name"
end

Anyway just throwing it out there. If this is something you would be interested in for solid_queue/mission control please let me know. I'd be happy to work on this feature and contribute back to both projects. Of course open to any feedback!

Thanks so much!

@rosa
Copy link
Member

rosa commented Mar 11, 2025

Hey @andrewmarkle, thanks for writing this up! I didn't know about this GoodJob feature, it seems quite useful for cases like yours indeed. I imagine the ability to filter by labels is very important, no? It's not like you want to check the label when you get a failed job, so you don't actually need to filter by it, no? 🤔 I ask because I was wondering if this could be something that Active Job supports, instead of being something specific to the adapter... in that case, the obvious solution would be to store the labels with the Active Job serialized data, but filtering wouldn't be possible unless the adapter stored these in a specific way to allow for that. You'd get labels working out of the box with all adapters, though 🤔

@andrewmarkle
Copy link
Author

Hey Rosa. Thanks for the reply!

I think there's definitely a use-case for us (and I'm sure others) that need to categorize or filter jobs into different buckets. For our organization the biggest benefit would be team labels. But I think this would be a flexible feature where folks could add labels for all kinds of things. Slow jobs, jobs that fail more frequently than they should, a specific type of error—whatever you want to highlight really!

That's a really interesting idea to add labels as part of ActiveJob though. I hadn't considered that! I think it would be fairly trivial to add a labels or tags field as part of the arguments for ActiveJob's serialized data. That would be a nice way to do it if the Rails team likes that change.

For us we would really like to filter jobs by these labels. Yes especially failed jobs! For us, on a bad day we might have 100s or even 1000 failed jobs. Currently we have a process where there's a Ruby dev that's on-call for the day and, as part of being on call, they go through all the failed jobs and figure out what to do with them (retry, discard, etc.) But as we continue to grow and add more teams and developers we're finding out that this process isn't scaling well. It would be much better if teams were responsible for triaging their own failed jobs. And to do that we need someway to segment failed jobs into different categories (aka labels! 🙂).

I think the real power comes with being able to take that data and filter by it IMO. I think there's a couple ways you could slice this.

1. Full on adapter support where labels are a first class citizen.

For solid_queue I think this would likely mean database changes to add labels as jobs get enqueued. Mostly just for the reason so that you can quickly and efficiently filter jobs by these labels in Mission Control. This would make querying for jobs by labels fast and easy. And it scales. But... it won't work for other adapters like Resque.

2. In-memory filtering

If we didn't want to go that route (or we just wanted to support it for other adapters) then I think we could still have something where we filter jobs by labels in-memory. This is doable! The trade-off is that the more labels / jobs you have the slower filtering is going to be. This might be fine for adapters like Resque as those jobs have a shelf life. Once they're done they are gone from existence and we no longer need to display them! But with SolidQueue it's a different matter because finished jobs are stored indefinitely. If we wanted to filter finished jobs by a label then I feel there's a tipping point where this no longer works at scale.

How GoodJob does it is they add a labels column into their dashboard:

Image

I was thinking of doing something similar in MissionControl if that makes sense. Where each job would have its associated labels that you could filter by.

Anyway thanks for the discussion. Let me know what you prefer. If you like the idea I can put together a proof of concept so we can look at something real. 🙇‍♂️

@rosa
Copy link
Member

rosa commented Mar 11, 2025

Yes! Full-on adapter support vs. in-memory filtering is already at play in Mission Control for filtering jobs by class name. Solid Queue supports that "natively" (there's a column for that in the solid_queue_jobs table) whereas for Resque, the filtering happens in memory. That wouldn't be a problem, but yes, you'd run into performance problems depending on how many jobs you have 😅 If you only ever used it to filter failed jobs, it could be ok (class filtering works fine for us with Resque because we apply it mostly to failed jobs), but if you wanted to filter finished jobs, then that'd be a different matter 🤔

I think, to support this in Solid Queue, the labels would need to be part of the solid_queue_jobs table to avoid more INSERTs when enqueuing, but I'm not sure yet what column type / index type would work in MySQL, PostgreSQL and SQLite 🤔 I'd need to look into it a bit more. If you know this, please feel free to share! 🙏

@andrewmarkle
Copy link
Author

I'll look into this and give you some options! I forgot that we had to support all 3 types of databases but that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants