Skip to content

Conversation

@d3flex
Copy link
Contributor

@d3flex d3flex commented Oct 22, 2025

With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria, it will send the new job to the minion queue via enqueue_restart, where now it should populate the event.

issue: https://progress.opensuse.org/issues/190557

@d3flex
Copy link
Contributor Author

d3flex commented Oct 22, 2025

  • I want to see where it will fail (some test failed on my side) and the codecov report
  • also I cant see where a test for this should go. there is t/10-jobs.t as an obvious candidate but it doesnt seem to cover the events

@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 98.76543% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.25%. Comparing base (1801a6d) to head (406afd4).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
lib/OpenQA/Task/Job/Restart.pm 94.11% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (98.76%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6816      +/-   ##
==========================================
- Coverage   99.26%   99.25%   -0.01%     
==========================================
  Files         402      402              
  Lines       41396    41469      +73     
==========================================
+ Hits        41090    41162      +72     
- Misses        306      307       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@perlpunk
Copy link
Contributor

  • also I cant see where a test for this should go. there is t/10-jobs.t as an obvious candidate but it doesnt seem to cover the events

t/23-amqp.t looks right in this case. Take a job and call enqueue_restart on it

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests

@d3flex d3flex force-pushed the feat/amqp_restart branch from 8632b55 to e5e5fd2 Compare October 23, 2025 07:44
Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good so far. I suggest you do a manual system test with a local openQA instance and ensure that no duplicate events are sent.

Also in your commit message I don't quite understand "With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria". What "end" and what "criteria"? Also "check if it can restarted" is grammatically incorrect

@d3flex d3flex force-pushed the feat/amqp_restart branch from e5e5fd2 to 44e622e Compare October 23, 2025 08:12
@d3flex
Copy link
Contributor Author

d3flex commented Oct 23, 2025

code looks good so far. I suggest you do a manual system test with a local openQA instance and ensure that no duplicate events are sent.

Also in your commit message I don't quite understand "With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria". What "end" and what "criteria"? Also "check if it can restarted" is grammatically incorrect

image

@d3flex d3flex force-pushed the feat/amqp_restart branch from 44e622e to 0b69cf5 Compare October 23, 2025 13:05
@d3flex
Copy link
Contributor Author

d3flex commented Oct 23, 2025

code looks good so far. I suggest you do a manual system test with a local openQA instance and ensure that no duplicate events are sent.

Also in your commit message I don't quite understand "With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria". What "end" and what "criteria"? Also "check if it can restarted" is grammatically incorrect

updated.

@d3flex d3flex marked this pull request as ready for review October 23, 2025 13:06
@d3flex d3flex removed the not-ready label Oct 23, 2025
@perlpunk
Copy link
Contributor

With this commit once the job goes to execute its final stage, it will
check if it can be restarted. If yes, it will send the new job to
the minion queue via enqueue_restart, where now it should populate
the event.

The only change in this commit is that, if the job is restarted via enqueue_restart, it now publishes a restart event.
The description sounds overly complicated ("goes to execute its final stage").
" it will check if it can be restarted" - that's already happening without your commit.

How about: "When a job is enqueued to be restarted via the RETRY feature, it now publishes a job.restart event."

@d3flex
Copy link
Contributor Author

d3flex commented Oct 23, 2025

With this commit once the job goes to execute its final stage, it will
check if it can be restarted. If yes, it will send the new job to
the minion queue via enqueue_restart, where now it should populate
the event.

The only change in this commit is that, if the job is restarted via enqueue_restart, it now publishes a restart event. The description sounds overly complicated ("goes to execute its final stage"). " it will check if it can be restarted" - that's already happening without your commit.

How about: "When a job is enqueued to be restarted via the RETRY feature, it now publishes a job.restart event."

I avoid to reffer RETRY because is not the only path to enqueue_restart. are you good with When a job is enqueued to be restarted, it now publishes a job.restart event?

@perlpunk
Copy link
Contributor

When a job is enqueued to be restarted as part of the $job->done handling, a job.restart event is now published

@d3flex d3flex force-pushed the feat/amqp_restart branch 2 times, most recently from c3478ec to c8fe5b4 Compare October 25, 2025 04:12
@d3flex d3flex force-pushed the feat/amqp_restart branch 3 times, most recently from 0499fd0 to b987963 Compare November 4, 2025 09:38
Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the inline comments, this might be much harder to make actually work than we estimated. Maybe we should reevaluate the best way of implementing this and possibly also re-consider whether it is worth it.

sub restart_openqa_job ($minion_job, $openqa_job) {
my $cloned_job_or_error = $openqa_job->auto_duplicate;
my $is_ok = ref $cloned_job_or_error || $cloned_job_or_error =~ qr/(already.*clone|direct parent)/i;
OpenQA::Events->singleton->emit_event('job_restart', data => {id => $openqa_job->id}) if $is_ok;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this actually work we would probably need to make sure the AMQP plugin is loaded in the Gru service. It probably isn't so emitting this event probably has no effect. Now that I think about it, running the AMQP plugin within the Gru service and the processes it forks to run the particular Minion jobs is probably not so easy and could end up becoming quite a mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good one. Lets take a step back.

It probably isn't so emitting this event probably has no effect

I am not sure about that but the emit_event in lib/OpenQA/Events.pm implies that this method is intended for non-controller contexts, such as Minion background tasks. right?

To make this actually work we would probably need to make sure the AMQP plugin is loaded in the Gru service.

Maybe is enough to make the Openqa::events::emit_events clever and enqueue an event from there? could that possibly work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intended to be used from non-controller context but not Minion tasks. I think "non-controller context" mainly refers to DBIx result classes or some utility modules.

We probably don't need to make emit_events clever. We just need to make sure that in the process that runs the Minion task the AMQP plugin is initialized (and maybe also other similar plugins for the sake of consistency). The problem with that is just that it might get messy due to the involved forking in combination with having to manage a permanent connection to the AMQP server.

Of course we could actually make emit_event clever and make it invoke some internal route of the normal web UI service to emit the event in case it is running from the Gru service. That might be the easiest solution - even though it requires a new internal route to do IPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can closed as well?

Comment on lines 48 to 49
combined_like { $t->post_ok('/api/v1/jobs/99926/restart?force=1')->status_is(200) } qr/Job 99926 duplicated/,
'Job restarted successfully';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This route is not what your ticket is about. Of course it makes also sense to add an explicit test for this in case we don't already have one. However, the relevant route would be the one that ends with /done I suppose. And then we'd of course need to run the Minion job enqueued by that route. And then you'll probably run into the problem mentioned in my other review comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I use this opportunity to remind about TDD (test driven development).
I took this PR, but only the new test and not the code changes.
The test passed.

It should be clear that this should not be the case. If you write a new test for a new or changed feature, then the test must fail until you add the actual code changes. If it passes, then you are not testing the new feature.

For checking this rule it isn't even important whether you started development with a test or not.

So for the future: This is an easy rule to check for yourself before you push something to a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remember correctly, I added that change mainly because was not covered in the api tests. Apparently thats why it is added in t/api. it could be something else as well simultaneously. in a few words, this was not meant to cover the new case. The emission from the minion job should have implemented in 10-jobs.t:AMQP event emission for minion restarts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is covering an existing case that was not covered before, it should be in its own commit.

Copy link
Contributor

@perlpunk perlpunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran a job with RETRY=1 in my local instance with this change.

I am seeing Sending AMQP event: opensusetest.openqa.job.done in the webui log, so the AMQP plugin seems to be working.

Now I would expect a message Sending AMQP event: opensusetest.openqa.job.restart" in the gru log, but I don't see it, so my conclusion is that the plugin is indeed not loaded in the gru server.

@d3flex d3flex force-pushed the feat/amqp_restart branch 3 times, most recently from f470c8a to f4b0cb6 Compare November 18, 2025 19:34
@d3flex
Copy link
Contributor Author

d3flex commented Nov 18, 2025

I just ran a job with RETRY=1 in my local instance with this change.

I am seeing Sending AMQP event: opensusetest.openqa.job.done in the webui log, so the AMQP plugin seems to be working.

Now I would expect a message Sending AMQP event: opensusetest.openqa.job.restart" in the gru log, but I don't see it, so my conclusion is that the plugin is indeed not loaded in the gru server.

you were right. So i went back to the suggestion. I submit a second commit to register the AMQP inside the minion space.

Here what I see now in the gru logs.

image

@d3flex d3flex removed the invalid label Nov 18, 2025
@perlpunk
Copy link
Contributor

I submit a second commit to register the AMQP inside the minion space.

That's good news!
So now you can work on the open discussions.


try {
$app->plugin('AMQP');
Mojo::IOLoop->singleton->one_tick;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is described in the commit (if you think it is not clear let me know to update it.)
but to have a proper answer.
The register function

Mojo::IOLoop->singleton->next_tick(
registers the events inside the Mojo::IOLoop->singleton->next_tick. The task tries to emit an event but it isnt actually registered yet. So I tried to force it, in order to have it registered asap. one_tick should trigger next_tick in the register sub.

@d3flex
Copy link
Contributor Author

d3flex commented Nov 19, 2025

Today I will take care of the coverage.

Copy link
Contributor

@Martchus Martchus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the AMQP plugin will be able to maintain its connection without the event loop running in the background I don't know. Maybe this approach can work. Maybe you'll need to run the event loop explicitly also after emitting the event.


try {
$app->plugin('AMQP');
Mojo::IOLoop->singleton->one_tick;
Copy link
Contributor

@Martchus Martchus Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably work but is of course a bit fragile as there's no explicit condition for stopping the event loop this way. I would at least add a short note:

Suggested change
Mojo::IOLoop->singleton->one_tick;
Mojo::IOLoop->singleton->one_tick; # handle async plugin registration

Of course it would be better to register some event handler that is fired after the registration has completed in which we will stop the event loop explicitly. Then we would just start the event loop.

Maybe this kind of plugin registration should be its own function, e.g. _register_amqp_plugin_synchronously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comment. And it took me a while to find how to write the test for the test coverage. I hope it is fine to keep it as is for now and write a note in the ticket, in case it is something to look at later.

my $openqa_job_id = $self->id;
my $minion_job_id = OpenQA::App->singleton->gru->enqueue(restart_job => [$openqa_job_id], $options)->{minion_id};
log_debug "Enqueued restarting openQA job $openqa_job_id via Minion job $minion_job_id";
OpenQA::App->singleton->emit_event(openqa_job_restart => \(id => $openqa_job_id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have removed this. I will check

@perlpunk
Copy link
Contributor

perlpunk commented Nov 19, 2025

The 3 commits and their commit messages don't make sense to me.

Commit 1:

Emit restart events when job restarted automatically
Send an event through the task itself after the auto_duplicate when the job
have been process and send for restart. While the auto_duplicate could be
used as a single point for the event's emission it interferes with other
functions which also emit events independently. For instance from the API
restart, it processes the payload and adds data to it, so it left as is.
Using the restart_openqa_job from the Restart module should not duplicate
the event which is emitted by the API, which also invoke auto_duplicate. And
it should also trigger the event when the job is processed and not while is
queued.

That should make sure that the event is send to AMQP for jobs with
RETRY once it is actually triggered.

That's a big wall of text and I don't understand most of it.

Commit 2:

Load AMQP plugin into the Gru tasks

ok.

Commit 3:

Emit restart events when a job is enqueued
With this commit once the job goes to its end, it will check if it can
restarted and if it fulfills the criteria, it will send the new job to
the minion queue via enqueue_restart, where now it should populate
the event.

This commit contains fixes for the previous two separate commits. That's not what the message says.

Now the titles of commit 1 and 3:

  • Emit restart events when job restarted automatically
  • Emit restart events when a job is enqueued

Commit 3 just contains fixes, and "when a job is enqueued" is not really helpful. What does it mean a job is enqueued? That's a very vague thing.

@perlpunk
Copy link
Contributor

I tried this out with a local rabbitmq and reported the problems here: https://progress.opensuse.org/issues/190557#note-27
So currently this doesn't work as expected.

@perlpunk perlpunk force-pushed the feat/amqp_restart branch 2 times, most recently from 377acf8 to c98c54a Compare November 20, 2025 14:45
@perlpunk
Copy link
Contributor

Thanks to @Martchus we got the AMQP plugin working in the gru server during our collab session: https://progress.opensuse.org/issues/190557#note-29
I pushed two commits.

@d3flex d3flex force-pushed the feat/amqp_restart branch 2 times, most recently from 692b0b0 to 309a1c6 Compare November 21, 2025 20:12
# SPDX-License-Identifier: GPL-2.0-or-later

use Test::Most;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously left over. I had edited it when i tried to add a test. I think I added signatures. When I remove it I missed that there was a line here previously. fixed

d3flex and others added 3 commits November 22, 2025 05:56
TBD

Using the `restart_openqa_job` from the Restart module should not duplicate
the event which is emitted by the API, which also invoke `auto_duplicate`. And
it should also trigger the event when the job is processed and not while is
queued.

That should make sure that the event is send to AMQP for jobs with
`RETRY` once it is actually triggered.

Load AMQP plugin into the Gru tasks

Makes possible to the minion tasks to emit events, as it makes the AMQP plugin
available in its context. The `one_tick` is called to trigger the `next_tick`
in the `register` function of the AMPQ module immediately, otherwise it seems
like it is registered late and the event is not emitted.

issue: https://progress.opensuse.org/issues/190557
Signed-off-by: Ioannis Bonatakis <[email protected]>
The plugin gets loaded automatically via OpenQA::Setup
Start Mojo::IOLoop and let it end when the AMQP plugin has finished
(or failed) publishing the event
And create test coverage
- Covers the emission of events on Minion restarts
- Loads the AMQP plugin
- Checks event body for consistency with events from API

issue: https://progress.opensuse.org/issues/190557
Signed-off-by: Ioannis Bonatakis <[email protected]>
Comment on lines +21 to +24
if ($is_ok) {
my $openqa_job_id = $openqa_job->id;
my $result = ref $cloned_job_or_error ? $cloned_job_or_error->{cluster_cloned} : $cloned_job_or_error;
my %event_data = (id => $openqa_job_id, result => $result, auto => 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't emit an event if cloning failed.

Suggested change
if ($is_ok) {
my $openqa_job_id = $openqa_job->id;
my $result = ref $cloned_job_or_error ? $cloned_job_or_error->{cluster_cloned} : $cloned_job_or_error;
my %event_data = (id => $openqa_job_id, result => $result, auto => 1);
if (ref $cloned_job_or_error) {
my %event_data = (id => $openqa_job->id, result => $cloned_job_or_error->{cluster_cloned}, auto => 1);

Comment on lines +965 to +971
my $jobs2 = $t->app->schema->resultset('Jobs');
is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file';

my %_settings = %settings;
$_settings{TEST} = 'test_amqp_restart';
$_settings{RETRY} = '1';
my $job = $jobs2->create_from_settings(\%_settings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a new variable $jobs2.

Suggested change
my $jobs2 = $t->app->schema->resultset('Jobs');
is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file';
my %_settings = %settings;
$_settings{TEST} = 'test_amqp_restart';
$_settings{RETRY} = '1';
my $job = $jobs2->create_from_settings(\%_settings);
is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file';
my %_settings = %settings;
$_settings{TEST} = 'test_amqp_restart';
$_settings{RETRY} = '1';
my $job = $jobs->create_from_settings(\%_settings);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants