-
Notifications
You must be signed in to change notification settings - Fork 220
Emit restart events when a job is enqueued #6816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (98.76%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #6816 +/- ##
==========================================
- Coverage 99.26% 99.25% -0.01%
==========================================
Files 402 402
Lines 41396 41469 +73
==========================================
+ Hits 41090 41162 +72
- Misses 306 307 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
okurz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs tests
8632b55 to
e5e5fd2
Compare
okurz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good so far. I suggest you do a manual system test with a local openQA instance and ensure that no duplicate events are sent.
Also in your commit message I don't quite understand "With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria". What "end" and what "criteria"? Also "check if it can restarted" is grammatically incorrect
e5e5fd2 to
44e622e
Compare
44e622e to
0b69cf5
Compare
updated. |
The only change in this commit is that, if the job is restarted via How about: "When a job is enqueued to be restarted via the |
I avoid to reffer |
|
When a job is enqueued to be restarted as part of the |
c3478ec to
c8fe5b4
Compare
0499fd0 to
b987963
Compare
Martchus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the inline comments, this might be much harder to make actually work than we estimated. Maybe we should reevaluate the best way of implementing this and possibly also re-consider whether it is worth it.
lib/OpenQA/Task/Job/Restart.pm
Outdated
| sub restart_openqa_job ($minion_job, $openqa_job) { | ||
| my $cloned_job_or_error = $openqa_job->auto_duplicate; | ||
| my $is_ok = ref $cloned_job_or_error || $cloned_job_or_error =~ qr/(already.*clone|direct parent)/i; | ||
| OpenQA::Events->singleton->emit_event('job_restart', data => {id => $openqa_job->id}) if $is_ok; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this actually work we would probably need to make sure the AMQP plugin is loaded in the Gru service. It probably isn't so emitting this event probably has no effect. Now that I think about it, running the AMQP plugin within the Gru service and the processes it forks to run the particular Minion jobs is probably not so easy and could end up becoming quite a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good one. Lets take a step back.
It probably isn't so emitting this event probably has no effect
I am not sure about that but the emit_event in lib/OpenQA/Events.pm implies that this method is intended for non-controller contexts, such as Minion background tasks. right?
To make this actually work we would probably need to make sure the AMQP plugin is loaded in the Gru service.
Maybe is enough to make the Openqa::events::emit_events clever and enqueue an event from there? could that possibly work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intended to be used from non-controller context but not Minion tasks. I think "non-controller context" mainly refers to DBIx result classes or some utility modules.
We probably don't need to make emit_events clever. We just need to make sure that in the process that runs the Minion task the AMQP plugin is initialized (and maybe also other similar plugins for the sake of consistency). The problem with that is just that it might get messy due to the involved forking in combination with having to manage a permanent connection to the AMQP server.
Of course we could actually make emit_event clever and make it invoke some internal route of the normal web UI service to emit the event in case it is running from the Gru service. That might be the easiest solution - even though it requires a new internal route to do IPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this can closed as well?
t/api/18-job-amqp-event.t
Outdated
| combined_like { $t->post_ok('/api/v1/jobs/99926/restart?force=1')->status_is(200) } qr/Job 99926 duplicated/, | ||
| 'Job restarted successfully'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This route is not what your ticket is about. Of course it makes also sense to add an explicit test for this in case we don't already have one. However, the relevant route would be the one that ends with /done I suppose. And then we'd of course need to run the Minion job enqueued by that route. And then you'll probably run into the problem mentioned in my other review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I use this opportunity to remind about TDD (test driven development).
I took this PR, but only the new test and not the code changes.
The test passed.
It should be clear that this should not be the case. If you write a new test for a new or changed feature, then the test must fail until you add the actual code changes. If it passes, then you are not testing the new feature.
For checking this rule it isn't even important whether you started development with a test or not.
So for the future: This is an easy rule to check for yourself before you push something to a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I remember correctly, I added that change mainly because was not covered in the api tests. Apparently thats why it is added in t/api. it could be something else as well simultaneously. in a few words, this was not meant to cover the new case. The emission from the minion job should have implemented in 10-jobs.t:AMQP event emission for minion restarts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is covering an existing case that was not covered before, it should be in its own commit.
perlpunk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran a job with RETRY=1 in my local instance with this change.
I am seeing Sending AMQP event: opensusetest.openqa.job.done in the webui log, so the AMQP plugin seems to be working.
Now I would expect a message Sending AMQP event: opensusetest.openqa.job.restart" in the gru log, but I don't see it, so my conclusion is that the plugin is indeed not loaded in the gru server.
f470c8a to
f4b0cb6
Compare
That's good news! |
lib/OpenQA/Shared/Plugin/Gru.pm
Outdated
|
|
||
| try { | ||
| $app->plugin('AMQP'); | ||
| Mojo::IOLoop->singleton->one_tick; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is described in the commit (if you think it is not clear let me know to update it.)
but to have a proper answer.
The register function
openQA/lib/OpenQA/WebAPI/Plugin/AMQP.pm
Line 31 in 1801a6d
| Mojo::IOLoop->singleton->next_tick( |
Mojo::IOLoop->singleton->next_tick. The task tries to emit an event but it isnt actually registered yet. So I tried to force it, in order to have it registered asap. one_tick should trigger next_tick in the register sub.
|
Today I will take care of the coverage. |
Martchus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether the AMQP plugin will be able to maintain its connection without the event loop running in the background I don't know. Maybe this approach can work. Maybe you'll need to run the event loop explicitly also after emitting the event.
lib/OpenQA/Shared/Plugin/Gru.pm
Outdated
|
|
||
| try { | ||
| $app->plugin('AMQP'); | ||
| Mojo::IOLoop->singleton->one_tick; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably work but is of course a bit fragile as there's no explicit condition for stopping the event loop this way. I would at least add a short note:
| Mojo::IOLoop->singleton->one_tick; | |
| Mojo::IOLoop->singleton->one_tick; # handle async plugin registration |
Of course it would be better to register some event handler that is fired after the registration has completed in which we will stop the event loop explicitly. Then we would just start the event loop.
Maybe this kind of plugin registration should be its own function, e.g. _register_amqp_plugin_synchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the comment. And it took me a while to find how to write the test for the test coverage. I hope it is fine to keep it as is for now and write a note in the ticket, in case it is something to look at later.
lib/OpenQA/Schema/Result/Jobs.pm
Outdated
| my $openqa_job_id = $self->id; | ||
| my $minion_job_id = OpenQA::App->singleton->gru->enqueue(restart_job => [$openqa_job_id], $options)->{minion_id}; | ||
| log_debug "Enqueued restarting openQA job $openqa_job_id via Minion job $minion_job_id"; | ||
| OpenQA::App->singleton->emit_event(openqa_job_restart => \(id => $openqa_job_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have removed this. I will check
b11bce2 to
7c83bef
Compare
|
The 3 commits and their commit messages don't make sense to me. Commit 1:
That's a big wall of text and I don't understand most of it. Commit 2:
ok. Commit 3:
This commit contains fixes for the previous two separate commits. That's not what the message says. Now the titles of commit 1 and 3:
Commit 3 just contains fixes, and "when a job is enqueued" is not really helpful. What does it mean a job is enqueued? That's a very vague thing. |
|
I tried this out with a local rabbitmq and reported the problems here: https://progress.opensuse.org/issues/190557#note-27 |
377acf8 to
c98c54a
Compare
|
Thanks to @Martchus we got the AMQP plugin working in the gru server during our collab session: https://progress.opensuse.org/issues/190557#note-29 |
692b0b0 to
309a1c6
Compare
| # SPDX-License-Identifier: GPL-2.0-or-later | ||
|
|
||
| use Test::Most; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously left over. I had edited it when i tried to add a test. I think I added signatures. When I remove it I missed that there was a line here previously. fixed
TBD Using the `restart_openqa_job` from the Restart module should not duplicate the event which is emitted by the API, which also invoke `auto_duplicate`. And it should also trigger the event when the job is processed and not while is queued. That should make sure that the event is send to AMQP for jobs with `RETRY` once it is actually triggered. Load AMQP plugin into the Gru tasks Makes possible to the minion tasks to emit events, as it makes the AMQP plugin available in its context. The `one_tick` is called to trigger the `next_tick` in the `register` function of the AMPQ module immediately, otherwise it seems like it is registered late and the event is not emitted. issue: https://progress.opensuse.org/issues/190557 Signed-off-by: Ioannis Bonatakis <[email protected]>
The plugin gets loaded automatically via OpenQA::Setup
Start Mojo::IOLoop and let it end when the AMQP plugin has finished (or failed) publishing the event
309a1c6 to
7d16752
Compare
And create test coverage - Covers the emission of events on Minion restarts - Loads the AMQP plugin - Checks event body for consistency with events from API issue: https://progress.opensuse.org/issues/190557 Signed-off-by: Ioannis Bonatakis <[email protected]>
7d16752 to
406afd4
Compare
| if ($is_ok) { | ||
| my $openqa_job_id = $openqa_job->id; | ||
| my $result = ref $cloned_job_or_error ? $cloned_job_or_error->{cluster_cloned} : $cloned_job_or_error; | ||
| my %event_data = (id => $openqa_job_id, result => $result, auto => 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't emit an event if cloning failed.
| if ($is_ok) { | |
| my $openqa_job_id = $openqa_job->id; | |
| my $result = ref $cloned_job_or_error ? $cloned_job_or_error->{cluster_cloned} : $cloned_job_or_error; | |
| my %event_data = (id => $openqa_job_id, result => $result, auto => 1); | |
| if (ref $cloned_job_or_error) { | |
| my %event_data = (id => $openqa_job->id, result => $cloned_job_or_error->{cluster_cloned}, auto => 1); | |
| my $jobs2 = $t->app->schema->resultset('Jobs'); | ||
| is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file'; | ||
|
|
||
| my %_settings = %settings; | ||
| $_settings{TEST} = 'test_amqp_restart'; | ||
| $_settings{RETRY} = '1'; | ||
| my $job = $jobs2->create_from_settings(\%_settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a new variable $jobs2.
| my $jobs2 = $t->app->schema->resultset('Jobs'); | |
| is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file'; | |
| my %_settings = %settings; | |
| $_settings{TEST} = 'test_amqp_restart'; | |
| $_settings{RETRY} = '1'; | |
| my $job = $jobs2->create_from_settings(\%_settings); | |
| is $t->app->config->{amqp}->{enabled}, 1, 'AMQP enabled from config file'; | |
| my %_settings = %settings; | |
| $_settings{TEST} = 'test_amqp_restart'; | |
| $_settings{RETRY} = '1'; | |
| my $job = $jobs->create_from_settings(\%_settings); |


With this commit once the job goes to its end, it will check if it can restarted and if it fulfills the criteria, it will send the new job to the minion queue via
enqueue_restart, where now it should populate the event.issue: https://progress.opensuse.org/issues/190557