Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host #506

jarich · 2023-09-15T07:28:04Z

For jobs running via PBS, job logs are not available until the job has completed (yes, even if the job runs for 4 hours and generates GBs of logs). I don't know why this is the case, but it is the case. PBS spools log messages somewhere and then moves them into the correct locations when the job is done. As such, if remote log fetching is enabled, then pulling the job logs from the job host means additional SSH connections, network traffic and load on the login hosts for zero value. This is even more the case when those connections are left open by users who navigate to another tab or whatever.

It would be ideal if there was a configuration item that told cat-log to only serve files from local disk, if that is what the user wanted.

hjoliver · 2023-09-18T00:16:21Z

Apparently (recent versions of?) PBS do allow you to write job stdout and stderr directly to their final destinations.

3.3.6 Writing Files Directly to Final Destination

If the MoM on the primary execution host can reach the final destination, she can write the job's standard output and
standard error files to that destination. To be reachable, the final destination host and path must either be on the execu-
tion host, or be mapped from the primary execution host via the $usecp directive in the MoM configuration file. To
specify that standard output and/or standard error should be written directly to their final destinations, use the d
sub-option to the -k option to qsub or qalter. Indicate which files to write via the e and/or o suboptions.

For example, to directly write both output and error to their final destinations:
qsub -koed -o <output path> -e <error path> job.sh

To directly write output to its final destination, and let error go through normal spooling and staging:
qsub -kod -o <output path> job.sh

From https://2021.help.altair.com/2021.1.2/PBS%20Professional/PBSUserGuide2021.1.2.pdf

@dpmatthews is this used with Cylc at your site?

Not that this invalidates the Issue - it's a reasonable request regardless.

jarich · 2023-09-18T07:11:59Z

I notice in flow/scripts/cat_log.py the documentation claims:

If remote job logs are retrieved to the workflow host on completion (global
config '[JOB-HOST]retrieve job logs = True') and the job is not currently
running, the local (retrieved) log will be accessed unless '-o/--force-remote'
is used.

There may be a bug in the calling code, if this is supposed to be the case, because if I attempt to select files for jobs I know are complete (and I can see in my cylc-run directory) I can't see these files through the log viewer.

@hjoliver This comes back to what we spoke about with respect to only showing the default file "job-activity.log" when SSH keys have not been set up properly.

dpmatthews · 2023-09-18T10:08:23Z

@hjoliver We have a locally written command which provides access to the stdout and stderr of a job whilst it is running. For all PBS platforms we configure:

        err tailer = qcat -f -e %(job_id)s
        out tailer = qcat -f -o %(job_id)s
        err viewer = qcat -e %(job_id)s
        out viewer = qcat -o %(job_id)s

@jarich

This comes back to what we spoke about with respect to only showing the default file "job-activity.log" when SSH keys have not been set up properly.

I think the UI server is currently using the cat log --force-remote option so I'm not surprised it isn't working correctly if ssh is failing.

dpmatthews · 2023-09-18T15:48:21Z

I think this issue can be resolved simply by removing the use of the -force-remote option. However, this will need a bit of investigation to make sure we understand the implications (@oliver-sanders originally thought it would be needed).

hjoliver · 2023-09-18T21:27:00Z

We have a locally written command which provides access to the stdout and stderr of a job whilst it is running.

@jarich - I had forgotten we supported that, because we've never needed it at my site. But that sounds like something you should do at yours?

jarich · 2023-09-19T05:00:26Z

We have a locally written command which provides access to the stdout and stderr of a job whilst it is running.

@jarich - I had forgotten we supported that, because we've never needed it at my site. But that sounds like something you should do at yours?

I will mention it to those who are able to decide whether to do it or not. I'll also look at making a local edit to remove --force-remote for the time being.

jarich · 2023-09-20T23:43:54Z

I commented out the --force-remote from

cylc-uiserver/cylc/uiserver/resolvers.py

Line 381 in b043bb9

'--force-remote',

and removed the -o from

cylc-uiserver/cylc/uiserver/resolvers.py

Line 459 in b043bb9

cmd: List[str] = ['cylc', 'cat-log', '-m', 'l', '-o', id_.id]

and I can confirm that this achieves the goals I want without yet causing any problems.

Scott will be creating a PR to allow this to be set by configuration.

hjoliver · 2023-09-21T04:58:49Z

I think this issue can be resolved simply by removing the use of the -force-remote option. However, this will need a bit of investigation to make sure we understand the implications (@oliver-sanders originally thought it would be needed).

@oliver-sanders, in light of @jarich 's findings, do you recall the reason for the use of force-remote? Was it to ensure that we always get an up-to-date log file rather than a stale local one? Even if that's wanted, maybe we could eschew remote retrieval if the task has finished and a local log exists. (Speculations from a chat with Jacinta and Scott today).

jarich · 2023-09-21T05:06:42Z

Establishing that the job has finished could be done by calling get_task_job_attrs and I presume the required arguments for that could be determined by unwrapping id with parse_ids

dpmatthews · 2023-09-21T06:45:43Z

Establishing that the job has finished could be done by calling get_task_job_attrs and I presume the required arguments for that could be determined by unwrapping id with parse_ids

As far as I can see, cat-log already gets the log files from the remote platform for running jobs so we don't need to use the --force-remote option in the UI server (and we don't need it to be configurable). We just need to confirm this works correctly in all cases.

oliver-sanders · 2023-09-21T09:31:54Z

@oliver-sanders, in light of @jarich 's findings, do you recall the reason for the use of force-remote?

We got things set up so that the cylc-uiserver log view worked. We were aware that we were going remote in some situations where we didn't necessarily need to. Improving cat-log to avoid this is on the list but a much lower priority for us than other ongoing work.

Some edge cases to consider:

Platforms where job log retrieval is not configured.
Lag between a job completing and the log files being synced back.
Log files which exceed the maximum filesize so are not synced back, but which still need to be visible.
Offline log file viewing for tasks which were active when the scheduler shut down.

The easiest way to avoid these issues without having to look into the implementation in depth was to go to the remote platform for the logs.

hjoliver · 2023-09-21T22:36:39Z

@dpmatthews - given @oliver-sanders' response there, I think making it configurable as per #509 is a reasonable solution for the moment, to give BOM what they need. With the proviso that the config might not be needed after some future release. Agree?

oliver-sanders · 2023-09-22T08:38:13Z

There's no need for configuration here, we just need to invest the time to make any required improvements to cat-log such that the option isn't needed.

Personally I have not had the time to spare to think this through so have left the default as is.

dpmatthews · 2023-09-22T08:59:17Z

We should put the effort into checking whether removal of the option causes any problems rather than adding configuration we don't want

hjoliver · 2023-09-24T22:15:34Z

There's no need for configuration here, we just need to invest the time to make any required improvements to cat-log such that the option isn't needed.

We should put the effort into checking whether removal of the option causes any problems rather than adding configuration we don't want

The problem with that is it's needed quite urgently (apparently) and your approach requires work from us that we probably cannot spare the time for right now.

(Hence my comment "I think making it configurable as per #509 is a reasonable solution for the moment, to give BOM what they need.")

Of course I'm not keen on adding future-unnecessary config to the system in general, but that can't be absolute - it depends on how long it will take to get the feature fixed.

I guess the alternative is temporary local patching.

dpmatthews · 2023-09-25T12:58:25Z

I think the risk of removing this option now is fairly small and we can probably address any issues it causes quite quickly if necessary. So, I'd prefer to just remove the option rather than cause additional work by making it configurable.

jarich · 2023-09-26T07:49:57Z

We can use the patch Scott provided for now, and wait to see what you settle on. So we're not requiring on an immediate release.

…

On Mon, 25 Sept 2023, 22:58 David Matthews, ***@***.***> wrote: I think the risk of removing this option now is fairly small and we can probably address any issues it causes quite quickly if necessary. So, I'd prefer to just remove the option rather than cause additional work by making it configurable. — Reply to this email directly, view it on GitHub <#506 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADI6JQ5RZPWRHXCZOS2YLX4F5XXANCNFSM6AAAAAA4ZMGYTE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

oliver-sanders · 2025-02-06T14:05:09Z

I have dug through the cat-log code and done some testing.

Quick summary:

I think this PR resolves the issue as reported above: cat-log: list out/err files when available via tailer cylc-flow#6480
Turning the --force-remote option off would reduce network traffic when querying finished jobs only, but it comes with a small caveat.

additional SSH connections, network traffic and load on the login hosts for zero value

It seems logical that removing --force-remote would help in this case, but removing this option won't actually prevent remote commands from being run whilst the job is running (the case reported in the original post) because cylc cat-log implicitly goes to the remote platform if the job is active (whether --force-remote is specified or not).
However, removing this option would prevent unnecessary remote commands once the job has finished (providing job log retrieval is configured).
But these remote commands are not of zero value (irrespective of PBS spooling behaviour) because the remote filesystem listing will pick up on other remote-only files such as job.status, job.xtrace, rose-bunch logs and any custom log files.

PBS spools log messages somewhere and then moves them into the correct locations when the job is done

PBS may create its log files in some temporary location then move them into the job log directory after the job finishes.
Cylc can access these files from their temporary locations if you configure a "tailer" and "viewer" command (as Dave mentioned above).
However, log files accessible via "tailer" or "viewer" commands don't presently appear in the GUI because our file listing doesn't take them into account. This is a bug!
This issue will be resolved in Cylc 8.4.1: cat-log: list out/err files when available via tailer cylc-flow#6480
From Cylc 8.4.1 onwards, job.out and job.err files accessible via "tailer" / "viewer" commands will now appear along with any other remote log files in the GUI, even whilst the job is running.
(caveat, you must configure both the "tailer" and the "viewer" for this to work).

It would be ideal if there was a configuration item that told cat-log to only serve files from local disk, if that is what the user wanted.

Removing the --force-remote option will result in users loosing access to log files in some circumstances.
But this will only reduce functionality to match that of Cylc Review (which users are already used to the limitations of) so maybe not too bad.
We can work around the worst of the caveats.

Caveats we will / do handle:

Platforms where remote viewers/tailers are configured.
- WIP (8.4.1): cat-log: list out/err files when available via tailer cylc-flow#6480
Platforms where job log retrieval is not configured.
- We can detect this.
- cat-log will use the --force-remote option implicitly.
- Handled by: https://github.com/cylc/cylc-flow/blob/b4dec534fae8b9ce9999a796182cdf63cb5a6fdc/cylc/flow/scripts/cat_log.py#L573-L574
Offline log file viewing for tasks which were active when the scheduler shut down.
- We can detect this.
- cat-log will use the --force-remote option implicitly.
- Handled by: https://github.com/cylc/cylc-flow/blob/b4dec534fae8b9ce9999a796182cdf63cb5a6fdc/cylc/flow/scripts/cat_log.py#L545

Caveats which are not handled:

Lag between a job completing and the log files being synced back.
- We can't easily detect this (remote log file retrieval is performed by a Cylc subprocess).
Log files which exceed the maximum filesize so are not synced back, but which still need to be visible.
- We can't easily detect this (rsync does the file-size filtering for us).

Symptoms of the remaining caveats:

If the log view is opened whilst a job is running:
- The remote-only log files will be listed in the dropdown, but will become unavailable as soon as the job finishes resulting in a "file not found error" if the user selects them.
- If the log files are refreshed once the job has stopped running, the remote-only files will disappear from the listing.
If the log view is opened once the job has finished:
- The remote-only files will not be available.

The main issue on the UX front is the dynamic behaviour of cat-log giving different results depending on the job state.

Question:

Removing the --force-remote option will avoid unnecessary remote commands being run against succeeded/failed/submit-failed jobs (but not submitted/running ones) when remote log retrieval is configured.
But in niche circumstances, it may make certain log files inaccessible via the GUI. The log view may appear buggy in these situations as file listings may change and files offered might not actually be accessible.
I don't think we can reasonably suppress this caveat, but it's not that big a deal.
Do we want to turn --force-remote off and accept the small caveat documented above?
On the flip side there is a potential caveat of not removing --force-remote which is that workflows might housekeep the remote platform more aggressively than the local platform.

Happy either way.

oliver-sanders · 2025-02-07T11:43:37Z

The two remaining caveats:

Lag between a job completing and the log files being synced back.
Log files which exceed the maximum filesize so are not synced back, but which still need to be visible.

Should be well addressed by these small cat-log changes:

cat-log: infer --force-remote for --mode=list-dir if job.out not present cylc-flow#6596
cat-log: infer --force-remote if log file not present locally cylc-flow#6597

With these done we are clear to remove --force-remote all caveats handled as well as we reasonably can:

cat-log: drop the --force-remote option #667

Unrelated, but I have also opened a third small cat-log issue that will help to reduce the load impacts of running cat-log against finished jobs (whether local or otherwise):

cat-log: new auto cat/tail mode cylc-flow#6598

oliver-sanders · 2025-02-07T11:48:48Z

Themes extracted from the original post:

Non-visibility of job.out and job.err files due to PBS "spooling"
- Already handled in cylc-flow.
- The GUI side of this issue is now handled by: cat-log: list out/err files when available via tailer cylc-flow#6480
Use of the --force-remote option.
- Removed by cat-log: drop the --force-remote option #667
System load impact of cat-log commands:
- Reduced by cat-log: new auto cat/tail mode cylc-flow#6598

Hopefully this removes the motivation for a site-configuration to remove the --force-remote option. Although, as noted above, cylc cat-log will often go remote whether this option is specified or not so a UIS configuration won't make much difference. If there is still a need to prevent remote commands, I suggest opening an issue in cylc-flow as we will have to disable it at the level of the cat-log command itself.

I'm going to close this issue as superseded by the three issues linked above. Feel free to follow up if I've missed anything.

jarich changed the title ~~Allow configuration to tell cat-log to fetches logs from (local) disk rather than remote host~~ Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host Sep 15, 2023

ScottWales mentioned this issue Sep 21, 2023

Allow local logs for cat-log #509

Closed

8 tasks

oliver-sanders self-assigned this Feb 6, 2025

This was referenced Feb 7, 2025

cat-log: infer --force-remote for --mode=list-dir if job.out not present cylc/cylc-flow#6596

Open

cat-log: infer --force-remote if log file not present locally cylc/cylc-flow#6597

Open

cat-log: drop the --force-remote option #667

Open

oliver-sanders closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host #506

Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host #506

jarich commented Sep 15, 2023 •

edited

Loading

hjoliver commented Sep 18, 2023

jarich commented Sep 18, 2023

dpmatthews commented Sep 18, 2023

dpmatthews commented Sep 18, 2023

hjoliver commented Sep 18, 2023

jarich commented Sep 19, 2023 •

edited

Loading

jarich commented Sep 20, 2023

hjoliver commented Sep 21, 2023

jarich commented Sep 21, 2023 •

edited

Loading

dpmatthews commented Sep 21, 2023

oliver-sanders commented Sep 21, 2023

hjoliver commented Sep 21, 2023

oliver-sanders commented Sep 22, 2023

dpmatthews commented Sep 22, 2023

hjoliver commented Sep 24, 2023 •

edited

Loading

dpmatthews commented Sep 25, 2023

jarich commented Sep 26, 2023 via email

oliver-sanders commented Feb 6, 2025 •

edited

Loading

oliver-sanders commented Feb 7, 2025

oliver-sanders commented Feb 7, 2025 •

edited

Loading

Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host #506

Allow configuration to tell cat-log to fetch logs from (local) disk rather than remote host #506

Comments

jarich commented Sep 15, 2023 • edited Loading

hjoliver commented Sep 18, 2023

jarich commented Sep 18, 2023

dpmatthews commented Sep 18, 2023

dpmatthews commented Sep 18, 2023

hjoliver commented Sep 18, 2023

jarich commented Sep 19, 2023 • edited Loading

jarich commented Sep 20, 2023

hjoliver commented Sep 21, 2023

jarich commented Sep 21, 2023 • edited Loading

dpmatthews commented Sep 21, 2023

oliver-sanders commented Sep 21, 2023

hjoliver commented Sep 21, 2023

oliver-sanders commented Sep 22, 2023

dpmatthews commented Sep 22, 2023

hjoliver commented Sep 24, 2023 • edited Loading

dpmatthews commented Sep 25, 2023

jarich commented Sep 26, 2023 via email

oliver-sanders commented Feb 6, 2025 • edited Loading

oliver-sanders commented Feb 7, 2025

oliver-sanders commented Feb 7, 2025 • edited Loading

jarich commented Sep 15, 2023 •

edited

Loading

jarich commented Sep 19, 2023 •

edited

Loading

jarich commented Sep 21, 2023 •

edited

Loading

hjoliver commented Sep 24, 2023 •

edited

Loading

oliver-sanders commented Feb 6, 2025 •

edited

Loading

oliver-sanders commented Feb 7, 2025 •

edited

Loading