Skip to content

Conversation

rmuir
Copy link
Member

@rmuir rmuir commented Oct 9, 2025

  • Gives us a 10 minute CI system
  • Expand test coverage to "arm on linux", which is a real use-case
  • Replaces slowest runner (20 minutes) with fastest runner (5 minutes)
  • Still tests with macos-arm at night
  • Policeman Jenkins still tests macos many times a day (and more reliably)

Copy link
Contributor

github-actions bot commented Oct 9, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@dweiss
Copy link
Contributor

dweiss commented Oct 9, 2025

I don't think Uwe's mac vm is an equivalent to these gh runners. I'd like to believe gh mac runners are actually on apple hardware while Uwe's is a hackintosh. Not that we ever discovered anything interesting up until now - I say this because I distinctly remember those mac runners on gh being much faster than anything else. It could have been the number of available cores or maybe something has changed externally.

@@ -0,0 +1,56 @@
name: "Run checks on MacOS: all modules"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this back into the single run-check-all and have a conditional parameter stating which OSs you want to run on? The point of having this single workflow was to decrease the maintenance of those gradle parameters and checks. I don't mind pulling it out for the time being until the problem with macs is sorted out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i feel bad copy-pasting the job, just didn't want strange behavior since I'm adding a cron here. We can probably merge into a single workflow, but it would mean dragging the mac hacks back in too :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweiss does this work better?

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

I don't think Uwe's mac vm is an equivalent to these gh runners. I'd like to believe gh mac runners are actually on apple hardware while Uwe's is a hackintosh. Not that we ever discovered anything interesting up until now - I say this because I distinctly remember those mac runners on gh being much faster than anything else. It could have been the number of available cores or maybe something has changed externally.

I agree, which is why I don't just drop the job, but instead try to reduce the frequency of it. These mac runners are a struggle and also have a high cost. I don't want to lose test coverage, but a nightly job would really take the pressure off, as opposed to invoking these on every PR/push.

The ubuntu one gives us more realistic arm testing (IMO) for search engines running on server-side ARM, it is efficient, and that's a real use-case, e.g. deploying to the cloud on graviton CPUs and the like. It is a good one for every PR.

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

I'll go as far as to say, I think we should change the "Run checks: all modules / checks without tests" to go on ubuntu-24.04-arm. The runner is simply faster than ubuntu-latest for our use-case.

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

I tried running linter with the ARM, running a cached build, saves a few seconds vs the intel for the linting job (which is around 3-4 minutes): not worth the trouble. nice that all linters do work correctly with arm though!

Gives us a 10 minute CI system
Expand test coverage to "arm on linux", which is a real use-case
Replaces slowest runner (20 minutes) with fastest runner (5 minutes)
Still tests with macos-arm at night
Policeman Jenkins still tests macos many times a day (and more reliably)
Copy link
Contributor

github-actions bot commented Oct 9, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy - I like this.

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

I don't think Uwe's mac vm is an equivalent to these gh runners. I'd like to believe gh mac runners are actually on apple hardware while Uwe's is a hackintosh. Not that we ever discovered anything interesting up until now - I say this because I distinctly remember those mac runners on gh being much faster than anything else. It could have been the number of available cores or maybe something has changed externally.

These mac runners are sometimes quite fast, even approaching 6 minutes (latest main build): https://github.com/apache/lucene/actions/runs/18367514153/job/52323135287

But then sometimes it takes 20 minutes: like the two builds it ran on the PR before it was merged. The variation is crazy unpredictable. Maybe some kind of throttling or old hardware, I have not fully debugged it, nor am I sure I want to. This PR is trying to make the problem less annoying.

@dweiss
Copy link
Contributor

dweiss commented Oct 9, 2025

we don't see the backstage of how it's organized - can be you're hitting a runner on a machine that is heavily loaded and it's not a fair vm scheduling system. or an older piece of hardware. it gets really complicated to trace it back to the root these days.

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

When debugging I just dumped sysctls and it shows a "virtualized cpu" and list of features, nothing exciting. You can find some other complaints on github around the problem, no clear solution. My gut is that it is probably just not supported as well as other operating system choices, maybe less mature virtualization around it, maybe struggle to meet with load demands too. Also its a quirky OS that does strange things.

Copy link
Contributor

github-actions bot commented Oct 9, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@dweiss dweiss added the skip-changelog Apply to PRs that don't need a changelog entry, stopping the automated changelog check. label Oct 9, 2025
@dweiss
Copy link
Contributor

dweiss commented Oct 9, 2025

Take a look if it's what you had in mind, Robert -
image

It may be a bit verbose but it's only part of github runs and it may be better to have too much than too little information there (?).

@rmuir
Copy link
Member Author

rmuir commented Oct 9, 2025

@dweiss that's awesome: it fills in the missing context in the CI build logs. Thank you

@rmuir rmuir merged commit fcbeefe into apache:main Oct 9, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:build-infra skip-changelog Apply to PRs that don't need a changelog entry, stopping the automated changelog check.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants