Skip to content

Remove stale solaris configure references #13206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Apr 22, 2025

A prior commit (PR #13163) removed the stale Solaris components as we no longer support that environment. However, the PR left the Solaris configure references in the code base.

This PR removes those references. It also removes a duplicate m4 file (opal_check_os_flavors.m4) that exists in the OAC configure area. All references to the OPAL version have been updated to OAC.

A prior commit (PR open-mpi#13163) removed the stale Solaris
components as we no longer support that environment.
However, the PR left the Solaris configure references
in the code base.

This PR removes those references. It also removes a
duplicate m4 file (opal_check_os_flavors.m4) that exists
in the OAC configure area. All references to the OPAL
version have been updated to OAC.

Signed-off-by: Ralph Castain <[email protected]>
@rhc54 rhc54 requested review from jsquyres and bwbarrett April 22, 2025 15:20
@rhc54 rhc54 self-assigned this Apr 22, 2025
@rhc54
Copy link
Contributor Author

rhc54 commented Apr 22, 2025

Not sure the failure has anything to do with this PR - the singleton test is timing out. I do see the following:

testHIndexed (test_util_dtlib.TestUtilDTLib.testHIndexed) ... �20 more processes
have sent help message help-mca-bml-r2.txt / unreachable proc

Any ideas?

bwbarrett
bwbarrett previously approved these changes Apr 22, 2025
#
# Put -mt before -mthreads because HP-UX aCC will properly compile
# with -mthreads (reading as -mt), but emit a warning about unknown
# flags hreads. Stupid compilers.
# flags threads. Stupid compilers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not a typo. The compiler emitted a warning about unknown flags hreads, because the t was consumed as part of the -mt parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I put quotes around the word so it is a little more obviously intentional.

the unusual spelling is intentional.

Signed-off-by: Ralph Castain <[email protected]>
@rhc54
Copy link
Contributor Author

rhc54 commented Apr 22, 2025

Something is messed up in your main branch - I'm seeing a bunch of errors like this one:

testCreateGroup (test_exceptions.TestExcSession.testCreateGroup) ... --------------------------------------------------------------------------
Your application has invoked an MPI function that is not supported in
this environment.

  MPI function: MPI_Group_from_session_pset
  Reason:       PMIx server unreachable
--------------------------------------------------------------------------

Looks like you are trying to test comm_spawn related functions and the "mpirun" server isn't getting spawned for some reason. The tests don't always just fail - you get lots of "proc not reachable" for the child job. Since it takes time for all those individual comm_spawn tests to fail, the overall CI test eventually times out.

Again, I can't see how this is related to what is being done here. Did something sneak into your main branch?

@hppritcha
Copy link
Member

This is expected behavior. Nothing to do at all with spawning processes. Just no server to handle pmix group construct ops.

@rhc54
Copy link
Contributor Author

rhc54 commented Apr 23, 2025

This is expected behavior. Nothing to do at all with spawning processes. Just no server to handle pmix group construct ops.

Okay - so how do you guys get this CI to pass? I didn't touch the yaml file.

@hppritcha
Copy link
Member

Check the mpi4py testcreatefromgroup unit test. That’s where the exception is special cased.

@rhc54
Copy link
Contributor Author

rhc54 commented Apr 23, 2025

Just running it by hand, the problem is that mpi4py is running a bunch of singleton comm_spawn tests - and those are generating errors and an eventual hang. Here is a sample of them:

testArgsOnlyAtRoot (test_spawn.TestSpawnSingleWorld.testArgsOnlyAtRoot) ... 64 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
testCommSpawn (test_spawn.TestSpawnSingleWorldMany.testCommSpawn) ... ERROR
testErrcodes (test_spawn.TestSpawnSingleWorldMany.testErrcodes) ... ERROR
testNoArgs (test_spawn.TestSpawnSingleWorldMany.testNoArgs) ... ERROR
testToMemory (test_status.TestStatus.testToMemory) ... ERROR
test_util_dtlib (unittest.loader._FailedTest.test_util_dtlib) ... ERROR

and here is the traceback for the test that eventually hangs - note that it has called spawn 40 times!

test_apply (test_util_pool.TestProcessPool.test_apply) ... 17 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
^CTraceback (most recent call last):
  File "/opt/hpc/build/mpi4py/test/main.py", line 361, in <module>
    main(module=None)
  File "/usr/local/lib/python3.11/unittest/main.py", line 102, in __init__
    self.runTests()
  File "/opt/hpc/build/mpi4py/test/main.py", line 346, in runTests
    super().runTests()
  File "/usr/local/lib/python3.11/unittest/main.py", line 274, in runTests
    self.result = testRunner.run(self.test)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/unittest/runner.py", line 217, in run
    test(result)
  File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/local/lib/python3.11/unittest/case.py", line 678, in __call__
    return self.run(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/usr/local/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/opt/hpc/build/mpi4py/test/test_util_pool.py", line 80, in test_apply
    self.assertEqual(papply(sqr, (5,)), sqr(5))
                     ^^^^^^^^^^^^^^^^^
  File "/opt/hpc/build/mpi4py/build/lib.linux-aarch64-cpython-311/mpi4py/util/pool.py", line 79, in apply
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
  File "/usr/local/lib/python3.11/threading.py", line 327, in wait
    waiter.acquire()

[rhc-node01:10447] dpm_disconnect_init: error -12 in isend to process 0

...bunch of error outputs like the one below:

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[24043,0],0]) is on host: rhc-node01
  Process 2 ([[51572,40],0]) is on host: rhc-node01
  BTLs attempted: self sm smcuda

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants