-
Notifications
You must be signed in to change notification settings - Fork 895
Remove stale solaris configure references #13206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
A prior commit (PR open-mpi#13163) removed the stale Solaris components as we no longer support that environment. However, the PR left the Solaris configure references in the code base. This PR removes those references. It also removes a duplicate m4 file (opal_check_os_flavors.m4) that exists in the OAC configure area. All references to the OPAL version have been updated to OAC. Signed-off-by: Ralph Castain <[email protected]>
Not sure the failure has anything to do with this PR - the singleton test is timing out. I do see the following: testHIndexed (test_util_dtlib.TestUtilDTLib.testHIndexed) ... �20 more processes
have sent help message help-mca-bml-r2.txt / unreachable proc Any ideas? |
# | ||
# Put -mt before -mthreads because HP-UX aCC will properly compile | ||
# with -mthreads (reading as -mt), but emit a warning about unknown | ||
# flags hreads. Stupid compilers. | ||
# flags threads. Stupid compilers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not a typo. The compiler emitted a warning about unknown flags hreads
, because the t
was consumed as part of the -mt
parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I put quotes around the word so it is a little more obviously intentional.
the unusual spelling is intentional. Signed-off-by: Ralph Castain <[email protected]>
Something is messed up in your main branch - I'm seeing a bunch of errors like this one: testCreateGroup (test_exceptions.TestExcSession.testCreateGroup) ... --------------------------------------------------------------------------
Your application has invoked an MPI function that is not supported in
this environment.
MPI function: MPI_Group_from_session_pset
Reason: PMIx server unreachable
-------------------------------------------------------------------------- Looks like you are trying to test comm_spawn related functions and the "mpirun" server isn't getting spawned for some reason. The tests don't always just fail - you get lots of "proc not reachable" for the child job. Since it takes time for all those individual comm_spawn tests to fail, the overall CI test eventually times out. Again, I can't see how this is related to what is being done here. Did something sneak into your main branch? |
This is expected behavior. Nothing to do at all with spawning processes. Just no server to handle pmix group construct ops. |
Okay - so how do you guys get this CI to pass? I didn't touch the yaml file. |
Check the mpi4py testcreatefromgroup unit test. That’s where the exception is special cased. |
Just running it by hand, the problem is that mpi4py is running a bunch of singleton comm_spawn tests - and those are generating errors and an eventual hang. Here is a sample of them: testArgsOnlyAtRoot (test_spawn.TestSpawnSingleWorld.testArgsOnlyAtRoot) ... 64 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
testCommSpawn (test_spawn.TestSpawnSingleWorldMany.testCommSpawn) ... ERROR
testErrcodes (test_spawn.TestSpawnSingleWorldMany.testErrcodes) ... ERROR
testNoArgs (test_spawn.TestSpawnSingleWorldMany.testNoArgs) ... ERROR
testToMemory (test_status.TestStatus.testToMemory) ... ERROR
test_util_dtlib (unittest.loader._FailedTest.test_util_dtlib) ... ERROR and here is the traceback for the test that eventually hangs - note that it has called spawn 40 times! test_apply (test_util_pool.TestProcessPool.test_apply) ... 17 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
^CTraceback (most recent call last):
File "/opt/hpc/build/mpi4py/test/main.py", line 361, in <module>
main(module=None)
File "/usr/local/lib/python3.11/unittest/main.py", line 102, in __init__
self.runTests()
File "/opt/hpc/build/mpi4py/test/main.py", line 346, in runTests
super().runTests()
File "/usr/local/lib/python3.11/unittest/main.py", line 274, in runTests
self.result = testRunner.run(self.test)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/unittest/runner.py", line 217, in run
test(result)
File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
test(result)
File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
test(result)
File "/usr/local/lib/python3.11/unittest/suite.py", line 84, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/unittest/suite.py", line 122, in run
test(result)
File "/usr/local/lib/python3.11/unittest/case.py", line 678, in __call__
return self.run(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/unittest/case.py", line 623, in run
self._callTestMethod(testMethod)
File "/usr/local/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
if method() is not None:
^^^^^^^^
File "/opt/hpc/build/mpi4py/test/test_util_pool.py", line 80, in test_apply
self.assertEqual(papply(sqr, (5,)), sqr(5))
^^^^^^^^^^^^^^^^^
File "/opt/hpc/build/mpi4py/build/lib.linux-aarch64-cpython-311/mpi4py/util/pool.py", line 79, in apply
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 451, in result
self._condition.wait(timeout)
File "/usr/local/lib/python3.11/threading.py", line 327, in wait
waiter.acquire()
[rhc-node01:10447] dpm_disconnect_init: error -12 in isend to process 0 ...bunch of error outputs like the one below: --------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[24043,0],0]) is on host: rhc-node01
Process 2 ([[51572,40],0]) is on host: rhc-node01
BTLs attempted: self sm smcuda
Your MPI job is now going to abort; sorry.
-------------------------------------------------------------------------- |
A prior commit (PR #13163) removed the stale Solaris components as we no longer support that environment. However, the PR left the Solaris configure references in the code base.
This PR removes those references. It also removes a duplicate m4 file (opal_check_os_flavors.m4) that exists in the OAC configure area. All references to the OPAL version have been updated to OAC.