BUG/TST: `special.logsumexp` on non-default device #22756

crusaderky · 2025-03-28T18:01:41Z

This PR sets up a test infrastructure to test device propagation when a function receives in input arrays that don't lay on the default device.
This benefits all multi-device backends, which means PyTorch, JAX, and (for testing purposes only) array-api-strict.

This PR fixes logsumexp, and incidentally finds out that there are bugs in both JAX and PyTorch at the moment of writing that prevent this from functioning properly; so I don't plan to extend the same treatment to other functions until the upstream bugs are solved.

Upstream issues

mdhaber · 2025-03-28T18:06:35Z

Closes gh-22680?

crusaderky · 2025-03-28T18:21:31Z

Closes gh-22680?

Yes on array-api-strict;
Yes on PyTorch, as long as the user didn't call torch.set_default_device(...);
Yes on JAX, as long as the user didn't wrap it in jax.jit;

TL;DR yes. It removes scipy's bugs and the user is left with the bugs of their backend of choice.

tylerjereddy · 2025-03-28T19:14:13Z

scipy/special/tests/test_logsumexp.py

+        """Test input device propagation to output."""
+        x = xp.asarray(x, device=nondefault_device)
+        assert xp_device(logsumexp(x)) == nondefault_device
+        assert xp_device(logsumexp(x, b=x)) == nondefault_device


At first, if you're working on a machine with a single GPU for example, this PR may appear to be fairly straightforward.

However, when I test on a node/machine that has multiple GPUs and use i.e., SCIPY_DEVICE=cuda python dev.py test -t scipy/special/tests/test_logsumexp.py::TestLogSumExp::test_device -b cupy

this test will currently fail on this branch:

scipy/special/tests/test_logsumexp.py:299: in test_device assert xp_device(logsumexp(x)) == nondefault_device E assert <CUDA Device 0> == <CUDA Device 1>

Does CuPy require special treatment? Do multiple GPUs require special treatment? It isn't immediately obvious to me, but the nature of the failure suggests that device propagation is not working as intended at first glance.

I'm using CuPy 13.3.0, which is fairly recent. I could try bumping the version maybe.

Is our ecosystem currently shimming around cupy.cuda.Device to compensate for not having the device kwarg on the array coercion for CuPy?

At first, if you're working on a machine with a single GPU for example, this PR may appear to be fairly straightforward.

On a single GPU machine, cupy has only one device and the test introduced by this PR is skipped.

Is our ecosystem currently shimming around cupy.cuda.Device to compensate for not having the device kwarg on the array coercion for CuPy?

Yes, by array-api-compat.
Looks like multi-device support was not thought through: data-apis/array-api-compat#293

mdhaber · 2025-03-28T19:22:09Z

Well, actually, it probably shouldn't close gh-22680. "Bug" or "lack of support for an experimental feature we eventually want to support", this is a systematic shortcoming of a lot of xp-translated code. There wasn't even GPU default device testing in CI when a lot of the translations were done, and there have been a lot of issues in backends surrounding the device keyword, so it wasn't used anywhere except fft and some parts of signal. Once we can actually test non-default device, we can start adding its use throughout. That should still be tracked by gh-22680 or a new issue.

scipy/conftest.py

crusaderky · 2025-03-31T15:22:13Z

I've re-enabled the test on torch; however it will exclusively run on a GPU-enabled host with SCIPY_DEVICE=cpu, which is not something that ever happens in CI.
I've xfailed the test on CuPy and tracking progress on data-apis/array-api-compat#293. I have low expectations that it is fixable because we can't override method. Full discussion in the linked PR.

mdhaber

This only superficially has to do with logsumexp, so I'll provide a superficial review and approval from the logsumexp side of things.

The changes there are minimal, but they appear to be complete. I think the only xp functions used by logsumexp (and private functions) that accept device are full, arange, and asarray. This PR provides the correct device argument to full and arange, and I think the calls to asarray can infer the device correctly based on the input.

Setting up the test fixture, etc., is not really in my wheelhouse, so I'll let others comment on that.

tylerjereddy · 2025-04-01T18:40:49Z

Just to continue to providing feedback on the multi-device scenario on the latest version of this branch + the latest version of the cognate array-api-compat branch, the next point of failure for one of the test cases is here:

@@ -115,14 +119,18 @@ def logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False):
             # Where result is infinite, we use the direct logsumexp calculation to
             # delegate edge case handling to the behavior of `xp.log` and `xp.exp`,
             # which should follow the C99 standard for complex values.
+            print("xp_device(a) at logsumexp checkpoint 4b:", xp_device(a))
+            xp.exp(a)
             b_exp_a = xp.exp(a) if b is None else b * xp.exp(a)
             sum_ = xp.sum(b_exp_a, axis=axis, keepdims=True)

Even the isolated xp.exp(a) on its own, which I manually added there, fails with: ValueError: The device where the array resides (1) is different from the current device (0). Peer access is unavailable between these devices.. The confusion around this is perhaps better discussed over at the array-api-compat PR though, and you've xfailed the tests with CuPy for now, so not trying to block this.

crusaderky · 2025-04-03T09:39:06Z

Tested that data-apis/array-api-compat#296 fully fixes PyTorch

crusaderky · 2025-04-04T10:02:38Z

I've reworked the fixture to incorporate prior art from #19900 (@lucascolley).

scipy/_lib/_array_api.py

crusaderky · 2025-04-04T10:04:16Z

scipy/conftest.py

+        # Note workaround when parsing SCIPY_DEVICE above.
+        # Also note that when SCIPY_DEVICE=cpu this test won't run in CI
+        # because CUDA-enabled CI boxes always use SCIPY_DEVICE=cuda.
+        pytest.xfail(reason="pytorch/pytorch#150199")


Workaround: data-apis/array-api-compat#299

crusaderky · 2025-04-04T10:05:54Z

scipy/conftest.py

+        # While this issue is specific to jax.jit, it would be unnecessarily
+        # verbose to skip the test for each jit-capable function and run it for
+        # those that only support eager mode.
+        pytest.xfail(reason="jax-ml/jax#26000")


Also see jax-ml/jax#27606 (fixed in next JAX release).

lucascolley

thanks Guido!

BUG/TST: special: run logsumexp on non-default device

3025a6a

crusaderky requested review from person142 and steppi as code owners March 28, 2025 18:01

github-actions bot added scipy.special defect A clear bug or issue that prevents SciPy from being installed or used as expected maintenance Items related to regular maintenance tasks labels Mar 28, 2025

crusaderky changed the title ~~BUG/TST: special: run logsumexp on non-default device~~ BUG/TST: special.logsumexp on non-default device Mar 28, 2025

tylerjereddy reviewed Mar 28, 2025

View reviewed changes

lucascolley reviewed Mar 28, 2025

View reviewed changes

scipy/conftest.py Outdated Show resolved Hide resolved

lucascolley removed request for steppi and person142 March 28, 2025 21:53

lucascolley added the array types Items related to array API support and input array validation (see gh-18286) label Mar 28, 2025

This was referenced Mar 31, 2025

Clarify definitions of "default device" and "current device" data-apis/array-api#835

Open

[DNM] ENH: CuPy multi-device support data-apis/array-api-compat#293

Draft

Test device support data-apis/array-api-tests#302

Open

crusaderky added 2 commits March 31, 2025 12:07

Merge branch 'main' into logsumexp_device

deef6eb

Tweak test

568ea55

This was referenced Mar 31, 2025

Multi-device support meta-thread data-apis/array-api#918

Open

Array API default_device() and devices() are incorrect jax-ml/jax#27606

Closed

Merge branch 'main' into logsumexp_device

addaaac

mdhaber approved these changes Apr 1, 2025

View reviewed changes

crusaderky added 2 commits April 1, 2025 09:48

Merge branch 'main' into logsumexp_device

c7681d8

Merge branch 'main' into logsumexp_device

06826fa

Merge branch 'main' into logsumexp_device

926d9f8

Rework fixture

a479eee

crusaderky requested a review from peterbell10 as a code owner April 3, 2025 16:58

lucascolley mentioned this pull request Apr 3, 2025

MAINT: fft: remove some backend test skips #22797

Closed

Merge branch 'main' into logsumexp_device

46e314e

crusaderky commented Apr 4, 2025

View reviewed changes

scipy/_lib/_array_api.py Show resolved Hide resolved

crusaderky commented Apr 4, 2025

View reviewed changes

lucascolley approved these changes Apr 4, 2025

View reviewed changes

lucascolley added this to the 1.16.0 milestone Apr 4, 2025

lucascolley merged commit bc5d86a into scipy:main Apr 4, 2025
39 of 41 checks passed

crusaderky deleted the logsumexp_device branch April 4, 2025 14:20

crusaderky mentioned this pull request Apr 7, 2025

BUG: error for arrays on non-default device #22680

Open

Uh oh!

BUG/TST: special.logsumexp on non-default device #22756

BUG/TST: special.logsumexp on non-default device #22756

Uh oh!

Conversation

crusaderky commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upstream issues

Uh oh!

mdhaber commented Mar 28, 2025

Uh oh!

crusaderky commented Mar 28, 2025

Uh oh!

tylerjereddy Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

crusaderky commented Mar 31, 2025

Uh oh!

mdhaber left a comment

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Apr 1, 2025

Uh oh!

crusaderky commented Apr 3, 2025

Uh oh!

crusaderky commented Apr 4, 2025

Uh oh!

Uh oh!

crusaderky Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

crusaderky Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BUG/TST: `special.logsumexp` on non-default device #22756

BUG/TST: `special.logsumexp` on non-default device #22756

crusaderky commented Mar 28, 2025 •

edited

Loading

mdhaber commented Mar 28, 2025 •

edited

Loading