BitGenerator support #499

flying-sheep · 2025-06-06T16:29:34Z

See

Fixes #498

The idea is to have a safe wrapper around the npy_bitgen struct that implements rand::RngCore. That way pyo3 functions could be passed a np.random.Generator, get that wrapper from it, and pass it to Rust APIs, which could then call its methods repeatedly.

The way it’s implemented, the workflow would look like this:

acquire GIL
downcast a np.random.BitGenerator instance into a numpy::random::PyBitGenerator.
call .lock() on it to get a numpy::random::PyBitGeneratorGuard.
release GIL
call functions on guard object without needing to hold the GIL

TODO:

I see local crashes when running all tests, so there’s probably some UB, I’d appreciate help to fix it.

Safety

If somebody releases the threading lock of the BitGenerator while we’re using it, this isn’t safe 🤔

API design options

I could make this more complex by adding a new trait that is implemented by both PyBitGenerator and PyBitGeneratorGuard, allowing to choose if someone wants to

use the PyBitGenerator’s random_* methods directly on that object while holding the GIL and without locking it
use it like it’s used now, by locking the np.random.BitGenerator and returning a GIL-free object that can be used.

but for now I just implemented the use case that’s actually desired.

Icxolu

This looks like a useful addition! Thanks for working on it. I'm definitely not an expert here, but I left a few comment about things that stood out to me. Let me know what you think.
Also, are there any differences between numpy v1 and v2 that we need to consider?

Icxolu · 2025-06-08T13:44:09Z

.vscode/settings.json

This should be removed

sure, will do when I’m done. I like working on multiple machines, and I don’t like re-doing settings for individual projects

src/random.rs

Icxolu

I still don't love the drop impl, but with the way to manually release it with a Python token, it may be acceptable. Maybe @davidhewitt has an idea and/or comments about the appoach. Otherwise I only have a few minor remarks.

src/random.rs

flying-sheep · 2025-06-09T21:14:21Z

Thanks for the comments, I’ll address them!

The main issue is that I think I’m triggering UB somehow and I don’t know how: when running all tests, often some unrelated test run after this one crashes …

Also, are there any differences between numpy v1 and v2 that we need to consider?

I didn’t forget about this either, will look!

/edit: the C API for random is there since 1.19: https://numpy.org/doc/1.26/reference/random/c-api.html

.vscode/settings.json

src/npyffi/random.rs

Cargo.toml

src/npyffi/random.rs

mejrs · 2025-06-10T11:05:44Z

src/random.rs

+//! # use pyo3::prelude::*;
+//! use rand::Rng as _;
+//! # use numpy::random::{PyBitGenerator, PyBitGeneratorMethods as _};
+//! # // TODO: reuse function definition from above?


It feels like there should be a convenient way to get this. I'm thinking about something like

impl PyBitGenerator { fn new(py: Python<'_>) -> PyResult<Bound<..>>; }

there are many implementations, we’d have to cover all of them.

I’d rather leave this minimal until this PR is mostly done.

src/random.rs

Icxolu · 2025-06-10T16:01:24Z

src/random.rs

+            .getattr(intern!(py, "capsule"))?
+            .downcast_into::<PyCapsule>()?;
+        let lock = self.getattr(intern!(py, "lock"))?;
+        // we’re holding the GIL, so there’s no race condition checking the lock and acquiring it later.


This may not be true under free-threaded Python. Is the lock known to be threadsafe and acquire simply fails if the lock is already acquired? If not we may need to guard the whole module under cfg(not(Py_GIL_DISABLED))

it doesn’t fail, it hangs, but that’s configurable with a timeout or by making it non-blocking: https://docs.python.org/3/library/threading.html#threading.Lock.acquire

and it’s a threading.Lock!

src/random.rs

flying-sheep · 2025-06-10T17:09:27Z

OK, with the release attr and changing the parallel test to use the explicit release as well, the UB now sometimes manifests as a lock poisoning error. progress?

Icxolu · 2025-06-10T18:32:59Z

I may have found a problem:

This fails as intended:

Python::with_gil(|py| {
    let obj = get_bit_generator(py)?;
    let a = obj.lock()?;
    let b = obj.lock()?;

    Ok::<_, PyErr>(())
})
.unwrap();

returning

called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'RuntimeError'>, value: RuntimeError('BitGenerator is already locked'), traceback: None }

But this does not fail:

Python::with_gil(|py| {
    let a = get_bit_generator(py)?.lock()?;
    let b = get_bit_generator(py)?.lock()?;

    Ok::<_, PyErr>(())
})
.unwrap();

and crucially it gives the same pointers:

[src/random.rs:113:18] ptr = 0x00007b9f6be44cc0
[src/random.rs:113:18] *ptr = bitgen_t {
    state: 0x00007b9f6be44d08,
    next_uint64: 0x00007b9f6837d320,
    next_uint32: 0x00007b9f6837d370,
    next_double: 0x00007b9f6837d3f0,
    next_raw: 0x00007b9f6837d320,
}
[src/random.rs:113:18] ptr = 0x00007b9f6be44cc0
[src/random.rs:113:18] *ptr = bitgen_t {
    state: 0x00007b9f6be44d08,
    next_uint64: 0x00007b9f6837d320,
    next_uint32: 0x00007b9f6837d370,
    next_double: 0x00007b9f6837d3f0,
    next_raw: 0x00007b9f6837d320,
}

So when using multiple threads, for example multiple tests running in parallel, we have a data race on the state. I think we need a lock across all instances to make this work.

flying-sheep · 2025-06-11T09:34:43Z

Oh wow, so while default_rng(...).bit_generator.state is always different (somehow), default_rng(...).bit_generator.ctypes.state_address isn’t necessarily (somehow).

note the different seed sequence passed to the function, not even then is the state address different wtf:

>>> np.random.default_rng([1, 4]).bit_generator.ctypes.state_address
4355856392
>>> np.random.default_rng([2, 4]).bit_generator.ctypes.state_address
4355856392

I have no clue what to make of that. I just assumed different random state on the Python side means a different underlying struct, because how can that not be the case?

But anyway, you made me realize that the whole approach is flawed because the same BitGenerator can be passed from the Python side multiple times. So a generator passed from Python doesn’t have guaranteed independent state from another. Therefore if we want to use it, we’d have to use its threading lock as intended instead of abusing that lock into meaning “we can now do whatever we want with it”

So I think the way to go is instead of locking to use spawn to get independent child generators (which are always different, so we could use them to our hearts’ content, but also one should probably use one per thread anyway):

>>> [bg.ctypes.state_address for bg in np.random.default_rng().bit_generator.spawn(2)]
[4355860968, 4355862376]

mejrs · 2025-06-11T09:45:23Z

Maybe we should skip the guard part and just lock and unlock within the RngCore implementation itself. Can you give an example for why you'd want this, and why the api has this form? Why would someone want to use this rather than the RngCore impl that rand ships with? Maybe we can come up with a better design.

flying-sheep · 2025-06-11T09:49:39Z

When implementing Python-facing APIs, having a rng: np.random.Generator parameter is common. I want to write code that actually respects that parameter and uses it instead of ignoring it or calling it once to seed the actual generator.

WIP bitgen

06d6ce1

flying-sheep changed the title ~~BItGenerator support~~ BitGenerator support Jun 6, 2025

flying-sheep added 14 commits June 6, 2025 19:11

nonnull

07e2416

fix and test

b611943

cmt

d93a264

safer: don’t allow trying to get BitGen from any PyAny

f52b2fa

less indirection

05814d6

add tryfrom

37d360e

implement rand

eed5b19

fmt

6c1a89b

rename and deref

d1909d3

order

bde2553

make into lock

a0b9ec5

docs

ee32246

more docs

1be6838

guard

2aa3d90

flying-sheep marked this pull request as ready for review June 8, 2025 12:44

Icxolu reviewed Jun 8, 2025

View reviewed changes

flying-sheep added 10 commits June 8, 2025 17:01

call_method0

0258e6d

reaname test

876001b

manually drop and capsule

71ce8be

remove useless test

2de7072

doctests

016eb7a

smaller

1f7f37f

clarify where to release the GIL

1d01c7a

safety

c90176a

oops

f49d3fa

less unsafe

a16846d

Icxolu reviewed Jun 8, 2025

View reviewed changes

src/random.rs Outdated Show resolved Hide resolved

add thread test

573d890

flying-sheep added 2 commits June 8, 2025 20:27

back to lock acquiring

06bb693

docs

663fa29

Icxolu reviewed Jun 9, 2025

View reviewed changes

src/random.rs Outdated Show resolved Hide resolved

src/random.rs Outdated Show resolved Hide resolved

src/random.rs Outdated Show resolved Hide resolved

src/random.rs Outdated Show resolved Hide resolved

src/random.rs Outdated Show resolved Hide resolved

flying-sheep added 6 commits June 10, 2025 10:11

no copy/clone

c6105c9

rename to release

3a0aa92

remove lifetime

a92861a

static

6dbb6dc

no mut ref conversion

b102d20

disambiguate

e5e440e

mejrs requested changes Jun 10, 2025

View reviewed changes

flying-sheep added 8 commits June 10, 2025 14:59

rand_core only

e73e3a2

rename bitgen type

c6493df

c_str macro

2327f36

intern strings

e5c6458

docs

e8cd5e8

more doc

0868405

clean up tests

8667203

no let-else

1fd7bb5

Icxolu reviewed Jun 10, 2025

View reviewed changes

use GILOnceCell::import

3913171

BitGenerator support #499

Are you sure you want to change the base?

BitGenerator support #499

Uh oh!

Conversation

flying-sheep commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Safety

API design options

Uh oh!

Icxolu left a comment

Choose a reason for hiding this comment

Uh oh!

Icxolu Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

flying-sheep Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Icxolu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mejrs Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Icxolu Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

flying-sheep Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Jun 10, 2025

Uh oh!

Icxolu commented Jun 10, 2025

Uh oh!

flying-sheep commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mejrs commented Jun 11, 2025

Uh oh!

flying-sheep commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Jun 6, 2025 •

edited

Loading

flying-sheep commented Jun 9, 2025 •

edited

Loading

flying-sheep commented Jun 11, 2025 •

edited

Loading

flying-sheep commented Jun 11, 2025 •

edited

Loading