Non-trivial runtime of heterogeneous polymer system

## Is your feature request related to a problem? ##

This is a user support question - I hoped for blistering fast runtime of NAGL charge assignment and all I'm getting is extremely fast.

I'm dealing with a system containing several large (~400 atoms) polymers which look similar but are not strictly isomorphic with each other.

[topology.json.zip](https://github.com/user-attachments/files/19297942/topology.json.zip)

In:
```python
from openff.toolkit.utils.toolkits import (
    NAGLToolkitWrapper,
    toolkit_registry_manager,
    ToolkitRegistry,
    RDKitToolkitWrapper,
)
from openff.toolkit import Topology


def assign_nagl(topology):
    with toolkit_registry_manager(
        ToolkitRegistry(
            [
                NAGLToolkitWrapper(),
                RDKitToolkitWrapper(),
            ]
        )
    ):
        for molecule in topology.molecules:
            molecule.assign_partial_charges(
                partial_charge_method="openff-gnn-am1bcc-0.1.0-rc.3.pt",
            )
            assert molecule.partial_charges is not None


topology = Topology.from_json(open("topology.json").read())

print(
    f"{topology.n_atoms=}\n"
    + f"{topology.n_molecules=}\n"
    + f"{[molecule.n_atoms for molecule in topology.molecules]=}"
)

%timeit assign_nagl(topology)
```

Out:
```
topology.n_atoms=4119
topology.n_molecules=10
[molecule.n_atoms for molecule in topology.molecules]=[420, 414, 417, 408, 411, 411, 417, 411, 399, 411]
6.72 s ± 247 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Some naive profiling suggests that
* most of the runtime is spent loading the model
* the model is re-loaded from disk again for each molecule in this loop

<img width="1231" alt="Image" src="https://github.com/user-attachments/assets/a8546788-a457-4e01-9a75-54feea678810" />

## Describe the solution you'd like ##

The same, or similar, code to magically run 10-100x quicker. Or, more feasibly, a different way to interact with the model that allows it to be loaded only once.

## Describe alternatives you've considered ##

I hoped that using the context manager meant the model would still be floating in memory. Another way of using the toolkit is `[NAGLToolkitWrapper().assign_partial_charges(molecule) for molecule in topology.molecules]` but that is a touch slower at 6.92 s ± 187 ms per loop. Intuitively I would expect it to be, since I think it needs to re-instantiate `NAGLToolkitWrapper` for each molecule.

## Additional context ##

I didn't try other models since I understand this one to be the best and the problem seems to be more about loading the model too many times and not the first time taking too long.

I certainly did not try this with AM1-BCC but I'm amused by the idea of doing so

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-trivial runtime of heterogeneous polymer system #193

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-trivial runtime of heterogeneous polymer system #193

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions