Skip to content

Non-trivial runtime of heterogeneous polymer system #193

@mattwthompson

Description

@mattwthompson

Is your feature request related to a problem?

This is a user support question - I hoped for blistering fast runtime of NAGL charge assignment and all I'm getting is extremely fast.

I'm dealing with a system containing several large (~400 atoms) polymers which look similar but are not strictly isomorphic with each other.

topology.json.zip

In:

from openff.toolkit.utils.toolkits import (
    NAGLToolkitWrapper,
    toolkit_registry_manager,
    ToolkitRegistry,
    RDKitToolkitWrapper,
)
from openff.toolkit import Topology


def assign_nagl(topology):
    with toolkit_registry_manager(
        ToolkitRegistry(
            [
                NAGLToolkitWrapper(),
                RDKitToolkitWrapper(),
            ]
        )
    ):
        for molecule in topology.molecules:
            molecule.assign_partial_charges(
                partial_charge_method="openff-gnn-am1bcc-0.1.0-rc.3.pt",
            )
            assert molecule.partial_charges is not None


topology = Topology.from_json(open("topology.json").read())

print(
    f"{topology.n_atoms=}\n"
    + f"{topology.n_molecules=}\n"
    + f"{[molecule.n_atoms for molecule in topology.molecules]=}"
)

%timeit assign_nagl(topology)

Out:

topology.n_atoms=4119
topology.n_molecules=10
[molecule.n_atoms for molecule in topology.molecules]=[420, 414, 417, 408, 411, 411, 417, 411, 399, 411]
6.72 s ± 247 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Some naive profiling suggests that

  • most of the runtime is spent loading the model
  • the model is re-loaded from disk again for each molecule in this loop
Image

Describe the solution you'd like

The same, or similar, code to magically run 10-100x quicker. Or, more feasibly, a different way to interact with the model that allows it to be loaded only once.

Describe alternatives you've considered

I hoped that using the context manager meant the model would still be floating in memory. Another way of using the toolkit is [NAGLToolkitWrapper().assign_partial_charges(molecule) for molecule in topology.molecules] but that is a touch slower at 6.92 s ± 187 ms per loop. Intuitively I would expect it to be, since I think it needs to re-instantiate NAGLToolkitWrapper for each molecule.

Additional context

I didn't try other models since I understand this one to be the best and the problem seems to be more about loading the model too many times and not the first time taking too long.

I certainly did not try this with AM1-BCC but I'm amused by the idea of doing so

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions