-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Is your feature request related to a problem?
This is a user support question - I hoped for blistering fast runtime of NAGL charge assignment and all I'm getting is extremely fast.
I'm dealing with a system containing several large (~400 atoms) polymers which look similar but are not strictly isomorphic with each other.
In:
from openff.toolkit.utils.toolkits import (
NAGLToolkitWrapper,
toolkit_registry_manager,
ToolkitRegistry,
RDKitToolkitWrapper,
)
from openff.toolkit import Topology
def assign_nagl(topology):
with toolkit_registry_manager(
ToolkitRegistry(
[
NAGLToolkitWrapper(),
RDKitToolkitWrapper(),
]
)
):
for molecule in topology.molecules:
molecule.assign_partial_charges(
partial_charge_method="openff-gnn-am1bcc-0.1.0-rc.3.pt",
)
assert molecule.partial_charges is not None
topology = Topology.from_json(open("topology.json").read())
print(
f"{topology.n_atoms=}\n"
+ f"{topology.n_molecules=}\n"
+ f"{[molecule.n_atoms for molecule in topology.molecules]=}"
)
%timeit assign_nagl(topology)Out:
topology.n_atoms=4119
topology.n_molecules=10
[molecule.n_atoms for molecule in topology.molecules]=[420, 414, 417, 408, 411, 411, 417, 411, 399, 411]
6.72 s ± 247 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Some naive profiling suggests that
- most of the runtime is spent loading the model
- the model is re-loaded from disk again for each molecule in this loop
Describe the solution you'd like
The same, or similar, code to magically run 10-100x quicker. Or, more feasibly, a different way to interact with the model that allows it to be loaded only once.
Describe alternatives you've considered
I hoped that using the context manager meant the model would still be floating in memory. Another way of using the toolkit is [NAGLToolkitWrapper().assign_partial_charges(molecule) for molecule in topology.molecules] but that is a touch slower at 6.92 s ± 187 ms per loop. Intuitively I would expect it to be, since I think it needs to re-instantiate NAGLToolkitWrapper for each molecule.
Additional context
I didn't try other models since I understand this one to be the best and the problem seems to be more about loading the model too many times and not the first time taking too long.
I certainly did not try this with AM1-BCC but I'm amused by the idea of doing so