Skip to content

Topology.identical_molecule_groups is too slow #2035

@mattwthompson

Description

@mattwthompson

Describe the bug

Subgraph isormorphism is central to using the toolkit on multi-molecule systems but it is slow, especially for large and complicated systems.

To Reproduce

topology.json.zip

With a topology of fairly modest size and refactoring to use a Rust re-implementation of networkx (#2033), this takes about 10 minutes:

In [1]: from openff.toolkit import Topology

In [2]: topology = Topology.from_json(open("topology.json").read())

In [3]: %timeit -o -r1 -n1 topology.identical_molecule_groups
11min 2s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Out[3]: <TimeitResult : 11min 2s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)>

In [4]: !open .

In [5]: topology.n_atoms, topology.n_molecules
Out[5]: (4119, 10)

Using the current main branch, it takes at least twice that (24 minutes, still running):

Image

Output

Computing environment (please complete the following information):

  • Operating system
  • Output of running conda list

Additional context

#1143 #1734 #353 #2008 openforcefield/openff-interchange#1156 etc.

Metadata

Metadata

Assignees

Labels

polymer-performanceRuntime of loading and/or parametrizing (bio)polymers

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions