Speedup for molecule isomorphism checking #2114

j-wags · 2025-10-14T21:17:45Z

Fix Topology.identical_molecule_groups is too slow #2035
- This makes the reproducing case run on my machine in ~0.5 sec, compared to 11 min using the old stack
Add tests
Update docstrings/documentation, if applicable
Lint codebase
Update changelog

for more information, see https://pre-commit.ci

…isomorphism

codecov · 2025-10-14T22:13:07Z

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.69%. Comparing base (7febf6d) to head (e204d49).
⚠️ Report is 1 commits behind head on main.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mattwthompson · 2025-10-15T03:39:49Z

openff/toolkit/topology/molecule.py

+                for node in data:
+                    h_counter = -1
+                    for neighbor in data.neighbors(node):
+                        if data.nodes[neighbor]['atomic_number'] == 1:
+                            data.nodes[neighbor]['atomic_number'] = h_counter
+                            h_counter -= 1


Does this depend on the order of the data.neighbors(node) iterator (in problematic ways)? I will try to think of an antagonistic test case tomorrow, if possible

Thanks, this is an absolutely filthy hack but I've been unable to think of edge cases that would trip it up. The order of the neighbors shouldn't matter since they're all chemically equivalent hydrogens, and we specify stereochemistry on the central atom using CIP rules which should be invariant to the ordering of the bonds/neighbors (as opposed to tying it to the ordering of bonds, where losing track of the Hs' identities could MAYBE scramble things in some complex way).

I'd love any other ideas you come up with, even if they're just smells that you can't solidify into test cases. This is for sure cooked if we ever get a H with two bonds or try to handle deuterium or something.

mattwthompson · 2025-10-15T20:02:57Z

So far I'm seeing a net improvement in some Interchange.to_x export times. I have not looked at anything beforehand (molecule loading, topology preparation, parameter assignment, partial charge assignment, etc.) nor is this a completely comprehensive analysis.¹

The two environments I used are my Interchange development environment (with this branch installed) and a second with only released versions of our software²

	OpenFE JACS set (vacuum)	Mixed solvent system	Ligand in water	Tim's polymers
GROMACS	No change	Marginally worse	Slightly better	Huge improvement
OpenMM	No change	Marginally worse	Somewhat better	No change

I have receipts for everything elsewhere if people want lots of logs files ↩
micromamba create --name released openff-toolkit seaborn polars tqdm matplotlib ↩

j-wags · 2025-10-15T22:41:50Z

A quick update here - I'm offline for the rest of this week and half of the next, but my plan is to come back around the 27th and try to prove to myself that this is a safe approach and put in tests to raise the alarm if a later change violates some assumption that this needs (no divalent H, no H isotopes, no negative atomic numbers, etc). If folks want to help out by thinking of edge cases/regressions, that could help me get through the QA process faster.

for more information, see https://pre-commit.ci

hannaomi · 2025-10-29T16:28:54Z

Some results using the faster_isomorphism branch with the highly polydisperse systems that were the use case in issue 1156 (trying to link this properly but it will only let me link the PR with that number).

The faster isomorphism branch has consistently lower create_interchange() runtimes compared to using off-toolkit v.0.17.0 when number of unique polymer chain components in the topology is increased.

Unclear to me what was causing that huge jump at >40 unique polymer chains, but it appears to have been fixed by the changes in faster_isomorphism.

For comparison, here is where I started when this issue was opened (toolkit v.0.16.8)

I can post some reproducing code for this benchmark if needed.

mattwthompson · 2025-10-29T16:47:04Z

Thanks @hannaomi - those results look great! The discontinuity is indeed confusing but in either case the scaling appears to remain linear and the performance is improved.

Outside the context of this PR and your benchmarks, we have a couple of other patches in Interchange in the pipelines which might help this even further - both creation and export. Based on some benchmarking I've done, I suspect most of the runtime is Interchange and not the toolkit - but that's not worth a deep dive right now, I don't think.

…py input, polish docstrings

j-wags and others added 5 commits October 14, 2025 14:17

potentially quick speedup for molecule isomorphism

ca6a652

silence yappy test and handle nx graphs

0385a8b

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef2a004

for more information, see https://pre-commit.ci

fix atom counting with negative numbered Hs

07e538f

Merge remote-tracking branch 'origin/faster_isomorphism' into faster_…

132e538

…isomorphism

change test to expect new (equivalent) ordering for atom mapping

1da30e8

mattwthompson reviewed Oct 15, 2025

View reviewed changes

mattwthompson mentioned this pull request Oct 15, 2025

Improve quadratic runtime of Interchange 0.4.0 GMX export openforcefield/openff-interchange#1264

Open

mattwthompson mentioned this pull request Oct 15, 2025

Non-trivial runtime of heterogeneous polymer system openforcefield/openff-nagl#193

Open

mrshirts mentioned this pull request Oct 16, 2025

Exporting to Interchange is SLOW! joelaforet/polyzymd_builder#1

Open

mattwthompson added the polymer-performance Runtime of loading and/or parametrizing (bio)polymers label Oct 17, 2025

j-wags self-assigned this Oct 20, 2025

j-wags and others added 3 commits October 28, 2025 15:54

add new error, test, and update releasenotes

fd5138e

[pre-commit.ci] auto fixes from pre-commit.com hooks

7fb3cb0

for more information, see https://pre-commit.ci

fix greater-than/less-than mismatch

505e3ad

privatize Molecule.are_isomorphic.to_networks, unconditionally deepco…

763aa00

…py input, polish docstrings

j-wags changed the title ~~[WIP/DNM] potentially quick speedup for molecule isomorphism~~ Speedup for molecule isomorphism checking Oct 29, 2025

j-wags marked this pull request as ready for review October 29, 2025 18:36

jameseastwood unassigned j-wags Oct 29, 2025

mattwthompson approved these changes Nov 3, 2025

View reviewed changes

mattwthompson mentioned this pull request Nov 3, 2025

Isomorphism comparisons are slow on similar large molecules #2072

Closed

j-wags and others added 2 commits November 4, 2025 07:50

Update releasenotes for 0.18.0 release

fadc7f6

Merge branch 'main' into faster_isomorphism

e204d49

j-wags merged commit f63576a into main Nov 4, 2025
17 checks passed

j-wags deleted the faster_isomorphism branch November 4, 2025 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speedup for molecule isomorphism checking #2114

Speedup for molecule isomorphism checking #2114

j-wags commented Oct 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

mattwthompson Oct 15, 2025

Uh oh!

j-wags Oct 15, 2025 •

edited

Loading

Uh oh!

mattwthompson commented Oct 15, 2025

Uh oh!

j-wags commented Oct 15, 2025

Uh oh!

hannaomi commented Oct 29, 2025 •

edited

Loading

Uh oh!

mattwthompson commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Speedup for molecule isomorphism checking #2114

Speedup for molecule isomorphism checking #2114

Conversation

j-wags commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mattwthompson Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

j-wags Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattwthompson commented Oct 15, 2025

Footnotes

Uh oh!

j-wags commented Oct 15, 2025

Uh oh!

hannaomi commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattwthompson commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

j-wags commented Oct 14, 2025 •

edited

Loading

codecov bot commented Oct 14, 2025 •

edited

Loading

j-wags Oct 15, 2025 •

edited

Loading

hannaomi commented Oct 29, 2025 •

edited

Loading