Skip to content

Conversation

@j-wags
Copy link
Member

@j-wags j-wags commented Oct 14, 2025

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.69%. Comparing base (7febf6d) to head (e204d49).
⚠️ Report is 1 commits behind head on main.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 2154 to 2159
for node in data:
h_counter = -1
for neighbor in data.neighbors(node):
if data.nodes[neighbor]['atomic_number'] == 1:
data.nodes[neighbor]['atomic_number'] = h_counter
h_counter -= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this depend on the order of the data.neighbors(node) iterator (in problematic ways)? I will try to think of an antagonistic test case tomorrow, if possible

Copy link
Member Author

@j-wags j-wags Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is an absolutely filthy hack but I've been unable to think of edge cases that would trip it up. The order of the neighbors shouldn't matter since they're all chemically equivalent hydrogens, and we specify stereochemistry on the central atom using CIP rules which should be invariant to the ordering of the bonds/neighbors (as opposed to tying it to the ordering of bonds, where losing track of the Hs' identities could MAYBE scramble things in some complex way).

I'd love any other ideas you come up with, even if they're just smells that you can't solidify into test cases. This is for sure cooked if we ever get a H with two bonds or try to handle deuterium or something.

@mattwthompson
Copy link
Member

So far I'm seeing a net improvement in some Interchange.to_x export times. I have not looked at anything beforehand (molecule loading, topology preparation, parameter assignment, partial charge assignment, etc.) nor is this a completely comprehensive analysis.1

The two environments I used are my Interchange development environment (with this branch installed) and a second with only released versions of our software2

OpenFE JACS set (vacuum) Mixed solvent system Ligand in water Tim's polymers
GROMACS No change Marginally worse Slightly better Huge improvement
OpenMM No change Marginally worse Somewhat better No change

Footnotes

  1. I have receipts for everything elsewhere if people want lots of logs files

  2. micromamba create --name released openff-toolkit seaborn polars tqdm matplotlib

@j-wags
Copy link
Member Author

j-wags commented Oct 15, 2025

A quick update here - I'm offline for the rest of this week and half of the next, but my plan is to come back around the 27th and try to prove to myself that this is a safe approach and put in tests to raise the alarm if a later change violates some assumption that this needs (no divalent H, no H isotopes, no negative atomic numbers, etc). If folks want to help out by thinking of edge cases/regressions, that could help me get through the QA process faster.

@mattwthompson mattwthompson added the polymer-performance Runtime of loading and/or parametrizing (bio)polymers label Oct 17, 2025
@j-wags j-wags self-assigned this Oct 20, 2025
@hannaomi
Copy link

hannaomi commented Oct 29, 2025

Some results using the faster_isomorphism branch with the highly polydisperse systems that were the use case in issue 1156 (trying to link this properly but it will only let me link the PR with that number).

The faster isomorphism branch has consistently lower create_interchange() runtimes compared to using off-toolkit v.0.17.0 when number of unique polymer chain components in the topology is increased.
image

Unclear to me what was causing that huge jump at >40 unique polymer chains, but it appears to have been fixed by the changes in faster_isomorphism.

For comparison, here is where I started when this issue was opened (toolkit v.0.16.8)
image

I can post some reproducing code for this benchmark if needed.

@mattwthompson
Copy link
Member

Thanks @hannaomi - those results look great! The discontinuity is indeed confusing but in either case the scaling appears to remain linear and the performance is improved.

Outside the context of this PR and your benchmarks, we have a couple of other patches in Interchange in the pipelines which might help this even further - both creation and export. Based on some benchmarking I've done, I suspect most of the runtime is Interchange and not the toolkit - but that's not worth a deep dive right now, I don't think.

@j-wags j-wags changed the title [WIP/DNM] potentially quick speedup for molecule isomorphism Speedup for molecule isomorphism checking Oct 29, 2025
@j-wags j-wags marked this pull request as ready for review October 29, 2025 18:36
@j-wags j-wags merged commit f63576a into main Nov 4, 2025
17 checks passed
@j-wags j-wags deleted the faster_isomorphism branch November 4, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

polymer-performance Runtime of loading and/or parametrizing (bio)polymers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Topology.identical_molecule_groups is too slow

4 participants