Torch problems when using NNPOps with Openmm-ML

Thanks for the great ecosystem for ML potentials in MD!

<details>
<summary>I tried running this simple `openmm-ml` example that uses `createSystem`:</summary>

```
#!/usr/bin/env python3

from openmm.app import *
from openmm import *
from openmm.unit import *
from openmmml import MLPotential

from sys import argv,stdout

# must be either "nnpops", "torchani"
implementation = argv[1]
input_file = argv[2]

pdb = PDBFile(input_file)

print("Creating ANI potential")
potential = MLPotential('ani2x')

print("Creating system")
system = potential.createSystem(pdb.topology, implementation=implementation)

print("Creating simulation")
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

print("Minimizing energy")
simulation.minimizeEnergy()

print("Simulating")
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True,
            potentialEnergy=True, temperature=True))
simulation.step(10000)
print("done")
```

</details>

<details>
<summary>I'm using a simple methane PDB file:</summary>

```
HETATM    1  C1  UNK     0      -0.238   0.373   0.000  1.00  0.00           C
HETATM    2  H1  UNK     0      -0.238   1.486   0.000  1.00  0.00           H
HETATM    3  H2  UNK     0      -1.286   0.002  -0.057  1.00  0.00           H
HETATM    4  H3  UNK     0       0.335   0.002  -0.879  1.00  0.00           H
HETATM    5  H4  UNK     0       0.236   0.002   0.936  1.00  0.00           H
END
```
</details>

When I specify to use the `torchani` implementation, everything goes through OK.

However, when I try to use `nnpops`, I get the following stacktrace (when running the energy minimization):
```
Traceback (most recent call last):
  File "/scratch/openmm-nnp/./run_md.py", line 28, in <module>
    simulation.minimizeEnergy()
  File "/scratch/.conda/envs/openmm_nnp/lib/python3.10/site-packages/openmm/app/simulation.py", line 137, in minimizeEnergy
    mm.LocalEnergyMinimizer.minimize(self.context, tolerance, maxIterations)
  File "/scratch/.conda/envs/openmm_nnp/lib/python3.10/site-packages/openmm/openmm.py", line 8544, in minimize
    return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 57, in <backward op>
            self_scalar_type = self.dtype
            def backward(grad_output):
                grad_self = AD_sum_backward(grad_output, self_size, dim, keepdim).to(self_scalar_type) / AD_safe_size(self_size, dim)
                            ~~~~~~~~~~~~~~~ <--- HERE
                return grad_self, None, None, None
  File "<string>", line 24, in AD_sum_backward
            if not keepdim and len(sizes) > 0:
                if len(dims) == 1:
                    return grad.unsqueeze(dims[0]).expand(sizes)
                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                else:
                    res = AD_unsqueeze_multiple(grad, dims, len(sizes))
RuntimeError: expand(CUDADoubleType{[1, 1]}, size=[1]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)
```

I'm using the `mmh/openmm-8-beta-linux` environment (via the command `mamba env create mmh/openmm-8-beta-linux`) on a Debian Bullseye system with an NVIDIA T4.
<details>
<summary>My full environment dump (`conda env export`):</summary>

```
channels:
  - conda-forge/label/openmm-torch_rc
  - conda-forge/label/openmm_rc
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_kmp_llvm
  - attrs=22.2.0=pyh71513ae_0
  - brotlipy=0.7.0=py310h5764c6d_1005
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.12.7=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2022.12.7=pyhd8ed1ab_0
  - cffi=1.15.1=py310h255011f_3
  - charset-normalizer=2.1.1=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - cryptography=39.0.0=py310h34c0648_0
  - cudatoolkit=11.8.0=h37601d7_11
  - cudnn=8.4.1.50=hed8a83a_0
  - exceptiongroup=1.1.0=pyhd8ed1ab_0
  - h5py=3.7.0=nompi_py310h416281c_102
  - hdf5=1.12.2=nompi_h4df4325_101
  - icu=70.1=h27087fc_0
  - idna=3.4=pyhd8ed1ab_0
  - importlib-metadata=6.0.0=pyha770c72_0
  - importlib_metadata=6.0.0=hd8ed1ab_0
  - iniconfig=2.0.0=pyhd8ed1ab_0
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.20.1=h81ceb04_0
  - lark-parser=0.12.0=pyhd8ed1ab_0
  - ld_impl_linux-64=2.39=hcc3a1bd_1
  - libaec=1.0.6=h9c3ff4c_0
  - libblas=3.9.0=16_linux64_openblas
  - libcblas=3.9.0=16_linux64_openblas
  - libcurl=7.87.0=hdc1c0ab_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=12.2.0=h65d4601_19
  - libgfortran-ng=12.2.0=h69a702a_19
  - libgfortran5=12.2.0=h337968e_19
  - libhwloc=2.8.0=h32351e8_1
  - libiconv=1.17=h166bdaf_0
  - liblapack=3.9.0=16_linux64_openblas
  - libnghttp2=1.51.0=hff17c54_0
  - libnsl=2.0.0=h7f98852_0
  - libopenblas=0.3.21=pthreads_h78a6416_3
  - libprotobuf=3.21.12=h3eb15da_0
  - libsqlite=3.40.0=h753d276_0
  - libssh2=1.10.0=hf14f497_3
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libuuid=2.32.1=h7f98852_1000
  - libxml2=2.10.3=hca2bb57_1
  - libzlib=1.2.13=h166bdaf_4
  - llvm-openmp=15.0.6=he0ac6c6_0
  - magma=2.5.4=hc72dce7_4
  - mkl=2022.2.1=h84fe81f_16997
  - nccl=2.14.3.1=h0800d71_0
  - ncurses=6.3=h27087fc_1
  - ninja=1.11.0=h924138e_0
  - nnpops=0.2=cuda112py310h8b99da5_5
  - numpy=1.24.1=py310h08bbf29_0
  - ocl-icd=2.3.1=h7f98852_0
  - ocl-icd-system=1.0.0=1
  - openmm=8.0.0beta=py310h2996cf7_2
  - openmm-ml=1.0beta=pyh79ba5db_2
  - openmm-torch=1.0beta=cuda112py310h02d4f52_2
  - openssl=3.0.7=h0b41bf4_1
  - packaging=22.0=pyhd8ed1ab_0
  - pip=22.3.1=pyhd8ed1ab_0
  - pluggy=1.0.0=pyhd8ed1ab_5
  - pycparser=2.21=pyhd8ed1ab_0
  - pyopenssl=23.0.0=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - pytest=7.2.0=pyhd8ed1ab_2
  - python=3.10.8=h4a9ceb5_0_cpython
  - python_abi=3.10=3_cp310
  - pytorch=1.12.1=cuda112py310he33e0d6_201
  - readline=8.1.2=h0f457ee_0
  - requests=2.28.1=pyhd8ed1ab_1
  - setuptools=59.5.0=py310hff52083_0
  - setuptools-scm=6.3.2=pyhd8ed1ab_0
  - setuptools_scm=6.3.2=hd8ed1ab_0
  - sleef=3.5.1=h9b69904_2
  - tbb=2021.7.0=h924138e_1
  - tk=8.6.12=h27826a3_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - torchani=2.2.2=cuda112py310h98dee98_6
  - typing_extensions=4.4.0=pyha770c72_0
  - tzdata=2022g=h191b570_0
  - urllib3=1.26.13=pyhd8ed1ab_0
  - wheel=0.38.4=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - zipp=3.11.0=pyhd8ed1ab_0
```
</details>

I've seen some mention of similar problems, but haven't been able to find the solution.

Any help is greatly appreciated. Apologies if this isn't the correct repo to open this issue in.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch problems when using NNPOps with Openmm-ML #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Torch problems when using NNPOps with Openmm-ML #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions