`AttentionPairBiasMask.backward()` fails with non-contiguous gradient tensors despite contiguous forward inputs

We're encountering a runtime error in `AttentionPairBiasMask.backward()` during training:
```output
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
```

The error occurs at line 358 in `attention_pair_bias_torch.py`:

```python
grad_out_mask = grad_out_mask.view(B, ctx.num_heads, -1)
```

All input tensors to `cuet.attention_pair_bias()` in the forward pass are verified contiguous, and the forward pass completes successfully every time. The error occurs during backpropagation when gradient tensors become non-contiguous.

Environment
- Hardware: H100 GPUs
- cuEquivariance version: 0.6.1
- PyTorch version: 2.7
- CUDA version: 12.x

We've implemented a monkey patch that makes gradient tensors contiguous before calling the original backward function, which resolves the issue. However:
- Should `AttentionPairBiasMask.backward()` handle non-contiguous gradient tensors? (A simple fix would be changing `view()` to `reshape()`)
- Or is it expected that all gradient inputs will be contiguous? If so, what's the recommended approach for ensuring this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`AttentionPairBiasMask.backward()` fails with non-contiguous gradient tensors despite contiguous forward inputs #177

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttentionPairBiasMask.backward() fails with non-contiguous gradient tensors despite contiguous forward inputs #177

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`AttentionPairBiasMask.backward()` fails with non-contiguous gradient tensors despite contiguous forward inputs #177