Race condition in `disrupt_abort_repair`

It is possible that during `disrupt_abort_repair`, the call to `storage_service/force_terminate_repair` arives just as a repair is finished and before another one starts, so in effect it does nothing, so the `nodetool repair` does not fail, and it runs until finished, which may trigger the timeout of 120s  on the thread.

```
Jun 09 23:49:17.110407 longevity-50gb-12h-2024-2-db-node-d169477f-2 scylla[6905]:  [shard  0:strm] repair - repair[dbc91c86-2a77-4c7a-91f9-b93721bb823e]: starting user-requested repair for keyspace system_distributed_everywhere, repair id 56, options {"trace": "false", "primaryRange": "false", "jobThreads": "1", "incremental": "false", "parallelism": "parallel"}
...
Jun 09 23:49:18.147909 longevity-50gb-12h-2024-2-db-node-d169477f-2 scylla[6905]:  [shard  0:strm] repair - repair[dbc91c86-2a77-4c7a-91f9-b93721bb823e]: completed successfully
Jun 09 23:49:18.282972 longevity-50gb-12h-2024-2-db-node-d169477f-2 scylla[6905]:  [shard  0:strm] repair - Started to abort repair jobs={}, nr_jobs=0
Jun 09 23:49:18.315701 longevity-50gb-12h-2024-2-db-node-d169477f-2 scylla[6905]:  [shard  0:strm] repair - repair[585f8d94-1250-4719-89ea-1f53a5bed91a]: starting user-requested repair for keyspace drop_table_during_repair_ks_6, repair id 57, options {"trace": "false", "primaryRange": "false", "jobThreads": "1", "incremental": "false", "parallelism": "parallel"}
Jun 09 23:49:18.315717 longevity-50gb-12h-2024-2-db-node-d169477f-2 scylla[6905]:  [shard  0:strm] repair - repair[585f8d94-1250-4719-89ea-1f53a5bed91a]: completed successfully: no tables to repair
```


---
## Packages

Scylla version: `2024.2.11-20250609.98e7e1fec707` with build-id `0137055552a86ec74fe7808066cd25cae9b712a1`
Kernel Version: `5.15.0-1085-aws`

## Installation details

Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run:
- longevity-50gb-12h-2024-2-db-node-d169477f-5 (34.201.146.63 | 10.12.10.226) (shards: 12)
- longevity-50gb-12h-2024-2-db-node-d169477f-4 (13.218.33.140 | 10.12.10.191) (shards: 11)
- longevity-50gb-12h-2024-2-db-node-d169477f-3 (18.209.14.117 | 10.12.9.2) (shards: -1)
- longevity-50gb-12h-2024-2-db-node-d169477f-2 (98.84.130.64 | 10.12.9.121) (shards: 11)
- longevity-50gb-12h-2024-2-db-node-d169477f-1 (13.218.246.21 | 10.12.10.20) (shards: 11)


OS / Image: `ami-0abd2efc39812f7d0` (aws: undefined_region)

Test: `longevity-150gb-asymmetric-cluster-12h-test`
Test id: `d169477f-0422-4d78-b8ce-864c35c693db`
Test name: `enterprise-2024.2/tier1/longevity-150gb-asymmetric-cluster-12h-test`
Test method: `longevity_test.LongevityTest.test_custom_time`
Test config file(s):

- [longevity-150GB-12h-autorization-LimitedMonkey.yaml](https://github.com/scylladb/scylla-cluster-tests/blob/f01fc9f4ae64e2cecae41380cda99ee4ddfc58ac/test-cases/longevity/longevity-150GB-12h-autorization-LimitedMonkey.yaml)


<details>
<summary>
Logs and commands
</summary>


- Restore Monitor Stack command: `$ hydra investigate show-monitor d169477f-0422-4d78-b8ce-864c35c693db`
- Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=d169477f-0422-4d78-b8ce-864c35c693db)
- Show all stored logs command: `$ hydra investigate show-logs d169477f-0422-4d78-b8ce-864c35c693db`


## Logs:
- **longevity-50gb-12h-2024-2-db-node-d169477f-5** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250609_120041/longevity-50gb-12h-2024-2-db-node-d169477f-5-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250609_120041/longevity-50gb-12h-2024-2-db-node-d169477f-5-d169477f.tar.gz)
- **db-cluster-d169477f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/db-cluster-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/db-cluster-d169477f.tar.gz)
- **sct-runner-events-d169477f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/sct-runner-events-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/sct-runner-events-d169477f.tar.gz)
- **sct-d169477f.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/sct-d169477f.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/sct-d169477f.log.tar.gz)
- **loader-set-d169477f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/loader-set-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/loader-set-d169477f.tar.gz)
- **monitor-set-d169477f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/monitor-set-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/monitor-set-d169477f.tar.gz)
- **ssl-conf-d169477f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/ssl-conf-d169477f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/d169477f-0422-4d78-b8ce-864c35c693db/20250610_002243/ssl-conf-d169477f.tar.gz)


[Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2024.2/job/tier1/job/longevity-150gb-asymmetric-cluster-12h-test/23/)
[Argus](https://argus.scylladb.com/test/e47c8c6d-308a-4939-a01e-eb10ae695340/runs?additionalRuns[]=d169477f-0422-4d78-b8ce-864c35c693db)
</details>

                            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Race condition in `disrupt_abort_repair` #11126

Packages

Installation details

Logs:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Race condition in disrupt_abort_repair #11126

Description

Packages

Installation details

Logs:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Race condition in `disrupt_abort_repair` #11126