Skip to content

Conversation

@scylladbbot
Copy link

since we are running into some issue we are failing to have ssh access but zero logs cause of it (we have multiple time during the years, credentails is or other cloud-init/boot issues)

in this change we are gonna make sure ssm-agents are working on our instances, and fallback to log during log collection if we can't have ssh access

  • added it to the regio configuration to enable it
  • added it the top of the cloud-init to unmask the agent see: scylladb/scylla-machine-image@b8e494d
  • SSMCommandRunner which have run() api as with our ssh based remoters
  • CommandLog collection is falling back to use SSMCommandRunner

Ref: #11581

TODO

  • configure all regions

Testing

  • locally - tested SSM implementation via actual machines, and region configuration code
  • aws provision
  • locally hardcode the fallback - to validate it's working

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)

  • Add unit tests to cover my changes (under unit-test/ folder)

  • Update the Readme/doc folder relevant to this change (if needed)

  • (cherry picked from commit 5aad56a)

Parent PR: #12298

@scylladbbot
Copy link
Author

@fruch - This PR has conflicts, therefore it was moved to draft
Please resolve them and mark this PR as ready for review

@fruch fruch force-pushed the backport/12298/to-perf-v17 branch from 82d900e to b88f5f1 Compare October 30, 2025 09:03
@fruch fruch marked this pull request as ready for review October 30, 2025 09:03
@fruch fruch removed the conflicts label Oct 30, 2025
since we are running into some issue we are failing to have ssh
access but zero logs cause of it (we have multiple time during the years,
credentails is or other cloud-init/boot issues)

in this change we are gonna make sure ssm-agents are working on our instances,
and fallback to log during log collection if we can't have ssh access

* added it to the regio configuration to enable it
* added it the top of the cloud-init to unmask the agent
  see: scylladb/scylla-machine-image@b8e494d
* `SSMCommandRunner` which have `run()` api as with our ssh based remoters
* `CommandLog` collection is falling back to use `SSMCommandRunner`

Ref: scylladb#11581

Update sdcm/utils/aws_region.py

Co-authored-by: Copilot <[email protected]>

Update sdcm/provision/aws/utils.py

Co-authored-by: Copilot <[email protected]>

Update sdcm/utils/aws_ssm_runner.py

Co-authored-by: Copilot <[email protected]>
(cherry picked from commit 5aad56a)
@fruch fruch force-pushed the backport/12298/to-perf-v17 branch from b88f5f1 to baedcf9 Compare October 30, 2025 13:57
@fruch fruch merged commit b54cf8b into scylladb:branch-perf-v17 Oct 30, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants