Skip to content

Support additional nodegroups #704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Support additional nodegroups #704

wants to merge 15 commits into from

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Jun 11, 2025

Important

I'm not sure the name additional_nodegroups is the best? Others are compute and login, although in a way it would be better if they contained _nodegroups too.

  • Adds opentofu variable additional_nodegroups to support defining non-Slurm nodes in the cluster. E.g.:

    additional_nodegroups = {
        squid = {
            nodes = ["squid-0"]
            flavor = var.other_node_flavor
        }
    }

    Nodes are automatically added to an inventory group of the same name as the node group.
    Slurm-controlled rebuild and compute-init is not supported, as these nodes will not be running slurmd.
    Security groups are those from opentofu variable nonlogin_security_groups by default, but may be overriden.

  • Also adds compute nodes into an inventory group of the same name as the note group, in addition to the existing ${cluster_name}_${group_name} inventory group required for stackhpc.openhpc role partition configuration. This simplifies multi-environment configuration.

Base automatically changed from control-ip-addresses to main June 13, 2025 14:30
@sjpb sjpb marked this pull request as ready for review June 13, 2025 14:35
@sjpb sjpb requested a review from a team as a code owner June 13, 2025 14:35
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is exactly as per tofu/login.tf, except it has non-login secgroups by default, and those can optionally be overriden.

Copy link
Member

@bertiethorpe bertiethorpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once CI passing. I agree that login and compute should be suffixed with "_nodegroups" but we can live with it. It would be good to test this in CI at some point too

@sjpb
Copy link
Collaborator Author

sjpb commented Jun 18, 2025

Should have said, this has been tested locally using the CI environment, so it does actually work! Bit loath to add it to CI as we already struggle with resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants