[Fixes: mosip/mosip-infra#1874] Add wg-onboard.sh for WireGuard onboa…#253
[Fixes: mosip/mosip-infra#1874] Add wg-onboard.sh for WireGuard onboa…#253Ivanmeneges wants to merge 7 commits into
Conversation
…rding automation This script automates the onboarding process for WireGuard in a new environment, handling peer allocation, configuration transformation, and publishing secrets to GitHub. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughA new Bash script ChangesWireGuard Onboarding Script and Workflow
Sequence Diagram(s)sequenceDiagram
participant Operator
participant Workflow as GitHub Actions
participant Script as wg-onboard.sh
participant Jumpserver
participant WGDocker as WireGuard Docker
participant GitHubAPI as GitHub API
rect rgba(70, 130, 180, 0.5)
note over Operator,Script: Workflow dispatch triggered
Operator->>Workflow: Trigger with env name, jumpserver, flags
Workflow->>Workflow: Validate secrets (ACTION_PAT, MOSIP_AWS_PEM)
Workflow->>Workflow: Write & verify SSH key
end
rect rgba(100, 149, 237, 0.5)
note over Script,WGDocker: Preflight and Atomic Allocation
Script->>Jumpserver: SSH ls WG_DIR, inventory peerN, check templates
Jumpserver-->>Script: existing peers, MAX_PEERS, CAN_CREATE flag
Script->>Jumpserver: SSH heredoc (flock ALLOC_LOCK)
Jumpserver->>Jumpserver: parse assigned.txt, reuse or gap-fill 3 peers
alt peer config missing and CAN_CREATE
Jumpserver->>WGDocker: generate privkey, pubkey, PSK
WGDocker-->>Jumpserver: keys
Jumpserver->>Jumpserver: write peerN.conf, update wg0 runtime
end
Jumpserver->>Jumpserver: record to assigned.txt
Jumpserver-->>Script: TF_PEER, WG0_PEER, WG1_PEER, REUSED flag
end
rect rgba(60, 179, 113, 0.5)
note over Script,GitHubAPI: Transform Configs and Publish Secrets
Script->>Jumpserver: SSH cat peerN.conf (×3)
Jumpserver-->>Script: raw confs
Script->>Script: validate/transform: remove DNS, set AllowedIPs CIDR
alt dry-run mode
Script->>Script: log intended entries + exit
else publish
Script->>GitHubAPI: ensure GitHub environment exists
loop each of 3 secrets
Script->>GitHubAPI: gh secret set ENV/SECRET_NAME
alt publish fails
Script->>GitHubAPI: gh secret delete (rollback)
Script->>Jumpserver: SSH rollback assigned.txt
Script->>Script: exit non-zero
end
end
end
end
rect rgba(184, 134, 11, 0.5)
note over Script,Workflow: Local Tracking and Repository Push
Script->>Script: flock, append to wg-peer-allocation.tsv
Script-->>Workflow: allocation complete
alt DRY_RUN is false
Workflow->>Workflow: Check TSV for changes
alt TSV changed
Workflow->>Workflow: git pull --rebase (handle concurrent)
Workflow->>Workflow: git commit wg-peer-allocation.tsv (signed)
Workflow->>Workflow: git push to GITHUB_REF_NAME
end
end
Workflow->>Workflow: Remove temp SSH key
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 272-320: The record_assignment function writes a temp file and
then uses cat to copy it to the destination file, which is not atomic and can
leave the file empty or partial if the write fails. Replace the final two lines
(cat "$tmp" > "$file" and rm -f "$tmp") with a single mv "$tmp" "$file" command
to atomically replace the file after successful generation. Apply the same
atomic replacement pattern to the remove_peer_line function as well.
- Around line 139-144: The wg_onboard_exit_cleanup function currently references
the TMP variable without initialization, which could be inherited from the
caller and cause deletion of unrelated files. Create a new script-owned variable
initialized to an empty string (such as WG_ONBOARD_TMP) at the beginning of the
script before the EXIT trap is installed. Replace all references to TMP in the
wg_onboard_exit_cleanup function and throughout the script with this new
script-owned variable. This ensures only temporary files created by this script
are cleaned up, not files from the caller's environment.
- Around line 438-465: The script skips the entire existing allocation lookup
when any FORCE_TF, FORCE_WG0, or FORCE_WG1 flag is set, which causes duplicate
assignments in assigned.txt when re-running with forced peers. Remove or modify
the condition that gates the allocation lookup (the check for empty FORCE_TF,
FORCE_WG0, FORCE_WG1 flags), so the existing assignment discovery always runs
for tf, wg0, and wg1. Then apply the force overrides after the lookup completes.
When a force override changes a peer for an existing environment, either
atomically replace the old per-secret assignment line in assigned.txt or fail
with a clear error message indicating manual cleanup is needed.
- Around line 146-157: The security issue is that the script defaults to
creating a temporary SSH_KNOWN_HOSTS file via mktemp and uses
StrictHostKeyChecking=accept-new, which allows any host key to be accepted
without validation on each CI run, creating a vector for host impersonation
attacks. Fix this by either making SSH_KNOWN_HOSTS required by removing the else
block that creates the temporary file, or by adding an explicit environment
variable flag that gates the accept-new behavior. When the flag is not
explicitly set, fall back to requiring a pre-configured known_hosts file,
ensuring trust-on-first-use is only enabled through explicit opt-in rather than
by default in the ssh_cmd function.
- Around line 197-200: The preflight check in the CAN_CREATE_PEERS condition
allows peer creation when a template file exists at templates/peer.conf, but the
ensure_peer_conf function (also mentioned in lines 330-333) does not actually
use this template and will fail if no existing peerN/peerN.conf configurations
are present. Remove the template existence check from the conditional expression
in the preflight validation so that CAN_CREATE_PEERS is only set to true when
there are existing peers that can be used as a basis for new peer creation,
ensuring the preflight check accurately reflects what the actual peer creation
logic can accomplish.
- Around line 485-490: The validation in the peer loop does not check if the
peer ID numeric value falls within the configured pool range. After extracting
the numeric portion into variable n using parameter expansion
(n="${peer#peer}"), add validation to ensure that n is within the valid range 1
to MAX_PEERS before calling ensure_peer_conf. This validation should reject peer
IDs that are outside the configured pool, preventing forced assignment of peers
like peer0 or peer999 when MAX_PEERS limits the actual pool size.
- Around line 153-162: The `ssh_cmd()` function definition is correct, but all
calls to it throughout the script pass user-controlled variables without proper
shell escaping, creating command injection vulnerabilities. Create a
`remote_quote()` helper function that properly escapes special characters in
shell strings, then apply this function to all unescaped variables before
passing them to `ssh_cmd()`. Specifically, find all instances where variables
like `CONFIG_DIR`, `ASSIGNED_FILE`, `ASSIGN_LOCK_FILE`, peer names, and ticket
labels are embedded in command strings passed to `ssh_cmd()` (such as at line
166 with the ls command, lines 217-220 and 528-530 with bash -s calls, and line
620 with the cat command), and wrap each variable with `$(remote_quote
"$VARIABLE")` to ensure proper escaping regardless of the variable content.
- Around line 653-672: The script records peer assignments before GitHub
operations, but fails to clean up on partial success. If environment creation
fails at the initial PUT call, the script exits via die without rolling back the
pre-recorded assignments. If secret publishing fails after partial success, the
code calls atomic_rollback_assignments to free peers without first deleting the
already-published secrets from GitHub, leaving stale credentials exposed. Add a
trap or error handler around the environment creation to trigger rollback if it
fails. Additionally, before calling atomic_rollback_assignments in the secret
publishing failure path, iterate through the successfully published secrets
(tracked via PUBLISHED_SECRETS counter and SECRET_NAMES array indices) and
delete each one using gh secret delete with the --env flag for the specific
environment to ensure stale secrets are removed before peer assignment records
are rolled back.
- Around line 96-109: The command-line argument parsing loop in the while
statement expands $2 directly in each case branch before require_arg can
validate whether the argument value exists. When set -u is enabled and a flag is
missing its required value, Bash exits with an unbound variable error instead of
allowing require_arg to provide the intended error message. Replace all
instances of "$2" in the case branches (for flags like --env, --host, --ssh-key,
--repo, --ticket, --wg-dir, --allowed-ips, --tf-peer, --wg0-peer, --wg1-peer,
and --max-peers) with "${2-}" to provide an empty default value, allowing
require_arg to validate the argument properly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4c2ab4d9-9819-4131-97e3-762adcee4a38
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Harden CLI parsing, SSH quoting, allocation lookup with forced peers, atomic assigned.txt writes, pool bounds, and rollback of both GitHub secrets and jumpserver assignments on partial publish failures. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 602-620: The rollback_published_secrets function logs deletion
failures but still returns success (0), allowing rollback_allocation_state to
proceed with atomic_rollback_assignments even when secrets remain. Modify
rollback_published_secrets to track whether any secret deletions failed and
return a non-zero exit code if so, then update rollback_allocation_state to
check the return status of rollback_published_secrets before calling
atomic_rollback_assignments, ensuring peer assignments are only freed if all
secrets are successfully deleted.
- Around line 743-746: The temporary file created with mktemp defaults to /tmp,
causing the subsequent mv operation to be non-atomic across filesystems and
potentially replacing the allocation tracker with a file having restrictive 0600
permissions. Modify the TRACKER_TMP assignment to create the temporary file in
the same directory as $ALLOCATION_FILE by using mktemp with a template that
includes the directory path of $ALLOCATION_FILE (for example, using dirname to
extract the directory). This ensures the mv operation stays within the same
filesystem and behaves atomically.
- Around line 211-213: In the peer_config_exists() function, split the single
local statement containing both peer and conf assignments into two separate
local statements. First declare and assign local peer="$1", then on the next
line declare and assign local conf="$CONFIG_DIR/$peer/$peer.conf". This ensures
that the peer variable is properly set before it is referenced in the conf
variable assignment, preventing unbound variable errors when set -u is enabled
and avoiding stale global peer values.
- Around line 223-234: The remote_quote function expansions in the ssh_cmd bash
command (lines 223-234 and 557-565) are not properly quoted locally, causing the
local shell to split arguments containing whitespace before they reach ssh. Wrap
each $(remote_quote ...) expansion in double quotes to ensure paths and labels
with whitespace are passed to the remote shell as single argv tokens. This
applies to all the variable expansions like $(remote_quote "$ASSIGNED_FILE"),
$(remote_quote "$ASSIGN_LOCK_FILE"), $(remote_quote "$CONFIG_DIR"), and all
other similar expansions in that command block.
- Line 172: The ssh_cmd invocations that include `bash -c` are incorrectly
formatted, causing OpenSSH to parse the remote command improperly. Remove the
`bash -c` wrapper from the ssh_cmd calls (keeping the command string that
follows), as ssh_cmd already passes the entire string as a single argument to
the remote shell. This applies to four instances of the pattern `ssh_cmd bash -c
"..."` which should become just `ssh_cmd "..."`. Note that the `bash -s` pattern
used elsewhere should remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 364cb21e-fdb1-460d-94a9-23c5e43d0143
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Remove broken bash -c SSH wrappers, split dependent local declarations, quote remote argv expansions, guard assignment rollback on secret delete failure, and create tracker temp files beside the allocation TSV. Co-authored-by: Cursor <cursoragent@cursor.com>
remote_quote is for inline remote shell commands only; bash -s receives plain argv so ASSIGNED_FILE and related paths are not polluted with literal quote characters. Co-authored-by: Cursor <cursoragent@cursor.com>
Preflight SSH check and clearer errors when ubuntu cannot update assigned.txt; write temp files beside the target for atomic replace. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
This workflow automates the onboarding of WireGuard environments by allocating peers and managing GitHub secrets. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/wg-onboard.yml (1)
53-62: 💤 Low valuePass secrets through environment variables instead of direct interpolation.
Direct interpolation of
${{ secrets.X }}in shell context is a minor security anti-pattern. While unlikely for PAT/PEM values, secrets containing shell metacharacters could cause unexpected behavior.Suggested improvement
- name: Validate required secrets + env: + ACTION_PAT: ${{ secrets.ACTION_PAT }} + MOSIP_AWS_PEM: ${{ secrets.MOSIP_AWS_PEM }} run: | missing=() - [[ -z "${{ secrets.ACTION_PAT }}" ]] && missing+=(ACTION_PAT) - [[ -z "${{ secrets.MOSIP_AWS_PEM }}" ]] && missing+=(MOSIP_AWS_PEM) + [[ -z "$ACTION_PAT" ]] && missing+=(ACTION_PAT) + [[ -z "$MOSIP_AWS_PEM" ]] && missing+=(MOSIP_AWS_PEM) if ((${`#missing`[@]})); then echo "ERROR: Missing repository secrets: ${missing[*]}" echo "Add them under Settings → Secrets and variables → Actions for ${GITHUB_REPOSITORY}" exit 1 fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/wg-onboard.yml around lines 53 - 62, The Validate required secrets step directly interpolates secrets using ${{ secrets.ACTION_PAT }} and ${{ secrets.MOSIP_AWS_PEM }} syntax in the shell script, which is a security anti-pattern. Instead, add an env section to the step that maps these secrets to environment variables (such as ACTION_PAT_ENV and MOSIP_AWS_PEM_ENV), then update the conditional checks to reference these environment variables (e.g., [[ -z "$ACTION_PAT_ENV" ]]) instead of directly interpolating the secrets inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/wg-onboard.yml:
- Around line 53-62: The Validate required secrets step directly interpolates
secrets using ${{ secrets.ACTION_PAT }} and ${{ secrets.MOSIP_AWS_PEM }} syntax
in the shell script, which is a security anti-pattern. Instead, add an env
section to the step that maps these secrets to environment variables (such as
ACTION_PAT_ENV and MOSIP_AWS_PEM_ENV), then update the conditional checks to
reference these environment variables (e.g., [[ -z "$ACTION_PAT_ENV" ]]) instead
of directly interpolating the secrets inline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 646ca814-c8a6-44e1-9956-d336edcf5d86
📒 Files selected for processing (1)
.github/workflows/wg-onboard.yml
…rding automation
This script automates the onboarding process for WireGuard in a new environment, handling peer allocation, configuration transformation, and publishing secrets to GitHub.
Summary by CodeRabbit
Release Notes