Skip to content

Fng infra install fix v2#5

Merged
FNGarvin merged 5 commits intomainfrom
fng-infra-install-fix-v2
Mar 5, 2026
Merged

Fng infra install fix v2#5
FNGarvin merged 5 commits intomainfrom
fng-infra-install-fix-v2

Conversation

@FNGarvin
Copy link
Owner

@FNGarvin FNGarvin commented Mar 3, 2026

Fng infra install fix v2

Replaces and combines runpod#240 and runpod#241.

Hot off the presses: freshly rebased for latest status of main as of this writing.

Test plan:

  • Verify that root and non-root installs still work in a clean Linux environment:

    • podman run --rm -v $(pwd):/workspace alpine:latest sh -c 'apk add --no-cache bash wget grep sed tar coreutils jq curl && cp /workspace/install.sh /tmp/install.sh && chmod 755 /tmp/install.sh && echo "=== TESTING ROOT INSTALL ===" && cd /tmp && bash /tmp/install.sh && /usr/local/bin/runpodctl version && echo -e "\n=== TESTING NON-ROOT INSTALL ===" && adduser -D tester && su tester -c "cd /home/tester && mkdir -p ~/.local/bin && bash /tmp/install.sh" && su tester -c "~/.local/bin/runpodctl version"''
    • non-root should warn that it's a user-only install and also warn if we installed into a location that wasn't already on the path.
  • Local test of new e2e test against live Runpod services:

    • export RUNPOD_API_KEY=XXXYYYZZZ && go build -o runpodctl main.go && go test -tags e2e -v ./e2e/cli_lifecycle_test.go from shell
    • Naturally, review e2e test to ensure you're comfortable using your key.
    • The export is necessary even if you've previously blessed a key with runpodctl doctor. Seemed sensible for a ci-focused test.
  • CI testing:

    • Run the test from the integration matrix manually or trigger it via appropriate action.

Summary by Sourcery

Improve the runpodctl installation script for non-root environments and add end-to-end integration testing against live RunPod services.

New Features:

  • Add end-to-end CLI lifecycle tests for pods and serverless endpoints that exercise create, list, get, update, and cleanup flows against live RunPod APIs.
  • Introduce a GitHub Actions integration test workflow that runs the e2e suite in both root and non-root user modes with optional configurable images.

Enhancements:

  • Revise the installer to support non-root, user-space installation paths with clearer messaging and PATH guidance.
  • Remove the jq dependency from the installer by relying on standard tools (wget, grep, sed, tar) and validate semantic version tags when fetching releases.
  • Improve installer robustness by handling multiple download URLs, cleaning up temporary files, and preferring Homebrew installation on macOS when available.

CI:

  • Add an integration-tests GitHub Actions workflow that conditionally uses a private RUNPOD_API_KEY secret, supports manual triggers with image overrides, and includes safety cleanup of CI-created resources.

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 3, 2026

Reviewer's Guide

Refactors the install.sh script to support non-root, jq-free installations with better path detection and macOS Homebrew support, adds Go end-to-end lifecycle tests for pods and serverless resources against live RunPod APIs, and wires them into a GitHub Actions integration-test matrix that exercises both root and non-root modes while cleaning up CI resources.

Sequence diagram for CI-triggered e2e integration tests with conditional API key usage

sequenceDiagram
    actor Dev as Developer
    participant GH as GitHub
    participant WF as "integration-tests workflow"
    participant R as GitHub_Runner
    participant GoT as go_test_e2e
    participant RP as RunPod_API

    Dev->>GH: Push commit / open PR
    GH-->>WF: Trigger integration-tests workflow

    WF->>R: Start job (matrix: root & non-root)

    par Root and non-root matrix
        R->>R: Set up environment (install Go, build runpodctl)
        R->>R: Evaluate RUNPOD_API_KEY secret

        alt RUNPOD_API_KEY present
            R->>GoT: go test -tags e2e ./e2e/cli_lifecycle_test.go
            GoT->>RP: Create pod / serverless resources
            GoT->>RP: Poll for status and exercise lifecycle
            GoT->>RP: Delete created resources
            GoT-->>R: Exit 0 (tests passed)
        else RUNPOD_API_KEY missing (fork or external PR)
            R->>GoT: go test -tags e2e ./e2e/cli_lifecycle_test.go
            GoT-->>R: Detect missing key and skip e2e tests gracefully
            R-->>WF: Mark job as success with skipped tests
        end
    end

    WF-->>GH: Report integration-test status on commit/PR
Loading

File-Level Changes

Change Details Files
Make the CLI installer non-root friendly, remove jq dependency, and improve download/resilience and macOS support.
  • Introduce detect_install_dir to choose an appropriate, writable install directory based on EUID, PATH, and common user bin locations, with high-visibility messaging and PATH hints for non-root installs.
  • Relax check_root to only warn on non-root and rely on INSTALL_DIR instead of hard-coded /usr/local/bin, including writeability checks before moving the binary.
  • Replace jq-based GitHub API parsing with grep/sed extraction for tag_name and validate that a plausible semantic version is returned.
  • Refactor download_url_constructor to produce an array of candidate URLs (including universal macOS binary and Linux arch-specific URLs) and update download_and_install_cli to iterate through them with better error messaging, cleanup of archives, and writable-dir validation.
  • Add try_brew_install to prefer Homebrew installation on macOS (using the invoking non-root user when available) and fall back to binary installation when brew is unavailable or fails.
  • Tighten system requirements by checking only for wget, tar, grep, and sed and failing fast if they are missing.
install.sh
Add Go e2e tests that exercise pod and serverless lifecycles via the runpodctl binary against live RunPod APIs, with key redaction and robust ID/JSON handling.
  • Create helpers to locate the runpodctl binary in common locations, run commands while capturing combined stdout/stderr, redact API keys from output, and extract id fields from JSON snippets embedded in CLI output.
  • Implement TestE2E_CLILifecycle_Pod to create, list, get, update, stop, start, and finally delete a pod using environment-configurable images/disk sizes and t.Cleanup-based teardown, plus optional croc send/receive testing gated by RUNPOD_E2E_TEST_CROC.
  • Implement TestE2E_CLILifecycle_Serverless to create a temporary serverless template, create an endpoint from it, wait for propagation, list and validate presence, update the endpoint name, and clean up template and endpoint resources.
  • Add robust handling of partial/extra CLI output by locating JSON blocks (objects/arrays) via string indices before unmarshalling.
e2e/cli_lifecycle_test.go
Introduce a GitHub Actions integration test workflow that runs the e2e tests in both root and non-root modes and performs defensive cleanup of CI-created resources.
  • Define an integration-tests matrix workflow triggered on pushes, PRs, and manual dispatch with inputs for pod and serverless images, using a concurrency group per ref to avoid overlap.
  • Set up Go 1.25.7 and install required CLI dependencies for tests on ubuntu-latest runners.
  • Build the runpodctl binary once and, for the non-root matrix leg, create a tester user and isolated workspace under /tmp with proper ownership.
  • Run go test -tags e2e against the e2e/cli_lifecycle_test.go file in both root and non-root modes, exporting RUNPOD_API_KEY and image overrides via workflow inputs or defaults.
  • Add an always-running cleanup step that, when RUNPOD_API_KEY is present, lists pods and serverless endpoints whose names start with ci-test- and deletes them to avoid resource leaks from CI runs.
.github/workflows/integration-tests.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The GitHub release version parsing in install.sh using grep/sed on the API response is fairly brittle; consider at least validating the extracted string (e.g., it starts with v and contains no whitespace) or falling back with a clearer error when the response shape changes or is rate-limited.
  • In e2e/cli_lifecycle_test.go, several filesystem operations (os.WriteFile, os.MkdirAll, os.RemoveAll) ignore returned errors; checking and failing early on these would make test failures easier to diagnose and avoid silent misbehavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The GitHub release version parsing in `install.sh` using `grep`/`sed` on the API response is fairly brittle; consider at least validating the extracted string (e.g., it starts with `v` and contains no whitespace) or falling back with a clearer error when the response shape changes or is rate-limited.
- In `e2e/cli_lifecycle_test.go`, several filesystem operations (`os.WriteFile`, `os.MkdirAll`, `os.RemoveAll`) ignore returned errors; checking and failing early on these would make test failures easier to diagnose and avoid silent misbehavior.

## Individual Comments

### Comment 1
<location path="install.sh" line_range="111" />
<code_context>
     local version_url="https://api.github.com/repos/runpod/runpodctl/releases/latest"
-    VERSION=$(wget -q -O- "$version_url" | jq -r '.tag_name')
+    # Using grep/sed instead of jq for zero-dependency parsing
+    VERSION=$(wget -q -O- "$version_url" | grep '"tag_name":' | sed -E 's/.*"tag_name": "([^"]+)".*/\1/')
+    
     if [ -z "$VERSION" ]; then
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The ad‑hoc grep/sed JSON parsing of the GitHub API response is fairly brittle and may break if the response format changes or includes multiple tag_name fields.

This zero-dependency approach increases the risk of extracting the wrong value (e.g., from HTML error pages, rate‑limit responses, or future responses with additional `"tag_name"` fields). To mitigate this, you could either tighten the grep/sed pattern (e.g., match a more specific JSON context or anchor at line start) and/or validate `VERSION` against an expected tag format (e.g., `^v[0-9]`). That helps avoid silently using an incorrect or non-existent release.

Suggested implementation:

```
    # Using grep/sed instead of jq for zero-dependency parsing
    # - Restrict to the first matching tag_name line
    # - Expect the canonical JSON indentation for the field
    VERSION=$(wget -q -O- "$version_url" \
        | grep -m1 '^  "tag_name":' \
        | sed -E 's/^[^"]*"tag_name": "([^"]+)".*/\1/')


```

```
    # Ensure we got a plausible semantic version tag (e.g., v1.2.3)
    if [ -z "$VERSION" ] || ! [[ "$VERSION" =~ ^v[0-9]+(\.[0-9]+)*$ ]]; then
        echo "Failed to fetch a valid latest version of runpodctl (got: '${VERSION:-<empty>}')."
        exit 1

```

These changes assume:
1. The script is executed with Bash (for the `[[ ... =~ ... ]]` regex test). If `install.sh` is intended for `/bin/sh`, replace the regex check with a `case` statement instead.
2. The GitHub API continues to indent `"tag_name"` with two spaces. If you want to be more permissive, you can relax the `grep` pattern to `grep -m1 '"tag_name":'` at the cost of slightly looser matching.
</issue_to_address>

### Comment 2
<location path="e2e/cli_lifecycle_test.go" line_range="161-164" />
<code_context>
+		t.Errorf("Pod ID %s not found in list output", podID)
+	}
+
+	// Get Pod
+	t.Logf("Getting pod details...")
+	getOut, getErr := runCLI("pod", "get", podID, "--output", "json")
+	if getErr != nil {
+		t.Errorf("Failed to get pod: %v\nOutput: %s", getErr, getOut)
+	}
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen pod lifecycle assertions by validating the returned data, not just the absence of errors

Right now the test only verifies that `pod get` exits successfully. To better validate the CLI contract, you could parse `getOut` (as with `extractIDField`) and assert that the returned `id` equals `podID` and, after the update, that the `name` equals `newName`. This will help catch regressions in the CLI output shape or contents, not just failures to run the command.
</issue_to_address>

### Comment 3
<location path="e2e/cli_lifecycle_test.go" line_range="328-172" />
<code_context>
+
+	t.Logf("Endpoint is ready and propagated.")
+
+	// List
+	listOut, listErr := runCLI("serverless", "list", "--output", "json")
+	if listErr != nil {
+		t.Errorf("Failed to list endpoints: %v\nOutput: %s", listErr, listOut)
+	} else if !strings.Contains(listOut, epID) {
+		t.Errorf("Endpoint ID %s not found in list output", epID)
+	}
+
+	// Update
+	newName := epName + "-updated"
+	t.Logf("Updating endpoint name to %s...", newName)
+	updateOut, updateErr := runCLI("serverless", "update", epID, "--name", newName)
+	if updateErr != nil {
+		t.Errorf("Failed to update serverless endpoint: %v\nOutput: %s", updateErr, updateOut)
+	}
</code_context>
<issue_to_address>
**suggestion (testing):** Assert serverless endpoint state (ID and updated name) from CLI output to fully validate the lifecycle

Right now this test only checks that `list`/`get` don’t error and that `listOut` contains `epID`, but it never confirms the update took effect. To tighten this:
- Parse `listOut`/`get` as JSON and assert there’s an entry for `epID`.
- After `update`, call `serverless get` and assert the name equals `newName`.
This ensures we verify actual state changes, not just successful exit codes.

Suggested implementation:

```golang
	t.Logf("Endpoint is ready and propagated.")

	// List endpoints and assert the created endpoint exists
	listOut, listErr := runCLI("serverless", "list", "--output", "json")
	if listErr != nil {
		t.Fatalf("Failed to list endpoints: %v\nOutput: %s", listErr, listOut)
	}

	type serverlessEndpoint struct {
		ID   string `json:"id"`
		Name string `json:"name"`
	}

	var endpoints []serverlessEndpoint
	if err := json.Unmarshal([]byte(listOut), &endpoints); err != nil {
		t.Fatalf("Failed to parse serverless list output as JSON: %v\nOutput: %s", err, listOut)
	}

	var listedEp *serverlessEndpoint
	for i := range endpoints {
		if endpoints[i].ID == epID {
			listedEp = &endpoints[i]
			break
		}
	}
	if listedEp == nil {
		t.Fatalf("Endpoint ID %s not found in serverless list output", epID)
	}

	// Update endpoint name
	newName := epName + "-updated"
	t.Logf("Updating endpoint name to %s...", newName)
	updateOut, updateErr := runCLI("serverless", "update", epID, "--name", newName)
	if updateErr != nil {
		t.Fatalf("Failed to update serverless endpoint: %v\nOutput: %s", updateErr, updateOut)
	}

	// Get endpoint and assert the name was updated
	getOut, getErr := runCLI("serverless", "get", epID, "--output", "json")
	if getErr != nil {
		t.Fatalf("Failed to get serverless endpoint: %v\nOutput: %s", getErr, getOut)
	}

	var updatedEp serverlessEndpoint
	if err := json.Unmarshal([]byte(getOut), &updatedEp); err != nil {
		t.Fatalf("Failed to parse serverless get output as JSON: %v\nOutput: %s", err, getOut)
	}

	if updatedEp.ID != epID {
		t.Fatalf("Expected endpoint ID %s from get, got %s", epID, updatedEp.ID)
	}
	if updatedEp.Name != newName {
		t.Fatalf("Expected endpoint name %s after update, got %s", newName, updatedEp.Name)
	}

	return fallback
}

```

1. Ensure `encoding/json` is imported at the top of `e2e/cli_lifecycle_test.go`, for example by updating the import block:
   - Add: `encoding/json`
2. If the actual JSON shape from `serverless list/get` differs (e.g., wrapped in an object like `{ "endpoints": [...] }` or uses different field names such as `endpointId` / `endpointName`), adjust the `serverlessEndpoint` struct and unmarshalling logic accordingly:
   - Update struct tags (e.g., ``ID string `json:"endpointId"` ``).
   - If list is wrapped, unmarshal into an intermediate struct like `struct { Endpoints []serverlessEndpoint `json:"endpoints"` }`.
</issue_to_address>

### Comment 4
<location path="e2e/cli_lifecycle_test.go" line_range="190" />
<code_context>
+		t.Errorf("Failed to start pod: %v\nOutput: %s", startErr, startOut)
+	}
+
+	// Test Croc File Transfer (Send/Receive)
+	t.Logf("Testing croc file transfer...")
+	testFileName := "ci-test-file.txt"
</code_context>
<issue_to_address>
**suggestion (testing):** Clarify expectations and failure behavior for the croc send/receive section to avoid silent test degradation

This croc file transfer block is currently best-effort and hides several failure modes (missing binary, send not starting, code not captured in time, receive failing). That means the main test can still pass while croc support is effectively broken. Please either:
- Make the croc check explicitly optional (e.g., behind a `RUNPOD_E2E_TEST_CROC` env var) and log/skip clearly when disabled, or
- Treat it as required by asserting that `binaryPath` is found and `sendCmd.Start()` (and other critical steps) succeed, failing the test if those expectations aren’t met.
This will surface when croc coverage is actually running in CI instead of silently being skipped.

Suggested implementation:

```golang
	// Test Croc File Transfer (Send/Receive)
	enableCroc := os.Getenv("RUNPOD_E2E_TEST_CROC") != ""
	if !enableCroc {
		t.Logf("Skipping croc file transfer test: RUNPOD_E2E_TEST_CROC not set")
	} else {
		t.Logf("RUNPOD_E2E_TEST_CROC set; croc file transfer test is required")
	}

	t.Logf("Testing croc file transfer...")
	testFileName := "ci-test-file.txt"
	testFileContent := "v1.14.15-ci-test"
	if err := os.WriteFile(testFileName, []byte(testFileContent), 0644); err != nil && enableCroc {
		t.Fatalf("Failed to create croc test file %q: %v", testFileName, err)
	}
	defer os.Remove(testFileName)

```

To fully implement the behavior described in your review comment, you should:

1. **Guard the entire croc test block with `enableCroc`:**
   - Wrap all subsequent croc send/receive logic (starting at `// Test Croc File Transfer (Send/Receive)` and ending where the croc-related code finishes) in:
   ```go
   if enableCroc {
       // existing croc send/receive code...
   }
   ```
   This ensures nothing runs when `RUNPOD_E2E_TEST_CROC` is not set.

2. **Fail fast when croc is required but the binary is missing:**
   - Immediately after the `for _, p := range []string{...}` loop that resolves `binaryPath`, add:
   ```go
   if enableCroc && binaryPath == "" {
       t.Fatalf("RUNPOD_E2E_TEST_CROC is set but runpodctl binary was not found in any of the expected paths")
   }
   ```

3. **Treat critical croc steps as required when enabled:**
   For the send/receive commands (which are not visible in the snippet), ensure:
   - `exec.Command` creation is checked for errors (if applicable; usually it returns *Cmd only).
   - `sendCmd.Start()` is checked:
     ```go
     if err := sendCmd.Start(); err != nil && enableCroc {
         t.Fatalf("Failed to start croc send command: %v", err)
     }
     ```
   - Any subsequent `Wait()`, `Run()`, or `CombinedOutput()` calls for both send and receive commands are checked, and when `enableCroc` is true, failures should call `t.Fatalf` with clear messages (include stderr/stdout in the message).

4. **Imports:**
   - Ensure `os` and `os/exec` are already imported at the top of the file. If not, add:
   ```go
   import (
       "os"
       "os/exec"
       // ...other imports...
   )
   ```
   The code above assumes these packages are available.

With these changes, croc coverage will be explicitly optional (via `RUNPOD_E2E_TEST_CROC`), clearly logged when disabled, and strictly enforced when enabled, avoiding silent degradation.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@FNGarvin
Copy link
Owner Author

FNGarvin commented Mar 4, 2026

@sourcery-ai review, please?

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The integration workflow is configured with go-version: '1.25.7', which does not exist yet and will cause the job to fail; update this to a currently supported Go release (e.g., 1.22.x) that matches the project’s target version.
  • The croc code extraction logic in TestE2E_CLILifecycle_Pod is quite brittle (scanning arbitrary lines for hyphenated tokens); consider matching the exact expected output format from runpodctl send or printing a machine-readable code line to make the parsing deterministic.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The integration workflow is configured with `go-version: '1.25.7'`, which does not exist yet and will cause the job to fail; update this to a currently supported Go release (e.g., 1.22.x) that matches the project’s target version.
- The croc code extraction logic in `TestE2E_CLILifecycle_Pod` is quite brittle (scanning arbitrary lines for hyphenated tokens); consider matching the exact expected output format from `runpodctl send` or printing a machine-readable code line to make the parsing deterministic.

## Individual Comments

### Comment 1
<location path="install.sh" line_range="223-226" />
<code_context>
-    echo "runpodctl installed successfully."
-}

+    if ! mv "$cli_file_name" "$INSTALL_DIR/"; then
+        echo "Failed to move $cli_file_name to $INSTALL_DIR/."
+        exit 1
+    fi
</code_context>
<issue_to_address>
**suggestion:** On move failure, the extracted binary is left behind in the working directory.

The archive is cleaned up (`rm -f "$cli_archive_file_name"`), but if `mv` to `$INSTALL_DIR` fails, the extracted `runpodctl` remains in the current directory. Please either delete it on failure (e.g. `rm -f "$cli_file_name"`) or at least print its path so users can handle it explicitly.

```suggestion
    if ! mv "$cli_file_name" "$INSTALL_DIR/"; then
        echo "Failed to move $cli_file_name to $INSTALL_DIR/."
        echo "Removing extracted binary at '$(pwd)/$cli_file_name' to avoid leaving stray files behind."
        rm -f "$cli_file_name"
        exit 1
    fi
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@FNGarvin FNGarvin force-pushed the fng-infra-install-fix-v2 branch 5 times, most recently from 147f55d to 2a39607 Compare March 5, 2026 15:12
@FNGarvin FNGarvin force-pushed the fng-infra-install-fix-v2 branch from 2a39607 to ec2d06f Compare March 5, 2026 15:20
@FNGarvin FNGarvin merged commit 24ec641 into main Mar 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant