Skip to content

feat: unified integration matrix with gated CI signaling#241

Closed
FNGarvin wants to merge 1 commit intorunpod:mainfrom
FNGarvin:feat-ci-tests
Closed

feat: unified integration matrix with gated CI signaling#241
FNGarvin wants to merge 1 commit intorunpod:mainfrom
FNGarvin:feat-ci-tests

Conversation

@FNGarvin
Copy link

PR: Feature: CI Testing

Description

This PR implements a tiered testing strategy designed to independently test validation and integration.

Key Changes

  • Unified Integration Matrix: Added tests/integration_suite.sh to report results as PASSED, FAILED, or EXPECTED_FAIL. This ensures that maintainers see clear "Feature Unavailable" signals for missing keys or root privileges instead of generic failures.
  • Full Lifecycle Audit: The suite exercises the complete lifecycle of Pods and Serverless resources (Create -> List -> Get -> Update -> Stop -> Start -> Send/Receive -> Delete).
  • Gated CI Workflow: Managed via tests/integration_suite.sh with separate root and non-root jobs. Should the parallel non-root PR be accepted, builds will be tested for fitness in both root and non-root contexts.
  • Automatic Resource Cleanup: Implemented a "Fail-Safe Kill Switch" in CI to ensure all created resources are removed even if tests fail or hang.
  • Safety Valve: Though a RUNPOD_API_KEY is required for the integration tier, the build actions have been setup so that the key is not required for the validation tier (such as downstream PRs). This allows validating that the project still builds and can still install and execute basic commands.
  • TINY Images: The CI images are based on alpine and the test images are based on Python:Alpine. CI testing, spinning up the pods, and running the CLI tests is extremely fast owing to Runpod's architecture.

Integration Notes

Cost & Privacy

  • Standardized on CPU-only compute and minimal templates to minimize CI expenses.
  • The RUNPOD_API_KEY is protected against abuse by automatically reporting EXPECTED_FAIL on unapproved external forks.
image image

There are Sourcery reviews and diagrams etc here that you can review if you like.

Thanks,
FNG

@FNGarvin
Copy link
Author

FNGarvin commented Mar 1, 2026

Reviewer's Guide

Introduces a unified CI integration test matrix that validates runpodctl installation and full API lifecycle under both root and non-root contexts, while enhancing the installer for non-root installs, architecture detection, and robust binary download behavior.

Sequence diagram for integration_suite.sh lifecycle and EXPECTED_FAIL handling

sequenceDiagram
    actor Dev
    participant GA as GitHubActions
    participant Job as Integration_job
    participant Suite as integration_suite_sh
    participant RP as Runpod_API

    Dev->>GA: Open or update PR
    GA->>Job: Trigger integration tier job (root or non root)
    Job->>Suite: Execute tests/integration_suite.sh

    Suite->>Suite: Check RUNPOD_API_KEY and fork status
    alt Missing key or unapproved fork
        Suite-->>Job: Mark result EXPECTED_FAIL
        Suite-->>RP: Skip API calls
        Job-->>GA: Report EXPECTED_FAIL status
    else Authorized with key
        Suite->>RP: Create Pod
        RP-->>Suite: Pod identifier
        Suite->>RP: List Pods
        RP-->>Suite: Pod list
        Suite->>RP: Get Pod details
        RP-->>Suite: Pod spec
        Suite->>RP: Update Pod configuration
        RP-->>Suite: Updated spec
        Suite->>RP: Stop Pod
        RP-->>Suite: Stopped state
        Suite->>RP: Start Pod
        RP-->>Suite: Running state
        Suite->>RP: Send workload or message
        RP-->>Suite: Receive result or response

        Suite->>RP: Delete Pod and related resources
        RP-->>Suite: Deletion confirmed

        Suite->>Suite: Fail safe kill switch
        Suite->>RP: Cleanup any leaked Pods or Serverless

        alt Any lifecycle step fails or hangs
            Suite-->>Job: Mark result FAILED
        else All lifecycle steps succeed
            Suite-->>Job: Mark result PASSED
        end

        Job-->>GA: Report PASSED or FAILED status
    end
Loading

Flow diagram for enhanced install.sh logic

flowchart TD
    A[Start install.sh] --> B[Print Installing runpodctl]

    B --> C{OS is macOS?}
    C -->|Yes| D[try_brew_install]
    D --> E{brew installed and runpodctl tap install succeeds?}
    E -->|Yes| F[Exit success]
    E -->|No| G[Fall back to binary install]
    C -->|No| G

    G --> H[detect_install_dir]
    H --> I{EUID == 0?}
    I -->|Yes| J[Set INSTALL_DIR to /usr/local/bin]
    I -->|No| K[Search preferred user dirs in PATH and writable]
    K --> L{Found suitable dir?}
    L -->|Yes| M[Set INSTALL_DIR to found dir]
    L -->|No| N[Create $HOME/.local/bin and set as INSTALL_DIR]
    J --> O[check_root]
    M --> O
    N --> O

    O --> P[Print non-root note if EUID != 0]
    P --> Q[check_system_requirements]
    Q --> R{All of wget tar grep sed present?}
    R -->|No| S[Print error Missing required commands and exit]
    R -->|Yes| T[fetch_latest_version]

    T --> U[Call GitHub releases latest API with wget]
    U --> V[Parse tag_name using grep and sed into VERSION]
    V --> W{VERSION empty?}
    W -->|Yes| X[Print failure to fetch version and exit]
    W -->|No| Y[download_url_constructor]

    Y --> Z[Detect os_type and arch_type]
    Z --> ZA{darwin linux or unsupported?}
    ZA -->|Unsupported| ZB[Print unsupported OS and exit]
    ZA -->|darwin| ZC[Set os_type darwin and arch_type all]
    ZA -->|linux| ZD[Map uname -m to amd64 or arm64 or exit]
    ZC --> ZE[Build URL1 and URL2 naming patterns]
    ZD --> ZE

    ZE --> ZF[download_and_install_cli]
    ZF --> ZG[Set cli_archive_file_name runpodctl.tar.gz]
    ZG --> ZH[Loop over DOWNLOAD_URLS]
    ZH --> ZI{wget succeeds for url?}
    ZI -->|Yes| ZJ[Set success true and break loop]
    ZI -->|No| ZH
    ZH -->|After loop| ZK{success == true?}
    ZK -->|No| ZL[Print failed to download from any URLs and exit]

    ZK -->|Yes| ZM[Extract runpodctl with tar]
    ZM --> ZN{tar extraction failed?}
    ZN -->|Yes| ZO[Print extract failure and exit]
    ZN -->|No| ZP[Remove archive file]
    ZP --> ZQ[chmod +x runpodctl]

    ZQ --> ZR{INSTALL_DIR writable?}
    ZR -->|No| ZS[Print INSTALL_DIR not writable, remove binary, exit]
    ZR -->|Yes| ZT[Move runpodctl into INSTALL_DIR]
    ZT --> ZU{mv succeeded?}
    ZU -->|No| ZV[Print failure to move binary and exit]
    ZU -->|Yes| ZW[Print runpodctl installed successfully to INSTALL_DIR]

    ZW --> ZX[End]
Loading

File-Level Changes

Change Details Files
Refactor install.sh to support non-root user-space installs, broaden platform support, and make the download process more resilient with fewer external dependencies.
  • Detect and select an appropriate INSTALL_DIR based on root/non-root execution, preferring user PATH directories and warning when not in PATH.
  • Relax root requirement to allow non-root installs while still noting when running without root privileges.
  • Replace jq dependency and package auto-install logic with a simple check for core POSIX tools (wget, tar, grep, sed), failing fast if missing.
  • Parse the latest GitHub release tag_name using grep/sed instead of jq to remove the jq runtime dependency.
  • Improve OS/architecture detection to support Linux amd64/arm64 and macOS universal binaries, constructing multiple candidate download URLs to support both old and new naming conventions.
  • Add a Homebrew-based install path on macOS that is attempted first, with a fallback to direct binary download if brew is unavailable or fails.
  • Make download/extraction more robust by iterating over multiple URLs, validating extraction, cleaning up temporary artifacts, and installing into the detected install directory with permission checks.
install.sh
Add a comprehensive bash-based integration test suite that validates install behavior and performs a full RunPod API lifecycle across pods and serverless endpoints.
  • Implement a colored, step-based runner that classifies outcomes as PASSED, FAILED, or EXPECTED_FAIL, with optional retry behavior and environment-based expected-failure handling.
  • Validate presence of required commands (jq, curl, wget, tar, grep, sed) before running tests.
  • Add Phase 1 installation validation to test root and non-root installation flows and basic runpodctl execution.
  • Add Phase 2 integration audit that, when RUNPOD_API_KEY is present, exercises pod lifecycle (create, list, get, update, stop/start, send/receive, delete) including data-plane send/receive via croc codes.
  • Add serverless lifecycle coverage (create, get, list, update, HTTP job submit/status, delete) with propagation polling and cleanup retries.
  • Gracefully treat missing RUNPOD_API_KEY as EXPECTED_FAIL for integration-heavy tests to keep CI signal meaningful for external contributors.
tests/integration_suite.sh
Introduce a GitHub Actions workflow to run the integration test suite in a root/non-root matrix inside an Alpine container, including emergency cleanup of leaked resources.
  • Define an Integration Test Matrix workflow triggered on pushes to key branches, pull requests to main, and manual dispatch, with a user_mode matrix for root and non-root runs.
  • Use an alpine container and install required tooling (bash, wget, grep, sed, tar, git, coreutils, jq, curl) for the test environment.
  • Set up a non-root tester user and isolated workspace for non-root runs, ensuring the test suite is executable and owned correctly.
  • Run the unified integration_suite.sh with RUNPOD_API_KEY injected from secrets, executing as root or via su for non-root mode.
  • Add a post-run emergency cleanup step that, when runpodctl and RUNPOD_API_KEY are available, enumerates and deletes any remaining pods and serverless endpoints created during tests.
.github/workflows/integration-tests.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@TimPietruskyRunPod
Copy link
Member

hey, thanks for this — having proper integration test coverage is something we've been wanting. the test suite design covering the full pod and serverless lifecycle is thorough and well thought out!

a few things we'd need addressed before this can be merged:

  1. this PR and feat: portable, root-free installer refactor (dependency-free) #240 have identical diffs — same 3 files, same changes. was this intentional? we'd suggest consolidating into one PR or splitting into two focused ones (installer changes vs. CI tests).

  2. merge conflictsfix: correct download URL pattern in install script #235 was merged recently and rewrote the same install.sh sections. this needs a rebase onto current main.

  3. resource cleanup safety — the emergency cleanup step deletes all pods and all serverless endpoints on the account. if this runs against a production key, that could be catastrophic. please scope cleanup to only resources created by the test run (e.g. filter by a ci-test- name prefix).

  4. set -e + resource leaks — if a test fails, set -e causes the script to exit immediately, potentially skipping cleanup of billable resources. a trap handler for cleanup would fix this.

  5. hardcoded values — the template ids (bwf8egptou, wvrr20un0l) and the feature branch names (feat-ci-tests, feat-install-noroot) in the workflow trigger should be parameterized via env vars.

we've created tracking issues for both features:

if you'd like to keep working on this, we'd suggest splitting the installer refactor and the test suite into separate PRs rebased on current main — it'll be easier to review and merge incrementally. but totally fine if you'd rather leave it to us.

really appreciate you taking the time on this — great work! 🙏

@FNGarvin
Copy link
Author

FNGarvin commented Mar 3, 2026

Merged into #249

@FNGarvin FNGarvin closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants